All members of each family were analyzed on the same array version: either the Illumina IMv1 (334 families) or Illumina IMv3 Duo (840 families) Bead array. These share 1,040,853 probes in common (representing 97% of probes on the IMv1 and 87% of probes on the IMv3). Of the 872 quartet families, 824 (94.5%) had all members hybridized and scanned simultaneously on the Illumina iScan in an effort
to minimize batch effects and technical variation. Genotyped samples were analyzed by using PLINK (Purcell et al., 2007) to identify incorrect sex, Mendelian inconsistencies, and cryptic relatedness by assessing inheritance by descent; 11 families were removed as a result. CNV detection was performed www.selleckchem.com/products/PF-2341066.html by using three algorithms: (1) PennCNV Revision 220, (2) Osimertinib datasheet QuantiSNP v1.1, and (3) GNOSIS. PennCNV
and QuantiSNP are based on the hidden Markov model. GNOSIS uses a continuous distribution function to fit the intensity values from the HapMap data and determine thresholds for significant points in the tails of the distribution that are used to detect copy-number changes. Analysis and merging of CNV predictions was performed with CNVision (www.CNVision.org), an in-house script. Specific genotyping and CNV parameters are detailed in the Supplemental Experimental Procedures. Five percent of the samples failed and were rerun; 39 families were removed because of repeated failures. A CNV was classified as rare if ≤50% of its length overlapped regions present at >1% frequency in the DGV of March 2010. Burden analyses were performed on the matched set of 872 probands and siblings. Typically, three outcomes were
assessed: proportion of individuals with ≥1 CNV matching the criteria (p value calculated with Fisher’s exact test); number of CNVs matching the criteria (p value calculated with sign test); and number of RefSeq genes within or overlapping CNVs matching the criteria (p 17-DMAG (Alvespimycin) HCl value calculated with Wilcoxon paired test). Where burden was assessed for unequal numbers of probands and siblings (e.g., by sex) the sign test and Wilcoxon paired test were replaced with the Wilcoxon test. To determine the probability of finding multiple rare de novo CNVs at the same location in probands, we first estimated how many likely positions in the genome were contributing to the observed de novo CNVs in siblings. As there are widely varying mutation rates for structural variation across the genome (Fu et al., 2010), some positions are more likely to result in de novo CNVs observed in our sample than others. Consequently, the likely number of positions is much smaller than the total possible number of positions. We refer to the likely CNV regions as effective copy-number-variable regions (eCNVRs) and calculate their quantity “C” using the so-called “unseen species problem,” which uses the frequency and number of observed CNV types (or species) to infer how many species are present in the population.