Detection of copy number variation in whole-exome targeted DNA sequencing of schizophrenia patients

October 6, 2011
2:50 pm - 4:00 pm
Halligan 111


There has been much recent debate regarding the genetic architecture underlying susceptibility to schizophrenia and other neuropsychiatric diseases. Nonetheless, it is clear that rare copy number variations (CNV) play a role in the pathology of schizophrenia. Most work on this topic has been done using pre-defined single-nucleotide polymorphism (SNP) arrays. In order to more fully elucidate this phenomenon at higher genomic resolution and identify rare CNV (both deletions and duplications), we have undertaken high-coverage whole- exome targeted resequencing of 1000 matched Swedish schizophrenia cases and controls. Specifically, we use the high-coverage sequencing read depth as a proxy for copy number, so that we would like to identify regions where a particular sample has significant depletion or enrichment in sequenced reads and infer a deletion or duplication CNV, respectively. However, this effort is confounded by multiple known biological factors, including GC content of the targeted exome intervals, the size and sequence complexity of these intervals, sequencing batch, proximity to segmental duplications, and nucleotide- level population variation, all of which may lead to differential capture of exomic sequence or sequencing efficiency. Moreover, there are presumably multiple unknown factors and interactions between all such factors that lead to sample-specific variability in read depth that does not directly result from genomic copy number variation. In order to overcome these issues, we have performed rigorous statistical, data-driven normalization using a principal component analysis (PCA) approach. We detected and removed read depth variation driven by these factors, resulting in a de-noised data set with which we then proceeded to make CNV calls using a hidden Markov model (HMM) algorithm for segmentation of the exome into diploid, deletion, and duplication regions for each sample.


Menachem obtained a BSc in each of the Life Sciences and Computer Science at the Hebrew University of Jerusalem. He then continued there for his joint MSc in Computer Science and Genomics/Bioinformatics, followed by a PhD in Computer Science. He currently holds a staff scientist position in the Stanley Center for Psychiatric Research in the Broad Institute of MIT and Harvard. His research interests have included protein family clustering, probabilistic graphical models, discrete optimization, protein design, and detection of structural genetic variation.