Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
Add more filters










Publication year range
1.
Res Sq ; 2024 May 10.
Article in English | MEDLINE | ID: mdl-38766095

ABSTRACT

Rare variants, comprising a vast majority of human genetic variations, are likely to have more deleterious impact on human diseases compared to common variants. Here we present carrier statistic, a statistical framework to prioritize disease-related rare variants by integrating gene expression data. By quantifying the impact of rare variants on gene expression, carrier statistic can prioritize those rare variants that have large functional consequence in the diseased patients. Through simulation studies and analyzing real multi-omics dataset, we demonstrated that carrier statistic is applicable in studies with limited sample size (a few hundreds) and achieves substantially higher sensitivity than existing rare variants association methods. Application to Alzheimer's disease reveals 16 rare variants within 15 genes with extreme carrier statistics. We also found strong excess of rare variants among the top prioritized genes in diseased patients compared to that in healthy individuals. The carrier statistic method can be applied to various rare variant types and is adaptable to other omics data modalities, offering a powerful tool for investigating the molecular mechanisms underlying complex diseases.

2.
bioRxiv ; 2024 Mar 21.
Article in English | MEDLINE | ID: mdl-38562756

ABSTRACT

Rare variants, comprising a vast majority of human genetic variations, are likely to have more deleterious impact on human diseases compared to common variants. Here we present carrier statistic, a statistical framework to prioritize disease-related rare variants by integrating gene expression data. By quantifying the impact of rare variants on gene expression, carrier statistic can prioritize those rare variants that have large functional consequence in the diseased patients. Through simulation studies and analyzing real multi-omics dataset, we demonstrated that carrier statistic is applicable in studies with limited sample size (a few hundreds) and achieves substantially higher sensitivity than existing rare variants association methods. Application to Alzheimer's disease reveals 16 rare variants within 15 genes with extreme carrier statistics. The carrier statistic method can be applied to various rare variant types and is adaptable to other omics data modalities, offering a powerful tool for investigating the molecular mechanisms underlying complex diseases.

3.
Science ; 375(6583): eabh3021, 2022 02 25.
Article in English | MEDLINE | ID: mdl-35201886

ABSTRACT

Sleep quality declines with age; however, the underlying mechanisms remain elusive. We found that hyperexcitable hypocretin/orexin (Hcrt/OX) neurons drive sleep fragmentation during aging. In aged mice, Hcrt neurons exhibited more frequent neuronal activity epochs driving wake bouts, and optogenetic activation of Hcrt neurons elicited more prolonged wakefulness. Aged Hcrt neurons showed hyperexcitability with lower KCNQ2 expression and impaired M-current, mediated by KCNQ2/3 channels. Single-nucleus RNA-sequencing revealed adaptive changes to Hcrt neuron loss in the aging brain. Disruption of Kcnq2/3 genes in Hcrt neurons of young mice destabilized sleep, mimicking aging-associated sleep fragmentation, whereas the KCNQ-selective activator flupirtine hyperpolarized Hcrt neurons and rejuvenated sleep architecture in aged mice. Our findings demonstrate a mechanism underlying sleep instability during aging and a strategy to improve sleep continuity.


Subject(s)
Aging , Neurons/physiology , Orexins/physiology , Sleep Deprivation/physiopathology , Sleep , Wakefulness , Aminopyridines/pharmacology , Animals , CRISPR-Cas Systems , Electroencephalography , Electromyography , Female , Hypothalamic Area, Lateral/physiopathology , KCNQ2 Potassium Channel/genetics , KCNQ2 Potassium Channel/metabolism , KCNQ3 Potassium Channel/genetics , KCNQ3 Potassium Channel/metabolism , Male , Mice , Narcolepsy/genetics , Narcolepsy/physiopathology , Nerve Tissue Proteins/genetics , Nerve Tissue Proteins/metabolism , Neural Pathways , Optogenetics , Patch-Clamp Techniques , RNA-Seq , Sleep Quality
4.
Sci Rep ; 10(1): 7933, 2020 05 13.
Article in English | MEDLINE | ID: mdl-32404971

ABSTRACT

ChIP-seq is one of the core experimental resources available to understand genome-wide epigenetic interactions and identify the functional elements associated with diseases. The analysis of ChIP-seq data is important but poses a difficult computational challenge, due to the presence of irregular noise and bias on various levels. Although many peak-calling methods have been developed, the current computational tools still require, in some cases, human manual inspection using data visualization. However, the huge volumes of ChIP-seq data make it almost impossible for human researchers to manually uncover all the peaks. Recently developed convolutional neural networks (CNN), which are capable of achieving human-like classification accuracy, can be applied to this challenging problem. In this study, we design a novel supervised learning approach for identifying ChIP-seq peaks using CNNs, and integrate it into a software pipeline called CNN-Peaks. We use data labeled by human researchers who annotate the presence or absence of peaks in some genomic segments, as training data for our model. The trained model is then applied to predict peaks in previously unseen genomic segments from multiple ChIP-seq datasets including benchmark datasets commonly used for validation of peak calling methods. We observe a performance superior to that of previous methods.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Computational Biology/methods , Neural Networks, Computer , Software , Algorithms , Binding Sites , Chromatin Immunoprecipitation Sequencing/methods , Databases, Nucleic Acid , Epigenesis, Genetic , Epigenomics/methods , Histones/metabolism , Humans , Nucleotide Motifs , Protein Binding , Transcription Initiation Site
5.
BMC Genomics ; 18(1): 321, 2017 04 24.
Article in English | MEDLINE | ID: mdl-28438122

ABSTRACT

BACKGROUND: High-resolution microarray technology is routinely used in basic research and clinical practice to efficiently detect copy number variants (CNVs) across the entire human genome. A new generation of arrays combining high probe densities with optimized designs will comprise essential tools for genome analysis in the coming years. We systematically compared the genome-wide CNV detection power of all 17 available array designs from the Affymetrix, Agilent, and Illumina platforms by hybridizing the well-characterized genome of 1000 Genomes Project subject NA12878 to all arrays, and performing data analysis using both manufacturer-recommended and platform-independent software. We benchmarked the resulting CNV call sets from each array using a gold standard set of CNVs for this genome derived from 1000 Genomes Project whole genome sequencing data. RESULTS: The arrays tested comprise both SNP and aCGH platforms with varying designs and contain between ~0.5 to ~4.6 million probes. Across the arrays CNV detection varied widely in number of CNV calls (4-489), CNV size range (~40 bp to ~8 Mbp), and percentage of non-validated CNVs (0-86%). We discovered strikingly strong effects of specific array design principles on performance. For example, some SNP array designs with the largest numbers of probes and extensive exonic coverage produced a considerable number of CNV calls that could not be validated, compared to designs with probe numbers that are sometimes an order of magnitude smaller. This effect was only partially ameliorated using different analysis software and optimizing data analysis parameters. CONCLUSIONS: High-resolution microarrays will continue to be used as reliable, cost- and time-efficient tools for CNV analysis. However, different applications tolerate different limitations in CNV detection. Our study quantified how these arrays differ in total number and size range of detected CNVs as well as sensitivity, and determined how each array balances these attributes. This analysis will inform appropriate array selection for future CNV studies, and allow better assessment of the CNV-analytical power of both published and ongoing array-based genomics studies. Furthermore, our findings emphasize the importance of concurrent use of multiple analysis algorithms and independent experimental validation in array-based CNV detection studies.


Subject(s)
DNA Copy Number Variations , Genome, Human/genetics , Oligonucleotide Array Sequence Analysis/methods , Cytogenetics , Genomics , Humans
6.
Nat Genet ; 47(2): 100-1, 2015 Feb.
Article in English | MEDLINE | ID: mdl-25627897

ABSTRACT

Large copy number variants (CNVs) are strongly associated with morphogenetic processes and common neurodevelopmental disorders. A new study uses the example of Williams-Beuren syndrome (WBS) and Williams-Beuren region duplication syndrome to illustrate how induced pluripotent stem cells (iPSCs) and next-generation genomics can lead to a better understanding of complex genetics.


Subject(s)
Chromosomes, Human, Pair 7/genetics , DNA Copy Number Variations , Gene Expression Regulation/genetics , Pluripotent Stem Cells/physiology , Transcription Factors, TFII/genetics , Williams Syndrome/genetics , Humans
7.
Nature ; 492(7429): 438-42, 2012 Dec 20.
Article in English | MEDLINE | ID: mdl-23160490

ABSTRACT

Reprogramming somatic cells into induced pluripotent stem cells (iPSCs) has been suspected of causing de novo copy number variation. To explore this issue, here we perform a whole-genome and transcriptome analysis of 20 human iPSC lines derived from the primary skin fibroblasts of seven individuals using next-generation sequencing. We find that, on average, an iPSC line manifests two copy number variants (CNVs) not apparent in the fibroblasts from which the iPSC was derived. Using PCR and digital droplet PCR, we show that at least 50% of those CNVs are present as low-frequency somatic genomic variants in parental fibroblasts (that is, the fibroblasts from which each corresponding human iPSC line is derived), and are manifested in iPSC lines owing to their clonal origin. Hence, reprogramming does not necessarily lead to de novo CNVs in iPSCs, because most of the line-manifested CNVs reflect somatic mosaicism in the human skin. Moreover, our findings demonstrate that clonal expansion, and iPSC lines in particular, can be used as a discovery tool to reliably detect low-frequency CNVs in the tissue of origin. Overall, we estimate that approximately 30% of the fibroblast cells have somatic CNVs in their genomes, suggesting widespread somatic mosaicism in the human body. Our study paves the way to understanding the fundamental question of the extent to which cells of the human body normally acquire structural alterations in their DNA post-zygotically.


Subject(s)
DNA Copy Number Variations/genetics , Induced Pluripotent Stem Cells/metabolism , Mosaicism , Skin/metabolism , Cell Differentiation , Cells, Cultured , Cellular Reprogramming , Clone Cells , Fibroblasts/cytology , Gene Expression Profiling , Genome, Human/genetics , Humans , Induced Pluripotent Stem Cells/cytology , Male , Neurons/cytology , Polymerase Chain Reaction , Reproducibility of Results , Skin/cytology
8.
Proc Natl Acad Sci U S A ; 109(44): 18018-23, 2012 Oct 30.
Article in English | MEDLINE | ID: mdl-23043118

ABSTRACT

Genetic variation between individuals has been extensively investigated, but differences between tissues within individuals are far less understood. It is commonly assumed that all healthy cells that arise from the same zygote possess the same genomic content, with a few known exceptions in the immune system and germ line. However, a growing body of evidence shows that genomic variation exists between differentiated tissues. We investigated the scope of somatic genomic variation between tissues within humans. Analysis of copy number variation by high-resolution array-comparative genomic hybridization in diverse tissues from six unrelated subjects reveals a significant number of intraindividual genomic changes between tissues. Many (79%) of these events affect genes. Our results have important consequences for understanding normal genetic and phenotypic variation within individuals, and they have significant implications for both the etiology of genetic diseases such as cancer and for immortalized cell lines that might be used in research and therapeutics.


Subject(s)
Genetic Variation , Comparative Genomic Hybridization , Gene Dosage , Humans
9.
Nature ; 470(7332): 59-65, 2011 Feb 03.
Article in English | MEDLINE | ID: mdl-21293372

ABSTRACT

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.


Subject(s)
DNA Copy Number Variations/genetics , Genetics, Population , Genome, Human/genetics , Genomics , Gene Duplication/genetics , Genetic Predisposition to Disease/genetics , Genotype , Humans , Mutagenesis, Insertional/genetics , Reproducibility of Results , Sequence Analysis, DNA , Sequence Deletion/genetics
10.
J Child Psychol Psychiatry ; 52(4): 504-16, 2011 Apr.
Article in English | MEDLINE | ID: mdl-21204834

ABSTRACT

The study of the developing brain has begun to shed light on the underpinnings of both early and adult onset neuropsychiatric disorders. Neuroimaging of the human brain across developmental time points and the use of model animal systems have combined to reveal brain systems and gene products that may play a role in autism spectrum disorders, attention deficit hyperactivity disorder, obsessive compulsive disorder and many other neurodevelopmental conditions. However, precisely how genes may function in human brain development and how they interact with each other leading to psychiatric disorders is unknown. Because of an increasing understanding of neural stem cells and how the nervous system subsequently develops from these cells, we have now the ability to study disorders of the nervous system in a new way - by rewinding and reviewing the development of human neural cells. Induced pluripotent stem cells (iPSCs), developed from mature somatic cells, have allowed the development of specific cells in patients to be observed in real time. Moreover, they have allowed some neuronal-specific abnormalities to be corrected with pharmacological intervention in tissue culture. These exciting advances based on the use of iPSCs hold great promise for understanding, diagnosing and, possibly, treating psychiatric disorders. Specifically, examination of iPSCs from typically developing individuals will reveal how basic cellular processes and genetic differences contribute to individually unique nervous systems. Moreover, by comparing iPSCs from typically developing individuals and patients, differences at stem cell stages, through neural differentiation, and into the development of functional neurons may be identified that will reveal opportunities for intervention. The application of such techniques to early onset neuropsychiatric disorders is still on the horizon but has become a reality of current research efforts as a consequence of the revelations of many years of basic developmental neurobiological science.


Subject(s)
Brain Diseases/physiopathology , Brain Diseases/therapy , Mental Disorders/physiopathology , Mental Disorders/therapy , Neuropsychiatry/trends , Stem Cell Research , Adult , Animals , Brain/physiopathology , Brain Diseases/genetics , Child , DNA Copy Number Variations/genetics , Disease Models, Animal , Epigenesis, Genetic/genetics , Female , Genomic Structural Variation , Humans , Infant, Newborn , Male , Mental Disorders/genetics , Models, Genetic , Neurons/physiology , Pluripotent Stem Cells/physiology , Pregnancy
11.
Proc Natl Acad Sci U S A ; 106(29): 12031-6, 2009 Jul 21.
Article in English | MEDLINE | ID: mdl-19597142

ABSTRACT

Down syndrome (DS), or trisomy 21, is a common disorder associated with several complex clinical phenotypes. Although several hypotheses have been put forward, it is unclear as to whether particular gene loci on chromosome 21 (HSA21) are sufficient to cause DS and its associated features. Here we present a high-resolution genetic map of DS phenotypes based on an analysis of 30 subjects carrying rare segmental trisomies of various regions of HSA21. By using state-of-the-art genomics technologies we mapped segmental trisomies at exon-level resolution and identified discrete regions of 1.8-16.3 Mb likely to be involved in the development of 8 DS phenotypes, 4 of which are congenital malformations, including acute megakaryocytic leukemia, transient myeloproliferative disorder, Hirschsprung disease, duodenal stenosis, imperforate anus, severe mental retardation, DS-Alzheimer Disease, and DS-specific congenital heart disease (DSCHD). Our DS-phenotypic maps located DSCHD to a <2-Mb interval. Furthermore, the map enabled us to present evidence against the necessary involvement of other loci as well as specific hypotheses that have been put forward in relation to the etiology of DS-i.e., the presence of a single DS consensus region and the sufficiency of DSCR1 and DYRK1A, or APP, in causing several severe DS phenotypes. Our study demonstrates the value of combining advanced genomics with cohorts of rare patients for studying DS, a prototype for the role of copy-number variation in complex disease.


Subject(s)
Chromosome Mapping , Chromosomes, Human, Pair 21/genetics , Down Syndrome/genetics , Trisomy/genetics , Humans , Infant , Meta-Analysis as Topic , Phenotype
12.
PLoS Genet ; 4(11): e1000249, 2008 Nov.
Article in English | MEDLINE | ID: mdl-18989455

ABSTRACT

Olfactory receptors (ORs), which are involved in odorant recognition, form the largest mammalian protein superfamily. The genomic content of OR genes is considerably reduced in humans, as reflected by the relatively small repertoire size and the high fraction ( approximately 55%) of human pseudogenes. Since several recent low-resolution surveys suggested that OR genomic loci are frequently affected by copy-number variants (CNVs), we hypothesized that CNVs may play an important role in the evolution of the human olfactory repertoire. We used high-resolution oligonucleotide tiling microarrays to detect CNVs across 851 OR gene and pseudogene loci. Examining genomic DNA from 25 individuals with ancestry from three populations, we identified 93 OR gene loci and 151 pseudogene loci affected by CNVs, generating a mosaic of OR dosages across persons. Our data suggest that approximately 50% of the CNVs involve more than one OR, with the largest CNV spanning 11 loci. In contrast to earlier reports, we observe that CNVs are more frequent among OR pseudogenes than among intact genes, presumably due to both selective constraints and CNV formation biases. Furthermore, our results show an enrichment of CNVs among ORs with a close human paralog or lacking a one-to-one ortholog in chimpanzee. Interestingly, among the latter we observed an enrichment in CNV losses over gains, a finding potentially related to the known diminution of the human OR repertoire. Quantitative PCR experiments performed for 122 sampled ORs agreed well with the microarray results and uncovered 23 additional CNVs. Importantly, these experiments allowed us to uncover nine common deletion alleles that affect 15 OR genes and five pseudogenes. Comparison to the chimpanzee reference genome revealed that all of the deletion alleles are human derived, therefore indicating a profound effect of human-specific deletions on the individual OR gene content. Furthermore, these deletion alleles may be used in future genetic association studies of olfactory inter-individual differences.


Subject(s)
Evolution, Molecular , Gene Dosage , Genetic Variation , Genome, Human , Receptors, Odorant/genetics , Animals , Female , Gene Deletion , Gene Expression Profiling , Humans , Male , Pan troglodytes , Pseudogenes , Racial Groups/genetics , Recombination, Genetic
13.
Proc Natl Acad Sci U S A ; 105(40): 15499-504, 2008 Oct 07.
Article in English | MEDLINE | ID: mdl-18832167

ABSTRACT

Highly specific amplification of complex DNA pools without bias or template-independent products (TIPs) remains a challenge. We have developed a method using phi29 DNA polymerase and trehalose and optimized control of amplification to create micrograms of specific amplicons without TIPs from down to subfemtograms of DNA. With an input of as little as 0.5-2.5 ng of human gDNA or a few cells, the product could be close to native DNA in locus representation. The amplicons from 5 and 0.5 ng of DNA faithfully demonstrated all previously known heterozygous segmental duplications and deletions (3 Mb to 18 kb) located on chromosome 22 and even a homozygous deletion smaller than 1 kb with high-resolution chromosome-wide comparative genomic hybridization. With 550k Infinium BeadChip SNP typing, the >99.7% accuracy was compared favorably with results on unamplified DNA. Importantly, underrepresentation of chromosome termini that occurred with GenomiPhi v2 was greatly rescued with the present procedure, and the call rate and accuracy of SNP typing were also improved for the amplicons with a 0.5-ng, partially degraded DNA input. In addition, the amplification proceeded logarithmically in terms of total yield before saturation; the intact cells was amplified >50 times more efficiently than an equivalent amount of extracted DNA; and the locus imbalance for amplicons with 0.1 ng or lower input of DNA was variable, whereas for higher input it was largely reproducible. This procedure facilitates genomic analysis with single cells or other traces of DNA, and generates products suitable for analysis by massively parallel sequencing as well as microarray hybridization.


Subject(s)
DNA/analysis , Genome , Nucleic Acid Amplification Techniques/methods , Genomics , Nucleic Acid Hybridization/methods , Oligonucleotide Array Sequence Analysis/methods , Polymorphism, Single Nucleotide , Sensitivity and Specificity
14.
Genome Res ; 18(10): 1652-9, 2008 Oct.
Article in English | MEDLINE | ID: mdl-18765822

ABSTRACT

DNA methylation is an important component of epigenetic modifications that influences the transcriptional machinery and is aberrant in many human diseases. Several methods have been developed to map DNA methylation for either limited regions or genome-wide. In particular, antibodies specific for methylated CpG have been successfully applied in genome-wide studies. However, despite the relevance of the obtained results, the interpretation of antibody enrichment is not trivial. Of greatest importance, the coupling of antibody-enriched methylated fragments with microarrays generates DNA methylation estimates that are not linearly related to the true methylation level. Here, we present an experimental and analytical methodology, MEDME (modeling experimental data with MeDIP enrichment), to obtain enhanced estimates that better describe the true values of DNA methylation level throughout the genome. We propose an experimental scenario for evaluating the true relationship in a high-throughput setting and a model-based analysis to predict the absolute and relative DNA methylation levels. We successfully applied this model to evaluate DNA methylation status of normal human melanocytes compared to a melanoma cell strain. Despite the low resolution typical of methods based on immunoprecipitation, we show that model-derived estimates of DNA methylation provide relatively high correlation with measured absolute and relative levels, as validated by bisulfite genomic DNA sequencing. Importantly, the model-derived DNA methylation estimates simplify the interpretation of the results both at single-loci and at chromosome-wide levels.


Subject(s)
Algorithms , DNA Methylation , Oligonucleotide Array Sequence Analysis/methods , CpG Islands , DNA/genetics , DNA/metabolism , DNA, Neoplasm/genetics , DNA, Neoplasm/metabolism , Epigenesis, Genetic , Genome, Human , Humans , Immunoprecipitation , Infant, Newborn , Melanocytes/metabolism , Sequence Analysis, DNA/methods
15.
Curr Opin Struct Biol ; 18(3): 366-74, 2008 Jun.
Article in English | MEDLINE | ID: mdl-18511261

ABSTRACT

Following recent technological advances there has been an increasing interest in genome structural variants (SVs), in particular copy-number variants (CNVs)--large-scale duplications and deletions. Although not immediately evident, CNV surveys make a conceptual connection between the fields of population genetics and protein families, in particular with regard to the stability and expandability of families. The mechanisms giving rise to CNVs can be considered as fundamental processes underlying gene duplication and loss; duplicated genes being the results of 'successful' copies, fixed and maintained in the population. Conversely, many 'unsuccessful' duplicates remain in the genome as pseudogenes. Here, we survey studies on CNVs, highlighting issues related to protein families. In particular, CNVs tend to affect specific gene functional categories, such as those associated with environmental response, and are depleted in genes related to basic cellular processes. Furthermore, CNVs occur more often at the periphery of the protein interaction network. In comparison, protein families associated with successful and unsuccessful duplicates are associated with similar functional categories but are differentially placed in the interaction network. These trends are likely reflective of CNV formation biases and natural selection, both of which differentially influence distinct protein families.


Subject(s)
Gene Duplication , Proteins/genetics , Proteins/chemistry
16.
Science ; 318(5849): 420-6, 2007 Oct 19.
Article in English | MEDLINE | ID: mdl-17901297

ABSTRACT

Structural variation of the genome involves kilobase- to megabase-sized deletions, duplications, insertions, inversions, and complex combinations of rearrangements. We introduce high-throughput and massive paired-end mapping (PEM), a large-scale genome-sequencing method to identify structural variants (SVs) approximately 3 kilobases (kb) or larger that combines the rescue and capture of paired ends of 3-kb fragments, massive 454 sequencing, and a computational approach to map DNA reads onto a reference genome. PEM was used to map SVs in an African and in a putatively European individual and identified shared and divergent SVs relative to the reference genome. Overall, we fine-mapped more than 1000 SVs and documented that the number of SVs among humans is much larger than initially hypothesized; many of the SVs potentially affect gene function. The breakpoint junction sequences of more than 200 SVs were determined with a novel pooling strategy and computational analysis. Our analysis provided insights into the mechanisms of SV formation in humans.


Subject(s)
Genetic Variation , Genome, Human , Mutation , Chromosome Inversion , Chromosome Mapping , Computational Biology , Female , Gene Fusion , Humans , Mutagenesis, Insertional , Oligonucleotide Array Sequence Analysis , Recombination, Genetic , Repetitive Sequences, Nucleic Acid , Retroelements , Sequence Analysis, DNA , Sequence Deletion
17.
Proc Natl Acad Sci U S A ; 104(24): 10110-5, 2007 Jun 12.
Article in English | MEDLINE | ID: mdl-17551006

ABSTRACT

Copy-number variants (CNVs) are an abundant form of genetic variation in humans. However, approaches for determining exact CNV breakpoint sequences (physical deletion or duplication boundaries) across individuals, crucial for associating genotype to phenotype, have been lacking so far, and the vast majority of CNVs have been reported with approximate genomic coordinates only. Here, we report an approach, called BreakPtr, for fine-mapping CNVs (available from http://breakptr.gersteinlab.org). We statistically integrate both sequence characteristics and data from high-resolution comparative genome hybridization experiments in a discrete-valued, bivariate hidden Markov model. Incorporation of nucleotide-sequence information allows us to take into account the fact that recently duplicated sequences (e.g., segmental duplications) often coincide with breakpoints. In anticipation of an upcoming increase in CNV data, we developed an iterative, "active" approach to initially scoring with a preliminary model, performing targeted validations, retraining the model, and then rescoring, and a flexible parameterization system that intuitively collapses from a full model of 2,503 parameters to a core one of only 10. Using our approach, we accurately mapped >400 breakpoints on chromosome 22 and a region of chromosome 11, refining the boundaries of many previously approximately mapped CNVs. Four predicted breakpoints flanked known disease-associated deletions. We validated an additional four predicted CNV breakpoints by sequencing. Overall, our results suggest a predictive resolution of approximately 300 bp. This level of resolution enables more precise correlations between CNVs and across individuals than previously possible, allowing the study of CNV population frequencies. Further, it enabled us to demonstrate a clear Mendelian pattern of inheritance for one of the CNVs.


Subject(s)
Chromosome Breakage , Chromosomes, Human, Pair 11 , Chromosomes, Human, Pair 22 , Gene Dosage , Genetic Variation , Genome, Human , Algorithms , Base Sequence , Humans , Models, Genetic , Molecular Sequence Data , Nucleic Acid Hybridization , Oligonucleotide Array Sequence Analysis , Physical Chromosome Mapping , Polymerase Chain Reaction , Polymorphism, Genetic , Predictive Value of Tests , Reproducibility of Results , Sequence Analysis, DNA
18.
Proc Natl Acad Sci U S A ; 103(12): 4534-9, 2006 Mar 21.
Article in English | MEDLINE | ID: mdl-16537408

ABSTRACT

Deletions and amplifications of the human genomic sequence (copy number polymorphisms) are the cause of numerous diseases and a potential cause of phenotypic variation in the normal population. Comparative genomic hybridization (CGH) has been developed as a useful tool for detecting alterations in DNA copy number that involve blocks of DNA several kilobases or larger in size. We have developed high-resolution CGH (HR-CGH) to detect accurately and with relatively little bias the presence and extent of chromosomal aberrations in human DNA. Maskless array synthesis was used to construct arrays containing 385,000 oligonucleotides with isothermal probes of 45-85 bp in length; arrays tiling the beta-globin locus and chromosome 22q were prepared. Arrays with a 9-bp tiling path were used to map a 622-bp heterozygous deletion in the beta-globin locus. Arrays with an 85-bp tiling path were used to analyze DNA from patients with copy number changes in the pericentromeric region of chromosome 22q. Heterozygous deletions and duplications as well as partial triploidies and partial tetraploidies of portions of chromosome 22q were mapped with high resolution (typically up to 200 bp) in each patient, and the precise breakpoints of two deletions were confirmed by DNA sequencing. Additional peaks potentially corresponding to known and novel additional CNPs were also observed. Our results demonstrate that HR-CGH allows the detection of copy number changes in the human genome at an unprecedented level of resolution.


Subject(s)
Chromosome Aberrations , Chromosomes, Human, Pair 22/genetics , Gene Dosage , Oligonucleotide Array Sequence Analysis/methods , Physical Chromosome Mapping/methods , Base Sequence , Gene Duplication , Globins/genetics , Humans , Molecular Sequence Data , Polymorphism, Genetic , Sequence Deletion
SELECTION OF CITATIONS
SEARCH DETAIL
...