Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 53
Filter
Add more filters

Publication year range
1.
Cell ; 171(3): 710-722.e12, 2017 Oct 19.
Article in English | MEDLINE | ID: mdl-28965761

ABSTRACT

To further our understanding of the genetic etiology of autism, we generated and analyzed genome sequence data from 516 idiopathic autism families (2,064 individuals). This resource includes >59 million single-nucleotide variants (SNVs) and 9,212 private copy number variants (CNVs), of which 133,992 and 88 are de novo mutations (DNMs), respectively. We estimate a mutation rate of ∼1.5 × 10-8 SNVs per site per generation with a significantly higher mutation rate in repetitive DNA. Comparing probands and unaffected siblings, we observe several DNM trends. Probands carry more gene-disruptive CNVs and SNVs, resulting in severe missense mutations and mapping to predicted fetal brain promoters and embryonic stem cell enhancers. These differences become more pronounced for autism genes (p = 1.8 × 10-3, OR = 2.2). Patients are more likely to carry multiple coding and noncoding DNMs in different genes, which are enriched for expression in striatal neurons (p = 3 × 10-3), suggesting a path forward for genetically characterizing more complex cases of autism.


Subject(s)
Autistic Disorder/genetics , DNA Copy Number Variations , Polymorphism, Single Nucleotide , Animals , DNA Mutational Analysis , Female , Genome-Wide Association Study , Humans , INDEL Mutation , Male , Mice
2.
Nat Methods ; 20(4): 550-558, 2023 04.
Article in English | MEDLINE | ID: mdl-36550274

ABSTRACT

Structural variants (SVs) account for a large amount of sequence variability across genomes and play an important role in human genomics and precision medicine. Despite intense efforts over the years, the discovery of SVs in individuals remains challenging due to the diploid and highly repetitive structure of the human genome, and by the presence of SVs that vastly exceed sequencing read lengths. However, the recent introduction of low-error long-read sequencing technologies such as PacBio HiFi may finally enable these barriers to be overcome. Here we present SV discovery with sample-specific strings (SVDSS)-a method for discovery of SVs from long-read sequencing technologies (for example, PacBio HiFi) that combines and effectively leverages mapping-free, mapping-based and assembly-based methodologies for overall superior SV discovery performance. Our experiments on several human samples show that SVDSS outperforms state-of-the-art mapping-based methods for discovery of insertion and deletion SVs in PacBio HiFi reads and achieves notable improvements in calling SVs in repetitive regions of the genome.


Subject(s)
Genomics , High-Throughput Nucleotide Sequencing , Humans , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , Genomics/methods , Genome, Human , Repetitive Sequences, Nucleic Acid
3.
Nucleic Acids Res ; 51(4): e23, 2023 02 28.
Article in English | MEDLINE | ID: mdl-36625266

ABSTRACT

The discovery of cancer driver mutations is a fundamental goal in cancer research. While many cancer driver mutations have been discovered in the protein-coding genome, research into potential cancer drivers in the non-coding regions showed limited success so far. Here, we present a novel comprehensive framework Dr.Nod for detection of non-coding cis-regulatory candidate driver mutations that are associated with dysregulated gene expression using tissue-matched enhancer-gene annotations. Applying the framework to data from over 1500 tumours across eight tissues revealed a 4.4-fold enrichment of candidate driver mutations in regulatory regions of known cancer driver genes. An overarching conclusion that emerges is that the non-coding driver mutations contribute to cancer by significantly altering transcription factor binding sites, leading to upregulation of tissue-matched oncogenes and down-regulation of tumour-suppressor genes. Interestingly, more than half of the detected cancer-promoting non-coding regulatory driver mutations are over 20 kb distant from the cancer-associated genes they regulate. Our results show the importance of tissue-matched enhancer-gene maps, functional impact of mutations, and complex background mutagenesis model for the prediction of non-coding regulatory drivers. In conclusion, our study demonstrates that non-coding mutations in enhancers play a previously underappreciated role in cancer and dysregulation of clinically relevant target genes.


Subject(s)
Neoplasms , Oncogenes , Regulatory Sequences, Nucleic Acid , Humans , Mutation , Neoplasms/genetics
4.
Nucleic Acids Res ; 49(8): e47, 2021 05 07.
Article in English | MEDLINE | ID: mdl-33503255

ABSTRACT

Large scale catalogs of common genetic variants (including indels and structural variants) are being created using data from second and third generation whole-genome sequencing technologies. However, the genotyping of these variants in newly sequenced samples is a nontrivial task that requires extensive computational resources. Furthermore, current approaches are mostly limited to only specific types of variants and are generally prone to various errors and ambiguities when genotyping complex events. We are proposing an ultra-efficient approach for genotyping any type of structural variation that is not limited by the shortcomings and complexities of current mapping-based approaches. Our method Nebula utilizes the changes in the count of k-mers to predict the genotype of structural variants. We have shown that not only Nebula is an order of magnitude faster than mapping based approaches for genotyping structural variants, but also has comparable accuracy to state-of-the-art approaches. Furthermore, Nebula is a generic framework not limited to any specific type of event. Nebula is publicly available at https://github.com/Parsoa/Nebula.


Subject(s)
Genomics/methods , Genotyping Techniques/methods , Computer Simulation , Databases, Genetic , Genotype , High-Throughput Nucleotide Sequencing , INDEL Mutation , Polymorphism, Single Nucleotide , Software , Whole Genome Sequencing
5.
Bioinformatics ; 36(4): 1082-1090, 2020 02 15.
Article in English | MEDLINE | ID: mdl-31584621

ABSTRACT

MOTIVATION: We propose Meltos, a novel computational framework to address the challenging problem of building tumor phylogeny trees using somatic structural variants (SVs) among multiple samples. Meltos leverages the tumor phylogeny tree built on somatic single nucleotide variants (SNVs) to identify high confidence SVs and produce a comprehensive tumor lineage tree, using a novel optimization formulation. While we do not assume the evolutionary progression of SVs is necessarily the same as SNVs, we show that a tumor phylogeny tree using high-quality somatic SNVs can act as a guide for calling and assigning somatic SVs on a tree. Meltos utilizes multiple genomic read signals for potential SV breakpoints in whole genome sequencing data and proposes a probabilistic formulation for estimating variant allele fractions (VAFs) of SV events. RESULTS: In order to assess the ability of Meltos to correctly refine SNV trees with SV information, we tested Meltos on two simulated datasets with five genomes in both. We also assessed Meltos on two real cancer datasets. We tested Meltos on multiple samples from a liposarcoma tumor and on a multi-sample breast cancer data (Yates et al., 2015), where the authors provide validated structural variation events together with deep, targeted sequencing for a collection of somatic SNVs. We show Meltos has the ability to place high confidence validated SV calls on a refined tumor phylogeny tree. We also showed the flexibility of Meltos to either estimate VAFs directly from genomic data or to use copy number corrected estimates. AVAILABILITY AND IMPLEMENTATION: Meltos is available at https://github.com/ih-lab/Meltos. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Neoplasms , Genome , Genomic Structural Variation , Genomics , High-Throughput Nucleotide Sequencing , Humans , Neoplasms/genetics , Phylogeny , Sequence Analysis , Software
6.
Nature ; 517(7536): 608-11, 2015 Jan 29.
Article in English | MEDLINE | ID: mdl-25383537

ABSTRACT

The human genome is arguably the most complete mammalian reference assembly, yet more than 160 euchromatic gaps remain and aspects of its structural variation remain poorly understood ten years after its completion. To identify missing sequence and genetic variation, here we sequence and analyse a haploid human genome (CHM1) using single-molecule, real-time DNA sequencing. We close or extend 55% of the remaining interstitial gaps in the human GRCh37 reference genome--78% of which carried long runs of degenerate short tandem repeats, often several kilobases in length, embedded within (G+C)-rich genomic regions. We resolve the complete sequence of 26,079 euchromatic structural variants at the base-pair level, including inversions, complex insertions and long tracts of tandem repeats. Most have not been previously reported, with the greatest increases in sensitivity occurring for events less than 5 kilobases in size. Compared to the human reference, we find a significant insertional bias (3:1) in regions corresponding to complex insertions and long short tandem repeats. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology.


Subject(s)
Genetic Variation/genetics , Genome, Human/genetics , Genomics , Sequence Analysis, DNA/methods , Chromosome Inversion/genetics , Chromosomes, Human, Pair 10/genetics , Cloning, Molecular , GC Rich Sequence/genetics , Haploidy , Humans , Mutagenesis, Insertional/genetics , Reference Standards , Tandem Repeat Sequences/genetics
7.
Nature ; 526(7571): 75-81, 2015 Oct 01.
Article in English | MEDLINE | ID: mdl-26432246

ABSTRACT

Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.


Subject(s)
Genetic Variation/genetics , Genome, Human/genetics , Physical Chromosome Mapping , Amino Acid Sequence , Genetic Predisposition to Disease , Genetics, Medical , Genetics, Population , Genome-Wide Association Study , Genomics , Genotype , Haplotypes/genetics , Homozygote , Humans , Molecular Sequence Data , Mutation Rate , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics , Sequence Analysis, DNA , Sequence Deletion/genetics
8.
Bioinformatics ; 35(20): 3923-3930, 2019 10 15.
Article in English | MEDLINE | ID: mdl-30937433

ABSTRACT

MOTIVATION: Several algorithms have been developed that use high-throughput sequencing technology to characterize structural variations (SVs). Most of the existing approaches focus on detecting relatively simple types of SVs such as insertions, deletions and short inversions. In fact, complex SVs are of crucial importance and several have been associated with genomic disorders. To better understand the contribution of complex SVs to human disease, we need new algorithms to accurately discover and genotype such variants. Additionally, due to similar sequencing signatures, inverted duplications or gene conversion events that include inverted segmental duplications are often characterized as simple inversions, likewise, duplications and gene conversions in direct orientation may be called as simple deletions. Therefore, there is still a need for accurate algorithms to fully characterize complex SVs and thus improve calling accuracy of more simple variants. RESULTS: We developed novel algorithms to accurately characterize tandem, direct and inverted interspersed segmental duplications using short read whole genome sequencing datasets. We integrated these methods to our TARDIS tool, which is now capable of detecting various types of SVs using multiple sequence signatures such as read pair, read depth and split read. We evaluated the prediction performance of our algorithms through several experiments using both simulated and real datasets. In the simulation experiments, using a 30× coverage TARDIS achieved 96% sensitivity with only 4% false discovery rate. For experiments that involve real data, we used two haploid genomes (CHM1 and CHM13) and one human genome (NA12878) from the Illumina Platinum Genomes set. Comparison of our results with orthogonal PacBio call sets from the same genomes revealed higher accuracy for TARDIS than state-of-the-art methods. Furthermore, we showed a surprisingly low false discovery rate of our approach for discovery of tandem, direct and inverted interspersed segmental duplications prediction on CHM1 (<5% for the top 50 predictions). AVAILABILITY AND IMPLEMENTATION: TARDIS source code is available at https://github.com/BilkentCompGen/tardis, and a corresponding Docker image is available at https://hub.docker.com/r/alkanlab/tardis/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
High-Throughput Nucleotide Sequencing , Segmental Duplications, Genomic , Algorithms , Genome, Human , Genomics , Humans , Software
9.
Am J Hum Genet ; 98(1): 58-74, 2016 Jan 07.
Article in English | MEDLINE | ID: mdl-26749308

ABSTRACT

We performed whole-genome sequencing (WGS) of 208 genomes from 53 families affected by simplex autism. For the majority of these families, no copy-number variant (CNV) or candidate de novo gene-disruptive single-nucleotide variant (SNV) had been detected by microarray or whole-exome sequencing (WES). We integrated multiple CNV and SNV analyses and extensive experimental validation to identify additional candidate mutations in eight families. We report that compared to control individuals, probands showed a significant (p = 0.03) enrichment of de novo and private disruptive mutations within fetal CNS DNase I hypersensitive sites (i.e., putative regulatory regions). This effect was only observed within 50 kb of genes that have been previously associated with autism risk, including genes where dosage sensitivity has already been established by recurrent disruptive de novo protein-coding mutations (ARID1B, SCN2A, NR3C2, PRKCA, and DSCAM). In addition, we provide evidence of gene-disruptive CNVs (in DISC1, WNT7A, RBFOX1, and MBD5), as well as smaller de novo CNVs and exon-specific SNVs missed by exome sequencing in neurodevelopmental genes (e.g., CANX, SAE1, and PIK3CA). Our results suggest that the detection of smaller, often multiple CNVs affecting putative regulatory elements might help explain additional risk of simplex autism.


Subject(s)
Autistic Disorder/genetics , DNA/genetics , Genome, Human , Exome , Female , Humans , Male , Pedigree , Polymorphism, Single Nucleotide
10.
Nature ; 499(7459): 471-5, 2013 Jul 25.
Article in English | MEDLINE | ID: mdl-23823723

ABSTRACT

Most great ape genetic variation remains uncharacterized; however, its study is critical for understanding population history, recombination, selection and susceptibility to disease. Here we sequence to high coverage a total of 79 wild- and captive-born individuals representing all six great ape species and seven subspecies and report 88.8 million single nucleotide polymorphisms. Our analysis provides support for genetically distinct populations within each species, signals of gene flow, and the split of common chimpanzees into two distinct groups: Nigeria-Cameroon/western and central/eastern populations. We find extensive inbreeding in almost all wild populations, with eastern gorillas being the most extreme. Inferred effective population sizes have varied radically over time in different lineages and this appears to have a profound effect on the genetic diversity at, or close to, genes in almost all species. We discover and assign 1,982 loss-of-function variants throughout the human and great ape lineages, determining that the rate of gene loss has not been different in the human branch compared to other internal branches in the great ape phylogeny. This comprehensive catalogue of great ape genome diversity provides a framework for understanding evolution and a resource for more effective management of wild and captive great ape populations.


Subject(s)
Genetic Variation , Hominidae/genetics , Africa , Animals , Animals, Wild/genetics , Animals, Zoo/genetics , Asia, Southeastern , Evolution, Molecular , Gene Flow/genetics , Genetics, Population , Genome/genetics , Gorilla gorilla/classification , Gorilla gorilla/genetics , Hominidae/classification , Humans , Inbreeding , Pan paniscus/classification , Pan paniscus/genetics , Pan troglodytes/classification , Pan troglodytes/genetics , Phylogeny , Polymorphism, Single Nucleotide/genetics , Population Density
11.
Genome Res ; 25(1): 142-54, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25378250

ABSTRACT

Despite considerable genetic heterogeneity underlying neurodevelopmental diseases, there is compelling evidence that many disease genes will map to a much smaller number of biological subnetworks. We developed a computational method, termed MAGI (merging affected genes into integrated networks), that simultaneously integrates protein-protein interactions and RNA-seq expression profiles during brain development to discover "modules" enriched for de novo mutations in probands. We applied this method to recent exome sequencing of 1116 patients with autism and intellectual disability, discovering two distinct modules that differ in their properties and associated phenotypes. The first module consists of 80 genes associated with Wnt, Notch, SWI/SNF, and NCOR complexes and shows the highest expression early during embryonic development (8-16 post-conception weeks [pcw]). The second module consists of 24 genes associated with synaptic function, including long-term potentiation and calcium signaling with higher levels of postnatal expression. Patients with de novo mutations in these modules are more significantly intellectually impaired and carry more severe missense mutations when compared to probands with de novo mutations outside of these modules. We used our approach to define subsets of the network associated with higher functioning autism as well as greater severity with respect to IQ. Finally, we applied MAGI independently to epilepsy and schizophrenia exome sequencing cohorts and found significant overlap as well as expansion of these modules, suggesting a core set of integrated neurodevelopmental networks common to seemingly diverse human diseases.


Subject(s)
Autistic Disorder/diagnosis , Autistic Disorder/genetics , Gene Regulatory Networks , Algorithms , Cluster Analysis , Cohort Studies , Databases, Factual , Epilepsy/diagnosis , Epilepsy/genetics , Exome , Genetic Heterogeneity , Humans , Mutation, Missense , Phenotype , Schizophrenia/diagnosis , Schizophrenia/genetics , Sequence Analysis, RNA
12.
Genome Res ; 25(6): 792-801, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25883321

ABSTRACT

Small insertions and deletions (indels) and large structural variations (SVs) are major contributors to human genetic diversity and disease. However, mutation rates and characteristics of de novo indels and SVs in the general population have remained largely unexplored. We report 332 validated de novo structural changes identified in whole genomes of 250 families, including complex indels, retrotransposon insertions, and interchromosomal events. These data indicate a mutation rate of 2.94 indels (1-20 bp) and 0.16 SVs (>20 bp) per generation. De novo structural changes affect on average 4.1 kbp of genomic sequence and 29 coding bases per generation, which is 91 and 52 times more nucleotides than de novo substitutions, respectively. This contrasts with the equal genomic footprint of inherited SVs and substitutions. An excess of structural changes originated on paternal haplotypes. Additionally, we observed a nonuniform distribution of de novo SVs across offspring. These results reveal the importance of different mutational mechanisms to changes in human genome structure across generations.


Subject(s)
Genetic Variation , Genome, Human , Alleles , Amino Acid Sequence , Female , Genomics , Haplotypes , Humans , INDEL Mutation , Male , Molecular Sequence Data , Mutation Rate , Polymorphism, Single Nucleotide , Retroelements/genetics , Sequence Alignment , Sequence Analysis, DNA
13.
Methods ; 129: 3-7, 2017 10 01.
Article in English | MEDLINE | ID: mdl-28583483

ABSTRACT

Structural variations (SV) are broadly defined as genomic alterations that affect >50bp of DNA, which are shown to have significant effect on evolution and disease. The advent of high throughput sequencing (HTS) technologies and the ability to perform whole genome sequencing (WGS), makes it feasible to study these variants in depth. However, discovery of all forms of SV using WGS has proven to be challenging as the short reads produced by the predominant HTS platforms (<200bp for current technologies) and the fact that most genomes include large amounts of repeats make it very difficult to unambiguously map and accurately characterize such variants. Furthermore, existing tools for SV discovery are primarily developed for only a few of the SV types, which may have conflicting sequence signatures (i.e. read pairs, read depth, split reads) with other, untargeted SV classes. Here we are introduce a new framework, Tardis, which combines multiple read signatures into a single package to characterize most SV types simultaneously, while preventing such conflicts. Tardis also has a modular structure that makes it easy to extend for the discovery of additional forms of SV.


Subject(s)
Genomic Structural Variation/genetics , Genomics , High-Throughput Nucleotide Sequencing/methods , Software , Algorithms , Genome, Human , High-Throughput Nucleotide Sequencing/trends , Humans , Sequence Analysis, DNA , Whole Genome Sequencing
14.
Nature ; 470(7332): 59-65, 2011 Feb 03.
Article in English | MEDLINE | ID: mdl-21293372

ABSTRACT

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.


Subject(s)
DNA Copy Number Variations/genetics , Genetics, Population , Genome, Human/genetics , Genomics , Gene Duplication/genetics , Genetic Predisposition to Disease/genetics , Genotype , Humans , Mutagenesis, Insertional/genetics , Reproducibility of Results , Sequence Analysis, DNA , Sequence Deletion/genetics
15.
Proc Natl Acad Sci U S A ; 110(33): 13457-62, 2013 Aug 13.
Article in English | MEDLINE | ID: mdl-23884656

ABSTRACT

We analyzed 83 fully sequenced great ape genomes for mobile element insertions, predicting a total of 49,452 fixed and polymorphic Alu and long interspersed element 1 (L1) insertions not present in the human reference assembly and assigning each retrotransposition event to a different time point during great ape evolution. We used these homoplasy-free markers to construct a mobile element insertions-based phylogeny of humans and great apes and demonstrate their differential power to discern ape subspecies and populations. Within this context, we find a good correlation between L1 diversity and single-nucleotide polymorphism heterozygosity (r(2) = 0.65) in contrast to Alu repeats, which show little correlation (r(2) = 0.07). We estimate that the "rate" of Alu retrotransposition has differed by a factor of 15-fold in these lineages. Humans, chimpanzees, and bonobos show the highest rates of Alu accumulation--the latter two since divergence 1.5 Mya. The L1 insertion rate, in contrast, has remained relatively constant, with rates differing by less than a factor of three. We conclude that Alu retrotransposition has been the most variable form of genetic variation during recent human-great ape evolution, with increases and decreases occurring over very short periods of evolutionary time.


Subject(s)
Genetic Variation , Genome/genetics , Hominidae/genetics , Phylogeny , Alu Elements/genetics , Animals , Cluster Analysis , DNA Primers/genetics , Genomics , Hominidae/classification , Humans , Likelihood Functions , Long Interspersed Nucleotide Elements/genetics , Polymerase Chain Reaction , Polymorphism, Single Nucleotide , Principal Component Analysis , Species Specificity
16.
Genome Res ; 21(12): 2203-12, 2011 Dec.
Article in English | MEDLINE | ID: mdl-22048523

ABSTRACT

With the increasing popularity of whole-genome shotgun sequencing (WGSS) via high-throughput sequencing technologies, it is becoming highly desirable to perform comparative studies involving multiple individuals (from a specific population, race, or a group sharing a particular phenotype). The conventional approach for a comparative genome variation study involves two key steps: (1) each paired-end high-throughput sequenced genome is compared with a reference genome and its (structural) differences are identified; (2) the lists of structural variants in each genome are compared against each other. In this study we propose to move away from this two-step approach to a novel one in which all genomes are compared with the reference genome simultaneously for obtaining much higher accuracy in structural variation detection. For this purpose, we introduce the maximum parsimony-based simultaneous structural variation discovery problem for a set of high-throughput sequenced genomes and provide efficient algorithms to solve it. We compare the proposed framework with the conventional framework, on the genomes of the Yoruban mother-father-child trio, as well as the CEU trio of European ancestry (both sequenced by Illumina platforms). We observed that the conventional framework predicts an unexpectedly high number of de novo variations in the child in comparison to the parents and misses some of the known variations. Our proposed framework, on the other hand, not only significantly reduces the number of incorrectly predicted de novo variations but also predicts more of the known (true) variations.


Subject(s)
Genetic Variation , Genome, Human/physiology , Models, Genetic , Sequence Analysis, DNA/methods , Humans
17.
Genome Res ; 21(6): 840-9, 2011 Jun.
Article in English | MEDLINE | ID: mdl-21131385

ABSTRACT

Human genomes are now being rapidly sequenced, but not all forms of genetic variation are routinely characterized. In this study, we focus on Alu retrotransposition events and seek to characterize differences in the pattern of mobile insertion between individuals based on the analysis of eight human genomes sequenced using next-generation sequencing. Applying a rapid read-pair analysis algorithm, we discover 4342 Alu insertions not found in the human reference genome and show that 98% of a selected subset (63/64) experimentally validate. Of these new insertions, 89% correspond to AluY elements, suggesting that they arose by retrotransposition. Eighty percent of the Alu insertions have not been previously reported and more novel events were detected in Africans when compared with non-African samples (76% vs. 69%). Using these data, we develop an experimental and computational screen to identify ancestry informative Alu retrotransposition events among different human populations.


Subject(s)
Alu Elements/genetics , Genetic Variation , Genome, Human/genetics , Base Sequence , Black People/genetics , Computational Biology/methods , Genomics/methods , Humans , Molecular Sequence Data , Sequence Analysis, DNA/methods
18.
Genome Res ; 21(10): 1640-9, 2011 Oct.
Article in English | MEDLINE | ID: mdl-21685127

ABSTRACT

Structural variation has played an important role in the evolutionary restructuring of human and great ape genomes. Recent analyses have suggested that the genomes of chimpanzee and human have been particularly enriched for this form of genetic variation. Here, we set out to assess the extent of structural variation in the gorilla lineage by generating 10-fold genomic sequence coverage from a western lowland gorilla and integrating these data into a physical and cytogenetic framework of structural variation. We discovered and validated over 7665 structural changes within the gorilla lineage, including sequence resolution of inversions, deletions, duplications, and mobile element insertions. A comparison with human and other ape genomes shows that the gorilla genome has been subjected to the highest rate of segmental duplication. We show that both the gorilla and chimpanzee genomes have experienced independent yet convergent patterns of structural mutation that have not occurred in humans, including the formation of subtelomeric heterochromatic caps, the hyperexpansion of segmental duplications, and bursts of retroviral integrations. Our analysis suggests that the chimpanzee and gorilla genomes are structurally more derived than either orangutan or human genomes.


Subject(s)
Evolution, Molecular , Genomic Structural Variation , Gorilla gorilla/genetics , Pan troglodytes/genetics , Animals , Base Sequence , Chromosome Mapping , Chromosome Structures , Comparative Genomic Hybridization , Humans , In Situ Hybridization, Fluorescence , Karyotype , Molecular Sequence Data , Segmental Duplications, Genomic , Sequence Analysis, DNA
19.
BMC Genomics ; 14: 363, 2013 May 31.
Article in English | MEDLINE | ID: mdl-23721540

ABSTRACT

BACKGROUND: The only known albino gorilla, named Snowflake, was a male wild born individual from Equatorial Guinea who lived at the Barcelona Zoo for almost 40 years. He was diagnosed with non-syndromic oculocutaneous albinism, i.e. white hair, light eyes, pink skin, photophobia and reduced visual acuity. Despite previous efforts to explain the genetic cause, this is still unknown. Here, we study the genetic cause of his albinism and making use of whole genome sequencing data we find a higher inbreeding coefficient compared to other gorillas. RESULTS: We successfully identified the causal genetic variant for Snowflake's albinism, a non-synonymous single nucleotide variant located in a transmembrane region of SLC45A2. This transporter is known to be involved in oculocutaneous albinism type 4 (OCA4) in humans. We provide experimental evidence that shows that this amino acid replacement alters the membrane spanning capability of this transmembrane region. Finally, we provide a comprehensive study of genome-wide patterns of autozygogosity revealing that Snowflake's parents were related, being this the first report of inbreeding in a wild born Western lowland gorilla. CONCLUSIONS: In this study we demonstrate how the use of whole genome sequencing can be extended to link genotype and phenotype in non-model organisms and it can be a powerful tool in conservation genetics (e.g., inbreeding and genetic diversity) with the expected decrease in sequencing cost.


Subject(s)
Genomics , Gorilla gorilla/genetics , High-Throughput Nucleotide Sequencing , Inbreeding , Amino Acid Sequence , Animals , Female , Heterozygote , Male , Membrane Transport Proteins/chemistry , Membrane Transport Proteins/genetics , Microsatellite Repeats/genetics , Molecular Sequence Data , Mutation , Sequence Analysis, DNA
20.
J Pathol ; 227(1): 53-61, 2012 May.
Article in English | MEDLINE | ID: mdl-22294438

ABSTRACT

Next-generation sequencing is making sequence-based molecular pathology and personalized oncology viable. We selected an individual initially diagnosed with conventional but aggressive prostate adenocarcinoma and sequenced the genome and transcriptome from primary and metastatic tissues collected prior to hormone therapy. The histology-pathology and copy number profiles were remarkably homogeneous, yet it was possible to propose the quadrant of the prostate tumour that likely seeded the metastatic diaspora. Despite a homogeneous cell type, our transcriptome analysis revealed signatures of both luminal and neuroendocrine cell types. Remarkably, the repertoire of expressed but apparently private gene fusions, including C15orf21:MYC, recapitulated this biology. We hypothesize that the amplification and over-expression of the stem cell gene MSI2 may have contributed to the stable hybrid cellular identity. This hybrid luminal-neuroendocrine tumour appears to represent a novel and highly aggressive case of prostate cancer with unique biological features and, conceivably, a propensity for rapid progression to castrate-resistance. Overall, this work highlights the importance of integrated analyses of genome, exome and transcriptome sequences for basic tumour biology, sequence-based molecular pathology and personalized oncology.


Subject(s)
Adenocarcinoma/genetics , Gene Expression Regulation, Neoplastic , Genomics , Prostatic Neoplasms/genetics , Adenocarcinoma/secondary , Adenocarcinoma/therapy , Combined Modality Therapy , DNA, Neoplasm/analysis , Gene Amplification , Gene Dosage , Gene Expression Profiling , Gene Fusion , Humans , Male , Middle Aged , Neuroendocrine Cells/metabolism , Neuroendocrine Cells/pathology , Prognosis , Prostatectomy , Prostatic Neoplasms/pathology , Prostatic Neoplasms/therapy , RNA-Binding Proteins/genetics , RNA-Binding Proteins/metabolism , Sequence Analysis, DNA , Sequence Analysis, RNA
SELECTION OF CITATIONS
SEARCH DETAIL