Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
Nat Genet ; 53(8): 1125-1134, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34312540

RESUMO

Autism is a highly heritable complex disorder in which de novo mutation (DNM) variation contributes significantly to risk. Using whole-genome sequencing data from 3,474 families, we investigate another source of large-effect risk variation, ultra-rare variants. We report and replicate a transmission disequilibrium of private, likely gene-disruptive (LGD) variants in probands but find that 95% of this burden resides outside of known DNM-enriched genes. This variant class more strongly affects multiplex family probands and supports a multi-hit model for autism. Candidate genes with private LGD variants preferentially transmitted to probands converge on the E3 ubiquitin-protein ligase complex, intracellular transport and Erb signaling protein networks. We estimate that these variants are approximately 2.5 generations old and significantly younger than other variants of similar type and frequency in siblings. Overall, private LGD variants are under strong purifying selection and appear to act on a distinct set of genes not yet associated with autism.


Assuntos
Transtorno do Espectro Autista/genética , Predisposição Genética para Doença , Proteínas/genética , Transtorno Autístico/genética , Evolução Molecular , Dosagem de Genes , Haplótipos , Humanos , Desequilíbrio de Ligação , Modelos Genéticos , Mutação , Linhagem , Polimorfismo de Nucleotídeo Único , Mapas de Interação de Proteínas/genética , Irmãos , Sequenciamento Completo do Genoma
2.
Am J Hum Genet ; 108(8): 1436-1449, 2021 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-34216551

RESUMO

Despite widespread clinical genetic testing, many individuals with suspected genetic conditions lack a precise diagnosis, limiting their opportunity to take advantage of state-of-the-art treatments. In some cases, testing reveals difficult-to-evaluate structural differences, candidate variants that do not fully explain the phenotype, single pathogenic variants in recessive disorders, or no variants in genes of interest. Thus, there is a need for better tools to identify a precise genetic diagnosis in individuals when conventional testing approaches have been exhausted. We performed targeted long-read sequencing (T-LRS) using adaptive sampling on the Oxford Nanopore platform on 40 individuals, 10 of whom lacked a complete molecular diagnosis. We computationally targeted up to 151 Mbp of sequence per individual and searched for pathogenic substitutions, structural variants, and methylation differences using a single data source. We detected all genomic aberrations-including single-nucleotide variants, copy number changes, repeat expansions, and methylation differences-identified by prior clinical testing. In 8/8 individuals with complex structural rearrangements, T-LRS enabled more precise resolution of the mutation, leading to changes in clinical management in one case. In ten individuals with suspected Mendelian conditions lacking a precise genetic diagnosis, T-LRS identified pathogenic or likely pathogenic variants in six and variants of uncertain significance in two others. T-LRS accurately identifies pathogenic structural variants, resolves complex rearrangements, and identifies Mendelian variants not detected by other technologies. T-LRS represents an efficient and cost-effective strategy to evaluate high-priority genes and regions or complex clinical testing results.


Assuntos
Aberrações Cromossômicas , Análise Citogenética/métodos , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/genética , Predisposição Genética para Doença , Genoma Humano , Mutação , Variações do Número de Cópias de DNA , Feminino , Testes Genéticos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Cariotipagem , Masculino , Análise de Sequência de DNA
3.
Genome Res ; 31(8): 1313-1324, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34244228

RESUMO

There are more than 55,000 variable number tandem repeats (VNTRs) in the human genome, notable for both their striking polymorphism and mutability. Despite their role in human evolution and genomic variation, they have yet to be studied collectively and in detail, partially owing to their large size, variability, and predominant location in noncoding regions. Here, we examine 467 VNTRs that are human-specific expansions, unique to one location in the genome, and not associated with retrotransposons. We leverage publicly available long-read genomes, including from the Human Genome Structural Variant Consortium, to ascertain the exact nucleotide composition of these VNTRs and compare their composition of alleles. We then confirm repeat unit composition in more than 3000 short-read samples from the 1000 Genomes Project. Our analysis reveals that these VNTRs contain highly structured repeat motif organization, modified by frequent deletion and duplication events. Although overall VNTR compositions tend to remain similar between 1000 Genomes Project superpopulations, we describe a notable exception with substantial differences in repeat composition (in PCBP3), as well as several VNTRs that are significantly different in length between superpopulations (in ART1, PROP1, DYNC2I1, and LOC102723906). We also observe that most of these VNTRs are expanded in archaic human genomes, yet remain stable in length between single generations. Collectively, our findings indicate that repeat motif variability, repeat composition, and repeat length are all informative modalities to consider when characterizing VNTRs and their contribution to genomic variation.

4.
Nature ; 594(7861): 77-81, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33953399

RESUMO

The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation1,2. Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes1,3-5 and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome.


Assuntos
Evolução Molecular , Genoma/genética , Genômica , Pan paniscus/genética , Filogenia , Animais , Fator de Iniciação 4A em Eucariotos/genética , Feminino , Genes , Gorilla gorilla/genética , Anotação de Sequência Molecular/normas , Pan troglodytes/genética , Pongo/genética , Duplicações Segmentares Genômicas , Análise de Sequência de DNA
5.
Science ; 372(6537)2021 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-33632895

RESUMO

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.


Assuntos
Variação Genética , Genoma Humano , Haplótipos , Feminino , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL , Sequências Repetitivas Dispersas , Masculino , Grupos Populacionais/genética , Locos de Características Quantitativas , Retroelementos , Análise de Sequência de DNA , Inversão de Sequência , Sequenciamento Completo do Genoma
6.
Viruses ; 13(2)2021 02 04.
Artigo em Inglês | MEDLINE | ID: mdl-33557409

RESUMO

Hepatocellular carcinoma (HCC) is a leading cause of cancer-related mortality. Almost half of HCC cases are associated with hepatitis B virus (HBV) infections, which often lead to HBV sequence integrations in the human genome. Accurate identification of HBV integration sites at a single nucleotide resolution is critical for developing a better understanding of the cancer genome landscape and of the disease itself. Here, we performed further analyses and characterization of HBV integrations identified by our recently reported VIcaller platform in recurrent or known HCC genes (such as TERT, MLL4, and CCNE1) as well as non-recurrent cancer-related genes (such as CSMD2, NKD2, and RHOU). Our pathway enrichment analysis revealed multiple pathways involving the alcohol dehydrogenase 4 gene, such as the metabolism pathways of retinol, tyrosine, and fatty acid. Further analysis of the HBV integration sites revealed distinct patterns involving the integration upper breakpoints, integrated genome lengths, and integration allele fractions between tumor and normal tissues. Our analysis also implies that the VIcaller method has diagnostic potential through discovering novel clonal integrations in cancer-related genes. In conclusion, although VIcaller is a hypothesis free virome-wide approach, it can still be applied to accurately identify genome-wide integration events of a specific candidate virus and their integration allele fractions.


Assuntos
Carcinoma Hepatocelular/genética , Vírus da Hepatite B/genética , Neoplasias Hepáticas/genética , Integração Viral , Carcinogênese/genética , Carcinoma Hepatocelular/patologia , Carcinoma Hepatocelular/virologia , DNA Viral/genética , Frequência do Gene , Genoma Humano/genética , Genoma Viral/genética , Vírus da Hepatite B/fisiologia , Hepatite B Crônica/genética , Hepatite B Crônica/patologia , Hepatite B Crônica/virologia , Humanos , Neoplasias Hepáticas/patologia , Neoplasias Hepáticas/virologia , Software
7.
Nat Biotechnol ; 39(3): 302-308, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33288906

RESUMO

Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.


Assuntos
Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Pais , Análise de Sequência de DNA/métodos , Análise de Célula Única/métodos , Algoritmos , Haplótipos , Humanos , Porto Rico/etnologia
8.
Nat Commun ; 11(1): 4932, 2020 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-33004838

RESUMO

Most genes associated with neurodevelopmental disorders (NDDs) were identified with an excess of de novo mutations (DNMs) but the significance in case-control mutation burden analysis is unestablished. Here, we sequence 63 genes in 16,294 NDD cases and an additional 62 genes in 6,211 NDD cases. By combining these with published data, we assess a total of 125 genes in over 16,000 NDD cases and compare the mutation burden to nonpsychiatric controls from ExAC. We identify 48 genes (25 newly reported) showing significant burden of ultra-rare (MAF < 0.01%) gene-disruptive mutations (FDR 5%), six of which reach family-wise error rate (FWER) significance (p < 1.25E-06). Among these 125 targeted genes, we also reevaluate DNM excess in 17,426 NDD trios with 6,499 new autism trios. We identify 90 genes enriched for DNMs (FDR 5%; e.g., GABRG2 and UIMC1); of which, 61 reach FWER significance (p < 3.64E-07; e.g., CASZ1). In addition to doubling the number of patients for many NDD risk genes, we present phenotype-genotype correlations for seven risk genes (CTCF, HNRNPU, KCNQ3, ZBTB18, TCF12, SPEN, and LEO1) based on this large-scale targeted sequencing effort.


Assuntos
Predisposição Genética para Doença , Transtornos do Neurodesenvolvimento/genética , Fatores de Transcrição Hélice-Alça-Hélice Básicos/genética , Fator de Ligação a CCCTC/genética , Estudos de Casos e Controles , Estudos de Coortes , Análise Mutacional de DNA , Proteínas de Ligação a DNA/genética , Feminino , Estudos de Associação Genética , Ribonucleoproteínas Nucleares Heterogêneas Grupo U/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Canal de Potássio KCNQ3/genética , Masculino , Mutação , Proteínas de Ligação a RNA/genética , Proteínas Repressoras/genética , Fatores de Transcrição/genética
9.
Genome Res ; 30(11): 1680-1693, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-33093070

RESUMO

Rhesus macaque is an Old World monkey that shared a common ancestor with human ∼25 Myr ago and is an important animal model for human disease studies. A deep understanding of its genetics is therefore required for both biomedical and evolutionary studies. Among structural variants, inversions represent a driving force in speciation and play an important role in disease predisposition. Here we generated a genome-wide map of inversions between human and macaque, combining single-cell strand sequencing with cytogenetics. We identified 375 total inversions between 859 bp and 92 Mbp, increasing by eightfold the number of previously reported inversions. Among these, 19 inversions flanked by segmental duplications overlap with recurrent copy number variants associated with neurocognitive disorders. Evolutionary analyses show that in 17 out of 19 cases, the Hominidae orientation of these disease-associated regions is always derived. This suggests that duplicated sequences likely played a fundamental role in generating inversions in humans and great apes, creating architectures that nowadays predispose these regions to disease-associated genetic instability. Finally, we identified 861 genes mapping at 156 inversions breakpoints, with some showing evidence of differential expression in human and macaque cell lines, thus highlighting candidates that might have contributed to the evolution of species-specific features. This study depicts the most accurate fine-scale map of inversions between human and macaque using a two-pronged integrative approach, such as single-cell strand sequencing and cytogenetics, and represents a valuable resource toward understanding of the biology and evolution of primate species.

11.
Am J Hum Genet ; 107(3): 445-460, 2020 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-32750315

RESUMO

Tandem repeats are proposed to contribute to human-specific traits, and more than 40 tandem repeat expansions are known to cause neurological disease. Here, we characterize a human-specific 69 bp variable number tandem repeat (VNTR) in the last intron of WDR7, which exhibits striking variability in both copy number and nucleotide composition, as revealed by long-read sequencing. In addition, greater repeat copy number is significantly enriched in three independent cohorts of individuals with sporadic amyotrophic lateral sclerosis (ALS). Each unit of the repeat forms a stem-loop structure with the potential to produce microRNAs, and the repeat RNA can aggregate when expressed in cells. We leveraged its remarkable sequence variability to align the repeat in 288 samples and uncover its mechanism of expansion. We found that the repeat expands in the 3'-5' direction, in groups of repeat units divisible by two. The expansion patterns we observed were consistent with duplication events, and a replication error called template switching. We also observed that the VNTR is expanded in both Denisovan and Neanderthal genomes but is fixed at one copy or fewer in non-human primates. Evaluating the repeat in 1000 Genomes Project samples reveals that some repeat segments are solely present or absent in certain geographic populations. The large size of the repeat unit in this VNTR, along with our multiplexed sequencing strategy, provides an unprecedented opportunity to study mechanisms of repeat expansion, and a framework for evaluating the roles of VNTRs in human evolution and disease.


Assuntos
Proteínas Adaptadoras de Transdução de Sinal/genética , Esclerose Amiotrófica Lateral/genética , Evolução Molecular , Sequências de Repetição em Tandem/genética , Idoso , Doença de Alzheimer/genética , Doença de Alzheimer/patologia , Esclerose Amiotrófica Lateral/patologia , Expansão das Repetições de DNA/genética , Feminino , Regulação da Expressão Gênica/genética , Humanos , Masculino , Repetições Minissatélites/genética , Fenótipo , Especificidade da Espécie
12.
Genome Biol ; 21(1): 202, 2020 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-32778141

RESUMO

BACKGROUND: The complex interspersed pattern of segmental duplications in humans is responsible for rearrangements associated with neurodevelopmental disease, including the emergence of novel genes important in human brain evolution. We investigate the evolution of LCR16a, a putative driver of this phenomenon that encodes one of the most rapidly evolving human-ape gene families, nuclear pore interacting protein (NPIP). RESULTS: Comparative analysis shows that LCR16a has independently expanded in five primate lineages over the last 35 million years of primate evolution. The expansions are associated with independent lineage-specific segmental duplications flanking LCR16a leading to the emergence of large interspersed duplication blocks at non-orthologous chromosomal locations in each primate lineage. The intron-exon structure of the NPIP gene family has changed dramatically throughout primate evolution with different branches showing characteristic gene models yet maintaining an open reading frame. In the African ape lineage, we detect signatures of positive selection that occurred after a transition to more ubiquitous expression among great ape tissues when compared to Old World and New World monkeys. Mouse transgenic experiments from baboon and human genomic loci confirm these expression differences and suggest that the broader ape expression pattern arose due to mutational changes that emerged in cis. CONCLUSIONS: LCR16a promotes serial interspersed duplications and creates hotspots of genomic instability that appear to be an ancient property of primate genomes. Dramatic changes to NPIP gene structure and altered tissue expression preceded major bouts of positive selection in the African ape lineage, suggestive of a gene undergoing strong adaptive evolution.

13.
Nat Genet ; 52(8): 849-858, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32541924

RESUMO

Inversions play an important role in disease and evolution but are difficult to characterize because their breakpoints map to large repeats. We increased by sixfold the number (n = 1,069) of previously reported great ape inversions by using single-cell DNA template strand and long-read sequencing. We find that the X chromosome is most enriched (2.5-fold) for inversions, on the basis of its size and duplication content. There is an excess of differentially expressed primate genes near the breakpoints of large (>100 kilobases (kb)) inversions but not smaller events. We show that when great ape lineage-specific duplications emerge, they preferentially (approximately 75%) occur in an inverted orientation compared to that at their ancestral locus. We construct megabase-pair scale haplotypes for individual chromosomes and identify 23 genomic regions that have recurrently toggled between a direct and an inverted state over 15 million years. The direct orientation is most frequently the derived state for human polymorphisms that predispose to recurrent copy number variants associated with neurodevelopmental disease.


Assuntos
Inversão Cromossômica/genética , Genoma/genética , Hominidae/genética , Animais , Cromossomos/genética , Variações do Número de Cópias de DNA/genética , Evolução Molecular , Feminino , Haplótipos/genética , Humanos , Masculino
14.
Genomics ; 112(1): 207-211, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-30710609

RESUMO

Viral sequence integrations in the human genome have been implicated in various human diseases. Viral integrations remain among the most challenging-to-detect structural changes of the human genome. No studies have systematically analyzed how molecular and bioinformatics factors affect the power (sensitivity) to detect viral integrations using high-throughput sequencing (HTS). We selected a wide-range of molecular and bioinformatics factors covering genome sequence characteristics, HTS features, and viral integration detection. We designed a fast simulation-based framework to model the process of detecting variable viral integration events in the human genome. We then examined the associations of selected factors with viral integration detection power. We identified six factors that significantly affected viral integration detection power (P < 2 × 10-16). The strongest factors associated with detection power included proportion of sample cells with clonal viral integrations (Pearson's ρ = 0.64), sequencing depth (ρ = 0.37), length of viral integration (ρ = 0.37), paired-end read insert size (ρ = 0.23), user-defined threshold (number of supporting reads) to claim successful identification of integrations (ρ = -0.19), and read length (when sequence volume was fixed) (ρ = -0.09). As the first tool of its kind, VIpower incorporates all these factors, which can be manipulated in concert with each other to optimize the detection power. This tool may be used to estimate viral integration detection power for various combinations of sequencing or analytic parameters. It may also be used to estimate the parameters required to achieve a specific power when designing new sequencing experiments.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Integração Viral , Genoma Humano , Humanos
15.
Ann Hum Genet ; 84(2): 125-140, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31711268

RESUMO

The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes.


Assuntos
Biomarcadores/análise , Variação Genética , Genoma Humano , Haploidia , Mola Hidatiforme/genética , Análise de Sequência de DNA/métodos , Análise de Célula Única/métodos , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Gravidez
16.
Proc Natl Acad Sci U S A ; 116(46): 23243-23253, 2019 11 12.
Artigo em Inglês | MEDLINE | ID: mdl-31659027

RESUMO

Short tandem repeats (STRs) and variable number tandem repeats (VNTRs) are important sources of natural and disease-causing variation, yet they have been problematic to resolve in reference genomes and genotype with short-read technology. We created a framework to model the evolution and instability of STRs and VNTRs in apes. We phased and assembled 3 ape genomes (chimpanzee, gorilla, and orangutan) using long-read and 10x Genomics linked-read sequence data for 21,442 human tandem repeats discovered in 6 haplotype-resolved assemblies of Yoruban, Chinese, and Puerto Rican origin. We define a set of 1,584 STRs/VNTRs expanded specifically in humans, including large tandem repeats affecting coding and noncoding portions of genes (e.g., MUC3A, CACNA1C). We show that short interspersed nuclear element-VNTR-Alu (SVA) retrotransposition is the main mechanism for distributing GC-rich human-specific tandem repeat expansions throughout the genome but with a bias against genes. In contrast, we observe that VNTRs not originating from retrotransposons have a propensity to cluster near genes, especially in the subtelomere. Using tissue-specific expression from human and chimpanzee brains, we identify genes where transcript isoform usage differs significantly, likely caused by cryptic splicing variation within VNTRs. Using single-cell expression from cerebral organoids, we observe a strong effect for genes associated with transcription profiles analogous to intermediate progenitor cells. Finally, we compare the sequence composition of some of the largest human-specific repeat expansions and identify 52 STRs/VNTRs with at least 40 uninterrupted pure tracts as candidates for genetically unstable regions associated with disease.


Assuntos
Evolução Molecular , Genoma Humano , Primatas/genética , Sequências de Repetição em Tandem , Animais , Doença/genética , Variação Estrutural do Genoma , Humanos , Splicing de RNA
17.
Genome Res ; 29(5): 819-830, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30872350

RESUMO

Oncoviral infection is responsible for 12%-15% of cancer in humans. Convergent evidence from epidemiology, pathology, and oncology suggests that new viral etiologies for cancers remain to be discovered. Oncoviral profiles can be obtained from cancer genome sequencing data; however, widespread viral sequence contamination and noncausal viruses complicate the process of identifying genuine oncoviruses. Here, we propose a novel strategy to address these challenges by performing virome-wide screening of early-stage clonal viral integrations. To implement this strategy, we developed VIcaller, a novel platform for identifying viral integrations that are derived from any characterized viruses and shared by a large proportion of tumor cells using whole-genome sequencing (WGS) data. The sensitivity and precision were confirmed with simulated and benchmark cancer data sets. By applying this platform to cancer WGS data sets with proven or speculated viral etiology, we newly identified or confirmed clonal integrations of hepatitis B virus (HBV), human papillomavirus (HPV), Epstein-Barr virus (EBV), and BK Virus (BKV), suggesting the involvement of these viruses in early stages of tumorigenesis in affected tumors, such as HBV in TERT and KMT2B (also known as MLL4) gene loci in liver cancer, HPV and BKV in bladder cancer, and EBV in non-Hodgkin's lymphoma. We also showed the capacity of VIcaller to identify integrations from some uncharacterized viruses. This is the first study to systematically investigate the strategy and method of virome-wide screening of clonal integrations to identify oncoviruses. Searching clonal viral integrations with our platform has the capacity to identify virus-caused cancers and discover cancer viral etiologies.


Assuntos
Neoplasias/virologia , Integração Viral/genética , Sequenciamento Completo do Genoma , Vírus BK/genética , Vírus BK/patogenicidade , Carcinogênese/genética , Transformação Celular Neoplásica , DNA Viral , Proteínas de Ligação a DNA/genética , Vírus da Hepatite B/genética , Vírus da Hepatite B/patogenicidade , Herpesvirus Humano 4/genética , Herpesvirus Humano 4/patogenicidade , Histona-Lisina N-Metiltransferase , Humanos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/virologia , Linfoma não Hodgkin/genética , Linfoma não Hodgkin/virologia , Neoplasias/genética , Papillomaviridae/genética , Papillomaviridae/patogenicidade , Software , Neoplasias da Bexiga Urinária/genética , Neoplasias da Bexiga Urinária/virologia
18.
Cell ; 176(3): 663-675.e19, 2019 01 24.
Artigo em Inglês | MEDLINE | ID: mdl-30661756

RESUMO

In order to provide a comprehensive resource for human structural variants (SVs), we generated long-read sequence data and analyzed SVs for fifteen human genomes. We sequence resolved 99,604 insertions, deletions, and inversions including 2,238 (1.6 Mbp) that are shared among all discovery genomes with an additional 13,053 (6.9 Mbp) present in the majority, indicating minor alleles or errors in the reference. Genotyping in 440 additional genomes confirms the most common SVs in unique euchromatin are now sequence resolved. We report a ninefold SV bias toward the last 5 Mbp of human chromosomes with nearly 55% of all VNTRs (variable number of tandem repeats) mapping to this portion of the genome. We identify SVs affecting coding and noncoding regulatory loci improving annotation and interpretation of functional variation. These data provide the framework to construct a canonical human reference and a resource for developing advanced representations capable of capturing allelic diversity.


Assuntos
Frequência do Gene/genética , Genoma Humano/genética , Variação Estrutural do Genoma/genética , Alelos , Eucromatina/genética , Genômica/métodos , Humanos , Repetições Minissatélites/genética , Análise de Sequência de DNA/métodos
19.
Nat Genet ; 51(1): 106-116, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30559488

RESUMO

We combined de novo mutation (DNM) data from 10,927 individuals with developmental delay and autism to identify 253 candidate neurodevelopmental disease genes with an excess of missense and/or likely gene-disruptive (LGD) mutations. Of these genes, 124 reach exome-wide significance (P < 5 × 10-7) for DNM. Intersecting these results with copy number variation (CNV) morbidity data shows an enrichment for genomic disorder regions (30/253, likelihood ratio (LR) +1.85, P = 0.0017). We identify genes with an excess of missense DNMs overlapping deletion syndromes (for example, KIF1A and the 2q37 deletion) as well as duplication syndromes, such as recurrent MAPK3 missense mutations within the chromosome 16p11.2 duplication, recurrent CHD4 missense DNMs in the 12p13 duplication region, and recurrent WDFY4 missense DNMs in the 10q11.23 duplication region. Network analyses of genes showing an excess of DNMs highlights functional networks, including cell-specific enrichments in the D1+ and D2+ spiny neurons of the striatum.


Assuntos
Variações do Número de Cópias de DNA/genética , Mutação/genética , Transtornos do Neurodesenvolvimento/genética , Animais , Transtorno Autístico/genética , Aberrações Cromossômicas , Deficiências do Desenvolvimento/genética , Exoma/genética , Humanos , Deficiência Intelectual/genética , Peptídeos e Proteínas de Sinalização Intracelular/genética , Complexo Mi-2 de Remodelação de Nucleossomo e Desacetilase/genética , Camundongos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética
20.
Genome Med ; 9(1): 101, 2017 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-29179772

RESUMO

Next-generation sequencing (NGS) is now more accessible to clinicians and researchers. As a result, our understanding of the genetics of neurodevelopmental disorders (NDDs) has rapidly advanced over the past few years. NGS has led to the discovery of new NDD genes with an excess of recurrent de novo mutations (DNMs) when compared to controls. Development of large-scale databases of normal and disease variation has given rise to metrics exploring the relative tolerance of individual genes to human mutation. Genetic etiology and diagnosis rates have improved, which have led to the discovery of new pathways and tissue types relevant to NDDs. In this review, we highlight several key findings based on the discovery of recurrent DNMs ranging from copy number variants to point mutations. We explore biases and patterns of DNM enrichment and the role of mosaicism and secondary mutations in variable expressivity. We discuss the benefit of whole-genome sequencing (WGS) over whole-exome sequencing (WES) to understand more complex, multifactorial cases of NDD and explain how this improved understanding aids diagnosis and management of these disorders. Comprehensive assessment of the DNM landscape across the genome using WGS and other technologies will lead to the development of novel functional and bioinformatics approaches to interpret DNMs and drive new insights into NDD biology.


Assuntos
Transtornos do Neurodesenvolvimento/genética , Variações do Número de Cópias de DNA , Humanos , Mutação , Sequenciamento Completo do Genoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...