Pesquisa | BVS CLAP/SMR-OPAS/OMS

1.

Megabase Length Hypermutation Accompanies Human Structural Variation at 17p11.2.

Beck, Christine R; Carvalho, Claudia M B; Akdemir, Zeynep C; Sedlazeck, Fritz J; Song, Xiaofei; Meng, Qingchang; Hu, Jianhong; Doddapaneni, Harsha; Chong, Zechen; Chen, Edward S; Thornton, Philip C; Liu, Pengfei; Yuan, Bo; Withers, Marjorie; Jhangiani, Shalini N; Kalra, Divya; Walker, Kimberly; English, Adam C; Han, Yi; Chen, Ken; Muzny, Donna M; Ira, Grzegorz; Shaw, Chad A; Gibbs, Richard A; Hastings, P J; Lupski, James R.

Cell ; 176(6): 1310-1324.e10, 2019 03 07.

Artigo em Inglês | MEDLINE | ID: mdl-30827684

RESUMO

DNA rearrangements resulting in human genome structural variants (SVs) are caused by diverse mutational mechanisms. We used long- and short-read sequencing technologies to investigate end products of de novo chromosome 17p11.2 rearrangements and query the molecular mechanisms underlying both recurrent and non-recurrent events. Evidence for an increased rate of clustered single-nucleotide variant (SNV) mutation in cis with non-recurrent rearrangements was found. Indel and SNV formation are associated with both copy-number gains and losses of 17p11.2, occur up to â¼1 Mb away from the breakpoint junctions, and favor C > G transversion substitutions; results suggest that single-stranded DNA is formed during the genesis of the SV and provide compelling support for a microhomology-mediated break-induced replication (MMBIR) mechanism for SV formation. Our data show an additional mutational burden of MMBIR consisting of hypermutation confined to the locus and manifesting as SNVs and indels predominantly within genes.

Assuntos

Cromossomos Humanos Par 17 , Mutação , Anormalidades Múltiplas/genética , Pontos de Quebra do Cromossomo , Transtornos Cromossômicos/genética , Duplicação Cromossômica/genética , Variações do Número de Cópias de DNA , Reparo do DNA/genética , Replicação do DNA , Rearranjo Gênico , Genoma Humano , Variação Estrutural do Genoma , Humanos , Mutação INDEL , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Recombinação Genética , Análise de Sequência de DNA/métodos , Síndrome de Smith-Magenis/genética

2.

Sooty mangabey genome sequence provides insight into AIDS resistance in a natural SIV host.

Palesch, David; Bosinger, Steven E; Tharp, Gregory K; Vanderford, Thomas H; Paiardini, Mirko; Chahroudi, Ann; Johnson, Zachary P; Kirchhoff, Frank; Hahn, Beatrice H; Norgren, Robert B; Patel, Nirav B; Sodora, Donald L; Dawoud, Reem A; Stewart, Caro-Beth; Seepo, Sara M; Harris, R Alan; Liu, Yue; Raveendran, Muthuswamy; Han, Yi; English, Adam; Thomas, Gregg W C; Hahn, Matthew W; Pipes, Lenore; Mason, Christopher E; Muzny, Donna M; Gibbs, Richard A; Sauter, Daniel; Worley, Kim; Rogers, Jeffrey; Silvestri, Guido.

Nature ; 553(7686): 77-81, 2018 01 03.

Artigo em Inglês | MEDLINE | ID: mdl-29300007

RESUMO

In contrast to infections with human immunodeficiency virus (HIV) in humans and simian immunodeficiency virus (SIV) in macaques, SIV infection of a natural host, sooty mangabeys (Cercocebus atys), is non-pathogenic despite high viraemia. Here we sequenced and assembled the genome of a captive sooty mangabey. We conducted genome-wide comparative analyses of transcript assemblies from C. atys and AIDS-susceptible species, such as humans and macaques, to identify candidates for host genetic factors that influence susceptibility. We identified several immune-related genes in the genome of C. atys that show substantial sequence divergence from macaques or humans. One of these sequence divergences, a C-terminal frameshift in the toll-like receptor-4 (TLR4) gene of C. atys, is associated with a blunted in vitro response to TLR-4 ligands. In addition, we found a major structural change in exons 3-4 of the immune-regulatory protein intercellular adhesion molecule 2 (ICAM-2); expression of this variant leads to reduced cell surface expression of ICAM-2. These data provide a resource for comparative genomic studies of HIV and/or SIV pathogenesis and may help to elucidate the mechanisms by which SIV-infected sooty mangabeys avoid AIDS.

Assuntos

Síndrome da Imunodeficiência Adquirida/genética , Cercocebus atys/genética , Cercocebus atys/virologia , Predisposição Genética para Doença , Genoma/genética , Especificidade de Hospedeiro/genética , Vírus da Imunodeficiência Símia , Síndrome da Imunodeficiência Adquirida/virologia , Sequência de Aminoácidos , Animais , Moléculas de Adesão Celular/química , Moléculas de Adesão Celular/genética , Moléculas de Adesão Celular/metabolismo , Cercocebus atys/imunologia , Éxons/genética , Feminino , Mutação da Fase de Leitura/genética , Variação Genética , Genômica , HIV/patogenicidade , Humanos , Macaca/virologia , Deleção de Sequência , Síndrome de Imunodeficiência Adquirida dos Símios/genética , Síndrome de Imunodeficiência Adquirida dos Símios/virologia , Vírus da Imunodeficiência Símia/patogenicidade , Especificidade da Espécie , Receptor 4 Toll-Like/química , Receptor 4 Toll-Like/genética , Receptor 4 Toll-Like/imunologia , Transcriptoma/genética , Sequenciamento Completo do Genoma

3.

muCNV: Genotyping Structural Variants for Population-level Sequencing.

Jun, Goo; Sedlazeck, Fritz; Zhu, Qihui; English, Adam; Metcalf, Ginger; Kang, Hyun Min; Lee, Charles; Gibbs, Richard; Boerwinkle, Eric.

Bioinformatics ; 2021 Mar 24.

Artigo em Inglês | MEDLINE | ID: mdl-33760063

RESUMO

MOTIVATION: There are high demands for joint genotyping of structural variations with short-read sequencing, but efficient and accurate genotyping in population scale is a challenging task. RESULTS: We developed muCNV that aggregates per-sample summary pileups for joint genotyping of > 100,000 samples. Pilot results show very low Mendelian inconsistencies. Applications to large-scale projects in cloud show the computational efficiencies of muCNV genotyping pipeline. AVAILABILITY: muCNV is publicly available for download at: https://github.com/gjun/muCNV. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.

SVachra: a tool to identify genomic structural variation in mate pair sequencing data containing inward and outward facing reads.

Hampton, Oliver A; English, Adam C; Wang, Mark; Salerno, William J; Liu, Yue; Muzny, Donna M; Han, Yi; Wheeler, David A; Worley, Kim C; Lupski, James R; Gibbs, Richard A.

BMC Genomics ; 18(Suppl 6): 691, 2017 Oct 03.

Artigo em Inglês | MEDLINE | ID: mdl-28984202

RESUMO

BACKGROUND: Characterization of genomic structural variation (SV) is essential to expanding the research and clinical applications of genome sequencing. Reliance upon short DNA fragment paired end sequencing has yielded a wealth of single nucleotide variants and internal sequencing read insertions-deletions, at the cost of limited SV detection. Multi-kilobase DNA fragment mate pair sequencing has supplemented the void in SV detection, but introduced new analytic challenges requiring SV detection tools specifically designed for mate pair sequencing data. Here, we introduce SVachra - Structural Variation Assessment of CHRomosomal Aberrations, a breakpoint calling program that identifies large insertions-deletions, inversions, inter- and intra-chromosomal translocations utilizing both inward and outward facing read types generated by mate pair sequencing. RESULTS: We demonstrate SVachra's utility by executing the program on large-insert (Illumina Nextera) mate pair sequencing data from the personal genome of a single subject (HS1011). An additional data set of long-read (Pacific BioSciences RSII) was also generated to validate SV calls from SVachra and other comparison SV calling programs. SVachra exhibited the highest validation rate and reported the widest distribution of SV types and size ranges when compared to other SV callers. CONCLUSIONS: SVachra is a highly specific breakpoint calling program that exhibits a more unbiased SV detection methodology than other callers.

Assuntos

Variação Genética , Genômica/métodos , Análise de Sequência de DNA/métodos

5.

PacBio-LITS: a large-insert targeted sequencing method for characterization of human disease-associated chromosomal structural variations.

Wang, Min; Beck, Christine R; English, Adam C; Meng, Qingchang; Buhay, Christian; Han, Yi; Doddapaneni, Harsha V; Yu, Fuli; Boerwinkle, Eric; Lupski, James R; Muzny, Donna M; Gibbs, Richard A.

BMC Genomics ; 16: 214, 2015 Mar 19.

Artigo em Inglês | MEDLINE | ID: mdl-25887218

RESUMO

BACKGROUND: Generation of long (>5 Kb) DNA sequencing reads provides an approach for interrogation of complex regions in the human genome. Currently, large-insert whole genome sequencing (WGS) technologies from Pacific Biosciences (PacBio) enable analysis of chromosomal structural variations (SVs), but the cost to achieve the required sequence coverage across the entire human genome is high. RESULTS: We developed a method (termed PacBio-LITS) that combines oligonucleotide-based DNA target-capture enrichment technologies with PacBio large-insert library preparation to facilitate SV studies at specific chromosomal regions. PacBio-LITS provides deep sequence coverage at the specified sites at substantially reduced cost compared with PacBio WGS. The efficacy of PacBio-LITS is illustrated by delineating the breakpoint junctions of low copy repeat (LCR)-associated complex structural rearrangements on chr17p11.2 in patients diagnosed with Potocki-Lupski syndrome (PTLS; MIM#610883). We successfully identified previously determined breakpoint junctions in three PTLS cases, and also were able to discover novel junctions in repetitive sequences, including LCR-mediated breakpoints. The new information has enabled us to propose mechanisms for formation of these structural variants. CONCLUSIONS: The new method leverages the cost efficiency of targeted capture-sequencing as well as the mappability and scaffolding capabilities of long sequencing reads generated by the PacBio platform. It is therefore suitable for studying complex SVs, especially those involving LCRs, inversions, and the generation of chimeric Alu elements at the breakpoints. Other genomic research applications, such as haplotype phasing and small insertion and deletion validation could also benefit from this technology.

Assuntos

Genômica/métodos , Aberrações Cromossômicas , Biblioteca Gênica , Rearranjo Gênico , Estudos de Associação Genética/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Fluxo de Trabalho

6.

Assessing structural variation in a personal genome-towards a human reference diploid genome.

English, Adam C; Salerno, William J; Hampton, Oliver A; Gonzaga-Jauregui, Claudia; Ambreth, Shruthi; Ritter, Deborah I; Beck, Christine R; Davis, Caleb F; Dahdouli, Mahmoud; Ma, Singer; Carroll, Andrew; Veeraraghavan, Narayanan; Bruestle, Jeremy; Drees, Becky; Hastie, Alex; Lam, Ernest T; White, Simon; Mishra, Pamela; Wang, Min; Han, Yi; Zhang, Feng; Stankiewicz, Pawel; Wheeler, David A; Reid, Jeffrey G; Muzny, Donna M; Rogers, Jeffrey; Sabo, Aniko; Worley, Kim C; Lupski, James R; Boerwinkle, Eric; Gibbs, Richard A.

BMC Genomics ; 16: 286, 2015 Apr 11.

Artigo em Inglês | MEDLINE | ID: mdl-25886820

RESUMO

BACKGROUND: Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods. RESULTS: We demonstrate Parliament's efficacy via integrated analyses of data from whole-genome array comparative genomic hybridization, short-read next-generation sequencing, long-read (Pacific BioSciences RSII), long-insert (Illumina Nextera), and whole-genome architecture (BioNano Irys) data from the personal genome of a single subject (HS1011). From this genome, Parliament identified 31,007 genomic loci between 100 bp and 1 Mbp that are inconsistent with the hg19 reference assembly. Of these loci, 9,777 are supported as putative SVs by hybrid local assembly, long-read PacBio data, or multi-source heuristics. These SVs span 59 Mbp of the reference genome (1.8%) and include 3,801 events identified only with long-read data. The HS1011 data and complete Parliament infrastructure, including a BAM-to-SV workflow, are available on the cloud-based service DNAnexus. CONCLUSIONS: HS1011 SV analysis reveals the limits and advantages of multiple sequencing technologies, specifically the impact of long-read SV discovery. With the full Parliament infrastructure, the HS1011 data constitute a public resource for novel SV discovery, software calibration, and personal genome structural variation analysis.

Assuntos

Genoma Humano , Variação Estrutural do Genoma , Análise de Sequência de DNA/métodos , Biologia Computacional , Bases de Dados Genéticas , Diploide , Humanos , Software

7.

PBHoney: identifying genomic variants via long-read discordance and interrupted mapping.

English, Adam C; Salerno, William J; Reid, Jeffrey G.

BMC Bioinformatics ; 15: 180, 2014 Jun 10.

Artigo em Inglês | MEDLINE | ID: mdl-24915764

RESUMO

BACKGROUND: As resequencing projects become more prevalent across a larger number of species, accurate variant identification will further elucidate the nature of genetic diversity and become increasingly relevant in genomic studies. However, the identification of larger genomic variants via DNA sequencing is limited by both the incomplete information provided by sequencing reads and the nature of the genome itself. Long-read sequencing technologies provide high-resolution access to structural variants often inaccessible to shorter reads. RESULTS: We present PBHoney, software that considers both intra-read discordance and soft-clipped tails of long reads (>10,000 bp) to identify structural variants. As a proof of concept, we identify four structural variants and two genomic features in a strain of Escherichia coli with PBHoney and validate them via de novo assembly. PBHoney is available for download at http://sourceforge.net/projects/pb-jelly/. CONCLUSIONS: Implementing two variant-identification approaches that exploit the high mappability of long reads, PBHoney is demonstrated as being effective at detecting larger structural variants using whole-genome Pacific Biosciences RS II Continuous Long Reads. Furthermore, PBHoney is able to discover two genomic features: the existence of Rac-Phage in isolate; evidence of E. coli's circular genome.

Assuntos

Genoma , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Escherichia coli/genética , Deleção de Genes , Humanos , Mutação INDEL , Análise de Sequência de DNA/métodos , Software

8.

Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline.

Reid, Jeffrey G; Carroll, Andrew; Veeraraghavan, Narayanan; Dahdouli, Mahmoud; Sundquist, Andreas; English, Adam; Bainbridge, Matthew; White, Simon; Salerno, William; Buhay, Christian; Yu, Fuli; Muzny, Donna; Daly, Richard; Duyk, Geoff; Gibbs, Richard A; Boerwinkle, Eric.

BMC Bioinformatics ; 15: 30, 2014 Jan 29.

Artigo em Inglês | MEDLINE | ID: mdl-24475911

RESUMO

BACKGROUND: Massively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results. RESULTS: To address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts. CONCLUSIONS: By taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples.

Assuntos

Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Internet , Software , Genoma/genética , Humanos

9.

Comprehensive and accurate genome analysis at scale using DRAGEN accelerated algorithms.

Behera, Sairam; Catreux, Severine; Rossi, Massimiliano; Truong, Sean; Huang, Zhuoyi; Ruehle, Michael; Visvanath, Arun; Parnaby, Gavin; Roddey, Cooper; Onuchic, Vitor; Cameron, Daniel L; English, Adam; Mehtalia, Shyamal; Han, James; Mehio, Rami; Sedlazeck, Fritz J.

bioRxiv ; 2024 Jan 06.

Artigo em Inglês | MEDLINE | ID: mdl-38260545

RESUMO

Research and medical genomics require comprehensive and scalable solutions to drive the discovery of novel disease targets, evolutionary drivers, and genetic markers with clinical significance. This necessitates a framework to identify all types of variants independent of their size (e.g., SNV/SV) or location (e.g., repeats). Here we present DRAGEN that utilizes novel methods based on multigenomes, hardware acceleration, and machine learning based variant detection to provide novel insights into individual genomes with ~30min computation time (from raw reads to variant detection). DRAGEN outperforms all other state-of-the-art methods in speed and accuracy across all variant types (SNV, indel, STR, SV, CNV) and further incorporates specialized methods to obtain key insights in medically relevant genes (e.g., HLA, SMN, GBA). We showcase DRAGEN across 3,202 genomes and demonstrate its scalability, accuracy, and innovations to further advance the integration of comprehensive genomics for research and medical applications.

10.

Analysis and benchmarking of small and large genomic variants across tandem repeats.

English, Adam C; Dolzhenko, Egor; Ziaei Jam, Helyaneh; McKenzie, Sean K; Olson, Nathan D; De Coster, Wouter; Park, Jonghun; Gu, Bida; Wagner, Justin; Eberle, Michael A; Gymrek, Melissa; Chaisson, Mark J P; Zook, Justin M; Sedlazeck, Fritz J.

Nat Biotechnol ; 2024 Apr 26.

Artigo em Inglês | MEDLINE | ID: mdl-38671154

RESUMO

Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits and are linked to over 60 disease phenotypes. However, they are often excluded from at-scale studies because of challenges with variant calling and representation, as well as a lack of a genome-wide standard. Here, to promote the development of TR methods, we created a catalog of TR regions and explored TR properties across 86 haplotype-resolved long-read human assemblies. We curated variants from the Genome in a Bottle (GIAB) HG002 individual to create a TR dataset to benchmark existing and future TR analysis methods. We also present an improved variant comparison method that handles variants greater than 4 bp in length and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ~24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 'truth-set' TR benchmark. We demonstrate the utility of this pipeline across short-read and long-read technologies.

11.

Characterization and visualization of tandem repeats at genome scale.

Dolzhenko, Egor; English, Adam; Dashnow, Harriet; De Sena Brandine, Guilherme; Mokveld, Tom; Rowell, William J; Karniski, Caitlin; Kronenberg, Zev; Danzi, Matt C; Cheung, Warren A; Bi, Chengpeng; Farrow, Emily; Wenger, Aaron; Chua, Khi Pin; Martínez-Cerdeño, Verónica; Bartley, Trevor D; Jin, Peng; Nelson, David L; Zuchner, Stephan; Pastinen, Tomi; Quinlan, Aaron R; Sedlazeck, Fritz J; Eberle, Michael A.

Nat Biotechnol ; 2024 Jan 02.

Artigo em Inglês | MEDLINE | ID: mdl-38168995

RESUMO

Tandem repeat (TR) variation is associated with gene expression changes and numerous rare monogenic diseases. Although long-read sequencing provides accurate full-length sequences and methylation of TRs, there is still a need for computational methods to profile TRs across the genome. Here we introduce the Tandem Repeat Genotyping Tool (TRGT) and an accompanying TR database. TRGT determines the consensus sequences and methylation levels of specified TRs from PacBio HiFi sequencing data. It also reports reads that support each repeat allele. These reads can be subsequently visualized with a companion TR visualization tool. Assessing 937,122 TRs, TRGT showed a Mendelian concordance of 98.38%, allowing a single repeat unit difference. In six samples with known repeat expansions, TRGT detected all expansions while also identifying methylation signals and mosaicism and providing finer repeat length resolution than existing methods. Additionally, we released a database with allele sequences and methylation levels for 937,122 TRs across 100 genomes.

12.

Benchmarking of small and large variants across tandem repeats.

English, Adam; Dolzhenko, Egor; Jam, Helyaneh Ziaei; Mckenzie, Sean; Olson, Nathan D; De Coster, Wouter; Park, Jonghun; Gu, Bida; Wagner, Justin; Eberle, Michael A; Gymrek, Melissa; Chaisson, Mark J P; Zook, Justin M; Sedlazeck, Fritz J.

bioRxiv ; 2023 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-37961319

RESUMO

Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits, and are linked to over 60 disease phenotypes. However, their complexity often excludes them from at-scale studies due to challenges with variant calling, representation, and lack of a genome-wide standard. To promote TR methods development, we create a comprehensive catalog of TR regions and explore its properties across 86 samples. We then curate variants from the GIAB HG002 individual to create a tandem repeat benchmark. We also present a variant comparison method that handles small and large alleles and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds â¼24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 TR benchmark. We work with the GIAB community to demonstrate the utility of this benchmark across short and long read technologies.

13.

Structural variation across 138,134 samples in the TOPMed consortium.

Jun, Goo; English, Adam C; Metcalf, Ginger A; Yang, Jianzhi; Chaisson, Mark Jp; Pankratz, Nathan; Menon, Vipin K; Salerno, William J; Krasheninina, Olga; Smith, Albert V; Lane, John A; Blackwell, Tom; Kang, Hyun Min; Salvi, Sejal; Meng, Qingchang; Shen, Hua; Pasham, Divya; Bhamidipati, Sravya; Kottapalli, Kavya; Arnett, Donna K; Ashley-Koch, Allison; Auer, Paul L; Beutel, Kathleen M; Bis, Joshua C; Blangero, John; Bowden, Donald W; Brody, Jennifer A; Cade, Brian E; Chen, Yii-Der Ida; Cho, Michael H; Curran, Joanne E; Fornage, Myriam; Freedman, Barry I; Fingerlin, Tasha; Gelb, Bruce D; Hou, Lifang; Hung, Yi-Jen; Kane, John P; Kaplan, Robert; Kim, Wonji; Loos, Ruth J F; Marcus, Gregory M; Mathias, Rasika A; McGarvey, Stephen T; Montgomery, Courtney; Naseri, Take; Nouraie, S Mehdi; Preuss, Michael H; Palmer, Nicholette D; Peyser, Patricia A.

bioRxiv ; 2023 Jan 25.

Artigo em Inglês | MEDLINE | ID: mdl-36747810

RESUMO

Ever larger Structural Variant (SV) catalogs highlighting the diversity within and between populations help researchers better understand the links between SVs and disease. The identification of SVs from DNA sequence data is non-trivial and requires a balance between comprehensiveness and precision. Here we present a catalog of 355,667 SVs (59.34% novel) across autosomes and the X chromosome (50bp+) from 138,134 individuals in the diverse TOPMed consortium. We describe our methodologies for SV inference resulting in high variant quality and >90% allele concordance compared to long-read de-novo assemblies of well-characterized control samples. We demonstrate utility through significant associations between SVs and important various cardio-metabolic and hemotologic traits. We have identified 690 SV hotspots and deserts and those that potentially impact the regulation of medically relevant genes. This catalog characterizes SVs across multiple populations and will serve as a valuable tool to understand the impact of SV on disease development and progression.

14.

Structural variation across 138,134 samples in the TOPMed consortium.

Jun, Goo; English, Adam C; Metcalf, Ginger A; Yang, Jianzhi; Chaisson, Mark Jp; Pankratz, Nathan; Menon, Vipin K; Salerno, William J; Krasheninina, Olga; Smith, Albert V; Lane, John A; Blackwell, Tom; Kang, Hyun Min; Salvi, Sejal; Meng, Qingchang; Shen, Hua; Pasham, Divya; Bhamidipati, Sravya; Kottapalli, Kavya; Arnett, Donna K; Ashley-Koch, Allison; Auer, Paul L; Beutel, Kathleen M; Bis, Joshua C; Blangero, John; Bowden, Donald W; Brody, Jennifer A; Cade, Brian E; Chen, Yii-Der Ida; Cho, Michael H; Curran, Joanne E; Fornage, Myriam; Freedman, Barry I; Fingerlin, Tasha; Gelb, Bruce D; Hou, Lifang; Hung, Yi-Jen; Kane, John P; Kaplan, Robert; Kim, Wonji; Loos, Ruth J F; Marcus, Gregory M; Mathias, Rasika A; McGarvey, Stephen T; Montgomery, Courtney; Naseri, Take; Nouraie, S Mehdi; Preuss, Michael H; Palmer, Nicholette D; Peyser, Patricia A.

Res Sq ; 2023 Feb 03.

Artigo em Inglês | MEDLINE | ID: mdl-36778386

RESUMO

Ever larger Structural Variant (SV) catalogs highlighting the diversity within and between populations help researchers better understand the links between SVs and disease. The identification of SVs from DNA sequence data is non-trivial and requires a balance between comprehensiveness and precision. Here we present a catalog of 355,667 SVs (59.34% novel) across autosomes and the X chromosome (50bp+) from 138,134 individuals in the diverse TOPMed consortium. We describe our methodologies for SV inference resulting in high variant quality and >90% allele concordance compared to long-read de-novo assemblies of well-characterized control samples. We demonstrate utility through significant associations between SVs and important various cardio-metabolic and hematologic traits. We have identified 690 SV hotspots and deserts and those that potentially impact the regulation of medically relevant genes. This catalog characterizes SVs across multiple populations and will serve as a valuable tool to understand the impact of SV on disease development and progression.

15.

Truvari: refined structural variant comparison preserves allelic diversity.

English, Adam C; Menon, Vipin K; Gibbs, Richard A; Metcalf, Ginger A; Sedlazeck, Fritz J.

Genome Biol ; 23(1): 271, 2022 12 27.

Artigo em Inglês | MEDLINE | ID: mdl-36575487

RESUMO

The fundamental challenge of multi-sample structural variant (SV) analysis such as merging and benchmarking is identifying when two SVs are the same. Common approaches for comparing SVs were developed alongside technologies which produce ill-defined boundaries. As SV detection becomes more exact, algorithms to preserve this refined signal are needed. Here, we present Truvari-an SV comparison, annotation, and analysis toolkit-and demonstrate the effect of SV comparison choices by building population-level VCFs from 36 haplotype-resolved long-read assemblies. We observe over-merging from other SV merging approaches which cause up to a 2.2× inflation of allele frequency, relative to Truvari.

Assuntos

Algoritmos , Variação Estrutural do Genoma , Humanos , Frequência do Gene , Alelos , Benchmarking , Sequenciamento de Nucleotídeos em Larga Escala , Genoma Humano

16.

xAtlas: scalable small variant calling across heterogeneous next-generation sequencing experiments.

Farek, Jesse; Hughes, Daniel; Salerno, William; Zhu, Yiming; Pisupati, Aishwarya; Mansfield, Adam; Krasheninina, Olga; English, Adam C; Metcalf, Ginger; Boerwinkle, Eric; Muzny, Donna M; Gibbs, Richard; Khan, Ziad; Sedlazeck, Fritz J.

Gigascience ; 122022 12 28.

Artigo em Inglês | MEDLINE | ID: mdl-36644891

RESUMO

BACKGROUND: The growing volume and heterogeneity of next-generation sequencing (NGS) data complicate the further optimization of identifying DNA variation, especially considering that curated high-confidence variant call sets frequently used to validate these methods are generally developed from the analysis of comparatively small and homogeneous sample sets. FINDINGS: We have developed xAtlas, a single-sample variant caller for single-nucleotide variants (SNVs) and small insertions and deletions (indels) in NGS data. xAtlas features rapid runtimes, support for CRAM and gVCF file formats, and retraining capabilities. xAtlas reports SNVs with 99.11% recall and 98.43% precision across a reference HG002 sample at 60× whole-genome coverage in less than 2 CPU hours. Applying xAtlas to 3,202 samples at 30× whole-genome coverage from the 1000 Genomes Project achieves an average runtime of 1.7 hours per sample and a clear separation of the individual populations in principal component analysis across called SNVs. CONCLUSIONS: xAtlas is a fast, lightweight, and accurate SNV and small indel calling method. Source code for xAtlas is available under a BSD 3-clause license at https://github.com/jfarek/xatlas.

Assuntos

Algoritmos , Software , Genoma , Mutação INDEL , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único

17.

Prevalence of alternative splicing choices in Arabidopsis thaliana.

English, Adam C; Patel, Ketan S; Loraine, Ann E.

BMC Plant Biol ; 10: 102, 2010 Jun 04.

Artigo em Inglês | MEDLINE | ID: mdl-20525311

RESUMO

BACKGROUND: Around 14% of protein-coding genes of Arabidopsis thaliana genes from the TAIR9 genome release are annotated as producing multiple transcript variants through alternative splicing. However, for most alternatively spliced genes in Arabidopsis, the relative expression level of individual splicing variants is unknown. RESULTS: We investigated prevalence of alternative splicing (AS) events in Arabidopsis thaliana using ESTs. We found that for most AS events with ample EST coverage, the majority of overlapping ESTs strongly supported one major splicing choice, with less than 10% of ESTs supporting the minor form. Analysis of ESTs also revealed a small but noteworthy subset of genes for which alternative choices appeared with about equal prevalence, suggesting that for these genes the variant splicing forms co-occur in the same cell types. Of the AS events in which both forms were about equally prevalent, more than 80% affected untranslated regions or involved small changes to the encoded protein sequence. CONCLUSIONS: Currently available evidence from ESTs indicates that alternative splicing in Arabidopsis occurs and affects many genes, but for most genes with documented alternative splicing, one AS choice predominates. To aid investigation of the role AS may play in modulating function of Arabidopsis genes, we provide an on-line resource (ArabiTag) that supports searching AS events by gene, by EST library keyword search, and by relative prevalence of minor and major forms.

Assuntos

Processamento Alternativo , Arabidopsis/genética , Modelos Genéticos , Biologia Computacional/métodos , Etiquetas de Sequências Expressas , Regulação da Expressão Gênica de Plantas , Genes de Plantas , Genoma de Planta , Alinhamento de Sequência , Interface Usuário-Computador

18.

Author Correction: A robust benchmark for detection of germline large deletions and insertions.

Zook, Justin M; Hansen, Nancy F; Olson, Nathan D; Chapman, Lesley; Mullikin, James C; Xiao, Chunlin; Sherry, Stephen; Koren, Sergey; Phillippy, Adam M; Boutros, Paul C; Sahraeian, Sayed Mohammad E; Huang, Vincent; Rouette, Alexandre; Alexander, Noah; Mason, Christopher E; Hajirasouliha, Iman; Ricketts, Camir; Lee, Joyce; Tearle, Rick; Fiddes, Ian T; Barrio, Alvaro Martinez; Wala, Jeremiah; Carroll, Andrew; Ghaffari, Noushin; Rodriguez, Oscar L; Bashir, Ali; Jackman, Shaun; Farrell, John J; Wenger, Aaron M; Alkan, Can; Soylev, Arda; Schatz, Michael C; Garg, Shilpa; Church, George; Marschall, Tobias; Chen, Ken; Fan, Xian; English, Adam C; Rosenfeld, Jeffrey A; Zhou, Weichen; Mills, Ryan E; Sage, Jay M; Davis, Jennifer R; Kaiser, Michael D; Oliver, John S; Catalano, Anthony P; Chaisson, Mark J P; Spies, Noah; Sedlazeck, Fritz J; Salit, Marc.

Nat Biotechnol ; 38(11): 1357, 2020 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-32699374

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

19.

A robust benchmark for detection of germline large deletions and insertions.

Zook, Justin M; Hansen, Nancy F; Olson, Nathan D; Chapman, Lesley; Mullikin, James C; Xiao, Chunlin; Sherry, Stephen; Koren, Sergey; Phillippy, Adam M; Boutros, Paul C; Sahraeian, Sayed Mohammad E; Huang, Vincent; Rouette, Alexandre; Alexander, Noah; Mason, Christopher E; Hajirasouliha, Iman; Ricketts, Camir; Lee, Joyce; Tearle, Rick; Fiddes, Ian T; Barrio, Alvaro Martinez; Wala, Jeremiah; Carroll, Andrew; Ghaffari, Noushin; Rodriguez, Oscar L; Bashir, Ali; Jackman, Shaun; Farrell, John J; Wenger, Aaron M; Alkan, Can; Soylev, Arda; Schatz, Michael C; Garg, Shilpa; Church, George; Marschall, Tobias; Chen, Ken; Fan, Xian; English, Adam C; Rosenfeld, Jeffrey A; Zhou, Weichen; Mills, Ryan E; Sage, Jay M; Davis, Jennifer R; Kaiser, Michael D; Oliver, John S; Catalano, Anthony P; Chaisson, Mark J P; Spies, Noah; Sedlazeck, Fritz J; Salit, Marc.

Nat Biotechnol ; 38(11): 1347-1355, 2020 11.

Artigo em Inglês | MEDLINE | ID: mdl-32541955

RESUMO

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.

Assuntos

Mutação em Linhagem Germinativa/genética , Mutação INDEL/genética , Diploide , Variação Estrutural do Genoma , Humanos , Anotação de Sequência Molecular , Análise de Sequência de DNA

20.

Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects.

Regier, Allison A; Farjoun, Yossi; Larson, David E; Krasheninina, Olga; Kang, Hyun Min; Howrigan, Daniel P; Chen, Bo-Juen; Kher, Manisha; Banks, Eric; Ames, Darren C; English, Adam C; Li, Heng; Xing, Jinchuan; Zhang, Yeting; Matise, Tara; Abecasis, Goncalo R; Salerno, Will; Zody, Michael C; Neale, Benjamin M; Hall, Ira M.

Nat Commun ; 9(1): 4038, 2018 10 02.

Artigo em Inglês | MEDLINE | ID: mdl-30279509

RESUMO

Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of current studies and data volumes. Here, we define WGS data processing standards that allow different groups to produce functionally equivalent (FE) results, yet still innovate on data processing pipelines. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results and produce significantly less variability than sequencing replicates. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for community-wide human genetics studies.

Assuntos

Genética Humana/normas , Sequenciamento Completo do Genoma/normas , Genoma Humano , Humanos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA