Búsqueda | BVS CLAP/SMR-OPS/OMS

1.

Exome sequencing and analysis of 454,787 UK Biobank participants.

Backman, Joshua D; Li, Alexander H; Marcketta, Anthony; Sun, Dylan; Mbatchou, Joelle; Kessler, Michael D; Benner, Christian; Liu, Daren; Locke, Adam E; Balasubramanian, Suganthi; Yadav, Ashish; Banerjee, Nilanjana; Gillies, Christopher E; Damask, Amy; Liu, Simon; Bai, Xiaodong; Hawes, Alicia; Maxwell, Evan; Gurski, Lauren; Watanabe, Kyoko; Kosmicki, Jack A; Rajagopal, Veera; Mighty, Jason; Jones, Marcus; Mitnaul, Lyndon; Stahl, Eli; Coppola, Giovanni; Jorgenson, Eric; Habegger, Lukas; Salerno, William J; Shuldiner, Alan R; Lotta, Luca A; Overton, John D; Cantor, Michael N; Reid, Jeffrey G; Yancopoulos, George; Kang, Hyun M; Marchini, Jonathan; Baras, Aris; Abecasis, Gonçalo R; Ferreira, Manuel A R.

Nature ; 599(7886): 628-634, 2021 11.

Artículo en Inglés | MEDLINE | ID: mdl-34662886

RESUMEN

A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing1 to explore protein-altering variants and their consequences in 454,787 participants in the UK Biobank study2. We identified 12 million coding variants, including around 1 million loss-of-function and around 1.8 million deleterious missense variants. When these were tested for association with 3,994 health-related traits, we found 564 genes with trait associations at P ≤ 2.18 × 10-11. Rare variant associations were enriched in loci from genome-wide association studies (GWAS), but most (91%) were independent of common variant signals. We discovered several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as risk-lowering associations for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3). Six genes were associated with brain imaging phenotypes, including two involved in neural development (GBE1, PLD1). Of the signals available and powered for replication in an independent cohort, 81% were confirmed; furthermore, association signals were generally consistent across individuals of European, Asian and African ancestry. We illustrate the ability of exome sequencing to identify gene-trait associations, elucidate gene function and pinpoint effector genes that underlie GWAS signals at scale.

Asunto(s)

Bancos de Muestras Biológicas , Bases de Datos Genéticas , Secuenciación del Exoma , Exoma/genética , África/etnología , Asia/etnología , Asma/genética , Diabetes Mellitus/genética , Europa (Continente)/etnología , Oftalmopatías/genética , Femenino , Predisposición Genética a la Enfermedad/genética , Variación Genética , Estudio de Asociación del Genoma Completo , Humanos , Hipertensión/genética , Hepatopatías/genética , Masculino , Mutación , Neoplasias/genética , Carácter Cuantitativo Heredable , Reino Unido

2.

Mapping and characterization of structural variation in 17,795 human genomes.

Abel, Haley J; Larson, David E; Regier, Allison A; Chiang, Colby; Das, Indraniel; Kanchi, Krishna L; Layer, Ryan M; Neale, Benjamin M; Salerno, William J; Reeves, Catherine; Buyske, Steven; Matise, Tara C; Muzny, Donna M; Zody, Michael C; Lander, Eric S; Dutcher, Susan K; Stitziel, Nathan O; Hall, Ira M.

Nature ; 583(7814): 83-89, 2020 07.

Artículo en Inglés | MEDLINE | ID: mdl-32460305

RESUMEN

A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline1 to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.

Asunto(s)

Variación Genética , Genoma Humano/genética , Secuenciación Completa del Genoma , Alelos , Estudios de Casos y Controles , Epigénesis Genética , Femenino , Dosificación de Gen/genética , Genética de Población , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Anotación de Secuencia Molecular , Sitios de Carácter Cuantitativo , Grupos Raciales/genética , Programas Informáticos

3.

Exome sequencing and characterization of 49,960 individuals in the UK Biobank.

Van Hout, Cristopher V; Tachmazidou, Ioanna; Backman, Joshua D; Hoffman, Joshua D; Liu, Daren; Pandey, Ashutosh K; Gonzaga-Jauregui, Claudia; Khalid, Shareef; Ye, Bin; Banerjee, Nilanjana; Li, Alexander H; O'Dushlaine, Colm; Marcketta, Anthony; Staples, Jeffrey; Schurmann, Claudia; Hawes, Alicia; Maxwell, Evan; Barnard, Leland; Lopez, Alexander; Penn, John; Habegger, Lukas; Blumenfeld, Andrew L; Bai, Xiaodong; O'Keeffe, Sean; Yadav, Ashish; Praveen, Kavita; Jones, Marcus; Salerno, William J; Chung, Wendy K; Surakka, Ida; Willer, Cristen J; Hveem, Kristian; Leader, Joseph B; Carey, David J; Ledbetter, David H; Cardon, Lon; Yancopoulos, George D; Economides, Aris; Coppola, Giovanni; Shuldiner, Alan R; Balasubramanian, Suganthi; Cantor, Michael; Nelson, Matthew R; Whittaker, John; Reid, Jeffrey G; Marchini, Jonathan; Overton, John D; Scott, Robert A; Abecasis, Gonçalo R; Yerges-Armstrong, Laura.

Nature ; 586(7831): 749-756, 2020 10.

Artículo en Inglés | MEDLINE | ID: mdl-33087929

RESUMEN

The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world1. Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.

Asunto(s)

Bases de Datos Genéticas , Secuenciación del Exoma , Exoma/genética , Mutación con Pérdida de Función/genética , Fenotipo , Anciano , Densidad Ósea/genética , Colágeno Tipo VI/genética , Demografía , Femenino , Genes BRCA1 , Genes BRCA2 , Genotipo , Humanos , Canales Iónicos/genética , Masculino , Persona de Mediana Edad , Neoplasias/genética , Penetrancia , Fragmentos de Péptidos/genética , Reino Unido , Várices/genética , Proteínas Activadoras de ras GTPasa/genética

4.

Optimized sample selection for cost-efficient long-read population sequencing.

Ranallo-Benavidez, T Rhyker; Lemmon, Zachary; Soyk, Sebastian; Aganezov, Sergey; Salerno, William J; McCoy, Rajiv C; Lippman, Zachary B; Schatz, Michael C; Sedlazeck, Fritz J.

Genome Res ; 31(5): 910-918, 2021 05.

Artículo en Inglés | MEDLINE | ID: mdl-33811084

RESUMEN

An increasingly important scenario in population genetics is when a large cohort has been genotyped using a low-resolution approach (e.g., microarrays, exome capture, short-read WGS), from which a few individuals are resequenced using a more comprehensive approach, especially long-read sequencing. The subset of individuals selected should ensure that the captured genetic diversity is fully representative and includes variants across all subpopulations. For example, human variation has historically focused on individuals with European ancestry, but this represents a small fraction of the overall diversity. Addressing this, SVCollector identifies the optimal subset of individuals for resequencing by analyzing population-level VCF files from low-resolution genotyping studies. It then computes a ranked list of samples that maximizes the total number of variants present within a subset of a given size. To solve this optimization problem, SVCollector implements a fast, greedy heuristic and an exact algorithm using integer linear programming. We apply SVCollector on simulated data, 2504 human genomes from the 1000 Genomes Project, and 3024 genomes from the 3000 Rice Genomes Project and show the rankings it computes are more representative than alternative naive strategies. When selecting an optimal subset of 100 samples in these cohorts, SVCollector identifies individuals from every subpopulation, whereas naive methods yield an unbalanced selection. Finally, we show the number of variants present in cohorts selected using this approach follows a power-law distribution that is naturally related to the population genetic concept of the allele frequency spectrum, allowing us to estimate the diversity present with increasing numbers of samples.

Asunto(s)

Genoma Humano , Polimorfismo de Nucleótido Simple , Exoma/genética , Frecuencia de los Genes , Genética de Población , Humanos , Análisis de Secuencia de ADN/métodos

5.

Sparse Project VCF: efficient encoding of population genotype matrices.

Lin, Michael F; Bai, Xiaodong; Salerno, William J; Reid, Jeffrey G.

Bioinformatics ; 36(22-23): 5537-5538, 2021 04 01.

Artículo en Inglés | MEDLINE | ID: mdl-33300997

RESUMEN

SUMMARY: Variant Call Format (VCF), the prevailing representation for germline genotypes in population sequencing, suffers rapid size growth as larger cohorts are sequenced and more rare variants are discovered. We present Sparse Project VCF (spVCF), an evolution of VCF with judicious entropy reduction and run-length encoding, delivering >10× size reduction for modern studies with practically minimal information loss. spVCF interoperates with VCF efficiently, including tabix-based random access. We demonstrate its effectiveness with the DiscovEHR and UK Biobank whole-exome sequencing cohorts. AVAILABILITY AND IMPLEMENTATION: Apache-licensed reference implementation: github.com/mlin/spVCF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Genómica , Programas Informáticos , Secuencia de Bases , Genotipo , Células Germinativas

6.

Correction: Whole exome sequencing study identifies novel rare and common Alzheimer's-Associated variants involved in immune response and transcriptional regulation.

Bis, Joshua C; Jian, Xueqiu; Kunkle, Brian W; Chen, Yuning; Hamilton-Nelson, Kara L; Bush, William S; Salerno, William J; Lancour, Daniel; Ma, Yiyi; Renton, Alan E; Marcora, Edoardo; Farrell, John J; Zhao, Yi; Qu, Liming; Ahmad, Shahzad; Amin, Najaf; Amouyel, Philippe; Beecham, Gary W; Below, Jennifer E; Campion, Dominique; Cantwell, Laura; Charbonnier, Camille; Chung, Jaeyoon; Crane, Paul K; Cruchaga, Carlos; Cupples, L Adrienne; Dartigues, Jean-François; Debette, Stéphanie; Deleuze, Jean-François; Fulton, Lucinda; Gabriel, Stacey B; Genin, Emmanuelle; Gibbs, Richard A; Goate, Alison; Grenier-Boley, Benjamin; Gupta, Namrata; Haines, Jonathan L; Havulinna, Aki S; Helisalmi, Seppo; Hiltunen, Mikko; Howrigan, Daniel P; Ikram, M Arfan; Kaprio, Jaakko; Konrad, Jan; Kuzma, Amanda; Lander, Eric S; Lathrop, Mark; Lehtimäki, Terho; Lin, Honghuang; Mattila, Kari.

Mol Psychiatry ; 25(8): 1901-1903, 2020 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-31636380

RESUMEN

A correction to this paper has been published and can be accessed via a link at the top of the paper.

7.

Whole exome sequencing study identifies novel rare and common Alzheimer's-Associated variants involved in immune response and transcriptional regulation.

Bis, Joshua C; Jian, Xueqiu; Kunkle, Brian W; Chen, Yuning; Hamilton-Nelson, Kara L; Bush, William S; Salerno, William J; Lancour, Daniel; Ma, Yiyi; Renton, Alan E; Marcora, Edoardo; Farrell, John J; Zhao, Yi; Qu, Liming; Ahmad, Shahzad; Amin, Najaf; Amouyel, Philippe; Beecham, Gary W; Below, Jennifer E; Campion, Dominique; Cantwell, Laura; Charbonnier, Camille; Chung, Jaeyoon; Crane, Paul K; Cruchaga, Carlos; Cupples, L Adrienne; Dartigues, Jean-François; Debette, Stéphanie; Deleuze, Jean-François; Fulton, Lucinda; Gabriel, Stacey B; Genin, Emmanuelle; Gibbs, Richard A; Goate, Alison; Grenier-Boley, Benjamin; Gupta, Namrata; Haines, Jonathan L; Havulinna, Aki S; Helisalmi, Seppo; Hiltunen, Mikko; Howrigan, Daniel P; Ikram, M Arfan; Kaprio, Jaakko; Konrad, Jan; Kuzma, Amanda; Lander, Eric S; Lathrop, Mark; Lehtimäki, Terho; Lin, Honghuang; Mattila, Kari.

Mol Psychiatry ; 25(8): 1859-1875, 2020 08.

Artículo en Inglés | MEDLINE | ID: mdl-30108311

RESUMEN

The Alzheimer's Disease Sequencing Project (ADSP) undertook whole exome sequencing in 5,740 late-onset Alzheimer disease (AD) cases and 5,096 cognitively normal controls primarily of European ancestry (EA), among whom 218 cases and 177 controls were Caribbean Hispanic (CH). An age-, sex- and APOE based risk score and family history were used to select cases most likely to harbor novel AD risk variants and controls least likely to develop AD by age 85 years. We tested ~1.5 million single nucleotide variants (SNVs) and 50,000 insertion-deletion polymorphisms (indels) for association to AD, using multiple models considering individual variants as well as gene-based tests aggregating rare, predicted functional, and loss of function variants. Sixteen single variants and 19 genes that met criteria for significant or suggestive associations after multiple-testing correction were evaluated for replication in four independent samples; three with whole exome sequencing (2,778 cases, 7,262 controls) and one with genome-wide genotyping imputed to the Haplotype Reference Consortium panel (9,343 cases, 11,527 controls). The top findings in the discovery sample were also followed-up in the ADSP whole-genome sequenced family-based dataset (197 members of 42 EA families and 501 members of 157 CH families). We identified novel and predicted functional genetic variants in genes previously associated with AD. We also detected associations in three novel genes: IGHG3 (p = 9.8 × 10-7), an immunoglobulin gene whose antibodies interact with ß-amyloid, a long non-coding RNA AC099552.4 (p = 1.2 × 10-7), and a zinc-finger protein ZNF655 (gene-based p = 5.0 × 10-6). The latter two suggest an important role for transcriptional regulation in AD pathogenesis.

Asunto(s)

Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/inmunología , Secuenciación del Exoma , Regulación de la Expresión Génica/genética , Inmunidad/genética , Transcripción Genética/genética , Anciano , Anciano de 80 o más Años , Enfermedad de Alzheimer/patología , Péptidos beta-Amiloides/inmunología , Apolipoproteínas E/genética , Femenino , Haplotipos/genética , Humanos , Inmunoglobulina G , Factores de Transcripción de Tipo Kruppel/genética , Masculino , Polimorfismo Genético/genética , ARN Largo no Codificante/genética

8.

VCPA: genomic variant calling pipeline and data management tool for Alzheimer's Disease Sequencing Project.

Leung, Yuk Yee; Valladares, Otto; Chou, Yi-Fan; Lin, Han-Jen; Kuzma, Amanda B; Cantwell, Laura; Qu, Liming; Gangadharan, Prabhakaran; Salerno, William J; Schellenberg, Gerard D; Wang, Li-San.

Bioinformatics ; 35(10): 1768-1770, 2019 05 15.

Artículo en Inglés | MEDLINE | ID: mdl-30351394

RESUMEN

SUMMARY: We report VCPA, our SNP/Indel Variant Calling Pipeline and data management tool used for the analysis of whole genome and exome sequencing (WGS/WES) for the Alzheimer's Disease Sequencing Project. VCPA consists of two independent but linkable components: pipeline and tracking database. The pipeline, implemented using the Workflow Description Language and fully optimized for the Amazon elastic compute cloud environment, includes steps from aligning raw sequence reads to variant calling using GATK. The tracking database allows users to view job running status in real time and visualize >100 quality metrics per genome. VCPA is functionally equivalent to the CCDG/TOPMed pipeline. Users can use the pipeline and the dockerized database to process large WGS/WES datasets on Amazon cloud with minimal configuration. AVAILABILITY AND IMPLEMENTATION: VCPA is released under the MIT license and is available for academic and nonprofit use for free. The pipeline source code and step-by-step instructions are available from the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (http://www.niagads.org/VCPA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Enfermedad de Alzheimer , Manejo de Datos , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Programas Informáticos

9.

Quality control and integration of genotypes from two calling pipelines for whole genome sequence data in the Alzheimer's disease sequencing project.

Naj, Adam C; Lin, Honghuang; Vardarajan, Badri N; White, Simon; Lancour, Daniel; Ma, Yiyi; Schmidt, Michael; Sun, Fangui; Butkiewicz, Mariusz; Bush, William S; Kunkle, Brian W; Malamon, John; Amin, Najaf; Choi, Seung Hoan; Hamilton-Nelson, Kara L; van der Lee, Sven J; Gupta, Namrata; Koboldt, Daniel C; Saad, Mohamad; Wang, Bowen; Nato, Alejandro Q; Sohi, Harkirat K; Kuzma, Amanda; Wang, Li-San; Cupples, L Adrienne; van Duijn, Cornelia; Seshadri, Sudha; Schellenberg, Gerard D; Boerwinkle, Eric; Bis, Joshua C; Dupuis, Josée; Salerno, William J; Wijsman, Ellen M; Martin, Eden R; DeStefano, Anita L.

Genomics ; 111(4): 808-818, 2019 07.

Artículo en Inglés | MEDLINE | ID: mdl-29857119

RESUMEN

The Alzheimer's Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed "consensus calling," to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available.

Asunto(s)

Enfermedad de Alzheimer/genética , Estudio de Asociación del Genoma Completo/normas , Técnicas de Genotipaje/normas , Control de Calidad , Secuenciación Completa del Genoma/normas , Algoritmos , Femenino , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Técnicas de Genotipaje/métodos , Humanos , Masculino , Polimorfismo Genético , Secuenciación Completa del Genoma/métodos

10.

SVachra: a tool to identify genomic structural variation in mate pair sequencing data containing inward and outward facing reads.

Hampton, Oliver A; English, Adam C; Wang, Mark; Salerno, William J; Liu, Yue; Muzny, Donna M; Han, Yi; Wheeler, David A; Worley, Kim C; Lupski, James R; Gibbs, Richard A.

BMC Genomics ; 18(Suppl 6): 691, 2017 Oct 03.

Artículo en Inglés | MEDLINE | ID: mdl-28984202

RESUMEN

BACKGROUND: Characterization of genomic structural variation (SV) is essential to expanding the research and clinical applications of genome sequencing. Reliance upon short DNA fragment paired end sequencing has yielded a wealth of single nucleotide variants and internal sequencing read insertions-deletions, at the cost of limited SV detection. Multi-kilobase DNA fragment mate pair sequencing has supplemented the void in SV detection, but introduced new analytic challenges requiring SV detection tools specifically designed for mate pair sequencing data. Here, we introduce SVachra - Structural Variation Assessment of CHRomosomal Aberrations, a breakpoint calling program that identifies large insertions-deletions, inversions, inter- and intra-chromosomal translocations utilizing both inward and outward facing read types generated by mate pair sequencing. RESULTS: We demonstrate SVachra's utility by executing the program on large-insert (Illumina Nextera) mate pair sequencing data from the personal genome of a single subject (HS1011). An additional data set of long-read (Pacific BioSciences RSII) was also generated to validate SV calls from SVachra and other comparison SV calling programs. SVachra exhibited the highest validation rate and reported the widest distribution of SV types and size ranges when compared to other SV callers. CONCLUSIONS: SVachra is a highly specific breakpoint calling program that exhibits a more unbiased SV detection methodology than other callers.

Asunto(s)

Variación Genética , Genómica/métodos , Análisis de Secuencia de ADN/métodos

11.

Assessing structural variation in a personal genome-towards a human reference diploid genome.

English, Adam C; Salerno, William J; Hampton, Oliver A; Gonzaga-Jauregui, Claudia; Ambreth, Shruthi; Ritter, Deborah I; Beck, Christine R; Davis, Caleb F; Dahdouli, Mahmoud; Ma, Singer; Carroll, Andrew; Veeraraghavan, Narayanan; Bruestle, Jeremy; Drees, Becky; Hastie, Alex; Lam, Ernest T; White, Simon; Mishra, Pamela; Wang, Min; Han, Yi; Zhang, Feng; Stankiewicz, Pawel; Wheeler, David A; Reid, Jeffrey G; Muzny, Donna M; Rogers, Jeffrey; Sabo, Aniko; Worley, Kim C; Lupski, James R; Boerwinkle, Eric; Gibbs, Richard A.

BMC Genomics ; 16: 286, 2015 Apr 11.

Artículo en Inglés | MEDLINE | ID: mdl-25886820

RESUMEN

BACKGROUND: Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods. RESULTS: We demonstrate Parliament's efficacy via integrated analyses of data from whole-genome array comparative genomic hybridization, short-read next-generation sequencing, long-read (Pacific BioSciences RSII), long-insert (Illumina Nextera), and whole-genome architecture (BioNano Irys) data from the personal genome of a single subject (HS1011). From this genome, Parliament identified 31,007 genomic loci between 100 bp and 1 Mbp that are inconsistent with the hg19 reference assembly. Of these loci, 9,777 are supported as putative SVs by hybrid local assembly, long-read PacBio data, or multi-source heuristics. These SVs span 59 Mbp of the reference genome (1.8%) and include 3,801 events identified only with long-read data. The HS1011 data and complete Parliament infrastructure, including a BAM-to-SV workflow, are available on the cloud-based service DNAnexus. CONCLUSIONS: HS1011 SV analysis reveals the limits and advantages of multiple sequencing technologies, specifically the impact of long-read SV discovery. With the full Parliament infrastructure, the HS1011 data constitute a public resource for novel SV discovery, software calibration, and personal genome structural variation analysis.

Asunto(s)

Genoma Humano , Variación Estructural del Genoma , Análisis de Secuencia de ADN/métodos , Biología Computacional , Bases de Datos Genéticas , Diploidia , Humanos , Programas Informáticos

12.

VCPA: genomic variant calling pipeline and data management tool for Alzheimer's Disease Sequencing Project.

Leung, Yuk Yee; Valladares, Otto; Chou, Yi-Fan; Lin, Han-Jen; Kuzma, Amanda B; Cantwell, Laura; Qu, Liming; Gangadharan, Prabhakaran; Salerno, William J; Schellenberg, Gerard D; Wang, Li-San.

Bioinformatics ; 35(11): 1985, 2019 06 01.

Artículo en Inglés | MEDLINE | ID: mdl-31004159

13.

PBHoney: identifying genomic variants via long-read discordance and interrupted mapping.

English, Adam C; Salerno, William J; Reid, Jeffrey G.

BMC Bioinformatics ; 15: 180, 2014 Jun 10.

Artículo en Inglés | MEDLINE | ID: mdl-24915764

RESUMEN

BACKGROUND: As resequencing projects become more prevalent across a larger number of species, accurate variant identification will further elucidate the nature of genetic diversity and become increasingly relevant in genomic studies. However, the identification of larger genomic variants via DNA sequencing is limited by both the incomplete information provided by sequencing reads and the nature of the genome itself. Long-read sequencing technologies provide high-resolution access to structural variants often inaccessible to shorter reads. RESULTS: We present PBHoney, software that considers both intra-read discordance and soft-clipped tails of long reads (>10,000 bp) to identify structural variants. As a proof of concept, we identify four structural variants and two genomic features in a strain of Escherichia coli with PBHoney and validate them via de novo assembly. PBHoney is available for download at http://sourceforge.net/projects/pb-jelly/. CONCLUSIONS: Implementing two variant-identification approaches that exploit the high mappability of long reads, PBHoney is demonstrated as being effective at detecting larger structural variants using whole-genome Pacific Biosciences RS II Continuous Long Reads. Furthermore, PBHoney is able to discover two genomic features: the existence of Rac-Phage in isolate; evidence of E. coli's circular genome.

Asunto(s)

Genoma , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Escherichia coli/genética , Eliminación de Gen , Humanos , Mutación INDEL , Análisis de Secuencia de ADN/métodos , Programas Informáticos

14.

Yield of genetic association signals from genomes, exomes and imputation in the UK Biobank.

Gaynor, Sheila M; Joseph, Tyler; Bai, Xiaodong; Zou, Yuxin; Boutkov, Boris; Maxwell, Evan K; Delaneau, Olivier; Hofmeister, Robin J; Krasheninina, Olga; Balasubramanian, Suganthi; Marcketta, Anthony; Backman, Joshua; Reid, Jeffrey G; Overton, John D; Lotta, Luca A; Marchini, Jonathan; Salerno, William J; Baras, Aris; Abecasis, Goncalo R; Thornton, Timothy A.

Nat Genet ; 2024 Sep 25.

Artículo en Inglés | MEDLINE | ID: mdl-39322778

RESUMEN

Whole-genome sequencing (WGS), whole-exome sequencing (WES) and array genotyping with imputation (IMP) are common strategies for assessing genetic variation and its association with medically relevant phenotypes. To date, there has been no systematic empirical assessment of the yield of these approaches when applied to hundreds of thousands of samples to enable the discovery of complex trait genetic signals. Using data for 100 complex traits from 149,195 individuals in the UK Biobank, we systematically compare the relative yield of these strategies in genetic association studies. We find that WGS and WES combined with arrays and imputation (WES + IMP) have the largest association yield. Although WGS results in an approximately fivefold increase in the total number of assayed variants over WES + IMP, the number of detected signals differed by only 1% for both single-variant and gene-based association analyses. Given that WES + IMP typically results in savings of lab and computational time and resources expended per sample, we evaluate the potential benefits of applying WES + IMP to larger samples. When we extend our WES + IMP analyses to 468,169 UK Biobank individuals, we observe an approximately fourfold increase in association signals with the threefold increase in sample size. We conclude that prioritizing WES + IMP and large sample sizes rather than contemporary short-read WGS alternatives will maximize the number of discoveries in genetic association studies.

15.

Genetic risk factors for COVID-19 and influenza are largely distinct.

Kosmicki, Jack A; Marcketta, Anthony; Sharma, Deepika; Di Gioia, Silvio Alessandro; Batista, Samantha; Yang, Xiao-Man; Tzoneva, Gannie; Martinez, Hector; Sidore, Carlo; Kessler, Michael D; Horowitz, Julie E; Roberts, Genevieve H L; Justice, Anne E; Banerjee, Nilanjana; Coignet, Marie V; Leader, Joseph B; Park, Danny S; Lanche, Rouel; Maxwell, Evan; Knight, Spencer C; Bai, Xiaodong; Guturu, Harendra; Baltzell, Asher; Girshick, Ahna R; McCurdy, Shannon R; Partha, Raghavendran; Mansfield, Adam J; Turissini, David A; Zhang, Miao; Mbatchou, Joelle; Watanabe, Kyoko; Verma, Anurag; Sirugo, Giorgio; Ritchie, Marylyn D; Salerno, William J; Shuldiner, Alan R; Rader, Daniel J; Mirshahi, Tooraj; Marchini, Jonathan; Overton, John D; Carey, David J; Habegger, Lukas; Reid, Jeffrey G; Economides, Aris; Kyratsous, Christos; Karalis, Katia; Baum, Alina; Cantor, Michael N; Rand, Kristin A; Hong, Eurie L.

Nat Genet ; 56(8): 1592-1596, 2024 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-39103650

RESUMEN

Coronavirus disease 2019 (COVID-19) and influenza are respiratory illnesses caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and influenza viruses, respectively. Both diseases share symptoms and clinical risk factors1, but the extent to which these conditions have a common genetic etiology is unknown. This is partly because host genetic risk factors are well characterized for COVID-19 but not for influenza, with the largest published genome-wide association studies for these conditions including >2 million individuals2 and about 1,000 individuals3-6, respectively. Shared genetic risk factors could point to targets to prevent or treat both infections. Through a genetic study of 18,334 cases with a positive test for influenza and 276,295 controls, we show that published COVID-19 risk variants are not associated with influenza. Furthermore, we discovered and replicated an association between influenza infection and noncoding variants in B3GALT5 and ST6GAL1, neither of which was associated with COVID-19. In vitro small interfering RNA knockdown of ST6GAL1-an enzyme that adds sialic acid to the cell surface, which is used for viral entry-reduced influenza infectivity by 57%. These results mirror the observation that variants that downregulate ACE2, the SARS-CoV-2 receptor, protect against COVID-19 (ref. 7). Collectively, these findings highlight downregulation of key cell surface receptors used for viral entry as treatment opportunities to prevent COVID-19 and influenza.

Asunto(s)

COVID-19 , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Gripe Humana , SARS-CoV-2 , Humanos , Gripe Humana/genética , Gripe Humana/epidemiología , Gripe Humana/virología , COVID-19/genética , COVID-19/virología , Factores de Riesgo , SARS-CoV-2/genética , Masculino , Femenino , Polimorfismo de Nucleótido Simple , Estudios de Casos y Controles , Persona de Mediana Edad

16.

Structural variation across 138,134 samples in the TOPMed consortium.

Jun, Goo; English, Adam C; Metcalf, Ginger A; Yang, Jianzhi; Chaisson, Mark Jp; Pankratz, Nathan; Menon, Vipin K; Salerno, William J; Krasheninina, Olga; Smith, Albert V; Lane, John A; Blackwell, Tom; Kang, Hyun Min; Salvi, Sejal; Meng, Qingchang; Shen, Hua; Pasham, Divya; Bhamidipati, Sravya; Kottapalli, Kavya; Arnett, Donna K; Ashley-Koch, Allison; Auer, Paul L; Beutel, Kathleen M; Bis, Joshua C; Blangero, John; Bowden, Donald W; Brody, Jennifer A; Cade, Brian E; Chen, Yii-Der Ida; Cho, Michael H; Curran, Joanne E; Fornage, Myriam; Freedman, Barry I; Fingerlin, Tasha; Gelb, Bruce D; Hou, Lifang; Hung, Yi-Jen; Kane, John P; Kaplan, Robert; Kim, Wonji; Loos, Ruth J F; Marcus, Gregory M; Mathias, Rasika A; McGarvey, Stephen T; Montgomery, Courtney; Naseri, Take; Nouraie, S Mehdi; Preuss, Michael H; Palmer, Nicholette D; Peyser, Patricia A.

bioRxiv ; 2023 Jan 25.

Artículo en Inglés | MEDLINE | ID: mdl-36747810

RESUMEN

Ever larger Structural Variant (SV) catalogs highlighting the diversity within and between populations help researchers better understand the links between SVs and disease. The identification of SVs from DNA sequence data is non-trivial and requires a balance between comprehensiveness and precision. Here we present a catalog of 355,667 SVs (59.34% novel) across autosomes and the X chromosome (50bp+) from 138,134 individuals in the diverse TOPMed consortium. We describe our methodologies for SV inference resulting in high variant quality and >90% allele concordance compared to long-read de-novo assemblies of well-characterized control samples. We demonstrate utility through significant associations between SVs and important various cardio-metabolic and hemotologic traits. We have identified 690 SV hotspots and deserts and those that potentially impact the regulation of medically relevant genes. This catalog characterizes SVs across multiple populations and will serve as a valuable tool to understand the impact of SV on disease development and progression.

17.

Structural variation across 138,134 samples in the TOPMed consortium.

Jun, Goo; English, Adam C; Metcalf, Ginger A; Yang, Jianzhi; Chaisson, Mark Jp; Pankratz, Nathan; Menon, Vipin K; Salerno, William J; Krasheninina, Olga; Smith, Albert V; Lane, John A; Blackwell, Tom; Kang, Hyun Min; Salvi, Sejal; Meng, Qingchang; Shen, Hua; Pasham, Divya; Bhamidipati, Sravya; Kottapalli, Kavya; Arnett, Donna K; Ashley-Koch, Allison; Auer, Paul L; Beutel, Kathleen M; Bis, Joshua C; Blangero, John; Bowden, Donald W; Brody, Jennifer A; Cade, Brian E; Chen, Yii-Der Ida; Cho, Michael H; Curran, Joanne E; Fornage, Myriam; Freedman, Barry I; Fingerlin, Tasha; Gelb, Bruce D; Hou, Lifang; Hung, Yi-Jen; Kane, John P; Kaplan, Robert; Kim, Wonji; Loos, Ruth J F; Marcus, Gregory M; Mathias, Rasika A; McGarvey, Stephen T; Montgomery, Courtney; Naseri, Take; Nouraie, S Mehdi; Preuss, Michael H; Palmer, Nicholette D; Peyser, Patricia A.

Res Sq ; 2023 Feb 03.

Artículo en Inglés | MEDLINE | ID: mdl-36778386

RESUMEN

Ever larger Structural Variant (SV) catalogs highlighting the diversity within and between populations help researchers better understand the links between SVs and disease. The identification of SVs from DNA sequence data is non-trivial and requires a balance between comprehensiveness and precision. Here we present a catalog of 355,667 SVs (59.34% novel) across autosomes and the X chromosome (50bp+) from 138,134 individuals in the diverse TOPMed consortium. We describe our methodologies for SV inference resulting in high variant quality and >90% allele concordance compared to long-read de-novo assemblies of well-characterized control samples. We demonstrate utility through significant associations between SVs and important various cardio-metabolic and hematologic traits. We have identified 690 SV hotspots and deserts and those that potentially impact the regulation of medically relevant genes. This catalog characterizes SVs across multiple populations and will serve as a valuable tool to understand the impact of SV on disease development and progression.

18.

Genome-wide analysis provides genetic evidence that ACE2 influences COVID-19 risk and yields risk scores associated with severe disease.

Horowitz, Julie E; Kosmicki, Jack A; Damask, Amy; Sharma, Deepika; Roberts, Genevieve H L; Justice, Anne E; Banerjee, Nilanjana; Coignet, Marie V; Yadav, Ashish; Leader, Joseph B; Marcketta, Anthony; Park, Danny S; Lanche, Rouel; Maxwell, Evan; Knight, Spencer C; Bai, Xiaodong; Guturu, Harendra; Sun, Dylan; Baltzell, Asher; Kury, Fabricio S P; Backman, Joshua D; Girshick, Ahna R; O'Dushlaine, Colm; McCurdy, Shannon R; Partha, Raghavendran; Mansfield, Adam J; Turissini, David A; Li, Alexander H; Zhang, Miao; Mbatchou, Joelle; Watanabe, Kyoko; Gurski, Lauren; McCarthy, Shane E; Kang, Hyun M; Dobbyn, Lee; Stahl, Eli; Verma, Anurag; Sirugo, Giorgio; Ritchie, Marylyn D; Jones, Marcus; Balasubramanian, Suganthi; Siminovitch, Katherine; Salerno, William J; Shuldiner, Alan R; Rader, Daniel J; Mirshahi, Tooraj; Locke, Adam E; Marchini, Jonathan; Overton, John D; Carey, David J.

Nat Genet ; 54(4): 382-392, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-35241825

RESUMEN

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) enters human host cells via angiotensin-converting enzyme 2 (ACE2) and causes coronavirus disease 2019 (COVID-19). Here, through a genome-wide association study, we identify a variant (rs190509934, minor allele frequency 0.2-2%) that downregulates ACE2 expression by 37% (P = 2.7 × 10-8) and reduces the risk of SARS-CoV-2 infection by 40% (odds ratio = 0.60, P = 4.5 × 10-13), providing human genetic evidence that ACE2 expression levels influence COVID-19 risk. We also replicate the associations of six previously reported risk variants, of which four were further associated with worse outcomes in individuals infected with the virus (in/near LZTFL1, MHC, DPP9 and IFNAR2). Lastly, we show that common variants define a risk score that is strongly associated with severe disease among cases and modestly improves the prediction of disease severity relative to demographic and clinical factors alone.

Asunto(s)

COVID-19 , Enzima Convertidora de Angiotensina 2/genética , COVID-19/genética , Estudio de Asociación del Genoma Completo , Humanos , Factores de Riesgo , SARS-CoV-2/genética

19.

Advancing human genetics research and drug discovery through exome sequencing of the UK Biobank.

Szustakowski, Joseph D; Balasubramanian, Suganthi; Kvikstad, Erika; Khalid, Shareef; Bronson, Paola G; Sasson, Ariella; Wong, Emily; Liu, Daren; Wade Davis, J; Haefliger, Carolina; Katrina Loomis, A; Mikkilineni, Rajesh; Noh, Hyun Ji; Wadhawan, Samir; Bai, Xiaodong; Hawes, Alicia; Krasheninina, Olga; Ulloa, Ricardo; Lopez, Alex E; Smith, Erin N; Waring, Jeffrey F; Whelan, Christopher D; Tsai, Ellen A; Overton, John D; Salerno, William J; Jacob, Howard; Szalma, Sandor; Runz, Heiko; Hinkle, Gregory; Nioi, Paul; Petrovski, Slavé; Miller, Melissa R; Baras, Aris; Mitnaul, Lyndon J; Reid, Jeffrey G.

Nat Genet ; 53(7): 942-948, 2021 07.

Artículo en Inglés | MEDLINE | ID: mdl-34183854

RESUMEN

The UK Biobank Exome Sequencing Consortium (UKB-ESC) is a private-public partnership between the UK Biobank (UKB) and eight biopharmaceutical companies that will complete the sequencing of exomes for all ~500,000 UKB participants. Here, we describe the early results from ~200,000 UKB participants and the features of this project that enabled its success. The biopharmaceutical industry has increasingly used human genetics to improve success in drug discovery. Recognizing the need for large-scale human genetics data, as well as the unique value of the data access and contribution terms of the UKB, the UKB-ESC was formed. As a result, exome data from 200,643 UKB enrollees are now available. These data include ~10 million exonic variants-a rich resource of rare coding variation that is particularly valuable for drug discovery. The UKB-ESC precompetitive collaboration has further strengthened academic and industry ties and has provided teams with an opportunity to interact with and learn from the wider research community.

Asunto(s)

Bancos de Muestras Biológicas , Descubrimiento de Drogas , Secuenciación del Exoma , Genética Humana , Investigación , Descubrimiento de Drogas/métodos , Genómica/métodos , Humanos , Reino Unido

20.

Parliament2: Accurate structural variant calling at scale.

Zarate, Samantha; Carroll, Andrew; Mahmoud, Medhat; Krasheninina, Olga; Jun, Goo; Salerno, William J; Schatz, Michael C; Boerwinkle, Eric; Gibbs, Richard A; Sedlazeck, Fritz J.

Gigascience ; 9(12)2020 12 21.

Artículo en Inglés | MEDLINE | ID: mdl-33347570

RESUMEN

BACKGROUND: Structural variants (SVs) are critical contributors to genetic diversity and genomic disease. To predict the phenotypic impact of SVs, there is a need for better estimates of both the occurrence and frequency of SVs, preferably from large, ethnically diverse cohorts. Thus, the current standard approach requires the use of short paired-end reads, which remain challenging to detect, especially at the scale of hundreds to thousands of samples. FINDINGS: We present Parliament2, a consensus SV framework that leverages multiple best-in-class methods to identify high-quality SVs from short-read DNA sequence data at scale. Parliament2 incorporates pre-installed SV callers that are optimized for efficient execution in parallel to reduce the overall runtime and costs. We demonstrate the accuracy of Parliament2 when applied to data from NovaSeq and HiSeq X platforms with the Genome in a Bottle (GIAB) SV call set across all size classes. The reported quality score per SV is calibrated across different SV types and size classes. Parliament2 has the highest F1 score (74.27%) measured across the independent gold standard from GIAB. We illustrate the compute performance by processing all 1000 Genomes samples (2,691 samples) in <1 day on GRCH38. Parliament2 improves the runtime performance of individual methods and is open source (https://github.com/slzarate/parliament2), and a Docker image, as well as a WDL implementation, is available. CONCLUSION: Parliament2 provides both a highly accurate single-sample SV call set from short-read DNA sequence data and enables cost-efficient application over cloud or cluster environments, processing thousands of samples.

Asunto(s)

Genómica , Programas Informáticos , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA