Búsqueda | Portal de Búsqueda de la BVS España

1.

High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios.

Byrska-Bishop, Marta; Evani, Uday S; Zhao, Xuefang; Basile, Anna O; Abel, Haley J; Regier, Allison A; Corvelo, André; Clarke, Wayne E; Musunuri, Rajeeva; Nagulapalli, Kshithija; Fairley, Susan; Runnels, Alexi; Winterkorn, Lara; Lowy, Ernesto; Germer, Soren; Brand, Harrison; Hall, Ira M; Talkowski, Michael E; Narzisi, Giuseppe; Zody, Michael C.

Cell ; 185(18): 3426-3440.e19, 2022 09 01.

Artículo en Inglés | MEDLINE | ID: mdl-36055201

RESUMEN

The 1000 Genomes Project (1kGP) is the largest fully open resource of whole-genome sequencing (WGS) data consented for public distribution without access or use restrictions. The final, phase 3 release of the 1kGP included 2,504 unrelated samples from 26 populations and was based primarily on low-coverage WGS. Here, we present a high-coverage 3,202-sample WGS 1kGP resource, which now includes 602 complete trios, sequenced to a depth of 30X using Illumina. We performed single-nucleotide variant (SNV) and short insertion and deletion (INDEL) discovery and generated a comprehensive set of structural variants (SVs) by integrating multiple analytic methods through a machine learning model. We show gains in sensitivity and precision of variant calls compared to phase 3, especially among rare SNVs as well as INDELs and SVs spanning frequency spectrum. We also generated an improved reference imputation panel, making variants discovered here accessible for association studies.

Asunto(s)

Genoma Humano , Secuenciación Completa del Genoma , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Mutación INDEL , Masculino , Polimorfismo de Nucleótido Simple

2.

High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation.

Gustafson, Jonas A; Gibson, Sophia B; Damaraju, Nikhita; Zalusky, Miranda P G; Hoekzema, Kendra; Twesigomwe, David; Yang, Lei; Snead, Anthony A; Richmond, Phillip A; De Coster, Wouter; Olson, Nathan D; Guarracino, Andrea; Li, Qiuhui; Miller, Angela L; Goffena, Joy; Anderson, Zachary B; Storz, Sophie H R; Ward, Sydney A; Sinha, Maisha; Gonzaga-Jauregui, Claudia; Clarke, Wayne E; Basile, Anna O; Corvelo, André; Reeves, Catherine; Helland, Adrienne; Musunuri, Rajeeva Lochan; Revsine, Mahler; Patterson, Karynne E; Paschal, Cate R; Zakarian, Christina; Goodwin, Sara; Jensen, Tanner D; Robb, Esther; McCombie, W Richard; Sedlazeck, Fritz J; Zook, Justin M; Montgomery, Stephen B; Garrison, Erik; Kolmogorov, Mikhail; Schatz, Michael C; McLaughlin, Richard N; Dashnow, Harriet; Zody, Michael C; Loose, Matt; Jain, Miten; Eichler, Evan E; Miller, Danny E.

Genome Res ; 2024 Oct 30.

Artículo en Inglés | MEDLINE | ID: mdl-39358015

RESUMEN

Fewer than half of individuals with a suspected Mendelian or monogenic condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control data sets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project (1KGP) Oxford Nanopore Technologies Sequencing Consortium aims to generate LRS data from at least 800 of the 1KGP samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37× and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.

3.

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.

Taliun, Daniel; Harris, Daniel N; Kessler, Michael D; Carlson, Jedidiah; Szpiech, Zachary A; Torres, Raul; Taliun, Sarah A Gagliano; Corvelo, André; Gogarten, Stephanie M; Kang, Hyun Min; Pitsillides, Achilleas N; LeFaive, Jonathon; Lee, Seung-Been; Tian, Xiaowen; Browning, Brian L; Das, Sayantan; Emde, Anne-Katrin; Clarke, Wayne E; Loesch, Douglas P; Shetty, Amol C; Blackwell, Thomas W; Smith, Albert V; Wong, Quenna; Liu, Xiaoming; Conomos, Matthew P; Bobo, Dean M; Aguet, François; Albert, Christine; Alonso, Alvaro; Ardlie, Kristin G; Arking, Dan E; Aslibekyan, Stella; Auer, Paul L; Barnard, John; Barr, R Graham; Barwick, Lucas; Becker, Lewis C; Beer, Rebecca L; Benjamin, Emelia J; Bielak, Lawrence F; Blangero, John; Boehnke, Michael; Bowden, Donald W; Brody, Jennifer A; Burchard, Esteban G; Cade, Brian E; Casella, James F; Chalazan, Brandon; Chasman, Daniel I; Chen, Yii-Der Ida.

Nature ; 590(7845): 290-299, 2021 02.

Artículo en Inglés | MEDLINE | ID: mdl-33568819

RESUMEN

The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.

Asunto(s)

Variación Genética/genética , Genoma Humano/genética , Genómica , National Heart, Lung, and Blood Institute (U.S.) , Medicina de Precisión , Citocromo P-450 CYP2D6/genética , Haplotipos/genética , Heterocigoto , Humanos , Mutación INDEL , Mutación con Pérdida de Función , Mutagénesis , Fenotipo , Polimorfismo de Nucleótido Simple , Densidad de Población , Medicina de Precisión/normas , Control de Calidad , Tamaño de la Muestra , Estados Unidos , Secuenciación Completa del Genoma/normas

4.

taxMaps: comprehensive and highly accurate taxonomic classification of short-read data in reasonable time.

Corvelo, André; Clarke, Wayne E; Robine, Nicolas; Zody, Michael C.

Genome Res ; 28(5): 751-758, 2018 05.

Artículo en Inglés | MEDLINE | ID: mdl-29588360

RESUMEN

High-throughput sequencing is a revolutionary technology for the analysis of metagenomic samples. However, querying large volumes of reads against comprehensive DNA/RNA databases in a sensitive manner can be compute-intensive. Here, we present taxMaps, a highly efficient, sensitive, and fully scalable taxonomic classification tool. Using a combination of simulated and real metagenomics data sets, we demonstrate that taxMaps is more sensitive and more precise than widely used taxonomic classifiers and is capable of delivering classification accuracy comparable to that of BLASTN, but at up to three orders of magnitude less computational cost.

Asunto(s)

Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Metagenómica/métodos , Programas Informáticos , Bacterias/clasificación , Bacterias/genética , Bases de Datos de Ácidos Nucleicos , Humanos , Microbiota/genética , Reproducibilidad de los Resultados , Ríos/microbiología , Especificidad de la Especie , Microbiología del Agua

5.

A crowdsourced set of curated structural variants for the human genome.

Chapman, Lesley M; Spies, Noah; Pai, Patrick; Lim, Chun Shen; Carroll, Andrew; Narzisi, Giuseppe; Watson, Christopher M; Proukakis, Christos; Clarke, Wayne E; Nariai, Naoki; Dawson, Eric; Jones, Garan; Blankenberg, Daniel; Brueffer, Christian; Xiao, Chunlin; Kolora, Sree Rohit Raj; Alexander, Noah; Wolujewicz, Paul; Ahmed, Azza E; Smith, Graeme; Shehreen, Saadlee; Wenger, Aaron M; Salit, Marc; Zook, Justin M.

PLoS Comput Biol ; 16(6): e1007933, 2020 06.

Artículo en Inglés | MEDLINE | ID: mdl-32559231

RESUMEN

A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app-SVCurator-to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. 'Expert' curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of 'expert' curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.

Asunto(s)

Genoma Humano , Variación Estructural del Genoma , Heurística , Humanos , Mutación INDEL

6.

The developmental transcriptome atlas of the biofuel crop Camelina sativa.

Kagale, Sateesh; Nixon, John; Khedikar, Yogendra; Pasha, Asher; Provart, Nicholas J; Clarke, Wayne E; Bollina, Venkatesh; Robinson, Stephen J; Coutu, Cathy; Hegedus, Dwayne D; Sharpe, Andrew G; Parkin, Isobel A P.

Plant J ; 88(5): 879-894, 2016 12.

Artículo en Inglés | MEDLINE | ID: mdl-27513981

RESUMEN

Camelina sativa is currently being embraced as a viable industrial bio-platform crop due to a number of desirable agronomic attributes and the unique fatty acid profile of the seed oil that has applications for food, feed and biofuel. The recent completion of the reference genome sequence of C. sativa identified a young hexaploid genome. To complement this work, we have generated a genome-wide developmental transcriptome map by RNA sequencing of 12 different tissues covering major developmental stages during the life cycle of C. sativa. We have generated a digital atlas of this comprehensive transcriptome resource that enables interactive visualization of expression data through a searchable database of electronic fluorescent pictographs (eFP browser). An analysis of this dataset supported expression of 88% of the annotated genes in C. sativa and provided a global overview of the complex architecture of temporal and spatial gene expression patterns active during development. Conventional differential gene expression analysis combined with weighted gene expression network analysis uncovered similarities as well as differences in gene expression patterns between different tissues and identified tissue-specific genes and network modules. A high-quality census of transcription factors, analysis of alternative splicing and tissue-specific genome dominance provided insight into the transcriptional dynamics and sub-genome interplay among the well-preserved triplicated repertoire of homeologous loci. The comprehensive transcriptome atlas in combination with the reference genome sequence provides a powerful resource for genomics research which can be leveraged to identify functional associations between genes and understand the regulatory networks underlying developmental processes.

Asunto(s)

Biocombustibles , Brassicaceae/metabolismo , Proteínas de Plantas/metabolismo , Transcriptoma/genética , Brassicaceae/genética , Regulación de la Expresión Génica de las Plantas/genética , Regulación de la Expresión Génica de las Plantas/fisiología , Proteínas de Plantas/genética , Poliploidía , Factores de Transcripción/genética , Factores de Transcripción/metabolismo

7.

Polyploid evolution of the Brassicaceae during the Cenozoic era.

Kagale, Sateesh; Robinson, Stephen J; Nixon, John; Xiao, Rong; Huebert, Terry; Condie, Janet; Kessler, Dallas; Clarke, Wayne E; Edger, Patrick P; Links, Matthew G; Sharpe, Andrew G; Parkin, Isobel A P.

Plant Cell ; 26(7): 2777-91, 2014 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-25035408

RESUMEN

The Brassicaceae (Cruciferae) family, owing to its remarkable species, genetic, and physiological diversity as well as its significant economic potential, has become a model for polyploidy and evolutionary studies. Utilizing extensive transcriptome pyrosequencing of diverse taxa, we established a resolved phylogeny of a subset of crucifer species. We elucidated the frequency, age, and phylogenetic position of polyploidy and lineage separation events that have marked the evolutionary history of the Brassicaceae. Besides the well-known ancient α (47 million years ago [Mya]) and ß (124 Mya) paleopolyploidy events, several species were shown to have undergone a further more recent (â¼7 to 12 Mya) round of genome multiplication. We identified eight whole-genome duplications corresponding to at least five independent neo/mesopolyploidy events. Although the Brassicaceae family evolved from other eudicots at the beginning of the Cenozoic era of the Earth (60 Mya), major diversification occurred only during the Neogene period (0 to 23 Mya). Remarkably, the widespread species divergence, major polyploidy, and lineage separation events during Brassicaceae evolution are clustered in time around epoch transitions characterized by prolonged unstable climatic conditions. The synchronized diversification of Brassicaceae species suggests that polyploid events may have conferred higher adaptability and increased tolerance toward the drastically changing global environment, thus facilitating species radiation.

Asunto(s)

Brassicaceae/genética , Cleome/genética , Evolución Molecular , Genoma de Planta/genética , Secuencia de Bases , Brassicaceae/clasificación , Cleome/clasificación , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Anotación de Secuencia Molecular , Datos de Secuencia Molecular , Filogenia , Hojas de la Planta/clasificación , Hojas de la Planta/genética , Poliploidía , ARN Mensajero/genética , ARN de Planta/química , ARN de Planta/genética , Análisis de Secuencia de ADN , Factores de Tiempo , Transcriptoma

8.

The compact genome of the plant pathogen Plasmodiophora brassicae is adapted to intracellular interactions with host Brassica spp.

Rolfe, Stephen A; Strelkov, Stephen E; Links, Matthew G; Clarke, Wayne E; Robinson, Stephen J; Djavaheri, Mohammad; Malinowski, Robert; Haddadi, Parham; Kagale, Sateesh; Parkin, Isobel A P; Taheri, Ali; Borhan, M Hossein.

BMC Genomics ; 17: 272, 2016 Mar 31.

Artículo en Inglés | MEDLINE | ID: mdl-27036196

RESUMEN

BACKGROUND: The protist Plasmodiophora brassicae is a soil-borne pathogen of cruciferous species and the causal agent of clubroot disease of Brassicas including agriculturally important crops such as canola/rapeseed (Brassica napus). P. brassicae has remained an enigmatic plant pathogen and is a rare example of an obligate biotroph that resides entirely inside the host plant cell. The pathogen is the cause of severe yield losses and can render infested fields unsuitable for Brassica crop growth due to the persistence of resting spores in the soil for up to 20 years. RESULTS: To provide insight into the biology of the pathogen and its interaction with its primary host B. napus, we produced a draft genome of P. brassicae pathotypes 3 and 6 (Pb3 and Pb6) that differ in their host range. Pb3 is highly virulent on B. napus (but also infects other Brassica species) while Pb6 infects only vegetable Brassica crops. Both the Pb3 and Pb6 genomes are highly compact, each with a total size of 24.2 Mb, and contain less than 2 % repetitive DNA. Clustering of genome-wide single nucleotide polymorphisms (SNP) of Pb3, Pb6 and three additional re-sequenced pathotypes (Pb2, Pb5 and Pb8) shows a high degree of correlation of cluster grouping with host range. The Pb3 genome features significant reduction of intergenic space with multiple examples of overlapping untranslated regions (UTRs). Dependency on the host for essential nutrients is evident from the loss of genes for the biosynthesis of thiamine and some amino acids and the presence of a wide range of transport proteins, including some unique to P. brassicae. The annotated genes of Pb3 include those with a potential role in the regulation of the plant growth hormones cytokinin and auxin. The expression profile of Pb3 genes, including putative effectors, during infection and their potential role in manipulation of host defence is discussed. CONCLUSION: The P. brassicae genome sequence reveals a compact genome, a dependency of the pathogen on its host for some essential nutrients and a potential role in the regulation of host plant cytokinin and auxin. Genome annotation supported by RNA sequencing reveals significant reduction in intergenic space which, in addition to low repeat content, has likely contributed to the P. brassicae compact genome.

Asunto(s)

Brassica/parasitología , Genoma de Protozoos , Interacciones Huésped-Parásitos/genética , Plasmodiophorida/genética , Arabidopsis , Productos Agrícolas/parasitología , Citocininas/metabolismo , ADN Protozoario/genética , Especificidad del Huésped , Ácidos Indolacéticos/metabolismo , Enfermedades de las Plantas/parasitología , Análisis de Secuencia de ARN , Transcriptoma

9.

A high-density SNP genotyping array for Brassica napus and its ancestral diploid species based on optimised selection of single-locus markers in the allotetraploid genome.

Clarke, Wayne E; Higgins, Erin E; Plieske, Joerg; Wieseke, Ralf; Sidebottom, Christine; Khedikar, Yogendra; Batley, Jacqueline; Edwards, Dave; Meng, Jinling; Li, Ruiyuan; Lawley, Cynthia Taylor; Pauquet, Jérôme; Laga, Benjamin; Cheung, Wing; Iniguez-Luy, Federico; Dyrszka, Emmanuelle; Rae, Stephen; Stich, Benjamin; Snowdon, Rod J; Sharpe, Andrew G; Ganal, Martin W; Parkin, Isobel A P.

Theor Appl Genet ; 129(10): 1887-99, 2016 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-27364915

RESUMEN

KEY MESSAGE: The Brassica napus Illumina array provides genome-wide markers linked to the available genome sequence, a significant tool for genetic analyses of the allotetraploid B. napus and its progenitor diploid genomes. A high-density single nucleotide polymorphism (SNP) Illumina Infinium array, containing 52,157 markers, was developed for the allotetraploid Brassica napus. A stringent selection process employing the short probe sequence for each SNP assay was used to limit the majority of the selected markers to those represented a minimum number of times across the highly replicated genome. As a result approximately 60 % of the SNP assays display genome-specificity, resolving as three clearly separated clusters (AA, AB, and BB) when tested with a diverse range of B. napus material. This genome specificity was supported by the analysis of the diploid ancestors of B. napus, whereby 26,504 and 29,720 markers were scorable in B. oleracea and B. rapa, respectively. Forty-four percent of the assayed loci on the array were genetically mapped in a single doubled-haploid B. napus population allowing alignment of their physical and genetic coordinates. Although strong conservation of the two positions was shown, at least 3 % of the loci were genetically mapped to a homoeologous position compared to their presumed physical position in the respective genome, underlying the importance of genetic corroboration of locus identity. In addition, the alignments identified multiple rearrangements between the diploid and tetraploid Brassica genomes. Although mostly attributed to genome assembly errors, some are likely evidence of rearrangements that occurred since the hybridisation of the progenitor genomes in the B. napus nucleus. Based on estimates for linkage disequilibrium decay, the array is a valuable tool for genetic fine mapping and genome-wide association studies in B. napus and its progenitor genomes.

Asunto(s)

Brassica napus/genética , Mapeo Cromosómico , Genoma de Planta , Técnicas de Genotipaje , Polimorfismo de Nucleótido Simple , ADN de Plantas/genética , Diploidia , Marcadores Genéticos , Análisis de Secuencia de ADN , Tetraploidía

10.

A systems genomics and genetics approach to identify the genetic regulatory network for lignin content in Brassica napus seeds.

Zhang, Wentao; Higgins, Erin E; Robinson, Stephen J; Clarke, Wayne E; Boyle, Kerry; Sharpe, Andrew G; Fobert, Pierre R; Parkin, Isobel A P.

Front Plant Sci ; 15: 1393621, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38903439

RESUMEN

Seed quality traits of oilseed rape, Brassica napus (B. napus), exhibit quantitative inheritance determined by its genetic makeup and the environment via the mediation of a complex genetic architecture of hundreds to thousands of genes. Thus, instead of single gene analysis, network-based systems genomics and genetics approaches that combine genotype, phenotype, and molecular phenotypes offer a promising alternative to uncover this complex genetic architecture. In the current study, systems genetics approaches were used to explore the genetic regulation of lignin traits in B. napus seeds. Four QTL (qLignin_A09_1, qLignin_A09_2, qLignin_A09_3, and qLignin_C08) distributed on two chromosomes were identified for lignin content. The qLignin_A09_2 and qLignin_C08 loci were homologous QTL from the A and C subgenomes, respectively. Genome-wide gene regulatory network analysis identified eighty-three subnetworks (or modules); and three modules with 910 genes in total, were associated with lignin content, which was confirmed by network QTL analysis. eQTL (expression quantitative trait loci) analysis revealed four cis-eQTL genes including lignin and flavonoid pathway genes, cinnamoyl-CoA-reductase (CCR1), and TRANSPARENT TESTA genes TT4, TT6, TT8, as causal genes. The findings validated the power of systems genetics to identify causal regulatory networks and genes underlying complex traits. Moreover, this information may enable the research community to explore new breeding strategies, such as network selection or gene engineering, to rewire networks to develop climate resilience crops with better seed quality.

11.

Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation.

Gustafson, Jonas A; Gibson, Sophia B; Damaraju, Nikhita; Zalusky, Miranda Pg; Hoekzema, Kendra; Twesigomwe, David; Yang, Lei; Snead, Anthony A; Richmond, Phillip A; De Coster, Wouter; Olson, Nathan D; Guarracino, Andrea; Li, Qiuhui; Miller, Angela L; Goffena, Joy; Anderson, Zachery; Storz, Sophie Hr; Ward, Sydney A; Sinha, Maisha; Gonzaga-Jauregui, Claudia; Clarke, Wayne E; Basile, Anna O; Corvelo, André; Reeves, Catherine; Helland, Adrienne; Musunuri, Rajeeva Lochan; Revsine, Mahler; Patterson, Karynne E; Paschal, Cate R; Zakarian, Christina; Goodwin, Sara; Jensen, Tanner D; Robb, Esther; McCombie, W Richard; Sedlazeck, Fritz J; Zook, Justin M; Montgomery, Stephen B; Garrison, Erik; Kolmogorov, Mikhail; Schatz, Michael C; McLaughlin, Richard N; Dashnow, Harriet; Zody, Michael C; Loose, Matt; Jain, Miten; Eichler, Evan E; Miller, Danny E.

medRxiv ; 2024 Mar 07.

Artículo en Inglés | MEDLINE | ID: mdl-38496498

RESUMEN

Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.

12.

Ancient orphan crop joins modern era: gene-based SNP discovery and mapping in lentil.

Sharpe, Andrew G; Ramsay, Larissa; Sanderson, Lacey-Anne; Fedoruk, Michael J; Clarke, Wayne E; Li, Rong; Kagale, Sateesh; Vijayan, Perumal; Vandenberg, Albert; Bett, Kirstin E.

BMC Genomics ; 14: 192, 2013 Mar 18.

Artículo en Inglés | MEDLINE | ID: mdl-23506258

RESUMEN

BACKGROUND: The genus Lens comprises a range of closely related species within the galegoid clade of the Papilionoideae family. The clade includes other important crops (e.g. chickpea and pea) as well as a sequenced model legume (Medicago truncatula). Lentil is a global food crop increasing in importance in the Indian sub-continent and elsewhere due to its nutritional value and quick cooking time. Despite this importance there has been a dearth of genetic and genomic resources for the crop and this has limited the application of marker-assisted selection strategies in breeding. RESULTS: We describe here the development of a deep and diverse transcriptome resource for lentil using next generation sequencing technology. The generation of data in multiple cultivated (L. culinaris) and wild (L. ervoides) genotypes together with the utilization of a bioinformatics workflow enabled the identification of a large collection of SNPs and the subsequent development of a genotyping platform that was used to establish the first comprehensive genetic map of the L. culinaris genome. Extensive collinearity with M. truncatula was evident on the basis of sequence homology between mapped markers and the model genome and large translocations and inversions relative to M. truncatula were identified. An estimate for the time divergence of L. culinaris from L. ervoides and of both from M. truncatula was also calculated. CONCLUSIONS: The availability of the genomic and derived molecular marker resources presented here will help change lentil breeding strategies and lead to increased genetic gain in the future.

Asunto(s)

Lens (Planta)/genética , Ligamiento Genético , Genómica , Genotipo , Medicago truncatula/genética , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN

13.

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes.

Ebler, Jana; Ebert, Peter; Clarke, Wayne E; Rausch, Tobias; Audano, Peter A; Houwaart, Torsten; Mao, Yafei; Korbel, Jan O; Eichler, Evan E; Zody, Michael C; Dilthey, Alexander T; Marschall, Tobias.

Nat Genet ; 54(4): 518-525, 2022 04.

Artículo en Inglés | MEDLINE | ID: mdl-35410384

RESUMEN

Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation-a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows.

Asunto(s)

Variación Genética , Genoma Humano , Genómica , Algoritmos , Genoma Humano/genética , Estudio de Asociación del Genoma Completo , Genómica/métodos , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN

14.

Curated variation benchmarks for challenging medically relevant autosomal genes.

Wagner, Justin; Olson, Nathan D; Harris, Lindsay; McDaniel, Jennifer; Cheng, Haoyu; Fungtammasan, Arkarachai; Hwang, Yih-Chii; Gupta, Richa; Wenger, Aaron M; Rowell, William J; Khan, Ziad M; Farek, Jesse; Zhu, Yiming; Pisupati, Aishwarya; Mahmoud, Medhat; Xiao, Chunlin; Yoo, Byunggil; Sahraeian, Sayed Mohammad Ebrahim; Miller, Danny E; Jáspez, David; Lorenzo-Salazar, José M; Muñoz-Barrera, Adrián; Rubio-Rodríguez, Luis A; Flores, Carlos; Narzisi, Giuseppe; Evani, Uday Shanker; Clarke, Wayne E; Lee, Joyce; Mason, Christopher E; Lincoln, Stephen E; Miga, Karen H; Ebbert, Mark T W; Shumate, Alaina; Li, Heng; Chin, Chen-Shan; Zook, Justin M; Sedlazeck, Fritz J.

Nat Biotechnol ; 40(5): 672-680, 2022 05.

Artículo en Inglés | MEDLINE | ID: mdl-35132260

RESUMEN

The repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting. The Genome in a Bottle Consortium has provided variant benchmark sets, but these exclude nearly 400 medically relevant genes due to their repetitiveness or polymorphic complexity. Here, we characterize 273 of these 395 challenging autosomal genes using a haplotype-resolved whole-genome assembly. This curated benchmark reports over 17,000 single-nucleotide variations, 3,600 insertions and deletions and 200 structural variations each for human genome reference GRCh37 and GRCh38 across HG002. We show that false duplications in either GRCh37 or GRCh38 result in reference-specific, missed variants for short- and long-read technologies in medically relevant genes, including CBS, CRYAA and KCNE1. When masking these false duplications, variant recall can improve from 8% to 100%. Forming benchmarks from a haplotype-resolved whole-genome assembly may become a prototype for future benchmarks covering the whole genome.

Asunto(s)

Genoma Humano , Genoma Humano/genética , Haplotipos/genética , Humanos , Análisis de Secuencia de ADN

15.

De novo sequence assembly of Albugo candida reveals a small genome relative to other biotrophic oomycetes.

Links, Matthew G; Holub, Eric; Jiang, Rays H Y; Sharpe, Andrew G; Hegedus, Dwayne; Beynon, Elena; Sillito, Dean; Clarke, Wayne E; Uzuhashi, Shihomi; Borhan, Mohammad H.

BMC Genomics ; 12: 503, 2011 Oct 13.

Artículo en Inglés | MEDLINE | ID: mdl-21995639

RESUMEN

BACKGROUND: Albugo candida is a biotrophic oomycete that parasitizes various species of Brassicaceae, causing a disease (white blister rust) with remarkable convergence in behaviour to unrelated rusts of basidiomycete fungi. RESULTS: A recent genome analysis of the oomycete Hyaloperonospora arabidopsidis suggests that a reduction in the number of genes encoding secreted pathogenicity proteins, enzymes for assimilation of inorganic nitrogen and sulphur represent a genomic signature for the evolution of obligate biotrophy. Here, we report a draft reference genome of a major crop pathogen Albugo candida (another obligate biotrophic oomycete) with an estimated genome of 45.3 Mb. This is very similar to the genome size of a necrotrophic oomycete Pythium ultimum (43 Mb) but less than half that of H. arabidopsidis (99 Mb). Sequencing of A. candida transcripts from infected host tissue and zoosporangia combined with genome-wide annotation revealed 15,824 predicted genes. Most of the predicted genes lack significant similarity with sequences from other oomycetes. Most intriguingly, A. candida appears to have a much smaller repertoire of pathogenicity-related proteins than H. arabidopsidis including genes that encode RXLR effector proteins, CRINKLER-like genes, and elicitins. Necrosis and Ethylene inducing Peptides were not detected in the genome of A. candida. Putative orthologs of tat-C, a component of the twin arginine translocase system, were identified from multiple oomycete genera along with proteins containing putative tat-secretion signal peptides. CONCLUSION: Albugo candida has a comparatively small genome amongst oomycetes, retains motility of sporangial inoculum, and harbours a much smaller repertoire of candidate effectors than was recently reported for H. arabidopsidis. This minimal gene repertoire could indicate a lack of expansion, rather than a reduction, in the number of genes that signify the evolution of biotrophy in oomycetes.

Asunto(s)

Oomicetos/genética , Secuencia de Aminoácidos , Brassicaceae/parasitología , Etiquetas de Secuencia Expresada , Proteínas Fúngicas/química , Proteínas Fúngicas/metabolismo , Genoma , Datos de Secuencia Molecular , Enfermedades de las Plantas/parasitología , Alineación de Secuencia , Análisis de Secuencia de ARN

16.

Performance assessment of DNA sequencing platforms in the ABRF Next-Generation Sequencing Study.

Foox, Jonathan; Tighe, Scott W; Nicolet, Charles M; Zook, Justin M; Byrska-Bishop, Marta; Clarke, Wayne E; Khayat, Michael M; Mahmoud, Medhat; Laaguiby, Phoebe K; Herbert, Zachary T; Warner, Derek; Grills, George S; Jen, Jin; Levy, Shawn; Xiang, Jenny; Alonso, Alicia; Zhao, Xia; Zhang, Wenwei; Teng, Fei; Zhao, Yonggang; Lu, Haorong; Schroth, Gary P; Narzisi, Giuseppe; Farmerie, William; Sedlazeck, Fritz J; Baldwin, Don A; Mason, Christopher E.

Nat Biotechnol ; 39(9): 1129-1140, 2021 09.

Artículo en Inglés | MEDLINE | ID: mdl-34504351

RESUMEN

Assessing the reproducibility, accuracy and utility of massively parallel DNA sequencing platforms remains an ongoing challenge. Here the Association of Biomolecular Resource Facilities (ABRF) Next-Generation Sequencing Study benchmarks the performance of a set of sequencing instruments (HiSeq/NovaSeq/paired-end 2 × 250-bp chemistry, Ion S5/Proton, PacBio circular consensus sequencing (CCS), Oxford Nanopore Technologies PromethION/MinION, BGISEQ-500/MGISEQ-2000 and GS111) on human and bacterial reference DNA samples. Among short-read instruments, HiSeq 4000 and X10 provided the most consistent, highest genome coverage, while BGI/MGISEQ provided the lowest sequencing error rates. The long-read instrument PacBio CCS had the highest reference-based mapping rate and lowest non-mapping rate. The two long-read platforms PacBio CCS and PromethION/MinION showed the best sequence mapping in repeat-rich areas and across homopolymers. NovaSeq 6000 using 2 × 250-bp read chemistry was the most robust instrument for capturing known insertion/deletion events. This study serves as a benchmark for current genomics technologies, as well as a resource to inform experimental design and next-generation sequencing variant calling.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/normas , Disparidad de Par Base , Benchmarking , ADN/genética , ADN Bacteriano/genética , Genoma Bacteriano , Genoma Humano , Humanos

17.

Haplotype-resolved diverse human genomes and integrated analysis of structural variation.

Ebert, Peter; Audano, Peter A; Zhu, Qihui; Rodriguez-Martin, Bernardo; Porubsky, David; Bonder, Marc Jan; Sulovari, Arvis; Ebler, Jana; Zhou, Weichen; Serra Mari, Rebecca; Yilmaz, Feyza; Zhao, Xuefang; Hsieh, PingHsun; Lee, Joyce; Kumar, Sushant; Lin, Jiadong; Rausch, Tobias; Chen, Yu; Ren, Jingwen; Santamarina, Martin; Höps, Wolfram; Ashraf, Hufsah; Chuang, Nelson T; Yang, Xiaofei; Munson, Katherine M; Lewis, Alexandra P; Fairley, Susan; Tallon, Luke J; Clarke, Wayne E; Basile, Anna O; Byrska-Bishop, Marta; Corvelo, André; Evani, Uday S; Lu, Tsung-Yu; Chaisson, Mark J P; Chen, Junjie; Li, Chong; Brand, Harrison; Wenger, Aaron M; Ghareghani, Maryam; Harvey, William T; Raeder, Benjamin; Hasenfeld, Patrick; Regier, Allison A; Abel, Haley J; Hall, Ira M; Flicek, Paul; Stegle, Oliver; Gerstein, Mark B; Tubio, Jose M C.

Science ; 372(6537)2021 04 02.

Artículo en Inglés | MEDLINE | ID: mdl-33632895

RESUMEN

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.

Asunto(s)

Variación Genética , Genoma Humano , Haplotipos , Femenino , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL , Secuencias Repetitivas Esparcidas , Masculino , Grupos de Población/genética , Sitios de Carácter Cuantitativo , Retroelementos , Análisis de Secuencia de ADN , Inversión de Secuencia , Secuenciación Completa del Genoma

18.

Towards unambiguous transcript mapping in the allotetraploid Brassica napus.

Parkin, Isobel A P; Clarke, Wayne E; Sidebottom, Christine; Zhang, Wentao; Robinson, Stephen J; Links, Matthew G; Karcz, Steve; Higgins, Erin E; Fobert, Pierre; Sharpe, Andrew G.

Genome ; 53(11): 929-38, 2010 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-21076508

RESUMEN

The architecture of the Brassica napus genome is marked by its evolutionary origins. The genome of B. napus was formed from the hybridization of two closely related diploid Brassica species, both of which evolved from an hexaploid ancestor. The extensive whole genome duplication events in its near and distant past result in the allotetraploid genome of B. napus maintaining multiple copies of most genes, which predicts a highly complex and redundant transcriptome that can confound any expression analyses. A stringent assembly of 142,399 B. napus expressed sequence tags allowed the development of a well-differentiated set of reference transcripts, which were used as a foundation to assess the efficacy of available tools for identifying and distinguishing transcripts in B. napus; including microarray hybridization and 3' anchored sequence tag capture. Microarray platforms cannot distinguish transcripts derived from the two progenitors or close homologues, although observed differential expression appeared to be biased towards unique transcripts. The use of 3' capture enhanced the ability to unambiguously identify homologues within the B. napus transcriptome but was limited by tag length. The ability to comprehensively catalogue gene expression in polyploid species could be transformed by the application of cost-efficient next generation sequencing technologies that will capture millions of long sequence tags.

Asunto(s)

Brassica napus/genética , Perfilación de la Expresión Génica/métodos , Tetraploidía , Etiquetas de Secuencia Expresada , Regulación de la Expresión Génica de las Plantas , Genes de Plantas , Genoma de Planta , Análisis de Secuencia por Matrices de Oligonucleótidos

19.

Narrow genetic base shapes population structure and linkage disequilibrium in an industrial oilseed crop, Brassica carinata A. Braun.

Khedikar, Yogendra; Clarke, Wayne E; Chen, Lifeng; Higgins, Erin E; Kagale, Sateesh; Koh, Chu Shin; Bennett, Rick; Parkin, Isobel A P.

Sci Rep ; 10(1): 12629, 2020 07 28.

Artículo en Inglés | MEDLINE | ID: mdl-32724070

RESUMEN

Ethiopian mustard (Brassica carinata A. Braun) is an emerging sustainable source of vegetable oil, in particular for the biofuel industry. The present study exploited genome assemblies of the Brassica diploids, Brassica nigra and Brassica oleracea, to discover over 10,000 genome-wide SNPs using genotype by sequencing of 620 B. carinata lines. The analyses revealed a SNP frequency of one every 91.7 kb, a heterozygosity level of 0.30, nucleotide diversity levels of 1.31 × 10-05, and the first five principal components captured only 13% molecular variation, indicating low levels of genetic diversity among the B. carinata collection. Genome bias was observed, with greater SNP density found on the B subgenome. The 620 lines clustered into two distinct sub-populations (SP1 and SP2) with the majority of accessions (88%) clustered in SP1 with those from Ethiopia, the presumed centre of origin. SP2 was distinguished by a collection of breeding lines, implicating targeted selection in creating population structure. Two selective sweep regions on B3 and B8 were detected, which harbour genes involved in fatty acid and aliphatic glucosinolate biosynthesis, respectively. The assessment of genetic diversity, population structure, and LD in the global B. carinata collection provides critical information to assist future crop improvement.

Asunto(s)

Productos Agrícolas/genética , Industrias , Desequilibrio de Ligamiento/genética , Planta de la Mostaza/genética , Cromosomas de las Plantas/genética , Variación Genética , Genética de Población , Genoma de Planta , Haplotipos/genética , Polimorfismo de Nucleótido Simple/genética , Selección Genética

20.

An archived activation tagged population of Arabidopsis thaliana to facilitate forward genetics approaches.

Robinson, Stephen J; Tang, Lily H; Mooney, Brent Ag; McKay, Sheldon J; Clarke, Wayne E; Links, Matthew G; Karcz, Steven; Regan, Sharon; Wu, Yun-Yun; Gruber, Margaret Y; Cui, Dejun; Yu, Min; Parkin, Isobel A P.

BMC Plant Biol ; 9: 101, 2009 Jul 31.

Artículo en Inglés | MEDLINE | ID: mdl-19646253

RESUMEN

BACKGROUND: Functional genomics tools provide researchers with the ability to apply high-throughput techniques to determine the function and interaction of a diverse range of genes. Mutagenized plant populations are one such resource that facilitate gene characterisation. They allow complex physiological responses to be correlated with the expression of single genes in planta, through either reverse genetics where target genes are mutagenized to assay the affect, or through forward genetics where populations of mutant lines are screened to identify those whose phenotype diverges from wild type for a particular trait. One limitation of these types of populations is the prevalence of gene redundancy within plant genomes, which can mask the affect of individual genes. Activation or enhancer populations, which not only provide knock-out but also dominant activation mutations, can facilitate the study of such genes. RESULTS: We have developed a population of almost 50,000 activation tagged A. thaliana lines that have been archived as individual lines to the T3 generation. The population is an excellent tool for both reverse and forward genetic screens and has been used successfully to identify a number of novel mutants. Insertion site sequences have been generated and mapped for 15,507 lines to enable further application of the population, while providing a clear distribution of T-DNA insertions across the genome. The population is being screened for a number of biochemical and developmental phenotypes, provisional data identifying novel alleles and genes controlling steps in proanthocyanidin biosynthesis and trichome development is presented. CONCLUSION: This publicly available population provides an additional tool for plant researcher's to assist with determining gene function for the many as yet uncharacterised genes annotated within the Arabidopsis genome sequence http://aafc-aac.usask.ca/FST. The presence of enhancer elements on the inserted T-DNA molecule allows both knock-out and dominant activation phenotypes to be identified for traits of interest.

Asunto(s)

Arabidopsis/genética , Genoma de Planta , Genómica/métodos , Mutagénesis Insercional , Análisis Mutacional de ADN , ADN Bacteriano/genética , ADN de Plantas/genética , Genes de Plantas

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA