Pesquisa | Portal Regional da BVS

1.

A Chromosome-Length Assembly of the Hawaiian Monk Seal (Neomonachus schauinslandi): A History of "Genetic Purging" and Genomic Stability.

Mohr, David W; Gaughran, Stephen J; Paschall, Justin; Naguib, Ahmed; Pang, Andy Wing Chun; Dudchenko, Olga; Aiden, Erez Lieberman; Church, Deanna M; Scott, Alan F.

Genes (Basel) ; 13(7)2022 07 18.

Artigo em Inglês | MEDLINE | ID: mdl-35886053

RESUMO

The Hawaiian monk seal (HMS) is the single extant species of tropical earless seals of the genus Neomonachus. The species survived a severe bottleneck in the late 19th century and experienced subsequent population declines until becoming the subject of a NOAA-led species recovery effort beginning in 1976 when the population was fewer than 1000 animals. Like other recovering species, the Hawaiian monk seal has been reported to have reduced genetic heterogeneity due to the bottleneck and subsequent inbreeding. Here, we report a chromosomal reference assembly for a male animal produced using a variety of methods. The final assembly consisted of 16 autosomes, an X, and portions of the Y chromosomes. We compared variants in this animal to other HMS and to a frequently sequenced human sample, confirming about 12% of the variation seen in man. To confirm that the reference animal was representative of the HMS, we compared his sequence to that of 10 other individuals and noted similarly low variation in all. Variation in the major histocompatibility (MHC) genes was nearly absent compared to the orthologous human loci. Demographic analysis predicts that Hawaiian monk seals have had a long history of small populations preceding the bottleneck, and their current low levels of heterozygosity may indicate specialization to a stable environment. When we compared our reference assembly to that of other species, we observed significant conservation of chromosomal architecture with other pinnipeds, especially other phocids. This reference should be a useful tool for future evolutionary studies as well as the long-term management of this species.

Assuntos

Focas Verdadeiras , Animais , Cromossomos , Instabilidade Genômica , Havaí/epidemiologia , Humanos , Masculino , Focas Verdadeiras/genética

2.

Author Correction: A general approach for detecting expressed mutations in AML cells using single cell RNA-sequencing.

Petti, Allegra A; Williams, Stephen R; Miller, Christopher A; Fiddes, Ian T; Srivatsan, Sridhar N; Chen, David Y; Fronick, Catrina C; Fulton, Robert S; Church, Deanna M; Ley, Timothy J.

Nat Commun ; 13(1): 4216, 2022 Jul 21.

Artigo em Inglês | MEDLINE | ID: mdl-35864110

3.

A next-generation human genome sequence.

Church, Deanna M.

Science ; 376(6588): 34-35, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-35357937

RESUMO

A near-complete sequence outlines a path for a more inclusive reference.

Assuntos

Genoma Humano , Sequência de Bases , Humanos , Análise de Sequência de DNA

4.

Single-Cell Transcriptomics Reveals Early Emergence of Liver Parenchymal and Non-parenchymal Cell Lineages.

Lotto, Jeremy; Drissler, Sibyl; Cullum, Rebecca; Wei, Wei; Setty, Manu; Bell, Erin M; Boutet, Stéphane C; Nowotschin, Sonja; Kuo, Ying-Yi; Garg, Vidur; Pe'er, Dana; Church, Deanna M; Hadjantonakis, Anna-Katerina; Hoodless, Pamela A.

Cell ; 183(3): 702-716.e14, 2020 10 29.

Artigo em Inglês | MEDLINE | ID: mdl-33125890

RESUMO

The cellular complexity and scale of the early liver have constrained analyses examining its emergence during organogenesis. To circumvent these issues, we analyzed 45,334 single-cell transcriptomes from embryonic day (E)7.5, when endoderm progenitors are specified, to E10.5 liver, when liver parenchymal and non-parenchymal cell lineages emerge. Our data detail divergence of vascular and sinusoidal endothelia, including a distinct transcriptional profile for sinusoidal endothelial specification by E8.75. We characterize two distinct mesothelial cell types as well as early hepatic stellate cells and reveal distinct spatiotemporal distributions for these populations. We capture transcriptional profiles for hepatoblast specification and migration, including the emergence of a hepatomesenchymal cell type and evidence for hepatoblast collective cell migration. Further, we identify cell-cell interactions during the organization of the primitive sinusoid. This study provides a comprehensive atlas of liver lineage establishment from the endoderm and mesoderm through to the organization of the primitive sinusoid at single-cell resolution.

Assuntos

Linhagem da Célula/genética , Fígado/citologia , Fígado/metabolismo , Análise de Célula Única , Transcriptoma/genética , Animais , Movimento Celular , Embrião de Mamíferos/citologia , Endotélio/citologia , Mesoderma/citologia , Camundongos , Transdução de Sinais , Células-Tronco/citologia

5.

Thousands of human sequences provide deep insight into single genomes.

Church, Deanna M.

Nature ; 581(7809): 385-386, 2020 05.

Artigo em Inglês | MEDLINE | ID: mdl-32461645

Assuntos

Genética Populacional , Genoma , Humanos

6.

De novo assembly of the olive fruit fly (Bactrocera oleae) genome with linked-reads and long-read technologies minimizes gaps and provides exceptional Y chromosome assembly.

Bayega, Anthony; Djambazian, Haig; Tsoumani, Konstantina T; Gregoriou, Maria-Eleni; Sagri, Efthimia; Drosopoulou, Eleni; Mavragani-Tsipidou, Penelope; Giorda, Kristina; Tsiamis, George; Bourtzis, Kostas; Oikonomopoulos, Spyridon; Dewar, Ken; Church, Deanna M; Papanicolaou, Alexie; Mathiopoulos, Kostas D; Ragoussis, Jiannis.

BMC Genomics ; 21(1): 259, 2020 Mar 30.

Artigo em Inglês | MEDLINE | ID: mdl-32228451

RESUMO

BACKGROUND: The olive fruit fly, Bactrocera oleae, is the most important pest in the olive fruit agribusiness industry. This is because female flies lay their eggs in the unripe fruits and upon hatching the larvae feed on the fruits thus destroying them. The lack of a high-quality genome and other genomic and transcriptomic data has hindered progress in understanding the fly's biology and proposing alternative control methods to pesticide use. RESULTS: Genomic DNA was sequenced from male and female Demokritos strain flies, maintained in the laboratory for over 45 years. We used short-, mate-pair-, and long-read sequencing technologies to generate a combined male-female genome assembly (GenBank accession GCA_001188975.2). Genomic DNA sequencing from male insects using 10x Genomics linked-reads technology followed by mate-pair and long-read scaffolding and gap-closing generated a highly contiguous 489 Mb genome with a scaffold N50 of 4.69 Mb and L50 of 30 scaffolds (GenBank accession GCA_001188975.4). RNA-seq data generated from 12 tissues and/or developmental stages allowed for genome annotation. Short reads from both males and females and the chromosome quotient method enabled identification of Y-chromosome scaffolds which were extensively validated by PCR. CONCLUSIONS: The high-quality genome generated represents a critical tool in olive fruit fly research. We provide an extensive RNA-seq data set, and genome annotation, critical towards gaining an insight into the biology of the olive fruit fly. In addition, elucidation of Y-chromosome sequences will advance our understanding of the Y-chromosome's organization, function and evolution and is poised to provide avenues for sterile insect technique approaches.

Assuntos

Tephritidae/genética , Cromossomo Y/genética , Cromossomo Y/metabolismo , Animais , Feminino , Genoma de Inseto/genética , Masculino , Reação em Cadeia da Polimerase

7.

Birth, expansion, and death of VCY-containing palindromes on the human Y chromosome.

Shi, Wentao; Massaia, Andrea; Louzada, Sandra; Handsaker, Juliet; Chow, William; McCarthy, Shane; Collins, Joanna; Hallast, Pille; Howe, Kerstin; Church, Deanna M; Yang, Fengtang; Xue, Yali; Tyler-Smith, Chris.

Genome Biol ; 20(1): 207, 2019 10 14.

Artigo em Inglês | MEDLINE | ID: mdl-31610793

RESUMO

BACKGROUND: Large palindromes (inverted repeats) make up substantial proportions of mammalian sex chromosomes, often contain genes, and have high rates of structural variation arising via ectopic recombination. As a result, they underlie many genomic disorders. Maintenance of the palindromic structure by gene conversion between the arms has been documented, but over longer time periods, palindromes are remarkably labile. Mechanisms of origin and loss of palindromes have, however, received little attention. RESULTS: Here, we use fiber-FISH, 10x Genomics Linked-Read sequencing, and breakpoint PCR sequencing to characterize the structural variation of the P8 palindrome on the human Y chromosome, which contains two copies of the VCY (Variable Charge Y) gene. We find a deletion of almost an entire arm of the palindrome, leading to death of the palindrome, a size increase by recruitment of adjacent sequence, and other complex changes including the formation of an entire new palindrome nearby. Together, these changes are found in ~ 1% of men, and we can assign likely molecular mechanisms to these mutational events. As a result, healthy men can have 1-4 copies of VCY. CONCLUSIONS: Gross changes, especially duplications, in palindrome structure can be relatively frequent and facilitate the evolution of sex chromosomes in humans, and potentially also in other mammalian species.

Assuntos

Cromossomos Humanos Y , Sequências Repetidas Invertidas , Proteínas Nucleares/genética , Sequência de Bases , Variações do Número de Cópias de DNA , Humanos

8.

A general approach for detecting expressed mutations in AML cells using single cell RNA-sequencing.

Petti, Allegra A; Williams, Stephen R; Miller, Christopher A; Fiddes, Ian T; Srivatsan, Sridhar N; Chen, David Y; Fronick, Catrina C; Fulton, Robert S; Church, Deanna M; Ley, Timothy J.

Nat Commun ; 10(1): 3660, 2019 08 14.

Artigo em Inglês | MEDLINE | ID: mdl-31413257

RESUMO

Virtually all tumors are genetically heterogeneous, containing mutationally-defined subclonal cell populations that often have distinct phenotypes. Single-cell RNA-sequencing has revealed that a variety of tumors are also transcriptionally heterogeneous, but the relationship between expression heterogeneity and subclonal architecture is unclear. Here, we address this question in the context of Acute Myeloid Leukemia (AML) by integrating whole genome sequencing with single-cell RNA-sequencing (using the 10x Genomics Chromium Single Cell 5' Gene Expression workflow). Applying this approach to five cryopreserved AML samples, we identify hundreds to thousands of cells containing tumor-specific mutations in each case, and use the results to distinguish AML cells (including normal-karyotype AML cells) from normal cells, identify expression signatures associated with subclonal mutations, and find cell surface markers that could be used to purify subclones for further study. This integrative approach for connecting genotype to phenotype is broadly applicable to any sample that is phenotypically and genetically heterogeneous.

Assuntos

Leucemia Mieloide Aguda/genética , RNA/genética , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Genômica , Genótipo , Humanos , Mutação , Fenótipo , Polimorfismo de Nucleotídeo Único , Sequenciamento Completo do Genoma

9.

Multi-platform discovery of haplotype-resolved structural variation in human genomes.

Chaisson, Mark J P; Sanders, Ashley D; Zhao, Xuefang; Malhotra, Ankit; Porubsky, David; Rausch, Tobias; Gardner, Eugene J; Rodriguez, Oscar L; Guo, Li; Collins, Ryan L; Fan, Xian; Wen, Jia; Handsaker, Robert E; Fairley, Susan; Kronenberg, Zev N; Kong, Xiangmeng; Hormozdiari, Fereydoun; Lee, Dillon; Wenger, Aaron M; Hastie, Alex R; Antaki, Danny; Anantharaman, Thomas; Audano, Peter A; Brand, Harrison; Cantsilieris, Stuart; Cao, Han; Cerveira, Eliza; Chen, Chong; Chen, Xintong; Chin, Chen-Shan; Chong, Zechen; Chuang, Nelson T; Lambert, Christine C; Church, Deanna M; Clarke, Laura; Farrell, Andrew; Flores, Joey; Galeev, Timur; Gorkin, David U; Gujral, Madhusudan; Guryev, Victor; Heaton, William Haynes; Korlach, Jonas; Kumar, Sushant; Kwon, Jee Young; Lam, Ernest T; Lee, Jong Eun; Lee, Joyce; Lee, Wan-Ping; Lee, Sau Peng.

Nat Commun ; 10(1): 1784, 2019 04 16.

Artigo em Inglês | MEDLINE | ID: mdl-30992455

RESUMO

The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50 bp) and 27,622 SVs (≥50 bp) per genome. We also discover 156 inversions per genome and 58 of the inversions intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a three to sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The methods and the dataset presented serve as a gold standard for the scientific community allowing us to make recommendations for maximizing structural variation sensitivity for future genome sequencing studies.

Assuntos

Genoma Humano/genética , Variação Estrutural do Genoma , Genômica/métodos , Haplótipos/genética , Algoritmos , Mapeamento Cromossômico/métodos , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Mutação INDEL , Sequenciamento Completo do Genoma/métodos

10.

The emergent landscape of the mouse gut endoderm at single-cell resolution.

Nowotschin, Sonja; Setty, Manu; Kuo, Ying-Yi; Liu, Vincent; Garg, Vidur; Sharma, Roshan; Simon, Claire S; Saiz, Nestor; Gardner, Rui; Boutet, Stéphane C; Church, Deanna M; Hoodless, Pamela A; Hadjantonakis, Anna-Katerina; Pe'er, Dana.

Nature ; 569(7756): 361-367, 2019 05.

Artigo em Inglês | MEDLINE | ID: mdl-30959515

RESUMO

Here we delineate the ontogeny of the mammalian endoderm by generating 112,217 single-cell transcriptomes, which represent all endoderm populations within the mouse embryo until midgestation. We use graph-based approaches to model differentiating cells, which provides a spatio-temporal characterization of developmental trajectories and defines the transcriptional architecture that accompanies the emergence of the first (primitive or extra-embryonic) endodermal population and its sister pluripotent (embryonic) epiblast lineage. We uncover a relationship between descendants of these two lineages, in which epiblast cells differentiate into endoderm at two distinct time points-before and during gastrulation. Trajectories of endoderm cells were mapped as they acquired embryonic versus extra-embryonic fates and as they spatially converged within the nascent gut endoderm, which revealed these cells to be globally similar but retain aspects of their lineage history. We observed the regionalized identity of cells along the anterior-posterior axis of the emergent gut tube, which reflects their embryonic or extra-embryonic origin, and the coordinated patterning of these cells into organ-specific territories.

Assuntos

Endoderma/citologia , Endoderma/embriologia , Intestinos/citologia , Intestinos/embriologia , Análise de Célula Única , Animais , Blastocisto/citologia , Padronização Corporal , Diferenciação Celular , Linhagem da Célula , Feminino , Gastrulação , Masculino , Camundongos

11.

Resolving the full spectrum of human genome variation using Linked-Reads.

Marks, Patrick; Garcia, Sarah; Barrio, Alvaro Martinez; Belhocine, Kamila; Bernate, Jorge; Bharadwaj, Rajiv; Bjornson, Keith; Catalanotti, Claudia; Delaney, Josh; Fehr, Adrian; Fiddes, Ian T; Galvin, Brendan; Heaton, Haynes; Herschleb, Jill; Hindson, Christopher; Holt, Esty; Jabara, Cassandra B; Jett, Susanna; Keivanfar, Nikka; Kyriazopoulou-Panagiotopoulou, Sofia; Lek, Monkol; Lin, Bill; Lowe, Adam; Mahamdallie, Shazia; Maheshwari, Shamoni; Makarewicz, Tony; Marshall, Jamie; Meschi, Francesca; O'Keefe, Christopher J; Ordonez, Heather; Patel, Pranav; Price, Andrew; Royall, Ariel; Ruark, Elise; Seal, Sheila; Schnall-Levin, Michael; Shah, Preyas; Stafford, David; Williams, Stephen; Wu, Indira; Xu, Andrew Wei; Rahman, Nazneen; MacArthur, Daniel; Church, Deanna M.

Genome Res ; 29(4): 635-645, 2019 04.

Artigo em Inglês | MEDLINE | ID: mdl-30894395

RESUMO

Large-scale population analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short-read whole-genome sequencing. However, these short-read approaches fail to give a complete picture of a genome. They struggle to identify structural events, cannot access repetitive regions, and fail to resolve the human genome into haplotypes. Here, we describe an approach that retains long range information while maintaining the advantages of short reads. Starting from â¼1 ng of high molecular weight DNA, we produce barcoded short-read libraries. Novel informatic approaches allow for the barcoded short reads to be associated with their original long molecules producing a novel data type known as "Linked-Reads". This approach allows for simultaneous detection of small and large variants from a single library. In this manuscript, we show the advantages of Linked-Reads over standard short-read approaches for reference-based analysis. Linked-Reads allow mapping to 38 Mb of sequence not accessible to short reads, adding sequence in 423 difficult-to-sequence genes including disease-relevant genes STRC, SMN1, and SMN2 Both Linked-Read whole-genome and whole-exome sequencing identify complex structural variations, including balanced events and single exon deletions and duplications. Further, Linked-Reads extend the region of high-confidence calls by 68.9 Mb. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.

Assuntos

Estudo de Associação Genômica Ampla/métodos , Polimorfismo Genético , Sequenciamento Completo do Genoma/métodos , Linhagem Celular , Genoma Humano , Humanos , Peptídeos e Proteínas de Sinalização Intercelular , Proteínas de Membrana/genética , Proteína 1 de Sobrevivência do Neurônio Motor/genética , Proteína 2 de Sobrevivência do Neurônio Motor/genética

12.

Genomes for all.

Church, Deanna M.

Nat Biotechnol ; 36(9): 815-816, 2018 09 06.

Artigo em Inglês | MEDLINE | ID: mdl-30188541

Assuntos

Variação Genética , Genoma Bacteriano

13.

Corrigendum: Direct determination of diploid genome sequences.

Weisenfeld, Neil I; Kumar, Vijay; Shah, Preyas; Church, Deanna M; Jaffe, David B.

Genome Res ; 28(4): 606.1, 2018 04.

Artigo em Inglês | MEDLINE | ID: mdl-29610250

14.

Reference quality assembly of the 3.5-Gb genome of Capsicum annuum from a single linked-read library.

Hulse-Kemp, Amanda M; Maheshwari, Shamoni; Stoffel, Kevin; Hill, Theresa A; Jaffe, David; Williams, Stephen R; Weisenfeld, Neil; Ramakrishnan, Srividya; Kumar, Vijay; Shah, Preyas; Schatz, Michael C; Church, Deanna M; Van Deynze, Allen.

Hortic Res ; 5: 4, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29423234

RESUMO

Linked-Read sequencing technology has recently been employed successfully for de novo assembly of human genomes, however, the utility of this technology for complex plant genomes is unproven. We evaluated the technology for this purpose by sequencing the 3.5-gigabase (Gb) diploid pepper (Capsicum annuum) genome with a single Linked-Read library. Plant genomes, including pepper, are characterized by long, highly similar repetitive sequences. Accordingly, significant effort is used to ensure that the sequenced plant is highly homozygous and the resulting assembly is a haploid consensus. With a phased assembly approach, we targeted a heterozygous F1 derived from a wide cross to assess the ability to derive both haplotypes and characterize a pungency gene with a large insertion/deletion. The Supernova software generated a highly ordered, more contiguous sequence assembly than all currently available C. annuum reference genomes. Over 83% of the final assembly was anchored and oriented using four publicly available de novo linkage maps. A comparison of the annotation of conserved eukaryotic genes indicated the completeness of assembly. The validity of the phased assembly is further demonstrated with the complete recovery of both 2.5-Kb insertion/deletion haplotypes of the PUN1 locus in the F1 sample that represents pungent and nonpungent peppers, as well as nearly full recovery of the BUSCO2 gene set within each of the two haplotypes. The most contiguous pepper genome assembly to date has been generated which demonstrates that Linked-Read library technology provides a tool to de novo assemble complex highly repetitive heterozygous plant genomes. This technology can provide an opportunity to cost-effectively develop high-quality genome assemblies for other complex plants and compare structural and gene differences through accurate haplotype reconstruction.

15.

Dissecting the Causal Mechanism of X-Linked Dystonia-Parkinsonism by Integrating Genome and Transcriptome Assembly.

Aneichyk, Tatsiana; Hendriks, William T; Yadav, Rachita; Shin, David; Gao, Dadi; Vaine, Christine A; Collins, Ryan L; Domingo, Aloysius; Currall, Benjamin; Stortchevoi, Alexei; Multhaupt-Buell, Trisha; Penney, Ellen B; Cruz, Lilian; Dhakal, Jyotsna; Brand, Harrison; Hanscom, Carrie; Antolik, Caroline; Dy, Marisela; Ragavendran, Ashok; Underwood, Jason; Cantsilieris, Stuart; Munson, Katherine M; Eichler, Evan E; Acuña, Patrick; Go, Criscely; Jamora, R Dominic G; Rosales, Raymond L; Church, Deanna M; Williams, Stephen R; Garcia, Sarah; Klein, Christine; Müller, Ulrich; Wilhelmsen, Kirk C; Timmers, H T Marc; Sapir, Yechiam; Wainger, Brian J; Henderson, Daniel; Ito, Naoto; Weisenfeld, Neil; Jaffe, David; Sharma, Nutan; Breakefield, Xandra O; Ozelius, Laurie J; Bragg, D Cristopher; Talkowski, Michael E.

Cell ; 172(5): 897-909.e21, 2018 02 22.

Artigo em Inglês | MEDLINE | ID: mdl-29474918

RESUMO

X-linked Dystonia-Parkinsonism (XDP) is a Mendelian neurodegenerative disease that is endemic to the Philippines and is associated with a founder haplotype. We integrated multiple genome and transcriptome assembly technologies to narrow the causal mutation to the TAF1 locus, which included a SINE-VNTR-Alu (SVA) retrotransposition into intron 32 of the gene. Transcriptome analyses identified decreased expression of the canonical cTAF1 transcript among XDP probands, and de novo assembly across multiple pluripotent stem-cell-derived neuronal lineages discovered aberrant TAF1 transcription that involved alternative splicing and intron retention (IR) in proximity to the SVA that was anti-correlated with overall TAF1 expression. CRISPR/Cas9 excision of the SVA rescued this XDP-specific transcriptional signature and normalized TAF1 expression in probands. These data suggest an SVA-mediated aberrant transcriptional mechanism associated with XDP and may provide a roadmap for layered technologies and integrated assembly-based analyses for other unsolved Mendelian disorders.

Assuntos

Distúrbios Distônicos/genética , Doenças Genéticas Ligadas ao Cromossomo X/genética , Genoma Humano , Transcriptoma/genética , Processamento Alternativo/genética , Elementos Alu/genética , Sequência de Bases , Sistemas CRISPR-Cas/genética , Estudos de Coortes , Família , Feminino , Loci Gênicos , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala , Histona Acetiltransferases/genética , Histona Acetiltransferases/metabolismo , Humanos , Células-Tronco Pluripotentes Induzidas/metabolismo , Íntrons/genética , Masculino , Repetições Minissatélites/genética , Modelos Genéticos , Degeneração Neural/genética , Degeneração Neural/patologia , Células-Tronco Neurais/metabolismo , Neurônios/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Elementos Nucleotídeos Curtos e Dispersos , Fatores Associados à Proteína de Ligação a TATA/genética , Fatores Associados à Proteína de Ligação a TATA/metabolismo , Fator de Transcrição TFIID/genética , Fator de Transcrição TFIID/metabolismo

16.

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

Schneider, Valerie A; Graves-Lindsay, Tina; Howe, Kerstin; Bouk, Nathan; Chen, Hsiu-Chuan; Kitts, Paul A; Murphy, Terence D; Pruitt, Kim D; Thibaud-Nissen, Françoise; Albracht, Derek; Fulton, Robert S; Kremitzki, Milinn; Magrini, Vincent; Markovic, Chris; McGrath, Sean; Steinberg, Karyn Meltz; Auger, Kate; Chow, William; Collins, Joanna; Harden, Glenn; Hubbard, Timothy; Pelan, Sarah; Simpson, Jared T; Threadgold, Glen; Torrance, James; Wood, Jonathan M; Clarke, Laura; Koren, Sergey; Boitano, Matthew; Peluso, Paul; Li, Heng; Chin, Chen-Shan; Phillippy, Adam M; Durbin, Richard; Wilson, Richard K; Flicek, Paul; Eichler, Evan E; Church, Deanna M.

Genome Res ; 27(5): 849-864, 2017 05.

Artigo em Inglês | MEDLINE | ID: mdl-28396521

RESUMO

The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.

Assuntos

Mapeamento de Sequências Contíguas/métodos , Genoma Humano , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Mapeamento de Sequências Contíguas/normas , Genômica/normas , Haploidia , Haplótipos , Humanos , Polimorfismo Genético , Padrões de Referência , Análise de Sequência de DNA/normas

17.

Direct determination of diploid genome sequences.

Weisenfeld, Neil I; Kumar, Vijay; Shah, Preyas; Church, Deanna M; Jaffe, David B.

Genome Res ; 27(5): 757-767, 2017 05.

Artigo em Inglês | MEDLINE | ID: mdl-28381613

RESUMO

Determining the genome sequence of an organism is challenging, yet fundamental to understanding its biology. Over the past decade, thousands of human genomes have been sequenced, contributing deeply to biomedical research. In the vast majority of cases, these have been analyzed by aligning sequence reads to a single reference genome, biasing the resulting analyses, and in general, failing to capture sequences novel to a given genome. Some de novo assemblies have been constructed free of reference bias, but nearly all were constructed by merging homologous loci into single "consensus" sequences, generally absent from nature. These assemblies do not correctly represent the diploid biology of an individual. In exactly two cases, true diploid de novo assemblies have been made, at great expense. One was generated using Sanger sequencing, and one using thousands of clone pools. Here, we demonstrate a straightforward and low-cost method for creating true diploid de novo assemblies. We make a single library from â¼1 ng of high molecular weight DNA, using the 10x Genomics microfluidic platform to partition the genome. We applied this technique to seven human samples, generating low-cost HiSeq X data, then assembled these using a new "pushbutton" algorithm, Supernova. Each computation took 2 d on a single server. Each yielded contigs longer than 100 kb, phase blocks longer than 2.5 Mb, and scaffolds longer than 15 Mb. Our method provides a scalable capability for determining the actual diploid genome sequence in a sample, opening the door to new approaches in genomic biology and medicine.

Assuntos

Mapeamento de Sequências Contíguas/métodos , Diploide , Análise de Sequência de DNA/métodos , Genoma Humano , Biblioteca Genômica , Humanos , Microfluídica/métodos , Software

18.

Principles and Recommendations for Standardizing the Use of the Next-Generation Sequencing Variant File in Clinical Settings.

Lubin, Ira M; Aziz, Nazneen; Babb, Lawrence J; Ballinger, Dennis; Bisht, Himani; Church, Deanna M; Cordes, Shaun; Eilbeck, Karen; Hyland, Fiona; Kalman, Lisa; Landrum, Melissa; Lockhart, Edward R; Maglott, Donna; Marth, Gabor; Pfeifer, John D; Rehm, Heidi L; Roy, Somak; Tezak, Zivana; Truty, Rebecca; Ullman-Cullere, Mollie; Voelkerding, Karl V; Worthey, Elizabeth A; Zaranek, Alexander W; Zook, Justin M.

J Mol Diagn ; 19(3): 417-426, 2017 05.

Artigo em Inglês | MEDLINE | ID: mdl-28315672

RESUMO

A national workgroup convened by the Centers for Disease Control and Prevention identified principles and made recommendations for standardizing the description of sequence data contained within the variant file generated during the course of clinical next-generation sequence analysis for diagnosing human heritable conditions. The specifications for variant files were initially developed to be flexible with regard to content representation to support a variety of research applications. This flexibility permits variation with regard to how sequence findings are described and this depends, in part, on the conventions used. For clinical laboratory testing, this poses a problem because these differences can compromise the capability to compare sequence findings among laboratories to confirm results and to query databases to identify clinically relevant variants. To provide for a more consistent representation of sequence findings described within variant files, the workgroup made several recommendations that considered alignment to a common reference sequence, variant caller settings, use of genomic coordinates, and gene and variant naming conventions. These recommendations were considered with regard to the existing variant file specifications presently used in the clinical setting. Adoption of these recommendations is anticipated to reduce the potential for ambiguity in describing sequence findings and facilitate the sharing of genomic data among clinical laboratories and other entities.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Bases de Dados Genéticas , Variação Genética/genética , Humanos , Software

19.

A variant by any name: quantifying annotation discordance across tools and clinical databases.

Yen, Jennifer L; Garcia, Sarah; Montana, Aldrin; Harris, Jason; Chervitz, Stephen; Morra, Massimo; West, John; Chen, Richard; Church, Deanna M.

Genome Med ; 9(1): 7, 2017 01 26.

Artigo em Inglês | MEDLINE | ID: mdl-28122645

RESUMO

BACKGROUND: Clinical genomic testing is dependent on the robust identification and reporting of variant-level information in relation to disease. With the shift to high-throughput sequencing, a major challenge for clinical diagnostics is the cross-identification of variants called on their genomic position to resources that rely on transcript- or protein-based descriptions. METHODS: We evaluated the accuracy of three tools (SnpEff, Variant Effect Predictor, and Variation Reporter) that generate transcript and protein-based variant nomenclature from genomic coordinates according to guidelines by the Human Genome Variation Society (HGVS). Our evaluation was based on transcript-controlled comparisons to a manually curated set of 126 test variants of various types drawn from data sources, each with HGVS-compliant transcript and protein descriptors. We further evaluated the concordance between annotations generated by Snpeff and Variant Effect Predictor and those in major germline and cancer databases: ClinVar and COSMIC, respectively. RESULTS: We find that there is substantial discordance between the annotation tools and databases in the description of insertions and/or deletions. Using our ground truth set of variants, constructed specifically to identify challenging events, accuracy was between 80 and 90% for coding and 50 and 70% for protein changes for 114 to 126 variants. Exact concordance for SNV syntax was over 99.5% between ClinVar and Variant Effect Predictor and SnpEff, but less than 90% for non-SNV variants. For COSMIC, exact concordance for coding and protein SNVs was between 65 and 88% and less than 15% for insertions. Across the tools and datasets, there was a wide range of different but equivalent expressions describing protein variants. CONCLUSIONS: Our results reveal significant inconsistency in variant representation across tools and databases. While some of these syntax differences may be clear to a clinician, they can confound variant matching, an important step in variant classification. These results highlight the urgent need for the adoption and adherence to uniform standards in variant annotation, with consistent reporting on the genomic reference, to enable accurate and efficient data-driven clinical care.

Assuntos

Confiabilidade dos Dados , Variação Genética , Genoma Humano , Anotação de Sequência Molecular/normas , Software/normas , Biologia Computacional/normas , Bases de Dados Genéticas , Humanos , Mutação INDEL

20.

Alternate-locus aware variant calling in whole genome sequencing.

Jäger, Marten; Schubach, Max; Zemojtel, Tomasz; Reinert, Knut; Church, Deanna M; Robinson, Peter N.

Genome Med ; 8(1): 130, 2016 12 13.

Artigo em Inglês | MEDLINE | ID: mdl-27964746

RESUMO

BACKGROUND: The last two human genome assemblies have extended the previous linear golden-path paradigm of the human genome to a graph-like model to better represent regions with a high degree of structural variability. The new model offers opportunities to improve the technical validity of variant calling in whole-genome sequencing (WGS). METHODS: We developed an algorithm that analyzes the patterns of variant calls in the 178 structurally variable regions of the GRCh38 genome assembly, and infers whether a given sample is most likely to contain sequences from the primary assembly, an alternate locus, or their heterozygous combination at each of these 178 regions. We investigate 121 in-house WGS datasets that have been aligned to the GRCh37 and GRCh38 assemblies. RESULTS: We show that stretches of sequences that are largely but not entirely identical between the primary assembly and an alternate locus can result in multiple variant calls against regions of the primary assembly. In WGS analysis, this results in characteristic and recognizable patterns of variant calls at positions that we term alignable scaffold-discrepant positions (ASDPs). In 121 in-house genomes, on average 51.8±3.8 of the 178 regions were found to correspond best to an alternate locus rather than the primary assembly sequence, and filtering these genomes with our algorithm led to the identification of 7863 variant calls per genome that colocalized with ASDPs. Additionally, we found that 437 of 791 genome-wide association study hits located within one of the regions corresponded to ASDPs. CONCLUSIONS: Our algorithm uses the information contained in the 178 structurally variable regions of the GRCh38 genome assembly to avoid spurious variant calls in cases where samples contain an alternate locus rather than the corresponding segment of the primary assembly. These results suggest the great potential of fully incorporating the resources of graph-like genome assemblies into variant calling, but also underscore the importance of developing computational resources that will allow a full reconstruction of the genotype in personal genomes. Our algorithm is freely available at https://github.com/charite/asdpex .

Assuntos

Algoritmos , Variação Genética , Genoma Humano , Heterozigoto , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Humanos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA