Pesquisa | BVS Bolivia

Author Correction: Assembly of a pan-genome from deep sequencing of 910 humans of African descent.

Sherman, Rachel M; Forman, Juliet; Antonescu, Valentin; Puiu, Daniela; Daya, Michelle; Rafaels, Nicholas; Boorgula, Meher Preethi; Chavan, Sameer; Vergara, Candelaria; Ortega, Victor E; Levin, Albert M; Eng, Celeste; Yazdanbakhsh, Maria; Wilson, James G; Marrugo, Javier; Lange, Leslie A; Williams, L Keoki; Watson, Harold; Ware, Lorraine B; Olopade, Christopher O; Olopade, Olufunmilayo; Oliveira, Ricardo R; Ober, Carole; Nicolae, Dan L; Meyers, Deborah A; Mayorga, Alvaro; Knight-Madden, Jennifer; Hartert, Tina; Hansel, Nadia N; Foreman, Marilyn G; Ford, Jean G; Faruque, Mezbah U; Dunston, Georgia M; Caraballo, Luis; Burchard, Esteban G; Bleecker, Eugene R; Araujo, Maria I; Herrera-Paz, Edwin F; Campbell, Monica; Foster, Cassandra; Taub, Margaret A; Beaty, Terri H; Ruczinski, Ingo; Mathias, Rasika A; Barnes, Kathleen C; Salzberg, Steven L.

Nat Genet ; 51(2): 364, 2019 02.

Artigo em Inglês | MEDLINE | ID: mdl-30647471

RESUMO

In the version of this article initially published, the statement "there are no pan-genomes for any other animal or plant species" was incorrect. The statement has been corrected to "there are no reported pan-genomes for any other animal species, to our knowledge." We thank David Edwards for bringing this error to our attention. The error has been corrected in the HTML and PDF versions of the article.

Assembly of a pan-genome from deep sequencing of 910 humans of African descent.

Nat Genet ; 51(1): 30-35, 2019 01.

Artigo em Inglês | MEDLINE | ID: mdl-30455414

RESUMO

We used a deeply sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome. We aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). We then compared all contigs to one another to identify a set of unique sequences representing regions of the African pan-genome missing from the reference genome. Our analysis revealed 296,485,284 bp in 125,715 distinct contigs present in the populations of African descent, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome. Although the functional significance of nearly all of this sequence is unknown, 387 of the novel contigs fall within 315 distinct protein-coding genes, and the rest appear to be intergenic.

Assuntos

População Negra/genética , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Análise de Sequência de DNA/métodos

Scaling read aligners to hundreds of threads on general-purpose processors.

Langmead, Ben; Wilks, Christopher; Antonescu, Valentin; Charles, Rone.

Bioinformatics ; 35(3): 421-432, 2019 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-30020410

RESUMO

Motivation: General-purpose processors can now contain many dozens of processor cores and support hundreds of simultaneous threads of execution. To make best use of these threads, genomics software must contend with new and subtle computer architecture issues. We discuss some of these and propose methods for improving thread scaling in tools that analyze each read independently, such as read aligners. Results: We implement these methods in new versions of Bowtie, Bowtie 2 and HISAT. We greatly improve thread scaling in many scenarios, including on the recent Intel Xeon Phi architecture. We also highlight how bottlenecks are exacerbated by variable-record-length file formats like FASTQ and suggest changes that enable superior scaling. Availability and implementation: Experiments for this study: https://github.com/BenLangmead/bowtie-scaling. Bowtie: http://bowtie-bio.sourceforge.net. Bowtie 2: http://bowtie-bio.sourceforge.net/bowtie2. HISAT: http://www.ccb.jhu.edu/software/hisat. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Genômica , Software , Sistemas Computacionais

Germline Mutations in DNA Repair Genes in Lung Adenocarcinoma.

Parry, Erin M; Gable, Dustin L; Stanley, Susan E; Khalil, Sara E; Antonescu, Valentin; Florea, Liliana; Armanios, Mary.

J Thorac Oncol ; 12(11): 1673-1678, 2017 11.

Artigo em Inglês | MEDLINE | ID: mdl-28843361

RESUMO

INTRODUCTION: Although lung cancer is generally thought to be environmentally provoked, anecdotal familial clustering has been reported, suggesting that there may be genetic susceptibility factors. We systematically tested whether germline mutations in eight candidate genes may be risk factors for lung adenocarcinoma. METHODS: We studied lung adenocarcinoma cases for which germline sequence data had been generated as part of The Cancer Genome Atlas project but had not been previously analyzed. We selected eight genes, ATM serine/threonine kinase gene (ATM), BRCA2, DNA repair associated gene (BRCA2), checkpoint kinase 2 gene (CHEK2), EGFR, parkin RBR E3 ubiquitin protein ligase gene (PARK2), telomerase reverse transcriptase gene (TERT), tumor protein p53 gene (TP53), and Yes associated protein 1 gene (YAP1), on the basis of prior anecdotal association with lung cancer or genome-wide association studies. RESULTS: Among 555 lung adenocarcinoma cases, we detected 14 pathogenic mutations in five genes; they occurred at a frequency of 2.5% and represented an OR of 66 (95% confidence interval: 33-125, p < 0.0001 [chi-square test]). The mutations fell most commonly in ATM (50%), followed by TP53, BRCA2, EGFR, and PARK2. Most (86%) of these variants had been reported in other familial cancer syndromes. Another 12 cases (2%) carried ultrarare variants that were predicted to be deleterious by three protein prediction programs; these most frequently involved ATM and BRCA2. CONCLUSIONS: A subset of patients with lung adenocarcinoma, at least 2.5% to 4.5%, carry germline variants that have been linked to cancer risk in Mendelian syndromes. The genes fall most frequently in DNA repair pathways. Our data indicate that patients with lung adenocarcinoma, similar to other solid tumors, include a subset of patients with inherited susceptibility.

Assuntos

Adenocarcinoma/genética , Reparo do DNA/genética , Mutação em Linhagem Germinativa/genética , Neoplasias Pulmonares/genética , Adenocarcinoma/patologia , Adenocarcinoma de Pulmão , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Neoplasias Pulmonares/patologia , Masculino , Pessoa de Meia-Idade

The novel fusion transcript NR5A2-KLHL29FT is generated by an insertion at the KLHL29 locus.

Sun, Zhenguo; Ke, Xiquan; Salzberg, Steven L; Kim, Daehwan; Antonescu, Valentin; Cheng, Yulan; Huang, Binbin; Song, Jee Hoon; Abraham, John M; Ibrahim, Sariat; Tian, Hui; Meltzer, Stephen J.

Cancer ; 123(9): 1507-1515, 2017 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-28081303

RESUMO

BACKGROUND: Novel fusion transcripts (FTs) caused by chromosomal rearrangement are common factors in the development of cancers. In the current study, the authors used massively parallel RNA sequencing to identify new FTs in colon cancers. METHODS: RNA sequencing (RNA-Seq) and TopHat-Fusion were used to identify new FTs in colon cancers. The authors then investigated whether the novel FT nuclear receptor subfamily 5, group A, member 2 (NR5A2)-Kelch-like family member 29 FT (KLHL29FT) was transcribed from a genomic chromosomal rearrangement. Next, the expression of NR5A2-KLHL29FT was measured by quantitative real-time polymerase chain reaction in colon cancers and matched corresponding normal epithelia. RESULTS: The authors identified the FT NR5A2-KLHL29FT in normal and cancerous epithelia. While investigating this transcript, it was unexpectedly found that it was due to an uncharacterized polymorphic germline insertion of the NR5A2 sequence from chromosome 1 into the KLHL29 locus at chromosome 2, rather than a chromosomal rearrangement. This germline insertion, which occurred at a population frequency of 0.40, appeared to bear no relationship to cancer development. Moreover, expression of NR5A2-KLHL29FT was validated in RNA specimens from samples with insertions of NR5A2 at the KLHL29 gene locus, but not from samples without this insertion. It is interesting to note that NR5A2-KLH29FT expression levels were significantly lower in colon cancers than in matched normal colonic epithelia (P =.029), suggesting the potential participation of NR5A2-KLHL29FT in the origin or progression of this tumor type. CONCLUSIONS: NR5A2-KLHL29FT was generated from a polymorphism insertion of the NR5A2 sequence into the KLHL29 locus. NR5A2-KLHL29FT may influence the origin or progression of colon cancer. Moreover, researchers should be aware that similar FTs may occur due to transchromosomal insertions that are not correctly annotated in genome databases, especially with current assembly algorithms. Cancer 2017;123:1507-1515. © 2017 American Cancer Society.

Assuntos

Proteínas Adaptadoras de Transdução de Sinal/genética , Colo/metabolismo , Neoplasias do Colo/genética , Mutagênese Insercional , Proteínas de Fusão Oncogênica/genética , RNA Mensageiro/metabolismo , Receptores Citoplasmáticos e Nucleares/genética , Neoplasias do Colo/metabolismo , Mutação em Linhagem Germinativa , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Reação em Cadeia da Polimerase em Tempo Real , Análise de Sequência de RNA

POPcorn: An Online Resource Providing Access to Distributed and Diverse Maize Project Data.

Cannon, Ethalinda K S; Birkett, Scott M; Braun, Bremen L; Kodavali, Sateesh; Jennewein, Douglas M; Yilmaz, Alper; Antonescu, Valentin; Antonescu, Corina; Harper, Lisa C; Gardiner, Jack M; Schaeffer, Mary L; Campbell, Darwin A; Andorf, Carson M; Andorf, Destri; Lisch, Damon; Koch, Karen E; McCarty, Donald R; Quackenbush, John; Grotewold, Erich; Lushbough, Carol M; Sen, Taner Z; Lawrence, Carolyn J.

Int J Plant Genomics ; 2011: 923035, 2011.

Artigo em Inglês | MEDLINE | ID: mdl-22253616

RESUMO

The purpose of the online resource presented here, POPcorn (Project Portal for corn), is to enhance accessibility of maize genetic and genomic resources for plant biologists. Currently, many online locations are difficult to find, some are best searched independently, and individual project websites often degrade over time-sometimes disappearing entirely. The POPcorn site makes available (1) a centralized, web-accessible resource to search and browse descriptions of ongoing maize genomics projects, (2) a single, stand-alone tool that uses web Services and minimal data warehousing to search for sequence matches in online resources of diverse offsite projects, and (3) a set of tools that enables researchers to migrate their data to the long-term model organism database for maize genetic and genomic information: MaizeGDB. Examples demonstrating POPcorn's utility are provided herein.

Using the DFCI gene index databases for biological discovery.

Antonescu, Corina; Antonescu, Valentin; Sultana, Razvan; Quackenbush, John.

Curr Protoc Bioinformatics ; Chapter 1: 1.6.1-1.6.36, 2010 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-20205187

RESUMO

The DFCI Gene Index Web pages provide access to analyses of ESTs and gene sequences for nearly 114 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a home page. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Etiquetas de Sequências Expressas , Genes , Armazenamento e Recuperação da Informação/métodos , Internet , Software , Interface Usuário-Computador

TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets.

Pertea, Geo; Huang, Xiaoqiu; Liang, Feng; Antonescu, Valentin; Sultana, Razvan; Karamycheva, Svetlana; Lee, Yuandan; White, Joseph; Cheung, Foo; Parvizi, Babak; Tsai, Jennifer; Quackenbush, John.

Bioinformatics ; 19(5): 651-2, 2003 Mar 22.

Artigo em Inglês | MEDLINE | ID: mdl-12651724

RESUMO

TGICL is a pipeline for analysis of large Expressed Sequence Tags (EST) and mRNA databases in which the sequences are first clustered based on pairwise sequence similarity, and then assembled by individual clusters (optionally with quality values) to produce longer, more complete consensus sequences. The system can run on multi-CPU architectures including SMP and PVM.

Assuntos

Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Ácidos Nucleicos , Etiquetas de Sequências Expressas , Perfilação da Expressão Gênica/métodos , Armazenamento e Recuperação da Informação/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Análise por Conglomerados , Regulação da Expressão Gênica/genética , Homologia de Sequência , Software

Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA).

Lee, Yuandan; Sultana, Razvan; Pertea, Geo; Cho, Jennifer; Karamycheva, Svetlana; Tsai, Jennifer; Parvizi, Babak; Cheung, Foo; Antonescu, Valentin; White, Joseph; Holt, Ingeborg; Liang, Feng; Quackenbush, John.

Genome Res ; 12(3): 493-502, 2002 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-11875039

RESUMO

Comparative genomics promises to rapidly accelerate the identification and functional classification of biologically important human genes. We developed the TIGR Orthologous Gene Alignment (TOGA; ) database to provide a cross-reference between fully and partially sequenced eukaryotic transcribed sequences. Starting with the assembled expressed sequence tag (EST) and gene sequences that comprise the 28 TIGR Gene Indices, we used high-stringency pair-wise sequence searches and a reflexive, transitive closure process to associate sequence-specific best hits, generating 32,652 tentative ortholog groups (TOGs). This has allowed us to identify putative orthologs and paralogs for known genes, as well as those that exist only as uncharacterized ESTs and to provide links to additional information including genome sequence and mapping data. TOGA provides an important new resource for the analysis of gene function in eukaryotes. In addition, an analysis of the most widely represented sequences can begin to provide insight into eukaryotic biological processes.

Assuntos

Células Eucarióticas , Genes/genética , Alinhamento de Sequência/métodos , Algoritmos , Animais , Bovinos , Biologia Computacional/métodos , Sequência Consenso/genética , Bases de Dados Genéticas , Células Eucarióticas/química , Células Eucarióticas/metabolismo , Genoma Humano , Humanos , Camundongos , Filogenia , Ratos , Homologia de Sequência do Ácido Nucleico

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA