Búsqueda | BVS CLAP/SMR-OPS/OMS

Scaling read aligners to hundreds of threads on general-purpose processors.

Langmead, Ben; Wilks, Christopher; Antonescu, Valentin; Charles, Rone.

Bioinformatics ; 35(3): 421-432, 2019 02 01.

Artículo en Inglés | MEDLINE | ID: mdl-30020410

RESUMEN

Motivation: General-purpose processors can now contain many dozens of processor cores and support hundreds of simultaneous threads of execution. To make best use of these threads, genomics software must contend with new and subtle computer architecture issues. We discuss some of these and propose methods for improving thread scaling in tools that analyze each read independently, such as read aligners. Results: We implement these methods in new versions of Bowtie, Bowtie 2 and HISAT. We greatly improve thread scaling in many scenarios, including on the recent Intel Xeon Phi architecture. We also highlight how bottlenecks are exacerbated by variable-record-length file formats like FASTQ and suggest changes that enable superior scaling. Availability and implementation: Experiments for this study: https://github.com/BenLangmead/bowtie-scaling. Bowtie: http://bowtie-bio.sourceforge.net. Bowtie 2: http://bowtie-bio.sourceforge.net/bowtie2. HISAT: http://www.ccb.jhu.edu/software/hisat. Supplementary information: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Genómica , Programas Informáticos , Sistemas de Computación

The novel fusion transcript NR5A2-KLHL29FT is generated by an insertion at the KLHL29 locus.

Sun, Zhenguo; Ke, Xiquan; Salzberg, Steven L; Kim, Daehwan; Antonescu, Valentin; Cheng, Yulan; Huang, Binbin; Song, Jee Hoon; Abraham, John M; Ibrahim, Sariat; Tian, Hui; Meltzer, Stephen J.

Cancer ; 123(9): 1507-1515, 2017 05 01.

Artículo en Inglés | MEDLINE | ID: mdl-28081303

RESUMEN

BACKGROUND: Novel fusion transcripts (FTs) caused by chromosomal rearrangement are common factors in the development of cancers. In the current study, the authors used massively parallel RNA sequencing to identify new FTs in colon cancers. METHODS: RNA sequencing (RNA-Seq) and TopHat-Fusion were used to identify new FTs in colon cancers. The authors then investigated whether the novel FT nuclear receptor subfamily 5, group A, member 2 (NR5A2)-Kelch-like family member 29 FT (KLHL29FT) was transcribed from a genomic chromosomal rearrangement. Next, the expression of NR5A2-KLHL29FT was measured by quantitative real-time polymerase chain reaction in colon cancers and matched corresponding normal epithelia. RESULTS: The authors identified the FT NR5A2-KLHL29FT in normal and cancerous epithelia. While investigating this transcript, it was unexpectedly found that it was due to an uncharacterized polymorphic germline insertion of the NR5A2 sequence from chromosome 1 into the KLHL29 locus at chromosome 2, rather than a chromosomal rearrangement. This germline insertion, which occurred at a population frequency of 0.40, appeared to bear no relationship to cancer development. Moreover, expression of NR5A2-KLHL29FT was validated in RNA specimens from samples with insertions of NR5A2 at the KLHL29 gene locus, but not from samples without this insertion. It is interesting to note that NR5A2-KLH29FT expression levels were significantly lower in colon cancers than in matched normal colonic epithelia (P =.029), suggesting the potential participation of NR5A2-KLHL29FT in the origin or progression of this tumor type. CONCLUSIONS: NR5A2-KLHL29FT was generated from a polymorphism insertion of the NR5A2 sequence into the KLHL29 locus. NR5A2-KLHL29FT may influence the origin or progression of colon cancer. Moreover, researchers should be aware that similar FTs may occur due to transchromosomal insertions that are not correctly annotated in genome databases, especially with current assembly algorithms. Cancer 2017;123:1507-1515. © 2017 American Cancer Society.

Asunto(s)

Proteínas Adaptadoras Transductoras de Señales/genética , Colon/metabolismo , Neoplasias del Colon/genética , Mutagénesis Insercional , Proteínas de Fusión Oncogénica/genética , ARN Mensajero/metabolismo , Receptores Citoplasmáticos y Nucleares/genética , Neoplasias del Colon/metabolismo , Mutación de Línea Germinal , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Reacción en Cadena en Tiempo Real de la Polimerasa , Análisis de Secuencia de ARN

Assembly of a pan-genome from deep sequencing of 910 humans of African descent.

Sherman, Rachel M; Forman, Juliet; Antonescu, Valentin; Puiu, Daniela; Daya, Michelle; Rafaels, Nicholas; Boorgula, Meher Preethi; Chavan, Sameer; Vergara, Candelaria; Ortega, Victor E; Levin, Albert M; Eng, Celeste; Yazdanbakhsh, Maria; Wilson, James G; Marrugo, Javier; Lange, Leslie A; Williams, L Keoki; Watson, Harold; Ware, Lorraine B; Olopade, Christopher O; Olopade, Olufunmilayo; Oliveira, Ricardo R; Ober, Carole; Nicolae, Dan L; Meyers, Deborah A; Mayorga, Alvaro; Knight-Madden, Jennifer; Hartert, Tina; Hansel, Nadia N; Foreman, Marilyn G; Ford, Jean G; Faruque, Mezbah U; Dunston, Georgia M; Caraballo, Luis; Burchard, Esteban G; Bleecker, Eugene R; Araujo, Maria I; Herrera-Paz, Edwin F; Campbell, Monica; Foster, Cassandra; Taub, Margaret A; Beaty, Terri H; Ruczinski, Ingo; Mathias, Rasika A; Barnes, Kathleen C; Salzberg, Steven L.

Nat Genet ; 51(1): 30-35, 2019 01.

Artículo en Inglés | MEDLINE | ID: mdl-30455414

RESUMEN

We used a deeply sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome. We aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). We then compared all contigs to one another to identify a set of unique sequences representing regions of the African pan-genome missing from the reference genome. Our analysis revealed 296,485,284 bp in 125,715 distinct contigs present in the populations of African descent, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome. Although the functional significance of nearly all of this sequence is unknown, 387 of the novel contigs fall within 315 distinct protein-coding genes, and the rest appear to be intergenic.

Asunto(s)

Población Negra/genética , Genoma Humano/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Análisis de Secuencia de ADN/métodos

Author Correction: Assembly of a pan-genome from deep sequencing of 910 humans of African descent.

Nat Genet ; 51(2): 364, 2019 02.

Artículo en Inglés | MEDLINE | ID: mdl-30647471

RESUMEN

In the version of this article initially published, the statement "there are no pan-genomes for any other animal or plant species" was incorrect. The statement has been corrected to "there are no reported pan-genomes for any other animal species, to our knowledge." We thank David Edwards for bringing this error to our attention. The error has been corrected in the HTML and PDF versions of the article.

Germline Mutations in DNA Repair Genes in Lung Adenocarcinoma.

Parry, Erin M; Gable, Dustin L; Stanley, Susan E; Khalil, Sara E; Antonescu, Valentin; Florea, Liliana; Armanios, Mary.

J Thorac Oncol ; 12(11): 1673-1678, 2017 11.

Artículo en Inglés | MEDLINE | ID: mdl-28843361

RESUMEN

INTRODUCTION: Although lung cancer is generally thought to be environmentally provoked, anecdotal familial clustering has been reported, suggesting that there may be genetic susceptibility factors. We systematically tested whether germline mutations in eight candidate genes may be risk factors for lung adenocarcinoma. METHODS: We studied lung adenocarcinoma cases for which germline sequence data had been generated as part of The Cancer Genome Atlas project but had not been previously analyzed. We selected eight genes, ATM serine/threonine kinase gene (ATM), BRCA2, DNA repair associated gene (BRCA2), checkpoint kinase 2 gene (CHEK2), EGFR, parkin RBR E3 ubiquitin protein ligase gene (PARK2), telomerase reverse transcriptase gene (TERT), tumor protein p53 gene (TP53), and Yes associated protein 1 gene (YAP1), on the basis of prior anecdotal association with lung cancer or genome-wide association studies. RESULTS: Among 555 lung adenocarcinoma cases, we detected 14 pathogenic mutations in five genes; they occurred at a frequency of 2.5% and represented an OR of 66 (95% confidence interval: 33-125, p < 0.0001 [chi-square test]). The mutations fell most commonly in ATM (50%), followed by TP53, BRCA2, EGFR, and PARK2. Most (86%) of these variants had been reported in other familial cancer syndromes. Another 12 cases (2%) carried ultrarare variants that were predicted to be deleterious by three protein prediction programs; these most frequently involved ATM and BRCA2. CONCLUSIONS: A subset of patients with lung adenocarcinoma, at least 2.5% to 4.5%, carry germline variants that have been linked to cancer risk in Mendelian syndromes. The genes fall most frequently in DNA repair pathways. Our data indicate that patients with lung adenocarcinoma, similar to other solid tumors, include a subset of patients with inherited susceptibility.

Asunto(s)

Adenocarcinoma/genética , Reparación del ADN/genética , Mutación de Línea Germinal/genética , Neoplasias Pulmonares/genética , Adenocarcinoma/patología , Adenocarcinoma del Pulmón , Adulto , Anciano , Anciano de 80 o más Años , Femenino , Humanos , Neoplasias Pulmonares/patología , Masculino , Persona de Mediana Edad

POPcorn: An Online Resource Providing Access to Distributed and Diverse Maize Project Data.

Cannon, Ethalinda K S; Birkett, Scott M; Braun, Bremen L; Kodavali, Sateesh; Jennewein, Douglas M; Yilmaz, Alper; Antonescu, Valentin; Antonescu, Corina; Harper, Lisa C; Gardiner, Jack M; Schaeffer, Mary L; Campbell, Darwin A; Andorf, Carson M; Andorf, Destri; Lisch, Damon; Koch, Karen E; McCarty, Donald R; Quackenbush, John; Grotewold, Erich; Lushbough, Carol M; Sen, Taner Z; Lawrence, Carolyn J.

Int J Plant Genomics ; 2011: 923035, 2011.

Artículo en Inglés | MEDLINE | ID: mdl-22253616

RESUMEN

The purpose of the online resource presented here, POPcorn (Project Portal for corn), is to enhance accessibility of maize genetic and genomic resources for plant biologists. Currently, many online locations are difficult to find, some are best searched independently, and individual project websites often degrade over time-sometimes disappearing entirely. The POPcorn site makes available (1) a centralized, web-accessible resource to search and browse descriptions of ongoing maize genomics projects, (2) a single, stand-alone tool that uses web Services and minimal data warehousing to search for sequence matches in online resources of diverse offsite projects, and (3) a set of tools that enables researchers to migrate their data to the long-term model organism database for maize genetic and genomic information: MaizeGDB. Examples demonstrating POPcorn's utility are provided herein.

Using the DFCI gene index databases for biological discovery.

Antonescu, Corina; Antonescu, Valentin; Sultana, Razvan; Quackenbush, John.

Curr Protoc Bioinformatics ; Chapter 1: 1.6.1-1.6.36, 2010 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-20205187

RESUMEN

The DFCI Gene Index Web pages provide access to analyses of ESTs and gene sequences for nearly 114 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a home page. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.

Asunto(s)

Biología Computacional/métodos , Bases de Datos Genéticas , Etiquetas de Secuencia Expresada , Genes , Almacenamiento y Recuperación de la Información/métodos , Internet , Programas Informáticos , Interfaz Usuario-Computador

TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets.

Pertea, Geo; Huang, Xiaoqiu; Liang, Feng; Antonescu, Valentin; Sultana, Razvan; Karamycheva, Svetlana; Lee, Yuandan; White, Joseph; Cheung, Foo; Parvizi, Babak; Tsai, Jennifer; Quackenbush, John.

Bioinformatics ; 19(5): 651-2, 2003 Mar 22.

Artículo en Inglés | MEDLINE | ID: mdl-12651724

RESUMEN

TGICL is a pipeline for analysis of large Expressed Sequence Tags (EST) and mRNA databases in which the sequences are first clustered based on pairwise sequence similarity, and then assembled by individual clusters (optionally with quality values) to produce longer, more complete consensus sequences. The system can run on multi-CPU architectures including SMP and PVM.

Asunto(s)

Sistemas de Administración de Bases de Datos , Bases de Datos de Ácidos Nucleicos , Etiquetas de Secuencia Expresada , Perfilación de la Expresión Génica/métodos , Almacenamiento y Recuperación de la Información/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Análisis por Conglomerados , Regulación de la Expresión Génica/genética , Homología de Secuencia , Programas Informáticos

Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA).

Lee, Yuandan; Sultana, Razvan; Pertea, Geo; Cho, Jennifer; Karamycheva, Svetlana; Tsai, Jennifer; Parvizi, Babak; Cheung, Foo; Antonescu, Valentin; White, Joseph; Holt, Ingeborg; Liang, Feng; Quackenbush, John.

Genome Res ; 12(3): 493-502, 2002 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-11875039

RESUMEN

Comparative genomics promises to rapidly accelerate the identification and functional classification of biologically important human genes. We developed the TIGR Orthologous Gene Alignment (TOGA; ) database to provide a cross-reference between fully and partially sequenced eukaryotic transcribed sequences. Starting with the assembled expressed sequence tag (EST) and gene sequences that comprise the 28 TIGR Gene Indices, we used high-stringency pair-wise sequence searches and a reflexive, transitive closure process to associate sequence-specific best hits, generating 32,652 tentative ortholog groups (TOGs). This has allowed us to identify putative orthologs and paralogs for known genes, as well as those that exist only as uncharacterized ESTs and to provide links to additional information including genome sequence and mapping data. TOGA provides an important new resource for the analysis of gene function in eukaryotes. In addition, an analysis of the most widely represented sequences can begin to provide insight into eukaryotic biological processes.

Asunto(s)

Células Eucariotas , Genes/genética , Alineación de Secuencia/métodos , Algoritmos , Animales , Bovinos , Biología Computacional/métodos , Secuencia de Consenso/genética , Bases de Datos Genéticas , Células Eucariotas/química , Células Eucariotas/metabolismo , Genoma Humano , Humanos , Ratones , Filogenia , Ratas , Homología de Secuencia de Ácido Nucleico

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA