Pesquisa | BVS Violência e Saúde

RNA-Seq mapping and detection of gene fusions with a suffix array algorithm.

Sakarya, Onur; Breu, Heinz; Radovich, Milan; Chen, Yongzhi; Wang, Yulei N; Barbacioru, Catalin; Utiramerur, Sowmi; Whitley, Penn P; Brockman, Joel P; Vatta, Paolo; Zhang, Zheng; Popescu, Liviu; Muller, Matthew W; Kudlingar, Vidya; Garg, Nriti; Li, Chieh-Yuan; Kong, Benjamin S; Bodeau, John P; Nutter, Robert C; Gu, Jian; Bramlett, Kelli S; Ichikawa, Jeffrey K; Hyland, Fiona C; Siddiqui, Asim S.

PLoS Comput Biol ; 8(4): e1002464, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-22496636

RESUMO

High-throughput RNA sequencing enables quantification of transcripts (both known and novel), exon/exon junctions and fusions of exons from different genes. Discovery of gene fusions-particularly those expressed with low abundance- is a challenge with short- and medium-length sequencing reads. To address this challenge, we implemented an RNA-Seq mapping pipeline within the LifeScope software. We introduced new features including filter and junction mapping, annotation-aided pairing rescue and accurate mapping quality values. We combined this pipeline with a Suffix Array Spliced Read (SASR) aligner to detect chimeric transcripts. Performing paired-end RNA-Seq of the breast cancer cell line MCF-7 using the SOLiD system, we called 40 gene fusions among over 120,000 splicing junctions. We validated 36 of these 40 fusions with TaqMan assays, of which 25 were expressed in MCF-7 but not the Human Brain Reference. An intra-chromosomal gene fusion involving the estrogen receptor alpha gene ESR1, and another involving the RPS6KB1 (Ribosomal protein S6 kinase beta-1) were recurrently expressed in a number of breast tumor cell lines and a clinical tumor sample.

Assuntos

Algoritmos , Fusão Gênica/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência de RNA/métodos , Software , Sequência de Bases , Dados de Sequência Molecular

Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology.

Karp, Peter D; Paley, Suzanne M; Krummenacker, Markus; Latendresse, Mario; Dale, Joseph M; Lee, Thomas J; Kaipa, Pallavi; Gilham, Fred; Spaulding, Aaron; Popescu, Liviu; Altman, Tomer; Paulsen, Ian; Keseler, Ingrid M; Caspi, Ron.

Brief Bioinform ; 11(1): 40-79, 2010 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-19955237

RESUMO

Pathway Tools is a production-quality software environment for creating a type of model-organism database called a Pathway/Genome Database (PGDB). A PGDB such as EcoCyc integrates the evolving understanding of the genes, proteins, metabolic network and regulatory network of an organism. This article provides an overview of Pathway Tools capabilities. The software performs multiple computational inferences including prediction of metabolic pathways, prediction of metabolic pathway hole fillers and prediction of operons. It enables interactive editing of PGDBs by DB curators. It supports web publishing of PGDBs, and provides a large number of query and visualization tools. The software also supports comparative analyses of PGDBs, and provides several systems biology analyses of PGDBs including reachability analysis of metabolic networks, and interactive tracing of metabolites through a metabolic network. More than 800 PGDBs have been created using Pathway Tools by scientists around the world, many of which are curated DBs for important model organisms. Those PGDBs can be exchanged using a peer-to-peer DB sharing system called the PGDB Registry.

Assuntos

Biologia Computacional , Genoma , Software , Biologia de Sistemas , Internet

The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases.

Caspi, Ron; Altman, Tomer; Dale, Joseph M; Dreher, Kate; Fulcher, Carol A; Gilham, Fred; Kaipa, Pallavi; Karthikeyan, Athikkattuvalasu S; Kothari, Anamika; Krummenacker, Markus; Latendresse, Mario; Mueller, Lukas A; Paley, Suzanne; Popescu, Liviu; Pujar, Anuradha; Shearer, Alexander G; Zhang, Peifen; Karp, Peter D.

Nucleic Acids Res ; 38(Database issue): D473-9, 2010 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-19850718

RESUMO

The MetaCyc database (MetaCyc.org) is a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. With more than 1400 pathways, MetaCyc is the largest collection of metabolic pathways currently available. Pathways reactions are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes, and literature citations. BioCyc (BioCyc.org) is a collection of more than 500 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs also contain additional features, such as predicted operons, transport systems, and pathway hole-fillers. The BioCyc Web site offers several tools for the analysis of the PGDBs, including Omics Viewers that enable visualization of omics datasets on two different genome-scale diagrams and tools for comparative analysis. The BioCyc PGDBs generated by SRI are offered for adoption by any party interested in curation of metabolic, regulatory, and genome-related information about an organism.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Bases de Dados de Ácidos Nucleicos , Animais , Biologia Computacional/tendências , Bases de Dados de Proteínas , Genoma Arqueal , Genoma Bacteriano , Genoma de Planta , Genoma Viral , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , Modelos Biológicos , Estrutura Terciária de Proteína , Software

Machine learning methods for metabolic pathway prediction.

Dale, Joseph M; Popescu, Liviu; Karp, Peter D.

BMC Bioinformatics ; 11: 15, 2010 Jan 08.

Artigo em Inglês | MEDLINE | ID: mdl-20064214

RESUMO

BACKGROUND: A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. RESULTS: To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. CONCLUSIONS: ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.

Assuntos

Inteligência Artificial , Redes e Vias Metabólicas , Biologia Computacional/métodos , Bases de Dados Factuais , Genoma , Software

Automation of gene assignments to metabolic pathways using high-throughput expression data.

Popescu, Liviu; Yona, Golan.

BMC Bioinformatics ; 6: 217, 2005 Aug 31.

Artigo em Inglês | MEDLINE | ID: mdl-16135255

RESUMO

BACKGROUND: Accurate assignment of genes to pathways is essential in order to understand the functional role of genes and to map the existing pathways in a given genome. Existing algorithms predict pathways by extrapolating experimental data in one organism to other organisms for which this data is not available. However, current systems classify all genes that belong to a specific EC family to all the pathways that contain the corresponding enzymatic reaction, and thus introduce ambiguity. RESULTS: Here we describe an algorithm for assignment of genes to cellular pathways that addresses this problem by selectively assigning specific genes to pathways. Our algorithm uses the set of experimentally elucidated metabolic pathways from MetaCyc, together with statistical models of enzyme families and expression data to assign genes to enzyme families and pathways by optimizing correlated co-expression, while minimizing conflicts due to shared assignments among pathways. Our algorithm also identifies alternative ("backup") genes and addresses the multi-domain nature of proteins. We apply our model to assign genes to pathways in the Yeast genome and compare the results for genes that were assigned experimentally. Our assignments are consistent with the experimentally verified assignments and reflect characteristic properties of cellular pathways. CONCLUSION: We present an algorithm for automatic assignment of genes to metabolic pathways. The algorithm utilizes expression data and reduces the ambiguity that characterizes assignments that are based only on EC numbers.

Assuntos

Algoritmos , Perfilação da Expressão Gênica/métodos , Modelos Biológicos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Asparagina/biossíntese , Processamento Eletrônico de Dados , Ácido Fólico/biossíntese , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo

Expectation-maximization algorithms for fuzzy assignment of genes to cellular pathways.

Popescu, Liviu; Yona, Golan.

Comput Syst Bioinformatics Conf ; : 281-91, 2006.

Artigo em Inglês | MEDLINE | ID: mdl-17369646

RESUMO

Cellular pathways are composed of multiple reactions and interactions mediated by genes. Many of these reactions are common to multiple pathways, and each reaction might be potentially mediated by multiple genes in the same genome. Existing pathway reconstruction procedures assign a gene to all pathways in which it might catalyze a reaction, leading to a many-to-many mapping of genes to pathways. However, it is unlikely that all genes that are capable of mediating a certain reaction are involved in all the pathways that contain it. Rather, it is more likely that each gene is optimized to function in specific pathway(s). Hence, existing procedures for pathway construction produce assignments that are ambiguous. Here we present a probabilistic algorithm for the assignment of genes to pathways that addresses this problem and reduces this ambiguity. Our algorithm uses expression data, database annotations and similarity data to infer the most likely assignments, and estimate the affinity of each gene with the known cellular pathways. We apply the algorithm to metabolic pathways in Yeast and compare the results to assignments that were experimentally verified.

Assuntos

Biologia Computacional/métodos , Perfilação da Expressão Gênica , Genômica/métodos , Algoritmos , Catálise , Análise por Conglomerados , Bases de Dados Genéticas , Lógica Fuzzy , Redes e Vias Metabólicas , Metionina/biossíntese , Modelos Biológicos , Modelos Genéticos , Probabilidade , Mapeamento de Interação de Proteínas/métodos , Serina/química

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA