Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Med Biol Eng Comput ; 61(1): 243-258, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-36357628

RESUMO

This study explores the machine learning-based assessment of predisposition to colorectal cancer based on single nucleotide polymorphisms (SNP). Such a computational approach may be used as a risk indicator and an auxiliary diagnosis method that complements the traditional methods such as biopsy and CT scan. Moreover, it may be used to develop a low-cost screening test for the early detection of colorectal cancers to improve public health. We employ several supervised classification algorithms. Besides, we apply data imputation to fill in the missing genotype values. The employed dataset includes SNPs observed in particular colorectal cancer-associated genomic loci that are located within DNA regions of 11 selected genes obtained from 115 individuals. We make the following observations: (i) random forest-based classifier using one-hot encoding and K-nearest neighbor (KNN)-based imputation performs the best among the studied classifiers with an F1 score of 89% and area under the curve (AUC) score of 0.96. (ii) One-hot encoding together with K-nearest neighbor-based data imputation increases the F1 scores by around 26% in comparison to the baseline approach which does not employ them. (iii) The proposed model outperforms a commonly employed state-of-the-art approach, ColonFlag, under all evaluated settings by up to 24% in terms of the AUC score. Based on the high accuracy of the constructed predictive models, the studied 11 genes may be considered a gene panel candidate for colon cancer risk screening.


Assuntos
Algoritmos , Neoplasias do Colo , Humanos , Genótipo , Fenótipo , Aprendizado de Máquina Supervisionado
2.
IEEE/ACM Trans Comput Biol Bioinform ; 18(3): 1014-1025, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-32750887

RESUMO

The metabolic wiring of patient cells is altered drastically in many diseases, including cancer. Understanding the nature of such changes may pave the way for new therapeutic opportunities as well as the development of personalized treatment strategies for patients. In this paper, we propose an algorithm called Metabolitics, which allows systems-level analysis of changes in the biochemical network of cells in disease states. It enables the study of a disease at both reaction- and pathway-level granularities for a detailed and summarized view of disease etiology. Metabolitics employs flux variability analysis with a dynamically built objective function based on biofluid metabolomics measurements in a personalized manner. Moreover, Metabolitics builds supervised classification models to discriminate between patients and healthy subjects based on the computed metabolic network changes. The use of Metabolitics is demonstrated for three distinct diseases, namely, breast cancer, Crohn's disease, and colorectal cancer. Our results show that the constructed supervised learning models successfully differentiate patients from healthy individuals by an average f1-score of 88 percent. Besides, in addition to the confirmation of previously reported breast cancer-associated pathways, we discovered that Biotin Metabolism along with Arginine and Proline Metabolism is subject to a significant increase in flux capacity, which have not been reported before.


Assuntos
Redes e Vias Metabólicas/genética , Metabolômica/métodos , Neoplasias/metabolismo , Medicina de Precisão/métodos , Algoritmos , Humanos , Neoplasias/diagnóstico , Neoplasias/genética , Aprendizado de Máquina Supervisionado , Biologia de Sistemas
3.
J Bioinform Comput Biol ; 18(5): 2050026, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33125294

RESUMO

Accurately identifying organisms based on their partially available genetic material is an important task to explore the phylogenetic diversity in an environment. Specific fragments in the DNA sequence of a living organism have been defined as DNA barcodes and can be used as markers to identify species efficiently and effectively. The existing DNA barcode-based classification approaches suffer from three major issues: (i) most of them assume that the classification is done within a given taxonomic class and/or input sequences are pre-aligned, (ii) highly performing classifiers, such as SVM, cannot scale to large taxonomies due to high memory requirements, (iii) mutations and noise in input DNA sequences greatly reduce the taxonomic classification score. In order to address these issues, we propose a multi-level hierarchical classifier framework to automatically assign taxonomy labels to DNA sequences. We utilize an alignment-free approach called spectrum kernel method for feature extraction. We build a proof-of-concept hierarchical classifier with two levels, and evaluated it on real DNA sequence data from barcode of life data systems. We demonstrate that the proposed framework provides higher f1-score than regular classifiers. Besides, hierarchical framework scales better to large datasets enabling researchers to employ classifiers with high classification performance and high memory requirement on large datasets. Furthermore, we show that the proposed framework is more robust to mutations and noise in sequence data than the non-hierarchical classifiers.


Assuntos
Classificação/métodos , Código de Barras de DNA Taxonômico/métodos , Aprendizado Profundo , Algoritmos , Animais , Aves/classificação , Aves/genética , Quirópteros/classificação , Quirópteros/genética , Bases de Dados Genéticas , Gleiquênias/classificação , Gleiquênias/genética , Fungos/classificação , Fungos/genética , Filogenia , Roedores/classificação , Roedores/genética , Máquina de Vetores de Suporte
4.
Artigo em Inglês | MEDLINE | ID: mdl-25267793

RESUMO

Metabolic networks have become one of the centers of attention in life sciences research with the advancements in the metabolomics field. A vast array of studies analyzes metabolites and their interrelations to seek explanations for various biological questions, and numerous genome-scale metabolic networks have been assembled to serve for this purpose. The increasing focus on this topic comes with the need for software systems that store, query, browse, analyze and visualize metabolic networks. PathCase Metabolomics Analysis Workbench (PathCaseMAW) is built, released and runs on a manually created generic mammalian metabolic network. The PathCaseMAW system provides a database-enabled framework and Web-based computational tools for browsing, querying, analyzing and visualizing stored metabolic networks. PathCaseMAW editor, with its user-friendly interface, can be used to create a new metabolic network and/or update an existing metabolic network. The network can also be created from an existing genome-scale reconstructed network using the PathCaseMAW SBML parser. The metabolic network can be accessed through a Web interface or an iPad application. For metabolomics analysis, steady-state metabolic network dynamics analysis (SMDA) algorithm is implemented and integrated with the system. SMDA tool is accessible through both the Web-based interface and the iPad application for metabolomics analysis based on a metabolic profile. PathCaseMAW is a comprehensive system with various data input and data access subsystems. It is easy to work with by design, and is a promising tool for metabolomics research and for educational purposes. Database URL: http://nashua.case.edu/PathwaysMAW/Web.


Assuntos
Bases de Dados Genéticas , Internet , Redes e Vias Metabólicas , Metabolômica/métodos , Interface Usuário-Computador , Software
5.
J Bioinform Comput Biol ; 10(1): 1240003, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22809304

RESUMO

With the recent advances in experimental technologies, such as gas chromatography and mass spectrometry, the number of metabolites that can be measured in biofluids of individuals has markedly increased. Given a set of such measurements, a very common task encountered by biologists is to identify the metabolic mechanisms that lead to changes in the concentrations of given metabolites and interpret the metabolic consequences of the observed changes in terms of physiological problems, nutritional deficiencies, or diseases. In this paper, we present the steady-state metabolic network dynamics analysis (SMDA) approach in detail, together with its application in a cystic fibrosis study. We also present a computational performance evaluation of the SMDA tool against a mammalian metabolic network database. The query output space of the SMDA tool is exponentially large in the number of reactions of the network. However, (i) larger numbers of observations exponentially reduce the output size, and (ii) exploratory search and browsing of the query output space is provided to allow users to search for what they are looking for.


Assuntos
Redes e Vias Metabólicas , Metabolômica/métodos , Fibrose Cística/metabolismo , Humanos , Modelos Biológicos
6.
BMC Syst Biol ; 6(1): 67, 2012 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-22697505

RESUMO

BACKGROUND: Integration of metabolic pathways resources and metabolic network models, and deploying new tools on the integrated platform can help perform more effective and more efficient systems biology research on understanding the regulation of metabolic networks. Therefore, the tasks of (a) integrating under a single database environment regulatory metabolic networks and existing models, and (b) building tools to help with modeling and analysis are desirable and intellectually challenging computational tasks. RESULTS: PathCase Systems Biology (PathCase-SB) is built and released. This paper describes PathCase-SB user interfaces developed to date. The current PathCase-SB system provides a database-enabled framework and web-based computational tools towards facilitating the development of kinetic models for biological systems. PathCase-SB aims to integrate systems biology models data and metabolic network data of selected biological data sources on the web (currently, BioModels Database and KEGG, respectively), and to provide more powerful and/or new capabilities via the new web-based integrative framework. CONCLUSIONS: Each of the current four PathCase-SB interfaces, namely, Browser, Visualization, Querying, and Simulation interfaces, have expanded and new capabilities as compared with the original data sources. PathCase-SB is already available on the web and being used by researchers across the globe.


Assuntos
Bases de Dados Factuais , Software , Biologia de Sistemas/métodos , Interface Usuário-Computador , Simulação por Computador , Glicólise/fisiologia , Internet , Redes e Vias Metabólicas , Modelos Biológicos
7.
BMC Syst Biol ; 5: 188, 2011 Nov 09.
Artigo em Inglês | MEDLINE | ID: mdl-22070889

RESUMO

BACKGROUND: Integration of metabolic pathways resources and regulatory metabolic network models, and deploying new tools on the integrated platform can help perform more effective and more efficient systems biology research on understanding the regulation in metabolic networks. Therefore, the tasks of (a) integrating under a single database environment regulatory metabolic networks and existing models, and (b) building tools to help with modeling and analysis are desirable and intellectually challenging computational tasks. DESCRIPTION: PathCase Systems Biology (PathCase-SB) is built and released. The PathCase-SB database provides data and API for multiple user interfaces and software tools. The current PathCase-SB system provides a database-enabled framework and web-based computational tools towards facilitating the development of kinetic models for biological systems. PathCase-SB aims to integrate data of selected biological data sources on the web (currently, BioModels database and KEGG), and to provide more powerful and/or new capabilities via the new web-based integrative framework. This paper describes architecture and database design issues encountered in PathCase-SB's design and implementation, and presents the current design of PathCase-SB's architecture and database. CONCLUSIONS: PathCase-SB architecture and database provide a highly extensible and scalable environment with easy and fast (real-time) access to the data in the database. PathCase-SB itself is already being used by researchers across the world.


Assuntos
Bases de Dados Factuais , Redes e Vias Metabólicas , Modelos Biológicos , Glicólise/fisiologia , Software , Biologia de Sistemas/métodos
8.
J Bioinform Comput Biol ; 8(2): 247-93, 2010 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-20401946

RESUMO

Metabolism is a representation of the biochemical principles that govern the production, consumption, degradation, and biosynthesis of metabolites in living cells. Organisms respond to changes in their physiological conditions or environmental perturbations (i.e. constraints) via cooperative implementation of such principles. Querying inner working principles of metabolism under different constraints provides invaluable insights for both researchers and educators. In this paper, we propose a metabolism query language (MQL) and discuss its query processing. MQL enables researchers to explore the behavior of the metabolism with a wide-range of predicates including dietary and physiological condition specifications. The query results of MQL are enriched with both textual and visual representations, and its query processing is completely tailored based on the underlying metabolic principles.


Assuntos
Redes e Vias Metabólicas , Metabolômica/estatística & dados numéricos , Algoritmos , Biologia Computacional , Gráficos por Computador , Bases de Dados Factuais , Humanos , Armazenamento e Recuperação da Informação , Fígado/metabolismo , Modelos Biológicos , Software , Biologia de Sistemas
10.
Bioinformatics ; 24(21): 2526-33, 2008 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-18728044

RESUMO

MOTIVATION: As the blueprints of cellular actions, biological pathways characterize the roles of genomic entities in various cellular mechanisms, and as such, their availability, manipulation and queriability over the web is important to facilitate ongoing biological research. RESULTS: In this article, we present the new features of PathCase, a system to store, query, visualize and analyze metabolic pathways at different levels of genetic, molecular, biochemical and organismal detail. The new features include: (i) a web-based system with a new architecture, containing a server-side and a client-side, and promoting scalability, and flexible and easy adaptation of different pathway databases, (ii) an interactive client-side visualization tool for metabolic pathways, with powerful visualization capabilities, and with integrated gene and organism viewers, (iii) two distinct querying capabilities: an advanced querying interface for computer savvy users, and built-in queries for ease of use, that can be issued directly from pathway visualizations and (iv) a pathway functionality analysis tool. PathCase is now available for three different datasets, namely, KEGG pathways data, sample pathways from the literature and BioCyc pathways for humans. AVAILABILITY: Available online at http://nashua.case.edu/pathways


Assuntos
Bases de Dados Factuais , Redes e Vias Metabólicas , Software , Simulação por Computador , Interface Usuário-Computador
11.
BMC Bioinformatics ; 9: 143, 2008 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-18325104

RESUMO

BACKGROUND: Genes and gene products are frequently annotated with Gene Ontology concepts based on the evidence provided in genomics articles. Manually locating and curating information about a genomic entity from the biomedical literature requires vast amounts of human effort. Hence, there is clearly a need forautomated computational tools to annotate the genes and gene products with Gene Ontology concepts by computationally capturing the related knowledge embedded in textual data. RESULTS: In this article, we present an automated genomic entity annotation system, GEANN, which extracts information about the characteristics of genes and gene products in article abstracts from PubMed, and translates the discoveredknowledge into Gene Ontology (GO) concepts, a widely-used standardized vocabulary of genomic traits. GEANN utilizes textual "extraction patterns", and a semantic matching framework to locate phrases matching to a pattern and produce Gene Ontology annotations for genes and gene products. In our experiments, GEANN has reached to the precision level of 78% at therecall level of 61%. On a select set of Gene Ontology concepts, GEANN either outperforms or is comparable to two other automated annotation studies. Use of WordNet for semantic pattern matching improves the precision and recall by 24% and 15%, respectively, and the improvement due to semantic pattern matching becomes more apparent as the Gene Ontology terms become more general. CONCLUSION: GEANN is useful for two distinct purposes: (i) automating the annotation of genomic entities with Gene Ontology concepts, and (ii) providing existing annotations with additional "evidence articles" from the literature. The use of textual extraction patterns that are constructed based on the existing annotations achieve high precision. The semantic pattern matching framework provides a more flexible pattern matching scheme with respect to "exactmatching" with the advantage of locating approximate pattern occurrences with similar semantics. Relatively low recall performance of our pattern-based approach may be enhanced either by employing a probabilistic annotation framework based on the annotation neighbourhoods in textual data, or, alternatively, the statistical enrichment threshold may be adjusted to lower values for applications that put more value on achieving higher recall values.


Assuntos
Genes/fisiologia , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Publicações Periódicas como Assunto , Proteínas/classificação , Proteínas/fisiologia , PubMed , Inteligência Artificial , Sistemas de Gerenciamento de Base de Dados , Reconhecimento Automatizado de Padrão , Proteínas/química , Vocabulário Controlado
12.
Pac Symp Biocomput ; : 221-32, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17990494

RESUMO

Annotating genes with Gene Ontology (GO) terms is crucial for biologists to characterize the traits of genes in a standardized way. However, manual curation of textual data, the most reliable form of gene annotation by GO terms, requires significant amounts of human effort, is very costly, and cannot catch up with the rate of increase in biomedical publications. In this paper, we present GEANN, a system to automatically infer new GO annotations for genes from biomedical papers based on the evidence support linked to PubMed, a biological literature database of 14 million papers. GEANN (i) extracts from text significant terms and phrases associated with a GO term, (ii) based on the extracted terms, constructs textual extraction patterns with reliability scores for GO terms, (iii) expands the pattern set through "pattern crosswalks", (iv) employs semantic pattern matching, rather than syntactic pattern matching, which allows for the recognition of phrases with close meanings, and (iv) annotates genes based on the "quality" of the matched pattern to the genomic entity occurring in the text. On the average, in our experiments, GEANN has reached to the precision level of 78% at the 57% recall level.


Assuntos
Genética/estatística & dados numéricos , PubMed , Biologia Computacional , Bases de Dados Genéticas , Reconhecimento Automatizado de Padrão
13.
Bioinformatics ; 23(20): 2775-83, 2007 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-17766269

RESUMO

MOTIVATION: Biological pathways provide significant insights on the interaction mechanisms of molecules. Presently, many essential pathways still remain unknown or incomplete for newly sequenced organisms. Moreover, experimental validation of enormous numbers of possible pathway candidates in a wet-lab environment is time- and effort-extensive. Thus, there is a need for comparative genomics tools that help scientists predict pathways in an organism's biological network. RESULTS: In this article, we propose a technique to discover unknown pathways in organisms. Our approach makes in-depth use of Gene Ontology (GO)-based functionalities of enzymes involved in metabolic pathways as follows: i. Model each pathway as a biological functionality graph of enzyme GO functions, which we call pathway functionality template. ii. Locate frequent pathway functionality patterns so as to infer previously unknown pathways through pattern matching in metabolic networks of organisms. We have experimentally evaluated the accuracy of the presented technique for 30 bacterial organisms to predict around 1500 organism-specific versions of 50 reference pathways. Using cross-validation strategy on known pathways, we have been able to infer pathways with 86% precision and 72% recall for enzymes (i.e. nodes). The accuracy of the predicted enzyme relationships has been measured at 85% precision with 64% recall. AVAILABILITY: Code upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Fenômenos Fisiológicos Bacterianos , Proteínas de Bactérias/metabolismo , Armazenamento e Recuperação da Informação/métodos , Modelos Biológicos , Transdução de Sinais/fisiologia , Simulação por Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA