Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
BMC Bioinformatics ; 21(1): 217, 2020 May 27.
Artículo en Inglés | MEDLINE | ID: mdl-32460703

RESUMEN

BACKGROUND: Enzymatic and chemical reactions are key for understanding biological processes in cells. Curated databases of chemical reactions exist but these databases struggle to keep up with the exponential growth of the biomedical literature. Conventional text mining pipelines provide tools to automatically extract entities and relationships from the scientific literature, and partially replace expert curation, but such machine learning frameworks often require a large amount of labeled training data and thus lack scalability for both larger document corpora and new relationship types. RESULTS: We developed an application of Snorkel, a weakly supervised learning framework, for extracting chemical reaction relationships from biomedical literature abstracts. For this work, we defined a chemical reaction relationship as the transformation of chemical A to chemical B. We built and evaluated our system on small annotated sets of chemical reaction relationships from two corpora: curated bacteria-related abstracts from the MetaCyc database (MetaCyc_Corpus) and a more general set of abstracts annotated with MeSH (Medical Subject Headings) term Bacteria (Bacteria_Corpus; a superset of MetaCyc_Corpus). For the MetaCyc_Corpus, we obtained 84% precision and 41% recall (55% F1 score). Extending to the more general Bacteria_Corpus decreased precision to 62% with only a four-point drop in recall to 37% (46% F1 score). Overall, the Bacteria_Corpus contained two orders of magnitude more candidate chemical reaction relationships (nine million candidates vs 68,0000 candidates) and had a larger class imbalance (2.5% positives vs 5% positives) as compared to the MetaCyc_Corpus. In total, we extracted 6871 chemical reaction relationships from nine million candidates in the Bacteria_Corpus. CONCLUSIONS: With this work, we built a database of chemical reaction relationships from almost 900,000 scientific abstracts without a large training set of labeled annotations. Further, we showed the generalizability of our initial application built on MetaCyc documents enriched with chemical reactions to a general set of articles related to bacteria.


Asunto(s)
Minería de Datos/métodos , Bacterias/metabolismo , Fenómenos Bioquímicos , Bases de Datos Factuales , Humanos , Aprendizaje Automático , Publicaciones , Programas Informáticos
2.
Bioinformatics ; 32(1): 106-13, 2016 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-26338771

RESUMEN

MOTIVATION: A complete repository of gene-gene interactions is key for understanding cellular processes, human disease and drug response. These gene-gene interactions include both protein-protein interactions and transcription factor interactions. The majority of known interactions are found in the biomedical literature. Interaction databases, such as BioGRID and ChEA, annotate these gene-gene interactions; however, curation becomes difficult as the literature grows exponentially. DeepDive is a trained system for extracting information from a variety of sources, including text. In this work, we used DeepDive to extract both protein-protein and transcription factor interactions from over 100,000 full-text PLOS articles. METHODS: We built an extractor for gene-gene interactions that identified candidate gene-gene relations within an input sentence. For each candidate relation, DeepDive computed a probability that the relation was a correct interaction. We evaluated this system against the Database of Interacting Proteins and against randomly curated extractions. RESULTS: Our system achieved 76% precision and 49% recall in extracting direct and indirect interactions involving gene symbols co-occurring in a sentence. For randomly curated extractions, the system achieved between 62% and 83% precision based on direct or indirect interactions, as well as sentence-level and document-level precision. Overall, our system extracted 3356 unique gene pairs using 724 features from over 100,000 full-text articles. AVAILABILITY AND IMPLEMENTATION: Application source code is publicly available at https://github.com/edoughty/deepdive_genegene_app CONTACT: russ.altman@stanford.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Minería de Datos , Epistasis Genética , Almacenamiento y Recuperación de la Información , Publicaciones , Programas Informáticos , Curaduría de Datos , Bases de Datos Genéticas , Humanos
3.
Pac Symp Biocomput ; 23: 56-67, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29218869

RESUMEN

Bacteria in the human gut have the ability to activate, inactivate, and reactivate drugs with both intended and unintended effects. For example, the drug digoxin is reduced to the inactive metabolite dihydrodigoxin by the gut Actinobacterium E. lenta, and patients colonized with high levels of drug metabolizing strains may have limited response to the drug. Understanding the complete space of drugs that are metabolized by the human gut microbiome is critical for predicting bacteria-drug relationships and their effects on individual patient response. Discovery and validation of drug metabolism via bacterial enzymes has yielded >50 drugs after nearly a century of experimental research. However, there are limited computational tools for screening drugs for potential metabolism by the gut microbiome. We developed a pipeline for comparing and characterizing chemical transformations using continuous vector representations of molecular structure learned using unsupervised representation learning. We applied this pipeline to chemical reaction data from MetaCyc to characterize the utility of vector representations for chemical reaction transformations. After clustering molecular and reaction vectors, we performed enrichment analyses and queries to characterize the space. We detected enriched enzyme names, Gene Ontology terms, and Enzyme Consortium (EC) classes within reaction clusters. In addition, we queried reactions against drug-metabolite transformations known to be metabolized by the human gut microbiome. The top results for these known drug transformations contained similar substructure modifications to the original drug pair. This work enables high throughput screening of drugs and their resulting metabolites against chemical reactions common to gut bacteria.


Asunto(s)
Bacterias/metabolismo , Microbioma Gastrointestinal/fisiología , Preparaciones Farmacéuticas/metabolismo , Biotransformación , Análisis por Conglomerados , Biología Computacional/métodos , Bases de Datos Farmacéuticas/estadística & datos numéricos , Evaluación Preclínica de Medicamentos/estadística & datos numéricos , Ensayos Analíticos de Alto Rendimiento/estadística & datos numéricos , Humanos , Preparaciones Farmacéuticas/química , Relación Estructura-Actividad Cuantitativa , Procesos Estocásticos
4.
Pac Symp Biocomput ; 23: 590-601, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29218917

RESUMEN

Obtaining relevant information about gene interactions is critical for understanding disease processes and treatment. With the rise in text mining approaches, the volume of such biomedical data is rapidly increasing, thereby creating a new problem for the users of this data: information overload. A tool for efficient querying and visualization of biomedical data that helps researchers understand the underlying biological mechanisms for diseases and drug responses, and ultimately helps patients, is sorely needed. To this end we have developed GeneDive, a web-based information retrieval, filtering, and visualization tool for large volumes of gene interaction data. GeneDive offers various features and modalities that guide the user through the search process to efficiently reach the information of their interest. GeneDive currently processes over three million gene-gene interactions with response times within a few seconds. For over half of the curated gene sets sourced from four prominent databases, more than 80% of the gene set members are recovered by GeneDive. In the near future, GeneDive will seamlessly accommodate other interaction types, such as gene-drug and gene-disease interactions, thus enabling full exploration of topics such as precision medicine. The GeneDive application and information about its underlying system architecture are available at http://www.genedive.net.


Asunto(s)
Epistasis Genética , Medicina de Precisión/estadística & datos numéricos , Programas Informáticos , Biología Computacional/métodos , Gráficos por Computador/estadística & datos numéricos , Minería de Datos/estadística & datos numéricos , Bases de Datos Genéticas/estadística & datos numéricos , Redes Reguladoras de Genes , Humanos , Almacenamiento y Recuperación de la Información/estadística & datos numéricos , Internet , Interfaz Usuario-Computador
5.
Neoplasia ; 19(2): 65-74, 2017 02.
Artículo en Inglés | MEDLINE | ID: mdl-28038319

RESUMEN

The semaphorins and the plexins are a family of large, cysteine-rich proteins originally identified as regulators of axon growth and lymphocyte activation that are now known to provide motility and positional information for a number of cell and tissue types. For example, our group and others have shown that some malignancies over express Semaphorin 4D (S4D), which acts through its receptor Plexin-B1 (PB1) on endothelial cells to attract blood vessels from the surrounding stroma for the purpose of supporting tumor growth. While plexins are the known functional receptors for the semaphorins, there is evidence that transmembrane semaphorins may transmit a signal themselves through their short cytoplasmic tail, a phenomenon known as 'reverse signaling.' We used computational methods based upon correlated evolution of sequences of interacting proteins, mutational analysis and in vitro and in vivo measurements of tumor aggressiveness to show that when bound to PB1, transmembrane S4D associates with the Rac GTPase exchange factor T lymphoma invasion and metastasis (Tiam) 1, which activates Rac and promotes proliferation, invasion and metastasis in oral squamous cell carcinoma (OSCC) cells. These results suggest that not only can S4D production by tumor cells affect the microenvironment, but engagement of this semaphorin at the cell surface activates a reverse signaling mechanism that influences tumor aggressiveness in OSCC.


Asunto(s)
Antígenos CD/metabolismo , Carcinoma de Células Escamosas/metabolismo , Carcinoma de Células Escamosas/patología , Factores de Intercambio de Guanina Nucleótido/metabolismo , Neoplasias de la Boca/metabolismo , Neoplasias de la Boca/patología , Semaforinas/metabolismo , Proteínas de Unión al GTP rac/metabolismo , Animales , Antígenos CD/química , Biopsia , Carcinoma de Células Escamosas/mortalidad , Línea Celular Tumoral , Movimiento Celular , Proliferación Celular , Proteínas de Unión al ADN , Modelos Animales de Enfermedad , Expresión Génica , Factores de Intercambio de Guanina Nucleótido/química , Humanos , Ratones , Neoplasias de la Boca/mortalidad , Metástasis de la Neoplasia , Proteínas Nucleares/metabolismo , Dominios PDZ , Pronóstico , Unión Proteica , Dominios y Motivos de Interacción de Proteínas , Proteómica/métodos , Semaforinas/química , Proteína 1 de Invasión e Inducción de Metástasis del Linfoma-T , Factores de Transcripción/metabolismo
7.
Artículo en Inglés | MEDLINE | ID: mdl-26306253

RESUMEN

Individuals who suffer from schizophrenia comprise I percent of the United States population and are four times more likely to die of suicide than the general US population. Identification of at-risk individuals with schizophrenia is challenging when they do not seek treatment. Microblogging platforms allow users to share their thoughts and emotions with the world in short snippets of text. In this work, we leveraged the large corpus of Twitter posts and machine-learning methodologies to detect individuals with schizophrenia. Using features from tweets such as emoticon use, posting time of day, and dictionary terms, we trained, built, and validated several machine learning models. Our support vector machine model achieved the best performance with 92% precision and 71% recall on the held-out test set. Additionally, we built a web application that dynamically displays summary statistics between cohorts. This enables outreach to undiagnosed individuals, improved physician diagnoses, and destigmatization of schizophrenia.

SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda