Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Methods Mol Biol ; 1159: 253-67, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24788271

RESUMEN

Drug development remains a time-consuming and highly expensive process with high attrition rates at each stage. Given the safety hurdles drugs must pass due to increased regulatory scrutiny, it is essential for pharmaceutical companies to maximize their return on investment by effectively extending drug life cycles. There have been many effective techniques, such as phenotypic screening and compound profiling, which identify new indications for existing drugs, often referred to as drug repurposing or drug repositioning. This chapter explores the use of text mining leveraging several publicly available knowledge resources and mechanism of action representations to link existing drugs to new diseases from biomedical abstracts in an attempt to generate biologically meaningful alternative drug indications.


Asunto(s)
Minería de Datos/métodos , Bases de Datos Bibliográficas , Reposicionamiento de Medicamentos , Animales , Humanos
2.
Int J Data Min Bioinform ; 10(4): 357-73, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25946883

RESUMEN

Identifying drug target candidates is an important task for early development throughout the drug discovery process. This process is supported by the development of new high-throughput technologies that enable better understanding of disease mechanism. It becomes critical to facilitate effective analysis of the large amount of biological data. However, with much of the biological knowledge represented in the literature in the form of natural text, analysis and interpretation of high-throughput data has not reached its potential effectiveness. In this paper, we describe our solution in employing text mining as a technique in finding scientific information for target and biomarker discovery from the biomedical literature. Our approach utilises natural language processing techniques to capture linguistic patterns for the extraction of biological knowledge from text. Additionally, we discuss how the extracted knowledge is used for the analysis of biological data such as next-generation sequencing and gene expression data.


Asunto(s)
Biología Computacional/métodos , Minería de Datos/métodos , Diseño de Fármacos , Industria Farmacéutica/tendencias , Perfilación de la Expresión Génica , Mutación , Genoma Humano , Humanos , Inflamación/tratamiento farmacológico , Medicina de Precisión/métodos , Reproducibilidad de los Resultados , Programas Informáticos , Distribución Tisular
3.
PLoS One ; 7(7): e40946, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22911721

RESUMEN

BACKGROUND: With the large amount of pharmacological and biological knowledge available in literature, finding novel drug indications for existing drugs using in silico approaches has become increasingly feasible. Typical literature-based approaches generate new hypotheses in the form of protein-protein interactions networks by means of linking concepts based on their cooccurrences within abstracts. However, this kind of approaches tends to generate too many hypotheses, and identifying new drug indications from large networks can be a time-consuming process. METHODOLOGY: In this work, we developed a method that acquires the necessary facts from literature and knowledge bases, and identifies new drug indications through automated reasoning. This is achieved by encoding the molecular effects caused by drug-target interactions and links to various diseases and drug mechanism as domain knowledge in AnsProlog, a declarative language that is useful for automated reasoning, including reasoning with incomplete information. Unlike other literature-based approaches, our approach is more fine-grained, especially in identifying indirect relationships for drug indications. CONCLUSION/SIGNIFICANCE: To evaluate the capability of our approach in inferring novel drug indications, we applied our method to 943 drugs from DrugBank and asked if any of these drugs have potential anti-cancer activities based on information on their targets and molecular interaction types alone. A total of 507 drugs were found to have the potential to be used for cancer treatments. Among the potential anti-cancer drugs, 67 out of 81 drugs (a recall of 82.7%) are indeed known cancer drugs. In addition, 144 out of 289 drugs (a recall of 49.8%) are non-cancer drugs that are currently tested in clinical trials for cancer treatments. These results suggest that our method is able to infer drug indications (original or alternative) based on their molecular targets and interactions alone and has the potential to discover novel drug indications for existing drugs.


Asunto(s)
Biología Computacional/métodos , Simulación por Computador , Descubrimiento de Drogas/métodos , Inteligencia Artificial , Bases de Datos Factuales , Estudios de Asociación Genética , Humanos , Unión Proteica
4.
J Biomed Inform ; 45(5): 842-50, 2012 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-22564364

RESUMEN

MOTIVATION: Genetic factors determine differences in pharmacokinetics, drug efficacy, and drug responses between individuals and sub-populations. Wrong dosages of drugs can lead to severe adverse drug reactions in individuals whose drug metabolism drastically differs from the "assumed average". Databases such as PharmGKB are excellent sources of pharmacogenetic information on enzymes, genetic variants, and drug response affected by changes in enzymatic activity. Here, we seek to aid researchers, database curators, and clinicians in their search for relevant information by automatically extracting these data from literature. APPROACH: We automatically populate a repository of information on genetic variants, relations to drugs, occurrence in sub-populations, and associations with disease. We mine textual data from PubMed abstracts to discover such genotype-phenotype associations, focusing on SNPs that can be associated with variations in drug response. The overall repository covers relations found between genes, variants, alleles, drugs, diseases, adverse drug reactions, populations, and allele frequencies. We cross-reference these data to EntrezGene, PharmGKB, PubChem, and others. RESULTS: The performance regarding entity recognition and relation extraction yields a precision of 90-92% for the major entity types (gene, drug, disease), and 76-84% for relations involving these types. Comparison of our repository to PharmGKB reveals a coverage of 93% of gene-drug associations in PharmGKB and 97% of the gene-variant mappings based on 180,000 PubMed abstracts. AVAILABILITY: http://bioai4core.fulton.asu.edu/snpshot.


Asunto(s)
Minería de Datos/métodos , Bases de Datos Genéticas , Enfermedad/genética , Farmacogenética/métodos , Polimorfismo de Nucleótido Simple , Animales , Estudios de Asociación Genética/métodos , Humanos , Bases del Conocimiento , Ratones , PubMed , Ratas
5.
Bioinformatics ; 26(18): i547-53, 2010 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-20823320

RESUMEN

MOTIVATION: Identifying drug-drug interactions (DDIs) is a critical process in drug administration and drug development. Clinical support tools often provide comprehensive lists of DDIs, but they usually lack the supporting scientific evidences and different tools can return inconsistent results. In this article, we propose a novel approach that integrates text mining and automated reasoning to derive DDIs. Through the extraction of various facts of drug metabolism, not only the DDIs that are explicitly mentioned in text can be extracted but also the potential interactions that can be inferred by reasoning. RESULTS: Our approach was able to find several potential DDIs that are not present in DrugBank. We manually evaluated these interactions based on their supporting evidences, and our analysis revealed that 81.3% of these interactions are determined to be correct. This suggests that our approach can uncover potential DDIs with scientific evidences explaining the mechanism of the interactions.


Asunto(s)
Minería de Datos , Interacciones Farmacológicas , Bases de Datos Factuales , Enzimas/metabolismo , Estudios de Factibilidad , Humanos , Lógica , Procesamiento de Lenguaje Natural , Preparaciones Farmacéuticas/administración & dosificación , Preparaciones Farmacéuticas/metabolismo
6.
Artículo en Inglés | MEDLINE | ID: mdl-20498514

RESUMEN

Proteins and their interactions govern virtually all cellular processes, such as regulation, signaling, metabolism, and structure. Most experimental findings pertaining to such interactions are discussed in research papers, which, in turn, get curated by protein interaction databases. Authors, editors, and publishers benefit from efforts to alleviate the tasks of searching for relevant papers, evidence for physical interactions, and proper identifiers for each protein involved. The BioCreative II.5 community challenge addressed these tasks in a competition-style assessment to evaluate and compare different methodologies, to make aware of the increasing accuracy of automated methods, and to guide future implementations. In this paper, we present our approaches for protein-named entity recognition, including normalization, and for extraction of protein-protein interactions from full text. Our overall goal is to identify efficient individual components, and we compare various compositions to handle a single full-text article in between 10 seconds and 2 minutes. We propose strategies to transfer document-level annotations to the sentence-level, which allows for the creation of a more fine-grained training corpus; we use this corpus to automatically derive around 5,000 patterns. We rank sentences by relevance to the task of finding novel interactions with physical evidence, using a sentence classifier built from this training corpus. Heuristics for paraphrasing sentences help to further remove unnecessary information that might interfere with patterns, such as additional adjectives, clauses, or bracketed expressions. In BioCreative II.5, we achieved an f-score of 22 percent for finding protein interactions, and 43 percent for mapping proteins to UniProt IDs; disregarding species, f-scores are 30 percent and 55 percent, respectively. On average, our best-performing setup required around 2 minutes per full text. All data and pattern sets as well as Java classes that extend- - third-party software are available as supplementary information (see Appendix).


Asunto(s)
Biología Computacional/métodos , Minería de Datos/métodos , Bases de Datos Genéticas , Mapeo de Interacción de Proteínas/métodos , Bases de Datos de Proteínas , Procesamiento de Lenguaje Natural , Publicaciones Periódicas como Asunto , Sociedades Científicas
7.
Pac Symp Biocomput ; : 465-76, 2010.
Artículo en Inglés | MEDLINE | ID: mdl-19908398

RESUMEN

Biological pathways are seen as highly critical in our understanding of the mechanism of biological functions. To collect information about pathways, manual curation has been the most popular method. However, pathway annotation is regarded as heavily time-consuming, as it requires expert curators to identify and collect information from different sources. Even with the pieces of biological facts and interactions collected from various sources, curators have to apply their biological knowledge to arrange the acquired interactions in such a way that together they perform a common biological function as a pathway. In this paper, we propose a novel approach for automated pathway synthesis that acquires facts from hand-curated knowledge bases. To comprehend the incompleteness of the knowledge bases, our approach also obtains facts through automated extraction from Medline abstracts. An essential component of our approach is to apply logical reasoning to the acquired facts based on the biological knowledge about pathways. By representing such biological knowledge, the reasoning component is capable of assigning ordering to the acquired facts and interactions that is necessary for pathway synthesis. We demonstrate the feasibility of our approach with the development of a system that synthesizes pharmacokinetic pathways. We evaluate our approach by reconstructing the existing pharmacokinetic pathways available in PharmGKB. Our results show that not only that our approach is capable of synthesizing these pathways but also uncovering information that is not available in the manually annotated pathways.


Asunto(s)
Farmacocinética , Inteligencia Artificial , Carbamatos/farmacocinética , Biología Computacional , Humanos , Bases del Conocimiento , MEDLINE , Redes y Vías Metabólicas , Modelos Biológicos , Piperidinas/farmacocinética , Pravastatina/farmacocinética , Biología Sintética
8.
Pac Symp Biocomput ; : 87-98, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19209697

RESUMEN

Curated biological knowledge of interactions and pathways is largely available from various databases, and network synthesis is a popular method to gain insight into the data. However, such data from curated databases presents a single view of the knowledge to the biologists, and it may not be suitable to researchers' specific needs. On the other hand, Medline abstracts are publicly accessible and encode the necessary information to synthesize different kinds of biological networks. In this paper, we propose a new paradigm in synthesizing biomolecular networks by allowing biologists to create their own networks through queries to a specialized database of Medline abstracts. With this approach, users can specify precisely what kind of information they want in the resulting networks. We demonstrate the feasibility of our approach in the synthesis of gene-drug, gene-disease and protein-protein interaction networks. We show that our approach is capable of synthesizing these networks with high precision and even finds relations that have yet to be curated in public databases. In addition, we demonstrate a scenario of recovering a drug-related pathway using our approach.


Asunto(s)
MEDLINE , Modelos Biológicos , Biometría , Bases de Datos Factuales , Enfermedad/genética , Humanos , Procesamiento de Lenguaje Natural , Farmacogenética/estadística & datos numéricos , Mapeo de Interacción de Proteínas/estadística & datos numéricos
9.
J Biomed Inform ; 42(1): 74-81, 2009 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-18595779

RESUMEN

We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike traditional clustering methods, our method is capable of assigning genes to multiple clusters, which is a more appropriate representation of the behavior of genes. Two datasets of yeast (Saccharomyces cerevisiae) expression profiles were applied to compare our method with other state-of-the-art clustering methods. Our experiments show that our method can produce far better biologically meaningful clusters even with the use of a small percentage of Gene Ontology annotations. In addition, our experiments further indicate that the utilization of prior knowledge in our method can predict gene functions effectively. The source code is freely available at http://sysbio.fulton.asu.edu/gofuzzy/.


Asunto(s)
Análisis por Conglomerados , Lógica Difusa , Perfilación de la Expresión Génica/métodos , Genes/fisiología , Programas Informáticos , Algoritmos , Biología Computacional , Bases de Datos Genéticas , Genes Fúngicos/fisiología , Internet , Distribución Normal , Análisis de Secuencia por Matrices de Oligonucleótidos , Reproducibilidad de los Resultados , Saccharomyces cerevisiae/genética
10.
Pac Symp Biocomput ; : 28-39, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17992743

RESUMEN

MOTIVATION: The promises of the post-genome era disease-related discoveries and advances have yet to be fully realized, with many opportunities for discovery hiding in the millions of biomedical papers published since. Public databases give access to data extracted from the literature by teams of experts, but their coverage is often limited and lags behind recent discoveries. We present a computational method that combines data extracted from the literature with data from curated sources in order to uncover possible gene-disease relationships that are not directly stated or were missed by the initial mining. METHOD: An initial set of genes and proteins is obtained from gene-disease relationships extracted from PubMed abstracts using natural language processing. Interactions involving the corresponding proteins are similarly extracted and integrated with interactions from curated databases (such as BIND and DIP), assigning a confidence measure to each interaction depending on its source. The augmented list of genes and gene products is then ranked combining two scores: one that reflects the strength of the relationship with the initial set of genes and incorporates user-defined weights and another that reflects the importance of the gene in maintaining the connectivity of the network. We applied the method to atherosclerosis to assess its effectiveness. RESULTS: Top-ranked proteins from the method are related to atherosclerosis with accuracy between 0.85 to 1.00 for the top 20 and 0.64 to 0.80 for the top 90 if duplicates are ignored, with 45% of the top 20 and 75% of the top 90 derived by the method, not extracted from text. Thus, though the initial gene set and interactions were automatically extracted from text (and subject to the impreciseness of automatic extraction), their use for further hypothesis generation is valuable given adequate computational analysis.


Asunto(s)
Mapeo de Interacción de Proteínas/estadística & datos numéricos , Aterosclerosis/etiología , Aterosclerosis/genética , Biología Computacional , Bases de Datos Genéticas , Genómica/estadística & datos numéricos , Humanos , Procesamiento de Lenguaje Natural , Proteómica/estadística & datos numéricos , PubMed
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...