Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Med Inform Decis Mak ; 18(Suppl 5): 119, 2018 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-30526566

RESUMO

BACKGROUND: The Gene Ontology (GO) is a resource that supplies information about gene product function using ontologies to represent biological knowledge. These ontologies cover three domains: Cellular Component (CC), Molecular Function (MF), and Biological Process (BP). GO annotation is a process which assigns gene functional information using GO terms to relevant genes in the literature. It is a common task among the Model Organism Database (MOD) groups. Manual GO annotation relies on human curators assigning gene functional information using GO terms by reading the biomedical literature. This process is very time-consuming and labor-intensive. As a result, many MODs can afford to curate only a fraction of relevant articles. METHODS: GO terms from the CC domain can be essentially divided into two sub-hierarchies: subcellular location terms, and protein complex terms. We cast the task of gene annotation using GO terms from the CC domain as relation extraction between gene and other entities: (1) extract cases where a protein is found to be in a subcellular location, and (2) extract cases where a protein is a subunit of a protein complex. For each relation extraction task, we use an approach based on triggers and syntactic dependencies to extract the desired relations among entities. RESULTS: We tested our approach on the BC4GO test set, a publicly available corpus for GO annotation. Our approach obtains a F1-score of 71%, a precision of 91% and a recall of 58% for predicting GO terms from CC Domain for given genes. CONCLUSIONS: We have described a novel approach of treating gene annotation with GO terms from CC domain as two relation extraction subtasks. Evaluation results show that our approach achieves a F1-score of 71% for predicting GO terms for given genes. Thereby our approach can be used to accelerate the process of GO annotation for the bio-annotators.


Assuntos
Biologia Computacional , Ontologia Genética , Anotação de Sequência Molecular , Processamento de Linguagem Natural , Humanos
2.
BMC Med Inform Decis Mak ; 18(Suppl 5): 117, 2018 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-30526643

RESUMO

BACKGROUND: The application of artificial intelligence techniques for processing electronic health records data plays increasingly significant role in advancing clinical decision support. This study conducts a quantitative comparison on the research of utilizing artificial intelligence on electronic health records between the USA and China to discovery their research similarities and differences. METHODS: Publications from both Web of Science and PubMed are retrieved to explore the research status and academic performances of the two countries quantitatively. Bibliometrics, geographic visualization, collaboration degree calculation, social network analysis, latent dirichlet allocation, and affinity propagation clustering are applied to analyze research quantity, collaboration relations, and hot research topics. RESULTS: There are 1031 publications from the USA and 173 publications from China during 2008-2017 period. The annual numbers of publications from the USA and China increase polynomially. JAMIA with 135 publications and JBI with 13 publications are the top prolific journals for the USA and China, respectively. Harvard University with 101 publications and Zhejiang University with 12 publications are the top prolific affiliations for the USA and China, respectively. Massachusetts is the most prolific region with 211 publications for the USA, while for China, Taiwan is the top 1 with 47 publications. China has relatively higher institutional and international collaborations. Nine main research areas for the USA are identified, differentiating 7 for China. CONCLUSIONS: There is a steadily growing presence and increasing visibility of utilizing artificial intelligence on electronic health records for the USA and China over the years. The results of the study demonstrate the research similarities and differences, as well as strengths and weaknesses of the two countries.


Assuntos
Inteligência Artificial , Bibliometria , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação , PubMed , Inteligência Artificial/estatística & dados numéricos , China , Registros Eletrônicos de Saúde/estatística & dados numéricos , Humanos , Armazenamento e Recuperação da Informação/estatística & dados numéricos , PubMed/estatística & dados numéricos , Taiwan , Estados Unidos
3.
Database (Oxford) ; 20172017 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-29220476

RESUMO

UniProt Knowledgebase (UniProtKB) is a publicly available database with access to a vast amount of protein sequence and functional information. To widen the scope of the publications associated with a protein entry, UniProt has introduced the computationally mapped additional bibliography section, which includes literature collected from external sources. In this article, we describe a text mining system, eGenPub, which selects articles that are 'about' specific proteins and allows automatic identification of additional bibliography for given UniProt protein entries. Focusing on plant proteins initially, eGenPub utilizes a gene normalization tool called pGenN, and a trained support vector machine model, which achieves a precision of 95.3%, to predict whether an article, based on its abstract, should be linked to a given UniProt entry. We have conducted a full-scale PubMed processing using eGenPub for eight common plant species. Altogether, 9025 articles are identified as relevant bibliography for 4752 UniProt entries, among which 5252 are additional papers not in the existing publication section. These newly computationally mapped additional bibliography via eGenPub is being integrated in the UniProt production pipeline, and can be accessed via the UniProtKB protein entry publication view.


Assuntos
Mineração de Dados , Bases de Dados Bibliográficas , Bases de Dados de Proteínas , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Plantas , Plantas/genética , Plantas/metabolismo
4.
PLoS One ; 10(8): e0135305, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26258475

RESUMO

BACKGROUND: Automatically detecting gene/protein names in the literature and connecting them to databases records, also known as gene normalization, provides a means to structure the information buried in free-text literature. Gene normalization is critical for improving the coverage of annotation in the databases, and is an essential component of many text mining systems and database curation pipelines. METHODS: In this manuscript, we describe a gene normalization system specifically tailored for plant species, called pGenN (pivot-based Gene Normalization). The system consists of three steps: dictionary-based gene mention detection, species assignment, and intra species normalization. We have developed new heuristics to improve each of these phases. RESULTS: We evaluated the performance of pGenN on an in-house expertly annotated corpus consisting of 104 plant relevant abstracts. Our system achieved an F-value of 88.9% (Precision 90.9% and Recall 87.2%) on this corpus, outperforming state-of-art systems presented in BioCreative III. We have processed over 440,000 plant-related Medline abstracts using pGenN. The gene normalization results are stored in a local database for direct query from the pGenN web interface (proteininformationresource.org/pgenn/). The annotated literature corpus is also publicly available through the PIR text mining portal (proteininformationresource.org/iprolink/).


Assuntos
Mineração de Dados/métodos , Genes de Plantas , Proteínas de Plantas/genética , Plantas/genética , Software , Bases de Dados Genéticas , Anotação de Sequência Molecular , Processamento de Linguagem Natural , Padrões de Referência , Terminologia como Assunto
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA