Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Bioinformatic workflow extraction from scientific texts based on word sense disambiguation and relation extraction.

Halioui, Ahmed; Valtchev, Petko; Diallo, Abdoulaye Banire.

IEEE/ACM Trans Comput Biol Bioinform ; 2018 Jun 14.

Artigo em Inglês | MEDLINE | ID: mdl-29994265

RESUMO

This paper introduces a method for automatic workflow extraction from texts using Process-Oriented Case-Based Reasoning (POCBR). While the current workflow management systems implement mostly different complicated graphical tasks based on advanced distributed solutions (e.g. cloud computing and grid computation), workflow knowledge acquisition from texts using case-based reasoning represents more expressive and semantic cases representations. We propose in this context, an ontology-based workflow extraction framework to acquire processual knowledge from texts. Our methodology extends classic NLP techniques to extract and disambiguate tasks and relations in texts. Using a graph-based representation of workflows and a domain ontology, our extraction process uses a context-aware approach to recognize workflow components: data and control flows. We applied our framework in a technical domain in bioinformatics: i.e. phylogenetic analyses. An evaluation based on workflow semantic similarities on a gold standard proves that our approach provides promising results in the process extraction domain. Both data and implementation of our framework are available in: http://labo.bioinfo.uqam.ca/tgrowler.

T-GOWler: Discovering Generalized Process Models Within Texts.

Halioui, Ahmed; Valtchev, Petko; Diallo, Abdoulaye Baniré.

J Comput Biol ; 24(8): 799-808, 2017 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-28742392

RESUMO

Contemporary workflow management systems are driven by explicit process models specifying the interdependencies between tasks. Creating these models is a challenging and time-consuming task. Existing approaches to mining concrete workflows into models tackle design aspects related to the diverging abstraction levels of the tasks. Concrete workflow logs represent tasks and cases of concrete events-partially or totally ordered-grounding hidden multilevel (abstract) semantics and contexts. Relevant generalized events could be rediscovered within these processes. We propose, in this article, an ontology-based workflow mining system to generate patterns from sequences of events that are themselves extracted from texts. Our system T-GOWler (Generalized Ontology-based WorkfLow minER within Texts) is based on two ontology-based modules: a workflow extractor and a pattern miner. To this end, it uses two different ontologies: a domain one (to support workflow extraction from texts) and a processual one (to mine generalized patterns from extracted workflows).

Assuntos

Algoritmos , Biologia Computacional/métodos , Mineração de Dados/métodos , Ontologia Genética , Semântica , Humanos , Filogenia , Fluxo de Trabalho

A machine learning approach for viral genome classification.

Remita, Mohamed Amine; Halioui, Ahmed; Malick Diouara, Abou Abdallah; Daigle, Bruno; Kiani, Golrokh; Diallo, Abdoulaye Baniré.

BMC Bioinformatics ; 18(1): 208, 2017 Apr 11.

Artigo em Inglês | MEDLINE | ID: mdl-28399797

RESUMO

BACKGROUND: Advances in cloning and sequencing technology are yielding a massive number of viral genomes. The classification and annotation of these genomes constitute important assets in the discovery of genomic variability, taxonomic characteristics and disease mechanisms. Existing classification methods are often designed for specific well-studied family of viruses. Thus, the viral comparative genomic studies could benefit from more generic, fast and accurate tools for classifying and typing newly sequenced strains of diverse virus families. RESULTS: Here, we introduce a virus classification platform, CASTOR, based on machine learning methods. CASTOR is inspired by a well-known technique in molecular biology: restriction fragment length polymorphism (RFLP). It simulates, in silico, the restriction digestion of genomic material by different enzymes into fragments. It uses two metrics to construct feature vectors for machine learning algorithms in the classification step. We benchmark CASTOR for the classification of distinct datasets of human papillomaviruses (HPV), hepatitis B viruses (HBV) and human immunodeficiency viruses type 1 (HIV-1). Results reveal true positive rates of 99%, 99% and 98% for HPV Alpha species, HBV genotyping and HIV-1 M subtyping, respectively. Furthermore, CASTOR shows a competitive performance compared to well-known HIV-1 specific classifiers (REGA and COMET) on whole genomes and pol fragments. CONCLUSION: The performance of CASTOR, its genericity and robustness could permit to perform novel and accurate large scale virus studies. The CASTOR web platform provides an open access, collaborative and reproducible machine learning classifiers. CASTOR can be accessed at http://castor.bioinfo.uqam.ca .

Assuntos

Genoma Viral , Genômica/métodos , Aprendizado de Máquina , Classificação , Simulação por Computador , HIV-1/genética , Vírus da Hepatite B/genética , Humanos , Papillomaviridae/genética , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA