Pesquisa | Secretaria de Estado da Saúde

DMLS: an automated pipeline to extract the Drosophila modular transcription regulators and targets from massive literature articles.

Yang, Tzu-Hsien; Yu, Yu-Huai; Wu, Sheng-Hang; Chang, Fang-Yuan; Tsai, Hsiu-Chun; Yang, Ya-Chiao.

Database (Oxford) ; 2024: 0, 2024 Jun 20.

Artigo em Inglês | MEDLINE | ID: mdl-38900628

RESUMO

Transcription regulation in multicellular species is mediated by modular transcription factor (TF) binding site combinations termed cis-regulatory modules (CRMs). Such CRM-mediated transcription regulation determines the gene expression patterns during development. Biologists frequently investigate CRM transcription regulation on gene expressions. However, the knowledge of the target genes and regulatory TFs participating in the CRMs under study is mostly fragmentary throughout the literature. Researchers need to afford tremendous human resources to fully surf through the articles deposited in biomedical literature databases in order to obtain the information. Although several novel text-mining systems are now available for literature triaging, these tools do not specifically focus on CRM-related literature prescreening, failing to correctly extract the information of the CRM target genes and regulatory TFs from the literature. For this reason, we constructed a supportive auto-literature prescreener called Drosophila Modular transcription-regulation Literature Screener (DMLS) that achieves the following: (i) prescreens articles describing experiments on modular transcription regulation, (ii) identifies the described target genes and TFs of the CRMs under study for each modular transcription-regulation-describing article and (iii) features an automated and extendable pipeline to perform the task. We demonstrated that the final performance of DMLS in extracting the described target gene and regulatory TF lists of CRMs under study for given articles achieved test macro area under the ROC curve (auROC) = 89.7% and area under the precision-recall curve (auPRC) = 77.6%, outperforming the intuitive gene name-occurrence-counting method by at least 19.9% in auROC and 30.5% in auPRC. The web service and the command line versions of DMLS are available at https://cobis.bme.ncku.edu.tw/DMLS/ and https://github.com/cobisLab/DMLS/, respectively. Database Tool URL: https://cobis.bme.ncku.edu.tw/DMLS/.

Assuntos

Mineração de Dados , Fatores de Transcrição , Animais , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Mineração de Dados/métodos , Drosophila/genética , Drosophila melanogaster/genética , Bases de Dados Genéticas , Regulação da Expressão Gênica , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo

CFA: An explainable deep learning model for annotating the transcriptional roles of cis-regulatory modules based on epigenetic codes.

Yang, Tzu-Hsien; Yu, Yu-Huai; Wu, Sheng-Hang; Zhang, Fang-Yuan.

Comput Biol Med ; 152: 106375, 2023 01.

Artigo em Inglês | MEDLINE | ID: mdl-36502693

RESUMO

Metazoa gene expression is controlled by modular DNA segments called cis-regulatory modules (CRMs). CRMs can convey promoter/enhancer/insulator roles, generating additional regulation layers in transcription. Experiments for understanding CRM roles are low-throughput and costly. Large-scale CRM function investigation still depends on computational methods. However, existing in silico tools only recognize enhancers or promoters exclusively, thus accumulating errors when considering CRM promoter/enhancer/insulator roles altogether. Currently, no algorithm can concurrently consider these CRM roles. In this research, we developed the CRM Function Annotator (CFA) model. CFA provides complete CRM transcriptional role labeling based on epigenetic profiling interpretation. We demonstrated that CFA achieves high performance (test macro auROC/auPRC = 94.1%/90.3%) and outperforms existing tools in promoter/enhancer/insulator identification. CFA is also inspected to recognize explainable epigenetic codes consistent with previous findings when labeling CRM roles. By considering the higher-order combinations of the epigenetic codes, CFA significantly reduces false-positive rates in CRM transcriptional role annotation. CFA is available at https://github.com/cobisLab/CFA/.

Assuntos

Aprendizado Profundo , Regiões Promotoras Genéticas/genética , Epigênese Genética/genética

RDDL: A systematic ensemble pipeline tool that streamlines balancing training schemes to reduce the effects of data imbalance in rare-disease-related deep-learning applications.

Yang, Tzu-Hsien; Liao, Zhan-Yi; Yu, Yu-Huai; Hsia, Min.

Comput Biol Chem ; 106: 107929, 2023 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-37517206

RESUMO

Identifying lowly prevalent diseases, or rare diseases, in their early stages is key to disease treatment in the medical field. Deep learning techniques now provide promising tools for this purpose. Nevertheless, the low prevalence of rare diseases entangles the proper application of deep networks for disease identification due to the severe class-imbalance issue. In the past decades, some balancing methods have been studied to handle the data-imbalance issue. The bad news is that it is verified that none of these methods guarantees superior performance to others. This performance variation causes the need to formulate a systematic pipeline with a comprehensive software tool for enhancing deep-learning applications in rare disease identification. We reviewed the existing balancing schemes and summarized a systematic deep ensemble pipeline with a constructed tool called RDDL for handling the data imbalance issue. Through two real case studies, we showed that rare disease identification could be boosted with this systematic RDDL pipeline tool by lessening the data imbalance problem during model training. The RDDL pipeline tool is available at https://github.com/cobisLab/RDDL/.

Assuntos

Aprendizado Profundo , Humanos , Doenças Raras , Software

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa