Pesquisa | Portal Regional da BVS

DMLS: an automated pipeline to extract the Drosophila modular transcription regulators and targets from massive literature articles.

Yang, Tzu-Hsien; Yu, Yu-Huai; Wu, Sheng-Hang; Chang, Fang-Yuan; Tsai, Hsiu-Chun; Yang, Ya-Chiao.

Database (Oxford) ; 2024: 0, 2024 Jun 20.

Artigo em Inglês | MEDLINE | ID: mdl-38900628

RESUMO

Transcription regulation in multicellular species is mediated by modular transcription factor (TF) binding site combinations termed cis-regulatory modules (CRMs). Such CRM-mediated transcription regulation determines the gene expression patterns during development. Biologists frequently investigate CRM transcription regulation on gene expressions. However, the knowledge of the target genes and regulatory TFs participating in the CRMs under study is mostly fragmentary throughout the literature. Researchers need to afford tremendous human resources to fully surf through the articles deposited in biomedical literature databases in order to obtain the information. Although several novel text-mining systems are now available for literature triaging, these tools do not specifically focus on CRM-related literature prescreening, failing to correctly extract the information of the CRM target genes and regulatory TFs from the literature. For this reason, we constructed a supportive auto-literature prescreener called Drosophila Modular transcription-regulation Literature Screener (DMLS) that achieves the following: (i) prescreens articles describing experiments on modular transcription regulation, (ii) identifies the described target genes and TFs of the CRMs under study for each modular transcription-regulation-describing article and (iii) features an automated and extendable pipeline to perform the task. We demonstrated that the final performance of DMLS in extracting the described target gene and regulatory TF lists of CRMs under study for given articles achieved test macro area under the ROC curve (auROC) = 89.7% and area under the precision-recall curve (auPRC) = 77.6%, outperforming the intuitive gene name-occurrence-counting method by at least 19.9% in auROC and 30.5% in auPRC. The web service and the command line versions of DMLS are available at https://cobis.bme.ncku.edu.tw/DMLS/ and https://github.com/cobisLab/DMLS/, respectively. Database Tool URL: https://cobis.bme.ncku.edu.tw/DMLS/.

Assuntos

Mineração de Dados , Fatores de Transcrição , Animais , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Mineração de Dados/métodos , Drosophila/genética , Drosophila melanogaster/genética , Bases de Dados Genéticas , Regulação da Expressão Gênica , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo

YTLR: Extracting yeast transcription factor-gene associations from the literature using automated literature readers.

Yang, Tzu-Hsien; Wang, Chung-Yu; Tsai, Hsiu-Chun; Yang, Ya-Chiao; Liu, Cheng-Tse.

Comput Struct Biotechnol J ; 20: 4636-4644, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36090812

RESUMO

Cells adapt to environmental stresses mainly via transcription reprogramming. Correct transcription control is mediated by the interactions between transcription factors (TF) and their target genes. These TF-gene associations can be probed by chromatin immunoprecipitation techniques and knockout experiments, revealing TF binding (TFB) and regulatory (TFR) evidence, respectively. Nevertheless, most evidence is still fragmentary in the literature and requires tremendous human resources to curate. We developed the first pipeline called YTLR (Yeast Transcription-regulation Literature Reader) to automate TF-gene relation extraction from the literature. YTLR first identifies articles with TFB and TFR information. Then TF-gene binding pairs are extracted from the TFB articles, and TF-gene regulatory associations are recognized from the TFR papers. On gathered test sets, YTLR achieves an AUC value of 98.8% in identifying articles with TFB evidence and AUC = 83.4% in extracting the detailed TF-gene binding pairs. And similarly, YTLR also obtains an AUC value of 98.2% in identifying TFR articles and AUC = 80.4% in extracting the detailed TF-gene regulatory associations. Furthermore, YTLR outperforms previous methods in both tasks. To facilitate researchers in extracting TF-gene transcriptional relations from large-scale queried articles, an automated and easy-to-use software tool based on the YTLR pipeline is constructed. In summary, YTLR aims to provide easier literature pre-screening for curators and help researchers gather yeast TF-gene transcriptional relation conclusions from articles in a high-throughput fashion. The YTLR pipeline software tool can be downloaded at https://github.com/cobisLab/YTLR/.

regCNN: identifying Drosophila genome-wide cis-regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifs.

Yang, Tzu-Hsien; Yang, Ya-Chiao; Tu, Kai-Chi.

Comput Struct Biotechnol J ; 20: 296-308, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35035784

RESUMO

Transcription regulation in metazoa is controlled by the binding events of transcription factors (TFs) or regulatory proteins on specific modular DNA regulatory sequences called cis-regulatory modules (CRMs). Understanding the distributions of CRMs on a genomic scale is essential for constructing the metazoan transcriptional regulatory networks that help diagnose genetic disorders. While traditional reporter-assay CRM identification approaches can provide an in-depth understanding of functions of some CRM, these methods are usually cost-inefficient and low-throughput. It is generally believed that by integrating diverse genomic data, reliable CRM predictions can be made. Hence, researchers often first resort to computational algorithms for genome-wide CRM screening before specific experiments. However, current existing in silico methods for searching potential CRMs were restricted by low sensitivity, poor prediction accuracy, or high computation time from TFBS composition combinatorial complexity. To overcome these obstacles, we designed a novel CRM identification pipeline called regCNN by considering the base-by-base local patterns in TF binding motifs and epigenetic profiles. On the test set, regCNN shows an accuracy/auROC of 84.5%/92.5% in CRM identification. And by further considering local patterns in epigenetic profiles and TF binding motifs, it can accomplish 4.7% (92.5%-87.8%) improvement in the auROC value over the average value-based pure multi-layer perceptron model. We also demonstrated that regCNN outperforms all currently available tools by at least 11.3% in auROC values. Finally, regCNN is verified to be robust against its resizing window hyperparameter in dealing with the variable lengths of CRMs. The model of regCNN can be downloaded athttp://cobisHSS0.im.nuk.edu.tw/regCNN/.

Cancer DEIso: An integrative analysis platform for investigating differentially expressed gene-level and isoform-level human cancer markers.

Yang, Tzu-Hsien; Chiang, Yu-Hsuan; Shiue, Sheng-Cian; Lin, Po-Heng; Yang, Ya-Chiao; Tu, Kai-Chi; Tseng, Yan-Yuan; Tseng, Joseph T; Wu, Wei-Sheng.

Comput Struct Biotechnol J ; 19: 5149-5159, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34589189

RESUMO

Transcript isoforms regulated by alternative splicing can substantially impact carcinogenesis, leading to a need to obtain clues for both gene differential expression and malfunctions of isoform distributions in cancer studies. The Cancer Genome Atlas (TCGA) project was launched in 2008 to collect cancer-related genome mutation raw data from the population. While many repositories tried to add insights into the raw data in TCGA, no existing database provides both comprehensive gene-level and isoform-level cancer stage marker investigation and survival analysis. We constructed Cancer DEIso to facilitate in-depth analyses for both gene-level and isoform-level human cancer studies. Patient RNA-seq data, sample sheets, patient clinical data, and human genome datasets were collected and processed in Cancer DEIso. And four functions to search differentially expressed genes/isoforms between cancer stages were implemented: (i) Search potential gene/isoform markers for a specified cancer type and its two stages; (ii) Search potentially induced cancer types and stages for a gene/isoform; (iii) Expression survival analysis on a given gene/isoform for some cancer; (iv) Gene/isoform stage expression comparison visualization. As an example, we demonstrate that Cancer DEIso can indicate potential colorectal cancer isoform diagnostic markers that are not easily detected when only gene-level expressions are considered. Cancer DEIso is available at http://cosbi4.ee.ncku.edu.tw/DEIso/.

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA