RESUMO
The electrochemical reduction reaction of nitrate (NO3RR) is an attractive route to produce ammonia at ambient conditions, but the conversion from nitrate to ammonia, which requires nine protons, has to compete with both the two-proton process of nitrite formation and the hydrogen evolution reaction. Extensive research efforts have thus been made in recent studies to develop electrocatalysts for the NO3RR facilitating the production of ammonia. Rather than designing another better electrocatalyst, herein, we synthesize an electrochemically inactive, porous, and chemically robust zirconium-based metal-organic framework (MOF) with enriched intraframework sulfonate groups, SO3-MOF-808, as a coating deposited on top of the catalytically active copper-based electrode. Although both the overall reaction rate and electrochemically active surface area of the electrode are barely affected by the MOF coating, with negatively charged sulfonate groups capable of enriching more protons near the electrode surface, the MOF coating significantly promotes the selectivity of the NO3RR toward the production of ammonia. In contrast, the use of MOF coating with positively charged trimethylammonium groups to repulse protons strongly facilitates the conversion of nitrate to nitrite, with selectivity of more than 90% at all potentials. Under the optimal operating conditions, the copper electrocatalyst with SO3-MOF-808 coating can achieve a Faradaic efficiency of 87.5% for ammonia production, a nitrate-to-ammonia selectivity of 95.6%, and an ammonia production rate of 97 µmol/cm2 h, outperforming all of those achieved by both the pristine copper (75.0%; 93.9%; 87 µmol/cm2 h) and copper with optimized Nafion coating (83.3%; 86.9%; 64 µmol/cm2 h). Findings here suggest the function of MOF as an advanced alternative to the commercially available Nafion to enrich protons near the surface of electrocatalyst for NO3RR, and shed light on the potential of utilizing such electrochemically inactive MOF coatings in a range of proton-coupled electrocatalytic reactions.
Assuntos
Linfoma Difuso de Grandes Células B , Terapia de Salvação , Transplante Haploidêntico , Humanos , Linfoma Difuso de Grandes Células B/terapia , Linfoma Difuso de Grandes Células B/patologia , Terapia de Salvação/métodos , Transplante Haploidêntico/métodos , Terapia Combinada , Criança , Masculino , Transplante de Células-Tronco Hematopoéticas/métodos , Recidiva Local de Neoplasia/terapia , Recidiva Local de Neoplasia/patologia , Linfócitos B/patologia , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , FemininoRESUMO
BACKGROUND AND OBJECTIVE: Proteome microarrays are one of the popular high-throughput screening methods for large-scale investigation of protein interactions in cells. These interactions can be measured on protein chips when coupled with fluorescence-labeled probes, helping indicate potential biomarkers or discover drugs. Several computational tools were developed to help analyze the protein chip results. However, existing tools fail to provide a user-friendly interface for biologists and present only one or two data analysis methods suitable for limited experimental designs, restricting the use cases. METHODS: In order to facilitate the biomarker examination using protein chips, we implemented a user-friendly and comprehensive web tool called BAPCP (Biomarker Analysis tool for Protein Chip Platforms) in this research to deal with diverse chip data distributions. RESULTS: BAPCP is well integrated with standard chip result files and includes 7 data normalization methods and 7 custom-designed quality control/differential analysis filters for biomarker extraction among experiment groups. Moreover, it can handle cost-efficient chip designs that repeat several blocks/samples within one single slide. Using experiments of the human coronavirus (HCoV) protein microarray and the E. coli proteome chip that helps study the immune response of Kawasaki disease as examples, we demonstrated that BAPCP can accelerate the time-consuming week-long manual biomarker identification process to merely 3 min. CONCLUSIONS: The developed BAPCP tool provides substantial analysis support for protein interaction studies and conforms to the necessity of expanding computer usage and exchanging information in bioscience and medicine. The web service of BAPCP is available at https://cosbi.ee.ncku.edu.tw/BAPCP/.
Assuntos
Biomarcadores , Análise Serial de Proteínas , Software , Biomarcadores/metabolismo , Humanos , Internet , Proteoma , Interface Usuário-Computador , Escherichia coli , Proteômica/métodos , Biologia ComputacionalRESUMO
Transcription regulation in multicellular species is mediated by modular transcription factor (TF) binding site combinations termed cis-regulatory modules (CRMs). Such CRM-mediated transcription regulation determines the gene expression patterns during development. Biologists frequently investigate CRM transcription regulation on gene expressions. However, the knowledge of the target genes and regulatory TFs participating in the CRMs under study is mostly fragmentary throughout the literature. Researchers need to afford tremendous human resources to fully surf through the articles deposited in biomedical literature databases in order to obtain the information. Although several novel text-mining systems are now available for literature triaging, these tools do not specifically focus on CRM-related literature prescreening, failing to correctly extract the information of the CRM target genes and regulatory TFs from the literature. For this reason, we constructed a supportive auto-literature prescreener called Drosophila Modular transcription-regulation Literature Screener (DMLS) that achieves the following: (i) prescreens articles describing experiments on modular transcription regulation, (ii) identifies the described target genes and TFs of the CRMs under study for each modular transcription-regulation-describing article and (iii) features an automated and extendable pipeline to perform the task. We demonstrated that the final performance of DMLS in extracting the described target gene and regulatory TF lists of CRMs under study for given articles achieved test macro area under the ROC curve (auROC) = 89.7% and area under the precision-recall curve (auPRC) = 77.6%, outperforming the intuitive gene name-occurrence-counting method by at least 19.9% in auROC and 30.5% in auPRC. The web service and the command line versions of DMLS are available at https://cobis.bme.ncku.edu.tw/DMLS/ and https://github.com/cobisLab/DMLS/, respectively. Database Tool URL: https://cobis.bme.ncku.edu.tw/DMLS/.
Assuntos
Mineração de Dados , Fatores de Transcrição , Animais , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Mineração de Dados/métodos , Drosophila/genética , Drosophila melanogaster/genética , Bases de Dados Genéticas , Regulação da Expressão Gênica , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismoRESUMO
Background: Hyperglycemia affects the outcomes of endovascular therapy (EVT) for acute ischemic stroke (AIS). This study compares the predictive ability of diabetes status and glucose measures on EVT outcomes using nationwide registry data. Methods: The study included 1,097 AIS patients who underwent EVT from the Taiwan Registry of Endovascular Thrombectomy for Acute Ischemic Stroke. The variables analyzed included diabetes status, admission glucose, glycated hemoglobin (HbA1c), admission glucose-to-HbA1c ratio (GAR), and outcomes such as 90-day poor functional outcome (modified Rankin Scale score ≥ 2) and symptomatic intracranial hemorrhage (SICH). Multivariable analyses investigated the independent effects of diabetes status and glucose measures on outcomes. A receiver operating characteristic (ROC) analysis was performed to compare their predictive abilities. Results: The multivariable analysis showed that individuals with known diabetes had a higher likelihood of poor functional outcomes (odds ratios [ORs] 2.10 to 2.58) and SICH (ORs 3.28 to 4.30) compared to those without diabetes. Higher quartiles of admission glucose and GAR were associated with poor functional outcomes and SICH. Higher quartiles of HbA1c were significantly associated with poor functional outcomes. However, patients in the second HbA1c quartile (5.6-5.8%) showed a non-significant tendency toward good functional outcomes compared to those in the lowest quartile (<5.6%). The ROC analysis indicated that diabetes status and admission glucose had higher predictive abilities for poor functional outcomes, while admission glucose and GAR were better predictors for SICH. Conclusion: In AIS patients undergoing EVT, diabetes status, admission glucose, and GAR were associated with 90-day poor functional outcomes and SICH. Admission glucose was likely the most suitable glucose measure for predicting outcomes after EVT.
RESUMO
It is now known that RNAs play more active roles in cellular pathways beyond simply serving as transcription templates. These biological mechanisms might be mediated by higher RNA stereo conformations, triggering the need to understand RNA secondary structures first. However, experimental protocols for solving RNA structures are unavailable for large-scale investigation due to their high costs and time-consuming nature. Various computational tools were thus developed to predict the RNA secondary structures from sequences. Recently, deep networks have been investigated to help predict RNA structures directly from their sequences. However, existing deep-learning-based tools are more or less suffering from model overfitting due to their complicated problem formulation and defective model training processes, limiting their applications across sequences from different structural families. In this research, we designed a two-stage RNA structure prediction strategy called DEBFold (deep ensemble boosting and folding) based on convolution encoding/decoding and self-attention mechanisms to enhance the existing thermodynamic structure models. Moreover, the model training process followed rigorous steps to achieve an acceptable prediction generalization. On the family-wise reserved test sets and the PDB-derived test set, DEBFold achieves better structure prediction performance over traditional tools and existing deep-learning methods. In summary, we obtained a cutting-edge deep-learning-based structure prediction tool with supreme across-family generalization performance. The DEBFold tool can be accessed at https://cobis.bme.ncku.edu.tw/DEBFold/.
Assuntos
Biologia Computacional , Aprendizado Profundo , Conformação de Ácido Nucleico , RNA , RNA/química , Biologia Computacional/métodos , Modelos Moleculares , Termodinâmica , Sequência de BasesRESUMO
PURPOSE: Clinical risk scores are essential for predicting outcomes in stroke patients. The advancements in deep learning (DL) techniques provide opportunities to develop prediction applications using magnetic resonance (MR) images. We aimed to develop an MR-based DL imaging biomarker for predicting outcomes in acute ischemic stroke (AIS) and evaluate its additional benefit to current risk scores. METHOD: This study included 3338 AIS patients. We trained a DL model using deep neural network architectures on MR images and radiomics to predict poor functional outcomes at three months post-stroke. The DL model generated a DL score, which served as the DL imaging biomarker. We compared the predictive performance of this biomarker to five risk scores on a holdout test set. Additionally, we assessed whether incorporating the imaging biomarker into the risk scores improved the predictive performance. RESULTS: The DL imaging biomarker achieved an area under the receiver operating characteristic curve (AUC) of 0.788. The AUCs of the five studied risk scores were 0.789, 0.793, 0.804, 0.810, and 0.826, respectively. The imaging biomarker's predictive performance was comparable to four of the risk scores but inferior to one (p = 0.038). Adding the imaging biomarker to the risk scores improved the AUCs (p-values) to 0.831 (0.003), 0.825 (0.001), 0.834 (0.003), 0.836 (0.003), and 0.839 (0.177), respectively. The net reclassification improvement and integrated discrimination improvement indices also showed significant improvements (all p < 0.001). CONCLUSIONS: Using DL techniques to create an MR-based imaging biomarker is feasible and enhances the predictive ability of current risk scores.
Assuntos
Isquemia Encefálica , Aprendizado Profundo , AVC Isquêmico , Acidente Vascular Cerebral , Humanos , Isquemia Encefálica/diagnóstico por imagem , Acidente Vascular Cerebral/diagnóstico por imagem , Imageamento por Ressonância Magnética , Biomarcadores , Estudos RetrospectivosRESUMO
miRNAs (microRNAs) target specific mRNA (messenger RNA) sites to regulate their translation expression. Although miRNA targeting can rely on seed region base pairing, animal miRNAs, including human miRNAs, typically cooperate with several cofactors, leading to various noncanonical pairing rules. Therefore, identifying the binding sites of animal miRNAs remains challenging. Because experiments for mapping miRNA targets are costly, computational methods are preferred for extracting potential miRNA-mRNA fragment binding pairs first. However, existing prediction tools can have significant false positives due to the prevalent noncanonical miRNA binding behaviors and the information-biased training negative sets that were used while constructing these tools. To overcome these obstacles, we first prepared an information-balanced miRNA binding pair ground-truth data set. A miRNA-mRNA interaction-aware model was then designed to help identify miRNA binding events. On the test set, our model (auROC = 94.4%) outperformed existing models by at least 2.8% in auROC. Furthermore, we showed that this model can suggest potential binding patterns for miRNA-mRNA sequence interacting pairs. Finally, we made the prepared data sets and the designed model available at http://cosbi2.ee.ncku.edu.tw/mirna_binding/download.
Assuntos
MicroRNAs , Animais , Humanos , MicroRNAs/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Algoritmos , Biologia Computacional/métodosRESUMO
Identifying lowly prevalent diseases, or rare diseases, in their early stages is key to disease treatment in the medical field. Deep learning techniques now provide promising tools for this purpose. Nevertheless, the low prevalence of rare diseases entangles the proper application of deep networks for disease identification due to the severe class-imbalance issue. In the past decades, some balancing methods have been studied to handle the data-imbalance issue. The bad news is that it is verified that none of these methods guarantees superior performance to others. This performance variation causes the need to formulate a systematic pipeline with a comprehensive software tool for enhancing deep-learning applications in rare disease identification. We reviewed the existing balancing schemes and summarized a systematic deep ensemble pipeline with a constructed tool called RDDL for handling the data imbalance issue. Through two real case studies, we showed that rare disease identification could be boosted with this systematic RDDL pipeline tool by lessening the data imbalance problem during model training. The RDDL pipeline tool is available at https://github.com/cobisLab/RDDL/.
Assuntos
Aprendizado Profundo , Humanos , Doenças Raras , SoftwareRESUMO
This study aimed to investigate the mechanical performance of early-strength carbon fiber-reinforced concrete (ECFRC) by incorporating original carbon fiber (OCF), recycled carbon fiber (RCF), and sizing-removed carbon fiber (SCF). Compressive, flexural, and splitting tensile strength were tested under three fiber-to-cement weight ratios (5‱, 10‱, and 15‱). The RCF was produced from waste bicycle parts made of carbon fiber-reinforced polymer (CFRP) through microwave-assisted pyrolysis (MAP). The sizing-removed fiber was obtained through a heat-treatment method applied to the OCF. The results of scanning electron microscopy (SEM) analysis with energy dispersive X-ray spectrometry (EDS) indicated the successful removal of sizing and impurities from the surface of the RCF and SCF. The mechanical test results showed that ECFRC with a 10‱ fiber-to-cement weight ratio of carbon fiber had the greatest improvement in its mechanical strengths. Moreover, the ECFRC with 10‱ RCF exhibited higher compressive, flexural, and splitting tensile strength than that of benchmark specimen by 14.2%, 56.5%, and 22.5%, respectively. The ECFRC specimens with a 10‱ fiber-to-cement weight ratio were used to analyze their impact resistance under various impact energies in the impact test. At 50 joules of impact energy, the impact number of the ECFRC with SCF was over 23 times that of the benchmark specimen (early-strength concrete without fiber) and was also greater than that of ECFRC with OCF and RCF.
RESUMO
Metazoa gene expression is controlled by modular DNA segments called cis-regulatory modules (CRMs). CRMs can convey promoter/enhancer/insulator roles, generating additional regulation layers in transcription. Experiments for understanding CRM roles are low-throughput and costly. Large-scale CRM function investigation still depends on computational methods. However, existing in silico tools only recognize enhancers or promoters exclusively, thus accumulating errors when considering CRM promoter/enhancer/insulator roles altogether. Currently, no algorithm can concurrently consider these CRM roles. In this research, we developed the CRM Function Annotator (CFA) model. CFA provides complete CRM transcriptional role labeling based on epigenetic profiling interpretation. We demonstrated that CFA achieves high performance (test macro auROC/auPRC = 94.1%/90.3%) and outperforms existing tools in promoter/enhancer/insulator identification. CFA is also inspected to recognize explainable epigenetic codes consistent with previous findings when labeling CRM roles. By considering the higher-order combinations of the epigenetic codes, CFA significantly reduces false-positive rates in CRM transcriptional role annotation. CFA is available at https://github.com/cobisLab/CFA/.
Assuntos
Aprendizado Profundo , Regiões Promotoras Genéticas/genética , Epigênese Genética/genéticaRESUMO
OBJECTIVES: To prevent aesthetic and functional deformities, precisely closed reduction is crucial in the management of nasal fractures. Plain film radiography (PF), ultrasonography (USG), and computed tomography can help confirm the diagnosis and classification of fractures and assist in performing closed reduction. However, no study in the literature reports on precisely closed reduction assisted with PF measurements under the picture archiving and communication system (PACS). METHODS: We retrospectively evaluated 153 patients with nasal bone fracture between January 2013 and December 2017. Surgeons conducted precisely closed reduction assisted with PF measurement of the distance between the fracture site and nasal tip under PACS on 34 patients (group A). Another group on 119 patients were reduced under surgeon's experience (group B). RESULTS: No significant differences in age, gender, Arbeitsgemeinschaft fur Osteosynthesefragen (AO) classification, and reduction outcome were observed between group A and group B (P > .05). The operative time of the group A was significantly lower (12.50 ± 4.64 minutes) compared to group B (23.78 ± 11.20 minutes; P < .001). After adjusted age, gender, and AO classification, patients in group A scored 10.46 minutes less on the operative time than those in group B (P < .001). In addition, the severity of nasal bone fracture (AO classification, ß = 3.37, P = .002) was positive associated with the operative time. CONCLUSIONS: In this study, closed reduction in nasal bone fracture assisted with PF measurements under PACS was performed precisely, thereby effectively decreasing operative time and the occurrence of complications. This procedure requires neither the use of new instruments or C-arm nor USG or navigation experience. Moreover, reduction can be easily performed using this method, and it requires short operative time, helps achieve great reduction, less radiation exposures, and is cost-effective.
Assuntos
Redução Fechada , Fraturas Ósseas , Osso Nasal , Osso Nasal/diagnóstico por imagem , Osso Nasal/lesões , Osso Nasal/cirurgia , Humanos , Fraturas Ósseas/diagnóstico por imagem , Fraturas Ósseas/cirurgia , Sistemas de Informação em Radiologia , Estudos Retrospectivos , Masculino , Feminino , Adulto , Duração da Cirurgia , Resultado do TratamentoRESUMO
Comparative analysis among multiple gene lists on their functional features is now a routine task due to the advancement of high-throughput experiments. Several enrichment analysis tools were developed in the past. However, these tools mainly focus on one gene list and contain only gene ontology or interaction features. What makes it worse, comparative investigation and customized feature set reanalysis are still unavailable. Therefore, we constructed the YMLA (Yeast Multiple List Analyzer) platform in this research. YMLA includes 39 yeast features and facilitates comparative analysis among multiple gene lists via tabular views, heatmaps, and network plots. Moreover, the customized feature set reanalysis function was implemented in YMLA to help form mechanism hypotheses based on a selected enriched feature subset. We demonstrated the biological applicability of YMLA via example lists consisting of genes with top/bottom translation efficiency values. The analysis results provided by YMLA reveal novel facts consistent with previous experiments. YMLA is available at https://cosbi7.ee.ncku.edu.tw/YMLA/.
Assuntos
Saccharomyces cerevisiae , Software , Saccharomyces cerevisiae/genéticaRESUMO
Cells adapt to environmental stresses mainly via transcription reprogramming. Correct transcription control is mediated by the interactions between transcription factors (TF) and their target genes. These TF-gene associations can be probed by chromatin immunoprecipitation techniques and knockout experiments, revealing TF binding (TFB) and regulatory (TFR) evidence, respectively. Nevertheless, most evidence is still fragmentary in the literature and requires tremendous human resources to curate. We developed the first pipeline called YTLR (Yeast Transcription-regulation Literature Reader) to automate TF-gene relation extraction from the literature. YTLR first identifies articles with TFB and TFR information. Then TF-gene binding pairs are extracted from the TFB articles, and TF-gene regulatory associations are recognized from the TFR papers. On gathered test sets, YTLR achieves an AUC value of 98.8% in identifying articles with TFB evidence and AUC = 83.4% in extracting the detailed TF-gene binding pairs. And similarly, YTLR also obtains an AUC value of 98.2% in identifying TFR articles and AUC = 80.4% in extracting the detailed TF-gene regulatory associations. Furthermore, YTLR outperforms previous methods in both tasks. To facilitate researchers in extracting TF-gene transcriptional relations from large-scale queried articles, an automated and easy-to-use software tool based on the YTLR pipeline is constructed. In summary, YTLR aims to provide easier literature pre-screening for curators and help researchers gather yeast TF-gene transcriptional relation conclusions from articles in a high-throughput fashion. The YTLR pipeline software tool can be downloaded at https://github.com/cobisLab/YTLR/.
RESUMO
RNA secondary structures can carry out essential cellular functions alone or interact with one another to form the hierarchical tertiary structures. Experimental structure identification approa ches can show the in vitro structures of RNA molecules. However, they usually have limits in the resolution and are costly. In silico structure prediction tools are thus primarily relied on for pre-experiment analysis. Various structure prediction models have been developed over the decades. Since these tools are usually used before knowing the actual RNA structures, evaluating and ranking the pile of secondary structure predictions of a given sequence is essential in computational analysis. In this research, we implemented a web service called SSRTool (RNA Secondary Structure prediction Ranking Tool) to assist in the ranking and evaluation of the generated predicted structures of a given sequence. Based on the computed species-specific interpretability significance in four common RNA structure-function aspects, SSRTool provides three functions along with visualization interfaces: (1) Rank user-generated predictions. (2) Provide an automated streamline of structure prediction and ranking for a given sequence. (3) Infer the functional aspects of a given structure. We demonstrated the applicability of SSRTool via real case studies and reported the similar trends between computed species-specific rankings and the corresponding prediction F1 values. The SSRTool web service is available online at https://cobisHSS0.im.nuk.edu.tw/SSRTool/, http://cosbi3.ee.ncku.edu.tw/SSRTool/, or the redirecting site https://github.com/cobisLab/SSRTool/.
RESUMO
Kawasaki disease (KD) is a form of acute systemic vasculitis that primarily affects children and has become the most common cause of acquired heart disease. While the etiopathogenesis of KD remains unknown, the diagnostic criteria of KD have been well established. Nevertheless, the diagnosis of KD is currently based on subjective clinical symptoms, and no molecular biomarker is yet available. We have previously performed and combined methylation array (Illumina HumanMethylation450 BeadChip) and transcriptome array (Affymetrix GeneChip Human Transcriptome Array 2.0) to identify genes that are differentially methylated/expressed in KD patients compared with control subjects. We have found that decreased methylation levels combined with elevated gene expression can indicate genes (e.g., toll-like receptors and CD177) involved in the disease mechanisms of KD. In this study, we constructed a database called KDmarkers to allow researchers to access these valuable potential KD biomarkers identified via methylation array and transcriptome array. KDmarkers provides three search modes. First, users can search genes differentially methylated and/or differentially expressed in KD patients compared with control subjects. Second, users can check the KD patient groups in which a given gene is differentially methylated and/or differentially expressed. Third, users can explore the DNA methylation levels and gene expression levels in all samples (KD patients and controls) for a particular gene of interest. We further demonstrated that the results in KDmarkers are strongly associated with KD immune responses. All analysis results can be downloaded for downstream experimental designs. KDmarkers is available online at https://cosbi.ee.ncku.edu.tw/KDmarkers/.
RESUMO
Transcription regulation in metazoa is controlled by the binding events of transcription factors (TFs) or regulatory proteins on specific modular DNA regulatory sequences called cis-regulatory modules (CRMs). Understanding the distributions of CRMs on a genomic scale is essential for constructing the metazoan transcriptional regulatory networks that help diagnose genetic disorders. While traditional reporter-assay CRM identification approaches can provide an in-depth understanding of functions of some CRM, these methods are usually cost-inefficient and low-throughput. It is generally believed that by integrating diverse genomic data, reliable CRM predictions can be made. Hence, researchers often first resort to computational algorithms for genome-wide CRM screening before specific experiments. However, current existing in silico methods for searching potential CRMs were restricted by low sensitivity, poor prediction accuracy, or high computation time from TFBS composition combinatorial complexity. To overcome these obstacles, we designed a novel CRM identification pipeline called regCNN by considering the base-by-base local patterns in TF binding motifs and epigenetic profiles. On the test set, regCNN shows an accuracy/auROC of 84.5%/92.5% in CRM identification. And by further considering local patterns in epigenetic profiles and TF binding motifs, it can accomplish 4.7% (92.5%-87.8%) improvement in the auROC value over the average value-based pure multi-layer perceptron model. We also demonstrated that regCNN outperforms all currently available tools by at least 11.3% in auROC values. Finally, regCNN is verified to be robust against its resizing window hyperparameter in dealing with the variable lengths of CRMs. The model of regCNN can be downloaded athttp://cobisHSS0.im.nuk.edu.tw/regCNN/.
RESUMO
RNA can provide vital cellular functions through its secondary or tertiary structure. Due to the low-throughput nature of experimental approaches, studies on RNA structures mainly resort to computational methods. However, current existing tools fail to consider RNA structure ensembles and do not provide ways to decipher functional hypotheses for the new predictions. In this research, a novel method was proposed to identify the functionally interpretable structure ensemble of a given RNA sequence and provide the meta-stable structure, or the most frequently observed functional RNA cellular conformation, based on the ensemble. In the prediction of meta-stable structures, the proposed method outperformed existing tools on a yeast test set. The inferred functional aspects were then manually checked and demonstrated a micro-averaging F1 value of 0.92. Further, a biological example of the yeast ASH1-E1 element was discussed to articulate that these functional aspects can also suggest testable hypotheses. Then the proposed method was verified to be well applicable to other species through a human test set. Finally, the proposed method was demonstrated to show resistance to sequence length-dependent performance deterioration.
Assuntos
Algoritmos , RNA , Biologia Computacional , Humanos , Conformação de Ácido Nucleico , Estrutura Secundária de Proteína , RNA/genéticaRESUMO
BACKGROUND: Piwi-interacting RNAs (piRNAs) are the small non-coding RNAs (ncRNAs) that silence genomic transposable elements. And researchers found out that piRNA also regulates various endogenous transcripts. However, there is no systematic understanding of the piRNA binding patterns and how piRNA targets genes. While various prediction methods have been developed for other similar ncRNAs (e.g., miRNAs), piRNA holds distinctive characteristics and requires its own computational model for binding target prediction. RESULTS: Recently, transcriptome-wide piRNA binding events in C. elegans were probed by PRG-1 CLASH experiments. Based on the probed piRNA-messenger RNAs (mRNAs) binding pairs, in this research, we devised the first deep learning architecture based on multi-head attention to computationally identify piRNA targeting mRNA sites. In the devised deep network, the given piRNA and mRNA segment sequences are first one-hot encoded and undergo a combined operation of convolution and squeezing-extraction to unravel motif patterns. And we incorporate a novel multi-head attention sub-network to extract the hidden piRNA binding rules that can simulate the biological piRNA target recognition process. Finally, the true piRNA-mRNA binding pairs are identified by a deep fully connected sub-network. Our model obtains a supreme discriminatory power of AUC [Formula: see text] 93.3% on an independent test set and successfully extracts the verified binding pattern of a synthetic piRNA. These results demonstrated that the devised model achieves high prediction performance and suggests testable potential biological piRNA binding rules. CONCLUSIONS: In this research, we developed the first deep learning method to identify piRNA targeting sites on C. elegans mRNAs. And the developed deep learning method is demonstrated to be of high accuracy and can provide biological insights into piRNA-mRNA binding patterns. The piRNA binding target identification network can be downloaded from http://cosbi2.ee.ncku.edu.tw/data_download/piRNA_mRNA_binding .
Assuntos
Proteínas de Caenorhabditis elegans , MicroRNAs , Animais , Proteínas Argonautas , Caenorhabditis elegans/genética , Proteínas de Caenorhabditis elegans/genética , Elementos de DNA Transponíveis , RNA Mensageiro/genética , RNA Interferente Pequeno/genéticaRESUMO
Transcript isoforms regulated by alternative splicing can substantially impact carcinogenesis, leading to a need to obtain clues for both gene differential expression and malfunctions of isoform distributions in cancer studies. The Cancer Genome Atlas (TCGA) project was launched in 2008 to collect cancer-related genome mutation raw data from the population. While many repositories tried to add insights into the raw data in TCGA, no existing database provides both comprehensive gene-level and isoform-level cancer stage marker investigation and survival analysis. We constructed Cancer DEIso to facilitate in-depth analyses for both gene-level and isoform-level human cancer studies. Patient RNA-seq data, sample sheets, patient clinical data, and human genome datasets were collected and processed in Cancer DEIso. And four functions to search differentially expressed genes/isoforms between cancer stages were implemented: (i) Search potential gene/isoform markers for a specified cancer type and its two stages; (ii) Search potentially induced cancer types and stages for a gene/isoform; (iii) Expression survival analysis on a given gene/isoform for some cancer; (iv) Gene/isoform stage expression comparison visualization. As an example, we demonstrate that Cancer DEIso can indicate potential colorectal cancer isoform diagnostic markers that are not easily detected when only gene-level expressions are considered. Cancer DEIso is available at http://cosbi4.ee.ncku.edu.tw/DEIso/.