Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
Interdiscip Sci ; 2024 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-38683279

RESUMO

The structures of fentanyl and its analogues are easy to be modified and few types have been included in database so far, which allow criminals to avoid the supervision of relevant departments. This paper introduces a molecular graph-based transformer model, which is combined with a data augmentation method based on substructure replacement to generate novel fentanyl analogues. 140,000 molecules were generated, and after a set of screening, 36,799 potential fentanyl analogues were finally obtained. We calculated the molecular properties of 36,799 potential fentanyl analogues. The results showed that the model could learn some properties of original fentanyl molecules. We compared the generated molecules from transformer model and data augmentation method based on substructure replacement with those generated by the other two molecular generation models based on deep learning, and found that the model in this paper can generate more novel potential fentanyl analogues. Finally, the findings of the paper indicate that transformer model based on molecular graph helps us explore the structure of potential fentanyl molecules as well as understand distribution of original molecules of fentanyl.

2.
J Theor Biol ; 571: 111538, 2023 08 21.
Artigo em Inglês | MEDLINE | ID: mdl-37257720

RESUMO

The gut microbial community has been shown to play a significant role in various diseases, including colorectal cancer (CRC), which is a major public health concern worldwide. The accurate diagnosis and etiological analysis of CRC are crucial issues. Numerous methods have utilized gut microbiota to address these challenges; however, few have considered the complex interactions and individual heterogeneity of the gut microbiota, which are important issues in genetics and intestinal microbiology, particularly in high-dimensional cases. This paper presents a novel method called Binary matrix based on Logistic Regression (LRBmat) to address these concerns. The binary matrix in LRBmat can directly mitigate or eliminate the influence of heterogeneity, while also capturing information on gut microbial interactions with any order. LRBmat is highly adaptable and can be combined with any machine learning method to enhance its capabilities. The proposed method was evaluated using real CRC data and demonstrated superior classification performance compared to state-of-the-art methods. Furthermore, the association rules extracted from the binary matrix of the real data align well with biological properties and existing literature, thereby aiding in the etiological analysis of CRC.


Assuntos
Neoplasias Colorretais , Microbioma Gastrointestinal , Microbiota , Humanos , Interações Microbianas
3.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36525367

RESUMO

SUMMARY: Non-coding RNAs play important roles in transcriptional processes and participate in the regulation of various biological functions, in particular miRNAs and lncRNAs. Despite their importance for several biological functions, the existing signaling pathway databases do not include information on miRNA and lncRNA. Here, we redesigned a novel pathway database named NcPath by integrating and visualizing a total of 178 308 human experimentally validated miRNA-target interactions (MTIs), 32 282 experimentally verified lncRNA-target interactions (LTIs) and 4837 experimentally validated human ceRNA networks across 222 KEGG pathways (including 27 sub-categories). To expand the application potential of the redesigned NcPath database, we identified 556 798 reliable lncRNA-protein-coding genes (PCG) interaction pairs by integrating co-expression relations, ceRNA relations, co-TF-binding interactions, co-histone-modification interactions, cis-regulation relations and lncPro Tool predictions between lncRNAs and PCG. In addition, to determine the pathways in which miRNA/lncRNA targets are involved, we performed a KEGG enrichment analysis using a hypergeometric test. The NcPath database also provides information on MTIs/LTIs/ceRNA networks, PubMed IDs, gene annotations and the experimental verification method used. In summary, the NcPath database will serve as an important and continually updated platform that provides annotation and visualization of the pathways on which non-coding RNAs (miRNA and lncRNA) are involved, and provide support to multimodal non-coding RNAs enrichment analysis. The NcPath database is freely accessible at http://ncpath.pianlab.cn/. AVAILABILITY AND IMPLEMENTATION: NcPath database is freely available at http://ncpath.pianlab.cn/. The code and manual to use NcPath can be found at https://github.com/Marscolono/NcPath/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
MicroRNAs , RNA Longo não Codificante , Humanos , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , RNA Mensageiro/metabolismo , Redes Reguladoras de Genes , MicroRNAs/genética , MicroRNAs/metabolismo , Transdução de Sinais
4.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36184256

RESUMO

Fentanyl and its analogues are psychoactive substances and the concern of fentanyl abuse has been existed in decades. Because the structure of fentanyl is easy to be modified, criminals may synthesize new fentanyl analogues to avoid supervision. The drug supervision is based on the structure matching to the database and too few kinds of fentanyl analogues are included in the database, so it is necessary to find out more potential fentanyl analogues and expand the sample space of fentanyl analogues. In this study, we introduced two deep generative models (SeqGAN and MolGPT) to generate potential fentanyl analogues, and a total of 11 041 valid molecules were obtained. The results showed that not only can we generate molecules with similar property distribution of original data, but the generated molecules also contain potential fentanyl analogues that are not pretty similar to any of original data. Ten molecules based on the rules of fentanyl analogues were selected for NMR, MS and IR validation. The results indicated that these molecules are all unreported fentanyl analogues. Furthermore, this study is the first to apply the deep learning to the generation of fentanyl analogues, greatly expands the exploring space of fentanyl analogues and provides help for the supervision of fentanyl.


Assuntos
Aprendizado Profundo , Fentanila , Fentanila/química , Analgésicos Opioides/química , Espectroscopia de Ressonância Magnética , Gerenciamento de Dados
5.
Brief Bioinform ; 23(6)2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-36155619

RESUMO

Identification of transcription factor binding sites (TFBSs) is essential to understanding of gene regulation. Designing computational models for accurate prediction of TFBSs is crucial because it is not feasible to experimentally assay all transcription factors (TFs) in all sequenced eukaryotic genomes. Although many methods have been proposed for the identification of TFBSs in humans, methods designed for plants are comparatively underdeveloped. Here, we present PlantBind, a method for integrated prediction and interpretation of TFBSs based on DNA sequences and DNA shape profiles. Built on an attention-based multi-label deep learning framework, PlantBind not only simultaneously predicts the potential binding sites of 315 TFs, but also identifies the motifs bound by transcription factors. During the training process, this model revealed a strong similarity among TF family members with respect to target binding sequences. Trans-species prediction performance using four Zea mays TFs demonstrated the suitability of this model for transfer learning. Overall, this study provides an effective solution for identifying plant TFBSs, which will promote greater understanding of transcriptional regulatory mechanisms in plants.


Assuntos
Regulação da Expressão Gênica , Fatores de Transcrição , Humanos , Sítios de Ligação , Ligação Proteica , Fatores de Transcrição/metabolismo , Redes Neurais de Computação
6.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35183063

RESUMO

Subcellular localization of microRNAs (miRNAs) is an important reflection of their biological functions. Considering the spatio-temporal specificity of miRNA subcellular localization, experimental detection techniques are expensive and time-consuming, which strongly motivates an efficient and economical computational method to predict miRNA subcellular localization. In this paper, we describe a computational framework, MiRLoc, to predict the subcellular localization of miRNAs. In contrast to existing methods, MiRLoc uses the functional similarity between miRNAs instead of sequence features and incorporates information about the subcellular localization of the corresponding target mRNAs. The results show that miRNA functional similarity data can be effectively used to predict miRNA subcellular localization, and that inclusion of subcellular localization information of target mRNAs greatly improves prediction performance.


Assuntos
MicroRNAs , Algoritmos , Biologia Computacional/métodos , MicroRNAs/genética , RNA Mensageiro/genética
7.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35189635

RESUMO

Protein lysine crotonylation (Kcr) is an important type of posttranslational modification that is associated with a wide range of biological processes. The identification of Kcr sites is critical to better understanding their functional mechanisms. However, the existing experimental techniques for detecting Kcr sites are cost-ineffective, to a great need for new computational methods to address this problem. We here describe Adapt-Kcr, an advanced deep learning model that utilizes adaptive embedding and is based on a convolutional neural network together with a bidirectional long short-term memory network and attention architecture. On the independent testing set, Adapt-Kcr outperformed the current state-of-the-art Kcr prediction model, with an improvement of 3.2% in accuracy and 1.9% in the area under the receiver operating characteristic curve. Compared to other Kcr models, Adapt-Kcr additionally had a more robust ability to distinguish between crotonylation and other lysine modifications. Another model (Adapt-ST) was trained to predict phosphorylation sites in SARS-CoV-2, and outperformed the equivalent state-of-the-art phosphorylation site prediction model. These results indicate that self-adaptive embedding features perform better than handcrafted features in capturing discriminative information; when used in attention architecture, this could be an effective way of identifying protein Kcr sites. Together, our Adapt framework (including learning embedding features and attention architecture) has a strong potential for prediction of other protein posttranslational modification sites.


Assuntos
Biologia Computacional , Aprendizado Profundo , Lisina/metabolismo , Processamento de Proteína Pós-Traducional , Software , Algoritmos , Benchmarking , Biologia Computacional/métodos , Biologia Computacional/normas , Bases de Dados Factuais , Redes Neurais de Computação , Fosforilação , Curva ROC , Reprodutibilidade dos Testes , Interface Usuário-Computador
8.
Biomedicines ; 9(11)2021 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-34829731

RESUMO

The occurrence of cancer is closely related to the deregulation of certain pathways. Based on pathway deregulation scores (PDS) inferred by the Pathifier algorithm, we analyzed transcriptomic data of 13 different cancer types in The Cancer Genome Atlas database to identify cancer-specific deregulated pathways and prognostic pathways. The results showed that the individual-specific pathway deregulation scores can clearly distinguish different cancer types and their tumor-adjacent tissues. In addition, the cancer-specific deregulated pathways and prognostic pathways of different cancer types had high heterogeneity, and the identified cancer prognostic pathways have been reported to be closely related to the corresponding cancers. Furthermore, we also found that cancers with more deregulation pathways tend to be malignant and have worse prognoses. Finally, a Cox proportional Hazards model was constructed based on the prognostic pathways; this model successfully predicted survival and prognosis based on data from cancer samples. In addition, the performance of the breast cancer prognostic model was validated with an independent data set in the METABRIC database. Therefore, the prognostic pathways we identified have the potential to become targets for the treatment of cancer.

9.
PeerJ ; 9: e11426, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34055486

RESUMO

Long non-coding RNA (lncRNA)-microRNA (miRNA) interactions are quickly emerging as important mechanisms underlying the functions of non-coding RNAs. Accordingly, predicting lncRNA-miRNA interactions provides an important basis for understanding the mechanisms of action of ncRNAs. However, the accuracy of the established prediction methods is still limited. In this study, we used structural consistency to measure the predictability of interactive links based on a bilayer network by integrating information for known lncRNA-miRNA interactions, an lncRNA similarity network, and an miRNA similarity network. In particular, by using the structural perturbation method, we proposed a framework called SPMLMI to predict potential lncRNA-miRNA interactions based on the bilayer network. We found that the structural consistency of the bilayer network was higher than that of any single network, supporting the utility of bilayer network construction for the prediction of lncRNA-miRNA interactions. Applying SPMLMI to three real datasets, we obtained areas under the curves of 0.9512 ± 0.0034, 0.8767 ± 0.0033, and 0.8653 ± 0.0021 based on 5-fold cross-validation, suggesting good model performance. In addition, the generalizability of SPMLMI was better than that of the previously established methods. Case studies of two lncRNAs (i.e., SNHG14 and MALAT1) further demonstrated the feasibility and effectiveness of the method. Therefore, SPMLMI is a feasible approach to identify novel lncRNA-miRNA interactions underlying complex biological processes.

10.
Front Genet ; 12: 650803, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33815484

RESUMO

N6-methyladenosine (m6A), the most common posttranscriptional modification in eukaryotic mRNAs, plays an important role in mRNA splicing, editing, stability, degradation, etc. Since the methylation state is dynamic, methylation sequencing needs to be carried out over different time periods, which brings some difficulties to identify the RNA methyladenine sites. Thus, it is necessary to develop a fast and accurate method to identify the RNA N6-methyladenosine sites in the transcriptome. In this study, we use first-order and second-order Markov models to identify RNA N6-methyladenine sites in three species (Saccharomyces cerevisiae, mouse, and Homo sapiens). These two methods can fully consider the correlation between adjacent nucleotides. The results show that the performance of our method is better than that of other existing methods. Furthermore, the codons encoded by three nucleotides have biases in mRNA, and a second-order Markov model can capture this kind of information exactly. This may be the main reason why the performance of the second-order Markov model is better than that of the first-order Markov model in the m6A prediction problem. In addition, we provide a corresponding web tool called MM-m6APred.

11.
Genes (Basel) ; 12(2)2021 01 28.
Artigo em Inglês | MEDLINE | ID: mdl-33525573

RESUMO

In genome-wide association studies, detecting high-order epistasis is important for analyzing the occurrence of complex human diseases and explaining missing heritability. However, there are various challenges in the actual high-order epistasis detection process due to the large amount of data, "small sample size problem", diversity of disease models, etc. This paper proposes a multi-objective genetic algorithm (EpiMOGA) for single nucleotide polymorphism (SNP) epistasis detection. The K2 score based on the Bayesian network criterion and the Gini index of the diversity of the binary classification problem were used to guide the search process of the genetic algorithm. Experiments were performed on 26 simulated datasets of different models and a real Alzheimer's disease dataset. The results indicated that EpiMOGA was obviously superior to other related and competitive methods in both detection efficiency and accuracy, especially for small-sample-size datasets, and the performance of EpiMOGA remained stable across datasets of different disease models. At the same time, a number of SNP loci and 2-order epistasis associated with Alzheimer's disease were identified by the EpiMOGA method, indicating that this method is capable of identifying high-order epistasis from genome-wide data and can be applied in the study of complex diseases.


Assuntos
Epistasia Genética/genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Genoma/genética , Algoritmos , Teorema de Bayes , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética
12.
PLoS Comput Biol ; 17(2): e1008767, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33600435

RESUMO

N6-methyladenine (6mA) is an important DNA modification form associated with a wide range of biological processes. Identifying accurately 6mA sites on a genomic scale is crucial for under-standing of 6mA's biological functions. However, the existing experimental techniques for detecting 6mA sites are cost-ineffective, which implies the great need of developing new computational methods for this problem. In this paper, we developed, without requiring any prior knowledge of 6mA and manually crafted sequence features, a deep learning framework named Deep6mA to identify DNA 6mA sites, and its performance is superior to other DNA 6mA prediction tools. Specifically, the 5-fold cross-validation on a benchmark dataset of rice gives the sensitivity and specificity of Deep6mA as 92.96% and 95.06%, respectively, and the overall prediction accuracy is 94%. Importantly, we find that the sequences with 6mA sites share similar patterns across different species. The model trained with rice data predicts well the 6mA sites of other three species: Arabidopsis thaliana, Fragaria vesca and Rosa chinensis with a prediction accuracy over 90%. In addition, we find that (1) 6mA tends to occur at GAGG motifs, which means the sequence near the 6mA site may be conservative; (2) 6mA is enriched in the TATA box of the promoter, which may be the main source of its regulating downstream gene expression.


Assuntos
Adenina/análogos & derivados , Metilação de DNA , DNA/genética , DNA/metabolismo , Aprendizado Profundo , Adenina/metabolismo , Arabidopsis/genética , Arabidopsis/metabolismo , Sequência de Bases , Sítios de Ligação/genética , Biologia Computacional , DNA de Plantas/genética , DNA de Plantas/metabolismo , Bases de Dados de Ácidos Nucleicos , Fragaria/genética , Fragaria/metabolismo , Redes Neurais de Computação , Oryza/genética , Oryza/metabolismo , Rosa/genética , Rosa/metabolismo , Especificidade da Espécie
13.
BMC Bioinformatics ; 22(1): 27, 2021 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-33482718

RESUMO

BACKGROUND: Currently, large-scale gene expression profiling has been successfully applied to the discovery of functional connections among diseases, genetic perturbation, and drug action. To address the cost of an ever-expanding gene expression profile, a new, low-cost, high-throughput reduced representation expression profiling method called L1000 was proposed, with which one million profiles were produced. Although a set of ~ 1000 carefully chosen landmark genes that can capture ~ 80% of information from the whole genome has been identified for use in L1000, the robustness of using these landmark genes to infer target genes is not satisfactory. Therefore, more efficient computational methods are still needed to deep mine the influential genes in the genome. RESULTS: Here, we propose a computational framework based on deep learning to mine a subset of genes that can cover more genomic information. Specifically, an AutoEncoder framework is first constructed to learn the non-linear relationship between genes, and then DeepLIFT is applied to calculate gene importance scores. Using this data-driven approach, we have re-obtained a landmark gene set. The result shows that our landmark genes can predict target genes more accurately and robustly than that of L1000 based on two metrics [mean absolute error (MAE) and Pearson correlation coefficient (PCC)]. This reveals that the landmark genes detected by our method contain more genomic information. CONCLUSIONS: We believe that our proposed framework is very suitable for the analysis of biological big data to reveal the mysteries of life. Furthermore, the landmark genes inferred from this study can be used for the explosive amplification of gene expression profiles to facilitate research into functional connections.


Assuntos
Aprendizado Profundo , Perfilação da Expressão Gênica , Genômica , Genoma , Transcriptoma
14.
Genes (Basel) ; 11(11)2020 10 29.
Artigo em Inglês | MEDLINE | ID: mdl-33138076

RESUMO

Identifying perturbed pathways at an individual level is important to discover the causes of cancer and develop individualized custom therapeutic strategies. Though prognostic gene lists have had success in prognosis prediction, using single genes that are related to the relevant system or specific network cannot fully reveal the process of tumorigenesis. We hypothesize that in individual samples, the disruption of transcription homeostasis can influence the occurrence, development, and metastasis of tumors and has implications for patient survival outcomes. Here, we introduced the individual-level pathway score, which can measure the correlation perturbation of the pathways in a single sample well. We applied this method to the expression data of 16 different cancer types from The Cancer Genome Atlas (TCGA) database. Our results indicate that different cancer types as well as their tumor-adjacent tissues can be clearly distinguished by the individual-level pathway score. Additionally, we found that there was strong heterogeneity among different cancer types and the percentage of perturbed pathways as well as the perturbation proportions of tumor samples in each pathway were significantly different. Finally, the prognosis-related pathways of different cancer types were obtained by survival analysis. We demonstrated that the individual-level pathway score (iPS) is capable of classifying cancer types and identifying some key prognosis-related pathways.


Assuntos
Neoplasias/genética , Estudos de Casos e Controles , Bases de Dados de Ácidos Nucleicos , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Estimativa de Kaplan-Meier , Masculino , Neoplasias/classificação , Neoplasias/mortalidade , Prognóstico , RNA-Seq
15.
PeerJ ; 8: e9161, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32461838

RESUMO

Breast cancer is a disease with high heterogeneity. Cancer is not usually caused by a single gene, but by multiple genes and their interactions with others and surroundings. Estimating breast cancer-specific gene-gene interaction networks is critical to elucidate the mechanisms of breast cancer from a biological network perspective. In this study, sample-specific gene-gene interaction networks of breast cancer samples were established by using a sample-specific network analysis method based on gene expression profiles. Then, gene-gene interaction networks and pathways related to breast cancer and its subtypes and stages were further identified. The similarity and difference among these subtype-related (and stage-related) networks and pathways were studied, which showed highly specific for subtype Basal-like and Stages IV and V. Finally, gene pairwise interactions associated with breast cancer prognosis were identified by a Cox proportional hazards regression model, and a risk prediction model based on the gene pairs was established, which also performed very well on an independent validation data set. This work will help us to better understand the mechanism underlying the occurrence of breast cancer from the sample-specific network perspective.

16.
Bioinformatics ; 36(14): 4103-4105, 2020 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-32413127

RESUMO

MOTIVATION: DNA N4-methylcytosine (4mC) modification is an important epigenetic modification in prokaryotic DNA due to its role in regulating DNA replication and protecting the host DNA against degradation. An efficient algorithm to identify 4mC sites is needed for downstream analyses. RESULTS: In this study, we propose a new prediction method named SOMM4mC based on a second-order Markov model, which makes use of the transition probability between adjacent nucleotides to identify 4mC sites. The results show that the first-order and second-order Markov model are superior to the three existing algorithms in all six species (Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Escherichia coli, Geoalkalibacter subterruneus and Geobacter pickeringii) where benchmark datasets are available. However, the classification performance of SOMM4mC is more outstanding than that of first-order Markov model. Especially, for E.coli and C.elegans, the overall accuracy of SOMM4mC are 91.8% and 87.6%, which are 8.5% and 6.1% higher than those of the latest method 4mcPred-SVM, respectively. This shows that more discriminant sequence information is captured by SOMM4mC through the dependency between adjacent nucleotides. AVAILABILITY AND IMPLEMENTATION: The web server of SOMM4mC is freely accessible at www.insect-genome.com/SOMM4mC. CONTACT: chenyuanyuan@njau.edu.cn or piancong@njau.edu.cn.


Assuntos
Drosophila melanogaster , Geobacter , Algoritmos , Animais , DNA/genética , Epigênese Genética
17.
Mol Ther Nucleic Acids ; 19: 1423-1433, 2020 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-32160711

RESUMO

MicroRNAs (miRNAs) have been shown to be closely related to cancer progression. Traditional methods for discovering cancer-related miRNAs mostly require significant marginal differential expression, but some cancer-related miRNAs may be non-differentially or only weakly differentially expressed. Such miRNAs are called dark matters miRNAs (DM-miRNAs) and are targeted through the Pearson correlation change on miRNA-target interactions (MTIs), but the efficiency of their method heavily relies on restrictive assumptions. In this paper, a novel method was developed to discover DM-miRNAs using support vector machine (SVM) based on not only the miRNA expression data but also the expression of its regulating target. The application of the new method in breast and kidney cancer datasets found, respectively, 9 and 24 potential DM-miRNAs that cannot be detected by previous methods. Eight and 15 of the newly discovered miRNAs have been found to be associated with breast and kidney cancers, respectively, in existing literature. These results indicate that our new method is more effective in discovering cancer-related miRNAs.

18.
Bioinformatics ; 36(2): 388-392, 2020 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-31297537

RESUMO

MOTIVATION: Recent studies have shown that DNA N6-methyladenine (6mA) plays an important role in epigenetic modification of eukaryotic organisms. It has been found that 6mA is closely related to embryonic development, stress response and so on. Developing a new algorithm to quickly and accurately identify 6mA sites in genomes is important for explore their biological functions. RESULTS: In this paper, we proposed a new classification method called MM-6mAPred based on a Markov model which makes use of the transition probability between adjacent nucleotides to identify 6mA site. The sensitivity and specificity of our method are 89.32% and 90.11%, respectively. The overall accuracy of our method is 89.72%, which is 6.59% higher than that of the previous method i6mA-Pred. It indicated that, compared with the 41 nucleotide chemical properties used by i6mA-Pred, the transition probability between adjacent nucleotides can capture more discriminant sequence information. AVAILABILITY AND IMPLEMENTATION: The web server of MM-6mAPred is freely accessible at http://www.insect-genome.com/MM-6mAPred/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
DNA/genética , Adenina , Metilação de DNA , Epigênese Genética , Genoma
19.
Brief Bioinform ; 21(2): 699-708, 2020 03 23.
Artigo em Inglês | MEDLINE | ID: mdl-30649247

RESUMO

miRNAs represent a type of noncoding small molecule RNA. Many studies have shown that miRNAs are widely involved in the regulation of various pathways. The key to fully understanding the regulatory function of miRNAs is the determination of the pathways in which the miRNAs participate. However, the major pathway databases such as KEGG only include information regarding protein-coding genes. Here, we redesigned a pathway database (called miR+Pathway) by integrating and visualizing the 8882 human experimentally validated miRNA-target interactions (MTIs) and 150 KEGG pathways. This database is freely accessible at http://www.insect-genome.com/miR-pathway. Researchers can intuitively determine the pathways and the genes in the pathways that are regulated by miRNAs as well as the miRNAs that target the pathways. To determine the pathways in which targets of a certain miRNA or multiple miRNAs are enriched, we performed a KEGG analysis miRNAs by using the hypergeometric test. In addition, miR+Pathway provides information regarding MTIs, PubMed IDs and the experimental verification method. Users can retrieve pathways regulated by an miRNA or a gene by inputting its names.


Assuntos
Bases de Dados Genéticas , MicroRNAs/genética , Animais , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Armazenamento e Recuperação da Informação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA