Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
2.
Comput Biol Med ; 171: 108229, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38447500

RESUMEN

Conventional COVID-19 testing methods have some flaws: they are expensive and time-consuming. Chest X-ray (CXR) diagnostic approaches can alleviate these flaws to some extent. However, there is no accurate and practical automatic diagnostic framework with good interpretability. The application of artificial intelligence (AI) technology to medical radiography can help to accurately detect the disease, reduce the burden on healthcare organizations, and provide good interpretability. Therefore, this study proposes a new deep neural network (CNN) based on CXR for COVID-19 diagnosis - CodeNet. This method uses contrastive learning to make full use of latent image data to enhance the model's ability to extract features and generalize across different data domains. On the evaluation dataset, the proposed method achieves an accuracy as high as 94.20%, outperforming several other existing methods used for comparison. Ablation studies validate the efficacy of the proposed method, while interpretability analysis shows that the method can effectively guide clinical professionals. This work demonstrates the superior detection performance of a CNN using contrastive learning techniques on CXR images, paving the way for computer vision and artificial intelligence technologies to leverage massive medical data for disease diagnosis.


Asunto(s)
COVID-19 , Aprendizaje Profundo , Humanos , COVID-19/diagnóstico por imagen , Prueba de COVID-19 , Inteligencia Artificial , Redes Neurales de la Computación
3.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37225419

RESUMEN

Single-cell RNA sequencing (scRNA-seq) detects whole transcriptome signals for large amounts of individual cells and is powerful for determining cell-to-cell differences and investigating the functional characteristics of various cell types. scRNA-seq datasets are usually sparse and highly noisy. Many steps in the scRNA-seq analysis workflow, including reasonable gene selection, cell clustering and annotation, as well as discovering the underlying biological mechanisms from such datasets, are difficult. In this study, we proposed an scRNA-seq analysis method based on the latent Dirichlet allocation (LDA) model. The LDA model estimates a series of latent variables, i.e. putative functions (PFs), from the input raw cell-gene data. Thus, we incorporated the 'cell-function-gene' three-layer framework into scRNA-seq analysis, as this framework is capable of discovering latent and complex gene expression patterns via a built-in model approach and obtaining biologically meaningful results through a data-driven functional interpretation process. We compared our method with four classic methods on seven benchmark scRNA-seq datasets. The LDA-based method performed best in the cell clustering test in terms of both accuracy and purity. By analysing three complex public datasets, we demonstrated that our method could distinguish cell types with multiple levels of functional specialization, and precisely reconstruct cell development trajectories. Moreover, the LDA-based method accurately identified the representative PFs and the representative genes for the cell types/cell stages, enabling data-driven cell cluster annotation and functional interpretation. According to the literature, most of the previously reported marker/functionally relevant genes were recognized.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Transcriptoma , Análisis por Conglomerados , Algoritmos
4.
BMC Genomics ; 23(Suppl 1): 269, 2022 Apr 06.
Artículo en Inglés | MEDLINE | ID: mdl-35387615

RESUMEN

BACKGROUND: In biological systems, metabolomics can not only contribute to the discovery of metabolic signatures for disease diagnosis, but is very helpful to illustrate the underlying molecular disease-causing mechanism. Therefore, identification of disease-related metabolites is of great significance for comprehensively understanding the pathogenesis of diseases and improving clinical medicine. RESULTS: In the paper, we propose a disease and literature driven metabolism prediction model (DLMPM) to identify the potential associations between metabolites and diseases based on latent factor model. We build the disease glossary with disease terms from different databases and an association matrix based on the mapping between diseases and metabolites. The similarity of diseases and metabolites is used to complete the association matrix. Finally, we predict potential associations between metabolites and diseases based on the matrix decomposition method. In total, 1,406 direct associations between diseases and metabolites are found. There are 119,206 unknown associations between diseases and metabolites predicted with a coverage rate of 80.88%. Subsequently, we extract training sets and testing sets based on data increment from the database of disease-related metabolites and assess the performance of DLMPM on 19 diseases. As a result, DLMPM is proven to be successful in predicting potential metabolic signatures for human diseases with an average AUC value of 82.33%. CONCLUSION: In this paper, a computational model is proposed for exploring metabolite-disease pairs and has good performance in predicting potential metabolites related to diseases through adequate validation. The results show that DLMPM has a better performance in prioritizing candidate diseases-related metabolites compared with the previous methods and would be helpful for researchers to reveal more information about human diseases.


Asunto(s)
Metabolómica , Publicaciones , Biología Computacional/métodos , Bases de Datos Factuales , Humanos , Metabolómica/métodos
5.
IEEE/ACM Trans Comput Biol Bioinform ; 19(4): 2017-2025, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-33687846

RESUMEN

The detection of drug-target interactions (DTIs) plays an important role in drug discovery and development, making DTI prediction urgent to be solved. Existing computational methods usually utilize drug similarity, target similarity and DTI information to make prediction, providing the convenience of fast time and low cost. However, they usually learn features for drugs and targets separately, lacking of a global consideration. In this study, we proposed a novel neighborhood-based global network model, named as NGN, to accurately predict DTIs from the global perspective. We designed a distance constraint for features of all entities (drugs and targets) in the latent space to ensure the close distance between adjacent entities, and defined a global probability matrix to compute the predicted DTI scores on our constructed neighborhood-based global network. Results showed that NGN obtained advantageous performance compared with other state-of-the-art methods, especially surpassing them by 4.2-9.1 percent on AUPR values in the biggest dataset. Furthermore, several novel high-ranked DTIs were successfully predicted with confirmations by public sources, demonstrating the effectiveness of our method.


Asunto(s)
Descubrimiento de Drogas , Descubrimiento de Drogas/métodos , Interacciones Farmacológicas
6.
Front Pharmacol ; 12: 799712, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34955863

RESUMEN

Modeling-based anti-cancer drug sensitivity prediction has been extensively studied in recent years. While most drug sensitivity prediction models only use gene expression data, the remarkable impacts of gene mutation, methylation, and copy number variation on drug sensitivity are neglected. Drug sensitivity prediction can both help protect patients from some adverse drug reactions and improve the efficacy of treatment. Genomics data are extremely useful for drug sensitivity prediction task. This article reviews the role of drug sensitivity prediction, describes a variety of methods for predicting drug sensitivity. Moreover, the research significance of drug sensitivity prediction, as well as existing problems are well discussed.

7.
Front Cell Dev Biol ; 9: 697035, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34414185

RESUMEN

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), causing an outbreak of coronavirus disease 2019 (COVID-19), has been undergoing various mutations. The analysis of the structural and energetic effects of mutations on protein-protein interactions between the receptor binding domain (RBD) of SARS-CoV-2 and angiotensin converting enzyme 2 (ACE2) or neutralizing monoclonal antibodies will be beneficial for epidemic surveillance, diagnosis, and optimization of neutralizing agents. According to the molecular dynamics simulation, a key mutation N439K in the SARS-CoV-2 RBD region created a new salt bridge with Glu329 of hACE2, which resulted in greater electrostatic complementarity, and created a weak salt bridge with Asp442 of RBD. Furthermore, the N439K-mutated RBD bound hACE2 with a higher affinity than wild-type, which may lead to more infectious. In addition, the N439K-mutated RBD was markedly resistant to the SARS-CoV-2 neutralizing antibody REGN10987, which may lead to the failure of neutralization. The results show consistent with the previous experimental conclusion and clarify the structural mechanism under affinity changes. Our methods will offer guidance on the assessment of the infection efficiency and antigenicity effect of continuing mutations in SARS-CoV-2.

8.
Nucleic Acids Res ; 49(D1): D1413-D1419, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33010177

RESUMEN

SC2disease (http://easybioai.com/sc2disease/) is a manually curated database that aims to provide a comprehensive and accurate resource of gene expression profiles in various cell types for different diseases. With the development of single-cell RNA sequencing (scRNA-seq) technologies, uncovering cellular heterogeneity of different tissues for different diseases has become feasible by profiling transcriptomes across cell types at the cellular level. In particular, comparing gene expression profiles between different cell types and identifying cell-type-specific genes in various diseases offers new possibilities to address biological and medical questions. However, systematic, hierarchical and vast databases of gene expression profiles in human diseases at the cellular level are lacking. Thus, we reviewed the literature prior to March 2020 for studies which used scRNA-seq to study diseases with human samples, and developed the SC2disease database to summarize all the data by different diseases, tissues and cell types. SC2disease documents 946 481 entries, corresponding to 341 cell types, 29 tissues and 25 diseases. Each entry in the SC2disease database contains comparisons of differentially expressed genes between different cell types, tissues and disease-related health status. Furthermore, we reanalyzed gene expression matrix by unified pipeline to improve the comparability between different studies. For each disease, we also compare cell-type-specific genes with the corresponding genes of lead single nucleotide polymorphisms (SNPs) identified in genome-wide association studies (GWAS) to implicate cell type specificity of the traits.


Asunto(s)
Trastorno del Espectro Autista/genética , Enfermedades Autoinmunes/genética , Enfermedades Cardiovasculares/genética , Bases de Datos Factuales , Enfermedades Gastrointestinales/genética , Neoplasias/genética , Enfermedades Neurodegenerativas/genética , Virosis/genética , Algoritmos , Trastorno del Espectro Autista/metabolismo , Trastorno del Espectro Autista/patología , Enfermedades Autoinmunes/metabolismo , Enfermedades Autoinmunes/patología , Enfermedades Cardiovasculares/metabolismo , Enfermedades Cardiovasculares/patología , Enfermedades Gastrointestinales/metabolismo , Enfermedades Gastrointestinales/patología , Perfilación de la Expresión Génica , Heterogeneidad Genética , Estudio de Asociación del Genoma Completo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Internet , Neoplasias/metabolismo , Neoplasias/patología , Enfermedades Neurodegenerativas/metabolismo , Enfermedades Neurodegenerativas/patología , Especificidad de Órganos , Polimorfismo de Nucleótido Simple , Análisis de la Célula Individual/métodos , Programas Informáticos , Transcriptoma , Virosis/metabolismo , Virosis/patología
10.
Front Cell Dev Biol ; 8: 557, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32695786

RESUMEN

Gastric cancer (GC) is the fourth most common malignant tumor. The mechanism underlying GC occurrence and development remains unclear. Previous studies have indicated that long non-coding RNAs (lncRNAs) are significantly associated with gastric cancer, but a systematic understanding of the role of lncRNAs in gastric cancer is lacking. In recent years, with the development of next-generation sequencing technology, tens of thousands of lncRNAs have been discovered. However, a large number of unannotated lncRNAs remain unidentified in different tissues, including potential gastric cancer-related lncRNAs. In this study, RNA sequencing (RNA-seq) data from 16 samples of eight gastric cancer patients were obtained and analyzed. A total of 1,854 previously unannotated lncRNAs were identified by ab initio assembly, and 520 differentially expressed lncRNAs were validated in the TCGA expression dataset. Methylation and copy number variation (CNV) array data from the same sample were integrated in the analysis. Changes in DNA methylation levels and CNVs may be responsible for the differential expression of 91 lncRNAs. Differentially expressed lncRNAs were enriched in coexpressed clusters of genes related to functions such as cell signaling, cell cycle, immune response, metabolic processes, angiogenesis, and regulation of retinoic acid (RA) receptors. Finally, a differentially expressed lncRNA, AC004510.3, was identified as a potential biomarker for the prediction of the overall survival of gastric cancer patients.

11.
Biomed Res Int ; 2020: 9235920, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32596396

RESUMEN

Enzymes are proteins that can efficiently catalyze specific biochemical reactions, and they are widely present in the human body. Developing an efficient method to identify human enzymes is vital to select enzymes from the vast number of human proteins and to investigate their functions. Nevertheless, only a limited amount of research has been conducted on the classification of human enzymes and nonenzymes. In this work, we developed a support vector machine- (SVM-) based predictor to classify human enzymes using the amino acid composition (AAC), the composition of k-spaced amino acid pairs (CKSAAP), and selected informative amino acid pairs through the use of a feature selection technique. A training dataset including 1117 human enzymes and 2099 nonenzymes and a test dataset including 684 human enzymes and 1270 nonenzymes were constructed to train and test the proposed model. The results of jackknife cross-validation showed that the overall accuracy was 76.46% for the training set and 76.21% for the test set, which are higher than the 72.6% achieved in previous research. Furthermore, various feature extraction methods and mainstream classifiers were compared in this task, and informative feature parameters of k-spaced amino acid pairs were selected and compared. The results suggest that our classifier can be used in human enzyme identification effectively and efficiently and can help to understand their functions and develop new drugs.


Asunto(s)
Aminoácidos/química , Enzimas/química , Proteínas/química , Algoritmos , Biología Computacional , Bases de Datos de Proteínas , Enzimas/clasificación , Humanos , Proteínas/clasificación , Máquina de Vectores de Soporte
12.
Bioinformatics ; 36(18): 4757-4764, 2020 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-32573702

RESUMEN

MOTIVATION: Evaluating genome similarity among individuals is an essential step in data analysis. Advanced sequencing technology detects more and rarer variants for massive individual genomes, thus enabling individual-level genome similarity evaluation. However, the current methodologies, such as the principal component analysis (PCA), lack the capability to fully leverage rare variants and are also difficult to interpret in terms of population genetics. RESULTS: Here, we introduce a probabilistic topic model, latent Dirichlet allocation, to evaluate individual genome similarity. A total of 2535 individuals from the 1000 Genomes Project (KGP) were used to demonstrate our method. Various aspects of variant choice and model parameter selection were studied. We found that relatively rare (0.001 20 000 bp) variants are more efficient for genome similarity evaluation. At least 100 000 such variants are necessary. In our results, the populations show significantly less mixed and more cohesive visualization than the PCA results. The global similarities among the KGP genomes are consistent with known geographical, historical and cultural factors. AVAILABILITY AND IMPLEMENTATION: The source code and data access are available at: https://github.com/lrjuan/LDA_genome. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma , Programas Informáticos , Genética de Población , Humanos , Modelos Estadísticos , Análisis de Componente Principal
13.
Artículo en Inglés | MEDLINE | ID: mdl-32047747

RESUMEN

Although genome sequencing has become increasingly popular, the simulation of individual genomes is still important. This is because sequencing a large number of individual genomes is costly and genome data with extreme and boundary conditions, such as fatal genetic defects, are difficult to obtain. Privacy and legal barriers also prevent many applications of real data. Large sequencing projects in recent years have provided a deeper understanding of the human genome. However, there is a lack of tools to leverage known data to simulate personal genomes as real as possible. Here, we designed and developed PGsim, a comprehensive and highly customizable individual genome simulator, that fully uses existing knowledge, such as variant allele frequencies in global or world main populations, mutation probability differences between protein-coding regions and non-coding regions, transition/transversion (Ti/Tv) ratios, Indel incidence, Indel length distribution, structural variation sites, and pathogenic mutation sites. Users can flexibly control the proportion and quantity of known variants, common variants, novel variants in both coding and non-coding regions, and special variants through detailed parameter settings. To ensure that the simulated personal genome has sufficient randomness, PGsim makes the generated variants more real and reliable in terms of variant distribution, proportion, and population characteristics. PGsim is able to employ a huge volume database as background data to simulate personal genomes and does not require SQL database support. Users can easily change the variant databases used as needed. As a Perl script, there is no obstacle to running PGsim on any version of the MAC OS or Linux systems, and no libraries, packages, interpreters, compilers, or other dependencies need to be installed in advance. The PGsim tool is publicly available at https://github.com/lrjuan/PGsim.

14.
BMC Bioinformatics ; 20(Suppl 16): 582, 2019 Dec 02.
Artículo en Inglés | MEDLINE | ID: mdl-31787106

RESUMEN

BACKGROUND: Over the past decades, a large number of long non-coding RNAs (lncRNAs) have been identified. Growing evidence has indicated that the mutation and dysregulation of lncRNAs play a critical role in the development of many complex human diseases. Consequently, identifying potential disease-related lncRNAs is an effective means to improve the quality of disease diagnostics and treatment, which is the motivation of this work. Here, we propose a computational model (LncDisAP) for potential disease-related lncRNA identification based on multiple biological datasets. First, the associations between lncRNA and different data sources are collected from different databases. With these data sources as dimensions, we calculate the functional associations between lncRNAs by the recommendation strategy of collaborative filtering. Subsequently, a disease-associated lncRNA functional network is built with functional similarities between lncRNAs as the weight. Ultimately, potential disease-related lncRNAs can be identified based on ranked scores derived by random walking with restart (RWR). Then, training sets and testing sets are extracted from two different versions of a disease-lncRNA dataset to assess the performance of LncDisAP on 54 diseases. RESULTS: A lncRNA functional network is built based on the proposed computational model, and it contains 66,060 associations among 364 lncRNAs associated with 182 diseases in total. We extract 218 known disease-lncRNA pairs associated with 54 diseases to assess the network. As a result, the average AUC (area under the receiver operating characteristic curve) of LncDisAP is 78.08%. CONCLUSION: In this article, a computational model integrating multiple lncRNA-related biological datasets is proposed for identifying potential disease-related lncRNAs. The result shows that LncDisAP is successful in predicting novel disease-related lncRNA signatures. In addition, with several common cancers taken as case studies, we found some unknown lncRNAs that could be associated with these diseases through our network. These results suggest that this method can be helpful in improving the quality for disease diagnostics and treatment.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Simulación por Computador , Bases de Datos Genéticas , Enfermedad/genética , Regulación de la Expresión Génica , ARN Largo no Codificante/genética , Área Bajo la Curva , Redes Reguladoras de Genes , Humanos , Curva ROC
15.
BMC Bioinformatics ; 20(Suppl 18): 574, 2019 Nov 25.
Artículo en Inglés | MEDLINE | ID: mdl-31760947

RESUMEN

BACKGROUND: As the terminal products of cellular regulatory process, functional related metabolites have a close relationship with complex diseases, and are often associated with the same or similar diseases. Therefore, identification of disease related metabolites play a critical role in understanding comprehensively pathogenesis of disease, aiming at improving the clinical medicine. Considering that a large number of metabolic markers of diseases need to be explored, we propose a computational model to identify potential disease-related metabolites based on functional relationships and scores of referred literatures between metabolites. First, obtaining associations between metabolites and diseases from the Human Metabolome database, we calculate the similarities of metabolites based on modified recommendation strategy of collaborative filtering utilizing the similarities between diseases. Next, a disease-associated metabolite network (DMN) is built with similarities between metabolites as weight. To improve the ability of identifying disease-related metabolites, we introduce scores of text mining from the existing database of chemicals and proteins into DMN and build a new disease-associated metabolite network (FLDMN) by fusing functional associations and scores of literatures. Finally, we utilize random walking with restart (RWR) in this network to predict candidate metabolites related to diseases. RESULTS: We construct the disease-associated metabolite network and its improved network (FLDMN) with 245 diseases, 587 metabolites and 28,715 disease-metabolite associations. Subsequently, we extract training sets and testing sets from two different versions of the Human Metabolome database and assess the performance of DMN and FLDMN on 19 diseases, respectively. As a result, the average AUC (area under the receiver operating characteristic curve) of DMN is 64.35%. As a further improved network, FLDMN is proven to be successful in predicting potential metabolic signatures for 19 diseases with an average AUC value of 76.03%. CONCLUSION: In this paper, a computational model is proposed for exploring metabolite-disease pairs and has good performance in predicting potential metabolites related to diseases through adequate validation. This result suggests that integrating literature and functional associations can be an effective way to construct disease associated metabolite network for prioritizing candidate diseases-related metabolites.


Asunto(s)
Biología Computacional/métodos , Metaboloma , Algoritmos , Simulación por Computador , Minería de Datos , Bases de Datos Factuales , Humanos , Publicaciones/estadística & datos numéricos , Curva ROC
16.
Artículo en Inglés | MEDLINE | ID: mdl-28981421

RESUMEN

Copy number variants (CNVs) play important roles in human disease and evolution. With the rapid development of next-generation sequencing technologies, many tools have been developed for inferring CNVs based on whole-exome sequencing (WES) data. However, as a result of the sparse distribution of exons in the genome, the limitations of the WES technique, and the nature of high-level signal noises in WES data, the efficacy of these variants remains less than desirable. Thus, there is need for the development of an effective tool to achieve a considerable power in WES CNVs discovery. In the present study, we describe a novel method, Estimation by Read Depth (RD) with Single-nucleotide variants from exome sequencing data (ERDS-exome). ERDS-exome employs a hybrid normalization approach to normalize WES data and to incorporate RD and single-nucleotide variation information together as a hybrid signal into a paired hidden Markov model to infer CNVs from WES data. Based on systematic evaluations of real data from the 1000 Genomes Project using other state-of-the-art tools, we observed that ERDS-exome demonstrates higher sensitivity and provides comparable or even better specificity than other tools. ERDS-exome is publicly available at: https://erds-exome.github.io.

18.
BMC Genomics ; 17 Suppl 5: 530, 2016 08 31.
Artículo en Inglés | MEDLINE | ID: mdl-27586009

RESUMEN

BACKGROUND: The Gene Ontology (GO) has been used in high-throughput omics research as a major bioinformatics resource. The hierarchical structure of GO provides users a convenient platform for biological information abstraction and hypothesis testing. Computational methods have been developed to identify functionally similar genes. However, none of the existing measurements take into account all the rich information in GO. Similarly, using these existing methods, web-based applications have been constructed to compute gene functional similarities, and to provide pure text-based outputs. Without a graphical visualization interface, it is difficult for result interpretation. RESULTS: We present InteGO2, a web tool that allows researchers to calculate the GO-based gene semantic similarities using seven widely used GO-based similarity measurements. Also, we provide an integrative measurement that synergistically integrates all the individual measurements to improve the overall performance. Using HTML5 and cytoscape.js, we provide a graphical interface in InteGO2 to visualize the resulting gene functional association networks. CONCLUSIONS: InteGO2 is an easy-to-use HTML5 based web tool. With it, researchers can measure gene or gene product functional similarity conveniently, and visualize the network of functional interactions in a graphical interface. InteGO2 can be accessed via http://mlg.hit.edu.cn:8089/ .


Asunto(s)
Ontología de Genes , Programas Informáticos , Algoritmos , Biología Computacional , Redes Reguladoras de Genes , Genes Bacterianos
19.
Bioinformatics ; 32(8): 1130-7, 2016 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-26644415

RESUMEN

MOTIVATION: Whole genome sequencing (WGS) of parent-offspring trios is a powerful approach for identifying disease-associated genes via detecting copy number variations (CNVs). Existing approaches, which detect CNVs for each individual in a trio independently, usually yield low-detection accuracy. Joint modeling approaches leveraging Mendelian transmission within the parent-offspring trio can be an efficient strategy to improve CNV detection accuracy. RESULTS: In this study, we developed TrioCNV, a novel approach for jointly detecting CNVs in parent-offspring trios from WGS data. Using negative binomial regression, we modeled the read depth signal while considering both GC content bias and mappability bias. Moreover, we incorporated the family relationship and used a hidden Markov model to jointly infer CNVs for three samples of a parent-offspring trio. Through application to both simulated data and a trio from 1000 Genomes Project, we showed that TrioCNV achieved superior performance than existing approaches. AVAILABILITY AND IMPLEMENTATION: The software TrioCNV implemented using a combination of Java and R is freely available from the website at https://github.com/yongzhuang/TrioCNV CONTACT: ydwang@hit.edu.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Variaciones en el Número de Copia de ADN , Programas Informáticos , Algoritmos , Genoma , Humanos , Padres
20.
Biomed Res Int ; 2015: 861402, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26425556

RESUMEN

A microRNA is a small noncoding RNA molecule, which functions in RNA silencing and posttranscriptional regulation of gene expression. To understand the mechanism of the activation of microRNA genes, the location of promoter regions driving their expression is required to be annotated precisely. Only a fraction of microRNA genes have confirmed transcription start sites (TSSs), which hinders our understanding of the transcription factor binding events. With the development of the next generation sequencing technology, the chromatin states can be inferred precisely by virtue of a combination of specific histone modifications. Using the genome-wide profiles of nine histone markers including H3K4me2, H3K4me3, H3K9Ac, H3K9me2, H3K18Ac, H3K27me1, H3K27me3, H3K36me2, and H3K36me3, we developed a computational strategy to identify the promoter regions of most microRNA genes in Arabidopsis, based upon the assumption that the distribution of histone markers around the TSSs of microRNA genes is similar to the TSSs of protein coding genes. Among 298 miRNA genes, our model identified 42 independent miRNA TSSs and 132 miRNA TSSs, which are located in the promoters of upstream genes. The identification of promoters will provide better understanding of microRNA regulation and can play an important role in the study of diseases at genetic level.


Asunto(s)
Arabidopsis/genética , Biomarcadores/metabolismo , Histonas/metabolismo , MicroARNs/metabolismo , Regiones Promotoras Genéticas , MicroARNs/genética , Sistemas de Lectura Abierta/genética , Curva ROC , Máquina de Vectores de Soporte , Sitio de Iniciación de la Transcripción
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA