Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Genes (Basel) ; 15(5)2024 04 29.
Artigo em Inglês | MEDLINE | ID: mdl-38790203

RESUMO

MicroRNAs (miRNAs), a class of small, non-coding RNAs, play a pivotal role in regulating gene expression at the post-transcriptional level. These regulatory molecules are integral to many biological processes and have been implicated in the pathogenesis of various diseases, including Human Immunodeficiency Virus (HIV) infection. This review aims to cover the current understanding of the multifaceted roles miRNAs assume in the context of HIV infection and pathogenesis. The discourse is structured around three primary focal points: (i) elucidation of the mechanisms through which miRNAs regulate HIV replication, encompassing both direct targeting of viral transcripts and indirect modulation of host factors critical for viral replication; (ii) examination of the modulation of miRNA expression by HIV, mediated through either viral proteins or the activation of cellular pathways consequent to viral infection; and (iii) assessment of the impact of miRNAs on the immune response and the progression of disease in HIV-infected individuals. Further, this review delves into the potential utility of miRNAs as biomarkers and therapeutic agents in HIV infection, underscoring the challenges and prospects inherent to this line of inquiry. The synthesis of current evidence positions miRNAs as significant modulators of the host-virus interplay, offering promising avenues for enhancing the diagnosis, treatment, and prevention of HIV infection.


Assuntos
Infecções por HIV , MicroRNAs , Replicação Viral , Humanos , MicroRNAs/genética , Infecções por HIV/genética , Infecções por HIV/virologia , Replicação Viral/genética , HIV-1/genética , Interações Hospedeiro-Patógeno/genética , Biomarcadores , Regulação da Expressão Gênica
2.
Front Mol Biosci ; 11: 1336336, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38380430

RESUMO

Alternative polyadenylation (APA) increases transcript diversity through the generation of isoforms with varying 3' untranslated region (3' UTR) lengths. As the 3' UTR harbors regulatory element target sites, such as miRNAs or RNA-binding proteins, changes in this region can impact post-transcriptional regulation and translation. Moreover, the APA landscape can change based on the cell type, cell state, or condition. Given that APA events can impact protein expression, investigating translational control is crucial for comprehending the overall cellular regulation process. Revisiting data from polysome profiling followed by RNA sequencing, we investigated the cardiomyogenic differentiation of pluripotent stem cells by identifying the transcripts that show dynamic 3' UTR lengthening or shortening, which are being actively recruited to ribosome complexes. Our findings indicate that dynamic 3' UTR lengthening is not exclusively associated with differential expression during cardiomyogenesis but rather with recruitment to polysomes. We confirm that the differentiated state of cardiomyocytes shows a preference for shorter 3' UTR in comparison to the pluripotent stage although preferences vary during the days of the differentiation process. The most distinct regulatory changes are seen in day 4 of differentiation, which is the mesoderm commitment time point of cardiomyogenesis. After identifying the miRNAs that would target specifically the alternative 3' UTR region of the isoforms, we constructed a gene regulatory network for the cardiomyogenesis process, in which genes related to the cell cycle were identified. Altogether, our work sheds light on the regulation and dynamic 3' UTR changes of polysome-recruited transcripts that take place during the cardiomyogenic differentiation of pluripotent stem cells.

3.
BMC Bioinformatics ; 24(1): 60, 2023 Feb 23.
Artigo em Inglês | MEDLINE | ID: mdl-36823571

RESUMO

BACKGROUND: Cell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an organism. The Kyoto encyclopedia of genes and genomes (KEGG) systematically analyzes gene functions, proteins, and molecules and combines them into pathways. Measurements of gene expression (e.g., RNA-seq data) can be mapped to KEGG pathways to determine which modules are affected or dysregulated in the disease. However, genes acting in multiple pathways and other inherent issues complicate such analyses. Many current approaches may only employ gene expression data and need to pay more attention to some of the existing knowledge stored in KEGG pathways for detecting dysregulated pathways. New methods that consider more precompiled information are required for a more holistic association between gene expression and diseases. RESULTS: PriPath is a novel approach that transfers the generic process of grouping and scoring, followed by modeling to analyze gene expression with KEGG pathways. In PriPath, KEGG pathways are utilized as the grouping function as part of a machine learning algorithm for selecting the most significant KEGG pathways. A machine learning model is trained to differentiate between diseases and controls using those groups. We have tested PriPath on 13 gene expression datasets of various cancers and other diseases. Our proposed approach successfully assigned biologically and clinically relevant KEGG terms to the samples based on the differentially expressed genes. We have comparatively evaluated the performance of PriPath against other tools, which are similar in their merit. For each dataset, we manually confirmed the top results of PriPath in the literature and found that most predictions can be supported by previous experimental research. CONCLUSIONS: PriPath can thus aid in determining dysregulated pathways, which applies to medical diagnostics. In the future, we aim to advance this approach so that it can perform patient stratification based on gene expression and identify druggable targets. Thereby, we cover two aspects of precision medicine.


Assuntos
Biologia Computacional , Neoplasias , Humanos , Biologia Computacional/métodos , Neoplasias/genética , Genoma , Algoritmos , Expressão Gênica , Perfilação da Expressão Gênica
4.
J Integr Bioinform ; 20(1)2023 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-36812104

RESUMO

Science has become a highly competitive undertaking concerning, for example, resources, positions, students, and publications. At the same time, the number of journals presenting scientific findings skyrockets while the knowledge increase per manuscript seems to be diminishing. Science has also become ever more dependent on computational analyses. For example, virtually all biomedical applications involve computational data analysis. The science community develops many computational tools, and there are numerous alternatives for many computational tasks. The same is true for workflow management systems, leading to a tremendous duplication of efforts. Software quality is often of low concern, and typically, a small dataset is used as a proof of principle to support rapid publication. Installation and usage of such tools are complicated, so virtual machine images, containers, and package managers are employed more frequently. These simplify installation and ease of use but do not solve the software quality issue and duplication of effort. We believe that a community-wide collaboration is needed to (a) ensure software quality, (b) increase reuse of code, (c) force proper software review, (c) increase testing, and (d) make interoperability more seamless. Such a science software ecosystem will overcome current issues and increase trust in current data analyses.


Assuntos
Ecossistema , Confiança , Humanos , Biologia Computacional/métodos , Software , Fluxo de Trabalho
5.
Curr Pharm Biotechnol ; 24(7): 825-831, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35619299

RESUMO

Diseases such as cancer are often defined by dysregulation of gene expression. Noncoding RNAs (ncRNA) such as microRNAs are involved in gene expression and cell-cell communication. Many other ncRNAs exist, such as circular RNAs and small nucleolar RNAs. A wealth of knowledge is available for many ncRNAs, but the information is federated in many databases. A small number of highly complementary ncRNA databases are discussed in this work. Their relevance for cancer research is highlighted, and some of the current problems and limitations are revealed. A central or shared database enforcing community reporting and quality standards is needed in the future. • RNA-seq • Noncoding RNAs • Databases • Data repositories.


Assuntos
MicroRNAs , Neoplasias , RNA Longo não Codificante , Humanos , RNA Longo não Codificante/genética , RNA não Traduzido/genética , RNA não Traduzido/metabolismo , MicroRNAs/genética , MicroRNAs/metabolismo , Neoplasias/genética
6.
Turk J Biol ; 47(6): 366-382, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38681776

RESUMO

Deep learning is a powerful machine learning technique that can learn from large amounts of data using multiple layers of artificial neural networks. This paper reviews some applications of deep learning in bioinformatics, a field that deals with analyzing and interpreting biological data. We first introduce the basic concepts of deep learning and then survey the recent advances and challenges of applying deep learning to various bioinformatics problems, such as genome sequencing, gene expression analysis, protein structure prediction, drug discovery, and disease diagnosis. We also discuss future directions and opportunities for deep learning in bioinformatics. We aim to provide an overview of deep learning so that bioinformaticians applying deep learning models can consider all critical technical and ethical aspects. Thus, our target audience is biomedical informatics researchers who use deep learning models for inference. This review will inspire more bioinformatics researchers to adopt deep-learning methods for their research questions while considering fairness, potential biases, explainability, and accountability.

7.
Methods Mol Biol ; 2257: 235-254, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34432282

RESUMO

Gene regulation is of utmost importance to cell homeostasis; thus, any dysregulation in it often leads to disease. MicroRNAs (miRNAs) are involved in posttranscriptional gene regulation and consequently, their dysregulation has been associated with many diseases.MiRBase version 21 contains microRNAs from about 200 species organized into about 70 clades. It has been shown that not all miRNAs collected in the database are likely to be real and, therefore, novel routes to delineate between correct and false miRNAs should be explored. We introduce a novel approach based on k-mer frequencies and machine learning that assigns an unknown/unlabeled miRNA to its most likely clade/species of origin. A simple way to filter new data would be to ensure that the novel miRNA categorizes closely to the species it is said to originate from. For that, an ensemble classifier of multiple two-class random forest classifiers was designed, where each random forest was trained on one species-clade pair. The approach was tested with different sampling methods on a dataset that was taken from miRBase version 21 and it was evaluated using a hierarchical F-measure. The approach predicted 81% to 94% of the test data correctly, depending on the sampling method. This is the first classifier that can classify miRNAs to their species of origin. This method will aid in the evaluation of miRNA database integrity and analysis of noisy miRNA samples.


Assuntos
MicroRNAs/genética , Regulação da Expressão Gênica , Aprendizado de Máquina
8.
Methods Mol Biol ; 2257: 423-438, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34432289

RESUMO

Mature microRNAs (miRNAs) are short RNA sequences about 18-24 nucleotide long, which provide the recognition key within RISC for the posttranscriptional regulation of target RNAs. Considering the canonical pathway, mature miRNAs are produced via a multistep process. Their transcription (pri-miRNAs) and first processing step via the microprocessor complex (pre-miRNAs) occur in the nucleus. Then they are exported into the cytosol, processed again by Dicer (dsRNA) and finally a single strand (mature miRNA) is incorporated into RISC (miRISC). The sequence of the incorporated miRNA provides the function of RNA target recognition via hybridization. Following binding of the target, the mRNA is either degraded or translation is inhibited, which ultimately leads to less protein production. Conversely, it has been shown that binding within the 5' UTR of the mRNA can lead to an increase in protein product. Regulation of homeostasis is very important for a cell; therefore, all steps in the miRNA-based regulation pathway, from transcription to the incorporation of the mature miRNA into RISC, are under tight control. While much research effort has been exerted in this area, the knowledgebase is not sufficient for accurately modelling miRNA regulation computationally. The computational prediction of miRNAs is, however, necessary because it is not feasible to investigate all possible pairs of a miRNA and its target, let alone miRNAs and their targets. We here point out open challenges important for computational modelling or for our general understanding of miRNA-based regulation and show how their investigation is beneficial. It is our hope that this collection of challenges will lead to their resolution in the near future.


Assuntos
MicroRNAs/genética , Regulação da Expressão Gênica , Genômica , RNA Mensageiro
10.
Arch Toxicol ; 95(11): 3475-3495, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34510227

RESUMO

microRNAs (miRNAs or miRs) are short non-coding RNA molecules which have been shown to be dysregulated and released into the extracellular milieu as a result of many drug and non-drug-induced pathologies in different organ systems. Consequently, circulating miRs have been proposed as useful biomarkers of many disease states, including drug-induced tissue injury. miRs have shown potential to support or even replace the existing traditional biomarkers of drug-induced toxicity in terms of sensitivity and specificity, and there is some evidence for their improved diagnostic and prognostic value. However, several pre-analytical and analytical challenges, mainly associated with assay standardization, require solutions before circulating miRs can be successfully translated into the clinic. This review will consider the value and potential for the use of circulating miRs in drug-safety assessment and describe a systems approach to the analysis of the miRNAome in the discovery setting, as well as highlighting standardization issues that at this stage prevent their clinical use as biomarkers. Highlighting these challenges will hopefully drive future research into finding appropriate solutions, and eventually circulating miRs may be translated to the clinic where their undoubted biomarker potential can be used to benefit patients in rapid, easy to use, point-of-care test systems.


Assuntos
Biomarcadores Farmacológicos , MicroRNAs/sangue , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/diagnóstico , Humanos , MicroRNAs/análise , Sensibilidade e Especificidade
11.
J Integr Bioinform ; 18(1): 19-26, 2021 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-33721918

RESUMO

SARS-CoV-2 has spread worldwide and caused social, economic, and health turmoil. The first genome assembly of SARS-CoV-2 was produced in Wuhan, and it is widely used as a reference. Subsequently, more than a hundred additional SARS-CoV-2 genomes have been sequenced. While the genomes appear to be mostly identical, there are variations. Therefore, an alignment of all available genomes and the derived consensus sequence could be used as a reference, better serving the science community. Variations are significant, but representing them in a genome browser can become, especially if their sequences are largely identical. Here we summarize the variation in one track. Other information not currently found in genome browsers for SARS-CoV-2, such as predicted miRNAs and predicted TRS as well as secondary structure information, were also added as tracks to the consensus genome. We believe that a genome browser based on the consensus sequence is better suited when considering worldwide effects and can become a valuable resource in the combating of COVID-19. The genome browser is available at http://cov.iaba.online.


Assuntos
COVID-19 , Genoma Viral/genética , SARS-CoV-2/genética , Sequência de Bases , Humanos , Software
13.
PeerJ ; 8: e10216, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33150092

RESUMO

For the identification and sequencing of proteins, mass spectrometry (MS) has become the tool of choice and, as such, drives proteomics. MS/MS spectra need to be assigned a peptide sequence for which two strategies exist. Either database search or de novo sequencing can be employed to establish peptide spectrum matches. For database search, mzIdentML is the current community standard for data representation. There is no community standard for representing de novo sequencing results, but we previously proposed the de novo markup language (DNML). At the moment, each de novo sequencing solution uses different data representation, complicating downstream data integration, which is crucial since ensemble predictions may be more useful than predictions of a single tool. We here propose the de novo MS Ontology (DNMSO), which can, for example, provide many-to-many mappings between spectra and peptide predictions. Additionally, an application programming interface (API) that supports any file operation necessary for de novo sequencing from spectra input to reading, writing, creating, of the DNMSO format, as well as conversion from many other file formats, has been implemented. This API removes all overhead from the production of de novo sequencing tools and allows developers to concentrate on algorithm development completely. We make the API and formal descriptions of the format freely available at https://github.com/savastakan/dnmso.

14.
J Integr Bioinform ; 16(3)2019 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-31145694

RESUMO

Big data and complex analysis workflows (pipelines) are common issues in data driven science such as bioinformatics. Large amounts of computational tools are available for data analysis. Additionally, many workflow management systems to piece together such tools into data analysis pipelines have been developed. For example, more than 50 computational tools for read mapping are available representing a large amount of duplicated effort. Furthermore, it is unclear whether these tools are correct and only a few have a user base large enough to have encountered and reported most of the potential problems. Bringing together many largely untested tools in a computational pipeline must lead to unpredictable results. Yet, this is the current state. While presently data analysis is performed on personal computers/workstations/clusters, the future will see development and analysis shift to the cloud. None of the workflow management systems is ready for this transition. This presents the opportunity to build a new system, which will overcome current duplications of effort, introduce proper testing, allow for development and analysis in public and private clouds, and include reporting features leading to interactive documents.


Assuntos
Biologia Computacional , Internet , Software , Interface Usuário-Computador , Fluxo de Trabalho
15.
Bioinformatics ; 35(20): 4020-4028, 2019 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-30895309

RESUMO

MOTIVATION: Disease is often manifested via changes in transcript and protein abundance. MicroRNAs (miRNAs) are instrumental in regulating protein abundance and may measurably influence transcript levels. miRNAs often target more than one mRNA (for humans, the average is three), and mRNAs are often targeted by more than one miRNA (for the genes considered in this study, the average is also three). Therefore, it is difficult to determine the miRNAs that may cause the observed differential gene expression. We present a novel approach, maTE, which is based on machine learning, that integrates information about miRNA target genes with gene expression data. maTE depends on the availability of a sufficient amount of patient and control samples. The samples are used to train classifiers to accurately classify the samples on a per miRNA basis. Multiple high scoring miRNAs are used to build a final classifier to improve separation. RESULTS: The aim of the study is to find a set of miRNAs causing the regulation of their target genes that best explains the difference between groups (e.g. cancer versus control). maTE provides a list of significant groups of genes where each group is targeted by a specific miRNA. For the datasets used in this study, maTE generally achieves an accuracy well above 80%. Also, the results show that when the accuracy is much lower (e.g. ∼50%), the set of miRNAs provided is likely not causative of the difference in expression. This new approach of integrating miRNA regulation with expression data yields powerful results and is independent of external labels and training data. Thereby, this approach allows new avenues for exploring miRNA regulation and may enable the development of miRNA-based biomarkers and drugs. AVAILABILITY AND IMPLEMENTATION: The KNIME workflow, implementing maTE, is available at Bioinformatics online. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
MicroRNAs/genética , Perfilação da Expressão Gênica , Humanos , Aprendizado de Máquina , Neoplasias , RNA Mensageiro
16.
Methods Mol Biol ; 1912: 175-196, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30635894

RESUMO

Proteins have a strong influence on the phenotype and their aberrant expression leads to diseases. MicroRNAs (miRNAs) are short RNA sequences which posttranscriptionally regulate protein expression. This regulation is driven by miRNAs acting as recognition sequences for their target mRNAs within a larger regulatory machinery. A miRNA can have many target mRNAs and an mRNA can be targeted by many miRNAs which makes it difficult to experimentally discover all miRNA-mRNA interactions. Therefore, computational methods have been developed for miRNA detection and miRNA target prediction. An abundance of available computational tools makes selection difficult. Additionally, interactions are not currently the focus of investigation although they more accurately define the regulation than pre-miRNA detection or target prediction could perform alone. We define an interaction including the miRNA source and the mRNA target. We present computational methods allowing the investigation of these interactions as well as how they can be used to extend regulatory pathways. Finally, we present a list of points that should be taken into account when investigating miRNA-mRNA interactions. In the future, this may lead to better understanding of functional interactions which may pave the way for disease marker discovery and design of miRNA-based drugs.


Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes , MicroRNAs/metabolismo , RNA Mensageiro/metabolismo , Animais , Biologia Computacional/instrumentação , Bases de Dados Genéticas , Perfilação da Expressão Gênica/instrumentação , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Aprendizado de Máquina , MicroRNAs/isolamento & purificação , RNA Mensageiro/isolamento & purificação , Análise de Sequência de RNA/instrumentação , Análise de Sequência de RNA/métodos , Software
17.
Stud Health Technol Inform ; 253: 183-187, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30147069

RESUMO

MicroRNAs (miRNAs), approximately 22 nucleotides long, post-transcriptionally active gene expression regulators, play active roles in modulating cellular processes. Gene regulation and miRNA regulation are intertwined and the main aim of this study is to facilitate the analysis of miRNAs within gene regulatory pathways. VANESA enables the reconstruction of biological pathways and supports visualization and simulation. To support integrative miRNA and gene pathway analyses, a custom database of experimentally proven miRNAs, integrating data from miRBase, TarBase and miRTarBase, was added to DAWIS-M.D., which is the main data source for VANESA. Analysis of human KEGG pathways within DAWIS-M.D. showed that 661 miRNAs (~1/3 recorded human miRNAs) lead to 65,474 interactions. hsa-miR-335-5p targets most genes in our system (2,544); while the most targeted gene (with 71 miRNAs) is NUFIP2 (Nuclear Fragile X Mental Retardation Protein Interacting Protein 2). Amyotrophic Lateral Sclerosis (ALS), a complex neurodegenerative disease, was chosen as a proof of concept model. Using our system, it was possible to reduce the initially several hundred genes and miRNAs associated with ALS to eight genes, 19 miRNAs and 31 interactions. This highlights the effectiveness of the implemented system to distill important information from otherwise hard to access, highly convoluted and vast regulatory networks.


Assuntos
Esclerose Lateral Amiotrófica/genética , Bases de Dados Genéticas , Regulação da Expressão Gênica , Redes Reguladoras de Genes , MicroRNAs , Perfilação da Expressão Gênica , Humanos , Estatística como Assunto
18.
Chemosphere ; 199: 390-401, 2018 May.
Artigo em Inglês | MEDLINE | ID: mdl-29453065

RESUMO

Puccinellia distans, common alkali grass, is found throughout the world and can survive in soils with boron concentrations that are lethal for other plant species. Indeed, P. distans accumulates very high levels of this element. Despite these interesting features, very little research has been performed to elucidate the boron tolerance mechanism in this species. In this study, P. distans samples were treated for three weeks with normal (0.5 mg L-1) and elevated (500 mg L-1) boron levels in hydroponic solution. Expressed sequence tags (ESTs) derived from shoot tissue were analyzed by RNA sequencing to identify genes up and down-regulated under boron stress. In this way, 3312 differentially expressed transcripts were detected, 67.7% of which were up-regulated and 32.3% of which were down-regulated in boron-treated plants. To partially confirm the RNA sequencing results, 32 randomly selected transcripts were analyzed for their expression levels in boron-treated plants. The results agreed with the expected direction of change (up or down-regulation). A total of 1652 transcripts had homologs in A. thaliana and/or O. sativa and mapped to 1107 different proteins. Functional annotation of these proteins indicated that the boron tolerance and hyperaccumulation mechanisms of P. distans involve many transcriptomic changes including: alterations in the malate pathway, changes in cell wall components that may allow sequestration of excess boron without toxic effects, and increased expression of at least one putative boron transporter and two putative aquaporins. Elucidation of the boron accumulation mechanism is important in developing approaches for bioremediation of boron contaminated soils.


Assuntos
Biodegradação Ambiental , Boro/farmacocinética , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Poaceae/metabolismo , Poluição Ambiental/análise , Etiquetas de Sequências Expressas , Hidroponia
19.
Nat Commun ; 8(1): 330, 2017 08 24.
Artigo em Inglês | MEDLINE | ID: mdl-28839141

RESUMO

MicroRNAs are crucial for post-transcriptional gene regulation, and their dysregulation has been associated with diseases like cancer and, therefore, their analysis has become popular. The experimental discovery of miRNAs is cumbersome and, thus, many computational tools have been proposed. Here we assess 13 ab initio pre-miRNA detection approaches using all relevant, published, and novel data sets while judging algorithm performance based on ten intrinsic performance measures. We present an extensible framework, izMiR, which allows for the unbiased comparison of existing algorithms, adding new ones, and combining multiple approaches into ensemble methods. In an exhaustive attempt, we condense the results of millions of computations and show that no method is clearly superior; however, we provide a guideline for biomedical researchers to select a tool. Finally, we demonstrate that combining all of the methods into one ensemble approach, for the first time, allows reliable purely computational pre-miRNA detection in large eukaryotic genomes.As the experimental discovery of microRNAs (miRNAs) is cumbersome, computational tools have been developed for the prediction of pre-miRNAs. Here the authors develop a framework to assess the performance of existing and novel pre-miRNA prediction tools and provide guidelines for selecting an appropriate approach for a given data set.


Assuntos
Algoritmos , Biologia Computacional/métodos , Regulação da Expressão Gênica , MicroRNAs/genética , Precursores de RNA/genética , Humanos , Aprendizado de Máquina , Reprodutibilidade dos Testes
20.
J Integr Bioinform ; 14(2)2017 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-28753538

RESUMO

MicroRNAs (miRNAs) are involved in the post-transcriptional regulation of protein abundance and thus have a great impact on the resulting phenotype. It is, therefore, no wonder that they have been implicated in many diseases ranging from virus infections to cancer. This impact on the phenotype leads to a great interest in establishing the miRNAs of an organism. Experimental methods are complicated which led to the development of computational methods for pre-miRNA detection. Such methods generally employ machine learning to establish models for the discrimination between miRNAs and other sequences. Positive training data for model establishment, for the most part, stems from miRBase, the miRNA registry. The quality of the entries in miRBase has been questioned, though. This unknown quality led to the development of filtering strategies in attempts to produce high quality positive datasets which can lead to a scarcity of positive data. To analyze the quality of filtered data we developed a machine learning model and found it is well able to establish data quality based on intrinsic measures. Additionally, we analyzed which features describing pre-miRNAs could discriminate between low and high quality data. Both models are applicable to data from miRBase and can be used for establishing high quality positive data. This will facilitate the development of better miRNA detection tools which will make the prediction of miRNAs in disease states more accurate. Finally, we applied both models to all miRBase data and provide the list of high quality hairpins.


Assuntos
Conjuntos de Dados como Assunto/normas , Aprendizado de Máquina , MicroRNAs/análise , Humanos , MicroRNAs/genética , Sistema de Registros
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA