RESUMO
Prioritization or ranking of different cell types in a single-cell RNA sequencing (scRNA-seq) framework can be performed in a variety of ways, some of these include: i) obtaining an indication of the proportion of cell types between the different conditions under study, ii) counting the number of differentially expressed genes (DEGs) between cell types and conditions in the experiment or, iii) prioritizing cell types based on prior knowledge about the conditions under study (i.e., a specific disease). These methods have drawbacks and limitations thus novel methods for improving cell ranking are required. Here we present a novel methodology that exploits prior knowledge in combination with expert-user information to accentuate cell types from a scRNA-seq analysis that yield the most biologically meaningful results with respect to a disease under study. Our methodology allows for ranking and prioritization of cell types based on how well their expression profiles relate to the molecular mechanisms and drugs associated with a disease. Molecular mechanisms, as well as drugs, are incorporated as prior knowledge in a standardized, structured manner. Cell types are then ranked/prioritized based on how well results from data-driven analysis of scRNA-seq data match the predefined prior knowledge. In additional cell-cell communication perturbations between disease and control networks are used to further prioritize/rank cell types. Our methodology has substantial advantages to more traditional cell ranking techniques and provides an informative complementary methodology that utilizes prior knowledge in a rapid and automated manner, that has previously not been attempted by other studies. The current methodology is also implemented as an R package entitled Single Cell Ranking Analysis Toolkit (scRANK) and is available for download and installation via GitHub (https://github.com/aoulas/scRANK).
Assuntos
Biologia Computacional , Análise de Sequência de RNA , Análise de Célula Única , Análise de Célula Única/métodos , Análise de Sequência de RNA/métodos , Humanos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , RNA-Seq/métodos , Algoritmos , SoftwareRESUMO
The "replication crisis" is a methodological problem in which many scientific research findings have been difficult or impossible to replicate. Because the reproducibility of empirical results is an essential aspect of the scientific method, such failures endanger the credibility of theories based on them and possibly significant portions of scientific knowledge. An instance of the replication crisis, analytic replication, pertains to reproducing published results through computational reanalysis of the authors' original data. However, direct replications are costly, time-consuming, and unrewarded in today's publishing standards. We propose that bioinformatics and computational biology students replicate recent discoveries as part of their curriculum. Considering the above, we performed a pilot study in one of the graduate-level courses we developed and taught at our University. The course is entitled Intro to R Programming and is meant for students in our Master's and PhD programs who have little to no programming skills. As the course emphasized real-world data analysis, we thought it would be an appropriate setting to carry out this study. The primary objective was to expose the students to real biological data analysis problems. These include locating and downloading the needed datasets, understanding any underlying conventions and annotations, understanding the analytical methods, and regenerating multiple graphs from their assigned article. The secondary goal was to determine whether the assigned articles contained sufficient information for a graduate-level student to replicate its figures. Overall, the students successfully reproduced 39% of the figures. The main obstacles were the need for more advanced programming skills and the incomplete documentation of the applied methods. Students were engaged, enthusiastic, and focused throughout the semester. We believe that this teaching approach will allow students to make fundamental scientific contributions under appropriate supervision. It will teach them about the scientific process, the importance of reporting standards, and the importance of openness.
Assuntos
Currículo , Educação de Pós-Graduação , Humanos , Projetos Piloto , Reprodutibilidade dos Testes , Educação de Pós-Graduação/métodos , Estudantes , EnsinoRESUMO
MOTIVATION: MicroRNA (miRNA) precursor arms give rise to multiple isoforms simultaneously called 'isomiRs.' IsomiRs from the same arm typically differ by a few nucleotides at either their 5' or 3' termini or both. In humans, the identities and abundances of isomiRs depend on a person's sex and genetic ancestry as well as on tissue type, tissue state and disease type/subtype. Moreover, nearly half of the time the most abundant isomiR differs from the miRNA sequence found in public databases. Accurate mining of isomiRs from deep sequencing data is thus important. RESULTS: We developed isoMiRmap, a fast, standalone, user-friendly mining tool that identifies and quantifies all isomiRs by directly processing short RNA-seq datasets. IsoMiRmap is a portable 'plug-and-play' tool, requires minimal setup, has modest computing and storage requirements, and can process an RNA-seq dataset with 50 million reads in just a few minutes on an average laptop. IsoMiRmap deterministically and exhaustively reports all isomiRs in a given deep sequencing dataset and quantifies them accurately (no double-counting). IsoMiRmap comprehensively reports all miRNA precursor locations from which an isomiR may be transcribed, tags as 'ambiguous' isomiRs whose sequences exist both inside and outside of the space of known miRNA sequences and reports the public identifiers of common single-nucleotide polymorphisms and documented somatic mutations that may be present in an isomiR. IsoMiRmap also identifies isomiRs with 3' non-templated post-transcriptional additions. Compared to similar tools, isoMiRmap is the fastest, reports more bona fide isomiRs, and provides the most comprehensive information related to an isomiR's transcriptional origin. AVAILABILITY AND IMPLEMENTATION: The codes for isoMiRmap are freely available at https://cm.jefferson.edu/isoMiRmap/ and https://github.com/TJU-CMC-Org/isoMiRmap/. IsomiR profiles for the datasets of the 1000 Genomes Project, spanning five population groups, and The Cancer Genome Atlas (TCGA), spanning 33 cancer studies, are also available at https://cm.jefferson.edu/isoMiRmap/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
AMPA-type glutamate receptors mediate fast, excitatory neurotransmission in the brain, and their concentrations at synapses are important determinants of synaptic strength. We investigated the post-transcriptional regulation of GluA2, the calcium-impermeable AMPA receptor subunit, by examining the subcellular distribution of its mRNA and evaluating its translational regulation by microRNA in cultured mouse hippocampal neurons. Using computational approaches, we identified a conserved microRNA-124 (miR-124) binding site in the 3'UTR of GluA2 and demonstrated that miR-124 regulated the translation of GluA2 mRNA reporters in a sequence-specific manner in luciferase assays. While we hypothesized that this regulation might occur in dendrites, our biochemical and fluorescent in situ hybridization (FISH) data indicate that GluA2 mRNA does not localize to dendrites or synapses of mouse hippocampal neurons. In contrast, we detected significant concentrations of miR-124 in dendrites. Overexpression of miR-124 in dissociated neurons results in a 30% knockdown of GluA2 protein, as measured by immunoblot and quantitative immunocytochemistry, without producing any changes in GluA2 mRNA concentrations. While total GluA2 concentrations are reduced, we did not detect any changes in the concentration of synaptic GluA2. We conclude from these results that miR-124 interacts with GluA2 mRNA in the cell body to downregulate translation. Our data support a model in which GluA2 is translated in the cell body and subsequently transported to neuronal dendrites and synapses, and suggest that synaptic GluA2 concentrations are modified primarily by regulated protein trafficking rather than by regulated local translation.
Assuntos
Regulação da Expressão Gênica/genética , Hipocampo/citologia , MicroRNAs/metabolismo , Neurônios/metabolismo , RNA Mensageiro/metabolismo , Receptores de AMPA/genética , Animais , Animais Recém-Nascidos , Células Cultivadas , Estimulantes do Sistema Nervoso Central/farmacologia , Dendritos/metabolismo , Regulação da Expressão Gênica/efeitos dos fármacos , Hibridização in Situ Fluorescente , Camundongos , Camundongos Endogâmicos C57BL , MicroRNAs/genética , MicroRNAs/farmacologia , Proteínas do Tecido Nervoso/metabolismo , Neurônios/citologia , Neurônios/efeitos dos fármacos , Picrotoxina/farmacologia , Mutação Puntual/genética , Ligação Proteica/genética , Transporte Proteico/efeitos dos fármacos , Transporte Proteico/genética , Receptores de AMPA/metabolismo , Receptores CXCR/genética , Receptores CXCR/metabolismo , Sinaptossomos/metabolismoRESUMO
BACKGROUND AND OBJECTIVE: The standard of care in Acute Myeloid Leukemia patients has remained essentially unchanged for nearly 40 years. Due to the complicated mutational patterns within and between individual patients and a lack of targeted agents for most mutational events, implementing individualized treatment for AML has proven difficult. We reanalysed the BeatAML dataset employing Machine Learning algorithms. The BeatAML project entails patients extensively characterized at the molecular and clinical levels and linked to drug sensitivity outputs. Our approach capitalizes on the molecular and clinical data provided by the BeatAML dataset to predict the ex vivo drug sensitivity for the 122 drugs evaluated by the project. METHODS: We utilized ElasticNet, which produces fully interpretable models, in combination with a two-step training protocol that allowed us to narrow down computations. We automated the genes' filtering step by employing two metrics, and we evaluated all possible data combinations to identify the best training configuration settings per drug. RESULTS: We report a Pearson correlation across all drugs of 0.36 when clinical and RNA sequencing data were combined, with the best-performing models reaching a Pearson correlation of 0.67. When we trained using the datasets in isolation, we noted that RNA Sequencing data (Pearson: 0.36) attained three times the predictive power of whole exome sequencing data (Pearson: 0.11), with clinical data falling somewhere in between (Pearson 0.26). Lastly, we present a paradigm of clinical significance. We used our models' prediction as a drug sensitivity score to rank an individual's expected response to treatment. We identified 78 patients out of 89 (88 %) that the proposed drug was more potent than the administered one based on their ex vivo drug sensitivity data. CONCLUSIONS: In conclusion, our reanalysis of the BeatAML dataset using Machine Learning algorithms demonstrates the potential for individualized treatment prediction in Acute Myeloid Leukemia patients, addressing the longstanding challenge of treatment personalization in this disease. By leveraging molecular and clinical data, our approach yields promising correlations between predicted drug sensitivity and actual responses, highlighting a significant step forward in improving therapeutic outcomes for AML patients.
RESUMO
The COVID-19 pandemic has exemplified the importance of interoperable and equitable data sharing for global surveillance and to support research. While many challenges could be overcome, at least in some countries, many hurdles within the organizational, scientific, technical and cultural realms still remain to be tackled to be prepared for future threats. We propose to (i) continue supporting global efforts that have proven to be efficient and trustworthy toward addressing challenges in pathogen molecular data sharing; (ii) establish a distributed network of Pathogen Data Platforms to (a) ensure high quality data, metadata standardization and data analysis, (b) perform data brokering on behalf of data providers both for research and surveillance, (c) foster capacity building and continuous improvements, also for pandemic preparedness; (iii) establish an International One Health Pathogens Portal, connecting pathogen data isolated from various sources (human, animal, food, environment), in a truly One Health approach and following FAIR principles. To address these challenging endeavors, we have started an ELIXIR Focus Group where we invite all interested experts to join in a concerted, expert-driven effort toward sustaining and ensuring high-quality data for global surveillance and research.
Assuntos
COVID-19 , Animais , Humanos , COVID-19/epidemiologia , Pandemias , Fortalecimento Institucional , Disseminação de InformaçãoRESUMO
Computational methods for miRNA target prediction vary in the algorithm used; and while one can state opinions about the strengths or weaknesses of each particular algorithm, the fact of the matter is that they fall substantially short of capturing the full detail of physical, temporal and spatial requirements of miRNA::target-mRNA interactions. Here, we introduce a novel miRNA target prediction tool called Targetprofiler that utilizes a probabilistic learning algorithm in the form of a hidden Markov model trained on experimentally verified miRNA targets. Using a large scale protein downregulation data set we validate our method and compare its performance to existing tools. We find that Targetprofiler exhibits greater correlation between computational predictions and protein downregulation and predicts experimentally verified miRNA targets more accurately than three other tools. Concurrently, we use primer extension to identify the mature sequence of a novel miRNA gene recently identified within a cancer associated genomic region and use Targetprofiler to predict its potential targets. Experimental verification of the ability of this small RNA molecule to regulate the expression of CCND2, a gene with documented oncogenic activity, confirms its functional role as a miRNA. These findings highlight the competitive advantage of our tool and its efficacy in extracting biologically significant results.
Assuntos
Algoritmos , Ciclina D2 , Regulação Neoplásica da Expressão Gênica , MicroRNAs , Proteínas de Neoplasias , Neoplasias , RNA Neoplásico , Análise de Sequência de RNA/métodos , Ciclina D2/biossíntese , Ciclina D2/genética , Células HeLa , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Proteínas de Neoplasias/biossíntese , Proteínas de Neoplasias/genética , Neoplasias/genética , Neoplasias/metabolismo , RNA Neoplásico/genética , RNA Neoplásico/metabolismoRESUMO
Technologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. Therefore, it is an unmet need to conceptualize how to integrate such data and implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining available multi-omic anlaysis tools (machine learning component analysis, non-parametric data combination, and a multi-omics exploratory analysis) in a step-wise manner. While in several studies, we have previously combined those integrative tools, here, we provide a systematic description of the STATegra framework and its validation using two The Cancer Genome Atlas (TCGA) case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma (SKCM) cases, we demonstrate an enhanced capacity of the framework (and beyond the individual tools) to identify features and pathways compared to single-omics analysis. Such an integrative multi-omics analysis framework for identifying features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATegRa Bioconductor package.
RESUMO
The development of single-cell sequencing technologies has allowed researchers to gain important new knowledge about the expression profile of genes in thousands of individual cells of a model organism or tissue. A common disadvantage of this technology is the loss of the three-dimensional (3-D) structure of the cells. Consequently, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) organized the Single-Cell Transcriptomics Challenge, in which we participated, with the aim to address the following two problems: (a) to identify the top 60, 40, and 20 genes of the Drosophila melanogaster embryo that contain the most spatial information and (b) to reconstruct the 3-D arrangement of the embryo using information from those genes. We developed two independent techniques, leveraging machine learning models from least absolute shrinkage and selection operator (Lasso) and deep neural networks (NNs), which are applied to high-dimensional single-cell sequencing data in order to accurately identify genes that contain spatial information. Our first technique, Lasso.TopX, utilizes the Lasso and ranking statistics and allows a user to define a specific number of features they are interested in. The NN approach utilizes weak supervision for linear regression to accommodate for uncertain or probabilistic training labels. We show, individually for both techniques, that we are able to identify important, stable, and a user-defined number of genes containing the most spatial information. The results from both techniques achieve high performance when reconstructing spatial information in D. melanogaster and also generalize to zebrafish (Danio rerio). Furthermore, we identified novel D. melanogaster genes that carry important positional information and were not previously suspected. We also show how the indirect use of the full datasets' information can lead to data leakage and generate bias in overestimating the model's performance. Lastly, we discuss the applicability of our approaches to other feature selection problems outside the realm of single-cell sequencing and the importance of being able to handle probabilistic training labels. Our source code and detailed documentation are available at https://github.com/TJU-CMC-Org/SingleCell-DREAM/.
RESUMO
Single-cell RNA-sequencing (scRNAseq) technologies are rapidly evolving. Although very informative, in standard scRNAseq experiments, the spatial organization of the cells in the tissue of origin is lost. Conversely, spatial RNA-seq technologies designed to maintain cell localization have limited throughput and gene coverage. Mapping scRNAseq to genes with spatial information increases coverage while providing spatial location. However, methods to perform such mapping have not yet been benchmarked. To fill this gap, we organized the DREAM Single-Cell Transcriptomics challenge focused on the spatial reconstruction of cells from the Drosophila embryo from scRNAseq data, leveraging as silver standard, genes with in situ hybridization data from the Berkeley Drosophila Transcription Network Project reference atlas. The 34 participating teams used diverse algorithms for gene selection and location prediction, while being able to correctly localize clusters of cells. Selection of predictor genes was essential for this task. Predictor genes showed a relatively high expression entropy, high spatial clustering and included prominent developmental genes such as gap and pair-rule genes and tissue markers. Application of the top 10 methods to a zebra fish embryo dataset yielded similar performance and statistical properties of the selected genes than in the Drosophila data. This suggests that methods developed in this challenge are able to extract generalizable properties of genes that are useful to accurately reconstruct the spatial arrangement of cells in tissues.
Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Análise de Célula Única/métodos , Análise Espacial , Algoritmos , Animais , Bases de Dados Genéticas , Drosophila/genética , Previsões/métodos , Regulação da Expressão Gênica no Desenvolvimento/genética , Redes Reguladoras de Genes/genética , Análise de Sequência de RNA/métodos , Transcriptoma/genética , Peixe-Zebra/genéticaRESUMO
BACKGROUND: Multiple Sclerosis (MS) is a chronic inflammatory disease and a leading cause of progressive neurological disability among young adults. DNA methylation, which intersects genes and environment to control cellular functions on a molecular level, may provide insights into MS pathogenesis. METHODS: We measured DNA methylation in CD4+ T cells (nâ¯=â¯31), CD8+ T cells (nâ¯=â¯28), CD14+ monocytes (n =â¯35) and CD19+ B cells (nâ¯=â¯27) from relapsing-remitting (RRMS), secondary progressive (SPMS) patients and healthy controls (HC) using Infinium HumanMethylation450 arrays. Monocyte (nâ¯=â¯25) and whole blood (n =â¯275) cohorts were used for validations. FINDINGS: B cells from MS patients displayed most significant differentially methylated positions (DMPs), followed by monocytes, while only few DMPs were detected in T cells. We implemented a non-parametric combination framework (omicsNPC) to increase discovery power by combining evidence from all four cell types. Identified shared DMPs co-localized at MS risk loci and clustered into distinct groups. Functional exploration of changes discriminating RRMS and SPMS from HC implicated lymphocyte signaling, T cell activation and migration. SPMS-specific changes, on the other hand, implicated myeloid cell functions and metabolism. Interestingly, neuronal and neurodegenerative genes and pathways were also specifically enriched in the SPMS cluster. INTERPRETATION: We utilized a statistical framework (omicsNPC) that combines multiple layers of evidence to identify DNA methylation changes that provide new insights into MS pathogenesis in general, and disease progression, in particular. FUND: This work was supported by the Swedish Research Council, Stockholm County Council, AstraZeneca, European Research Council, Karolinska Institutet and Margaretha af Ugglas Foundation.
Assuntos
Metilação de DNA , Imunidade , Esclerose Múltipla/etiologia , Esclerose Múltipla/metabolismo , Transdução de Sinais , Adulto , Idoso , Subpopulações de Linfócitos B/imunologia , Subpopulações de Linfócitos B/metabolismo , Biomarcadores , Ilhas de CpG , Progressão da Doença , Suscetibilidade a Doenças , Feminino , Humanos , Imunofenotipagem , Masculino , Pessoa de Meia-Idade , Esclerose Múltipla/diagnóstico por imagem , Esclerose Múltipla/patologia , Esclerose Múltipla Crônica Progressiva/diagnóstico , Esclerose Múltipla Crônica Progressiva/etiologia , Esclerose Múltipla Crônica Progressiva/metabolismo , Esclerose Múltipla Recidivante-Remitente/diagnóstico , Esclerose Múltipla Recidivante-Remitente/etiologia , Esclerose Múltipla Recidivante-Remitente/metabolismo , Locos de Características Quantitativas , Subpopulações de Linfócitos T/imunologia , Subpopulações de Linfócitos T/metabolismoRESUMO
Multiple Sclerosis (MS) is an autoimmune disease of the central nervous system with prominent neurodegenerative components. The triggering and progression of MS is associated with transcriptional and epigenetic alterations in several tissues, including peripheral blood. The combined influence of transcriptional and epigenetic changes associated with MS has not been assessed in the same individuals. Here we generated paired transcriptomic (RNA-seq) and DNA methylation (Illumina 450 K array) profiles of CD4+ and CD8+ T cells (CD4, CD8), using clinically accessible blood from healthy donors and MS patients in the initial relapsing-remitting and subsequent secondary-progressive stage. By integrating the output of a differential expression test with a permutation-based non-parametric combination methodology, we identified 149 differentially expressed (DE) genes in both CD4 and CD8 cells collected from MS patients. Moreover, by leveraging the methylation-dependent regulation of gene expression, we identified the gene SH3YL1, which displayed significant correlated expression and methylation changes in MS patients. Importantly, silencing of SH3YL1 in primary human CD4 cells demonstrated its influence on T cell activation. Collectively, our strategy based on paired sampling of several cell-types provides a novel approach to increase sensitivity for identifying shared mechanisms altered in CD4 and CD8 cells of relevance in MS in small sized clinical materials.
Assuntos
Imunomodulação , Esclerose Múltipla/etiologia , Esclerose Múltipla/metabolismo , Subpopulações de Linfócitos T/imunologia , Subpopulações de Linfócitos T/metabolismo , Adulto , Biologia Computacional/métodos , Metilação de DNA , Gerenciamento Clínico , Progressão da Doença , Suscetibilidade a Doenças , Feminino , Perfilação da Expressão Gênica , Humanos , Ativação Linfocitária/genética , Ativação Linfocitária/imunologia , Masculino , Pessoa de Meia-Idade , Esclerose Múltipla/diagnóstico , Índice de Gravidade de Doença , TranscriptomaRESUMO
BACKGROUND: The advance of omics technologies has made possible to measure several data modalities on a system of interest. In this work, we illustrate how the Non-Parametric Combination methodology, namely NPC, can be used for simultaneously assessing the association of different molecular quantities with an outcome of interest. We argue that NPC methods have several potential applications in integrating heterogeneous omics technologies, as for example identifying genes whose methylation and transcriptional levels are jointly deregulated, or finding proteins whose abundance shows the same trends of the expression of their encoding genes. RESULTS: We implemented the NPC methodology within "omicsNPC", an R function specifically tailored for the characteristics of omics data. We compare omicsNPC against a range of alternative methods on simulated as well as on real data. Comparisons on simulated data point out that omicsNPC produces unbiased / calibrated p-values and performs equally or significantly better than the other methods included in the study; furthermore, the analysis of real data show that omicsNPC (a) exhibits higher statistical power than other methods, (b) it is easily applicable in a number of different scenarios, and (c) its results have improved biological interpretability. CONCLUSIONS: The omicsNPC function competitively behaves in all comparisons conducted in this study. Taking into account that the method (i) requires minimal assumptions, (ii) it can be used on different studies designs and (iii) it captures the dependences among heterogeneous data modalities, omicsNPC provides a flexible and statistically powerful solution for the integrative analysis of different omics data.
Assuntos
Perfilação da Expressão Gênica , Genômica , Estatística como Assunto/métodos , Estatísticas não Paramétricas , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Glioblastoma/genética , Humanos , Invasividade Neoplásica , Esquizofrenia/genéticaRESUMO
We address the problem of predicting the position of a miRNA duplex on a microRNA hairpin via the development and application of a novel SVM-based methodology. Our method combines a unique problem representation and an unbiased optimization protocol to learn from mirBase19.0 an accurate predictive model, termed MiRduplexSVM. This is the first model that provides precise information about all four ends of the miRNA duplex. We show that (a) our method outperforms four state-of-the-art tools, namely MaturePred, MiRPara, MatureBayes, MiRdup as well as a Simple Geometric Locator when applied on the same training datasets employed for each tool and evaluated on a common blind test set. (b) In all comparisons, MiRduplexSVM shows superior performance, achieving up to a 60% increase in prediction accuracy for mammalian hairpins and can generalize very well on plant hairpins, without any special optimization. (c) The tool has a number of important applications such as the ability to accurately predict the miRNA or the miRNA*, given the opposite strand of a duplex. Its performance on this task is superior to the 2nts overhang rule commonly used in computational studies and similar to that of a comparative genomic approach, without the need for prior knowledge or the complexity of performing multiple alignments. Finally, it is able to evaluate novel, potential miRNAs found either computationally or experimentally. In relation with recent confidence evaluation methods used in miRBase, MiRduplexSVM was successful in identifying high confidence potential miRNAs.
Assuntos
MicroRNAs/química , MicroRNAs/genética , Modelos Teóricos , Algoritmos , Sequência de Bases , Biologia Computacional/métodosRESUMO
Computational methods for miRNA target prediction are currently undergoing extensive review and evaluation. There is still a great need for improvement of these tools and bioinformatics approaches are looking towards high-throughput experiments in order to validate predictions. The combination of large-scale techniques with computational tools will not only provide greater credence to computational predictions but also lead to the better understanding of specific biological questions. Current miRNA target prediction tools utilize probabilistic learning algorithms, machine learning methods and even empirical biologically defined rules in order to build models based on experimentally verified miRNA targets. Large-scale protein downregulation assays and next-generation sequencing (NGS) are now being used to validate methodologies and compare the performance of existing tools. Tools that exhibit greater correlation between computational predictions and protein downregulation or RNA downregulation are considered the state of the art. Moreover, efficiency in prediction of miRNA targets that are concurrently verified experimentally provides additional validity to computational predictions and further highlights the competitive advantage of specific tools and their efficacy in extracting biologically significant results. In this review paper, we discuss the computational methods for miRNA target prediction and provide a detailed comparison of methodologies and features utilized by each specific tool. Moreover, we provide an overview of current state-of-the-art high-throughput methods used in miRNA target prediction.
Assuntos
Biologia Computacional/métodos , MicroRNAs/genética , Algoritmos , Animais , Inteligência Artificial , Humanos , Análise de Sequência de RNA , SoftwareRESUMO
Changes in the structure and/or the expression of protein-coding genes were thought to be the major cause of cancer for many decades. However, the recent discovery of non-coding RNA (ncRNA) transcripts suggests that the molecular biology of cancer is far more complex. MicroRNAs (miRNAs) are key players of the family of ncRNAs and they have been under extensive investigation because of their involvement in carcinogenesis, often taking up roles of tumor suppressors or oncogenes. Owing to the slow nature of experimental identification of miRNA genes, computational procedures have been applied as a valuable complement to cloning. Numerous computational tools, implemented to recognize the characteristic features of miRNA biogenesis, have resulted in the prediction of multiple novel miRNA genes. Computational approaches provide valuable clues as to which are the dominant features that characterize these regulatory units and furthermore act by narrowing down the search space making experimental verification faster and significantly cheaper. Moreover, in combination with large-scale, high-throughput methods, such as deep sequencing and tilling arrays, computational methods have aided in the discovery of putative molecular signatures of miRNA deregulation in human tumors. This chapter focuses on existing computational methods for identifying miRNA genes, provides an overview of the methodology undertaken by these tools, and underlies their contribution toward unraveling the role of miRNAs in cancer.
Assuntos
MicroRNAs/genética , Neoplasias/genética , Animais , Biologia Computacional , HumanosRESUMO
Changes in the structure and/or the expression of protein coding genes were thought to be the major cause of cancer for many decades. The recent discovery of non-coding RNA (ncRNA) transcripts (i.e., microRNAs) suggests that the molecular biology of cancer is far more complex. MicroRNAs (miRNAs) have been under investigation due to their involvement in carcinogenesis, often taking up roles of tumor suppressors or oncogenes. Due to the slow nature of experimental identification of miRNA genes, computational procedures have been applied as a valuable complement to cloning. Numerous computational tools, implemented to recognize the features of miRNA biogenesis, have resulted in the prediction of novel miRNA genes. Computational approaches provide clues as to which are the dominant features that characterize these regulatory units and furthermore act by narrowing down the search space making experimental verification faster and cheaper. In combination with large scale, high throughput methods, such as deep sequencing, computational methods have aided in the discovery of putative molecular signatures of miRNA deregulation in human tumors. This review focuses on existing computational methods for identifying miRNA genes, provides an overview of the methodology undertaken by these tools, and underlies their contribution towards unraveling the role of miRNAs in cancer.