RESUMEN
Life sciences are currently going through a great number of transformations raised by the in-going revolution in high-throughput technologies for the acquisition of data. The integration of their high dimensionality, ranging from omics to clinical data, is becoming one of the most challenging stages. It involves inter-disciplinary developments with the aim to move towards an enhanced understanding of human physiology for caring purposes. Biologists, bioinformaticians, physicians and other experts related to the healthcare domain have to accompany each step of the analysis process in order to investigate and expertise these various data. In this perspective, methods related to information visualization are gaining increasing attention within life sciences. The softwares based on these methods are now well recognized to facilitate expert users' success in carrying out their data analysis tasks. This article aims at reviewing the current methods and techniques dedicated to information visualisation and their current use in software development related to omics or/and clinical data.
Asunto(s)
Biología Computacional , Presentación de Datos , Conjuntos de Datos como Asunto , Humanos , Almacenamiento y Recuperación de la Información , Programas InformáticosRESUMEN
BACKGROUND: Bacterial sRNA-mediated regulatory networks has been introduced as a powerful way to analyze the fast rewiring capabilities of a bacteria in response to changing environmental conditions. The identification of mRNA targets of bacterial sRNAs is essential to investigate their functional activities. However, this step remains challenging with the lack of knowledge of the topological and biological constraints behind the formation of sRNA-mRNA duplexes. Even with the most sophisticated bioinformatics target prediction tools, the large proportion of false predictions may be prohibitive for further analyses. To deal with this issue, sRNA target analyses can be carried out from the resulting gene lists given by RNA-SEQ experiments when available. However, the number of resulting target candidates may be still huge and cannot be easily interpreted by domain experts who need to confront various biological features to prioritize the target candidates. Therefore, novel strategies have to be carried out to improve the specificity of computational prediction results, before proposing new candidates for an expensive experimental validation stage. RESULT: To address this issue, we propose a new visualization tool rNAV 2.0, for detecting and filtering bacterial sRNA targets for regulatory networks. rNAV is designed to cope with a variety of biological constraints, including the gene annotations, the conserved regions of interaction or specific patterns of regulation. Depending on the application, these constraints can be variously combined to analyze the target candidates, prioritized for instance by a known conserved interaction region, or because of a common function. CONCLUSION: The standalone application implements a set of known algorithms and interaction techniques, and applies them to the new problem of identifying reasonable sRNA target candidates.
Asunto(s)
Biología Computacional/métodos , Regulación Bacteriana de la Expresión Génica/genética , ARN Bacteriano/genéticaRESUMEN
The revolution in high-throughput sequencing technologies has enabled the acquisition of gigabytes of RNA sequences in many different conditions and has highlighted an unexpected number of small RNAs (sRNAs) in bacteria. Ongoing exploitation of these data enables numerous applications for investigating bacterial transacting sRNA-mediated regulation networks. Focusing on sRNAs that regulate mRNA translation in trans, recent works have noted several sRNA-based regulatory pathways that are essential for key cellular processes. Although the number of known bacterial sRNAs is increasing, the experimental validation of their interactions with mRNA targets remains challenging and involves expensive and time-consuming experimental strategies. Hence, bioinformatics is crucial for selecting and prioritizing candidates before designing any experimental work. However, current software for target prediction produces a prohibitive number of candidates because of the lack of biological knowledge regarding the rules governing sRNA-mRNA interactions. Therefore, there is a real need to develop new approaches to help biologists focus on the most promising predicted sRNA-mRNA interactions. In this perspective, this review aims at presenting the advantages of mixing bioinformatics and visualization approaches for analyzing predicted sRNA-mediated regulatory bacterial networks.
Asunto(s)
Biología Computacional , Redes Reguladoras de Genes , ARN Bacteriano/fisiologíaRESUMEN
BACKGROUND: The search for enriched features has become widely used to characterize a set of genes or proteins. A key aspect of this technique is its ability to identify correlations amongst heterogeneous data such as Gene Ontology annotations, gene expression data and genome location of genes. Despite the rapid growth of available data, very little has been proposed in terms of formalization and optimization. Additionally, current methods mainly ignore the structure of the data which causes results redundancy. For example, when searching for enrichment in GO terms, genes can be annotated with multiple GO terms and should be propagated to the more general terms in the Gene Ontology. Consequently, the gene sets often overlap partially or totally, and this causes the reported enriched GO terms to be both numerous and redundant, hence, overwhelming the researcher with non-pertinent information. This situation is not unique, it arises whenever some hierarchical clustering is performed (e.g. based on the gene expression profiles), the extreme case being when genes that are neighbors on the chromosomes are considered. RESULTS: We present a generic framework to efficiently identify the most pertinent over-represented features in a set of genes. We propose a formal representation of gene sets based on the theory of partially ordered sets (posets), and give a formal definition of target set pertinence. Algorithms and compact representations of target sets are provided for the generation and the evaluation of the pertinent target sets. The relevance of our method is illustrated through the search for enriched GO annotations in the proteins involved in a multiprotein complex. The results obtained demonstrate the gain in terms of pertinence (up to 64% redundancy removed), space requirements (up to 73% less storage) and efficiency (up to 98% less comparisons). CONCLUSION: The generic framework presented in this article provides a formal approach to adequately represent available data and efficiently search for pertinent over-represented features in a set of genes or proteins. The formalism and the pertinence definition can be directly used by most of the methods and tools currently available for feature enrichment analysis.
Asunto(s)
Biología Computacional/métodos , Compresión de Datos/métodos , Sistemas de Administración de Bases de Datos , Perfilación de la Expresión Génica/métodos , Reconocimiento de Normas Patrones Automatizadas , Algoritmos , Inteligencia Artificial , Análisis por Conglomerados , Bases de Datos Genéticas/estadística & datos numéricos , Bases de Datos de Proteínas/estadística & datos numéricos , Eficiencia , Perfilación de la Expresión Génica/estadística & datos numéricos , Teoría de la Información , Proteínas/clasificación , Proteínas/genética , Proteínas/metabolismo , Relación Estructura-Actividad , Terminología como Asunto , Simplificación del TrabajoRESUMEN
The combination of sequencing and post-sequencing experimental approaches produces huge collections of data that are highly heterogeneous both in structure and in semantics. We propose a new strategy for the integration of such data. This strategy uses structured sets of sequences as a unified representation of biological information and defines a probabilistic measure of similarity between the sets. Sets can be composed of sequences that are known to have a biological relationship (e.g. proteins involved in a complex or a pathway) or that share similar values for a particular attribute (e.g. expression profile). We have developed a software, BlastSets, which implements this strategy. It exploits a database where the sets derived from diverse biological information can be deposited using a standard XML format. For a given query set, BlastSets returns target sets found in the database whose similarity to the query is statistically significant. The tool allowed us to automatically identify verified relationships between correlated expression profiles and biological pathways using publicly available data for Saccharomyces cerevisiae. It was also used to retrieve the members of a complex (ribosome) based on the mining of expression profiles. These first results validate the relevance of the strategy and demonstrate the promising potential of BlastSets.
Asunto(s)
Biología Computacional/métodos , Análisis de Secuencia/métodos , Programas Informáticos , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Genómica , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Integración de SistemasRESUMEN
Trypanosoma brucei is a protozoan parasite of major of interest in discovering new genes for drug targets. This parasite alternates its life cycle between the mammal host(s) (bloodstream form) and the insect vector (procyclic form), with two divergent glucose metabolism amenable to in vitro culture. While the metabolic network of the bloodstream forms has been well characterized, the flux distribution between the different branches of the glucose metabolic network in the procyclic form has not been addressed so far. We present a computational analysis (called Metaboflux) that exploits the metabolic topology of the procyclic form, and allows the incorporation of multipurpose experimental data to increase the biological relevance of the model. The alternatives resulting from the structural complexity of networks are formulated as an optimization problem solved by a metaheuristic where experimental data are modeled in a multiobjective function. Our results show that the current metabolic model is in agreement with experimental data and confirms the observed high metabolic flexibility of glucose metabolism. In addition, Metaboflux offers a rational explanation for the high flexibility in the ratio between final products from glucose metabolism, thsat is, flux redistribution through the malic enzyme steps.