Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Transl Med ; 22(1): 64, 2024 01 16.
Artículo en Inglés | MEDLINE | ID: mdl-38229087

RESUMEN

BACKGROUND: Atopic dermatitis (AD) is a prevalent chronic inflammatory skin disease whose pathophysiology involves the interplay between genetic and environmental factors, ultimately leading to dysfunction of the epidermis. While several treatments are effective in symptom management, many existing therapies offer only temporary relief and often come with side effects. For this reason, the formulation of an effective therapeutic plan is challenging and there is a need for more effective and targeted treatments that address the root causes of the condition. Here, we hypothesise that modelling the complexity of the molecular buildup of the atopic dermatitis can be a concrete means to drive drug discovery. METHODS: We preprocessed, harmonised and integrated publicly available transcriptomics datasets of lesional and non-lesional skin from AD patients. We inferred co-expression network models of both AD lesional and non-lesional skin and exploited their interactional properties by integrating them with a priori knowledge in order to extrapolate a robust AD disease module. Pharmacophore-based virtual screening was then utilised to build a tailored library of compounds potentially active for AD. RESULTS: In this study, we identified a core disease module for AD, pinpointing known and unknown molecular determinants underlying the skin lesions. We identified skin- and immune-cell type signatures expressed by the disease module, and characterised the impaired cellular functions underlying the complex phenotype of atopic dermatitis. Therefore, by investigating the connectivity of genes belonging to the AD module, we prioritised novel putative biomarkers of the disease. Finally, we defined a tailored compound library by characterising the therapeutic potential of drugs targeting genes within the disease module to facilitate and tailor future drug discovery efforts towards novel pharmacological strategies for AD. CONCLUSIONS: Overall, our study reveals a core disease module providing unprecedented information about genetic, transcriptional and pharmacological relationships that foster drug discovery in atopic dermatitis.


Asunto(s)
Dermatitis Atópica , Humanos , Dermatitis Atópica/tratamiento farmacológico , Dermatitis Atópica/genética , Piel , Perfilación de la Expresión Génica , Fenotipo , Biomarcadores
2.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34396389

RESUMEN

Typical clustering analysis for large-scale genomics data combines two unsupervised learning techniques: dimensionality reduction and clustering (DR-CL) methods. It has been demonstrated that transforming gene expression to pathway-level information can improve the robustness and interpretability of disease grouping results. This approach, referred to as biological knowledge-driven clustering (BK-CL) approach, is often neglected, due to a lack of tools enabling systematic comparisons with more established DR-based methods. Moreover, classic clustering metrics based on group separability tend to favor the DR-CL paradigm, which may increase the risk of identifying less actionable disease subtypes that have ambiguous biological and clinical explanations. Hence, there is a need for developing metrics that assess biological and clinical relevance. To facilitate the systematic analysis of BK-CL methods, we propose a computational protocol for quantitative analysis of clustering results derived from both DR-CL and BK-CL methods. Moreover, we propose a new BK-CL method that combines prior knowledge of disease relevant genes, network diffusion algorithms and gene set enrichment analysis to generate robust pathway-level information. Benchmarking studies were conducted to compare the grouping results from different DR-CL and BK-CL approaches with respect to standard clustering evaluation metrics, concordance with known subtypes, association with clinical outcomes and disease modules in co-expression networks of genes. No single approach dominated every metric, showing the importance multi-objective evaluation in clustering analysis. However, we demonstrated that, on gene expression data sets derived from TCGA samples, the BK-CL approach can find groupings that provide significant prognostic value in both breast and prostate cancers.


Asunto(s)
Biomarcadores , Biología Computacional/métodos , Minería de Datos , Susceptibilidad a Enfermedades , Algoritmos , Análisis por Conglomerados , Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Predisposición Genética a la Enfermedad , Genómica/métodos , Humanos , Pronóstico , Transducción de Señal , Análisis de Supervivencia , Flujo de Trabajo
3.
Bioinformatics ; 38(Suppl_2): ii20-ii26, 2022 09 16.
Artículo en Inglés | MEDLINE | ID: mdl-36124794

RESUMEN

MOTIVATION: In modern translational research, the development of biomarkers heavily relies on use of omics technologies, but implementations with basic data mining algorithms frequently lead to false positives. Non-dominated Sorting Genetic Algorithm II (NSGA2) is an extremely effective algorithm for biomarker discovery but has been rarely evaluated against large-scale datasets. The exploration of the feature search space is the key to NSGA2 success but in specific cases NSGA2 expresses a shallow exploration of the space of possible feature combinations, possibly leading to models with low predictive performances. RESULTS: We propose two improved NSGA2 algorithms for finding subsets of biomarkers exhibiting different trade-offs between accuracy and feature number. The performances are investigated on gene expression data of breast cancer patients. The results are compared with NSGA2 and LASSO. The benchmarking dataset includes internal and external validation sets. The results show that the proposed algorithms generate a better approximation of the optimal trade-offs between accuracy and set size. Moreover, validation and test accuracies are better than those provided by NSGA2 and LASSO. Remarkably, the GA-based methods provide biomarkers that achieve a very high prediction accuracy (>80%) with a small number of features (<10), representing a valid alternative to known biomarker models, such as Pam50 and MammaPrint. AVAILABILITY AND IMPLEMENTATION: The software is publicly available on GitHub at github.com/UEFBiomedicalInformaticsLab/BIODAI/tree/main/MOO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Investigación Biomédica , Humanos , Programas Informáticos
4.
Bioinformatics ; 38(7): 2066-2069, 2022 03 28.
Artículo en Inglés | MEDLINE | ID: mdl-35134136

RESUMEN

PURPOSE: Endocrine disruptors are a rising concern due to the wide array of health issues that it can cause. Although there are tools for mode of action (MoA)-based prediction of endocrine disruption (e.g. QSAR Toolbox and iSafeRat), none of them is based on toxicogenomics data. Here, we present EDTox, an R Shiny application enabling users to explore and use a computational method that we have recently published to identify and prioritize endocrine disrupting (ED) chemicals based on toxicogenomic data. The EDTox pipeline utilizes previously trained toxicogenomic-driven classifiers to make predictions on new untested compounds by using their molecular initiating events. Furthermore, the proposed R Shiny app allows users to extend the prediction systems by training and adding new classifiers based on new available toxicogenomic data. This functionality helps users to explore the ED potential of chemicals in new, untested exposure scenarios. AVAILABILITY AND IMPLEMENTATION: This tool is available as web application (www.edtox.fi) and stand-alone software on GitHub and Zenodo (https://doi.org/10.5281/zenodo.5817093). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Toxicogenética
5.
Hum Genomics ; 16(1): 62, 2022 11 28.
Artículo en Inglés | MEDLINE | ID: mdl-36437479

RESUMEN

In recent years, a growing interest in the characterization of the molecular basis of psoriasis has been observed. However, despite the availability of a large amount of molecular data, many pathogenic mechanisms of psoriasis are still poorly understood. In this study, we performed an integrated analysis of 23 public transcriptomic datasets encompassing both lesional and uninvolved skin samples from psoriasis patients. We defined comprehensive gene co-expression network models of psoriatic lesions and uninvolved skin. Moreover, we curated and exploited a wide range of functional information from multiple public sources in order to systematically annotate the inferred networks. The integrated analysis of transcriptomics data and co-expression networks highlighted genes that are frequently dysregulated and show aberrant patterns of connectivity in the psoriatic lesion compared with the unaffected skin. Our approach allowed us to also identify plausible, previously unknown, actors in the expression of the psoriasis phenotype. Finally, we characterized communities of co-expressed genes associated with relevant molecular functions and expression signatures of specific immune cell types associated with the psoriasis lesion. Overall, integrating experimental driven results with curated functional information from public repositories represents an efficient approach to empower knowledge generation about psoriasis and may be applicable to other complex diseases.


Asunto(s)
Psoriasis , Humanos , Psoriasis/genética , Piel/metabolismo , Redes Reguladoras de Genes/genética , Transcriptoma/genética
6.
Proc Natl Acad Sci U S A ; 117(52): 33474-33485, 2020 12 29.
Artículo en Inglés | MEDLINE | ID: mdl-33318199

RESUMEN

Contact dermatitis tremendously impacts the quality of life of suffering patients. Currently, diagnostic regimes rely on allergy testing, exposure specification, and follow-up visits; however, distinguishing the clinical phenotype of irritant and allergic contact dermatitis remains challenging. Employing integrative transcriptomic analysis and machine-learning approaches, we aimed to decipher disease-related signature genes to find suitable sets of biomarkers. A total of 89 positive patch-test reaction biopsies against four contact allergens and two irritants were analyzed via microarray. Coexpression network analysis and Random Forest classification were used to discover potential biomarkers and selected biomarker models were validated in an independent patient group. Differential gene-expression analysis identified major gene-expression changes depending on the stimulus. Random Forest classification identified CD47, BATF, FASLG, RGS16, SYNPO, SELE, PTPN7, WARS, PRC1, EXO1, RRM2, PBK, RAD54L, KIFC1, SPC25, PKMYT, HISTH1A, TPX2, DLGAP5, TPX2, CH25H, and IL37 as potential biomarkers to distinguish allergic and irritant contact dermatitis in human skin. Validation experiments and prediction performances on external testing datasets demonstrated potential applicability of the identified biomarker models in the clinic. Capitalizing on this knowledge, novel diagnostic tools can be developed to guide clinical diagnosis of contact allergies.


Asunto(s)
Biomarcadores/metabolismo , Dermatitis Alérgica por Contacto/diagnóstico , Dermatitis Irritante/diagnóstico , Aprendizaje Automático , Adulto , Algoritmos , Alérgenos , Bases de Datos Genéticas , Dermatitis Alérgica por Contacto/genética , Dermatitis Irritante/genética , Diagnóstico Diferencial , Femenino , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Irritantes , Leucocitos/metabolismo , Masculino , Pruebas del Parche , Reproducibilidad de los Resultados , Índice de Severidad de la Enfermedad , Piel/patología , Transcriptoma/genética
7.
Brief Bioinform ; 21(6): 1937-1953, 2020 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-31774113

RESUMEN

The drug discovery process starts with identification of a disease-modifying target. This critical step traditionally begins with manual investigation of scientific literature and biomedical databases to gather evidence linking molecular target to disease, and to evaluate the efficacy, safety and commercial potential of the target. The high-throughput and affordability of current omics technologies, allowing quantitative measurements of many putative targets (e.g. DNA, RNA, protein, metabolite), has exponentially increased the volume of scientific data available for this arduous task. Therefore, computational platforms identifying and ranking disease-relevant targets from existing biomedical data sources, including omics databases, are needed. To date, more than 30 drug target discovery (DTD) platforms exist. They provide information-rich databases and graphical user interfaces to help scientists identify putative targets and pre-evaluate their therapeutic efficacy and potential side effects. Here we survey and compare a set of popular DTD platforms that utilize multiple data sources and omics-driven knowledge bases (either directly or indirectly) for identifying drug targets. We also provide a description of omics technologies and related data repositories which are important for DTD tasks.


Asunto(s)
Biología Computacional , Descubrimiento de Drogas , Genómica , Bases del Conocimiento , Bases de Datos Factuales , Sistemas de Liberación de Medicamentos , Preparaciones Farmacéuticas , Proteómica
8.
Bioinformatics ; 36(14): 4214-4216, 2020 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-32437556

RESUMEN

SUMMARY: Estimating efficacy of gene-target-disease associations is a fundamental step in drug discovery. An important data source for this laborious task is RNA expression, which can provide gene-disease associations on the basis of expression fold change and statistical significance. However, the simply use of the log-fold change can lead to numerous false-positive associations. On the other hand, more sophisticated methods that utilize gene co-expression networks do not consider tissue specificity. Here, we introduce Transcriptome-driven Efficacy estimates for gene-based TArget discovery (ThETA), an R package that enables non-expert users to use novel efficacy scoring methods for drug-target discovery. In particular, ThETA allows users to search for gene perturbation (therapeutics) that reverse disease-gene expression and genes that are closely related to disease-genes in tissue-specific networks. ThETA also provides functions to integrate efficacy evaluations obtained with different approaches and to build an overall efficacy score, which can be used to identify and prioritize gene(target)-disease associations. Finally, ThETA implements visualizations to show tissue-specific interconnections between target and disease-genes, and to indicate biological annotations associated with the top selected genes. AVAILABILITY AND IMPLEMENTATION: ThETA is freely available for academic use at https://github.com/vittoriofortino84/ThETA. CONTACT: vittorio.fortino@uef.fi. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Transcriptoma , Descubrimiento de Drogas , Redes Reguladoras de Genes
9.
Bioinformatics ; 36(1): 145-153, 2020 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-31233136

RESUMEN

SUMMARY: Quantitative structure-activity relationship (QSAR) modelling is currently used in multiple fields to relate structural properties of compounds to their biological activities. This technique is also used for drug design purposes with the aim of predicting parameters that determine drug behaviour. To this end, a sophisticated process, involving various analytical steps concatenated in series, is employed to identify and fine-tune the optimal set of predictors from a large dataset of molecular descriptors (MDs). The search of the optimal model requires to optimize multiple objectives at the same time, as the aim is to obtain the minimal set of features that maximizes the goodness of fit and the applicability domain (AD). Hence, a multi-objective optimization strategy, improving multiple parameters in parallel, can be applied. Here we propose a new multi-niche multi-objective genetic algorithm that simultaneously enables stable feature selection as well as obtaining robust and validated regression models with maximized AD. We benchmarked our method on two simulated datasets. Moreover, we analyzed an aquatic acute toxicity dataset and compared the performances of single- and multi-objective fitness functions on different regression models. Our results show that our multi-objective algorithm is a valid alternative to classical QSAR modelling strategy, for continuous response values, since it automatically finds the model with the best compromise between statistical robustness, predictive performance, widest AD, and the smallest number of MDs. AVAILABILITY AND IMPLEMENTATION: The python implementation of MaNGA is available at https://github.com/Greco-Lab/MaNGA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Biología Computacional , Modelos Químicos , Relación Estructura-Actividad Cuantitativa , Biología Computacional/métodos , Diseño de Fármacos
10.
Clin Exp Allergy ; 50(10): 1148-1158, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-32865840

RESUMEN

BACKGROUND: After the Second World War, the population living in the Karelian region was strictly divided by the "iron curtain" between Finland and Russia. This resulted in different lifestyle, standard of living, and exposure to the environment. Allergic manifestations and sensitization to common allergens have been much more common on the Finnish compared to the Russian side. OBJECTIVE: The remarkable allergy disparity in the Finnish and Russian Karelia calls for immunological explanations. METHODS: Young people, aged 15-20 years, in the Finnish (n = 69) and Russian (n = 75) Karelia were studied. The impact of genetic variation on the phenotype was studied by a genome-wide association analysis. Differences in gene expression (transcriptome) were explored from the blood mononuclear cells (PBMC) and related to skin and nasal epithelium microbiota and sensitization. RESULTS: The genotype differences between the Finnish and Russian populations did not explain the allergy gap. The network of gene expression and skin and nasal microbiota was richer and more diverse in the Russian subjects. When the function of 261 differentially expressed genes was explored, innate immunity pathways were suppressed among Russians compared to Finns. Differences in the gene expression paralleled the microbiota disparity. High Acinetobacter abundance in Russians correlated with suppression of innate immune response. High-total IgE was associated with enhanced anti-viral response in the Finnish but not in the Russian subjects. CONCLUSIONS AND CLINICAL RELEVANCE: Young populations living in the Finnish and Russian Karelia show marked differences in genome-wide gene expression and host contrasting skin and nasal epithelium microbiota. The rich gene-microbe network in Russians seems to result in a better-balanced innate immunity and associates with low allergy prevalence.


Asunto(s)
Disparidades en el Estado de Salud , Hipersensibilidad/epidemiología , Inmunidad Innata , Microbiota/inmunología , Adolescente , Factores de Edad , Femenino , Finlandia/epidemiología , Redes Reguladoras de Genes , Estudio de Asociación del Genoma Completo , Interacciones Microbiota-Huesped , Humanos , Hipersensibilidad/inmunología , Hipersensibilidad/microbiología , Hipersensibilidad/virología , Inmunidad Innata/genética , Inmunoglobulina E/sangre , Leucocitos Mononucleares/inmunología , Leucocitos Mononucleares/microbiología , Leucocitos Mononucleares/virología , Masculino , Mucosa Nasal/inmunología , Mucosa Nasal/microbiología , Mucosa Nasal/virología , Polimorfismo de Nucleótido Simple , Prevalencia , Federación de Rusia/epidemiología , Piel/inmunología , Piel/microbiología , Piel/virología , Transcriptoma , Adulto Joven
11.
Int J Mol Sci ; 21(8)2020 Apr 24.
Artículo en Inglés | MEDLINE | ID: mdl-32344727

RESUMEN

Endocrine disruptors (EDs) are defined as chemicals that mimic, block, or interfere with hormones in the body's endocrine systems and have been associated with a diverse array of health issues. The concept of endocrine disruption has recently been extended to metabolic alterations that may result in diseases, such as obesity, diabetes, and fatty liver disease, and constitute an increasing health concern worldwide. However, while epidemiological and experimental data on the close association of EDs and adverse metabolic effects are mounting, predictive methods and models to evaluate the detailed mechanisms and pathways behind these observed effects are lacking, thus restricting the regulatory risk assessment of EDs. The EDCMET (Metabolic effects of Endocrine Disrupting Chemicals: novel testing METhods and adverse outcome pathways) project brings together systems toxicologists; experimental biologists with a thorough understanding of the molecular mechanisms of metabolic disease and comprehensive in vitro and in vivo methodological skills; and, ultimately, epidemiologists linking environmental exposure to adverse metabolic outcomes. During its 5-year journey, EDCMET aims to identify novel ED mechanisms of action, to generate (pre)validated test methods to assess the metabolic effects of Eds, and to predict emergent adverse biological phenotypes by following the adverse outcome pathway (AOP) paradigm.


Asunto(s)
Disruptores Endocrinos/efectos adversos , Metabolismo Energético/efectos de los fármacos , Animales , Biomarcadores , Susceptibilidad a Enfermedades , Sistema Endocrino/efectos de los fármacos , Sistema Endocrino/metabolismo , Exposición a Riesgos Ambientales , Contaminantes Ambientales , Epigénesis Genética , Humanos , Enfermedades Metabólicas/etiología , Enfermedades Metabólicas/metabolismo , Mitocondrias/genética , Mitocondrias/metabolismo , Receptores Citoplasmáticos y Nucleares/genética , Receptores Citoplasmáticos y Nucleares/metabolismo
12.
Bioinformatics ; 34(12): 2136-2138, 2018 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-29425308

RESUMEN

Summary: Detecting and interpreting responsive modules from gene expression data by using network-based approaches is a common but laborious task. It often requires the application of several computational methods implemented in different software packages, forcing biologists to compile complex analytical pipelines. Here we introduce INfORM (Inference of NetwOrk Response Modules), an R shiny application that enables non-expert users to detect, evaluate and select gene modules with high statistical and biological significance. INfORM is a comprehensive tool for the identification of biologically meaningful response modules from consensus gene networks inferred by using multiple algorithms. It is accessible through an intuitive graphical user interface allowing for a level of abstraction from the computational steps. Availability and implementation: INfORM is freely available for academic use at https://github.com/Greco-Lab/INfORM. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Expresión Génica , Redes Reguladoras de Genes , Programas Informáticos , Algoritmos
13.
Part Fibre Toxicol ; 16(1): 28, 2019 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-31277695

RESUMEN

BACKGROUND: Copper oxide (CuO) nanomaterials are used in a wide range of industrial and commercial applications. These materials can be hazardous, especially if they are inhaled. As a result, the pulmonary effects of CuO nanomaterials have been studied in healthy subjects but limited knowledge exists today about their effects on lungs with allergic airway inflammation (AAI). The objective of this study was to investigate how pristine CuO modulates allergic lung inflammation and whether surface modifications can influence its reactivity. CuO and its carboxylated (CuO COOH), methylaminated (CuO NH3) and PEGylated (CuO PEG) derivatives were administered here on four consecutive days via oropharyngeal aspiration in a mouse model of AAI. Standard genome-wide gene expression profiling as well as conventional histopathological and immunological methods were used to investigate the modulatory effects of the nanomaterials on both healthy and compromised immune system. RESULTS: Our data demonstrates that although CuO materials did not considerably influence hallmarks of allergic airway inflammation, the materials exacerbated the existing lung inflammation by eliciting dramatic pulmonary neutrophilia. Transcriptomic analysis showed that CuO, CuO COOH and CuO NH3 commonly enriched neutrophil-related biological processes, especially in healthy mice. In sharp contrast, CuO PEG had a significantly lower potential in triggering changes in lungs of healthy and allergic mice revealing that surface PEGylation suppresses the effects triggered by the pristine material. CONCLUSIONS: CuO as well as its functionalized forms worsen allergic airway inflammation by causing neutrophilia in the lungs, however, our results also show that surface PEGylation can be a promising approach for inhibiting the effects of pristine CuO. Our study provides information for health and safety assessment of modified CuO materials, and it can be useful in the development of nanomedical applications.


Asunto(s)
Cobre/toxicidad , Nanopartículas/toxicidad , Infiltración Neutrófila/efectos de los fármacos , Neumonía/inducido químicamente , Polietilenglicoles/química , Transcriptoma/efectos de los fármacos , Animales , Cobre/química , Femenino , Perfilación de la Expresión Génica , Estudio de Asociación del Genoma Completo , Ratones Endogámicos BALB C , Nanopartículas/química , Ovalbúmina/inmunología , Neumonía/genética , Neumonía/inmunología , Neumonía/patología , Propiedades de Superficie
14.
Int J Mol Sci ; 21(1)2019 Dec 18.
Artículo en Inglés | MEDLINE | ID: mdl-31861438

RESUMEN

The explosion of omics data availability in cancer research has boosted the knowledge of the molecular basis of cancer, although the strategies for its definitive resolution are still not well established. The complexity of cancer biology, given by the high heterogeneity of cancer cells, leads to the development of pharmacoresistance for many patients, hampering the efficacy of therapeutic approaches. Machine learning techniques have been implemented to extract knowledge from cancer omics data in order to address fundamental issues in cancer research, as well as the classification of clinically relevant sub-groups of patients and for the identification of biomarkers for disease risk and prognosis. Rule induction algorithms are a group of pattern discovery approaches that represents discovered relationships in the form of human readable associative rules. The application of such techniques to the modern plethora of collected cancer omics data can effectively boost our understanding of cancer-related mechanisms. In fact, the capability of these methods to extract a huge amount of human readable knowledge will eventually help to uncover unknown relationships between molecular attributes and the malignant phenotype. In this review, we describe applications and strategies for the usage of rule induction approaches in cancer omics data analysis. In particular, we explore the canonical applications and the future challenges and opportunities posed by multi-omics integration problems.


Asunto(s)
Genómica , Metabolómica , Neoplasias/etiología , Neoplasias/metabolismo , Proteómica , Biología Computacional/métodos , Bases de Datos Genéticas , Genómica/métodos , Humanos , Aprendizaje Automático , Metabolómica/métodos , Proteómica/métodos
15.
Bioinformatics ; 32(20): 3199-3200, 2016 10 15.
Artículo en Inglés | MEDLINE | ID: mdl-27296981

RESUMEN

The use of high-throughput RNA sequencing to predict dynamic operon structures in prokaryotic genomes has recently gained popularity in bioinformatics. We provide the R implementation of a novel method that uses transcriptomic features extracted from RNA-seq transcriptome profiles to develop ensemble classifiers for condition-dependent operon predictions. The CONDOP package provides a deeper insight into RNA-seq data analysis and allows scientists to highlight the operon organization in the context of transcriptional regulation with a few lines of code. AVAILABILITY AND IMPLEMENTATION: CONDOP is implemented in R and is freely available at CRAN. CONTACT: vittorio.fortino@helsinki.fiSupplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Operón , Análisis de Secuencia de ARN , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento , ARN
16.
BMC Bioinformatics ; 16: 37, 2015 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-25652236

RESUMEN

BACKGROUND: DAVID is the most popular tool for interpreting large lists of gene/proteins classically produced in high-throughput experiments. However, the use of DAVID website becomes difficult when analyzing multiple gene lists, for it does not provide an adequate visualization tool to show/compare multiple enrichment results in a concise and informative manner. RESULT: We implemented a new R-based graphical tool, BACA (Bubble chArt to Compare Annotations), which uses the DAVID web service for cross-comparing enrichment analysis results derived from multiple large gene lists. BACA is implemented in R and is freely available at the CRAN repository ( http://cran.r-project.org/web/packages/BACA/ ). CONCLUSION: The package BACA allows R users to combine multiple annotation charts into one output graph by passing DAVID website.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Genes , Anotación de Secuencia Molecular , Proteínas , Programas Informáticos
17.
BMC Bioinformatics ; 16: 261, 2015 Aug 19.
Artículo en Inglés | MEDLINE | ID: mdl-26283178

RESUMEN

BACKGROUND: Multiple high-throughput molecular profiling by omics technologies can be collected for the same individuals. Combining these data, rather than exploiting them separately, can significantly increase the power of clinically relevant patients subclassifications. RESULTS: We propose a multi-view approach in which the information from different data layers (views) is integrated at the levels of the results of each single view clustering iterations. It works by factorizing the membership matrices in a late integration manner. We evaluated the effectiveness and the performance of our method on six multi-view cancer datasets. In all the cases, we found patient sub-classes with statistical significance, identifying novel sub-groups previously not emphasized in literature. Our method performed better as compared to other multi-view clustering algorithms and, unlike other existing methods, it is able to quantify the contribution of single views on the final results. CONCLUSION: Our observations suggest that integration of prior information with genomic features in the subtyping analysis is an effective strategy in identifying disease subgroups. The methodology is implemented in R and the source code is available online at http://neuronelab.unisa.it/a-multi-view-genomic-data-integration-methodology/ .


Asunto(s)
Algoritmos , Genómica/métodos , Análisis por Conglomerados , MicroARNs/genética , MicroARNs/metabolismo , Análisis de Secuencia de ARN
18.
BMC Bioinformatics ; 16: 151, 2015 May 12.
Artículo en Inglés | MEDLINE | ID: mdl-25962835

RESUMEN

BACKGROUND: OMICs technologies allow to assay the state of a large number of different features (e.g., mRNA expression, miRNA expression, copy number variation, DNA methylation, etc.) from the same samples. The objective of these experiments is usually to find a reduced set of significant features, which can be used to differentiate the conditions assayed. In terms of development of novel feature selection computational methods, this task is challenging for the lack of fully annotated biological datasets to be used for benchmarking. A possible way to tackle this problem is generating appropriate synthetic datasets, whose composition and behaviour are fully controlled and known a priori. RESULTS: Here we propose a novel method centred on the generation of networks of interactions among different biological molecules, especially involved in regulating gene expression. Synthetic datasets are obtained from ordinary differential equations based models with known parameters. Our results show that the generated datasets are well mimicking the behaviour of real data, for popular data analysis methods are able to selectively identify existing interactions. CONCLUSIONS: The proposed method can be used in conjunction to real biological datasets in the assessment of data mining techniques. The main strength of this method consists in the full control on the simulated data while retaining coherence with the real biological processes. The R package MVBioDataSim is freely available to the scientific community at http://neuronelab.unisa.it/?p=1722.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Simulación por Computador , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Genómica/métodos , Variaciones en el Número de Copia de ADN , Metilación de ADN , Conjuntos de Datos como Asunto , Regulación de la Expresión Génica , Humanos , MicroARNs/genética
19.
BMC Bioinformatics ; 15: 145, 2014 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-24884724

RESUMEN

BACKGROUND: Inferring operon maps is crucial to understanding the regulatory networks of prokaryotic genomes. Recently, RNA-seq based transcriptome studies revealed that in many bacterial species the operon structure vary with the change of environmental conditions. Therefore, new computational solutions that use both static and dynamic data are necessary to create condition specific operon predictions. RESULTS: In this work, we propose a novel classification method that integrates RNA-seq based transcriptome profiles with genomic sequence features to accurately identify the operons that are expressed under a measured condition. The classifiers are trained on a small set of confirmed operons and then used to classify the remaining gene pairs of the organism studied. Finally, by linking consecutive gene pairs classified as operons, our computational approach produces condition-dependent operon maps. We evaluated our approach on various RNA-seq expression profiles of the bacteria Haemophilus somni, Porphyromonas gingivalis, Escherichia coli and Salmonella enterica. Our results demonstrate that, using features depending on both transcriptome dynamics and genome sequence characteristics, we can identify operon pairs with high accuracy. Moreover, the combination of DNA sequence and expression data results in more accurate predictions than each one alone. CONCLUSION: We present a computational strategy for the comprehensive analysis of condition-dependent operon maps in prokaryotes. Our method can be used to generate condition specific operon maps of many bacterial organisms for which high-resolution transcriptome data is available.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Genoma Bacteriano , Operón , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ARN/métodos , Genómica/métodos , Anotación de Secuencia Molecular
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA