Búsqueda | Biblioteca Virtual en Salud

Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle?

Touw, Wouter G; Bayjanov, Jumamurat R; Overmars, Lex; Backus, Lennart; Boekhorst, Jos; Wels, Michiel; van Hijum, Sacha A F T.

Brief Bioinform ; 14(3): 315-26, 2013 May.

Artículo en Inglés | MEDLINE | ID: mdl-22786785

RESUMEN

In the Life Sciences 'omics' data is increasingly generated by different high-throughput technologies. Often only the integration of these data allows uncovering biological insights that can be experimentally validated or mechanistically modelled, i.e. sophisticated computational approaches are required to extract the complex non-linear trends present in omics data. Classification techniques allow training a model based on variables (e.g. SNPs in genetic association studies) to separate different classes (e.g. healthy subjects versus patients). Random Forest (RF) is a versatile classification algorithm suited for the analysis of these large data sets. In the Life Sciences, RF is popular because RF classification models have a high-prediction accuracy and provide information on importance of variables for classification. For omics data, variables or conditional relations between variables are typically important for a subset of samples of the same class. For example: within a class of cancer patients certain SNP combinations may be important for a subset of patients that have a specific subtype of cancer, but not important for a different subset of patients. These conditional relationships can in principle be uncovered from the data with RF as these are implicitly taken into account by the algorithm during the creation of the classification model. This review details some of the to the best of our knowledge rarely or never used RF properties that allow maximizing the biological insights that can be extracted from complex omics data sets using RF.

Asunto(s)

Algoritmos , Disciplinas de las Ciencias Biológicas , Minería de Datos , Humanos , Neoplasias/genética , Polimorfismo de Nucleótido Simple

Draft Genome Sequences of 24 Lactococcus lactis Strains.

Backus, Lennart; Wels, Michiel; Boekhorst, Jos; Dijkstra, Annereinou R; Beerthuyzen, Marke; Kelly, William J; Siezen, Roland J; van Hijum, Sacha A F T; Bachmann, Herwig.

Genome Announc ; 5(13)2017 Mar 30.

Artículo en Inglés | MEDLINE | ID: mdl-28360177

RESUMEN

The lactic acid bacterium Lactococcus lactis is widely used for the production of fermented dairy products. Here, we present the draft genome sequences of 24 L. lactis strains isolated from different environments and geographic locations.

Draft Genome Sequences of 11 Lactococcus lactis subsp. cremoris Strains.

Wels, Michiel; Backus, Lennart; Boekhorst, Jos; Dijkstra, Annereinou; Beerthuyzen, Marke; Siezen, Roland J; Bachmann, Herwig; van Hijum, Sacha A F T.

Genome Announc ; 5(11)2017 Mar 16.

Artículo en Inglés | MEDLINE | ID: mdl-28302789

RESUMEN

The lactic acid bacterium Lactococcus lactis is widely used for the fermentation of dairy products. Here, we present the draft genome sequences of 11 L. lactis subsp. cremoris strains isolated from different environments.

Draft Genome Sequence of Lactobacillus plantarum SF2A35B.

Bron, Peter A; Lee, I-Chiao; Backus, Lennart; van Hijum, Sacha A F T; Wels, Michiel; Kleerebezem, Michiel.

Genome Announc ; 4(1)2016 Feb 25.

Artículo en Inglés | MEDLINE | ID: mdl-26950330

RESUMEN

The lactic acid bacterium Lactobacillus plantarum is intensively studied as a model probiotic species. Here, we present the draft genome sequence of the exopolysaccharide-producing strain SF2A35B.

Draft Genome Sequence of Streptococcus thermophilus C106, a Dairy Isolate from an Artisanal Cheese Produced in the Countryside of Ireland.

Wels, Michiel; Serrano, L Mariela; Eibrink, Beerd-Jan; Backus, Lennart; Bongers, Roger S; Vriesendorp, Bastienne; Siezen, Roland J; van Hijum, Sacha A F T; Meijer, Wilco C.

Genome Announc ; 3(6)2015 Nov 25.

Artículo en Inglés | MEDLINE | ID: mdl-26607891

RESUMEN

The lactic acid bacterium Streptococcus thermophilus is widely used for the fermentation of dairy products. Here, we present the draft genome sequence of S. thermophilus C106 isolated from an artisanal cheese produced in the countryside of Ireland.

Screening metatranscriptomes for toxin genes as functional drivers of human colorectal cancer.

Dutilh, Bas E; Backus, Lennart; van Hijum, Sacha A F T; Tjalsma, Harold.

Best Pract Res Clin Gastroenterol ; 27(1): 85-99, 2013 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-23768555

RESUMEN

The colonic mucosa is in constant physical interaction with a dense and complex bacterial community that comprises health-promoting and pathogenic microbes. Here, we highlight important clinical studies and experimental models that have linked the intestinal microbiota to the development of colorectal cancer (CRC). Moreover, we use recently published metatranscriptome sequencing data to test whether potentially carcinogenic toxin genes exhibit higher expression levels in human CRC tissue compared to adjacent non-malignant mucosa. Our analyses show a large variation in expression of toxin(-related) genes from different species. Surprisingly, Enterobacterial toxins were among the highest expressed, while Enterobacteria were not among the most abundant species in these samples. Although we can differentiate on- and off-tumour sites based on toxin reads, the read depth profiles are quite similar and show only limited coverage of the toxin genes. Thus, extended metagenomic studies are needed to obtain a high-resolution picture of host-pathogen interactions during human CRC.

Asunto(s)

Neoplasias Colorrectales/genética , Enterotoxinas/genética , Transcriptoma/genética , Tracto Gastrointestinal/microbiología , Expresión Génica/fisiología , Interacciones Huésped-Patógeno , Humanos , Mucosa Intestinal/microbiología , Metagenoma/fisiología

Explaining microbial phenotypes on a genomic scale: GWAS for microbes.

Dutilh, Bas E; Backus, Lennart; Edwards, Robert A; Wels, Michiel; Bayjanov, Jumamurat R; van Hijum, Sacha A F T.

Brief Funct Genomics ; 12(4): 366-80, 2013 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-23625995

RESUMEN

There is an increasing availability of complete or draft genome sequences for microbial organisms. These data form a potentially valuable resource for genotype-phenotype association and gene function prediction, provided that phenotypes are consistently annotated for all the sequenced strains. In this review, we address the requirements for successful gene-trait matching. We outline a basic protocol for microbial functional genomics, including genome assembly, annotation of genotypes (including single nucleotide polymorphisms, orthologous groups and prophages), data pre-processing, genotype-phenotype association, visualization and interpretation of results. The methodologies for association described herein can be applied to other data types, opening up possibilities to analyze transcriptome-phenotype associations, and correlate microbial population structure or activity, as measured by metagenomics, to environmental parameters.

Asunto(s)

Estudio de Asociación del Genoma Completo/métodos , Genómica/métodos , Estudios de Asociación Genética

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

Detalles de la búsqueda