An algorithm for chemical genomic profiling that minimizes batch effects: bucket evaluations.

Shabtai, Daniel; Giaever, Guri; Nislow, Corey

Shabtai, Daniel; Giaever, Guri; Nislow, Corey.

Afiliación

Shabtai D; Department of Cell and Systems Biology and the Donnelly Centre, University of Toronto, Toronto, ON, Canada.

BMC Bioinformatics ; 13: 245, 2012 Sep 25.

Article en En | MEDLINE | ID: mdl-23009392

RESUMEN

BACKGROUND: Chemical genomics is an interdisciplinary field that combines small molecule perturbation with traditional genomics to understand gene function and to study the mode(s) of drug action. A benefit of chemical genomic screens is their breadth; each screen can capture the sensitivity of comprehensive collections of mutants or, in the case of mammalian cells, gene knock-downs, simultaneously. As with other large-scale experimental platforms, to compare and contrast such profiles, e.g. for clustering known compounds with uncharacterized compounds, a robust means to compare a large cohort of profiles is required. Existing methods for correlating different chemical profiles include diverse statistical discriminant analysis-based methods and specific gene filtering or normalization methods. Though powerful, none are ideal because they typically require one to define the disrupting effects, commonly known as batch effects, to detect true signal from experimental variation. These effects are not always known, and they can mask true biological differences. We present a method, Bucket Evaluations (BE) that surmounts many of these problems and is extensible to other datasets such as those obtained via gene expression profiling and which is platform independent. RESULTS: We designed an algorithm to analyse chemogenomic profiles to identify potential targets of known drugs and new chemical compounds. We used levelled rank comparisons to identify drugs/compounds with similar profiles that minimizes batch effects and avoids the requirement of pre-defining the disrupting effects. This algorithm was also tested on gene expression microarray data and high throughput sequencing chemogenomic screens and found the method is applicable to a variety of dataset types. CONCLUSIONS: BE, along with various correlation methods on a collection of datasets proved to be highly accurate for locating similarity between experiments. BE is a non-parametric correlation approach, which is suitable for locating correlations in somewhat perturbed datasets such as chemical genomic profiles. We created software and a user interface for using BE, which is publically available.

Asunto(s)

Algoritmos; Perfilación de la Expresión Génica/métodos; Estudio de Asociación del Genoma Completo; Animales; Análisis por Conglomerados; Técnicas de Silenciamiento del Gen; Secuenciación de Nucleótidos de Alto Rendimiento; Análisis de Secuencia por Matrices de Oligonucleótidos; Saccharomyces cerevisiae/genética; Saccharomyces cerevisiae/fisiología; Análisis de Secuencia de ADN; Programas Informáticos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Algoritmos / Perfilación de la Expresión Génica / Estudio de Asociación del Genoma Completo Tipo de estudio: Prognostic_studies Límite: Animals Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2012 Tipo del documento: Article País de afiliación: Canadá

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google