Model-Based Clustering With Data Correction For Removing Artifacts In Gene Expression Data.

Young, William Chad; Raftery, Adrian E; Yeung, Ka Yee

Young, William Chad; Raftery, Adrian E; Yeung, Ka Yee.

Afiliación

Young WC; Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195.
Raftery AE; Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195.
Yeung KY; Institute of Technology, University of Washington Tacoma, Campus Box 358426, 1900 Commerce Street, Tacoma, WA 98402.

Ann Appl Stat ; 11(4): 1998-2026, 2016 Feb.

Article en En | MEDLINE | ID: mdl-30740193

RESUMEN

The NIH Library of Integrated Network-based Cellular Signatures (LINCS) contains gene expression data from over a million experiments, using Luminex Bead technology. Only 500 colors are used to measure the expression levels of the 1,000 landmark genes measured, and the data for the resulting pairs of genes are deconvolved. The raw data are sometimes inadequate for reliable deconvolution, leading to artifacts in the final processed data. These include the expression levels of paired genes being flipped or given the same value, and clusters of values that are not at the true expression level. We propose a new method called model-based clustering with data correction (MCDC) that is able to identify and correct these three kinds of artifacts simultaneously. We show that MCDC improves the resulting gene expression data in terms of agreement with external baselines, as well as improving results from subsequent analysis.

Palabras clave

Gene regulatory network; LINCS; MCDC; Model-based clustering

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: Ann Appl Stat Año: 2016 Tipo del documento: Article Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google