Your browser doesn't support javascript.
loading
Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster.
Lin, Yanzhu; Golovnina, Kseniya; Chen, Zhen-Xia; Lee, Hang Noh; Negron, Yazmin L Serrano; Sultana, Hina; Oliver, Brian; Harbison, Susan T.
Afiliación
  • Lin Y; Laboratory of Systems Genetics, Center for Systems Biology, National Heart Lung and Blood Institute, 10 Center Drive, MSC 1640, Bethesda, MD, 20892, USA. yanzhu.lin@nih.gov.
  • Golovnina K; Developmental Genomics Section, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, USA. kseniya.golovnina@nih.gov.
  • Chen ZX; Developmental Genomics Section, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, USA. zhen-xia.chen@nih.gov.
  • Lee HN; Developmental Genomics Section, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, USA. hangnoh.lee@nih.gov.
  • Negron YL; Laboratory of Systems Genetics, Center for Systems Biology, National Heart Lung and Blood Institute, 10 Center Drive, MSC 1640, Bethesda, MD, 20892, USA. yazmin.serranonegron@nih.gov.
  • Sultana H; Developmental Genomics Section, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, USA. hina.sultana@nih.gov.
  • Oliver B; Developmental Genomics Section, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, USA. briano@helix.nih.gov.
  • Harbison ST; Laboratory of Systems Genetics, Center for Systems Biology, National Heart Lung and Blood Institute, 10 Center Drive, MSC 1640, Bethesda, MD, 20892, USA. susan.harbison@nih.gov.
BMC Genomics ; 17: 28, 2016 Jan 05.
Article en En | MEDLINE | ID: mdl-26732976
ABSTRACT

BACKGROUND:

A generally accepted approach to the analysis of RNA-Seq read count data does not yet exist. We sequenced the mRNA of 726 individuals from the Drosophila Genetic Reference Panel in order to quantify differences in gene expression among single flies. One of our experimental goals was to identify the optimal analysis approach for the detection of differential gene expression among the factors we varied in the experiment genotype, environment, sex, and their interactions. Here we evaluate three different filtering strategies, eight normalization methods, and two statistical approaches using our data set. We assessed differential gene expression among factors and performed a statistical power analysis using the eight biological replicates per genotype, environment, and sex in our data set.

RESULTS:

We found that the most critical considerations for the analysis of RNA-Seq read count data were the normalization method, underlying data distribution assumption, and numbers of biological replicates, an observation consistent with previous RNA-Seq and microarray analysis comparisons. Some common normalization methods, such as Total Count, Quantile, and RPKM normalization, did not align the data across samples. Furthermore, analyses using the Median, Quantile, and Trimmed Mean of M-values normalization methods were sensitive to the removal of low-expressed genes from the data set. Although it is robust in many types of analysis, the normal data distribution assumption produced results vastly different than the negative binomial distribution. In addition, at least three biological replicates per condition were required in order to have sufficient statistical power to detect expression differences among the three-way interaction of genotype, environment, and sex.

CONCLUSIONS:

The best analysis approach to our data was to normalize the read counts using the DESeq method and apply a generalized linear model assuming a negative binomial distribution using either edgeR or DESeq software. Genes having very low read counts were removed after normalizing the data and fitting it to the negative binomial distribution. We describe the results of this evaluation and include recommended analysis strategies for RNA-Seq read count data.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: ARN Mensajero / Regulación de la Expresión Génica / Drosophila melanogaster / Secuenciación de Nucleótidos de Alto Rendimiento Tipo de estudio: Prognostic_studies Límite: Animals Idioma: En Revista: BMC Genomics Asunto de la revista: GENETICA Año: 2016 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: ARN Mensajero / Regulación de la Expresión Génica / Drosophila melanogaster / Secuenciación de Nucleótidos de Alto Rendimiento Tipo de estudio: Prognostic_studies Límite: Animals Idioma: En Revista: BMC Genomics Asunto de la revista: GENETICA Año: 2016 Tipo del documento: Article País de afiliación: Estados Unidos