Your browser doesn't support javascript.
loading
Substantial batch effects in TCGA exome sequences undermine pan-cancer analysis of germline variants.
Rasnic, Roni; Brandes, Nadav; Zuk, Or; Linial, Michal.
Afiliación
  • Rasnic R; The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel. roni.rasnic@mail.huji.ac.il.
  • Brandes N; The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
  • Zuk O; Department of Statistics, The Hebrew University of Jerusalem, Jerusalem, Israel.
  • Linial M; Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel.
BMC Cancer ; 19(1): 783, 2019 Aug 07.
Article en En | MEDLINE | ID: mdl-31391007
ABSTRACT

BACKGROUND:

In recent years, research on cancer predisposition germline variants has emerged as a prominent field. The identity of somatic mutations is based on a reliable mapping of the patient germline variants. In addition, the statistics of germline variants frequencies in healthy individuals and cancer patients is the basis for seeking candidates for cancer predisposition genes. The Cancer Genome Atlas (TCGA) is one of the main sources of such data, providing a diverse collection of molecular data including deep sequencing for more than 30 types of cancer from > 10,000 patients.

METHODS:

Our hypothesis in this study is that whole exome sequences from blood samples of cancer patients are not expected to show systematic differences among cancer types. To test this hypothesis, we analyzed common and rare germline variants across six cancer types, covering 2241 samples from TCGA. In our analysis we accounted for inherent variables in the data including the different variant calling protocols, sequencing platforms, and ethnicity.

RESULTS:

We report on substantial batch effects in germline variants associated with cancer types. We attribute the effect to the specific sequencing centers that produced the data. Specifically, we measured 30% variability in the number of reported germline variants per sample across sequencing centers. The batch effect is further expressed in nucleotide composition and variant frequencies. Importantly, the batch effect causes substantial differences in germline variant distribution patterns across numerous genes, including prominent cancer predisposition genes such as BRCA1, RET, MAX, and KRAS. For most of known cancer predisposition genes, we found a distinct batch-dependent difference in germline variants.

CONCLUSION:

TCGA germline data is exposed to strong batch effects with substantial variabilities among TCGA sequencing centers. We claim that those batch effects are consequential for numerous TCGA pan-cancer studies. In particular, these effects may compromise the reliability and the potency to detect new cancer predisposition genes. Furthermore, interpretation of pan-cancer analyses should be revisited in view of the source of the genomic data after accounting for the reported batch effects.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Genoma Humano / Mutación de Línea Germinal / Genómica / Exoma / Neoplasias Tipo de estudio: Diagnostic_studies / Prognostic_studies Límite: Humans Idioma: En Revista: BMC Cancer Asunto de la revista: NEOPLASIAS Año: 2019 Tipo del documento: Article País de afiliación: Israel

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Genoma Humano / Mutación de Línea Germinal / Genómica / Exoma / Neoplasias Tipo de estudio: Diagnostic_studies / Prognostic_studies Límite: Humans Idioma: En Revista: BMC Cancer Asunto de la revista: NEOPLASIAS Año: 2019 Tipo del documento: Article País de afiliación: Israel
...