Your browser doesn't support javascript.
loading
Detection of suspicious interactions of spiking covariates in methylation data.
Sieg, Miriam; Richter, Gesa; Schaefer, Arne S; Kruppa, Jochen.
Afiliación
  • Sieg M; Charité - University Medicine, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, Berlin, 10117, Germany.
  • Richter G; Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Strane 2, Berlin, 10178, Germany.
  • Schaefer AS; Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Strane 2, Berlin, 10178, Germany.
  • Kruppa J; Department of Periodontology and Synoptic Dentistry, Institute of Dental, Oral and Maxillary Medicine, Charité - University Medicine, Charitéplatz 1, Berlin, 10117, Germany.
BMC Bioinformatics ; 21(1): 36, 2020 Jan 30.
Article en En | MEDLINE | ID: mdl-32000657
BACKGROUND: In methylation analyses like epigenome-wide association studies, a high amount of biomarkers is tested for an association between the measured continuous outcome and different covariates. In the case of a continuous covariate like smoking pack years (SPY), a measure of lifetime exposure to tobacco toxins, a spike at zero can occur. Hence, all non-smokers are generating a peak at zero, while the smoking patients are distributed over the other SPY values. Additionally, the spike might also occur on the right side of the covariate distribution, if a category "heavy smoker" is designed. Here, we will focus on methylation data with a spike at the left or the right of the distribution of a continuous covariate. After the methylation data is generated, analysis is usually performed by preprocessing, quality control, and determination of differentially methylated sites, often performed in pipeline fashion. Hence, the data is processed in a string of methods, which are available in one software package. The pipelines can distinguish between categorical covariates, i.e. for group comparisons or continuous covariates, i.e. for linear regression. The differential methylation analysis is often done internally by a linear regression without checking its inherent assumptions. A spike in the continuous covariate is ignored and can cause biased results. RESULTS: We have reanalysed five data sets, four freely available from ArrayExpress, including methylation data and smoking habits reported by smoking pack years. Therefore, we generated an algorithm to check for the occurrences of suspicious interactions between the values associated with the spike position and the non-spike positions of the covariate. Our algorithm helps to decide if a suspicious interaction can be found and further investigations should be carried out. This is mostly important, because the information on the differentially methylated sites will be used for post-hoc analyses like pathway analyses. CONCLUSIONS: We help to check for the validation of the linear regression assumptions in a methylation analysis pipeline. These assumptions should also be considered for machine learning approaches. In addition, we are able to detect outliers in the continuous covariate. Therefore, more statistical robust results should be produced in methylation analysis using our algorithm as a preprocessing step.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Fumar / Metilación de ADN Tipo de estudio: Diagnostic_studies Límite: Adult / Humans / Middle aged Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2020 Tipo del documento: Article País de afiliación: Alemania

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Fumar / Metilación de ADN Tipo de estudio: Diagnostic_studies Límite: Adult / Humans / Middle aged Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2020 Tipo del documento: Article País de afiliación: Alemania