MethylToSNP: identifying SNPs in Illumina DNA methylation array data.

LaBarre, Brenna A; Goncearenco, Alexander; Petrykowska, Hanna M; Jaratlerdsiri, Weerachai; Bornman, M S Riana; Hayes, Vanessa M; Elnitski, Laura

LaBarre, Brenna A; Goncearenco, Alexander; Petrykowska, Hanna M; Jaratlerdsiri, Weerachai; Bornman, M S Riana; Hayes, Vanessa M; Elnitski, Laura.

Afiliação

LaBarre BA; Graduate Program in Bioinformatics, Boston University, Boston, MA, USA.
Goncearenco A; Genomic Functional Analysis Section, Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 49 Convent Dr., Bethesda, MD, 20892, USA.
Petrykowska HM; Genomic Functional Analysis Section, Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 49 Convent Dr., Bethesda, MD, 20892, USA.
Jaratlerdsiri W; Genomic Functional Analysis Section, Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 49 Convent Dr., Bethesda, MD, 20892, USA.
Bornman MSR; Laboratory for Human Comparative & Prostate Cancer Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia.
Hayes VM; School of Health Systems and Public Health, University of Pretoria, Hatfield, Pretoria, South Africa.
Elnitski L; Laboratory for Human Comparative & Prostate Cancer Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia.

Epigenetics Chromatin ; 12(1): 79, 2019 12 20.

Article em En | MEDLINE | ID: mdl-31861999

RESUMO

BACKGROUND: Current array-based methods for the measurement of DNA methylation rely on the process of sodium bisulfite conversion to differentiate between methylated and unmethylated cytosine bases in DNA. In the absence of genotype data this process can lead to ambiguity in data interpretation when a sample has polymorphisms at a methylation probe site. A common way to minimize this problem is to exclude such potentially problematic sites, with some methods removing as much as 60% of array probes from consideration before data analysis. RESULTS: Here, we present an algorithm implemented in an R Bioconductor package, MethylToSNP, which detects a characteristic data pattern to infer sites likely to be confounded by polymorphisms. Additionally, the tool provides a stringent reliability score to allow thresholding on SNP predictions. We calibrated parameters and thresholds used by the algorithm on simulated and real methylation data sets. We illustrate findings using methylation data from YRI (Yoruba in Ibadan, Nigeria), CEPH (European descent) and KhoeSan (southern African) populations. Our polymorphism predictions made using MethylToSNP have been validated through SNP databases and bisulfite and genomic sequencing. CONCLUSIONS: The benefits of this method are threefold. First, it prevents extensive data loss by considering only SNPs specific to the individuals in the study. Second, it offers the possibility to identify new polymorphisms in samples for which there is little known about the genetic landscape. Third, it identifies variants as they exist in functional regions of a genome, such as in CTCF (transcriptional repressor) sites and enhancers, that may be common alleles or personal mutations with potential to deleteriously affect genomic regulatory activities. We demonstrate that MethylToSNP is applicable to the Illumina 450K and Illumina 850K EPIC array data and is also backwards compatible to the 27K methylation arrays. Going forward, this kind of nuanced approach can increase the amount of information derived from precious data sets by considering samples of the project individually to enable more informed decisions about data cleaning.

Assuntos

Algoritmos; Metilação de DNA; Análise de Sequência com Séries de Oligonucleotídeos/métodos; Polimorfismo de Nucleotídeo Único; Interface Usuário-Computador; Ilhas de CpG; Bases de Dados Genéticas; Elementos Facilitadores Genéticos; Epigenômica/métodos; Genoma Humano; Humanos; Namíbia

Palavras-chave

Bisulfite sequencing; CTCF sites; Data analysis; Enhancers; Illumina methylation array; Methylation probes; Polymorphisms; Single nucleotide polymorphisms (SNPs)

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Interface Usuário-Computador / Metilação de DNA / Análise de Sequência com Séries de Oligonucleotídeos / Polimorfismo de Nucleotídeo Único Tipo de estudo: Prognostic_studies Limite: Humans País/Região como assunto: Africa Idioma: En Ano de publicação: 2019 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google