PIP-SNP: a pipeline for processing SNP data featured as linkage disequilibrium bin mapping, genotype imputing and marker synthesizing.

Zhang, Wenchao; Kang, Yun; Dai, Xinbin; Xu, Shizhong; Zhao, Patrick X

Zhang, Wenchao; Kang, Yun; Dai, Xinbin; Xu, Shizhong; Zhao, Patrick X.

Afiliação

Zhang W; Noble Research Institute LLC, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA.
Kang Y; Noble Research Institute LLC, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA.
Dai X; Noble Research Institute LLC, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA.
Xu S; Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA.
Zhao PX; Noble Research Institute LLC, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA.

NAR Genom Bioinform ; 3(3): lqab060, 2021 Sep.

Article em En | MEDLINE | ID: mdl-34235432

RESUMO

Genome-wide association study data analyses often face two significant challenges: (i) high dimensionality of single-nucleotide polymorphism (SNP) genotypes and (ii) imputation of missing values. SNPs are not independent due to physical linkage and natural selection. The correlation of nearby SNPs is known as linkage disequilibrium (LD), which can be used for LD conceptual SNP bin mapping, missing genotype inferencing and SNP dimension reduction. We used a stochastic process to describe the SNP signals and proposed two types of autocorrelations to measure nearby SNPs' information redundancy. Based on the calculated autocorrelation coefficients, we constructed LD bins. We adopted a k-nearest neighbors algorithm (kNN) to impute the missing genotypes. We proposed several novel methods to find the optimal synthetic marker to represent the SNP bin. We also proposed methods to evaluate the information loss or information conservation between using the original genome-wide markers and using dimension-reduced synthetic markers. Our performance assessments on the real-life SNP data from a rice recombinant inbred line (RIL) population and a rice HapMap project show that the new methods produce satisfactory results. We implemented these functional modules in C/C++ and streamlined them into a web-based pipeline named PIP-SNP (https://bioinfo.noble.org/PIP_SNP/) for processing SNP data.

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: NAR Genom Bioinform Ano de publicação: 2021 Tipo de documento: Article País de afiliação: Estados Unidos País de publicação: Reino Unido

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google