Your browser doesn't support javascript.
loading
Rye: genetic ancestry inference at biobank scale.
Conley, Andrew B; Rishishwar, Lavanya; Ahmad, Maria; Sharma, Shivam; Norris, Emily T; Jordan, I King; Mariño-Ramírez, Leonardo.
Afiliación
  • Conley AB; National Institute on Minority Health and Health Disparities, National Institutes of Health, Bethesda, MD, USA.
  • Rishishwar L; IHRC-Georgia Tech Applied Bioinformatics Laboratory, Atlanta, GA, USA.
  • Ahmad M; PanAmerican Bioinformatics Institute, Valle del Cauca, Cali, Colombia.
  • Sharma S; National Institute on Minority Health and Health Disparities, National Institutes of Health, Bethesda, MD, USA.
  • Norris ET; IHRC-Georgia Tech Applied Bioinformatics Laboratory, Atlanta, GA, USA.
  • Jordan IK; PanAmerican Bioinformatics Institute, Valle del Cauca, Cali, Colombia.
  • Mariño-Ramírez L; School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA.
Nucleic Acids Res ; 51(8): e44, 2023 05 08.
Article en En | MEDLINE | ID: mdl-36928108
Biobank projects are generating genomic data for many thousands of individuals. Computational methods are needed to handle these massive data sets, including genetic ancestry (GA) inference tools. Current methods for GA inference do not scale to biobank-size genomic datasets. We present Rye-a new algorithm for GA inference at biobank scale. We compared the accuracy and runtime performance of Rye to the widely used RFMix, ADMIXTURE and iAdmix programs and applied it to a dataset of 488221 genome-wide variant samples from the UK Biobank. Rye infers GA based on principal component analysis of genomic variant samples from ancestral reference populations and query individuals. The algorithm's accuracy is powered by Metropolis-Hastings optimization and its speed is provided by non-negative least squares regression. Rye produces highly accurate GA estimates for three-way admixed populations-African, European and Native American-compared to RFMix and ADMIXTURE (${R}^2 = \ 0.998 - 1.00$), and shows 50× runtime improvement compared to ADMIXTURE on the UK Biobank dataset. Rye analysis of UK Biobank samples demonstrates how it can be used to infer GA at both continental and subcontinental levels. We discuss user consideration and options for the use of Rye; the program and its documentation are distributed on the GitHub repository: https://github.com/healthdisparities/rye.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Secale / Genética de Población Límite: Humans Idioma: En Revista: Nucleic Acids Res Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Secale / Genética de Población Límite: Humans Idioma: En Revista: Nucleic Acids Res Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos