Búsqueda | Portal Regional de la BVS

Quality Control Procedures for Genome-Wide Association Studies.

Truong, Van Q; Woerner, Jakob A; Cherlin, Tess A; Bradford, Yuki; Lucas, Anastasia M; Okeh, Chelsea C; Shivakumar, Manu K; Hui, Daniel H; Kumar, Rachit; Pividori, Milton; Jones, S Chris; Bossa, Abigail C; Turner, Stephen D; Ritchie, Marylyn D; Verma, Shefali S.

Curr Protoc ; 2(11): e603, 2022 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-36441943

RESUMEN

Genome-wide association studies (GWAS) are being conducted at an unprecedented rate in population-based cohorts and have increased our understanding of the pathophysiology of many complex diseases. Regardless of the context, the practical utility of this information ultimately depends upon the quality of the data used for statistical analyses. Quality control (QC) procedures for GWAS are constantly evolving. Here, we enumerate some of the challenges in QC of genotyped GWAS data and describe the approaches involving genotype imputation of a sample dataset along with post-imputation quality assurance, thereby minimizing potential bias and error in GWAS results. We discuss common issues associated with QC of the GWAS data (genotyped and imputed), including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We provide detailed guidelines along with a sample dataset to suggest current best practices and discuss areas of ongoing and future research. © 2022 Wiley Periodicals LLC.

Asunto(s)

Estudio de Asociación del Genoma Completo , Proyectos de Investigación , Humanos , Control de Calidad , Genotipo , Aberraciones Cromosómicas Sexuales

Genome-wide Association Analysis Across 16,956 Patients Identifies a Novel Genetic Association Between BMP6, NIPAL1, CNGA1 and Spondylosis.

Zhang, Yanfei; Grant, Ryan A; Shivakumar, Manu K; Zaleski, Michael; Sofoluke, Nelson; Slotkin, Jonathan R; Williams, Marc S; Lee, Ming Ta Michael.

Spine (Phila Pa 1976) ; 46(11): E625-E631, 2021 Jun 01.

Artículo en Inglés | MEDLINE | ID: mdl-33332786

RESUMEN

STUDY DESIGN: A case-control genome-wide association study (GWAS) on spondylosis. OBJECTIVE: Leveraging Geisinger's MyCode initiative's multimodal dataset, we aimed to identify genetic associations with degenerative spine disease. SUMMARY OF BACKGROUND DATA: Degenerative spine conditions are a leading cause of global disability; however, the genetic underpinnings of these conditions remain under-investigated. Previous studies using candidate-gene approach suggest a genetic risk for degenerative spine conditions, but large-scale GWASs are lacking. METHODS: We identified 4434 patients with a diagnosis of spondylosis using ICD diagnosis codes with genotype data available. We identified a population-based control of 12,522 patients who did not have any diagnosis for osteoarthritis. A linear-mix, additive genetic model was employed to perform the genetic association tests adjusting for age, sex, and genetic principal components to account for the population structure and relatedness. Gene-based association tests were performed and heritability and genetic correlations with other traits were investigated. RESULTS: We identified a genome-wide significant locus at rs12190551 (odds ratioâ=â1.034, 95% confidence interval 1.022-1.046, Pâ=â8.5 × 10-9, minor allele frequency = 36.9%) located in the intron of BMP6. Additionally, NIPAL1 and CNGA1 achieved Bonferroni significance in the gene-based association tests. The estimated heritability was 7.19%. Furthermore, significant genetic correlations with pain, depression, lumbar spine bone mineral density, and osteoarthritis were identified. CONCLUSION: We demonstrated the use of a massive database of genotypes combined with electronic health record data to identify a novel and significant association spondylosis. We also identified significant genetic correlations with pain, depression, bone mineral density, and osteoarthritis, suggesting shared genetic etiology and molecular pathways with these phenotypes.Level of Evidence: N/A.

Asunto(s)

Proteína Morfogenética Ósea 6/genética , Proteínas de Transporte de Catión/genética , Canales Catiónicos Regulados por Nucleótidos Cíclicos/genética , Espondilosis , Estudios de Casos y Controles , Femenino , Predisposición Genética a la Enfermedad/genética , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Espondilosis/epidemiología , Espondilosis/genética

Increasing the Density of Laboratory Measures for Machine Learning Applications.

Abedi, Vida; Li, Jiang; Shivakumar, Manu K; Avula, Venkatesh; Chaudhary, Durgesh P; Shellenberger, Matthew J; Khara, Harshit S; Zhang, Yanfei; Lee, Ming Ta Michael; Wolk, Donna M; Yeasin, Mohammed; Hontecillas, Raquel; Bassaganya-Riera, Josep; Zand, Ramin.

J Clin Med ; 10(1)2020 Dec 30.

Artículo en Inglés | MEDLINE | ID: mdl-33396741

RESUMEN

BACKGROUND: The imputation of missingness is a key step in Electronic Health Records (EHR) mining, as it can significantly affect the conclusions derived from the downstream analysis in translational medicine. The missingness of laboratory values in EHR is not at random, yet imputation techniques tend to disregard this key distinction. Consequently, the development of an adaptive imputation strategy designed specifically for EHR is an important step in improving the data imbalance and enhancing the predictive power of modeling tools for healthcare applications. METHOD: We analyzed the laboratory measures derived from Geisinger's EHR on patients in three distinct cohorts-patients tested for Clostridioides difficile (Cdiff) infection, patients with a diagnosis of inflammatory bowel disease (IBD), and patients with a diagnosis of hip or knee osteoarthritis (OA). We extracted Logical Observation Identifiers Names and Codes (LOINC) from which we excluded those with 75% or more missingness. The comorbidities, primary or secondary diagnosis, as well as active problem lists, were also extracted. The adaptive imputation strategy was designed based on a hybrid approach. The comorbidity patterns of patients were transformed into latent patterns and then clustered. Imputation was performed on a cluster of patients for each cohort independently to show the generalizability of the method. The results were compared with imputation applied to the complete dataset without incorporating the information from comorbidity patterns. RESULTS: We analyzed a total of 67,445 patients (11,230 IBD patients, 10,000 OA patients, and 46,215 patients tested for C. difficile infection). We extracted 495 LOINC and 11,230 diagnosis codes for the IBD cohort, 8160 diagnosis codes for the Cdiff cohort, and 2042 diagnosis codes for the OA cohort based on the primary/secondary diagnosis and active problem list in the EHR. Overall, the most improvement from this strategy was observed when the laboratory measures had a higher level of missingness. The best root mean square error (RMSE) difference for each dataset was recorded as -35.5 for the Cdiff, -8.3 for the IBD, and -11.3 for the OA dataset. CONCLUSIONS: An adaptive imputation strategy designed specifically for EHR that uses complementary information from the clinical profile of the patient can be used to improve the imputation of missing laboratory values, especially when laboratory codes with high levels of missingness are included in the analysis.

Rare variants in the splicing regulatory elements of EXOC3L4 are associated with brain glucose metabolism in Alzheimer's disease.

Miller, Jason E; Shivakumar, Manu K; Lee, Younghee; Han, Seonggyun; Horgousluoglu, Emrin; Risacher, Shannon L; Saykin, Andrew J; Nho, Kwangsik; Kim, Dokyoon.

BMC Med Genomics ; 11(Suppl 3): 76, 2018 Sep 14.

Artículo en Inglés | MEDLINE | ID: mdl-30255815

RESUMEN

BACKGROUND: Alzheimer's disease (AD) is one of the most common neurodegenerative diseases that causes problems related to brain function. To some extent it is understood on a molecular level how AD arises, however there are a lack of biomarkers that can be used for early diagnosis. Two popular methods to identify AD-related biomarkers use genetics and neuroimaging. Genes and neuroimaging phenotypes have provided some insights as to the potential for AD biomarkers. While the field of imaging-genomics has identified genetic features associated with structural and functional neuroimaging phenotypes, it remains unclear how variants that affect splicing could be important for understanding the genetic etiology of AD. METHODS: In this study, rare variants (minor allele frequency < 0.01) in splicing regulatory element (SRE) loci from whole genome sequencing (WGS) in the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort, were used to identify genes that are associated with global brain cortical glucose metabolism in AD measured by FDG PET-scans. Gene-based associated analyses of rare variants were performed using the program BioBin and the optimal Sequence Kernel Association Test (SKAT-O). RESULTS: The gene, EXOC3L4, was identified as significantly associated with global cortical glucose metabolism (FDR (false discovery rate) corrected p < 0.05) using SRE coding variants only. Three loci that may affect splicing within EXOC3L4 contribute to the association. CONCLUSION: Based on sequence homology, EXOC3L4 is likely a part of the exocyst complex. Our results suggest the possibility that variants which affect proper splicing of EXOC3L4 via SREs may impact vesicle transport, giving rise to AD related phenotypes. Overall, by utilizing WGS and functional neuroimaging we have identified a gene significantly associated with an AD related endophenotype, potentially through a mechanism that involves splicing.

Asunto(s)

Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/metabolismo , Encéfalo/metabolismo , Glucosa/metabolismo , Polimorfismo de Nucleótido Simple , Empalme del ARN/genética , Secuencias Reguladoras de Ácidos Nucleicos , Proteínas de Transporte Vesicular/genética , Enfermedad de Alzheimer/diagnóstico por imagen , Biomarcadores/análisis , Encéfalo/patología , Estudios de Cohortes , Biología Computacional/métodos , Humanos , Neuroimagen/métodos , Fenotipo , Secuenciación Completa del Genoma/métodos

Codon bias among synonymous rare variants is associated with Alzheimer's disease imaging biomarker.

Miller, Jason E; Shivakumar, Manu K; Risacher, Shannon L; Saykin, Andrew J; Lee, Seunggeun; Nho, Kwangsik; Kim, Dokyoon.

Pac Symp Biocomput ; 23: 365-376, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-29218897

RESUMEN

Alzheimer's disease (AD) is a neurodegenerative disorder with few biomarkers even though it impacts a relatively large portion of the population and is predicted to affect significantly more individuals in the future. Neuroimaging has been used in concert with genetic information to improve our understanding in relation to how AD arises and how it can be potentially diagnosed. Additionally, evidence suggests synonymous variants can have a functional impact on gene regulatory mechanisms, including those related to AD. Some synonymous codons are preferred over others leading to a codon bias. The bias can arise with respect to codons that are more or less frequently used in the genome. A bias can also result from optimal and non-optimal codons, which have stronger and weaker codon anti-codon interactions, respectively. Although association tests have been utilized before to identify genes associated with AD, it remains unclear how codon bias plays a role and if it can improve rare variant analysis. In this work, rare variants from whole-genome sequencing from the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort were binned into genes using BioBin. An association analysis of the genes with AD-related neuroimaging biomarker was performed using SKAT-O. While using all synonymous variants we did not identify any genomewide significant associations, using only synonymous variants that affected codon frequency we identified several genes as significantly associated with the imaging phenotype. Additionally, significant associations were found using only rare variants that contains an optimal codon in among minor alleles and a non-optimal codon in the major allele. These results suggest that codon bias may play a role in AD and that it can be used to improve detection power in rare variant association analysis.

Asunto(s)

Enfermedad de Alzheimer/diagnóstico por imagen , Enfermedad de Alzheimer/genética , Codón/genética , Variación Genética , Anciano , Anciano de 80 o más Años , Biomarcadores , Biología Computacional/métodos , Femenino , Estudios de Asociación Genética , Humanos , Imagen por Resonancia Magnética , Masculino , Neuroimagen , Polimorfismo de Nucleótido Simple , Secuenciación Completa del Genoma

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA