Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
ArXiv ; 2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38495569

RESUMO

Conditional testing via the knockoff framework allows one to identify -- among large number of possible explanatory variables -- those that carry unique information about an outcome of interest, and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome wide association studies (GWAS), which have the goal of identifying genetic variants which influence traits of medical relevance. While conditional testing can be both more powerful and precise than traditional GWAS analysis methods, its vanilla implementation encounters a difficulty common to all multivariate analysis methods: it is challenging to distinguish among multiple, highly correlated regressors. This impasse can be overcome by shifting the object of inference from single variables to groups of correlated variables. To achieve this, it is necessary to construct "group knockoffs." While successful examples are already documented in the literature, this paper substantially expands the set of algorithms and software for group knockoffs. We focus in particular on second-order knockoffs, for which we describe correlation matrix approximations that are appropriate for GWAS data and that result in considerable computational savings. We illustrate the effectiveness of the proposed methods with simulations and with the analysis of albuminuria data from the UK Biobank. The described algorithms are implemented in an open-source Julia package Knockoffs.jl, for which both R and Python wrappers are available.

2.
ArXiv ; 2024 Feb 20.
Artigo em Inglês | MEDLINE | ID: mdl-38463500

RESUMO

Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.g., to avoid the release of sensitive genetic information. We extend GhostKnockoffs He et al. [2022] and introduce variable selection methods based on penalized regression achieving false discovery rate (FDR) control. We report empirical results in extensive simulation studies, demonstrating enhanced performance over previous work. We also apply our methods to genome-wide association studies of Alzheimer's disease, and evidence a significant improvement in power.

3.
Bioinformatics ; 39(4)2023 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-37067496

RESUMO

MOTIVATION: In a genome-wide association study, analyzing multiple correlated traits simultaneously is potentially superior to analyzing the traits one by one. Standard methods for multivariate genome-wide association study operate marker-by-marker and are computationally intensive. RESULTS: We present a sparsity constrained regression algorithm for multivariate genome-wide association study based on iterative hard thresholding and implement it in a convenient Julia package MendelIHT.jl. In simulation studies with up to 100 quantitative traits, iterative hard thresholding exhibits similar true positive rates, smaller false positive rates, and faster execution times than GEMMA's linear mixed models and mv-PLINK's canonical correlation analysis. On UK Biobank data with 470 228 variants, MendelIHT completed a three-trait joint analysis (n=185 656) in 20 h and an 18-trait joint analysis (n=104 264) in 53 h with an 80 GB memory footprint. In short, MendelIHT enables geneticists to fit a single regression model that simultaneously considers the effect of all SNPs and dozens of traits. AVAILABILITY AND IMPLEMENTATION: Software, documentation, and scripts to reproduce our results are available from https://github.com/OpenMendel/MendelIHT.jl.


Assuntos
Estudo de Associação Genômica Ampla , Software , Algoritmos , Simulação por Computador , Fenótipo , Polimorfismo de Nucleotídeo Único
4.
Am J Hum Genet ; 110(2): 314-325, 2023 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-36610401

RESUMO

Admixture estimation plays a crucial role in ancestry inference and genome-wide association studies (GWASs). Computer programs such as ADMIXTURE and STRUCTURE are commonly employed to estimate the admixture proportions of sample individuals. However, these programs can be overwhelmed by the computational burdens imposed by the 105 to 106 samples and millions of markers commonly found in modern biobanks. An attractive strategy is to run these programs on a set of ancestry-informative SNP markers (AIMs) that exhibit substantially different frequencies across populations. Unfortunately, existing methods for identifying AIMs require knowing ancestry labels for a subset of the sample. This supervised learning approach creates a chicken and the egg scenario. In this paper, we present an unsupervised, scalable framework that seamlessly carries out AIM selection and likelihood-based estimation of admixture proportions. Our simulated and real data examples show that this approach is scalable to modern biobank datasets. OpenADMIXTURE, our Julia implementation of the method, is open source and available for free.


Assuntos
Bancos de Espécimes Biológicos , Estudo de Associação Genômica Ampla , Humanos , Estudo de Associação Genômica Ampla/métodos , Funções Verossimilhança , Grupos Populacionais , Software , Genética Populacional
5.
Front Endocrinol (Lausanne) ; 13: 888429, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35909562

RESUMO

Objective: A personalized simulation tool, p-THYROSIM, was developed (1) to better optimize replacement LT4 and LT4+LT3 dosing for hypothyroid patients, based on individual hormone levels, BMIs, and gender; and (2) to better understand how gender and BMI impact thyroid dynamical regulation over time in these patients. Methods: p-THYROSIM was developed by (1) modifying and refining THYROSIM, an established physiologically based mechanistic model of the system regulating serum T3, T4, and TSH level dynamics; (2) incorporating sex and BMI of individual patients into the model; and (3) quantifying it with 3 experimental datasets and validating it with a fourth containing data from distinct male and female patients across a wide range of BMIs. For validation, we compared our optimized predictions with previously published results on optimized LT4 monotherapies. We also optimized combination T3+T4 dosing and computed unmeasured residual thyroid function (RTF) across a wide range of BMIs from male and female patient data. Results: Compared with 3 other dosing methods, the accuracy of p-THYROSIM optimized dosages for LT4 monotherapy was better overall (53% vs. 44%, 43%, and 38%) and for extreme BMI patients (63% vs. ~51% low BMI, 48% vs. ~36% and 22% for high BMI). Optimal dosing for combination LT4+LT3 therapy and unmeasured RTFs was predictively computed with p-THYROSIM for male and female patients in low, normal, and high BMI ranges, yielding daily T3 doses of 5 to 7.5 µg of LT3 combined with 62.5-100 µg of LT4 for women or 75-125 µg of LT4 for men. Also, graphs of steady-state serum T3, T4, and TSH concentrations vs. RTF (range 0%-50%) for untreated patients showed that neither BMI nor gender had any effect on RTF predictions for our patient cohort data. Notably, the graphs provide a means for estimating unmeasurable RTFs for individual patients from their hormone measurements before treatment. Conclusions: p-THYROSIM can provide accurate monotherapies for male and female hypothyroid patients, personalized with their BMIs. Where combination therapy is warranted, our results predict that not much LT3 is needed in addition to LT4 to restore euthyroid levels, suggesting opportunities for further research exploring combination therapy with lower T3 doses and slow-releasing T3 formulations.


Assuntos
Hipotireoidismo , Modelagem Computacional Específica para o Paciente , Tiroxina , Tri-Iodotironina , Índice de Massa Corporal , Relação Dose-Resposta a Droga , Feminino , Humanos , Hipotireoidismo/sangue , Hipotireoidismo/tratamento farmacológico , Masculino , Hormônios Tireóideos/administração & dosagem , Hormônios Tireóideos/sangue , Hormônios Tireóideos/farmacologia , Hormônios Tireóideos/uso terapêutico , Tireotropina/sangue , Tiroxina/administração & dosagem , Tiroxina/sangue , Tiroxina/farmacologia , Tiroxina/uso terapêutico , Tri-Iodotironina/administração & dosagem , Tri-Iodotironina/sangue , Tri-Iodotironina/farmacologia , Tri-Iodotironina/uso terapêutico
6.
Bioinformatics ; 37(24): 4756-4763, 2021 12 11.
Artigo em Inglês | MEDLINE | ID: mdl-34289008

RESUMO

MOTIVATION: Current methods for genotype imputation and phasing exploit the volume of data in haplotype reference panels and rely on hidden Markov models (HMMs). Existing programs all have essentially the same imputation accuracy, are computationally intensive and generally require prephasing the typed markers. RESULTS: We introduce a novel data-mining method for genotype imputation and phasing that substitutes highly efficient linear algebra routines for HMM calculations. This strategy, embodied in our Julia program MendelImpute.jl, avoids explicit assumptions about recombination and population structure while delivering similar prediction accuracy, better memory usage and an order of magnitude or better run-times compared to the fastest competing method. MendelImpute operates on both dosage data and unphased genotype data and simultaneously imputes missing genotypes and phase at both the typed and untyped SNPs (single nucleotide polymorphisms). Finally, MendelImpute naturally extends to global and local ancestry estimation and lends itself to new strategies for data compression and hence faster data transport and sharing. AVAILABILITY AND IMPLEMENTATION: Software, documentation and scripts to reproduce our results are available from https://github.com/OpenMendel/MendelImpute.jl. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Compressão de Dados , Software , Genótipo , Haplótipos , Polimorfismo de Nucleotídeo Único
7.
Gigascience ; 9(6)2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32491161

RESUMO

BACKGROUND: Consecutive testing of single nucleotide polymorphisms (SNPs) is usually employed to identify genetic variants associated with complex traits. Ideally one should model all covariates in unison, but most existing analysis methods for genome-wide association studies (GWAS) perform only univariate regression. RESULTS: We extend and efficiently implement iterative hard thresholding (IHT) for multiple regression, treating all SNPs simultaneously. Our extensions accommodate generalized linear models, prior information on genetic variants, and grouping of variants. In our simulations, IHT recovers up to 30% more true predictors than SNP-by-SNP association testing and exhibits a 2-3 orders of magnitude decrease in false-positive rates compared with lasso regression. We also test IHT on the UK Biobank hypertension phenotypes and the Northern Finland Birth Cohort of 1966 cardiovascular phenotypes. We find that IHT scales to the large datasets of contemporary human genetics and recovers the plausible genetic variants identified by previous studies. CONCLUSIONS: Our real data analysis and simulation studies suggest that IHT can (i) recover highly correlated predictors, (ii) avoid over-fitting, (iii) deliver better true-positive and false-positive rates than either marginal testing or lasso regression, (iv) recover unbiased regression coefficients, (v) exploit prior information and group-sparsity, and (vi) be used with biobank-sized datasets. Although these advances are studied for genome-wide association studies inference, our extensions are pertinent to other regression problems with large numbers of predictors.


Assuntos
Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Modelos Lineares , Algoritmos , Predisposição Genética para Doença , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes
8.
Hum Genet ; 139(1): 61-71, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-30915546

RESUMO

Statistical methods for genome-wide association studies (GWAS) continue to improve. However, the increasing volume and variety of genetic and genomic data make computational speed and ease of data manipulation mandatory in future software. In our view, a collaborative effort of statistical geneticists is required to develop open source software targeted to genetic epidemiology. Our attempt to meet this need is called the OPENMENDEL project (https://openmendel.github.io). It aims to (1) enable interactive and reproducible analyses with informative intermediate results, (2) scale to big data analytics, (3) embrace parallel and distributed computing, (4) adapt to rapid hardware evolution, (5) allow cloud computing, (6) allow integration of varied genetic data types, and (7) foster easy communication between clinicians, geneticists, statisticians, and computer scientists. This article reviews and makes recommendations to the genetic epidemiology community in the context of the OPENMENDEL project.


Assuntos
Biologia Computacional/métodos , Genoma Humano , Estudo de Associação Genômica Ampla , Modelos Estatísticos , Linguagens de Programação , Algoritmos , Humanos , Polimorfismo de Nucleotídeo Único , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...