Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 71
Filtrar
1.
bioRxiv ; 2024 May 06.
Artículo en Inglés | MEDLINE | ID: mdl-38766054

RESUMEN

Identifying the causal variants and mechanisms that drive complex traits and diseases remains a core problem in human genetics. The majority of these variants have individually weak effects and lie in non-coding gene-regulatory elements where we lack a complete understanding of how single nucleotide alterations modulate transcriptional processes to affect human phenotypes. To address this, we measured the activity of 221,412 trait-associated variants that had been statistically fine-mapped using a Massively Parallel Reporter Assay (MPRA) in 5 diverse cell-types. We show that MPRA is able to discriminate between likely causal variants and controls, identifying 12,025 regulatory variants with high precision. Although the effects of these variants largely agree with orthogonal measures of function, only 69% can plausibly be explained by the disruption of a known transcription factor (TF) binding motif. We dissect the mechanisms of 136 variants using saturation mutagenesis and assign impacted TFs for 91% of variants without a clear canonical mechanism. Finally, we provide evidence that epistasis is prevalent for variants in close proximity and identify multiple functional variants on the same haplotype at a small, but important, subset of trait-associated loci. Overall, our study provides a systematic functional characterization of likely causal common variants underlying complex and molecular human traits, enabling new insights into the regulatory grammar underlying disease risk.

2.
medRxiv ; 2024 May 13.
Artículo en Inglés | MEDLINE | ID: mdl-38798542

RESUMEN

Leveraging data from multiple ancestries can greatly improve fine-mapping power due to differences in linkage disequilibrium and allele frequencies. We propose MultiSuSiE, an extension of the sum of single effects model (SuSiE) to multiple ancestries that allows causal effect sizes to vary across ancestries based on a multivariate normal prior informed by empirical data. We evaluated MultiSuSiE via simulations and analyses of 14 quantitative traits leveraging whole-genome sequencing data in 47k African-ancestry and 94k European-ancestry individuals from All of Us. In simulations, MultiSuSiE applied to Afr47k+Eur47k was well-calibrated and attained higher power than SuSiE applied to Eur94k; interestingly, higher causal variant PIPs in Afr47k compared to Eur47k were entirely explained by differences in the extent of LD quantified by LD 4th moments. Compared to very recently proposed multi-ancestry fine-mapping methods, MultiSuSiE attained higher power and/or much lower computational costs, making the analysis of large-scale All of Us data feasible. In real trait analyses, MultiSuSiE applied to Afr47k+Eur94k identified 579 fine-mapped variants with PIP > 0.5, and MultiSuSiE applied to Afr47k+Eur47k identified 44% more fine-mapped variants with PIP > 0.5 than SuSiE applied to Eur94k. We validated MultiSuSiE results for real traits via functional enrichment of fine-mapped variants. We highlight several examples where MultiSuSiE implicates well-studied or biologically plausible fine-mapped variants that were not implicated by other methods.

3.
JAMA Netw Open ; 7(3): e243379, 2024 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-38546648

RESUMEN

Importance: Subgroup analyses are often performed in oncology to investigate differential treatment effects and may even constitute the basis for regulatory approvals. Current understanding of the features, results, and quality of subgroup analyses is limited. Objective: To evaluate forest plot interpretability and credibility of differential treatment effect claims among oncology trials. Design, Setting, and Participants: This cross-sectional study included randomized phase 3 clinical oncology trials published prior to 2021. Trials were screened from ClinicalTrials.gov. Main Outcomes and Measures: Missing visual elements in forest plots were defined as a missing point estimate or use of a linear x-axis scale for hazard and odds ratios. Multiplicity of testing control was recorded. Differential treatment effect claims were rated using the Instrument for Assessing the Credibility of Effect Modification Analyses. Linear and logistic regressions evaluated associations with outcomes. Results: Among 785 trials, 379 studies (48%) enrolling 331 653 patients reported a subgroup analysis. The forest plots of 43% of trials (156 of 363) were missing visual elements impeding interpretability. While 4148 subgroup effects were evaluated, only 1 trial (0.3%) controlled for multiple testing. On average, trials that did not meet the primary end point conducted 2 more subgroup effect tests compared with trials meeting the primary end point (95% CI, 0.59-3.43 tests; P = .006). A total of 101 differential treatment effects were claimed across 15% of trials (55 of 379). Interaction testing was missing in 53% of trials (29 of 55) claiming differential treatment effects. Trials not meeting the primary end point were associated with greater odds of no interaction testing (odds ratio, 4.47; 95% CI, 1.42-15.55, P = .01). The credibility of differential treatment effect claims was rated as low or very low in 93% of cases (94 of 101). Conclusions and Relevance: In this cross-sectional study of phase 3 oncology trials, nearly half of trials presented a subgroup analysis in their primary publication. However, forest plots of these subgroup analyses largely lacked essential features for interpretation, and most differential treatment effect claims were not supported. Oncology subgroup analyses should be interpreted with caution, and improvements to the quality of subgroup analyses are needed.


Asunto(s)
Oncología Médica , Neoplasias , Humanos , Estudios Transversales , Neoplasias/terapia , Oportunidad Relativa
4.
bioRxiv ; 2023 Nov 02.
Artículo en Inglés | MEDLINE | ID: mdl-37961350

RESUMEN

Large-scale whole-genome sequencing (WGS) studies have improved our understanding of the contributions of coding and noncoding rare variants to complex human traits. Leveraging association effect sizes across multiple traits in WGS rare variant association analysis can improve statistical power over single-trait analysis, and also detect pleiotropic genes and regions. Existing multi-trait methods have limited ability to perform rare variant analysis of large-scale WGS data. We propose MultiSTAAR, a statistical framework and computationally-scalable analytical pipeline for functionally-informed multi-trait rare variant analysis in large-scale WGS studies. MultiSTAAR accounts for relatedness, population structure and correlation among phenotypes by jointly analyzing multiple traits, and further empowers rare variant association analysis by incorporating multiple functional annotations. We applied MultiSTAAR to jointly analyze three lipid traits (low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides) in 61,861 multi-ethnic samples from the Trans-Omics for Precision Medicine (TOPMed) Program. We discovered new associations with lipid traits missed by single-trait analysis, including rare variants within an enhancer of NIPSNAP3A and an intergenic region on chromosome 1.

5.
Eur J Cancer ; 194: 113357, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37827064

RESUMEN

BACKGROUND: The 'Table 1 Fallacy' refers to the unsound use of significance testing for comparing the distributions of baseline variables between randomised groups to draw erroneous conclusions about balance or imbalance. We performed a cross-sectional study of the Table 1 Fallacy in phase III oncology trials. METHODS: From ClinicalTrials.gov, 1877 randomised trials were screened. Multivariable logistic regressions evaluated predictors of the Table 1 Fallacy. RESULTS: A total of 765 randomised controlled trials involving 553,405 patients were analysed. The Table 1 Fallacy was observed in 25% of trials (188 of 765), with 3% of comparisons deemed significant (59 of 2353), approximating the typical 5% type I error assertion probability. Application of trial-level multiplicity corrections reduced the rate of significant findings to 0.3% (six of 2345 tests). Factors associated with lower odds of the Table 1 Fallacy included industry sponsorship (adjusted odds ratio [aOR] 0.29, 95% confidence interval [CI] 0.18-0.47; multiplicity-corrected P < 0.0001), larger trial size (≥795 versus <280 patients; aOR 0.32, 95% CI 0.19-0.53; multiplicity-corrected P = 0.0008), and publication in a European versus American journal (aOR 0.06, 95% CI 0.03-0.13; multiplicity-corrected P < 0.0001). CONCLUSIONS: This study highlights the persistence of the Table 1 Fallacy in contemporary oncology randomised controlled trials, with one of every four trials testing for baseline differences after randomisation. Significance testing is a suboptimal method for identifying unsound randomisation procedures and may encourage misleading inferences. Journal-level enforcement is a possible strategy to help mitigate this fallacy.


Asunto(s)
Neoplasias , Humanos , Prevalencia , Estudios Transversales , Neoplasias/epidemiología , Neoplasias/terapia , Ensayos Clínicos Controlados Aleatorios como Asunto
7.
Am J Hum Genet ; 110(8): 1330-1342, 2023 08 03.
Artículo en Inglés | MEDLINE | ID: mdl-37494930

RESUMEN

Allelic series are of candidate therapeutic interest because of the existence of a dose-response relationship between the functionality of a gene and the degree or severity of a phenotype. We define an allelic series as a collection of variants in which increasingly deleterious mutations lead to increasingly large phenotypic effects, and we have developed a gene-based rare-variant association test specifically targeted to identifying genes containing allelic series. Building on the well-known burden test and sequence kernel association test (SKAT), we specify a variety of association models covering different genetic architectures and integrate these into a Coding-Variant Allelic-Series Test (COAST). Through extensive simulations, we confirm that COAST maintains the type I error and improves the power when the pattern of coding-variant effect sizes increases monotonically with mutational severity. We applied COAST to identify allelic-series genes for four circulating-lipid traits and five cell-count traits among 145,735 subjects with available whole-exome sequencing data from the UK Biobank. Compared with optimal SKAT (SKAT-O), COAST identified 29% more Bonferroni-significant associations with circulating-lipid traits, on average, and 82% more with cell-count traits. All of the gene-trait associations identified by COAST have corroborating evidence either from rare-variant associations in the full cohort (Genebass, n = 400,000) or from common-variant associations in the GWAS Catalog. In addition to detecting many gene-trait associations present in Genebass by using only a fraction (36.9%) of the sample, COAST detects associations, such as that between ANGPTL4 and triglycerides, that are absent from Genebass but that have clear common-variant support.


Asunto(s)
Variación Genética , Lípidos , Simulación por Computador , Estudios de Asociación Genética , Fenotipo , Estudio de Asociación del Genoma Completo
9.
JAMA Netw Open ; 6(6): e2319055, 2023 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-37342044

RESUMEN

This cohort study demonstrates how to use cumulative event count curves to create a clinically meaningful end point by simultaneously considering recurrence, progression, and survival times from the individual patient.


Asunto(s)
Oncología Médica , Humanos , Determinación de Punto Final
10.
medRxiv ; 2023 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-37163049

RESUMEN

High-dimensional clinical data are becoming more accessible in biobank-scale datasets. However, effectively utilizing high-dimensional clinical data for genetic discovery remains challenging. Here we introduce a general deep learning-based framework, REpresentation learning for Genetic discovery on Low-dimensional Embeddings (REGLE), for discovering associations between genetic variants and high-dimensional clinical data. REGLE uses convolutional variational autoencoders to compute a non-linear, low-dimensional, disentangled embedding of the data with highly heritable individual components. REGLE can incorporate expert-defined or clinical features and provides a framework to create accurate disease-specific polygenic risk scores (PRS) in datasets which have minimal expert phenotyping. We apply REGLE to both respiratory and circulatory systems: spirograms which measure lung function and photoplethysmograms (PPG) which measure blood volume changes. Genome-wide association studies on REGLE embeddings identify more genome-wide significant loci than existing methods and replicate known loci for both spirograms and PPG, demonstrating the generality of the framework. Furthermore, these embeddings are associated with overall survival. Finally, we construct a set of PRSs that improve predictive performance of asthma, chronic obstructive pulmonary disease, hypertension, and systolic blood pressure in multiple biobanks. Thus, REGLE embeddings can quantify clinically relevant features that are not currently captured in a standardized or automated way.

11.
JAMA Netw Open ; 6(4): e236498, 2023 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-37010873

RESUMEN

This cohort study assesses the relative stability of median and mean survival time estimates reported in cancer clinical trials.


Asunto(s)
Neoplasias , Humanos , Tasa de Supervivencia , Neoplasias/tratamiento farmacológico , Análisis de Supervivencia
12.
Nat Genet ; 55(5): 787-795, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-37069358

RESUMEN

Chronic obstructive pulmonary disease (COPD), the third leading cause of death worldwide, is highly heritable. While COPD is clinically defined by applying thresholds to summary measures of lung function, a quantitative liability score has more power to identify genetic signals. Here we train a deep convolutional neural network on noisy self-reported and International Classification of Diseases labels to predict COPD case-control status from high-dimensional raw spirograms and use the model's predictions as a liability score. The machine-learning-based (ML-based) liability score accurately discriminates COPD cases and controls, and predicts COPD-related hospitalization without any domain-specific knowledge. Moreover, the ML-based liability score is associated with overall survival and exacerbation events. A genome-wide association study on the ML-based liability score replicates existing COPD and lung function loci and also identifies 67 new loci. Lastly, our method provides a general framework to use ML methods and medical-record-based labels that does not require domain knowledge or expert curation to improve disease prediction and genomic discovery for drug design.


Asunto(s)
Aprendizaje Profundo , Enfermedad Pulmonar Obstructiva Crónica , Humanos , Estudio de Asociación del Genoma Completo/métodos , Enfermedad Pulmonar Obstructiva Crónica/genética , Sitios Genéticos , Polimorfismo de Nucleótido Simple/genética
13.
JAMA Cardiol ; 8(6): 554-563, 2023 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-37099283

RESUMEN

Importance: In the Dapagliflozin Evaluation to Improve the Lives of Patients With Preserved Ejection Fraction Heart Failure (DELIVER) trial, dapagliflozin reduced the risk of time to first worsening heart failure (HF) event or cardiovascular death in patients with HF with mildly reduced or preserved ejection fraction (EF). Objective: To evaluate the effect of dapagliflozin on total (ie, first and recurrent) HF events and cardiovascular death in this population. Design, Setting, and Participants: In this prespecified analysis of the DELIVER trial, the proportional rates approach of Lin, Wei, Yang, and Ying (LWYY) and a joint frailty model were used to examine the effect of dapagliflozin on total HF events and cardiovascular death. Several subgroups were examined to test for heterogeneity in the effect of dapagliflozin, including left ventricular EF. Participants were enrolled from August 2018 to December 2020, and data were analyzed from August to October 2022. Interventions: Dapagliflozin, 10 mg, once daily or matching placebo. Main Outcomes and Measures: The outcome was total episodes of worsening HF (hospitalization for HF or urgent HF visit requiring intravenous HF therapies) and cardiovascular death. Results: Of 6263 included patients, 2747 (43.9%) were women, and the mean (SD) age was 71.7 (9.6) years. There were 1057 HF events and cardiovascular deaths in the placebo group compared with 815 in the dapagliflozin group. Patients with more HF events had features of more severe HF, such as higher N-terminal pro-B-type natriuretic peptide level, worse kidney function, more prior HF hospitalizations, and longer duration of HF, although EF was similar to those with no HF events. In the LWYY model, the rate ratio for total HF events and cardiovascular death for dapagliflozin compared with placebo was 0.77 (95% CI, 0.67-0.89; P < .001) compared with a hazard ratio of 0.82 (95% CI, 0.73-0.92; P < .001) in a traditional time to first event analysis. In the joint frailty model, the rate ratio was 0.72 (95% CI, 0.65-0.81; P < .001) for total HF events and 0.87 (95% CI, 0.72-1.05; P = .14) for cardiovascular death. The results were similar for total HF hospitalizations (without urgent HF visits) and cardiovascular death and in all subgroups, including those defined by EF. Conclusions and Relevance: In the DELIVER trial, dapagliflozin reduced the rate of total HF events (first and subsequent HF hospitalizations and urgent HF visits) and cardiovascular death regardless of patient characteristics, including EF. Trial Registration: ClinicalTrials.gov Identifier: NCT03619213.


Asunto(s)
Fragilidad , Insuficiencia Cardíaca , Humanos , Femenino , Anciano , Masculino , Insuficiencia Cardíaca/complicaciones , Insuficiencia Cardíaca/tratamiento farmacológico , Insuficiencia Cardíaca/inducido químicamente , Función Ventricular Izquierda , Compuestos de Bencidrilo/uso terapéutico
15.
Biometrics ; 79(2): 1472-1484, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-35218565

RESUMEN

Sample sizes vary substantially across tissues in the Genotype-Tissue Expression (GTEx) project, where considerably fewer samples are available from certain inaccessible tissues, such as the substantia nigra (SSN), than from accessible tissues, such as blood. This severely limits power for identifying tissue-specific expression quantitative trait loci (eQTL) in undersampled tissues. Here we propose Surrogate Phenotype Regression Analysis (Spray) for leveraging information from a correlated surrogate outcome (eg, expression in blood) to improve inference on a partially missing target outcome (eg, expression in SSN). Rather than regarding the surrogate outcome as a proxy for the target outcome, Spray jointly models the target and surrogate outcomes within a bivariate regression framework. Unobserved values of either outcome are treated as missing data. We describe and implement an expectation conditional maximization algorithm for performing estimation in the presence of bilateral outcome missingness. Spray estimates the same association parameter estimated by standard eQTL mapping and controls the type I error even when the target and surrogate outcomes are truly uncorrelated. We demonstrate analytically and empirically, using simulations and GTEx data, that in comparison with marginally modeling the target outcome, jointly modeling the target and surrogate outcomes increases estimation precision and improves power.


Asunto(s)
Algoritmos , Sitios de Carácter Cuantitativo , Fenotipo , Análisis de Regresión
16.
BMC Bioinformatics ; 23(1): 208, 2022 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-35650523

RESUMEN

BACKGROUND: Bioinformatics investigators often gain insights by combining information across multiple and disparate data sets. Merging data from multiple sources frequently results in data sets that are incomplete or contain missing values. Although missing data are ubiquitous, existing implementations of Gaussian mixture models (GMMs) either cannot accommodate missing data, or do so by imposing simplifying assumptions that limit the applicability of the model. In the presence of missing data, a standard ad hoc practice is to perform complete case analysis or imputation prior to model fitting. Both approaches have serious drawbacks, potentially resulting in biased and unstable parameter estimates. RESULTS: Here we present missingness-aware Gaussian mixture models (MGMM), an R package for fitting GMMs in the presence of missing data. Unlike existing GMM implementations that can accommodate missing data, MGMM places no restrictions on the form of the covariance matrix. Using three case studies on real and simulated 'omics data sets, we demonstrate that, when the underlying data distribution is near-to a GMM, MGMM is more effective at recovering the true cluster assignments than either the existing GMM implementations that accommodate missing data, or fitting a standard GMM after state of the art imputation. Moreover, MGMM provides an accurate assessment of cluster assignment uncertainty, even when the generative distribution is not a GMM. CONCLUSION: Compared to state-of-the-art competitors, MGMM demonstrates a better ability to recover the true cluster assignments for a wide variety of data sets and a large range of missingness rates. MGMM provides the bioinformatics community with a powerful, easy-to-use, and statistically sound tool for performing clustering and density estimation in the presence of missing data. MGMM is publicly available as an R package on CRAN: https://CRAN.R-project.org/package=MGMM .


Asunto(s)
Biología Computacional , Análisis por Conglomerados , Biología Computacional/métodos , Distribución Normal
18.
JNCI Cancer Spectr ; 6(1)2022 01 05.
Artículo en Inglés | MEDLINE | ID: mdl-35699499

RESUMEN

When designing a comparative oncology trial for an overall or progression-free survival endpoint, investigators often quantify the treatment effect using a difference in median survival times. However, rather than directly designing the study to estimate this difference, it is almost always converted to a hazard ratio (HR) to determine the study size. At the analysis stage, the hazard ratio is utilized for formal analysis, yet because it may be difficult to interpret clinically, especially when the proportional hazards assumption is not met, the observed medians are also reported descriptively. The hazard ratio and median difference contrast different aspects of the survival curves. Whereas the hazard ratio places greater emphasis on late-occurring separation, the median difference focuses locally on the centers of the distributions and cannot capture either short- or long-term differences. Having 2 sets of summaries (a hazard ratio and the medians) may lead to incoherent conclusions regarding the treatment effect. For instance, the hazard ratio may suggest a treatment difference whereas the medians do not, or vice versa. In this commentary, we illustrate these commonly encountered issues using examples from recent oncology trials. We present a coherent alternative strategy that, unlike relying on the hazard ratio, does not require modeling assumptions and always results in clinically interpretable summaries of the treatment effect.


Asunto(s)
Neoplasias , Proyectos de Investigación , Humanos , Oncología Médica , Neoplasias/terapia , Modelos de Riesgos Proporcionales
20.
J Hum Genet ; 67(8): 449-458, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-35351958

RESUMEN

Using the Taiwan Biobank, we aimed to identify traits and genetic variations that could predispose Han Chinese women to primary dysmenorrhea. Cases of primary dysmenorrhea included those who self-reported "frequent dysmenorrhea" in a dysmenorrhea-related Taiwan Biobank questionnaire, and those who have been diagnosed with severe dysmenorrhea by a physician. Controls were those without self-reported dysmenorrhea. Customized Axiom-Taiwan Biobank Array Plates were used to perform whole-genome genotyping, PLINK was used to perform association tests, and HaploReg was used to conduct functional annotations of SNPs and bioinformatic analyses. The GWAS analysis included 1186 cases and 24,020 controls. We identified 53 SNPs that achieved genome-wide significance (P < 5 × 10-8, which clustered in 2 regions. The first SNP cluster was on chromosome 1, and included 24 high LD (R2 > 0.88) variants around the NGF gene (lowest P value of 3.83 × 10-13 for rs2982742). Most SNPs occurred within NGF introns, and were predicted to alter regulatory binding motifs. The second SNP cluster was on chromosome 2, including 7 high LD (R2 > 0.94) variants around the IL1A and IL1B loci (lowest P value of 7.43 × 10-10 for rs11676014) and 22 SNPs that did not reach significance after conditional analysis. Most of these SNPs resided within IL1A and IL1B introns, while 2 SNPs may be in the promoter histone marks or promoter flanking regions of IL1B. To conclude, data from this study suggest that NGF, IL1A, and IL1B may be involved in the pathogenesis of primary dysmenorrhea in the Han Chinese in Taiwan.


Asunto(s)
Dismenorrea , Interleucina-1alfa , Interleucina-1beta , Factor de Crecimiento Nervioso , Bancos de Muestras Biológicas , Dismenorrea/epidemiología , Dismenorrea/genética , Femenino , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Interleucina-1alfa/genética , Interleucina-1beta/genética , Factor de Crecimiento Nervioso/genética , Polimorfismo de Nucleótido Simple , Taiwán
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA