Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 79
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Am J Hum Genet ; 110(8): 1330-1342, 2023 08 03.
Article in English | MEDLINE | ID: mdl-37494930

ABSTRACT

Allelic series are of candidate therapeutic interest because of the existence of a dose-response relationship between the functionality of a gene and the degree or severity of a phenotype. We define an allelic series as a collection of variants in which increasingly deleterious mutations lead to increasingly large phenotypic effects, and we have developed a gene-based rare-variant association test specifically targeted to identifying genes containing allelic series. Building on the well-known burden test and sequence kernel association test (SKAT), we specify a variety of association models covering different genetic architectures and integrate these into a Coding-Variant Allelic-Series Test (COAST). Through extensive simulations, we confirm that COAST maintains the type I error and improves the power when the pattern of coding-variant effect sizes increases monotonically with mutational severity. We applied COAST to identify allelic-series genes for four circulating-lipid traits and five cell-count traits among 145,735 subjects with available whole-exome sequencing data from the UK Biobank. Compared with optimal SKAT (SKAT-O), COAST identified 29% more Bonferroni-significant associations with circulating-lipid traits, on average, and 82% more with cell-count traits. All of the gene-trait associations identified by COAST have corroborating evidence either from rare-variant associations in the full cohort (Genebass, n = 400,000) or from common-variant associations in the GWAS Catalog. In addition to detecting many gene-trait associations present in Genebass by using only a fraction (36.9%) of the sample, COAST detects associations, such as that between ANGPTL4 and triglycerides, that are absent from Genebass but that have clear common-variant support.


Subject(s)
Genetic Variation , Lipids , Computer Simulation , Genetic Association Studies , Phenotype , Genome-Wide Association Study
2.
Am J Hum Genet ; 108(7): 1217-1230, 2021 07 01.
Article in English | MEDLINE | ID: mdl-34077760

ABSTRACT

Genome-wide association studies (GWASs) require accurate cohort phenotyping, but expert labeling can be costly, time intensive, and variable. Here, we develop a machine learning (ML) model to predict glaucomatous optic nerve head features from color fundus photographs. We used the model to predict vertical cup-to-disc ratio (VCDR), a diagnostic parameter and cardinal endophenotype for glaucoma, in 65,680 Europeans in the UK Biobank (UKB). A GWAS of ML-based VCDR identified 299 independent genome-wide significant (GWS; p ≤ 5 × 10-8) hits in 156 loci. The ML-based GWAS replicated 62 of 65 GWS loci from a recent VCDR GWAS in the UKB for which two ophthalmologists manually labeled images for 67,040 Europeans. The ML-based GWAS also identified 93 novel loci, significantly expanding our understanding of the genetic etiologies of glaucoma and VCDR. Pathway analyses support the biological significance of the novel hits to VCDR: select loci near genes involved in neuronal and synaptic biology or harboring variants are known to cause severe Mendelian ophthalmic disease. Finally, the ML-based GWAS results significantly improve polygenic prediction of VCDR and primary open-angle glaucoma in the independent EPIC-Norfolk cohort.


Subject(s)
Machine Learning , Optic Disk/anatomy & histology , Datasets as Topic , Fluorescein Angiography , Genome-Wide Association Study , Glaucoma, Open-Angle/diagnostic imaging , Humans , Models, Anatomic , Optic Disk/diagnostic imaging , Phenotype , Risk Assessment
3.
PLoS Genet ; 17(8): e1009713, 2021 08.
Article in English | MEDLINE | ID: mdl-34460823

ABSTRACT

Genome-wide association studies (GWASs) have uncovered a wealth of associations between common variants and human phenotypes. Here, we present an integrative analysis of GWAS summary statistics from 36 phenotypes to decipher multitrait genetic architecture and its link with biological mechanisms. Our framework incorporates multitrait association mapping along with an investigation of the breakdown of genetic associations into clusters of variants harboring similar multitrait association profiles. Focusing on two subsets of immunity and metabolism phenotypes, we then demonstrate how genetic variants within clusters can be mapped to biological pathways and disease mechanisms. Finally, for the metabolism set, we investigate the link between gene cluster assignment and the success of drug targets in randomized controlled trials.


Subject(s)
Computational Biology/methods , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Cluster Analysis , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Phenotype
4.
Biometrics ; 79(2): 1472-1484, 2023 06.
Article in English | MEDLINE | ID: mdl-35218565

ABSTRACT

Sample sizes vary substantially across tissues in the Genotype-Tissue Expression (GTEx) project, where considerably fewer samples are available from certain inaccessible tissues, such as the substantia nigra (SSN), than from accessible tissues, such as blood. This severely limits power for identifying tissue-specific expression quantitative trait loci (eQTL) in undersampled tissues. Here we propose Surrogate Phenotype Regression Analysis (Spray) for leveraging information from a correlated surrogate outcome (eg, expression in blood) to improve inference on a partially missing target outcome (eg, expression in SSN). Rather than regarding the surrogate outcome as a proxy for the target outcome, Spray jointly models the target and surrogate outcomes within a bivariate regression framework. Unobserved values of either outcome are treated as missing data. We describe and implement an expectation conditional maximization algorithm for performing estimation in the presence of bilateral outcome missingness. Spray estimates the same association parameter estimated by standard eQTL mapping and controls the type I error even when the target and surrogate outcomes are truly uncorrelated. We demonstrate analytically and empirically, using simulations and GTEx data, that in comparison with marginally modeling the target outcome, jointly modeling the target and surrogate outcomes increases estimation precision and improves power.


Subject(s)
Algorithms , Quantitative Trait Loci , Phenotype , Regression Analysis
5.
BMC Bioinformatics ; 23(1): 208, 2022 Jun 01.
Article in English | MEDLINE | ID: mdl-35650523

ABSTRACT

BACKGROUND: Bioinformatics investigators often gain insights by combining information across multiple and disparate data sets. Merging data from multiple sources frequently results in data sets that are incomplete or contain missing values. Although missing data are ubiquitous, existing implementations of Gaussian mixture models (GMMs) either cannot accommodate missing data, or do so by imposing simplifying assumptions that limit the applicability of the model. In the presence of missing data, a standard ad hoc practice is to perform complete case analysis or imputation prior to model fitting. Both approaches have serious drawbacks, potentially resulting in biased and unstable parameter estimates. RESULTS: Here we present missingness-aware Gaussian mixture models (MGMM), an R package for fitting GMMs in the presence of missing data. Unlike existing GMM implementations that can accommodate missing data, MGMM places no restrictions on the form of the covariance matrix. Using three case studies on real and simulated 'omics data sets, we demonstrate that, when the underlying data distribution is near-to a GMM, MGMM is more effective at recovering the true cluster assignments than either the existing GMM implementations that accommodate missing data, or fitting a standard GMM after state of the art imputation. Moreover, MGMM provides an accurate assessment of cluster assignment uncertainty, even when the generative distribution is not a GMM. CONCLUSION: Compared to state-of-the-art competitors, MGMM demonstrates a better ability to recover the true cluster assignments for a wide variety of data sets and a large range of missingness rates. MGMM provides the bioinformatics community with a powerful, easy-to-use, and statistically sound tool for performing clustering and density estimation in the presence of missing data. MGMM is publicly available as an R package on CRAN: https://CRAN.R-project.org/package=MGMM .


Subject(s)
Computational Biology , Cluster Analysis , Computational Biology/methods , Normal Distribution
6.
J Hum Genet ; 67(8): 449-458, 2022 Aug.
Article in English | MEDLINE | ID: mdl-35351958

ABSTRACT

Using the Taiwan Biobank, we aimed to identify traits and genetic variations that could predispose Han Chinese women to primary dysmenorrhea. Cases of primary dysmenorrhea included those who self-reported "frequent dysmenorrhea" in a dysmenorrhea-related Taiwan Biobank questionnaire, and those who have been diagnosed with severe dysmenorrhea by a physician. Controls were those without self-reported dysmenorrhea. Customized Axiom-Taiwan Biobank Array Plates were used to perform whole-genome genotyping, PLINK was used to perform association tests, and HaploReg was used to conduct functional annotations of SNPs and bioinformatic analyses. The GWAS analysis included 1186 cases and 24,020 controls. We identified 53 SNPs that achieved genome-wide significance (P < 5 × 10-8, which clustered in 2 regions. The first SNP cluster was on chromosome 1, and included 24 high LD (R2 > 0.88) variants around the NGF gene (lowest P value of 3.83 × 10-13 for rs2982742). Most SNPs occurred within NGF introns, and were predicted to alter regulatory binding motifs. The second SNP cluster was on chromosome 2, including 7 high LD (R2 > 0.94) variants around the IL1A and IL1B loci (lowest P value of 7.43 × 10-10 for rs11676014) and 22 SNPs that did not reach significance after conditional analysis. Most of these SNPs resided within IL1A and IL1B introns, while 2 SNPs may be in the promoter histone marks or promoter flanking regions of IL1B. To conclude, data from this study suggest that NGF, IL1A, and IL1B may be involved in the pathogenesis of primary dysmenorrhea in the Han Chinese in Taiwan.


Subject(s)
Dysmenorrhea , Interleukin-1alpha , Interleukin-1beta , Nerve Growth Factor , Biological Specimen Banks , Dysmenorrhea/epidemiology , Dysmenorrhea/genetics , Female , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Interleukin-1alpha/genetics , Interleukin-1beta/genetics , Nerve Growth Factor/genetics , Polymorphism, Single Nucleotide , Taiwan
7.
Stat Med ; 41(20): 4022-4033, 2022 09 10.
Article in English | MEDLINE | ID: mdl-35688463

ABSTRACT

Selection trials are used to compare potentially active experimental treatments without a control arm. While sample size calculation methods exist for binary endpoints, no such methods are available for time-to-event endpoints, even though these are ubiquitous in clinical trials. Recent selection trials have begun using progression-free survival as their primary endpoint, but have dichotomized it at a specific time point for sample size calculation and analysis. This changes the clinical question and may reduce power to detect a difference between the arms. In this article, we develop the theory for sample size calculation in selection trials where the time-to-event endpoint is assumed to follow an exponential or Weilbull distribution. We provide a free web application for sample size calculation, as well as an R package, that researchers can use in the design of their studies.


Subject(s)
Research Design , Humans , Patient Selection , Randomized Controlled Trials as Topic , Sample Size
8.
Clin Infect Dis ; 72(11): e887-e889, 2021 06 01.
Article in English | MEDLINE | ID: mdl-33053155

ABSTRACT

For survival analysis in comparative coronavirus disease 2019 trials, the routinely used hazard ratio may not provide a meaningful summary of the treatment effect. The mean survival time difference/ratio is an intuitive, assumption-free alternative. However, for short-term studies, landmark mortality rate differences/ratios are more clinically relevant and should be formally analyzed and reported.


Subject(s)
COVID-19 , Humans , Proportional Hazards Models , SARS-CoV-2 , Survival Analysis , Treatment Outcome
9.
Ann Intern Med ; 173(8): 632-637, 2020 10 20.
Article in English | MEDLINE | ID: mdl-32634024

ABSTRACT

Clinical trials of treatments for coronavirus disease 2019 (COVID-19) draw intense public attention. More than ever, valid, transparent, and intuitive summaries of the treatment effects, including efficacy and harm, are needed. In recently published and ongoing randomized comparative trials evaluating treatments for COVID-19, time to a positive outcome, such as recovery or improvement, has repeatedly been used as either the primary or key secondary end point. Because patients may die before recovery or improvement, data analysis of this end point faces a competing risk problem. Commonly used survival analysis techniques, such as the Kaplan-Meier method, often are not appropriate for such situations. Moreover, almost all trials have quantified treatment effects by using the hazard ratio, which is difficult to interpret for a positive event, especially in the presence of competing risks. Using 2 recent trials evaluating treatments (remdesivir and convalescent plasma) for COVID-19 as examples, a valid, well-established yet underused procedure is presented for estimating the cumulative recovery or improvement rate curve across the study period. Furthermore, an intuitive and clinically interpretable summary of treatment efficacy based on this curve is also proposed. Clinical investigators are encouraged to consider applying these methods for quantifying treatment effects in future studies of COVID-19.


Subject(s)
Betacoronavirus , Coronavirus Infections/therapy , Pandemics , Pneumonia, Viral/therapy , Randomized Controlled Trials as Topic/methods , COVID-19 , Coronavirus Infections/epidemiology , Humans , Immunization, Passive/methods , Pneumonia, Viral/epidemiology , SARS-CoV-2 , Treatment Outcome , COVID-19 Serotherapy
10.
Ann Intern Med ; 173(5): 368-374, 2020 09 01.
Article in English | MEDLINE | ID: mdl-32628533

ABSTRACT

In comparative studies, treatment effect is often assessed using a binary outcome that indicates response to the therapy. Commonly used summary measures for response include the cumulative and current response rates at a specific time point. The current response rate is sometimes called the probability of being in response (PBIR), which regards a patient as a responder only if they have achieved and remain in response at present. The methods used in practice for estimating these rates, however, may not be appropriate. Moreover, whereas an effective treatment is expected to achieve a rapid and sustained response, the response at a fixed time point does not provide information about the duration of response (DOR). As an alternative, a curve constructed from the current response rates over the entire study period may be considered, which can be used for visualizing how rapidly patients responded to therapy and how long responses were sustained. The area under the PBIR curve is the mean DOR. This connection between response and DOR makes this curve attractive for assessing the treatment effect. In contrast to the conventional method for analyzing the DOR data, which uses responders only, the above procedure includes all patients in the study. Although discussed extensively in the statistical literature, estimation of the current response rate curve has garnered little attention in the medical literature. This article illustrates how to construct and analyze such a curve using data from a recent study for treating renal cell carcinoma. Clinical trialists are encouraged to consider this robust and clinically interpretable procedure as an additional tool for evaluating treatment effects in clinical studies.


Subject(s)
Comparative Effectiveness Research , Data Interpretation, Statistical , Equivalence Trials as Topic , Antineoplastic Agents/therapeutic use , Carcinoma, Renal Cell/drug therapy , Humans , Kidney Neoplasms/drug therapy , Probability , Randomized Controlled Trials as Topic , Statistics as Topic/methods , Time Factors , Treatment Outcome
11.
Biometrics ; 76(4): 1262-1272, 2020 12.
Article in English | MEDLINE | ID: mdl-31883270

ABSTRACT

Quantitative traits analyzed in Genome-Wide Association Studies (GWAS) are often nonnormally distributed. For such traits, association tests based on standard linear regression are subject to reduced power and inflated type I error in finite samples. Applying the rank-based inverse normal transformation (INT) to nonnormally distributed traits has become common practice in GWAS. However, the different variations on INT-based association testing have not been formally defined, and guidance is lacking on when to use which approach. In this paper, we formally define and systematically compare the direct (D-INT) and indirect (I-INT) INT-based association tests. We discuss their assumptions, underlying generative models, and connections. We demonstrate that the relative powers of D-INT and I-INT depend on the underlying data generating process. Since neither approach is uniformly most powerful, we combine them into an adaptive omnibus test (O-INT). O-INT is robust to model misspecification, protects the type I error, and is well powered against a wide range of nonnormally distributed traits. Extensive simulations were conducted to examine the finite sample operating characteristics of these tests. Our results demonstrate that, for nonnormally distributed traits, INT-based tests outperform the standard untransformed association test, both in terms of power and type I error rate control. We apply the proposed methods to GWAS of spirometry traits in the UK Biobank. O-INT has been implemented in the R package RNOmni, which is available on CRAN.


Subject(s)
Genome-Wide Association Study , Models, Genetic , Linear Models , Phenotype
12.
Physiol Genomics ; 51(12): 630-643, 2019 12 01.
Article in English | MEDLINE | ID: mdl-31736414

ABSTRACT

Respiratory syncytial virus (RSV) causes severe lower respiratory tract disease in infants, young children, and susceptible adults. The pathogenesis of RSV disease is not fully understood, although toll-like receptor 4 (TLR4)-related innate immune response is known to play a role. The present study was designed to determine TLR4-mediated disease phenotypes and lung transcriptomics and to elucidate transcriptional mechanisms underlying differential RSV susceptibility in inbred strains of mice. Dominant negative Tlr4 mutant (C3H/HeJ, HeJ, Tlr4Lps-d) and its wild-type (C3H/HeOuJ, OuJ, Tlr4Lps-n) mice and five genetically diverse, differentially responsive strains bearing the wild-type Tlr4Lps-n allele were infected with RSV. Bronchoalveolar lavage, histopathology, and genome-wide transcriptomics were used to characterize the pulmonary response to RSV. RSV-induced lung neutrophilia [1 day postinfection (pi)], epithelial proliferation (1 day pi), and lymphocytic infiltration (5 days pi) were significantly lower in HeJ compared with OuJ mice. Pulmonary RSV expression was also significantly suppressed in HeJ than in OuJ. Upregulation of immune/inflammatory (Cxcl3, Saa1) and heat shock protein (Hspa1a, Hsph1) genes was characteristic of OuJ mice, while cell cycle and cell death/survival genes were modulated in HeJ mice following RSV infection. Strain-specific transcriptomics suggested virus-responsive (Oasl1, Irg1, Mx1) and epidermal differentiation complex (Krt4, Lce3a) genes may contribute to TLR4-independent defense against RSV in resistant strains including C57BL/6J. The data indicate that TLR4 contributes to pulmonary RSV pathogenesis and activation of cellular immunity, the inflammasome complex, and vascular damage underlies it. Distinct transcriptomics in differentially responsive Tlr4-wild-type strains provide new insights into the mechanism of RSV disease and potential therapeutic targets.


Subject(s)
Genetic Predisposition to Disease , Lung Injury/genetics , Respiratory Syncytial Virus Infections/metabolism , Respiratory Syncytial Viruses/isolation & purification , Toll-Like Receptor 4/metabolism , Transcriptome/genetics , Animals , Disease Models, Animal , Immunity, Cellular , Lung Injury/virology , Male , Mice , Mice, Inbred BALB C , Mice, Inbred C57BL , Mice, Inbred DBA , Mice, Transgenic , Phenotype , Respiratory Syncytial Virus Infections/virology , Toll-Like Receptor 4/genetics , Viral Load/genetics
15.
Proc Natl Acad Sci U S A ; 112(10): 3056-61, 2015 Mar 10.
Article in English | MEDLINE | ID: mdl-25713392

ABSTRACT

Dendritic cells (DCs) are the primary leukocytes responsible for priming T cells. To find and activate naïve T cells, DCs must migrate to lymph nodes, yet the cellular programs responsible for this key step remain unclear. DC migration to lymph nodes and the subsequent T-cell response are disrupted in a mouse we recently described lacking the NOD-like receptor NLRP10 (NLR family, pyrin domain containing 10); however, the mechanism by which this pattern recognition receptor governs DC migration remained unknown. Using a proteomic approach, we discovered that DCs from Nlrp10 knockout mice lack the guanine nucleotide exchange factor DOCK8 (dedicator of cytokinesis 8), which regulates cytoskeleton dynamics in multiple leukocyte populations; in humans, loss-of-function mutations in Dock8 result in severe immunodeficiency. Surprisingly, Nlrp10 knockout mice crossed to other backgrounds had normal DOCK8 expression. This suggested that the original Nlrp10 knockout strain harbored an unexpected mutation in Dock8, which was confirmed using whole-exome sequencing. Consistent with our original report, NLRP3 inflammasome activation remained unaltered in NLRP10-deficient DCs even after restoring DOCK8 function; however, these DCs recovered the ability to migrate. Isolated loss of DOCK8 via targeted deletion confirmed its absolute requirement for DC migration. Because mutations in Dock genes have been discovered in other mouse lines, we analyzed the diversity of Dock8 across different murine strains and found that C3H/HeJ mice also harbor a Dock8 mutation that partially impairs DC migration. We conclude that DOCK8 is an important regulator of DC migration during an immune response and is prone to mutations that disrupt its crucial function.


Subject(s)
Carrier Proteins/physiology , Cell Movement/genetics , Dendritic Cells/immunology , Guanine Nucleotide Exchange Factors/physiology , Adaptor Proteins, Signal Transducing , Animals , Apoptosis Regulatory Proteins , Carrier Proteins/genetics , Guanine Nucleotide Exchange Factors/genetics , Lymphocyte Activation , Mice , Mice, Inbred C3H , Mice, Knockout , Point Mutation
20.
N Engl J Med ; 381(11): e22, 2019 09 12.
Article in English | MEDLINE | ID: mdl-31509686
SELECTION OF CITATIONS
SEARCH DETAIL