Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 32
Filter
1.
PLoS Genet ; 20(1): e1011134, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38241355

ABSTRACT

It has been well established that cancer cells can evade immune surveillance by mutating themselves. Understanding genetic alterations in cancer cells that contribute to immune regulation could lead to better immunotherapy patient stratification and identification of novel immune-oncology (IO) targets. In this report, we describe our effort of genome-wide association analyses across 22 TCGA cancer types to explore the associations between genetic alterations in cancer cells and 74 immune traits. Results showed that the tumor microenvironment (TME) is shaped by different gene mutations in different cancer types. Out of the key genes that drive multiple immune traits, top hit KEAP1 in lung adenocarcinoma (LUAD) was selected for validation. It was found that KEAP1 mutations can explain more than 10% of the variance for multiple immune traits in LUAD. Using public scRNA-seq data, further analysis confirmed that KEAP1 mutations activate the NRF2 pathway and promote a suppressive TME. The activation of the NRF2 pathway is negatively correlated with lower T cell infiltration and higher T cell exhaustion. Meanwhile, several immune check point genes, such as CD274 (PD-L1), are highly expressed in NRF2-activated cancer cells. By integrating multiple RNA-seq data, a NRF2 gene signature was curated, which predicts anti-PD1 therapy response better than CD274 gene alone in a mixed cohort of different subtypes of non-small cell lung cancer (NSCLC) including LUAD, highlighting the important role of KEAP1-NRF2 axis in shaping the TME in NSCLC. Finally, a list of overexpressed ligands in NRF2 pathway activated cancer cells were identified and could potentially be targeted for TME remodeling in LUAD.


Subject(s)
Adenocarcinoma of Lung , Carcinoma, Non-Small-Cell Lung , Lung Neoplasms , Humans , Kelch-Like ECH-Associated Protein 1/genetics , Genome-Wide Association Study , NF-E2-Related Factor 2/genetics , Lung Neoplasms/genetics , Adenocarcinoma of Lung/genetics , Tumor Microenvironment/genetics , Prognosis
2.
Am J Hum Genet ; 110(5): 762-773, 2023 05 04.
Article in English | MEDLINE | ID: mdl-37019109

ABSTRACT

The ongoing release of large-scale sequencing data in the UK Biobank allows for the identification of associations between rare variants and complex traits. SAIGE-GENE+ is a valid approach to conducting set-based association tests for quantitative and binary traits. However, for ordinal categorical phenotypes, applying SAIGE-GENE+ with treating the trait as quantitative or binarizing the trait can cause inflated type I error rates or power loss. In this study, we propose a scalable and accurate method for rare-variant association tests, POLMM-GENE, in which we used a proportional odds logistic mixed model to characterize ordinal categorical phenotypes while adjusting for sample relatedness. POLMM-GENE fully utilizes the categorical nature of phenotypes and thus can well control type I error rates while remaining powerful. In the analyses of UK Biobank 450k whole-exome-sequencing data for five ordinal categorical traits, POLMM-GENE identified 54 gene-phenotype associations.


Subject(s)
Exome , Genome-Wide Association Study , Genome-Wide Association Study/methods , Exome/genetics , Biological Specimen Banks , Phenotype , Data Analysis , United Kingdom
3.
Am J Hum Genet ; 108(5): 825-839, 2021 05 06.
Article in English | MEDLINE | ID: mdl-33836139

ABSTRACT

In genome-wide association studies, ordinal categorical phenotypes are widely used to measure human behaviors, satisfaction, and preferences. However, because of the lack of analysis tools, methods designed for binary or quantitative traits are commonly used inappropriately to analyze categorical phenotypes. To accurately model the dependence of an ordinal categorical phenotype on covariates, we propose an efficient mixed model association test, proportional odds logistic mixed model (POLMM). POLMM is computationally efficient to analyze large datasets with hundreds of thousands of samples, can control type I error rates at a stringent significance level regardless of the phenotypic distribution, and is more powerful than alternative methods. In contrast, the standard linear mixed model approaches cannot control type I error rates for rare variants when the phenotypic distribution is unbalanced, although they performed well when testing common variants. We applied POLMM to 258 ordinal categorical phenotypes on array genotypes and imputed samples from 408,961 individuals in UK Biobank. In total, we identified 5,885 genome-wide significant variants, of which, 424 variants (7.2%) are rare variants with MAF < 0.01.


Subject(s)
Computer Simulation , Genome-Wide Association Study , Models, Genetic , Phenotype , Biological Specimen Banks , Child , Female , Humans , Male , Research Design , United Kingdom
4.
Brief Bioinform ; 23(2)2022 03 10.
Article in English | MEDLINE | ID: mdl-35037014

ABSTRACT

Optimal methods could effectively improve the accuracy of predicting and identifying candidate driver genes. Various computational methods based on mutational frequency, network and function approaches have been developed to identify mutation driver genes in cancer genomes. However, a comprehensive evaluation of the performance levels of network-, function- and frequency-based methods is lacking. In the present study, we assessed and compared eight performance criteria for eight network-based, one function-based and three frequency-based algorithms using eight benchmark datasets. Under different conditions, the performance of approaches varied in terms of network, measurement and sample size. The frequency-based driverMAPS and network-based HotNet2 methods showed the best overall performance. Network-based algorithms using protein-protein interaction networks outperformed the function- and the frequency-based approaches. Precision, F1 score and Matthews correlation coefficient were low for most approaches. Thus, most of these algorithms require stringent cutoffs to correctly distinguish driver and non-driver genes. We constructed a website named Cancer Driver Catalog (http://159.226.67.237/sun/cancer_driver/), wherein we integrated the gene scores predicted by the foregoing software programs. This resource provides valuable guidance for cancer researchers and clinical oncologists prioritizing cancer driver gene candidates by using an optimal tool.


Subject(s)
Neoplasms , Oncogenes , Algorithms , Computational Biology/methods , Gene Regulatory Networks , Humans , Mutation , Neoplasms/genetics , Software
5.
Nucleic Acids Res ; 50(D1): D72-D82, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34792166

ABSTRACT

Rapid advances in high-throughput sequencing technologies have led to the discovery of thousands of extrachromosomal circular DNAs (eccDNAs) in the human genome. Loss-of-function experiments are difficult to conduct on circular and linear chromosomes, as they usually overlap. Hence, it is challenging to interpret the molecular functions of eccDNAs. Here, we present CircleBase (http://circlebase.maolab.org), an integrated resource and analysis platform used to curate and interpret eccDNAs in multiple cell types. CircleBase identifies putative functional eccDNAs by incorporating sequencing datasets, computational predictions, and manual annotations. It classifies them into six sections including targeting genes, epigenetic regulations, regulatory elements, chromatin accessibility, chromatin interactions, and genetic variants. The eccDNA targeting and regulatory networks are displayed by informative visualization tools and then prioritized. Functional enrichment analyses revealed that the top-ranked cancer cell eccDNAs were enriched in oncogenic pathways such as the Ras and PI3K-Akt signaling pathways. In contrast, eccDNAs from healthy individuals were not significantly enriched. CircleBase provides a user-friendly interface for searching, browsing, and analyzing eccDNAs in various cell/tissue types. Thus, it is useful to screen for potential functional eccDNAs and interpret their molecular mechanisms in human cancers and other diseases.


Subject(s)
Chromosomes/genetics , DNA, Circular/genetics , Databases, Genetic , Extrachromosomal Inheritance/genetics , Cell Lineage/genetics , Cytoplasm/genetics , Genome, Human/genetics , High-Throughput Nucleotide Sequencing , Humans
6.
Am J Hum Genet ; 107(2): 222-233, 2020 08 06.
Article in English | MEDLINE | ID: mdl-32589924

ABSTRACT

With increasing biobanking efforts connecting electronic health records and national registries to germline genetics, the time-to-event data analysis has attracted increasing attention in the genetics studies of human diseases. In time-to-event data analysis, the Cox proportional hazards (PH) regression model is one of the most used approaches. However, existing methods and tools are not scalable when analyzing a large biobank with hundreds of thousands of samples and endpoints, and they are not accurate when testing low-frequency and rare variants. Here, we propose a scalable and accurate method, SPACox (a saddlepoint approximation implementation based on the Cox PH regression model), that is applicable for genome-wide scale time-to-event data analysis. SPACox requires fitting a Cox PH regression model only once across the genome-wide analysis and then uses a saddlepoint approximation (SPA) to calibrate the test statistics. Simulation studies show that SPACox is 76-252 times faster than other existing alternatives, such as gwasurvivr, 185-511 times faster than the standard Wald test, and more than 6,000 times faster than the Firth correction and can control type I error rates at the genome-wide significance level regardless of minor allele frequencies. Through the analysis of UK Biobank inpatient data of 282,871 white British European ancestry samples, we show that SPACox can efficiently analyze large sample sizes and accurately control type I error rates. We identified 611 loci associated with time-to-event phenotypes of 12 common diseases, of which 38 loci would be missed within a logistic regression framework with a binary phenotype defined as event occurrence status during the follow-up period.


Subject(s)
Genome-Wide Association Study/methods , Biological Specimen Banks , Case-Control Studies , Data Analysis , Gene Frequency/genetics , Humans , Logistic Models , Phenotype , Proportional Hazards Models , Sample Size , United Kingdom , White People/genetics
7.
Am J Hum Genet ; 106(1): 3-12, 2020 01 02.
Article in English | MEDLINE | ID: mdl-31866045

ABSTRACT

In biobank data analysis, most binary phenotypes have unbalanced case-control ratios, and this can cause inflation of type I error rates. Recently, a saddle point approximation (SPA) based single-variant test has been developed to provide an accurate and scalable method to test for associations of such phenotypes. For gene- or region-based multiple-variant tests, a few methods exist that can adjust for unbalanced case-control ratios; however, these methods are either less accurate when case-control ratios are extremely unbalanced or not scalable for large data analyses. To address these problems, we propose SKAT- and SKAT-O- type region-based tests; in these tests, the single-variant score statistic is calibrated based on SPA and efficient resampling (ER). Through simulation studies, we show that the proposed method provides well-calibrated p values. In contrast, when the case-control ratio is 1:99, the unadjusted approach has greatly inflated type I error rates (90 times that of exome-wide sequencing α = 2.5 × 10-6). Additionally, the proposed method has similar computation time to the unadjusted approaches and is scalable for large sample data. In our application, the UK Biobank whole-exome sequence data analysis of 45,596 unrelated European samples and 791 PheCode phenotypes identified 10 rare-variant associations with p value < 10-7, including the associations between JAK2 and myeloproliferative disease, HOXB13 and cancer of prostate, and F11 and congenital coagulation defects. All analysis summary results are publicly available through a web-based visual server, and this availability can help facilitate the identification of the genetic basis of complex diseases.


Subject(s)
Biological Specimen Banks , Exome Sequencing/methods , Exome/genetics , Genome-Wide Association Study , Phenomics , Polymorphism, Single Nucleotide , Case-Control Studies , Computer Simulation , Humans , Numerical Analysis, Computer-Assisted , Phenotype , United Kingdom
8.
Bioinformatics ; 38(18): 4337-4343, 2022 09 15.
Article in English | MEDLINE | ID: mdl-35876838

ABSTRACT

MOTIVATION: In the genome-wide association analysis of population-based biobanks, most diseases have low prevalence, which results in low detection power. One approach to tackle the problem is using family disease history, yet existing methods are unable to address type I error inflation induced by increased correlation of phenotypes among closely related samples, as well as unbalanced phenotypic distribution. RESULTS: We propose a new method for genetic association test with family disease history, mixed-model-based Test with Adjusted Phenotype and Empirical saddlepoint approximation, which controls for increased phenotype correlation by adopting a two-variance-component mixed model, accounts for case-control imbalance by using empirical saddlepoint approximation, and is flexible to incorporate any existing adjusted phenotypes, such as phenotypes from the LT-FH method. We show through simulation studies and analysis of UK Biobank data of white British samples and the Korean Genome and Epidemiology Study of Korean samples that the proposed method is robust and yields better calibration compared to existing methods while gaining power for detection of variant-phenotype associations. AVAILABILITY AND IMPLEMENTATION: The summary statistics and code generated in this study are available at https://github.com/styvon/TAPE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Genome-Wide Association Study/methods , Case-Control Studies , Phenotype , Computer Simulation
9.
Blood ; 137(2): 155-167, 2021 01 14.
Article in English | MEDLINE | ID: mdl-33156908

ABSTRACT

The histone mark H3K27me3 and its reader/writer polycomb repressive complex 2 (PRC2) mediate widespread transcriptional repression in stem and progenitor cells. Mechanisms that regulate this activity are critical for hematopoietic development but are poorly understood. Here we show that the E3 ubiquitin ligase F-box only protein 11 (FBXO11) relieves PRC2-mediated repression during erythroid maturation by targeting its newly identified substrate bromo adjacent homology domain-containing 1 (BAHD1), an H3K27me3 reader that recruits transcriptional corepressors. Erythroblasts lacking FBXO11 are developmentally delayed, with reduced expression of maturation-associated genes, most of which harbor bivalent histone marks at their promoters. In FBXO11-/- erythroblasts, these gene promoters bind BAHD1 and fail to recruit the erythroid transcription factor GATA1. The BAHD1 complex interacts physically with PRC2, and depletion of either component restores FBXO11-deficient erythroid gene expression. Our studies identify BAHD1 as a novel effector of PRC2-mediated repression and reveal how a single E3 ubiquitin ligase eliminates PRC2 repression at many developmentally poised bivalent genes during erythropoiesis.


Subject(s)
Chromosomal Proteins, Non-Histone/metabolism , Erythropoiesis/physiology , F-Box Proteins/metabolism , Gene Expression Regulation/physiology , Polycomb Repressive Complex 2/metabolism , Protein-Arginine N-Methyltransferases/metabolism , Cell Line , Erythroblasts/metabolism , Humans , Proteolysis
10.
Am J Hum Genet ; 105(6): 1182-1192, 2019 12 05.
Article in English | MEDLINE | ID: mdl-31735295

ABSTRACT

The etiology of most complex diseases involves genetic variants, environmental factors, and gene-environment interaction (G × E) effects. Compared with marginal genetic association studies, G × E analysis requires more samples and detailed measure of environmental exposures, and this limits the possible discoveries. Large-scale population-based biobanks with detailed phenotypic and environmental information, such as UK-Biobank, can be ideal resources for identifying G × E effects. However, due to the large computation cost and the presence of case-control imbalance, existing methods often fail. Here we propose a scalable and accurate method, SPAGE (SaddlePoint Approximation implementation of G × E analysis), that is applicable for genome-wide scale phenome-wide G × E studies. SPAGE fits a genotype-independent logistic model only once across the genome-wide analysis in order to reduce computation cost, and SPAGE uses a saddlepoint approximation (SPA) to calibrate the test statistics for analysis of phenotypes with unbalanced case-control ratios. Simulation studies show that SPAGE is 33-79 times faster than the Wald test and 72-439 times faster than the Firth's test, and SPAGE can control type I error rates at the genome-wide significance level even when case-control ratios are extremely unbalanced. Through the analysis of UK-Biobank data of 344,341 white British European-ancestry samples, we show that SPAGE can efficiently analyze large samples while controlling for unbalanced case-control ratios.


Subject(s)
Biological Specimen Banks , Gene-Environment Interaction , Genetic Diseases, Inborn/genetics , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Quantitative Trait, Heritable , Case-Control Studies , Female , Genetic Diseases, Inborn/epidemiology , Humans , Logistic Models , Male , Phenomics , Phenotype , United Kingdom/epidemiology
11.
Biostatistics ; 21(1): 33-49, 2020 01 01.
Article in English | MEDLINE | ID: mdl-30007308

ABSTRACT

It has been well acknowledged that methods for secondary trait (ST) association analyses under a case-control design (ST$_{\text{CC}}$) should carefully consider the sampling process to avoid biased risk estimates. A similar situation also exists in the extreme phenotype sequencing (EPS) designs, which is to select subjects with extreme values of continuous primary phenotype for sequencing. EPS designs are commonly used in modern epidemiological and clinical studies such as the well-known National Heart, Lung, and Blood Institute Exome Sequencing Project. Although naïve generalized regression or ST$_{\text{CC}}$ method could be applied, their validity is questionable due to difference in statistical designs. Herein, we propose a general prospective likelihood framework to perform association testing for binary and continuous STs under EPS designs (STEPS), which can also incorporate covariates and interaction terms. We provide a computationally efficient and robust algorithm to obtain the maximum likelihood estimates. We also present two empirical mathematical formulas for power/sample size calculations to facilitate planning of binary/continuous STs association analyses under EPS designs. Extensive simulations and application to a genome-wide association study of benign ethnic neutropenia under an EPS design demonstrate the superiority of STEPS over all its alternatives above.


Subject(s)
Genetic Association Studies/methods , Models, Theoretical , Computer Simulation , Humans , Likelihood Functions , Phenotype
12.
Blood ; 133(18): 1927-1942, 2019 05 02.
Article in English | MEDLINE | ID: mdl-30782612

ABSTRACT

Although many recent studies describe the emergence and prevalence of "clonal hematopoiesis of indeterminate potential" in aged human populations, a systematic analysis of the numbers of clones supporting steady-state hematopoiesis throughout mammalian life is lacking. Previous efforts relied on transplantation of "barcoded" hematopoietic stem cells (HSCs) to track the contribution of HSC clones to reconstituted blood. However, ex vivo manipulation and transplantation alter HSC function and thus may not reflect the biology of steady-state hematopoiesis. Using a noninvasive in vivo color-labeling system, we report the first comprehensive analysis of the changing global clonal complexity of steady-state hematopoiesis during the natural murine lifespan. We observed that the number of clones (ie, clonal complexity) supporting the major blood and bone marrow hematopoietic compartments decline with age by ∼30% and ∼60%, respectively. Aging dramatically reduced HSC in vivo-repopulating activity and lymphoid potential while increasing functional heterogeneity. Continuous challenge of the hematopoietic system by serial transplantation provoked the clonal collapse of both young and aged hematopoietic systems. Whole-exome sequencing of serially transplanted aged and young hematopoietic clones confirmed oligoclonal hematopoiesis and revealed mutations in at least 27 genes, including nonsense, missense, and deletion mutations in Bcl11b, Hist1h2ac, Npy2r, Notch3, Ptprr, and Top2b.


Subject(s)
Aging/physiology , Clone Cells/cytology , Hematopoiesis/physiology , Hematopoietic Stem Cells/cytology , Animals , Hematopoietic Stem Cell Transplantation , Mice
13.
Stem Cells ; 36(6): 943-950, 2018 06.
Article in English | MEDLINE | ID: mdl-29430853

ABSTRACT

Hematopoietic stem and progenitor cells (HSPCs) are necessary for life-long blood production and replenishment of the hematopoietic system during stress. We recently reported that nuclear factor I/X (Nfix) promotes HSPC survival post-transplant. Here, we report that ectopic expression of Nfix in primary mouse HSPCs extends their ex vivo culture from about 20 to 40 days. HSPCs overexpressing Nfix display hypersensitivity to supportive cytokines and reduced apoptosis when subjected to cytokine deprivation relative to controls. Ectopic Nfix resulted in elevated levels of c-Mpl transcripts and cell surface protein on primary murine HSPCs as well as increased phosphorylation of STAT5, which is known to be activated down-stream of c-MPL. Blocking c-MPL signaling by removal of thrombopoietin or addition of a c-MPL neutralizing antibody negated the antiapoptotic effect of Nfix overexpression on cultured HSPCs. Furthermore, NFIX was capable of binding to and transcriptionally activating a proximal c-Mpl promoter fragment. In sum, these data suggest that NFIX-mediated upregulation of c-Mpl transcription can protect primitive hematopoietic cells from stress ex vivo. Stem Cells 2018;36:943-950.


Subject(s)
Hematopoietic Stem Cells/metabolism , NFI Transcription Factors/metabolism , Receptors, Thrombopoietin/metabolism , Animals , Humans , Mice , Signal Transduction
14.
Methods ; 145: 67-75, 2018 08 01.
Article in English | MEDLINE | ID: mdl-29803781

ABSTRACT

Genome-wide association studies have discovered many biologically important associations of genes with phenotypes. Typically, genome-wide association analyses formally test the association of each genetic feature (SNP, CNV, etc) with the phenotype of interest and summarize the results with multiplicity-adjusted p-values. However, very small p-values only provide evidence against the null hypothesis of no association without indicating which biological model best explains the observed data. Correctly identifying a specific biological model may improve the scientific interpretation and can be used to more effectively select and design a follow-up validation study. Thus, statistical methodology to identify the correct biological model for a particular genotype-phenotype association can be very useful to investigators. Here, we propose a general statistical method to summarize how accurately each of five biological models (null, additive, dominant, recessive, co-dominant) represents the data observed for each variant in a GWAS study. We show that the new method stringently controls the false discovery rate and asymptotically selects the correct biological model. Simulations of two-stage discovery-validation studies show that the new method has these properties and that its validation power is similar to or exceeds that of simple methods that use the same statistical model for all SNPs. Example analyses of three data sets also highlight these advantages of the new method. An R package is freely available at www.stjuderesearch.org/site/depts/biostats/maew.


Subject(s)
Genome-Wide Association Study/methods , Models, Genetic , Polymorphism, Genetic , Statistics as Topic , Humans
15.
Anal Chem ; 89(5): 2956-2963, 2017 03 07.
Article in English | MEDLINE | ID: mdl-28194965

ABSTRACT

Isobaric labeling quantification by mass spectrometry (MS) has emerged as a powerful technology for multiplexed large-scale protein profiling, but measurement accuracy in complex mixtures is confounded by the interference from coisolated ions, resulting in ratio compression. Here we report that the ratio compression can be essentially resolved by the combination of pre-MS peptide fractionation, MS2-based interference detection, and post-MS computational interference correction. To recapitulate the complexity of biological samples, we pooled tandem mass tag (TMT)-labeled Escherichia coli peptides at 1:3:10 ratios and added in ∼20-fold more rat peptides as background, followed by the analysis of two-dimensional liquid chromatography (LC)-MS/MS. Systematic investigation shows that quantitative interference was impacted by LC fractionation depth, MS isolation window, and peptide loading amount. Exhaustive fractionation (320 × 4 h) can nearly eliminate the interference and achieve results comparable to the MS3-based method. Importantly, the interference in MS2 scans can be estimated by the intensity of contaminated y1 product ions, and we thus developed an algorithm to correct reporter ion ratios of tryptic peptides. Our data indicate that intermediate fractionation (40 × 2 h) and y1 ion-based correction allow accurate and deep TMT profiling of more than 10 000 proteins, which represents a straightforward and affordable strategy in isobaric labeling proteomics.


Subject(s)
Chromatography, High Pressure Liquid/methods , Peptides/analysis , Tandem Mass Spectrometry/methods , Algorithms , Animals , Brain/metabolism , Escherichia coli/metabolism , Escherichia coli Proteins/metabolism , Hydrogen-Ion Concentration , Ions/chemistry , Peptides/metabolism , Rats
16.
Ann Hum Genet ; 79(4): 294-309, 2015 Jul.
Article in English | MEDLINE | ID: mdl-25959545

ABSTRACT

In genetic association studies of an ordered categorical phenotype, it is usual to either regroup multiple categories of the phenotype into two categories and then apply the logistic regression (LG), or apply ordered logistic (oLG), or ordered probit (oPRB) regression, which accounts for the ordinal nature of the phenotype. However, they may lose statistical power or may not control type I error due to their model assumption and/or instable parameter estimation algorithm when the genetic variant is rare or sample size is limited. To solve this problem, we propose a set-valued (SV) system model to identify genetic variants associated with an ordinal categorical phenotype. We couple this model with a SV system identification algorithm to identify all the key system parameters. Simulations and two real data analyses show that SV and LG accurately controlled the Type I error rate even at a significance level of 10(-6) but not oLG and oPRB in some cases. LG had significantly less power than the other three methods due to disregarding of the ordinal nature of the phenotype, and SV had similar or greater power than oLG and oPRB. We argue that SV should be employed in genetic association studies for ordered categorical phenotype.


Subject(s)
Algorithms , Genetic Association Studies , Models, Genetic , Computer Simulation , Humans , Logistic Models , Neoplasm, Residual/genetics , Precursor Cell Lymphoblastic Leukemia-Lymphoma/genetics
17.
Hum Hered ; 78(2): 104-16, 2014.
Article in English | MEDLINE | ID: mdl-25096228

ABSTRACT

We propose in this paper a set-valued (SV) system model, which is a generalized form of logistic (LG) and Probit (Probit) regression, to be considered as a method for discovering genetic variants, especially rare genetic variants in next-generation sequencing studies, for a binary phenotype. We propose a new SV system identification method to estimate all underlying key system parameters for the Probit model and compare it with the LG model in the setting of genetic association studies. Across an extensive series of simulation studies, the Probit method maintained type I error control and had similar or greater power than the LG method, which is robust to different distributions of noise: logistic, normal, or t distributions. Additionally, the Probit association parameter estimate was 2.7-46.8-fold less variable than the LG log-odds ratio association parameter estimate. Less variability in the association parameter estimate translates to greater power and robustness across the spectrum of minor allele frequencies (MAFs), and these advantages are the most pronounced for rare variants. For instance, in a simulation that generated data from an additive logistic model with an odds ratio of 7.4 for a rare single nucleotide polymorphism with a MAF of 0.005 and a sample size of 2,300, the Probit method had 60% power whereas the LG method had 25% power at the α = 10(-6) level. Consistent with these simulation results, the set of variants identified by the LG method was a subset of those identified by the Probit method in two example analyses. Thus, we suggest the Probit method may be a competitive alternative to the LG method in genetic association studies such as candidate gene, genome-wide, or next-generation sequencing studies for a binary phenotype.


Subject(s)
Genetic Association Studies/statistics & numerical data , Genetic Variation , Models, Statistical , Regression Analysis , Computer Simulation , DNA-Binding Proteins/genetics , Exome , Humans , Phenotype , Precursor Cell Lymphoblastic Leukemia-Lymphoma/genetics , Transcription Factors/genetics
18.
EBioMedicine ; 105: 105195, 2024 Jun 12.
Article in English | MEDLINE | ID: mdl-38870545

ABSTRACT

BACKGROUND: Response to antipsychotic drugs (APD) varies greatly among individuals and is affected by genetic factors. This study aims to demonstrate genome-wide associations between copy number variants (CNVs) and response to APD in patients with schizophrenia. METHODS: A total of 3030 patients of Han Chinese ethnicity randomly received APD (aripiprazole, olanzapine, quetiapine, risperidone, ziprasidone, haloperidol and perphenazine) treatment for six weeks. This study is a secondary data analysis. Percentage change on the Positive and Negative Syndrome Scale (PANSS) reduction was used to assess APD efficacy, and more than 50% change was considered as APD response. Associations between CNV burden, gene set, CNV loci and CNV break-point and APD efficacy were analysed. FINDINGS: Higher CNV losses burden decreased the odds of 6-week APD response (OR = 0.66 [0.44, 0.98]). CNV losses in synaptic pathway involved in neurotransmitters were associated with 2-week PANSS reduction rate. CNV involved in sialylation (1p31.1 losses) and cellular metabolism (19q13.32 gains) associated with 6-week PANSS reduction rate at genome-wide significant level. Additional 36 CNVs associated with PANSS factors improvement. The OR of protective CNVs for 6-week APD response was 3.10 (95% CI: 1.33-7.19) and risk CNVs was 8.47 (95% CI: 1.92-37.43). CNV interacted with genetic risk score on APD efficacy (Beta = -1.53, SE = 0.66, P = 0.021). The area under curve to differ 6-week APD response attained 80.45% (95% CI: 78.07%-82.82%). INTERPRETATION: Copy number variants contributed to poor APD efficacy and synaptic pathway involved in neurotransmitter was highlighted. FUNDING: National Natural Science Foundation of China, National Key R&D Program of China, China Postdoctoral Science Foundation.

19.
Schizophr Bull ; 49(1): 208-217, 2023 01 03.
Article in English | MEDLINE | ID: mdl-36179110

ABSTRACT

BACKGROUND AND HYPOTHESIS: Complex schizophrenia symptoms were recently conceptualized as interactive symptoms within a network system. However, it remains unknown how a schizophrenia network changed during acute antipsychotic treatment. The present study aimed to evaluate the interactive change of schizophrenia symptoms under seven antipsychotics from individual time series. STUDY DESIGN: Data on 3030 schizophrenia patients were taken from a multicenter randomized clinical trial and used to estimate the partial correlation cross-sectional networks and longitudinal random slope networks based on multivariate multilevel model. Thirty symptoms assessed by The Positive and Negative Syndrome Scale clustered the networks. STUDY RESULTS: Five stable communities were detected in cross-sectional networks and random slope networks that describe symptoms change over time. Delusions, emotional withdrawal, and lack of spontaneity and flow of conversation featured as central symptoms, and conceptual disorganization, hostility, uncooperativeness, and difficulty in abstract thinking featured as bridge symptoms, all showing high centrality in the random slope network. Acute antipsychotic treatment changed the network structure (M-test = 0.116, P < .001) compared to baseline, and responsive subjects showed lower global strength after treatment (11.68 vs 14.18, S-test = 2.503, P < .001) compared to resistant subjects. Central symptoms and bridge symptoms kept higher centrality across random slope networks of different antipsychotics. Quetiapine treatment network showed improvement in excitement symptoms, the one featured as both central and bridge symptom. CONCLUSION: Our findings revealed the central symptoms, bridge symptoms, cochanging features, and individualized features under different antipsychotics of schizophrenia. This brings implications for future targeted drug development and search for pathophysiological mechanisms.


Subject(s)
Antipsychotic Agents , Schizophrenia , Humans , Antipsychotic Agents/pharmacology , Antipsychotic Agents/therapeutic use , Schizophrenia/drug therapy , Schizophrenia/diagnosis , Cross-Sectional Studies , Quetiapine Fumarate/therapeutic use
20.
Nat Genet ; 54(10): 1466-1469, 2022 10.
Article in English | MEDLINE | ID: mdl-36138231

ABSTRACT

Several biobanks, including UK Biobank (UKBB), are generating large-scale sequencing data. An existing method, SAIGE-GENE, performs well when testing variants with minor allele frequency (MAF) ≤ 1%, but inflation is observed in variance component set-based tests when restricting to variants with MAF ≤ 0.1% or 0.01%. Here, we propose SAIGE-GENE+ with greatly improved type I error control and computational efficiency to facilitate rare variant tests in large-scale data. We further show that incorporating multiple MAF cutoffs and functional annotations can improve power and thus uncover new gene-phenotype associations. In the analysis of UKBB whole exome sequencing data for 30 quantitative and 141 binary traits, SAIGE-GENE+ identified 551 gene-phenotype associations.


Subject(s)
Genome-Wide Association Study , Gene Frequency/genetics , Genome-Wide Association Study/methods , Phenotype , Exome Sequencing
SELECTION OF CITATIONS
SEARCH DETAIL