Search | Virtual Health Library

1.

A powerful approach to identify replicable variants in genome-wide association studies.

Li, Yan; Lei, Haochen; Wen, Xiaoquan; Cao, Hongyuan.

Am J Hum Genet ; 111(5): 966-978, 2024 05 02.

Article in English | MEDLINE | ID: mdl-38701746

ABSTRACT

Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype-phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger evidence for the findings. Current replicability analysis relies on the independence assumption among single-nucleotide polymorphisms (SNPs) and ignores the linkage disequilibrium (LD) structure. We show that such a strategy may produce either overly liberal or overly conservative results in practice. We develop an efficient method, ReAD, to detect replicable SNPs associated with the phenotype from two GWASs accounting for the LD structure. The local dependence structure of SNPs across two heterogeneous studies is captured by a four-state hidden Markov model (HMM) built on two sequences of p values. By incorporating information from adjacent locations via the HMM, our approach provides more accurate SNP significance rankings. ReAD is scalable, platform independent, and more powerful than existing replicability analysis methods with effective false discovery rate control. Through analysis of datasets from two asthma GWASs and two ulcerative colitis GWASs, we show that ReAD can identify replicable genetic loci that existing methods might otherwise miss.

Subject(s)

Asthma , Genome-Wide Association Study , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Genome-Wide Association Study/methods , Humans , Asthma/genetics , Markov Chains , Colitis, Ulcerative/genetics , Reproducibility of Results , Phenotype , Genotype

2.

STAREG: Statistical replicability analysis of high throughput experiments with applications to spatial transcriptomic studies.

Li, Yan; Zhou, Xiang; Chen, Rui; Zhang, Xianyang; Cao, Hongyuan.

PLoS Genet ; 20(10): e1011423, 2024 Oct.

Article in English | MEDLINE | ID: mdl-39361716

ABSTRACT

Replicable signals from different yet conceptually related studies provide stronger scientific evidence and more powerful inference. We introduce STAREG, a statistical method for replicability analysis of high throughput experiments, and apply it to analyze spatial transcriptomic studies. STAREG uses summary statistics from multiple studies of high throughput experiments and models the the joint distribution of p-values accounting for the heterogeneity of different studies. It effectively controls the false discovery rate (FDR) and has higher power by information borrowing. Moreover, it provides different rankings of important genes. With the EM algorithm in combination with pool-adjacent-violator-algorithm (PAVA), STAREG is scalable to datasets with millions of genes without any tuning parameters. Analyzing two pairs of spatially resolved transcriptomic datasets, we are able to make biological discoveries that otherwise cannot be obtained by using existing methods.

Subject(s)

Algorithms , Gene Expression Profiling , Transcriptome , Gene Expression Profiling/methods , Transcriptome/genetics , Humans , Reproducibility of Results , Animals , Models, Statistical

3.

Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function.

Boadu, Frimpong; Cao, Hongyuan; Cheng, Jianlin.

Bioinformatics ; 39(39 Suppl 1): i318-i325, 2023 06 30.

Article in English | MEDLINE | ID: mdl-37387145

ABSTRACT

MOTIVATION: Millions of protein sequences have been generated by numerous genome and transcriptome sequencing projects. However, experimentally determining the function of the proteins is still a time consuming, low-throughput, and expensive process, leading to a large protein sequence-function gap. Therefore, it is important to develop computational methods to accurately predict protein function to fill the gap. Even though many methods have been developed to use protein sequences as input to predict function, much fewer methods leverage protein structures in protein function prediction because there was lack of accurate protein structures for most proteins until recently. RESULTS: We developed TransFun-a method using a transformer-based protein language model and 3D-equivariant graph neural networks to distill information from both protein sequences and structures to predict protein function. It extracts feature embeddings from protein sequences using a pre-trained protein language model (ESM) via transfer learning and combines them with 3D structures of proteins predicted by AlphaFold2 through equivariant graph neural networks. Benchmarked on the CAFA3 test dataset and a new test dataset, TransFun outperforms several state-of-the-art methods, indicating that the language model and 3D-equivariant graph neural networks are effective methods to leverage protein sequences and structures to improve protein function prediction. Combining TransFun predictions and sequence similarity-based predictions can further increase prediction accuracy. AVAILABILITY AND IMPLEMENTATION: The source code of TransFun is available at https://github.com/jianlin-cheng/TransFun.

Subject(s)

Benchmarking , Language , Amino Acid Sequence , Neural Networks, Computer , Software

4.

JUMP: replicability analysis of high-throughput experiments with applications to spatial transcriptomic studies.

Lyu, Pengfei; Li, Yan; Wen, Xiaoquan; Cao, Hongyuan.

Bioinformatics ; 39(6)2023 06 01.

Article in English | MEDLINE | ID: mdl-37279733

ABSTRACT

MOTIVATION: Replicability is the cornerstone of scientific research. The current statistical method for high-dimensional replicability analysis either cannot control the false discovery rate (FDR) or is too conservative. RESULTS: We propose a statistical method, JUMP, for the high-dimensional replicability analysis of two studies. The input is a high-dimensional paired sequence of p-values from two studies and the test statistic is the maximum of p-values of the pair. JUMP uses four states of the p-value pairs to indicate whether they are null or non-null. Conditional on the hidden states, JUMP computes the cumulative distribution function of the maximum of p-values for each state to conservatively approximate the probability of rejection under the composite null of replicability. JUMP estimates unknown parameters and uses a step-up procedure to control FDR. By incorporating different states of composite null, JUMP achieves a substantial power gain over existing methods while controlling the FDR. Analyzing two pairs of spatially resolved transcriptomic datasets, JUMP makes biological discoveries that otherwise cannot be obtained by using existing methods. AVAILABILITY AND IMPLEMENTATION: An R package JUMP implementing the JUMP method is available on CRAN (https://CRAN.R-project.org/package=JUMP).

Subject(s)

Gene Expression Profiling , Transcriptome , Gene Expression Profiling/methods

5.

Proposal for collinear integrated acousto-optic tunable filters featuring ultrawide tuning ranges and multi-band operations.

Pan, Bingcheng; Cao, Hongyuan; Li, Huan; Dai, Daoxin.

Opt Express ; 30(14): 24747-24761, 2022 Jul 04.

Article in English | MEDLINE | ID: mdl-36237021

ABSTRACT

Integrated optical tunable filters are key components for a wide spectrum of applications, including optical communications and interconnects, spectral analysis, and tunable light sources, among others. Compared with their thermo-optic counterparts, integrated acousto-optic (AO) tunable filters provide a unique approach to achieve superior performance, including ultrawide continuous tuning ranges of hundreds of nm, low power consumption of sub-mW and fast tuning speed of sub-µs. Based on suspended one-dimensional (1D) AO waveguides in the collinear configuration, we propose and theoretically investigate an innovative family of integrated AO tunable filters (AOTFs) on thin-film lithium niobate. The AO waveguides perform as tunable wavelength-selective narrow-band polarization rotators, where highly efficient conversion between co-propagating TE0 and TM0 modes is enabled by the torsional acoustic A1 mode, which can be selectively excited by a novel antisymmetric wavefront interdigital transducer. Furthermore, we systematically and quantitatively explore the possibilities of exciting modulated acoustic waves, which contain multiple frequency components, along the AO waveguide to achieve independently reconfigurable multi-band operations, with tunable time-variant spectral shapes. By incorporating a complete set of ultrawide-band polarization-handling components, we have proposed and theoretically investigated several representative monolithic AOTF configurations, featuring different arrangements of single or cascaded identical AO waveguides. One of the present AOTF designs exhibits a theoretical linewidth of â¼8ânm (â¼4ânm), a sidelobe suppression ratio of â¼75âdB, and theoretically no excess loss at the center wavelength of 1550ânm (1310ânm), with an ultrawide tuning range of 1.25-1.65âµm (from O-band to L-band), a fast tuning speed of 0.14 µs, and a low power consumption of a few mW.

6.

OPTIMAL FALSE DISCOVERY RATE CONTROL FOR LARGE SCALE MULTIPLE TESTING WITH AUXILIARY INFORMATION.

Cao, Hongyuan; Chen, Jun; Zhang, Xianyang.

Ann Stat ; 50(2): 807-857, 2022 Apr.

Article in English | MEDLINE | ID: mdl-37138896

ABSTRACT

Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are available. Exploiting such auxiliary information can boost statistical power. To this end, we propose a framework based on a two-group mixture model with varying probabilities of being null for different hypotheses a priori, where a shape-constrained relationship is imposed between the auxiliary information and the prior probabilities of being null. An optimal rejection rule is designed to maximize the expected number of true positives when average false discovery rate is controlled. Focusing on the ordered structure, we develop a robust EM algorithm to estimate the prior probabilities of being null and the distribution of p-values under the alternative hypothesis simultaneously. We show that the proposed method has better power than state-of-the-art competitors while controlling the false discovery rate, both empirically and theoretically. Extensive simulations demonstrate the advantage of the proposed method. Datasets from genome-wide association studies are used to illustrate the new methodology.

7.

Regression analysis of additive hazards model with sparse longitudinal covariates.

Sun, Zhuowei; Cao, Hongyuan; Chen, Li.

Lifetime Data Anal ; 28(2): 263-281, 2022 04.

Article in English | MEDLINE | ID: mdl-35147908

ABSTRACT

Additive hazards model is often used to complement the proportional hazards model in the analysis of failure time data. Statistical inference of additive hazards model with time-dependent longitudinal covariates requires the availability of the whole trajectory of the longitudinal process, which is not realistic in practice. The commonly used last value carried forward approach for intermittently observed longitudinal covariates can induce biased parameter estimation. The more principled joint modeling of the longitudinal process and failure time data imposes strong modeling assumptions, which is difficult to verify. In this paper, we propose methods that weigh the distance between the observational time of longitudinal covariates and the failure time, resulting in unbiased regression coefficient estimation. We establish the consistency and asymptotic normality of the proposed estimators. Simulation studies provide numerical support for the theoretical findings. Data from an Alzheimer's study illustrate the practical utility of the methodology.

Subject(s)

Proportional Hazards Models , Computer Simulation , Humans , Regression Analysis

8.

On computation of semiparametric maximum likelihood estimators with shape constraints.

Wang, Yudong; Ye, Zhi-Sheng; Cao, Hongyuan.

Biometrics ; 77(1): 113-124, 2021 03.

Article in English | MEDLINE | ID: mdl-32271941

ABSTRACT

Large sample theory of semiparametric models based on maximum likelihood estimation (MLE) with shape constraint on the nonparametric component is well studied. Relatively less attention has been paid to the computational aspect of semiparametric MLE. The computation of semiparametric MLE based on existing approaches such as the expectation-maximization (EM) algorithm can be computationally prohibitive when the missing rate is high. In this paper, we propose a computational framework for semiparametric MLE based on an inexact block coordinate ascent (BCA) algorithm. We show theoretically that the proposed algorithm converges. This computational framework can be applied to a wide range of data with different structures, such as panel count data, interval-censored data, and degradation data, among others. Simulation studies demonstrate favorable performance compared with existing algorithms in terms of accuracy and speed. Two data sets are used to illustrate the proposed computational method. We further implement the proposed computational method in R package BCA1SG, available at CRAN.

Subject(s)

Algorithms , Models, Statistical , Computer Simulation , Likelihood Functions

9.

False discovery rate control incorporating phylogenetic tree increases detection power in microbiome-wide multiple testing.

Xiao, Jian; Cao, Hongyuan; Chen, Jun.

Bioinformatics ; 33(18): 2873-2881, 2017 Sep 15.

Article in English | MEDLINE | ID: mdl-28505251

ABSTRACT

MOTIVATION: Next generation sequencing technologies have enabled the study of the human microbiome through direct sequencing of microbial DNA, resulting in an enormous amount of microbiome sequencing data. One unique characteristic of microbiome data is the phylogenetic tree that relates all the bacterial species. Closely related bacterial species have a tendency to exhibit a similar relationship with the environment or disease. Thus, incorporating the phylogenetic tree information can potentially improve the detection power for microbiome-wide association studies, where hundreds or thousands of tests are conducted simultaneously to identify bacterial species associated with a phenotype of interest. Despite much progress in multiple testing procedures such as false discovery rate (FDR) control, methods that take into account the phylogenetic tree are largely limited. RESULTS: We propose a new FDR control procedure that incorporates the prior structure information and apply it to microbiome data. The proposed procedure is based on a hierarchical model, where a structure-based prior distribution is designed to utilize the phylogenetic tree. By borrowing information from neighboring bacterial species, we are able to improve the statistical power of detecting associated bacterial species while controlling the FDR at desired levels. When the phylogenetic tree is mis-specified or non-informative, our procedure achieves a similar power as traditional procedures that do not take into account the tree structure. We demonstrate the performance of our method through extensive simulations and real microbiome datasets. We identified far more alcohol-drinking associated bacterial species than traditional methods. AVAILABILITY AND IMPLEMENTATION: R package StructFDR is available from CRAN. CONTACT: chen.jun2@mayo.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Bacteria/genetics , High-Throughput Nucleotide Sequencing/methods , Microbiota/genetics , Phylogeny , Software , Genomics/methods , Humans , Polymorphism, Genetic , Sequence Analysis, DNA/methods

10.

Assessing agreement with multiple raters on correlated kappa statistics.

Cao, Hongyuan; Sen, Pranab K; Peery, Anne F; Dellon, Evan S.

Biom J ; 58(4): 935-43, 2016 Jul.

Article in English | MEDLINE | ID: mdl-26890370

ABSTRACT

In clinical studies, it is often of interest to see the diagnostic agreement among clinicians on certain symptoms. Previous work has focused on the agreement between two clinicians under two different conditions or the agreement among multiple clinicians under one condition. Few have discussed the agreement study with a design where multiple clinicians examine the same group of patients under two different conditions. In this paper, we use the intraclass kappa statistic for assessing nominal scale agreement with such a design. We derive an explicit variance formula for the difference of correlated kappa statistics and conduct hypothesis testing for the equality of kappa statistics. Simulation studies show that the method performs well with realistic sample sizes and may be superior to a method that did not take into account the measurement dependence structure. The practical utility of the method is illustrated on data from an eosinophilic esophagitis (EoE) study.

Subject(s)

Biometry/methods , Diagnostic Techniques and Procedures/standards , Models, Statistical , Computer Simulation , Humans , Reproducibility of Results , Research Design , Sample Size

11.

Angiotensin-converting enzyme inhibitors predict acute kidney injury during chemoradiation for head and neck cancer.

Spiotto, Michael T; Cao, Hongyuan; Mell, Loren; Toback, F Gary.

Anticancer Drugs ; 26(3): 343-9, 2015 Mar.

Article in English | MEDLINE | ID: mdl-25486599

ABSTRACT

Head and neck cancer patients undergoing chemoradiation experience considerable toxicities including acute kidney injury (AKI). However, it remains unclear what factors predispose patients to renal toxicity during treatment. Here, we assessed the predictors and outcomes of patients experiencing AKI during chemoradiation. We carried out a retrospective cohort study to assess the maximum changes in serum creatinine (Cr) in 173 patients with stage III-IV head and neck cancer treated with chemoradiation between 1999 and 2012. We defined AKI as Cr increases 26.5 µmol/l or more over the pretreatment baseline. AKI was associated with angiotensin-converting enzyme inhibitor (ACEI) use (33.0 vs. 11.0%; P=0.0004), but no other medications or comorbidities. On multivariate analysis, ACEI use, weight loss 10% or more of body weight, and performance status 70 or more predicted for Cr increments 26.5 µmol/l or more, whereas only ACEI use predicted for Cr increments of 44.2 µmol/l or greater. Furthermore, on multivariate analysis, AKI predicted for more interventions during radiotherapy including intravenous fluid use (P=0.0005) and hospitalizations (P=0.007), as well as long-term renal dysfunction (P<0.0001). Renal toxicity was not associated with worse locoregional control, progression-free survival, or overall survival. Renal toxicity during chemoradiation was associated with ACEI use alone or coupled with weight loss 10% or more of body weight during therapy. Our results suggest that actively managing ACEI use and intravascular volume status during chemoradiation may avoid AKI, minimize subsequent interventions, and reduce the risk for long-term renal dysfunction.

Subject(s)

Acute Kidney Injury/etiology , Angiotensin-Converting Enzyme Inhibitors/adverse effects , Chemoradiotherapy/adverse effects , Head and Neck Neoplasms/drug therapy , Head and Neck Neoplasms/radiotherapy , Acute Kidney Injury/chemically induced , Aged , Angiotensin-Converting Enzyme Inhibitors/therapeutic use , Cohort Studies , Creatinine/blood , Disease-Free Survival , Female , Head and Neck Neoplasms/mortality , Humans , Kidney Function Tests , Male , Middle Aged , Retrospective Studies , Treatment Outcome , Weight Loss/drug effects , Weight Loss/radiation effects

12.

Targeting Staphylococcus aureus α-toxin as a novel approach to reduce severity of recurrent skin and soft-tissue infections.

Sampedro, Georgia R; DeDent, Andrea C; Becker, Russell E N; Berube, Bryan J; Gebhardt, Michael J; Cao, Hongyuan; Bubeck Wardenburg, Juliane.

J Infect Dis ; 210(7): 1012-8, 2014 Oct 01.

Article in English | MEDLINE | ID: mdl-24740631

ABSTRACT

Staphyococcus aureus frequently causes recurrent skin and soft-tissue infection (SSTI). In the pediatric population, elevated serum antibody targeting S. aureus α-toxin is correlated with a reduced incidence of recurrent SSTI. Using a novel model of recurrent SSTI, we demonstrated that expression of α-toxin during primary infection increases the severity of recurrent disease. Antagonism of α-toxin by either a dominant-negative toxin mutant or a small molecule inhibitor of the toxin receptor ADAM10 during primary infection reduces reinfection abscess severity. Early neutralization of α-toxin activity during S. aureus SSTI therefore offers a new therapeutic strategy to mitigate primary and recurrent disease.

Subject(s)

Bacterial Toxins/toxicity , Hemolysin Proteins/toxicity , Soft Tissue Infections/pathology , Staphylococcal Skin Infections/pathology , Staphylococcus aureus/physiology , Animals , Bacterial Toxins/antagonists & inhibitors , Bacterial Toxins/metabolism , Hemolysin Proteins/antagonists & inhibitors , Hemolysin Proteins/metabolism , Male , Mice, Inbred C57BL , Recurrence , Soft Tissue Infections/drug therapy , Staphylococcal Skin Infections/drug therapy , Staphylococcus aureus/metabolism

13.

Racial parities in outcomes after radiotherapy for head and neck cancer.

Liu, Gene-Fu F; Ranck, Mark C; Solanki, Abhishek A; Cao, Hongyuan; Kolokythas, Antonia; Wenig, Barry L; Chen, Lucy; Ard, Stephanie; Weichselbaum, Ralph R; Halpern, Howard; Spiotto, Michael T.

Cancer ; 120(2): 244-52, 2014 Jan 15.

Article in English | MEDLINE | ID: mdl-24122486

ABSTRACT

BACKGROUND: Although black patients experience worse outcomes after treatment for squamous cell carcinoma of the head and neck (HNSCC), these conclusions were based on populations in which blacks comprised a minority of patients. The objective of the current study was to determine the impact of race on outcomes in patients with HNSCC who received radiotherapy at an institution in which blacks comprised the majority of patients. METHODS: In this retrospective cohort study, the authors reviewed 366 black patients and 236 white patients who had nonmetastatic HNSCC for which they received radiotherapy between 1990 and 2012. The primary study outcome measures were locoregional control, freedom from distant metastasis, progression-free survival, and overall survival. RESULTS: The median follow-up was 18.3 months for all patients. The 2-year locoregional control rate was 71.9% for black patients compared with 64.2% for white patients (hazard ratio, 0.72; P=.03). There was no difference between blacks and whites regarding 2-year freedom from distant metastasis, progression-free survival, or overall survival. Among the patients who had stage III through IVB disease, blacks and whites had similar outcomes. On multivariate analysis, race was not statistically significant for locoregional control, freedom from distant metastasis, progression-free survival, or overall survival. Despite these similar outcomes, black patients had worse socioeconomic factors and increased comorbidities but had similar treatment compliance compared with white patients. CONCLUSIONS: With more adverse prognostic factors, black patients experienced oncologic outcomes similar to the outcomes of white patients after receiving radiotherapy for HNSCC. The current data suggest that centers that treat large percentages of minority patients who receive radiotherapy for HNSCCs may overcome existing health care disparities through improved treatment compliance.

Subject(s)

Carcinoma, Squamous Cell/radiotherapy , Head and Neck Neoplasms/radiotherapy , Black or African American , Carcinoma, Squamous Cell/mortality , Carcinoma, Squamous Cell/pathology , Carcinoma, Squamous Cell/surgery , Dermatitis/etiology , Disease-Free Survival , Head and Neck Neoplasms/mortality , Head and Neck Neoplasms/pathology , Head and Neck Neoplasms/surgery , Humans , Middle Aged , Multivariate Analysis , Radiotherapy/adverse effects , Squamous Cell Carcinoma of Head and Neck , Treatment Outcome , White People

14.

Scalable lipid droplet microarray fabrication, validation, and screening.

Bell, Tracey N; Kusi-Appiah, Aubrey E; Tocci, Vincent; Lyu, Pengfei; Zhu, Lei; Zhu, Fanxiu; Van Winkle, David; Cao, Hongyuan; Singh, Mandip S; Lenhert, Steven.

PLoS One ; 19(7): e0304736, 2024.

Article in English | MEDLINE | ID: mdl-38968248

ABSTRACT

High throughput screening of small molecules and natural products is costly, requiring significant amounts of time, reagents, and operating space. Although microarrays have proven effective in the miniaturization of screening for certain biochemical assays, such as nucleic acid hybridization or antibody binding, they are not widely used for drug discovery in cell culture due to the need for cells to internalize lipophilic drug candidates. Lipid droplet microarrays are a promising solution to this problem as they are capable of delivering lipophilic drugs to cells at dosages comparable to solution delivery. However, the scalablility of the array fabrication, assay validation, and screening steps has limited the utility of this approach. Here we take several new steps to scale up the process for lipid droplet array fabrication, assay validation in cell culture, and drug screening. A nanointaglio printing process has been adapted for use with a printing press. The arrays are stabilized for immersion into aqueous solution using a vapor coating process. In addition to delivery of lipophilic compounds, we found that we are also able to encapsulate and deliver a water-soluble compound in this way. The arrays can be functionalized by extracellular matrix proteins such as collagen prior to cell culture as the mechanism for uptake is based on direct contact with the lipid delivery vehicles rather than diffusion of the drug out of the microarray spots. We demonstrate this method for delivery to 3 different cell types and the screening of 92 natural product extracts on a microarray covering an area of less than 0.1 cm2. The arrays are suitable for miniaturized screening, for instance in high biosafety level facilities where space is limited and for applications where cell numbers are limited, such as in functional precision medicine.

Subject(s)

Lipid Droplets , Humans , Lipid Droplets/metabolism , Microarray Analysis/methods , Animals , Drug Evaluation, Preclinical/methods , High-Throughput Screening Assays/methods

15.

EndoPRS: Incorporating Endophenotype Information to Improve Polygenic Risk Scores for Clinical Endpoints.

Kharitonova, Elena V; Sun, Quan; Ockerman, Frank; Chen, Brian; Zhou, Laura Y; Cao, Hongyuan; Mathias, Rasika A; Auer, Paul L; Ober, Carole; Raffield, Laura M; Reiner, Alexander P; Cox, Nancy J; Kelada, Samir; Tao, Ran; Li, Yun.

medRxiv ; 2024 May 24.

Article in English | MEDLINE | ID: mdl-38826253

ABSTRACT

Polygenic risk score (PRS) prediction of complex diseases can be improved by leveraging related phenotypes. This has motivated the development of several multi-trait PRS methods that jointly model information from genetically correlated traits. However, these methods do not account for vertical pleiotropy between traits, in which one trait acts as a mediator for another. Here, we introduce endoPRS, a weighted lasso model that incorporates information from relevant endophenotypes to improve disease risk prediction without making assumptions about the genetic architecture underlying the endophenotype-disease relationship. Through extensive simulation analysis, we demonstrate the robustness of endoPRS in a variety of complex genetic frameworks. We also apply endoPRS to predict the risk of childhood onset asthma in UK Biobank by leveraging a paired GWAS of eosinophil count, a relevant endophenotype. We find that endoPRS significantly improves prediction compared to many existing PRS methods, including multi-trait PRS methods, MTAG and wMT-BLUP, which suggests advantages of endoPRS in real-life clinical settings.

16.

Medical records-based postmarketing safety evaluation of rare events with uncertain status.

Cao, Hongyuan; LaVange, Lisa M; Heyse, Joseph F; Mast, T Christopher; Kosorok, Michael R.

J Biopharm Stat ; 23(4): 744-55, 2013.

Article in English | MEDLINE | ID: mdl-23786578

ABSTRACT

We develop a simple statistic for comparing rates of rare adverse events between treatment groups in postmarketing safety studies where the events have uncertain status. In this setting, the statistic is asymptotically equivalent to the logrank statistic, but the limiting distribution has Poisson and binomial components instead of being Gaussian. We develop two new procedures for computing critical values: a Gaussian approximation and a parametric bootstrap. Both numerical and asymptotic properties of the procedures are studied. The test procedures are demonstrated on a postmarketing safety study of the RotaTeq vaccine. This vaccine was developed to reduce the incidence of severe diarrhea in infants.

Subject(s)

Consumer Product Safety , Medical Records/statistics & numerical data , Models, Statistical , Product Surveillance, Postmarketing/methods , Product Surveillance, Postmarketing/statistics & numerical data , Uncertainty , Humans , Rotavirus Vaccines/standards , Vaccines, Attenuated/standards

17.

Medical records-based postmarketing safety evaluation of rare events with uncertain status.

Cao, Hongyuan; LaVange, Lisa M; Heyse, Joseph F; Mast, T Christopher; Kosorok, Michael R.

J Biopharm Stat ; 23(1): 201-12, 2013.

Article in English | MEDLINE | ID: mdl-23331231

ABSTRACT

We develop a simple statistic for comparing rates of rare adverse events between treatment groups in postmarketing safety studies where the events have uncertain status. In this setting, the statistic is asymptotically equivalent to the logrank statistic, but the limiting distribution has Poisson and binomial components instead of being Gaussian. We develop two new procedures for computing critical values, a Gaussian approximation and a parametric bootstrap. Both numerical and asymptotic properties of the procedures are studied. The test procedures are demonstrated on a postmarketing safety study of the RotaTeq vaccine. This vaccine was developed to reduce the incidence of severe diarrhea in infants.

Subject(s)

Medical Records/standards , Patient Safety/standards , Product Surveillance, Postmarketing/methods , Product Surveillance, Postmarketing/standards , Randomized Controlled Trials as Topic/methods , Rotavirus Vaccines/adverse effects , Humans , Infant , Intussusception/etiology , Intussusception/prevention & control , Medical Records/statistics & numerical data , Normal Distribution , Patient Safety/statistics & numerical data , Product Surveillance, Postmarketing/statistics & numerical data , Randomized Controlled Trials as Topic/adverse effects , Randomized Controlled Trials as Topic/statistics & numerical data , Vaccines, Attenuated/adverse effects

18.

Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function.

Boadu, Frimpong; Cao, Hongyuan; Cheng, Jianlin.

bioRxiv ; 2023 Jan 20.

Article in English | MEDLINE | ID: mdl-36711471

ABSTRACT

Motivation: Millions of protein sequences have been generated by numerous genome and transcriptome sequencing projects. However, experimentally determining the function of the proteins is still a time consuming, low-throughput, and expensive process, leading to a large protein sequence-function gap. Therefore, it is important to develop computational methods to accurately predict protein function to fill the gap. Even though many methods have been developed to use protein sequences as input to predict function, much fewer methods leverage protein structures in protein function prediction because there was lack of accurate protein structures for most proteins until recently. Results: We developed TransFun - a method using a transformer-based protein language model and 3D-equivariant graph neural networks to distill information from both protein sequences and structures to predict protein function. It extracts feature embeddings from protein sequences using a pre-trained protein language model (ESM) via transfer learning and combines them with 3D structures of proteins predicted by AlphaFold2 through equivariant graph neural networks. Benchmarked on the CAFA3 test dataset and a new test dataset, TransFun outperforms several state-of-the-art methods, indicating the language model and 3D-equivariant graph neural networks are effective methods to leverage protein sequences and structures to improve protein function prediction. Combining TransFun predictions and sequence similarity-based predictions can further increase prediction accuracy. Availability: The source code of TransFun is available at https://github.com/jianlin-cheng/TransFun. Contact: chengji@missouri.edu.

19.

Statistical analysis of spatially resolved transcriptomic data by incorporating multiomics auxiliary information.

Li, Yan; Zhou, Xiang; Cao, Hongyuan.

Genetics ; 221(4)2022 07 30.

Article in English | MEDLINE | ID: mdl-35731210

ABSTRACT

Effective control of false discovery rate is key for multiplicity problems. Here, we consider incorporating informative covariates from external datasets in the multiple testing procedure to boost statistical power while maintaining false discovery rate control. In particular, we focus on the statistical analysis of innovative high-dimensional spatial transcriptomic data while incorporating external multiomics data that provide distinct but complementary information to the detection of spatial expression patterns. We extend OrderShapeEM, an efficient covariate-assisted multiple testing procedure that incorporates one auxiliary study, to make it permissible to incorporate multiple external omics studies, to boost statistical power of spatial expression pattern detection. Specifically, we first use a recently proposed computationally efficient statistical analysis method, spatial pattern recognition via kernels, to produce the primary test statistics for spatial transcriptomic data. Afterwards, we construct the auxiliary covariate by combining information from multiple external omics studies, such as bulk and single-cell RNA-seq data using the Cauchy combination rule. Finally, we extend and implement the integrative analysis method OrderShapeEM on the primary P-values along with auxiliary data incorporating multiomics information for efficient covariate-assisted spatial expression analysis. We conduct a series of realistic simulations to evaluate the performance of our method with known ground truth. Four case studies in mouse olfactory bulb, mouse cerebellum, human breast cancer, and human heart tissues further demonstrate the substantial power gain of our method in detecting genes with spatial expression patterns compared to existing classic approaches that do not utilize any external information.

Subject(s)

Transcriptome , Animals , Humans , Mice

20.

Diastereoselective Synthesis of Chromeno[3,2-d]isoxazoles via Brønsted Acid Catalyzed Tandem 1,6-Addition/Double Annulations of o-Hydroxyl Propargylic Alcohols.

Li, Zhu; Zhang, Pei-Xu; Li, Zhao-Zhao; Zhang, Xing-Lu; Cao, Hong-Yuan; Gao, Yu-Ning; Bian, Ming; Chen, Hui-Yu; Liu, Zhen-Jiang.

Org Lett ; 24(37): 6863-6868, 2022 Sep 23.

Article in English | MEDLINE | ID: mdl-36102802

ABSTRACT

A Brønsted acid catalyzed tandem process to access densely functionalized chromeno[3,2-d]isoxazoles with good to excellent yields and diastereoselectivities was disclosed. The procedure is proposed to involve a 1,6-conjugate addition/electrophilic addition/double annulations process of alkynyl o-quinone methides (o-AQMs) in situ generated from o-hydroxyl propargylic alcohols with nitrones. Mild conditions, good functional group compatibility, easy scale-up of the reaction, and further product transformation demonstrated its potential application.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL