Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 27
Filter
1.
Nature ; 623(7985): 139-148, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37748514

ABSTRACT

Post-acute infection syndromes may develop after acute viral disease1. Infection with SARS-CoV-2 can result in the development of a post-acute infection syndrome known as long COVID. Individuals with long COVID frequently report unremitting fatigue, post-exertional malaise, and a variety of cognitive and autonomic dysfunctions2-4. However, the biological processes that are associated with the development and persistence of these symptoms are unclear. Here 275 individuals with or without long COVID were enrolled in a cross-sectional study that included multidimensional immune phenotyping and unbiased machine learning methods to identify biological features associated with long COVID. Marked differences were noted in circulating myeloid and lymphocyte populations relative to the matched controls, as well as evidence of exaggerated humoral responses directed against SARS-CoV-2 among participants with long COVID. Furthermore, higher antibody responses directed against non-SARS-CoV-2 viral pathogens were observed among individuals with long COVID, particularly Epstein-Barr virus. Levels of soluble immune mediators and hormones varied among groups, with cortisol levels being lower among participants with long COVID. Integration of immune phenotyping data into unbiased machine learning models identified the key features that are most strongly associated with long COVID status. Collectively, these findings may help to guide future studies into the pathobiology of long COVID and help with developing relevant biomarkers.


Subject(s)
Antibodies, Viral , Herpesvirus 4, Human , Hydrocortisone , Lymphocytes , Myeloid Cells , Post-Acute COVID-19 Syndrome , SARS-CoV-2 , Humans , Antibodies, Viral/blood , Antibodies, Viral/immunology , Biomarkers/blood , Cross-Sectional Studies , Herpesvirus 4, Human/immunology , Hydrocortisone/blood , Immunophenotyping , Lymphocytes/immunology , Machine Learning , Myeloid Cells/immunology , Post-Acute COVID-19 Syndrome/diagnosis , Post-Acute COVID-19 Syndrome/immunology , Post-Acute COVID-19 Syndrome/physiopathology , Post-Acute COVID-19 Syndrome/virology , SARS-CoV-2/immunology
2.
Bioinformatics ; 40(5)2024 May 02.
Article in English | MEDLINE | ID: mdl-38603606

ABSTRACT

MOTIVATION: Predictive biological signatures provide utility as biomarkers for disease diagnosis and prognosis, as well as prediction of responses to vaccination or therapy. These signatures are identified from high-throughput profiling assays through a combination of dimensionality reduction and machine learning techniques. The genes, proteins, metabolites, and other biological analytes that compose signatures also generate hypotheses on the underlying mechanisms driving biological responses, thus improving biological understanding. Dimensionality reduction is a critical step in signature discovery to address the large number of analytes in omics datasets, especially for multi-omics profiling studies with tens of thousands of measurements. Latent factor models, which can account for the structural heterogeneity across diverse assays, effectively integrate multi-omics data and reduce dimensionality to a small number of factors that capture correlations and associations among measurements. These factors provide biologically interpretable features for predictive modeling. However, multi-omics integration and predictive modeling are generally performed independently in sequential steps, leading to suboptimal factor construction. Combining these steps can yield better multi-omics signatures that are more predictive while still being biologically meaningful. RESULTS: We developed a supervised variational Bayesian factor model that extracts multi-omics signatures from high-throughput profiling datasets that can span multiple data types. Signature-based multiPle-omics intEgration via lAtent factoRs (SPEAR) adaptively determines factor rank, emphasis on factor structure, data relevance and feature sparsity. The method improves the reconstruction of underlying factors in synthetic examples and prediction accuracy of coronavirus disease 2019 severity and breast cancer tumor subtypes. AVAILABILITY AND IMPLEMENTATION: SPEAR is a publicly available R-package hosted at https://bitbucket.org/kleinstein/SPEAR.


Subject(s)
Bayes Theorem , Humans , COVID-19/virology , Computational Biology/methods , Female , Genomics/methods , Supervised Machine Learning , Multiomics
3.
J Dairy Sci ; 2024 Jul 12.
Article in English | MEDLINE | ID: mdl-39004139

ABSTRACT

The transgalactosylase activity of ß-galactosidases offers a convenient and promising strategy for conversion of lactose into high-value oligosaccharides, such as galacto-oligosaccharides (GOS) and human milk oligosaccharides (HMOs). In this study, we cloned and biochemically characterized a novel C-terminally truncated ß-galactosidase (PaBgal2A-D) from Paenibacillus antarcticus with high transglycosylation activity. PaBgal2A-D is a member of glycoside hydrolase (GH) family 2. The optimal pH and temperature of PaBgal2A-D were determined to be pH 6.5 and 50°C, respectively. It was relatively stable within pH 5.0-8.0 and up to 50°C. PaBgal2A-D showed high transglycosylation activity for GOS synthesis, and the maximum yield of 50.8% (wt/wt) was obtained in 2 h. Moreover, PaBgal2A-D could synthesize lacto-N-neotetraose (LNnT) using lactose and lacto-N-triose II (LNT2), with a conversion rate of 16.4%. This study demonstrated that PaBgal2A-D could be a promising tool to prepare GOS and LNnT.

4.
World J Microbiol Biotechnol ; 37(5): 83, 2021 Apr 15.
Article in English | MEDLINE | ID: mdl-33855634

ABSTRACT

A novel chitosanase gene, designated as PbCsn8, was cloned from Paenibacillus barengoltzii. It shared the highest identity of 73% with the glycoside hydrolase (GH) family 8 chitosanase from Bacillus thuringiensis JAM-GG01. The gene was heterologously expressed in Bacillus subtilis as an extracellular protein, and the highest chitosanase yield of 1, 108 U/mL was obtained by high-cell density fermentation in a 5-L fermentor. The recombinant chitosanase (PbCsn8) was purified to homogeneity and biochemically characterized. PbCsn8 was most active at pH 5.5 and 70 °C, respectively. It was stable in a wide pH range of 5.0-11.0 and up to 55 °C. PbCsn8 was a bifunctional enzyme, exhibiting both chitosanase and glucanase activities, with the highest specificity towards chitosan (360 U/mg), followed by barley ß-glucan (72 U/mg) and lichenan (13 U/mg). It hydrolyzed chitosan to release mainly chitooligosaccharides (COSs) with degree of polymerization (DP) 2-3, while hydrolyzed barley ß-glucan to yield mainly glucooligosaccharides with DP > 5. PbCsn8 was further applied in COS production, and the highest COS yield of 79.3% (w/w) was obtained. This is the first report on a GH family 8 chitosanase from P. barengoltzii. The high yield and remarkable hydrolysis properties may make PbCsn8 a good candidate in industrial application.


Subject(s)
Chitin/analogs & derivatives , Glycoside Hydrolases/metabolism , Paenibacillus/enzymology , Paenibacillus/genetics , Paenibacillus/metabolism , Amino Acid Sequence , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Chitin/biosynthesis , Chitosan/metabolism , Cloning, Molecular , Glucans/metabolism , Glycoside Hydrolases/genetics , Hydrolysis , Industrial Microbiology , Oligosaccharides , Recombinant Proteins/genetics , Recombinant Proteins/metabolism , Substrate Specificity , beta-Glucans/metabolism
5.
Proc Natl Acad Sci U S A ; 114(43): 11368-11373, 2017 10 24.
Article in English | MEDLINE | ID: mdl-29073058

ABSTRACT

Maintaining a robust blood product supply is an essential requirement to guarantee optimal patient care in modern health care systems. However, daily blood product use is difficult to anticipate. Platelet products are the most variable in daily usage, have short shelf lives, and are also the most expensive to produce, test, and store. Due to the combination of absolute need, uncertain daily demand, and short shelf life, platelet products are frequently wasted due to expiration. Our aim is to build and validate a statistical model to forecast future platelet demand and thereby reduce wastage. We have investigated platelet usage patterns at our institution, and specifically interrogated the relationship between platelet usage and aggregated hospital-wide patient data over a recent consecutive 29-mo period. Using a convex statistical formulation, we have found that platelet usage is highly dependent on weekday/weekend pattern, number of patients with various abnormal complete blood count measurements, and location-specific hospital census data. We incorporated these relationships in a mathematical model to guide collection and ordering strategy. This model minimizes waste due to expiration while avoiding shortages; the number of remaining platelet units at the end of any day stays above 10 in our model during the same period. Compared with historical expiration rates during the same period, our model reduces the expiration rate from 10.5 to 3.2%. Extrapolating our results to the ∼2 million units of platelets transfused annually within the United States, if implemented successfully, our model can potentially save ∼80 million dollars in health care costs.


Subject(s)
Models, Statistical , Platelet Transfusion/statistics & numerical data , Tertiary Healthcare , California , Electronic Health Records , Health Care Costs , Humans , Platelet Transfusion/economics , Tertiary Healthcare/economics
6.
Can J Stat ; 48(3): 447-470, 2020 Sep.
Article in English | MEDLINE | ID: mdl-36092475

ABSTRACT

We propose a simple method for evaluating the model that has been chosen by an adaptive regression procedure, our main focus being the lasso. This procedure deletes each chosen predictor and refits the lasso to get a set of models that are "close" to the chosen "base model," and compares the error rates of the base model with that of nearby models. If the deletion of a predictor leads to significant deterioration in the model's predictive power, the predictor is called indispensable; otherwise, the nearby model is called acceptable and can serve as a good alternative to the base model. This provides both an assessment of the predictive contribution of each variable and a set of alternative models that may be used in place of the chosen model. We call this procedure "Next-Door analysis" since it examines models "next" to the base model. It can be applied to supervised learning problems with ℓ 1 penalization and stepwise procedures. We have implemented it in the R language as a library to accompany the well-known glmnet library.


Les auteurs proposent une méthode simple pour l'évaluation de modèles choisis par une procédure de régression adaptative telle que le lasso, sur lequel ils se concentrent. Leur procédure consiste à retirer chaque prédicteur à tour de rôle et à réajuster le lasso afin d'obtenir un ensemble de modèles qui sont près du modèle de base. Ils comparent ensuite le taux d'erreur du modèle de base avec ceux de son voisinage. Lorsque le retrait d'une variable conduit à une baisse marquée de la puissance prédictive du modèle, le prédicteur est considéré comme indispensable. Sinon, il est jugé acceptable et peut servir de remplaçant pour le modèle de base. Cette approche permet à la fois de mesurer la contribution prédictive de chaque variable et de constituer un ensemble de modèles de remplacement. Les auteurs ont baptisé cette approche « l'analyse de la porte voisine ¼ puisqu'elle consiste à examiner des modèles près du modèle de base. Celle-ci peut être appliquée aux problèmes d'apprentissage supervisé avec une pénalisation ℓ 1 et des procédures pas-à-pas. Le auteurs ont implémenté leur méthode en R dans une bibliothèque de fonctions accompagnant la populaire bibliothèque glmnet.

7.
PLoS Comput Biol ; 13(12): e1005875, 2017 12.
Article in English | MEDLINE | ID: mdl-29281633

ABSTRACT

Mass cytometry (CyTOF) has greatly expanded the capability of cytometry. It is now easy to generate multiple CyTOF samples in a single study, with each sample containing single-cell measurement on 50 markers for more than hundreds of thousands of cells. Current methods do not adequately address the issues concerning combining multiple samples for subpopulation discovery, and these issues can be quickly and dramatically amplified with increasing number of samples. To overcome this limitation, we developed Partition-Assisted Clustering and Multiple Alignments of Networks (PAC-MAN) for the fast automatic identification of cell populations in CyTOF data closely matching that of expert manual-discovery, and for alignments between subpopulations across samples to define dataset-level cellular states. PAC-MAN is computationally efficient, allowing the management of very large CyTOF datasets, which are increasingly common in clinical studies and cancer studies that monitor various tissue samples for each subject.


Subject(s)
Single-Cell Analysis/statistics & numerical data , Animals , Biomarkers/analysis , Cluster Analysis , Computational Biology , Computer Simulation , Data Interpretation, Statistical , Databases, Factual , Flow Cytometry/statistics & numerical data , Gene Expression , Humans , Mice
8.
Stat Sin ; 28(3): 1225-1243, 2018 Jul.
Article in English | MEDLINE | ID: mdl-35677806

ABSTRACT

We propose a new method for supervised learning. The hubNet procedure fits a hub-based graphical model to the predictors, to estimate the amount of "connection" that each predictor has with other predictors. This yields a set of predictor weights that are then used in a regularized regression such as the lasso or elastic net. The resulting procedure is easy to implement, can often yield higher or competitive prediction accuracy with fewer features than the lasso, and can give insight into the underlying structure of the predictors. HubNet can be generalized seamlessly to supervised problems such as regularized logistic regression (and other GLMs), Cox's proportional hazards model, and nonlinear procedures such as random forests and boosting. We prove recovery results under a specialized model and illustrate the method on real and simulated data.

9.
Int J Biol Macromol ; 269(Pt 1): 132041, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38705315

ABSTRACT

Hemocyanin, an oxygen-transport protein, is widely distributed in the hemolymph of marine arthropods and mollusks, playing an important role in their physiological processes. Recently, hemocyanin has been recognized as a multifunctional glycoprotein involved in the immunological responses of aquatic invertebrates. Consequently, the link between hemocyanin functions and their potential applications has garnered increased attention. This review offers an integrated overview of hemocyanin's structure, physicochemical characteristics, and bioactivities to further promote the utilization of hemocyanin derived from marine products. Specifically, we review its implication in two aspects of food and aquaculture industries: quality and health. Hemocyanin's inducible phenoloxidase activity is thought to be an inducer of melanosis in crustaceans. New anti-melanosis agents targeted to hemocyanin need to be explored. The red-color change observed in shrimp shells is related to hemocyanin, affecting consumer preferences. Hemocyanin's adaptive modification in response to the aquatic environment is available as a biomarker. Additionally, hemocyanin is endowed with bioactivities encompassing anti-microbial, antiviral, and therapeutic activities. Hemocyanin is also a novel allergen and its allergenic features remain incompletely characterized.


Subject(s)
Hemocyanins , Hemocyanins/chemistry , Animals , Food Industry , Aquatic Organisms/chemistry , Humans
10.
medRxiv ; 2024 Mar 04.
Article in English | MEDLINE | ID: mdl-38496502

ABSTRACT

Strong sex differences in the frequencies and manifestations of Long COVID (LC) have been reported with females significantly more likely than males to present with LC after acute SARS-CoV-2 infection 1-7 . However, whether immunological traits underlying LC differ between sexes, and whether such differences explain the differential manifestations of LC symptomology is currently unknown. Here, we performed sex-based multi-dimensional immune-endocrine profiling of 165 individuals 8 with and without LC in an exploratory, cross-sectional study to identify key immunological traits underlying biological sex differences in LC. We found that female and male participants with LC experienced different sets of symptoms, and distinct patterns of organ system involvement, with female participants suffering from a higher symptom burden. Machine learning approaches identified differential sets of immune features that characterized LC in females and males. Males with LC had decreased frequencies of monocyte and DC populations, elevated NK cells, and plasma cytokines including IL-8 and TGF-ß-family members. Females with LC had increased frequencies of exhausted T cells, cytokine-secreting T cells, higher antibody reactivity to latent herpes viruses including EBV, HSV-2, and CMV, and lower testosterone levels than their control female counterparts. Testosterone levels were significantly associated with lower symptom burden in LC participants over sex designation. These findings suggest distinct immunological processes of LC in females and males and illuminate the crucial role of immune-endocrine dysregulation in sex-specific pathology.

11.
Cell Rep Methods ; 4(3): 100731, 2024 Mar 25.
Article in English | MEDLINE | ID: mdl-38490204

ABSTRACT

Systems vaccinology studies have identified factors affecting individual vaccine responses, but comparing these findings is challenging due to varying study designs. To address this lack of reproducibility, we established a community resource for comparing Bordetella pertussis booster responses and to host annual contests for predicting patients' vaccination outcomes. We report here on our experiences with the "dry-run" prediction contest. We found that, among 20+ models adopted from the literature, the most successful model predicting vaccination outcome was based on age alone. This confirms our concerns about the reproducibility of conclusions between different vaccinology studies. Further, we found that, for newly trained models, handling of baseline information on the target variables was crucial. Overall, multiple co-inertia analysis gave the best results of the tested modeling approaches. Our goal is to engage community in these prediction challenges by making data and models available and opening a public contest in August 2024.


Subject(s)
Multiomics , Vaccines , Humans , Vaccinology/methods , Reproducibility of Results , Computer Simulation
12.
medRxiv ; 2024 Feb 13.
Article in English | MEDLINE | ID: mdl-38405760

ABSTRACT

Age is a major risk factor for severe coronavirus disease-2019 (COVID-19), yet the mechanisms responsible for this relationship have remained incompletely understood. To address this, we evaluated the impact of aging on host and viral dynamics in a prospective, multicenter cohort of 1,031 patients hospitalized for COVID-19, ranging from 18 to 96 years of age. We performed blood transcriptomics and nasal metatranscriptomics, and measured peripheral blood immune cell populations, inflammatory protein expression, anti-SARS-CoV-2 antibodies, and anti-interferon (IFN) autoantibodies. We found that older age correlated with an increased SARS-CoV-2 viral load at the time of admission, and with delayed viral clearance over 28 days. This contributed to an age-dependent increase in type I IFN gene expression in both the respiratory tract and blood. We also observed age-dependent transcriptional increases in peripheral blood IFN-γ, neutrophil degranulation, and Toll like receptor (TLR) signaling pathways, and decreases in T cell receptor (TCR) and B cell receptor signaling pathways. Over time, older adults exhibited a remarkably sustained induction of proinflammatory genes (e.g., CXCL6) and serum chemokines (e.g., CXCL9) compared to younger individuals, highlighting a striking age-dependent impairment in inflammation resolution. Augmented inflammatory signaling also involved the upper airway, where aging was associated with upregulation of TLR, IL17, type I IFN and IL1 pathways, and downregulation TCR and PD-1 signaling pathways. Metatranscriptomics revealed that the oldest adults exhibited disproportionate reactivation of herpes simplex virus and cytomegalovirus in the upper airway following hospitalization. Mass cytometry demonstrated that aging correlated with reduced naïve T and B cell populations, and increased monocytes and exhausted natural killer cells. Transcriptional and protein biomarkers of disease severity markedly differed with age, with the oldest adults exhibiting greater expression of TLR and inflammasome signaling genes, as well as proinflammatory proteins (e.g., IL6, CXCL8), in severe COVID-19 compared to mild/moderate disease. Anti-IFN autoantibody prevalence correlated with both age and disease severity. Taken together, this work profiles both host and microbe in the blood and airway to provide fresh insights into aging-related immune changes in a large cohort of vaccine-naïve COVID-19 patients. We observed age-dependent immune dysregulation at the transcriptional, protein and cellular levels, manifesting in an imbalance of inflammatory responses over the course of hospitalization, and suggesting potential new therapeutic targets.

13.
Sci Transl Med ; 16(743): eadj5154, 2024 Apr 17.
Article in English | MEDLINE | ID: mdl-38630846

ABSTRACT

Age is a major risk factor for severe coronavirus disease 2019 (COVID-19), yet the mechanisms behind this relationship have remained incompletely understood. To address this, we evaluated the impact of aging on host immune response in the blood and the upper airway, as well as the nasal microbiome in a prospective, multicenter cohort of 1031 vaccine-naïve patients hospitalized for COVID-19 between 18 and 96 years old. We performed mass cytometry, serum protein profiling, anti-severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) antibody assays, and blood and nasal transcriptomics. We found that older age correlated with increased SARS-CoV-2 viral abundance upon hospital admission, delayed viral clearance, and increased type I interferon gene expression in both the blood and upper airway. We also observed age-dependent up-regulation of innate immune signaling pathways and down-regulation of adaptive immune signaling pathways. Older adults had lower naïve T and B cell populations and higher monocyte populations. Over time, older adults demonstrated a sustained induction of pro-inflammatory genes and serum chemokines compared with younger individuals, suggesting an age-dependent impairment in inflammation resolution. Transcriptional and protein biomarkers of disease severity differed with age, with the oldest adults exhibiting greater expression of pro-inflammatory genes and proteins in severe disease. Together, our study finds that aging is associated with impaired viral clearance, dysregulated immune signaling, and persistent and potentially pathologic activation of pro-inflammatory genes and proteins.


Subject(s)
COVID-19 , Humans , Aged , Adolescent , Young Adult , Adult , Middle Aged , Aged, 80 and over , SARS-CoV-2 , Prospective Studies , Multiomics , Chemokines
14.
medRxiv ; 2024 Jan 12.
Article in English | MEDLINE | ID: mdl-38260484

ABSTRACT

Background: Long COVID contributes to the global burden of disease. Proposed root cause hypotheses include the persistence of SARS-CoV-2 viral reservoir, autoimmunity, and reactivation of latent herpesviruses. Patients have reported various changes in Long COVID symptoms after COVID-19 vaccinations, leaving uncertainty about whether vaccine-induced immune responses may alleviate or worsen disease pathology. Methods: In this prospective study, we evaluated changes in symptoms and immune responses after COVID-19 vaccination in 16 vaccine-naïve individuals with Long COVID. Surveys were administered before vaccination and then at 2, 6, and 12 weeks after receiving the first vaccine dose of the primary series. Simultaneously, SARS-CoV-2-reactive TCR enrichment, SARS-CoV-2-specific antibody responses, antibody responses to other viral and self-antigens, and circulating cytokines were quantified before vaccination and at 6 and 12 weeks after vaccination. Results: Self-report at 12 weeks post-vaccination indicated 10 out of 16 participants had improved health, 3 had no change, 1 had worse health, and 2 reported marginal changes. Significant elevation in SARS-CoV-2-specific TCRs and Spike protein-specific IgG were observed 6 and 12 weeks after vaccination. No changes in reactivities were observed against herpes viruses and self-antigens. Within this dataset, higher baseline sIL-6R was associated with symptom improvement, and the two top features associated with non-improvement were high IFN-ß and CNTF, among soluble analytes. Conclusions: Our study showed that in this small sample, vaccination improved the health or resulted in no change to the health of most participants, though few experienced worsening. Vaccination was associated with increased SARS-CoV-2 Spike protein-specific IgG and T cell expansion in most individuals with Long COVID. Symptom improvement was observed in those with baseline elevated sIL-6R, while elevated interferon and neuropeptide levels were associated with a lack of improvement.

15.
J Clin Invest ; 134(9)2024 May 01.
Article in English | MEDLINE | ID: mdl-38690733

ABSTRACT

BACKGROUNDPatients hospitalized for COVID-19 exhibit diverse clinical outcomes, with outcomes for some individuals diverging over time even though their initial disease severity appears similar to that of other patients. A systematic evaluation of molecular and cellular profiles over the full disease course can link immune programs and their coordination with progression heterogeneity.METHODSWe performed deep immunophenotyping and conducted longitudinal multiomics modeling, integrating 10 assays for 1,152 Immunophenotyping Assessment in a COVID-19 Cohort (IMPACC) study participants and identifying several immune cascades that were significant drivers of differential clinical outcomes.RESULTSIncreasing disease severity was driven by a temporal pattern that began with the early upregulation of immunosuppressive metabolites and then elevated levels of inflammatory cytokines, signatures of coagulation, formation of neutrophil extracellular traps, and T cell functional dysregulation. A second immune cascade, predictive of 28-day mortality among critically ill patients, was characterized by reduced total plasma Igs and B cells and dysregulated IFN responsiveness. We demonstrated that the balance disruption between IFN-stimulated genes and IFN inhibitors is a crucial biomarker of COVID-19 mortality, potentially contributing to failure of viral clearance in patients with fatal illness.CONCLUSIONOur longitudinal multiomics profiling study revealed temporal coordination across diverse omics that potentially explain the disease progression, providing insights that can inform the targeted development of therapies for patients hospitalized with COVID-19, especially those who are critically ill.TRIAL REGISTRATIONClinicalTrials.gov NCT04378777.FUNDINGNIH (5R01AI135803-03, 5U19AI118608-04, 5U19AI128910-04, 4U19AI090023-11, 4U19AI118610-06, R01AI145835-01A1S1, 5U19AI062629-17, 5U19AI057229-17, 5U19AI125357-05, 5U19AI128913-03, 3U19AI077439-13, 5U54AI142766-03, 5R01AI104870-07, 3U19AI089992-09, 3U19AI128913-03, and 5T32DA018926-18); NIAID, NIH (3U19AI1289130, U19AI128913-04S1, and R01AI122220); and National Science Foundation (DMS2310836).


Subject(s)
COVID-19 , SARS-CoV-2 , Severity of Illness Index , Humans , COVID-19/immunology , COVID-19/mortality , COVID-19/blood , Male , Longitudinal Studies , SARS-CoV-2/immunology , Female , Middle Aged , Aged , Adult , Cytokines/blood , Cytokines/immunology , Multiomics
16.
Hum Vaccin Immunother ; 19(2): 2251830, 2023 08 01.
Article in English | MEDLINE | ID: mdl-37697867

ABSTRACT

Overfitting describes the phenomenon where a highly predictive model on the training data generalizes poorly to future observations. It is a common concern when applying machine learning techniques to contemporary medical applications, such as predicting vaccination response and disease status in infectious disease or cancer studies. This review examines the causes of overfitting and offers strategies to counteract it, focusing on model complexity reduction, reliable model evaluation, and harnessing data diversity. Through discussion of the underlying mathematical models and illustrative examples using both synthetic data and published real datasets, our objective is to equip analysts and bioinformaticians with the knowledge and tools necessary to detect and mitigate overfitting in their research.


Subject(s)
Machine Learning , Vaccination
17.
bioRxiv ; 2023 Dec 12.
Article in English | MEDLINE | ID: mdl-37961111

ABSTRACT

The disparity in genetic risk prediction accuracy between European and non-European individuals highlights a critical challenge in health inequality. To bridge this gap, we introduce JointPRS, a novel method that models multiple populations jointly to improve genetic risk predictions for non-European individuals. JointPRS has three key features. First, it encompasses all diverse populations to improve prediction accuracy, rather than relying solely on the target population with a singular auxiliary European group. Second, it autonomously estimates and leverages chromosome-wise cross-population genetic correlations to infer the effect sizes of genetic variants. Lastly, it provides an auto version that has comparable performance to the tuning version to accommodate the situation with no validation dataset. Through extensive simulations and real data applications to 22 quantitative traits and four binary traits in East Asian populations, nine quantitative traits and one binary trait in African populations, and four quantitative traits in South Asian populations, we demonstrate that JointPRS outperforms state-of-art methods, improving the prediction accuracy for both quantitative and binary traits in non-European populations.

18.
Res Sq ; 2023 Dec 25.
Article in English | MEDLINE | ID: mdl-38234764

ABSTRACT

The disparity in genetic risk prediction accuracy between European and non-European individuals highlights a critical challenge in health inequality. To bridge this gap, we introduce JointPRS, a novel method that models multiple populations jointly to improve genetic risk predictions for non-European individuals. JointPRS has three key features. First, it encompasses all diverse populations to improve prediction accuracy, rather than relying solely on the target population with a singular auxiliary European group. Second, it autonomously estimates and leverages chromosome-wise cross-population genetic correlations to infer the effect sizes of genetic variants. Lastly, it provides an auto version that has comparable performance to the tuning version to accommodate the situation with no validation dataset. Through extensive simulations and real data applications to 22 quantitative traits and four binary traits in East Asian populations, nine quantitative traits and one binary trait in African populations, and four quantitative traits in South Asian populations, we demonstrate that JointPRS outperforms state-of-art methods, improving the prediction accuracy for both quantitative and binary traits in non-European populations.

19.
bioRxiv ; 2023 Sep 27.
Article in English | MEDLINE | ID: mdl-36747790

ABSTRACT

MOTIVATION: Predictive biological signatures provide utility as biomarkers for disease diagnosis and prognosis, as well as prediction of responses to vaccination or therapy. These signatures are iden-tified from high-throughput profiling assays through a combination of dimensionality reduction and machine learning techniques. The genes, proteins, metabolites, and other biological analytes that compose signatures also generate hypotheses on the underlying mechanisms driving biological responses, thus improving biological understanding. Dimensionality reduction is a critical step in signature discovery to address the large number of analytes in omics datasets, especially for multi-omics profiling studies with tens of thousands of measurements. Latent factor models, which can account for the structural heterogeneity across diverse assays, effectively integrate multi-omics data and reduce dimensionality to a small number of factors that capture correlations and associations among measurements. These factors provide biologically interpretable features for predictive model-ing. However, multi-omics integration and predictive modeling are generally performed independent-ly in sequential steps, leading to suboptimal factor construction. Combining these steps can yield better multi-omics signatures that are more predictive while still being biologically meaningful. RESULTS: We developed a supervised variational Bayesian factor model that extracts multi-omics signatures from high-throughput profiling datasets that can span multiple data types. Signature-based multiPle-omics intEgration via lAtent factoRs (SPEAR) adaptively determines factor rank, emphasis on factor structure, data relevance and feature sparsity. The method improves the recon-struction of underlying factors in synthetic examples and prediction accuracy of COVID-19 severity and breast cancer tumor subtypes. AVAILABILITY: SPEAR is a publicly available R-package hosted at https://bitbucket.org/kleinstein/SPEAR.

20.
iScience ; 26(12): 108387, 2023 Dec 15.
Article in English | MEDLINE | ID: mdl-38047068

ABSTRACT

Infection with West Nile virus (WNV) drives a wide range of responses, from asymptomatic to flu-like symptoms/fever or severe cases of encephalitis and death. To identify cellular and molecular signatures distinguishing WNV severity, we employed systems profiling of peripheral blood from asymptomatic and severely ill individuals infected with WNV. We interrogated immune responses longitudinally from acute infection through convalescence employing single-cell protein and transcriptional profiling complemented with matched serum proteomics and metabolomics as well as multi-omics analysis. At the acute time point, we detected both elevation of pro-inflammatory markers in innate immune cell types and reduction of regulatory T cell activity in participants with severe infection, whereas asymptomatic donors had higher expression of genes associated with anti-inflammatory CD16+ monocytes. Therefore, we demonstrated the potential of systems immunology using multiple cell-type and cell-state-specific analyses to identify correlates of infection severity and host cellular activity contributing to an effective anti-viral response.

SELECTION OF CITATIONS
SEARCH DETAIL