Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 623(7985): 139-148, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37748514

RESUMO

Post-acute infection syndromes may develop after acute viral disease1. Infection with SARS-CoV-2 can result in the development of a post-acute infection syndrome known as long COVID. Individuals with long COVID frequently report unremitting fatigue, post-exertional malaise, and a variety of cognitive and autonomic dysfunctions2-4. However, the biological processes that are associated with the development and persistence of these symptoms are unclear. Here 275 individuals with or without long COVID were enrolled in a cross-sectional study that included multidimensional immune phenotyping and unbiased machine learning methods to identify biological features associated with long COVID. Marked differences were noted in circulating myeloid and lymphocyte populations relative to the matched controls, as well as evidence of exaggerated humoral responses directed against SARS-CoV-2 among participants with long COVID. Furthermore, higher antibody responses directed against non-SARS-CoV-2 viral pathogens were observed among individuals with long COVID, particularly Epstein-Barr virus. Levels of soluble immune mediators and hormones varied among groups, with cortisol levels being lower among participants with long COVID. Integration of immune phenotyping data into unbiased machine learning models identified the key features that are most strongly associated with long COVID status. Collectively, these findings may help to guide future studies into the pathobiology of long COVID and help with developing relevant biomarkers.


Assuntos
Anticorpos Antivirais , Herpesvirus Humano 4 , Hidrocortisona , Linfócitos , Células Mieloides , Síndrome de COVID-19 Pós-Aguda , SARS-CoV-2 , Humanos , Anticorpos Antivirais/sangue , Anticorpos Antivirais/imunologia , Biomarcadores/sangue , Estudos Transversais , Herpesvirus Humano 4/imunologia , Hidrocortisona/sangue , Imunofenotipagem , Linfócitos/imunologia , Aprendizado de Máquina , Células Mieloides/imunologia , Síndrome de COVID-19 Pós-Aguda/diagnóstico , Síndrome de COVID-19 Pós-Aguda/imunologia , Síndrome de COVID-19 Pós-Aguda/fisiopatologia , Síndrome de COVID-19 Pós-Aguda/virologia , SARS-CoV-2/imunologia
2.
Bioinformatics ; 40(5)2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38603606

RESUMO

MOTIVATION: Predictive biological signatures provide utility as biomarkers for disease diagnosis and prognosis, as well as prediction of responses to vaccination or therapy. These signatures are identified from high-throughput profiling assays through a combination of dimensionality reduction and machine learning techniques. The genes, proteins, metabolites, and other biological analytes that compose signatures also generate hypotheses on the underlying mechanisms driving biological responses, thus improving biological understanding. Dimensionality reduction is a critical step in signature discovery to address the large number of analytes in omics datasets, especially for multi-omics profiling studies with tens of thousands of measurements. Latent factor models, which can account for the structural heterogeneity across diverse assays, effectively integrate multi-omics data and reduce dimensionality to a small number of factors that capture correlations and associations among measurements. These factors provide biologically interpretable features for predictive modeling. However, multi-omics integration and predictive modeling are generally performed independently in sequential steps, leading to suboptimal factor construction. Combining these steps can yield better multi-omics signatures that are more predictive while still being biologically meaningful. RESULTS: We developed a supervised variational Bayesian factor model that extracts multi-omics signatures from high-throughput profiling datasets that can span multiple data types. Signature-based multiPle-omics intEgration via lAtent factoRs (SPEAR) adaptively determines factor rank, emphasis on factor structure, data relevance and feature sparsity. The method improves the reconstruction of underlying factors in synthetic examples and prediction accuracy of coronavirus disease 2019 severity and breast cancer tumor subtypes. AVAILABILITY AND IMPLEMENTATION: SPEAR is a publicly available R-package hosted at https://bitbucket.org/kleinstein/SPEAR.


Assuntos
Teorema de Bayes , Humanos , COVID-19/virologia , Biologia Computacional/métodos , Feminino , Genômica/métodos , Aprendizado de Máquina Supervisionado , Multiômica
3.
World J Microbiol Biotechnol ; 37(5): 83, 2021 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-33855634

RESUMO

A novel chitosanase gene, designated as PbCsn8, was cloned from Paenibacillus barengoltzii. It shared the highest identity of 73% with the glycoside hydrolase (GH) family 8 chitosanase from Bacillus thuringiensis JAM-GG01. The gene was heterologously expressed in Bacillus subtilis as an extracellular protein, and the highest chitosanase yield of 1, 108 U/mL was obtained by high-cell density fermentation in a 5-L fermentor. The recombinant chitosanase (PbCsn8) was purified to homogeneity and biochemically characterized. PbCsn8 was most active at pH 5.5 and 70 °C, respectively. It was stable in a wide pH range of 5.0-11.0 and up to 55 °C. PbCsn8 was a bifunctional enzyme, exhibiting both chitosanase and glucanase activities, with the highest specificity towards chitosan (360 U/mg), followed by barley ß-glucan (72 U/mg) and lichenan (13 U/mg). It hydrolyzed chitosan to release mainly chitooligosaccharides (COSs) with degree of polymerization (DP) 2-3, while hydrolyzed barley ß-glucan to yield mainly glucooligosaccharides with DP > 5. PbCsn8 was further applied in COS production, and the highest COS yield of 79.3% (w/w) was obtained. This is the first report on a GH family 8 chitosanase from P. barengoltzii. The high yield and remarkable hydrolysis properties may make PbCsn8 a good candidate in industrial application.


Assuntos
Quitina/análogos & derivados , Glicosídeo Hidrolases/metabolismo , Paenibacillus/enzimologia , Paenibacillus/genética , Paenibacillus/metabolismo , Sequência de Aminoácidos , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Quitina/biossíntese , Quitosana/metabolismo , Clonagem Molecular , Glucanos/metabolismo , Glicosídeo Hidrolases/genética , Hidrólise , Microbiologia Industrial , Oligossacarídeos , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo , Especificidade por Substrato , beta-Glucanas/metabolismo
4.
Proc Natl Acad Sci U S A ; 114(43): 11368-11373, 2017 10 24.
Artigo em Inglês | MEDLINE | ID: mdl-29073058

RESUMO

Maintaining a robust blood product supply is an essential requirement to guarantee optimal patient care in modern health care systems. However, daily blood product use is difficult to anticipate. Platelet products are the most variable in daily usage, have short shelf lives, and are also the most expensive to produce, test, and store. Due to the combination of absolute need, uncertain daily demand, and short shelf life, platelet products are frequently wasted due to expiration. Our aim is to build and validate a statistical model to forecast future platelet demand and thereby reduce wastage. We have investigated platelet usage patterns at our institution, and specifically interrogated the relationship between platelet usage and aggregated hospital-wide patient data over a recent consecutive 29-mo period. Using a convex statistical formulation, we have found that platelet usage is highly dependent on weekday/weekend pattern, number of patients with various abnormal complete blood count measurements, and location-specific hospital census data. We incorporated these relationships in a mathematical model to guide collection and ordering strategy. This model minimizes waste due to expiration while avoiding shortages; the number of remaining platelet units at the end of any day stays above 10 in our model during the same period. Compared with historical expiration rates during the same period, our model reduces the expiration rate from 10.5 to 3.2%. Extrapolating our results to the ∼2 million units of platelets transfused annually within the United States, if implemented successfully, our model can potentially save ∼80 million dollars in health care costs.


Assuntos
Modelos Estatísticos , Transfusão de Plaquetas/estatística & dados numéricos , Atenção Terciária à Saúde , California , Registros Eletrônicos de Saúde , Custos de Cuidados de Saúde , Humanos , Transfusão de Plaquetas/economia , Atenção Terciária à Saúde/economia
5.
Can J Stat ; 48(3): 447-470, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-36092475

RESUMO

We propose a simple method for evaluating the model that has been chosen by an adaptive regression procedure, our main focus being the lasso. This procedure deletes each chosen predictor and refits the lasso to get a set of models that are "close" to the chosen "base model," and compares the error rates of the base model with that of nearby models. If the deletion of a predictor leads to significant deterioration in the model's predictive power, the predictor is called indispensable; otherwise, the nearby model is called acceptable and can serve as a good alternative to the base model. This provides both an assessment of the predictive contribution of each variable and a set of alternative models that may be used in place of the chosen model. We call this procedure "Next-Door analysis" since it examines models "next" to the base model. It can be applied to supervised learning problems with ℓ 1 penalization and stepwise procedures. We have implemented it in the R language as a library to accompany the well-known glmnet library.


Les auteurs proposent une méthode simple pour l'évaluation de modèles choisis par une procédure de régression adaptative telle que le lasso, sur lequel ils se concentrent. Leur procédure consiste à retirer chaque prédicteur à tour de rôle et à réajuster le lasso afin d'obtenir un ensemble de modèles qui sont près du modèle de base. Ils comparent ensuite le taux d'erreur du modèle de base avec ceux de son voisinage. Lorsque le retrait d'une variable conduit à une baisse marquée de la puissance prédictive du modèle, le prédicteur est considéré comme indispensable. Sinon, il est jugé acceptable et peut servir de remplaçant pour le modèle de base. Cette approche permet à la fois de mesurer la contribution prédictive de chaque variable et de constituer un ensemble de modèles de remplacement. Les auteurs ont baptisé cette approche « l'analyse de la porte voisine ¼ puisqu'elle consiste à examiner des modèles près du modèle de base. Celle-ci peut être appliquée aux problèmes d'apprentissage supervisé avec une pénalisation ℓ 1 et des procédures pas-à-pas. Le auteurs ont implémenté leur méthode en R dans une bibliothèque de fonctions accompagnant la populaire bibliothèque glmnet.

6.
PLoS Comput Biol ; 13(12): e1005875, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-29281633

RESUMO

Mass cytometry (CyTOF) has greatly expanded the capability of cytometry. It is now easy to generate multiple CyTOF samples in a single study, with each sample containing single-cell measurement on 50 markers for more than hundreds of thousands of cells. Current methods do not adequately address the issues concerning combining multiple samples for subpopulation discovery, and these issues can be quickly and dramatically amplified with increasing number of samples. To overcome this limitation, we developed Partition-Assisted Clustering and Multiple Alignments of Networks (PAC-MAN) for the fast automatic identification of cell populations in CyTOF data closely matching that of expert manual-discovery, and for alignments between subpopulations across samples to define dataset-level cellular states. PAC-MAN is computationally efficient, allowing the management of very large CyTOF datasets, which are increasingly common in clinical studies and cancer studies that monitor various tissue samples for each subject.


Assuntos
Análise de Célula Única/estatística & dados numéricos , Animais , Biomarcadores/análise , Análise por Conglomerados , Biologia Computacional , Simulação por Computador , Interpretação Estatística de Dados , Bases de Dados Factuais , Citometria de Fluxo/estatística & dados numéricos , Expressão Gênica , Humanos , Camundongos
7.
Stat Sin ; 28(3): 1225-1243, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35677806

RESUMO

We propose a new method for supervised learning. The hubNet procedure fits a hub-based graphical model to the predictors, to estimate the amount of "connection" that each predictor has with other predictors. This yields a set of predictor weights that are then used in a regularized regression such as the lasso or elastic net. The resulting procedure is easy to implement, can often yield higher or competitive prediction accuracy with fewer features than the lasso, and can give insight into the underlying structure of the predictors. HubNet can be generalized seamlessly to supervised problems such as regularized logistic regression (and other GLMs), Cox's proportional hazards model, and nonlinear procedures such as random forests and boosting. We prove recovery results under a specialized model and illustrate the method on real and simulated data.

8.
Int J Biol Macromol ; 269(Pt 1): 132041, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38705315

RESUMO

Hemocyanin, an oxygen-transport protein, is widely distributed in the hemolymph of marine arthropods and mollusks, playing an important role in their physiological processes. Recently, hemocyanin has been recognized as a multifunctional glycoprotein involved in the immunological responses of aquatic invertebrates. Consequently, the link between hemocyanin functions and their potential applications has garnered increased attention. This review offers an integrated overview of hemocyanin's structure, physicochemical characteristics, and bioactivities to further promote the utilization of hemocyanin derived from marine products. Specifically, we review its implication in two aspects of food and aquaculture industries: quality and health. Hemocyanin's inducible phenoloxidase activity is thought to be an inducer of melanosis in crustaceans. New anti-melanosis agents targeted to hemocyanin need to be explored. The red-color change observed in shrimp shells is related to hemocyanin, affecting consumer preferences. Hemocyanin's adaptive modification in response to the aquatic environment is available as a biomarker. Additionally, hemocyanin is endowed with bioactivities encompassing anti-microbial, antiviral, and therapeutic activities. Hemocyanin is also a novel allergen and its allergenic features remain incompletely characterized.


Assuntos
Hemocianinas , Hemocianinas/química , Animais , Indústria Alimentícia , Organismos Aquáticos/química , Humanos
9.
medRxiv ; 2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38496502

RESUMO

Strong sex differences in the frequencies and manifestations of Long COVID (LC) have been reported with females significantly more likely than males to present with LC after acute SARS-CoV-2 infection 1-7 . However, whether immunological traits underlying LC differ between sexes, and whether such differences explain the differential manifestations of LC symptomology is currently unknown. Here, we performed sex-based multi-dimensional immune-endocrine profiling of 165 individuals 8 with and without LC in an exploratory, cross-sectional study to identify key immunological traits underlying biological sex differences in LC. We found that female and male participants with LC experienced different sets of symptoms, and distinct patterns of organ system involvement, with female participants suffering from a higher symptom burden. Machine learning approaches identified differential sets of immune features that characterized LC in females and males. Males with LC had decreased frequencies of monocyte and DC populations, elevated NK cells, and plasma cytokines including IL-8 and TGF-ß-family members. Females with LC had increased frequencies of exhausted T cells, cytokine-secreting T cells, higher antibody reactivity to latent herpes viruses including EBV, HSV-2, and CMV, and lower testosterone levels than their control female counterparts. Testosterone levels were significantly associated with lower symptom burden in LC participants over sex designation. These findings suggest distinct immunological processes of LC in females and males and illuminate the crucial role of immune-endocrine dysregulation in sex-specific pathology.

10.
Cell Rep Methods ; 4(3): 100731, 2024 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-38490204

RESUMO

Systems vaccinology studies have identified factors affecting individual vaccine responses, but comparing these findings is challenging due to varying study designs. To address this lack of reproducibility, we established a community resource for comparing Bordetella pertussis booster responses and to host annual contests for predicting patients' vaccination outcomes. We report here on our experiences with the "dry-run" prediction contest. We found that, among 20+ models adopted from the literature, the most successful model predicting vaccination outcome was based on age alone. This confirms our concerns about the reproducibility of conclusions between different vaccinology studies. Further, we found that, for newly trained models, handling of baseline information on the target variables was crucial. Overall, multiple co-inertia analysis gave the best results of the tested modeling approaches. Our goal is to engage community in these prediction challenges by making data and models available and opening a public contest in August 2024.


Assuntos
Multiômica , Vacinas , Humanos , Vacinologia/métodos , Reprodutibilidade dos Testes , Simulação por Computador
11.
medRxiv ; 2024 Jan 12.
Artigo em Inglês | MEDLINE | ID: mdl-38260484

RESUMO

Background: Long COVID contributes to the global burden of disease. Proposed root cause hypotheses include the persistence of SARS-CoV-2 viral reservoir, autoimmunity, and reactivation of latent herpesviruses. Patients have reported various changes in Long COVID symptoms after COVID-19 vaccinations, leaving uncertainty about whether vaccine-induced immune responses may alleviate or worsen disease pathology. Methods: In this prospective study, we evaluated changes in symptoms and immune responses after COVID-19 vaccination in 16 vaccine-naïve individuals with Long COVID. Surveys were administered before vaccination and then at 2, 6, and 12 weeks after receiving the first vaccine dose of the primary series. Simultaneously, SARS-CoV-2-reactive TCR enrichment, SARS-CoV-2-specific antibody responses, antibody responses to other viral and self-antigens, and circulating cytokines were quantified before vaccination and at 6 and 12 weeks after vaccination. Results: Self-report at 12 weeks post-vaccination indicated 10 out of 16 participants had improved health, 3 had no change, 1 had worse health, and 2 reported marginal changes. Significant elevation in SARS-CoV-2-specific TCRs and Spike protein-specific IgG were observed 6 and 12 weeks after vaccination. No changes in reactivities were observed against herpes viruses and self-antigens. Within this dataset, higher baseline sIL-6R was associated with symptom improvement, and the two top features associated with non-improvement were high IFN-ß and CNTF, among soluble analytes. Conclusions: Our study showed that in this small sample, vaccination improved the health or resulted in no change to the health of most participants, though few experienced worsening. Vaccination was associated with increased SARS-CoV-2 Spike protein-specific IgG and T cell expansion in most individuals with Long COVID. Symptom improvement was observed in those with baseline elevated sIL-6R, while elevated interferon and neuropeptide levels were associated with a lack of improvement.

12.
Sci Transl Med ; 16(743): eadj5154, 2024 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-38630846

RESUMO

Age is a major risk factor for severe coronavirus disease 2019 (COVID-19), yet the mechanisms behind this relationship have remained incompletely understood. To address this, we evaluated the impact of aging on host immune response in the blood and the upper airway, as well as the nasal microbiome in a prospective, multicenter cohort of 1031 vaccine-naïve patients hospitalized for COVID-19 between 18 and 96 years old. We performed mass cytometry, serum protein profiling, anti-severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) antibody assays, and blood and nasal transcriptomics. We found that older age correlated with increased SARS-CoV-2 viral abundance upon hospital admission, delayed viral clearance, and increased type I interferon gene expression in both the blood and upper airway. We also observed age-dependent up-regulation of innate immune signaling pathways and down-regulation of adaptive immune signaling pathways. Older adults had lower naïve T and B cell populations and higher monocyte populations. Over time, older adults demonstrated a sustained induction of pro-inflammatory genes and serum chemokines compared with younger individuals, suggesting an age-dependent impairment in inflammation resolution. Transcriptional and protein biomarkers of disease severity differed with age, with the oldest adults exhibiting greater expression of pro-inflammatory genes and proteins in severe disease. Together, our study finds that aging is associated with impaired viral clearance, dysregulated immune signaling, and persistent and potentially pathologic activation of pro-inflammatory genes and proteins.


Assuntos
COVID-19 , Humanos , Idoso , Adolescente , Adulto Jovem , Adulto , Pessoa de Meia-Idade , Idoso de 80 Anos ou mais , SARS-CoV-2 , Estudos Prospectivos , Multiômica , Quimiocinas
13.
medRxiv ; 2024 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-38405760

RESUMO

Age is a major risk factor for severe coronavirus disease-2019 (COVID-19), yet the mechanisms responsible for this relationship have remained incompletely understood. To address this, we evaluated the impact of aging on host and viral dynamics in a prospective, multicenter cohort of 1,031 patients hospitalized for COVID-19, ranging from 18 to 96 years of age. We performed blood transcriptomics and nasal metatranscriptomics, and measured peripheral blood immune cell populations, inflammatory protein expression, anti-SARS-CoV-2 antibodies, and anti-interferon (IFN) autoantibodies. We found that older age correlated with an increased SARS-CoV-2 viral load at the time of admission, and with delayed viral clearance over 28 days. This contributed to an age-dependent increase in type I IFN gene expression in both the respiratory tract and blood. We also observed age-dependent transcriptional increases in peripheral blood IFN-γ, neutrophil degranulation, and Toll like receptor (TLR) signaling pathways, and decreases in T cell receptor (TCR) and B cell receptor signaling pathways. Over time, older adults exhibited a remarkably sustained induction of proinflammatory genes (e.g., CXCL6) and serum chemokines (e.g., CXCL9) compared to younger individuals, highlighting a striking age-dependent impairment in inflammation resolution. Augmented inflammatory signaling also involved the upper airway, where aging was associated with upregulation of TLR, IL17, type I IFN and IL1 pathways, and downregulation TCR and PD-1 signaling pathways. Metatranscriptomics revealed that the oldest adults exhibited disproportionate reactivation of herpes simplex virus and cytomegalovirus in the upper airway following hospitalization. Mass cytometry demonstrated that aging correlated with reduced naïve T and B cell populations, and increased monocytes and exhausted natural killer cells. Transcriptional and protein biomarkers of disease severity markedly differed with age, with the oldest adults exhibiting greater expression of TLR and inflammasome signaling genes, as well as proinflammatory proteins (e.g., IL6, CXCL8), in severe COVID-19 compared to mild/moderate disease. Anti-IFN autoantibody prevalence correlated with both age and disease severity. Taken together, this work profiles both host and microbe in the blood and airway to provide fresh insights into aging-related immune changes in a large cohort of vaccine-naïve COVID-19 patients. We observed age-dependent immune dysregulation at the transcriptional, protein and cellular levels, manifesting in an imbalance of inflammatory responses over the course of hospitalization, and suggesting potential new therapeutic targets.

14.
J Clin Invest ; 134(9)2024 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-38690733

RESUMO

BACKGROUNDPatients hospitalized for COVID-19 exhibit diverse clinical outcomes, with outcomes for some individuals diverging over time even though their initial disease severity appears similar to that of other patients. A systematic evaluation of molecular and cellular profiles over the full disease course can link immune programs and their coordination with progression heterogeneity.METHODSWe performed deep immunophenotyping and conducted longitudinal multiomics modeling, integrating 10 assays for 1,152 Immunophenotyping Assessment in a COVID-19 Cohort (IMPACC) study participants and identifying several immune cascades that were significant drivers of differential clinical outcomes.RESULTSIncreasing disease severity was driven by a temporal pattern that began with the early upregulation of immunosuppressive metabolites and then elevated levels of inflammatory cytokines, signatures of coagulation, formation of neutrophil extracellular traps, and T cell functional dysregulation. A second immune cascade, predictive of 28-day mortality among critically ill patients, was characterized by reduced total plasma Igs and B cells and dysregulated IFN responsiveness. We demonstrated that the balance disruption between IFN-stimulated genes and IFN inhibitors is a crucial biomarker of COVID-19 mortality, potentially contributing to failure of viral clearance in patients with fatal illness.CONCLUSIONOur longitudinal multiomics profiling study revealed temporal coordination across diverse omics that potentially explain the disease progression, providing insights that can inform the targeted development of therapies for patients hospitalized with COVID-19, especially those who are critically ill.TRIAL REGISTRATIONClinicalTrials.gov NCT04378777.FUNDINGNIH (5R01AI135803-03, 5U19AI118608-04, 5U19AI128910-04, 4U19AI090023-11, 4U19AI118610-06, R01AI145835-01A1S1, 5U19AI062629-17, 5U19AI057229-17, 5U19AI125357-05, 5U19AI128913-03, 3U19AI077439-13, 5U54AI142766-03, 5R01AI104870-07, 3U19AI089992-09, 3U19AI128913-03, and 5T32DA018926-18); NIAID, NIH (3U19AI1289130, U19AI128913-04S1, and R01AI122220); and National Science Foundation (DMS2310836).


Assuntos
COVID-19 , SARS-CoV-2 , Índice de Gravidade de Doença , Humanos , COVID-19/imunologia , COVID-19/mortalidade , COVID-19/sangue , Masculino , Estudos Longitudinais , SARS-CoV-2/imunologia , Feminino , Pessoa de Meia-Idade , Idoso , Adulto , Citocinas/sangue , Citocinas/imunologia , Multiômica
15.
Hum Vaccin Immunother ; 19(2): 2251830, 2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37697867

RESUMO

Overfitting describes the phenomenon where a highly predictive model on the training data generalizes poorly to future observations. It is a common concern when applying machine learning techniques to contemporary medical applications, such as predicting vaccination response and disease status in infectious disease or cancer studies. This review examines the causes of overfitting and offers strategies to counteract it, focusing on model complexity reduction, reliable model evaluation, and harnessing data diversity. Through discussion of the underlying mathematical models and illustrative examples using both synthetic data and published real datasets, our objective is to equip analysts and bioinformaticians with the knowledge and tools necessary to detect and mitigate overfitting in their research.


Assuntos
Aprendizado de Máquina , Vacinação
16.
bioRxiv ; 2023 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-37961111

RESUMO

The disparity in genetic risk prediction accuracy between European and non-European individuals highlights a critical challenge in health inequality. To bridge this gap, we introduce JointPRS, a novel method that models multiple populations jointly to improve genetic risk predictions for non-European individuals. JointPRS has three key features. First, it encompasses all diverse populations to improve prediction accuracy, rather than relying solely on the target population with a singular auxiliary European group. Second, it autonomously estimates and leverages chromosome-wise cross-population genetic correlations to infer the effect sizes of genetic variants. Lastly, it provides an auto version that has comparable performance to the tuning version to accommodate the situation with no validation dataset. Through extensive simulations and real data applications to 22 quantitative traits and four binary traits in East Asian populations, nine quantitative traits and one binary trait in African populations, and four quantitative traits in South Asian populations, we demonstrate that JointPRS outperforms state-of-art methods, improving the prediction accuracy for both quantitative and binary traits in non-European populations.

17.
Res Sq ; 2023 Dec 25.
Artigo em Inglês | MEDLINE | ID: mdl-38234764

RESUMO

The disparity in genetic risk prediction accuracy between European and non-European individuals highlights a critical challenge in health inequality. To bridge this gap, we introduce JointPRS, a novel method that models multiple populations jointly to improve genetic risk predictions for non-European individuals. JointPRS has three key features. First, it encompasses all diverse populations to improve prediction accuracy, rather than relying solely on the target population with a singular auxiliary European group. Second, it autonomously estimates and leverages chromosome-wise cross-population genetic correlations to infer the effect sizes of genetic variants. Lastly, it provides an auto version that has comparable performance to the tuning version to accommodate the situation with no validation dataset. Through extensive simulations and real data applications to 22 quantitative traits and four binary traits in East Asian populations, nine quantitative traits and one binary trait in African populations, and four quantitative traits in South Asian populations, we demonstrate that JointPRS outperforms state-of-art methods, improving the prediction accuracy for both quantitative and binary traits in non-European populations.

18.
bioRxiv ; 2023 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-36747790

RESUMO

MOTIVATION: Predictive biological signatures provide utility as biomarkers for disease diagnosis and prognosis, as well as prediction of responses to vaccination or therapy. These signatures are iden-tified from high-throughput profiling assays through a combination of dimensionality reduction and machine learning techniques. The genes, proteins, metabolites, and other biological analytes that compose signatures also generate hypotheses on the underlying mechanisms driving biological responses, thus improving biological understanding. Dimensionality reduction is a critical step in signature discovery to address the large number of analytes in omics datasets, especially for multi-omics profiling studies with tens of thousands of measurements. Latent factor models, which can account for the structural heterogeneity across diverse assays, effectively integrate multi-omics data and reduce dimensionality to a small number of factors that capture correlations and associations among measurements. These factors provide biologically interpretable features for predictive model-ing. However, multi-omics integration and predictive modeling are generally performed independent-ly in sequential steps, leading to suboptimal factor construction. Combining these steps can yield better multi-omics signatures that are more predictive while still being biologically meaningful. RESULTS: We developed a supervised variational Bayesian factor model that extracts multi-omics signatures from high-throughput profiling datasets that can span multiple data types. Signature-based multiPle-omics intEgration via lAtent factoRs (SPEAR) adaptively determines factor rank, emphasis on factor structure, data relevance and feature sparsity. The method improves the recon-struction of underlying factors in synthetic examples and prediction accuracy of COVID-19 severity and breast cancer tumor subtypes. AVAILABILITY: SPEAR is a publicly available R-package hosted at https://bitbucket.org/kleinstein/SPEAR.

19.
iScience ; 26(12): 108387, 2023 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-38047068

RESUMO

Infection with West Nile virus (WNV) drives a wide range of responses, from asymptomatic to flu-like symptoms/fever or severe cases of encephalitis and death. To identify cellular and molecular signatures distinguishing WNV severity, we employed systems profiling of peripheral blood from asymptomatic and severely ill individuals infected with WNV. We interrogated immune responses longitudinally from acute infection through convalescence employing single-cell protein and transcriptional profiling complemented with matched serum proteomics and metabolomics as well as multi-omics analysis. At the acute time point, we detected both elevation of pro-inflammatory markers in innate immune cell types and reduction of regulatory T cell activity in participants with severe infection, whereas asymptomatic donors had higher expression of genes associated with anti-inflammatory CD16+ monocytes. Therefore, we demonstrated the potential of systems immunology using multiple cell-type and cell-state-specific analyses to identify correlates of infection severity and host cellular activity contributing to an effective anti-viral response.

20.
bioRxiv ; 2023 Aug 29.
Artigo em Inglês | MEDLINE | ID: mdl-37693565

RESUMO

Computational models that predict an individual's response to a vaccine offer the potential for mechanistic insights and personalized vaccination strategies. These models are increasingly derived from systems vaccinology studies that generate immune profiles from human cohorts pre- and post-vaccination. Most of these studies involve relatively small cohorts and profile the response to a single vaccine. The ability to assess the performance of the resulting models would be improved by comparing their performance on independent datasets, as has been done with great success in other areas of biology such as protein structure predictions. To transfer this approach to system vaccinology studies, we established a prototype platform that focuses on the evaluation of Computational Models of Immunity to Pertussis Booster vaccinations (CMI-PB). A community resource, CMI-PB generates experimental data for the explicit purpose of model evaluation, which is performed through a series of annual data releases and associated contests. We here report on our experience with the first such 'dry run' for a contest where the goal was to predict individual immune responses based on pre-vaccination multi-omic profiles. Over 30 models adopted from the literature were tested, but only one was predictive, and was based on age alone. The performance of new models built using CMI-PB training data was much better, but varied significantly based on the choice of pre-vaccination features used and the model building strategy. This suggests that previously published models developed for other vaccines do not generalize well to Pertussis Booster vaccination. Overall, these results reinforced the need for comparative analysis across models and datasets that CMI-PB aims to achieve. We are seeking wider community engagement for our first public prediction contest, which will open in early 2024.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA