Pesquisa | Portal Regional da BVS

1.

Multi-omic integration reveals alterations in nasal mucosal biology that mediate air pollutant effects on allergic rhinitis.

Irizar, Haritz; Chun, Yoojin; Hsu, Hsiao-Hsien Leon; Li, Yan-Chak; Zhang, Lingdi; Arditi, Zoe; Grishina, Galina; Grishin, Alexander; Vicencio, Alfin; Pandey, Gaurav; Bunyavanich, Supinda.

Allergy ; 2024 May 26.

Artigo em Inglês | MEDLINE | ID: mdl-38796780

RESUMO

BACKGROUND: Allergic rhinitis is a common inflammatory condition of the nasal mucosa that imposes a considerable health burden. Air pollution has been observed to increase the risk of developing allergic rhinitis. We addressed the hypotheses that early life exposure to air toxics is associated with developing allergic rhinitis, and that these effects are mediated by DNA methylation and gene expression in the nasal mucosa. METHODS: In a case-control cohort of 505 participants, we geocoded participants' early life exposure to air toxics using data from the US Environmental Protection Agency, assessed physician diagnosis of allergic rhinitis by questionnaire, and collected nasal brushings for whole-genome DNA methylation and transcriptome profiling. We then performed a series of analyses including differential expression, Mendelian randomization, and causal mediation analyses to characterize relationships between early life air toxics, nasal DNA methylation, nasal gene expression, and allergic rhinitis. RESULTS: Among the 505 participants, 275 had allergic rhinitis. The mean age of the participants was 16.4 years (standard deviation = 9.5 years). Early life exposure to air toxics such as acrylic acid, phosphine, antimony compounds, and benzyl chloride was associated with developing allergic rhinitis. These air toxics exerted their effects by altering the nasal DNA methylation and nasal gene expression levels of genes involved in respiratory ciliary function, mast cell activation, pro-inflammatory TGF-ß1 signaling, and the regulation of myeloid immune cell function. CONCLUSIONS: Our results expand the range of air pollutants implicated in allergic rhinitis and shed light on their underlying biological mechanisms in nasal mucosa.

2.

A comprehensive youth diabetes epidemiological dataset and web portal: Resource Development and Case Studies.

McDonough, Catherine; Li, Yan Chak; Vangeepuram, Nita; Liu, Bian; Pandey, Gaurav.

JMIR Public Health Surveill ; 2024 Apr 26.

Artigo em Inglês | MEDLINE | ID: mdl-38666756

RESUMO

BACKGROUND: The prevalence of Type 2 diabetes (DM) and prediabetes (preDM) has been increasing among youth in recent decades in the United States, prompting an urgent need for understanding and identifying their associated risk factors. Such efforts, however, have been hindered by the lack of easily accessible youth preDM/DM data. OBJECTIVE: We aimed to first build a high quality, comprehensive epidemiological dataset focused on youth preDM/DM. Subsequently, we aimed to make this data accessible by creating a user-friendly web portal to share it and corresponding codes. Through this, we hope to address this significant gap and facilitate youth preDM/DM research. METHODS: Building on data from the National Health and Nutrition Examination Survey (NHANES) from 1999 to 2018, we cleaned and harmonized hundreds of variables relevant to preDM/DM (fasting plasma glucose level ≥100 mg/dL and/or HbA1C ≥5.7%) for youth aged 12-19 years (n=15,149). We identified individual factors associated with preDM/DM risk using bivariate statistical analyses and predicted preDM/DM status using our Ensemble Integration (EI) framework for multi-domain machine learning. We then developed a user-friendly web portal named Prediabetes/diabetes in youth ONline Dashboard (POND) to share the data and codes. RESULTS: We extracted 95 variables potentially relevant to preDM/DM risk organized into 4 domains (sociodemographic, health status, diet, and other lifestyle behaviors). The bivariate analyses identified 27 significant correlates of preDM/DM (P ≤0.0005, Bonferroni adjusted), including race/ethnicity, health insurance, BMI, added sugar intake, and screen time. Sixteen of these factors were also identified based on the EI methodology (Fisher's P of overlap=7.06x10^-6). In addition to those, the EI approach identified 11 additional predictive variables, including some known (e.g., meat and fruit intake and family income) and less recognized factors (e.g., number of rooms in homes). The factors identified in both analyses spanned over all 4 of the domains mentioned. These data and results, as well as other exploratory tools, can be accessed on POND (https://rstudio-connect.hpc.mssm.edu/POND/). CONCLUSIONS: Using NHANES data, we built one of the largest public epidemiological datasets for studying youth preDM/DM and identified potential risk factors using complementary analytical approaches. Our results align with the multifactorial nature of preDM/DM with correlates across several domains. Also, our data-sharing platform, POND, facilitates a wide range of applications to inform future youth preDM/DM studies.

3.

Machine learning-driven identification of air toxic combinations associated with asthma symptoms among elementary school children in Spokane, Washington, USA.

Amiri, Solmaz; Li, Yan-Chak; Buchwald, Dedra; Pandey, Gaurav.

Sci Total Environ ; 921: 171102, 2024 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-38387571

RESUMO

Air toxics are atmospheric pollutants with hazardous effects on health and the environment. Although methodological constraints have limited the number of air toxics assessed for associations with health and disease, advances in machine learning (ML) enable the assessment of a much larger set of environmental exposures. We used ML methods to conduct a retrospective study to identify combinations of 109 air toxics associated with asthma symptoms among 269 elementary school students in Spokane, Washington. Data on the frequency of asthma symptoms for these children were obtained from Spokane Public Schools. Their exposure to air toxics was estimated by using the Environmental Protection Agency's Air Toxics Screening Assessment and National Air Toxics Assessment. We defined three exposure periods: the most recent year (2019), the last three years (2017-2019), and the last five years (2014-2019). We analyzed the data using the ML-based Data-driven ExposurE Profile (DEEP) extraction method. DEEP identified 25 air toxic combinations associated with asthma symptoms in at least one exposure period. Three combinations (1,1,1-trichloroethane, 2-nitropropane, and 2,4,6-trichlorophenol) were significantly associated with asthma symptoms in all three exposure periods. Four air toxics (1,1,1-trichloroethane, 1,1,2,2-tetrachloroethane, BIS (2-ethylhexyl) phthalate (DEHP), and 2,4-dinitrophenol) were associated only in combination with other toxics, and would not have been identified by traditional statistical methods. The application of DEEP also identified a vulnerable subpopulation of children who were exposed to 13 of the 25 significant combinations in at least one exposure period. On average, these children experienced the largest number of asthma symptoms in our sample. By providing evidence on air toxic combinations associated with childhood asthma, our findings may contribute to the regulation of these toxics to improve children's respiratory health.

Assuntos

Poluentes Atmosféricos , Poluição do Ar , Asma , Tricloroetanos , Criança , Humanos , Poluentes Atmosféricos/toxicidade , Poluentes Atmosféricos/análise , Washington/epidemiologia , Estudos Retrospectivos , Asma/induzido quimicamente , Asma/epidemiologia , Exposição Ambiental

4.

Exploring the Druggable Conformational Space of Protein Kinases Using AI-Generated Structures.

Herrington, Noah B; Stein, David; Li, Yan Chak; Pandey, Gaurav; Schlessinger, Avner.

bioRxiv ; 2023 Sep 02.

Artigo em Inglês | MEDLINE | ID: mdl-37693436

RESUMO

Protein kinase function and interactions with drugs are controlled in part by the movement of the DFG and ÉC-Helix motifs, which enable kinases to adopt various conformational states. Small molecule ligands elicit therapeutic effects with distinct selectivity profiles and residence times that often depend on the kinase conformation(s) they bind. However, the limited availability of experimentally determined structural data for kinases in inactive states restricts drug discovery efforts for this major protein family. Modern AI-based structural modeling methods hold potential for exploring the previously experimentally uncharted druggable conformational space for kinases. Here, we first evaluated the currently explored conformational space of kinases in the PDB and models generated by AlphaFold2 (AF2) (1) and ESMFold (2), two prominent AI-based structure prediction methods. We then investigated AF2's ability to predict kinase structures in different conformations at various multiple sequence alignment (MSA) depths, based on this parameter's ability to explore conformational diversity. Our results showed a bias within the PDB and predicted structural models generated by AF2 and ESMFold toward structures of kinases in the active state over alternative conformations, particularly those conformations controlled by the DFG motif. Finally, we demonstrate that predicting kinase structures using AF2 at lower MSA depths allows the exploration of the space of these alternative conformations, including identifying previously unobserved conformations for 398 kinases. The results of our analysis of structural modeling by AF2 create a new avenue for the pursuit of new therapeutic agents against a notoriously difficult-to-target family of proteins. Significance Statement: Greater abundance of kinase structural data in inactive conformations, currently lacking in structural databases, would improve our understanding of how protein kinases function and expand drug discovery and development for this family of therapeutic targets. Modern approaches utilizing artificial intelligence and machine learning have potential for efficiently capturing novel protein conformations. We provide evidence for a bias within AlphaFold2 and ESMFold to predict structures of kinases in their active states, similar to their overrepresentation in the PDB. We show that lowering the AlphaFold2 algorithm's multiple sequence alignment depth can help explore kinase conformational space more broadly. It can also enable the prediction of hundreds of kinase structures in novel conformations, many of whose models are likely viable for drug discovery.

5.

Facilitating youth diabetes studies with the most comprehensive epidemiological dataset available through a public web portal.

McDonough, Catherine; Li, Yan Chak; Vangeepuram, Nita; Liu, Bian; Pandey, Gaurav.

medRxiv ; 2023 Aug 04.

Artigo em Inglês | MEDLINE | ID: mdl-37577465

RESUMO

The prevalence of type 2 diabetes mellitus (DM) and prediabetes (preDM) is rapidly increasing among youth, posing significant health and economic consequences. To address this growing concern, we created the most comprehensive youth-focused diabetes dataset to date derived from National Health and Nutrition Examination Survey (NHANES) data from 1999 to 2018. The dataset, consisting of 15,149 youth aged 12 to 19 years, encompasses preDM/DM relevant variables from sociodemographic, health status, diet, and other lifestyle behavior domains. An interactive web portal, POND (Prediabetes/diabetes in youth ONline Dashboard), was developed to provide public access to the dataset, allowing users to explore variables potentially associated with youth preDM/DM. Leveraging statistical and machine learning methods, we conducted two case studies, revealing established and lesser-known variables linked to youth preDM/DM. This dataset and portal can facilitate future studies to inform prevention and management strategies for youth prediabetes and diabetes.

6.

Developing better digital health measures of Parkinson's disease using free living data and a crowdsourced data analysis challenge.

Sieberts, Solveig K; Borzymowski, Henryk; Guan, Yuanfang; Huang, Yidi; Matzner, Ayala; Page, Alex; Bar-Gad, Izhar; Beaulieu-Jones, Brett; El-Hanani, Yuval; Goschenhofer, Jann; Javidnia, Monica; Keller, Mark S; Li, Yan-Chak; Saqib, Mohammed; Smith, Greta; Stanescu, Ana; Venuto, Charles S; Zielinski, Robert; Jayaraman, Arun; Evers, Luc J W; Foschini, Luca; Mariakakis, Alex; Pandey, Gaurav; Shawen, Nicholas; Synder, Phil; Omberg, Larsson.

PLOS Digit Health ; 2(3): e0000208, 2023 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-36976789

RESUMO

One of the promising opportunities of digital health is its potential to lead to more holistic understandings of diseases by interacting with the daily life of patients and through the collection of large amounts of real-world data. Validating and benchmarking indicators of disease severity in the home setting is difficult, however, given the large number of confounders present in the real world and the challenges in collecting ground truth data in the home. Here we leverage two datasets collected from patients with Parkinson's disease, which couples continuous wrist-worn accelerometer data with frequent symptom reports in the home setting, to develop digital biomarkers of symptom severity. Using these data, we performed a public benchmarking challenge in which participants were asked to build measures of severity across 3 symptoms (on/off medication, dyskinesia, and tremor). 42 teams participated and performance was improved over baseline models for each subchallenge. Additional ensemble modeling across submissions further improved performance, and the top models validated in a subset of patients whose symptoms were observed and rated by trained clinicians.

7.

Integrating multimodal data through interpretable heterogeneous ensembles.

Li, Yan Chak; Wang, Linhua; Law, Jeffrey N; Murali, T M; Pandey, Gaurav.

Bioinform Adv ; 2(1): vbac065, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36158455

RESUMO

Motivation: Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular, early and intermediate approaches that rely on a uniform integrated representation reinforce the consensus among the modalities but may lose exclusive local information. The alternative late integration approach that can address this challenge has not been systematically studied for biomedical problems. Results: We propose Ensemble Integration (EI) as a novel systematic implementation of the late integration approach. EI infers local predictive models from the individual data modalities using appropriate algorithms and uses heterogeneous ensemble algorithms to integrate these local models into a global predictive model. We also propose a novel interpretation method for EI models. We tested EI on the problems of predicting protein function from multimodal STRING data and mortality due to coronavirus disease 2019 (COVID-19) from multimodal data in electronic health records. We found that EI accomplished its goal of producing significantly more accurate predictions than each individual modality. It also performed better than several established early integration methods for each of these problems. The interpretation of a representative EI model for COVID-19 mortality prediction identified several disease-relevant features, such as laboratory test (blood urea nitrogen and calcium) and vital sign measurements (minimum oxygen saturation) and demographics (age). These results demonstrated the effectiveness of the EI framework for biomedical data integration and predictive modeling. Availability and implementation: Code and data are available at https://github.com/GauravPandeyLab/ensemble_integration. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

8.

Integrating multimodal data through interpretable heterogeneous ensembles.

Li, Yan Chak; Wang, Linhua; Law, Jeffrey N; Murali, T M; Pandey, Gaurav.

bioRxiv ; 2022 Jul 25.

Artigo em Inglês | MEDLINE | ID: mdl-35923321

RESUMO

Motivation: Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular, early and intermediate approaches that rely on a uniform integrated representation reinforce the consensus among the modalities, but may lose exclusive local information. The alternative late integration approach that can address this challenge has not been systematically studied for biomedical problems. Results: We propose Ensemble Integration (EI) as a novel systematic implementation of the late integration approach. EI infers local predictive models from the individual data modalities using appropriate algorithms, and uses effective heterogeneous ensemble algorithms to integrate these local models into a global predictive model. We also propose a novel interpretation method for EI models. We tested EI on the problems of predicting protein function from multimodal STRING data, and mortality due to COVID-19 from multimodal data in electronic health records. We found that EI accomplished its goal of producing significantly more accurate predictions than each individual modality. It also performed better than several established early integration methods for each of these problems. The interpretation of a representative EI model for COVID-19 mortality prediction identified several disease-relevant features, such as laboratory test (blood urea nitrogen (BUN) and calcium) and vital sign measurements (minimum oxygen saturation) and demographics (age). These results demonstrated the effectiveness of the EI framework for biomedical data integration and predictive modeling. Availability: Code and data are available at https://github.com/GauravPandeyLab/ensemble_integration . Contact: gaurav.pandey@mssm.edu.

9.

Machine learning-driven identification of early-life air toxic combinations associated with childhood asthma outcomes.

Li, Yan-Chak; Hsu, Hsiao-Hsien Leon; Chun, Yoojin; Chiu, Po-Hsiang; Arditi, Zoe; Claudio, Luz; Pandey, Gaurav; Bunyavanich, Supinda.

J Clin Invest ; 131(22)2021 11 15.

Artigo em Inglês | MEDLINE | ID: mdl-34609967

RESUMO

Air pollution is a well-known contributor to asthma. Air toxics are hazardous air pollutants that cause or may cause serious health effects. Although individual air toxics have been associated with asthma, only a limited number of studies have specifically examined combinations of air toxics associated with the disease. We geocoded air toxic levels from the US National Air Toxics Assessment (NATA) to residential locations for participants of our AiRway in Asthma (ARIA) study. We then applied Data-driven ExposurE Profile extraction (DEEP), a machine learning-based method, to discover combinations of early-life air toxics associated with current use of daily asthma controller medication, lifetime emergency department visit for asthma, and lifetime overnight hospitalization for asthma. We discovered 20 multi-air toxic combinations and 18 single air toxics associated with at least 1 outcome. The multi-air toxic combinations included those containing acrylic acid, ethylidene dichloride, and hydroquinone, and they were significantly associated with asthma outcomes. Several air toxic members of the combinations would not have been identified by single air toxic analyses, supporting the use of machine learning-based methods designed to detect combinatorial effects. Our findings provide knowledge about air toxic combinations associated with childhood asthma.

Assuntos

Poluentes Atmosféricos/efeitos adversos , Asma/etiologia , Aprendizado de Máquina , Acrilatos/efeitos adversos , Adolescente , Poluentes Atmosféricos/análise , Criança , Cloreto de Etil/efeitos adversos , Feminino , Humanos , Hidroquinonas/efeitos adversos , Masculino , Fatores de Risco

10.

Clinical features of COVID-19 mortality: development and validation of a clinical prediction model.

Yadaw, Arjun S; Li, Yan-Chak; Bose, Sonali; Iyengar, Ravi; Bunyavanich, Supinda; Pandey, Gaurav.

Lancet Digit Health ; 2(10): e516-e525, 2020 10.

Artigo em Inglês | MEDLINE | ID: mdl-32984797

RESUMO

Background: The COVID-19 pandemic has affected millions of individuals and caused hundreds of thousands of deaths worldwide. Predicting mortality among patients with COVID-19 who present with a spectrum of complications is very difficult, hindering the prognostication and management of the disease. We aimed to develop an accurate prediction model of COVID-19 mortality using unbiased computational methods, and identify the clinical features most predictive of this outcome. Methods: In this prediction model development and validation study, we applied machine learning techniques to clinical data from a large cohort of patients with COVID-19 treated at the Mount Sinai Health System in New York City, NY, USA, to predict mortality. We analysed patient-level data captured in the Mount Sinai Data Warehouse database for individuals with a confirmed diagnosis of COVID-19 who had a health system encounter between March 9 and April 6, 2020. For initial analyses, we used patient data from March 9 to April 5, and randomly assigned (80:20) the patients to the development dataset or test dataset 1 (retrospective). Patient data for those with encounters on April 6, 2020, were used in test dataset 2 (prospective). We designed prediction models based on clinical features and patient characteristics during health system encounters to predict mortality using the development dataset. We assessed the resultant models in terms of the area under the receiver operating characteristic curve (AUC) score in the test datasets. Findings: Using the development dataset (n=3841) and a systematic machine learning framework, we developed a COVID-19 mortality prediction model that showed high accuracy (AUC=0·91) when applied to test datasets of retrospective (n=961) and prospective (n=249) patients. This model was based on three clinical features: patient's age, minimum oxygen saturation over the course of their medical encounter, and type of patient encounter (inpatient vs outpatient and telehealth visits). Interpretation: An accurate and parsimonious COVID-19 mortality prediction model based on three features might have utility in clinical settings to guide the management and prognostication of patients affected by this disease. External validation of this prediction model in other populations is needed. Funding: National Institutes of Health.

Assuntos

COVID-19/mortalidade , Regras de Decisão Clínica , Fatores Etários , Idoso , COVID-19/patologia , Conjuntos de Dados como Assunto , Feminino , Humanos , Modelos Logísticos , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Cidade de Nova Iorque/epidemiologia , Curva ROC , Reprodutibilidade dos Testes , Fatores de Risco

11.

Clinical predictors of COVID-19 mortality.

Yadaw, Arjun S; Li, Yan-Chak; Bose, Sonali; Iyengar, Ravi; Bunyavanich, Supinda; Pandey, Gaurav.

medRxiv ; 2020 May 22.

Artigo em Inglês | MEDLINE | ID: mdl-32511520

RESUMO

BACKGROUND: The coronavirus disease 2019 (COVID-19) pandemic has affected over millions of individuals and caused hundreds of thousands of deaths worldwide. It can be difficult to accurately predict mortality among COVID-19 patients presenting with a spectrum of complications, hindering the prognostication and management of the disease. METHODS: We applied machine learning techniques to clinical data from a large cohort of 5,051 COVID-19 patients treated at the Mount Sinai Health System in New York City, the global COVID-19 epicenter, to predict mortality. Predictors were designed to classify patients into Deceased or Alive mortality classes and were evaluated in terms of the area under the receiver operating characteristic (ROC) curve (AUC score). FINDINGS: Using a development cohort (n=3,841) and a systematic machine learning framework, we identified a COVID-19 mortality predictor that demonstrated high accuracy (AUC=0.91) when applied to test sets of retrospective (n= 961) and prospective (n=249) patients. This mortality predictor was based on five clinical features: age, minimum O2 saturation during encounter, type of patient encounter (inpatient vs. various types of outpatient and telehealth encounters), hydroxychloroquine use, and maximum body temperature. INTERPRETATION: An accurate and parsimonious COVID-19 mortality predictor based on five features may have utility in clinical settings to guide the management and prognostication of patients affected by this disease.

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA