Pesquisa | BVS Violência e Saúde

1.

Personalized hypertension treatment recommendations by a data-driven model.

Hu, Yang; Huerta, Jasmine; Cordella, Nicholas; Mishuris, Rebecca G; Paschalidis, Ioannis Ch.

BMC Med Inform Decis Mak ; 23(1): 44, 2023 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-36859187

RESUMO

BACKGROUND: Hypertension is a prevalent cardiovascular disease with severe longer-term implications. Conventional management based on clinical guidelines does not facilitate personalized treatment that accounts for a richer set of patient characteristics. METHODS: Records from 1/1/2012 to 1/1/2020 at the Boston Medical Center were used, selecting patients with either a hypertension diagnosis or meeting diagnostic criteria (≥ 130 mmHg systolic or ≥ 90 mmHg diastolic, n = 42,752). Models were developed to recommend a class of antihypertensive medications for each patient based on their characteristics. Regression immunized against outliers was combined with a nearest neighbor approach to associate with each patient an affinity group of other patients. This group was then used to make predictions of future Systolic Blood Pressure (SBP) under each prescription type. For each patient, we leveraged these predictions to select the class of medication that minimized their future predicted SBP. RESULTS: The proposed model, built with a distributionally robust learning procedure, leads to a reduction of 14.28 mmHg in SBP, on average. This reduction is 70.30% larger than the reduction achieved by the standard-of-care and 7.08% better than the corresponding reduction achieved by the 2nd best model which uses ordinary least squares regression. All derived models outperform following the previous prescription or the current ground truth prescription in the record. We randomly sampled and manually reviewed 350 patient records; 87.71% of these model-generated prescription recommendations passed a sanity check by clinicians. CONCLUSION: Our data-driven approach for personalized hypertension treatment yielded significant improvement compared to the standard-of-care. The model implied potential benefits of computationally deprescribing and can support situations with clinical equipoise.

Assuntos

Doenças Cardiovasculares , Hipertensão , Humanos , Análise por Conglomerados , Hospitais , Prontuários Médicos

2.

Predictive models of pregnancy based on data from a preconception cohort study.

Yland, Jennifer J; Wang, Taiyao; Zad, Zahra; Willis, Sydney K; Wang, Tanran R; Wesselink, Amelia K; Jiang, Tammy; Hatch, Elizabeth E; Wise, Lauren A; Paschalidis, Ioannis Ch.

Hum Reprod ; 37(3): 565-576, 2022 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-35024824

RESUMO

STUDY QUESTION: Can we derive adequate models to predict the probability of conception among couples actively trying to conceive? SUMMARY ANSWER: Leveraging data collected from female participants in a North American preconception cohort study, we developed models to predict pregnancy with performance of â¼70% in the area under the receiver operating characteristic curve (AUC). WHAT IS KNOWN ALREADY: Earlier work has focused primarily on identifying individual risk factors for infertility. Several predictive models have been developed in subfertile populations, with relatively low discrimination (AUC: 59-64%). STUDY DESIGN, SIZE, DURATION: Study participants were female, aged 21-45 years, residents of the USA or Canada, not using fertility treatment, and actively trying to conceive at enrollment (2013-2019). Participants completed a baseline questionnaire at enrollment and follow-up questionnaires every 2 months for up to 12 months or until conception. We used data from 4133 participants with no more than one menstrual cycle of pregnancy attempt at study entry. PARTICIPANTS/MATERIALS, SETTING, METHODS: On the baseline questionnaire, participants reported data on sociodemographic factors, lifestyle and behavioral factors, diet quality, medical history and selected male partner characteristics. A total of 163 predictors were considered in this study. We implemented regularized logistic regression, support vector machines, neural networks and gradient boosted decision trees to derive models predicting the probability of pregnancy: (i) within fewer than 12 menstrual cycles of pregnancy attempt time (Model I), and (ii) within 6 menstrual cycles of pregnancy attempt time (Model II). Cox models were used to predict the probability of pregnancy within each menstrual cycle for up to 12 cycles of follow-up (Model III). We assessed model performance using the AUC and the weighted-F1 score for Models I and II, and the concordance index for Model III. MAIN RESULTS AND THE ROLE OF CHANCE: Model I and II AUCs were 70% and 66%, respectively, in parsimonious models, and the concordance index for Model III was 63%. The predictors that were positively associated with pregnancy in all models were: having previously breastfed an infant and using multivitamins or folic acid supplements. The predictors that were inversely associated with pregnancy in all models were: female age, female BMI and history of infertility. Among nulligravid women with no history of infertility, the most important predictors were: female age, female BMI, male BMI, use of a fertility app, attempt time at study entry and perceived stress. LIMITATIONS, REASONS FOR CAUTION: Reliance on self-reported predictor data could have introduced misclassification, which would likely be non-differential with respect to the pregnancy outcome given the prospective design. In addition, we cannot be certain that all relevant predictor variables were considered. Finally, though we validated the models using split-sample replication techniques, we did not conduct an external validation study. WIDER IMPLICATIONS OF THE FINDINGS: Given a wide range of predictor data, machine learning algorithms can be leveraged to analyze epidemiologic data and predict the probability of conception with discrimination that exceeds earlier work. STUDY FUNDING/COMPETING INTEREST(S): The research was partially supported by the U.S. National Science Foundation (under grants DMS-1664644, CNS-1645681 and IIS-1914792) and the National Institutes for Health (under grants R01 GM135930 and UL54 TR004130). In the last 3 years, L.A.W. has received in-kind donations for primary data collection in PRESTO from FertilityFriend.com, Kindara.com, Sandstone Diagnostics and Swiss Precision Diagnostics. L.A.W. also serves as a fibroid consultant to AbbVie, Inc. The other authors declare no competing interests. TRIAL REGISTRATION NUMBER: N/A.

Assuntos

Fertilidade , Infertilidade , Estudos de Coortes , Feminino , Humanos , Masculino , Gravidez , Estudos Prospectivos , Inquéritos e Questionários

3.

Social determinants of health and the prediction of missed breast imaging appointments.

Sotudian, Shahabeddin; Afran, Aaron; LeBedis, Christina A; Rives, Anna F; Paschalidis, Ioannis Ch; Fishman, Michael D C.

BMC Health Serv Res ; 22(1): 1454, 2022 Nov 30.

Artigo em Inglês | MEDLINE | ID: mdl-36451240

RESUMO

BACKGROUND: Predictive models utilizing social determinants of health (SDH), demographic data, and local weather data were trained to predict missed imaging appointments (MIA) among breast imaging patients at the Boston Medical Center (BMC). Patients were characterized by many different variables, including social needs, demographics, imaging utilization, appointment features, and weather conditions on the date of the appointment. METHODS: This HIPAA compliant retrospective cohort study was IRB approved. Informed consent was waived. After data preprocessing steps, the dataset contained 9,970 patients and 36,606 appointments from 1/1/2015 to 12/31/2019. We identified 57 potentially impactful variables used in the initial prediction model and assessed each patient for MIA. We then developed a parsimonious model via recursive feature elimination, which identified the 25 most predictive variables. We utilized linear and non-linear models including support vector machines (SVM), logistic regression (LR), and random forest (RF) to predict MIA and compared their performance. RESULTS: The highest-performing full model is the nonlinear RF, achieving the highest Area Under the ROC Curve (AUC) of 76% and average F1 score of 85%. Models limited to the most predictive variables were able to attain AUC and F1 scores comparable to models with all variables included. The variables most predictive of missed appointments included timing, prior appointment history, referral department of origin, and socioeconomic factors such as household income and access to caregiving services. CONCLUSIONS: Prediction of MIA with the data available is inherently limited by the complex, multifactorial nature of MIA. However, the algorithms presented achieved acceptable performance and demonstrated that socioeconomic factors were useful predictors of MIA. In contrast with non-modifiable demographic factors, we can address SDH to decrease the incidence of MIA.

Assuntos

Determinantes Sociais da Saúde , Fatores Sociais , Humanos , Estudos Retrospectivos , Diagnóstico por Imagem , Fatores Socioeconômicos

4.

A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent.

Pu, Shi; Olshevsky, Alex; Paschalidis, Ioannis Ch.

IEEE Trans Automat Contr ; 67(11): 5900-5915, 2022 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37284602

RESUMO

This paper is concerned with minimizing the average of n cost functions over a network in which agents may communicate and exchange information with each other. We consider the setting where only noisy gradient information is available. To solve the problem, we study the distributed stochastic gradient descent (DSGD) method and perform a non-asymptotic convergence analysis. For strongly convex and smooth objective functions, in expectation, DSGD asymptotically achieves the optimal network independent convergence rate compared to centralized stochastic gradient descent (SGD). Our main contribution is to characterize the transient time needed for DSGD to approach the asymptotic convergence rate. Moreover, we construct a "hard" optimization problem that proves the sharpness of the obtained result. Numerical experiments demonstrate the tightness of the theoretical results.

5.

Automated detection of mild cognitive impairment and dementia from voice recordings: A natural language processing approach.

Amini, Samad; Hao, Boran; Zhang, Lifu; Song, Mengting; Gupta, Aman; Karjadi, Cody; Kolachalama, Vijaya B; Au, Rhoda; Paschalidis, Ioannis Ch.

Alzheimers Dement ; 2022 Jul 07.

Artigo em Inglês | MEDLINE | ID: mdl-35796399

RESUMO

INTRODUCTION: Automated computational assessment of neuropsychological tests would enable widespread, cost-effective screening for dementia. METHODS: A novel natural language processing approach is developed and validated to identify different stages of dementia based on automated transcription of digital voice recordings of subjects' neuropsychological tests conducted by the Framingham Heart Study (n = 1084). Transcribed sentences from the test were encoded into quantitative data and several models were trained and tested using these data and the participants' demographic characteristics. RESULTS: Average area under the curve (AUC) on the held-out test data reached 92.6%, 88.0%, and 74.4% for differentiating Normal cognition from Dementia, Normal or Mild Cognitive Impairment (MCI) from Dementia, and Normal from MCI, respectively. DISCUSSION: The proposed approach offers a fully automated identification of MCI and dementia based on a recorded neuropsychological test, providing an opportunity to develop a remote screening tool that could be adapted easily to any language.

6.

Routing and Rebalancing Intermodal Autonomous Mobility-on-Demand Systems in Mixed Traffic.

Wollenstein-Betech, Salomón; Salazar, Mauro; Houshmand, Arian; Pavone, Marco; Paschalidis, Ioannis Ch; Cassandras, Christos G.

IEEE trans Intell Transp Syst ; 23(8): 12263-12275, 2022 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-37124136

RESUMO

This paper studies congestion-aware route-planning policies for intermodal Autonomous Mobility-on-Demand (AMoD) systems, whereby a fleet of autonomous vehicles provides on-demand mobility jointly with public transit under mixed traffic conditions (consisting of AMoD and private vehicles). First, we devise a network flow model to jointly optimize the AMoD routing and rebalancing strategies in a congestion-aware fashion by accounting for the endogenous impact of AMoD flows on travel time. Second, we capture the effect of exogenous traffic stemming from private vehicles adapting to the AMoD flows in a user-centric fashion by leveraging a sequential approach. Since our results are in terms of link flows, we then provide algorithms to retrieve the explicit recommended routes to users. Finally, we showcase our framework with two case-studies considering the transportation sub-networks in Eastern Massachusetts and New York City, respectively. Our results suggest that for high levels of demand, pure AMoD travel can be detrimental due to the additional traffic stemming from its rebalancing flows. However, blending AMoD with public transit, walking and micromobility options can significantly improve the overall system performance by leveraging the high-throughput of public transit combined with the flexibility of walking and micromobility.

7.

Learning from animals: How to Navigate Complex Terrains.

Zhu, Henghui; Liu, Hao; Ataei, Armin; Munk, Yonatan; Daniel, Thomas; Paschalidis, Ioannis Ch.

PLoS Comput Biol ; 16(1): e1007452, 2020 01.

Artigo em Inglês | MEDLINE | ID: mdl-31917816

RESUMO

We develop a method to learn a bio-inspired motion control policy using data collected from hawkmoths navigating in a virtual forest. A Markov Decision Process (MDP) framework is introduced to model the dynamics of moths and sparse logistic regression is used to learn control policy parameters from the data. The results show that moths do not favor detailed obstacle location information in navigation, but rely heavily on optical flow. Using the policy learned from the moth data as a starting point, we propose an actor-critic learning algorithm to refine policy parameters and obtain a policy that can be used by an autonomous aerial vehicle operating in a cluttered environment. Compared with the moths' policy, the policy we obtain integrates both obstacle location and optical flow. We compare the performance of these two policies in terms of their ability to navigate in artificial forest areas. While the optimized policy can adjust its parameters to outperform the moth's policy in each different terrain, the moth's policy exhibits a high level of robustness across terrains.

Assuntos

Simulação por Computador , Modelos Biológicos , Navegação Espacial/fisiologia , Algoritmos , Animais , Biologia Computacional , Tomada de Decisões , Processamento de Imagem Assistida por Computador , Aprendizado de Máquina , Cadeias de Markov , Mariposas/fisiologia

8.

The impact of payer status on hospital admissions: evidence from an academic medical center.

Zhao, Yanying; Paschalidis, Ioannis Ch; Hu, Jianqiang.

BMC Health Serv Res ; 21(1): 930, 2021 Sep 07.

Artigo em Inglês | MEDLINE | ID: mdl-34493261

RESUMO

BACKGROUND: There are plenty of studies investigating the disparity of payer status in accessing to care. However, most studies are either disease-specific or cohort-specific. Quantifying the disparity from the level of facility through a large controlled study are rare. This study aims to examine how the payer status affects patient hospitalization from the perspective of a facility. METHODS: We extracted all patients with visiting record in a medical center between 5/1/2009-4/30/2014, and then linked the outpatient and inpatient records three year before target admission time to patients. We conduct a retrospective observational study using a conditional logistic regression methodology. To control the illness of patients with different diseases in training the model, we construct a three-dimension variable with data stratification technology. The model is validated on a dataset distinct from the one used for training. RESULTS: Patients covered by private insurance or uninsured are less likely to be hospitalized than patients insured by government. For uninsured patients, inequity in access to hospitalization is observed. The value of standardized coefficients indicates that government-sponsored insurance has the greatest impact on improving patients' hospitalization. CONCLUSION: Attention is needed on improving the access to care for uninsured patients. Also, basic preventive care services should be enhanced, especially for people insured by government. The findings can serve as a baseline from which to measure the anticipated effect of measures to reduce disparity of payer status in hospitalization.

Assuntos

Seguro Saúde , Pessoas sem Cobertura de Seguro de Saúde , Centros Médicos Acadêmicos , Hospitalização , Hospitais , Humanos , Estados Unidos

9.

Learning parametric policies and transition probability models of markov decision processes from data.

Xu, Tingting; Zhu, Henghui; Paschalidis, Ioannis Ch.

Eur J Control ; 57: 68-75, 2021 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-33716408

RESUMO

We consider the problem of estimating the policy and transition probability model of a Markov Decision Process from data (state, action, next state tuples). The transition probability and policy are assumed to be parametric functions of a sparse set of features associated with the tuples. We propose two regularized maximum likelihood estimation algorithms for learning the transition probability model and policy, respectively. An upper bound is established on the regret, which is the difference between the average reward of the estimated policy under the estimated transition probabilities and that of the original unknown policy under the true (unknown) transition probabilities. We provide a sample complexity result showing that we can achieve a low regret with a relatively small amount of training samples. We illustrate the theoretical results with a healthcare example and a robot navigation experiment.

10.

A neural circuit model for a contextual association task inspired by recommender systems.

Zhu, Henghui; Paschalidis, Ioannis Ch; Chang, Allen; Stern, Chantal E; Hasselmo, Michael E.

Hippocampus ; 30(4): 384-395, 2020 04.

Artigo em Inglês | MEDLINE | ID: mdl-32057161

RESUMO

Behavioral data shows that humans and animals have the capacity to learn rules of associations applied to specific examples, and generalize these rules to a broad variety of contexts. This article focuses on neural circuit mechanisms to perform a context-dependent association task that requires linking sensory stimuli to behavioral responses and generalizing to multiple other symmetrical contexts. The model uses neural gating units that regulate the pattern of physiological connectivity within the circuit. These neural gating units can be used in a learning framework that performs low-rank matrix factorization analogous to recommender systems, allowing generalization with high accuracy to a wide range of additional symmetrical contexts. The neural gating units are trained with a biologically inspired framework involving traces of Hebbian modification that are updated based on the correct behavioral output of the network. This modeling demonstrates potential neural mechanisms for learning context-dependent association rules and for the change in selectivity of neurophysiological responses in the hippocampus. The proposed computational model is evaluated using simulations of the learning process and the application of the model to new stimuli. Further, human subject behavioral experiments were performed and the results validate the key observation of a low-rank synaptic matrix structure linking stimuli to responses.

Assuntos

Aprendizagem/fisiologia , Redes Neurais de Computação , Estimulação Luminosa/métodos , Desempenho Psicomotor/fisiologia , Percepção Visual/fisiologia , Estudos de Coortes , Humanos

11.

Asymptotic Network Independence in Distributed Stochastic Optimization for Machine Learning.

Pu, Shi; Olshevsky, Alex; Paschalidis, Ioannis Ch.

IEEE Signal Process Mag ; 37(3): 114-122, 2020 May.

Artigo em Inglês | MEDLINE | ID: mdl-33746471

RESUMO

We provide a discussion of several recent results which, in certain scenarios, are able to overcome a barrier in distributed stochastic optimization for machine learning. Our focus is the so-called asymptotic network independence property, which is achieved whenever a distributed method executed over a network of n nodes asymptotically converges to the optimal solution at a comparable rate to a centralized method with the same computational power as the entire network. We explain this property through an example involving the training of ML models and sketch a short mathematical analysis for comparing the performance of distributed stochastic gradient descent (DSGD) with centralized stochastic gradient decent (SGD).

12.

Predicting Chronic Disease Hospitalizations from Electronic Health Records: An Interpretable Classification Approach.

Brisimi, Theodora S; Xu, Tingting; Wang, Taiyao; Dai, Wuyang; Adams, William G; Paschalidis, Ioannis Ch.

Proc IEEE Inst Electr Electron Eng ; 106(4): 690-707, 2018 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-30886441

RESUMO

Urban living in modern large cities has significant adverse effects on health, increasing the risk of several chronic diseases. We focus on the two leading clusters of chronic disease, heart disease and diabetes, and develop data-driven methods to predict hospitalizations due to these conditions. We base these predictions on the patients' medical history, recent and more distant, as described in their Electronic Health Records (EHR). We formulate the prediction problem as a binary classification problem and consider a variety of machine learning methods, including kernelized and sparse Support Vector Machines (SVM), sparse logistic regression, and random forests. To strike a balance between accuracy and interpretability of the prediction, which is important in a medical setting, we propose two novel methods: K-LRT, a likelihood ratio test-based method, and a Joint Clustering and Classification (JCC) method which identifies hidden patient clusters and adapts classifiers to each cluster. We develop theoretical out-of-sample guarantees for the latter method. We validate our algorithms on large datasets from the Boston Medical Center, the largest safety-net hospital system in New England.

13.

Accounting for observed small angle X-ray scattering profile in the protein-protein docking server ClusPro.

Xia, Bing; Mamonov, Artem; Leysen, Seppe; Allen, Karen N; Strelkov, Sergei V; Paschalidis, Ioannis Ch; Vajda, Sandor; Kozakov, Dima.

J Comput Chem ; 36(20): 1568-72, 2015 Jul 30.

Artigo em Inglês | MEDLINE | ID: mdl-26095982

RESUMO

The protein-protein docking server ClusPro is used by thousands of laboratories, and models built by the server have been reported in over 300 publications. Although the structures generated by the docking include near-native ones for many proteins, selecting the best model is difficult due to the uncertainty in scoring. Small angle X-ray scattering (SAXS) is an experimental technique for obtaining low resolution structural information in solution. While not sufficient on its own to uniquely predict complex structures, accounting for SAXS data improves the ranking of models and facilitates the identification of the most accurate structure. Although SAXS profiles are currently available only for a small number of complexes, due to its simplicity the method is becoming increasingly popular. Since combining docking with SAXS experiments will provide a viable strategy for fairly high-throughput determination of protein complex structures, the option of using SAXS restraints is added to the ClusPro server. © 2015 Wiley Periodicals, Inc.

Assuntos

Simulação de Acoplamento Molecular , Proteínas/química , Espalhamento a Baixo Ângulo , Difração de Raios X

14.

The impact of side-chain packing on protein docking refinement.

Moghadasi, Mohammad; Mirzaei, Hanieh; Mamonov, Artem; Vakili, Pirooz; Vajda, Sandor; Paschalidis, Ioannis Ch; Kozakov, Dima.

J Chem Inf Model ; 55(4): 872-81, 2015 Apr 27.

Artigo em Inglês | MEDLINE | ID: mdl-25714358

RESUMO

We study the impact of optimizing the side-chain positions in the interface region between two proteins during the process of binding. Mathematically, the problem is similar to side-chain prediction, which has been extensively explored in the process of protein structure prediction. The protein-protein docking application, however, has a number of characteristics that necessitate different algorithmic and implementation choices. In this work, we implement a distributed approximate algorithm that can be implemented on multiprocessor architectures and enables a trade-off between accuracy and running speed. We report computational results on benchmarks of enzyme-inhibitor and other types of complexes, establishing that the side-chain flexibility our algorithm introduces substantially improves the performance of docking protocols. Furthermore, we establish that the inclusion of unbound side-chain conformers in the side-chain positioning problem is critical in these performance improvements. The code is available to the community under open source license.

Assuntos

Simulação de Acoplamento Molecular , Proteínas/química , Proteínas/metabolismo , Algoritmos , Termodinâmica , Fatores de Tempo

15.

A Message-Passing Algorithm for Wireless Network Scheduling.

Paschalidis, Ioannis Ch; Huang, Fuzhuo; Lai, Wei.

IEEE ACM Trans Netw ; 23(5): 1528-1541, 2015 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-26752942

RESUMO

We consider scheduling in wireless networks and formulate it as Maximum Weighted Independent Set (MWIS) problem on a "conflict" graph that captures interference among simultaneous transmissions. We propose a novel, low-complexity, and fully distributed algorithm that yields high-quality feasible solutions. Our proposed algorithm consists of two phases, each of which requires only local information and is based on message-passing. The first phase solves a relaxation of the MWIS problem using a gradient projection method. The relaxation we consider is tighter than the simple linear programming relaxation and incorporates constraints on all cliques in the graph. The second phase of the algorithm starts from the solution of the relaxation and constructs a feasible solution to the MWIS problem. We show that our algorithm always outputs an optimal solution to the MWIS problem for perfect graphs. Simulation results compare our policies against Carrier Sense Multiple Access (CSMA) and other alternatives and show excellent performance.

16.

ITNR: Inversion Transformer-based Neural Ranking for cancer drug recommendations.

Sotudian, Shahabeddin; Paschalidis, Ioannis Ch.

Comput Biol Med ; 172: 108312, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38503090

RESUMO

Personalized drug response prediction is an approach for tailoring effective therapeutic strategies for patients based on their tumors' genomic characterization. While machine learning methods are widely employed in the literature, they often struggle to capture drug-cell line relations across various cell lines. In addressing this challenge, our study introduces a novel listwise Learning-to-Rank (LTR) model named Inversion Transformer-based Neural Ranking (ITNR). ITNR utilizes genomic features and a transformer architecture to decipher functional relationships and construct models that can predict patient-specific drug responses. Our experiments were conducted on three major drug response data sets, showing that ITNR reliably and consistently outperforms state-of-the-art LTR models.

Assuntos

Antineoplásicos , Neoplasias , Humanos , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Linhagem Celular , Genômica , Aprendizado de Máquina , Neoplasias/tratamento farmacológico , Neoplasias/genética

17.

Multi-Attribute Subset Selection enables prediction of representative phenotypes across microbial populations.

Herbst, Konrad; Wang, Taiyao; Forchielli, Elena J; Thommes, Meghan; Paschalidis, Ioannis Ch; Segrè, Daniel.

Commun Biol ; 7(1): 407, 2024 Apr 03.

Artigo em Inglês | MEDLINE | ID: mdl-38570615

RESUMO

The interpretation of complex biological datasets requires the identification of representative variables that describe the data without critical information loss. This is particularly important in the analysis of large phenotypic datasets (phenomics). Here we introduce Multi-Attribute Subset Selection (MASS), an algorithm which separates a matrix of phenotypes (e.g., yield across microbial species and environmental conditions) into predictor and response sets of conditions. Using mixed integer linear programming, MASS expresses the response conditions as a linear combination of the predictor conditions, while simultaneously searching for the optimally descriptive set of predictors. We apply the algorithm to three microbial datasets and identify environmental conditions that predict phenotypes under other conditions, providing biologically interpretable axes for strain discrimination. MASS could be used to reduce the number of experiments needed to identify species or to map their metabolic capabilities. The generality of the algorithm allows addressing subset selection problems in areas beyond biology.

Assuntos

Algoritmos , Fenótipo

18.

Predicting polycystic ovary syndrome with machine learning algorithms from electronic health records.

Zad, Zahra; Jiang, Victoria S; Wolf, Amber T; Wang, Taiyao; Cheng, J Jojo; Paschalidis, Ioannis Ch; Mahalingaiah, Shruthi.

Front Endocrinol (Lausanne) ; 15: 1298628, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38356959

RESUMO

Introduction: Predictive models have been used to aid early diagnosis of PCOS, though existing models are based on small sample sizes and limited to fertility clinic populations. We built a predictive model using machine learning algorithms based on an outpatient population at risk for PCOS to predict risk and facilitate earlier diagnosis, particularly among those who meet diagnostic criteria but have not received a diagnosis. Methods: This is a retrospective cohort study from a SafetyNet hospital's electronic health records (EHR) from 2003-2016. The study population included 30,601 women aged 18-45 years without concurrent endocrinopathy who had any visit to Boston Medical Center for primary care, obstetrics and gynecology, endocrinology, family medicine, or general internal medicine. Four prediction outcomes were assessed for PCOS. The first outcome was PCOS ICD-9 diagnosis with additional model outcomes of algorithm-defined PCOS. The latter was based on Rotterdam criteria and merging laboratory values, radiographic imaging, and ICD data from the EHR to define irregular menstruation, hyperandrogenism, and polycystic ovarian morphology on ultrasound. Results: We developed predictive models using four machine learning methods: logistic regression, supported vector machine, gradient boosted trees, and random forests. Hormone values (follicle-stimulating hormone, luteinizing hormone, estradiol, and sex hormone binding globulin) were combined to create a multilayer perceptron score using a neural network classifier. Prediction of PCOS prior to clinical diagnosis in an out-of-sample test set of patients achieved an average AUC of 85%, 81%, 80%, and 82%, respectively in Models I, II, III and IV. Significant positive predictors of PCOS diagnosis across models included hormone levels and obesity; negative predictors included gravidity and positive bHCG. Conclusion: Machine learning algorithms were used to predict PCOS based on a large at-risk population. This approach may guide early detection of PCOS within EHR-interfaced populations to facilitate counseling and interventions that may reduce long-term health consequences. Our model illustrates the potential benefits of an artificial intelligence-enabled provider assistance tool that can be integrated into the EHR to reduce delays in diagnosis. However, model validation in other hospital-based populations is necessary.

Assuntos

Síndrome do Ovário Policístico , Humanos , Feminino , Síndrome do Ovário Policístico/diagnóstico , Estudos Retrospectivos , Inteligência Artificial , Registros Eletrônicos de Saúde , Hormônio Luteinizante , Algoritmos , Aprendizado de Máquina

19.

Automating biomedical literature review for rapid drug discovery: Leveraging GPT-4 to expedite pandemic response.

Yang, Jingmei; Walker, Kenji C; Bekar-Cesaretli, Ayse A; Hao, Boran; Bhadelia, Nahid; Joseph-McCarthy, Diane; Paschalidis, Ioannis Ch.

Int J Med Inform ; 189: 105500, 2024 May 24.

Artigo em Inglês | MEDLINE | ID: mdl-38815316

RESUMO

OBJECTIVE: The rapid expansion of the biomedical literature challenges traditional review methods, especially during outbreaks of emerging infectious diseases when quick action is critical. Our study aims to explore the potential of ChatGPT to automate the biomedical literature review for rapid drug discovery. MATERIALS AND METHODS: We introduce a novel automated pipeline helping to identify drugs for a given virus in response to a potential future global health threat. Our approach can be used to select PubMed articles identifying a drug target for the given virus. We tested our approach on two known pathogens: SARS-CoV-2, where the literature is vast, and Nipah, where the literature is sparse. Specifically, a panel of three experts reviewed a set of PubMed articles and labeled them as either describing a drug target for the given virus or not. The same task was given to the automated pipeline and its performance was based on whether it labeled the articles similarly to the human experts. We applied a number of prompt engineering techniques to improve the performance of ChatGPT. RESULTS: Our best configuration used GPT-4 by OpenAI and achieved an out-of-sample validation performance with accuracy/F1-score/sensitivity/specificity of 92.87%/88.43%/83.38%/97.82% for SARS-CoV-2 and 87.40%/73.90%/74.72%/91.36% for Nipah. CONCLUSION: These results highlight the utility of ChatGPT in drug discovery and development and reveal their potential to enable rapid drug target identification during a pandemic-level health emergency.

20.

Predictive models of miscarriage on the basis of data from a preconception cohort study.

Yland, Jennifer J; Zad, Zahra; Wang, Tanran R; Wesselink, Amelia K; Jiang, Tammy; Hatch, Elizabeth E; Paschalidis, Ioannis Ch; Wise, Lauren A.

Fertil Steril ; 122(1): 140-149, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38604264

RESUMO

OBJECTIVE: To use self-reported preconception data to derive models that predict the risk of miscarriage. DESIGN: Prospective preconception cohort study. SETTING: Not applicable. PATIENTS: Study participants were female, aged 21-45 years, residents of the United States or Canada, and attempting spontaneous pregnancy at enrollment during 2013-2022. Participants were followed for up to 12 months of pregnancy attempts; those who conceived were followed through pregnancy and postpartum. We restricted analyses to participants who conceived during the study period. EXPOSURE: On baseline and follow-up questionnaires completed every 8 weeks until pregnancy, we collected self-reported data on sociodemographic factors, reproductive history, lifestyle, anthropometrics, diet, medical history, and male partner characteristics. We included 160 potential predictor variables in our models. MAIN OUTCOME MEASURES: The primary outcome was a miscarriage, defined as pregnancy loss before 20 weeks of gestation. We followed participants from their first positive pregnancy test until miscarriage or a censoring event (induced abortion, ectopic pregnancy, loss of follow-up, or 20 weeks of gestation), whichever occurred first. We fit both survival and static models using Cox proportional hazards models, logistic regression, support vector machines, gradient-boosted trees, and random forest algorithms. We evaluated model performance using the concordance index (survival models) and the weighted F1 score (static models). RESULTS: Among the 8,720 participants who conceived, 20.4% reported miscarriage. In multivariable models, the strongest predictors of miscarriage were female age, history of miscarriage, and male partner age. The weighted F1 score ranged from 73%-89% for static models and the concordance index ranged from 53%-56% for survival models, indicating better discrimination for the static models compared with the survival models (i.e., the ability of the model to discriminate between individuals with and without miscarriage). No appreciable differences were observed across strata of miscarriage history or among models restricted to ≥8 weeks of gestation. CONCLUSION: Our findings suggest that miscarriage is not easily predicted on the basis of preconception lifestyle characteristics and that advancing age and a history of miscarriage are the most important predictors of incident miscarriage.

Assuntos

Aborto Espontâneo , Humanos , Feminino , Adulto , Aborto Espontâneo/epidemiologia , Gravidez , Estudos Prospectivos , Adulto Jovem , Pessoa de Meia-Idade , Fatores de Risco , Medição de Risco , Estados Unidos/epidemiologia , Valor Preditivo dos Testes , Canadá/epidemiologia , Estudos de Coortes , Masculino , Autorrelato

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA