Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 56
Filtrar
1.
Proc Natl Acad Sci U S A ; 121(26): e2405840121, 2024 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-38900798

RESUMO

Proteomics has been revolutionized by large protein language models (PLMs), which learn unsupervised representations from large corpora of sequences. These models are typically fine-tuned in a supervised setting to adapt the model to specific downstream tasks. However, the computational and memory footprint of fine-tuning (FT) large PLMs presents a barrier for many research groups with limited computational resources. Natural language processing has seen a similar explosion in the size of models, where these challenges have been addressed by methods for parameter-efficient fine-tuning (PEFT). In this work, we introduce this paradigm to proteomics through leveraging the parameter-efficient method LoRA and training new models for two important tasks: predicting protein-protein interactions (PPIs) and predicting the symmetry of homooligomer quaternary structures. We show that these approaches are competitive with traditional FT while requiring reduced memory and substantially fewer parameters. We additionally show that for the PPI prediction task, training only the classification head also remains competitive with full FT, using five orders of magnitude fewer parameters, and that each of these methods outperform state-of-the-art PPI prediction methods with substantially reduced compute. We further perform a comprehensive evaluation of the hyperparameter space, demonstrate that PEFT of PLMs is robust to variations in these hyperparameters, and elucidate where best practices for PEFT in proteomics differ from those in natural language processing. All our model adaptation and evaluation code is available open-source at https://github.com/microsoft/peft_proteomics. Thus, we provide a blueprint to democratize the power of PLM adaptation to groups with limited computational resources.


Assuntos
Proteômica , Proteômica/métodos , Proteínas/química , Proteínas/metabolismo , Processamento de Linguagem Natural , Mapeamento de Interação de Proteínas/métodos , Biologia Computacional/métodos , Humanos , Algoritmos
3.
Am J Med Genet A ; 194(11): e63596, 2024 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-38895864

RESUMO

The purpose of this study is to gain insights into potential genetic factors contributing to the infant's vulnerability to Sudden Unexpected Infant Death (SUID). Whole Genome Sequencing (WGS) was performed on 144 infants that succumbed to SUID, and 573 healthy adults. Variants were filtered by gnomAD allele frequencies and predictions of functional consequences. Variants of interest were identified in 88 genes, in 64.6% of our cohort. Seventy-three of these have been previously associated with SIDS/SUID/SUDP. Forty-three can be characterized as cardiac genes and are related to cardiomyopathies, arrhythmias, and other conditions. Variants in 22 genes were associated with neurologic functions. Variants were also found in 13 genes reported to be pathogenic for various systemic disorders and in two genes associated with immunological function. Variants in eight genes are implicated in the response to hypoxia and the regulation of reactive oxygen species (ROS) and have not been previously described in SIDS/SUID/SUDP. Seventy-two infants met the triple risk hypothesis criteria. Our study confirms and further expands the list of genetic variants associated with SUID. The abundance of genes associated with heart disease and the discovery of variants associated with the redox metabolism have important mechanistic implications for the pathophysiology of SUID.


Assuntos
Predisposição Genética para Doença , Morte Súbita do Lactente , Sequenciamento Completo do Genoma , Humanos , Morte Súbita do Lactente/genética , Morte Súbita do Lactente/patologia , Feminino , Lactente , Masculino , Recém-Nascido , Variação Genética , Adulto , Frequência do Gene
4.
Pancreatology ; 2024 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-39261223

RESUMO

BACKGROUND/OBJECTIVES: Pancreatic cyst management can be distilled into three separate pathways - discharge, monitoring or surgery- based on the risk of malignant transformation. This study compares the performance of artificial intelligence (AI) models to clinical care for this task. METHODS: Two explainable boosting machine (EBM) models were developed and evaluated using clinical features only, or clinical features and cyst fluid molecular markers (CFMM) using a publicly available dataset, consisting of 850 cases (median age 64; 65 % female) with independent training (429 cases) and holdout test cohorts (421 cases). There were 137 cysts with no malignant potential, 114 malignant cysts, and 599 IPMNs and MCNs. RESULTS: The EBM and EBM with CFMM models had higher accuracy for identifying patients requiring monitoring (0.88 and 0.82) and surgery (0.66 and 0.82) respectively compared with current clinical care (0.62 and 0.58). For discharge, the EBM with CFMM model had a higher accuracy (0.91) than either the EBM model (0.84) or current clinical care (0.86). In the cohort of patients who underwent surgical resection, use of the EBM-CFMM model would have decreased the number of unnecessary surgeries by 59 % (n = 92), increased correct surgeries by 7.5 % (n = 11), identified patients who require monitoring by 122 % (n = 76), and increased the number of patients correctly classified for discharge by 138 % (n = 18) compared to clinical care. CONCLUSIONS: EBM models had greater sensitivity and specificity for identifying the correct management compared with either clinical management or previous AI models. The model predictions are demonstrated to be interpretable by clinicians.

5.
Proc Natl Acad Sci U S A ; 118(18)2021 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-33903246

RESUMO

There are emerging opportunities to assess health indicators at truly small areas with increasing availability of data geocoded to micro geographic units and advanced modeling techniques. The utility of such fine-grained data can be fully leveraged if linked to local governance units that are accountable for implementation of programs and interventions. We used data from the 2011 Indian Census for village-level demographic and amenities features and the 2016 Indian Demographic and Health Survey in a bias-corrected semisupervised regression framework to predict child anthropometric failures for all villages in India. Of the total geographic variation in predicted child anthropometric failure estimates, 54.2 to 72.3% were attributed to the village level followed by 20.6 to 39.5% to the state level. The mean predicted stunting was 37.9% (SD: 10.1%; IQR: 31.2 to 44.7%), and substantial variation was found across villages ranging from less than 5% for 691 villages to over 70% in 453 villages. Estimates at the village level can potentially shift the paradigm of policy discussion in India by enabling more informed prioritization and precise targeting. The proposed methodology can be adapted and applied to diverse population health indicators, and in other contexts, to reveal spatial heterogeneity at a finer geographic scale and identify local areas with the greatest needs and with direct implications for actions to take place.


Assuntos
Transtornos da Nutrição Infantil/epidemiologia , Transtornos do Crescimento/epidemiologia , Desnutrição/epidemiologia , Antropometria , Censos , Criança , Transtornos da Nutrição Infantil/metabolismo , Transtornos da Nutrição Infantil/patologia , Pré-Escolar , Feminino , Transtornos do Crescimento/metabolismo , Transtornos do Crescimento/patologia , Humanos , Índia/epidemiologia , Masculino , Desnutrição/metabolismo , Desnutrição/patologia , População Rural/estatística & dados numéricos
6.
Int J Equity Health ; 22(1): 181, 2023 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-37670348

RESUMO

BACKGROUND: Socioeconomic status has long been associated with population health and health outcomes. While ameliorating social determinants of health may improve health, identifying and targeting areas where feasible interventions are most needed would help improve health equity. We sought to identify inequities in health and social determinants of health (SDOH) associated with local economic distress at the county-level. METHODS: For 3,131 counties in the 50 US states and Washington, DC (wherein approximately 325,711,203 people lived in 2019), we conducted a retrospective analysis of county-level data collected from County Health Rankings in two periods (centering around 2015 and 2019). We used ANOVA to compare thirty-three measures across five health and SDOH domains (Health Outcomes, Clinical Care, Health Behaviors, Physical Environment, and Social and Economic Factors) that were available in both periods, changes in measures between periods, and ratios of measures for the least to most prosperous counties across county-level prosperity quintiles, based on the Economic Innovation Group's 2015-2019 Distressed Community Index Scores. RESULTS: With seven exceptions, in both periods, we found a worsening of values with each progression from more to less prosperous counties, with least prosperous counties having the worst values (ANOVA p < 0.001 for all measures). Between 2015 and 2019, all except six measures progressively worsened when comparing higher to lower prosperity quintiles, and gaps between the least and most prosperous counties generally widened. CONCLUSIONS: In the late 2010s, the least prosperous US counties overwhelmingly had worse values in measures of Health Outcomes, Clinical Care, Health Behaviors, the Physical Environment, and Social and Economic Factors than more prosperous counties. Between 2015 and 2019, for most measures, inequities between the least and most prosperous counties widened. Our findings suggest that local economic prosperity may serve as a proxy for health and SDOH status of the community. Policymakers and leaders in public and private sectors might use long-term, targeted economic stimuli in low prosperity counties to generate local, community health benefits for vulnerable populations. Doing so could sustainably improve health; not doing so will continue to generate poor health outcomes and ever-widening economic disparities.


Assuntos
Comportamentos Relacionados com a Saúde , Determinantes Sociais da Saúde , Humanos , Estudos Retrospectivos , Fatores Econômicos , Avaliação de Resultados em Cuidados de Saúde
7.
BMC Public Health ; 22(1): 2394, 2022 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-36539760

RESUMO

BACKGROUND: Despite an abundance of information on the risk factors of SARS-CoV-2, there have been few US-wide studies of long-term effects. In this paper we analyzed a large medical claims database of US based individuals to identify common long-term effects as well as their associations with various social and medical risk factors. METHODS: The medical claims database was obtained from a prominent US based claims data processing company, namely Change Healthcare. In addition to the claims data, the dataset also consisted of various social determinants of health such as race, income, education level and veteran status of the individuals. A self-controlled cohort design (SCCD) observational study was performed to identify ICD-10 codes whose proportion was significantly increased in the outcome period compared to the control period to identify significant long-term effects. A logistic regression-based association analysis was then performed between identified long-term effects and social determinants of health. RESULTS: Among the over 1.37 million COVID patients in our datasets we found 36 out of 1724 3-digit ICD-10 codes to be statistically significantly increased in the post-COVID period (p-value < 0.05). We also found one combination of ICD-10 codes, corresponding to 'other anemias' and 'hypertension', that was statistically significantly increased in the post-COVID period (p-value < 0.05). Our logistic regression-based association analysis with social determinants of health variables, after adjusting for comorbidities and prior conditions, showed that age and gender were significantly associated with the multiple long-term effects. Race was only associated with 'other sepsis', income was only associated with 'Alopecia areata' (autoimmune disease causing hair loss), while education level was only associated with 'Maternal infectious and parasitic diseases' (p-value < 0.05). CONCLUSION: We identified several long-term effects of SARS-CoV-2 through a self-controlled study on a cohort of over one million patients. Furthermore, we found that while age and gender are commonly associated with the long-term effects, other social determinants of health such as race, income and education levels have rare or no significant associations.


Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , SARS-CoV-2 , Determinantes Sociais da Saúde , Fatores de Risco , Comorbidade
8.
J Med Internet Res ; 23(5): e24742, 2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-33872190

RESUMO

BACKGROUND: Identifying new COVID-19 cases is challenging. Not every suspected case undergoes testing, because testing kits and other equipment are limited in many parts of the world. Yet populations increasingly use the internet to manage both home and work life during the pandemic, giving researchers mediated connections to millions of people sheltering in place. OBJECTIVE: The goal of this study was to assess the feasibility of using an online news platform to recruit volunteers willing to report COVID-19-like symptoms and behaviors. METHODS: An online epidemiologic survey captured COVID-19-related symptoms and behaviors from individuals recruited through banner ads offered through Microsoft News. Respondents indicated whether they were experiencing symptoms, whether they received COVID-19 testing, and whether they traveled outside of their local area. RESULTS: A total of 87,322 respondents completed the survey across a 3-week span at the end of April 2020, with 54.3% of the responses from the United States and 32.0% from Japan. Of the total respondents, 19,631 (22.3%) reported at least one symptom associated with COVID-19. Nearly two-fifths of these respondents (39.1%) reported more than one COVID-19-like symptom. Individuals who reported being tested for COVID-19 were significantly more likely to report symptoms (47.7% vs 21.5%; P<.001). Symptom reporting rates positively correlated with per capita COVID-19 testing rates (R2=0.26; P<.001). Respondents were geographically diverse, with all states and most ZIP Codes represented. More than half of the respondents from both countries were older than 50 years of age. CONCLUSIONS: News platforms can be used to quickly recruit study participants, enabling collection of infectious disease symptoms at scale and with populations that are older than those found through social media platforms. Such platforms could enable epidemiologists and researchers to quickly assess trends in emerging infections potentially before at-risk populations present to clinics and hospitals for testing and/or treatment.


Assuntos
Publicidade/métodos , Teste para COVID-19/métodos , Uso da Internet/estatística & dados numéricos , Mídias Sociais/estatística & dados numéricos , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Pandemias , Projetos Piloto , SARS-CoV-2/isolamento & purificação , Inquéritos e Questionários , Adulto Jovem
9.
J Acoust Soc Am ; 149(5): 3086, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-34241138

RESUMO

The goal of this project is to use acoustic signatures to detect, classify, and count the calls of four acoustic populations of blue whales so that, ultimately, the conservation status of each population can be better assessed. We used manual annotations from 350 h of audio recordings from the underwater hydrophones in the Indian Ocean to build a deep learning model to detect, classify, and count the calls from four acoustic song types. The method we used was Siamese neural networks (SNN), a class of neural network architectures that are used to find the similarity of the inputs by comparing their feature vectors, finding that they outperformed the more widely used convolutional neural networks (CNN). Specifically, the SNN outperform a CNN with 2% accuracy improvement in population classification and 1.7%-6.4% accuracy improvement in call count estimation for each blue whale population. In addition, even though we treat the call count estimation problem as a classification task and encode the number of calls in each spectrogram as a categorical variable, SNN surprisingly learned the ordinal relationship among them. SNN are robust and are shown here to be an effective way to automatically mine large acoustic datasets for blue whale calls.


Assuntos
Balaenoptera , Acústica , Animais , Oceano Índico , Redes Neurais de Computação , Vocalização Animal
10.
J Pediatr ; 220: 49-55.e2, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32061407

RESUMO

OBJECTIVES: To assess the geographic variation of sudden unexpected infant death (SUID) and test if variation in geographic factors, such as state, latitude, and longitude, play a role in SUID risk across the US. STUDY DESIGN: We analyzed the Centers for Disease Control and Prevention's Cohort Linked Birth/Infant Death dataset (2005-2010; 22 882 SUID cases, 25 305 837 live births, rate 0.90/1000). SUID was defined as infant deaths (ages 7-364 days) that included sudden infant death syndrome, ill-defined and unknown cause of mortality, and accidental suffocation and strangulation in bed. SUID geographic variation was analyzed using 2 statistical models, logistic regression and generalized additive model (GAM). RESULTS: Both models produced similar results. Without adjustment, there was marked geographic variation in SUID rates, but the variation decreased after adjusting for covariates including known risk factors for SUID. After adjustment, nine states demonstrated significantly higher or lower SUID mortality than the national average. Geographic contribution to SUID risk in terms of latitude and longitude were also attenuated after adjustment for covariates. CONCLUSION: Understanding why some states have lower SUID rates may enhance SUID prevention strategies.


Assuntos
Morte Súbita do Lactente/epidemiologia , Centers for Disease Control and Prevention, U.S. , Conjuntos de Dados como Assunto , Geografia Médica , Humanos , Lactente , Recém-Nascido , Modelos Estatísticos , Estados Unidos/epidemiologia
11.
J Acoust Soc Am ; 147(3): 1834, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-32237822

RESUMO

Over a decade after the Cook Inlet beluga (Delphinapterus leucas) was listed as endangered in 2008, the population has shown no sign of recovery. Lack of ecological knowledge limits the understanding of, and ability to manage, potential threats impeding recovery of this declining population. National Oceanic and Atmospheric Administration Fisheries, in partnership with the Alaska Department of Fish and Game, initiated a passive acoustics monitoring program in 2017 to investigate beluga seasonal occurrence by deploying a series of passive acoustic moorings. Data have been processed with semi-automated tonal detectors followed by time intensive manual validation. To reduce this labor intensive and time-consuming process, in addition to increasing the accuracy of classification results, the authors constructed an ensembled deep learning convolutional neural network model to classify beluga detections as true or false. Using a 0.5 threshold, the final model achieves 96.57% precision and 92.26% recall on testing dataset. This methodology proves to be successful at classifying beluga signals, and the framework can be easily generalized to other acoustic classification problems.


Assuntos
Beluga , Aprendizado Profundo , Acústica , Alaska , Animais , Oceanos e Mares
13.
Sci Rep ; 14(1): 6002, 2024 03 12.
Artigo em Inglês | MEDLINE | ID: mdl-38472269

RESUMO

In the United States the rate of stillbirth after 28 weeks' gestation (late stillbirth) is 2.7/1000 births. Fetuses that are small for gestational age (SGA) or large for gestational age (LGA) are at increased risk of stillbirth. SGA and LGA are often categorized as growth or birthweight ≤ 10th and ≥ 90th centile, respectively; however, these cut-offs are arbitrary. We sought to characterize the relationship between birthweight and stillbirth risk in greater detail. Data on singleton births between 28- and 44-weeks' gestation from 2014 to 2015 were extracted from the US Centers for Disease Control and Prevention live birth and fetal death files. Growth was assessed using customized birthweight centiles (Gestation Related Optimal Weight; GROW). The analyses included logistic regression using SGA/LGA categories and a generalized additive model (GAM) using birthweight centile as a continuous exposure. Although the SGA and LGA categories identified infants at risk of stillbirth, categorical models provided poor fits to the data within the high-risk bins, and in particular markedly underestimated the risk for the extreme centiles. For example, for fetuses in the lowest GROW centile, the observed rate was 39.8/1000 births compared with a predicted rate of 11.7/1000 from the category-based analysis. In contrast, the model-predicted risk from the GAM tracked closely with the observed risk, with the GAM providing an accurate characterization of stillbirth risk across the entire birthweight continuum. This study provides stillbirth risk estimates for each GROW centile, which clinicians can use in conjunction with other clinical details to guide obstetric management.


Assuntos
Desenvolvimento Fetal , Natimorto , Gravidez , Recém-Nascido , Lactente , Feminino , Humanos , Estados Unidos , Peso ao Nascer , Recém-Nascido Pequeno para a Idade Gestacional , Idade Gestacional , Retardo do Crescimento Fetal
14.
Med Phys ; 51(2): 1203-1216, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37544015

RESUMO

BACKGROUND: Prostate-specific membrane antigen (PSMA) PET imaging represents a valuable source of information reflecting disease stage, response rate, and treatment optimization options, particularly with PSMA radioligand therapy. Quantification of radiopharmaceutical uptake in healthy organs from PSMA images has the potential to minimize toxicity by extrapolation of the radiation dose delivery towards personalization of therapy. However, segmentation and quantification of uptake in organs requires labor-intensive organ delineations that are often not feasible in the clinic nor scalable for large clinical trials. PURPOSE: In this work we develop and test the PSMA Healthy organ segmentation network (PSMA-Hornet), a fully-automated deep neural net for simultaneous segmentation of 14 healthy organs representing the normal biodistribution of [18 F]DCFPyL on PET/CT images. We also propose a modified U-net architecture, a self-supervised pre-training method for PET/CT images, a multi-target Dice loss, and multi-target batch balancing to effectively train PSMA-Hornet and similar networks. METHODS: The study used manually-segmented [18 F]DCFPyL PET/CT images from 100 subjects, and 526 similar images without segmentations. The unsegmented images were used for self-supervised model pretraining. For supervised training, Monte-Carlo cross-validation was used to evaluate the network performance, with 85 subjects in each trial reserved for model training, 5 for validation, and 10 for testing. Image segmentation and quantification metrics were evaluated on the test folds with respect to manual segmentations by a nuclear medicine physician, and compared to inter-rater agreement. The model's segmentation performance was also evaluated on a separate set of 19 images with high tumor load. RESULTS: With our best model, the lowest mean Dice coefficient on the test set was 0.826 for the sublingual gland, and the highest was 0.964 for liver. The highest mean error in tracer uptake quantification was 13.9% in the sublingual gland. Self-supervised pretraining improved training convergence, train-to-test generalization, and segmentation quality. In addition, we found that a multi-target network produced significantly higher segmentation accuracy than single-organ networks. CONCLUSIONS: The developed network can be used to automatically obtain high-quality organ segmentations for PSMA image analysis tasks. It can be used to reproducibly extract imaging data, and holds promise for clinical applications such as personalized radiation dosimetry and improved radioligand therapy.


Assuntos
Antígenos de Superfície , Glutamato Carboxipeptidase II , Neoplasias da Próstata , Animais , Humanos , Masculino , Processamento de Imagem Assistida por Computador/métodos , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada/métodos , Neoplasias da Próstata/diagnóstico por imagem , Neoplasias da Próstata/radioterapia , Distribuição Tecidual
15.
JAMA Pediatr ; 178(9): 906-913, 2024 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-39073792

RESUMO

Importance: Rates of maternal obesity are increasing in the US. Although obesity is a well-documented risk factor for numerous poor pregnancy outcomes, it is not currently a recognized risk factor for sudden unexpected infant death (SUID). Objective: To determine whether maternal obesity is a risk factor for SUID and the proportion of SUID cases attributable to maternal obesity. Design, Setting, and Participants: This was a US nationwide cohort study using Centers for Disease Control and Prevention National Center for Health Statistics linked birth-infant death records for birth cohorts in 2015 through 2019. All US live births for the study years occurring at 28 weeks' gestation or later from complete reporting areas were eligible; SUID cases were deaths occurring at 7 to 364 days after birth with International Statistical Classification of Diseases, Tenth Revision cause of death code R95 (sudden infant death syndrome), R99 (ill-defined and unknown causes), or W75 (accidental suffocation and strangulation in bed). Data were analyzed from October 1 through November 15, 2023. Exposure: Maternal prepregnancy body mass index (BMI; calculated as weight in kilograms divided by height in meters squared). Main Outcome and Measure: SUID. Results: Of 18 857 694 live births eligible for analysis (median [IQR] age: maternal, 29 [9] years; paternal, 31 [9] years; gestational, 39 [2] weeks), 16 545 died of SUID (SUID rate, 0.88/1000 live births). After confounder adjustment, compared with mothers with normal BMI (BMI 18.5-24.9), infants born to mothers with obesity had a higher SUID risk that increased with increasing obesity severity. Infants of mothers with class I obesity (BMI 30.0-34.9) were at increased SUID risk (adjusted odds ratio [aOR], 1.10; 95% CI, 1.05-1.16); with class II obesity (BMI 35.0-39.9), a higher risk (aOR, 1.20; 95% CI, 1.13-1.27); and class III obesity (BMI ≥40.0), an even higher risk (aOR, 1.39; 95% CI, 1.31-1.47). A generalized additive model showed that increased BMI was monotonically associated with increased SUID risk, with an acceleration of risk for BMIs greater than approximately 25 to 30. Approximately 5.4% of SUID cases were attributable to maternal obesity. Conclusions and Relevance: The findings suggest that infants born to mothers with obesity are at increased risk of SUID, with a dose-dependent association between increasing maternal BMI and SUID risk. Maternal obesity should be added to the list of known risk factors for SUID. With maternal obesity rates increasing, research should identify potential causal mechanisms for this association.


Assuntos
Obesidade Materna , Morte Súbita do Lactente , Humanos , Feminino , Gravidez , Morte Súbita do Lactente/epidemiologia , Morte Súbita do Lactente/etiologia , Fatores de Risco , Adulto , Recém-Nascido , Estados Unidos/epidemiologia , Obesidade Materna/epidemiologia , Obesidade Materna/complicações , Lactente , Índice de Massa Corporal , Complicações na Gravidez/epidemiologia , Estudos de Coortes
16.
JMIR Mhealth Uhealth ; 12: e57318, 2024 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-38913882

RESUMO

BACKGROUND: Conversational chatbots are an emerging digital intervention for smoking cessation. No studies have reported on the entire development process of a cessation chatbot. OBJECTIVE: We aim to report results of the user-centered design development process and randomized controlled trial for a novel and comprehensive quit smoking conversational chatbot called QuitBot. METHODS: The 4 years of formative research for developing QuitBot followed an 11-step process: (1) specifying a conceptual model; (2) conducting content analysis of existing interventions (63 hours of intervention transcripts); (3) assessing user needs; (4) developing the chat's persona ("personality"); (5) prototyping content and persona; (6) developing full functionality; (7) programming the QuitBot; (8) conducting a diary study; (9) conducting a pilot randomized controlled trial (RCT); (10) reviewing results of the RCT; and (11) adding a free-form question and answer (QnA) function, based on user feedback from pilot RCT results. The process of adding a QnA function itself involved a three-step process: (1) generating QnA pairs, (2) fine-tuning large language models (LLMs) on QnA pairs, and (3) evaluating the LLM outputs. RESULTS: We developed a quit smoking program spanning 42 days of 2- to 3-minute conversations covering topics ranging from motivations to quit, setting a quit date, choosing Food and Drug Administration-approved cessation medications, coping with triggers, and recovering from lapses and relapses. In a pilot RCT with 96% three-month outcome data retention, QuitBot demonstrated high user engagement and promising cessation rates compared to the National Cancer Institute's SmokefreeTXT text messaging program, particularly among those who viewed all 42 days of program content: 30-day, complete-case, point prevalence abstinence rates at 3-month follow-up were 63% (39/62) for QuitBot versus 38.5% (45/117) for SmokefreeTXT (odds ratio 2.58, 95% CI 1.34-4.99; P=.005). However, Facebook Messenger intermittently blocked participants' access to QuitBot, so we transitioned from Facebook Messenger to a stand-alone smartphone app as the communication channel. Participants' frustration with QuitBot's inability to answer their open-ended questions led to us develop a core conversational feature, enabling users to ask open-ended questions about quitting cigarette smoking and for the QuitBot to respond with accurate and professional answers. To support this functionality, we developed a library of 11,000 QnA pairs on topics associated with quitting cigarette smoking. Model testing results showed that Microsoft's Azure-based QnA maker effectively handled questions that matched our library of 11,000 QnA pairs. A fine-tuned, contextualized GPT-3.5 (OpenAI) responds to questions that are not within our library of QnA pairs. CONCLUSIONS: The development process yielded the first LLM-based quit smoking program delivered as a conversational chatbot. Iterative testing led to significant enhancements, including improvements to the delivery channel. A pivotal addition was the inclusion of a core LLM-supported conversational feature allowing users to ask open-ended questions. TRIAL REGISTRATION: ClinicalTrials.gov NCT03585231; https://clinicaltrials.gov/study/NCT03585231.


Assuntos
Abandono do Hábito de Fumar , Design Centrado no Usuário , Humanos , Abandono do Hábito de Fumar/métodos , Abandono do Hábito de Fumar/psicologia , Masculino , Adulto , Feminino , Pessoa de Meia-Idade
17.
Int J Public Health ; 69: 1607295, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39132383

RESUMO

Objectives: To determine whether life expectancy (LE) changes between 2000 and 2019 were associated with race, rural status, local economic prosperity, and changes in local economic prosperity, at the county level. Methods: Between 12/1/22 and 2/28/23, we conducted a retrospective analysis of 2000 and 2019 data from 3,123 United States counties. For Total, White, and Black populations, we compared LE changes for counties across the rural-urban continuum, the local economic prosperity continuum, and for counties in which local economic prosperity dramatically improved or declined. Results: In both years, overall, across the rural-urban continuum, and for all studied populations, LE decreased with each progression from the most to least prosperous quintile (all p < 0.001); improving county prosperity between 2000-2019 was associated with greater LE gains (p < 0.001 for all). Conclusion: At the county level, race, rurality, and local economic distress were all associated with LE; improvements in local economic conditions were associated with accelerated LE. Policymakers should appreciate the health externalities of investing in areas experiencing poor economic prosperity if their goal is to improve population health.


Assuntos
Expectativa de Vida , População Rural , Humanos , Expectativa de Vida/tendências , Estudos Retrospectivos , Estados Unidos , Masculino , Feminino , População Urbana , Fatores Socioeconômicos , Estresse Financeiro
18.
PLoS One ; 19(2): e0297271, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38315667

RESUMO

Differentially private (DP) synthetic datasets are a solution for sharing data while preserving the privacy of individual data providers. Understanding the effects of utilizing DP synthetic data in end-to-end machine learning pipelines impacts areas such as health care and humanitarian action, where data is scarce and regulated by restrictive privacy laws. In this work, we investigate the extent to which synthetic data can replace real, tabular data in machine learning pipelines and identify the most effective synthetic data generation techniques for training and evaluating machine learning models. We systematically investigate the impacts of differentially private synthetic data on downstream classification tasks from the point of view of utility as well as fairness. Our analysis is comprehensive and includes representatives of the two main types of synthetic data generation algorithms: marginal-based and GAN-based. To the best of our knowledge, our work is the first that: (i) proposes a training and evaluation framework that does not assume that real data is available for testing the utility and fairness of machine learning models trained on synthetic data; (ii) presents the most extensive analysis of synthetic dataset generation algorithms in terms of utility and fairness when used for training machine learning models; and (iii) encompasses several different definitions of fairness. Our findings demonstrate that marginal-based synthetic data generators surpass GAN-based ones regarding model training utility for tabular data. Indeed, we show that models trained using data generated by marginal-based algorithms can exhibit similar utility to models trained using real data. Our analysis also reveals that the marginal-based synthetic data generated using AIM and MWEM PGM algorithms can train models that simultaneously achieve utility and fairness characteristics close to those obtained by models trained with real data.


Assuntos
Algoritmos , Instalações de Saúde , Decoração de Interiores e Mobiliário , Conhecimento , Aprendizado de Máquina
19.
Res Sq ; 2024 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-38746169

RESUMO

The majority of proteins must form higher-order assemblies to perform their biological functions. Despite the importance of protein quaternary structure, there are few machine learning models that can accurately and rapidly predict the symmetry of assemblies involving multiple copies of the same protein chain. Here, we address this gap by training several classes of protein foundation models, including ESM-MSA, ESM2, and RoseTTAFold2, to predict homo-oligomer symmetry. Our best model named Seq2Symm, which utilizes ESM2, outperforms existing template-based and deep learning methods. It achieves an average PR-AUC of 0.48 and 0.44 across homo-oligomer symmetries on two different held-out test sets compared to 0.32 and 0.23 for the template-based method. Because Seq2Symm can rapidly predict homo-oligomer symmetries using a single sequence as input (~ 80,000 proteins/hour), we have applied it to 5 entire proteomes and ~ 3.5 million unlabeled protein sequences to identify patterns in protein assembly complexity across biological kingdoms and species.

20.
JAMA Ophthalmol ; 142(3): 226-233, 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38329740

RESUMO

Importance: Deep learning image analysis often depends on large, labeled datasets, which are difficult to obtain for rare diseases. Objective: To develop a self-supervised approach for automated classification of macular telangiectasia type 2 (MacTel) on optical coherence tomography (OCT) with limited labeled data. Design, Setting, and Participants: This was a retrospective comparative study. OCT images from May 2014 to May 2019 were collected by the Lowy Medical Research Institute, La Jolla, California, and the University of Washington, Seattle, from January 2016 to October 2022. Clinical diagnoses of patients with and without MacTel were confirmed by retina specialists. Data were analyzed from January to September 2023. Exposures: Two convolutional neural networks were pretrained using the Bootstrap Your Own Latent algorithm on unlabeled training data and fine-tuned with labeled training data to predict MacTel (self-supervised method). ResNet18 and ResNet50 models were also trained using all labeled data (supervised method). Main Outcomes and Measures: The ground truth yes vs no MacTel diagnosis is determined by retinal specialists based on spectral-domain OCT. The models' predictions were compared against human graders using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), area under precision recall curve (AUPRC), and area under the receiver operating characteristic curve (AUROC). Uniform manifold approximation and projection was performed for dimension reduction and GradCAM visualizations for supervised and self-supervised methods. Results: A total of 2636 OCT scans from 780 patients with MacTel and 131 patients without MacTel were included from the MacTel Project (mean [SD] age, 60.8 [11.7] years; 63.8% female), and another 2564 from 1769 patients without MacTel from the University of Washington (mean [SD] age, 61.2 [18.1] years; 53.4% female). The self-supervised approach fine-tuned on 100% of the labeled training data with ResNet50 as the feature extractor performed the best, achieving an AUPRC of 0.971 (95% CI, 0.969-0.972), an AUROC of 0.970 (95% CI, 0.970-0.973), accuracy of 0.898%, sensitivity of 0.898, specificity of 0.949, PPV of 0.935, and NPV of 0.919. With only 419 OCT volumes (185 MacTel patients in 10% of labeled training dataset), the ResNet18 self-supervised model achieved comparable performance, with an AUPRC of 0.958 (95% CI, 0.957-0.960), an AUROC of 0.966 (95% CI, 0.964-0.967), and accuracy, sensitivity, specificity, PPV, and NPV of 90.2%, 0.884, 0.916, 0.896, and 0.906, respectively. The self-supervised models showed better agreement with the more experienced human expert graders. Conclusions and Relevance: The findings suggest that self-supervised learning may improve the accuracy of automated MacTel vs non-MacTel binary classification on OCT with limited labeled training data, and these approaches may be applicable to other rare diseases, although further research is warranted.


Assuntos
Aprendizado Profundo , Telangiectasia Retiniana , Humanos , Feminino , Pessoa de Meia-Idade , Masculino , Tomografia de Coerência Óptica/métodos , Estudos Retrospectivos , Doenças Raras , Telangiectasia Retiniana/diagnóstico por imagem , Aprendizado de Máquina Supervisionado
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA