Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Emerg Infect Dis ; 26(9): 2196-2200, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32818406

RESUMO

We evaluated the performance of X-bar chart, exponentially weighted moving average, and C3 cumulative sums aberration detection algorithms for acute diarrheal disease syndromic surveillance at naval sites in Peru during 2007-2011. The 3 algorithms' detection sensitivity was 100%, specificity was 97%-99%, and positive predictive value was 27%-46%.


Assuntos
Vigilância da População , Vigilância de Evento Sentinela , Algoritmos , Surtos de Doenças , Eletrônica , Peru/epidemiologia , Sensibilidade e Especificidade
2.
Artigo em Inglês | MEDLINE | ID: mdl-38511501

RESUMO

OBJECTIVES: Large language models (LLMs) are poised to change care delivery, but their impact on health equity is unclear. While marginalized populations have been historically excluded from early technology developments, LLMs present an opportunity to change our approach to developing, evaluating, and implementing new technologies. In this perspective, we describe the role of LLMs in supporting health equity. MATERIALS AND METHODS: We apply the National Institute on Minority Health and Health Disparities (NIMHD) research framework to explore the use of LLMs for health equity. RESULTS: We present opportunities for how LLMs can improve health equity across individual, family and organizational, community, and population health. We describe emerging concerns including biased data, limited technology diffusion, and privacy. Finally, we highlight recommendations focused on prompt engineering, retrieval augmentation, digital inclusion, transparency, and bias mitigation. CONCLUSION: The potential of LLMs to support health equity depends on making health equity a focus from the start.

3.
Lancet Digit Health ; 6(1): e12-e22, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38123252

RESUMO

BACKGROUND: Large language models (LLMs) such as GPT-4 hold great promise as transformative tools in health care, ranging from automating administrative tasks to augmenting clinical decision making. However, these models also pose a danger of perpetuating biases and delivering incorrect medical diagnoses, which can have a direct, harmful impact on medical care. We aimed to assess whether GPT-4 encodes racial and gender biases that impact its use in health care. METHODS: Using the Azure OpenAI application interface, this model evaluation study tested whether GPT-4 encodes racial and gender biases and examined the impact of such biases on four potential applications of LLMs in the clinical domain-namely, medical education, diagnostic reasoning, clinical plan generation, and subjective patient assessment. We conducted experiments with prompts designed to resemble typical use of GPT-4 within clinical and medical education applications. We used clinical vignettes from NEJM Healer and from published research on implicit bias in health care. GPT-4 estimates of the demographic distribution of medical conditions were compared with true US prevalence estimates. Differential diagnosis and treatment planning were evaluated across demographic groups using standard statistical tests for significance between groups. FINDINGS: We found that GPT-4 did not appropriately model the demographic diversity of medical conditions, consistently producing clinical vignettes that stereotype demographic presentations. The differential diagnoses created by GPT-4 for standardised clinical vignettes were more likely to include diagnoses that stereotype certain races, ethnicities, and genders. Assessment and plans created by the model showed significant association between demographic attributes and recommendations for more expensive procedures as well as differences in patient perception. INTERPRETATION: Our findings highlight the urgent need for comprehensive and transparent bias assessments of LLM tools such as GPT-4 for intended use cases before they are integrated into clinical care. We discuss the potential sources of these biases and potential mitigation strategies before clinical implementation. FUNDING: Priscilla Chan and Mark Zuckerberg.


Assuntos
Educação Médica , Instalações de Saúde , Feminino , Humanos , Masculino , Tomada de Decisão Clínica , Diagnóstico Diferencial , Atenção à Saúde
4.
medRxiv ; 2023 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-37502975

RESUMO

Objectives: Our primary objective was to develop a natural language processing approach that accurately predicts outpatient Evaluation and Management (E/M) level of service (LoS) codes using clinicians' notes from a health system electronic health record. A secondary objective was to investigate the impact of clinic note de-identification on document classification performance. Methods: We used retrospective outpatient office clinic notes from four medical and surgical specialties. Classification models were fine-tuned on the clinic notes datasets and stratified by subspecialty. The success criteria for the classification tasks were the classification accuracy and F1-scores on internal test data. For the secondary objective, the dataset was de-identified using Named Entity Recognition (NER) to remove protected health information (PHI), and models were retrained. Results: The models demonstrated similar predictive performance across different specialties, except for internal medicine, which had the lowest classification accuracy across all model architectures. The models trained on the entire note corpus achieved an E/M LoS CPT code classification accuracy of 74.8% (CI 95: 74.1-75.6). However, the de-identified note corpus showed a markedly lower classification accuracy of 48.2% (CI 95: 47.7-48.6) compared to the model trained on the identified notes. Conclusion: The study demonstrates the potential of NLP-based document classifiers to accurately predict E/M LoS CPT codes using clinical notes from various medical and procedural specialties. The models' performance suggests that the classification task's complexity merits further investigation. The de-identification experiment demonstrated that de-identification may negatively impact classifier performance. Further research is needed to validate the performance of our NLP classifiers in different healthcare settings and patient populations and to investigate the potential implications of de-identification on model performance.

5.
Nat Commun ; 14(1): 6403, 2023 10 12.
Artigo em Inglês | MEDLINE | ID: mdl-37828001

RESUMO

Rare Mendelian disorders pose a major diagnostic challenge and collectively affect 300-400 million patients worldwide. Many automated tools aim to uncover causal genes in patients with suspected genetic disorders, but evaluation of these tools is limited due to the lack of comprehensive benchmark datasets that include previously unpublished conditions. Here, we present a computational pipeline that simulates realistic clinical datasets to address this deficit. Our framework jointly simulates complex phenotypes and challenging candidate genes and produces patients with novel genetic conditions. We demonstrate the similarity of our simulated patients to real patients from the Undiagnosed Diseases Network and evaluate common gene prioritization methods on the simulated cohort. These prioritization methods recover known gene-disease associations but perform poorly on diagnosing patients with novel genetic disorders. Our publicly-available dataset and codebase can be utilized by medical genetics researchers to evaluate, compare, and improve tools that aid in the diagnostic process.


Assuntos
Pacientes , Doenças Raras , Humanos , Simulação por Computador , Fenótipo , Doenças Raras/diagnóstico , Doenças Raras/genética
6.
medRxiv ; 2023 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-37398230

RESUMO

Many areas of medicine would benefit from deeper, more accurate phenotyping, but there are limited approaches for phenotyping using clinical notes without substantial annotated data. Large language models (LLMs) have demonstrated immense potential to adapt to novel tasks with no additional training by specifying task-specific i nstructions. We investigated the per-formance of a publicly available LLM, Flan-T5, in phenotyping patients with postpartum hemorrhage (PPH) using discharge notes from electronic health records ( n =271,081). The language model achieved strong performance in extracting 24 granular concepts associated with PPH. Identifying these granular concepts accurately allowed the development of inter-pretable, complex phenotypes and subtypes. The Flan-T5 model achieved high fidelity in phenotyping PPH (positive predictive value of 0.95), identifying 47% more patients with this complication compared to the current standard of using claims codes. This LLM pipeline can be used reliably for subtyping PPH and outperformed a claims-based approach on the three most common PPH subtypes associated with uterine atony, abnormal placentation, and obstetric trauma. The advantage of this approach to subtyping is its interpretability, as each concept contributing to the subtype determination can be evaluated. Moreover, as definitions may change over time due to new guidelines, using granular concepts to create complex phenotypes enables prompt and efficient updating of the algorithm. Using this lan-guage modelling approach enables rapid phenotyping without the need for any manually annotated training data across multiple clinical use cases.

7.
NPJ Digit Med ; 6(1): 212, 2023 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-38036723

RESUMO

Many areas of medicine would benefit from deeper, more accurate phenotyping, but there are limited approaches for phenotyping using clinical notes without substantial annotated data. Large language models (LLMs) have demonstrated immense potential to adapt to novel tasks with no additional training by specifying task-specific instructions. Here we report the performance of a publicly available LLM, Flan-T5, in phenotyping patients with postpartum hemorrhage (PPH) using discharge notes from electronic health records (n = 271,081). The language model achieves strong performance in extracting 24 granular concepts associated with PPH. Identifying these granular concepts accurately allows the development of interpretable, complex phenotypes and subtypes. The Flan-T5 model achieves high fidelity in phenotyping PPH (positive predictive value of 0.95), identifying 47% more patients with this complication compared to the current standard of using claims codes. This LLM pipeline can be used reliably for subtyping PPH and outperforms a claims-based approach on the three most common PPH subtypes associated with uterine atony, abnormal placentation, and obstetric trauma. The advantage of this approach to subtyping is its interpretability, as each concept contributing to the subtype determination can be evaluated. Moreover, as definitions may change over time due to new guidelines, using granular concepts to create complex phenotypes enables prompt and efficient updating of the algorithm. Using this language modelling approach enables rapid phenotyping without the need for any manually annotated training data across multiple clinical use cases.

8.
Lancet Digit Health ; 5(12): e882-e894, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-38000873

RESUMO

BACKGROUND: The evaluation and management of first-time seizure-like events in children can be difficult because these episodes are not always directly observed and might be epileptic seizures or other conditions (seizure mimics). We aimed to evaluate whether machine learning models using real-world data could predict seizure recurrence after an initial seizure-like event. METHODS: This retrospective cohort study compared models trained and evaluated on two separate datasets between Jan 1, 2010, and Jan 1, 2020: electronic medical records (EMRs) at Boston Children's Hospital and de-identified, patient-level, administrative claims data from the IBM MarketScan research database. The study population comprised patients with an initial diagnosis of either epilepsy or convulsions before the age of 21 years, based on International Classification of Diseases, Clinical Modification (ICD-CM) codes. We compared machine learning-based predictive modelling using structured data (logistic regression and XGBoost) with emerging techniques in natural language processing by use of large language models. FINDINGS: The primary cohort comprised 14 021 patients at Boston Children's Hospital matching inclusion criteria with an initial seizure-like event and the comparison cohort comprised 15 062 patients within the IBM MarketScan research database. Seizure recurrence based on a composite expert-derived definition occurred in 57% of patients at Boston Children's Hospital and 63% of patients within IBM MarketScan. Large language models with additional domain-specific and location-specific pre-training on patients excluded from the study (F1-score 0·826 [95% CI 0·817-0·835], AUC 0·897 [95% CI 0·875-0·913]) performed best. All large language models, including the base model without additional pre-training (F1-score 0·739 [95% CI 0·738-0·741], AUROC 0·846 [95% CI 0·826-0·861]) outperformed models trained with structured data. With structured data only, XGBoost outperformed logistic regression and XGBoost models trained with the Boston Children's Hospital EMR (logistic regression: F1-score 0·650 [95% CI 0·643-0·657], AUC 0·694 [95% CI 0·685-0·705], XGBoost: F1-score 0·679 [0·676-0·683], AUC 0·725 [0·717-0·734]) performed similarly to models trained on the IBM MarketScan database (logistic regression: F1-score 0·596 [0·590-0·601], AUC 0·670 [0·664-0·675], XGBoost: F1-score 0·678 [0·668-0·687], AUC 0·710 [0·703-0·714]). INTERPRETATION: Physician's clinical notes about an initial seizure-like event include substantial signals for prediction of seizure recurrence, and additional domain-specific and location-specific pre-training can significantly improve the performance of clinical large language models, even for specialised cohorts. FUNDING: UCB, National Institute of Neurological Disorders and Stroke (US National Institutes of Health).


Assuntos
Epilepsia , Convulsões , Criança , Humanos , Adulto Jovem , Adulto , Estudos Retrospectivos , Convulsões/diagnóstico , Aprendizado de Máquina , Registros Eletrônicos de Saúde
9.
Commun Med (Lond) ; 2(1): 149, 2022 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-36414774

RESUMO

BACKGROUND: Prior research has shown that artificial intelligence (AI) systems often encode biases against minority subgroups. However, little work has focused on ways to mitigate the harm discriminatory algorithms can cause in high-stakes settings such as medicine. METHODS: In this study, we experimentally evaluated the impact biased AI recommendations have on emergency decisions, where participants respond to mental health crises by calling for either medical or police assistance. We recruited 438 clinicians and 516 non-experts to participate in our web-based experiment. We evaluated participant decision-making with and without advice from biased and unbiased AI systems. We also varied the style of the AI advice, framing it either as prescriptive recommendations or descriptive flags. RESULTS: Participant decisions are unbiased without AI advice. However, both clinicians and non-experts are influenced by prescriptive recommendations from a biased algorithm, choosing police help more often in emergencies involving African-American or Muslim men. Crucially, using descriptive flags rather than prescriptive recommendations allows respondents to retain their original, unbiased decision-making. CONCLUSIONS: Our work demonstrates the practical danger of using biased models in health contexts, and suggests that appropriately framing decision support can mitigate the effects of AI bias. These findings must be carefully considered in the many real-world clinical scenarios where inaccurate or biased models may be used to inform important decisions.


Artificial intelligence (AI) systems that make decisions based on historical data are increasingly common in health care settings. However, many AI models exhibit problematic biases, as data often reflect human prejudices against minority groups. In this study, we used a web-based experiment to evaluate the impact biased models can have when used to inform human decisions. We found that though participants were not inherently biased, they were strongly influenced by advice from a biased model if it was offered prescriptively (i.e., "you should do X"). This adherence led their decisions to be biased against African-American and Muslims individuals. However, framing the same advice descriptively (i.e., without recommending a specific action) allowed participants to remain fair. These results demonstrate that though discriminatory AI can lead to poor outcomes for minority groups, appropriately framing advice can help mitigate its effects.

10.
Proc Conf ; 2021: 4794-4811, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34179900

RESUMO

Summarization of clinical narratives is a long-standing research problem. Here, we introduce the task of hospital-course summarization. Given the documentation authored throughout a patient's hospitalization, generate a paragraph that tells the story of the patient admission. We construct an English, text-to-text dataset of 109,000 hospitalizations (2M source notes) and their corresponding summary proxy: the clinician-authored "Brief Hospital Course" paragraph written as part of a discharge note. Exploratory analyses reveal that the BHC paragraphs are highly abstractive with some long extracted fragments; are concise yet comprehensive; differ in style and content organization from the source notes; exhibit minimal lexical cohesion; and represent silver-standard references. Our analysis identifies multiple implications for modeling this complex, multi-document summarization task.

11.
Pac Symp Biocomput ; 26: 55-66, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33691004

RESUMO

Intimate partner violence (IPV) is an urgent, prevalent, and under-detected public health issue. We present machine learning models to assess patients for IPV and injury. We train the predictive algorithms on radiology reports with 1) IPV labels based on entry to a violence prevention program and 2) injury labels provided by emergency radiology fellowship-trained physicians. Our dataset includes 34,642 radiology reports and 1479 patients of IPV victims and control patients. Our best model predicts IPV a median of 3.08 years before violence prevention program entry with a sensitivity of 64% and a specificity of 95%. We conduct error analysis to determine for which patients our model has especially high or low performance and discuss next steps for a deployed clinical risk model.


Assuntos
Violência por Parceiro Íntimo , Radiologia , Biologia Computacional , Humanos
12.
Soc Sci Med ; 215: 92-97, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30219749

RESUMO

RATIONALE: Persons who identify as lesbian, gay, bisexual, and transgender (LGBT) face health inequities due to unwarranted discrimination against their sexual orientation or identity. An important contributor to LGBT health disparities is the inequitable or substandard care that LGBT individuals receive from hospitals. OBJECTIVE: To investigate inequities in hospital care among LGBT patients using the popular social media platform Twitter. METHOD: This study examined a dataset of Twitter communications (tweets) collected from February 2015 to May 2017. The tweets mentioned Twitter handles for hospitals (i.e., usernames for hospitals) and LGBT related terms. The topics discussed were explored to develop an LGBT position index referring to whether the hospital appears supportive or not supportive of LGBT rights. Results for each hospital were then compared to the Healthcare Equality Index (HEI), an established index to evaluate equity of hospital care towards LGBT patients. RESULTS: In total, 1856 tweets mentioned LGBT terms representing 653 unique hospitals. Of these hospitals, 189 (28.9%) were identified as HEI leaders. Hospitals in the Northeast showed significantly greater support towards LGBT issues compared to hospitals in the Midwest. Hospitals deemed as HEI leaders had higher LGBT position scores compared to non-HEI leaders (p = 0.042), when controlling for hospital size and location. CONCLUSIONS: This exploratory study describes a novel approach to monitoring LGBT hospital care. While these initial findings should be interpreted cautiously, they can potentially inform practices to improve equity of care and efforts to address health disparities among gender minority groups.


Assuntos
Disparidades em Assistência à Saúde/tendências , Hospitais/normas , Minorias Sexuais e de Gênero/psicologia , Mídias Sociais/tendências , Hospitais/estatística & dados numéricos , Humanos , Comportamento Sexual/estatística & dados numéricos , Minorias Sexuais e de Gênero/estatística & dados numéricos , Mídias Sociais/instrumentação , Mídias Sociais/estatística & dados numéricos , Estados Unidos
13.
AMIA Annu Symp Proc ; 2018: 740-749, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30815116

RESUMO

Over 75 million Americans have multiple concurrent chronic conditions and medical decision making for these patients is mostly based on retrospective cohort studies. Current methods to generate cohorts of patients with comorbidities are neither scalable nor generalizable. We propose a supervised machine learning algorithm for learning comorbidity phenotypes without requiring manually created training sets. First, we generated myocardial infarction (MI) and type-2 diabetes (T2DM) patient cohorts using ICD9-based imperfectly labeled samples upon which LASSO logistic regression models were trained. Second, we assessed the effects of training sample size, inclusion of physician input, and inclusion of clinical text features on model performance. Using ICD9 codes as our labeling heuristic, we achieved comparable performance to models created using keywords as labeling heuristic. We found that expert input and higher training sample sizes could compensate for the lack of clinical text derived features. However, our best performing model included clinical text as features with a large training sample size.


Assuntos
Comorbidade , Diabetes Mellitus Tipo 2 , Infarto do Miocárdio , Aprendizado de Máquina Supervisionado , Doença Crônica , Diabetes Mellitus Tipo 2/complicações , Humanos , Classificação Internacional de Doenças , Modelos Logísticos , Infarto do Miocárdio/complicações , Estudos Retrospectivos
14.
ISME J ; 10(5): 1170-81, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-26574685

RESUMO

Endogenous intestinal microbiota have wide-ranging and largely uncharacterized effects on host physiology. Here, we used reverse-phase liquid chromatography-coupled tandem mass spectrometry to define the mouse intestinal proteome in the stomach, jejunum, ileum, cecum and proximal colon under three colonization states: germ-free (GF), monocolonized with Bacteroides thetaiotaomicron and conventionally raised (CR). Our analysis revealed distinct proteomic abundance profiles along the gastrointestinal (GI) tract. Unsupervised clustering showed that host protein abundance primarily depended on GI location rather than colonization state and specific proteins and functions that defined these locations were identified by random forest classifications. K-means clustering of protein abundance across locations revealed substantial differences in host protein production between CR mice relative to GF and monocolonized mice. Finally, comparison with fecal proteomic data sets suggested that the identities of stool proteins are not biased to any region of the GI tract, but are substantially impacted by the microbiota in the distal colon.


Assuntos
Microbioma Gastrointestinal , Trato Gastrointestinal/microbiologia , Proteoma/metabolismo , Animais , Ceco/microbiologia , Análise por Conglomerados , Fezes , Íleo/microbiologia , Jejuno/microbiologia , Espectrometria de Massas , Camundongos , Estômago/microbiologia
15.
Vaccine ; 32(28): 3469-72, 2014 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-24795227

RESUMO

The Brighton Collaboration is a global research network focused on vaccine safety. The Collaboration has created case definitions to determine diagnostic certainty for several adverse events. Currently nested within multi-page publications, these definitions can be cumbersome for use. We report the results of a randomized trial in which the case definition for anaphylaxis was converted into a user-friendly algorithm and compared the algorithm with the standard case definition. The primary outcomes were efficiency and accuracy. Forty medical students determined the Brighton Level of diagnostic certainty of a sample case of anaphylaxis using either the algorithm or the original case definition. Most participants in both groups selected the correct Brighton Level. Participants using the algorithm required significantly less time to review the case and determine the level of diagnostic certainty [mean difference=107 s (95% CI: 13-200; p=0.026)], supporting that the algorithm was more efficient without impacting accuracy.


Assuntos
Algoritmos , Anafilaxia/diagnóstico , Vacinação/efeitos adversos , Sistemas de Notificação de Reações Adversas a Medicamentos , Anafilaxia/induzido quimicamente , Humanos , Estudantes de Medicina , Fatores de Tempo , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA