Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
Nat Commun ; 15(1): 7036, 2024 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-39147770

RESUMEN

Methane emissions from the oil and gas sector are a large contributor to climate change. Robust emission quantification and source attribution are needed for mitigating methane emissions, requiring a transparent, comprehensive, and accurate geospatial database of oil and gas infrastructure. Realizing such a database is hindered by data gaps nationally and globally. To fill these gaps, we present a deep learning approach on freely available, high-resolution satellite imagery for automatically mapping well pads and storage tanks. We validate the results in the Permian and Denver-Julesburg basins, two high-producing basins in the United States. Our approach achieves high performance on expert-curated datasets of well pads (Precision = 0.955, Recall = 0.904) and storage tanks (Precision = 0.962, Recall = 0.968). When deployed across the entire basins, the approach captures a majority of well pads in existing datasets (79.5%) and detects a substantial number (>70,000) of well pads not present in those datasets. Furthermore, we detect storage tanks (>169,000) on well pads, which were not mapped in existing datasets. We identify remaining challenges with the approach, which, when solved, should enable a globally scalable and public framework for mapping well pads, storage tanks, and other oil and gas infrastructure.

2.
Pac Symp Biocomput ; 29: 120-133, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38160274

RESUMEN

Lack of diagnosis coding is a barrier to leveraging veterinary notes for medical and public health research. Previous work is limited to develop specialized rule-based or customized supervised learning models to predict diagnosis coding, which is tedious and not easily transferable. In this work, we show that open-source large language models (LLMs) pretrained on general corpus can achieve reasonable performance in a zero-shot setting. Alpaca-7B can achieve a zero-shot F1 of 0.538 on CSU test data and 0.389 on PP test data, two standard benchmarks for coding from veterinary notes. Furthermore, with appropriate fine-tuning, the performance of LLMs can be substantially boosted, exceeding those of strong state-of-the-art supervised models. VetLLM, which is fine-tuned on Alpaca-7B using just 5000 veterinary notes, can achieve a F1 of 0.747 on CSU test data and 0.637 on PP test data. It is of note that our fine-tuning is data-efficient: using 200 notes can outperform supervised models trained with more than 100,000 notes. The findings demonstrate the great potential of leveraging LLMs for language processing tasks in medicine, and we advocate this new paradigm for processing clinical text.


Asunto(s)
Camélidos del Nuevo Mundo , Humanos , Animales , Procesamiento de Lenguaje Natural , Biología Computacional , Lenguaje
3.
Patterns (N Y) ; 4(9): 100802, 2023 Sep 08.
Artículo en Inglés | MEDLINE | ID: mdl-37720336

RESUMEN

Artificial intelligence (AI) models for automatic generation of narrative radiology reports from images have the potential to enhance efficiency and reduce the workload of radiologists. However, evaluating the correctness of these reports requires metrics that can capture clinically pertinent differences. In this study, we investigate the alignment between automated metrics and radiologists' scoring of errors in report generation. We address the limitations of existing metrics by proposing new metrics, RadGraph F1 and RadCliQ, which demonstrate stronger correlation with radiologists' evaluations. In addition, we analyze the failure modes of the metrics to understand their limitations and provide guidance for metric selection and interpretation. This study establishes RadGraph F1 and RadCliQ as meaningful metrics for guiding future research in radiology report generation.

4.
AMIA Annu Symp Proc ; 2023: 1007-1016, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38222438

RESUMEN

Low-yield repetitive laboratory diagnostics burden patients and inflate cost of care. In this study, we assess whether stability in repeated laboratory diagnostic measurements is predictable with uncertainty estimates using electronic health record data available before the diagnostic is ordered. We use probabilistic regression to predict a distribution of plausible values, allowing use-time customization for various definitions of "stability" given dynamic ranges and clinical scenarios. After converting distributions into "stability" scores, the models achieve a sensitivity of 29% for white blood cells, 60% for hemoglobin, 100% for platelets, 54% for potassium, 99% for albumin and 35% for creatinine for predicting stability at 90% precision, suggesting those fractions of repetitive tests could be reduced with low risk of missing important changes. The findings demonstrate the feasibility of using electronic health record data to identify low-yield repetitive tests and offer personalized guidance for better usage of testing while ensuring high quality care.


Asunto(s)
Técnicas de Laboratorio Clínico , Hemoglobinas , Humanos
5.
Nat Biomed Eng ; 6(12): 1399-1406, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36109605

RESUMEN

In tasks involving the interpretation of medical images, suitably trained machine-learning models often exceed the performance of medical experts. Yet such a high-level of performance typically requires that the models be trained with relevant datasets that have been painstakingly annotated by experts. Here we show that a self-supervised model trained on chest X-ray images that lack explicit annotations performs pathology-classification tasks with accuracies comparable to those of radiologists. On an external validation dataset of chest X-rays, the self-supervised model outperformed a fully supervised model in the detection of three pathologies (out of eight), and the performance generalized to pathologies that were not explicitly annotated for model training, to multiple image-interpretation tasks and to datasets from multiple institutions.


Asunto(s)
Aprendizaje Automático , Aprendizaje Automático Supervisado , Rayos X
6.
J Am Med Inform Assoc ; 29(11): 1908-1918, 2022 10 07.
Artículo en Inglés | MEDLINE | ID: mdl-35994003

RESUMEN

OBJECTIVE: Chest pain is common, and current risk-stratification methods, requiring 12-lead electrocardiograms (ECGs) and serial biomarker assays, are static and restricted to highly resourced settings. Our objective was to predict myocardial injury using continuous single-lead ECG waveforms similar to those obtained from wearable devices and to evaluate the potential of transfer learning from labeled 12-lead ECGs to improve these predictions. METHODS: We studied 10 874 Emergency Department (ED) patients who received continuous ECG monitoring and troponin testing from 2020 to 2021. We defined myocardial injury as newly elevated troponin in patients with chest pain or shortness of breath. We developed deep learning models of myocardial injury using continuous lead II ECG from bedside monitors as well as conventional 12-lead ECGs from triage. We pretrained single-lead models on a pre-existing corpus of labeled 12-lead ECGs. We compared model predictions to those of ED physicians. RESULTS: A transfer learning strategy, whereby models for continuous single-lead ECGs were first pretrained on 12-lead ECGs from a separate cohort, predicted myocardial injury as accurately as models using patients' own 12-lead ECGs: area under the receiver operating characteristic curve 0.760 (95% confidence interval [CI], 0.721-0.799) and area under the precision-recall curve 0.321 (95% CI, 0.251-0.397). Models demonstrated a high negative predictive value for myocardial injury among patients with chest pain or shortness of breath, exceeding the predictive performance of ED physicians, while attending to known stigmata of myocardial injury. CONCLUSIONS: Deep learning models pretrained on labeled 12-lead ECGs can predict myocardial injury from noisy, continuous monitor data early in a patient's presentation. The utility of continuous single-lead ECG in the risk stratification of chest pain has implications for wearable devices and preclinical settings, where external validation of the approach is needed.


Asunto(s)
Dolor en el Pecho , Electrocardiografía , Biomarcadores , Dolor en el Pecho/diagnóstico , Dolor en el Pecho/etiología , Disnea/diagnóstico , Disnea/etiología , Electrocardiografía/métodos , Servicio de Urgencia en Hospital , Humanos , Aprendizaje Automático , Troponina
7.
Patterns (N Y) ; 3(1): 100400, 2022 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-35079716

RESUMEN

Data labeling is often the limiting step in machine learning because it requires time from trained experts. To address the limitation on labeled data, contrastive learning, among other unsupervised learning methods, leverages unlabeled data to learn representations of data. Here, we propose a contrastive learning framework that utilizes metadata for selecting positive and negative pairs when training on unlabeled data. We demonstrate its application in the healthcare domain on heart and lung sound recordings. The increasing availability of heart and lung sound recordings due to adoption of digital stethoscopes lends itself as an opportunity to demonstrate the application of our contrastive learning method. Compared to contrastive learning with augmentations, the contrastive learning model leveraging metadata for pair selection utilizes clinical information associated with lung and heart sound recordings. This approach uses shared context of the recordings on the patient level using clinical information including age, sex, weight, location of sounds, etc. We show improvement in downstream tasks for diagnosing heart and lung sounds when leveraging patient-specific representations in selecting positive and negative pairs. This study paves the path for medical applications of contrastive learning that leverage clinical information. We have made our code available here: https://github.com/stanfordmlgroup/selfsupervised-lungandheartsounds.

8.
J Thorac Imaging ; 37(3): 162-167, 2022 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-34561377

RESUMEN

PURPOSE: Patients with pneumonia often present to the emergency department (ED) and require prompt diagnosis and treatment. Clinical decision support systems for the diagnosis and management of pneumonia are commonly utilized in EDs to improve patient care. The purpose of this study is to investigate whether a deep learning model for detecting radiographic pneumonia and pleural effusions can improve functionality of a clinical decision support system (CDSS) for pneumonia management (ePNa) operating in 20 EDs. MATERIALS AND METHODS: In this retrospective cohort study, a dataset of 7434 prior chest radiographic studies from 6551 ED patients was used to develop and validate a deep learning model to identify radiographic pneumonia, pleural effusions, and evidence of multilobar pneumonia. Model performance was evaluated against 3 radiologists' adjudicated interpretation and compared with performance of the natural language processing of radiology reports used by ePNa. RESULTS: The deep learning model achieved an area under the receiver operating characteristic curve of 0.833 (95% confidence interval [CI]: 0.795, 0.868) for detecting radiographic pneumonia, 0.939 (95% CI: 0.911, 0.962) for detecting pleural effusions and 0.847 (95% CI: 0.800, 0.890) for identifying multilobar pneumonia. On all 3 tasks, the model achieved higher agreement with the adjudicated radiologist interpretation compared with ePNa. CONCLUSIONS: A deep learning model demonstrated higher agreement with radiologists than the ePNa CDSS in detecting radiographic pneumonia and related findings. Incorporating deep learning models into pneumonia CDSS could enhance diagnostic performance and improve pneumonia management.


Asunto(s)
Sistemas de Apoyo a Decisiones Clínicas , Aprendizaje Profundo , Derrame Pleural , Neumonía , Servicio de Urgencia en Hospital , Humanos , Derrame Pleural/diagnóstico por imagen , Neumonía/diagnóstico por imagen , Radiografía Torácica , Estudios Retrospectivos
9.
EBioMedicine ; 71: 103546, 2021 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-34419924

RESUMEN

BACKGROUND: Respiratory virus infections are significant causes of morbidity and mortality, and may induce host metabolite alterations by infecting respiratory epithelial cells. We investigated the use of liquid chromatography quadrupole time-of-flight mass spectrometry (LC/Q-TOF) combined with machine learning for the diagnosis of influenza infection. METHODS: We analyzed nasopharyngeal swab samples by LC/Q-TOF to identify distinct metabolic signatures for diagnosis of acute illness. Machine learning models were performed for classification, followed by Shapley additive explanation (SHAP) analysis to analyze feature importance and for biomarker discovery. FINDINGS: A total of 236 samples were tested in the discovery phase by LC/Q-TOF, including 118 positive samples (40 influenza A 2009 H1N1, 39 influenza H3 and 39 influenza B) as well as 118 age and sex-matched negative controls with acute respiratory illness. Analysis showed an area under the receiver operating characteristic curve (AUC) of 1.00 (95% confidence interval [95% CI] 0.99, 1.00), sensitivity of 1.00 (95% CI 0.86, 1.00) and specificity of 0.96 (95% CI 0.81, 0.99). The metabolite most strongly associated with differential classification was pyroglutamic acid. Independent validation of a biomarker signature based on the top 20 differentiating ion features was performed in a prospective cohort of 96 symptomatic individuals including 48 positive samples (24 influenza A 2009 H1N1, 5 influenza H3 and 19 influenza B) and 48 negative samples. Testing performed using a clinically-applicable targeted approach, liquid chromatography triple quadrupole mass spectrometry, showed an AUC of 1.00 (95% CI 0.998, 1.00), sensitivity of 0.94 (95% CI 0.83, 0.98), and specificity of 1.00 (95% CI 0.93, 1.00). Limitations include lack of sample suitability assessment, and need to validate these findings in additional patient populations. INTERPRETATION: This metabolomic approach has potential for diagnostic applications in infectious diseases testing, including other respiratory viruses, and may eventually be adapted for point-of-care testing. FUNDING: None.


Asunto(s)
Gripe Humana/diagnóstico , Aprendizaje Automático , Metaboloma , Técnicas de Diagnóstico Molecular/métodos , Adolescente , Adulto , Niño , Preescolar , Femenino , Cromatografía de Gases y Espectrometría de Masas/métodos , Humanos , Gripe Humana/metabolismo , Gripe Humana/virología , Masculino , Metabolómica/métodos , Mucosa Nasal/metabolismo , Mucosa Nasal/virología , Orthomyxoviridae/patogenicidad , Ácido Pirrolidona Carboxílico/análisis
10.
JAMA Netw Open ; 4(7): e2117391, 2021 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-34297075

RESUMEN

Importance: Physicians are required to work with rapidly growing amounts of medical data. Approximately 62% of time per patient is devoted to reviewing electronic health records (EHRs), with clinical data review being the most time-consuming portion. Objective: To determine whether an artificial intelligence (AI) system developed to organize and display new patient referral records would improve a clinician's ability to extract patient information compared with the current standard of care. Design, Setting, and Participants: In this prognostic study, an AI system was created to organize patient records and improve data retrieval. To evaluate the system on time and accuracy, a nonblinded, prospective study was conducted at a single academic medical center. Recruitment emails were sent to all physicians in the gastroenterology division, and 12 clinicians agreed to participate. Each of the clinicians participating in the study received 2 referral records: 1 AI-optimized patient record and 1 standard (non-AI-optimized) patient record. For each record, clinicians were asked 22 questions requiring them to search the assigned record for clinically relevant information. Clinicians reviewed records from June 1 to August 30, 2020. Main Outcomes and Measures: The time required to answer each question, along with accuracy, was measured for both records, with and without AI optimization. Participants were asked to assess overall satisfaction with the AI system, their preferred review method (AI-optimized vs standard), and other topics to assess clinical utility. Results: Twelve gastroenterology physicians/fellows completed the study. Compared with standard (non-AI-optimized) patient record review, the AI system saved first-time physician users 18% of the time used to answer the clinical questions (10.5 [95% CI, 8.5-12.6] vs 12.8 [95% CI, 9.4-16.2] minutes; P = .02). There was no significant decrease in accuracy when physicians retrieved important patient information (83.7% [95% CI, 79.3%-88.2%] with the AI-optimized vs 86.0% [95% CI, 81.8%-90.2%] without the AI-optimized record; P = .81). Survey responses from physicians were generally positive across all questions. Eleven of 12 physicians (92%) preferred the AI-optimized record review to standard review. Despite a learning curve pointed out by respondents, 11 of 12 physicians believed that the technology would save them time to assess new patient records and were interested in using this technology in their clinic. Conclusions and Relevance: In this prognostic study, an AI system helped physicians extract relevant patient information in a shorter time while maintaining high accuracy. This finding is particularly germane to the ever-increasing amounts of medical data and increased stressors on clinicians. Increased user familiarity with the AI system, along with further enhancements in the system itself, hold promise to further improve physician data extraction from large quantities of patient health records.


Asunto(s)
Inteligencia Artificial , Almacenamiento y Recuperación de la Información/métodos , Registros Médicos , Médicos/psicología , Diseño Centrado en el Usuario , Centros Médicos Académicos , Adulto , Femenino , Humanos , Satisfacción en el Trabajo , Masculino , Persona de Mediana Edad , Estudios Prospectivos , Derivación y Consulta , Análisis y Desempeño de Tareas , Factores de Tiempo , Carga de Trabajo/psicología
11.
NPJ Digit Med ; 4(1): 88, 2021 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-34075194

RESUMEN

Coronary artery disease (CAD), the most common manifestation of cardiovascular disease, remains the most common cause of mortality in the United States. Risk assessment is key for primary prevention of coronary events and coronary artery calcium (CAC) scoring using computed tomography (CT) is one such non-invasive tool. Despite the proven clinical value of CAC, the current clinical practice implementation for CAC has limitations such as the lack of insurance coverage for the test, need for capital-intensive CT machines, specialized imaging protocols, and accredited 3D imaging labs for analysis (including personnel and software). Perhaps the greatest gap is the millions of patients who undergo routine chest CT exams and demonstrate coronary artery calcification, but their presence is not often reported or quantitation is not feasible. We present two deep learning models that automate CAC scoring demonstrating advantages in automated scoring for both dedicated gated coronary CT exams and routine non-gated chest CTs performed for other reasons to allow opportunistic screening. First, we trained a gated coronary CT model for CAC scoring that showed near perfect agreement (mean difference in scores = -2.86; Cohen's Kappa = 0.89, P < 0.0001) with current conventional manual scoring on a retrospective dataset of 79 patients and was found to perform the task faster (average time for automated CAC scoring using a graphics processing unit (GPU) was 3.5 ± 2.1 s vs. 261 s for manual scoring) in a prospective trial of 55 patients with little difference in scores compared to three technologists (mean difference in scores = 3.24, 5.12, and 5.48, respectively). Then using CAC scores from paired gated coronary CT as a reference standard, we trained a deep learning model on our internal data and a cohort from the Multi-Ethnic Study of Atherosclerosis (MESA) study (total training n = 341, Stanford test n = 42, MESA test n = 46) to perform CAC scoring on routine non-gated chest CT exams with validation on external datasets (total n = 303) obtained from four geographically disparate health systems. On identifying patients with any CAC (i.e., CAC ≥ 1), sensitivity and PPV was high across all datasets (ranges: 80-100% and 87-100%, respectively). For CAC ≥ 100 on routine non-gated chest CTs, which is the latest recommended threshold to initiate statin therapy, our model showed sensitivities of 71-94% and positive predictive values in the range of 88-100% across all the sites. Adoption of this model could allow more patients to be screened with CAC scoring, potentially allowing opportunistic early preventive interventions.

12.
Sci Data ; 8(1): 135, 2021 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-34017010

RESUMEN

Diffuse Large B-Cell Lymphoma (DLBCL) is the most common non-Hodgkin lymphoma. Though histologically DLBCL shows varying morphologies, no morphologic features have been consistently demonstrated to correlate with prognosis. We present a morphologic analysis of histology sections from 209 DLBCL cases with associated clinical and cytogenetic data. Duplicate tissue core sections were arranged in tissue microarrays (TMAs), and replicate sections were stained with H&E and immunohistochemical stains for CD10, BCL6, MUM1, BCL2, and MYC. The TMAs are accompanied by pathologist-annotated regions-of-interest (ROIs) that identify areas of tissue representative of DLBCL. We used a deep learning model to segment all tumor nuclei in the ROIs, and computed several geometric features for each segmented nucleus. We fit a Cox proportional hazards model to demonstrate the utility of these geometric features in predicting survival outcome, and found that it achieved a C-index (95% CI) of 0.635 (0.574,0.691). Our finding suggests that geometric features computed from tumor nuclei are of prognostic importance, and should be validated in prospective studies.


Asunto(s)
Aprendizaje Profundo , Linfoma de Células B Grandes Difuso/genética , Linfoma de Células B Grandes Difuso/patología , Núcleo Celular/ultraestructura , Eosina Amarillenta-(YS) , Hematoxilina , Humanos , Pronóstico , Coloración y Etiquetado , Análisis de Matrices Tisulares
13.
NPJ Digit Med ; 3: 115, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32964138

RESUMEN

Tuberculosis (TB) is the leading cause of preventable death in HIV-positive patients, and yet often remains undiagnosed and untreated. Chest x-ray is often used to assist in diagnosis, yet this presents additional challenges due to atypical radiographic presentation and radiologist shortages in regions where co-infection is most common. We developed a deep learning algorithm to diagnose TB using clinical information and chest x-ray images from 677 HIV-positive patients with suspected TB from two hospitals in South Africa. We then sought to determine whether the algorithm could assist clinicians in the diagnosis of TB in HIV-positive patients as a web-based diagnostic assistant. Use of the algorithm resulted in a modest but statistically significant improvement in clinician accuracy (p = 0.002), increasing the mean clinician accuracy from 0.60 (95% CI 0.57, 0.63) without assistance to 0.65 (95% CI 0.60, 0.70) with assistance. However, the accuracy of assisted clinicians was significantly lower (p < 0.001) than that of the stand-alone algorithm, which had an accuracy of 0.79 (95% CI 0.77, 0.82) on the same unseen test cases. These results suggest that deep learning assistance may improve clinician accuracy in TB diagnosis using chest x-rays, which would be valuable in settings with a high burden of HIV/TB co-infection. Moreover, the high accuracy of the stand-alone algorithm suggests a potential value particularly in settings with a scarcity of radiological expertise.

15.
NPJ Digit Med ; 3: 61, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32352039

RESUMEN

Pulmonary embolism (PE) is a life-threatening clinical problem and computed tomography pulmonary angiography (CTPA) is the gold standard for diagnosis. Prompt diagnosis and immediate treatment are critical to avoid high morbidity and mortality rates, yet PE remains among the diagnoses most frequently missed or delayed. In this study, we developed a deep learning model-PENet, to automatically detect PE on volumetric CTPA scans as an end-to-end solution for this purpose. The PENet is a 77-layer 3D convolutional neural network (CNN) pretrained on the Kinetics-600 dataset and fine-tuned on a retrospective CTPA dataset collected from a single academic institution. The PENet model performance was evaluated in detecting PE on data from two different institutions: one as a hold-out dataset from the same institution as the training data and a second collected from an external institution to evaluate model generalizability to an unrelated population dataset. PENet achieved an AUROC of 0.84 [0.82-0.87] on detecting PE on the hold out internal test set and 0.85 [0.81-0.88] on external dataset. PENet also outperformed current state-of-the-art 3D CNN models. The results represent successful application of an end-to-end 3D CNN model for the complex task of PE diagnosis without requiring computationally intensive and time consuming preprocessing and demonstrates sustained performance on data from an external institution. Our model could be applied as a triage tool to automatically identify clinically important PEs allowing for prioritization for diagnostic radiology interpretation and improved care pathways via more efficient diagnosis.

16.
BMC Public Health ; 20(1): 608, 2020 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-32357871

RESUMEN

BACKGROUND: Risk adjustment models are employed to prevent adverse selection, anticipate budgetary reserve needs, and offer care management services to high-risk individuals. We aimed to address two unknowns about risk adjustment: whether machine learning (ML) and inclusion of social determinants of health (SDH) indicators improve prospective risk adjustment for health plan payments. METHODS: We employed a 2-by-2 factorial design comparing: (i) linear regression versus ML (gradient boosting) and (ii) demographics and diagnostic codes alone, versus additional ZIP code-level SDH indicators. Healthcare claims from privately-insured US adults (2016-2017), and Census data were used for analysis. Data from 1.02 million adults were used for derivation, and data from 0.26 million to assess performance. Model performance was measured using coefficient of determination (R2), discrimination (C-statistic), and mean absolute error (MAE) for the overall population, and predictive ratio and net compensation for vulnerable subgroups. We provide 95% confidence intervals (CI) around each performance measure. RESULTS: Linear regression without SDH indicators achieved moderate determination (R2 0.327, 95% CI: 0.300, 0.353), error ($6992; 95% CI: $6889, $7094), and discrimination (C-statistic 0.703; 95% CI: 0.701, 0.705). ML without SDH indicators improved all metrics (R2 0.388; 95% CI: 0.357, 0.420; error $6637; 95% CI: $6539, $6735; C-statistic 0.717; 95% CI: 0.715, 0.718), reducing misestimation of cost by $3.5 M per 10,000 members. Among people living in areas with high poverty, high wealth inequality, or high prevalence of uninsured, SDH indicators reduced underestimation of cost, improving the predictive ratio by 3% (~$200/person/year). CONCLUSIONS: ML improved risk adjustment models and the incorporation of SDH indicators reduced underpayment in several vulnerable populations.


Asunto(s)
Promoción de la Salud/economía , Promoción de la Salud/estadística & datos numéricos , Seguro de Salud/economía , Seguro de Salud/estadística & datos numéricos , Aprendizaje Automático/economía , Aprendizaje Automático/estadística & datos numéricos , Determinantes Sociales de la Salud/economía , Determinantes Sociales de la Salud/estadística & datos numéricos , Adulto , Análisis Costo-Beneficio , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estudios Prospectivos , Ajuste de Riesgo
17.
NPJ Digit Med ; 3: 23, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32140566

RESUMEN

Artificial intelligence (AI) algorithms continue to rival human performance on a variety of clinical tasks, while their actual impact on human diagnosticians, when incorporated into clinical workflows, remains relatively unexplored. In this study, we developed a deep learning-based assistant to help pathologists differentiate between two subtypes of primary liver cancer, hepatocellular carcinoma and cholangiocarcinoma, on hematoxylin and eosin-stained whole-slide images (WSI), and evaluated its effect on the diagnostic performance of 11 pathologists with varying levels of expertise. Our model achieved accuracies of 0.885 on a validation set of 26 WSI, and 0.842 on an independent test set of 80 WSI. Although use of the assistant did not change the mean accuracy of the 11 pathologists (p = 0.184, OR = 1.281), it significantly improved the accuracy (p = 0.045, OR = 1.499) of a subset of nine pathologists who fell within well-defined experience levels (GI subspecialists, non-GI subspecialists, and trainees). In the assisted state, model accuracy significantly impacted the diagnostic decisions of all 11 pathologists. As expected, when the model's prediction was correct, assistance significantly improved accuracy (p = 0.000, OR = 4.289), whereas when the model's prediction was incorrect, assistance significantly decreased accuracy (p = 0.000, OR = 0.253), with both effects holding across all pathologist experience levels and case difficulty levels. Our results highlight the challenges of translating AI models into the clinical setting, and emphasize the importance of taking into account potential unintended negative consequences of model assistance when designing and testing medical AI-assistance tools.

18.
Sci Rep ; 10(1): 3958, 2020 03 03.
Artículo en Inglés | MEDLINE | ID: mdl-32127625

RESUMEN

The development of deep learning algorithms for complex tasks in digital medicine has relied on the availability of large labeled training datasets, usually containing hundreds of thousands of examples. The purpose of this study was to develop a 3D deep learning model, AppendiXNet, to detect appendicitis, one of the most common life-threatening abdominal emergencies, using a small training dataset of less than 500 training CT exams. We explored whether pretraining the model on a large collection of natural videos would improve the performance of the model over training the model from scratch. AppendiXNet was pretrained on a large collection of YouTube videos called Kinetics, consisting of approximately 500,000 video clips and annotated for one of 600 human action classes, and then fine-tuned on a small dataset of 438 CT scans annotated for appendicitis. We found that pretraining the 3D model on natural videos significantly improved the performance of the model from an AUC of 0.724 (95% CI 0.625, 0.823) to 0.810 (95% CI 0.725, 0.895). The application of deep learning to detect abnormalities on CT examinations using video pretraining could generalize effectively to other challenging cross-sectional medical imaging tasks when training data is limited.


Asunto(s)
Algoritmos , Apendicitis/diagnóstico , Apendicitis/metabolismo , Aprendizaje Profundo , Adulto , Estudios Transversales , Femenino , Humanos , Masculino , Persona de Mediana Edad
19.
JAMA Netw Open ; 2(6): e195600, 2019 06 05.
Artículo en Inglés | MEDLINE | ID: mdl-31173130

RESUMEN

Importance: Deep learning has the potential to augment clinician performance in medical imaging interpretation and reduce time to diagnosis through automated segmentation. Few studies to date have explored this topic. Objective: To develop and apply a neural network segmentation model (the HeadXNet model) capable of generating precise voxel-by-voxel predictions of intracranial aneurysms on head computed tomographic angiography (CTA) imaging to augment clinicians' intracranial aneurysm diagnostic performance. Design, Setting, and Participants: In this diagnostic study, a 3-dimensional convolutional neural network architecture was developed using a training set of 611 head CTA examinations to generate aneurysm segmentations. Segmentation outputs from this support model on a test set of 115 examinations were provided to clinicians. Between August 13, 2018, and October 4, 2018, 8 clinicians diagnosed the presence of aneurysm on the test set, both with and without model augmentation, in a crossover design using randomized order and a 14-day washout period. Head and neck examinations performed between January 3, 2003, and May 31, 2017, at a single academic medical center were used to train, validate, and test the model. Examinations positive for aneurysm had at least 1 clinically significant, nonruptured intracranial aneurysm. Examinations with hemorrhage, ruptured aneurysm, posttraumatic or infectious pseudoaneurysm, arteriovenous malformation, surgical clips, coils, catheters, or other surgical hardware were excluded. All other CTA examinations were considered controls. Main Outcomes and Measures: Sensitivity, specificity, accuracy, time, and interrater agreement were measured. Metrics for clinician performance with and without model augmentation were compared. Results: The data set contained 818 examinations from 662 unique patients with 328 CTA examinations (40.1%) containing at least 1 intracranial aneurysm and 490 examinations (59.9%) without intracranial aneurysms. The 8 clinicians reading the test set ranged in experience from 2 to 12 years. Augmenting clinicians with artificial intelligence-produced segmentation predictions resulted in clinicians achieving statistically significant improvements in sensitivity, accuracy, and interrater agreement when compared with no augmentation. The clinicians' mean sensitivity increased by 0.059 (95% CI, 0.028-0.091; adjusted P = .01), mean accuracy increased by 0.038 (95% CI, 0.014-0.062; adjusted P = .02), and mean interrater agreement (Fleiss κ) increased by 0.060, from 0.799 to 0.859 (adjusted P = .05). There was no statistically significant change in mean specificity (0.016; 95% CI, -0.010 to 0.041; adjusted P = .16) and time to diagnosis (5.71 seconds; 95% CI, 7.22-18.63 seconds; adjusted P = .19). Conclusions and Relevance: The deep learning model developed successfully detected clinically significant intracranial aneurysms on CTA. This suggests that integration of an artificial intelligence-assisted diagnostic model may augment clinician performance with dependable and accurate predictions and thereby optimize patient care.


Asunto(s)
Aprendizaje Profundo , Aneurisma Intracraneal/diagnóstico , Competencia Clínica/normas , Simulación por Computador , Estudios Cruzados , Diagnóstico por Computador/métodos , Femenino , Humanos , Masculino , Persona de Mediana Edad , Examen Neurológico/métodos , Neurólogos/normas , Estudios Retrospectivos
20.
Circ Cardiovasc Qual Outcomes ; 12(3): e005010, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30857410

RESUMEN

BACKGROUND: The absolute risk reduction (ARR) in cardiovascular events from therapy is generally assumed to be proportional to baseline risk-such that high-risk patients benefit most. Yet newer analyses have proposed using randomized trial data to develop models that estimate individual treatment effects. We tested 2 hypotheses: first, that models of individual treatment effects would reveal that benefit from intensive blood pressure therapy is proportional to baseline risk; and second, that a machine learning approach designed to predict heterogeneous treatment effects-the X-learner meta-algorithm-is equivalent to a conventional logistic regression approach. METHODS AND RESULTS: We compared conventional logistic regression to the X-learner approach for prediction of 3-year cardiovascular disease event risk reduction from intensive (target systolic blood pressure <120 mm Hg) versus standard (target <140 mm Hg) blood pressure treatment, using individual participant data from the SPRINT (Systolic Blood Pressure Intervention Trial; N=9361) and ACCORD BP (Action to Control Cardiovascular Risk in Diabetes Blood Pressure; N=4733) trials. Each model incorporated 17 covariates, an indicator for treatment arm, and interaction terms between covariates and treatment. Logistic regression had lower C statistic for benefit than the X-learner (0.51 [95% CI, 0.49-0.53] versus 0.60 [95% CI, 0.58-0.63], respectively). Following the logistic regression's recommendation for individualized therapy produced restricted mean time until cardiovascular disease event of 1065.47 days (95% CI, 1061.04-1069.35), while following the X-learner's recommendation improved mean time until cardiovascular disease event to 1068.71 days (95% CI, 1065.42-1072.08). Calibration was worse for logistic regression; it over-estimated ARR attributable to intensive treatment (slope between predicted and observed ARR of 0.73 [95% CI, 0.30-1.14] versus 1.06 [95% CI, 0.74-1.32] for the X-learner, compared with the ideal of 1). Predicted ARRs using logistic regression were generally proportional to baseline pretreatment cardiovascular risk, whereas the X-learner observed-correctly-that individual treatment effects were often not proportional to baseline risk. CONCLUSIONS: Predictions for individual treatment effects from trial data reveal that patients may experience ARRs not simply proportional to baseline cardiovascular disease risk. Machine learning methods may improve discrimination and calibration of individualized treatment effect estimates from clinical trial data. CLINICAL TRIAL REGISTRATION: URL: https://www.clinicaltrials.gov . Unique identifiers: NCT01206062; NCT00000620.


Asunto(s)
Antihipertensivos/uso terapéutico , Presión Sanguínea/efectos de los fármacos , Minería de Datos , Hipertensión/tratamiento farmacológico , Aprendizaje Automático , Anciano , Antihipertensivos/efectos adversos , Femenino , Humanos , Hipertensión/diagnóstico , Hipertensión/fisiopatología , Masculino , Persona de Mediana Edad , Ensayos Clínicos Controlados Aleatorios como Asunto , Medición de Riesgo , Factores de Riesgo , Resultado del Tratamiento
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA