RESUMEN
RATIONALE: Estimating the causal effect of an intervention at individual level, also called individual treatment effect (ITE), may help in identifying response prior to the intervention. OBJECTIVES: We aimed to develop machine learning (ML) models which estimate ITE of an intervention using data from randomised controlled trials and illustrate this approach with prediction of ITE on annual chronic obstructive pulmonary disease (COPD) exacerbation rates. METHODS: We used data from 8151 patients with COPD of the Study to Understand Mortality and MorbidITy in COPD (SUMMIT) trial (NCT01313676) to address the ITE of fluticasone furoate/vilanterol (FF/VI) versus control (placebo) on exacerbation rate and developed a novel metric, Q-score, for assessing the power of causal inference models. We then validated the methodology on 5990 subjects from the InforMing the PAthway of COPD Treatment (IMPACT) trial (NCT02164513) to estimate the ITE of FF/umeclidinium/VI (FF/UMEC/VI) versus UMEC/VI on exacerbation rate. We used Causal Forest as causal inference model. RESULTS: In SUMMIT, Causal Forest was optimised on the training set (n=5705) and tested on 2446 subjects (Q-score 0.61). In IMPACT, Causal Forest was optimised on 4193 subjects in the training set and tested on 1797 individuals (Q-score 0.21). In both trials, the quantiles of patients with the strongest ITE consistently demonstrated the largest reductions in observed exacerbations rates (0.54 and 0.53, p<0.001). Poor lung function and blood eosinophils, respectively, were the strongest predictors of ITE. CONCLUSIONS: This study shows that ML models for causal inference can be used to identify individual response to different COPD treatments and highlight treatment traits. Such models could become clinically useful tools for individual treatment decisions in COPD.
Asunto(s)
Pulmón , Enfermedad Pulmonar Obstructiva Crónica , Humanos , Administración por Inhalación , Enfermedad Pulmonar Obstructiva Crónica/tratamiento farmacológico , Androstadienos/uso terapéutico , Androstadienos/farmacología , Alcoholes Bencílicos/uso terapéutico , Alcoholes Bencílicos/farmacología , Clorobencenos/uso terapéutico , Clorobencenos/farmacología , Broncodilatadores/uso terapéutico , Combinación de Medicamentos , Método Doble Ciego , Resultado del Tratamiento , Ensayos Clínicos Controlados Aleatorios como AsuntoRESUMEN
BACKGROUND: Few studies have investigated the collaborative potential between artificial intelligence (AI) and pulmonologists for diagnosing pulmonary disease. We hypothesised that the collaboration between a pulmonologist and AI with explanations (explainable AI (XAI)) is superior in diagnostic interpretation of pulmonary function tests (PFTs) than the pulmonologist without support. METHODS: The study was conducted in two phases, a monocentre study (phase 1) and a multicentre intervention study (phase 2). Each phase utilised two different sets of 24 PFT reports of patients with a clinically validated gold standard diagnosis. Each PFT was interpreted without (control) and with XAI's suggestions (intervention). Pulmonologists provided a differential diagnosis consisting of a preferential diagnosis and optionally up to three additional diagnoses. The primary end-point compared accuracy of preferential and additional diagnoses between control and intervention. Secondary end-points were the number of diagnoses in differential diagnosis, diagnostic confidence and inter-rater agreement. We also analysed how XAI influenced pulmonologists' decisions. RESULTS: In phase 1 (n=16 pulmonologists), mean preferential and differential diagnostic accuracy significantly increased by 10.4% and 9.4%, respectively, between control and intervention (p<0.001). Improvements were somewhat lower but highly significant (p<0.0001) in phase 2 (5.4% and 8.7%, respectively; n=62 pulmonologists). In both phases, the number of diagnoses in the differential diagnosis did not reduce, but diagnostic confidence and inter-rater agreement significantly increased during intervention. Pulmonologists updated their decisions with XAI's feedback and consistently improved their baseline performance if AI provided correct predictions. CONCLUSION: A collaboration between a pulmonologist and XAI is better at interpreting PFTs than individual pulmonologists reading without XAI support or XAI alone.
Asunto(s)
Inteligencia Artificial , Enfermedades Pulmonares , Humanos , Neumólogos , Pruebas de Función Respiratoria , Enfermedades Pulmonares/diagnósticoRESUMEN
BACKGROUND: Parameters from maximal expiratory flow-volume curves (MEFVC) have been linked to CT-based parameters of COPD. However, the association between MEFVC shape and phenotypes like emphysema, small airways disease (SAD) and bronchial wall thickening (BWT) has not been investigated. RESEARCH QUESTION: We analyzed if the shape of MEFVC can be linked to CT-determined emphysema, SAD and BWT in a large cohort of COPDGene participants. STUDY DESIGN AND METHODS: In the COPDGene cohort, we used principal component analysis (PCA) to extract patterns from MEFVC shape and performed multiple linear regression to assess the association of these patterns with CT parameters over the COPD spectrum, in mild and moderate-severe COPD. RESULTS: Over the entire spectrum, in mild and moderate-severe COPD, principal components of MEFVC were important predictors for the continuous CT parameters. Their contribution to the prediction of emphysema diminished when classical pulmonary function test parameters were added. For SAD, the components remained very strong predictors. The adjusted R2 was higher in moderate-severe COPD, while in mild COPD, the adjusted R2 for all CT outcomes was low; 0.28 for emphysema, 0.21 for SAD and 0.19 for BWT. INTERPRETATION: The shape of the maximal expiratory flow-volume curve as analyzed with PCA is not an appropriate screening tool for early disease phenotypes identified by CT scan. However, it contributes to assessing emphysema and SAD in moderate-severe COPD.
Asunto(s)
Enfisema , Enfermedad Pulmonar Obstructiva Crónica , Enfisema Pulmonar , Humanos , Análisis de Componente Principal , Fumar , Enfermedad Pulmonar Obstructiva Crónica/diagnóstico , Enfermedad Pulmonar Obstructiva Crónica/genética , Espirometría , Fenotipo , Volumen Espiratorio ForzadoRESUMEN
The past 5 years have seen an explosion of interest in the use of artificial intelligence (AI) and machine learning techniques in medicine. This has been driven by the development of deep neural networks (DNNs)-complex networks residing in silico but loosely modelled on the human brain-that can process complex input data such as a chest radiograph image and output a classification such as 'normal' or 'abnormal'. DNNs are 'trained' using large banks of images or other input data that have been assigned the correct labels. DNNs have shown the potential to equal or even surpass the accuracy of human experts in pattern recognition tasks such as interpreting medical images or biosignals. Within respiratory medicine, the main applications of AI and machine learning thus far have been the interpretation of thoracic imaging, lung pathology slides and physiological data such as pulmonary function tests. This article surveys progress in this area over the past 5 years, as well as highlighting the current limitations of AI and machine learning and the potential for future developments.
Asunto(s)
Inteligencia Artificial , Aprendizaje Automático , Neumología , HumanosRESUMEN
RATIONALE: While American Thoracic Society (ATS)/European Respiratory Society (ERS) quality control criteria for spirometry include several quantitative limits, it also requires manual visual inspection. The current approach is time consuming and leads to high intertechnician variability. We propose a deep-learning approach called convolutional neural network (CNN), to standardise spirometric manoeuvre acceptability and usability. METHODS AND METHODS: In 36â873 curves from the National Health and Nutritional Examination Survey USA 2011-2012, technicians labelled 54% of curves as meeting ATS/ERS 2005 acceptability criteria with satisfactory start and end of test, but identified 93% of curves with a usable forced expiratory volume in 1â s. We processed raw data into images of maximal expiratory flow-volume curve (MEFVC), calculated ATS/ERS quantifiable criteria and developed CNNs to determine manoeuvre acceptability and usability on 90% of the curves. The models were tested on the remaining 10% of curves. We calculated Shapley values to interpret the models. RESULTS: In the test set (n=3738), CNN showed an accuracy of 87% for acceptability and 92% for usability, with the latter demonstrating a high sensitivity (92%) and specificity (96%). They were significantly superior (p<0.0001) to ATS/ERS quantifiable rule-based models. Shapley interpretation revealed MEFVC<1â s (MEFVC pattern within first second of exhalation) and plateau in volume-time were most important in determining acceptability, while MEFVC<1â s entirely determined usability. CONCLUSION: The CNNs identified relevant attributes in spirometric curves to standardise ATS/ERS manoeuvre acceptability and usability recommendations, and further provides individual manoeuvre feedback. Our algorithm combines the visual experience of skilled technicians and ATS/ERS quantitative rules in automating the critical phase of spirometry quality control.
Asunto(s)
Aprendizaje Profundo , Algoritmos , Espiración , Volumen Espiratorio Forzado , Humanos , Espirometría , Estados Unidos , Capacidad VitalRESUMEN
The interpretation of pulmonary function tests (PFTs) to diagnose respiratory diseases is built on expert opinion that relies on the recognition of patterns and the clinical context for detection of specific diseases. In this study, we aimed to explore the accuracy and interrater variability of pulmonologists when interpreting PFTs compared with artificial intelligence (AI)-based software that was developed and validated in more than 1500 historical patient cases.120 pulmonologists from 16 European hospitals evaluated 50 cases with PFT and clinical information, resulting in 6000 independent interpretations. The AI software examined the same data. American Thoracic Society/European Respiratory Society guidelines were used as the gold standard for PFT pattern interpretation. The gold standard for diagnosis was derived from clinical history, PFT and all additional tests.The pattern recognition of PFTs by pulmonologists (senior 73%, junior 27%) matched the guidelines in 74.4±5.9% of the cases (range 56-88%). The interrater variability of κ=0.67 pointed to a common agreement. Pulmonologists made correct diagnoses in 44.6±8.7% of the cases (range 24-62%) with a large interrater variability (κ=0.35). The AI-based software perfectly matched the PFT pattern interpretations (100%) and assigned a correct diagnosis in 82% of all cases (p<0.0001 for both measures).The interpretation of PFTs by pulmonologists leads to marked variations and errors. AI-based software provides more accurate interpretations and may serve as a powerful decision support tool to improve clinical practice.
Asunto(s)
Inteligencia Artificial , Neumología , Pruebas de Función Respiratoria , Adulto , Anciano , Anciano de 80 o más Años , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estudios Prospectivos , Programas InformáticosRESUMEN
PURPOSE OF REVIEW: The application of artificial intelligence in the diagnosis of obstructive lung diseases is an exciting phenomenon. Artificial intelligence algorithms work by finding patterns in data obtained from diagnostic tests, which can be used to predict clinical outcomes or to detect obstructive phenotypes. The purpose of this review is to describe the latest trends and to discuss the future potential of artificial intelligence in the diagnosis of obstructive lung diseases. RECENT FINDINGS: Machine learning has been successfully used in automated interpretation of pulmonary function tests for differential diagnosis of obstructive lung diseases. Deep learning models such as convolutional neural network are state-of-the art for obstructive pattern recognition in computed tomography. Machine learning has also been applied in other diagnostic approaches such as forced oscillation test, breath analysis, lung sound analysis and telemedicine with promising results in small-scale studies. SUMMARY: Overall, the application of artificial intelligence has produced encouraging results in the diagnosis of obstructive lung diseases. However, large-scale studies are still required to validate current findings and to boost its adoption by the medical community.
Asunto(s)
Inteligencia Artificial , Enfermedades Pulmonares Obstructivas/diagnóstico , Enfermedades Pulmonares Obstructivas/fisiopatología , Algoritmos , Pruebas Respiratorias , Diagnóstico por Computador , Humanos , Aprendizaje Automático , Redes Neurales de la Computación , Reconocimiento de Normas Patrones Automatizadas , Pruebas de Función Respiratoria , Ruidos Respiratorios , Tomografía Computarizada por Rayos XAsunto(s)
Pulmón , Humanos , Estudios Transversales , Valores de Referencia , Espirometría , Capacidad Vital , Volumen Espiratorio ForzadoRESUMEN
BACKGROUND: Specific resistance loops appear in different shapes influenced by different resistive properties of the airways, yet their descriptive ability is compressed to a single parameter - its slope. We aimed to develop new parameters reflecting the various shapes of the loop and to explore their potential in the characterisation of obstructive airways diseases. METHODS: Our study included 134 subjects: Healthy controls (N = 22), Asthma with non-obstructive lung function (N = 22) and COPD of all disease stages (N = 90). Different shapes were described by geometrical and second-order transfer function parameters. RESULTS: Our parameters demonstrated no difference between asthma and healthy controls groups, but were significantly different (p < 0.0001) from the patients with COPD. Grouping mild COPD subjects by an open or not-open shape of the resistance loop revealed significant differences of loop parameters and classical lung function parameters. Multiple logistic regression indicated RV/TLC as the only predictor of loop opening with OR = 1.157, 95% CI (1.064-1.267), p-value = 0.0006 and R2 = 0.35. Inducing airway narrowing in asthma gave equal shape measures as in COPD non-openers, but with a decreased slope (p < 0.0001). CONCLUSION: This study introduces new parameters calculated from the resistance loops which may correlate with different phenotypes of obstructive airways diseases.
Asunto(s)
Resistencia de las Vías Respiratorias , Asma/patología , Asma/fisiopatología , Modelos Biológicos , Enfermedad Pulmonar Obstructiva Crónica/patología , Enfermedad Pulmonar Obstructiva Crónica/fisiopatología , Adulto , Anciano , Simulación por Computador , Femenino , Humanos , Masculino , Persona de Mediana Edad , Dinámicas no Lineales , Pletismografía/métodos , Reproducibilidad de los Resultados , Sensibilidad y EspecificidadRESUMEN
BACKGROUND: The use of pulmonary function tests is primarily based on expert opinion and international guidelines. Current interpretation strategies are using predefined cutoffs for the description of a typical pattern. OBJECTIVES: We aimed to explore the predicted disease outcome based on the American Thoracic Society/European Respiratory Society (ATS/ERS) interpreting strategy. Subsequently, we investigated whether an unbiased machine learning framework integrating lung function with clinical variables may provide alternative decision trees resulting in a more accurate diagnosis. METHODS: Our study included data from 968 subjects admitted for the first time to a pulmonary practice. The final clinical diagnosis was based on the combination of complete pulmonary function with the investigations that were decided at the physician's discretion. Clinical diagnoses were separated into 10 different groups and validated by an expert panel. RESULTS: The ATS/ERS algorithm resulted in a correct diagnostic label in 38% of the subjects. Chronic obstructive pulmonary disease (COPD) was detected with an acceptable accuracy (74%), whereas all other diseases were poorly identified. The new data-based decision tree improved the general accuracy to 68% after 10-fold cross-validation when detecting the most common lung diseases, with a significantly higher positive predictive value and sensitivity for COPD, asthma, interstitial lung disease, and neuromuscular disorder (83/78, 66/82, 52/59, and 100/54%, respectively). CONCLUSIONS: Our data show that the current algorithms for lung function interpretation can be improved by a computer-based choice of lung function and clinical variables and their decision-making thresholds.
Asunto(s)
Asma/diagnóstico , Automatización , Enfermedades Pulmonares Intersticiales/diagnóstico , Pulmón/fisiopatología , Enfermedades Neuromusculares/diagnóstico , Enfermedad Pulmonar Obstructiva Crónica/diagnóstico , Pruebas de Función Respiratoria , Adulto , Anciano , Asma/fisiopatología , Estudios de Casos y Controles , Femenino , Volumen Espiratorio Forzado , Humanos , Enfermedades Pulmonares Intersticiales/fisiopatología , Aprendizaje Automático , Masculino , Persona de Mediana Edad , Enfermedades Neuromusculares/fisiopatología , Capacidad de Difusión Pulmonar , Enfermedad Pulmonar Obstructiva Crónica/fisiopatología , Capacidad Pulmonar Total , Capacidad VitalRESUMEN
BACKGROUND: Airway resistance (RAW) and specific airway conductance (sGAW) are measures that reflect the patency of airways. Little is known of the variability of these measures between different lung diseases. This study investigated the contribution of RAW and sGAW to a diagnosis of obstructive airways disease and their role in differentiating asthma from COPD. METHODS: 976 subjects admitted for the first time to a pulmonary practice in Belgium were included. Clinical diagnoses were based on complete pulmonary function tests and supported by investigations of physicians' discretion. 651 subjects had a final diagnosis of obstructive diseases, 168 had another respiratory disease and 157 subjects had no respiratory disease (healthy controls). RESULTS: RAW and sGAW were significantly different (p < 0.0001) between obstructive and other groups. Abnormal RAW and sGAW were found in 39 % and 18 % of the population, respectively, in which 81 % and 90 % had diagnosed airway obstruction. Multiple regression revealed sGAW to be a significant and independent predictor of an obstructive disorder. To differentiate asthma from COPD, RAW was found to be more relevant and statistically significant. In asthma patients with normal FEV1/FVC ratio, both RAW and sGAW were more specific than sensitive diagnostic tests in differentiating asthma from healthy subjects. CONCLUSIONS: RAW and sGAW are significant factors that contribute to the diagnosis and differentiation of obstructive airways diseases.
Asunto(s)
Obstrucción de las Vías Aéreas/diagnóstico , Obstrucción de las Vías Aéreas/fisiopatología , Resistencia de las Vías Respiratorias/fisiología , Adulto , Anciano , Obstrucción de las Vías Aéreas/epidemiología , Bélgica/epidemiología , Estudios de Cohortes , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estudios Prospectivos , Pruebas de Función Respiratoria/métodosRESUMEN
BACKGROUND AND OBJECTIVE: The definition of chronic obstructive pulmonary disease (COPD) based on a fixed forced expiratory volume in 1 s (FEV1 )/forced vital capacity (FVC) ratio or on the lower limits of FEV1 /FVC of a healthy reference population is the subject of continuous debate. We explored whether dynamics of forced expiratory flow decline on spirometry can identify subjects with and without COPD when the two key diagnostic criteria are discordant. METHODS: Four hundred twenty-three individuals with a history of ≥15 pack-years smoking had pulmonary function measurements conducted. A second-order input-output model was used to describe the dynamics of the forced expiration. The capability of the model parameters to predict presence of disease was explored with a support vector machine classifier. In the discordant individuals, newly classified subjects were validated by other pulmonary function tests. RESULTS: In the non-discordant subjects (n = 370), the second-order model was able to confirm a diagnosis of COPD in 95% of subjects (n = 351). In the discordant individuals (n = 53), the classification by dynamic flow analysis found 28 patients to be healthy whereas 25 patients were still classified as COPD. Hyperinflation, increased airways resistance and reduced dynamic volumes were observed in the newly identified COPD group of discordant subjects. When using non-spirometry-based pulmonary function criteria as a standard for correct diagnoses in the individual discordant subjects, the model allocated 68% (n = 36) of the discordant to a correct diagnosis. CONCLUSIONS: Expiratory flow dynamics can detect airflow limitation and indicate the presence of COPD. In discordant subjects, our methodology allows a better identification of subjects with or without characteristics of COPD.
Asunto(s)
Enfermedad Pulmonar Obstructiva Crónica/diagnóstico , Espirometría/métodos , Anciano , Femenino , Volumen Espiratorio Forzado/fisiología , Humanos , Pulmón/fisiopatología , Masculino , Persona de Mediana Edad , Enfermedad Pulmonar Obstructiva Crónica/fisiopatología , Pruebas de Función Respiratoria , Fumar/fisiopatología , Capacidad Vital/fisiologíaRESUMEN
INTRODUCTION: Spirometry is a point-of-care lung function test that helps support the diagnosis and monitoring of chronic lung disease. The quality and interpretation accuracy of spirometry is variable in primary care. This study aims to evaluate whether artificial intelligence (AI) decision support software improves the performance of primary care clinicians in the interpretation of spirometry, against reference standard (expert interpretation). METHODS AND ANALYSIS: A parallel, two-group, statistician-blinded, randomised controlled trial of primary care clinicians in the UK, who refer for, or interpret, spirometry. People with specialist training in respiratory medicine to consultant level were excluded. A minimum target of 228 primary care clinician participants will be randomised with a 1:1 allocation to assess fifty de-identified, real-world patient spirometry sessions through an online platform either with (intervention group) or without (control group) AI decision support software report. Outcomes will cover primary care clinicians' spirometry interpretation performance including measures of technical quality assessment, spirometry pattern recognition and diagnostic prediction, compared with reference standard. Clinicians' self-rated confidence in spirometry interpretation will also be evaluated. The primary outcome is the proportion of the 50 spirometry sessions where the participant's preferred diagnosis matches the reference diagnosis. Unpaired t-tests and analysis of covariance will be used to estimate the difference in primary outcome between intervention and control groups. ETHICS AND DISSEMINATION: This study has been reviewed and given favourable opinion by Health Research Authority Wales (reference: 22/HRA/5023). Results will be submitted for publication in peer-reviewed journals, presented at relevant national and international conferences, disseminated through social media, patient and public routes and directly shared with stakeholders. TRIAL REGISTRATION NUMBER: NCT05933694.
Asunto(s)
Inteligencia Artificial , Atención Primaria de Salud , Espirometría , Humanos , Sistemas de Apoyo a Decisiones Clínicas , Ensayos Clínicos Controlados Aleatorios como Asunto , Programas Informáticos , Espirometría/métodos , Reino UnidoRESUMEN
BACKGROUND: Spirometric parameters are the mainstay for diagnosis of COPD, but cannot distinguish airway obstruction from emphysema. We aimed to develop a computer model that quantifies airway collapse on forced expiratory flow-volume loops. We then explored and validated the relationship of airway collapse with computed tomography (CT) diagnosed emphysema in two large independent cohorts. METHODS: A computer model was developed in 513 Caucasian individuals with ≥15 pack-years who performed spirometry, diffusion capacity and CT scans to quantify emphysema presence. The model computed the two best fitting regression lines on the expiratory phase of the flow-volume loop and calculated the angle between them. The collapse was expressed as an Angle of collapse (AC) which was then correlated with the presence of emphysema. Findings were validated in an independent group of 340 individuals. RESULTS: AC in emphysema subjects (N = 251) was significantly lower (131° ± 14°) compared to AC in subjects without emphysema (N = 223), (152° ± 10°) (p < 0.0001). Multivariate regression analysis revealed AC as best indicator of visually scored emphysema (R2 = 0.505, p < 0.0001) with little significant contribution of KCO, %predicted and FEV1, %predicted to the total model (total R2 = 0.626, p < 0.0001). Similar associations were obtained when using CT-automated density scores for emphysema assessment. Receiver operating characteristic (ROC) curves pointed to 131° as the best cut-off for emphysema (95.5% positive predictive value, 97% specificity and 51% sensitivity). Validation in a second group confirmed the significant difference in mean AC between emphysema and non-emphysema subjects. When applying the 131° cut-off, a positive predictive value of 95.6%, a specificity of 96% and a sensitivity of 59% were demonstrated. CONCLUSIONS: Airway collapse on forced expiration quantified by a computer model correlates with emphysema. An AC below 131° can be considered as a specific cut-off for predicting the presence of emphysema in heavy smokers.
Asunto(s)
Simulación por Computador , Enfisema/diagnóstico , Enfisema/fisiopatología , Volumen Espiratorio Forzado/fisiología , Enfermedad Pulmonar Obstructiva Crónica/diagnóstico , Enfermedad Pulmonar Obstructiva Crónica/fisiopatología , Tomografía Computarizada por Rayos X , Anciano , Algoritmos , Estudios de Cohortes , Diagnóstico Diferencial , Femenino , Humanos , Masculino , Persona de Mediana Edad , Valor Predictivo de las Pruebas , Análisis de Regresión , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Fumar , EspirometríaRESUMEN
Background and objective: Spirometry patterns can suggest that a patient has a restrictive ventilatory impairment; however, lung volume measurements such as total lung capacity (TLC) are required to confirm the diagnosis. The aim of the study was to train a supervised machine learning model that can accurately estimate TLC values from spirometry and subsequently identify which patients would most benefit from undergoing a complete pulmonary function test. Methods: We trained three tree-based machine learning models on 51,761 spirometry data points with corresponding TLC measurements. We then compared model performance using an independent test set consisting of 1,402 patients. The best-performing model was used to retrospectively identify restrictive ventilatory impairment in the same test set. The algorithm was compared against different spirometry patterns commonly used to predict restriction. Results: The prevalence of restrictive ventilatory impairment in the test set is 16.7% (234/1402). CatBoost was the best-performing machine learning model. It predicted TLC with a mean squared error (MSE) of 560.1 mL. The sensitivity, specificity, and F1-score of the optimal algorithm for predicting restrictive ventilatory impairment was 83, 92, and 75%, respectively. Conclusion: A machine learning model trained on spirometry data can estimate TLC to a high degree of accuracy. This approach could be used to develop future smart home-based spirometry solutions, which could aid decision making and self-monitoring in patients with restrictive lung diseases.
RESUMEN
Rationale: Acquiring high-quality spirometry data in clinical trials is important, particularly when using forced expiratory volume in 1â s or forced vital capacity as primary end-points. In addition to quantitative criteria, the American Thoracic Society (ATS)/European Respiratory Society (ERS) standards include subjective evaluation which introduces inter-rater variability and potential mistakes. We explored the value of artificial intelligence (AI)-based software (ArtiQ.QC) to assess spirometry quality and compared it to traditional over-reading control. Methods: A random sample of 2000 sessions (8258 curves) was selected from Chiesi COPD and asthma trials (n=1000 per disease). Acceptability using the 2005 ATS/ERS standards was determined by over-reader review and by ArtiQ.QC. Additionally, three respiratory physicians jointly reviewed a subset of curves (n=150). Results: The majority of curves (n=7267, 88%) were of good quality. The AI agreed with over-readers in 91% of cases, with 97% sensitivity and 93% positive predictive value. Performance was significantly better in the asthma group. In the revised subset, n=50 curves were repeated to assess intra-rater reliability (κ=0.83, 0.86 and 0.80 for each of the three reviewers). All reviewers agreed on 63% of 100 unique tests (κ=0.5). When reviewers set the consensus (gold standard), individual agreement with it was 88%, 94% and 70%. The agreement between AI and "gold-standard" was 73%; over-reader agreement was 46%. Conclusion: AI-based software can be used to measure spirometry data quality with comparable accuracy as experts. The assessment is a subjective exercise, with intra- and inter-rater variability even when the criteria are defined very precisely and objectively. By providing consistent results and immediate feedback to the sites, AI may benefit clinical trial conduct and variability reduction.
RESUMEN
Background and aims: Pulmonary hypertension due to left heart disease (PH-LHD) is the most frequent form of PH. As differential diagnosis with pulmonary arterial hypertension (PAH) has therapeutic implications, it is important to accurately and noninvasively differentiate PH-LHD from PAH before referral to PH centres. The aim was to develop and validate a machine learning (ML) model to improve prediction of PH-LHD in a population of PAH and PH-LHD patients. Methods: Noninvasive PH-LHD predictors from 172 PAH and 172 PH-LHD patients from the PH centre database at the University Hospitals of Leuven (Leuven, Belgium) were used to develop an ML model. The Jacobs score was used as performance benchmark. The dataset was split into a training and test set (70:30) and the best model was selected after 10-fold cross-validation on the training dataset (n=240). The final model was externally validated using 165 patients (91 PAH, 74 PH-LHD) from Erasme Hospital (Brussels, Belgium). Results: In the internal test dataset (n=104), a random forest-based model correctly diagnosed 70% of PH-LHD patients (sensitivity: n=35/50), with 100% positive predicted value, 78% negative predicted value and 100% specificity. The model outperformed the Jacobs score, which identified 18% (n=9/50) of the patients with PH-LHD without false positives. In external validation, the model had 64% sensitivity at 100% specificity, while the Jacobs score had a sensitivity of 3% for no false positives. Conclusions: ML significantly improves the sensitivity of PH-LHD prediction at 100% specificity. Such a model may substantially reduce the number of patients referred for invasive diagnostics without missing PAH diagnoses.
RESUMEN
BACKGROUND: Spirometry services to diagnose and monitor lung disease in primary care were identified as a priority in the NHS Long Term Plan, and are restarting post-COVID-19 pandemic in England; however, evidence regarding best practice is limited. AIM: To explore perspectives on spirometry provision in primary care, and the potential for artificial intelligence (AI) decision support software to aid quality and interpretation. DESIGN AND SETTING: Semi-structured interviews with stakeholders in spirometry services across England. METHOD: Participants were recruited by snowball sampling. Interviews explored the pre-âpandemic delivery of spirometry, restarting of services, and perceptions of the role of AI. Transcripts were analysed thematically. RESULTS: In total, 28 participants (mean years' clinical experience = 21.6 [standard deviation 9.4, range 3-40]) were interviewed between April and June 2022. Participants included clinicians (n = 25) and commissioners (n = 3); eight held regional and/or national respiratory network advisory roles. Four themes were identified: 1) historical challenges in provision of spirometry services; 2) inequity in post-âpandemic spirometry provision and challenges to restarting spirometry in primary care; 3) future delivery closer to patients' homes by appropriately trained staff; and 4) the potential for AI to have supportive roles in spirometry. CONCLUSION: Stakeholders highlighted historic challenges and the damaging effects of the pandemic contributing to inequity in provision of spirometry, which must be addressed. Overall, stakeholders were positive about the potential of AI to support clinicians in quality assessment and interpretation of spirometry. However, it was evident that validation of the software must be sufficiently robust for clinicians and healthcare commissioners to have trust in the process.