RESUMEN
Policymakers must make management decisions despite incomplete knowledge and conflicting model projections. Little guidance exists for the rapid, representative, and unbiased collection of policy-relevant scientific input from independent modeling teams. Integrating approaches from decision analysis, expert judgment, and model aggregation, we convened multiple modeling teams to evaluate COVID-19 reopening strategies for a mid-sized United States county early in the pandemic. Projections from seventeen distinct models were inconsistent in magnitude but highly consistent in ranking interventions. The 6-mo-ahead aggregate projections were well in line with observed outbreaks in mid-sized US counties. The aggregate results showed that up to half the population could be infected with full workplace reopening, while workplace restrictions reduced median cumulative infections by 82%. Rankings of interventions were consistent across public health objectives, but there was a strong trade-off between public health outcomes and duration of workplace closures, and no win-win intermediate reopening strategies were identified. Between-model variation was high; the aggregate results thus provide valuable risk quantification for decision making. This approach can be applied to the evaluation of management interventions in any setting where models are used to inform decision making. This case study demonstrated the utility of our approach and was one of several multimodel efforts that laid the groundwork for the COVID-19 Scenario Modeling Hub, which has provided multiple rounds of real-time scenario projections for situational awareness and decision making to the Centers for Disease Control and Prevention since December 2020.
Asunto(s)
COVID-19 , Humanos , COVID-19/epidemiología , COVID-19/prevención & control , Incertidumbre , Brotes de Enfermedades/prevención & control , Salud Pública , Pandemias/prevención & controlRESUMEN
The COVID-19 pandemic has created an urgent need for models that can project epidemic trends, explore intervention scenarios, and estimate resource needs. Here we describe the methodology of Covasim (COVID-19 Agent-based Simulator), an open-source model developed to help address these questions. Covasim includes country-specific demographic information on age structure and population size; realistic transmission networks in different social layers, including households, schools, workplaces, long-term care facilities, and communities; age-specific disease outcomes; and intrahost viral dynamics, including viral-load-based transmissibility. Covasim also supports an extensive set of interventions, including non-pharmaceutical interventions, such as physical distancing and protective equipment; pharmaceutical interventions, including vaccination; and testing interventions, such as symptomatic and asymptomatic testing, isolation, contact tracing, and quarantine. These interventions can incorporate the effects of delays, loss-to-follow-up, micro-targeting, and other factors. Implemented in pure Python, Covasim has been designed with equal emphasis on performance, ease of use, and flexibility: realistic and highly customized scenarios can be run on a standard laptop in under a minute. In collaboration with local health agencies and policymakers, Covasim has already been applied to examine epidemic dynamics and inform policy decisions in more than a dozen countries in Africa, Asia-Pacific, Europe, and North America.
Asunto(s)
COVID-19 , Modelos Biológicos , SARS-CoV-2 , Análisis de Sistemas , Número Básico de Reproducción , COVID-19/etiología , COVID-19/prevención & control , COVID-19/transmisión , Prueba de COVID-19 , Vacunas contra la COVID-19 , Biología Computacional , Simulación por Computador , Trazado de Contacto , Progresión de la Enfermedad , Desinfección de las Manos , Interacciones Microbiota-Huesped , Humanos , Máscaras , Conceptos Matemáticos , Pandemias , Distanciamiento Físico , Cuarentena , Programas InformáticosRESUMEN
BACKGROUND: Unusually high snowfall in western Washington State in February 2019 led to widespread school and workplace closures. We assessed the impact of social distancing caused by this extreme weather event on the transmission of respiratory viruses. METHODS: Residual specimens from patients evaluated for acute respiratory illness at hospitals in the Seattle metropolitan area were screened for a panel of respiratory viruses. Transmission models were fit to each virus to estimate the magnitude reduction in transmission due to weather-related disruptions. Changes in contact rates and care-seeking were informed by data on local traffic volumes and hospital visits. RESULTS: Disruption in contact patterns reduced effective contact rates during the intervention period by 16 to 95%, and cumulative disease incidence through the remainder of the season by 3 to 9%. Incidence reductions were greatest for viruses that were peaking when the disruption occurred and least for viruses in an early epidemic phase. CONCLUSION: High-intensity, short-duration social distancing measures may substantially reduce total incidence in a respiratory virus epidemic if implemented near the epidemic peak. For SARS-CoV-2, this suggests that, even when SARS-CoV-2 spread is out of control, implementing short-term disruptions can prevent COVID-19 deaths.
Asunto(s)
Epidemias/prevención & control , Distanciamiento Físico , Infecciones del Sistema Respiratorio/transmisión , Infecciones del Sistema Respiratorio/virología , Tiempo (Meteorología) , COVID-19 , Ciudades , Humanos , Incidencia , Modelos Teóricos , Estudios Retrospectivos , WashingtónRESUMEN
Hepatitis C virus (HCV) afflicts 170 million people and kills 700 000 annually. Vaccination offers the most realistic and cost effective hope of controlling this epidemic, but despite 25 years of research, no vaccine is available. A major obstacle is HCV's extreme genetic variability and rapid mutational escape from immune pressure. Coupling maximum entropy inference with population dynamics simulations, we have employed a computational approach to translate HCV sequence databases into empirical landscapes of viral fitness and simulate the intrahost evolution of the viral quasispecies over these landscapes. We explicitly model the coupled host-pathogen dynamics by combining agent-based models of viral mutation with stochastically-integrated coupled ordinary differential equations for the host immune response. We validate our model in predicting the mutational evolution of the HCV RNA-dependent RNA polymerase (protein NS5B) within seven individuals for whom longitudinal sequencing data is available. We then use our approach to perform exhaustive in silico evaluation of putative immunogen candidates to rationally design tailored vaccines to simultaneously cripple viral fitness and block mutational escape within two selected individuals. By systematically identifying a small number of promising vaccine candidates, our empirical fitness landscapes and host-pathogen dynamics simulator can guide and accelerate experimental vaccine design efforts.
Asunto(s)
Simulación por Computador , Hepacivirus/inmunología , Hepatitis C/inmunología , Interacciones Huésped-Patógeno , Modelos Inmunológicos , Vacunas contra Hepatitis Viral/inmunología , Algoritmos , Linfocitos T CD8-positivos/inmunología , Linfocitos T CD8-positivos/virología , Evolución Molecular , Hepacivirus/genética , Hepacivirus/fisiología , Hepatitis C/virología , Humanos , Inmunidad , Mutación , Proteínas no Estructurales Virales/genética , Proteínas no Estructurales Virales/inmunologíaRESUMEN
UNLABELLED: Hepatitis C virus (HCV) afflicts 170 million people worldwide, 2%-3% of the global population, and kills 350 000 each year. Prophylactic vaccination offers the most realistic and cost effective hope of controlling this epidemic in the developing world where expensive drug therapies are not available. Despite 20 years of research, the high mutability of the virus and lack of knowledge of what constitutes effective immune responses have impeded development of an effective vaccine. Coupling data mining of sequence databases with spin glass models from statistical physics, we have developed a computational approach to translate clinical sequence databases into empirical fitness landscapes quantifying the replicative capacity of the virus as a function of its amino acid sequence. These landscapes explicitly connect viral genotype to phenotypic fitness, and reveal vulnerable immunological targets within the viral proteome that can be exploited to rationally design vaccine immunogens. We have recovered the empirical fitness landscape for the HCV RNA-dependent RNA polymerase (protein NS5B) responsible for viral genome replication, and validated the predictions of our model by demonstrating excellent accord with experimental measurements and clinical observations. We have used our landscapes to perform exhaustive in silico screening of 16.8 million T-cell immunogen candidates to identify 86 optimal formulations. By reducing the search space of immunogen candidates by over five orders of magnitude, our approach can offer valuable savings in time, expense, and labor for experimental vaccine development and accelerate the search for a HCV vaccine. ABBREVIATIONS: HCV-hepatitis C virus, HLA-human leukocyte antigen, CTL-cytotoxic T lymphocyte, NS5B-nonstructural protein 5B, MSA-multiple sequence alignment, PEG-IFN-pegylated interferon.
Asunto(s)
Aptitud Genética , Hepacivirus/inmunología , Vacunas contra Hepatitis Viral/inmunología , Proteínas Virales/inmunología , Biología Computacional , Simulación por Computador , Genotipo , Hepacivirus/genética , Modelos Químicos , Proteoma , Análisis de Secuencia de Proteína , Vacunas Sintéticas/genética , Vacunas Sintéticas/inmunología , Vacunas contra Hepatitis Viral/genética , Proteínas Virales/genéticaRESUMEN
Survival and second malignancy prediction models can aid clinical decision making. Most commonly, survival analysis studies are performed using traditional proportional hazards models, which require strong assumptions and can lead to biased estimates if violated. Therefore, this study aims to implement an alternative, machine learning (ML) model for survival analysis: Random Survival Forest (RSF). In this study, RSFs were built using the U.S. Surveillance Epidemiology and End Results to (1) predict 30-year survival in pediatric, adolescent, and young adult cancer survivors; and (2) predict risk and site of a second tumor within 30 years of the first tumor diagnosis in these age groups. The final RSF model for pediatric, adolescent, and young adult survival has an average Concordance index (C-index) of 92.9%, 94.2%, and 94.4% and average time-dependent area under the receiver operating characteristic curve (AUC) at 30-years since first diagnosis of 90.8%, 93.6%, 96.1% respectively. The final RSF model for pediatric, adolescent, and young adult second malignancy has an average C-index of 86.8%, 85.2%, and 88.6% and average time-dependent AUC at 30-years since first diagnosis of 76.5%, 88.1%, and 99.0% respectively. This study suggests the robustness and potential clinical value of ML models to alleviate physician burden by quickly identifying highest risk individuals.
Asunto(s)
Supervivientes de Cáncer , Neoplasias Primarias Secundarias , Neoplasias , Humanos , Niño , Adolescente , Adulto Joven , Neoplasias Primarias Secundarias/epidemiología , Modelos de Riesgos Proporcionales , Análisis de Supervivencia , Neoplasias/diagnóstico , Neoplasias/epidemiologíaRESUMEN
Despite large investment cancer continues to be a major source of mortality and morbidity throughout the world. Traditional methods of detection and diagnosis such as biopsy and imaging, tend to be expensive and have risks of complications. As data becomes more abundant and machine learning continues advancing, it is natural to ask how they can help solve some of these problems. In this paper we show that using a person's personal health data it is possible to predict their risk for a wide variety of cancers. We dub this process a "statistical biopsy." Specifically, we train two neural networks, one predicting risk for 16 different cancer types in females and the other predicting risk for 15 different cancer types in males. The networks were trained as binary classifiers identifying individuals that were diagnosed with the different cancer types within 5 years of joining the PLOC trial. However, rather than use the binary output of the classifiers we show that the continuous output can instead be used as a cancer risk allowing a holistic look at an individual's cancer risks. We tested our multi-cancer model on the UK Biobank dataset showing that for most cancers the predictions generalized well and that looking at multiple cancer risks at once from personal health data is a possibility. While the statistical biopsy will not be able to replace traditional biopsies for diagnosing cancers, we hope there can be a shift of paradigm in how statistical models are used in cancer detection moving to something more powerful and more personalized than general population screening guidelines.
RESUMEN
Initial COVID-19 containment in the United States focused on limiting mobility, including school and workplace closures. However, these interventions have had enormous societal and economic costs. Here, we demonstrate the feasibility of an alternative control strategy, test-trace-quarantine: routine testing of primarily symptomatic individuals, tracing and testing their known contacts, and placing their contacts in quarantine. We perform this analysis using Covasim, an open-source agent-based model, which has been calibrated to detailed demographic, mobility, and epidemiological data for the Seattle region from January through June 2020. With current levels of mask use and schools remaining closed, we find that high but achievable levels of testing and tracing are sufficient to maintain epidemic control even under a return to full workplace and community mobility and with low vaccine coverage. The easing of mobility restrictions in June 2020 and subsequent scale-up of testing and tracing programs through September provided real-world validation of our predictions. Although we show that test-trace-quarantine can control the epidemic in both theory and practice, its success is contingent on high testing and tracing rates, high quarantine compliance, relatively short testing and tracing delays, and moderate to high mask use. Thus, in order for test-trace-quarantine to control transmission with a return to high mobility, strong performance in all aspects of the program is required.
Asunto(s)
COVID-19/prevención & control , COVID-19/transmisión , Trazado de Contacto/métodos , Cuarentena/métodos , Humanos , SARS-CoV-2/aislamiento & purificación , Estados UnidosRESUMEN
The acceptance-rejection technique has been widely used in several Monte Carlo simulation packages for Rayleigh scattering of photons. However, the models implemented in these packages might fail to reproduce the corresponding experimental and theoretical results. The discrepancy is attributed to the fact that all current simulations implement an elastic scattering model for the angular distribution of photons without considering anomalous scattering effects. In this study, a novel Rayleigh scattering model using anomalous scattering factors based on the inverse-sampling technique is presented. Its performance was evaluated against other simulation algorithms in terms of simulation accuracy and computational efficiency. The computational efficiency was tested with a general-purpose Monte Carlo package named Particle Transport in Media (PTM). The evaluation showed that a Monte Carlo model using both atomic form factors and anomalous scattering factors for the angular distribution of photons (instead of the atomic form factors alone) produced Rayleigh scattering results in closer agreement with experimental data. The comparison and evaluation confirmed that the inverse-sampling technique using atomic form factors and anomalous scattering factors exhibited improved computational efficiency and performed the best in reproducing experimental measurements and related scattering matrix calculations. Furthermore, using this model to sample coherent scattering can provide scientific insight for complex systems.
RESUMEN
While colorectal cancer (CRC) is third in prevalence and mortality among cancers in the United States, there is no effective method to screen the general public for CRC risk. In this study, to identify an effective mass screening method for CRC risk, we evaluated seven supervised machine learning algorithms: linear discriminant analysis, support vector machine, naive Bayes, decision tree, random forest, logistic regression, and artificial neural network. Models were trained and cross-tested with the National Health Interview Survey (NHIS) and the Prostate, Lung, Colorectal, Ovarian Cancer Screening (PLCO) datasets. Six imputation methods were used to handle missing data: mean, Gaussian, Lorentzian, one-hot encoding, Gaussian expectation-maximization, and listwise deletion. Among all of the model configurations and imputation method combinations, the artificial neural network with expectation-maximization imputation emerged as the best, having a concordance of 0.70 ± 0.02, sensitivity of 0.63 ± 0.06, and specificity of 0.82 ± 0.04. In stratifying CRC risk in the NHIS and PLCO datasets, only 2% of negative cases were misclassified as high risk and 6% of positive cases were misclassified as low risk. In modeling the CRC-free probability with Kaplan-Meier estimators, low-, medium-, and high CRC-risk groups have statistically-significant separation. Our results indicated that the trained artificial neural network can be used as an effective screening tool for early intervention and prevention of CRC in large populations.
RESUMEN
Incidence and mortality rates of endometrial cancer are increasing, leading to increased interest in endometrial cancer risk prediction and stratification to help in screening and prevention. Previous risk models have had moderate success with the area under the curve (AUC) ranging from 0.68 to 0.77. Here we demonstrate a population-based machine learning model for endometrial cancer screening that achieves a testing AUC of 0.96. We train seven machine learning algorithms based solely on personal health data, without any genomic, imaging, biomarkers, or invasive procedures. The data come from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO). We further compare our machine learning model with 15 gynecologic oncologists and primary care physicians in the stratification of endometrial cancer risk for 100 women. We find a random forest model that achieves a testing AUC of 0.96 and a neural network model that achieves a testing AUC of 0.91. We test both models in risk stratification against 15 practicing physicians. Our random forest model is 2.5 times better at identifying above-average risk women with a 2-fold reduction in the false positive rate. Our neural network model is 2 times better at identifying above-average risk women with a 3-fold reduction in the false positive rate. Our machine learning models provide a non-invasive and cost-effective way to identify high-risk sub-populations who may benefit from early screening of endometrial cancer, prior to disease onset. Through statistical biopsy of personal health data, we have identified a new and effective approach for early cancer detection and prevention for individual patients.
RESUMEN
The Monte Carlo (MC) method is widely used to solve various problems in radiotherapy. There has been an impetus to accelerate MC simulation on GPUs whereas thread divergence remains a major issue for MC codes based on acceptance-rejection sampling. Inverse transform sampling has the potential to eliminate thread divergence but it is only implemented for photon transport. Here, we report a MC package Particle Transport in Media (PTM) to demonstrate the implementation of coupled photon-electron transport simulation using inverse transform sampling. Rayleigh scattering, Compton scattering, photo-electric effect and pair production are considered in an analogous manner for photon transport. Electron transport is simulated in a class II condensed history scheme, i.e., catastrophic inelastic scattering and Bremsstrahlung events are simulated explicitly while subthreshold interactions are subject to grouping. A random-hinge electron step correction algorithm and a modified PRESTA boundary crossing algorithm are employed to improve simulation accuracy. Benchmark studies against both EGSnrc simulations and experimental measurements are performed for various beams, phantoms and geometries. Gamma indices of the dose distributions are better than 99.6% for all the tested scenarios under the 2%/2 mm criteria. These results demonstrate the successful implementation of inverse transform sampling in coupled photon-electron transport simulation.
Asunto(s)
Método de Montecarlo , Dosis de Radiación , Planificación de la Radioterapia Asistida por Computador/métodos , Radioisótopos de Cobalto/uso terapéutico , Estudios de Factibilidad , Aceleradores de Partículas , Fantasmas de Imagen , Dosificación Radioterapéutica , AguaRESUMEN
Policymakers make decisions about COVID-19 management in the face of considerable uncertainty. We convened multiple modeling teams to evaluate reopening strategies for a mid-sized county in the United States, in a novel process designed to fully express scientific uncertainty while reducing linguistic uncertainty and cognitive biases. For the scenarios considered, the consensus from 17 distinct models was that a second outbreak will occur within 6 months of reopening, unless schools and non-essential workplaces remain closed. Up to half the population could be infected with full workplace reopening; non-essential business closures reduced median cumulative infections by 82%. Intermediate reopening interventions identified no win-win situations; there was a trade-off between public health outcomes and duration of workplace closures. Aggregate results captured twice the uncertainty of individual models, providing a more complete expression of risk for decision-making purposes.
RESUMEN
Among women, breast cancer is a leading cause of death. Breast cancer risk predictions can inform screening and preventative actions. Previous works found that adding inputs to the widely-used Gail model improved its ability to predict breast cancer risk. However, these models used simple statistical architectures and the additional inputs were derived from costly and / or invasive procedures. By contrast, we developed machine learning models that used highly accessible personal health data to predict five-year breast cancer risk. We created machine learning models using only the Gail model inputs and models using both Gail model inputs and additional personal health data relevant to breast cancer risk. For both sets of inputs, six machine learning models were trained and evaluated on the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial data set. The area under the receiver operating characteristic curve metric quantified each model's performance. Since this data set has a small percentage of positive breast cancer cases, we also reported sensitivity, specificity, and precision. We used Delong tests (p < 0.05) to compare the testing data set performance of each machine learning model to that of the Breast Cancer Risk Prediction Tool (BCRAT), an implementation of the Gail model. None of the machine learning models with only BCRAT inputs were significantly stronger than the BCRAT. However, the logistic regression, linear discriminant analysis, and neural network models with the broader set of inputs were all significantly stronger than the BCRAT. These results suggest that relative to the BCRAT, additional easy-to-obtain personal health inputs can improve five-year breast cancer risk prediction. Our models could be used as non-invasive and cost-effective risk stratification tools to increase early breast cancer detection and prevention, motivating both immediate actions like screening and long-term preventative measures such as hormone replacement therapy and chemoprevention.
Asunto(s)
Neoplasias de la Mama/epidemiología , Aprendizaje Automático , Anciano , Neoplasias de la Mama/diagnóstico , Femenino , Registros de Salud Personal , Humanos , Persona de Mediana Edad , Curva ROC , Medición de RiesgoRESUMEN
Purpose: Screening the general population for ovarian cancer is not recommended by every major medical or public health organization because the harms from screening outweigh the benefit it provides. To improve ovarian cancer detection and survival many are looking at high-risk populations who would benefit from screening. Methods: We train a neural network on readily available personal health data to predict and stratify ovarian cancer risk. We use two different datasets to train our network: The National Health Interview Survey and Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial. Results: Our model has an area under the receiver operating characteristic curve of 0.71. We further demonstrate how the model could be used to stratify patients into different risk categories. A simple 3-tier scheme classifies 23.8% of those with cancer and 1.0% of those without as high-risk similar to genetic testing, and 1.1% of those with cancer and 24.4% of those without as low risk. Conclusion: The developed neural network offers a cost-effective and non-invasive way to identify those who could benefit from targeted screening.
RESUMEN
Early detection of pancreatic cancer is challenging because cancer-specific symptoms occur only at an advanced stage, and a reliable screening tool to identify high-risk patients is lacking. To address this challenge, an artificial neural network (ANN) was developed, trained, and tested using the health data of 800,114 respondents captured in the National Health Interview Survey (NHIS) and Pancreatic, Lung, Colorectal, and Ovarian cancer (PLCO) datasets, together containing 898 patients diagnosed with pancreatic cancer. Prediction of pancreatic cancer risk was assessed at an individual level by incorporating 18 features into the neural network. The established ANN model achieved a sensitivity of 87.3 and 80.7%, a specificity of 80.8 and 80.7%, and an area under the receiver operating characteristic curve of 0.86 and 0.85 for the training and testing cohorts, respectively. These results indicate that our ANN can be used to predict pancreatic cancer risk with high discriminatory power and may provide a novel approach to identify patients at higher risk for pancreatic cancer who may benefit from more tailored screening and intervention.
RESUMEN
Colorectal cancer (CRC) is third in prevalence and mortality among all cancers in the US. Currently, the United States Preventative Services Task Force (USPSTF) recommends anyone ages 50-75 and/or with a family history to be screened for CRC. To improve screening specificity and sensitivity, we have built an artificial neural network (ANN) trained on 12 to 14 categories of personal health data from the National Health Interview Survey (NHIS). Years 1997-2016 of the NHIS contain 583,770 respondents who had never received a diagnosis of any cancer and 1409 who had received a diagnosis of CRC within 4 years of taking the survey. The trained ANN has sensitivity of 0.57 ± 0.03, specificity of 0.89 ± 0.02, positive predictive value of 0.0075 ± 0.0003, negative predictive value of 0.999 ± 0.001, and concordance of 0.80 ± 0.05 per the guidelines of Transparent Reporting of Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) level 2a, comparable to current risk-scoring methods. To demonstrate clinical applicability, both USPSTF guidelines and the trained ANN are used to stratify respondents to the 2017 NHIS into low-, medium- and high-risk categories (TRIPOD levels 4 and 2b, respectively). The number of CRC respondents misclassified as low risk is decreased from 35% by screening guidelines to 5% by ANN (in 60 cases). The number of non-CRC respondents misclassified as high risk is decreased from 53% by screening guidelines to 6% by ANN (in 25,457 cases). Our results demonstrate a robustly-tested method of stratifying CRC risk that is non-invasive, cost-effective, and easy to implement publicly.
Asunto(s)
Neoplasias Colorrectales/diagnóstico , Detección Precoz del Cáncer/estadística & datos numéricos , Modelos Estadísticos , Redes Neurales de la Computación , Autoinforme/estadística & datos numéricos , Anciano , Enfermedades Cardiovasculares/fisiopatología , Neoplasias Colorrectales/patología , Diabetes Mellitus/fisiopatología , Detección Precoz del Cáncer/métodos , Femenino , Estado de Salud , Encuestas Epidemiológicas/estadística & datos numéricos , Humanos , Masculino , Anamnesis/estadística & datos numéricos , Persona de Mediana Edad , Guías de Práctica Clínica como Asunto , Pronóstico , Factores de Riesgo , Estados UnidosRESUMEN
The objective of this study is to train and validate a multi-parameterized artificial neural network (ANN) based on personal health information to predict lung cancer risk with high sensitivity and specificity. The 1997-2015 National Health Interview Survey adult data was used to train and validate our ANN, with inputs: gender, age, BMI, diabetes, smoking status, emphysema, asthma, race, Hispanic ethnicity, hypertension, heart diseases, vigorous exercise habits, and history of stroke. We identified 648 cancer and 488,418 non-cancer cases. For the training set the sensitivity was 79.8% (95% CI, 75.9%-83.6%), specificity was 79.9% (79.8%-80.1%), and AUC was 0.86 (0.85-0.88). For the validation set sensitivity was 75.3% (68.9%-81.6%), specificity was 80.6% (80.3%-80.8%), and AUC was 0.86 (0.84-0.89). Our results indicate that the use of an ANN based on personal health information gives high specificity and modest sensitivity for lung cancer detection, offering a cost-effective and non-invasive clinical tool for risk stratification.
Asunto(s)
Detección Precoz del Cáncer/métodos , Neoplasias Pulmonares/epidemiología , Medición de Riesgo/métodos , Anciano , Asma/complicaciones , Asma/patología , Femenino , Humanos , Neoplasias Pulmonares/etiología , Neoplasias Pulmonares/patología , Masculino , Persona de Mediana Edad , Redes Neurales de la Computación , Factores de Riesgo , Fumar/patología , Accidente CerebrovascularRESUMEN
PURPOSE: To develop and validate a multiparameterized artificial neural network (ANN) on the basis of personal health information for prostate cancer risk prediction and stratification. METHODS: The 1997 to 2015 National Health Interview Survey adult survey data were used to train and validate a multiparameterized ANN, with parameters including age, body mass index, diabetes status, smoking status, emphysema, asthma, race, ethnicity, hypertension, heart disease, exercise habits, and history of stroke. We developed a training set of patients ≥ 45 years of age with a first primary prostate cancer diagnosed within 4 years of the survey. After training, the sensitivity and specificity were obtained as functions of the cutoff values of the continuous output of the ANN. We also evaluated the ANN with the 2016 data set for cancer risk stratification. RESULTS: We identified 1,672 patients with prostate cancer and 100,033 respondents without cancer in the 1997 to 2015 data sets. The training set had a sensitivity of 21.5% (95% CI, 19.2% to 23.9%), specificity of 91% (95% CI, 90.8% to 91.2%), area under the curve of 0.73 (95% CI, 0.71 to 0.75), and positive predictive value of 28.5% (95% CI, 25.5% to 31.5%). The validation set had a sensitivity of 23.2% (95% CI, 19.5% to 26.9%), specificity of 89.4% (95% CI, 89% to 89.7%), area under the curve of 0.72 (95% CI, 0.70 to 0.75), and positive predictive value of 26.5% (95% CI, 22.4% to 30.6%). For the 2016 data set, the ANN classified all 13,031 patients into low-, medium-, and high-risk subgroups and identified 5% of the cancer population as high risk. CONCLUSION: A multiparameterized ANN that is based on personal health information could be used for prostate cancer risk prediction with high specificity and low sensitivity. The ANN can further stratify the population into three subgroups that may be helpful in refining prescreening estimates of cancer risk.
Asunto(s)
Sistemas de Información en Salud , Neoplasias de la Próstata/diagnóstico , Anciano , Área Bajo la Curva , Humanos , Masculino , Persona de Mediana Edad , Redes Neurales de la Computación , Neoplasias de la Próstata/patología , Medición de Riesgo , Factores de Riesgo , Sensibilidad y Especificidad , Encuestas y CuestionariosRESUMEN
Lung cancer is the most common cause of cancer-related death globally. As a preventive measure, the United States Preventive Services Task Force (USPSTF) recommends annual screening of high risk individuals with low-dose computed tomography (CT). The resulting volume of CT scans from millions of people will pose a significant challenge for radiologists to interpret. To fill this gap, computer-aided detection (CAD) algorithms may prove to be the most promising solution. A crucial first step in the analysis of lung cancer screening results using CAD is the detection of pulmonary nodules, which may represent early-stage lung cancer. The objective of this work is to develop and validate a reinforcement learning model based on deep artificial neural networks for early detection of lung nodules in thoracic CT images. Inspired by the AlphaGo system, our deep learning algorithm takes a raw CT image as input and views it as a collection of states, and output a classification of whether a nodule is present or not. The dataset used to train our model is the LIDC/IDRI database hosted by the lung nodule analysis (LUNA) challenge. In total, there are 888 CT scans with annotations based on agreement from at least three out of four radiologists. As a result, there are 590 individuals having one or more nodules, and 298 having none. Our training results yielded an overall accuracy of 99.1% [sensitivity 99.2%, specificity 99.1%, positive predictive value (PPV) 99.1%, negative predictive value (NPV) 99.2%]. In our test, the results yielded an overall accuracy of 64.4% (sensitivity 58.9%, specificity 55.3%, PPV 54.2%, and NPV 60.0%). These early results show promise in solving the major issue of false positives in CT screening of lung nodules, and may help to save unnecessary follow-up tests and expenditures.