RESUMO
Importance: Declining mortality in the field of pediatric critical care medicine has shifted practicing clinicians' attention to preserving patients' neurodevelopmental potential as a main objective. Earlier identification of critically ill children at risk for incurring neurologic morbidity would facilitate heightened surveillance that could lead to timelier clinical detection, earlier interventions, and preserved neurodevelopmental trajectory. Objective: Develop machine-learning models for identifying acquired neurologic morbidity while hospitalized with critical illness and assess correlation with contemporary serum-based, brain injury-derived biomarkers. Design: Retrospective cohort study. Setting: Two large, quaternary children's hospitals. Exposures: Critical illness. Main Outcomes and Measures: The outcome was neurologic morbidity, defined according to a computable, composite definition at the development site or an order for neurocritical care consultation at the validation site. Models were developed using varying time windows for temporal feature engineering and varying censored time horizons prior to identified neurologic morbidity. Optimal models were selected based on F1 scores, cohort sizes, calibration, and data availability for eventual deployment. A generalizable created at the development site was assessed at an external validation site and optimized with spline recalibration. Correlation was assessed between development site model predictions and measurements of brain biomarkers from a convenience cohort. Results: After exclusions there were 14,222-25,171 encounters from 2010-2022 in the development site cohorts and 6,280-6,373 from 2018-2021 in the validation site cohort. At the development site, an extreme gradient boosted model (XGBoost) with a 12-hour time horizon and 48-hour feature engineering window had an F1-score of 0.54, area under the receiver operating characteristics curve (AUROC) of 0.82, and a number needed to alert (NNA) of 2. A generalizable XGBoost model with a 24-hour time horizon and 48-hour feature engineering window demonstrated an F1-score of 0.37, AUROC of 0.81, AUPRC of 0.51, and NNA of 4 at the validation site. After recalibration at the validation site, the Brier score was 0.04. Serum levels of the brain injury biomarker glial fibrillary acidic protein measurements significantly correlated with model output (rs=0.34; P=0.007). Conclusions and Relevance: We demonstrate a well-performing ensemble of models for predicting neurologic morbidity in children with biomolecular corroboration. Prospective assessment and refinement of biomarker-coupled risk models in pediatric critical illness is warranted.
RESUMO
RATIONALE: Young people with epilepsy of childbearing potential (YPWECP) are vulnerable to a variety of adverse health outcomes due to teratogenic antiseizure medications (ASMs) and drug-drug interactions between ASMs and contraceptives that can lead to breakthrough seizures and/or contraceptive failure. To better understand reproductive healthcare provision for YPWECP, we conducted a retrospective analysis of relevant prescription patterns. METHODS: We analyzed procedural and medication data for YPWECP ages 13-21 years (n = 1525) from 2011 through 2021 at a single tertiary-care pediatric medical center to investigate rates of (1) prescription of folic acid, (2) prescription of an enzyme-inducing ASM<6 months before or after hormonal contraception initiation (or < 3 years after subdermal implant placement), (3) prescription of lamotrigine < 6 months before or after an estrogen-containing contraceptive that could affect lamotrigine serum concentrations, and (4) documentation of any contraceptive medication or device that overlaps initiation of a patient's first teratogenic ASM. We performed statistical analyses with sample proportion z-tests. We then used logistic regression and generalized estimating equations to evaluate for associations between patient characteristics and prescription patterns. RESULTS: Among 1525 YPWECP, less than half (41 %, n = 629) were prescribed folic acid during the study period (95 % CI 38.8-43.7). Of YPWECP prescribed an enzyme-inducing ASM, 24 % (186/766) were co-prescribed a hormonal contraceptive that adversely interacts with the ASM (95 % CI 21.2-27.3 %). Of those prescribed lamotrigine during the study period, 24 % (111/472) had documentation of an estrogen-containing medication that could affect lamotrigine serum concentrations < 6 months before or after that prescription (95 % CI 19.7-27.3 %). Of those prescribed a teratogenic ASM, only 13 % (82/638) had documentation of contraception prior to (or within the same month as) starting their first teratogenic ASM (95 % CI 10.3-15.5 %). Older age was associated with increased odds of contraceptive coverage prior to initiation of the first teratogenic ASM and was also associated with increased odds of having contraceptives co-prescribed with ASMs that could interact. No significant associations were found between race/ethnicity and any outcomes. CONCLUSIONS: YPWECP experience low rates of folic acid prescription and low rates of contraceptive coverage while prescribed teratogenic ASMs. Many YPWECP, particularly older adolescents, are at increased risk for contraceptive failure and/or breakthrough seizures due to drug-drug interactions. Results demonstrate a need for increased focus on reproductive healthcare for YPWECP. Future studies should evaluate interventions aimed at improving these outcomes.
RESUMO
Combining predictions from multiple models into an ensemble is a widely used practice across many fields with demonstrated performance benefits. The R package hubEnsembles provides a flexible framework for ensembling various types of predictions, including point estimates and probabilistic predictions. A range of common methods for generating ensembles are supported, including weighted averages, quantile averages, and linear pools. The hubEnsembles package fits within a broader framework of open-source software and data tools called the "hubverse", which facilitates the development and management of collaborative modelling exercises.
RESUMO
MOTIVATION: Software is vital for the advancement of biology and medicine. Impact evaluations of scientific software have primarily emphasized traditional citation metrics of associated papers, despite these metrics inadequately capturing the dynamic picture of impact and despite challenges with improper citation. RESULTS: To understand how software developers evaluate their tools, we conducted a survey of participants in the Informatics Technology for Cancer Research (ITCR) program funded by the National Cancer Institute (NCI). We found that although developers realize the value of more extensive metric collection, they find a lack of funding and time hindering. We also investigated software among this community for how often infrastructure that supports more nontraditional metrics were implemented and how this impacted rates of papers describing usage of the software. We found that infrastructure such as social media presence, more in-depth documentation, the presence of software health metrics, and clear information on how to contact developers seemed to be associated with increased mention rates. Analysing more diverse metrics can enable developers to better understand user engagement, justify continued funding, identify novel use cases, pinpoint improvement areas, and ultimately amplify their software's impact. Challenges are associated, including distorted or misleading metrics, as well as ethical and security concerns. More attention to nuances involved in capturing impact across the spectrum of biomedical software is needed. For funders and developers, we outline guidance based on experience from our community. By considering how we evaluate software, we can empower developers to create tools that more effectively accelerate biological and medical research progress. AVAILABILITY AND IMPLEMENTATION: More information about the analysis, as well as access to data and code is available at https://github.com/fhdsl/ITCR_Metrics_manuscript_website.
Assuntos
Pesquisa Biomédica , Software , Pesquisa Biomédica/métodos , Humanos , Estados Unidos , Biologia Computacional/métodosRESUMO
Across many fields, scenario modeling has become an important tool for exploring long-term projections and how they might depend on potential interventions and critical uncertainties, with relevance to both decision makers and scientists. In the past decade, and especially during the COVID-19 pandemic, the field of epidemiology has seen substantial growth in the use of scenario projections. Multiple scenarios are often projected at the same time, allowing important comparisons that can guide the choice of intervention, the prioritization of research topics, or public communication. The design of the scenarios is central to their ability to inform important questions. In this paper, we draw on the fields of decision analysis and statistical design of experiments to propose a framework for scenario design in epidemiology, with relevance also to other fields. We identify six different fundamental purposes for scenario designs (decision making, sensitivity analysis, situational awareness, horizon scanning, forecasting, and value of information) and discuss how those purposes guide the structure of scenarios. We discuss other aspects of the content and process of scenario design, broadly for all settings and specifically for multi-model ensemble projections. As an illustrative case study, we examine the first 17 rounds of scenarios from the U.S. COVID-19 Scenario Modeling Hub, then reflect on future advancements that could improve the design of scenarios in epidemiological settings.
Assuntos
COVID-19 , Técnicas de Apoio para a Decisão , Humanos , COVID-19/epidemiologia , COVID-19/prevenção & controle , COVID-19/transmissão , Previsões , SARS-CoV-2 , Doenças Transmissíveis/epidemiologia , Pandemias/prevenção & controle , Tomada de Decisões , Projetos de PesquisaRESUMO
BACKGROUND: The early identification of outbreaks of both known and novel influenza-like illnesses (ILIs) is an important public health problem. OBJECTIVE: This study aimed to describe the design and testing of a tool that detects and tracks outbreaks of both known and novel ILIs, such as the SARS-CoV-2 worldwide pandemic, accurately and early. METHODS: This paper describes the ILI Tracker algorithm that first models the daily occurrence of a set of known ILIs in hospital emergency departments in a monitored region using findings extracted from patient care reports using natural language processing. We then show how the algorithm can be extended to detect and track the presence of an unmodeled disease that may represent a novel disease outbreak. RESULTS: We include results based on modeling diseases like influenza, respiratory syncytial virus, human metapneumovirus, and parainfluenza for 5 emergency departments in Allegheny County, Pennsylvania, from June 1, 2014, to May 31, 2015. We also include the results of detecting the outbreak of an unmodeled disease, which in retrospect was very likely an outbreak of the enterovirus D68 (EV-D68). CONCLUSIONS: The results reported in this paper provide support that ILI Tracker was able to track well the incidence of 4 modeled influenza-like diseases over a 1-year period, relative to laboratory-confirmed cases, and it was computationally efficient in doing so. The system was also able to detect a likely novel outbreak of EV-D68 early in an outbreak that occurred in Allegheny County in 2014 as well as clinically characterize that outbreak disease accurately.
Assuntos
Algoritmos , Teorema de Bayes , Surtos de Doenças , Influenza Humana , Humanos , Influenza Humana/epidemiologia , Pennsylvania/epidemiologia , COVID-19/epidemiologia , Serviço Hospitalar de Emergência/estatística & dados numéricosRESUMO
Between December 2020 and April 2023, the COVID-19 Scenario Modeling Hub (SMH) generated operational multi-month projections of COVID-19 burden in the US to guide pandemic planning and decision-making in the context of high uncertainty. This effort was born out of an attempt to coordinate, synthesize and effectively use the unprecedented amount of predictive modeling that emerged throughout the COVID-19 pandemic. Here we describe the history of this massive collective research effort, the process of convening and maintaining an open modeling hub active over multiple years, and attempt to provide a blueprint for future efforts. We detail the process of generating 17 rounds of scenarios and projections at different stages of the COVID-19 pandemic, and disseminating results to the public health community and lay public. We also highlight how SMH was expanded to generate influenza projections during the 2022-23 season. We identify key impacts of SMH results on public health and draw lessons to improve future collaborative modeling efforts, research on scenario projections, and the interface between models and policy.
Assuntos
COVID-19 , Influenza Humana , Humanos , COVID-19/epidemiologia , Influenza Humana/epidemiologia , Pandemias , Políticas , Saúde PúblicaRESUMO
PURPOSE: Manual extraction of case details from patient records for cancer surveillance is a resource-intensive task. Natural Language Processing (NLP) techniques have been proposed for automating the identification of key details in clinical notes. Our goal was to develop NLP application programming interfaces (APIs) for integration into cancer registry data abstraction tools in a computer-assisted abstraction setting. METHODS: We used cancer registry manual abstraction processes to guide the design of DeepPhe-CR, a web-based NLP service API. The coding of key variables was performed through NLP methods validated using established workflows. A container-based implementation of the NLP methods and the supporting infrastructure was developed. Existing registry data abstraction software was modified to include results from DeepPhe-CR. An initial usability study with data registrars provided early validation of the feasibility of the DeepPhe-CR tools. RESULTS: API calls support submission of single documents and summarization of cases across one or more documents. The container-based implementation uses a REST router to handle requests and support a graph database for storing results. NLP modules extract topography, histology, behavior, laterality, and grade at 0.79-1.00 F1 across multiple cancer types (breast, prostate, lung, colorectal, ovary, and pediatric brain) from data of two population-based cancer registries. Usability study participants were able to use the tool effectively and expressed interest in the tool. CONCLUSION: The DeepPhe-CR system provides an architecture for building cancer-specific NLP tools directly into registrar workflows in a computer-assisted abstraction setting. Improved user interactions in client tools may be needed to realize the potential of these approaches.
Assuntos
Processamento de Linguagem Natural , Neoplasias , Masculino , Feminino , Humanos , Criança , Software , Próstata , Sistema de Registros , Neoplasias/diagnóstico , Neoplasias/terapiaRESUMO
OBJECTIVES: This study aimed to enable clinical researchers without expertise in natural language processing (NLP) to extract and analyze information about sexual and reproductive health (SRH), or other sensitive health topics, from large sets of clinical notes. METHODS: (1) We retrieved text from the electronic health record as individual notes. (2) We segmented notes into sentences using one of scispaCy's NLP toolkits. (3) We exported sentences to the labeling application Watchful and annotated subsets of these as relevant or irrelevant to various SRH categories by applying a combination of regular expressions and manual annotation. (4) The labeled sentences served as training data to create machine learning models for classifying text; specifically, we used spaCy's default text classification ensemble, comprising a bag-of-words model and a neural network with attention. (5) We applied each model to unlabeled sentences to identify additional references to SRH with novel relevant vocabulary. We used this information and repeated steps 3 to 5 iteratively until the models identified no new relevant sentences for each topic. Finally, we aggregated the labeled data for analysis. RESULTS: This methodology was applied to 3,663 Child Neurology notes for 971 female patients. Our search focused on six SRH categories. We validated the approach using two subject matter experts, who independently labeled a sample of 400 sentences. Cohen's kappa values were calculated for each category between the reviewers (menstruation: 1, sexual activity: 0.9499, contraception: 0.9887, folic acid: 1, teratogens: 0.8864, pregnancy: 0.9499). After removing the sentences on which reviewers did not agree, we compared the reviewers' labels to those produced via our methodology, again using Cohen's kappa (menstruation: 1, sexual activity: 1, contraception: 0.9885, folic acid: 1, teratogens: 0.9841, pregnancy: 0.9871). CONCLUSION: Our methodology is reproducible, enables analysis of large amounts of text, and has produced results that are highly comparable to subject matter expert manual review.
Assuntos
Processamento de Linguagem Natural , Saúde Reprodutiva , Criança , Humanos , Feminino , Teratogênicos , Registros Eletrônicos de Saúde , Comportamento Sexual , Ácido FólicoRESUMO
Our ability to forecast epidemics far into the future is constrained by the many complexities of disease systems. Realistic longer-term projections may, however, be possible under well-defined scenarios that specify the future state of critical epidemic drivers. Since December 2020, the U.S. COVID-19 Scenario Modeling Hub (SMH) has convened multiple modeling teams to make months ahead projections of SARS-CoV-2 burden, totaling nearly 1.8 million national and state-level projections. Here, we find SMH performance varied widely as a function of both scenario validity and model calibration. We show scenarios remained close to reality for 22 weeks on average before the arrival of unanticipated SARS-CoV-2 variants invalidated key assumptions. An ensemble of participating models that preserved variation between models (using the linear opinion pool method) was consistently more reliable than any single model in periods of valid scenario assumptions, while projection interval coverage was near target levels. SMH projections were used to guide pandemic response, illustrating the value of collaborative hubs for longer-term scenario projections.
Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , Pandemias/prevenção & controle , SARS-CoV-2 , IncertezaRESUMO
Across many fields, scenario modeling has become an important tool for exploring long-term projections and how they might depend on potential interventions and critical uncertainties, with relevance to both decision makers and scientists. In the past decade, and especially during the COVID-19 pandemic, the field of epidemiology has seen substantial growth in the use of scenario projections. Multiple scenarios are often projected at the same time, allowing important comparisons that can guide the choice of intervention, the prioritization of research topics, or public communication. The design of the scenarios is central to their ability to inform important questions. In this paper, we draw on the fields of decision analysis and statistical design of experiments to propose a framework for scenario design in epidemiology, with relevance also to other fields. We identify six different fundamental purposes for scenario designs (decision making, sensitivity analysis, value of information, situational awareness, horizon scanning, and forecasting) and discuss how those purposes guide the structure of scenarios. We discuss other aspects of the content and process of scenario design, broadly for all settings and specifically for multi-model ensemble projections. As an illustrative case study, we examine the first 17 rounds of scenarios from the U.S. COVID-19 Scenario Modeling Hub, then reflect on future advancements that could improve the design of scenarios in epidemiological settings.
RESUMO
BACKGROUND: Infectious disease computational modeling studies have been widely published during the coronavirus disease 2019 (COVID-19) pandemic, yet they have limited reproducibility. Developed through an iterative testing process with multiple reviewers, the Infectious Disease Modeling Reproducibility Checklist (IDMRC) enumerates the minimal elements necessary to support reproducible infectious disease computational modeling publications. The primary objective of this study was to assess the reliability of the IDMRC and to identify which reproducibility elements were unreported in a sample of COVID-19 computational modeling publications. METHODS: Four reviewers used the IDMRC to assess 46 preprint and peer reviewed COVID-19 modeling studies published between March 13th, 2020, and July 30th, 2020. The inter-rater reliability was evaluated by mean percent agreement and Fleiss' kappa coefficients (κ). Papers were ranked based on the average number of reported reproducibility elements, and average proportion of papers that reported each checklist item were tabulated. RESULTS: Questions related to the computational environment (mean κ = 0.90, range = 0.90-0.90), analytical software (mean κ = 0.74, range = 0.68-0.82), model description (mean κ = 0.71, range = 0.58-0.84), model implementation (mean κ = 0.68, range = 0.39-0.86), and experimental protocol (mean κ = 0.63, range = 0.58-0.69) had moderate or greater (κ > 0.41) inter-rater reliability. Questions related to data had the lowest values (mean κ = 0.37, range = 0.23-0.59). Reviewers ranked similar papers in the upper and lower quartiles based on the proportion of reproducibility elements each paper reported. While over 70% of the publications provided data used in their models, less than 30% provided the model implementation. CONCLUSIONS: The IDMRC is the first comprehensive, quality-assessed tool for guiding researchers in reporting reproducible infectious disease computational modeling studies. The inter-rater reliability assessment found that most scores were characterized by moderate or greater agreement. These results suggest that the IDMRC might be used to provide reliable assessments of the potential for reproducibility of published infectious disease modeling publications. Results of this evaluation identified opportunities for improvement to the model implementation and data questions that can further improve the reliability of the checklist.
Assuntos
COVID-19 , Doenças Transmissíveis , Humanos , Reprodutibilidade dos Testes , Lista de Checagem , Variações Dependentes do Observador , Simulação por ComputadorRESUMO
Our ability to forecast epidemics more than a few weeks into the future is constrained by the complexity of disease systems, our limited ability to measure the current state of an epidemic, and uncertainties in how human action will affect transmission. Realistic longer-term projections (spanning more than a few weeks) may, however, be possible under defined scenarios that specify the future state of critical epidemic drivers, with the additional benefit that such scenarios can be used to anticipate the comparative effect of control measures. Since December 2020, the U.S. COVID-19 Scenario Modeling Hub (SMH) has convened multiple modeling teams to make 6-month ahead projections of the number of SARS-CoV-2 cases, hospitalizations and deaths. The SMH released nearly 1.8 million national and state-level projections between February 2021 and November 2022. SMH performance varied widely as a function of both scenario validity and model calibration. Scenario assumptions were periodically invalidated by the arrival of unanticipated SARS-CoV-2 variants, but SMH still provided projections on average 22 weeks before changes in assumptions (such as virus transmissibility) invalidated scenarios and their corresponding projections. During these periods, before emergence of a novel variant, a linear opinion pool ensemble of contributed models was consistently more reliable than any single model, and projection interval coverage was near target levels for the most plausible scenarios (e.g., 79% coverage for 95% projection interval). SMH projections were used operationally to guide planning and policy at different stages of the pandemic, illustrating the value of the hub approach for long-term scenario projections.
RESUMO
Software is vital for the advancement of biology and medicine. Through analysis of usage and impact metrics of software, developers can help determine user and community engagement. These metrics can be used to justify additional funding, encourage additional use, and identify unanticipated use cases. Such analyses can help define improvement areas and assist with managing project resources. However, there are challenges associated with assessing usage and impact, many of which vary widely depending on the type of software being evaluated. These challenges involve issues of distorted, exaggerated, understated, or misleading metrics, as well as ethical and security concerns. More attention to the nuances, challenges, and considerations involved in capturing impact across the diverse spectrum of biological software is needed. Furthermore, some tools may be especially beneficial to a small audience, yet may not have comparatively compelling metrics of high usage. Although some principles are generally applicable, there is not a single perfect metric or approach to effectively evaluate a software tool's impact, as this depends on aspects unique to each tool, how it is used, and how one wishes to evaluate engagement. We propose more broadly applicable guidelines (such as infrastructure that supports the usage of software and the collection of metrics about usage), as well as strategies for various types of software and resources. We also highlight outstanding issues in the field regarding how communities measure or evaluate software impact. To gain a deeper understanding of the issues hindering software evaluations, as well as to determine what appears to be helpful, we performed a survey of participants involved with scientific software projects for the Informatics Technology for Cancer Research (ITCR) program funded by the National Cancer Institute (NCI). We also investigated software among this scientific community and others to assess how often infrastructure that supports such evaluations is implemented and how this impacts rates of papers describing usage of the software. We find that although developers recognize the utility of analyzing data related to the impact or usage of their software, they struggle to find the time or funding to support such analyses. We also find that infrastructure such as social media presence, more in-depth documentation, the presence of software health metrics, and clear information on how to contact developers seem to be associated with increased usage rates. Our findings can help scientific software developers make the most out of the evaluations of their software so that they can more fully benefit from such assessments.
RESUMO
It would be highly desirable to have a tool that detects the outbreak of a new influenza-like illness, such as COVID-19, accurately and early. This paper describes the ILI Tracker algorithm that first models the daily occurrence of a set of known influenza-like illnesses in a hospital emergency department using findings extracted from patient-care reports using natural language processing. We include results based on modeling the diseases influenza, respiratory syncytial virus, human metapneumovirus, and parainfluenza for five emergency departments in Allegheny County Pennsylvania from June 1, 2010 through May 31, 2015. We then show how the algorithm can be extended to detect the presence of an unmodeled disease which may represent a novel disease outbreak. We also include results for detecting an outbreak of an unmodeled disease during the mentioned time period, which in retrospect was very likely an outbreak of Enterovirus D68.
RESUMO
Rationale The American Academy of Neurology (AAN) recommends annual sexual and reproductive health (SRH) counseling for all people with epilepsy of gestational capacity (PWEGC). Child neurologists report discussing SRH concerns infrequently with adolescents. Limited research exists regarding documentation of such counseling. METHODS: We retrospectively studied clinical notes using natural language processing to investigate child neurologists' documentation of SRH counseling for adolescent and young adult PWEGC. We segmented notes into sentences and evaluated for references to menstruation, sexual activity, contraception, folic acid, teratogens, and pregnancy. We developed training sets in a labeling application and used machine learning to identify additional counseling instances. We repeated this iteratively until we identified no new relevant sentences. We validated results using external reviewers; after removing sentences reviewers disagreed on (n = 13/400), we calculated Cohen's kappa values between the model and reviewers (>0.98 for all categories). We evaluated labeled texts for each patient per calendar year with descriptive statistics and logistic regression, adjusting for race/ethnicity, age, and teratogen use. RESULTS: Data comprised 971 PWEGC age 13-21 years with 2277 patient-years and 3663 outpatient child neurology notes. Nearly half of patient-years lacked SRH counseling documentation (49.1%). Among all patients, 38.0% never had SRH counseling documented. Documentation was present regarding menstruation in 26.7% of patient-years, folic acid in 25.0%, contraception in 21.9%, pregnancy in 3.5%, teratogens in 3.0%, and sexual activity in 1.8%. Documentation regarding menstruation and contraception was associated with prescription of antiseizure medications that have a higher risk of teratogenic effects (OR = 1.27, p = 0.020, 95% CI = [1.04,1.54]; OR = 1.27, p = 0.027, 95% CI = [1.03,1.58]). Documentation regarding contraception, folic acid, and sexual activity was increased among older patients (OR = 1.28, p < 0.001, 95% CI = [1.21,1.35]; OR = 1.26, p < 0.001, 95% CI = [1.19,1.32]; OR = 1.26, p = 0.004, 95% CI = [1.08,1.47]). Documentation regarding sexual activity was decreased among patients identifying as White/Non-Hispanic (OR = 0.39, p = 0.007, 95% CI = [0.20,0.78]). CONCLUSION: Child neurologists counsel PWEGC on SRH less frequently than recommended by the AAN based on documentation.
Assuntos
Epilepsia , Saúde Reprodutiva , Gravidez , Feminino , Criança , Adolescente , Humanos , Adulto Jovem , Adulto , Estudos Retrospectivos , Teratogênicos , Anticoncepção , Epilepsia/psicologia , Comportamento Sexual , Aconselhamento , Ácido FólicoRESUMO
Objective: The manual extraction of case details from patient records for cancer surveillance efforts is a resource-intensive task. Natural Language Processing (NLP) techniques have been proposed for automating the identification of key details in clinical notes. Our goal was to develop NLP application programming interfaces (APIs) for integration into cancer registry data abstraction tools in a computer-assisted abstraction setting. Methods: We used cancer registry manual abstraction processes to guide the design of DeepPhe-CR, a web-based NLP service API. The coding of key variables was done through NLP methods validated using established workflows. A container-based implementation including the NLP wasdeveloped. Existing registry data abstraction software was modified to include results from DeepPhe-CR. An initial usability study with data registrars provided early validation of the feasibility of the DeepPhe-CR tools. Results: API calls support submission of single documents and summarization of cases across multiple documents. The container-based implementation uses a REST router to handle requests and support a graph database for storing results. NLP modules extract topography, histology, behavior, laterality, and grade at 0.79-1.00 F1 across common and rare cancer types (breast, prostate, lung, colorectal, ovary and pediatric brain) on data from two cancer registries. Usability study participants were able to use the tool effectively and expressed interest in adopting the tool. Discussion: Our DeepPhe-CR system provides a flexible architecture for building cancer-specific NLP tools directly into registrar workflows in a computer-assisted abstraction setting. Improving user interactions in client tools, may be needed to realize the potential of these approaches. DeepPhe-CR: https://deepphe.github.io/.
RESUMO
Background: Infectious disease computational modeling studies have been widely published during the coronavirus disease 2019 (COVID-19) pandemic, yet they have limited reproducibility. Developed through an iterative testing process with multiple reviewers, the Infectious Disease Modeling Reproducibility Checklist (IDMRC) enumerates the minimal elements necessary to support reproducible infectious disease computational modeling publications. The primary objective of this study was to assess the reliability of the IDMRC and to identify which reproducibility elements were unreported in a sample of COVID-19 computational modeling publications. Methods: Four reviewers used the IDMRC to assess 46 preprint and peer reviewed COVID-19 modeling studies published between March 13th, 2020, and July 31st, 2020. The inter-rater reliability was evaluated by mean percent agreement and Fleiss' kappa coefficients (κ). Papers were ranked based on the average number of reported reproducibility elements, and average proportion of papers that reported each checklist item were tabulated. Results: Questions related to the computational environment (mean κ = 0.90, range = 0.90-0.90), analytical software (mean κ = 0.74, range = 0.68-0.82), model description (mean κ = 0.71, range = 0.58-0.84), model implementation (mean κ = 0.68, range = 0.39-0.86), and experimental protocol (mean κ = 0.63, range = 0.58-0.69) had moderate or greater (κ > 0.41) inter-rater reliability. Questions related to data had the lowest values (mean κ = 0.37, range = 0.23-0.59). Reviewers ranked similar papers in the upper and lower quartiles based on the proportion of reproducibility elements each paper reported. While over 70% of the publications provided data used in their models, less than 30% provided the model implementation. Conclusions: The IDMRC is the first comprehensive, quality-assessed tool for guiding researchers in reporting reproducible infectious disease computational modeling studies. The inter-rater reliability assessment found that most scores were characterized by moderate or greater agreement. These results suggests that the IDMRC might be used to provide reliable assessments of the potential for reproducibility of published infectious disease modeling publications. Results of this evaluation identified opportunities for improvement to the model implementation and data questions that can further improve the reliability of the checklist.
RESUMO
PURPOSE: Real-world evidence for radiation therapy (RT) is limited because it is often documented only in the clinical narrative. We developed a natural language processing system for automated extraction of detailed RT events from text to support clinical phenotyping. METHODS AND MATERIALS: A multi-institutional data set of 96 clinician notes, 129 North American Association of Central Cancer Registries cancer abstracts, and 270 RT prescriptions from HemOnc.org was used and divided into train, development, and test sets. Documents were annotated for RT events and associated properties: dose, fraction frequency, fraction number, date, treatment site, and boost. Named entity recognition models for properties were developed by fine-tuning BioClinicalBERT and RoBERTa transformer models. A multiclass RoBERTa-based relation extraction model was developed to link each dose mention with each property in the same event. Models were combined with symbolic rules to create a hybrid end-to-end pipeline for comprehensive RT event extraction. RESULTS: Named entity recognition models were evaluated on the held-out test set with F1 results of 0.96, 0.88, 0.94, 0.88, 0.67, and 0.94 for dose, fraction frequency, fraction number, date, treatment site, and boost, respectively. The relation model achieved an average F1 of 0.86 when the input was gold-labeled entities. The end-to-end system F1 result was 0.81. The end-to-end system performed best on North American Association of Central Cancer Registries abstracts (average F1 0.90), which are mostly copy-paste content from clinician notes. CONCLUSIONS: We developed methods and a hybrid end-to-end system for RT event extraction, which is the first natural language processing system for this task. This system provides proof-of-concept for real-world RT data collection for research and is promising for the potential of natural language processing methods to support clinical care.
Assuntos
Processamento de Linguagem Natural , Neoplasias , Humanos , Neoplasias/radioterapia , Registros Eletrônicos de SaúdeRESUMO
Computational models of infectious diseases have become valuable tools for research and the public health response against epidemic threats. The reproducibility of computational models has been limited, undermining the scientific process and possibly trust in modeling results and related response strategies, such as vaccination. We translated published reproducibility guidelines from a wide range of scientific disciplines into an implementation framework for improving reproducibility of infectious disease computational models. The framework comprises 22 elements that should be described, grouped into 6 categories: computational environment, analytical software, model description, model implementation, data, and experimental protocol. The framework can be used by scientific communities to develop actionable tools for sharing computational models in a reproducible way.