Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 56
Filter
Add more filters

Country/Region as subject
Publication year range
1.
J Biomed Inform ; 149: 104576, 2024 01.
Article in English | MEDLINE | ID: mdl-38101690

ABSTRACT

INTRODUCTION: Machine learning algorithms are expected to work side-by-side with humans in decision-making pipelines. Thus, the ability of classifiers to make reliable decisions is of paramount importance. Deep neural networks (DNNs) represent the state-of-the-art models to address real-world classification. Although the strength of activation in DNNs is often correlated with the network's confidence, in-depth analyses are needed to establish whether they are well calibrated. METHOD: In this paper, we demonstrate the use of DNN-based classification tools to benefit cancer registries by automating information extraction of disease at diagnosis and at surgery from electronic text pathology reports from the US National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) population-based cancer registries. In particular, we introduce multiple methods for selective classification to achieve a target level of accuracy on multiple classification tasks while minimizing the rejection amount-that is, the number of electronic pathology reports for which the model's predictions are unreliable. We evaluate the proposed methods by comparing our approach with the current in-house deep learning-based abstaining classifier. RESULTS: Overall, all the proposed selective classification methods effectively allow for achieving the targeted level of accuracy or higher in a trade-off analysis aimed to minimize the rejection rate. On in-distribution validation and holdout test data, with all the proposed methods, we achieve on all tasks the required target level of accuracy with a lower rejection rate than the deep abstaining classifier (DAC). Interpreting the results for the out-of-distribution test data is more complex; nevertheless, in this case as well, the rejection rate from the best among the proposed methods achieving 97% accuracy or higher is lower than the rejection rate based on the DAC. CONCLUSIONS: We show that although both approaches can flag those samples that should be manually reviewed and labeled by human annotators, the newly proposed methods retain a larger fraction and do so without retraining-thus offering a reduced computational cost compared with the in-house deep learning-based abstaining classifier.


Subject(s)
Deep Learning , Humans , Uncertainty , Neural Networks, Computer , Algorithms , Machine Learning
2.
Cancer ; 129(12): 1821-1835, 2023 06 15.
Article in English | MEDLINE | ID: mdl-37063057

ABSTRACT

BACKGROUND: Depression is common among breast cancer patients and can affect concordance with guideline-recommended treatment plans. Yet, the impact of depression on cancer treatment and survival is understudied, particularly in relation to the timing of the depression diagnosis. METHODS: The Kentucky Cancer Registry data was used to identify female patients diagnosed with primary invasive breast cancer who were 20 years of age or older in 2007-2011. Patients were classified as having no depression, depression pre-cancer diagnosis only, depression post- cancer diagnosis only, or persistent depression. The impact of depression on receiving guideline-recommended treatment and survival was examined using multivariable logistic regression and Cox regression, respectively. RESULTS: Of 6054 eligible patients, 4.1%, 3.7%, and 6.2% patients had persistent depression, depression pre-diagnosis only, and depression post-diagnosis only, respectively. A total of 1770 (29.2%) patients did not receive guideline-recommended cancer treatment. Compared to patients with no depression, the odds of receiving guideline-recommended treatment were decreased in patients with depression pre-diagnosis only (odds ratio [OR], 0.75; 95% confidence interval [CI], 0.54-1.04) but not in patients with post-diagnosis only or persistent depression. Depression post-diagnosis only (hazard ratio, 1.51; 95% CI, 1.24-1.83) and depression pre-diagnosis only (hazard ratio, 1.26; 95% CI, 0.99-1.59) were associated with worse survival. No significant difference in survival was found between patients with persistent depression and patients with no depression (p > .05). CONCLUSIONS: Neglecting depression management after a breast cancer diagnosis may result in poorer cancer treatment concordance and worse survival. Early detection and consistent management of depression is critical in improving patient survival.


Subject(s)
Breast Neoplasms , Humans , Female , Breast Neoplasms/complications , Breast Neoplasms/therapy , Breast Neoplasms/diagnosis , Kentucky/epidemiology , Proportional Hazards Models , Registries
3.
BMC Bioinformatics ; 23(Suppl 12): 386, 2022 Sep 23.
Article in English | MEDLINE | ID: mdl-36151511

ABSTRACT

BACKGROUND: Public Data Commons (PDC) have been highlighted in the scientific literature for their capacity to collect and harmonize big data. On the other hand, local data commons (LDC), located within an institution or organization, have been underrepresented in the scientific literature, even though they are a critical part of research infrastructure. Being closest to the sources of data, LDCs provide the ability to collect and maintain the most up-to-date, high-quality data within an organization, closest to the sources of the data. As a data provider, LDCs have many challenges in both collecting and standardizing data, moreover, as a consumer of PDC, they face problems of data harmonization stemming from the monolithic harmonization pipeline designs commonly adapted by many PDCs. Unfortunately, existing guidelines and resources for building and maintaining data commons exclusively focus on PDC and provide very little information on LDC. RESULTS: This article focuses on four important observations. First, there are three different types of LDC service models that are defined based on their roles and requirements. These can be used as guidelines for building new LDC or enhancing the services of existing LDC. Second, the seven core services of LDC are discussed, including cohort identification and facilitation of genomic sequencing, the management of molecular reports and associated infrastructure, quality control, data harmonization, data integration, data sharing, and data access control. Third, instead of commonly developed monolithic systems, we propose a new data sharing method for data harmonization that combines both divide-and-conquer and bottom-up approaches. Finally, an end-to-end LDC implementation is introduced with real-world examples. CONCLUSIONS: Although LDCs are an optimal place to identify and address data quality issues, they have traditionally been relegated to the role of passive data provider for much larger PDC. Indeed, many LDCs limit their functions to only conducting routine data storage and transmission tasks due to a lack of information on how to design, develop, and improve their services using limited resources. We hope that this work will be the first small step in raising awareness among the LDCs of their expanded utility and to publicize to a wider audience the importance of LDC.


Subject(s)
Big Data , Information Dissemination , Developing Countries , Humans
4.
J Biomed Inform ; 125: 103957, 2022 01.
Article in English | MEDLINE | ID: mdl-34823030

ABSTRACT

In the last decade, the widespread adoption of electronic health record documentation has created huge opportunities for information mining. Natural language processing (NLP) techniques using machine and deep learning are becoming increasingly widespread for information extraction tasks from unstructured clinical notes. Disparities in performance when deploying machine learning models in the real world have recently received considerable attention. In the clinical NLP domain, the robustness of convolutional neural networks (CNNs) for classifying cancer pathology reports under natural distribution shifts remains understudied. In this research, we aim to quantify and improve the performance of the CNN for text classification on out-of-distribution (OOD) datasets resulting from the natural evolution of clinical text in pathology reports. We identified class imbalance due to different prevalence of cancer types as one of the sources of performance drop and analyzed the impact of previous methods for addressing class imbalance when deploying models in real-world domains. Our results show that our novel class-specialized ensemble technique outperforms other methods for the classification of rare cancer types in terms of macro F1 scores. We also found that traditional ensemble methods perform better in top classes, leading to higher micro F1 scores. Based on our findings, we formulate a series of recommendations for other ML practitioners on how to build robust models with extremely imbalanced datasets in biomedical NLP applications.


Subject(s)
Natural Language Processing , Neoplasms , Electronic Health Records , Humans , Machine Learning , Neural Networks, Computer
5.
BMC Bioinformatics ; 22(1): 113, 2021 Mar 09.
Article in English | MEDLINE | ID: mdl-33750288

ABSTRACT

BACKGROUND: Automated text classification has many important applications in the clinical setting; however, obtaining labelled data for training machine learning and deep learning models is often difficult and expensive. Active learning techniques may mitigate this challenge by reducing the amount of labelled data required to effectively train a model. In this study, we analyze the effectiveness of 11 active learning algorithms on classifying subsite and histology from cancer pathology reports using a Convolutional Neural Network as the text classification model. RESULTS: We compare the performance of each active learning strategy using two differently sized datasets and two different classification tasks. Our results show that on all tasks and dataset sizes, all active learning strategies except diversity-sampling strategies outperformed random sampling, i.e., no active learning. On our large dataset (15K initial labelled samples, adding 15K additional labelled samples each iteration of active learning), there was no clear winner between the different active learning strategies. On our small dataset (1K initial labelled samples, adding 1K additional labelled samples each iteration of active learning), marginal and ratio uncertainty sampling performed better than all other active learning techniques. We found that compared to random sampling, active learning strongly helps performance on rare classes by focusing on underrepresented classes. CONCLUSIONS: Active learning can save annotation cost by helping human annotators efficiently and intelligently select which samples to label. Our results show that a dataset constructed using effective active learning techniques requires less than half the amount of labelled data to achieve the same performance as a dataset constructed using random sampling.


Subject(s)
Machine Learning , Neoplasms , Algorithms , Humans , Neoplasms/genetics , Neoplasms/pathology , Neural Networks, Computer
6.
Blood ; 131(26): 2943-2954, 2018 06 28.
Article in English | MEDLINE | ID: mdl-29695515

ABSTRACT

Prostate apoptosis response-4 (Par-4), a proapoptotic tumor suppressor protein, is downregulated in many cancers including renal cell carcinoma, glioblastoma, endometrial, and breast cancer. Par-4 induces apoptosis selectively in various types of cancer cells but not normal cells. We found that chronic lymphocytic leukemia (CLL) cells from human patients and from Eµ-Tcl1 mice constitutively express Par-4 in greater amounts than normal B-1 or B-2 cells. Interestingly, knockdown of Par-4 in human CLL-derived Mec-1 cells results in a robust increase in p21/WAF1 expression and decreased growth due to delayed G1-to-S cell-cycle transition. Lack of Par-4 also increased the expression of p21 and delayed CLL growth in Eµ-Tcl1 mice. Par-4 expression in CLL cells required constitutively active B-cell receptor (BCR) signaling, as inhibition of BCR signaling with US Food and Drug Administration (FDA)-approved drugs caused a decrease in Par-4 messenger RNA and protein, and an increase in apoptosis. In particular, activities of Lyn, a Src family kinase, spleen tyrosine kinase, and Bruton tyrosine kinase are required for Par-4 expression in CLL cells, suggesting a novel regulation of Par-4 through BCR signaling. Together, these results suggest that Par-4 may play a novel progrowth rather than proapoptotic role in CLL and could be targeted to enhance the therapeutic effects of BCR-signaling inhibitors.


Subject(s)
Apoptosis Regulatory Proteins/metabolism , Gene Expression Regulation, Leukemic , Leukemia, Lymphocytic, Chronic, B-Cell/metabolism , Animals , Apoptosis Regulatory Proteins/genetics , Cell Cycle , Cell Line, Tumor , Cyclin-Dependent Kinase Inhibitor p21/genetics , Cyclin-Dependent Kinase Inhibitor p21/metabolism , Gene Deletion , Humans , Leukemia, Lymphocytic, Chronic, B-Cell/genetics , Leukemia, Lymphocytic, Chronic, B-Cell/pathology , Mice, Inbred C57BL , Mice, Inbred NOD , Receptors, Antigen, B-Cell/metabolism , Signal Transduction , Up-Regulation
7.
J Biomed Inform ; 110: 103564, 2020 10.
Article in English | MEDLINE | ID: mdl-32919043

ABSTRACT

OBJECTIVE: In machine learning, it is evident that the classification of the task performance increases if bootstrap aggregation (bagging) is applied. However, the bagging of deep neural networks takes tremendous amounts of computational resources and training time. The research question that we aimed to answer in this research is whether we could achieve higher task performance scores and accelerate the training by dividing a problem into sub-problems. MATERIALS AND METHODS: The data used in this study consist of free text from electronic cancer pathology reports. We applied bagging and partitioned data training using Multi-Task Convolutional Neural Network (MT-CNN) and Multi-Task Hierarchical Convolutional Attention Network (MT-HCAN) classifiers. We split a big problem into 20 sub-problems, resampled the training cases 2,000 times, and trained the deep learning model for each bootstrap sample and each sub-problem-thus, generating up to 40,000 models. We performed the training of many models concurrently in a high-performance computing environment at Oak Ridge National Laboratory (ORNL). RESULTS: We demonstrated that aggregation of the models improves task performance compared with the single-model approach, which is consistent with other research studies; and we demonstrated that the two proposed partitioned bagging methods achieved higher classification accuracy scores on four tasks. Notably, the improvements were significant for the extraction of cancer histology data, which had more than 500 class labels in the task; these results show that data partition may alleviate the complexity of the task. On the contrary, the methods did not achieve superior scores for the tasks of site and subsite classification. Intrinsically, since data partitioning was based on the primary cancer site, the accuracy depended on the determination of the partitions, which needs further investigation and improvement. CONCLUSION: Results in this research demonstrate that 1. The data partitioning and bagging strategy achieved higher performance scores. 2. We achieved faster training leveraged by the high-performance Summit supercomputer at ORNL.


Subject(s)
Neoplasms , Neural Networks, Computer , Computing Methodologies , Humans , Information Storage and Retrieval , Machine Learning
8.
BMC Med Inform Decis Mak ; 20(Suppl 10): 271, 2020 12 15.
Article in English | MEDLINE | ID: mdl-33319710

ABSTRACT

BACKGROUND: The Kentucky Cancer Registry (KCR) is a central cancer registry for the state of Kentucky that receives data about incident cancer cases from all healthcare facilities in the state within 6 months of diagnosis. Similar to all other U.S. and Canadian cancer registries, KCR uses a data dictionary provided by the North American Association of Central Cancer Registries (NAACCR) for standardized data entry. The NAACCR data dictionary is not an ontological system. Mapping between the NAACCR data dictionary and the National Cancer Institute (NCI) Thesaurus (NCIt) will facilitate the enrichment, dissemination and utilization of cancer registry data. We introduce a web-based system, called Interactive Mapping Interface (IMI), for creating mappings from data dictionaries to ontologies, in particular from NAACCR to NCIt. METHOD: IMI has been designed as a general approach with three components: (1) ontology library; (2) mapping interface; and (3) recommendation engine. The ontology library provides a list of ontologies as targets for building mappings. The mapping interface consists of six modules: project management, mapping dashboard, access control, logs and comments, hierarchical visualization, and result review and export. The built-in recommendation engine automatically identifies a list of candidate concepts to facilitate the mapping process. RESULTS: We report the architecture design and interface features of IMI. To validate our approach, we implemented an IMI prototype and pilot-tested features using the IMI interface to map a sample set of NAACCR data elements to NCIt concepts. 47 out of 301 NAACCR data elements have been mapped to NCIt concepts. Five branches of hierarchical tree have been identified from these mapped concepts for visual inspection. CONCLUSIONS: IMI provides an interactive, web-based interface for building mappings from data dictionaries to ontologies. Although our pilot-testing scope is limited, our results demonstrate feasibility using IMI for semantic enrichment of cancer registry data by mapping NAACCR data elements to NCIt concepts.


Subject(s)
Biological Ontologies , Neoplasms , Canada/epidemiology , Humans , Internet , Neoplasms/diagnosis , Neoplasms/epidemiology , Registries , Vocabulary, Controlled
9.
Cancer ; 125(21): 3729-3737, 2019 11 01.
Article in English | MEDLINE | ID: mdl-31381143

ABSTRACT

Population-based cancer registries have improved dramatically over the last 2 decades. These central cancer registries provide a critical framework that can elevate the science of cancer research. There have also been important technical and scientific advances that help to unlock the potential of population-based cancer registries. These advances include improvements in probabilistic record linkage, refinements in natural language processing, the ability to perform genomic sequencing on formalin-fixed, paraffin-embedded (FFPE) tissue, and improvements in the ability to identify activity levels of many different signaling molecules in FFPE tissue. This article describes how central cancer registries can provide a population-based sample frame that will lead to studies with strong external validity, how central cancer registries can link with public and private health insurance claims to obtain complete treatment information, how central cancer registries can use informatics techniques to provide population-based rapid case ascertainment, how central cancer registries can serve as a population-based virtual tissue repository, and how population-based cancer registries are essential for guiding the implementation of evidence-based interventions and measuring changes in the cancer burden after the implementation of these interventions.


Subject(s)
Neoplasms/diagnosis , Neoplasms/therapy , Population Surveillance/methods , Registries/statistics & numerical data , Biomedical Research/methods , Biomedical Research/statistics & numerical data , Fixatives/chemistry , Formaldehyde/chemistry , High-Throughput Nucleotide Sequencing/methods , Humans , Paraffin Embedding/methods , Tissue Fixation/methods
10.
Cancer Control ; 26(1): 1073274819845873, 2019.
Article in English | MEDLINE | ID: mdl-31014079

ABSTRACT

Recent metabolic and genetic research has demonstrated that risk for specific histological types of lung cancer varies in relation to cigarette smoking and obesity. This study investigated the spatial and temporal distribution of lung cancer histological types in Kentucky, a largely rural state with high rates of smoking and obesity, to discern population-level trends that might reflect variation in these and other risk factors. The Kentucky Cancer Registry provided residential geographic coordinates for lung cancer cases diagnosed from 1995 through 2014. We used multinomial and discrete Poisson spatiotemporal scan statistics, adjusted for age, gender, and race, to characterize risk for specific histological types-small cell, adenocarcinoma, squamous cell, and other types-throughout Kentucky and compared to maps of risk factors. Toward the end of the study period, adenocarcinoma was more common among all population subgroups in north-central Kentucky, where smoking and obesity are less prevalent. During the same time frame, squamous cell, small cell, and other types were more common in rural Appalachia, where smoking and obesity are more prevalent, and in some high poverty urban areas. Spatial and temporal patterns in the distribution of histological types of lung cancer are likely related to regional variation in multiple risk factors. High smoking and obesity rates in the Appalachian region, and likely in high poverty urban areas, appeared to coincide with high rates of squamous cell and small cell lung cancer. In north-central Kentucky, environmental exposures might have resulted in higher risk for adenocarcinoma specifically.


Subject(s)
Adenocarcinoma of Lung/epidemiology , Cigarette Smoking/epidemiology , Lung Neoplasms/epidemiology , Obesity/epidemiology , Small Cell Lung Carcinoma/epidemiology , Adenocarcinoma of Lung/pathology , Aged , Cluster Analysis , Female , Humans , Kentucky/epidemiology , Lung Neoplasms/pathology , Male , Middle Aged , Risk Factors , Small Cell Lung Carcinoma/pathology , Spatio-Temporal Analysis
11.
J Biomed Inform ; 97: 103267, 2019 09.
Article in English | MEDLINE | ID: mdl-31401235

ABSTRACT

OBJECTIVE: We study the performance of machine learning (ML) methods, including neural networks (NNs), to extract mutational test results from pathology reports collected by cancer registries. Given the lack of hand-labeled datasets for mutational test result extraction, we focus on the particular use-case of extracting Epidermal Growth Factor Receptor mutation results in non-small cell lung cancers. We explore the generalization of NNs across different registries where our goals are twofold: (1) to assess how well models trained on a registry's data port to test data from a different registry and (2) to assess whether and to what extent such models can be improved using state-of-the-art neural domain adaptation techniques under different assumptions about what is available (labeled vs unlabeled data) at the target registry site. MATERIALS AND METHODS: We collected data from two registries: the Kentucky Cancer Registry (KCR) and the Fred Hutchinson Cancer Research Center (FH) Cancer Surveillance System. We combine NNs with adversarial domain adaptation to improve cross-registry performance. We compare to other classifiers in the standard supervised classification, unsupervised domain adaptation, and supervised domain adaptation scenarios. RESULTS: The performance of ML methods varied between registries. To extract positive results, the basic convolutional neural network (CNN) had an F1 of 71.5% on the KCR dataset and 95.7% on the FH dataset. For the KCR dataset, the CNN F1 results were low when trained on FH data (Positive F1: 23%). Using our proposed adversarial CNN, without any labeled data, we match the F1 of the models trained directly on each target registry's data. The adversarial CNN F1 improved when trained on FH and applied to KCR dataset (Positive F1: 70.8%). We found similar performance improvements when we trained on KCR and tested on FH reports (Positive F1: 45% to 96%). CONCLUSION: Adversarial domain adaptation improves the performance of NNs applied to pathology reports. In the unsupervised domain adaptation setting, we match the performance of models that are trained directly on target registry's data by using source registry's labeled data and unlabeled examples from the target registry.


Subject(s)
Machine Learning , Mutation , Neoplasms/genetics , Neoplasms/pathology , Registries/statistics & numerical data , Carcinoma, Non-Small-Cell Lung/genetics , Carcinoma, Non-Small-Cell Lung/pathology , Computational Biology , Data Mining , Deep Learning , ErbB Receptors/genetics , Humans , Lung Neoplasms/genetics , Lung Neoplasms/pathology , Neural Networks, Computer
12.
J Neurooncol ; 132(3): 507-512, 2017 05.
Article in English | MEDLINE | ID: mdl-28285334

ABSTRACT

Determine whether the risk of astrocytomas in Appalachian children is higher than the national average. We compared the incidence of pediatric brain tumors in Appalachia versus non-Appalachia regions, covering years 2000-2011. The North American Association of Central Cancer Registries (NAACCR) collects population-based data from 55 cancer registries throughout U.S. and Canada. All invasive primary (i.e. non-metastatic tumors), with age at diagnosis 0-19 years old, were included. Nearly 27,000 and 2200 central nervous system (CNS) tumors from non-Appalachia and Appalachia, respectively comprise the cohorts. Age-adjusted incidence rates of each main brain tumor subtype were compared. The incidence rate of pediatric CNS tumors was 8% higher in Appalachia, 3.31 [95% CI 3.17-3.45] versus non-Appalachia, 3.06, [95% CI 3.02-3.09] for the years 2001-2011, all rates are per 100,000 population. Astrocytomas accounted for the majority of this difference, with the rate being 16% higher in Appalachian children, 1.77, [95% CI 1.67-1.87] versus non-Appalachian children, 1.52, [95% CI 1.50-1.55]. Among astrocytomas, World Health Organization (WHO) grade I astrocytomas were 41% higher in Appalachia, 0.63 [95% CI 0.56-0.70] versus non-Appalachia 0.44 [95% CI 0.43-0.46] for the years 2004-2011. This is the first study to demonstrate that Appalachian children are at greater risk of CNS neoplasms, and that much of this difference is in WHO grade I astrocytomas, 41% more common. The cause of this increased incidence is unknown and we discuss the importance of this in relation to genetic and environmental findings in Appalachia.


Subject(s)
Brain Neoplasms/epidemiology , Adolescent , Appalachian Region/epidemiology , Child , Child, Preschool , Female , Humans , Incidence , Infant , Infant, Newborn , Male , Registries , Young Adult
13.
J Surg Res ; 214: 1-8, 2017 06 15.
Article in English | MEDLINE | ID: mdl-28624029

ABSTRACT

BACKGROUND: Although adjuvant therapy (AT) is a necessary component of multimodality therapy for pancreatic ductal adenocarcinoma (PDAC), its application can be hindered by post-pancreaticoduodenectomy (PD) complications. The primary aim of this study was to evaluate the impact of post-PD complications on AT utilization and overall survival (OS). METHODS: Patients undergoing PD without neoadjuvant therapy for stages I-III PDAC at a single institution (2007-2015) were evaluated. Ninety-day postoperative major complications (PMCs) were defined as grade ≥3. Records were linked to the Kentucky Cancer Registry for AT/OS data. Early AT was given <8 wk; late 8-16 wk. Initiation >16 wk was not considered to be AT. Complication effects on AT timing/utilization and OS were evaluated. RESULTS: Of 93 consecutive patients treated with surgery upfront with AT data, 64 (69%) received AT (41 [44%] early; 23 [25%] late). There were 32 patients (34%) with low-grade complications and 24 (26%) with PMC. With PMC, only six of 24 patients (25%) received early AT and 13 of 24 (54%) received any (early/late) AT versus 35 of 69 (51%) early AT and 51 of 69 (74%) any AT without PMC. PMCs were associated with worse median OS (7.1 versus 24.6 mo, without PMC, P < 0.001). Independent predictors of OS included AT (hazard ratio [HR]: 0.48), tumor >2 cm (HR: 3.39), node-positivity (HR: 2.16), and PMC (HR: 3.69, all P < 0.02). CONCLUSIONS: Independent of AT utilization and biologic factors, PMC negatively impacted OS in patients treated with surgery first. These data suggest that strategies to decrease PMC and treatment sequencing alternatives to increase multimodality therapy rates may improve oncologic outcomes for PDAC.


Subject(s)
Carcinoma, Pancreatic Ductal/therapy , Pancreatic Neoplasms/therapy , Pancreaticoduodenectomy , Postoperative Complications , Adult , Aged , Aged, 80 and over , Carcinoma, Pancreatic Ductal/mortality , Carcinoma, Pancreatic Ductal/surgery , Chemotherapy, Adjuvant , Female , Follow-Up Studies , Humans , Male , Middle Aged , Pancreatic Neoplasms/mortality , Pancreatic Neoplasms/surgery , Radiotherapy, Adjuvant , Retrospective Studies , Survival Analysis , Time Factors , Treatment Outcome
14.
J Surg Oncol ; 114(4): 451-5, 2016 Sep.
Article in English | MEDLINE | ID: mdl-27238300

ABSTRACT

BACKGROUND: Long-term results of the ESPAC-3 trial suggest that while completing adjuvant therapy (AT) is necessary after resection of pancreatic ductal adenocarcinoma (PDAC), early initiation (within 8 weeks) may not be associated with improved overall survival (OS). The primary aim of this study was to evaluate the OS impact of early versus late AT in a statewide analysis. METHODS: Patients with stages I-III PDAC in the Kentucky Cancer Registry (KCR) from 2004 to 2013, were evaluated. Those undergoing pancreatectomy were stratified into two groups ("early," <8 weeks, vs. "late," 8-16 weeks). RESULTS: Of 2,221 diagnosed patients with stages I-III, 831 (37.4%) underwent pancreatectomy upfront. Of these, only 420 (50.5%) received AT. Initiation date of AT was not associated with OS (median OS: early, 20.2 vs. late, 19.0 months, P = 0.97). On multivariate analysis, factors that affected OS included stage (II, HR-1.82, P = 0.017; III, HR-3.77, P < 0.001), node positivity (HR-1.51, P = 0.004), poorly/undifferentiated grade (HR-1.34; P = 0.011), but not AT initiation date. CONCLUSIONS: In this statewide analysis, there was no difference in OS between early and late AT initiation for resected PDAC. The ideal window for AT initiation remains unknown as tumor biology continues to trump regimens from the past decade. J. Surg. Oncol. 2016;114:451-455. © 2016 Wiley Periodicals, Inc.


Subject(s)
Adenocarcinoma/therapy , Carcinoma, Pancreatic Ductal/therapy , Pancreatectomy , Pancreatic Neoplasms/therapy , Adenocarcinoma/mortality , Adenocarcinoma/pathology , Adult , Aged , Aged, 80 and over , Carcinoma, Pancreatic Ductal/mortality , Carcinoma, Pancreatic Ductal/pathology , Combined Modality Therapy , Female , Humans , Male , Middle Aged , Neoplasm Staging , Pancreatic Neoplasms/mortality , Pancreatic Neoplasms/pathology , Registries , Time Factors
15.
JCO Oncol Pract ; 20(5): 631-642, 2024 May.
Article in English | MEDLINE | ID: mdl-38194612

ABSTRACT

PURPOSE: Database linkage between cancer registries and clinical trial consortia has the potential to elucidate referral patterns of children and adolescents with newly diagnosed cancer, including enrollment into cancer clinical trials. This study's primary objective was to assess the feasibility of this linkage approach. METHODS: Patients younger than 20 years diagnosed with incident cancer during 2012-2017 in the Kentucky Cancer Registry (KCR) were linked with patients enrolled in a Children's Oncology Group (COG) study. Matched patients between databases were described by sex, age, race and ethnicity, geographical location when diagnosed, and cancer type. Logistic regression modeling identified factors associated with COG study enrollment. Timeliness of patient identification by KCR was reported through the Centers for Disease Control and Prevention's Early Case Capture (ECC) program. RESULTS: Of 1,357 patients reported to KCR, 47% were determined by matching to be enrolled in a COG study. Patients had greater odds of enrollment if they were age 0-4 years (v 15-19 years), reported from a COG-affiliated institution, and had renal cancer, neuroblastoma, or leukemia. Patients had lower odds of enrollment if Hispanic (v non-Hispanic White) or had epithelial (eg, thyroid, melanoma) cancer. Most (59%) patients were reported to KCR within 10 days of pathologic diagnosis. CONCLUSION: Linkage of clinical trial data with cancer registries is a feasible approach for tracking patient referral and clinical trial enrollment patterns. Adolescents had lower enrollment compared with younger age groups, independent of cancer type. Population-based early case capture could guide interventions designed to increase cancer clinical trial enrollment.


Subject(s)
Clinical Trials as Topic , Neoplasms , Humans , Adolescent , Child , Female , Male , Neoplasms/therapy , Neoplasms/epidemiology , Child, Preschool , Infant , Infant, Newborn , Registries , Young Adult , Patient Selection , Information Storage and Retrieval
16.
Front Oncol ; 13: 1193487, 2023.
Article in English | MEDLINE | ID: mdl-37664066

ABSTRACT

Background: Appalachia is a region with significant cancer disparities in incidence and mortality compared to Kentucky and the United States. However, the contribution of these cancer health disparities to subsequent primary cancers (SPCs) among survivors of adult-onset cancers is limited. This study aimed to quantify the overall and cancer type-specific risks of SPCs among adult-onset cancer survivors by first primary cancer (FPC) types, residence and sex. Methods: This retrospective cohort study from the Kentucky Cancer Registry included 148,509 individuals aged 20-84 years diagnosed with FPCs from 2000-2014 (followed until December 31, 2019) and survived at least 5 years. Expected numbers of SPC were derived from incidence rates in the Kentucky population; standardized incidence ratio (SIR) compared with those expected in the general Kentucky population. Results: Among 148,509 survivors (50.2% women, 27.9% Appalachian), 17,970 SPC cases occurred during 829,530 person-years of follow-up (mean, 5.6 years). Among men, the overall risk of developing any SPCs was statistically significantly higher for 20 of the 30 FPC types, as compared with risks in the general population. Among women, the overall risk of developing any SPCs was statistically significantly higher for 20 of the 31 FPC types, as compared to the general population. The highest overall SIR were estimated among oral cancer survivors (SIR, 2.14 [95% CI, 1.97-2.33] among men, and among laryngeal cancer survivors (SIR, 3.62 [95% CI, 2.93-4.42], among women. Appalachian survivors had significantly increased risk of overall SPC and different site specific SPC when compared to non-Appalachian survivors. The highest overall SIR were estimated among laryngeal cancer survivors for both Appalachian and non-Appalachian residents (SIR, 2.50: 95%CI, 2.10-2.95; SIR, 2.02: 95% CI, 1.77-2.03, respectively). Conclusion: Among adult-onset cancer survivors in Kentucky, several FPC types were significantly associated with greater risk of developing an SPC, compared with the general population. Risk for Appalachian survivors was even higher when compared to non-Appalachian residents, but was not explained by higher risk of smoking related cancers. Cancers associated with smoking comprised substantial proportions of overall SPC incidence among all survivors and highlight the importance of ongoing surveillance and efforts to prevent new cancers among survivors.

17.
medRxiv ; 2023 Oct 26.
Article in English | MEDLINE | ID: mdl-37205575

ABSTRACT

Objective: The manual extraction of case details from patient records for cancer surveillance efforts is a resource-intensive task. Natural Language Processing (NLP) techniques have been proposed for automating the identification of key details in clinical notes. Our goal was to develop NLP application programming interfaces (APIs) for integration into cancer registry data abstraction tools in a computer-assisted abstraction setting. Methods: We used cancer registry manual abstraction processes to guide the design of DeepPhe-CR, a web-based NLP service API. The coding of key variables was done through NLP methods validated using established workflows. A container-based implementation including the NLP wasdeveloped. Existing registry data abstraction software was modified to include results from DeepPhe-CR. An initial usability study with data registrars provided early validation of the feasibility of the DeepPhe-CR tools. Results: API calls support submission of single documents and summarization of cases across multiple documents. The container-based implementation uses a REST router to handle requests and support a graph database for storing results. NLP modules extract topography, histology, behavior, laterality, and grade at 0.79-1.00 F1 across common and rare cancer types (breast, prostate, lung, colorectal, ovary and pediatric brain) on data from two cancer registries. Usability study participants were able to use the tool effectively and expressed interest in adopting the tool. Discussion: Our DeepPhe-CR system provides a flexible architecture for building cancer-specific NLP tools directly into registrar workflows in a computer-assisted abstraction setting. Improving user interactions in client tools, may be needed to realize the potential of these approaches. DeepPhe-CR: https://deepphe.github.io/.

18.
J Natl Cancer Inst ; 115(11): 1337-1354, 2023 11 08.
Article in English | MEDLINE | ID: mdl-37433078

ABSTRACT

BACKGROUND: Cancer is a leading cause of death by disease among children and adolescents in the United States. This study updates cancer incidence rates and trends using the most recent and comprehensive US cancer registry data available. METHODS: We used data from US Cancer Statistics to evaluate counts, age-adjusted incidence rates, and trends among children and adolescents younger than 20 years of age diagnosed with malignant tumors between 2003 and 2019. We calculated the average annual percent change (APC) and APC using joinpoint regression. Rates and trends were stratified by demographic and geographic characteristics and by cancer type. RESULTS: With 248 749 cases reported between 2003 and 2019, the overall cancer incidence rate was 178.3 per 1 million; incidence rates were highest for leukemia (46.6), central nervous system neoplasms (30.8), and lymphoma (27.3). Rates were highest for males, children 0 to 4 years of age, Non-Hispanic White children and adolescents, those in the Northeast census region, the top 25% of counties by economic status, and metropolitan counties with a population of 1 million people or more. Although the overall incidence rate of pediatric cancer increased 0.5% per year on average between 2003 and 2019, the rate increased between 2003 and 2016 (APC = 1.1%), and then decreased between 2016 and 2019 (APC = -2.1%). Between 2003 and 2019, rates of leukemia, lymphoma, hepatic tumors, bone tumors, and thyroid carcinomas increased, while melanoma rates decreased. Rates of central nervous system neoplasms increased until 2017, and then decreased. Rates of other cancer types remained stable. CONCLUSIONS: Incidence of pediatric cancer increased overall, although increases were limited to certain cancer types. These findings may guide future public health and research priorities.


Subject(s)
Central Nervous System Neoplasms , Leukemia , Lymphoma , Melanoma , Child , Male , Adolescent , Humans , United States/epidemiology , Young Adult , Adult , Incidence , Lymphoma/epidemiology , Central Nervous System Neoplasms/epidemiology , Leukemia/epidemiology
19.
Int J Radiat Oncol Biol Phys ; 117(1): 262-273, 2023 09 01.
Article in English | MEDLINE | ID: mdl-36990288

ABSTRACT

PURPOSE: Real-world evidence for radiation therapy (RT) is limited because it is often documented only in the clinical narrative. We developed a natural language processing system for automated extraction of detailed RT events from text to support clinical phenotyping. METHODS AND MATERIALS: A multi-institutional data set of 96 clinician notes, 129 North American Association of Central Cancer Registries cancer abstracts, and 270 RT prescriptions from HemOnc.org was used and divided into train, development, and test sets. Documents were annotated for RT events and associated properties: dose, fraction frequency, fraction number, date, treatment site, and boost. Named entity recognition models for properties were developed by fine-tuning BioClinicalBERT and RoBERTa transformer models. A multiclass RoBERTa-based relation extraction model was developed to link each dose mention with each property in the same event. Models were combined with symbolic rules to create a hybrid end-to-end pipeline for comprehensive RT event extraction. RESULTS: Named entity recognition models were evaluated on the held-out test set with F1 results of 0.96, 0.88, 0.94, 0.88, 0.67, and 0.94 for dose, fraction frequency, fraction number, date, treatment site, and boost, respectively. The relation model achieved an average F1 of 0.86 when the input was gold-labeled entities. The end-to-end system F1 result was 0.81. The end-to-end system performed best on North American Association of Central Cancer Registries abstracts (average F1 0.90), which are mostly copy-paste content from clinician notes. CONCLUSIONS: We developed methods and a hybrid end-to-end system for RT event extraction, which is the first natural language processing system for this task. This system provides proof-of-concept for real-world RT data collection for research and is promising for the potential of natural language processing methods to support clinical care.


Subject(s)
Natural Language Processing , Neoplasms , Humans , Neoplasms/radiotherapy , Electronic Health Records
20.
JCO Clin Cancer Inform ; 7: e2300156, 2023 09.
Article in English | MEDLINE | ID: mdl-38113411

ABSTRACT

PURPOSE: Manual extraction of case details from patient records for cancer surveillance is a resource-intensive task. Natural Language Processing (NLP) techniques have been proposed for automating the identification of key details in clinical notes. Our goal was to develop NLP application programming interfaces (APIs) for integration into cancer registry data abstraction tools in a computer-assisted abstraction setting. METHODS: We used cancer registry manual abstraction processes to guide the design of DeepPhe-CR, a web-based NLP service API. The coding of key variables was performed through NLP methods validated using established workflows. A container-based implementation of the NLP methods and the supporting infrastructure was developed. Existing registry data abstraction software was modified to include results from DeepPhe-CR. An initial usability study with data registrars provided early validation of the feasibility of the DeepPhe-CR tools. RESULTS: API calls support submission of single documents and summarization of cases across one or more documents. The container-based implementation uses a REST router to handle requests and support a graph database for storing results. NLP modules extract topography, histology, behavior, laterality, and grade at 0.79-1.00 F1 across multiple cancer types (breast, prostate, lung, colorectal, ovary, and pediatric brain) from data of two population-based cancer registries. Usability study participants were able to use the tool effectively and expressed interest in the tool. CONCLUSION: The DeepPhe-CR system provides an architecture for building cancer-specific NLP tools directly into registrar workflows in a computer-assisted abstraction setting. Improved user interactions in client tools may be needed to realize the potential of these approaches.


Subject(s)
Natural Language Processing , Neoplasms , Male , Female , Humans , Child , Software , Prostate , Registries , Neoplasms/diagnosis , Neoplasms/therapy
SELECTION OF CITATIONS
SEARCH DETAIL