Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 56
Filtrar
1.
JCO Oncol Pract ; 20(5): 631-642, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38194612

RESUMO

PURPOSE: Database linkage between cancer registries and clinical trial consortia has the potential to elucidate referral patterns of children and adolescents with newly diagnosed cancer, including enrollment into cancer clinical trials. This study's primary objective was to assess the feasibility of this linkage approach. METHODS: Patients younger than 20 years diagnosed with incident cancer during 2012-2017 in the Kentucky Cancer Registry (KCR) were linked with patients enrolled in a Children's Oncology Group (COG) study. Matched patients between databases were described by sex, age, race and ethnicity, geographical location when diagnosed, and cancer type. Logistic regression modeling identified factors associated with COG study enrollment. Timeliness of patient identification by KCR was reported through the Centers for Disease Control and Prevention's Early Case Capture (ECC) program. RESULTS: Of 1,357 patients reported to KCR, 47% were determined by matching to be enrolled in a COG study. Patients had greater odds of enrollment if they were age 0-4 years (v 15-19 years), reported from a COG-affiliated institution, and had renal cancer, neuroblastoma, or leukemia. Patients had lower odds of enrollment if Hispanic (v non-Hispanic White) or had epithelial (eg, thyroid, melanoma) cancer. Most (59%) patients were reported to KCR within 10 days of pathologic diagnosis. CONCLUSION: Linkage of clinical trial data with cancer registries is a feasible approach for tracking patient referral and clinical trial enrollment patterns. Adolescents had lower enrollment compared with younger age groups, independent of cancer type. Population-based early case capture could guide interventions designed to increase cancer clinical trial enrollment.


Assuntos
Ensaios Clínicos como Assunto , Neoplasias , Humanos , Adolescente , Criança , Feminino , Masculino , Neoplasias/terapia , Neoplasias/epidemiologia , Pré-Escolar , Lactente , Recém-Nascido , Sistema de Registros , Adulto Jovem , Seleção de Pacientes , Armazenamento e Recuperação da Informação
2.
J Biomed Inform ; 149: 104576, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38101690

RESUMO

INTRODUCTION: Machine learning algorithms are expected to work side-by-side with humans in decision-making pipelines. Thus, the ability of classifiers to make reliable decisions is of paramount importance. Deep neural networks (DNNs) represent the state-of-the-art models to address real-world classification. Although the strength of activation in DNNs is often correlated with the network's confidence, in-depth analyses are needed to establish whether they are well calibrated. METHOD: In this paper, we demonstrate the use of DNN-based classification tools to benefit cancer registries by automating information extraction of disease at diagnosis and at surgery from electronic text pathology reports from the US National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) population-based cancer registries. In particular, we introduce multiple methods for selective classification to achieve a target level of accuracy on multiple classification tasks while minimizing the rejection amount-that is, the number of electronic pathology reports for which the model's predictions are unreliable. We evaluate the proposed methods by comparing our approach with the current in-house deep learning-based abstaining classifier. RESULTS: Overall, all the proposed selective classification methods effectively allow for achieving the targeted level of accuracy or higher in a trade-off analysis aimed to minimize the rejection rate. On in-distribution validation and holdout test data, with all the proposed methods, we achieve on all tasks the required target level of accuracy with a lower rejection rate than the deep abstaining classifier (DAC). Interpreting the results for the out-of-distribution test data is more complex; nevertheless, in this case as well, the rejection rate from the best among the proposed methods achieving 97% accuracy or higher is lower than the rejection rate based on the DAC. CONCLUSIONS: We show that although both approaches can flag those samples that should be manually reviewed and labeled by human annotators, the newly proposed methods retain a larger fraction and do so without retraining-thus offering a reduced computational cost compared with the in-house deep learning-based abstaining classifier.


Assuntos
Aprendizado Profundo , Humanos , Incerteza , Redes Neurais de Computação , Algoritmos , Aprendizado de Máquina
3.
JCO Clin Cancer Inform ; 7: e2300156, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38113411

RESUMO

PURPOSE: Manual extraction of case details from patient records for cancer surveillance is a resource-intensive task. Natural Language Processing (NLP) techniques have been proposed for automating the identification of key details in clinical notes. Our goal was to develop NLP application programming interfaces (APIs) for integration into cancer registry data abstraction tools in a computer-assisted abstraction setting. METHODS: We used cancer registry manual abstraction processes to guide the design of DeepPhe-CR, a web-based NLP service API. The coding of key variables was performed through NLP methods validated using established workflows. A container-based implementation of the NLP methods and the supporting infrastructure was developed. Existing registry data abstraction software was modified to include results from DeepPhe-CR. An initial usability study with data registrars provided early validation of the feasibility of the DeepPhe-CR tools. RESULTS: API calls support submission of single documents and summarization of cases across one or more documents. The container-based implementation uses a REST router to handle requests and support a graph database for storing results. NLP modules extract topography, histology, behavior, laterality, and grade at 0.79-1.00 F1 across multiple cancer types (breast, prostate, lung, colorectal, ovary, and pediatric brain) from data of two population-based cancer registries. Usability study participants were able to use the tool effectively and expressed interest in the tool. CONCLUSION: The DeepPhe-CR system provides an architecture for building cancer-specific NLP tools directly into registrar workflows in a computer-assisted abstraction setting. Improved user interactions in client tools may be needed to realize the potential of these approaches.


Assuntos
Processamento de Linguagem Natural , Neoplasias , Masculino , Feminino , Humanos , Criança , Software , Próstata , Sistema de Registros , Neoplasias/diagnóstico , Neoplasias/terapia
4.
Front Oncol ; 13: 1193487, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37664066

RESUMO

Background: Appalachia is a region with significant cancer disparities in incidence and mortality compared to Kentucky and the United States. However, the contribution of these cancer health disparities to subsequent primary cancers (SPCs) among survivors of adult-onset cancers is limited. This study aimed to quantify the overall and cancer type-specific risks of SPCs among adult-onset cancer survivors by first primary cancer (FPC) types, residence and sex. Methods: This retrospective cohort study from the Kentucky Cancer Registry included 148,509 individuals aged 20-84 years diagnosed with FPCs from 2000-2014 (followed until December 31, 2019) and survived at least 5 years. Expected numbers of SPC were derived from incidence rates in the Kentucky population; standardized incidence ratio (SIR) compared with those expected in the general Kentucky population. Results: Among 148,509 survivors (50.2% women, 27.9% Appalachian), 17,970 SPC cases occurred during 829,530 person-years of follow-up (mean, 5.6 years). Among men, the overall risk of developing any SPCs was statistically significantly higher for 20 of the 30 FPC types, as compared with risks in the general population. Among women, the overall risk of developing any SPCs was statistically significantly higher for 20 of the 31 FPC types, as compared to the general population. The highest overall SIR were estimated among oral cancer survivors (SIR, 2.14 [95% CI, 1.97-2.33] among men, and among laryngeal cancer survivors (SIR, 3.62 [95% CI, 2.93-4.42], among women. Appalachian survivors had significantly increased risk of overall SPC and different site specific SPC when compared to non-Appalachian survivors. The highest overall SIR were estimated among laryngeal cancer survivors for both Appalachian and non-Appalachian residents (SIR, 2.50: 95%CI, 2.10-2.95; SIR, 2.02: 95% CI, 1.77-2.03, respectively). Conclusion: Among adult-onset cancer survivors in Kentucky, several FPC types were significantly associated with greater risk of developing an SPC, compared with the general population. Risk for Appalachian survivors was even higher when compared to non-Appalachian residents, but was not explained by higher risk of smoking related cancers. Cancers associated with smoking comprised substantial proportions of overall SPC incidence among all survivors and highlight the importance of ongoing surveillance and efforts to prevent new cancers among survivors.

5.
JNCI Cancer Spectr ; 7(5)2023 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-37525535

RESUMO

BACKGROUND: Management of localized or recurrent prostate cancer since the 1990s has been based on risk stratification using clinicopathological variables, including Gleason score, T stage (based on digital rectal exam), and prostate-specific antigen (PSA). In this study a novel prognostic test, the Decipher Prostate Genomic Classifier (GC), was used to stratify risk of prostate cancer progression in a US national database of men with prostate cancer. METHODS: Records of prostate cancer cases from participating SEER (Surveillance, Epidemiology, and End Results) program registries, diagnosed during the period from 2010 through 2018, were linked to records of testing with the GC prognostic test. Multivariable analysis was used to quantify the association between GC scores or risk groups and use of definitive local therapy after diagnosis in the GC biopsy-tested cohort and postoperative radiotherapy in the GC-tested cohort as well as adverse pathological findings after prostatectomy. RESULTS: A total of 572 545 patients were included in the analysis, of whom 8927 patients underwent GC testing. GC biopsy-tested patients were more likely to undergo active active surveillance or watchful waiting than untested patients (odds ratio [OR] =2.21, 95% confidence interval [CI] = 2.04 to 2.38, P < .001). The highest use of active surveillance or watchful waiting was for patients with a low-risk GC classification (41%) compared with those with an intermediate- (27%) or high-risk (11%) GC classification (P < .001). Among National Comprehensive Cancer Network patients with low and favorable-intermediate risk, higher GC risk class was associated with greater use of local therapy (OR = 4.79, 95% CI = 3.51 to 6.55, P < .001). Within this subset of patients who were subsequently treated with prostatectomy, high GC risk was associated with harboring adverse pathological findings (OR = 2.94, 95% CI = 1.38 to 6.27, P = .005). Use of radiation after prostatectomy was statistically significantly associated with higher GC risk groups (OR = 2.69, 95% CI = 1.89 to 3.84). CONCLUSIONS: There is a strong association between use of the biopsy GC test and likelihood of conservative management. Higher genomic classifier scores are associated with higher rates of adverse pathology at time of surgery and greater use of postoperative radiotherapy.In this study the Decipher Prostate Genomic Classifier (GC) was used to analyze a US national database of men with prostate cancer. Use of the GC was associated with conservative management (ie, active surveillance). Among men who had high-risk GC scores and then had surgery, there was a 3-fold higher chance of having worrisome findings in surgical specimens.


Assuntos
Neoplasias da Próstata , Masculino , Humanos , Estados Unidos/epidemiologia , Medição de Risco/métodos , Neoplasias da Próstata/epidemiologia , Neoplasias da Próstata/genética , Neoplasias da Próstata/terapia , Antígeno Prostático Específico , Próstata/cirurgia , Próstata/patologia , Genômica
7.
J Natl Cancer Inst ; 115(11): 1337-1354, 2023 11 08.
Artigo em Inglês | MEDLINE | ID: mdl-37433078

RESUMO

BACKGROUND: Cancer is a leading cause of death by disease among children and adolescents in the United States. This study updates cancer incidence rates and trends using the most recent and comprehensive US cancer registry data available. METHODS: We used data from US Cancer Statistics to evaluate counts, age-adjusted incidence rates, and trends among children and adolescents younger than 20 years of age diagnosed with malignant tumors between 2003 and 2019. We calculated the average annual percent change (APC) and APC using joinpoint regression. Rates and trends were stratified by demographic and geographic characteristics and by cancer type. RESULTS: With 248 749 cases reported between 2003 and 2019, the overall cancer incidence rate was 178.3 per 1 million; incidence rates were highest for leukemia (46.6), central nervous system neoplasms (30.8), and lymphoma (27.3). Rates were highest for males, children 0 to 4 years of age, Non-Hispanic White children and adolescents, those in the Northeast census region, the top 25% of counties by economic status, and metropolitan counties with a population of 1 million people or more. Although the overall incidence rate of pediatric cancer increased 0.5% per year on average between 2003 and 2019, the rate increased between 2003 and 2016 (APC = 1.1%), and then decreased between 2016 and 2019 (APC = -2.1%). Between 2003 and 2019, rates of leukemia, lymphoma, hepatic tumors, bone tumors, and thyroid carcinomas increased, while melanoma rates decreased. Rates of central nervous system neoplasms increased until 2017, and then decreased. Rates of other cancer types remained stable. CONCLUSIONS: Incidence of pediatric cancer increased overall, although increases were limited to certain cancer types. These findings may guide future public health and research priorities.


Assuntos
Neoplasias do Sistema Nervoso Central , Leucemia , Linfoma , Melanoma , Criança , Masculino , Adolescente , Humanos , Estados Unidos/epidemiologia , Adulto Jovem , Adulto , Incidência , Linfoma/epidemiologia , Neoplasias do Sistema Nervoso Central/epidemiologia , Leucemia/epidemiologia
8.
JCO Precis Oncol ; 7: e2300044, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37384864

RESUMO

PURPOSE: The DecisionDx-Melanoma 31-gene expression profile (31-GEP) test is validated to classify cutaneous malignant melanoma (CM) patient risk of recurrence, metastasis, or death as low (class 1A), intermediate (class 1B/2A), or high (class 2B). This study aimed to examine the effect of 31-GEP testing on survival outcomes and confirm the prognostic ability of the 31-GEP at the population level. METHODS: Patients with stage I-III CM with a clinical 31-GEP result between 2016 and 2018 were linked to data from 17 SEER registries (n = 4,687) following registries' operation procedures for linkages. Melanoma-specific survival (MSS) and overall survival (OS) differences by 31-GEP risk category were examined using Kaplan-Meier analysis and the log-rank test. Crude and adjusted hazard ratios (HRs) were calculated using Cox regression model to evaluate variables associated with survival. 31-GEP tested patients were propensity score-matched to a cohort of non-31-GEP tested patients from the SEER database. Robustness of the effect of 31-GEP testing was assessed using resampling. RESULTS: Patients with a 31-GEP class 1A result had higher 3-year MSS and OS than patients with a class 1B/2A or class 2B result (MSS: 99.7% v 97.1% v 89.6%, P < .001; OS: 96.6% v 90.2% v 79.4%, P < .001). A class 2B result was an independent predictor of MSS (HR, 7.00; 95% CI, 2.70 to 18.00) and OS (HR, 2.39; 95% CI, 1.54 to 3.70). 31-GEP testing was associated with a 29% lower MSS mortality (HR, 0.71; 95% CI, 0.53 to 0.94) and 17% lower overall mortality (HR, 0.83; 95% CI, 0.70 to 0.99) relative to untested patients. CONCLUSION: In a population-based, clinically tested melanoma cohort, the 31-GEP stratified patients by their risk of dying from melanoma.


Assuntos
Melanoma , Neoplasias Cutâneas , Humanos , Melanoma/genética , Neoplasias Cutâneas/genética , Transcriptoma , Estimativa de Kaplan-Meier , Melanoma Maligno Cutâneo
9.
medRxiv ; 2023 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-37205575

RESUMO

Objective: The manual extraction of case details from patient records for cancer surveillance efforts is a resource-intensive task. Natural Language Processing (NLP) techniques have been proposed for automating the identification of key details in clinical notes. Our goal was to develop NLP application programming interfaces (APIs) for integration into cancer registry data abstraction tools in a computer-assisted abstraction setting. Methods: We used cancer registry manual abstraction processes to guide the design of DeepPhe-CR, a web-based NLP service API. The coding of key variables was done through NLP methods validated using established workflows. A container-based implementation including the NLP wasdeveloped. Existing registry data abstraction software was modified to include results from DeepPhe-CR. An initial usability study with data registrars provided early validation of the feasibility of the DeepPhe-CR tools. Results: API calls support submission of single documents and summarization of cases across multiple documents. The container-based implementation uses a REST router to handle requests and support a graph database for storing results. NLP modules extract topography, histology, behavior, laterality, and grade at 0.79-1.00 F1 across common and rare cancer types (breast, prostate, lung, colorectal, ovary and pediatric brain) on data from two cancer registries. Usability study participants were able to use the tool effectively and expressed interest in adopting the tool. Discussion: Our DeepPhe-CR system provides a flexible architecture for building cancer-specific NLP tools directly into registrar workflows in a computer-assisted abstraction setting. Improving user interactions in client tools, may be needed to realize the potential of these approaches. DeepPhe-CR: https://deepphe.github.io/.

10.
Cancer ; 129(12): 1821-1835, 2023 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-37063057

RESUMO

BACKGROUND: Depression is common among breast cancer patients and can affect concordance with guideline-recommended treatment plans. Yet, the impact of depression on cancer treatment and survival is understudied, particularly in relation to the timing of the depression diagnosis. METHODS: The Kentucky Cancer Registry data was used to identify female patients diagnosed with primary invasive breast cancer who were 20 years of age or older in 2007-2011. Patients were classified as having no depression, depression pre-cancer diagnosis only, depression post- cancer diagnosis only, or persistent depression. The impact of depression on receiving guideline-recommended treatment and survival was examined using multivariable logistic regression and Cox regression, respectively. RESULTS: Of 6054 eligible patients, 4.1%, 3.7%, and 6.2% patients had persistent depression, depression pre-diagnosis only, and depression post-diagnosis only, respectively. A total of 1770 (29.2%) patients did not receive guideline-recommended cancer treatment. Compared to patients with no depression, the odds of receiving guideline-recommended treatment were decreased in patients with depression pre-diagnosis only (odds ratio [OR], 0.75; 95% confidence interval [CI], 0.54-1.04) but not in patients with post-diagnosis only or persistent depression. Depression post-diagnosis only (hazard ratio, 1.51; 95% CI, 1.24-1.83) and depression pre-diagnosis only (hazard ratio, 1.26; 95% CI, 0.99-1.59) were associated with worse survival. No significant difference in survival was found between patients with persistent depression and patients with no depression (p > .05). CONCLUSIONS: Neglecting depression management after a breast cancer diagnosis may result in poorer cancer treatment concordance and worse survival. Early detection and consistent management of depression is critical in improving patient survival.


Assuntos
Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/complicações , Neoplasias da Mama/terapia , Neoplasias da Mama/diagnóstico , Kentucky/epidemiologia , Modelos de Riscos Proporcionais , Sistema de Registros
11.
Int J Radiat Oncol Biol Phys ; 117(1): 262-273, 2023 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-36990288

RESUMO

PURPOSE: Real-world evidence for radiation therapy (RT) is limited because it is often documented only in the clinical narrative. We developed a natural language processing system for automated extraction of detailed RT events from text to support clinical phenotyping. METHODS AND MATERIALS: A multi-institutional data set of 96 clinician notes, 129 North American Association of Central Cancer Registries cancer abstracts, and 270 RT prescriptions from HemOnc.org was used and divided into train, development, and test sets. Documents were annotated for RT events and associated properties: dose, fraction frequency, fraction number, date, treatment site, and boost. Named entity recognition models for properties were developed by fine-tuning BioClinicalBERT and RoBERTa transformer models. A multiclass RoBERTa-based relation extraction model was developed to link each dose mention with each property in the same event. Models were combined with symbolic rules to create a hybrid end-to-end pipeline for comprehensive RT event extraction. RESULTS: Named entity recognition models were evaluated on the held-out test set with F1 results of 0.96, 0.88, 0.94, 0.88, 0.67, and 0.94 for dose, fraction frequency, fraction number, date, treatment site, and boost, respectively. The relation model achieved an average F1 of 0.86 when the input was gold-labeled entities. The end-to-end system F1 result was 0.81. The end-to-end system performed best on North American Association of Central Cancer Registries abstracts (average F1 0.90), which are mostly copy-paste content from clinician notes. CONCLUSIONS: We developed methods and a hybrid end-to-end system for RT event extraction, which is the first natural language processing system for this task. This system provides proof-of-concept for real-world RT data collection for research and is promising for the potential of natural language processing methods to support clinical care.


Assuntos
Processamento de Linguagem Natural , Neoplasias , Humanos , Neoplasias/radioterapia , Registros Eletrônicos de Saúde
12.
Cancers (Basel) ; 14(17)2022 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-36077869

RESUMO

BACKGROUND: Patients with sarcoma often require individualized treatment strategies and are likely to receive aggressive immunosuppressive therapies, which may place them at higher risk for severe COVID-19. We aimed to describe demographics, risk factors, and outcomes for patients with sarcoma and COVID-19. METHODS: We performed a retrospective cohort study of patients with sarcoma and COVID-19 reported to the COVID-19 and Cancer Consortium (CCC19) registry (NCT04354701) from 17 March 2020 to 30 September 2021. Demographics, sarcoma histologic type, treatments, and COVID-19 outcomes were analyzed. RESULTS: of 281 patients, 49% (n = 139) were hospitalized, 33% (n = 93) received supplemental oxygen, 11% (n = 31) were admitted to the ICU, and 6% (n = 16) received mechanical ventilation. A total of 23 (8%) died within 30 days of COVID-19 diagnosis and 44 (16%) died overall at the time of analysis. When evaluated by sarcoma subtype, patients with bone sarcoma and COVID-19 had a higher mortality rate than patients from a matched SEER cohort (13.5% vs 4.4%). Older age, poor performance status, recent systemic anti-cancer therapy, and lung metastases all contributed to higher COVID-19 severity. CONCLUSIONS: Patients with sarcoma have high rates of severe COVID-19 and those with bone sarcoma may have the greatest risk of death.

13.
JAMIA Open ; 5(3): ooac075, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-36110150

RESUMO

Objective: We aim to reduce overfitting and model overconfidence by distilling the knowledge of an ensemble of deep learning models into a single model for the classification of cancer pathology reports. Materials and Methods: We consider the text classification problem that involves 5 individual tasks. The baseline model consists of a multitask convolutional neural network (MtCNN), and the implemented ensemble (teacher) consists of 1000 MtCNNs. We performed knowledge transfer by training a single model (student) with soft labels derived through the aggregation of ensemble predictions. We evaluate performance based on accuracy and abstention rates by using softmax thresholding. Results: The student model outperforms the baseline MtCNN in terms of abstention rates and accuracy, thereby allowing the model to be used with a larger volume of documents when deployed. The highest boost was observed for subsite and histology, for which the student model classified an additional 1.81% reports for subsite and 3.33% reports for histology. Discussion: Ensemble predictions provide a useful strategy for quantifying the uncertainty inherent in labeled data and thereby enable the construction of soft labels with estimated probabilities for multiple classes for a given document. Training models with the derived soft labels reduce model confidence in difficult-to-classify documents, thereby leading to a reduction in the number of highly confident wrong predictions. Conclusions: Ensemble model distillation is a simple tool to reduce model overconfidence in problems with extreme class imbalance and noisy datasets. These methods can facilitate the deployment of deep learning models in high-risk domains with low computational resources where minimizing inference time is required.

14.
BMC Bioinformatics ; 23(Suppl 12): 386, 2022 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-36151511

RESUMO

BACKGROUND: Public Data Commons (PDC) have been highlighted in the scientific literature for their capacity to collect and harmonize big data. On the other hand, local data commons (LDC), located within an institution or organization, have been underrepresented in the scientific literature, even though they are a critical part of research infrastructure. Being closest to the sources of data, LDCs provide the ability to collect and maintain the most up-to-date, high-quality data within an organization, closest to the sources of the data. As a data provider, LDCs have many challenges in both collecting and standardizing data, moreover, as a consumer of PDC, they face problems of data harmonization stemming from the monolithic harmonization pipeline designs commonly adapted by many PDCs. Unfortunately, existing guidelines and resources for building and maintaining data commons exclusively focus on PDC and provide very little information on LDC. RESULTS: This article focuses on four important observations. First, there are three different types of LDC service models that are defined based on their roles and requirements. These can be used as guidelines for building new LDC or enhancing the services of existing LDC. Second, the seven core services of LDC are discussed, including cohort identification and facilitation of genomic sequencing, the management of molecular reports and associated infrastructure, quality control, data harmonization, data integration, data sharing, and data access control. Third, instead of commonly developed monolithic systems, we propose a new data sharing method for data harmonization that combines both divide-and-conquer and bottom-up approaches. Finally, an end-to-end LDC implementation is introduced with real-world examples. CONCLUSIONS: Although LDCs are an optimal place to identify and address data quality issues, they have traditionally been relegated to the role of passive data provider for much larger PDC. Indeed, many LDCs limit their functions to only conducting routine data storage and transmission tasks due to a lack of information on how to design, develop, and improve their services using limited resources. We hope that this work will be the first small step in raising awareness among the LDCs of their expanded utility and to publicize to a wider audience the importance of LDC.


Assuntos
Big Data , Disseminação de Informação , Países em Desenvolvimento , Humanos
15.
JAMIA Open ; 5(2): ooac049, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35721398

RESUMO

Objectives: The International Classification of Childhood Cancer (ICCC) facilitates the effective classification of a heterogeneous group of cancers in the important pediatric population. However, there has been no development of machine learning models for the ICCC classification. We developed deep learning-based information extraction models from cancer pathology reports based on the ICD-O-3 coding standard. In this article, we describe extending the models to perform ICCC classification. Materials and Methods: We developed 2 models, ICD-O-3 classification and ICCC recoding (Model 1) and direct ICCC classification (Model 2), and 4 scenarios subject to the training sample size. We evaluated these models with a corpus consisting of 29 206 reports with age at diagnosis between 0 and 19 from 6 state cancer registries. Results: Our findings suggest that the direct ICCC classification (Model 2) is substantially better than reusing the ICD-O-3 classification model (Model 1). Applying the uncertainty quantification mechanism to assess the confidence of the algorithm in assigning a code demonstrated that the model achieved a micro-F1 score of 0.987 while abstaining (not sufficiently confident to assign a code) on only 14.8% of ambiguous pathology reports. Conclusions: Our experimental results suggest that the machine learning-based automatic information extraction from childhood cancer pathology reports in the ICCC is a reliable means of supplementing human annotators at state cancer registries by reading and abstracting the majority of the childhood cancer pathology reports accurately and reliably.

16.
Cancer Biomark ; 33(2): 185-198, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35213361

RESUMO

BACKGROUND: With the use of artificial intelligence and machine learning techniques for biomedical informatics, security and privacy concerns over the data and subject identities have also become an important issue and essential research topic. Without intentional safeguards, machine learning models may find patterns and features to improve task performance that are associated with private personal information. OBJECTIVE: The privacy vulnerability of deep learning models for information extraction from medical textural contents needs to be quantified since the models are exposed to private health information and personally identifiable information. The objective of the study is to quantify the privacy vulnerability of the deep learning models for natural language processing and explore a proper way of securing patients' information to mitigate confidentiality breaches. METHODS: The target model is the multitask convolutional neural network for information extraction from cancer pathology reports, where the data for training the model are from multiple state population-based cancer registries. This study proposes the following schemes to collect vocabularies from the cancer pathology reports; (a) words appearing in multiple registries, and (b) words that have higher mutual information. We performed membership inference attacks on the models in high-performance computing environments. RESULTS: The comparison outcomes suggest that the proposed vocabulary selection methods resulted in lower privacy vulnerability while maintaining the same level of clinical task performance.


Assuntos
Confidencialidade , Aprendizado Profundo , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Neoplasias/epidemiologia , Inteligência Artificial , Aprendizado Profundo/normas , Humanos , Neoplasias/patologia , Sistema de Registros
17.
J Pathol Inform ; 13: 5, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35136672

RESUMO

BACKGROUND: Population-based state cancer registries are an authoritative source for cancer statistics in the United States. They routinely collect a variety of data, including patient demographics, primary tumor site, stage at diagnosis, first course of treatment, and survival, on every cancer case that is reported across all U.S. states and territories. The goal of our project is to enrich NCI's Surveillance, Epidemiology, and End Results (SEER) registry data with high-quality population-based biospecimen data in the form of digital pathology, machine-learning-based classifications, and quantitative histopathology imaging feature sets (referred to here as Pathomics features). MATERIALS AND METHODS: As part of the project, the underlying informatics infrastructure was designed, tested, and implemented through close collaboration with several participating SEER registries to ensure consistency with registry processes, computational scalability, and ability to support creation of population cohorts that span multiple sites. Utilizing computational imaging algorithms and methods to both generate indices and search for matches makes it possible to reduce inter- and intra-observer inconsistencies and to improve the objectivity with which large image repositories are interrogated. RESULTS: Our team has created and continues to expand a well-curated repository of high-quality digitized pathology images corresponding to subjects whose data are routinely collected by the collaborating registries. Our team has systematically deployed and tested key, visual analytic methods to facilitate automated creation of population cohorts for epidemiological studies and tools to support visualization of feature clusters and evaluation of whole-slide images. As part of these efforts, we are developing and optimizing advanced search and matching algorithms to facilitate automated, content-based retrieval of digitized specimens based on their underlying image features and staining characteristics. CONCLUSION: To meet the challenges of this project, we established the analytic pipelines, methods, and workflows to support the expansion and management of a growing repository of high-quality digitized pathology and information-rich, population cohorts containing objective imaging and clinical attributes to facilitate studies that seek to discriminate among different subtypes of disease, stratify patient populations, and perform comparisons of tumor characteristics within and across patient cohorts. We have also successfully developed a suite of tools based on a deep-learning method to perform quantitative characterizations of tumor regions, assess infiltrating lymphocyte distributions, and generate objective nuclear feature measurements. As part of these efforts, our team has implemented reliable methods that enable investigators to systematically search through large repositories to automatically retrieve digitized pathology specimens and correlated clinical data based on their computational signatures.

18.
IEEE J Biomed Health Inform ; 26(6): 2796-2803, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35020599

RESUMO

Recent applications ofdeep learning have shown promising results for classifying unstructured text in the healthcare domain. However, the reliability of models in production settings has been hindered by imbalanced data sets in which a small subset of the classes dominate. In the absence of adequate training data, rare classes necessitate additional model constraints for robust performance. Here, we present a strategy for incorporating short sequences of text (i.e. keywords) into training to boost model accuracy on rare classes. In our approach, we assemble a set of keywords, including short phrases, associated with each class. The keywords are then used as additional data during each batch of model training, resulting in a training loss that has contributions from both raw data and keywords. We evaluate our approach on classification of cancer pathology reports, which shows a substantial increase in model performance for rare classes. Furthermore, we analyze the impact of keywords on model output probabilities for bigrams, providing a straightforward method to identify model difficulties for limited training data.


Assuntos
Reprodutibilidade dos Testes , Coleta de Dados , Humanos
19.
J Biomed Inform ; 125: 103957, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34823030

RESUMO

In the last decade, the widespread adoption of electronic health record documentation has created huge opportunities for information mining. Natural language processing (NLP) techniques using machine and deep learning are becoming increasingly widespread for information extraction tasks from unstructured clinical notes. Disparities in performance when deploying machine learning models in the real world have recently received considerable attention. In the clinical NLP domain, the robustness of convolutional neural networks (CNNs) for classifying cancer pathology reports under natural distribution shifts remains understudied. In this research, we aim to quantify and improve the performance of the CNN for text classification on out-of-distribution (OOD) datasets resulting from the natural evolution of clinical text in pathology reports. We identified class imbalance due to different prevalence of cancer types as one of the sources of performance drop and analyzed the impact of previous methods for addressing class imbalance when deploying models in real-world domains. Our results show that our novel class-specialized ensemble technique outperforms other methods for the classification of rare cancer types in terms of macro F1 scores. We also found that traditional ensemble methods perform better in top classes, leading to higher micro F1 scores. Based on our findings, we formulate a series of recommendations for other ML practitioners on how to build robust models with extremely imbalanced datasets in biomedical NLP applications.


Assuntos
Processamento de Linguagem Natural , Neoplasias , Registros Eletrônicos de Saúde , Humanos , Aprendizado de Máquina , Redes Neurais de Computação
20.
J Registry Manag ; 49(4): 153-160, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-37260815

RESUMO

Cancer surveillance at the population level is a highly labor-intensive process, with certified tumor registrars (CTRs) manually reviewing medical charts of cancer patients and entering information into local databases that are centrally merged and curated at state and national levels. Registries face considerable challenges in terms of constrained budgets, staffing shortages, and keeping pace with the evolving national and international data standards that are essential to cancer registration. Advanced informatics methods are needed to increase automation, reduce manual efforts, and to help address some of these challenges. The Cancer Informatics Advisory Group (CIAG) to the North American Association of Central Cancer Registries (NAACCR) board was established in 2019 to advise of external informatics activities and initiatives for long-term strategic planning. Reviewed here by the CIAG are current informatics initiatives that were either born out of the cancer registry field or have implications for expansion to cancer surveillance programs in the future. Several areas of notable activity are presented, including an overview of informatics initiatives and descriptions of 12 specific informatics projects with implications for cancer registries. Recommendations are also provided to the registry community for the continued tracking and impact of the projects and initiatives.


Assuntos
Neoplasias , Humanos , Certificação , Pessoal de Saúde , Sistemas de Informação , Neoplasias/epidemiologia , Sistema de Registros
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...