Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 85
Filtrar
1.
CA Cancer J Clin ; 72(3): 287-300, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-34964981

RESUMO

Generating evidence on the use, effectiveness, and safety of new cancer therapies is a priority for researchers, health care providers, payers, and regulators given the rapid pace of change in cancer diagnosis and treatments. The use of real-world data (RWD) is integral to understanding the utilization patterns and outcomes of these new treatments among patients with cancer who are treated in clinical practice and community settings. An initial step in the use of RWD is careful study design to assess the suitability of an RWD source. This pivotal process can be guided by using a conceptual model that encourages predesign conceptualization. The primary types of RWD included are electronic health records, administrative claims data, cancer registries, and specialty data providers and networks. Careful consideration of each data type is necessary because they are collected for a specific purpose, capturing a set of data elements within a certain population for that purpose, and they vary by population coverage and longitudinality. In this review, the authors provide a high-level assessment of the strengths and limitations of each data category to inform data source selection appropriate to the study question. Overall, the development and accessibility of RWD sources for cancer research are rapidly increasing, and the use of these data requires careful consideration of composition and utility to assess important questions in understanding the use and effectiveness of new therapies.


Assuntos
Armazenamento e Recuperação da Informação , Oncologia , Registros Eletrônicos de Saúde , Humanos , Sistema de Registros , Projetos de Pesquisa
2.
J Biomed Inform ; 149: 104576, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38101690

RESUMO

INTRODUCTION: Machine learning algorithms are expected to work side-by-side with humans in decision-making pipelines. Thus, the ability of classifiers to make reliable decisions is of paramount importance. Deep neural networks (DNNs) represent the state-of-the-art models to address real-world classification. Although the strength of activation in DNNs is often correlated with the network's confidence, in-depth analyses are needed to establish whether they are well calibrated. METHOD: In this paper, we demonstrate the use of DNN-based classification tools to benefit cancer registries by automating information extraction of disease at diagnosis and at surgery from electronic text pathology reports from the US National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) population-based cancer registries. In particular, we introduce multiple methods for selective classification to achieve a target level of accuracy on multiple classification tasks while minimizing the rejection amount-that is, the number of electronic pathology reports for which the model's predictions are unreliable. We evaluate the proposed methods by comparing our approach with the current in-house deep learning-based abstaining classifier. RESULTS: Overall, all the proposed selective classification methods effectively allow for achieving the targeted level of accuracy or higher in a trade-off analysis aimed to minimize the rejection rate. On in-distribution validation and holdout test data, with all the proposed methods, we achieve on all tasks the required target level of accuracy with a lower rejection rate than the deep abstaining classifier (DAC). Interpreting the results for the out-of-distribution test data is more complex; nevertheless, in this case as well, the rejection rate from the best among the proposed methods achieving 97% accuracy or higher is lower than the rejection rate based on the DAC. CONCLUSIONS: We show that although both approaches can flag those samples that should be manually reviewed and labeled by human annotators, the newly proposed methods retain a larger fraction and do so without retraining-thus offering a reduced computational cost compared with the in-house deep learning-based abstaining classifier.


Assuntos
Aprendizado Profundo , Humanos , Incerteza , Redes Neurais de Computação , Algoritmos , Aprendizado de Máquina
3.
BMC Med Inform Decis Mak ; 24(Suppl 5): 262, 2024 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-39289714

RESUMO

BACKGROUND: Applying graph convolutional networks (GCN) to the classification of free-form natural language texts leveraged by graph-of-words features (TextGCN) was studied and confirmed to be an effective means of describing complex natural language texts. However, the text classification models based on the TextGCN possess weaknesses in terms of memory consumption and model dissemination and distribution. In this paper, we present a fast message passing network (FastMPN), implementing a GCN with message passing architecture that provides versatility and flexibility by allowing trainable node embedding and edge weights, helping the GCN model find the better solution. We applied the FastMPN model to the task of clinical information extraction from cancer pathology reports, extracting the following six properties: main site, subsite, laterality, histology, behavior, and grade. RESULTS: We evaluated the clinical task performance of the FastMPN models in terms of micro- and macro-averaged F1 scores. A comparison was performed with the multi-task convolutional neural network (MT-CNN) model. Results show that the FastMPN model is equivalent to or better than the MT-CNN. CONCLUSIONS: Our implementation revealed that our FastMPN model, which is based on the PyTorch platform, can train a large corpus (667,290 training samples) with 202,373 unique words in less than 3 minutes per epoch using one NVIDIA V100 hardware accelerator. Our experiments demonstrated that using this implementation, the clinical task performance scores of information extraction related to tumors from cancer pathology reports were highly competitive.


Assuntos
Processamento de Linguagem Natural , Neoplasias , Redes Neurais de Computação , Humanos , Neoplasias/classificação , Mineração de Dados
4.
Am J Epidemiol ; 191(12): 2075-2083, 2022 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-35872590

RESUMO

Follow-up of US cohort members for incident cancer is time-consuming, is costly, and often results in underascertainment when the traditional methods of self-reporting and/or medical record validation are used. We conducted one of the first large-scale investigations to assess the feasibility, methods, and benefits of linking participants in the US Radiologic Technologists (USRT) Study (n = 146,022) with the majority of US state or regional cancer registries. Follow-up of this cohort has relied primarily on questionnaires (mailed approximately every 10 years) and linkage with the National Death Index. We compared the level of agreement and completeness of questionnaire/death-certificate-based information with that of registry-based (43 registries) incident cancer follow-up in the USRT cohort. Using registry-identified first primary cancers from 1999-2012 as the gold standard, the overall sensitivity was 46.5% for self-reports only and 63.0% for both self-reports and death certificates. Among the 37.0% false-negative reports, 27.8% were due to dropout, while 9.2% were due to misreporting. The USRT cancer reporting patterns differed by cancer type. Our study indicates that linkage to state cancer registries would greatly improve completeness and accuracy of cancer follow-up in comparison with questionnaire self-reporting. These findings support ongoing development of a national US virtual pooled registry with which to streamline cohort linkages.


Assuntos
Atestado de Óbito , Neoplasias , Humanos , Estudos de Coortes , Autorrelato , Incidência , Neoplasias/epidemiologia , Sistema de Registros
5.
J Biomed Inform ; 125: 103957, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34823030

RESUMO

In the last decade, the widespread adoption of electronic health record documentation has created huge opportunities for information mining. Natural language processing (NLP) techniques using machine and deep learning are becoming increasingly widespread for information extraction tasks from unstructured clinical notes. Disparities in performance when deploying machine learning models in the real world have recently received considerable attention. In the clinical NLP domain, the robustness of convolutional neural networks (CNNs) for classifying cancer pathology reports under natural distribution shifts remains understudied. In this research, we aim to quantify and improve the performance of the CNN for text classification on out-of-distribution (OOD) datasets resulting from the natural evolution of clinical text in pathology reports. We identified class imbalance due to different prevalence of cancer types as one of the sources of performance drop and analyzed the impact of previous methods for addressing class imbalance when deploying models in real-world domains. Our results show that our novel class-specialized ensemble technique outperforms other methods for the classification of rare cancer types in terms of macro F1 scores. We also found that traditional ensemble methods perform better in top classes, leading to higher micro F1 scores. Based on our findings, we formulate a series of recommendations for other ML practitioners on how to build robust models with extremely imbalanced datasets in biomedical NLP applications.


Assuntos
Processamento de Linguagem Natural , Neoplasias , Registros Eletrônicos de Saúde , Humanos , Aprendizado de Máquina , Redes Neurais de Computação
6.
BMC Bioinformatics ; 22(1): 113, 2021 Mar 09.
Artigo em Inglês | MEDLINE | ID: mdl-33750288

RESUMO

BACKGROUND: Automated text classification has many important applications in the clinical setting; however, obtaining labelled data for training machine learning and deep learning models is often difficult and expensive. Active learning techniques may mitigate this challenge by reducing the amount of labelled data required to effectively train a model. In this study, we analyze the effectiveness of 11 active learning algorithms on classifying subsite and histology from cancer pathology reports using a Convolutional Neural Network as the text classification model. RESULTS: We compare the performance of each active learning strategy using two differently sized datasets and two different classification tasks. Our results show that on all tasks and dataset sizes, all active learning strategies except diversity-sampling strategies outperformed random sampling, i.e., no active learning. On our large dataset (15K initial labelled samples, adding 15K additional labelled samples each iteration of active learning), there was no clear winner between the different active learning strategies. On our small dataset (1K initial labelled samples, adding 1K additional labelled samples each iteration of active learning), marginal and ratio uncertainty sampling performed better than all other active learning techniques. We found that compared to random sampling, active learning strongly helps performance on rare classes by focusing on underrepresented classes. CONCLUSIONS: Active learning can save annotation cost by helping human annotators efficiently and intelligently select which samples to label. Our results show that a dataset constructed using effective active learning techniques requires less than half the amount of labelled data to achieve the same performance as a dataset constructed using random sampling.


Assuntos
Aprendizado de Máquina , Neoplasias , Algoritmos , Humanos , Neoplasias/genética , Neoplasias/patologia , Redes Neurais de Computação
7.
Lancet Oncol ; 21(9): e444-e451, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32888473

RESUMO

Population-based cancer registries (PBCRs) generate measures of cancer incidence and survival that are essential for cancer surveillance, research, and cancer control strategies. In 2014, the Toronto Paediatric Cancer Stage Guidelines were developed to standardise how PBCRs collect data on the stage at diagnosis for childhood cancer cases. These guidelines have been implemented in multiple jurisdictions worldwide to facilitate international comparative studies of incidence and outcome. Robust stratification by risk also requires data on key non-stage prognosticators (NSPs). Key experts and stakeholders used a modified Delphi approach to establish principles guiding paediatric cancer NSP data collection. With the use of these principles, recommendations were made on which NSPs should be collected for the major malignancies in children. The 2014 Toronto Stage Guidelines were also reviewed and updated where necessary. Wide adoption of the resultant Paediatric NSP Guidelines and updated Toronto Stage Guidelines will enhance the harmonisation and use of childhood cancer data provided by PBCRs.


Assuntos
Guias como Assunto/normas , Neoplasias/terapia , Pediatria/tendências , Prognóstico , Criança , Atenção à Saúde , Humanos , Estadiamento de Neoplasias , Neoplasias/epidemiologia , Sistema de Registros
8.
J Biomed Inform ; 110: 103564, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32919043

RESUMO

OBJECTIVE: In machine learning, it is evident that the classification of the task performance increases if bootstrap aggregation (bagging) is applied. However, the bagging of deep neural networks takes tremendous amounts of computational resources and training time. The research question that we aimed to answer in this research is whether we could achieve higher task performance scores and accelerate the training by dividing a problem into sub-problems. MATERIALS AND METHODS: The data used in this study consist of free text from electronic cancer pathology reports. We applied bagging and partitioned data training using Multi-Task Convolutional Neural Network (MT-CNN) and Multi-Task Hierarchical Convolutional Attention Network (MT-HCAN) classifiers. We split a big problem into 20 sub-problems, resampled the training cases 2,000 times, and trained the deep learning model for each bootstrap sample and each sub-problem-thus, generating up to 40,000 models. We performed the training of many models concurrently in a high-performance computing environment at Oak Ridge National Laboratory (ORNL). RESULTS: We demonstrated that aggregation of the models improves task performance compared with the single-model approach, which is consistent with other research studies; and we demonstrated that the two proposed partitioned bagging methods achieved higher classification accuracy scores on four tasks. Notably, the improvements were significant for the extraction of cancer histology data, which had more than 500 class labels in the task; these results show that data partition may alleviate the complexity of the task. On the contrary, the methods did not achieve superior scores for the tasks of site and subsite classification. Intrinsically, since data partitioning was based on the primary cancer site, the accuracy depended on the determination of the partitions, which needs further investigation and improvement. CONCLUSION: Results in this research demonstrate that 1. The data partitioning and bagging strategy achieved higher performance scores. 2. We achieved faster training leveraged by the high-performance Summit supercomputer at ORNL.


Assuntos
Neoplasias , Redes Neurais de Computação , Metodologias Computacionais , Humanos , Armazenamento e Recuperação da Informação , Aprendizado de Máquina
9.
Cancer ; 124(13): 2801-2814, 2018 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-29786851

RESUMO

BACKGROUND: Temporal trends in prostate cancer incidence and death rates have been attributed to changing patterns of screening and improved treatment (mortality only), among other factors. This study evaluated contemporary national-level trends and their relations with prostate-specific antigen (PSA) testing prevalence and explored trends in incidence according to disease characteristics with stage-specific, delay-adjusted rates. METHODS: Joinpoint regression was used to examine changes in delay-adjusted prostate cancer incidence rates from population-based US cancer registries from 2000 to 2014 by age categories, race, and disease characteristics, including stage, PSA, Gleason score, and clinical extension. In addition, the analysis included trends for prostate cancer mortality between 1975 and 2015 by race and the estimation of PSA testing prevalence between 1987 and 2005. The annual percent change was calculated for periods defined by significant trend change points. RESULTS: For all age groups, overall prostate cancer incidence rates declined approximately 6.5% per year from 2007. However, the incidence of distant-stage disease increased from 2010 to 2014. The incidence of disease according to higher PSA levels or Gleason scores at diagnosis did not increase. After years of significant decline (from 1993 to 2013), the overall prostate cancer mortality trend stabilized from 2013 to 2015. CONCLUSIONS: After a decline in PSA test usage, there has been an increased burden of late-stage disease, and the decline in prostate cancer mortality has leveled off. Cancer 2018;124:2801-2814. © 2018 American Cancer Society.


Assuntos
Efeitos Psicossociais da Doença , Mortalidade/tendências , Neoplasias da Próstata/epidemiologia , Comitês Consultivos/normas , Distribuição por Idade , Idoso , Detecção Precoce de Câncer/normas , Detecção Precoce de Câncer/estatística & dados numéricos , Humanos , Incidência , Masculino , Programas de Rastreamento/normas , Programas de Rastreamento/estatística & dados numéricos , Pessoa de Meia-Idade , Gradação de Tumores , Estadiamento de Neoplasias , Prevalência , Serviços Preventivos de Saúde/normas , Antígeno Prostático Específico/sangue , Neoplasias da Próstata/sangue , Neoplasias da Próstata/diagnóstico , Neoplasias da Próstata/patologia , Programa de SEER/estatística & dados numéricos , Estados Unidos/epidemiologia
10.
Cancer Causes Control ; 29(4-5): 427-433, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29497884

RESUMO

PURPOSE: This analysis describes the impact of hysterectomy on incidence rates and trends in endometrioid endometrial cancer in the United States among women of reproductive age. METHODS: Hysterectomy prevalence for states containing Surveillance, Epidemiology, and End Results (SEER) registry was estimated using data from the Behavioral Risk Factor Surveillance System (BRFSS) between 1992 and 2010. The population was adjusted for age, race, and calendar year strata. Age-adjusted incidence rates and trends of endometrial cancer among women age 20-49 corrected for hysterectomy were estimated. RESULTS: Hysterectomy prevalence varied by age, race, and ethnicity. Increasing incidence trends were observed, and were attenuated after correcting for hysterectomy. Among all women, the incidence was increasing 1.6% annually (95% CI 0.9, 2.3) and this increase was no longer significant after correction for hysterectomy (+ 0.7; 95% CI - 0.1, 1.5). Stage at diagnosis was similar with and without correction for hysterectomy. The largest increase in incidence over time was among Hispanic women; even after correction for hysterectomy, incidence was increasing (1.8%; 95% CI 0.2, 3.4) annually. CONCLUSION: Overall, endometrioid endometrial cancer incidence rates in the US remain stable among women of reproductive age. Routine reporting of endometrial cancer incidence does not accurately measure incidence among racial and ethnic minorities.


Assuntos
Carcinoma Endometrioide/epidemiologia , Neoplasias do Endométrio/epidemiologia , Histerectomia/estatística & dados numéricos , Adulto , Etnicidade , Feminino , Hispânico ou Latino/estatística & dados numéricos , Humanos , Incidência , Pessoa de Meia-Idade , Prevalência , Sistema de Registros , Programa de SEER , Estados Unidos , Adulto Jovem
11.
Cancer ; 123(4): 697-703, 2017 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-27783399

RESUMO

BACKGROUND: Researchers have used prostate-specific antigen (PSA) values collected by central cancer registries to evaluate tumors for potential aggressive clinical disease. An independent study collecting PSA values suggested a high error rate (18%) related to implied decimal points. To evaluate the error rate in the Surveillance, Epidemiology, and End Results (SEER) program, a comprehensive review of PSA values recorded across all SEER registries was performed. METHODS: Consolidated PSA values for eligible prostate cancer cases in SEER registries were reviewed and compared with text documentation from abstracted records. Four types of classification errors were identified: implied decimal point errors, abstraction or coding implementation errors, nonsignificant errors, and changes related to "unknown" values. RESULTS: A total of 50,277 prostate cancer cases diagnosed in 2012 were reviewed. Approximately 94.15% of cases did not have meaningful changes (85.85% correct, 5.58% with a nonsignificant change of <1 ng/mL, and 2.80% with no clinical change). Approximately 5.70% of cases had meaningful changes (1.93% due to implied decimal point errors, 1.54% due to abstract or coding errors, and 2.23% due to errors related to unknown categories). Only 419 of the original 50,277 cases (0.83%) resulted in a change in disease stage due to a corrected PSA value. CONCLUSIONS: The implied decimal error rate was only 1.93% of all cases in the current validation study, with a meaningful error rate of 5.81%. The reasons for the lower error rate in SEER are likely due to ongoing and rigorous quality control and visual editing processes by the central registries. The SEER program currently is reviewing and correcting PSA values back to 2004 and will re-release these data in the public use research file. Cancer 2017;123:697-703. © 2016 American Cancer Society.


Assuntos
Valor Preditivo dos Testes , Antígeno Prostático Específico/sangue , Neoplasias da Próstata/epidemiologia , Programa de SEER , Humanos , Masculino , Estadiamento de Neoplasias , Neoplasias da Próstata/sangue , Neoplasias da Próstata/patologia
12.
Cancer ; 122(9): 1312-37, 2016 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-26959385

RESUMO

BACKGROUND: Annual updates on cancer occurrence and trends in the United States are provided through an ongoing collaboration among the American Cancer Society (ACS), the Centers for Disease Control and Prevention (CDC), the National Cancer Institute (NCI), and the North American Association of Central Cancer Registries (NAACCR). This annual report highlights the increasing burden of liver and intrahepatic bile duct (liver) cancers. METHODS: Cancer incidence data were obtained from the CDC, NCI, and NAACCR; data about cancer deaths were obtained from the CDC's National Center for Health Statistics (NCHS). Annual percent changes in incidence and death rates (age-adjusted to the 2000 US Standard Population) for all cancers combined and for the leading cancers among men and women were estimated by joinpoint analysis of long-term trends (incidence for 1992-2012 and mortality for 1975-2012) and short-term trends (2008-2012). In-depth analysis of liver cancer incidence included an age-period-cohort analysis and an incidence-based estimation of person-years of life lost because of the disease. By using NCHS multiple causes of death data, hepatitis C virus (HCV) and liver cancer-associated death rates were examined from 1999 through 2013. RESULTS: Among men and women of all major racial and ethnic groups, death rates continued to decline for all cancers combined and for most cancer sites; the overall cancer death rate (for both sexes combined) decreased by 1.5% per year from 2003 to 2012. Overall, incidence rates decreased among men and remained stable among women from 2003 to 2012. Among both men and women, deaths from liver cancer increased at the highest rate of all cancer sites, and liver cancer incidence rates increased sharply, second only to thyroid cancer. Men had more than twice the incidence rate of liver cancer than women, and rates increased with age for both sexes. Among non-Hispanic (NH) white, NH black, and Hispanic men and women, liver cancer incidence rates were higher for persons born after the 1938 to 1947 birth cohort. In contrast, there was a minimal birth cohort effect for NH Asian and Pacific Islanders (APIs). NH black men and Hispanic men had the lowest median age at death (60 and 62 years, respectively) and the highest average person-years of life lost per death (21 and 20 years, respectively) from liver cancer. HCV and liver cancer-associated death rates were highest among decedents who were born during 1945 through 1965. CONCLUSIONS: Overall, cancer incidence and mortality declined among men; and, although cancer incidence was stable among women, mortality declined. The burden of liver cancer is growing and is not equally distributed throughout the population. Efforts to vaccinate populations that are vulnerable to hepatitis B virus (HBV) infection and to identify and treat those living with HCV or HBV infection, metabolic conditions, alcoholic liver disease, or other causes of cirrhosis can be effective in reducing the incidence and mortality of liver cancer. Cancer 2016;122:1312-1337. © 2016 American Cancer Society.


Assuntos
Neoplasias/epidemiologia , Distribuição por Idade , American Cancer Society , Causas de Morte/tendências , Centers for Disease Control and Prevention, U.S. , Etnicidade/estatística & dados numéricos , Feminino , Humanos , Incidência , Neoplasias Hepáticas/epidemiologia , Neoplasias Hepáticas/etnologia , Masculino , National Cancer Institute (U.S.) , Neoplasias/etnologia , Grupos Raciais/estatística & dados numéricos , Sistema de Registros/estatística & dados numéricos , Distribuição por Sexo , Fatores Sexuais , Fatores de Tempo , Estados Unidos/epidemiologia , Estados Unidos/etnologia
13.
Cancer ; 122(10): 1579-87, 2016 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-26991915

RESUMO

BACKGROUND: This article presents a first look at rates and trends for cases in the Surveillance, Epidemiology, and End Results (SEER) program diagnosed through 2013 using the February 2015 submission, and a validation of rates and trends from the February 2014 submission using the subsequent November 2014 submission. To the authors' knowledge, this is the second time SEER has published trends based on the early February submission. Three new cancer sites were added: cervix, thyroid, and liver/ intrahepatic bile duct. METHODS: A reporting delay model adjusted for the undercount of cases, which is substantially larger for the February than the subsequent November submission, was used. Joinpoint regression methodology was used to assess trends. Delay-adjusted rates and trends were checked to assess validity between the February and November 2014 submissions. RESULTS: The validation of rates and trends from the February and November 2014 submissions demonstrated even better agreement than the previously reported comparison between the February and November 2013 submissions, thereby affording additional confidence that the delay-adjusted February submission data can be used to produce valid estimates of incidence trends. Trends for cases diagnosed through 2013 revealed more rapid declines in female colon and rectal cancer and prostate cancer. A plateau in female melanoma trends and a slowing of the increases in thyroid cancer and male liver/intrahepatic bile duct cancer trends were observed. CONCLUSIONS: Analysis of early cancer data submissions can provide a preliminary indication of differences in incidence trends with an additional year of data. Although the delay adjustment correction adjusts for underreporting of cases, caution should be exercised when interpreting the results in this early submission. Cancer 2016;122:1579-87. © 2016 American Cancer Society.


Assuntos
Neoplasias/epidemiologia , Métodos Epidemiológicos , Feminino , Humanos , Incidência , Masculino , Reprodutibilidade dos Testes , Programa de SEER , Fatores Sexuais , Estados Unidos/epidemiologia
15.
J Natl Cancer Inst Monogr ; 2024(65): 110-117, 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-39102886

RESUMO

Although the Surveillance, Epidemiology, and End Results (SEER) Program has maintained high standards of quality and completeness, the traditional data captured through population-based cancer surveillance are no longer sufficient to understand the impact of cancer and its outcomes. Therefore, in recent years, the SEER Program has expanded the population it covers and enhanced the types of data that are being collected. Traditionally, surveillance systems collected data characterizing the patient and their cancer at the time of diagnosis, as well as limited information on the initial course of therapy. SEER performs active follow-up on cancer patients from diagnosis until death, ascertaining critical information on mortality and survival over time. With the growth of precision oncology and rapid development and dissemination of new diagnostics and treatments, the limited data that registries have traditionally captured around the time of diagnosis-although useful for characterizing the cancer-are insufficient for understanding why similar patients may have different outcomes. The molecular composition of the tumor and genetic factors such as BRCA status affect the patient's treatment response and outcomes. Capturing and stratifying by these critical risk factors are essential if we are to understand differences in outcomes among patients who may be demographically similar, have the same cancer, be diagnosed at the same stage, and receive the same treatment. In addition to the tumor characteristics, it is essential to understand all the therapies that a patient receives over time, not only for the initial treatment period but also if the cancer recurs or progresses. Capturing this subsequent therapy is critical not only for research but also to help patients understand their risk at the time of therapeutic decision making. This article serves as an introduction and foundation for a JNCI Monograph with specific articles focusing on innovative new methods and processes implemented or under development for the SEER Program. The following sections describe the need to evaluate the SEER Program and provide a summary or introduction of those key enhancements that have been or are in the process of being implemented for SEER.


Assuntos
Neoplasias , Programa de SEER , Humanos , Programa de SEER/estatística & dados numéricos , Neoplasias/terapia , Neoplasias/epidemiologia , Neoplasias/diagnóstico , Estados Unidos/epidemiologia , Vigilância da População
16.
J Natl Cancer Inst Monogr ; 2024(65): 145-151, 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-39102883

RESUMO

The National Cancer Institute and the Department of Energy strategic partnership applies advanced computing and predictive machine learning and deep learning models to automate the capture of information from unstructured clinical text for inclusion in cancer registries. Applications include extraction of key data elements from pathology reports, determination of whether a pathology or radiology report is related to cancer, extraction of relevant biomarker information, and identification of recurrence. With the growing complexity of cancer diagnosis and treatment, capturing essential information with purely manual methods is increasingly difficult. These new methods for applying advanced computational capabilities to automate data extraction represent an opportunity to close critical information gaps and create a nimble, flexible platform on which new information sources, such as genomics, can be added. This will ultimately provide a deeper understanding of the drivers of cancer and outcomes in the population and increase the timeliness of reporting. These advances will enable better understanding of how real-world patients are treated and the outcomes associated with those treatments in the context of our complex medical and social environment.


Assuntos
Aprendizado Profundo , Aprendizado de Máquina , Neoplasias , Humanos , Neoplasias/diagnóstico , Neoplasias/epidemiologia , Estados Unidos/epidemiologia , Sistema de Registros , National Cancer Institute (U.S.)
17.
J Natl Cancer Inst Monogr ; 2024(65): 132-144, 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-39102880

RESUMO

One of the challenges associated with understanding environmental impacts on cancer risk and outcomes is estimating potential exposures of individuals diagnosed with cancer to adverse environmental conditions over the life course. Historically, this has been partly due to the lack of reliable measures of cancer patients' potential environmental exposures before a cancer diagnosis. The emerging sources of cancer-related spatiotemporal environmental data and residential history information, coupled with novel technologies for data extraction and linkage, present an opportunity to integrate these data into the existing cancer surveillance data infrastructure, thereby facilitating more comprehensive assessment of cancer risk and outcomes. In this paper, we performed a landscape analysis of the available environmental data sources that could be linked to historical residential address information of cancer patients' records collected by the National Cancer Institute's Surveillance, Epidemiology, and End Results Program. The objective is to enable researchers to use these data to assess potential exposures at the time of cancer initiation through the time of diagnosis and even after diagnosis. The paper addresses the challenges associated with data collection and completeness at various spatial and temporal scales, as well as opportunities and directions for future research.


Assuntos
Exposição Ambiental , Neoplasias , Programa de SEER , Humanos , Programa de SEER/estatística & dados numéricos , Neoplasias/epidemiologia , Neoplasias/etiologia , Exposição Ambiental/efeitos adversos , Estados Unidos/epidemiologia , Bases de Dados Factuais , National Cancer Institute (U.S.) , Coleta de Dados/métodos , Fonte de Informação
18.
J Natl Cancer Inst Monogr ; 2024(65): 191-197, 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-39102879

RESUMO

BACKGROUND: The National Cancer Institute funds many large cohort studies that rely on self-reported cancer data requiring medical record validation. This is labor intensive, costly, and prone to underreporting or misreporting of cancer and disparity-related differential response. US population-based central cancer registries identify incident cancer within their catchment area, yielding all malignant neoplasms and benign brain and central nervous system tumors with standardized data fields. This manuscript describes the development, implementation, and features of a system to facilitate linkage between cohort studies and cancer registries and the release of cancer registry data for matched cohort participants. METHODS: The Virtual Pooled Registry-Cancer Linkage System (VPR-CLS) provides an online system to link cohorts with multiple state cancer registries by 1) securely transmitting a study file to registries, 2) providing an optimized linkage algorithm to generate preliminary match counts, and 3) providing a streamlined process and templated forms for submitting and tracking data requests for cohort participants who matched with registries. RESULTS: In 2022, the VPR-CLS launched with 45 registries, covering 95% of the US state populations and Puerto Rico. Registries have linked with 15 studies having 14 273-10.9 million participants. Except in 1 study, linkage sensitivity ranged from 87.0% to 99.9%. Numerous registries have adopted the VPR-CLS templated institutional review board-registry application (n = 39), templated data use agreement (n = 25), and central institutional review board (n = 16). CONCLUSIONS: The VPR-CLS markedly improves ascertainment of cancer outcomes and is the preferred approach for determination of outcomes from cohort studies, postmarketing surveillance, and clinical trials.


Assuntos
Registro Médico Coordenado , Neoplasias , Sistema de Registros , Humanos , Sistema de Registros/estatística & dados numéricos , Neoplasias/epidemiologia , Neoplasias/diagnóstico , Estados Unidos/epidemiologia , Registro Médico Coordenado/métodos , Estudos de Coortes , National Cancer Institute (U.S.)
19.
J Clin Oncol ; 42(9): 1001-1010, 2024 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-38320222

RESUMO

PURPOSE: This study assessed the prevalence of specific major adverse financial events (AFEs)-bankruptcies, liens, and evictions-before a cancer diagnosis and their association with later-stage cancer at diagnosis. METHODS: Patients age 20-69 years diagnosed with cancer during 2014-2015 were identified from the Seattle, Louisiana, and Georgia SEER population-based cancer registries. Registry data were linked with LexisNexis consumer data to identify patients with a history of court-documented AFEs before cancer diagnosis. The association of AFEs and later-stage cancer diagnoses (stages III/IV) was assessed using separate sex-specific multivariable logistic regression. RESULTS: Among 101,649 patients with cancer linked to LexisNexis data, 36,791 (36.2%) had a major AFE reported before diagnosis. The mean and median timing of the AFE closest to diagnosis were 93 and 77 months, respectively. AFEs were most common among non-Hispanic Black, unmarried, and low-income patients. Individuals with previous AFEs were more likely to be diagnosed with later-stage cancer than individuals with no AFE (males-odds ratio [OR], 1.09 [95% CI, 1.03 to 1.14]; P < .001; females-OR, 1.18 [95% CI, 1.13 to 1.24]; P < .0001) after adjusting for age, race, marital status, income, registry, and cancer type. Associations between AFEs prediagnosis and later-stage disease did not vary by AFE timing. CONCLUSION: One third of newly diagnosed patients with cancer had a major AFE before their diagnosis. Patients with AFEs were more likely to have later-stage diagnosis, even accounting for traditional measures of socioeconomic status that influence the stage at diagnosis. The prevalence of prediagnosis AFEs underscores financial vulnerability of patients with cancer before their diagnosis, before any subsequent financial burden associated with cancer treatment.


Assuntos
Negro ou Afro-Americano , Neoplasias , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Adulto Jovem , Georgia/epidemiologia , Neoplasias/diagnóstico , Neoplasias/epidemiologia , Sistema de Registros , Estados Unidos/epidemiologia
20.
J Natl Cancer Inst Monogr ; 2024(65): 180-190, 2024 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-39102878

RESUMO

BACKGROUND: The Surveillance, Epidemiology, and End Results (SEER) Program with the National Cancer Institute tested whether population-based cancer registries can serve as honest brokers to acquire tissue and data in the SEER-Linked Virtual Tissue Repository (VTR) Pilot. METHODS: We collected formalin-fixed, paraffin-embedded tissue and clinical data from patients with pancreatic ductal adenocarcinoma (PDAC) and breast cancer (BC) for two studies comparing cancer cases with highly unusual survival (≥5 years for PDAC and ≤30 months for BC) to pair-matched controls with usual survival (≤2 years for PDAC and ≥5 years for BC). Success was defined as the ability for registries to acquire tissue and data on cancer cases with highly unusual outcomes. RESULTS: Of 98 PDAC and 103 BC matched cases eligible for tissue collection, sources of attrition for tissue collection were tissue being unavailable, control paired with failed case, second control that was not requested, tumor necrosis ≥20%, and low tumor cellularity. In total, tissue meeting the study criteria was obtained for 70 (71%) PDAC and 74 (72%) BC matched cases. For patients with tissue received, clinical data completeness ranged from 59% for CA-19-9 after treatment to >95% for margin status, whether radiation therapy and chemotherapy were administered, and comorbidities. CONCLUSIONS: The VTR Pilot demonstrated the feasibility of using SEER cancer registries as honest brokers to provide tissue and clinical data for secondary use in research. Studies using this program should oversample by 45% to 50% to obtain sufficient sample size and targeted population representation and involve subspecialty matter expert pathologists for tissue selection.


Assuntos
Neoplasias da Mama , Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Programa de SEER , Humanos , Feminino , Projetos Piloto , Carcinoma Ductal Pancreático/terapia , Carcinoma Ductal Pancreático/patologia , Estados Unidos/epidemiologia , Masculino , Neoplasias da Mama/terapia , Neoplasias da Mama/patologia , Neoplasias da Mama/epidemiologia , Neoplasias Pancreáticas/terapia , Neoplasias Pancreáticas/patologia , Neoplasias Pancreáticas/epidemiologia , Pessoa de Meia-Idade , Idoso , National Cancer Institute (U.S.) , Bancos de Tecidos , Sistema de Registros , Adulto , Estudos de Casos e Controles
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA