Pesquisa | Portal Regional da BVS

Near real time monitoring and forecasting for COVID-19 situational awareness.

Stewart, Robert; Erwin, Samantha; Piburn, Jesse; Nagle, Nicholas; Kaufman, Jason; Peluso, Alina; Christian, J Blair; Grant, Joshua; Sorokine, Alexandre; Bhaduri, Budhendra.

Appl Geogr ; 146: 102759, 2022 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-35945952

RESUMO

In the opening months of the pandemic, the need for situational awareness was urgent. Forecasting models such as the Susceptible-Infectious-Recovered (SIR) model were hampered by limited testing data and key information on mobility, contact tracing, and local policy variations would not be consistently available for months. New case counts from sources like John Hopkins University and the NY Times were systematically reliable. Using these data, we developed the novel COVID County Situational Awareness Tool (CCSAT) for reliable monitoring and decision support. In CCSAT, we developed a retrospective seven-day moving window semantic map of county-level disease magnitude and acceleration that smoothed noisy daily variations. We also developed a novel Bayesian model that reliably forecasted county-level magnitude and acceleration for the upcoming week based on population and new case count data. Together these formed a robust operational update including county-level maps of new case rate changes, estimates of new cases in the upcoming week, and measures of model reliability. We found CCSAT provided stable, reliable estimates across the seven-day time window, with the greatest errors occurring in cases of anomalous, single day spikes. In this paper, we provide CCSAT details and apply it to a single week in June 2020.

Inferring the spread of COVID-19: the role of time-varying reporting rate in epidemiological modelling.

Spannaus, Adam; Papamarkou, Theodore; Erwin, Samantha; Christian, J Blair.

Sci Rep ; 12(1): 10761, 2022 06 24.

Artigo em Inglês | MEDLINE | ID: mdl-35750796

RESUMO

The role of epidemiological models is crucial for informing public health officials during a public health emergency, such as the COVID-19 pandemic. However, traditional epidemiological models fail to capture the time-varying effects of mitigation strategies and do not account for under-reporting of active cases, thus introducing bias in the estimation of model parameters. To infer more accurate parameter estimates and to reduce the uncertainty of these estimates, we extend the SIR and SEIR epidemiological models with two time-varying parameters that capture the transmission rate and the rate at which active cases are reported to health officials. Using two real data sets of COVID-19 cases, we perform Bayesian inference via our SIR and SEIR models with time-varying transmission and reporting rates and via their standard counterparts with constant rates; our approach provides parameter estimates with more realistic interpretation, and 1-week ahead predictions with reduced uncertainty. Furthermore, we find consistent under-reporting in the number of active cases in the data that we consider, suggesting that the initial phase of the pandemic was more widespread than previously reported.

Assuntos

COVID-19 , Pandemias , Teorema de Bayes , COVID-19/epidemiologia , Humanos , Saúde Pública , Incerteza

Optimal vocabulary selection approaches for privacy-preserving deep NLP model training for information extraction and cancer epidemiology.

Yoon, Hong-Jun; Stanley, Christopher; Christian, J Blair; Klasky, Hilda B; Blanchard, Andrew E; Durbin, Eric B; Wu, Xiao-Cheng; Stroup, Antoinette; Doherty, Jennifer; Schwartz, Stephen M; Wiggins, Charles; Damesyn, Mark; Coyle, Linda; Tourassi, Georgia D.

Cancer Biomark ; 33(2): 185-198, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35213361

RESUMO

BACKGROUND: With the use of artificial intelligence and machine learning techniques for biomedical informatics, security and privacy concerns over the data and subject identities have also become an important issue and essential research topic. Without intentional safeguards, machine learning models may find patterns and features to improve task performance that are associated with private personal information. OBJECTIVE: The privacy vulnerability of deep learning models for information extraction from medical textural contents needs to be quantified since the models are exposed to private health information and personally identifiable information. The objective of the study is to quantify the privacy vulnerability of the deep learning models for natural language processing and explore a proper way of securing patients' information to mitigate confidentiality breaches. METHODS: The target model is the multitask convolutional neural network for information extraction from cancer pathology reports, where the data for training the model are from multiple state population-based cancer registries. This study proposes the following schemes to collect vocabularies from the cancer pathology reports; (a) words appearing in multiple registries, and (b) words that have higher mutual information. We performed membership inference attacks on the models in high-performance computing environments. RESULTS: The comparison outcomes suggest that the proposed vocabulary selection methods resulted in lower privacy vulnerability while maintaining the same level of clinical task performance.

Assuntos

Confidencialidade , Aprendizado Profundo , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Neoplasias/epidemiologia , Inteligência Artificial , Aprendizado Profundo/normas , Humanos , Neoplasias/patologia , Sistema de Registros

A Keyword-Enhanced Approach to Handle Class Imbalance in Clinical Text Classification.

Blanchard, Andrew E; Gao, Shang; Yoon, Hong-Jun; Christian, J Blair; Durbin, Eric B; Wu, Xiao-Cheng; Stroup, Antoinette; Doherty, Jennifer; Schwartz, Stephen M; Wiggins, Charles; Coyle, Linda; Penberthy, Lynne; Tourassi, Georgia D.

IEEE J Biomed Health Inform ; 26(6): 2796-2803, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35020599

RESUMO

Recent applications ofdeep learning have shown promising results for classifying unstructured text in the healthcare domain. However, the reliability of models in production settings has been hindered by imbalanced data sets in which a small subset of the classes dominate. In the absence of adequate training data, rare classes necessitate additional model constraints for robust performance. Here, we present a strategy for incorporating short sequences of text (i.e. keywords) into training to boost model accuracy on rare classes. In our approach, we assemble a set of keywords, including short phrases, associated with each class. The keywords are then used as additional data during each batch of model training, resulting in a training loss that has contributions from both raw data and keywords. We evaluate our approach on classification of cancer pathology reports, which shows a substantial increase in model performance for rare classes. Furthermore, we analyze the impact of keywords on model output probabilities for bigrams, providing a straightforward method to identify model difficulties for limited training data.

Assuntos

Reprodutibilidade dos Testes , Coleta de Dados , Humanos

A pre-training and self-training approach for biomedical named entity recognition.

Gao, Shang; Kotevska, Olivera; Sorokine, Alexandre; Christian, J Blair.

PLoS One ; 16(2): e0246310, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33561139

RESUMO

Named entity recognition (NER) is a key component of many scientific literature mining tasks, such as information retrieval, information extraction, and question answering; however, many modern approaches require large amounts of labeled training data in order to be effective. This severely limits the effectiveness of NER models in applications where expert annotations are difficult and expensive to obtain. In this work, we explore the effectiveness of transfer learning and semi-supervised self-training to improve the performance of NER models in biomedical settings with very limited labeled data (250-2000 labeled samples). We first pre-train a BiLSTM-CRF and a BERT model on a very large general biomedical NER corpus such as MedMentions or Semantic Medline, and then we fine-tune the model on a more specific target NER task that has very limited training data; finally, we apply semi-supervised self-training using unlabeled data to further boost model performance. We show that in NER tasks that focus on common biomedical entity types such as those in the Unified Medical Language System (UMLS), combining transfer learning with self-training enables a NER model such as a BiLSTM-CRF or BERT to obtain similar performance with the same model trained on 3x-8x the amount of labeled data. We further show that our approach can also boost performance in a low-resource application where entities types are more rare and not specifically covered in UMLS.

Assuntos

Inteligência Artificial , Reconhecimento Psicológico , Aprendizado de Máquina não Supervisionado , Humanos , Modelos Teóricos , Aprendizado de Máquina Supervisionado , Terminologia como Assunto , Transferência de Experiência , Unified Medical Language System

Knowledge Graph-Enabled Cancer Data Analytics.

Hasan, S M Shamimul; Rivera, Donna; Wu, Xiao-Cheng; Durbin, Eric B; Christian, J Blair; Tourassi, Georgia.

IEEE J Biomed Health Inform ; 24(7): 1952-1967, 2020 07.

Artigo em Inglês | MEDLINE | ID: mdl-32386166

RESUMO

Cancer registries collect unstructured and structured cancer data for surveillance purposes which provide important insights regarding cancer characteristics, treatments, and outcomes. Cancer registry data typically (1) categorize each reportable cancer case or tumor at the time of diagnosis, (2) contain demographic information about the patient such as age, gender, and location at time of diagnosis, (3) include planned and completed primary treatment information, and (4) may contain survival outcomes. As structured data is being extracted from various unstructured sources, such as pathology reports, radiology reports, medical records, and stored for reporting and other needs, the associated information representing a reportable cancer is constantly expanding and evolving. While some popular analytic approaches including SEER*Stat and SAS exist, we provide a knowledge graph approach to organizing cancer registry data. Our approach offers unique advantages for timely data analysis and presentation and visualization of valuable information. This knowledge graph approach semantically enriches the data, and easily enables linking with third-party data which can help explain variation in cancer incidence patterns, disparities, and outcomes. We developed a prototype knowledge graph based on the Louisiana Tumor Registry dataset. We present the advantages of the knowledge graph approach by examining: i) scenario-specific queries, ii) links with openly available external datasets, iii) schema evolution for iterative analysis, and iv) data visualization. Our results demonstrate that this graph based solution can perform complex queries, improve query run-time performance by up to 76%, and more easily conduct iterative analyses to enhance researchers' understanding of cancer registry data.

Assuntos

Bases de Conhecimento , Neoplasias , Sistema de Registros , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Bases de Dados Factuais , Feminino , Humanos , Incidência , Masculino , Pessoa de Meia-Idade , Neoplasias/diagnóstico , Neoplasias/epidemiologia , Neoplasias/fisiopatologia

Deep Transfer Learning Across Cancer Registries for Information Extraction from Pathology Reports.

Alawad, Mohammed; Gao, Shang; Qiu, John; Schaefferkoetter, Noah; Hinkle, Jacob D; Yoon, Hong-Jun; Christian, J Blair; Wu, Xiao-Cheng; Durbin, Eric B; Jeong, Jong Cheol; Hands, Isaac; Rust, David; Tourassi, Georgia.

IEEE EMBS Int Conf Biomed Health Inform ; 20192019 May.

Artigo em Inglês | MEDLINE | ID: mdl-36081613

RESUMO

Automated text information extraction from cancer pathology reports is an active area of research to support national cancer surveillance. A well-known challenge is how to develop information extraction tools with robust performance across cancer registries. In this study we investigated whether transfer learning (TL) with a convolutional neural network (CNN) can facilitate cross-registry knowledge sharing. Specifically, we performed a series of experiments to determine whether a CNN trained with single-registry data is capable of transferring knowledge to another registry or whether developing a cross-registry knowledge database produces a more effective and generalizable model. Using data from two cancer registries and primary tumor site and topography as the information extraction task of interest, our study showed that TL results in 6.90% and 17.22% improvement of classification macro F-score over the baseline single-registry models. Detailed analysis illustrated that the observed improvement is evident in the low prevalence classes.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA