Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
BMC Med Inform Decis Mak ; 24(Suppl 5): 262, 2024 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-39289714

RESUMO

BACKGROUND: Applying graph convolutional networks (GCN) to the classification of free-form natural language texts leveraged by graph-of-words features (TextGCN) was studied and confirmed to be an effective means of describing complex natural language texts. However, the text classification models based on the TextGCN possess weaknesses in terms of memory consumption and model dissemination and distribution. In this paper, we present a fast message passing network (FastMPN), implementing a GCN with message passing architecture that provides versatility and flexibility by allowing trainable node embedding and edge weights, helping the GCN model find the better solution. We applied the FastMPN model to the task of clinical information extraction from cancer pathology reports, extracting the following six properties: main site, subsite, laterality, histology, behavior, and grade. RESULTS: We evaluated the clinical task performance of the FastMPN models in terms of micro- and macro-averaged F1 scores. A comparison was performed with the multi-task convolutional neural network (MT-CNN) model. Results show that the FastMPN model is equivalent to or better than the MT-CNN. CONCLUSIONS: Our implementation revealed that our FastMPN model, which is based on the PyTorch platform, can train a large corpus (667,290 training samples) with 202,373 unique words in less than 3 minutes per epoch using one NVIDIA V100 hardware accelerator. Our experiments demonstrated that using this implementation, the clinical task performance scores of information extraction related to tumors from cancer pathology reports were highly competitive.


Assuntos
Processamento de Linguagem Natural , Neoplasias , Redes Neurais de Computação , Humanos , Neoplasias/classificação , Mineração de Dados
2.
Cancer Biomark ; 33(2): 185-198, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35213361

RESUMO

BACKGROUND: With the use of artificial intelligence and machine learning techniques for biomedical informatics, security and privacy concerns over the data and subject identities have also become an important issue and essential research topic. Without intentional safeguards, machine learning models may find patterns and features to improve task performance that are associated with private personal information. OBJECTIVE: The privacy vulnerability of deep learning models for information extraction from medical textural contents needs to be quantified since the models are exposed to private health information and personally identifiable information. The objective of the study is to quantify the privacy vulnerability of the deep learning models for natural language processing and explore a proper way of securing patients' information to mitigate confidentiality breaches. METHODS: The target model is the multitask convolutional neural network for information extraction from cancer pathology reports, where the data for training the model are from multiple state population-based cancer registries. This study proposes the following schemes to collect vocabularies from the cancer pathology reports; (a) words appearing in multiple registries, and (b) words that have higher mutual information. We performed membership inference attacks on the models in high-performance computing environments. RESULTS: The comparison outcomes suggest that the proposed vocabulary selection methods resulted in lower privacy vulnerability while maintaining the same level of clinical task performance.


Assuntos
Confidencialidade , Aprendizado Profundo , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Neoplasias/epidemiologia , Inteligência Artificial , Aprendizado Profundo/normas , Humanos , Neoplasias/patologia , Sistema de Registros
3.
IEEE J Biomed Health Inform ; 26(6): 2796-2803, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35020599

RESUMO

Recent applications ofdeep learning have shown promising results for classifying unstructured text in the healthcare domain. However, the reliability of models in production settings has been hindered by imbalanced data sets in which a small subset of the classes dominate. In the absence of adequate training data, rare classes necessitate additional model constraints for robust performance. Here, we present a strategy for incorporating short sequences of text (i.e. keywords) into training to boost model accuracy on rare classes. In our approach, we assemble a set of keywords, including short phrases, associated with each class. The keywords are then used as additional data during each batch of model training, resulting in a training loss that has contributions from both raw data and keywords. We evaluate our approach on classification of cancer pathology reports, which shows a substantial increase in model performance for rare classes. Furthermore, we analyze the impact of keywords on model output probabilities for bigrams, providing a straightforward method to identify model difficulties for limited training data.


Assuntos
Reprodutibilidade dos Testes , Coleta de Dados , Humanos
4.
IEEE J Biomed Health Inform ; 24(7): 1952-1967, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32386166

RESUMO

Cancer registries collect unstructured and structured cancer data for surveillance purposes which provide important insights regarding cancer characteristics, treatments, and outcomes. Cancer registry data typically (1) categorize each reportable cancer case or tumor at the time of diagnosis, (2) contain demographic information about the patient such as age, gender, and location at time of diagnosis, (3) include planned and completed primary treatment information, and (4) may contain survival outcomes. As structured data is being extracted from various unstructured sources, such as pathology reports, radiology reports, medical records, and stored for reporting and other needs, the associated information representing a reportable cancer is constantly expanding and evolving. While some popular analytic approaches including SEER*Stat and SAS exist, we provide a knowledge graph approach to organizing cancer registry data. Our approach offers unique advantages for timely data analysis and presentation and visualization of valuable information. This knowledge graph approach semantically enriches the data, and easily enables linking with third-party data which can help explain variation in cancer incidence patterns, disparities, and outcomes. We developed a prototype knowledge graph based on the Louisiana Tumor Registry dataset. We present the advantages of the knowledge graph approach by examining: i) scenario-specific queries, ii) links with openly available external datasets, iii) schema evolution for iterative analysis, and iv) data visualization. Our results demonstrate that this graph based solution can perform complex queries, improve query run-time performance by up to 76%, and more easily conduct iterative analyses to enhance researchers' understanding of cancer registry data.


Assuntos
Bases de Conhecimento , Neoplasias , Sistema de Registros , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Bases de Dados Factuais , Feminino , Humanos , Incidência , Masculino , Pessoa de Meia-Idade , Neoplasias/diagnóstico , Neoplasias/epidemiologia , Neoplasias/fisiopatologia
5.
Artigo em Inglês | MEDLINE | ID: mdl-36081613

RESUMO

Automated text information extraction from cancer pathology reports is an active area of research to support national cancer surveillance. A well-known challenge is how to develop information extraction tools with robust performance across cancer registries. In this study we investigated whether transfer learning (TL) with a convolutional neural network (CNN) can facilitate cross-registry knowledge sharing. Specifically, we performed a series of experiments to determine whether a CNN trained with single-registry data is capable of transferring knowledge to another registry or whether developing a cross-registry knowledge database produces a more effective and generalizable model. Using data from two cancer registries and primary tumor site and topography as the information extraction task of interest, our study showed that TL results in 6.90% and 17.22% improvement of classification macro F-score over the baseline single-registry models. Detailed analysis illustrated that the observed improvement is evident in the low prevalence classes.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA