Pesquisa | BVS IEC

1.

Deep learning uncertainty quantification for clinical text classification.

Peluso, Alina; Danciu, Ioana; Yoon, Hong-Jun; Yusof, Jamaludin Mohd; Bhattacharya, Tanmoy; Spannaus, Adam; Schaefferkoetter, Noah; Durbin, Eric B; Wu, Xiao-Cheng; Stroup, Antoinette; Doherty, Jennifer; Schwartz, Stephen; Wiggins, Charles; Coyle, Linda; Penberthy, Lynne; Tourassi, Georgia D; Gao, Shang.

J Biomed Inform ; 149: 104576, 2024 01.

Artigo em Inglês | MEDLINE | ID: mdl-38101690

RESUMO

INTRODUCTION: Machine learning algorithms are expected to work side-by-side with humans in decision-making pipelines. Thus, the ability of classifiers to make reliable decisions is of paramount importance. Deep neural networks (DNNs) represent the state-of-the-art models to address real-world classification. Although the strength of activation in DNNs is often correlated with the network's confidence, in-depth analyses are needed to establish whether they are well calibrated. METHOD: In this paper, we demonstrate the use of DNN-based classification tools to benefit cancer registries by automating information extraction of disease at diagnosis and at surgery from electronic text pathology reports from the US National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) population-based cancer registries. In particular, we introduce multiple methods for selective classification to achieve a target level of accuracy on multiple classification tasks while minimizing the rejection amount-that is, the number of electronic pathology reports for which the model's predictions are unreliable. We evaluate the proposed methods by comparing our approach with the current in-house deep learning-based abstaining classifier. RESULTS: Overall, all the proposed selective classification methods effectively allow for achieving the targeted level of accuracy or higher in a trade-off analysis aimed to minimize the rejection rate. On in-distribution validation and holdout test data, with all the proposed methods, we achieve on all tasks the required target level of accuracy with a lower rejection rate than the deep abstaining classifier (DAC). Interpreting the results for the out-of-distribution test data is more complex; nevertheless, in this case as well, the rejection rate from the best among the proposed methods achieving 97% accuracy or higher is lower than the rejection rate based on the DAC. CONCLUSIONS: We show that although both approaches can flag those samples that should be manually reviewed and labeled by human annotators, the newly proposed methods retain a larger fraction and do so without retraining-thus offering a reduced computational cost compared with the in-house deep learning-based abstaining classifier.

Assuntos

Aprendizado Profundo , Humanos , Incerteza , Redes Neurais de Computação , Algoritmos , Aprendizado de Máquina

2.

Class imbalance in out-of-distribution datasets: Improving the robustness of the TextCNN for the classification of rare cancer types.

De Angeli, Kevin; Gao, Shang; Danciu, Ioana; Durbin, Eric B; Wu, Xiao-Cheng; Stroup, Antoinette; Doherty, Jennifer; Schwartz, Stephen; Wiggins, Charles; Damesyn, Mark; Coyle, Linda; Penberthy, Lynne; Tourassi, Georgia D; Yoon, Hong-Jun.

J Biomed Inform ; 125: 103957, 2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-34823030

RESUMO

In the last decade, the widespread adoption of electronic health record documentation has created huge opportunities for information mining. Natural language processing (NLP) techniques using machine and deep learning are becoming increasingly widespread for information extraction tasks from unstructured clinical notes. Disparities in performance when deploying machine learning models in the real world have recently received considerable attention. In the clinical NLP domain, the robustness of convolutional neural networks (CNNs) for classifying cancer pathology reports under natural distribution shifts remains understudied. In this research, we aim to quantify and improve the performance of the CNN for text classification on out-of-distribution (OOD) datasets resulting from the natural evolution of clinical text in pathology reports. We identified class imbalance due to different prevalence of cancer types as one of the sources of performance drop and analyzed the impact of previous methods for addressing class imbalance when deploying models in real-world domains. Our results show that our novel class-specialized ensemble technique outperforms other methods for the classification of rare cancer types in terms of macro F1 scores. We also found that traditional ensemble methods perform better in top classes, leading to higher micro F1 scores. Based on our findings, we formulate a series of recommendations for other ML practitioners on how to build robust models with extremely imbalanced datasets in biomedical NLP applications.

Assuntos

Processamento de Linguagem Natural , Neoplasias , Registros Eletrônicos de Saúde , Humanos , Aprendizado de Máquina , Redes Neurais de Computação

3.

Accelerated training of bootstrap aggregation-based deep information extraction systems from cancer pathology reports.

Yoon, Hong-Jun; Klasky, Hilda B; Gounley, John P; Alawad, Mohammed; Gao, Shang; Durbin, Eric B; Wu, Xiao-Cheng; Stroup, Antoinette; Doherty, Jennifer; Coyle, Linda; Penberthy, Lynne; Blair Christian, J; Tourassi, Georgia D.

J Biomed Inform ; 110: 103564, 2020 10.

Artigo em Inglês | MEDLINE | ID: mdl-32919043

RESUMO

OBJECTIVE: In machine learning, it is evident that the classification of the task performance increases if bootstrap aggregation (bagging) is applied. However, the bagging of deep neural networks takes tremendous amounts of computational resources and training time. The research question that we aimed to answer in this research is whether we could achieve higher task performance scores and accelerate the training by dividing a problem into sub-problems. MATERIALS AND METHODS: The data used in this study consist of free text from electronic cancer pathology reports. We applied bagging and partitioned data training using Multi-Task Convolutional Neural Network (MT-CNN) and Multi-Task Hierarchical Convolutional Attention Network (MT-HCAN) classifiers. We split a big problem into 20 sub-problems, resampled the training cases 2,000 times, and trained the deep learning model for each bootstrap sample and each sub-problem-thus, generating up to 40,000 models. We performed the training of many models concurrently in a high-performance computing environment at Oak Ridge National Laboratory (ORNL). RESULTS: We demonstrated that aggregation of the models improves task performance compared with the single-model approach, which is consistent with other research studies; and we demonstrated that the two proposed partitioned bagging methods achieved higher classification accuracy scores on four tasks. Notably, the improvements were significant for the extraction of cancer histology data, which had more than 500 class labels in the task; these results show that data partition may alleviate the complexity of the task. On the contrary, the methods did not achieve superior scores for the tasks of site and subsite classification. Intrinsically, since data partitioning was based on the primary cancer site, the accuracy depended on the determination of the partitions, which needs further investigation and improvement. CONCLUSION: Results in this research demonstrate that 1. The data partitioning and bagging strategy achieved higher performance scores. 2. We achieved faster training leveraged by the high-performance Summit supercomputer at ORNL.

Assuntos

Neoplasias , Redes Neurais de Computação , Metodologias Computacionais , Humanos , Armazenamento e Recuperação da Informação , Aprendizado de Máquina

4.

Scalable deep text comprehension for Cancer surveillance on high-performance computing.

Qiu, John X; Yoon, Hong-Jun; Srivastava, Kshitij; Watson, Thomas P; Blair Christian, J; Ramanathan, Arvind; Wu, Xiao C; Fearn, Paul A; Tourassi, Georgia D.

BMC Bioinformatics ; 19(Suppl 18): 488, 2018 Dec 21.

Artigo em Inglês | MEDLINE | ID: mdl-30577743

RESUMO

BACKGROUND: Deep Learning (DL) has advanced the state-of-the-art capabilities in bioinformatics applications which has resulted in trends of increasingly sophisticated and computationally demanding models trained by larger and larger data sets. This vastly increased computational demand challenges the feasibility of conducting cutting-edge research. One solution is to distribute the vast computational workload across multiple computing cluster nodes with data parallelism algorithms. In this study, we used a High-Performance Computing environment and implemented the Downpour Stochastic Gradient Descent algorithm for data parallelism to train a Convolutional Neural Network (CNN) for the natural language processing task of information extraction from a massive dataset of cancer pathology reports. We evaluated the scalability improvements using data parallelism training and the Titan supercomputer at Oak Ridge Leadership Computing Facility. To evaluate scalability, we used different numbers of worker nodes and performed a set of experiments comparing the effects of different training batch sizes and optimizer functions. RESULTS: We found that Adadelta would consistently converge at a lower validation loss, though requiring over twice as many training epochs as the fastest converging optimizer, RMSProp. The Adam optimizer consistently achieved a close 2nd place minimum validation loss significantly faster; using a batch size of 16 and 32 allowed the network to converge in only 4.5 training epochs. CONCLUSIONS: We demonstrated that the networked training process is scalable across multiple compute nodes communicating with message passing interface while achieving higher classification accuracy compared to a traditional machine learning algorithm.

Assuntos

Metodologias Computacionais , Aprendizado Profundo/tendências , Neoplasias/diagnóstico , Compreensão , Humanos , Neoplasias/patologia , Redes Neurais de Computação

5.

Optimal vocabulary selection approaches for privacy-preserving deep NLP model training for information extraction and cancer epidemiology.

Yoon, Hong-Jun; Stanley, Christopher; Christian, J Blair; Klasky, Hilda B; Blanchard, Andrew E; Durbin, Eric B; Wu, Xiao-Cheng; Stroup, Antoinette; Doherty, Jennifer; Schwartz, Stephen M; Wiggins, Charles; Damesyn, Mark; Coyle, Linda; Tourassi, Georgia D.

Cancer Biomark ; 33(2): 185-198, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35213361

RESUMO

BACKGROUND: With the use of artificial intelligence and machine learning techniques for biomedical informatics, security and privacy concerns over the data and subject identities have also become an important issue and essential research topic. Without intentional safeguards, machine learning models may find patterns and features to improve task performance that are associated with private personal information. OBJECTIVE: The privacy vulnerability of deep learning models for information extraction from medical textural contents needs to be quantified since the models are exposed to private health information and personally identifiable information. The objective of the study is to quantify the privacy vulnerability of the deep learning models for natural language processing and explore a proper way of securing patients' information to mitigate confidentiality breaches. METHODS: The target model is the multitask convolutional neural network for information extraction from cancer pathology reports, where the data for training the model are from multiple state population-based cancer registries. This study proposes the following schemes to collect vocabularies from the cancer pathology reports; (a) words appearing in multiple registries, and (b) words that have higher mutual information. We performed membership inference attacks on the models in high-performance computing environments. RESULTS: The comparison outcomes suggest that the proposed vocabulary selection methods resulted in lower privacy vulnerability while maintaining the same level of clinical task performance.

Assuntos

Confidencialidade , Aprendizado Profundo , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Neoplasias/epidemiologia , Inteligência Artificial , Aprendizado Profundo/normas , Humanos , Neoplasias/patologia , Sistema de Registros

6.

A Keyword-Enhanced Approach to Handle Class Imbalance in Clinical Text Classification.

Blanchard, Andrew E; Gao, Shang; Yoon, Hong-Jun; Christian, J Blair; Durbin, Eric B; Wu, Xiao-Cheng; Stroup, Antoinette; Doherty, Jennifer; Schwartz, Stephen M; Wiggins, Charles; Coyle, Linda; Penberthy, Lynne; Tourassi, Georgia D.

IEEE J Biomed Health Inform ; 26(6): 2796-2803, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35020599

RESUMO

Recent applications ofdeep learning have shown promising results for classifying unstructured text in the healthcare domain. However, the reliability of models in production settings has been hindered by imbalanced data sets in which a small subset of the classes dominate. In the absence of adequate training data, rare classes necessitate additional model constraints for robust performance. Here, we present a strategy for incorporating short sequences of text (i.e. keywords) into training to boost model accuracy on rare classes. In our approach, we assemble a set of keywords, including short phrases, associated with each class. The keywords are then used as additional data during each batch of model training, resulting in a training loss that has contributions from both raw data and keywords. We evaluate our approach on classification of cancer pathology reports, which shows a substantial increase in model performance for rare classes. Furthermore, we analyze the impact of keywords on model output probabilities for bigrams, providing a straightforward method to identify model difficulties for limited training data.

Assuntos

Reprodutibilidade dos Testes , Coleta de Dados , Humanos

7.

Mutual information-based template matching scheme for detection of breast masses: from mammography to digital breast tomosynthesis.

Mazurowski, Maciej A; Lo, Joseph Y; Harrawood, Brian P; Tourassi, Georgia D.

J Biomed Inform ; 44(5): 815-23, 2011 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-21554985

RESUMO

Development of a computational decision aid for a new medical imaging modality typically is a long and complicated process. It consists of collecting data in the form of images and annotations, development of image processing and pattern recognition algorithms for analysis of the new images and finally testing of the resulting system. Since new imaging modalities are developed more rapidly than ever before, any effort for decreasing the time and cost of this development process could result in maximizing the benefit of the new imaging modality to patients by making the computer aids quickly available to radiologists that interpret the images. In this paper, we make a step in this direction and investigate the possibility of translating the knowledge about the detection problem from one imaging modality to another. Specifically, we present a computer-aided detection (CAD) system for mammographic masses that uses a mutual information-based template matching scheme with intelligently selected templates. We presented principles of template matching with mutual information for mammography before. In this paper, we present an implementation of those principles in a complete computer-aided detection system. The proposed system, through an automatic optimization process, chooses the most useful templates (mammographic regions of interest) using a large database of previously collected and annotated mammograms. Through this process, the knowledge about the task of detecting masses in mammograms is incorporated in the system. Then, we evaluate whether our system developed for screen-film mammograms can be successfully applied not only to other mammograms but also to digital breast tomosynthesis (DBT) reconstructed slices without adding any DBT cases for training. Our rationale is that since mutual information is known to be a robust inter-modality image similarity measure, it has high potential of transferring knowledge between modalities in the context of the mass detection task. Experimental evaluation of the system on mammograms showed competitive performance compared to other mammography CAD systems recently published in the literature. When the system was applied "as-is" to DBT, its performance was notably worse than that for mammograms. However, with a simple additional preprocessing step, the performance of the system reached levels similar to that obtained for mammograms. In conclusion, the presented CAD system not only performed competitively on screen-film mammograms but it also performed robustly on DBT showing that direct transfer of knowledge across breast imaging modalities for mass detection is in fact possible.

Assuntos

Mama/patologia , Mamografia/métodos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Algoritmos , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/diagnóstico por imagem , Diagnóstico por Computador/métodos , Feminino , Humanos , Reconhecimento Automatizado de Padrão

8.

Exploring the potential of context-sensitive CADe in screening mammography.

Tourassi, Georgia D; Mazurowski, Maciej A; Harrawood, Brian P; Krupinski, Elizabeth A.

Med Phys ; 37(11): 5728-36, 2010 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-21158284

RESUMO

PURPOSE: Conventional computer-assisted detection (CADe) systems in screening mammography provide the same decision support to all users. The aim of this study was to investigate the potential of a context-sensitive CADe system which provides decision support guided by each user's focus of attention during visual search and reporting patterns for a specific case. METHODS: An observer study for the detection of malignant masses in screening mammograms was conducted in which six radiologists evaluated 20 mammograms while wearing an eye-tracking device. Eye-position data and diagnostic decisions were collected for each radiologist and case they reviewed. These cases were subsequently analyzed with an in-house knowledge-based CADe system using two different modes: Conventional mode with a globally fixed decision threshold and context-sensitive mode with a location-variable decision threshold based on the radiologists' eye dwelling data and reporting information. RESULTS: The CADe system operating in conventional mode had 85.7% per-image malignant mass sensitivity at 3.15 false positives per image (FPsI). The same system operating in context-sensitive mode provided personalized decision support at 85.7%-100% sensitivity and 0.35-0.40 FPsI to all six radiologists. Furthermore, context-sensitive CADe system could improve the radiologists' sensitivity and reduce their performance gap more effectively than conventional CADe. CONCLUSIONS: Context-sensitive CADe support shows promise in delineating and reducing the radiologists' perceptual and cognitive errors in the diagnostic interpretation of screening mammograms more effectively than conventional CADe.

Assuntos

Neoplasias da Mama/diagnóstico , Mamografia/métodos , Cognição , Tomada de Decisões , Técnicas de Apoio para a Decisão , Diagnóstico por Computador , Erros de Diagnóstico/prevenção & controle , Reações Falso-Positivas , Feminino , Humanos , Variações Dependentes do Observador , Percepção , Radiologia/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

9.

Individualized computer-aided education in mammography based on user modeling: concept and preliminary experiments.

Mazurowski, Maciej A; Baker, Jay A; Barnhart, Huiman X; Tourassi, Georgia D.

Med Phys ; 37(3): 1152-60, 2010 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-20384251

RESUMO

PURPOSE: The authors propose the framework for an individualized adaptive computer-aided educational system in mammography that is based on user modeling. The underlying hypothesis is that user models can be developed to capture the individual error making patterns of radiologists-in-training. In this pilot study, the authors test the above hypothesis for the task of breast cancer diagnosis in mammograms. METHODS: The concept of a user model was formalized as the function that relates image features to the likelihood/extent of the diagnostic error made by a radiologist-in-training and therefore to the level of difficulty that a case will pose to the radiologist-in-training (or "user"). Then, machine learning algorithms were implemented to build such user models. Specifically, the authors explored k-nearest neighbor, artificial neural networks, and multiple regression for the task of building the model using observer data collected from ten Radiology residents at Duke University Medical Center for the problem of breast mass diagnosis in mammograms. For each resident, a user-specific model was constructed that predicts the user's expected level of difficulty for each presented case based on two BI-RADS image features. In the experiments, leave-one-out data handling scheme was applied to assign each case to a low-predicted-difficulty or a high-predicted-difficulty group for each resident based on each of the three user models. To evaluate whether the user model is useful in predicting difficulty, the authors performed statistical tests using the generalized estimating equations approach to determine whether the mean actual error is the same or not between the low-predicted-difficulty group and the high-predicted-difficulty group. RESULTS: When the results for all observers were pulled together, the actual errors made by residents were statistically significantly higher for cases in the high-predicted-difficulty group than for cases in the low-predicted-difficulty group for all modeling algorithms (p < or = 0.002 for all methods). This indicates that the user models were able to accurately predict difficulty level of the analyzed cases. Furthermore, the authors determined that among the two BI-RADS features that were used in this study, mass margin was the most useful in predicting individual user errors. CONCLUSIONS: The pilot study shows promise for developing individual user models that can accurately predict the level of difficulty that each case will pose to the radiologist-in-training. These models could allow for constructing adaptive computer-aided educational systems in mammography.

Assuntos

Instrução por Computador/métodos , Avaliação Educacional/métodos , Mamografia , Competência Profissional , Radiologia/educação , Interface Usuário-Computador , North Carolina , Projetos Piloto

10.

An adaptive incremental approach to constructing ensemble classifiers: application in an information-theoretic computer-aided decision system for detection of masses in mammograms.

Mazurowski, Maciej A; Zurada, Jacek M; Tourassi, Georgia D.

Med Phys ; 36(7): 2976-84, 2009 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-19673196

RESUMO

Ensemble classifiers have been shown efficient in multiple applications. In this article, the authors explore the effectiveness of ensemble classifiers in a case-based computer-aided diagnosis system for detection of masses in mammograms. They evaluate two general ways of constructing subclassifiers by resampling of the available development dataset: Random division and random selection. Furthermore, they discuss the problem of selecting the ensemble size and propose two adaptive incremental techniques that automatically select the size for the problem at hand. All the techniques are evaluated with respect to a previously proposed information-theoretic CAD system (IT-CAD). The experimental results show that the examined ensemble techniques provide a statistically significant improvement (AUC = 0.905 +/- 0.024) in performance as compared to the original IT-CAD system (AUC = 0.865 +/- 0.029). Some of the techniques allow for a notable reduction in the total number of examples stored in the case base (to 1.3% of the original size), which, in turn, results in lower storage requirements and a shorter response time of the system. Among the methods examined in this article, the two proposed adaptive techniques are by far the most effective for this purpose. Furthermore, the authors provide some discussion and guidance for choosing the ensemble parameters.

Assuntos

Neoplasias da Mama/diagnóstico por imagem , Diagnóstico por Computador , Interpretação de Imagem Assistida por Computador/métodos , Mamografia/métodos , Algoritmos , Inteligência Artificial , Automação , Bases de Dados Factuais , Feminino , Humanos , Reprodutibilidade dos Testes

11.

Methodology for generating a 3D computerized breast phantom from empirical data.

Li, Christina M; Segars, W Paul; Tourassi, Georgia D; Boone, John M; Dobbins, James T.

Med Phys ; 36(7): 3122-31, 2009 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-19673211

RESUMO

The initial process for creating a flexible three-dimensional computer-generated breast phantom based on empirical data is described. Dedicated breast computed-tomography data were processed to suppress noise and scatter artifacts in the reconstructed image set. An automated algorithm was developed to classify the breast into its primary components. A preliminary phantom defined using subdivision surfaces was generated from the segmented data. To demonstrate potential applications of the phantom, simulated mammographic image data were acquired of the phantom using a simplistic compression model and an analytic projection algorithm directly on the surface model. The simulated image was generated using a model for a polyenergetic cone-beam projection of the compressed phantom. The methods used to create the breast phantom generate resulting images that have a high level of tissue structure detail available and appear similar to actual mammograms. Fractal dimension measurements of simulated images of the phantom are comparatively similar to measurements from images of real human subjects. A realistic and geometrically defined breast phantom that can accurately simulate imaging data may have many applications in breast imaging research.

Assuntos

Mama/anatomia & histologia , Imageamento Tridimensional/métodos , Modelos Anatômicos , Imagens de Fantasmas , Algoritmos , Simulação por Computador , Feminino , Humanos , Masculino , Mamografia , Tomografia Computadorizada por Raios X

12.

Automated breast mass detection in 3D reconstructed tomosynthesis volumes: a featureless approach.

Singh, Swatee; Tourassi, Georgia D; Baker, Jay A; Samei, Ehsan; Lo, Joseph Y.

Med Phys ; 35(8): 3626-36, 2008 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-18777923

RESUMO

The purpose of this study was to propose and implement a computer aided detection (CADe) tool for breast tomosynthesis. This task was accomplished in two stages-a highly sensitive mass detector followed by a false positive (FP) reduction stage. Breast tomosynthesis data from 100 human subject cases were used, of which 25 subjects had one or more mass lesions and the rest were normal. For stage 1, filter parameters were optimized via a grid search. The CADe identified suspicious locations were reconstructed to yield 3D CADe volumes of interest. The first stage yielded a maximum sensitivity of 93% with 7.7 FPs/breast volume. Unlike traditional CADe algorithms in which the second stage FP reduction is done via feature extraction and analysis, instead information theory principles were used with mutual information as a similarity metric. Three schemes were proposed, all using leave-one-case-out cross validation sampling. The three schemes, A, B, and C, differed in the composition of their knowledge base of regions of interest (ROIs). Scheme A's knowledge base was comprised of all the mass and FP ROIs generated by the first stage of the algorithm. Scheme B had a knowledge base that contained information from mass ROIs and randomly extracted normal ROIs. Scheme C had information from three sources of information-masses, FPs, and normal ROIs. Also, performance was assessed as a function of the composition of the knowledge base in terms of the number of FP or normal ROIs needed by the system to reach optimal performance. The results indicated that the knowledge base needed no more than 20 times as many FPs and 30 times as many normal ROIs as masses to attain maximal performance. The best overall system performance was 85% sensitivity with 2.4 FPs per breast volume for scheme A, 3.6 FPs per breast volume for scheme B, and 3 FPs per breast volume for scheme C.

Assuntos

Mama/patologia , Mamografia/métodos , Reconhecimento Automatizado de Padrão/métodos , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Algoritmos , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/patologia , Reações Falso-Positivas , Feminino , Humanos , Sensibilidade e Especificidade

13.

Selection of examples in case-based computer-aided decision systems.

Mazurowski, Maciej A; Zurada, Jacek M; Tourassi, Georgia D.

Phys Med Biol ; 53(21): 6079-96, 2008 Nov 07.

Artigo em Inglês | MEDLINE | ID: mdl-18854606

RESUMO

Case-based computer-aided decision (CB-CAD) systems rely on a database of previously stored, known examples when classifying new, incoming queries. Such systems can be particularly useful since they do not need retraining every time a new example is deposited in the case base. The adaptive nature of case-based systems is well suited to the current trend of continuously expanding digital databases in the medical domain. To maintain efficiency, however, such systems need sophisticated strategies to effectively manage the available evidence database. In this paper, we discuss the general problem of building an evidence database by selecting the most useful examples to store while satisfying existing storage requirements. We evaluate three intelligent techniques for this purpose: genetic algorithm-based selection, greedy selection and random mutation hill climbing. These techniques are compared to a random selection strategy used as the baseline. The study is performed with a previously presented CB-CAD system applied for false positive reduction in screening mammograms. The experimental evaluation shows that when the development goal is to maximize the system's diagnostic performance, the intelligent techniques are able to reduce the size of the evidence database to 37% of the original database by eliminating superfluous and/or detrimental examples while at the same time significantly improving the CAD system's performance. Furthermore, if the case-base size is a main concern, the total number of examples stored in the system can be reduced to only 2-4% of the original database without a decrease in the diagnostic performance. Comparison of the techniques shows that random mutation hill climbing provides the best balance between the diagnostic performance and computational efficiency when building the evidence database of the CB-CAD system.

Assuntos

Tomada de Decisões Assistida por Computador , Dispositivos de Armazenamento em Computador , Reprodutibilidade dos Testes

14.

Decision optimization of case-based computer-aided decision systems using genetic algorithms with application to mammography.

Mazurowski, Maciej A; Habas, Piotr A; Zurada, Jacek M; Tourassi, Georgia D.

Phys Med Biol ; 53(4): 895-908, 2008 Feb 21.

Artigo em Inglês | MEDLINE | ID: mdl-18263947

RESUMO

This paper presents an optimization framework for improving case-based computer-aided decision (CB-CAD) systems. The underlying hypothesis of the study is that each example in the knowledge database of a medical decision support system has different importance in the decision making process. A new decision algorithm incorporating an importance weight for each example is proposed to account for these differences. The search for the best set of importance weights is defined as an optimization problem and a genetic algorithm is employed to solve it. The optimization process is tailored to maximize the system's performance according to clinically relevant evaluation criteria. The study was performed using a CAD system developed for the classification of regions of interests (ROIs) in mammograms as depicting masses or normal tissue. The system was constructed and evaluated using a dataset of ROIs extracted from the Digital Database for Screening Mammography (DDSM). Experimental results show that, according to receiver operator characteristic (ROC) analysis, the proposed method significantly improves the overall performance of the CAD system as well as its average specificity for high breast mass detection rates.

Assuntos

Algoritmos , Sistemas de Apoio a Decisões Clínicas , Diagnóstico por Computador/métodos , Mamografia/métodos , Estudos de Casos e Controles , Bases de Dados Factuais , Curva ROC

15.

Neutron-stimulated emission computed tomography of a multi-element phantom.

Floyd, Carey E; Kapadia, Anuj J; Bender, Janelle E; Sharma, Amy C; Xia, Jessie Q; Harrawood, Brian P; Tourassi, Georgia D; Lo, Joseph Y; Crowell, Alexander S; Kiser, Mathew R; Howell, Calvin R.

Phys Med Biol ; 53(9): 2313-26, 2008 May 07.

Artigo em Inglês | MEDLINE | ID: mdl-18421119

RESUMO

This paper describes the implementation of neutron-stimulated emission computed tomography (NSECT) for non-invasive imaging and reconstruction of a multi-element phantom. The experimental apparatus and process for acquisition of multi-spectral projection data are described along with the reconstruction algorithm and images of the two elements in the phantom. Independent tomographic reconstruction of each element of the multi-element phantom was performed successfully. This reconstruction result is the first of its kind and provides encouraging proof of concept for proposed subsequent spectroscopic tomography of biological samples using NSECT.

Assuntos

Nêutrons , Tomografia Computadorizada de Emissão/instrumentação , Tomografia Computadorizada de Emissão/métodos , Algoritmos , Diagnóstico por Imagem/métodos , Desenho de Equipamento , Raios gama , Humanos , Interpretação de Imagem Assistida por Computador/métodos , Processamento de Imagem Assistida por Computador/métodos , Modelos Estatísticos , Neoplasias/diagnóstico , Imagens de Fantasmas , Espalhamento de Radiação , Espectrofotometria/métodos

16.

Evaluating the effect of image preprocessing on an information-theoretic CAD system in mammography.

Tourassi, Georgia D; Ike, Robert; Singh, Swatee; Harrawood, Brian.

Acad Radiol ; 15(5): 626-34, 2008 May.

Artigo em Inglês | MEDLINE | ID: mdl-18423320

RESUMO

RATIONALE AND OBJECTIVES: In our earlier studies, we reported an evidence-based computer-assisted decision (CAD) system for location-specific interrogation of mammograms. A content-based image retrieval framework with information theoretic (IT) similarity measures serves as the foundation for this system. Specifically, the normalized mutual information (NMI) was shown to be the most effective similarity measure for reduction of false-positive marks generated by other prescreening mass detection schemes. The objective of this work was to investigate the importance of image filtering as a possible preprocessing step in our IT-CAD system. MATERIALS AND METHODS: Different filters were applied, each one aiming to compensate for known limitations of the NMI similarity measure. The study was based on a region-of-interest database that included true masses and false-positive regions from digitized mammograms. RESULTS: Receiver-operating characteristics (ROC) analysis showed that IT-CAD is affected slightly by image filtering. Modest, yet statistically significant, performance gain was observed with median filtering (overall ROC area index A(z) improved from 0.78 to 0.82). However, Gabor filtering improved performance for the high-sensitivity portion of the ROC curve where a typical false-positive reduction scheme should operate (partial ROC area index (0.90)A(z) improved from 0.33 to 0.37). Fusion of IT-CAD decisions from different filtering schemes markedly improved performance (A(z) = 0.90 and (0.90)A(z) = 0.55). At 95% sensitivity, the system's specificity improved by 36.6%. CONCLUSIONS: Additional improvement in false-positive reduction can be achieved by incorporating image filtering as a preprocessing step in our IT-CAD system.

Assuntos

Neoplasias da Mama/diagnóstico por imagem , Armazenamento e Recuperação da Informação/métodos , Mamografia , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Algoritmos , Inteligência Artificial , Humanos , Teoria da Informação , Reconhecimento Automatizado de Padrão/métodos , Curva ROC , Intensificação de Imagem Radiográfica , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Técnica de Subtração

17.

Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance.

Mazurowski, Maciej A; Habas, Piotr A; Zurada, Jacek M; Lo, Joseph Y; Baker, Jay A; Tourassi, Georgia D.

Neural Netw ; 21(2-3): 427-36, 2008.

Artigo em Inglês | MEDLINE | ID: mdl-18272329

RESUMO

This study investigates the effect of class imbalance in training data when developing neural network classifiers for computer-aided medical diagnosis. The investigation is performed in the presence of other characteristics that are typical among medical data, namely small training sample size, large number of features, and correlations between features. Two methods of neural network training are explored: classical backpropagation (BP) and particle swarm optimization (PSO) with clinically relevant training criteria. An experimental study is performed using simulated data and the conclusions are further validated on real clinical data for breast cancer diagnosis. The results show that classifier performance deteriorates with even modest class imbalance in the training data. Further, it is shown that BP is generally preferable over PSO for imbalanced training data especially with small data sample and large number of features. Finally, it is shown that there is no clear preference between oversampling and no compensation approach and some guidance is provided regarding a proper selection.

Assuntos

Inteligência Artificial , Tomada de Decisões , Retroalimentação , Redes Neurais de Computação , Algoritmos , Neoplasias da Mama/classificação , Neoplasias da Mama/diagnóstico , Simulação por Computador , Diagnóstico por Computador/métodos , Processamento Eletrônico de Dados , Humanos , Curva ROC

18.

Near-Field High-Energy Spectroscopic Gamma Imaging Using a Rotation Modulation Collimator.

Sharma, Amy C; Turkington, Timothy G; Tourassi, Georgia D; Floyd, Carey E.

Nucl Instrum Methods Phys Res B ; 266(22): 4938-47, 2008 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-26523076

RESUMO

Certain trace elements are vital to the body and elemental imbalances can be indicators of certain diseases including cancer and liver diseases. Neutron Stimulated Emission Computed Tomography (NSECT) is being developed as spectroscopic imaging technique to non-invasively and non-destructively measure and image elemental concentrations within the body. A region of interest is illuminated via a high-energy beam of neutrons that scatter inelastically with elemental nuclei within the body. The excited nuclei then relax by emitting characteristic gamma rays. Acquiring the gamma spectrum in a tomographic manner allows not only the identification of elements, but also the formation of images representing spatial distributions of specific elements. We are developing a high-energy position-sensitive gamma camera that allows full illumination of the entire region of interest. Because current scintillation crystal based position-sensitive gamma cameras operate in too low of an energy range, we are adapting high-energy gamma imaging techniques used in space-based imaging. A High Purity Germanium (HPGe) detector provides high-resolution energy spectra while a rotating modulation collimator (RMC) placed in front of the detector modulates the incoming signal to provide spatial information. The purpose of this manuscript is to describe the near-field RMC geometry, which varies greatly from the infinite-focus space-based applications, and how it modulates the incident gamma flux. A simple geometric model is presented and then used to reconstruct two-dimensional planar images of both simulated point sources and extended sources.

19.

Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports.

Qiu, John X; Yoon, Hong-Jun; Fearn, Paul A; Tourassi, Georgia D.

IEEE J Biomed Health Inform ; 22(1): 244-251, 2018 01.

Artigo em Inglês | MEDLINE | ID: mdl-28475069

RESUMO

Pathology reports are a primary source of information for cancer registries which process high volumes of free-text reports annually. Information extraction and coding is a manual, labor-intensive process. In this study, we investigated deep learning and a convolutional neural network (CNN), for extracting ICD-O-3 topographic codes from a corpus of breast and lung cancer pathology reports. We performed two experiments, using a CNN and a more conventional term frequency vector approach, to assess the effects of class prevalence and inter-class transfer learning. The experiments were based on a set of 942 pathology reports with human expert annotations as the gold standard. CNN performance was compared against a more conventional term frequency vector space approach. We observed that the deep learning models consistently outperformed the conventional approaches in the class prevalence experiment, resulting in micro- and macro-F score increases of up to 0.132 and 0.226, respectively, when class labels were well populated. Specifically, the best performing CNN achieved a micro-F score of 0.722 over 12 ICD-O-3 topography codes. Transfer learning provided a consistent but modest performance boost for the deep learning methods but trends were contingent on the CNN method and cancer site. These encouraging results demonstrate the potential of deep learning for automated abstraction of pathology reports.

Assuntos

Inteligência Artificial , Diagnóstico por Computador/métodos , Registros Eletrônicos de Saúde , Neoplasias , Humanos , Neoplasias/classificação , Neoplasias/diagnóstico , Neoplasias/patologia , Máquina de Vetores de Suporte

20.

Hierarchical attention networks for information extraction from cancer pathology reports.

Gao, Shang; Young, Michael T; Qiu, John X; Yoon, Hong-Jun; Christian, James B; Fearn, Paul A; Tourassi, Georgia D; Ramanthan, Arvind.

J Am Med Inform Assoc ; 25(3): 321-330, 2018 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-29155996

RESUMO

OBJECTIVE: We explored how a deep learning (DL) approach based on hierarchical attention networks (HANs) can improve model performance for multiple information extraction tasks from unstructured cancer pathology reports compared to conventional methods that do not sufï¬ciently capture syntactic and semantic contexts from free-text documents. MATERIALS AND METHODS: Data for our analyses were obtained from 942 deidentiï¬ed pathology reports collected by the National Cancer Institute Surveillance, Epidemiology, and End Results program. The HAN was implemented for 2 information extraction tasks: (1) primary site, matched to 12 International Classification of Diseases for Oncology topography codes (7 breast, 5 lung primary sites), and (2) histological grade classiï¬cation, matched to G1-G4. Model performance metrics were compared to conventional machine learning (ML) approaches including naive Bayes, logistic regression, support vector machine, random forest, and extreme gradient boosting, and other DL models, including a recurrent neural network (RNN), a recurrent neural network with attention (RNN w/A), and a convolutional neural network. RESULTS: Our results demonstrate that for both information tasks, HAN performed signiï¬cantly better compared to the conventional ML and DL techniques. In particular, across the 2 tasks, the mean micro and macro F-scores for the HAN with pretraining were (0.852,0.708), compared to naive Bayes (0.518, 0.213), logistic regression (0.682, 0.453), support vector machine (0.634, 0.434), random forest (0.698, 0.508), extreme gradient boosting (0.696, 0.522), RNN (0.505, 0.301), RNN w/A (0.637, 0.471), and convolutional neural network (0.714, 0.460). CONCLUSIONS: HAN-based DL models show promise in information abstraction tasks within unstructured clinical pathology reports.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA