Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
1.
J Biomed Inform ; 149: 104576, 2024 01.
Artículo en Inglés | MEDLINE | ID: mdl-38101690

RESUMEN

INTRODUCTION: Machine learning algorithms are expected to work side-by-side with humans in decision-making pipelines. Thus, the ability of classifiers to make reliable decisions is of paramount importance. Deep neural networks (DNNs) represent the state-of-the-art models to address real-world classification. Although the strength of activation in DNNs is often correlated with the network's confidence, in-depth analyses are needed to establish whether they are well calibrated. METHOD: In this paper, we demonstrate the use of DNN-based classification tools to benefit cancer registries by automating information extraction of disease at diagnosis and at surgery from electronic text pathology reports from the US National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) population-based cancer registries. In particular, we introduce multiple methods for selective classification to achieve a target level of accuracy on multiple classification tasks while minimizing the rejection amount-that is, the number of electronic pathology reports for which the model's predictions are unreliable. We evaluate the proposed methods by comparing our approach with the current in-house deep learning-based abstaining classifier. RESULTS: Overall, all the proposed selective classification methods effectively allow for achieving the targeted level of accuracy or higher in a trade-off analysis aimed to minimize the rejection rate. On in-distribution validation and holdout test data, with all the proposed methods, we achieve on all tasks the required target level of accuracy with a lower rejection rate than the deep abstaining classifier (DAC). Interpreting the results for the out-of-distribution test data is more complex; nevertheless, in this case as well, the rejection rate from the best among the proposed methods achieving 97% accuracy or higher is lower than the rejection rate based on the DAC. CONCLUSIONS: We show that although both approaches can flag those samples that should be manually reviewed and labeled by human annotators, the newly proposed methods retain a larger fraction and do so without retraining-thus offering a reduced computational cost compared with the in-house deep learning-based abstaining classifier.


Asunto(s)
Aprendizaje Profundo , Humanos , Incertidumbre , Redes Neurales de la Computación , Algoritmos , Aprendizaje Automático
2.
Cancer Biomark ; 33(2): 185-198, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35213361

RESUMEN

BACKGROUND: With the use of artificial intelligence and machine learning techniques for biomedical informatics, security and privacy concerns over the data and subject identities have also become an important issue and essential research topic. Without intentional safeguards, machine learning models may find patterns and features to improve task performance that are associated with private personal information. OBJECTIVE: The privacy vulnerability of deep learning models for information extraction from medical textural contents needs to be quantified since the models are exposed to private health information and personally identifiable information. The objective of the study is to quantify the privacy vulnerability of the deep learning models for natural language processing and explore a proper way of securing patients' information to mitigate confidentiality breaches. METHODS: The target model is the multitask convolutional neural network for information extraction from cancer pathology reports, where the data for training the model are from multiple state population-based cancer registries. This study proposes the following schemes to collect vocabularies from the cancer pathology reports; (a) words appearing in multiple registries, and (b) words that have higher mutual information. We performed membership inference attacks on the models in high-performance computing environments. RESULTS: The comparison outcomes suggest that the proposed vocabulary selection methods resulted in lower privacy vulnerability while maintaining the same level of clinical task performance.


Asunto(s)
Confidencialidad , Aprendizaje Profundo , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Neoplasias/epidemiología , Inteligencia Artificial , Aprendizaje Profundo/normas , Humanos , Neoplasias/patología , Sistema de Registros
3.
IEEE J Biomed Health Inform ; 26(6): 2796-2803, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35020599

RESUMEN

Recent applications ofdeep learning have shown promising results for classifying unstructured text in the healthcare domain. However, the reliability of models in production settings has been hindered by imbalanced data sets in which a small subset of the classes dominate. In the absence of adequate training data, rare classes necessitate additional model constraints for robust performance. Here, we present a strategy for incorporating short sequences of text (i.e. keywords) into training to boost model accuracy on rare classes. In our approach, we assemble a set of keywords, including short phrases, associated with each class. The keywords are then used as additional data during each batch of model training, resulting in a training loss that has contributions from both raw data and keywords. We evaluate our approach on classification of cancer pathology reports, which shows a substantial increase in model performance for rare classes. Furthermore, we analyze the impact of keywords on model output probabilities for bigrams, providing a straightforward method to identify model difficulties for limited training data.


Asunto(s)
Reproducibilidad de los Resultados , Recolección de Datos , Humanos
4.
J Biomed Inform ; 125: 103957, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34823030

RESUMEN

In the last decade, the widespread adoption of electronic health record documentation has created huge opportunities for information mining. Natural language processing (NLP) techniques using machine and deep learning are becoming increasingly widespread for information extraction tasks from unstructured clinical notes. Disparities in performance when deploying machine learning models in the real world have recently received considerable attention. In the clinical NLP domain, the robustness of convolutional neural networks (CNNs) for classifying cancer pathology reports under natural distribution shifts remains understudied. In this research, we aim to quantify and improve the performance of the CNN for text classification on out-of-distribution (OOD) datasets resulting from the natural evolution of clinical text in pathology reports. We identified class imbalance due to different prevalence of cancer types as one of the sources of performance drop and analyzed the impact of previous methods for addressing class imbalance when deploying models in real-world domains. Our results show that our novel class-specialized ensemble technique outperforms other methods for the classification of rare cancer types in terms of macro F1 scores. We also found that traditional ensemble methods perform better in top classes, leading to higher micro F1 scores. Based on our findings, we formulate a series of recommendations for other ML practitioners on how to build robust models with extremely imbalanced datasets in biomedical NLP applications.


Asunto(s)
Procesamiento de Lenguaje Natural , Neoplasias , Registros Electrónicos de Salud , Humanos , Aprendizaje Automático , Redes Neurales de la Computación
5.
J Biomed Inform ; 110: 103564, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-32919043

RESUMEN

OBJECTIVE: In machine learning, it is evident that the classification of the task performance increases if bootstrap aggregation (bagging) is applied. However, the bagging of deep neural networks takes tremendous amounts of computational resources and training time. The research question that we aimed to answer in this research is whether we could achieve higher task performance scores and accelerate the training by dividing a problem into sub-problems. MATERIALS AND METHODS: The data used in this study consist of free text from electronic cancer pathology reports. We applied bagging and partitioned data training using Multi-Task Convolutional Neural Network (MT-CNN) and Multi-Task Hierarchical Convolutional Attention Network (MT-HCAN) classifiers. We split a big problem into 20 sub-problems, resampled the training cases 2,000 times, and trained the deep learning model for each bootstrap sample and each sub-problem-thus, generating up to 40,000 models. We performed the training of many models concurrently in a high-performance computing environment at Oak Ridge National Laboratory (ORNL). RESULTS: We demonstrated that aggregation of the models improves task performance compared with the single-model approach, which is consistent with other research studies; and we demonstrated that the two proposed partitioned bagging methods achieved higher classification accuracy scores on four tasks. Notably, the improvements were significant for the extraction of cancer histology data, which had more than 500 class labels in the task; these results show that data partition may alleviate the complexity of the task. On the contrary, the methods did not achieve superior scores for the tasks of site and subsite classification. Intrinsically, since data partitioning was based on the primary cancer site, the accuracy depended on the determination of the partitions, which needs further investigation and improvement. CONCLUSION: Results in this research demonstrate that 1. The data partitioning and bagging strategy achieved higher performance scores. 2. We achieved faster training leveraged by the high-performance Summit supercomputer at ORNL.


Asunto(s)
Neoplasias , Redes Neurales de la Computación , Metodologías Computacionales , Humanos , Almacenamiento y Recuperación de la Información , Aprendizaje Automático
7.
BMC Bioinformatics ; 19(Suppl 18): 488, 2018 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-30577743

RESUMEN

BACKGROUND: Deep Learning (DL) has advanced the state-of-the-art capabilities in bioinformatics applications which has resulted in trends of increasingly sophisticated and computationally demanding models trained by larger and larger data sets. This vastly increased computational demand challenges the feasibility of conducting cutting-edge research. One solution is to distribute the vast computational workload across multiple computing cluster nodes with data parallelism algorithms. In this study, we used a High-Performance Computing environment and implemented the Downpour Stochastic Gradient Descent algorithm for data parallelism to train a Convolutional Neural Network (CNN) for the natural language processing task of information extraction from a massive dataset of cancer pathology reports. We evaluated the scalability improvements using data parallelism training and the Titan supercomputer at Oak Ridge Leadership Computing Facility. To evaluate scalability, we used different numbers of worker nodes and performed a set of experiments comparing the effects of different training batch sizes and optimizer functions. RESULTS: We found that Adadelta would consistently converge at a lower validation loss, though requiring over twice as many training epochs as the fastest converging optimizer, RMSProp. The Adam optimizer consistently achieved a close 2nd place minimum validation loss significantly faster; using a batch size of 16 and 32 allowed the network to converge in only 4.5 training epochs. CONCLUSIONS: We demonstrated that the networked training process is scalable across multiple compute nodes communicating with message passing interface while achieving higher classification accuracy compared to a traditional machine learning algorithm.


Asunto(s)
Metodologías Computacionales , Aprendizaje Profundo/tendencias , Neoplasias/diagnóstico , Comprensión , Humanos , Neoplasias/patología , Redes Neurales de la Computación
8.
IEEE J Biomed Health Inform ; 22(1): 244-251, 2018 01.
Artículo en Inglés | MEDLINE | ID: mdl-28475069

RESUMEN

Pathology reports are a primary source of information for cancer registries which process high volumes of free-text reports annually. Information extraction and coding is a manual, labor-intensive process. In this study, we investigated deep learning and a convolutional neural network (CNN), for extracting ICD-O-3 topographic codes from a corpus of breast and lung cancer pathology reports. We performed two experiments, using a CNN and a more conventional term frequency vector approach, to assess the effects of class prevalence and inter-class transfer learning. The experiments were based on a set of 942 pathology reports with human expert annotations as the gold standard. CNN performance was compared against a more conventional term frequency vector space approach. We observed that the deep learning models consistently outperformed the conventional approaches in the class prevalence experiment, resulting in micro- and macro-F score increases of up to 0.132 and 0.226, respectively, when class labels were well populated. Specifically, the best performing CNN achieved a micro-F score of 0.722 over 12 ICD-O-3 topography codes. Transfer learning provided a consistent but modest performance boost for the deep learning methods but trends were contingent on the CNN method and cancer site. These encouraging results demonstrate the potential of deep learning for automated abstraction of pathology reports.


Asunto(s)
Inteligencia Artificial , Diagnóstico por Computador/métodos , Registros Electrónicos de Salud , Neoplasias , Humanos , Neoplasias/clasificación , Neoplasias/diagnóstico , Neoplasias/patología , Máquina de Vectores de Soporte
9.
J Am Med Inform Assoc ; 25(3): 321-330, 2018 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-29155996

RESUMEN

OBJECTIVE: We explored how a deep learning (DL) approach based on hierarchical attention networks (HANs) can improve model performance for multiple information extraction tasks from unstructured cancer pathology reports compared to conventional methods that do not sufficiently capture syntactic and semantic contexts from free-text documents. MATERIALS AND METHODS: Data for our analyses were obtained from 942 deidentified pathology reports collected by the National Cancer Institute Surveillance, Epidemiology, and End Results program. The HAN was implemented for 2 information extraction tasks: (1) primary site, matched to 12 International Classification of Diseases for Oncology topography codes (7 breast, 5 lung primary sites), and (2) histological grade classification, matched to G1-G4. Model performance metrics were compared to conventional machine learning (ML) approaches including naive Bayes, logistic regression, support vector machine, random forest, and extreme gradient boosting, and other DL models, including a recurrent neural network (RNN), a recurrent neural network with attention (RNN w/A), and a convolutional neural network. RESULTS: Our results demonstrate that for both information tasks, HAN performed significantly better compared to the conventional ML and DL techniques. In particular, across the 2 tasks, the mean micro and macro F-scores for the HAN with pretraining were (0.852,0.708), compared to naive Bayes (0.518, 0.213), logistic regression (0.682, 0.453), support vector machine (0.634, 0.434), random forest (0.698, 0.508), extreme gradient boosting (0.696, 0.522), RNN (0.505, 0.301), RNN w/A (0.637, 0.471), and convolutional neural network (0.714, 0.460). CONCLUSIONS: HAN-based DL models show promise in information abstraction tasks within unstructured clinical pathology reports.

11.
Med Phys ; 44(3): 832-846, 2017 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-28079249

RESUMEN

PURPOSE: The objective of this study was to assess the complexity of human visual search activity during mammographic screening using fractal analysis and to investigate its relationship with case and reader characteristics. METHODS: The study was performed for the task of mammographic screening with simultaneous viewing of four coordinated breast views as typically done in clinical practice. Eye-tracking data and diagnostic decisions collected for 100 mammographic cases (25 normal, 25 benign, 50 malignant) from 10 readers (three board certified radiologists and seven Radiology residents), formed the corpus for this study. The fractal dimension of the readers' visual scanning pattern was computed with the Minkowski-Bouligand box-counting method and used as a measure of gaze complexity. Individual factor and group-based interaction ANOVA analysis was performed to study the association between fractal dimension, case pathology, breast density, and reader experience level. The consistency of the observed trends depending on gaze data representation was also examined. RESULTS: Case pathology, breast density, reader experience level, and individual reader differences are all independent predictors of the complexity of visual scanning pattern when screening for breast cancer. No higher order effects were found to be significant. CONCLUSIONS: Fractal characterization of visual search behavior during mammographic screening is dependent on case properties and image reader characteristics.


Asunto(s)
Neoplasias de la Mama/diagnóstico por imagen , Movimientos Oculares , Fractales , Mamografía/métodos , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Adulto , Anciano , Anciano de 80 o más Años , Análisis de Varianza , Densidad de la Mama , Errores Diagnósticos , Medidas del Movimiento Ocular , Femenino , Humanos , Internado y Residencia , Persona de Mediana Edad , Variaciones Dependientes del Observador , Competencia Profesional , Radiólogos , Percepción Visual
12.
J Med Imaging (Bellingham) ; 3(4): 044506, 2016 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-28018939

RESUMEN

The purpose of this work is to describe the LUNGx Challenge for the computerized classification of lung nodules on diagnostic computed tomography (CT) scans as benign or malignant and report the performance of participants' computerized methods along with that of six radiologists who participated in an observer study performing the same Challenge task on the same dataset. The Challenge provided sets of calibration and testing scans, established a performance assessment process, and created an infrastructure for case dissemination and result submission. Ten groups applied their own methods to 73 lung nodules (37 benign and 36 malignant) that were selected to achieve approximate size matching between the two cohorts. Area under the receiver operating characteristic curve (AUC) values for these methods ranged from 0.50 to 0.68; only three methods performed statistically better than random guessing. The radiologists' AUC values ranged from 0.70 to 0.85; three radiologists performed statistically better than the best-performing computer method. The LUNGx Challenge compared the performance of computerized methods in the task of differentiating benign from malignant lung nodules on CT scans, placed in the context of the performance of radiologists on the same task. The continued public availability of the Challenge cases will provide a valuable resource for the medical imaging research community.

14.
Med Phys ; 40(10): 101906, 2013 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-24089908

RESUMEN

PURPOSE: The primary aim of the present study was to test the feasibility of predicting diagnostic errors in mammography by merging radiologists' gaze behavior and image characteristics. A secondary aim was to investigate group-based and personalized predictive models for radiologists of variable experience levels. METHODS: The study was performed for the clinical task of assessing the likelihood of malignancy of mammographic masses. Eye-tracking data and diagnostic decisions for 40 cases were acquired from four Radiology residents and two breast imaging experts as part of an IRB-approved pilot study. Gaze behavior features were extracted from the eye-tracking data. Computer-generated and BIRADS images features were extracted from the images. Finally, machine learning algorithms were used to merge gaze and image features for predicting human error. Feature selection was thoroughly explored to determine the relative contribution of the various features. Group-based and personalized user modeling was also investigated. RESULTS: Machine learning can be used to predict diagnostic error by merging gaze behavior characteristics from the radiologist and textural characteristics from the image under review. Leveraging data collected from multiple readers produced a reasonable group model [area under the ROC curve (AUC) = 0.792 ± 0.030]. Personalized user modeling was far more accurate for the more experienced readers (AUC = 0.837 ± 0.029) than for the less experienced ones (AUC = 0.667 ± 0.099). The best performing group-based and personalized predictive models involved combinations of both gaze and image features. CONCLUSIONS: Diagnostic errors in mammography can be predicted to a good extent by leveraging the radiologists' gaze behavior and image content.


Asunto(s)
Errores Diagnósticos , Movimientos Oculares , Procesamiento de Imagen Asistido por Computador/métodos , Mamografía/métodos , Radiología , Adulto , Anciano , Anciano de 80 o más Años , Inteligencia Artificial , Técnicas de Apoyo para la Decisión , Estudios de Factibilidad , Humanos , Internado no Médico , Persona de Mediana Edad , Variaciones Dependientes del Observador
15.
Acad Radiol ; 19(7): 865-71, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22459643

RESUMEN

RATIONALE AND OBJECTIVE: The objective of this study is to test the hypothesis that there are patterns in erroneous assessment of BI-RADS features among radiology trainees when interpreting mammographic masses and that these patterns can be captured in individualized statistical user models. Identifying these patterns could be useful in personalizing and adapting educational material to complement the individual weaknesses of each trainee during his or her mammography education. MATERIALS AND METHODS: Reading data of 33 mammographic cases containing masses was used. The cases were individually described by 10 radiology residents using four BI-RADS features: mass shape, mass margin, mass density and parenchyma density. For each resident, an individual model was automatically constructed that predicts likelihood (HIGH or LOW) of erroneously assigning each BI-RADS descriptor by the resident. Error was defined as deviation of the resident's assessment from the expert assessments. We evaluated the predictive performance of the models using leave-one-out crossvalidation. RESULTS: The user models were able to predict which assessments have higher likelihood of error. The proportion of actual errors to the number of situations in which these errors could potentially occur was significantly higher (P < .05) when user-model assigned HIGH likelihood of error than when LOW likelihood of error was assigned for three of the four BI-RADS features. Overall, the difference between the HIGH and LOW likelihood of error groups was statistically significant (P < .0001) combining all four features. CONCLUSION: Error making in BI-RADS descriptor assessment appears to follow patterns that can be captured with statistical pattern recognition-based user models.


Asunto(s)
Neoplasias de la Mama/diagnóstico por imagen , Errores Diagnósticos , Internado y Residencia , Mamografía , Radiología/educación , Femenino , Humanos , Modelos Estadísticos
16.
Neural Netw ; 25(1): 141-5, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21820273

RESUMEN

Case selection is a useful approach for increasing the efficiency and performance of case-based classifiers. Multiple techniques have been designed to perform case selection. This paper empirically investigates how class imbalance in the available set of training cases can impact the performance of the resulting classifier as well as properties of the selected set. In this study, the experiments are performed using a dataset for the problem of detecting breast masses in screening mammograms. The classification problem was binary and we used a k-nearest neighbor classifier. The classifier's performance was evaluated using the receiver operating characteristic (ROC) area under the curve (AUC) measure. The experimental results indicate that although class imbalance reduces the performance of the derived classifier and the effectiveness of selection at improving overall classifier performance, case selection can still be beneficial, regardless of the level of class imbalance.


Asunto(s)
Interpretación Estadística de Datos , Toma de Decisiones Asistida por Computador , Mamografía/clasificación , Femenino , Humanos , Mamografía/estadística & datos numéricos
17.
J Biomed Inform ; 44(5): 815-23, 2011 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-21554985

RESUMEN

Development of a computational decision aid for a new medical imaging modality typically is a long and complicated process. It consists of collecting data in the form of images and annotations, development of image processing and pattern recognition algorithms for analysis of the new images and finally testing of the resulting system. Since new imaging modalities are developed more rapidly than ever before, any effort for decreasing the time and cost of this development process could result in maximizing the benefit of the new imaging modality to patients by making the computer aids quickly available to radiologists that interpret the images. In this paper, we make a step in this direction and investigate the possibility of translating the knowledge about the detection problem from one imaging modality to another. Specifically, we present a computer-aided detection (CAD) system for mammographic masses that uses a mutual information-based template matching scheme with intelligently selected templates. We presented principles of template matching with mutual information for mammography before. In this paper, we present an implementation of those principles in a complete computer-aided detection system. The proposed system, through an automatic optimization process, chooses the most useful templates (mammographic regions of interest) using a large database of previously collected and annotated mammograms. Through this process, the knowledge about the task of detecting masses in mammograms is incorporated in the system. Then, we evaluate whether our system developed for screen-film mammograms can be successfully applied not only to other mammograms but also to digital breast tomosynthesis (DBT) reconstructed slices without adding any DBT cases for training. Our rationale is that since mutual information is known to be a robust inter-modality image similarity measure, it has high potential of transferring knowledge between modalities in the context of the mass detection task. Experimental evaluation of the system on mammograms showed competitive performance compared to other mammography CAD systems recently published in the literature. When the system was applied "as-is" to DBT, its performance was notably worse than that for mammograms. However, with a simple additional preprocessing step, the performance of the system reached levels similar to that obtained for mammograms. In conclusion, the presented CAD system not only performed competitively on screen-film mammograms but it also performed robustly on DBT showing that direct transfer of knowledge across breast imaging modalities for mass detection is in fact possible.


Asunto(s)
Mama/patología , Mamografía/métodos , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Algoritmos , Neoplasias de la Mama/diagnóstico , Neoplasias de la Mama/diagnóstico por imagen , Diagnóstico por Computador/métodos , Femenino , Humanos , Reconocimiento de Normas Patrones Automatizadas
18.
Phys Med Biol ; 56(2): 473-89, 2011 Jan 21.
Artículo en Inglés | MEDLINE | ID: mdl-21191152

RESUMEN

When constructing a pattern classifier, it is important to make best use of the instances (a.k.a. cases, examples, patterns or prototypes) available for its development. In this paper we present an extensive comparative analysis of algorithms that, given a pool of previously acquired instances, attempt to select those that will be the most effective to construct an instance-based classifier in terms of classification performance, time efficiency and storage requirements. We evaluate seven previously proposed instance selection algorithms and compare their performance to simple random selection of instances. We perform the evaluation using k-nearest neighbor classifier and three classification problems: one with simulated Gaussian data and two based on clinical databases for breast cancer detection and diagnosis, respectively. Finally, we evaluate the impact of the number of instances available for selection on the performance of the selection algorithms and conduct initial analysis of the selected instances. The experiments show that for all investigated classification problems, it was possible to reduce the size of the original development dataset to less than 3% of its initial size while maintaining or improving the classification performance. Random mutation hill climbing emerges as the superior selection algorithm. Furthermore, we show that some previously proposed algorithms perform worse than random selection. Regarding the impact of the number of instances available for the classifier development on the performance of the selection algorithms, we confirm that the selection algorithms are generally more effective as the pool of available instances increases. In conclusion, instance selection is generally beneficial for instance-based classifiers as it can improve their performance, reduce their storage requirements and improve their response time. However, choosing the right selection algorithm is crucial.


Asunto(s)
Neoplasias de la Mama/diagnóstico , Sistemas de Apoyo a Decisiones Clínicas , Mamografía/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Interpretación de Imagen Radiográfica Asistida por Computador/métodos , Algoritmos , Bases de Datos Factuales , Humanos
19.
Med Phys ; 37(11): 5728-36, 2010 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-21158284

RESUMEN

PURPOSE: Conventional computer-assisted detection (CADe) systems in screening mammography provide the same decision support to all users. The aim of this study was to investigate the potential of a context-sensitive CADe system which provides decision support guided by each user's focus of attention during visual search and reporting patterns for a specific case. METHODS: An observer study for the detection of malignant masses in screening mammograms was conducted in which six radiologists evaluated 20 mammograms while wearing an eye-tracking device. Eye-position data and diagnostic decisions were collected for each radiologist and case they reviewed. These cases were subsequently analyzed with an in-house knowledge-based CADe system using two different modes: Conventional mode with a globally fixed decision threshold and context-sensitive mode with a location-variable decision threshold based on the radiologists' eye dwelling data and reporting information. RESULTS: The CADe system operating in conventional mode had 85.7% per-image malignant mass sensitivity at 3.15 false positives per image (FPsI). The same system operating in context-sensitive mode provided personalized decision support at 85.7%-100% sensitivity and 0.35-0.40 FPsI to all six radiologists. Furthermore, context-sensitive CADe system could improve the radiologists' sensitivity and reduce their performance gap more effectively than conventional CADe. CONCLUSIONS: Context-sensitive CADe support shows promise in delineating and reducing the radiologists' perceptual and cognitive errors in the diagnostic interpretation of screening mammograms more effectively than conventional CADe.


Asunto(s)
Neoplasias de la Mama/diagnóstico , Mamografía/métodos , Cognición , Toma de Decisiones , Técnicas de Apoyo para la Decisión , Diagnóstico por Computador , Errores Diagnósticos/prevención & control , Reacciones Falso Positivas , Femenino , Humanos , Variaciones Dependientes del Observador , Percepción , Radiología/métodos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
20.
Med Phys ; 37(3): 1152-60, 2010 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-20384251

RESUMEN

PURPOSE: The authors propose the framework for an individualized adaptive computer-aided educational system in mammography that is based on user modeling. The underlying hypothesis is that user models can be developed to capture the individual error making patterns of radiologists-in-training. In this pilot study, the authors test the above hypothesis for the task of breast cancer diagnosis in mammograms. METHODS: The concept of a user model was formalized as the function that relates image features to the likelihood/extent of the diagnostic error made by a radiologist-in-training and therefore to the level of difficulty that a case will pose to the radiologist-in-training (or "user"). Then, machine learning algorithms were implemented to build such user models. Specifically, the authors explored k-nearest neighbor, artificial neural networks, and multiple regression for the task of building the model using observer data collected from ten Radiology residents at Duke University Medical Center for the problem of breast mass diagnosis in mammograms. For each resident, a user-specific model was constructed that predicts the user's expected level of difficulty for each presented case based on two BI-RADS image features. In the experiments, leave-one-out data handling scheme was applied to assign each case to a low-predicted-difficulty or a high-predicted-difficulty group for each resident based on each of the three user models. To evaluate whether the user model is useful in predicting difficulty, the authors performed statistical tests using the generalized estimating equations approach to determine whether the mean actual error is the same or not between the low-predicted-difficulty group and the high-predicted-difficulty group. RESULTS: When the results for all observers were pulled together, the actual errors made by residents were statistically significantly higher for cases in the high-predicted-difficulty group than for cases in the low-predicted-difficulty group for all modeling algorithms (p < or = 0.002 for all methods). This indicates that the user models were able to accurately predict difficulty level of the analyzed cases. Furthermore, the authors determined that among the two BI-RADS features that were used in this study, mass margin was the most useful in predicting individual user errors. CONCLUSIONS: The pilot study shows promise for developing individual user models that can accurately predict the level of difficulty that each case will pose to the radiologist-in-training. These models could allow for constructing adaptive computer-aided educational systems in mammography.


Asunto(s)
Instrucción por Computador/métodos , Evaluación Educacional/métodos , Mamografía , Competencia Profesional , Radiología/educación , Interfaz Usuario-Computador , North Carolina , Proyectos Piloto
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...