Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
1.
Pac Symp Biocomput ; 29: 108-119, 2024.
Article in English | MEDLINE | ID: mdl-38160273

ABSTRACT

Classical machine learning and deep learning models for Computer-Aided Diagnosis (CAD) commonly focus on overall classification performance, treating misclassification errors (false negatives and false positives) equally during training. This uniform treatment overlooks the distinct costs associated with each type of error, leading to suboptimal decision-making, particularly in the medical domain where it is important to improve the prediction sensitivity without significantly compromising overall accuracy. This study introduces a novel deep learning-based CAD system that incorporates a cost-sensitive parameter into the activation function. By applying our methodologies to two medical imaging datasets, our proposed study shows statistically significant increases of 3.84% and 5.4% in sensitivity while maintaining overall accuracy for Lung Image Database Consortium (LIDC) and Breast Cancer Histological Database (BreakHis), respectively. Our findings underscore the significance of integrating cost-sensitive parameters into future CAD systems to optimize performance and ultimately reduce costs and improve patient outcomes.


Subject(s)
Deep Learning , Humans , Computational Biology , Diagnosis, Computer-Assisted/methods , Lung , Computers
2.
JMIR Hum Factors ; 10: e46120, 2023 09 08.
Article in English | MEDLINE | ID: mdl-37682590

ABSTRACT

BACKGROUND: Understanding the communication between physicians and patients can identify areas where they can improve and build stronger relationships. This led to better patient outcomes including increased engagement, enhanced adherence to treatment plan, and a boost in trust. OBJECTIVE: This study investigates eye gaze directions of physicians, patients, and computers in naturalistic medical encounters at Federally Qualified Health Centers to understand communication patterns given different patients' diverse backgrounds. The aim is to support the building and designing of health information technologies, which will facilitate the improvement of patient outcomes. METHODS: Data were obtained from 77 videotaped medical encounters in 2014 from 3 Federally Qualified Health Centers in Chicago, Illinois, that included 11 physicians and 77 patients. Self-reported surveys were collected from physicians and patients. A systematic analysis approach was used to thoroughly examine and analyze the data. The dynamics of eye gazes during interactions between physicians, patients, and computers were evaluated using the lag sequential analysis method. The objective of the study was to identify significant behavior patterns from the 6 predefined patterns initiated by both physicians and patients. The association between eye gaze patterns was examined using the Pearson chi-square test and the Yule Q test. RESULTS: The results of the lag sequential method showed that 3 out of 6 doctor-initiated gaze patterns were followed by patient-response gaze patterns. Moreover, 4 out of 6 patient-initiated patterns were significantly followed by doctor-response gaze patterns. Unlike the findings in previous studies, doctor-initiated eye gaze behavior patterns were not leading patients' eye gaze. Moreover, patient-initiated eye gaze behavior patterns were significant in certain circumstances, particularly when interacting with physicians. CONCLUSIONS: This study examined several physician-patient-computer interaction patterns in naturalistic settings using lag sequential analysis. The data indicated a significant influence of the patients' gazes on physicians. The findings revealed that physicians demonstrated a higher tendency to engage with patients by reciprocating the patient's eye gaze when the patient looked at them. However, the reverse pattern was not observed, suggesting a lack of reciprocal gaze from patients toward physicians and a tendency to not direct their gaze toward a specific object. Furthermore, patients exhibited a preference for the computer when physicians directed their eye gaze toward it.


Subject(s)
Fixation, Ocular , Physicians , Humans , Chicago , Communication , Computers
3.
Front Big Data ; 6: 1173038, 2023.
Article in English | MEDLINE | ID: mdl-37139170

ABSTRACT

Data integration is a well-motivated problem in the clinical data science domain. Availability of patient data, reference clinical cases, and datasets for research have the potential to advance the healthcare industry. However, the unstructured (text, audio, or video data) and heterogeneous nature of the data, the variety of data standards and formats, and patient privacy constraint make data interoperability and integration a challenge. The clinical text is further categorized into different semantic groups and may be stored in different files and formats. Even the same organization may store cases in different data structures, making data integration more challenging. With such inherent complexity, domain experts and domain knowledge are often necessary to perform data integration. However, expert human labor is time and cost prohibitive. To overcome the variability in the structure, format, and content of the different data sources, we map the text into common categories and compute similarity within those. In this paper, we present a method to categorize and merge clinical data by considering the underlying semantics behind the cases and use reference information about the cases to perform data integration. Evaluation shows that we were able to merge 88% of clinical data from five different sources.

4.
Annu Int Conf IEEE Eng Med Biol Soc ; 2020: 1254-1257, 2020 07.
Article in English | MEDLINE | ID: mdl-33018215

ABSTRACT

Computer-aided Diagnosis (CAD) systems have long aimed to be used in clinical practice to help doctors make decisions by providing a second opinion. However, most machine learning based CAD systems make predictions without explicitly showing how their predictions were generated. Since the cognitive process of the diagnostic imaging interpretation involves various visual characteristics of the region of interest, the explainability of the results should leverage those characteristics. We encode visual characteristics of the region of interest based on pairs of similar images rather than the image content by itself. Using a Siamese convolutional neural network (SCNN), we first learn the similarity among nodules, then encode image content using the SCNN similarity-based feature representation, and lastly, we apply the K-nearest neighbor (KNN) approach to make diagnostic characterizations using the Siamese-based image features. We demonstrate the feasibility of our approach on spiculation, a visual characteristic that radiologists consider when interpreting the degree of cancer malignancy, and the NIH/NCI Lung Image Database Consortium (LIDC) dataset that contains both spiculation and malignancy characteristics for lung nodules.Clinical Relevance - This establishes that spiculation can be quantified to automate the diagnostic characterization of lung nodules in Computed Tomography images.


Subject(s)
Lung Neoplasms , Radiographic Image Interpretation, Computer-Assisted , Humans , Lung , Lung Neoplasms/diagnostic imaging , Neural Networks, Computer , Tomography, X-Ray Computed
5.
J Digit Imaging ; 33(3): 797-813, 2020 06.
Article in English | MEDLINE | ID: mdl-32253657

ABSTRACT

Radiology teaching file repositories contain a large amount of information about patient health and radiologist interpretation of medical findings. Although valuable for radiology education, the use of teaching file repositories has been hindered by the ability to perform advanced searches on these repositories given the unstructured format of the data and the sparseness of the different repositories. Our term coverage analysis of two major medical ontologies, Radiology Lexicon (RadLex) and Unified Medical Language System (UMLS) Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and two teaching file repositories, Medical Imaging Resource Community (MIRC) and MyPacs, showed that both ontologies combined cover 56.3% of terms in the MIRC and only 17.9% of terms in MyPacs. Furthermore, the overlap between the two ontologies (i.e., terms included by both the RadLex and UMLS SNOMED CT) was a mere 5.6% for the MIRC and 2% for the RadLex. Clustering the content of the teaching file repositories showed that they focus on different diagnostic areas within radiology. The MIRC teaching file covers mostly pediatric cases; a few cases are female patients with heart-, chest-, and bone-related diseases. The MyPacs contains a range of different diseases with no focus on a particular disease category, gender, or age group. MyPacs also provides a wide variety of cases related to the neck, face, heart, chest, and breast. These findings provide valuable insights on what new cases should be added or how existent cases may be integrated to provide more comprehensive data repositories. Similarly, the low-term coverage by the ontologies shows the need to expand ontologies with new terminology such as new terms learned from these teaching file repositories and validated by experts. While our methodology to organize and index data using clustering approaches and medical ontologies is applied to teaching file repositories, it can be applied to any other medical clinical data.


Subject(s)
Computer-Assisted Instruction , Radiology Information Systems , Radiology , Child , Female , Humans , Radiography , Radiology/education , Systematized Nomenclature of Medicine
7.
Biomed Opt Express ; 10(2): 914-931, 2019 Feb 01.
Article in English | MEDLINE | ID: mdl-30800523

ABSTRACT

Age-related macular degeneration (AMD) is a degenerative aging disorder, which can lead to irreversible vision loss in older individuals. The emergence of clinical applications of retinal hyper-spectral imaging provides a unique opportunity to capture important spectral signatures, with the potential to enhance the molecular diagnosis of retinal diseases. In this study, we use a machine learning classification approach to explore whether hyper-spectral images offer an improved outcome compared to standard RGB images. Our results show that the classifier performs better on hyper-spectral images with improved accuracy and sensitivity for drusen classification compared to standard imaging. By examining the most important features in the classification task, our data suggest that drusen are highly heterogeneous. Our work provides further evidence that hyper-spectral retinal image data are uniquely suited for computer-aided diagnosis and detection techniques.

8.
BMC Bioinformatics ; 19(Suppl 8): 211, 2018 06 13.
Article in English | MEDLINE | ID: mdl-29897319

ABSTRACT

BACKGROUND: Suicide is an alarming public health problem accounting for a considerable number of deaths each year worldwide. Many more individuals contemplate suicide. Understanding the attributes, characteristics, and exposures correlated with suicide remains an urgent and significant problem. As social networking sites have become more common, users have adopted these sites to talk about intensely personal topics, among them their thoughts about suicide. Such data has previously been evaluated by analyzing the language features of social media posts and using factors derived by domain experts to identify at-risk users. RESULTS: In this work, we automatically extract informal latent recurring topics of suicidal ideation found in social media posts. Our evaluation demonstrates that we are able to automatically reproduce many of the expertly determined risk factors for suicide. Moreover, we identify many informal latent topics related to suicide ideation such as concerns over health, work, self-image, and financial issues. CONCLUSIONS: These informal topics topics can be more specific or more general. Some of our topics express meaningful ideas not contained in the risk factors and some risk factors do not have complimentary latent topics. In short, our analysis of the latent topics extracted from social media containing suicidal ideations suggests that users of these systems express ideas that are complementary to the topics defined by experts but differ in their scope, focus, and precision of language.


Subject(s)
Information Storage and Retrieval , Internet , Social Media , Suicidal Ideation , Adolescent , Algorithms , Automation , Female , Humans , Language , Male , Middle Aged , Risk Factors
9.
Surgery ; 164(3): 379-386, 2018 09.
Article in English | MEDLINE | ID: mdl-29801732

ABSTRACT

BACKGROUND: This study aimed to determine whether publicized hospital rankings can be used to predict surgical outcomes. METHODS: Patients undergoing one of nine surgical procedures were identified, using the Healthcare Cost and Utilization Project State Inpatient Database for Florida and New York 2011-2013 and merged with hospital data from the American Hospital Association Annual Survey. Nine quality designations were analyzed as possible predictors of inpatient mortality and postoperative complications, using logistic regression, decision trees, and support vector machines. RESULTS: We identified 229,657 patients within 177 hospitals. Decision trees were the highest performing machine learning algorithm for predicting inpatient mortality and postoperative complications (accuracy 0.83, P<.001). The top 3 variables associated with low surgical mortality (relative impact) were Hospital Compare (42), total procedure volume (16) and, Joint Commission (12). When analyzed separately for each individual procedure, hospital quality awards were not predictors of postoperative complications for 7 of the 9 studied procedures. However, when grouping together procedures with a volume-outcome relationship, hospital ranking becomes a significant predictor of postoperative complications. CONCLUSION: Hospital quality rankings are not a reliable indicator of quality for all surgical procedures. Hospital and provider quality must be evaluated with an emphasis on creating consistent, reliable, and accurate measures of quality that translate to improved patient outcomes.


Subject(s)
Awards and Prizes , Hospitals , Quality of Health Care , Surgical Procedures, Operative/statistics & numerical data , Florida , Hospital Mortality , Hospitalization/statistics & numerical data , Humans , Machine Learning , New York , Postoperative Complications/epidemiology , Sensitivity and Specificity , Surgical Procedures, Operative/adverse effects , Surgical Procedures, Operative/mortality
10.
Surgery ; 160(4): 839-849, 2016 10.
Article in English | MEDLINE | ID: mdl-27524432

ABSTRACT

BACKGROUND: Our objective was to determine the hospital resources required for low-volume, high-quality care at high-volume cancer resection centers. METHODS: Patients who underwent esophageal, pancreatic, and rectal resection for malignancy were identified using Healthcare Cost and Utilization Project State Inpatient Database (Florida and California) between 2007 and 2011. Annual case volume by procedure was used to identify high- and low-volume centers. Hospital data were obtained from the American Hospital Association Annual Survey Database. Procedure risk-adjusted mortality was calculated for each hospital using multilevel, mixed-effects models. RESULTS: A total of 24,784 patients from 302 hospitals met the inclusion criteria. Of these, 13 hospitals were classified as having a high-volume, oncologic resection ecosystem by being a high-volume hospital for ≥2 studied procedures. A total of 11 of 31 studied hospital factors were strongly associated with hospitals that performed a high volume of cancer resections and were used to develop the High Volume Ecosystem for Oncologic Resections (HIVE-OR) score. At low-volume centers, increasing HIVE-OR score resulted in decreased mortality for rectal cancer resection (P = .038). HIVE-OR was not related to risk-adjusted mortality for esophagectomy (P = .421) or pancreatectomy (P = .413) at low-volume centers. CONCLUSION: Our study found that in some settings, low-volume, high-quality cancer surgical care can be explained by having a high-volume ecosystem.


Subject(s)
Colectomy/mortality , Esophagectomy/mortality , Hospital Mortality/trends , Hospitals, High-Volume , Pancreatectomy/mortality , Quality of Health Care , Aged , Colectomy/methods , Databases, Factual , Ecosystem , Esophagectomy/methods , Female , Health Care Surveys , Humans , Length of Stay/statistics & numerical data , Male , Middle Aged , Outcome Assessment, Health Care , Pancreatectomy/methods , Role , Survival Analysis , United States
11.
Comput Math Methods Med ; 2016: 3516089, 2016.
Article in English | MEDLINE | ID: mdl-27462364

ABSTRACT

The nematode Caenorhabditis elegans explores the environment using a combination of different movement patterns, which include straight movement, reversal, and turns. We propose to quantify C. elegans movement behavior using a computer vision approach based on run-length encoding of step-length data. In this approach, the path of C. elegans is encoded as a string of characters, where each character represents a path segment of a specific type of movement. With these encoded string data, we perform k-means cluster analysis to distinguish movement behaviors resulting from different genotypes and food availability. We found that shallow and sharp turns are the most critical factors in distinguishing the differences among the movement behaviors. To validate our approach, we examined the movement behavior of tph-1 mutants that lack an enzyme responsible for serotonin biosynthesis. A k-means cluster analysis with the path string-encoded data showed that tph-1 movement behavior on food is similar to that of wild-type animals off food. We suggest that this run-length encoding approach is applicable to trajectory data in animal or human mobility data.


Subject(s)
Appetitive Behavior , Behavior, Animal , Caenorhabditis elegans/physiology , Algorithms , Animals , Cluster Analysis , Computational Biology/methods , Feeding Behavior , Genotype , Machine Learning , Movement , Pattern Recognition, Automated , Software
12.
PLoS One ; 10(12): e0145870, 2015.
Article in English | MEDLINE | ID: mdl-26713869

ABSTRACT

The nematode Caenorhabditis elegans provides a unique opportunity to interrogate the neural basis of behavior at single neuron resolution. In C. elegans, neural circuits that control behaviors can be formulated based on its complete neural connection map, and easily assessed by applying advanced genetic tools that allow for modulation in the activity of specific neurons. Importantly, C. elegans exhibits several elaborate behaviors that can be empirically quantified and analyzed, thus providing a means to assess the contribution of specific neural circuits to behavioral output. Particularly, locomotory behavior can be recorded and analyzed with computational and mathematical tools. Here, we describe a robust single worm-tracking system, which is based on the open-source Python programming language, and an analysis system, which implements path-related algorithms. Our tracking system was designed to accommodate worms that explore a large area with frequent turns and reversals at high speeds. As a proof of principle, we used our tracker to record the movements of wild-type animals that were freshly removed from abundant bacterial food, and determined how wild-type animals change locomotory behavior over a long period of time. Consistent with previous findings, we observed that wild-type animals show a transition from area-restricted local search to global search over time. Intriguingly, we found that wild-type animals initially exhibit short, random movements interrupted by infrequent long trajectories. This movement pattern often coincides with local/global search behavior, and visually resembles Lévy flight search, a search behavior conserved across species. Our mathematical analysis showed that while most of the animals exhibited Brownian walks, approximately 20% of the animals exhibited Lévy flights, indicating that C. elegans can use Lévy flights for efficient food search. In summary, our tracker and analysis software will help analyze the neural basis of the alteration and transition of C. elegans locomotory behavior in a food-deprived condition.


Subject(s)
Behavior, Animal , Caenorhabditis elegans/physiology , Locomotion , Programming Languages , Algorithms , Animals , Caenorhabditis elegans/genetics , Caenorhabditis elegans Proteins/genetics , Food , Mutation
13.
BMC Neurosci ; 16: 26, 2015 Apr 24.
Article in English | MEDLINE | ID: mdl-25907097

ABSTRACT

BACKGROUND: Large conductance, calcium-activated BK channels regulate many important physiological processes, including smooth muscle excitation, hormone release and synaptic transmission. The biological roles of these channels hinge on their unique ability to respond synergistically to both voltage and cytosolic calcium elevations. Because calcium influx is meticulously regulated both spatially and temporally, the localization of BK channels near calcium channels is critical for their proper function. However, the mechanism underlying BK channel localization near calcium channels is not fully understood. RESULTS: We show here that in C. elegans the localization of SLO-1/BK channels to presynaptic terminals, where UNC-2/CaV2 calcium channels regulate neurotransmitter release, is controlled by the hierarchical organization of CTN-1/α-catulin and DYB-1/dystrobrevin, two proteins that interact with cortical cytoskeletal proteins. CTN-1 organizes a macromolecular SLO-1 channel complex at presynaptic terminals by direct physical interaction. DYB-1 contributes to the maintenance or stabilization of the complex at presynaptic terminals by interacting with CTN-1. We also show that SLO-1 channels are functionally coupled with UNC-2 calcium channels, and that normal localization of SLO-1 to presynaptic terminals requires UNC-2. In the absence of UNC-2, SLO-1 clusters lose the localization specificity, thus accumulating inside and outside of presynaptic terminals. Moreover, CTN-1 is also similarly localized in unc-2 mutants, consistent with the direct interaction between CTN-1 and SLO-1. However, localization of UNC-2 at the presynaptic terminals is not dependent on either CTN-1 or SLO-1. Taken together, our data strongly suggest that the absence of UNC-2 indirectly influences SLO-1 localization via the reorganization of cytoskeletal proteins. CONCLUSION: CTN-1 and DYB-1, which interact with cortical cytoskeletal proteins, are required for the presynaptic punctate localization of SLO-1 in a hierarchical manner. In addition, UNC-2 calcium channels indirectly control the fidelity of SLO-1 puncta localization at presynaptic terminals. We suggest that the absence of UNC-2 leads to the reorganization of the cytoskeletal structure that includes CTN-1, which in turn influences SLO-1 puncta localization.


Subject(s)
Caenorhabditis elegans Proteins/metabolism , Large-Conductance Calcium-Activated Potassium Channels/metabolism , Membrane Proteins/metabolism , Nerve Tissue Proteins/metabolism , Presynaptic Terminals/metabolism , alpha Catenin/metabolism , Animals , Animals, Genetically Modified , Caenorhabditis elegans , Caenorhabditis elegans Proteins/genetics , Large-Conductance Calcium-Activated Potassium Channels/genetics , Locomotion/physiology , Membrane Proteins/genetics , Microscopy, Fluorescence , Mutation
14.
J Digit Imaging ; 28(6): 704-17, 2015 Dec.
Article in English | MEDLINE | ID: mdl-25708891

ABSTRACT

We analyze the importance of shape features for predicting spiculation ratings assigned by radiologists to lung nodules in computed tomography (CT) scans. Using the Lung Image Database Consortium (LIDC) data and classification models based on decision trees, we demonstrate that the importance of several shape features increases disproportionately relative to other image features with increasing size of the nodule. Our shaped-based classification results show an area under the receiver operating characteristic (ROC) curve of 0.65 when classifying spiculation for small nodules and an area of 0.91 for large nodules, resulting in a 26% difference in classification performance using shape features. An analysis of the results illustrates that this change in performance is driven by features that measure boundary complexity, which perform well for large nodules but perform relatively poorly and do no better than other features for small nodules. For large nodules, the roughness of the segmented boundary maps well to the semantic concept of spiculation. For small nodules, measuring directly the complexity of hard segmentations does not yield good results for predicting spiculation due to limits imposed by spatial resolution and the uncertainty in boundary location. Therefore, a wider range of features, including shape, texture, and intensity features, are needed to predict spiculation ratings for small nodules. A further implication is that the efficacy of shape features for a particular classifier used to create computer-aided diagnosis systems depends on the distribution of nodule sizes in the training and testing sets, which may not be consistent across different research studies.


Subject(s)
Lung Neoplasms/diagnostic imaging , Radiographic Image Interpretation, Computer-Assisted/methods , Solitary Pulmonary Nodule/diagnostic imaging , Tomography, X-Ray Computed/methods , Humans , Lung/diagnostic imaging , ROC Curve , Reproducibility of Results , Sensitivity and Specificity
15.
Comput Biol Med ; 62: 294-305, 2015 Jul.
Article in English | MEDLINE | ID: mdl-25712071

ABSTRACT

Computer-aided diagnosis systems can play an important role in lowering the workload of clinical radiologists and reducing costs by automatically analyzing vast amounts of image data and providing meaningful and timely insights during the decision making process. In this paper, we present strategies on how to better manage the limited time of clinical radiologists in conjunction with predictive model diagnosis. We first introduce a metric for discriminating between the different categories of diagnostic complexity (such as easy versus hard) encountered when interpreting CT scans. Second, we propose to learn the diagnostic complexity using a classification approach based on low-level image features automatically extracted from pixel data. We then show how this classification can be used to decide how to best allocate additional radiologists to interpret a case based on its diagnosis category. Using a lung nodule image dataset, we determined that, by a simple division of cases into hard and easy to diagnose, the number of interpretations can be distributed to significantly lower the cost with limited loss in prediction accuracy. Furthermore, we show that with just a few low-level image features (18% of the original set) we are able to determine the easy from hard cases for a significant subset (66%) of the lung nodule image data.


Subject(s)
Diagnosis, Computer-Assisted/methods , Image Processing, Computer-Assisted/methods , Lung Neoplasms/diagnostic imaging , Diagnosis, Computer-Assisted/economics , Female , Humans , Image Processing, Computer-Assisted/economics , Male , Radiography
16.
Int J Comput Assist Radiol Surg ; 7(2): 323-9, 2012 Mar.
Article in English | MEDLINE | ID: mdl-21671095

ABSTRACT

PURPOSE: Classification of a suspicious mass (region of interest, ROI) in a mammogram as malignant or benign may be achieved using mass shape features. An ensemble system was built for this purpose and tested. METHODS: Multiple contours were generated from a single ROI using various parameter settings of the image enhancement functions for the segmentation. For each segmented contour, the mass shape features were computed. For classification, the dataset was partitioned into four subsets based on the patient age (young/old) and the ROI size (large/small). We built an ensemble learning system consisting of four single classifiers, where each classifier is a specialist, trained specifically for one of the subsets. Those specialist classifiers are also an optimal classifier for the subset, selected from several candidate classifiers through preliminary experiment. In this scheme, the final diagnosis (malignant or benign) of an instance is the classification produced by the classifier trained for the subset to which the instance belongs. RESULTS: The Digital Database for Screening Mammography (DDSM) from the University of South Florida was used to test the ensemble system for classification of masses, which achieved a 72% overall accuracy. This ensemble of specialist classifiers achieved better performance than single classification (56%). CONCLUSION: An ensemble classifier for mammography-detected masses may provide superior performance to any single classifier in distinguishing benign from malignant cases.


Subject(s)
Breast Neoplasms/diagnostic imaging , Breast Neoplasms/pathology , Mammography/methods , Radiographic Image Interpretation, Computer-Assisted , Adult , Age Factors , Breast Diseases/diagnostic imaging , Breast Diseases/pathology , Breast Neoplasms/diagnosis , Computer-Aided Design , Diagnosis, Differential , Female , Humans , Mammography/instrumentation , Middle Aged , Reproducibility of Results , Systems Analysis
17.
J Digit Imaging ; 25(3): 423-36, 2012 Jun.
Article in English | MEDLINE | ID: mdl-22193755

ABSTRACT

Traditionally, image studies evaluating the effectiveness of computer-aided diagnosis (CAD) use a single label from a medical expert compared with a single label produced by CAD. The purpose of this research is to present a CAD system based on Belief Decision Tree classification algorithm, capable of learning from probabilistic input (based on intra-reader variability) and providing probabilistic output. We compared our approach against a traditional decision tree approach with respect to a traditional performance metric (accuracy) and a probabilistic one (area under the distance-threshold curve-AuC(dt)). The probabilistic classification technique showed notable performance improvement in comparison with the traditional one with respect to both evaluation metrics. Specifically, when applying cross-validation technique on the training subset of instances, boosts of 28.26% and 30.28% were noted for the probabilistic approach with respect to accuracy and AuC(dt), respectively. Furthermore, on the validation subset of instances, boosts of 20.64% and 23.21% were noted again for the probabilistic approach with respect to the same two metrics. In addition, we compared our CAD system results with diagnostic data available for a small subset of the Lung Image Database Consortium database. We discovered that when our CAD system errs, it generally does so with low confidence. Predictions produced by the system also agree with diagnoses of truly benign nodules more often than radiologists, offering the possibility of reducing the false positives.


Subject(s)
Algorithms , Decision Trees , Diagnosis, Computer-Assisted/methods , Lung Neoplasms/diagnostic imaging , Radiographic Image Interpretation, Computer-Assisted/methods , Tomography, X-Ray Computed , Area Under Curve , Artificial Intelligence , Diagnosis, Differential , Humans , Markov Chains , Probability , ROC Curve
18.
J Digit Imaging ; 24(2): 256-70, 2011 Apr.
Article in English | MEDLINE | ID: mdl-20390436

ABSTRACT

Ideally, an image should be reported and interpreted in the same way (e.g., the same perceived likelihood of malignancy) or similarly by any two radiologists; however, as much research has demonstrated, this is not often the case. Various efforts have made an attempt at tackling the problem of reducing the variability in radiologists' interpretations of images. The Lung Image Database Consortium (LIDC) has provided a database of lung nodule images and associated radiologist ratings in an effort to provide images to aid in the analysis of computer-aided tools. Likewise, the Radiological Society of North America has developed a radiological lexicon called RadLex. As such, the goal of this paper is to investigate the feasibility of associating LIDC characteristics and terminology with RadLex terminology. If matches between LIDC characteristics and RadLex terms are found, probabilistic models based on image features may be used as decision-based rules to predict if an image or lung nodule could be characterized or classified as an associated RadLex term. The results of this study were matches for 25 (74%) out of 34 LIDC terms in RadLex. This suggests that LIDC characteristics and associated rating terminology may be better conceptualized or reduced to produce even more matches with RadLex. Ultimately, the goal is to identify and establish a more standardized rating system and terminology to reduce the subjective variability between radiologist annotations. A standardized rating system can then be utilized by future researchers to develop automatic annotation models and tools for computer-aided decision systems.


Subject(s)
Databases, Factual , Lung Neoplasms/diagnostic imaging , Radiographic Image Interpretation, Computer-Assisted/methods , Radiology Information Systems , Terminology as Topic , Tomography, X-Ray Computed/methods , Feasibility Studies , Humans , Lung/diagnostic imaging , Lung Neoplasms/classification , North America , Societies, Medical
19.
Article in English | MEDLINE | ID: mdl-22255337

ABSTRACT

In reading Computed Tomography (CT) scans with potentially malignant lung nodules, radiologists make use of high level information (semantic characteristics) in their analysis. Computer-Aided Diagnostic Characterization (CADc) systems can assist radiologists by offering a "second opinion"--predicting these semantic characteristics for lung nodules. In this work, we propose a way of predicting the distribution of radiologists' opinions using a multiple-label classification algorithm based on belief decision trees using the National Cancer Institute (NCI) Lung Image Database Consortium (LIDC) dataset, which includes semantic annotations by up to four human radiologists for each one of the 914 nodules. Furthermore, we evaluate our multiple-label results using a novel distance-threshold curve technique--and, measuring the area under this curve, obtain 69% performance on the validation subset. We conclude that multiple-label classification algorithms are an appropriate method of representing the diagnoses of multiple radiologists on lung CT scans when ground truth is unavailable.


Subject(s)
Decision Trees , Lung Neoplasms/classification , Probability , Humans , Lung Neoplasms/diagnostic imaging , Tomography, X-Ray Computed
20.
Article in English | MEDLINE | ID: mdl-19965054

ABSTRACT

Clinical narratives, such as radiology and pathology reports, are commonly available in electronic form. However, they are also commonly entered and stored as free text. Knowledge of the structure of clinical narratives is necessary for enhancing the productivity of healthcare departments and facilitating research. This study attempts to automatically segment medical reports into semantic sections. Our goal is to develop a robust and scalable medical report segmentation system requiring minimum user input for efficient retrieval and extraction of information from free-text clinical narratives. Hand-crafted rules were used to automatically identify a high-confidence training set. This automatically created training dataset was later used to develop metrics and an algorithm that determines the semantic structure of the medical reports. A word-vector cosine similarity metric combined with several heuristics was used to classify each report sentence into one of several pre-defined semantic sections. This baseline algorithm achieved 79% accuracy. A Support Vector Machine (SVM) classifier trained on additional formatting and contextual features was able to achieve 90% accuracy. Plans for future work include developing a configurable system that could accommodate various medical report formatting and content standards.


Subject(s)
Algorithms , Artificial Intelligence , Documentation/methods , Information Storage and Retrieval/methods , Medical Records , Natural Language Processing , Pattern Recognition, Automated/methods , Semantics
SELECTION OF CITATIONS
SEARCH DETAIL
...