Search | VHL Regional Portal

Correlation Aware Relevance-Based Semantic Index for Clinical Big Data Repository.

Deshpande, Priya; Rasin, Alexander.

J Imaging Inform Med ; 2024 Apr 23.

Article in English | MEDLINE | ID: mdl-38653911

ABSTRACT

In this paper, we focus on indexing mechanisms for unstructured clinical big integrated data repository systems. Clinical data is unstructured and heterogeneous, which comes in different files and formats. Accessing data efficiently and effectively are critical challenges. Traditional indexing mechanisms are difficult to apply on unstructured data, especially by identifying correlation information between clinical data elements. In this research work, we developed a correlation-aware relevance-based index that retrieves clinical data by fetching most relevant cases efficiently. In our previous work, we designed a methodology that categorizes medical data based on the semantics of data elements and merges them into an integrated repository. We developed a data integration system for medical data sources that combines heterogeneous medical data and provides access to knowledge-based database repositories to different users. In this research work, we designed an indexing system using semantic tags extracted from clinical data sources and medical ontologies that retrieves relevant data from database repositories and speeds up the process of data retrieval. Our objective is to provide an integrated biomedical database repository that can be used by radiologists as a reference, or for patient care, or by researchers. In this paper, we focus on designing a technique that performs data processing for data integration, learn the semantic properties of data elements, and develop a correlation-aware topic index that facilitates efficient data retrieval. We generated semantic tags by identifying key elements from integrated clinical cases using topic modeling techniques. We investigated a technique that identifies tags for merged categories and provides an index to fetch data from an integrated database repository. We developed a topic coherence matrix that shows how well a topic is supported by a corpus from clinical cases and medical ontologies. We were able to find more relevant results using an annotation index from an integrated database repository, and there was a 61% increase in a recall. We evaluated results with the help of experts and compared them with naive index (index with all terms from the corpus). Our approach improved data retrieval quality by providing most relevant results and reduced data retrieval time as we applied correlation-aware index on an integrated data repository. Topic indexing approach proposed in this research work identifies tags based on a correlation between different data elements, improves data retrieval time, and provides most relevant cases as an outcome of this system.

Biomedical heterogeneous data categorization and schema mapping toward data integration.

Deshpande, Priya; Rasin, Alexander; Tchoua, Roselyne; Furst, Jacob; Raicu, Daniela; Schinkel, Michiel; Trivedi, Hari; Antani, Sameer.

Front Big Data ; 6: 1173038, 2023.

Article in English | MEDLINE | ID: mdl-37139170

ABSTRACT

Data integration is a well-motivated problem in the clinical data science domain. Availability of patient data, reference clinical cases, and datasets for research have the potential to advance the healthcare industry. However, the unstructured (text, audio, or video data) and heterogeneous nature of the data, the variety of data standards and formats, and patient privacy constraint make data interoperability and integration a challenge. The clinical text is further categorized into different semantic groups and may be stored in different files and formats. Even the same organization may store cases in different data structures, making data integration more challenging. With such inherent complexity, domain experts and domain knowledge are often necessary to perform data integration. However, expert human labor is time and cost prohibitive. To overcome the variability in the structure, format, and content of the different data sources, we map the text into common categories and compute similarity within those. In this paper, we present a method to categorize and merge clinical data by considering the underlying semantics behind the cases and use reference information about the cases to perform data integration. Evaluation shows that we were able to merge 88% of clinical data from five different sources.

Learning Latent Spiculated Features for Lung Nodule Characterization.

Qiu, Bowen; Furst, Jacob; Rasin, Alexander; Tchoua, Roselyne; Raicu, Daniela.

Annu Int Conf IEEE Eng Med Biol Soc ; 2020: 1254-1257, 2020 07.

Article in English | MEDLINE | ID: mdl-33018215

ABSTRACT

Computer-aided Diagnosis (CAD) systems have long aimed to be used in clinical practice to help doctors make decisions by providing a second opinion. However, most machine learning based CAD systems make predictions without explicitly showing how their predictions were generated. Since the cognitive process of the diagnostic imaging interpretation involves various visual characteristics of the region of interest, the explainability of the results should leverage those characteristics. We encode visual characteristics of the region of interest based on pairs of similar images rather than the image content by itself. Using a Siamese convolutional neural network (SCNN), we first learn the similarity among nodules, then encode image content using the SCNN similarity-based feature representation, and lastly, we apply the K-nearest neighbor (KNN) approach to make diagnostic characterizations using the Siamese-based image features. We demonstrate the feasibility of our approach on spiculation, a visual characteristic that radiologists consider when interpreting the degree of cancer malignancy, and the NIH/NCI Lung Image Database Consortium (LIDC) dataset that contains both spiculation and malignancy characteristics for lung nodules.Clinical Relevance - This establishes that spiculation can be quantified to automate the diagnostic characterization of lung nodules in Computed Tomography images.

Subject(s)

Lung Neoplasms , Radiographic Image Interpretation, Computer-Assisted , Humans , Lung , Lung Neoplasms/diagnostic imaging , Neural Networks, Computer , Tomography, X-Ray Computed

Ontology-Based Radiology Teaching File Summarization, Coverage, and Integration.

Deshpande, Priya; Rasin, Alexander; Son, Jun; Kim, Sungmin; Brown, Eli; Furst, Jacob; Raicu, Daniela S; Montner, Steven M; Armato, Samuel G.

J Digit Imaging ; 33(3): 797-813, 2020 06.

Article in English | MEDLINE | ID: mdl-32253657

ABSTRACT

Radiology teaching file repositories contain a large amount of information about patient health and radiologist interpretation of medical findings. Although valuable for radiology education, the use of teaching file repositories has been hindered by the ability to perform advanced searches on these repositories given the unstructured format of the data and the sparseness of the different repositories. Our term coverage analysis of two major medical ontologies, Radiology Lexicon (RadLex) and Unified Medical Language System (UMLS) Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and two teaching file repositories, Medical Imaging Resource Community (MIRC) and MyPacs, showed that both ontologies combined cover 56.3% of terms in the MIRC and only 17.9% of terms in MyPacs. Furthermore, the overlap between the two ontologies (i.e., terms included by both the RadLex and UMLS SNOMED CT) was a mere 5.6% for the MIRC and 2% for the RadLex. Clustering the content of the teaching file repositories showed that they focus on different diagnostic areas within radiology. The MIRC teaching file covers mostly pediatric cases; a few cases are female patients with heart-, chest-, and bone-related diseases. The MyPacs contains a range of different diseases with no focus on a particular disease category, gender, or age group. MyPacs also provides a wide variety of cases related to the neck, face, heart, chest, and breast. These findings provide valuable insights on what new cases should be added or how existent cases may be integrated to provide more comprehensive data repositories. Similarly, the low-term coverage by the ontologies shows the need to expand ontologies with new terminology such as new terms learned from these teaching file repositories and validated by experts. While our methodology to organize and index data using clustering approaches and medical ontologies is applied to teaching file repositories, it can be applied to any other medical clinical data.

Subject(s)

Computer-Assisted Instruction , Radiology Information Systems , Radiology , Child , Female , Humans , Radiography , Radiology/education , Systematized Nomenclature of Medicine

Using Real-Time Social Media Technologies to Monitor Levels of Perceived Stress and Emotional State in College Students: A Web-Based Questionnaire Study.

Liu, Sam; Zhu, Miaoqi; Yu, Dong Jin; Rasin, Alexander; Young, Sean D.

JMIR Ment Health ; 4(1): e2, 2017 Jan 10.

Article in English | MEDLINE | ID: mdl-28073737

ABSTRACT

BACKGROUND: College can be stressful for many freshmen as they cope with a variety of stressors. Excess stress can negatively affect both psychological and physical health. Thus, there is a need to find innovative and cost-effective strategies to help identify students experiencing high levels of stress to receive appropriate treatment. Social media use has been rapidly growing, and recent studies have reported that data from these technologies can be used for public health surveillance. Currently, no studies have examined whether Twitter data can be used to monitor stress level and emotional state among college students. OBJECTIVE: The primary objective of our study was to investigate whether students' perceived levels of stress were associated with the sentiment and emotions of their tweets. The secondary objective was to explore whether students' emotional state was associated with the sentiment and emotions of their tweets. METHODS: We recruited 181 first-year freshman students aged 18-20 years at University of California, Los Angeles. All participants were asked to complete a questionnaire that assessed their demographic characteristics, levels of stress, and emotional state for the last 7 days. All questionnaires were completed within a 48-hour period. All tweets posted by the participants from that week (November 2 to 8, 2015) were mined and manually categorized based on their sentiment (positive, negative, neutral) and emotion (anger, fear, love, happiness) expressed. Ordinal regressions were used to assess whether weekly levels of stress and emotional states were associated with the percentage of positive, neutral, negative, anger, fear, love, or happiness tweets. RESULTS: A total of 121 participants completed the survey and were included in our analysis. A total of 1879 tweets were analyzed. A higher level of weekly stress was significantly associated with a greater percentage of negative sentiment tweets (beta=1.7, SE 0.7; P=.02) and tweets containing emotions of fear (beta=2.4, SE 0.9; P=.01) and love (beta=3.6, SE 1.4; P=.01). A greater level of anger was negatively associated with the percentage of positive sentiment (beta=-1.6, SE 0.8; P=.05) and tweets related to the emotions of happiness (beta=-2.2, SE 0.9; P=.02). A greater level of fear was positively associated with the percentage of negative sentiment (beta=1.67, SE 0.7; P=.01), particularly a greater proportion of tweets related to the emotion of fear (beta=2.4, SE 0.8; P=.01). Participants who reported a greater level of love showed a smaller percentage of negative sentiment tweets (beta=-1.3, SE 0.7; P=0.05). Emotions of happiness were positively associated with the percentage of tweets related to the emotion of happiness (beta=-1.8, SE 0.8; P=.02) and negatively associated with percentage of negative sentiment tweets (beta=-1.7, SE 0.7; P=.02) and tweets related to the emotion of fear (beta=-2.8, SE 0.8; P=.01). CONCLUSIONS: Sentiment and emotions expressed in the tweets have the potential to provide real-time monitoring of stress level and emotional well-being in college students.

Assessing diagnostic complexity: An image feature-based strategy to reduce annotation costs.

Zamacona, Jose R; Niehaus, Ronald; Rasin, Alexander; Furst, Jacob D; Raicu, Daniela S.

Comput Biol Med ; 62: 294-305, 2015 Jul.

Article in English | MEDLINE | ID: mdl-25712071

ABSTRACT

Computer-aided diagnosis systems can play an important role in lowering the workload of clinical radiologists and reducing costs by automatically analyzing vast amounts of image data and providing meaningful and timely insights during the decision making process. In this paper, we present strategies on how to better manage the limited time of clinical radiologists in conjunction with predictive model diagnosis. We first introduce a metric for discriminating between the different categories of diagnostic complexity (such as easy versus hard) encountered when interpreting CT scans. Second, we propose to learn the diagnostic complexity using a classification approach based on low-level image features automatically extracted from pixel data. We then show how this classification can be used to decide how to best allocate additional radiologists to interpret a case based on its diagnosis category. Using a lung nodule image dataset, we determined that, by a simple division of cases into hard and easy to diagnose, the number of interpretations can be distributed to significantly lower the cost with limited loss in prediction accuracy. Furthermore, we show that with just a few low-level image features (18% of the original set) we are able to determine the easy from hard cases for a significant subset (66%) of the lung nodule image data.

Subject(s)

Diagnosis, Computer-Assisted/methods , Image Processing, Computer-Assisted/methods , Lung Neoplasms/diagnostic imaging , Diagnosis, Computer-Assisted/economics , Female , Humans , Image Processing, Computer-Assisted/economics , Male , Radiography

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL