Search | VHL Regional Portal

1.

Effect of semantic distance on learning structured query language: An empirical study.

Shin, Shin-Shing.

Front Psychol ; 13: 996363, 2022.

Article in English | MEDLINE | ID: mdl-36438342

ABSTRACT

Students of database courses usually encounter difficulties in learning structured query language (SQL). Numerous studies have been conducted to improve how students learn SQL. However, learning SQL remains difficult. This study analyzed the difficulties in learning SQL from the viewpoint of semantic distance by using semantic network theory. An experiment involving a database course was performed to assess the influence of semantic distance on learners' understanding of SQL. The participants were requested to perform a query-writing task at the end of the course to investigate their understanding of SQL. The data analysis results indicated that the participants developed a better understanding of the formulation-to-planning transformation than the planning-to-coding transformation. This implies that the semantic distance of the planning-to-coding transformation is greater than that of the formulation-to-planning transformation, and the semantic distance of the planning-to-coding transformation is attributable to the semantic transformation from natural language to SQL, which are two essentially different languages and belong to different knowledge categories. Accordingly, this study concludes that SQL learning difficulties can mainly be ascribed to the planning-to-coding transformation because the large semantic distance. The findings suggest that SQL instructions should emphasize the semantic mapping of the planning-to-coding transformation by incorporating materials related to the transformation and should shorten the semantic distance involved in learning SQL. These two principles can be used to evaluate the effectiveness of SQL teaching methods in assisting SQL learning, and motivate researchers to develop more effective teaching methods from the viewpoint of semantic distance.

2.

Use of automatic SQL generation interface to enhance transparency and validity of health-data analysis.

Wagholikar, Kavishwar B; Zelle, David; Ainsworth, Layne; Chaney, Kira; Blood, Alexander J; Miller, Angela; Chulyadyo, Rupendra; Oates, Michael; Gordon, William J; Aronson, Samuel J; Scirica, Benjamin M; Murphy, Shawn N.

Inform Med Unlocked ; 312022.

Article in English | MEDLINE | ID: mdl-35874460

ABSTRACT

Analysis of health data typically requires development of queries using structured query language (SQL) by a data-analyst. As the SQL queries are manually created, they are prone to errors. In addition, accurate implementation of the queries depends on effective communication with clinical experts, that further makes the analysis error prone. As a potential resolution, we explore an alternative approach wherein a graphical interface that automatically generates the SQL queries is used to perform the analysis. The latter allows clinical experts to directly perform complex queries on the data, despite their unfamiliarity with SQL syntax. The interface provides an intuitive understanding of the query logic which makes the analysis transparent and comprehensible to the clinical study-staff, thereby enhancing the transparency and validity of the analysis. This study demonstrates the feasibility of using a user-friendly interface that automatically generate SQL for analysis of health data. It outlines challenges that will be useful for designing user-friendly tools to improve transparency and reproducibility of data analysis.

3.

Neural Network Assisted Pathology Case Identification.

Cheng, Jerome.

J Pathol Inform ; 13: 100008, 2022.

Article in English | MEDLINE | ID: mdl-35242447

ABSTRACT

BACKGROUND: Traditionally, cases for cohort selection and quality assurance purposes are identified through structured query language (SQL) searches matching specific keywords. Recently, several neural network-based natural language processing (NLP) pipelines have emerged as an accurate alternative/complementary method for case retrieval. METHODS: The diagnosis section of 1000 pathology reports with the terms "colon" and "carcinoma" were retrieved from our laboratory information system through a SQL query. Each of the reports were labeled as either positive or negative, where cases are considered positive if the case was a primary adenocarcinoma of the colon. Negative cases comprised adenocarcinoma from other sites, metastatic adenocarcinomas, benign conditions, rectal cancers, and other cases that do not fit in the primary colonic adenocarcinoma category. The 1000 cases were randomly separated into training, validation, and holdout sets. A convolutional neural network (CNN) model built using Keras (a neural network library) was trained to identify positive cases, and the model was applied to the holdout set to predict the category for each case. RESULTS: The CNN model classified 141 out of 149 primary colonic adenocarcinoma cases, and 43 out of 51 negative cases correctly, achieving an accuracy of 92% and area under the ROC curve (AUC) of 0.957. CONCLUSION: Trained convolutional neural network models by itself, or as an adjunct to keyword and pattern-based text extraction methods may be used to search for pathology cases of interest with high accuracy.

4.

Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre.

Hunter, Benjamin; Reis, Sara; Campbell, Des; Matharu, Sheila; Ratnakumar, Prashanthi; Mercuri, Luca; Hindocha, Sumeet; Kalsi, Hardeep; Mayer, Erik; Glampson, Ben; Robinson, Emily J; Al-Lazikani, Bisan; Scerri, Lisa; Bloch, Susannah; Lee, Richard.

Front Med (Lausanne) ; 8: 748168, 2021.

Article in English | MEDLINE | ID: mdl-34805217

ABSTRACT

Importance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation. Objective: To automate lung nodule identification in a tertiary cancer centre. Methods: This retrospective cohort study used Electronic Healthcare Records to identify CT reports generated between 31st October 2011 and 24th July 2020. A structured query language/natural language processing tool was developed to classify reports according to lung nodule status. Performance was externally validated. Sentences were used to train machine-learning classifiers to predict concerning nodule features in 2,000 patients. Results: 14,586 patients with lung nodules were identified. The cancer types most commonly associated with lung nodules were lung (39%), neuro-endocrine (38%), skin (35%), colorectal (33%) and sarcoma (33%). Lung nodule patients had a greater proportion of metastatic diagnoses (45 vs. 23%, p < 0.001), a higher mean post-baseline scan number (6.56 vs. 1.93, p < 0.001), and a shorter mean scan interval (4.1 vs. 5.9 months, p < 0.001) than those without nodules. Inter-observer agreement for sentence classification was 0.94 internally and 0.98 externally. Sensitivity and specificity for nodule identification were 93 and 99% internally, and 100 and 100% at external validation, respectively. A linear-support vector machine model predicted concerning sentence features with 94% accuracy. Conclusion: We have developed and validated an accurate tool for automated lung nodule identification that is valuable for service evaluation and research data acquisition.

5.

Validation of rule-based algorithms to determine colorectal, breast, and cervical cancer screening status using electronic health record data from an urban healthcare system in New York City.

Leder Macek, Aleeza J; Kirschenbaum, Joshua D; Ricklan, Sarah J; Schreiber-Stainthorp, William; Omene, Britney C; Conderino, Sarah.

Prev Med Rep ; 24: 101599, 2021 Dec.

Article in English | MEDLINE | ID: mdl-34976656

ABSTRACT

Although cancer screening has greatly reduced colorectal cancer, breast cancer, and cervical cancer morbidity and mortality over the last few decades, adherence to cancer screening guidelines remains inconsistent, particularly among certain demographic groups. This study aims to validate a rule-based algorithm to determine adherence to cancer screening. A novel screening algorithm was applied to electronic health record (EHR) from an urban healthcare system in New York City to automatically determine adherence to national cancer screening guidelines for patients deemed eligible for screening. First, a subset of patients was randomly selected from the EHR and their data were exported in a de-identified manner for manual review of screening adherence by two teams of human reviewers. Interrater reliability for manual review was calculated using Cohen's Kappa and found to be high in all instances. The sensitivity and specificity of the algorithm was calculated by comparing the algorithm to the final manual dataset. When assessing cancer screening adherence, the algorithm performed with a high sensitivity (79%, 70%, 80%) and specificity (92%, 99%, 97%) for colorectal cancer, breast cancer, and cervical cancer screenings, respectively. This study validates an algorithm that can effectively determine patient adherence to colorectal cancer, breast cancer, and cervical cancer screening guidelines. This design improves upon previous methods of algorithm validation by using computerized extraction of essential components of patients' EHRs and by using de-identified data for manual review. Use of the described algorithm could allow for more precise and efficient allocation of public health resources to improve cancer screening rates.

6.

Making big data small.

Fan, Wenfei.

Proc Math Phys Eng Sci ; 475(2225): 20190034, 2019 May.

Article in English | MEDLINE | ID: mdl-31236056

ABSTRACT

Big data analytics is often prohibitively costly and is typically conducted by parallel processing with a cluster of machines. Is big data analytics beyond the reach of small companies that can only afford limited resources? This paper tackles this question by presenting Boundedly EvAlable SQL (BEAS), a system for querying big relations with constrained resources. The idea is to make big data small. To answer a query posed on a dataset, it often suffices to access a small fraction of the data no matter how big the dataset is. In the light of this, BEAS answers queries on big data by identifying and fetching a small set of the data needed. Under available resources, it computes exact answers whenever possible and otherwise approximate answers with accuracy guarantees. Underlying BEAS are principled approaches of bounded evaluation and data-driven approximation, the focus of this paper.

7.

Automated identification of an aspirin-exacerbated respiratory disease cohort.

Cahill, Katherine N; Johns, Christina B; Cui, Jing; Wickner, Paige; Bates, David W; Laidlaw, Tanya M; Beeler, Patrick E.

J Allergy Clin Immunol ; 139(3): 819-825.e6, 2017 Mar.

Article in English | MEDLINE | ID: mdl-27567328

ABSTRACT

BACKGROUND: Aspirin-exacerbated respiratory disease (AERD) is characterized by 3 clinical features: asthma, nasal polyposis, and respiratory reactions to cyclooxygenase-1 inhibitors (nonsteroidal anti-inflammatory drugs). Electronic health records (EHRs) contain information on each feature of this triad. OBJECTIVE: We sought to determine whether an informatics algorithm applied to the EHR could electronically identify patients with AERD. METHODS: We developed an informatics algorithm to search the EHRs of patients aged 18 years and older from the Partners Healthcare system over a 10-year period (2004-2014). Charts with search terms for asthma, nasal polyps, and record of respiratory (cohort A) or unspecified (cohort B) reactions to nonsteroidal anti-inflammatory drugs were identified as "possible AERD." Two clinical experts reviewed all charts to confirm a diagnosis of "clinical AERD" and classify cases as "diagnosed AERD" or "undiagnosed AERD" on the basis of physician-documented AERD-specific terms in patient notes. RESULTS: Our algorithm identified 731 "possible AERD" cases, of which 638 were not in our AERD patient registry. Chart review of cohorts A (n = 511) and B (n = 127) demonstrated a positive predictive value of 78.4% for "clinical AERD," which rose to 88.7% when unspecified reactions were excluded. Of those with clinical AERD, 12.4% had no mention of AERD by any treating caregiver and were classified as "undiagnosed AERD." "Undiagnosed AERD" cases were less likely than "diagnosed AERD" cases to have been seen by an allergist/immunologist (38.7% vs 93.2%; P < .0001). CONCLUSIONS: An informatics algorithm can successfully identify both known and previously undiagnosed cases of AERD with a high positive predictive value. Involvement of an allergist/immunologist significantly increases the likelihood of an AERD diagnosis.

Subject(s)

Algorithms , Asthma, Aspirin-Induced/diagnosis , Cyclooxygenase Inhibitors/adverse effects , Nasal Polyps/diagnosis , Adult , Aged , Computational Biology , Female , Humans , Male , Middle Aged

8.

Glaucoma database.

K, Rangachari; M, Dhivya; Pj, Eswari Pandaranayaka; N, Prasanthi; P, Sundaresan; Sr, Krishnadas; S, Krishnaswamy.

Bioinformation ; 5(9): 398-9, 2011 Feb 07.

Article in English | MEDLINE | ID: mdl-21383909

ABSTRACT

UNLABELLED: Glaucoma, a complex heterogenous disease, is the leading cause for optic nerve-related blindness worldwide. Primary open angle glaucoma (POAG) is the most common subset and by the year 2020 it is estimated that approximately 60 million people will be affected. MYOC, OPTN, CYP1B1 and WDR36 are the important candidate genes. Nearly 4% of the glaucoma patients have mutation in any one of these genes. Mutation in any of these genes causes disease either directly or indirectly and the severity of the disease varies according to position of the genes. We have compiled all the related mutations and SNPs in the above genes and developed a database, to help access statistical and clinical information of particular mutation. This database is available online at http:bicmku.in:8081/glaucoma The database, constructed using SQL, contains data pertaining to the SNPs and mutation information involved in the above genes and relevant study data. AVAILABILITY: The database is available for free at http:bicmku.in:8081/glaucoma.

9.

Development and Evaluation of Classification Codes and Retrieval Program of the Interpretation of Nuclear Medicine Imaging Studies / 대한의료정보학회지

Hyung-Jae LEE; Hee-Seung BOM; Seong-Young KWON; Young-Soon SEO; Jung-Jun MIN; Ho-Chun SONG.

Journal of Korean Society of Medical Informatics ; : 383-390, 2005.

Article in Korean | WPRIM (Western Pacific) | ID: wpr-91267

ABSTRACT

OBJECTIVE: To evaluate usefulness of the classification codes and retrieval program of the interpretation of nuclear medicine imaging studies. METHODS: We retrieved specific results of the interpretation of 3,613 nuclear medicine imaging studies from database server of the Chonnam National University Hwasun Hospital using classification code retrieval program or by searching narrative phrases using structured query language(SQL). Accuracy of the retrieved results as well as retrieval time in each group were compared. RESULTS: Retrieved results using SQL showed lower accuracy than those using classification codes. There was no delay of response or overload of network traffic whether we used either retrieval program or SQL. CONCLUSION: Retrieval of specific results from database of the interpretation of nuclear medicine imaging studies using classification codes with retrieval program was more accurate and convenient than searching narrative phrases using SQL.

Subject(s)

Classification , Nuclear Medicine

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL