Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 968
Filter
1.
Gigascience ; 132024 Jan 02.
Article in English | MEDLINE | ID: mdl-39101783

ABSTRACT

BACKGROUND: Visualization is an indispensable facet of genomic data analysis. Despite the abundance of specialized visualization tools, there remains a distinct need for tailored solutions. However, their implementation typically requires extensive programming expertise from bioinformaticians and software developers, especially when building interactive applications. Toolkits based on visualization grammars offer a more accessible, declarative way to author new visualizations. Yet, current grammar-based solutions fall short in adequately supporting the interactive analysis of large datasets with extensive sample collections, a pivotal task often encountered in cancer research. FINDINGS: We present GenomeSpy, a grammar-based toolkit for authoring tailored, interactive visualizations for genomic data analysis. By using combinatorial building blocks and a declarative language, users can implement new visualization designs easily and embed them in web pages or end-user-oriented applications. A distinctive element of GenomeSpy's architecture is its effective use of the graphics processing unit in all rendering, enabling a high frame rate and smoothly animated interactions, such as navigation within a genome. We demonstrate the utility of GenomeSpy by characterizing the genomic landscape of 753 ovarian cancer samples from patients in the DECIDER clinical trial. Our results expand the understanding of the genomic architecture in ovarian cancer, particularly the diversity of chromosomal instability. CONCLUSIONS: GenomeSpy is a visualization toolkit applicable to a wide range of tasks pertinent to genome analysis. It offers high flexibility and exceptional performance in interactive analysis. The toolkit is open source with an MIT license, implemented in JavaScript, and available at https://genomespy.app/.


Subject(s)
Genomics , Software , Humans , Genomics/methods , Computer Graphics , Neoplasms/genetics , Ovarian Neoplasms/genetics , Genome, Human , User-Computer Interface , Female , Computational Biology/methods
2.
Vaccine ; 42(21): 126179, 2024 Aug 07.
Article in English | MEDLINE | ID: mdl-39116485

ABSTRACT

BACKGROUND: The Advisory Committee on Immunization Practices (ACIP) recommends early childhood vaccinations, but knowledge is limited about the magnitude and timing of vaccine delay for each recommended dose on a population level. We sought to characterize longitudinal patient-level patterns of early childhood vaccination schedule adherence. METHODS: Using the Merative MarketScan Commercial Database (2009-2019), we identified commercially-insured infants who received at least one timely dose of a 2-month recommended vaccine. We categorized the number of recommended vaccines administered on the same date at 2, 4, 6, and 12-15 months of age (grace period: -7, +21 days). A Sankey diagram illustrated the number of vaccines received concomitantly during each age window and depicted transitions to different states over time (e.g., no vaccine delay to vaccine delay). For each vaccine dose, we estimated the cumulative incidence of receipt. RESULTS: Among 1,239,364 eligible children, 28% of infants aged 4 months and 38% of infants aged 6 months did not receive timely, concomitant administration of all recommended vaccines. The number of timely vaccines received concomitantly and age at receipt varied most for doses recommended during the second year of life. Children with a previously delayed (versus timely) dose consistently experienced longer time to subsequent dose. CONCLUSIONS: National coverage improved over time for all recommended vaccine doses under study, most notably for measles, mumps, and rubella. However, many children do not receive vaccines on schedule. Interventions to maintain adherence to the recommended schedule are needed early in life.

3.
Curr Protoc ; 4(8): e1120, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39126338

ABSTRACT

JBrowse 2 is a modular genome browser that can visualize many common genomic file formats. While JBrowse 2 supports a variety of different usages, it is particularly suited for deployment on websites, such as model organism databases or other web-based genomic data resources. This protocol provides detailed instructions for setting up JBrowse 2 on an Ubuntu Linux web server, loading a reference genome from a FASTA format file, and adding a gene annotation track from a GFF3 format file. By the end of the protocol, users will have a working JBrowse 2 instance that is accessible via the web. © 2024 The Author(s). Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Setting up JBrowse 2 on your web server.


Subject(s)
Genomics , Genomics/methods , Software , Web Browser , Databases, Genetic , Internet , Genome/genetics , Humans , User-Computer Interface
4.
Mol Ecol Resour ; : e13996, 2024 Aug 04.
Article in English | MEDLINE | ID: mdl-39099161

ABSTRACT

The analysis of meta-omics data requires the utilization of several bioinformatics tools and proficiency in informatics. The integration of multiple meta-omics data is even more challenging, and the outputs of existing bioinformatics solutions are not always easy to interpret. Here, we present a meta-omics bioinformatics pipeline, Meta-Omics Software for Community Analysis (MOSCA), which aims to overcome these limitations. MOSCA was initially developed for analysing metagenomics (MG) and metatranscriptomics (MT) data. Now, it also performs MG and metaproteomics (MP) integrated analysis, and MG/MT analysis was upgraded with an additional iterative binning step, metabolic pathways mapping, and several improvements regarding functional annotation and data visualization. MOSCA handles raw sequencing data and mass spectra and performs pre-processing, assembly, annotation, binning and differential gene/protein expression analysis. MOSCA shows taxonomic and functional analysis in large tables, performs metabolic pathways mapping, generates Krona plots and shows gene/protein expression results in heatmaps, improving omics data visualization. MOSCA is easily run from a single command while also providing a web interface (MOSGUITO). Relevant features include an extensive set of customization options, allowing tailored analyses to suit specific research objectives, and the ability to restart the pipeline from intermediary checkpoints using alternative configurations. Two case studies showcased MOSCA results, giving a complete view of the anaerobic microbial communities from anaerobic digesters and insights on the role of specific microorganisms. MOSCA represents a pivotal advancement in meta-omics research, offering an intuitive, comprehensive, and versatile solution for researchers seeking to unravel the intricate tapestry of microbial communities.

5.
BMC Public Health ; 24(1): 2103, 2024 Aug 05.
Article in English | MEDLINE | ID: mdl-39098915

ABSTRACT

BACKGROUND: Black individuals in the U.S. face increasing racial disparities in drug overdose related to social determinants of health, including place-based features. Mobile outreach efforts work to mitigate social determinants by servicing geographic areas with low drug treatment and overdose prevention access but are often limited by convenience-based targets. Geographic information systems (GIS) are often used to characterize and visualize the overdose crisis and could be translated to community to guide mobile outreach services. The current study examines the initial acceptability and appropriateness of GIS to facilitate data-driven outreach for reducing overdose inequities facing Black individuals. METHODS: We convened a focus group of stakeholders (N = 8) in leadership roles at organizations conducting mobile outreach in predominantly Black neighborhoods of St. Louis, MO. Organizations represented provided adult mental health and substance use treatment or harm reduction services. Participants were prompted to discuss current outreach strategies and provided feedback on preliminary GIS-derived maps displaying regional overdose epidemiology. A reflexive approach to thematic analysis was used to extract themes. RESULTS: Four themes were identified that contextualize the acceptability and utility of an overdose visualization tool to mobile service providers in Black communities. They were: 1) importance of considering broader community context; 2) potential for awareness, engagement, and community collaboration; 3) ensuring data relevance to the affected community; and 4) data manipulation and validity concerns. CONCLUSIONS: There are several perceived benefits of using GIS to map overdose among mobile providers serving Black communities that are overburdened by the overdose crisis but under resourced. Perceived potential benefits included informing location-based targets for services as well as improving awareness of the overdose crisis and facilitating collaboration, advocacy, and resource allocation. However, as GIS-enabled visualization of drug overdose grows in science, public health, and community settings, stakeholders must consider concerns undermining community trust and benefits, particularly for Black communities facing historical inequities and ongoing disparities.


Subject(s)
Black or African American , Drug Overdose , Focus Groups , Geographic Information Systems , Humans , Drug Overdose/epidemiology , Drug Overdose/prevention & control , Drug Overdose/ethnology , Black or African American/statistics & numerical data , Community-Institutional Relations , Male , Female , Adult , Health Status Disparities , Stakeholder Participation
6.
Front Toxicol ; 6: 1437884, 2024.
Article in English | MEDLINE | ID: mdl-39104826

ABSTRACT

In environmental health, the specific molecular mechanisms connecting a chemical exposure to an adverse endpoint are often unknown, reflecting knowledge gaps. At the public Comparative Toxicogenomics Database (CTD; https://ctdbase.org/), we integrate manually curated, literature-based interactions from CTD to compute four-unit blocks of information organized as a potential step-wise molecular mechanism, known as "CGPD-tetramers," wherein a chemical interacts with a gene product to trigger a phenotype which can be linked to a disease. These computationally derived datasets can be used to fill the gaps and offer testable mechanistic information. Users can generate CGPD-tetramers for any combination of chemical, gene, phenotype, and/or disease of interest at CTD; however, such queries typically result in the generation of thousands of CGPD-tetramers. Here, we describe a novel approach to transform these large datasets into user-friendly chord diagrams using R. This visualization process is straightforward, simple to implement, and accessible to inexperienced users that have never used R before. Combining CGPD-tetramers into a single chord diagram helps identify potential key chemicals, genes, phenotypes, and diseases. This visualization allows users to more readily analyze computational datasets that can fill the exposure knowledge gaps in the environmental health continuum.

7.
Article in English | MEDLINE | ID: mdl-39003519

ABSTRACT

OBJECTIVES: To understand healthcare providers' experiences of using GlucoGuide, a mockup tool that integrates visual data analysis with algorithmic insights to support clinicians' use of patientgenerated data from Type 1 diabetes devices. MATERIALS AND METHODS: This qualitative study was conducted in three phases. In Phase 1, 11 clinicians reviewed data using commercial diabetes platforms in a think-aloud data walkthrough activity followed by semistructured interviews. In Phase 2, GlucoGuide was developed. In Phase 3, the same clinicians reviewed data using GlucoGuide in a think-aloud activity followed by semistructured interviews. Inductive thematic analysis was used to analyze transcripts of Phase 1 and Phase 3 think-aloud activity and interview. RESULTS: 3 high level tasks, 8 sub-tasks, and 4 challenges were identified in Phase 1. In Phase 2, 3 requirements for GlucoGuide were identified. Phase 3 results suggested that clinicians found GlucoGuide easier to use and experienced a lower cognitive burden as compared to the commercial diabetes data reports that were used in Phase 1. Additionally, GlucoGuide addressed the challenges experienced in Phase 1. DISCUSSION: The study suggests that the knowledge of analytical tasks and task-specific visualization strategies in implementing features of data interfaces can result in tools that lower the perceived burden of engaging with data. Additionally, supporting clinicians in contextualizing algorithmic insights by visual analysis of relevant data can positively influence clinicians' willingness to leverage algorithmic support. CONCLUSION: Task-aligned tools that combine multiple data-driven approaches, such as visualization strategies and algorithmic insights, can improve clinicians' experience in reviewing device data.

8.
Stud Health Technol Inform ; 315: 37-42, 2024 Jul 24.
Article in English | MEDLINE | ID: mdl-39049222

ABSTRACT

The pilot study explores how data visualization influences patient comprehension and engagement in understanding hyperlipidemia test results across diverse patient groups. Employing Gestalt theory and the Relational Information Display (RID) framework, intuitive visual tools were developed using Google Sheets, QlikView®, and Microsoft® Excel®. The survey conducted with patients used a Likert scale to evaluate six different line and bar graphs, each presenting the same LDL cholesterol data. The study emphasized the creation of graphs that were easily interpretable. The survey aimed to assess preferences for various data visualization formats. The survey results indicated that patients preferred stacked area charts, while healthcare providers favored line charts. The results highlight the importance of user-centric design and the effective application of theoretical frameworks in creating visualizations that enhance patient engagement and comprehension. The study highlights the role of tailored data visualizations in healthcare, emphasizing the need for such tools in user-centered health technology.


Subject(s)
Comprehension , Data Visualization , Humans , Pilot Projects , User-Computer Interface , Hyperlipidemias , Female , Male , Middle Aged
9.
Stud Health Technol Inform ; 315: 92-97, 2024 Jul 24.
Article in English | MEDLINE | ID: mdl-39049232

ABSTRACT

High cholesterol levels significantly contribute to the risk of atherosclerotic cardiovascular disease (ACVD), with a notable portion of ischemic heart disease cases linked to elevated cholesterol levels. Effective graphical displays of lipid panel tests and other cardiac risk factors are crucial for quick and accurate data interpretation, enabling early intervention for individuals with hyperlipidemia. Applying design theories such as Gestalt and distributed cognitive theories is essential for creating user-centered graphical data displays in the context of cardiovascular (CV) risk factors. The proposed dashboard informed by these theories is expected to help healthcare providers better address cardiovascular disease (CVD), enhancing diagnosis, treatment, and prevention. Moreover, this approach may help alleviate clinical provider burnout, improve patient outcomes, and reduce provider stress, thus contributing to safer and more effective healthcare systems.


Subject(s)
Atherosclerosis , Humans , User-Computer Interface , Data Visualization , Risk Factors , Heart Disease Risk Factors , Risk Assessment
10.
BMC Health Serv Res ; 24(1): 851, 2024 Jul 26.
Article in English | MEDLINE | ID: mdl-39061040

ABSTRACT

BACKGROUND: The effective management of surgical and anesthesia care relies on quality data and its readily availability for both patient-centered decision-making and facility-level improvement efforts. Recognizing this critical need, the Strengthening Systems for Improved Surgical Outcomes (SSISO) project addressed surgical care data management and information use practices across 23 health facilities from October 2019 to September 2022. This study aimed to evaluate the effectiveness of SSISO interventions in enhancing practices related to surgical data capture, reporting, analysis, and visualization. METHODS: This study employed a mixed method, pre- post intervention evaluation design to assess changes in data management and utilization practices at intervention facilities. The intervention packages included capacity building trainings, monthly mentorship visits facilitated by a hub-and-spoke approach, provision of data capture tools, and reinforcement of performance review teams. Data collection occurred at baseline (February - April 2020) and endline (April - June 2022). The evaluation focused on the availability and appropriate use of data capture tools, as well as changes in performance review practices. Appropriate use of registers was defined as filling all the necessary data onto the registers, and this was verified by completeness of selected key data elements in the registers. RESULTS: The proportion of health facilities with Operation Room (OR) scheduling, referral, and surgical site infection registers significantly increased by 34.8%, 56.5% and 87%, respectively, at project endline compared to baseline. Availability of OR and Anesthesia registers remained high throughout the project, at 91.3% and 95.6%, respectively. Furthermore, the appropriate use of these registers improved, with statistically significant increases observed for OR scheduling registers (34.8% increase). Increases were also noted for OR register (9.5% increase) and anesthesia register (4.5% increase), although not statistically significant. Assessing the prior three months reports, the report submissions to the Ministry of Health/Regional Health Bureau (MOH/RHB) rose from 85 to 100%, reflecting complete reporting at endline period. Additionally, the proportion of surgical teams analyzing and displaying data for informed decision-making significantly increased from 30.4% at baseline to 60.8% at endline period. CONCLUSION: The implemented interventions positively impacted surgical data management and utilization practice at intervention facilities. These positive changes were likely attributable to capacity building trainings and regular mentorship visits via hub-and-spoke approach. Hence, we recommend further investigation into the effectiveness of similar intervention packages in improving surgical data management, data analysis and visualization practices in low- and middle-income country settings.


Subject(s)
Quality Improvement , Humans , Ethiopia , Health Facilities/standards , Health Facilities/statistics & numerical data , Surgical Procedures, Operative/statistics & numerical data , Surgical Procedures, Operative/standards , Capacity Building , Data Management , Operating Rooms/organization & administration , Operating Rooms/standards , Operating Rooms/statistics & numerical data
11.
Heliyon ; 10(13): e32972, 2024 Jul 15.
Article in English | MEDLINE | ID: mdl-39040365

ABSTRACT

In order to address issues such as inaccurate education resource positioning and inefficient resource utilization, this study optimizes the Educational Resource Management System (ERMS) by combining image data visualization techniques with convolutional neural networks (CNNs) technology in deep learning. Firstly, the crucial role of ERMS in education and teaching is analyzed. Secondly, the application of image data visualization techniques and CNNs in the system is explained, along with the associated challenges. Finally, by optimizing the CNNs model and system architecture and validating with experimental data, the rationality of the proposed model is confirmed. Experimental results indicate a significant improvement in various performance metrics compared to traditional models. The recognition accuracy on the Mnist dataset reaches 98.1 %, and notably, on the cifar-10 dataset, the optimized model achieves an accuracy close to 98.3 % with improved runtime reduced to only 640.4 s. Additionally, through systematic simulation experiments, the designed system is shown to fully meet the earlier requirements for system functionality, validating the feasibility and rationality of the model and system in this study. Therefore, this study holds high practical value for optimizing ERMS and provides meaningful insights into image data visualization techniques and CNNs optimization.

12.
Sensors (Basel) ; 24(13)2024 Jul 05.
Article in English | MEDLINE | ID: mdl-39001148

ABSTRACT

With the advancement in information and communication technology, modern society has relied on various computing systems in areas closely related to human life. However, cyberattacks are also becoming more diverse and intelligent, with personal information and human lives being threatened. The moving target defense (MTD) strategy was designed to protect mission-critical systems from cyberattacks. The MTD strategy shifted the paradigm from passive to active system defense. However, there is a lack of indicators that can be used as a reference when deriving general system components, making it difficult to configure a systematic MTD strategy. Additionally, even when selecting system components, a method to confirm whether the systematic components are selected to respond to actual cyberattacks is needed. Therefore, in this study, we surveyed and analyzed existing cyberattack information and MTD strategy research results to configure a component dataset. Next, we found the correlation between the cyberattack information and MTD strategy component datasets and used this to design and implement the MTD-Diorama data visualization engine to configure a systematic MTD strategy. Through this, researchers can conveniently identify the attack surface contained in cyberattack information and the MTD strategies that can respond to each attack surface. Furthermore, it will allow researchers to configure more systematic MTD strategies that can be used universally without being limited to specific computing systems.


Subject(s)
Computer Security , Humans , Algorithms
13.
Am J Sports Med ; 52(8): 1915-1917, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38946456
14.
J Multidiscip Healthc ; 17: 3193-3211, 2024.
Article in English | MEDLINE | ID: mdl-39006873

ABSTRACT

Purpose: Over the past 24 years, significant advancements have been made in applying artificial intelligence (AI) to musculoskeletal (MSK) diseases. However, there is a lack of analytical and descriptive investigations on the trajectory, essential research directions, current research scenario, pivotal focuses, and future perspectives. This research aims to provide a thorough update on the progress in AI for MSK diseases over the last 24 years. Methods: Data from the Web of Science database, covering January 1, 2000, to March 1, 2024, was analyzed. Using advanced analytical tools, we conducted comprehensive scientometric and visual analyses. Results: The findings highlight the predominant influence of the USA, which accounts for 28.53% of the total publications and plays a key role in shaping research in this field. Notable productivity was seen at institutions such as the University of California, San Francisco, Harvard Medical School, and Seoul National University. Valentina Pedoia is identified as the most prolific contributor. Scientific Reports had the highest number of publications in this area. The five most significant diseases are joint diseases, bone fractures, bone tumors, cartilage diseases, and spondylitis. Conclusion: This comprehensive scientometric assessment benefits both experienced researchers and newcomers, providing quick access to essential information and fostering the development of innovative concepts in this field.

15.
J STEM Outreach ; 7(2)2024 Feb.
Article in English | MEDLINE | ID: mdl-39006760

ABSTRACT

As federal strategic plans prioritize increasing diversity within the biomedical workforce, and STEM training and outreach programs seek to recruit and retain students from historically underrepresented populations, there is a need for interrogation of traditional demographic descriptors and careful consideration of best practices for obtaining demographic data. To accelerate this work, equity-focused researchers and leaders from STEM programs convened to examine approaches for measuring demographic variables. Gender, race/ethnicity, disability, and disadvantaged background were prioritized given their focus by federal funding agencies. Categories of sex minority, sexual (orientation) minority, and gender minority (SSGM) should be included in demographic measures collected by STEM programs, consistent with recommendations from White House Executive Orders and federal reports. Our manuscript offers operationalized phrasing for demographic questions and recommendations for use across student-serving programs. Inclusive demographics permit the identification of individuals who are being excluded, marginalized, or improperly aggregated, increasing capacity to address inequities in biomedical research training. As trainees do not enter training programs with equal access, accommodations, or preparation, inclusive demographic measures can welcome trainees and inform a nuanced set of program outcomes that facilitate research on intersectionality to support the recruitment and retention of underrepresented students in biomedical research.

16.
bioRxiv ; 2024 Jul 02.
Article in English | MEDLINE | ID: mdl-39005315

ABSTRACT

Spatial transcriptomics (ST) is a powerful tool for understanding tissue biology and disease mechanisms. However, its potential is often underutilized due to the advanced data analysis and programming skills required. To address this, we present spatialGE, a web application that simplifies the analysis of ST data. The application spatialGE provides a user-friendly interface that guides users without programming expertise through various analysis pipelines, including quality control, normalization, domain detection, phenotyping, and multiple spatial analyses. It also enables comparative analysis among samples and supports various ST technologies. We demonstrate the utility of spatialGE through its application in studying the tumor microenvironment of melanoma brain metastasis and Merkel cell carcinoma. Our results highlight the ability of spatialGE to identify spatial gene expression patterns and enrichments, providing valuable insights into the tumor microenvironment and its utility in democratizing ST data analysis for the wider scientific community.

17.
Cureus ; 16(6): e63348, 2024 Jun.
Article in English | MEDLINE | ID: mdl-39077282

ABSTRACT

Clear aligner treatment (CAT) has been evolving over the past two decades. This study aims to conduct a comprehensive and up-to-date bibliometric analysis of publications related to CAT, presenting the research trends, landscapes, and hot spots in this field. All publications were retrieved from the Web of Science Core Collection from 2003 to 2023. In addition to a general analysis of research landscapes, the following items were analyzed, i.e., countries, institutions, authors, journals, publications, and keywords. A total of 1031 relevant publications were included in this study. From 2003 to the present, the number of publications and citations in this field showed an increasing trend. Italy led in terms of publication counts, and Sichuan University in China had the highest publication counts among institutions. In total, 33 scholars had published a minimum of 10 articles, and the collaborations among them were mostly within each country. The American Journal of Orthodontics and Dentofacial Orthopedics published the most relevant publications. "Predictability of tooth movements," "influencing factors for clinical efficacy," "biomechanics," and "patients' perception and periodontal health" stood out as the core research focus on CAT. Our study identified the most influential countries, institutions and authors, and their cooperative relationships, and detected hot research topics on CAT, calling for more high-quality international collaborative research in the future.

18.
Bioinformatics ; 2024 Jul 25.
Article in English | MEDLINE | ID: mdl-39052868

ABSTRACT

SUMMARY: One of the first steps in single-cell omics data analysis is visualization, which allows researchers to see how well-separated cell-types are from each other. When visualizing multiple datasets at once, data integration/batch correction methods are used to merge the datasets. While needed for downstream analyses, these methods modify features space (e.g. gene expression)/PCA space in order to mix cell-types between batches as well as possible. This obscures sample-specific features and breaks down local embedding structures that can be seen when a sample is embedded alone. Therefore, in order to improve in visual comparisons between large numbers of samples (e.g., multiple patients, omic modalities, different time points), we introduce Compound-SNE, which performs what we term a soft alignment of samples in embedding space. We show that Compound-SNE is able to align cell-types in embedding space across samples, while preserving local embedding structures from when samples are embedded independently. AVAILABILITY AND IMPLEMENTATION: Python code for Compound-SNE is available for download at https://github.com/HaghverdiLab/Compound-SNE. SUPPLEMENTARY INFORMATION: Available online. Provides algorithmic details and additional tests.

19.
BMJ Health Care Inform ; 31(1)2024 Jul 29.
Article in English | MEDLINE | ID: mdl-39074912

ABSTRACT

BACKGROUND: Despite the increasing availability of electronic healthcare record (EHR) data and wide availability of plug-and-play machine learning (ML) Application Programming Interfaces, the adoption of data-driven decision-making within routine hospital workflows thus far, has remained limited. Through the lens of deriving clusters of diagnoses by age, this study investigated the type of ML analysis that can be performed using EHR data and how results could be communicated to lay stakeholders. METHODS: Observational EHR data from a tertiary paediatric hospital, containing 61 522 unique patients and 3315 unique ICD-10 diagnosis codes was used, after preprocessing. K-means clustering was applied to identify age distributions of patient diagnoses. The final model was selected using quantitative metrics and expert assessment of the clinical validity of the clusters. Additionally, uncertainty over preprocessing decisions was analysed. FINDINGS: Four age clusters of diseases were identified, broadly aligning to ages between: 0 and 1; 1 and 5; 5 and 13; 13 and 18. Diagnoses, within the clusters, aligned to existing knowledge regarding the propensity of presentation at different ages, and sequential clusters presented known disease progressions. The results validated similar methodologies within the literature. The impact of uncertainty induced by preprocessing decisions was large at the individual diagnoses but not at a population level. Strategies for mitigating, or communicating, this uncertainty were successfully demonstrated. CONCLUSION: Unsupervised ML applied to EHR data identifies clinically relevant age distributions of diagnoses which can augment existing decision making. However, biases within healthcare datasets dramatically impact results if not appropriately mitigated or communicated.


Subject(s)
Electronic Health Records , Unsupervised Machine Learning , Humans , Child , Child, Preschool , Infant , Adolescent , Cluster Analysis , Infant, Newborn , Male , Female , Age Factors
20.
JMIR Med Inform ; 12: e49865, 2024 Jul 24.
Article in English | MEDLINE | ID: mdl-39046780

ABSTRACT

BACKGROUND: Interpretability and intuitive visualization facilitate medical knowledge generation through big data. In addition, robustness to high-dimensional and missing data is a requirement for statistical approaches in the medical domain. A method tailored to the needs of physicians must meet all the abovementioned criteria. OBJECTIVE: This study aims to develop an accessible tool for visual data exploration without the need for programming knowledge, adjusting complex parameterizations, or handling missing data. We sought to use statistical analysis using the setting of disease and control cohorts familiar to clinical researchers. We aimed to guide the user by identifying and highlighting data patterns associated with disease and reveal relations between attributes within the data set. METHODS: We introduce the attribute association graph, a novel graph structure designed for visual data exploration using robust statistical metrics. The nodes capture frequencies of participant attributes in disease and control cohorts as well as deviations between groups. The edges represent conditional relations between attributes. The graph is visualized using the Neo4j (Neo4j, Inc) data platform and can be interactively explored without the need for technical knowledge. Nodes with high deviations between cohorts and edges of noticeable conditional relationship are highlighted to guide the user during the exploration. The graph is accompanied by a dashboard visualizing variable distributions. For evaluation, we applied the graph and dashboard to the Hamburg City Health Study data set, a large cohort study conducted in the city of Hamburg, Germany. All data structures can be accessed freely by researchers, physicians, and patients. In addition, we developed a user test conducted with physicians incorporating the System Usability Scale, individual questions, and user tasks. RESULTS: We evaluated the attribute association graph and dashboard through an exemplary data analysis of participants with a general cardiovascular disease in the Hamburg City Health Study data set. All results extracted from the graph structure and dashboard are in accordance with findings from the literature, except for unusually low cholesterol levels in participants with cardiovascular disease, which could be induced by medication. In addition, 95% CIs of Pearson correlation coefficients were calculated for all associations identified during the data analysis, confirming the results. In addition, a user test with 10 physicians assessing the usability of the proposed methods was conducted. A System Usability Scale score of 70.5% and average successful task completion of 81.4% were reported. CONCLUSIONS: The proposed attribute association graph and dashboard enable intuitive visual data exploration. They are robust to high-dimensional as well as missing data and require no parameterization. The usability for clinicians was confirmed via a user test, and the validity of the statistical results was confirmed by associations known from literature and standard statistical inference.

SELECTION OF CITATIONS
SEARCH DETAIL