Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 995
Filter
1.
Appl Spectrosc ; : 37028241280669, 2024 Sep 28.
Article in English | MEDLINE | ID: mdl-39340333

ABSTRACT

Modern developments in autonomous chemometric machine learning technology strive to relinquish the need for human intervention. However, such algorithms developed and used in chemometric multivariate calibration and classification applications exclude crucial expert insight when difficult and safety-critical analysis situations arise, e.g., spectral-based medical decisions such as noninvasively determining if a biopsy is cancerous. The prediction accuracy and interpolation capabilities of autonomous methods for new samples depend on the quality and scope of their training (calibration) data. Specifically, analysis patterns within target data not captured by the training data will produce undesirable outcomes. Alternatively, using an immersive analytic approach allows insertion of human expert judgment at key machine learning algorithm junctures forming a sensemaking process performed in cooperation with a computer. The capacity of immersive virtual reality (IVR) environments to render human comprehensible three-dimensional space simulating real-world encounters, suggests its suitability as a hybrid immersive human-computer interface for data analysis tasks. Using IVR maximizes human senses to capitalize on our instinctual perception of the physical environment, thereby leveraging our innate ability to recognize patterns and visualize thresholds crucial to reducing erroneous outcomes. In this first use of IVR as an immersive analytic tool for spectral data, we examine an integrated IVR real-time model selection algorithm for a recent model updating method that adapts a model from the original calibration domain to predict samples from shifted target domains. Using near-infrared data, analyte prediction errors from IVR-selected models are reduced compared to errors using an established autonomous model selection approach. Results demonstrate the viability of IVR as a human data analysis interface for spectral data analysis including classification problems.

2.
Bioinformatics ; 2024 Sep 30.
Article in English | MEDLINE | ID: mdl-39348165

ABSTRACT

SUMMARY: Computational metabolomics workflows have revolutionized the untargeted metabolomics field. However, the organization and prioritization of metabolite features remains a laborious process. Organizing metabolomics data is often done through mass fragmentation-based spectral similarity grouping, resulting in feature sets that also represent an intuitive and scientifically meaningful first stage of analysis in untargeted metabolomics. Exploiting such feature sets, feature-set testing has emerged as an approach that is widely used in genomics and targeted metabolomics pathway enrichment analyses. It allows for formally combining groupings with statistical testing into more meaningful pathway enrichment conclusions. Here, we present msFeaST (mass spectral Feature Set Testing), a feature-set testing and visualization workflow for LC-MS/MS untargeted metabolomics data. Feature-set testing involves statistically assessing differential abundance patterns for groups of features across experimental conditions. We developed msFeaST to make use of spectral similarity-based feature groupings generated using k-medoids clustering, where the resulting clusters serve as a proxy for grouping structurally similar features with potential biosynthesis pathway relationships. Spectral clustering done in this way allows for feature group-wise statistical testing using the globaltest package, which provides high power to detect small concordant effects via joint modeling and reduced multiplicity adjustment penalties. Hence, msFeaST provides interactive integration of the semi-quantitative experimental information with mass-spectral structural similarity information, enhancing the prioritization of features and feature sets during exploratory data analysis. AVAILABILITY AND IMPLEMENTATION: The msFeaST workflow is freely available through https://github.com/kevinmildau/msFeaST and built to work on MacOS and Linux systems. SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.

3.
J Med Internet Res ; 26: e56804, 2024 Sep 17.
Article in English | MEDLINE | ID: mdl-39288409

ABSTRACT

BACKGROUND: Data dashboards have become more widely used for the public communication of health-related data, including in maternal health. OBJECTIVE: We aimed to evaluate the content and features of existing publicly available maternal health dashboards in the United States. METHODS: Through systematic searches, we identified 80 publicly available, interactive dashboards presenting US maternal health data. We abstracted and descriptively analyzed the technical features and content of identified dashboards across four areas: (1) scope and origins, (2) technical capabilities, (3) data sources and indicators, and (4) disaggregation capabilities. Where present, we abstracted and qualitatively analyzed dashboard text describing the purpose and intended audience. RESULTS: Most reviewed dashboards reported state-level data (58/80, 72%) and were hosted on a state health department website (48/80, 60%). Most dashboards reported data from only 1 (33/80, 41%) or 2 (23/80, 29%) data sources. Key indicators, such as the maternal mortality rate (10/80, 12%) and severe maternal morbidity rate (12/80, 15%), were absent from most dashboards. Included dashboards used a range of data visualizations, and most allowed some disaggregation by time (65/80, 81%), geography (65/80, 81%), and race or ethnicity (55/80, 69%). Among dashboards that identified their audience (30/80, 38%), legislators or policy makers and public health agencies or organizations were the most common audiences. CONCLUSIONS: While maternal health dashboards have proliferated, their designs and features are not standard. This assessment of maternal health dashboards in the United States found substantial variation among dashboards, including inconsistent data sources, health indicators, and disaggregation capabilities. Opportunities to strengthen dashboards include integrating a greater number of data sources, increasing disaggregation capabilities, and considering end-user needs in dashboard design.


Subject(s)
Maternal Health , United States , Humans , Maternal Health/statistics & numerical data , Female , Public Health , Pregnancy , Dashboard Systems
4.
Health Informatics J ; 30(3): 14604582241279720, 2024.
Article in English | MEDLINE | ID: mdl-39224960

ABSTRACT

The analysis of large sets of spatio-temporal data is a fundamental challenge in epidemiological research. As the quantity and the complexity of such kind of data increases, automatic analysis approaches, such as statistics, data mining, machine learning, etc., can be used to extract useful information. While these approaches have proven effective, they require a priori knowledge of the information being sought, and some interesting insights into the data may be missed. To bridge this gap, information visualization offers a set of techniques for not only presenting known information, but also exploring data without having a hypothesis formulated beforehand. In this paper, we introduce Epid Data Explorer (EDE), a visualization tool that enables exploration of spatio-temporal epidemiological data. EDE allows easy comparisons of indicators and trends across different geographical areas and times. It facilitates this exploration through ready-to-use pre-loaded datasets as well as user-chosen datasets. The tool also provides a secure architecture for easily importing new datasets while ensuring confidentiality. In two use cases using data associated with the COVID-19 epidemic, we demonstrate the substantial impact of implemented lockdown measures on mobility and how EDE allows assessing correlations between the spread of COVID-19 and weather conditions.


Subject(s)
COVID-19 , Spatio-Temporal Analysis , Humans , COVID-19/epidemiology , Data Mining/methods , Data Visualization , SARS-CoV-2 , Software
5.
Imaging Neurosci (Camb) ; 2: 1-39, 2024 Aug 01.
Article in English | MEDLINE | ID: mdl-39257641

ABSTRACT

Quality control (QC) assessment is a vital part of FMRI processing and analysis, and a typically underdiscussed aspect of reproducibility. This includes checking datasets at their very earliest stages (acquisition and conversion) through their processing steps (e.g., alignment and motion correction) to regression modeling (correct stimuli, no collinearity, valid fits, enough degrees of freedom, etc.) for each subject. There are a wide variety of features to verify throughout any single-subject processing pipeline, both quantitatively and qualitatively. We present several FMRI preprocessing QC features available in the AFNI toolbox, many of which are automatically generated by the pipeline-creation tool, afni_proc.py. These items include a modular HTML document that covers full single-subject processing from the raw data through statistical modeling, several review scripts in the results directory of processed data, and command line tools for identifying subjects with one or more quantitative properties across a group (such as triaging warnings, making exclusion criteria, or creating informational tables). The HTML itself contains several buttons that efficiently facilitate interactive investigations into the data, when deeper checks are needed beyond the systematic images. The pages are linkable, so that users can evaluate individual items across a group, for increased sensitivity to differences (e.g., in alignment or regression modeling images). Finally, the QC document contains rating buttons for each "QC block," as well as comment fields for each, to facilitate both saving and sharing the evaluations. This increases the specificity of QC, as well as its shareability, as these files can be shared with others and potentially uploaded into repositories, promoting transparency and open science. We describe the features and applications of these QC tools for FMRI.

6.
Life (Basel) ; 14(9)2024 Sep 21.
Article in English | MEDLINE | ID: mdl-39337979

ABSTRACT

Type 2 diabetes, prediabetes, and insulin resistance (IR) are widespread yet often undetected in their early stages, contributing to a silent epidemic. Metabolic Syndrome (MetS) is also highly prevalent, increasing the chronic disease burden. Annual check-ups are inadequate for early detection due to conventional result formats that lack specific markers and comprehensive visualization. The aim of this study was to evaluate low-budget biochemical and hematological parameters, with data visualization, for identifying IR and MetS in a community-based laboratory. In a cross-sectional study with 1870 participants in Patras, Greece, blood samples were analyzed for key cardiovascular and inflammatory markers. IR diagnostic markers (TyG-Index, TyG-BMI, Triglycerides/HDL ratio, NLR) were compared with HOMA-IR. Innovative data visualization techniques were used to present metabolic profiles. Notable differences in parameters of cardiovascular risk and inflammation were observed between normal-weight and obese people, highlighting BMI as a significant risk factor. Also, the inflammation marker NHR (Neutrophils to HDL-Cholesterol Ratio) Index was successful at distinguishing the obese individuals and those with MetS from normal individuals. Additionally, a new diagnostic index of IR, combining BMI (Body Mass Index) and NHR Index, demonstrated better performance than other well-known indices. Lastly, data visualization significantly helped individuals understand their metabolic health patterns more clearly. BMI and NHR Index could play an essential role in assessing metabolic health patterns. Integrating specific markers and data visualization in routine check-ups enhances the early detection of IR and MetS, aiding in better patient awareness and adherence.

7.
J Clin Transl Sci ; 8(1): e121, 2024.
Article in English | MEDLINE | ID: mdl-39345710

ABSTRACT

Multisector stakeholders, including, community-based organizations, health systems, researchers, policymakers, and commerce, increasingly seek to address health inequities that persist due to structural racism. They require accessible tools to visualize and quantify the prevalence of social drivers of health (SDOH) and correlate them with health to facilitate dialog and action. We developed and deployed a web-based data visualization platform to make health and SDOH data available to the community. We conducted interviews and focus groups among end users of the platform to establish needs and desired platform functionality. The platform displays curated SDOH and de-identified and aggregated local electronic health record data. The resulting Social, Environmental, and Equity Drivers (SEED) Health Atlas integrates SDOH data across multiple constructs, including socioeconomic status, environmental pollution, and built environment. Aggregated health prevalence data on multiple conditions can be visualized in interactive maps. Data can be visualized and downloaded without coding knowledge. Visualizations facilitate an understanding of community health priorities and local health inequities. SEED could facilitate future discussions on improving community health and health equity. SEED provides a promising tool that members of the community and researchers may use in their efforts to improve health equity.

8.
Article in English | MEDLINE | ID: mdl-39348270

ABSTRACT

OBJECTIVES: This article describes the design and evaluation of MS Pattern Explorer, a novel visual tool that uses interactive machine learning to analyze fitness wearables' data. Applied to a clinical study of multiple sclerosis (MS) patients, the tool addresses key challenges: managing activity signals, accelerating insight generation, and rapidly contextualizing identified patterns. By analyzing sensor measurements, it aims to enhance understanding of MS symptomatology and improve the broader problem of clinical exploratory sensor data analysis. MATERIALS AND METHODS: Following a user-centered design approach, we learned that clinicians have 3 priorities for generating insights for the Barka-MS study data: exploration and search for, and contextualization of, sequences and patterns in patient sleep and activity. We compute meaningful sequences for patients using clustering and proximity search, displaying these with an interactive visual interface composed of coordinated views. Our evaluation posed both closed and open-ended tasks to participants, utilizing a scoring system to gauge the tool's usability, and effectiveness in supporting insight generation across 15 clinicians, data scientists, and non-experts. RESULTS AND DISCUSSION: We present MS Pattern Explorer, a visual analytics system that helps clinicians better address complex data-centric challenges by facilitating the understanding of activity patterns. It enables innovative analysis that leads to rapid insight generation and contextualization of temporal activity data, both within and between patients of a cohort. Our evaluation results indicate consistent performance across participant groups and effective support for insight generation in MS patient fitness tracker data. Our implementation offers broad applicability in clinical research, allowing for potential expansion into cohort-wide comparisons or studies of other chronic conditions. CONCLUSION: MS Pattern Explorer successfully reduces the signal overload clinicians currently experience with activity data, introducing novel opportunities for data exploration, sense-making, and hypothesis generation.

9.
J Am Soc Mass Spectrom ; 35(10): 2315-2323, 2024 Oct 02.
Article in English | MEDLINE | ID: mdl-39221961

ABSTRACT

Mass spectrometry imaging (MSI) provides information about the spatial localization of molecules in complex samples with high sensitivity and molecular selectivity. Although point-wise data acquisition, in which mass spectra are acquired at predefined points in a grid pattern, is common in MSI, several MSI techniques use line-wise data acquisition. In line-wise mode, the imaged surface is continuously sampled along consecutive parallel lines and MSI data are acquired as a collection of line scans across the sample. Furthermore, aside from the standard imaging mode in which full mass spectra are acquired, other acquisition modes have been developed to enhance molecular specificity, enable separation of isobaric and isomeric species, and improve sensitivity to facilitate the imaging of low abundance species. These methods, including MS/MS-MSI in both MS2 and MS3 modes, multiple-reaction monitoring (MRM)-MSI, and ion mobility spectrometry (IMS)-MSI have all demonstrated their capabilities, but their broader implementation is limited by the existing MSI analysis software. Here, we present MSIGen, an open-source Python package for the visualization of MSI experiments performed in line-wise acquisition mode containing MS1, MS2, MRM, and IMS data, which is available at https://github.com/LabLaskin/MSIGen. The package supports multiple vendor-specific and open-source data formats and contains tools for targeted extraction of ion images, normalization, and exportation as images, arrays, or publication-style images. MSIGen offers multiple interfaces, allowing for accessibility and easy integration with other workflows. Considering its support for a wide variety of MSI imaging modes and vendor formats, MSIGen is a valuable tool for the visualization and analysis of MSI data.

10.
Am J Biol Anthropol ; : e25020, 2024 Sep 02.
Article in English | MEDLINE | ID: mdl-39222382

ABSTRACT

A central goal of biological anthropology is connecting environmental variation to differences in host physiology, biology, health, and evolution. The microbiome represents a valuable pathway for studying how variation in host environments impacts health outcomes. While there are many resources for learning about methods related to microbiome sample collection, laboratory analyses, and genetic sequencing, there are fewer dedicated to helping researchers navigate the dense portfolio of bioinformatics and statistical approaches for analyzing microbiome data. Those that do exist are rarely related to questions in biological anthropology and instead are often focused on human biomedicine. To address this gap, we expand on existing tutorials and provide a "road map" to aid biological anthropologists in understanding, selecting, and deploying the data analysis and visualization methods that are most appropriate for their specific research questions. Leveraging an existing dataset of fecal samples and survey data collected from wild geladas living in Simien Mountains National Park in Ethiopia (Baniel et al., 2021), this paper guides researchers toward answering three questions related to variation in the gut microbiome across host and environmental factors. By providing explanations, examples, and a reproducible workflow for different analytic methods, we move beyond the theoretical benefits of considering the microbiome within anthropological research and instead present researchers with a guide for applying microbiome science to their work. This paper makes microbiome science more accessible to biological anthropologists and paves the way for continued research into the microbiome's role in the ecology, evolution, and health of human and non-human primates.

11.
Stud Health Technol Inform ; 317: 314-323, 2024 Aug 30.
Article in English | MEDLINE | ID: mdl-39234736

ABSTRACT

INTRODUCTION: User-centered data visualizations can reduce physician cognitive load and support clinical decision making. To facilitate the selection of appropriate visualizations for single patient health data summaries, this scoping review provides a literature overview of possible visualization techniques and the corresponding reported user-centered design phases. METHODS: The publication databases PubMed, Web of Science, IEEE Xplore and ACM Digital Library were searched for relevant articles from 2017 to 2022. RESULTS: Of the 777 articles screened, 78 articles were included in the final analysis. The most commonly used visualization techniques are table, scatterplot-line timeline, text and event timelines, with 24 other visualization techniques identified. The testing phase of the user centered design process is reported most frequently. CONCLUSION: This scoping review can support developers in the selection of suitable visualizations for single patient health data by revealing the design space of possible visualization techniques.


Subject(s)
Decision Support Systems, Clinical , Humans , Data Visualization , Clinical Decision-Making , Electronic Health Records , User-Computer Interface , User-Centered Design
12.
Front Bioinform ; 4: 1349205, 2024.
Article in English | MEDLINE | ID: mdl-39286643

ABSTRACT

Rvisdiff is an R/Bioconductor package that generates an interactive interface for the interpretation of differential expression results. It creates a local web page that enables the exploration of statistical analysis results through the generation of auto-analytical visualizations. Users can explore the differential expression results and the source expression data interactively in the same view. As input, the package supports the results of popular differential expression packages such as DESeq2, edgeR, and limma. As output, the package generates a local HTML page that can be easily viewed in a web browser. Rvisdiff is freely available at https://bioconductor.org/packages/Rvisdiff/.

13.
Heliyon ; 10(18): e37439, 2024 Sep 30.
Article in English | MEDLINE | ID: mdl-39315188

ABSTRACT

The emergence of artificial intelligence (AI) technology has presented new challenges and opportunities for Traditional Chinese Medicine (TCM), aiming to provide objective assessments and improve clinical effectiveness. However, there is a lack of comprehensive analyses on the research trajectory, key directions, current trends, and future perspectives in this field. This research aims to comprehensively update the progress of AI in TCM over the past 24 years, based on data from the Web of Science database covering January 1, 2000, to March 1, 2024. Using advanced analytical tools, we conducted detailed bibliometric and visual analyses. The results highlight China's predominant influence, contributing 54.35 % of the total publications and playing a key role in shaping research in this field. Significant productivity was observed at institutions such as the China Academy of Chinese Medical Sciences, Beijing University of Chinese Medicine, and Shanghai University of Traditional Chinese Medicine, with Wang Yu being the most prolific contributor. The journal Molecules contributed the most publications in this field. This study identified hepatocellular carcinoma, chemical and drug-induced liver injury, Papillon-Lefèvre disease, Parkinson's disease, and anorexia as the most significant disorders researched. This comprehensive bibliometric assessment benefits both seasoned researchers and newcomers, offering quick access to essential information and fostering the generation of innovative ideas in this field.

14.
Heliyon ; 10(16): e36127, 2024 Aug 30.
Article in English | MEDLINE | ID: mdl-39224260

ABSTRACT

Extensive research has made significant progress in exploring the potential application of extracellular vesicles (EV) in the diagnosis and treatment of osteoarthritis (OA). However, there is current a lack of study on bibliometrics. In this study, we completed a novel bibliometric analysis of EV research in OA over the past two decades. Specifically, we identified a total of 354 relevant publications obtained between January 1, 2003 and December 31, 2022. We also provided a description of the distribution information regarding the countries or regions of publication, institutions involved, journals, authors, citations, and keywords. The primary research focuses encompassed the role of extracellular vesicles in the diagnosis of OA, delivery of active ingredients, treatment strategies, and cartilage repair. These findings highlight the latest research frontiers and emerging areas, providing valuable insights for further investigations on the application of extracellular vesicles in the context of osteoarthritis.

15.
Heliyon ; 10(16): e35979, 2024 Aug 30.
Article in English | MEDLINE | ID: mdl-39247267

ABSTRACT

We analyze leading journals in behavioral finance to identify the most-used keywords in the area and how they have evolved. Using keyword analysis of data between 2000 and 2020 as well as data mapping and visualization tools, a dynamic map of the discipline was constructed. This study assesses the state-of-the-art of the field, main topics of discussion, relationships that arise between the concepts discussed, and emerging issues of interest. The sample comprises 3876 pieces, including 15859 keywords from journals responsible for the growth of the discipline, namely the Journal of Behavioral and Experimental Economics, Journal of Behavioral and Experimental Finance, Journal of Economic Psychology, Journal of Behavioral Finance, and Review of Behavioral Finance. During the period analyzed, our results depict a lively area and highlight the prominent role that experiments play in the field. Two related but different streams of behavioral finance research are revealed.

16.
Curr Protoc ; 4(8): e1120, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39126338

ABSTRACT

JBrowse 2 is a modular genome browser that can visualize many common genomic file formats. While JBrowse 2 supports a variety of different usages, it is particularly suited for deployment on websites, such as model organism databases or other web-based genomic data resources. This protocol provides detailed instructions for setting up JBrowse 2 on an Ubuntu Linux web server, loading a reference genome from a FASTA format file, and adding a gene annotation track from a GFF3 format file. By the end of the protocol, users will have a working JBrowse 2 instance that is accessible via the web. © 2024 The Author(s). Current Protocols published by Wiley Periodicals LLC. Basic Protocol: Setting up JBrowse 2 on your web server.


Subject(s)
Genomics , Genomics/methods , Software , Web Browser , Databases, Genetic , Internet , Genome/genetics , Humans , User-Computer Interface
17.
Front Toxicol ; 6: 1437884, 2024.
Article in English | MEDLINE | ID: mdl-39104826

ABSTRACT

In environmental health, the specific molecular mechanisms connecting a chemical exposure to an adverse endpoint are often unknown, reflecting knowledge gaps. At the public Comparative Toxicogenomics Database (CTD; https://ctdbase.org/), we integrate manually curated, literature-based interactions from CTD to compute four-unit blocks of information organized as a potential step-wise molecular mechanism, known as "CGPD-tetramers," wherein a chemical interacts with a gene product to trigger a phenotype which can be linked to a disease. These computationally derived datasets can be used to fill the gaps and offer testable mechanistic information. Users can generate CGPD-tetramers for any combination of chemical, gene, phenotype, and/or disease of interest at CTD; however, such queries typically result in the generation of thousands of CGPD-tetramers. Here, we describe a novel approach to transform these large datasets into user-friendly chord diagrams using R. This visualization process is straightforward, simple to implement, and accessible to inexperienced users that have never used R before. Combining CGPD-tetramers into a single chord diagram helps identify potential key chemicals, genes, phenotypes, and diseases. This visualization allows users to more readily analyze computational datasets that can fill the exposure knowledge gaps in the environmental health continuum.

18.
Gigascience ; 132024 Jan 02.
Article in English | MEDLINE | ID: mdl-39172544

ABSTRACT

BACKGROUND: As single-cell sequencing technologies continue to advance, the growing volume and complexity of the ensuing data present new analytical challenges. Large cellular populations from single-cell atlases are more difficult to visualize and require extensive processing to identify biologically relevant subpopulations. Managing these workflows is also laborious for technical users and unintuitive for nontechnical users. RESULTS: We present TooManyCellsInteractive (TMCI), a browser-based JavaScript application for interactive exploration of cell populations. TMCI provides an intuitive interface to visualize and manipulate a radial tree representation of hierarchical cell subpopulations and allows users to easily overlay, filter, and compare biological features at multiple resolutions. Here we describe the software architecture and demonstrate how we used TMCI in a pan-cancer analysis to identify unique survival pathways among drug-tolerant persister cells. CONCLUSIONS: TMCI will facilitate exploration and visualization of large-scale sequencing data in a user-friendly way. TMCI is freely available at https://github.com/schwartzlab-methods/too-many-cells-interactive. An example tree from data within this article is available at https://tmci.schwartzlab.ca/.


Subject(s)
Single-Cell Analysis , Software , Single-Cell Analysis/methods , Humans , Computational Biology/methods , Neoplasms/genetics , Neoplasms/pathology
19.
Mol Ecol Resour ; : e13996, 2024 Aug 04.
Article in English | MEDLINE | ID: mdl-39099161

ABSTRACT

The analysis of meta-omics data requires the utilization of several bioinformatics tools and proficiency in informatics. The integration of multiple meta-omics data is even more challenging, and the outputs of existing bioinformatics solutions are not always easy to interpret. Here, we present a meta-omics bioinformatics pipeline, Meta-Omics Software for Community Analysis (MOSCA), which aims to overcome these limitations. MOSCA was initially developed for analysing metagenomics (MG) and metatranscriptomics (MT) data. Now, it also performs MG and metaproteomics (MP) integrated analysis, and MG/MT analysis was upgraded with an additional iterative binning step, metabolic pathways mapping, and several improvements regarding functional annotation and data visualization. MOSCA handles raw sequencing data and mass spectra and performs pre-processing, assembly, annotation, binning and differential gene/protein expression analysis. MOSCA shows taxonomic and functional analysis in large tables, performs metabolic pathways mapping, generates Krona plots and shows gene/protein expression results in heatmaps, improving omics data visualization. MOSCA is easily run from a single command while also providing a web interface (MOSGUITO). Relevant features include an extensive set of customization options, allowing tailored analyses to suit specific research objectives, and the ability to restart the pipeline from intermediary checkpoints using alternative configurations. Two case studies showcased MOSCA results, giving a complete view of the anaerobic microbial communities from anaerobic digesters and insights on the role of specific microorganisms. MOSCA represents a pivotal advancement in meta-omics research, offering an intuitive, comprehensive, and versatile solution for researchers seeking to unravel the intricate tapestry of microbial communities.

20.
J Cheminform ; 16(1): 101, 2024 Aug 16.
Article in English | MEDLINE | ID: mdl-39152469

ABSTRACT

With the increased availability of chemical data in public databases, innovative techniques and algorithms have emerged for the analysis, exploration, visualization, and extraction of information from these data. One such technique is chemical grouping, where chemicals with common characteristics are categorized into distinct groups based on physicochemical properties, use, biological activity, or a combination. However, existing tools for chemical grouping often require specialized programming skills or the use of commercial software packages. To address these challenges, we developed a user-friendly chemical grouping workflow implemented in KNIME, a free, open-source, low/no-code, data analytics platform. The workflow serves as an all-encompassing tool, expertly incorporating a range of processes such as molecular descriptor calculation, feature selection, dimensionality reduction, hyperparameter search, and supervised and unsupervised machine learning methods, enabling effective chemical grouping and visualization of results. Furthermore, we implemented tools for interpretation, identifying key molecular descriptors for the chemical groups, and using natural language summaries to clarify the rationale behind these groupings. The workflow was designed to run seamlessly in both the KNIME local desktop version and KNIME Server WebPortal as a web application. It incorporates interactive interfaces and guides to assist users in a step-by-step manner. We demonstrate the utility of this workflow through a case study using an eye irritation and corrosion dataset.Scientific contributionsThis work presents a novel, comprehensive chemical grouping workflow in KNIME, enhancing accessibility by integrating a user-friendly graphical interface that eliminates the need for extensive programming skills. This workflow uniquely combines several features such as automated molecular descriptor calculation, feature selection, dimensionality reduction, and machine learning algorithms (both supervised and unsupervised), with hyperparameter optimization to refine chemical grouping accuracy. Moreover, we have introduced an innovative interpretative step and natural language summaries to elucidate the underlying reasons for chemical groupings, significantly advancing the usability of the tool and interpretability of the results.

SELECTION OF CITATIONS
SEARCH DETAIL