Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 151
Filter
Add more filters

Publication year range
1.
Nat Methods ; 21(2): 182-194, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38347140

ABSTRACT

Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.


Subject(s)
Artificial Intelligence
2.
Radiology ; 310(2): e231319, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38319168

ABSTRACT

Filters are commonly used to enhance specific structures and patterns in images, such as vessels or peritumoral regions, to enable clinical insights beyond the visible image using radiomics. However, their lack of standardization restricts reproducibility and clinical translation of radiomics decision support tools. In this special report, teams of researchers who developed radiomics software participated in a three-phase study (September 2020 to December 2022) to establish a standardized set of filters. The first two phases focused on finding reference filtered images and reference feature values for commonly used convolutional filters: mean, Laplacian of Gaussian, Laws and Gabor kernels, separable and nonseparable wavelets (including decomposed forms), and Riesz transformations. In the first phase, 15 teams used digital phantoms to establish 33 reference filtered images of 36 filter configurations. In phase 2, 11 teams used a chest CT image to derive reference values for 323 of 396 features computed from filtered images using 22 filter and image processing configurations. Reference filtered images and feature values for Riesz transformations were not established. Reproducibility of standardized convolutional filters was validated on a public data set of multimodal imaging (CT, fluorodeoxyglucose PET, and T1-weighted MRI) in 51 patients with soft-tissue sarcoma. At validation, reproducibility of 486 features computed from filtered images using nine configurations × three imaging modalities was assessed using the lower bounds of 95% CIs of intraclass correlation coefficients. Out of 486 features, 458 were found to be reproducible across nine teams with lower bounds of 95% CIs of intraclass correlation coefficients greater than 0.75. In conclusion, eight filter types were standardized with reference filtered images and reference feature values for verifying and calibrating radiomics software packages. A web-based tool is available for compliance checking.


Subject(s)
Image Processing, Computer-Assisted , Radiomics , Humans , Reproducibility of Results , Biomarkers , Multimodal Imaging
3.
BMC Ophthalmol ; 23(1): 220, 2023 May 17.
Article in English | MEDLINE | ID: mdl-37198558

ABSTRACT

BACKGROUND: Amblyopia is the most common developmental vision disorder in children. The initial treatment consists of refractive correction. When insufficient, occlusion therapy may further improve visual acuity. However, the challenges and compliance issues associated with occlusion therapy may result in treatment failure and residual amblyopia. Virtual reality (VR) games developed to improve visual function have shown positive preliminary results. The aim of this study is to determine the efficacy of these games to improve vision, attention, and motor skills in patients with residual amblyopia and identify brain-related changes. We hypothesize that a VR-based training with the suggested ingredients (3D cues and rich feedback), combined with increasing the difficulty level and the use of various games in a home-based environment is crucial for treatment efficacy of vision recovery, and may be particularly effective in children. METHODS: The AMBER study is a randomized, cross-over, controlled trial designed to assess the effect of binocular stimulation (VR-based stereoptic serious games) in individuals with residual amblyopia (n = 30, 6-35 years of age), compared to refractive correction on vision, selective attention and motor control skills. Additionally, they will be compared to a control group of age-matched healthy individuals (n = 30) to account for the unique benefit of VR-based serious games. All participants will play serious games 30 min per day, 5 days per week, for 8 weeks. The games are delivered with the Vivid Vision Home software. The amblyopic cohort will receive both treatments in a randomized order according to the type of amblyopia, while the control group will only receive the VR-based stereoscopic serious games. The primary outcome is visual acuity in the amblyopic eye. Secondary outcomes include stereoacuity, functional vision, cortical visual responses, selective attention, and motor control. The outcomes will be measured before and after each treatment with 8-week follow-up. DISCUSSION: The VR-based games used in this study have been conceived to deliver binocular visual stimulation tailored to the individual visual needs of the patient, which will potentially result in improved basic and functional vision skills as well as visual attention and motor control skills. TRIAL REGISTRATION: This protocol is registered on ClinicalTrials.gov (identifier: NCT05114252) and in the Swiss National Clinical Trials Portal (identifier: SNCTP000005024).


Subject(s)
Amblyopia , Video Games , Child , Humans , Amblyopia/therapy , Vision, Binocular/physiology , Visual Acuity , Treatment Outcome , Randomized Controlled Trials as Topic
4.
BMC Med Imaging ; 21(1): 77, 2021 05 08.
Article in English | MEDLINE | ID: mdl-33964886

ABSTRACT

BACKGROUND: One challenge to train deep convolutional neural network (CNNs) models with whole slide images (WSIs) is providing the required large number of costly, manually annotated image regions. Strategies to alleviate the scarcity of annotated data include: using transfer learning, data augmentation and training the models with less expensive image-level annotations (weakly-supervised learning). However, it is not clear how to combine the use of transfer learning in a CNN model when different data sources are available for training or how to leverage from the combination of large amounts of weakly annotated images with a set of local region annotations. This paper aims to evaluate CNN training strategies based on transfer learning to leverage the combination of weak and strong annotations in heterogeneous data sources. The trade-off between classification performance and annotation effort is explored by evaluating a CNN that learns from strong labels (region annotations) and is later fine-tuned on a dataset with less expensive weak (image-level) labels. RESULTS: As expected, the model performance on strongly annotated data steadily increases as the percentage of strong annotations that are used increases, reaching a performance comparable to pathologists ([Formula: see text]). Nevertheless, the performance sharply decreases when applied for the WSI classification scenario with [Formula: see text]. Moreover, it only provides a lower performance regardless of the number of annotations used. The model performance increases when fine-tuning the model for the task of Gleason scoring with the weak WSI labels [Formula: see text]. CONCLUSION: Combining weak and strong supervision improves strong supervision in classification of Gleason patterns using tissue microarrays (TMA) and WSI regions. Our results contribute very good strategies for training CNN models combining few annotated data and heterogeneous data sources. The performance increases in the controlled TMA scenario with the number of annotations used to train the model. Nevertheless, the performance is hindered when the trained TMA model is applied directly to the more challenging WSI classification problem. This demonstrates that a good pre-trained model for prostate cancer TMA image classification may lead to the best downstream model if fine-tuned on the WSI target dataset. We have made available the source code repository for reproducing the experiments in the paper: https://github.com/ilmaro8/Digital_Pathology_Transfer_Learning.


Subject(s)
Neoplasm Grading/methods , Neural Networks, Computer , Prostatic Neoplasms/pathology , Supervised Machine Learning , Datasets as Topic , Diagnosis, Computer-Assisted/methods , Humans , Male , Neoplasm Grading/classification , Prostate/pathology , Prostatectomy/methods , Prostatic Neoplasms/surgery , Tissue Array Analysis
5.
Sensors (Basel) ; 21(22)2021 Nov 11.
Article in English | MEDLINE | ID: mdl-34833573

ABSTRACT

One major challenge limiting the use of dexterous robotic hand prostheses controlled via electromyography and pattern recognition relates to the important efforts required to train complex models from scratch. To overcome this problem, several studies in recent years proposed to use transfer learning, combining pre-trained models (obtained from prior subjects) with training sessions performed on a specific user. Although a few promising results were reported in the past, it was recently shown that the use of conventional transfer learning algorithms does not increase performance if proper hyperparameter optimization is performed on the standard approach that does not exploit transfer learning. The objective of this paper is to introduce novel analyses on this topic by using a random forest classifier without hyperparameter optimization and to extend them with experiments performed on data recorded from the same patient, but in different data acquisition sessions. Two domain adaptation techniques were tested on the random forest classifier, allowing us to conduct experiments on healthy subjects and amputees. Differently from several previous papers, our results show that there are no appreciable improvements in terms of accuracy, regardless of the transfer learning techniques tested. The lack of adaptive learning is also demonstrated for the first time in an intra-subject experimental setting when using as a source ten data acquisitions recorded from the same subject but on five different days.


Subject(s)
Amputees , Artificial Limbs , Algorithms , Electromyography , Hand , Humans , Pattern Recognition, Automated
6.
Radiology ; 295(2): 328-338, 2020 05.
Article in English | MEDLINE | ID: mdl-32154773

ABSTRACT

Background Radiomic features may quantify characteristics present in medical imaging. However, the lack of standardized definitions and validated reference values have hampered clinical use. Purpose To standardize a set of 174 radiomic features. Materials and Methods Radiomic features were assessed in three phases. In phase I, 487 features were derived from the basic set of 174 features. Twenty-five research teams with unique radiomics software implementations computed feature values directly from a digital phantom, without any additional image processing. In phase II, 15 teams computed values for 1347 derived features using a CT image of a patient with lung cancer and predefined image processing configurations. In both phases, consensus among the teams on the validity of tentative reference values was measured through the frequency of the modal value and classified as follows: less than three matches, weak; three to five matches, moderate; six to nine matches, strong; 10 or more matches, very strong. In the final phase (phase III), a public data set of multimodality images (CT, fluorine 18 fluorodeoxyglucose PET, and T1-weighted MRI) from 51 patients with soft-tissue sarcoma was used to prospectively assess reproducibility of standardized features. Results Consensus on reference values was initially weak for 232 of 302 features (76.8%) at phase I and 703 of 1075 features (65.4%) at phase II. At the final iteration, weak consensus remained for only two of 487 features (0.4%) at phase I and 19 of 1347 features (1.4%) at phase II. Strong or better consensus was achieved for 463 of 487 features (95.1%) at phase I and 1220 of 1347 features (90.6%) at phase II. Overall, 169 of 174 features were standardized in the first two phases. In the final validation phase (phase III), most of the 169 standardized features could be excellently reproduced (166 with CT; 164 with PET; and 164 with MRI). Conclusion A set of 169 radiomics features was standardized, which enabled verification and calibration of different radiomics software. © RSNA, 2020 Online supplemental material is available for this article. See also the editorial by Kuhl and Truhn in this issue.


Subject(s)
Biomarkers/analysis , Image Processing, Computer-Assisted/standards , Software , Calibration , Fluorodeoxyglucose F18 , Humans , Lung Neoplasms/diagnostic imaging , Magnetic Resonance Imaging , Phantoms, Imaging , Phenotype , Positron-Emission Tomography , Radiopharmaceuticals , Reproducibility of Results , Sarcoma/diagnostic imaging , Tomography, X-Ray Computed
7.
Sensors (Basel) ; 20(15)2020 Aug 01.
Article in English | MEDLINE | ID: mdl-32752155

ABSTRACT

BACKGROUND: Muscle synergy analysis is an approach to understand the neurophysiological mechanisms behind the hypothesized ability of the Central Nervous System (CNS) to reduce the dimensionality of muscle control. The muscle synergy approach is also used to evaluate motor recovery and the evolution of the patients' motor performance both in single-session and longitudinal studies. Synergy-based assessments are subject to various sources of variability: natural trial-by-trial variability of performed movements, intrinsic characteristics of subjects that change over time (e.g., recovery, adaptation, exercise, etc.), as well as experimental factors such as different electrode positioning. These sources of variability need to be quantified in order to resolve challenges for the application of muscle synergies in clinical environments. The objective of this study is to analyze the stability and similarity of extracted muscle synergies under the effect of factors that may induce variability, including inter- and intra-session variability within subjects and inter-subject variability differentiation. The analysis was performed using the comprehensive, publicly available hand grasp NinaPro Database, featuring surface electromyography (EMG) measures from two EMG electrode bracelets. METHODS: Intra-session, inter-session, and inter-subject synergy stability was analyzed using the following measures: variance accounted for (VAF) and number of synergies (NoS) as measures of reconstruction stability quality and cosine similarity for comparison of spatial composition of extracted synergies. Moreover, an approach based on virtual electrode repositioning was applied to shed light on the influence of electrode position on inter-session synergy similarity. RESULTS: Inter-session synergy similarity was significantly lower with respect to intra-session similarity, both considering coefficient of variation of VAF (approximately 0.2-15% for inter vs. approximately 0.1% to 2.5% for intra, depending on NoS) and coefficient of variation of NoS (approximately 6.5-14.5% for inter vs. approximately 3-3.5% for intra, depending on VAF) as well as synergy similarity (approximately 74-77% for inter vs. approximately 88-94% for intra, depending on the selected VAF). Virtual electrode repositioning revealed that a slightly different electrode position can lower similarity of synergies from the same session and can increase similarity between sessions. Finally, the similarity of inter-subject synergies has no significant difference from the similarity of inter-session synergies (both on average approximately 84-90% depending on selected VAF). CONCLUSION: Synergy similarity was lower in inter-session conditions with respect to intra-session. This finding should be considered when interpreting results from multi-session assessments. Lastly, electrode positioning might play an important role in the lower similarity of synergies over different sessions.


Subject(s)
Hand Strength , Muscle, Skeletal , Activities of Daily Living , Adult , Biomechanical Phenomena , Electromyography , Female , Hand , Humans , Male , Young Adult
8.
J Neuroeng Rehabil ; 16(1): 63, 2019 05 28.
Article in English | MEDLINE | ID: mdl-31138257

ABSTRACT

BACKGROUND: Hand grasp patterns require complex coordination. The reduction of the kinematic dimensionality is a key process to study the patterns underlying hand usage and grasping. It allows to define metrics for motor assessment and rehabilitation, to develop assistive devices and prosthesis control methods. Several studies were presented in this field but most of them targeted a limited number of subjects, they focused on postures rather than entire grasping movements and they did not perform separate analysis for the tasks and subjects, which can limit the impact on rehabilitation and assistive applications. This paper provides a comprehensive mapping of synergies from hand grasps targeting activities of daily living. It clarifies several current limits of the field and fosters the development of applications in rehabilitation and assistive robotics. METHODS: In this work, hand kinematic data of 77 subjects, performing up to 20 hand grasps, were acquired with a data glove (a 22-sensor CyberGlove II data glove) and analyzed. Principal Component Analysis (PCA) and hierarchical cluster analysis were used to extract and group kinematic synergies that summarize the coordination patterns available for hand grasps. RESULTS: Twelve synergies were found to account for > 80% of the overall variation. The first three synergies accounted for more than 50% of the total amount of variance and consisted of: the flexion and adduction of the Metacarpophalangeal joint (MCP) of fingers 3 to 5 (synergy #1), palmar arching and flexion of the wrist (synergy #2) and opposition of the thumb (synergy #3). Further synergies refine movements and have higher variability among subjects. CONCLUSION: Kinematic synergies are extracted from a large number of subjects (77) and grasps related to activities of daily living (20). The number of motor modules required to perform the motor tasks is higher than what previously described. Twelve synergies are responsible for most of the variation in hand grasping. The first three are used as primary synergies, while the remaining ones target finer movements (e.g. independence of thumb and index finger). The results generalize the description of hand kinematics, better clarifying several limits of the field and fostering the development of applications in rehabilitation and assistive robotics.


Subject(s)
Activities of Daily Living , Hand Strength/physiology , Motor Activity/physiology , Biomechanical Phenomena , Datasets as Topic , Female , Humans , Male , Principal Component Analysis
9.
J Neuroeng Rehabil ; 16(1): 28, 2019 02 15.
Article in English | MEDLINE | ID: mdl-30770759

ABSTRACT

BACKGROUND: A proper modeling of human grasping and of hand movements is fundamental for robotics, prosthetics, physiology and rehabilitation. The taxonomies of hand grasps that have been proposed in scientific literature so far are based on qualitative analyses of the movements and thus they are usually not quantitatively justified. METHODS: This paper presents to the best of our knowledge the first quantitative taxonomy of hand grasps based on biomedical data measurements. The taxonomy is based on electromyography and kinematic data recorded from 40 healthy subjects performing 20 unique hand grasps. For each subject, a set of hierarchical trees are computed for several signal features. Afterwards, the trees are combined, first into modality-specific (i.e. muscular and kinematic) taxonomies of hand grasps and then into a general quantitative taxonomy of hand movements. The modality-specific taxonomies provide similar results despite describing different parameters of hand movements, one being muscular and the other kinematic. RESULTS: The general taxonomy merges the kinematic and muscular description into a comprehensive hierarchical structure. The obtained results clarify what has been proposed in the literature so far and they partially confirm the qualitative parameters used to create previous taxonomies of hand grasps. According to the results, hand movements can be divided into five movement categories defined based on the overall grasp shape, finger positioning and muscular activation. Part of the results appears qualitatively in accordance with previous results describing kinematic hand grasping synergies. CONCLUSIONS: The taxonomy of hand grasps proposed in this paper clarifies with quantitative measurements what has been proposed in the field on a qualitative basis, thus having a potential impact on several scientific fields.


Subject(s)
Hand Strength/physiology , Hand/physiology , Adult , Algorithms , Biomechanical Phenomena , Classification , Electromyography , Female , Fingers , Hand/anatomy & histology , Healthy Volunteers , Humans , Male , Movement , Reference Values , Signal Processing, Computer-Assisted
10.
J Biomed Inform ; 56: 57-64, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26002820

ABSTRACT

Information search has changed the way we manage knowledge and the ubiquity of information access has made search a frequent activity, whether via Internet search engines or increasingly via mobile devices. Medical information search is in this respect no different and much research has been devoted to analyzing the way in which physicians aim to access information. Medical image search is a much smaller domain but has gained much attention as it has different characteristics than search for text documents. While web search log files have been analysed many times to better understand user behaviour, the log files of hospital internal systems for search in a PACS/RIS (Picture Archival and Communication System, Radiology Information System) have rarely been analysed. Such a comparison between a hospital PACS/RIS search and a web system for searching images of the biomedical literature is the goal of this paper. Objectives are to identify similarities and differences in search behaviour of the two systems, which could then be used to optimize existing systems and build new search engines. Log files of the ARRS GoldMiner medical image search engine (freely accessible on the Internet) containing 222,005 queries, and log files of Stanford's internal PACS/RIS search called radTF containing 18,068 queries were analysed. Each query was preprocessed and all query terms were mapped to the RadLex (Radiology Lexicon) terminology, a comprehensive lexicon of radiology terms created and maintained by the Radiological Society of North America, so the semantic content in the queries and the links between terms could be analysed, and synonyms for the same concept could be detected. RadLex was mainly created for the use in radiology reports, to aid structured reporting and the preparation of educational material (Lanlotz, 2006) [1]. In standard medical vocabularies such as MeSH (Medical Subject Headings) and UMLS (Unified Medical Language System) specific terms of radiology are often underrepresented, therefore RadLex was considered to be the best option for this task. The results show a surprising similarity between the usage behaviour in the two systems, but several subtle differences can also be noted. The average number of terms per query is 2.21 for GoldMiner and 2.07 for radTF, the used axes of RadLex (anatomy, pathology, findings, …) have almost the same distribution with clinical findings being the most frequent and the anatomical entity the second; also, combinations of RadLex axes are extremely similar between the two systems. Differences include a longer length of the sessions in radTF than in GoldMiner (3.4 and 1.9 queries per session on average). Several frequent search terms overlap but some strong differences exist in the details. In radTF the term "normal" is frequent, whereas in GoldMiner it is not. This makes intuitive sense, as in the literature normal cases are rarely described whereas in clinical work the comparison with normal cases is often a first step. The general similarity in many points is likely due to the fact that users of the two systems are influenced by their daily behaviour in using standard web search engines and follow this behaviour in their professional search. This means that many results and insights gained from standard web search can likely be transferred to more specialized search systems. Still, specialized log files can be used to find out more on reformulations and detailed strategies of users to find the right content.


Subject(s)
Medical Informatics/instrumentation , Radiographic Image Interpretation, Computer-Assisted/instrumentation , Radiology Information Systems , Radiology/instrumentation , Algorithms , Computer Graphics , Hospitals , Information Storage and Retrieval , Internet , Medical Informatics/methods , Natural Language Processing , Radiographic Image Interpretation, Computer-Assisted/methods , Search Engine , Semantics , User-Computer Interface
11.
J Digit Imaging ; 28(5): 537-46, 2015 Oct.
Article in English | MEDLINE | ID: mdl-25810317

ABSTRACT

Log files of information retrieval systems that record user behavior have been used to improve the outcomes of retrieval systems, understand user behavior, and predict events. In this article, a log file of the ARRS GoldMiner search engine containing 222,005 consecutive queries is analyzed. Time stamps are available for each query, as well as masked IP addresses, which enables to identify queries from the same person. This article describes the ways in which physicians (or Internet searchers interested in medical images) search and proposes potential improvements by suggesting query modifications. For example, many queries contain only few terms and therefore are not specific; others contain spelling mistakes or non-medical terms that likely lead to poor or empty results. One of the goals of this report is to predict the number of results a query will have since such a model allows search engines to automatically propose query modifications in order to avoid result lists that are empty or too large. This prediction is made based on characteristics of the query terms themselves. Prediction of empty results has an accuracy above 88%, and thus can be used to automatically modify the query to avoid empty result sets for a user. The semantic analysis and data of reformulations done by users in the past can aid the development of better search systems, particularly to improve results for novice users. Therefore, this paper gives important ideas to better understand how people search and how to use this knowledge to improve the performance of specialized medical search engines.


Subject(s)
Information Storage and Retrieval/methods , Internet , Radiology Information Systems , Semantics , User-Computer Interface , Humans
12.
Comput Methods Programs Biomed ; 250: 108187, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38657383

ABSTRACT

BACKGROUND AND OBJECTIVE: The automatic registration of differently stained whole slide images (WSIs) is crucial for improving diagnosis and prognosis by fusing complementary information emerging from different visible structures. It is also useful to quickly transfer annotations between consecutive or restained slides, thus significantly reducing the annotation time and associated costs. Nevertheless, the slide preparation is different for each stain and the tissue undergoes complex and large deformations. Therefore, a robust, efficient, and accurate registration method is highly desired by the scientific community and hospitals specializing in digital pathology. METHODS: We propose a two-step hybrid method consisting of (i) deep learning- and feature-based initial alignment algorithm, and (ii) intensity-based nonrigid registration using the instance optimization. The proposed method does not require any fine-tuning to a particular dataset and can be used directly for any desired tissue type and stain. The registration time is low, allowing one to perform efficient registration even for large datasets. The method was proposed for the ACROBAT 2023 challenge organized during the MICCAI 2023 conference and scored 1st place. The method is released as open-source software. RESULTS: The proposed method is evaluated using three open datasets: (i) Automatic Nonrigid Histological Image Registration Dataset (ANHIR), (ii) Automatic Registration of Breast Cancer Tissue Dataset (ACROBAT), and (iii) Hybrid Restained and Consecutive Histological Serial Sections Dataset (HyReCo). The target registration error (TRE) is used as the evaluation metric. We compare the proposed algorithm to other state-of-the-art solutions, showing considerable improvement. Additionally, we perform several ablation studies concerning the resolution used for registration and the initial alignment robustness and stability. The method achieves the most accurate results for the ACROBAT dataset, the cell-level registration accuracy for the restained slides from the HyReCo dataset, and is among the best methods evaluated on the ANHIR dataset. CONCLUSIONS: The article presents an automatic and robust registration method that outperforms other state-of-the-art solutions. The method does not require any fine-tuning to a particular dataset and can be used out-of-the-box for numerous types of microscopic images. The method is incorporated into the DeeperHistReg framework, allowing others to directly use it to register, transform, and save the WSIs at any desired pyramid level (resolution up to 220k x 220k). We provide free access to the software. The results are fully and easily reproducible. The proposed method is a significant contribution to improving the WSI registration quality, thus advancing the field of digital pathology.


Subject(s)
Algorithms , Deep Learning , Image Processing, Computer-Assisted , Humans , Image Processing, Computer-Assisted/methods , Software , Image Interpretation, Computer-Assisted/methods , Breast Neoplasms/diagnostic imaging , Breast Neoplasms/pathology , Female , Staining and Labeling
13.
Med Image Anal ; 95: 103191, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38728903

ABSTRACT

Prostate cancer is the second most frequent cancer in men worldwide after lung cancer. Its diagnosis is based on the identification of the Gleason score that evaluates the abnormality of cells in glands through the analysis of the different Gleason patterns within tissue samples. The recent advancements in computational pathology, a domain aiming at developing algorithms to automatically analyze digitized histopathology images, lead to a large variety and availability of datasets and algorithms for Gleason grading and scoring. However, there is no clear consensus on which methods are best suited for each problem in relation to the characteristics of data and labels. This paper provides a systematic comparison on nine datasets with state-of-the-art training approaches for deep neural networks (including fully-supervised learning, weakly-supervised learning, semi-supervised learning, Additive-MIL, Attention-Based MIL, Dual-Stream MIL, TransMIL and CLAM) applied to Gleason grading and scoring tasks. The nine datasets are collected from pathology institutes and openly accessible repositories. The results show that the best methods for Gleason grading and Gleason scoring tasks are fully supervised learning and CLAM, respectively, guiding researchers to the best practice to adopt depending on the task to solve and the labels that are available.


Subject(s)
Deep Learning , Neoplasm Grading , Prostatic Neoplasms , Humans , Prostatic Neoplasms/pathology , Prostatic Neoplasms/diagnostic imaging , Male , Algorithms , Image Interpretation, Computer-Assisted/methods
14.
Sci Data ; 11(1): 688, 2024 Jun 26.
Article in English | MEDLINE | ID: mdl-38926396

ABSTRACT

Automated medical image analysis systems often require large amounts of training data with high quality labels, which are difficult and time consuming to generate. This paper introduces Radiology Object in COntext version 2 (ROCOv2), a multimodal dataset consisting of radiological images and associated medical concepts and captions extracted from the PMC Open Access subset. It is an updated version of the ROCO dataset published in 2018, and adds 35,705 new images added to PMC since 2018. It further provides manually curated concepts for imaging modalities with additional anatomical and directional concepts for X-rays. The dataset consists of 79,789 images and has been used, with minor modifications, in the concept detection and caption prediction tasks of ImageCLEFmedical Caption 2023. The dataset is suitable for training image annotation models based on image-caption pairs, or for multi-label image classification using Unified Medical Language System (UMLS) concepts provided with each image. In addition, it can serve for pre-training of medical domain models, and evaluation of deep learning models for multi-task learning.


Subject(s)
Multimodal Imaging , Radiology , Humans , Image Processing, Computer-Assisted , Unified Medical Language System
15.
Insights Imaging ; 15(1): 8, 2024 Jan 17.
Article in English | MEDLINE | ID: mdl-38228979

ABSTRACT

PURPOSE: To propose a new quality scoring tool, METhodological RadiomICs Score (METRICS), to assess and improve research quality of radiomics studies. METHODS: We conducted an online modified Delphi study with a group of international experts. It was performed in three consecutive stages: Stage#1, item preparation; Stage#2, panel discussion among EuSoMII Auditing Group members to identify the items to be voted; and Stage#3, four rounds of the modified Delphi exercise by panelists to determine the items eligible for the METRICS and their weights. The consensus threshold was 75%. Based on the median ranks derived from expert panel opinion and their rank-sum based conversion to importance scores, the category and item weights were calculated. RESULT: In total, 59 panelists from 19 countries participated in selection and ranking of the items and categories. Final METRICS tool included 30 items within 9 categories. According to their weights, the categories were in descending order of importance: study design, imaging data, image processing and feature extraction, metrics and comparison, testing, feature processing, preparation for modeling, segmentation, and open science. A web application and a repository were developed to streamline the calculation of the METRICS score and to collect feedback from the radiomics community. CONCLUSION: In this work, we developed a scoring tool for assessing the methodological quality of the radiomics research, with a large international panel and a modified Delphi protocol. With its conditional format to cover methodological variations, it provides a well-constructed framework for the key methodological concepts to assess the quality of radiomic research papers. CRITICAL RELEVANCE STATEMENT: A quality assessment tool, METhodological RadiomICs Score (METRICS), is made available by a large group of international domain experts, with transparent methodology, aiming at evaluating and improving research quality in radiomics and machine learning. KEY POINTS: • A methodological scoring tool, METRICS, was developed for assessing the quality of radiomics research, with a large international expert panel and a modified Delphi protocol. • The proposed scoring tool presents expert opinion-based importance weights of categories and items with a transparent methodology for the first time. • METRICS accounts for varying use cases, from handcrafted radiomics to entirely deep learning-based pipelines. • A web application has been developed to help with the calculation of the METRICS score ( https://metricsscore.github.io/metrics/METRICS.html ) and a repository created to collect feedback from the radiomics community ( https://github.com/metricsscore/metrics ).

16.
JAMIA Open ; 6(1): ooac107, 2023 Apr.
Article in English | MEDLINE | ID: mdl-36632329

ABSTRACT

Objective: The aim of this study was to test the feasibility of PICO (participants, interventions, comparators, outcomes) entity extraction using weak supervision and natural language processing. Methodology: We re-purpose more than 127 medical and nonmedical ontologies and expert-generated rules to obtain multiple noisy labels for PICO entities in the evidence-based medicine (EBM)-PICO corpus. These noisy labels are aggregated using simple majority voting and generative modeling to get consensus labels. The resulting probabilistic labels are used as weak signals to train a weakly supervised (WS) discriminative model and observe performance changes. We explore mistakes in the EBM-PICO that could have led to inaccurate evaluation of previous automation methods. Results: In total, 4081 randomized clinical trials were weakly labeled to train the WS models and compared against full supervision. The models were separately trained for PICO entities and evaluated on the EBM-PICO test set. A WS approach combining ontologies and expert-generated rules outperformed full supervision for the participant entity by 1.71% macro-F1. Error analysis on the EBM-PICO subset revealed 18-23% erroneous token classifications. Discussion: Automatic PICO entity extraction accelerates the writing of clinical systematic reviews that commonly use PICO information to filter health evidence. However, PICO extends to more entities-PICOS (S-study type and design), PICOC (C-context), and PICOT (T-timeframe) for which labelled datasets are unavailable. In such cases, the ability to use weak supervision overcomes the expensive annotation bottleneck. Conclusions: We show the feasibility of WS PICO entity extraction using freely available ontologies and heuristics without manually annotated data. Weak supervision has encouraging performance compared to full supervision but requires careful design to outperform it.

17.
BMJ Open ; 13(12): e076865, 2023 12 09.
Article in English | MEDLINE | ID: mdl-38070902

ABSTRACT

INTRODUCTION: Radiological imaging is one of the most frequently performed diagnostic tests worldwide. The free-text contained in radiology reports is currently only rarely used for secondary use purposes, including research and predictive analysis. However, this data might be made available by means of information extraction (IE), based on natural language processing (NLP). Recently, a new approach to NLP, large language models (LLMs), has gained momentum and continues to improve performance of IE-related tasks. The objective of this scoping review is to show the state of research regarding IE from free-text radiology reports based on LLMs, to investigate applied methods and to guide future research by showing open challenges and limitations of current approaches. To our knowledge, no systematic or scoping review of IE from radiology reports based on LLMs has been published. Existing publications are outdated and do not comprise LLM-based methods. METHODS AND ANALYSIS: This protocol is designed based on the JBI Manual for Evidence Synthesis, chapter 11.2: 'Development of a scoping review protocol'. Inclusion criteria and a search strategy comprising four databases (PubMed, IEEE Xplore, Web of Science Core Collection and ACM Digital Library) are defined. Furthermore, we describe the screening process, data charting, analysis and presentation of extracted data. ETHICS AND DISSEMINATION: This protocol describes the methodology of a scoping literature review and does not comprise research on or with humans, animals or their data. Therefore, no ethical approval is required. After the publication of this protocol and the conduct of the review, its results are going to be published in an open access journal dedicated to biomedical informatics/digital health.


Subject(s)
Radiology , Research Design , Humans , Information Storage and Retrieval , Radiography , Language , Review Literature as Topic
18.
Neuroscience ; 514: 100-122, 2023 03 15.
Article in English | MEDLINE | ID: mdl-36708799

ABSTRACT

Muscle synergy analysis investigates the neurophysiological mechanisms that the central nervous system employs to coordinate muscles. Several models have been developed to decompose electromyographic (EMG) signals into spatial and temporal synergies. However, using multiple approaches can complicate the interpretation of results. Spatial synergies represent invariant muscle weights modulated with variant temporal coefficients; temporal synergies are invariant temporal profiles that coordinate variant muscle weights. While non-negative matrix factorization allows to extract both spatial and temporal synergies, the comparison between the two approaches was rarely investigated targeting a large set of multi-joint upper-limb movements. Spatial and temporal synergies were extracted from two datasets with proximal (16 subjects, 10M, 6F) and distal upper-limb movements (30 subjects, 21M, 9F), focusing on their differences in reconstruction accuracy and inter-individual variability. We showed the existence of both spatial and temporal structure in the EMG data, comparing synergies with those from a surrogate dataset in which the phases were shuffled preserving the frequency content of the original data. The two models provide a compact characterization of motor coordination at the spatial or temporal level, respectively. However, a lower number of temporal synergies are needed to achieve the same reconstruction R2: spatial and temporal synergies may capture different hierarchical levels of motor control and are dual approaches to the characterization of low-dimensional coordination of the upper-limb. Last, a detailed characterization of the structure of the temporal synergies suggested that they can be related to intermittent control of the movement, allowing high flexibility and dexterity. These results improve neurophysiology understanding in several fields such as motor control, rehabilitation, and prosthetics.


Subject(s)
Muscle, Skeletal , Temporal Muscle , Humans , Muscle, Skeletal/physiology , Electromyography , Movement/physiology , Upper Extremity/physiology
19.
Article in English | MEDLINE | ID: mdl-38082977

ABSTRACT

The acquisition of whole slide images is prone to artifacts that can require human control and re-scanning, both in clinical workflows and in research-oriented settings. Quality control algorithms are a first step to overcome this challenge, as they limit the use of low quality images. Developing quality control systems in histopathology is not straightforward, also due to the limited availability of data related to this topic. We address the problem by proposing a tool to augment data with artifacts. The proposed method seamlessly generates and blends artifacts from an external library to a given histopathology dataset. The datasets augmented by the blended artifacts are then used to train an artifact detection network in a supervised way. We use the YOLOv5 model for the artifact detection with a slightly modified training pipeline. The proposed tool can be extended into a complete framework for the quality assessment of whole slide images.Clinical relevance- The proposed method may be useful for the initial quality screening of whole slide images. Each year, millions of whole slide images are acquired and digitized worldwide. Numerous of them contain artifacts affecting the following AI-oriented analysis. Therefore, a tool operating at the acquisition phase and improving the initial quality assessment is crucial to increase the performance of digital pathology algorithms, e.g., early cancer diagnosis.


Subject(s)
Artifacts , Neoplasms , Humans , Image Processing, Computer-Assisted/methods , Algorithms
20.
Eur J Radiol ; 169: 111159, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37976760

ABSTRACT

PURPOSE: To review eXplainable Artificial Intelligence/(XAI) methods available for medical imaging/(MI). METHOD: A scoping review was conducted following the Joanna Briggs Institute's methodology. The search was performed on Pubmed, Embase, Cinhal, Web of Science, BioRxiv, MedRxiv, and Google Scholar. Studies published in French and English after 2017 were included. Keyword combinations and descriptors related to explainability, and MI modalities were employed. Two independent reviewers screened abstracts, titles and full text, resolving differences through discussion. RESULTS: 228 studies met the criteria. XAI publications are increasing, targeting MRI (n = 73), radiography (n = 47), CT (n = 46). Lung (n = 82) and brain (n = 74) pathologies, Covid-19 (n = 48), Alzheimer's disease (n = 25), brain tumors (n = 15) are the main pathologies explained. Explanations are presented visually (n = 186), numerically (n = 67), rule-based (n = 11), textually (n = 11), and example-based (n = 6). Commonly explained tasks include classification (n = 89), prediction (n = 47), diagnosis (n = 39), detection (n = 29), segmentation (n = 13), and image quality improvement (n = 6). The most frequently provided explanations were local (78.1 %), 5.7 % were global, and 16.2 % combined both local and global approaches. Post-hoc approaches were predominantly employed. The used terminology varied, sometimes indistinctively using explainable (n = 207), interpretable (n = 187), understandable (n = 112), transparent (n = 61), reliable (n = 31), and intelligible (n = 3). CONCLUSION: The number of XAI publications in medical imaging is increasing, primarily focusing on applying XAI techniques to MRI, CT, and radiography for classifying and predicting lung and brain pathologies. Visual and numerical output formats are predominantly used. Terminology standardisation remains a challenge, as terms like "explainable" and "interpretable" are sometimes being used indistinctively. Future XAI development should consider user needs and perspectives.


Subject(s)
Alzheimer Disease , Brain Neoplasms , Humans , Artificial Intelligence , Radiography , Brain/diagnostic imaging
SELECTION OF CITATIONS
SEARCH DETAIL