RESUMO
Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.
Assuntos
Inteligência ArtificialRESUMO
Filters are commonly used to enhance specific structures and patterns in images, such as vessels or peritumoral regions, to enable clinical insights beyond the visible image using radiomics. However, their lack of standardization restricts reproducibility and clinical translation of radiomics decision support tools. In this special report, teams of researchers who developed radiomics software participated in a three-phase study (September 2020 to December 2022) to establish a standardized set of filters. The first two phases focused on finding reference filtered images and reference feature values for commonly used convolutional filters: mean, Laplacian of Gaussian, Laws and Gabor kernels, separable and nonseparable wavelets (including decomposed forms), and Riesz transformations. In the first phase, 15 teams used digital phantoms to establish 33 reference filtered images of 36 filter configurations. In phase 2, 11 teams used a chest CT image to derive reference values for 323 of 396 features computed from filtered images using 22 filter and image processing configurations. Reference filtered images and feature values for Riesz transformations were not established. Reproducibility of standardized convolutional filters was validated on a public data set of multimodal imaging (CT, fluorodeoxyglucose PET, and T1-weighted MRI) in 51 patients with soft-tissue sarcoma. At validation, reproducibility of 486 features computed from filtered images using nine configurations × three imaging modalities was assessed using the lower bounds of 95% CIs of intraclass correlation coefficients. Out of 486 features, 458 were found to be reproducible across nine teams with lower bounds of 95% CIs of intraclass correlation coefficients greater than 0.75. In conclusion, eight filter types were standardized with reference filtered images and reference feature values for verifying and calibrating radiomics software packages. A web-based tool is available for compliance checking.
Assuntos
Processamento de Imagem Assistida por Computador , Radiômica , Humanos , Reprodutibilidade dos Testes , Biomarcadores , Imagem MultimodalRESUMO
BACKGROUND: Amblyopia is the most common developmental vision disorder in children. The initial treatment consists of refractive correction. When insufficient, occlusion therapy may further improve visual acuity. However, the challenges and compliance issues associated with occlusion therapy may result in treatment failure and residual amblyopia. Virtual reality (VR) games developed to improve visual function have shown positive preliminary results. The aim of this study is to determine the efficacy of these games to improve vision, attention, and motor skills in patients with residual amblyopia and identify brain-related changes. We hypothesize that a VR-based training with the suggested ingredients (3D cues and rich feedback), combined with increasing the difficulty level and the use of various games in a home-based environment is crucial for treatment efficacy of vision recovery, and may be particularly effective in children. METHODS: The AMBER study is a randomized, cross-over, controlled trial designed to assess the effect of binocular stimulation (VR-based stereoptic serious games) in individuals with residual amblyopia (n = 30, 6-35 years of age), compared to refractive correction on vision, selective attention and motor control skills. Additionally, they will be compared to a control group of age-matched healthy individuals (n = 30) to account for the unique benefit of VR-based serious games. All participants will play serious games 30 min per day, 5 days per week, for 8 weeks. The games are delivered with the Vivid Vision Home software. The amblyopic cohort will receive both treatments in a randomized order according to the type of amblyopia, while the control group will only receive the VR-based stereoscopic serious games. The primary outcome is visual acuity in the amblyopic eye. Secondary outcomes include stereoacuity, functional vision, cortical visual responses, selective attention, and motor control. The outcomes will be measured before and after each treatment with 8-week follow-up. DISCUSSION: The VR-based games used in this study have been conceived to deliver binocular visual stimulation tailored to the individual visual needs of the patient, which will potentially result in improved basic and functional vision skills as well as visual attention and motor control skills. TRIAL REGISTRATION: This protocol is registered on ClinicalTrials.gov (identifier: NCT05114252) and in the Swiss National Clinical Trials Portal (identifier: SNCTP000005024).
Assuntos
Ambliopia , Jogos de Vídeo , Criança , Humanos , Ambliopia/terapia , Visão Binocular/fisiologia , Acuidade Visual , Resultado do Tratamento , Ensaios Clínicos Controlados Aleatórios como AssuntoRESUMO
BACKGROUND: One challenge to train deep convolutional neural network (CNNs) models with whole slide images (WSIs) is providing the required large number of costly, manually annotated image regions. Strategies to alleviate the scarcity of annotated data include: using transfer learning, data augmentation and training the models with less expensive image-level annotations (weakly-supervised learning). However, it is not clear how to combine the use of transfer learning in a CNN model when different data sources are available for training or how to leverage from the combination of large amounts of weakly annotated images with a set of local region annotations. This paper aims to evaluate CNN training strategies based on transfer learning to leverage the combination of weak and strong annotations in heterogeneous data sources. The trade-off between classification performance and annotation effort is explored by evaluating a CNN that learns from strong labels (region annotations) and is later fine-tuned on a dataset with less expensive weak (image-level) labels. RESULTS: As expected, the model performance on strongly annotated data steadily increases as the percentage of strong annotations that are used increases, reaching a performance comparable to pathologists ([Formula: see text]). Nevertheless, the performance sharply decreases when applied for the WSI classification scenario with [Formula: see text]. Moreover, it only provides a lower performance regardless of the number of annotations used. The model performance increases when fine-tuning the model for the task of Gleason scoring with the weak WSI labels [Formula: see text]. CONCLUSION: Combining weak and strong supervision improves strong supervision in classification of Gleason patterns using tissue microarrays (TMA) and WSI regions. Our results contribute very good strategies for training CNN models combining few annotated data and heterogeneous data sources. The performance increases in the controlled TMA scenario with the number of annotations used to train the model. Nevertheless, the performance is hindered when the trained TMA model is applied directly to the more challenging WSI classification problem. This demonstrates that a good pre-trained model for prostate cancer TMA image classification may lead to the best downstream model if fine-tuned on the WSI target dataset. We have made available the source code repository for reproducing the experiments in the paper: https://github.com/ilmaro8/Digital_Pathology_Transfer_Learning.
Assuntos
Gradação de Tumores/métodos , Redes Neurais de Computação , Neoplasias da Próstata/patologia , Aprendizado de Máquina Supervisionado , Conjuntos de Dados como Assunto , Diagnóstico por Computador/métodos , Humanos , Masculino , Gradação de Tumores/classificação , Próstata/patologia , Prostatectomia/métodos , Neoplasias da Próstata/cirurgia , Análise Serial de TecidosRESUMO
One major challenge limiting the use of dexterous robotic hand prostheses controlled via electromyography and pattern recognition relates to the important efforts required to train complex models from scratch. To overcome this problem, several studies in recent years proposed to use transfer learning, combining pre-trained models (obtained from prior subjects) with training sessions performed on a specific user. Although a few promising results were reported in the past, it was recently shown that the use of conventional transfer learning algorithms does not increase performance if proper hyperparameter optimization is performed on the standard approach that does not exploit transfer learning. The objective of this paper is to introduce novel analyses on this topic by using a random forest classifier without hyperparameter optimization and to extend them with experiments performed on data recorded from the same patient, but in different data acquisition sessions. Two domain adaptation techniques were tested on the random forest classifier, allowing us to conduct experiments on healthy subjects and amputees. Differently from several previous papers, our results show that there are no appreciable improvements in terms of accuracy, regardless of the transfer learning techniques tested. The lack of adaptive learning is also demonstrated for the first time in an intra-subject experimental setting when using as a source ten data acquisitions recorded from the same subject but on five different days.
Assuntos
Amputados , Membros Artificiais , Algoritmos , Eletromiografia , Mãos , Humanos , Reconhecimento Automatizado de PadrãoRESUMO
Background Radiomic features may quantify characteristics present in medical imaging. However, the lack of standardized definitions and validated reference values have hampered clinical use. Purpose To standardize a set of 174 radiomic features. Materials and Methods Radiomic features were assessed in three phases. In phase I, 487 features were derived from the basic set of 174 features. Twenty-five research teams with unique radiomics software implementations computed feature values directly from a digital phantom, without any additional image processing. In phase II, 15 teams computed values for 1347 derived features using a CT image of a patient with lung cancer and predefined image processing configurations. In both phases, consensus among the teams on the validity of tentative reference values was measured through the frequency of the modal value and classified as follows: less than three matches, weak; three to five matches, moderate; six to nine matches, strong; 10 or more matches, very strong. In the final phase (phase III), a public data set of multimodality images (CT, fluorine 18 fluorodeoxyglucose PET, and T1-weighted MRI) from 51 patients with soft-tissue sarcoma was used to prospectively assess reproducibility of standardized features. Results Consensus on reference values was initially weak for 232 of 302 features (76.8%) at phase I and 703 of 1075 features (65.4%) at phase II. At the final iteration, weak consensus remained for only two of 487 features (0.4%) at phase I and 19 of 1347 features (1.4%) at phase II. Strong or better consensus was achieved for 463 of 487 features (95.1%) at phase I and 1220 of 1347 features (90.6%) at phase II. Overall, 169 of 174 features were standardized in the first two phases. In the final validation phase (phase III), most of the 169 standardized features could be excellently reproduced (166 with CT; 164 with PET; and 164 with MRI). Conclusion A set of 169 radiomics features was standardized, which enabled verification and calibration of different radiomics software. © RSNA, 2020 Online supplemental material is available for this article. See also the editorial by Kuhl and Truhn in this issue.
Assuntos
Biomarcadores/análise , Processamento de Imagem Assistida por Computador/normas , Software , Calibragem , Fluordesoxiglucose F18 , Humanos , Neoplasias Pulmonares/diagnóstico por imagem , Imageamento por Ressonância Magnética , Imagens de Fantasmas , Fenótipo , Tomografia por Emissão de Pósitrons , Compostos Radiofarmacêuticos , Reprodutibilidade dos Testes , Sarcoma/diagnóstico por imagem , Tomografia Computadorizada por Raios XRESUMO
BACKGROUND: Muscle synergy analysis is an approach to understand the neurophysiological mechanisms behind the hypothesized ability of the Central Nervous System (CNS) to reduce the dimensionality of muscle control. The muscle synergy approach is also used to evaluate motor recovery and the evolution of the patients' motor performance both in single-session and longitudinal studies. Synergy-based assessments are subject to various sources of variability: natural trial-by-trial variability of performed movements, intrinsic characteristics of subjects that change over time (e.g., recovery, adaptation, exercise, etc.), as well as experimental factors such as different electrode positioning. These sources of variability need to be quantified in order to resolve challenges for the application of muscle synergies in clinical environments. The objective of this study is to analyze the stability and similarity of extracted muscle synergies under the effect of factors that may induce variability, including inter- and intra-session variability within subjects and inter-subject variability differentiation. The analysis was performed using the comprehensive, publicly available hand grasp NinaPro Database, featuring surface electromyography (EMG) measures from two EMG electrode bracelets. METHODS: Intra-session, inter-session, and inter-subject synergy stability was analyzed using the following measures: variance accounted for (VAF) and number of synergies (NoS) as measures of reconstruction stability quality and cosine similarity for comparison of spatial composition of extracted synergies. Moreover, an approach based on virtual electrode repositioning was applied to shed light on the influence of electrode position on inter-session synergy similarity. RESULTS: Inter-session synergy similarity was significantly lower with respect to intra-session similarity, both considering coefficient of variation of VAF (approximately 0.2-15% for inter vs. approximately 0.1% to 2.5% for intra, depending on NoS) and coefficient of variation of NoS (approximately 6.5-14.5% for inter vs. approximately 3-3.5% for intra, depending on VAF) as well as synergy similarity (approximately 74-77% for inter vs. approximately 88-94% for intra, depending on the selected VAF). Virtual electrode repositioning revealed that a slightly different electrode position can lower similarity of synergies from the same session and can increase similarity between sessions. Finally, the similarity of inter-subject synergies has no significant difference from the similarity of inter-session synergies (both on average approximately 84-90% depending on selected VAF). CONCLUSION: Synergy similarity was lower in inter-session conditions with respect to intra-session. This finding should be considered when interpreting results from multi-session assessments. Lastly, electrode positioning might play an important role in the lower similarity of synergies over different sessions.
Assuntos
Força da Mão , Músculo Esquelético , Atividades Cotidianas , Adulto , Fenômenos Biomecânicos , Eletromiografia , Feminino , Mãos , Humanos , Masculino , Adulto JovemRESUMO
BACKGROUND: Hand grasp patterns require complex coordination. The reduction of the kinematic dimensionality is a key process to study the patterns underlying hand usage and grasping. It allows to define metrics for motor assessment and rehabilitation, to develop assistive devices and prosthesis control methods. Several studies were presented in this field but most of them targeted a limited number of subjects, they focused on postures rather than entire grasping movements and they did not perform separate analysis for the tasks and subjects, which can limit the impact on rehabilitation and assistive applications. This paper provides a comprehensive mapping of synergies from hand grasps targeting activities of daily living. It clarifies several current limits of the field and fosters the development of applications in rehabilitation and assistive robotics. METHODS: In this work, hand kinematic data of 77 subjects, performing up to 20 hand grasps, were acquired with a data glove (a 22-sensor CyberGlove II data glove) and analyzed. Principal Component Analysis (PCA) and hierarchical cluster analysis were used to extract and group kinematic synergies that summarize the coordination patterns available for hand grasps. RESULTS: Twelve synergies were found to account for > 80% of the overall variation. The first three synergies accounted for more than 50% of the total amount of variance and consisted of: the flexion and adduction of the Metacarpophalangeal joint (MCP) of fingers 3 to 5 (synergy #1), palmar arching and flexion of the wrist (synergy #2) and opposition of the thumb (synergy #3). Further synergies refine movements and have higher variability among subjects. CONCLUSION: Kinematic synergies are extracted from a large number of subjects (77) and grasps related to activities of daily living (20). The number of motor modules required to perform the motor tasks is higher than what previously described. Twelve synergies are responsible for most of the variation in hand grasping. The first three are used as primary synergies, while the remaining ones target finer movements (e.g. independence of thumb and index finger). The results generalize the description of hand kinematics, better clarifying several limits of the field and fostering the development of applications in rehabilitation and assistive robotics.
Assuntos
Atividades Cotidianas , Força da Mão/fisiologia , Atividade Motora/fisiologia , Fenômenos Biomecânicos , Conjuntos de Dados como Assunto , Feminino , Humanos , Masculino , Análise de Componente PrincipalRESUMO
BACKGROUND: A proper modeling of human grasping and of hand movements is fundamental for robotics, prosthetics, physiology and rehabilitation. The taxonomies of hand grasps that have been proposed in scientific literature so far are based on qualitative analyses of the movements and thus they are usually not quantitatively justified. METHODS: This paper presents to the best of our knowledge the first quantitative taxonomy of hand grasps based on biomedical data measurements. The taxonomy is based on electromyography and kinematic data recorded from 40 healthy subjects performing 20 unique hand grasps. For each subject, a set of hierarchical trees are computed for several signal features. Afterwards, the trees are combined, first into modality-specific (i.e. muscular and kinematic) taxonomies of hand grasps and then into a general quantitative taxonomy of hand movements. The modality-specific taxonomies provide similar results despite describing different parameters of hand movements, one being muscular and the other kinematic. RESULTS: The general taxonomy merges the kinematic and muscular description into a comprehensive hierarchical structure. The obtained results clarify what has been proposed in the literature so far and they partially confirm the qualitative parameters used to create previous taxonomies of hand grasps. According to the results, hand movements can be divided into five movement categories defined based on the overall grasp shape, finger positioning and muscular activation. Part of the results appears qualitatively in accordance with previous results describing kinematic hand grasping synergies. CONCLUSIONS: The taxonomy of hand grasps proposed in this paper clarifies with quantitative measurements what has been proposed in the field on a qualitative basis, thus having a potential impact on several scientific fields.
Assuntos
Força da Mão/fisiologia , Mãos/fisiologia , Adulto , Algoritmos , Fenômenos Biomecânicos , Classificação , Eletromiografia , Feminino , Dedos , Mãos/anatomia & histologia , Voluntários Saudáveis , Humanos , Masculino , Movimento , Valores de Referência , Processamento de Sinais Assistido por ComputadorRESUMO
Information search has changed the way we manage knowledge and the ubiquity of information access has made search a frequent activity, whether via Internet search engines or increasingly via mobile devices. Medical information search is in this respect no different and much research has been devoted to analyzing the way in which physicians aim to access information. Medical image search is a much smaller domain but has gained much attention as it has different characteristics than search for text documents. While web search log files have been analysed many times to better understand user behaviour, the log files of hospital internal systems for search in a PACS/RIS (Picture Archival and Communication System, Radiology Information System) have rarely been analysed. Such a comparison between a hospital PACS/RIS search and a web system for searching images of the biomedical literature is the goal of this paper. Objectives are to identify similarities and differences in search behaviour of the two systems, which could then be used to optimize existing systems and build new search engines. Log files of the ARRS GoldMiner medical image search engine (freely accessible on the Internet) containing 222,005 queries, and log files of Stanford's internal PACS/RIS search called radTF containing 18,068 queries were analysed. Each query was preprocessed and all query terms were mapped to the RadLex (Radiology Lexicon) terminology, a comprehensive lexicon of radiology terms created and maintained by the Radiological Society of North America, so the semantic content in the queries and the links between terms could be analysed, and synonyms for the same concept could be detected. RadLex was mainly created for the use in radiology reports, to aid structured reporting and the preparation of educational material (Lanlotz, 2006) [1]. In standard medical vocabularies such as MeSH (Medical Subject Headings) and UMLS (Unified Medical Language System) specific terms of radiology are often underrepresented, therefore RadLex was considered to be the best option for this task. The results show a surprising similarity between the usage behaviour in the two systems, but several subtle differences can also be noted. The average number of terms per query is 2.21 for GoldMiner and 2.07 for radTF, the used axes of RadLex (anatomy, pathology, findings, ) have almost the same distribution with clinical findings being the most frequent and the anatomical entity the second; also, combinations of RadLex axes are extremely similar between the two systems. Differences include a longer length of the sessions in radTF than in GoldMiner (3.4 and 1.9 queries per session on average). Several frequent search terms overlap but some strong differences exist in the details. In radTF the term "normal" is frequent, whereas in GoldMiner it is not. This makes intuitive sense, as in the literature normal cases are rarely described whereas in clinical work the comparison with normal cases is often a first step. The general similarity in many points is likely due to the fact that users of the two systems are influenced by their daily behaviour in using standard web search engines and follow this behaviour in their professional search. This means that many results and insights gained from standard web search can likely be transferred to more specialized search systems. Still, specialized log files can be used to find out more on reformulations and detailed strategies of users to find the right content.
Assuntos
Informática Médica/instrumentação , Interpretação de Imagem Radiográfica Assistida por Computador/instrumentação , Sistemas de Informação em Radiologia , Radiologia/instrumentação , Algoritmos , Gráficos por Computador , Hospitais , Armazenamento e Recuperação da Informação , Internet , Informática Médica/métodos , Processamento de Linguagem Natural , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Ferramenta de Busca , Semântica , Interface Usuário-ComputadorRESUMO
Log files of information retrieval systems that record user behavior have been used to improve the outcomes of retrieval systems, understand user behavior, and predict events. In this article, a log file of the ARRS GoldMiner search engine containing 222,005 consecutive queries is analyzed. Time stamps are available for each query, as well as masked IP addresses, which enables to identify queries from the same person. This article describes the ways in which physicians (or Internet searchers interested in medical images) search and proposes potential improvements by suggesting query modifications. For example, many queries contain only few terms and therefore are not specific; others contain spelling mistakes or non-medical terms that likely lead to poor or empty results. One of the goals of this report is to predict the number of results a query will have since such a model allows search engines to automatically propose query modifications in order to avoid result lists that are empty or too large. This prediction is made based on characteristics of the query terms themselves. Prediction of empty results has an accuracy above 88%, and thus can be used to automatically modify the query to avoid empty result sets for a user. The semantic analysis and data of reformulations done by users in the past can aid the development of better search systems, particularly to improve results for novice users. Therefore, this paper gives important ideas to better understand how people search and how to use this knowledge to improve the performance of specialized medical search engines.
Assuntos
Armazenamento e Recuperação da Informação/métodos , Internet , Sistemas de Informação em Radiologia , Semântica , Interface Usuário-Computador , HumanosRESUMO
Radiological imaging is a globally prevalent diagnostic method, yet the free text contained in radiology reports is not frequently used for secondary purposes. Natural Language Processing can provide structured data retrieved from these reports. This paper provides a summary of the current state of research on Large Language Model (LLM) based approaches for information extraction (IE) from radiology reports. We conduct a scoping review that follows the PRISMA-ScR guideline. Queries of five databases were conducted on August 1st 2023. Among the 34 studies that met inclusion criteria, only pre-transformer and encoder-based models are described. External validation shows a general performance decrease, although LLMs might improve generalizability of IE approaches. Reports related to CT and MRI examinations, as well as thoracic reports, prevail. Most common challenges reported are missing validation on external data and augmentation of the described methods. Different reporting granularities affect the comparability and transparency of approaches.
RESUMO
Hand-labelling clinical corpora can be costly and inflexible, requiring re-annotation every time new classes need to be extracted. PICO (Participant, Intervention, Comparator, Outcome) information extraction can expedite conducting systematic reviews to answer clinical questions. However, PICO frequently extends to other entities such as Study type and design, trial context, and timeframe, requiring manual re-annotation of existing corpora. In this paper, we adapt Snorkel's weak supervision methodology to extend clinical corpora to new entities without extensive hand labelling. Specifically, we enrich the EBM-PICO corpus with new entities through an example of "Study type and design" extraction. Using weak supervision, we obtain programmatic labels on 4,081 EBM-PICO documents, achieving an F1-score of 85.02% on the test set.
Assuntos
Armazenamento e Recuperação da Informação , Revisões Sistemáticas como Assunto , Humanos , Mineração de Dados/métodos , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem NaturalRESUMO
BACKGROUND AND OBJECTIVE: The automatic registration of differently stained whole slide images (WSIs) is crucial for improving diagnosis and prognosis by fusing complementary information emerging from different visible structures. It is also useful to quickly transfer annotations between consecutive or restained slides, thus significantly reducing the annotation time and associated costs. Nevertheless, the slide preparation is different for each stain and the tissue undergoes complex and large deformations. Therefore, a robust, efficient, and accurate registration method is highly desired by the scientific community and hospitals specializing in digital pathology. METHODS: We propose a two-step hybrid method consisting of (i) deep learning- and feature-based initial alignment algorithm, and (ii) intensity-based nonrigid registration using the instance optimization. The proposed method does not require any fine-tuning to a particular dataset and can be used directly for any desired tissue type and stain. The registration time is low, allowing one to perform efficient registration even for large datasets. The method was proposed for the ACROBAT 2023 challenge organized during the MICCAI 2023 conference and scored 1st place. The method is released as open-source software. RESULTS: The proposed method is evaluated using three open datasets: (i) Automatic Nonrigid Histological Image Registration Dataset (ANHIR), (ii) Automatic Registration of Breast Cancer Tissue Dataset (ACROBAT), and (iii) Hybrid Restained and Consecutive Histological Serial Sections Dataset (HyReCo). The target registration error (TRE) is used as the evaluation metric. We compare the proposed algorithm to other state-of-the-art solutions, showing considerable improvement. Additionally, we perform several ablation studies concerning the resolution used for registration and the initial alignment robustness and stability. The method achieves the most accurate results for the ACROBAT dataset, the cell-level registration accuracy for the restained slides from the HyReCo dataset, and is among the best methods evaluated on the ANHIR dataset. CONCLUSIONS: The article presents an automatic and robust registration method that outperforms other state-of-the-art solutions. The method does not require any fine-tuning to a particular dataset and can be used out-of-the-box for numerous types of microscopic images. The method is incorporated into the DeeperHistReg framework, allowing others to directly use it to register, transform, and save the WSIs at any desired pyramid level (resolution up to 220k x 220k). We provide free access to the software. The results are fully and easily reproducible. The proposed method is a significant contribution to improving the WSI registration quality, thus advancing the field of digital pathology.
Assuntos
Algoritmos , Aprendizado Profundo , Processamento de Imagem Assistida por Computador , Humanos , Processamento de Imagem Assistida por Computador/métodos , Software , Interpretação de Imagem Assistida por Computador/métodos , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/patologia , Feminino , Coloração e RotulagemRESUMO
In ophthalmology, Optical Coherence Tomography (OCT) has become a daily used tool in the diagnostics and therapeutic planning of various diseases. Publicly available datasets play a crucial role in advancing research by providing access to diverse imaging data for algorithm development. The accessibility, data format, annotations, and metadata are not consistent across OCT datasets, making it challenging to efficiently use the available resources. This article provides a comprehensive analysis of different OCT datasets, with particular attention to dataset properties, disease representation, accessibility, and aims to create a catalog of all publicly available OCT datasets. The goal is to improve accessibility to OCT data, increase openness about the availability, and give important new perspectives on the state of OCT imaging resources. Our findings reveal the need for improved data-sharing practices and standardized documentation.
Assuntos
Tomografia de Coerência Óptica , Humanos , Doenças Retinianas/diagnóstico por imagem , Bases de Dados Factuais , Retina/diagnóstico por imagem , Disseminação de InformaçãoRESUMO
Artificial intelligence has transformed medical diagnostic capabilities, particularly through medical image analysis. AI algorithms perform well in detecting abnormalities with a strong performance, enabling computer-aided diagnosis by analyzing the extensive amounts of patient data. The data serve as a foundation upon which algorithms learn and make predictions. Thus, the importance of data cannot be underestimated, and clinically corresponding datasets are required. Many researchers face a lack of medical data due to limited access, privacy concerns, or the absence of available annotations. One of the most widely used diagnostic tools in ophthalmology is Optical Coherence Tomography (OCT). Addressing the data availability issue is crucial for enhancing AI applications in the field of OCT diagnostics. This review aims to provide a comprehensive analysis of all publicly accessible retinal OCT datasets. Our main objective is to compile a list of OCT datasets and their properties, which can serve as an accessible reference, facilitating data curation for medical image analysis tasks. For this review, we searched through the Zenodo repository, Mendeley Data repository, MEDLINE database, and Google Dataset search engine. We systematically evaluated all the identified datasets and found 23 open-access datasets containing OCT images, which significantly vary in terms of size, scope, and ground-truth labels. Our findings indicate the need for improvement in data-sharing practices and standardized documentation. Enhancing the availability and quality of OCT datasets will support the development of AI algorithms and ultimately improve diagnostic capabilities in ophthalmology. By providing a comprehensive list of accessible OCT datasets, this review aims to facilitate better utilization and development of AI in medical image analysis.
RESUMO
The problem of artifacts in whole slide image acquisition, prevalent in both clinical workflows and research-oriented settings, necessitates human intervention and re-scanning. Overcoming this challenge requires developing quality control algorithms, that are hindered by the limited availability of relevant annotated data in histopathology. The manual annotation of ground-truth for artifact detection methods is expensive and time-consuming. This work addresses the issue by proposing a method dedicated to augmenting whole slide images with artifacts. The tool seamlessly generates and blends artifacts from an external library to a given histopathology dataset. The augmented datasets are then utilized to train artifact classification methods. The evaluation shows their usefulness in classification of the artifacts, where they show an improvement from 0.10 to 0.01 AUROC depending on the artifact type. The framework, model, weights, and ground-truth annotations are freely released to facilitate open science and reproducible research.
Assuntos
Algoritmos , Artefatos , Processamento de Imagem Assistida por Computador , Controle de Qualidade , Humanos , Processamento de Imagem Assistida por Computador/métodosRESUMO
Prostate cancer is the second most frequent cancer in men worldwide after lung cancer. Its diagnosis is based on the identification of the Gleason score that evaluates the abnormality of cells in glands through the analysis of the different Gleason patterns within tissue samples. The recent advancements in computational pathology, a domain aiming at developing algorithms to automatically analyze digitized histopathology images, lead to a large variety and availability of datasets and algorithms for Gleason grading and scoring. However, there is no clear consensus on which methods are best suited for each problem in relation to the characteristics of data and labels. This paper provides a systematic comparison on nine datasets with state-of-the-art training approaches for deep neural networks (including fully-supervised learning, weakly-supervised learning, semi-supervised learning, Additive-MIL, Attention-Based MIL, Dual-Stream MIL, TransMIL and CLAM) applied to Gleason grading and scoring tasks. The nine datasets are collected from pathology institutes and openly accessible repositories. The results show that the best methods for Gleason grading and Gleason scoring tasks are fully supervised learning and CLAM, respectively, guiding researchers to the best practice to adopt depending on the task to solve and the labels that are available.
Assuntos
Aprendizado Profundo , Gradação de Tumores , Neoplasias da Próstata , Humanos , Neoplasias da Próstata/patologia , Neoplasias da Próstata/diagnóstico por imagem , Masculino , Algoritmos , Interpretação de Imagem Assistida por Computador/métodosRESUMO
Automated medical image analysis systems often require large amounts of training data with high quality labels, which are difficult and time consuming to generate. This paper introduces Radiology Object in COntext version 2 (ROCOv2), a multimodal dataset consisting of radiological images and associated medical concepts and captions extracted from the PMC Open Access subset. It is an updated version of the ROCO dataset published in 2018, and adds 35,705 new images added to PMC since 2018. It further provides manually curated concepts for imaging modalities with additional anatomical and directional concepts for X-rays. The dataset consists of 79,789 images and has been used, with minor modifications, in the concept detection and caption prediction tasks of ImageCLEFmedical Caption 2023. The dataset is suitable for training image annotation models based on image-caption pairs, or for multi-label image classification using Unified Medical Language System (UMLS) concepts provided with each image. In addition, it can serve for pre-training of medical domain models, and evaluation of deep learning models for multi-task learning.
Assuntos
Imagem Multimodal , Radiologia , Humanos , Processamento de Imagem Assistida por Computador , Unified Medical Language SystemRESUMO
The increasing availability of biomedical data creates valuable resources for developing new deep learning algorithms to support experts, especially in domains where collecting large volumes of annotated data is not trivial. Biomedical data include several modalities containing complementary information, such as medical images and reports: images are often large and encode low-level information, while reports include a summarized high-level description of the findings identified within data and often only concerning a small part of the image. However, only a few methods allow to effectively link the visual content of images with the textual content of reports, preventing medical specialists from properly benefitting from the recent opportunities offered by deep learning models. This paper introduces a multimodal architecture creating a robust biomedical data representation encoding fine-grained text representations within image embeddings. The architecture aims to tackle data scarcity (combining supervised and self-supervised learning) and to create multimodal biomedical ontologies. The architecture is trained on over 6,000 colon whole slide Images (WSI), paired with the corresponding report, collected from two digital pathology workflows. The evaluation of the multimodal architecture involves three tasks: WSI classification (on data from pathology workflow and from public repositories), multimodal data retrieval, and linking between textual and visual concepts. Noticeably, the latter two tasks are available by architectural design without further training, showing that the multimodal architecture that can be adopted as a backbone to solve peculiar tasks. The multimodal data representation outperforms the unimodal one on the classification of colon WSIs and allows to halve the data needed to reach accurate performance, reducing the computational power required and thus the carbon footprint. The combination of images and reports exploiting self-supervised algorithms allows to mine databases without needing new annotations provided by experts, extracting new information. In particular, the multimodal visual ontology, linking semantic concepts to images, may pave the way to advancements in medicine and biomedical analysis domains, not limited to histopathology.