RESUMO
Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.
Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Aprendizado de Máquina , SemânticaRESUMO
Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation.
Assuntos
Inteligência ArtificialRESUMO
Failure is an integral part of life and by extension academia. At the same time, failure is often ignored, with potentially negative consequences both for the science and the scientists involved. This article provides several strategies for learning from and dealing with failure instead of ignoring it. Hopefully, our recommendations are widely applicable, while still taking into account individual differences between academics. These simple rules allow academics to further develop their own strategies for failing successfully in academia.
RESUMO
The Danish Reproducibility Network (DKRN) is a grassroots initiative for establishing a peer-supportive reproducibility-focused academic network in Denmark. We modelled our approach on already existing national Reproducibility Networks. We consulted with researchers and research support professionals to identify the needs of the research community. Three themes emerged around policy implementation, training and the appropriate application of reproducible practices. The network aims to address these three themes in a strategic plan, which harnesses the benefits of grassroots initiatives. The mission of the DKRN is therefore to facilitate communication, peer-support, and the exchange of ideas through a network of topic and geographical nodes. The network is open to researchers and research support professionals from all career stages and disciplines. It aligns with broader international initiatives, and national institutions, positioning itself as a contributor to the Danish research ecosystem.
Assuntos
Grupo Associado , Dinamarca , Humanos , Reprodutibilidade dos Testes , PesquisadoresRESUMO
Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.
RESUMO
Research in computer analysis of medical images bears many promises to improve patients' health. However, a number of systematic challenges are slowing down the progress of the field, from limitations of the data, such as biases, to research incentives, such as optimizing for publication. In this paper we review roadblocks to developing and assessing methods. Building our analysis on evidence from the literature and data challenges, we show that at every step, potential biases can creep in. On a positive note, we also discuss on-going efforts to counteract these problems. Finally we provide recommendations on how to further address these problems in the future.
RESUMO
Measuring airways in chest computed tomography (CT) scans is important for characterizing diseases such as cystic fibrosis, yet very time-consuming to perform manually. Machine learning algorithms offer an alternative, but need large sets of annotated scans for good performance. We investigate whether crowdsourcing can be used to gather airway annotations. We generate image slices at known locations of airways in 24 subjects and request the crowd workers to outline the airway lumen and airway wall. After combining multiple crowd workers, we compare the measurements to those made by the experts in the original scans. Similar to our preliminary study, a large portion of the annotations were excluded, possibly due to workers misunderstanding the instructions. After excluding such annotations, moderate to strong correlations with the expert can be observed, although these correlations are slightly lower than inter-expert correlations. Furthermore, the results across subjects in this study are quite variable. Although the crowd has potential in annotating airways, further development is needed for it to be robust enough for gathering annotations in practice. For reproducibility, data and code are available online: http://github.com/adriapr/crowdairway.git.
Assuntos
Algoritmos , Crowdsourcing/estatística & dados numéricos , Crowdsourcing/normas , Pulmão/diagnóstico por imagem , Aprendizado de Máquina , Radiografia Torácica/métodos , Tomografia Computadorizada por Raios X/métodos , HumanosRESUMO
Early career researchers (ECRs) are faced with a range of competing pressures in academia, making self-management key to building a successful career. The Organization for Human Brain Mapping undertook a group effort to gather helpful advice for ECRs in self-management.
Assuntos
Disciplinas das Ciências Biológicas/educação , Escolha da Profissão , Disciplinas das Ciências Naturais/educação , Pesquisadores , Autogestão , Mapeamento Encefálico , Humanos , Estilo de Vida , Mentores , Trabalho/psicologiaRESUMO
Machine learning (ML) algorithms have made a tremendous impact in the field of medical imaging. While medical imaging datasets have been growing in size, a challenge for supervised ML algorithms that is frequently mentioned is the lack of annotated data. As a result, various methods that can learn with less/other types of supervision, have been proposed. We give an overview of semi-supervised, multiple instance, and transfer learning in medical imaging, both in diagnosis or segmentation tasks. We also discuss connections between these learning scenarios, and opportunities for future research. A dataset with the details of the surveyed papers is available via https://figshare.com/articles/Database_of_surveyed_literature_in_Not-so-supervised_a_survey_of_semi-supervised_multi-instance_and_transfer_learning_in_medical_image_analysis_/7479416.
Assuntos
Diagnóstico por Imagem , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina Supervisionado , HumanosRESUMO
[This corrects the article DOI: 10.1371/journal.pone.0205397.].
RESUMO
Chronic obstructive pulmonary disease (COPD) is a lung disease that can be quantified using chest computed tomography scans. Recent studies have shown that COPD can be automatically diagnosed using weakly supervised learning of intensity and texture distributions. However, up till now such classifiers have only been evaluated on scans from a single domain, and it is unclear whether they would generalize across domains, such as different scanners or scanning protocols. To address this problem, we investigate classification of COPD in a multicenter dataset with a total of 803 scans from three different centers, four different scanners, with heterogenous subject distributions. Our method is based on Gaussian texture features, and a weighted logistic classifier, which increases the weights of samples similar to the test data. We show that Gaussian texture features outperform intensity features previously used in multicenter classification tasks. We also show that a weighting strategy based on a classifier that is trained to discriminate between scans from different domains can further improve the results. To encourage further research into transfer learning methods for the classification of COPD, upon acceptance of this paper we will release two feature datasets used in this study on http://bigr.nl/research/projects/copd.
Assuntos
Interpretação de Imagem Assistida por Computador/métodos , Doença Pulmonar Obstrutiva Crônica/diagnóstico por imagem , Tomografia Computadorizada por Raios X/métodos , Adulto , Idoso , Área Sob a Curva , Humanos , Aprendizado de Máquina , Pessoa de Meia-IdadeRESUMO
PURPOSE: A method for automatically quantifying emphysema regions using High-Resolution Computed Tomography (HRCT) scans of patients with chronic obstructive pulmonary disease (COPD) that does not require manually annotated scans for training is presented. METHODS: HRCT scans of controls and of COPD patients with diverse disease severity are acquired at two different centers. Textural features from co-occurrence matrices and Gaussian filter banks are used to characterize the lung parenchyma in the scans. Two robust versions of multiple instance learning (MIL) classifiers that can handle weakly labeled data, miSVM and MILES, are investigated. Weak labels give information relative to the emphysema without indicating the location of the lesions. The classifiers are trained with the weak labels extracted from the forced expiratory volume in one minute (FEV1) and diffusing capacity of the lungs for carbon monoxide (DLCO). At test time, the classifiers output a patient label indicating overall COPD diagnosis and local labels indicating the presence of emphysema. The classifier performance is compared with manual annotations made by two radiologists, a classical density based method, and pulmonary function tests (PFTs). RESULTS: The miSVM classifier performed better than MILES on both patient and emphysema classification. The classifier has a stronger correlation with PFT than the density based method, the percentage of emphysema in the intersection of annotations from both radiologists, and the percentage of emphysema annotated by one of the radiologists. The correlation between the classifier and the PFT is only outperformed by the second radiologist. CONCLUSIONS: The presented method uses MIL classifiers to automatically identify emphysema regions in HRCT scans. Furthermore, this approach has been demonstrated to correlate better with DLCO than a classical density based method or a radiologist, which is known to be affected in emphysema. Therefore, it is relevant to facilitate assessment of emphysema and to reduce inter-observer variability.
Assuntos
Interpretação de Imagem Assistida por Computador/métodos , Pulmão/diagnóstico por imagem , Enfisema Pulmonar/diagnóstico , Tomografia Computadorizada por Raios X , Humanos , Distribuição Normal , Enfisema Pulmonar/diagnóstico por imagem , Testes de Função RespiratóriaRESUMO
In multiple instance learning, objects are sets (bags) of feature vectors (instances) rather than individual feature vectors. In this paper, we address the problem of how these bags can best be represented. Two standard approaches are to use (dis)similarities between bags and prototype bags, or between bags and prototype instances. The first approach results in a relatively low-dimensional representation, determined by the number of training bags, whereas the second approach results in a relatively high-dimensional representation, determined by the total number of instances in the training set. However, an advantage of the latter representation is that the informativeness of the prototype instances can be inferred. In this paper, a third, intermediate approach is proposed, which links the two approaches and combines their strengths. Our classifier is inspired by a random subspace ensemble, and considers subspaces of the dissimilarity space, defined by subsets of instances, as prototypes. We provide insight into the structure of some popular multiple instance problems and show state-of-the-art performances on these data sets.