Search | VHL Search Portal

1.

DECONbench: a benchmarking platform dedicated to deconvolution methods for tumor heterogeneity quantification.

Decamps, Clémentine; Arnaud, Alexis; Petitprez, Florent; Ayadi, Mira; Baurès, Aurélia; Armenoult, Lucile; Escalera, Sergio; Guyon, Isabelle; Nicolle, Rémy; Tomasini, Richard; de Reyniès, Aurélien; Cros, Jérôme; Blum, Yuna; Richard, Magali.

BMC Bioinformatics ; 22(1): 473, 2021 Oct 02.

Article in English | MEDLINE | ID: mdl-34600479

ABSTRACT

BACKGROUND: Quantification of tumor heterogeneity is essential to better understand cancer progression and to adapt therapeutic treatments to patient specificities. Bioinformatic tools to assess the different cell populations from single-omic datasets as bulk transcriptome or methylome samples have been recently developed, including reference-based and reference-free methods. Improved methods using multi-omic datasets are yet to be developed in the future and the community would need systematic tools to perform a comparative evaluation of these algorithms on controlled data. RESULTS: We present DECONbench, a standardized unbiased benchmarking resource, applied to the evaluation of computational methods quantifying cell-type heterogeneity in cancer. DECONbench includes gold standard simulated benchmark datasets, consisting of transcriptome and methylome profiles mimicking pancreatic adenocarcinoma molecular heterogeneity, and a set of baseline deconvolution methods (reference-free algorithms inferring cell-type proportions). DECONbench performs a systematic performance evaluation of each new methodological contribution and provides the possibility to publicly share source code and scoring. CONCLUSION: DECONbench allows continuous submission of new methods in a user-friendly fashion, each novel contribution being automatically compared to the reference baseline methods, which enables crowdsourced benchmarking. DECONbench is designed to serve as a reference platform for the benchmarking of deconvolution methods in the evaluation of cancer heterogeneity. We believe it will contribute to leverage the benchmarking practices in the biomedical and life science communities. DECONbench is hosted on the open source Codalab competition platform. It is freely available at: https://competitions.codalab.org/competitions/27453 .

Subject(s)

Adenocarcinoma , Pancreatic Neoplasms , Algorithms , Benchmarking , Computational Biology , Humans , Pancreatic Neoplasms/genetics

2.

Statistical Machine Learning for Human Behaviour Analysis.

Moeslund, Thomas B; Escalera, Sergio; Anbarjafari, Gholamreza; Nasrollahi, Kamal; Wan, Jun.

Entropy (Basel) ; 22(5)2020 May 07.

Article in English | MEDLINE | ID: mdl-33286302

ABSTRACT

Human behaviour analysis has introduced several challenges in various fields, such as applied information theory, affective computing, robotics, biometrics and pattern recognition [...].

3.

Action Recognition Using Single-Pixel Time-of-Flight Detection.

Ofodile, Ikechukwu; Helmi, Ahmed; Clapés, Albert; Avots, Egils; Peensoo, Kerttu Maria; Valdma, Sandhra-Mirella; Valdmann, Andreas; Valtna-Lukner, Heli; Omelkov, Sergey; Escalera, Sergio; Ozcinar, Cagri; Anbarjafari, Gholamreza.

Entropy (Basel) ; 21(4)2019 Apr 18.

Article in English | MEDLINE | ID: mdl-33267128

ABSTRACT

Action recognition is a challenging task that plays an important role in many robotic systems, which highly depend on visual input feeds. However, due to privacy concerns, it is important to find a method which can recognise actions without using visual feed. In this paper, we propose a concept for detecting actions while preserving the test subject's privacy. Our proposed method relies only on recording the temporal evolution of light pulses scattered back from the scene. Such data trace to record one action contains a sequence of one-dimensional arrays of voltage values acquired by a single-pixel detector at 1 GHz repetition rate. Information about both the distance to the object and its shape are embedded in the traces. We apply machine learning in the form of recurrent neural networks for data analysis and demonstrate successful action recognition. The experimental results show that our proposed method could achieve on average 96.47 % accuracy on the actions walking forward, walking backwards, sitting down, standing up and waving hand, using recurrent neural network.

4.

Organ Segmentation in Poultry Viscera Using RGB-D.

Philipsen, Mark Philip; Dueholm, Jacob Velling; Jørgensen, Anders; Escalera, Sergio; Moeslund, Thomas Baltzer.

Sensors (Basel) ; 18(1)2018 Jan 03.

Article in English | MEDLINE | ID: mdl-29301337

ABSTRACT

We present a pattern recognition framework for semantic segmentation of visual structures, that is, multi-class labelling at pixel level, and apply it to the task of segmenting organs in the eviscerated viscera from slaughtered poultry in RGB-D images. This is a step towards replacing the current strenuous manual inspection at poultry processing plants. Features are extracted from feature maps such as activation maps from a convolutional neural network (CNN). A random forest classifier assigns class probabilities, which are further refined by utilizing context in a conditional random field. The presented method is compatible with both 2D and 3D features, which allows us to explore the value of adding 3D and CNN-derived features. The dataset consists of 604 RGB-D images showing 151 unique sets of eviscerated viscera from four different perspectives. A mean Jaccard index of 78.11 % is achieved across the four classes of organs by using features derived from 2D, 3D and a CNN, compared to 74.28 % using only basic 2D image features.

5.

Multi-Modal Deep Hand Sign Language Recognition in Still Images Using Restricted Boltzmann Machine.

Rastgoo, Razieh; Kiani, Kourosh; Escalera, Sergio.

Entropy (Basel) ; 20(11)2018 Oct 23.

Article in English | MEDLINE | ID: mdl-33266533

ABSTRACT

In this paper, a deep learning approach, Restricted Boltzmann Machine (RBM), is used to perform automatic hand sign language recognition from visual data. We evaluate how RBM, as a deep generative model, is capable of generating the distribution of the input data for an enhanced recognition of unseen data. Two modalities, RGB and Depth, are considered in the model input in three forms: original image, cropped image, and noisy cropped image. Five crops of the input image are used and the hand of these cropped images are detected using Convolutional Neural Network (CNN). After that, three types of the detected hand images are generated for each modality and input to RBMs. The outputs of the RBMs for two modalities are fused in another RBM in order to recognize the output sign label of the input image. The proposed multi-modal model is trained on all and part of the American alphabet and digits of four publicly available datasets. We also evaluate the robustness of the proposal against noise. Experimental results show that the proposed multi-modal model, using crops and the RBM fusing methodology, achieves state-of-the-art results on Massey University Gesture Dataset 2012, American Sign Language (ASL). and Fingerspelling Dataset from the University of Surrey's Center for Vision, Speech and Signal Processing, NYU, and ASL Fingerspelling A datasets.

6.

A survey on model based approaches for 2D and 3D visual human pose recovery.

Perez-Sala, Xavier; Escalera, Sergio; Angulo, Cecilio; Gonzàlez, Jordi.

Sensors (Basel) ; 14(3): 4189-210, 2014 Mar 03.

Article in English | MEDLINE | ID: mdl-24594613

ABSTRACT

Human Pose Recovery has been studied in the field of Computer Vision for the last 40 years. Several approaches have been reported, and significant improvements have been obtained in both data representation and model design. However, the problem of Human Pose Recovery in uncontrolled environments is far from being solved. In this paper, we define a general taxonomy to group model based approaches for Human Pose Recovery, which is composed of five main modules: appearance, viewpoint, spatial relations, temporal consistence, and behavior. Subsequently, a methodological comparison is performed following the proposed taxonomy, evaluating current SoA approaches in the aforementioned five group categories. As a result of this comparison, we discuss the main advantages and drawbacks of the reviewed literature.

Subject(s)

Data Collection , Imaging, Three-Dimensional , Models, Theoretical , Posture/physiology , Visual Perception/physiology , Biomechanical Phenomena , Humans

7.

Learning to Recognize Actions on Objects in Egocentric Video With Attention Dictionaries.

Sudhakaran, Swathikiran; Escalera, Sergio; Lanz, Oswald.

IEEE Trans Pattern Anal Mach Intell ; 45(6): 6674-6687, 2023 Jun.

Article in English | MEDLINE | ID: mdl-33571086

ABSTRACT

We present EgoACO, a deep neural architecture for video action recognition that learns to pool action-context-object descriptors from frame level features by leveraging the verb-noun structure of action labels in egocentric video datasets. The core component is class activation pooling (CAP), a differentiable pooling layer that combines ideas from bilinear pooling for fine-grained recognition and from feature learning for discriminative localization. CAP uses self-attention with a dictionary of learnable weights to pool from the most relevant feature regions. Through CAP, EgoACO learns to decode object and scene context descriptors from video frame features. For temporal modeling we design a recurrent version of class activation pooling termed Long Short-Term Attention (LSTA). LSTA extends convolutional gated LSTM with built-in spatial attention and a re-designed output gate. Action, object and context descriptors are fused by a multi-head prediction that accounts for the inter-dependencies between noun-verb-action structured labels in egocentric video datasets. EgoACO features built-in visual explanations, helping learning and interpretation of discriminative information in video. Results on the two largest egocentric action recognition datasets currently available, EPIC-KITCHENS and EGTEA Gaze+, show that by decoding action-context-object descriptors, the model achieves state-of-the-art recognition performance.

8.

Gate-Shift-Fuse for Video Action Recognition.

Sudhakaran, Swathikiran; Escalera, Sergio; Lanz, Oswald.

IEEE Trans Pattern Anal Mach Intell ; 45(9): 10913-10928, 2023 Sep.

Article in English | MEDLINE | ID: mdl-37074899

ABSTRACT

Convolutional Neural Networks are the de facto models for image recognition. However 3D CNNs, the straight forward extension of 2D CNNs for video recognition, have not achieved the same success on standard action recognition benchmarks. One of the main reasons for this reduced performance of 3D CNNs is the increased computational complexity requiring large scale annotated datasets to train them in scale. 3D kernel factorization approaches have been proposed to reduce the complexity of 3D CNNs. Existing kernel factorization approaches follow hand-designed and hard-wired techniques. In this paper we propose Gate-Shift-Fuse (GSF), a novel spatio-temporal feature extraction module which controls interactions in spatio-temporal decomposition and learns to adaptively route features through time and combine them in a data dependent manner. GSF leverages grouped spatial gating to decompose input tensor and channel weighting to fuse the decomposed tensors. GSF can be inserted into existing 2D CNNs to convert them into an efficient and high performing spatio-temporal feature extractor, with negligible parameter and compute overhead. We perform an extensive analysis of GSF using two popular 2D CNN families and achieve state-of-the-art or competitive performance on five standard action recognition benchmarks.

9.

Video Transformers: A Survey.

Selva, Javier; Johansen, Anders S; Escalera, Sergio; Nasrollahi, Kamal; Moeslund, Thomas B; Clapes, Albert.

IEEE Trans Pattern Anal Mach Intell ; 45(11): 12922-12943, 2023 Nov.

Article in English | MEDLINE | ID: mdl-37022830

ABSTRACT

Transformer models have shown great success handling long-range interactions, making them a promising tool for modeling video. However, they lack inductive biases and scale quadratically with input length. These limitations are further exacerbated when dealing with the high dimensionality introduced by the temporal dimension. While there are surveys analyzing the advances of Transformers for vision, none focus on an in-depth analysis of video-specific designs. In this survey, we analyze the main contributions and trends of works leveraging Transformers to model video. Specifically, we delve into how videos are handled at the input level first. Then, we study the architectural changes made to deal with video more efficiently, reduce redundancy, re-introduce useful inductive biases, and capture long-term temporal dynamics. In addition, we provide an overview of different training regimes and explore effective self-supervised learning strategies for video. Finally, we conduct a performance comparison on the most common benchmark for Video Transformers (i.e., action classification), finding them to outperform 3D ConvNets even with less computational complexity.

10.

Radiomics analysis enhances the diagnostic performance of CMR stress perfusion: a proof-of-concept study using the Dan-NICAD dataset.

Raisi-Estabragh, Zahra; Martin-Isla, Carlos; Nissen, Louise; Szabo, Liliana; Campello, Victor M; Escalera, Sergio; Winther, Simon; Bøttcher, Morten; Lekadir, Karim; Petersen, Steffen E.

Front Cardiovasc Med ; 10: 1141026, 2023.

Article in English | MEDLINE | ID: mdl-37781298

ABSTRACT

Objectives: To assess the feasibility of extracting radiomics signal intensity based features from the myocardium using cardiovascular magnetic resonance (CMR) imaging stress perfusion sequences. Furthermore, to compare the diagnostic performance of radiomics models against standard-of-care qualitative visual assessment of stress perfusion images, with the ground truth stenosis label being defined by invasive Fractional Flow Reserve (FFR) and quantitative coronary angiography. Methods: We used the Dan-NICAD 1 dataset, a multi-centre study with coronary computed tomography angiography, 1,5âT CMR stress perfusion, and invasive FFR available for a subset of 148 patients with suspected coronary artery disease. Image segmentation was performed by two independent readers. We used the Pyradiomics platform to extract radiomics first-order (n = 14) and texture (n = 75) features from the LV myocardium (basal, mid, apical) in rest and stress perfusion images. Results: Overall, 92 patients (mean age 62 years, 56 men) were included in the study, 39 with positive FFR. We double-cross validated the model and, in each inner fold, we trained and validated a per territory model. The conventional analysis results reported sensitivity of 41% and specificity of 84%. Our final radiomics model demonstrated an improvement on these results with an average sensitivity of 53% and specificity of 86%. Conclusion: In this proof-of-concept study from the Dan-NICAD dataset, we demonstrate the feasibility of radiomics analysis applied to CMR perfusion images with a suggestion of superior diagnostic performance of radiomics models over conventional visual analysis of perfusion images in picking up perfusion defects defined by invasive coronary angiography.

11.

CrossMoDA 2021 challenge: Benchmark of cross-modality domain adaptation techniques for vestibular schwannoma and cochlea segmentation.

Dorent, Reuben; Kujawa, Aaron; Ivory, Marina; Bakas, Spyridon; Rieke, Nicola; Joutard, Samuel; Glocker, Ben; Cardoso, Jorge; Modat, Marc; Batmanghelich, Kayhan; Belkov, Arseniy; Calisto, Maria Baldeon; Choi, Jae Won; Dawant, Benoit M; Dong, Hexin; Escalera, Sergio; Fan, Yubo; Hansen, Lasse; Heinrich, Mattias P; Joshi, Smriti; Kashtanova, Victoriya; Kim, Hyeon Gyu; Kondo, Satoshi; Kruse, Christian N; Lai-Yuen, Susana K; Li, Hao; Liu, Han; Ly, Buntheng; Oguz, Ipek; Shin, Hyungseob; Shirokikh, Boris; Su, Zixian; Wang, Guotai; Wu, Jianghao; Xu, Yanwu; Yao, Kai; Zhang, Li; Ourselin, Sébastien; Shapey, Jonathan; Vercauteren, Tom.

Med Image Anal ; 83: 102628, 2023 01.

Article in English | MEDLINE | ID: mdl-36283200

ABSTRACT

Domain Adaptation (DA) has recently been of strong interest in the medical imaging community. While a large variety of DA techniques have been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality Domain Adaptation (crossMoDA) challenge was organised in conjunction with the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). CrossMoDA is the first large and multi-class benchmark for unsupervised cross-modality Domain Adaptation. The goal of the challenge is to segment two key brain structures involved in the follow-up and treatment planning of vestibular schwannoma (VS): the VS and the cochleas. Currently, the diagnosis and surveillance in patients with VS are commonly performed using contrast-enhanced T1 (ceT1) MR imaging. However, there is growing interest in using non-contrast imaging sequences such as high-resolution T2 (hrT2) imaging. For this reason, we established an unsupervised cross-modality segmentation benchmark. The training dataset provides annotated ceT1 scans (N=105) and unpaired non-annotated hrT2 scans (N=105). The aim was to automatically perform unilateral VS and bilateral cochlea segmentation on hrT2 scans as provided in the testing set (N=137). This problem is particularly challenging given the large intensity distribution gap across the modalities and the small volume of the structures. A total of 55 teams from 16 countries submitted predictions to the validation leaderboard. Among them, 16 teams from 9 different countries submitted their algorithm for the evaluation phase. The level of performance reached by the top-performing teams is strikingly high (best median Dice score - VS: 88.4%; Cochleas: 85.7%) and close to full supervision (median Dice score - VS: 92.5%; Cochleas: 87.7%). All top-performing methods made use of an image-to-image translation approach to transform the source-domain images into pseudo-target-domain images. A segmentation network was then trained using these generated images and the manual annotations provided for the source image.

Subject(s)

Neuroma, Acoustic , Humans , Neuroma, Acoustic/diagnostic imaging

12.

MyoPS: A benchmark of myocardial pathology segmentation combining three-sequence cardiac magnetic resonance images.

Li, Lei; Wu, Fuping; Wang, Sihan; Luo, Xinzhe; Martín-Isla, Carlos; Zhai, Shuwei; Zhang, Jianpeng; Liu, Yanfei; Zhang, Zhen; Ankenbrand, Markus J; Jiang, Haochuan; Zhang, Xiaoran; Wang, Linhong; Arega, Tewodros Weldebirhan; Altunok, Elif; Zhao, Zhou; Li, Feiyan; Ma, Jun; Yang, Xiaoping; Puybareau, Elodie; Oksuz, Ilkay; Bricq, Stephanie; Li, Weisheng; Punithakumar, Kumaradevan; Tsaftaris, Sotirios A; Schreiber, Laura M; Yang, Mingjing; Liu, Guocai; Xia, Yong; Wang, Guotai; Escalera, Sergio; Zhuang, Xiahai.

Med Image Anal ; 87: 102808, 2023 07.

Article in English | MEDLINE | ID: mdl-37087838

ABSTRACT

Assessment of myocardial viability is essential in diagnosis and treatment management of patients suffering from myocardial infarction, and classification of pathology on the myocardium is the key to this assessment. This work defines a new task of medical image analysis, i.e., to perform myocardial pathology segmentation (MyoPS) combining three-sequence cardiac magnetic resonance (CMR) images, which was first proposed in the MyoPS challenge, in conjunction with MICCAI 2020. Note that MyoPS refers to both myocardial pathology segmentation and the challenge in this paper. The challenge provided 45 paired and pre-aligned CMR images, allowing algorithms to combine the complementary information from the three CMR sequences for pathology segmentation. In this article, we provide details of the challenge, survey the works from fifteen participants and interpret their methods according to five aspects, i.e., preprocessing, data augmentation, learning strategy, model architecture and post-processing. In addition, we analyze the results with respect to different factors, in order to examine the key obstacles and explore the potential of solutions, as well as to provide a benchmark for future research. The average Dice scores of submitted algorithms were 0.614±0.231 and 0.644±0.153 for myocardial scars and edema, respectively. We conclude that while promising results have been reported, the research is still in the early stage, and more in-depth exploration is needed before a successful application to the clinics. MyoPS data and evaluation tool continue to be publicly available upon registration via its homepage (www.sdspeople.fudan.edu.cn/zhuangxiahai/0/myops20/).

Subject(s)

Benchmarking , Image Processing, Computer-Assisted , Humans , Image Processing, Computer-Assisted/methods , Heart/diagnostic imaging , Myocardium/pathology , Magnetic Resonance Imaging/methods

13.

Deep Learning Segmentation of the Right Ventricle in Cardiac MRI: The M&Ms Challenge.

Martin-Isla, Carlos; Campello, Victor M; Izquierdo, Cristian; Kushibar, Kaisar; Sendra-Balcells, Carla; Gkontra, Polyxeni; Sojoudi, Alireza; Fulton, Mitchell J; Arega, Tewodros Weldebirhan; Punithakumar, Kumaradevan; Li, Lei; Sun, Xiaowu; Al Khalil, Yasmina; Liu, Di; Jabbar, Sana; Queiros, Sandro; Galati, Francesco; Mazher, Moona; Gao, Zheyao; Beetz, Marcel; Tautz, Lennart; Galazis, Christoforos; Varela, Marta; Hullebrand, Markus; Grau, Vicente; Zhuang, Xiahai; Puig, Domenec; Zuluaga, Maria A; Mohy-Ud-Din, Hassan; Metaxas, Dimitris; Breeuwer, Marcel; van der Geest, Rob J; Noga, Michelle; Bricq, Stephanie; Rentschler, Mark E; Guala, Andrea; Petersen, Steffen E; Escalera, Sergio; Palomares, Jose F Rodriguez; Lekadir, Karim.

IEEE J Biomed Health Inform ; 27(7): 3302-3313, 2023 Jul.

Article in English | MEDLINE | ID: mdl-37067963

ABSTRACT

In recent years, several deep learning models have been proposed to accurately quantify and diagnose cardiac pathologies. These automated tools heavily rely on the accurate segmentation of cardiac structures in MRI images. However, segmentation of the right ventricle is challenging due to its highly complex shape and ill-defined borders. Hence, there is a need for new methods to handle such structure's geometrical and textural complexities, notably in the presence of pathologies such as Dilated Right Ventricle, Tricuspid Regurgitation, Arrhythmogenesis, Tetralogy of Fallot, and Inter-atrial Communication. The last MICCAI challenge on right ventricle segmentation was held in 2012 and included only 48 cases from a single clinical center. As part of the 12th Workshop on Statistical Atlases and Computational Models of the Heart (STACOM 2021), the M&Ms-2 challenge was organized to promote the interest of the research community around right ventricle segmentation in multi-disease, multi-view, and multi-center cardiac MRI. Three hundred sixty CMR cases, including short-axis and long-axis 4-chamber views, were collected from three Spanish hospitals using nine different scanners from three different vendors, and included a diverse set of right and left ventricle pathologies. The solutions provided by the participants show that nnU-Net achieved the best results overall. However, multi-view approaches were able to capture additional information, highlighting the need to integrate multiple cardiac diseases, views, scanners, and acquisition protocols to produce reliable automatic cardiac segmentation algorithms.

Subject(s)

Deep Learning , Heart Ventricles , Humans , Heart Ventricles/diagnostic imaging , Magnetic Resonance Imaging/methods , Algorithms , Heart Atria

14.

GrabCut-based human segmentation in video sequences.

Hernández-Vela, Antonio; Reyes, Miguel; Ponce, Víctor; Escalera, Sergio.

Sensors (Basel) ; 12(11): 15376-93, 2012 Nov 09.

Article in English | MEDLINE | ID: mdl-23202215

ABSTRACT

In this paper, we present a fully-automatic Spatio-Temporal GrabCut human segmentation methodology that combines tracking and segmentation. GrabCut initialization is performed by a HOG-based subject detection, face detection, and skin color model. Spatial information is included by Mean Shift clustering whereas temporal coherence is considered by the historical of Gaussian Mixture Models. Moreover, full face and pose recovery is obtained by combining human segmentation with Active Appearance Models and Conditional Random Fields. Results over public datasets and in a new Human Limb dataset show a robust segmentation and recovery of both face and pose using the presented methodology.

Subject(s)

Face , Pattern Recognition, Automated , Video Recording/methods , Automation , Extremities , Humans , Models, Theoretical , Skin Pigmentation

15.

Social network extraction and analysis based on multimodal dyadic interaction.

Escalera, Sergio; Baró, Xavier; Vitrià, Jordi; Radeva, Petia; Raducanu, Bogdan.

Sensors (Basel) ; 12(2): 1702-19, 2012.

Article in English | MEDLINE | ID: mdl-22438733

ABSTRACT

Social interactions are a very important component in people's lives. Social network analysis has become a common technique used to model and quantify the properties of social interactions. In this paper, we propose an integrated framework to explore the characteristics of a social network extracted from multimodal dyadic interactions. For our study, we used a set of videos belonging to New York Times' Blogging Heads opinion blog. The Social Network is represented as an oriented graph, whose directed links are determined by the Influence Model. The links' weights are a measure of the "influence" a person has over the other. The states of the Influence Model encode automatically extracted audio/visual features from our videos using state-of-the art algorithms. Our results are reported in terms of accuracy of audio/visual data fusion for speaker segmentation and centrality measures used to characterize the extracted social network.

Subject(s)

Algorithms , Interpersonal Relations , Multimedia , Social Support , User-Computer Interface

16.

Deep learning with self-supervision and uncertainty regularization to count fish in underwater images.

Tarling, Penny; Cantor, Mauricio; Clapés, Albert; Escalera, Sergio.

PLoS One ; 17(5): e0267759, 2022.

Article in English | MEDLINE | ID: mdl-35507631

ABSTRACT

Effective conservation actions require effective population monitoring. However, accurately counting animals in the wild to inform conservation decision-making is difficult. Monitoring populations through image sampling has made data collection cheaper, wide-reaching and less intrusive but created a need to process and analyse this data efficiently. Counting animals from such data is challenging, particularly when densely packed in noisy images. Attempting this manually is slow and expensive, while traditional computer vision methods are limited in their generalisability. Deep learning is the state-of-the-art method for many computer vision tasks, but it has yet to be properly explored to count animals. To this end, we employ deep learning, with a density-based regression approach, to count fish in low-resolution sonar images. We introduce a large dataset of sonar videos, deployed to record wild Lebranche mullet schools (Mugil liza), with a subset of 500 labelled images. We utilise abundant unlabelled data in a self-supervised task to improve the supervised counting task. For the first time in this context, by introducing uncertainty quantification, we improve model training and provide an accompanying measure of prediction uncertainty for more informed biological decision-making. Finally, we demonstrate the generalisability of our proposed counting framework through testing it on a recent benchmark dataset of high-resolution annotated underwater images from varying habitats (DeepFish). From experiments on both contrasting datasets, we demonstrate our network outperforms the few other deep learning models implemented for solving this task. By providing an open-source framework along with training data, our study puts forth an efficient deep learning template for crowd counting aquatic animals thereby contributing effective methods to assess natural populations from the ever-increasing visual data.

Subject(s)

Deep Learning , Animals , Benchmarking , Ecosystem , Fishes , Uncertainty

17.

ChaLearn Looking at People: IsoGD and ConGD Large-Scale RGB-D Gesture Recognition.

Wan, Jun; Lin, Chi; Wen, Longyin; Li, Yunan; Miao, Qiguang; Escalera, Sergio; Anbarjafari, Gholamreza; Guyon, Isabelle; Guo, Guodong; Li, Stan Z.

IEEE Trans Cybern ; 52(5): 3422-3433, 2022 May.

Article in English | MEDLINE | ID: mdl-32816685

ABSTRACT

The ChaLearn large-scale gesture recognition challenge has run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than 200 teams around the world. This challenge has two tracks, focusing on isolated and continuous gesture recognition, respectively. It describes the creation of both benchmark datasets and analyzes the advances in large-scale gesture recognition based on these two datasets. In this article, we discuss the challenges of collecting large-scale ground-truth annotations of gesture recognition and provide a detailed analysis of the current methods for large-scale isolated and continuous gesture recognition. In addition to the recognition rate and mean Jaccard index (MJI) as evaluation metrics used in previous challenges, we introduce the corrected segmentation rate (CSR) metric to evaluate the performance of temporal segmentation for continuous gesture recognition. Furthermore, we propose a bidirectional long short-term memory (Bi-LSTM) method, determining video division points based on skeleton points. Experiments show that the proposed Bi-LSTM outperforms state-of-the-art methods with an absolute improvement of 8.1% (from 0.8917 to 0.9639) of CSR.

Subject(s)

Gestures , Pattern Recognition, Automated , Algorithms , Humans , Pattern Recognition, Automated/methods

18.

Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform.

Xu, Zhen; Escalera, Sergio; Pavão, Adrien; Richard, Magali; Tu, Wei-Wei; Yao, Quanming; Zhao, Huan; Guyon, Isabelle.

Patterns (N Y) ; 3(7): 100543, 2022 Jul 08.

Article in English | MEDLINE | ID: mdl-35845844

ABSTRACT

Obtaining a standardized benchmark of computational methods is a major issue in data-science communities. Dedicated frameworks enabling fair benchmarking in a unified environment are yet to be developed. Here, we introduce Codabench, a meta-benchmark platform that is open sourced and community driven for benchmarking algorithms or software agents versus datasets or tasks. A public instance of Codabench is open to everyone free of charge and allows benchmark organizers to fairly compare submissions under the same setting (software, hardware, data, algorithms), with custom protocols and data formats. Codabench has unique features facilitating easy organization of flexible and reproducible benchmarks, such as the possibility of reusing templates of benchmarks and supplying compute resources on demand. Codabench has been used internally and externally on various applications, receiving more than 130 users and 2,500 submissions. As illustrative use cases, we introduce four diverse benchmarks covering graph machine learning, cancer heterogeneity, clinical diagnosis, and reinforcement learning.

19.

Minimising multi-centre radiomics variability through image normalisation: a pilot study.

Campello, Víctor M; Martín-Isla, Carlos; Izquierdo, Cristian; Guala, Andrea; Palomares, José F Rodríguez; Viladés, David; Descalzo, Martín L; Karakas, Mahir; Çavus, Ersin; Raisi-Estabragh, Zahra; Petersen, Steffen E; Escalera, Sergio; Seguí, Santi; Lekadir, Karim.

Sci Rep ; 12(1): 12532, 2022 07 22.

Article in English | MEDLINE | ID: mdl-35869125

ABSTRACT

Radiomics is an emerging technique for the quantification of imaging data that has recently shown great promise for deeper phenotyping of cardiovascular disease. Thus far, the technique has been mostly applied in single-centre studies. However, one of the main difficulties in multi-centre imaging studies is the inherent variability of image characteristics due to centre differences. In this paper, a comprehensive analysis of radiomics variability under several image- and feature-based normalisation techniques was conducted using a multi-centre cardiovascular magnetic resonance dataset. 218 subjects divided into healthy (n = 112) and hypertrophic cardiomyopathy (n = 106, HCM) groups from five different centres were considered. First and second order texture radiomic features were extracted from three regions of interest, namely the left and right ventricular cavities and the left ventricular myocardium. Two methods were used to assess features' variability. First, feature distributions were compared across centres to obtain a distribution similarity index. Second, two classification tasks were proposed to assess: (1) the amount of centre-related information encoded in normalised features (centre identification) and (2) the generalisation ability for a classification model when trained on these features (healthy versus HCM classification). The results showed that the feature-based harmonisation technique ComBat is able to remove the variability introduced by centre information from radiomic features, at the expense of slightly degrading classification performance. Piecewise linear histogram matching normalisation gave features with greater generalisation ability for classification ( balanced accuracy in between 0.78 ± 0.08 and 0.79 ± 0.09). Models trained with features from images without normalisation showed the worst performance overall ( balanced accuracy in between 0.45 ± 0.28 and 0.60 ± 0.22). In conclusion, centre-related information removal did not imply good generalisation ability for classification.

Subject(s)

Cardiomyopathy, Hypertrophic , Magnetic Resonance Imaging , Cardiomyopathy, Hypertrophic/diagnostic imaging , Humans , Magnetic Resonance Imaging/methods , Pilot Projects

20.

A fully-automatic caudate nucleus segmentation of brain MRI: application in volumetric analysis of pediatric attention-deficit/hyperactivity disorder.

Igual, Laura; Soliva, Joan Carles; Hernández-Vela, Antonio; Escalera, Sergio; Jiménez, Xavier; Vilarroya, Oscar; Radeva, Petia.

Biomed Eng Online ; 10: 105, 2011 Dec 05.

Article in English | MEDLINE | ID: mdl-22141926

ABSTRACT

BACKGROUND: Accurate automatic segmentation of the caudate nucleus in magnetic resonance images (MRI) of the brain is of great interest in the analysis of developmental disorders. Segmentation methods based on a single atlas or on multiple atlases have been shown to suitably localize caudate structure. However, the atlas prior information may not represent the structure of interest correctly. It may therefore be useful to introduce a more flexible technique for accurate segmentations. METHOD: We present CaudateCut: a new fully-automatic method of segmenting the caudate nucleus in MRI. CaudateCut combines an atlas-based segmentation strategy with the Graph Cut energy-minimization framework. We adapt the Graph Cut model to make it suitable for segmenting small, low-contrast structures, such as the caudate nucleus, by defining new energy function data and boundary potentials. In particular, we exploit information concerning the intensity and geometry, and we add supervised energies based on contextual brain structures. Furthermore, we reinforce boundary detection using a new multi-scale edgeness measure. RESULTS: We apply the novel CaudateCut method to the segmentation of the caudate nucleus to a new set of 39 pediatric attention-deficit/hyperactivity disorder (ADHD) patients and 40 control children, as well as to a public database of 18 subjects. We evaluate the quality of the segmentation using several volumetric and voxel by voxel measures. Our results show improved performance in terms of segmentation compared to state-of-the-art approaches, obtaining a mean overlap of 80.75%. Moreover, we present a quantitative volumetric analysis of caudate abnormalities in pediatric ADHD, the results of which show strong correlation with expert manual analysis. CONCLUSION: CaudateCut generates segmentation results that are comparable to gold-standard segmentations and which are reliable in the analysis of differentiating neuroanatomical abnormalities between healthy controls and pediatric ADHD.

Subject(s)

Attention Deficit Disorder with Hyperactivity/pathology , Brain/pathology , Caudate Nucleus/pathology , Magnetic Resonance Imaging/methods , Adolescent , Attention Deficit Disorder with Hyperactivity/diagnosis , Brain/anatomy & histology , Child , Female , Humans , Image Interpretation, Computer-Assisted/methods , Male , Models, Theoretical , Reproducibility of Results

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL