Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 37
Filter
1.
Front Cardiovasc Med ; 10: 1141026, 2023.
Article in English | MEDLINE | ID: mdl-37781298

ABSTRACT

Objectives: To assess the feasibility of extracting radiomics signal intensity based features from the myocardium using cardiovascular magnetic resonance (CMR) imaging stress perfusion sequences. Furthermore, to compare the diagnostic performance of radiomics models against standard-of-care qualitative visual assessment of stress perfusion images, with the ground truth stenosis label being defined by invasive Fractional Flow Reserve (FFR) and quantitative coronary angiography. Methods: We used the Dan-NICAD 1 dataset, a multi-centre study with coronary computed tomography angiography, 1,5 T CMR stress perfusion, and invasive FFR available for a subset of 148 patients with suspected coronary artery disease. Image segmentation was performed by two independent readers. We used the Pyradiomics platform to extract radiomics first-order (n = 14) and texture (n = 75) features from the LV myocardium (basal, mid, apical) in rest and stress perfusion images. Results: Overall, 92 patients (mean age 62 years, 56 men) were included in the study, 39 with positive FFR. We double-cross validated the model and, in each inner fold, we trained and validated a per territory model. The conventional analysis results reported sensitivity of 41% and specificity of 84%. Our final radiomics model demonstrated an improvement on these results with an average sensitivity of 53% and specificity of 86%. Conclusion: In this proof-of-concept study from the Dan-NICAD dataset, we demonstrate the feasibility of radiomics analysis applied to CMR perfusion images with a suggestion of superior diagnostic performance of radiomics models over conventional visual analysis of perfusion images in picking up perfusion defects defined by invasive coronary angiography.

2.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 12922-12943, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37022830

ABSTRACT

Transformer models have shown great success handling long-range interactions, making them a promising tool for modeling video. However, they lack inductive biases and scale quadratically with input length. These limitations are further exacerbated when dealing with the high dimensionality introduced by the temporal dimension. While there are surveys analyzing the advances of Transformers for vision, none focus on an in-depth analysis of video-specific designs. In this survey, we analyze the main contributions and trends of works leveraging Transformers to model video. Specifically, we delve into how videos are handled at the input level first. Then, we study the architectural changes made to deal with video more efficiently, reduce redundancy, re-introduce useful inductive biases, and capture long-term temporal dynamics. In addition, we provide an overview of different training regimes and explore effective self-supervised learning strategies for video. Finally, we conduct a performance comparison on the most common benchmark for Video Transformers (i.e., action classification), finding them to outperform 3D ConvNets even with less computational complexity.

3.
Med Image Anal ; 87: 102808, 2023 07.
Article in English | MEDLINE | ID: mdl-37087838

ABSTRACT

Assessment of myocardial viability is essential in diagnosis and treatment management of patients suffering from myocardial infarction, and classification of pathology on the myocardium is the key to this assessment. This work defines a new task of medical image analysis, i.e., to perform myocardial pathology segmentation (MyoPS) combining three-sequence cardiac magnetic resonance (CMR) images, which was first proposed in the MyoPS challenge, in conjunction with MICCAI 2020. Note that MyoPS refers to both myocardial pathology segmentation and the challenge in this paper. The challenge provided 45 paired and pre-aligned CMR images, allowing algorithms to combine the complementary information from the three CMR sequences for pathology segmentation. In this article, we provide details of the challenge, survey the works from fifteen participants and interpret their methods according to five aspects, i.e., preprocessing, data augmentation, learning strategy, model architecture and post-processing. In addition, we analyze the results with respect to different factors, in order to examine the key obstacles and explore the potential of solutions, as well as to provide a benchmark for future research. The average Dice scores of submitted algorithms were 0.614±0.231 and 0.644±0.153 for myocardial scars and edema, respectively. We conclude that while promising results have been reported, the research is still in the early stage, and more in-depth exploration is needed before a successful application to the clinics. MyoPS data and evaluation tool continue to be publicly available upon registration via its homepage (www.sdspeople.fudan.edu.cn/zhuangxiahai/0/myops20/).


Subject(s)
Benchmarking , Image Processing, Computer-Assisted , Humans , Image Processing, Computer-Assisted/methods , Heart/diagnostic imaging , Myocardium/pathology , Magnetic Resonance Imaging/methods
4.
IEEE J Biomed Health Inform ; 27(7): 3302-3313, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37067963

ABSTRACT

In recent years, several deep learning models have been proposed to accurately quantify and diagnose cardiac pathologies. These automated tools heavily rely on the accurate segmentation of cardiac structures in MRI images. However, segmentation of the right ventricle is challenging due to its highly complex shape and ill-defined borders. Hence, there is a need for new methods to handle such structure's geometrical and textural complexities, notably in the presence of pathologies such as Dilated Right Ventricle, Tricuspid Regurgitation, Arrhythmogenesis, Tetralogy of Fallot, and Inter-atrial Communication. The last MICCAI challenge on right ventricle segmentation was held in 2012 and included only 48 cases from a single clinical center. As part of the 12th Workshop on Statistical Atlases and Computational Models of the Heart (STACOM 2021), the M&Ms-2 challenge was organized to promote the interest of the research community around right ventricle segmentation in multi-disease, multi-view, and multi-center cardiac MRI. Three hundred sixty CMR cases, including short-axis and long-axis 4-chamber views, were collected from three Spanish hospitals using nine different scanners from three different vendors, and included a diverse set of right and left ventricle pathologies. The solutions provided by the participants show that nnU-Net achieved the best results overall. However, multi-view approaches were able to capture additional information, highlighting the need to integrate multiple cardiac diseases, views, scanners, and acquisition protocols to produce reliable automatic cardiac segmentation algorithms.


Subject(s)
Deep Learning , Heart Ventricles , Humans , Heart Ventricles/diagnostic imaging , Magnetic Resonance Imaging/methods , Algorithms , Heart Atria
5.
IEEE Trans Pattern Anal Mach Intell ; 45(9): 10913-10928, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37074899

ABSTRACT

Convolutional Neural Networks are the de facto models for image recognition. However 3D CNNs, the straight forward extension of 2D CNNs for video recognition, have not achieved the same success on standard action recognition benchmarks. One of the main reasons for this reduced performance of 3D CNNs is the increased computational complexity requiring large scale annotated datasets to train them in scale. 3D kernel factorization approaches have been proposed to reduce the complexity of 3D CNNs. Existing kernel factorization approaches follow hand-designed and hard-wired techniques. In this paper we propose Gate-Shift-Fuse (GSF), a novel spatio-temporal feature extraction module which controls interactions in spatio-temporal decomposition and learns to adaptively route features through time and combine them in a data dependent manner. GSF leverages grouped spatial gating to decompose input tensor and channel weighting to fuse the decomposed tensors. GSF can be inserted into existing 2D CNNs to convert them into an efficient and high performing spatio-temporal feature extractor, with negligible parameter and compute overhead. We perform an extensive analysis of GSF using two popular 2D CNN families and achieve state-of-the-art or competitive performance on five standard action recognition benchmarks.

6.
IEEE Trans Pattern Anal Mach Intell ; 45(6): 6674-6687, 2023 Jun.
Article in English | MEDLINE | ID: mdl-33571086

ABSTRACT

We present EgoACO, a deep neural architecture for video action recognition that learns to pool action-context-object descriptors from frame level features by leveraging the verb-noun structure of action labels in egocentric video datasets. The core component is class activation pooling (CAP), a differentiable pooling layer that combines ideas from bilinear pooling for fine-grained recognition and from feature learning for discriminative localization. CAP uses self-attention with a dictionary of learnable weights to pool from the most relevant feature regions. Through CAP, EgoACO learns to decode object and scene context descriptors from video frame features. For temporal modeling we design a recurrent version of class activation pooling termed Long Short-Term Attention (LSTA). LSTA extends convolutional gated LSTM with built-in spatial attention and a re-designed output gate. Action, object and context descriptors are fused by a multi-head prediction that accounts for the inter-dependencies between noun-verb-action structured labels in egocentric video datasets. EgoACO features built-in visual explanations, helping learning and interpretation of discriminative information in video. Results on the two largest egocentric action recognition datasets currently available, EPIC-KITCHENS and EGTEA Gaze+, show that by decoding action-context-object descriptors, the model achieves state-of-the-art recognition performance.

7.
Med Image Anal ; 83: 102628, 2023 01.
Article in English | MEDLINE | ID: mdl-36283200

ABSTRACT

Domain Adaptation (DA) has recently been of strong interest in the medical imaging community. While a large variety of DA techniques have been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality Domain Adaptation (crossMoDA) challenge was organised in conjunction with the 24th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021). CrossMoDA is the first large and multi-class benchmark for unsupervised cross-modality Domain Adaptation. The goal of the challenge is to segment two key brain structures involved in the follow-up and treatment planning of vestibular schwannoma (VS): the VS and the cochleas. Currently, the diagnosis and surveillance in patients with VS are commonly performed using contrast-enhanced T1 (ceT1) MR imaging. However, there is growing interest in using non-contrast imaging sequences such as high-resolution T2 (hrT2) imaging. For this reason, we established an unsupervised cross-modality segmentation benchmark. The training dataset provides annotated ceT1 scans (N=105) and unpaired non-annotated hrT2 scans (N=105). The aim was to automatically perform unilateral VS and bilateral cochlea segmentation on hrT2 scans as provided in the testing set (N=137). This problem is particularly challenging given the large intensity distribution gap across the modalities and the small volume of the structures. A total of 55 teams from 16 countries submitted predictions to the validation leaderboard. Among them, 16 teams from 9 different countries submitted their algorithm for the evaluation phase. The level of performance reached by the top-performing teams is strikingly high (best median Dice score - VS: 88.4%; Cochleas: 85.7%) and close to full supervision (median Dice score - VS: 92.5%; Cochleas: 87.7%). All top-performing methods made use of an image-to-image translation approach to transform the source-domain images into pseudo-target-domain images. A segmentation network was then trained using these generated images and the manual annotations provided for the source image.


Subject(s)
Neuroma, Acoustic , Humans , Neuroma, Acoustic/diagnostic imaging
8.
Sci Rep ; 12(1): 12532, 2022 07 22.
Article in English | MEDLINE | ID: mdl-35869125

ABSTRACT

Radiomics is an emerging technique for the quantification of imaging data that has recently shown great promise for deeper phenotyping of cardiovascular disease. Thus far, the technique has been mostly applied in single-centre studies. However, one of the main difficulties in multi-centre imaging studies is the inherent variability of image characteristics due to centre differences. In this paper, a comprehensive analysis of radiomics variability under several image- and feature-based normalisation techniques was conducted using a multi-centre cardiovascular magnetic resonance dataset. 218 subjects divided into healthy (n = 112) and hypertrophic cardiomyopathy (n = 106, HCM) groups from five different centres were considered. First and second order texture radiomic features were extracted from three regions of interest, namely the left and right ventricular cavities and the left ventricular myocardium. Two methods were used to assess features' variability. First, feature distributions were compared across centres to obtain a distribution similarity index. Second, two classification tasks were proposed to assess: (1) the amount of centre-related information encoded in normalised features (centre identification) and (2) the generalisation ability for a classification model when trained on these features (healthy versus HCM classification). The results showed that the feature-based harmonisation technique ComBat is able to remove the variability introduced by centre information from radiomic features, at the expense of slightly degrading classification performance. Piecewise linear histogram matching normalisation gave features with greater generalisation ability for classification ( balanced accuracy in between 0.78 ± 0.08 and 0.79 ± 0.09). Models trained with features from images without normalisation showed the worst performance overall ( balanced accuracy in between 0.45 ± 0.28 and 0.60 ± 0.22). In conclusion, centre-related information removal did not imply good generalisation ability for classification.


Subject(s)
Cardiomyopathy, Hypertrophic , Magnetic Resonance Imaging , Cardiomyopathy, Hypertrophic/diagnostic imaging , Humans , Magnetic Resonance Imaging/methods , Pilot Projects
9.
Patterns (N Y) ; 3(7): 100543, 2022 Jul 08.
Article in English | MEDLINE | ID: mdl-35845844

ABSTRACT

Obtaining a standardized benchmark of computational methods is a major issue in data-science communities. Dedicated frameworks enabling fair benchmarking in a unified environment are yet to be developed. Here, we introduce Codabench, a meta-benchmark platform that is open sourced and community driven for benchmarking algorithms or software agents versus datasets or tasks. A public instance of Codabench is open to everyone free of charge and allows benchmark organizers to fairly compare submissions under the same setting (software, hardware, data, algorithms), with custom protocols and data formats. Codabench has unique features facilitating easy organization of flexible and reproducible benchmarks, such as the possibility of reusing templates of benchmarks and supplying compute resources on demand. Codabench has been used internally and externally on various applications, receiving more than 130 users and 2,500 submissions. As illustrative use cases, we introduce four diverse benchmarks covering graph machine learning, cancer heterogeneity, clinical diagnosis, and reinforcement learning.

10.
PLoS One ; 17(5): e0267759, 2022.
Article in English | MEDLINE | ID: mdl-35507631

ABSTRACT

Effective conservation actions require effective population monitoring. However, accurately counting animals in the wild to inform conservation decision-making is difficult. Monitoring populations through image sampling has made data collection cheaper, wide-reaching and less intrusive but created a need to process and analyse this data efficiently. Counting animals from such data is challenging, particularly when densely packed in noisy images. Attempting this manually is slow and expensive, while traditional computer vision methods are limited in their generalisability. Deep learning is the state-of-the-art method for many computer vision tasks, but it has yet to be properly explored to count animals. To this end, we employ deep learning, with a density-based regression approach, to count fish in low-resolution sonar images. We introduce a large dataset of sonar videos, deployed to record wild Lebranche mullet schools (Mugil liza), with a subset of 500 labelled images. We utilise abundant unlabelled data in a self-supervised task to improve the supervised counting task. For the first time in this context, by introducing uncertainty quantification, we improve model training and provide an accompanying measure of prediction uncertainty for more informed biological decision-making. Finally, we demonstrate the generalisability of our proposed counting framework through testing it on a recent benchmark dataset of high-resolution annotated underwater images from varying habitats (DeepFish). From experiments on both contrasting datasets, we demonstrate our network outperforms the few other deep learning models implemented for solving this task. By providing an open-source framework along with training data, our study puts forth an efficient deep learning template for crowd counting aquatic animals thereby contributing effective methods to assess natural populations from the ever-increasing visual data.


Subject(s)
Deep Learning , Animals , Benchmarking , Ecosystem , Fishes , Uncertainty
11.
IEEE Trans Cybern ; 52(5): 3422-3433, 2022 May.
Article in English | MEDLINE | ID: mdl-32816685

ABSTRACT

The ChaLearn large-scale gesture recognition challenge has run twice in two workshops in conjunction with the International Conference on Pattern Recognition (ICPR) 2016 and International Conference on Computer Vision (ICCV) 2017, attracting more than 200 teams around the world. This challenge has two tracks, focusing on isolated and continuous gesture recognition, respectively. It describes the creation of both benchmark datasets and analyzes the advances in large-scale gesture recognition based on these two datasets. In this article, we discuss the challenges of collecting large-scale ground-truth annotations of gesture recognition and provide a detailed analysis of the current methods for large-scale isolated and continuous gesture recognition. In addition to the recognition rate and mean Jaccard index (MJI) as evaluation metrics used in previous challenges, we introduce the corrected segmentation rate (CSR) metric to evaluate the performance of temporal segmentation for continuous gesture recognition. Furthermore, we propose a bidirectional long short-term memory (Bi-LSTM) method, determining video division points based on skeleton points. Experiments show that the proposed Bi-LSTM outperforms state-of-the-art methods with an absolute improvement of 8.1% (from 0.8917 to 0.9639) of CSR.


Subject(s)
Gestures , Pattern Recognition, Automated , Algorithms , Humans , Pattern Recognition, Automated/methods
12.
BMC Bioinformatics ; 22(1): 473, 2021 Oct 02.
Article in English | MEDLINE | ID: mdl-34600479

ABSTRACT

BACKGROUND: Quantification of tumor heterogeneity is essential to better understand cancer progression and to adapt therapeutic treatments to patient specificities. Bioinformatic tools to assess the different cell populations from single-omic datasets as bulk transcriptome or methylome samples have been recently developed, including reference-based and reference-free methods. Improved methods using multi-omic datasets are yet to be developed in the future and the community would need systematic tools to perform a comparative evaluation of these algorithms on controlled data. RESULTS: We present DECONbench, a standardized unbiased benchmarking resource, applied to the evaluation of computational methods quantifying cell-type heterogeneity in cancer. DECONbench includes gold standard simulated benchmark datasets, consisting of transcriptome and methylome profiles mimicking pancreatic adenocarcinoma molecular heterogeneity, and a set of baseline deconvolution methods (reference-free algorithms inferring cell-type proportions). DECONbench performs a systematic performance evaluation of each new methodological contribution and provides the possibility to publicly share source code and scoring. CONCLUSION: DECONbench allows continuous submission of new methods in a user-friendly fashion, each novel contribution being automatically compared to the reference baseline methods, which enables crowdsourced benchmarking. DECONbench is designed to serve as a reference platform for the benchmarking of deconvolution methods in the evaluation of cancer heterogeneity. We believe it will contribute to leverage the benchmarking practices in the biomedical and life science communities. DECONbench is hosted on the open source Codalab competition platform. It is freely available at: https://competitions.codalab.org/competitions/27453 .


Subject(s)
Adenocarcinoma , Pancreatic Neoplasms , Algorithms , Benchmarking , Computational Biology , Humans , Pancreatic Neoplasms/genetics
13.
IEEE Trans Med Imaging ; 40(12): 3543-3554, 2021 12.
Article in English | MEDLINE | ID: mdl-34138702

ABSTRACT

The emergence of deep learning has considerably advanced the state-of-the-art in cardiac magnetic resonance (CMR) segmentation. Many techniques have been proposed over the last few years, bringing the accuracy of automated segmentation close to human performance. However, these models have been all too often trained and validated using cardiac imaging samples from single clinical centres or homogeneous imaging protocols. This has prevented the development and validation of models that are generalizable across different clinical centres, imaging conditions or scanner vendors. To promote further research and scientific benchmarking in the field of generalizable deep learning for cardiac segmentation, this paper presents the results of the Multi-Centre, Multi-Vendor and Multi-Disease Cardiac Segmentation (M&Ms) Challenge, which was recently organized as part of the MICCAI 2020 Conference. A total of 14 teams submitted different solutions to the problem, combining various baseline models, data augmentation strategies, and domain adaptation techniques. The obtained results indicate the importance of intensity-driven data augmentation, as well as the need for further research to improve generalizability towards unseen scanner vendors or new imaging protocols. Furthermore, we present a new resource of 375 heterogeneous CMR datasets acquired by using four different scanner vendors in six hospitals and three different countries (Spain, Canada and Germany), which we provide as open-access for the community to enable future research in the field.


Subject(s)
Heart , Magnetic Resonance Imaging , Cardiac Imaging Techniques , Heart/diagnostic imaging , Humans
14.
IEEE Trans Pattern Anal Mach Intell ; 43(9): 3108-3125, 2021 09.
Article in English | MEDLINE | ID: mdl-33891549

ABSTRACT

This paper reports the results and post-challenge analyses of ChaLearn's AutoDL challenge series, which helped sorting out a profusion of AutoML solutions for Deep Learning (DL) that had been introduced in a variety of settings, but lacked fair comparisons. All input data modalities (time series, images, videos, text, tabular) were formatted as tensors and all tasks were multi-label classification problems. Code submissions were executed on hidden tasks, with limited time and computational resources, pushing solutions that get results quickly. In this setting, DL methods dominated, though popular Neural Architecture Search (NAS) was impractical. Solutions relied on fine-tuned pre-trained networks, with architectures matching data modality. Post-challenge tests did not reveal improvements beyond the imposed time limit. While no component is particularly original or novel, a high level modular organization emerged featuring a "meta-learner", "data ingestor", "model selector", "model/learner", and "evaluator". This modularity enabled ablation studies, which revealed the importance of (off-platform) meta-learning, ensembling, and efficient data management. Experiments on heterogeneous module combinations further confirm the (local) optimality of the winning solutions. Our challenge legacy includes an ever-lasting benchmark (http://autodl.chalearn.org), the open-sourced code of the winners, and a free "AutoDL self-service."

15.
Entropy (Basel) ; 22(5)2020 May 07.
Article in English | MEDLINE | ID: mdl-33286302

ABSTRACT

Human behaviour analysis has introduced several challenges in various fields, such as applied information theory, affective computing, robotics, biometrics and pattern recognition [...].

16.
Entropy (Basel) ; 21(4)2019 Apr 18.
Article in English | MEDLINE | ID: mdl-33267128

ABSTRACT

Action recognition is a challenging task that plays an important role in many robotic systems, which highly depend on visual input feeds. However, due to privacy concerns, it is important to find a method which can recognise actions without using visual feed. In this paper, we propose a concept for detecting actions while preserving the test subject's privacy. Our proposed method relies only on recording the temporal evolution of light pulses scattered back from the scene. Such data trace to record one action contains a sequence of one-dimensional arrays of voltage values acquired by a single-pixel detector at 1 GHz repetition rate. Information about both the distance to the object and its shape are embedded in the traces. We apply machine learning in the form of recurrent neural networks for data analysis and demonstrate successful action recognition. The experimental results show that our proposed method could achieve on average 96.47 % accuracy on the actions walking forward, walking backwards, sitting down, standing up and waving hand, using recurrent neural network.

17.
Inf. psiquiátr ; (232): 47-60, abr.-jun. 2018. tab
Article in Spanish | IBECS | ID: ibc-180754

ABSTRACT

El proyecto de "Buen trato a las personas mayores y personas en situación de fragilidad con padecimiento emocional: hacia un envejecimiento saludable" se enmarca en la comarca del Baix Llobregat, siendo un encargo de su Consell Comarcal, en colaboración con el Departament de Treball, Afers Socials i Famílies de la Generalitat de Catalunya y la Diputació de Barcelona. El Proyecto se ha desarrollado en tres fases. La primera fase ha tenido como objetivo llevar a término el diagnóstico de la situación actual de las persones mayores y de aquellas que se encuentran en situación de fragilidad. Para ello se han recogido y analizado las opiniones de profesionales y técnicos expertos, familiares y persones en situación de fragilidad. Para la segunda fase se ha elaborado un documento marco que tiene como objetivo ser la referencia para la comarca del Baix Llobregat y que contemple los aspectos de buen trato a las persones mayores y aquella en situación de fragilidad. En la tercera fase se procederá a la implementación del Proyecto. Para ello se están llevando a cabo acciones de sensibilización y formación específicas dirigidas a diversos colectivos de profesionales. Finalmente, se contempla una cuarta fase que será de evaluación y mejora


The project of "Good treatment to the elderly and people in a situation of fragility with emotional suffering: towards a healthy aging" is part of the Baix Llobregat region, being commissioned by its Regional Council, in collaboration with the Departament de Treball, Afers Socials i Famílies of the Generalitat de Catalunya and the Diputació de Barcelona. The Project has been developed in three phases. The first phase has been aimed at completing the diagnosis of the current situation of the elderly and those who are in a situation of fragility. To this end, the opinions of experts and technical experts, family members and people in a situation of fragility have been collected and analyzed. For the second phase, a framework document has been prepared that aims to be the reference for the Baix Llobregat region and that includes the aspects of good treatment for elderly people and those in a situation of fragility. In the third phase, the Project will be implemented. To this end, specific awareness and training actions are being carried out aimed at various groups of professionals. Finally, a fourth phase is contemplated that will be of evaluation and improvement


Subject(s)
Humans , Aged , Aged, 80 and over , Elder Abuse/psychology , Frail Elderly/psychology , Aging/psychology , Health of the Elderly , Projects , Geriatric Psychiatry , 34658 , Qualitative Research , Psychosocial Deprivation
18.
Sensors (Basel) ; 18(1)2018 Jan 03.
Article in English | MEDLINE | ID: mdl-29301337

ABSTRACT

We present a pattern recognition framework for semantic segmentation of visual structures, that is, multi-class labelling at pixel level, and apply it to the task of segmenting organs in the eviscerated viscera from slaughtered poultry in RGB-D images. This is a step towards replacing the current strenuous manual inspection at poultry processing plants. Features are extracted from feature maps such as activation maps from a convolutional neural network (CNN). A random forest classifier assigns class probabilities, which are further refined by utilizing context in a conditional random field. The presented method is compatible with both 2D and 3D features, which allows us to explore the value of adding 3D and CNN-derived features. The dataset consists of 604 RGB-D images showing 151 unique sets of eviscerated viscera from four different perspectives. A mean Jaccard index of 78.11 % is achieved across the four classes of organs by using features derived from 2D, 3D and a CNN, compared to 74.28 % using only basic 2D image features.

19.
Entropy (Basel) ; 20(11)2018 Oct 23.
Article in English | MEDLINE | ID: mdl-33266533

ABSTRACT

In this paper, a deep learning approach, Restricted Boltzmann Machine (RBM), is used to perform automatic hand sign language recognition from visual data. We evaluate how RBM, as a deep generative model, is capable of generating the distribution of the input data for an enhanced recognition of unseen data. Two modalities, RGB and Depth, are considered in the model input in three forms: original image, cropped image, and noisy cropped image. Five crops of the input image are used and the hand of these cropped images are detected using Convolutional Neural Network (CNN). After that, three types of the detected hand images are generated for each modality and input to RBMs. The outputs of the RBMs for two modalities are fused in another RBM in order to recognize the output sign label of the input image. The proposed multi-modal model is trained on all and part of the American alphabet and digits of four publicly available datasets. We also evaluate the robustness of the proposal against noise. Experimental results show that the proposed multi-modal model, using crops and the RBM fusing methodology, achieves state-of-the-art results on Massey University Gesture Dataset 2012, American Sign Language (ASL). and Fingerspelling Dataset from the University of Surrey's Center for Vision, Speech and Signal Processing, NYU, and ASL Fingerspelling A datasets.

20.
IEEE Trans Pattern Anal Mach Intell ; 40(10): 2388-2401, 2018 10.
Article in English | MEDLINE | ID: mdl-29035211

ABSTRACT

Error Correcting Output Codes (ECOC) is a successful technique in multi-class classification, which is a core problem in Pattern Recognition and Machine Learning. A major advantage of ECOC over other methods is that the multi-class problem is decoupled into a set of binary problems that are solved independently. However, literature defines a general error-correcting capability for ECOCs without analyzing how it distributes among classes, hindering a deeper analysis of pair-wise error-correction. To address these limitations this paper proposes an Error-Correcting Factorization (ECF) method. Our contribution is three fold: (I) We propose a novel representation of the error-correction capability, called the design matrix, that enables us to build an ECOC on the basis of allocating correction to pairs of classes. (II) We derive the optimal code length of an ECOC using rank properties of the design matrix. (III) ECF is formulated as a discrete optimization problem, and a relaxed solution is found using an efficient constrained block coordinate descent approach. (IV) Enabled by the flexibility introduced with the design matrix we propose to allocate the error-correction on classes that are prone to confusion. Experimental results in several databases show that when allocating the error-correction to confusable classes ECF outperforms state-of-the-art approaches.


Subject(s)
Algorithms , Machine Learning , Pattern Recognition, Automated/methods , Electronic Data Processing , Escherichia coli , Face/anatomy & histology , Humans , Image Processing, Computer-Assisted/methods , Models, Theoretical , Yeasts
SELECTION OF CITATIONS
SEARCH DETAIL
...