Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
IEEE Trans Image Process ; 33: 2158-2170, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38470575

RESUMEN

Depth information opens up new opportunities for video object segmentation (VOS) to be more accurate and robust in complex scenes. However, the RGBD VOS task is largely unexplored due to the expensive collection of RGBD data and time-consuming annotation of segmentation. In this work, we first introduce a new benchmark for RGBD VOS, named DepthVOS, which contains 350 videos (over 55k frames in total) annotated with masks and bounding boxes. We futher propose a novel, strong baseline model - Fused Color-Depth Network (FusedCDNet), which can be trained solely under the supervision of bounding boxes, while being used to generate masks with a bounding box guideline only in the first frame. Thereby, the model possesses three major advantages: a weakly-supervised training strategy to overcome the high-cost annotation, a cross-modal fusion module to handle complex scenes, and weakly-supervised inference to promote ease of use. Extensive experiments demonstrate that our proposed method performs on par with top fully-supervised algorithms. We will open-source our project on https://github.com/yjybuaa/depthvos/ to facilitate the development of RGBD VOS.

2.
IEEE Trans Pattern Anal Mach Intell ; 46(7): 5131-5148, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38300783

RESUMEN

One fundamental problem in deep learning is understanding the excellent performance of deep Neural Networks (NNs) in practice. An explanation for the superiority of NNs is that they can realize a large family of complicated functions, i.e., they have powerful expressivity. The expressivity of a Neural Network with Piecewise Linear activations (PLNN) can be quantified by the maximal number of linear regions it can separate its input space into. In this paper, we provide several mathematical results needed for studying the linear regions of Convolutional Neural Networks with Piecewise Linear activations (PLCNNs), and use them to derive the maximal and average numbers of linear regions for one-layer PLCNNs. Furthermore, we obtain upper and lower bounds for the number of linear regions of multi-layer PLCNNs. Our results suggest that deeper PLCNNs have more powerful expressivity than shallow PLCNNs, while PLCNNs have more expressivity than fully-connected PLNNs per parameter, in terms of the number of linear regions.

3.
IEEE Trans Cybern ; 54(4): 2592-2605, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-37729576

RESUMEN

Appearance-based gaze estimation has been widely studied recently with promising performance. The majority of appearance-based gaze estimation methods are developed under the deterministic frameworks. However, the deterministic gaze estimation methods suffer from large performance drop upon challenging eye images in low-resolution, darkness, partial occlusions, etc. To alleviate this problem, in this article, we alternatively reformulate the appearance-based gaze estimation problem under a generative framework. Specifically, we propose a variational inference model, that is, variational gaze estimation network (VGE-Net), to generate multiple gaze maps as complimentary candidates simultaneously supervised by the ground-truth gaze map. To achieve robust estimation, we adaptively fuse the gaze directions predicted on these candidate gaze maps by a regression network through a simple attention mechanism. Experiments on three benchmarks, that is, MPIIGaze, EYEDIAP, and Columbia, demonstrate that our VGE-Net outperforms state-of-the-art gaze estimation methods, especially on challenging cases. Comprehensive ablation studies also validate the effectiveness of our contributions. The code will be publicly released.

4.
Med Image Anal ; 82: 102603, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36116297

RESUMEN

Automating report generation for medical imaging promises to minimize labor and aid diagnosis in clinical practice. Deep learning algorithms have recently been shown to be capable of captioning natural photos. However, doing a similar thing for medical data, is difficult due to the variety in reports written by different radiologists with fluctuating levels of knowledge and experience. Current methods for automatic report generation tend to merely copy one of the training samples in the created report. To tackle this issue, we propose variational topic inference, a probabilistic approach for automatic chest X-ray report generation. Specifically, we introduce a probabilistic latent variable model where a latent variable defines a single topic. The topics are inferred in a conditional variational inference framework by aligning vision and language modalities in a latent space, with each topic governing the generation of one sentence in the report. We further adopt a visual attention module that enables the model to attend to different locations in the image while generating the descriptions. We conduct extensive experiments on two benchmarks, namely Indiana U. Chest X-rays and MIMIC-CXR. The results demonstrate that our proposed variational topic inference method can generate reports with novel sentence structure, rather than mere copies of reports used in training, while still achieving comparable performance to state-of-the-art methods in terms of standard language generation criteria.


Asunto(s)
Algoritmos , Modelos Teóricos , Humanos , Rayos X , Incertidumbre
5.
Med Image Anal ; 80: 102512, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35709559

RESUMEN

Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is an MRI technique for quantifying perfusion that can be used in clinical applications for classification of tumours and other types of diseases. Conventionally, the non-linear least squares (NLLS) methods is used for tracer-kinetic modelling of DCE data. However, despite promising results, NLLS suffers from long processing times (minutes-hours) and noisy parameter maps due to the non-convexity of the cost function. In this work, we investigated physics-informed deep neural networks for estimating physiological parameters from DCE-MRI signal-curves. Three voxel-wise temporal frameworks (FCN, LSTM, GRU) and two spatio-temporal frameworks (CNN, U-Net) were investigated. The accuracy and precision of parameter estimation by the temporal frameworks were evaluated in simulations. All networks showed higher precision than the NLLS. Specifically, the GRU showed to decrease the random error on ve by a factor of 4.8 with respect to the NLLS for noise (SD) of 1/20. The accuracy was better for the prediction of the ve parameter in all networks compared to the NLLS. The GRU and LSTM worked with arbitrary acquisition lengths. The GRU was selected for in vivo evaluation and compared to the spatio-temporal frameworks in 28 patients with pancreatic cancer. All neural network approaches showed less noisy parameter maps than the NLLS. The GRU had better test-retest repeatability than the NLLS for all three parameters and was able to detect one additional patient with significant changes in DCE parameters post chemo-radiotherapy. Although the U-Net and CNN had even better test-retest characteristics than the GRU, and were able to detect even more responders, they also showed potential systematic errors in the parameter maps. Therefore, we advise using our GRU framework for analysing DCE data.


Asunto(s)
Aprendizaje Profundo , Neoplasias Pancreáticas , Algoritmos , Medios de Contraste , Humanos , Imagen por Resonancia Magnética/métodos , Neoplasias Pancreáticas/diagnóstico por imagen
6.
Artículo en Inglés | MEDLINE | ID: mdl-35226600

RESUMEN

Few-shot learning deals with the fundamental and challenging problem of learning from a few annotated samples, while being able to generalize well on new tasks. The crux of few-shot learning is to extract prior knowledge from related tasks to enable fast adaptation to a new task with a limited amount of data. In this paper, we propose meta-learning kernels with random Fourier features for few-shot learning, we call MetaKernel. Specically, we propose learning variational random features in a data-driven manner to obtain task-specic kernels by leveraging the shared knowledge provided by related tasks in a meta-learning setting. We treat the random feature basis as the latent variable, which is estimated by variational inference. The shared knowledge from related tasks is incorporated into a context inference of the posterior, which we achieve via a long-short term memory module. To establish more expressive kernels, we deploy conditional normalizing ows based on coupling layers to achieve a richer posterior distribution over random Fourier bases. The resultant kernels are more informative and discriminative, which further improves the few-shot learning. We conduct experiments on both few-shot image classication and regression tasks. The results on fourteen datasets demonstrate MetaKernel consistently better performance than state-of-the-art alternatives.

7.
IEEE Trans Image Process ; 31: 275-286, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34855598

RESUMEN

Abnormal crowd behavior detection has recently attracted increasing attention due to its wide applications in computer vision research areas. However, it is still an extremely challenging task due to the great variability of abnormal behavior coupled with huge ambiguity and uncertainty of video contents. To tackle these challenges, we propose a new probabilistic framework named variational abnormal behavior detection (VABD), which can detect abnormal crowd behavior in video sequences. We make three major contributions: (1) We develop a new probabilistic latent variable model that combines the strengths of the U-Net and conditional variational auto-encoder, which also are the backbone of our model; (2) We propose a motion loss based on an optical flow network to impose the motion consistency of generated video frames and input video frames; (3) We embed a Wasserstein generative adversarial network at the end of the backbone network to enhance the framework performance. VABD can accurately discriminate abnormal video frames from video sequences. Experimental results on UCSD, CUHK Avenue, IITB-Corridor, and ShanghaiTech datasets show that VABD outperforms the state-of-the-art algorithms on abnormal crowd behavior detection. Without data augmentation, our VABD achieves 72.24% in terms of AUC on IITB-Corridor, which surpasses the state-of-the-art methods by nearly 5%.

8.
IEEE Trans Neural Netw Learn Syst ; 33(9): 4800-4814, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-33720834

RESUMEN

Skeleton-based action recognition has been extensively studied, but it remains an unsolved problem because of the complex variations of skeleton joints in 3-D spatiotemporal space. To handle this issue, we propose a newly temporal-then-spatial recalibration method named memory attention networks (MANs) and deploy MANs using the temporal attention recalibration module (TARM) and spatiotemporal convolution module (STCM). In the TARM, a novel temporal attention mechanism is built based on residual learning to recalibrate frames of skeleton data temporally. In the STCM, the recalibrated sequence is transformed or encoded as the input of CNNs to further model the spatiotemporal information of skeleton sequence. Based on MANs, a new collaborative memory fusion module (CMFM) is proposed to further improve the efficiency, leading to the collaborative MANs (C-MANs), trained with two streams of base MANs. TARM, STCM, and CMFM form a single network seamlessly and enable the whole network to be trained in an end-to-end fashion. Comparing with the state-of-the-art methods, MANs and C-MANs improve the performance significantly and achieve the best results on six data sets for action recognition. The source code has been made publicly available at https://github.com/memory-attention-networks.


Asunto(s)
Redes Neurales de la Computación , Esqueleto
9.
Artículo en Inglés | MEDLINE | ID: mdl-32365032

RESUMEN

Image deraining is an important yet challenging image processing task. Though deterministic image deraining methods are developed with encouraging performance, they are infeasible to learn flexible representations for probabilistic inference and diverse predictions. Besides, rain intensity varies both in spatial locations and across color channels, making this task more difficult. In this paper, we propose a Conditional Variational Image Deraining (CVID) network for better deraining performance, leveraging the exclusive generative ability of Conditional Variational Auto-Encoder (CVAE) on providing diverse predictions for the rainy image. To perform spatially adaptive deraining, we propose a spatial density estimation (SDE) module to estimate a rain density map for each image. Since rain density varies across different color channels, we also propose a channel-wise (CW) deraining scheme. Experiments on synthesized and real-world datasets show that the proposed CVID network achieves much better performance than previous deterministic methods on image deraining. Extensive ablation studies validate the effectiveness of the proposed SDE module and CW scheme in our CVID network. The code is available at https://github.com/Yingjun-Du/VID.

10.
Artículo en Inglés | MEDLINE | ID: mdl-31765313

RESUMEN

Representation learning is a fundamental but challenging problem, especially when the distribution of data is unknown. In this paper, we propose a new representation learning method, named Structure Transfer Machine (STM), which enables feature learning process to converge at the representation expectation in a probabilistic way. We theoretically show that such an expected value of the representation (mean) is achievable if the manifold structure can be transferred from the data space to the feature space. The resulting structure regularization term, named manifold loss, is incorporated into the loss function of the typical deep learning pipeline. The STM architecture is constructed to enforce the learned deep representation to satisfy the intrinsic manifold structure from the data, which results in robust features that suit various application scenarios, such as digit recognition, image classification and object tracking. Compared with state-of-the-art CNN architectures, we achieve better results on several commonly used public benchmarks.

11.
IEEE Trans Neural Netw Learn Syst ; 30(2): 553-565, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-29994406

RESUMEN

Video classification has been extensively researched in computer vision due to its wide spread applications. However, it remains an outstanding task because of the great challenges in effective spatial-temporal feature extraction and efficient classification with high-dimensional video representations. To address these challenges, in this paper, we propose an end-to-end learning framework called deep ensemble machine (DEM) for video classification. Specifically, to establish effective spatio-temporal features, we propose using two deep convolutional neural networks (CNNs), i.e., vision and graphics group and C3-D to extract heterogeneous spatial and temporal features for complementary representations. To achieve efficient classification, we propose ensemble learning based on random projections aiming to transform high-dimensional features into a set of lower dimensional compact features in subspaces; an ensemble of classifiers is trained on the subspaces and combined with a weighting layer during the backpropagation. To further enhance the performance, we introduce rectified linear encoding (RLE) inspired from error-correcting output coding to encode the initial outputs of classifiers, followed by a softmax layer to produce the final classification results. DEM combines the strengths of deep CNNs and ensemble learning, which establishes a new end-to-end learning architecture for more accurate and efficient video classification. We show the great effectiveness of DEM by extensive experiments on four data sets for diverse video classification tasks including action recognition and dynamic scene classification. Results have shown that DEM achieves high performance on all tasks with an improvement of up to 13% on CIFAR10 data set over the baseline model.

12.
IEEE Trans Image Process ; 28(1): 205-215, 2019 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-30136940

RESUMEN

Action recognition has been extensively researched in computer vision due to its potential applications in a broad range of areas. The key to action recognition lies in modeling actions and measuring their similarity, which however poses great challenges. In this paper, we propose learning match kernels between actions on Grassmann manifold for action recognition. Specifically, we propose modeling actions as a linear subspace on the Grassmann manifold; the subspace is a set of convolutional neural network (CNN) feature vectors pooled temporally over frames in semantic video clips, which simultaneously captures local discriminant patterns and temporal dynamics of motion. To measure the similarity between actions, we propose Grassmann match kernels (GMK) based on canonical correlations of linear subspaces to directly match videos for action recognition; GMK is learned in a supervised way via kernel target alignment, which is endowed with a great discriminative ability to distinguish actions from different classes. The proposed approach leverages the strengths of CNNs for feature extraction and kernels for measuring similarity, which accomplishes a general learning framework of match kernels for action recognition. We have conducted extensive experiments on five challenging realistic data sets including Youtube, UCF50, UCF101, Penn action, and HMDB51. The proposed approach achieves high performance and substantially surpasses the state-of-the-art algorithms by large margins, which demonstrates the great effectiveness of proposed approach for action recognition.

13.
Neuroinformatics ; 16(3-4): 285-294, 2018 10.
Artículo en Inglés | MEDLINE | ID: mdl-29802511

RESUMEN

Accurate and automatic prediction of cognitive assessment from multiple neuroimaging biomarkers is crucial for early detection of Alzheimer's disease. The major challenges arise from the nonlinear relationship between biomarkers and assessment scores and the inter-correlation among them, which have not yet been well addressed. In this paper, we propose multi-layer multi-target regression (MMR) which enables simultaneously modeling intrinsic inter-target correlations and nonlinear input-output relationships in a general compositional framework. Specifically, by kernelized dictionary learning, the MMR can effectively handle highly nonlinear relationship between biomarkers and assessment scores; by robust low-rank linear learning via matrix elastic nets, the MMR can explicitly encode inter-correlations among multiple assessment scores; moreover, the MMR is flexibly and allows to work with non-smooth ℓ2,1-norm loss function, which enables calibration of multiple targets with disparate noise levels for more robust parameter estimation. The MMR can be efficiently solved by an alternating optimization algorithm via gradient descent with guaranteed convergence. The MMR has been evaluated by extensive experiments on the ADNI database with MRI data, and produced high accuracy surpassing previous regression models, which demonstrates its great effectiveness as a new multi-target regression model for clinical multivariate prediction.


Asunto(s)
Enfermedad de Alzheimer/diagnóstico por imagen , Cognición/fisiología , Bases de Datos Factuales , Imagen por Resonancia Magnética , Modelos Neurológicos , Enfermedad de Alzheimer/psicología , Bases de Datos Factuales/estadística & datos numéricos , Humanos , Imagen por Resonancia Magnética/estadística & datos numéricos , Análisis Multivariante , Valor Predictivo de las Pruebas , Análisis de Regresión
14.
IEEE Trans Neural Netw Learn Syst ; 29(5): 1575-1586, 2018 05.
Artículo en Inglés | MEDLINE | ID: mdl-28328512

RESUMEN

Multitarget regression has recently generated intensive popularity due to its ability to simultaneously solve multiple regression tasks with improved performance, while great challenges stem from jointly exploring inter-target correlations and input-output relationships. In this paper, we propose multitarget sparse latent regression (MSLR) to simultaneously model intrinsic intertarget correlations and complex nonlinear input-output relationships in one single framework. By deploying a structure matrix, the MSLR accomplishes a latent variable model which is able to explicitly encode intertarget correlations via -norm-based sparse learning; the MSLR naturally admits a representer theorem for kernel extension, which enables it to flexibly handle highly complex nonlinear input-output relationships; the MSLR can be solved efficiently by an alternating optimization algorithm with guaranteed convergence, which ensures efficient multitarget regression. Extensive experimental evaluation on both synthetic data and six greatly diverse real-world data sets shows that the proposed MSLR consistently outperforms the state-of-the-art algorithms, which demonstrates its great effectiveness for multivariate prediction.

15.
IEEE Trans Pattern Anal Mach Intell ; 40(2): 497-504, 2018 02.
Artículo en Inglés | MEDLINE | ID: mdl-28368816

RESUMEN

Multi-target regression has recently regained great popularity due to its capability of simultaneously learning multiple relevant regression tasks and its wide applications in data mining, computer vision and medical image analysis, while great challenges arise from jointly handling inter-target correlations and input-output relationships. In this paper, we propose Multi-layer Multi-target Regression (MMR) which enables simultaneously modeling intrinsic inter-target correlations and nonlinear input-output relationships in a general framework via robust low-rank learning. Specifically, the MMR can explicitly encode inter-target correlations in a structure matrix by matrix elastic nets (MEN); the MMR can work in conjunction with the kernel trick to effectively disentangle highly complex nonlinear input-output relationships; the MMR can be efficiently solved by a new alternating optimization algorithm with guaranteed convergence. The MMR leverages the strength of kernel methods for nonlinear feature learning and the structural advantage of multi-layer learning architectures for inter-target correlation modeling. More importantly, it offers a new multi-layer learning paradigm for multi-target regression which is endowed with high generality, flexibility and expressive ability. Extensive experimental evaluation on 18 diverse real-world datasets demonstrates that our MMR can achieve consistently high performance and outperforms representative state-of-the-art algorithms, which shows its great effectiveness and generality for multivariate prediction.

16.
IEEE Trans Neural Netw Learn Syst ; 28(9): 2035-2047, 2017 09.
Artículo en Inglés | MEDLINE | ID: mdl-27295694

RESUMEN

Multioutput regression has recently shown great ability to solve challenging problems in both computer vision and medical image analysis. However, due to the huge image variability and ambiguity, it is fundamentally challenging to handle the highly complex input-target relationship of multioutput regression, especially with indiscriminate high-dimensional representations. In this paper, we propose a novel supervised descriptor learning (SDL) algorithm for multioutput regression, which can establish discriminative and compact feature representations to improve the multivariate estimation performance. The SDL is formulated as generalized low-rank approximations of matrices with a supervised manifold regularization. The SDL is able to simultaneously extract discriminative features closely related to multivariate targets and remove irrelevant and redundant information by transforming raw features into a new low-dimensional space aligned to targets. The achieved discriminative while compact descriptor largely reduces the variability and ambiguity for multioutput regression, which enables more accurate and efficient multivariate estimation. We conduct extensive evaluation of the proposed SDL on both synthetic data and real-world multioutput regression tasks for both computer vision and medical image analysis. Experimental results have shown that the proposed SDL can achieve high multivariate estimation accuracy on all tasks and largely outperforms the algorithms in the state of the arts. Our method establishes a novel SDL framework for multioutput regression, which can be widely used to boost the performance in different applications.

17.
Med Image Anal ; 36: 184-196, 2017 02.
Artículo en Inglés | MEDLINE | ID: mdl-27940226

RESUMEN

Cardiac four-chamber volume estimation serves as a fundamental and crucial role in clinical quantitative analysis of whole heart functions. It is a challenging task due to the huge complexity of the four chambers including great appearance variations, huge shape deformation and interference between chambers. Direct estimation has recently emerged as an effective and convenient tool for cardiac ventricular volume estimation. However, existing direct estimation methods were specifically developed for one single ventricle, i.e., left ventricle (LV), or bi-ventricles; they can not be directly used for four chamber volume estimation due to the great combinatorial variability and highly complex anatomical interdependency of the four chambers. In this paper, we propose a new, general framework for direct and simultaneous four chamber volume estimation. We have addressed two key issues, i.e., cardiac image representation and simultaneous four chamber volume estimation, which enables accurate and efficient four-chamber volume estimation. We generate compact and discriminative image representations by supervised descriptor learning (SDL) which can remove irrelevant information and extract discriminative features. We propose direct and simultaneous four-chamber volume estimation by the multioutput sparse latent regression (MSLR), which enables jointly modeling nonlinear input-output relationships and capturing four-chamber interdependence. The proposed method is highly generalized, independent of imaging modalities, which provides a general regression framework that can be extensively used for clinical data prediction to achieve automated diagnosis. Experiments on both MR and CT images show that our method achieves high performance with a correlation coefficient of up to 0.921 with ground truth obtained manually by human experts, which is clinically significant and enables more accurate, convenient and comprehensive assessment of cardiac functions.


Asunto(s)
Algoritmos , Ventrículos Cardíacos/diagnóstico por imagen , Humanos , Análisis de Regresión , Aprendizaje Automático Supervisado
18.
Med Image Anal ; 30: 120-129, 2016 May.
Artículo en Inglés | MEDLINE | ID: mdl-26919699

RESUMEN

Direct estimation of cardiac ventricular volumes has become increasingly popular and important in cardiac function analysis due to its effectiveness and efficiency by avoiding an intermediate segmentation step. However, existing methods rely on either intensive user inputs or problematic assumptions. To realize the full capacities of direct estimation, this paper presents a general, fully learning-based framework for direct bi-ventricular volume estimation, which removes user inputs and unreliable assumptions. We formulate bi-ventricular volume estimation as a general regression framework which consists of two main full learning stages: unsupervised cardiac image representation learning by multi-scale deep networks and direct bi-ventricular volume estimation by random forests. By leveraging strengths of generative and discriminant learning, the proposed method produces high correlations of around 0.92 with ground truth by human experts for both the left and right ventricles using a leave-one-subject-out cross validation, and largely outperforms existing direct methods on a larger dataset of 100 subjects including both healthy and diseased cases with twice the number of subjects used in previous methods. More importantly, the proposed method can not only be practically used in clinical cardiac function analysis but also be easily extended to other organ volume estimation tasks.


Asunto(s)
Ventrículos Cardíacos/diagnóstico por imagen , Interpretación de Imagen Asistida por Computador/métodos , Aprendizaje Automático , Redes Neurales de la Computación , Volumen Sistólico , Disfunción Ventricular/diagnóstico por imagen , Algoritmos , Humanos , Aumento de la Imagen/métodos , Imagenología Tridimensional/métodos , Tamaño de los Órganos , Reconocimiento de Normas Patrones Automatizadas/métodos , Análisis de Regresión , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
19.
IEEE Trans Pattern Anal Mach Intell ; 38(9): 1908-14, 2016 09.
Artículo en Inglés | MEDLINE | ID: mdl-26552074

RESUMEN

In this paper, we propose a novel subspace learning algorithm called Local Feature Discriminant Projection (LFDP) for supervised dimensionality reduction of local features. LFDP is able to efficiently seek a subspace to improve the discriminability of local features for classification. We make three novel contributions. First, the proposed LFDP is a general supervised subspace learning algorithm which provides an efficient way for dimensionality reduction of large-scale local feature descriptors. Second, we introduce the Differential Scatter Discriminant Criterion (DSDC) to the subspace learning of local feature descriptors which avoids the matrix singularity problem. Third, we propose a generalized orthogonalization method to impose on projections, leading to a more compact and less redundant subspace. Extensive experimental validation on three benchmark datasets including UIUC-Sports, Scene-15 and MIT Indoor demonstrates that the proposed LFDP outperforms other dimensionality reduction methods and achieves state-of-the-art performance for image classification.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...