Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
Med Image Anal ; 97: 103230, 2024 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-38875741

RESUMO

Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training methodologies. With T3, challenge organizers train a codebase provided by the participants on sequestered training data. T3 was implemented in the STOIC2021 challenge, with the goal of predicting from a computed tomography (CT) scan whether subjects had a severe COVID-19 infection, defined as intubation or death within one month. STOIC2021 consisted of a Qualification phase, where participants developed challenge solutions using 2000 publicly available CT scans, and a Final phase, where participants submitted their training methodologies with which solutions were trained on CT scans of 9724 subjects. The organizers successfully trained six of the eight Final phase submissions. The submitted codebases for training and running inference were released publicly. The winning solution obtained an area under the receiver operating characteristic curve for discerning between severe and non-severe COVID-19 of 0.815. The Final phase solutions of all finalists improved upon their Qualification phase solutions.

2.
IEEE J Biomed Health Inform ; 28(6): 3781-3792, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38483802

RESUMO

Disease forecasting is a longstanding problem for the research community, which aims at informing and improving decisions with the best available evidence. Specifically, the interest in respiratory disease forecasting has dramatically increased since the beginning of the coronavirus pandemic, rendering the accurate prediction of influenza-like-illness (ILI) a critical task. Although methods for short-term ILI forecasting and nowcasting have achieved good accuracy, their performance worsens at long-term ILI forecasts. Machine learning models have outperformed conventional forecasting approaches enabling to utilize diverse exogenous data sources, such as social media, internet users' search query logs, and climate data. However, the most recent deep learning ILI forecasting models use only historical occurrence data achieving state-of-the-art results. Inspired by recent deep neural network architectures in time series forecasting, this work proposes the Regional Influenza-Like-Illness Forecasting (ReILIF) method for regional long-term ILI prediction. The proposed architecture takes advantage of diverse exogenous data, that are, meteorological and population data, introducing an efficient intermediate fusion mechanism to combine the different types of information with the aim to capture the variations of ILI from various views. The efficacy of the proposed approach compared to state-of-the-art ILI forecasting methods is confirmed by an extensive experimental study following standard evaluation measures.


Assuntos
Previsões , Influenza Humana , Humanos , Influenza Humana/epidemiologia , Previsões/métodos , Aprendizado Profundo , COVID-19/epidemiologia , Redes Neurais de Computação
3.
Med Image Anal ; 94: 103107, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38401269

RESUMO

We propose a novel semi-supervised learning method to leverage unlabeled data alongside minimal annotated data and improve medical imaging classification performance in realistic scenarios with limited labeling budgets to afford data annotations. Our method introduces distance correlation to minimize correlations between feature representations from different views of the same image encoded with non-coupled deep neural networks architectures. In addition, it incorporates a data-driven graph-attention based regularization strategy to model affinities among images within the unlabeled data by exploiting their inherent relational information in the feature space. We conduct extensive experiments on four medical imaging benchmark data sets involving X-ray, dermoscopic, magnetic resonance, and computer tomography imaging on single and multi-label medical imaging classification scenarios. Our experiments demonstrate the effectiveness of our method in achieving very competitive performance and outperforming several state-of-the-art semi-supervised learning methods. Furthermore, they confirm the suitability of distance correlation as a versatile dependence measure and the benefits of the proposed graph-attention based regularization for semi-supervised learning in medical imaging analysis.


Assuntos
Benchmarking , Redes Neurais de Computação , Humanos , Aprendizado de Máquina Supervisionado
4.
IEEE Trans Image Process ; 33: 108-122, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38039164

RESUMO

We present two deep unfolding neural networks for the simultaneous tasks of background subtraction and foreground detection in video. Unlike conventional neural networks based on deep feature extraction, we incorporate domain-knowledge models by considering a masked variation of the robust principal component analysis problem (RPCA). With this approach, we separate video clips into low-rank and sparse components, respectively corresponding to the backgrounds and foreground masks indicating the presence of moving objects. Our models, coined ROMAN-S and ROMAN-R, map the iterations of two alternating direction of multipliers methods (ADMM) to trainable convolutional layers, and the proximal operators are mapped to non-linear activation functions with trainable thresholds. This approach leads to lightweight networks with enhanced interpretability that can be trained on limited data. In ROMAN-S, the correlation in time of successive binary masks is controlled with side-information based on l1 - l1 minimization. ROMAN-R enhances the foreground detection by learning a dictionary of atoms to represent the moving foreground in a high-dimensional feature space and by using reweighted- l1 - l1 minimization. Experiments are conducted on both synthetic and real video datasets, for which we also include an analysis of the generalization to unseen clips. Comparisons are made with existing deep unfolding RPCA neural networks, which do not use a mask formulation for the foreground, and with a 3D U-Net baseline. Results show that our proposed models outperform other deep unfolding networks, as well as the untrained optimization algorithms. ROMAN-R, in particular, is competitive with the U-Net baseline for foreground detection, with the additional advantage of providing video backgrounds and requiring substantially fewer training parameters and smaller training sets.

5.
Artigo em Inglês | MEDLINE | ID: mdl-38157461

RESUMO

Contrastive learning has revolutionized the field of computer vision, learning rich representations from unlabeled data, which generalize well to diverse vision tasks. Consequently, it has become increasingly important to explain these approaches and understand their inner workings mechanisms. Given that contrastive models are trained with interdependent and interacting inputs and aim to learn invariance through data augmentation, the existing methods for explaining single-image systems (e.g., image classification models) are inadequate as they fail to account for these factors and typically assume independent inputs. Additionally, there is a lack of evaluation metrics designed to assess pairs of explanations, and no analytical studies have been conducted to investigate the effectiveness of different techniques used to explaining contrastive learning. In this work, we design visual explanation methods that contribute towards understanding similarity learning tasks from pairs of images. We further adapt existing metrics, used to evaluate visual explanations of image classification systems, to suit pairs of explanations and evaluate our proposed methods with these metrics. Finally, we present a thorough analysis of visual explainability methods for contrastive learning, establish their correlation with downstream tasks and demonstrate the potential of our approaches to investigate their merits and drawbacks.

6.
Sensors (Basel) ; 24(1)2023 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-38202910

RESUMO

This paper studies an advanced machine learning method, specifically few-shot classification with meta-learning, applied to distributed acoustic sensing (DAS) data. The study contributes two key aspects: (i) an investigation of different pre-processing methods for DAS data and (ii) the implementation of a neural network model based on meta-learning to learn a representation of the processed data. In the context of urban infrastructure monitoring, we develop a few-shot classification framework that classifies query samples with only a limited number of support samples. The model consists of an embedding network trained on a meta dataset for feature extraction and is followed by a classifier for performing few-shot classification. This research thoroughly explores three types of data pre-processing, that is, decomposed phase, power spectral density, and frequency energy band, as inputs to the neural network. Experimental results show the efficient learning capabilities of the embedding model when working with various pre-processed data, offering a range of pre-processing options. Furthermore, the results demonstrate outstanding few-shot classification performance across a large number of event classes, highlighting the framework's potential for urban infrastructure monitoring applications.

7.
IEEE Trans Neural Netw Learn Syst ; 33(12): 7330-7344, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-34111008

RESUMO

Training deep neural networks on large datasets containing high-dimensional data requires a large amount of computation. A solution to this problem is data-parallel distributed training, where a model is replicated into several computational nodes that have access to different chunks of the data. This approach, however, entails high communication rates and latency because of the computed gradients that need to be shared among nodes at every iteration. The problem becomes more pronounced in the case that there is wireless communication between the nodes (i.e., due to the limited network bandwidth). To address this problem, various compression methods have been proposed, including sparsification, quantization, and entropy encoding of the gradients. Existing methods leverage the intra-node information redundancy, that is, they compress gradients at each node independently. In contrast, we advocate that the gradients across the nodes are correlated and propose methods to leverage this inter-node redundancy to improve compression efficiency. Depending on the node communication protocol (parameter server or ring-allreduce), we propose two instances for the gradient compression that we coin Learned Gradient Compression (LGC). Our methods exploit an autoencoder (i.e., trained during the first stages of the distributed training) to capture the common information that exists in the gradients of the distributed nodes. To constrain the nodes' computational complexity, the autoencoder is realized with a lightweight neural network. We have tested our LGC methods on the image classification and semantic segmentation tasks using different convolutional neural networks (CNNs) [ResNet50, ResNet101, and pyramid scene parsing network (PSPNet)] and multiple datasets (ImageNet, Cifar10, and CamVid). The ResNet101 model trained for image classification on Cifar10 achieved significant compression rate reductions with the accuracy of 93.57%, which is lower than the baseline distributed training with uncompressed gradients only by 0.18%. The rate of the model is reduced by 8095× and 8× compared with the baseline and the state-of-the-art deep gradient compression (DGC) method, respectively.


Assuntos
Compressão de Dados , Aprendizado Profundo , Redes Neurais de Computação , Entropia
8.
IEEE Trans Image Process ; 30: 4099-4113, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33798083

RESUMO

Deep unfolding methods design deep neural networks as learned variations of optimization algorithms through the unrolling of their iterations. These networks have been shown to achieve faster convergence and higher accuracy than the original optimization methods. In this line of research, this paper presents novel interpretable deep recurrent neural networks (RNNs), designed by the unfolding of iterative algorithms that solve the task of sequential signal reconstruction (in particular, video reconstruction). The proposed networks are designed by accounting that video frames' patches have a sparse representation and the temporal difference between consecutive representations is also sparse. Specifically, we design an interpretable deep RNN (coined reweighted-RNN) by unrolling the iterations of a proximal method that solves a reweighted version of the l1 - l1 minimization problem. Due to the underlying minimization model, our reweighted-RNN has a different thresholding function (alias, different activation function) for each hidden unit in each layer. In this way, it has higher network expressivity than existing deep unfolding RNN models. We also present the derivative l1 - l1 -RNN model, which is obtained by unfolding a proximal method for the l1 - l1 minimization problem. We apply the proposed interpretable RNNs to the task of video frame reconstruction from low-dimensional measurements, that is, sequential video frame reconstruction. The experimental results on various datasets demonstrate that the proposed deep RNNs outperform various RNN models.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Redes Neurais de Computação , Gravação em Vídeo , Algoritmos , Aprendizado Profundo , Humanos , Pedestres
9.
Artigo em Inglês | MEDLINE | ID: mdl-32784140

RESUMO

The reconstruction of a high resolution image given a low resolution observation is an ill-posed inverse problem in imaging. Deep learning methods rely on training data to learn an end-to-end mapping from a low-resolution input to a highresolution output. Unlike existing deep multimodal models that do not incorporate domain knowledge about the problem, we propose a multimodal deep learning design that incorporates sparse priors and allows the effective integration of information from another image modality into the network architecture. Our solution relies on a novel deep unfolding operator, performing steps similar to an iterative algorithm for convolutional sparse coding with side information; therefore, the proposed neural network is interpretable by design. The deep unfolding architecture is used as a core component of a multimodal framework for guided image super-resolution. An alternative multimodal design is investigated by employing residual learning to improve the training efficiency. The presented multimodal approach is applied to super-resolution of near-infrared and multi-spectral images as well as depth upsampling using RGB images as side information. Experimental results show that our model outperforms state-ofthe-art methods.

10.
IEEE Trans Neural Netw Learn Syst ; 31(9): 3579-3593, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-31689219

RESUMO

The problem of completing high-dimensional matrices from a limited set of observations arises in many big data applications, especially recommender systems. The existing matrix completion models generally follow either a memory- or a model-based approach, whereas geometric matrix completion (GMC) models combine the best from both approaches. Existing deep-learning-based geometric models yield good performance, but, in order to operate, they require a fixed structure graph capturing the relationships among the users and items. This graph is typically constructed by evaluating a pre-defined similarity metric on the available observations or by using side information, e.g., user profiles. In contrast, Markov-random-fields-based models do not require a fixed structure graph but rely on handcrafted features to make predictions. When no side information is available and the number of available observations becomes very low, existing solutions are pushed to their limits. In this article, we propose a GMC approach that addresses these challenges. We consider matrix completion as a structured prediction problem in a conditional random field (CRF), which is characterized by a maximum a posteriori (MAP) inference, and we propose a deep model that predicts the missing entries by solving the MAP inference problem. The proposed model simultaneously learns the similarities among matrix entries, computes the CRF potentials, and solves the inference problem. Its training is performed in an end-to-end manner, with a method to supervise the learning of entry similarities. Comprehensive experiments demonstrate the superior performance of the proposed model compared to various state-of-the-art models on popular benchmark data sets and underline its superior capacity to deal with highly incomplete matrices.

11.
Sensors (Basel) ; 19(22)2019 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-31731824

RESUMO

Extrinsic camera calibration is essential for any computer vision task in a camera network. Typically, researchers place a calibration object in the scene to calibrate all the cameras in a camera network. However, when installing cameras in the field, this approach can be costly and impractical, especially when recalibration is needed. This paper proposes a novel, accurate and fully automatic extrinsic calibration framework for camera networks with partially overlapping views. The proposed method considers the pedestrians in the observed scene as the calibration objects and analyzes the pedestrian tracks to obtain extrinsic parameters. Compared to the state of the art, the new method is fully automatic and robust in various environments. Our method detect human poses in the camera images and then models walking persons as vertical sticks. We apply a brute-force method to determines the correspondence between persons in multiple camera images. This information along with 3D estimated locations of the top and the bottom of the pedestrians are then used to compute the extrinsic calibration matrices. We also propose a novel method to calibrate the camera network by only using the top and centerline of the person when the bottom of the person is not available in heavily occluded scenes. We verified the robustness of the method in different camera setups and for both single and multiple walking people. The results show that the triangulation error of a few centimeters can be obtained. Typically, it requires less than one minute of observing the walking people to reach this accuracy in controlled environments. It also just takes a few minutes to collect enough data for the calibration in uncontrolled environments. Our proposed method can perform well in various situations such as multi-person, occlusions, or even at real intersections on the street.


Assuntos
Calibragem/normas , Interpretação de Imagem Assistida por Computador/métodos , Imageamento Tridimensional/métodos , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Inteligência Artificial , Humanos , Reconhecimento Automatizado de Padrão/normas , Pedestres
12.
Sensors (Basel) ; 18(6)2018 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-29857543

RESUMO

This work explores an innovative strategy for increasing the efficiency of compressed sensing applied on mm-wave SAR sensing using multiple weighted side information. The approach is tested on synthetic and on real non-destructive testing measurements performed on a 3D-printed object with defects while taking advantage of multiple previous SAR images of the object with different degrees of similarity. The tested algorithm attributes autonomously weights to the side information at two levels: (1) between the components inside the side information and (2) between the different side information. The reconstruction is thereby almost immune to poor quality side information while exploiting the relevant components hidden inside the added side information. The presented results prove that, in contrast to common compressed sensing, good SAR image reconstruction is achieved at subsampling rates far below the Nyquist rate. Moreover, the algorithm is shown to be much more robust for low quality side information compared to coherent background subtraction.

13.
IEEE Trans Image Process ; 27(9): 4314-4329, 2018 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29870350

RESUMO

This paper considers online robust principal component analysis (RPCA) in time-varying decomposition problems such as video foreground-background separation. We propose a compressive online RPCA algorithm that decomposes recursively a sequence of data vectors (e.g., frames) into sparse and low-rank components. Different from conventional batch RPCA, which processes all the data directly, our approach considers a small set of measurements taken per data vector (frame). Moreover, our algorithm can incorporate multiple prior information from previous decomposed vectors via proposing an - minimization method. At each time instance, the algorithm recovers the sparse vector by solving the - minimization problem-which promotes not only the sparsity of the vector but also its correlation with multiple previously recovered sparse vectors-and, subsequently, updates the low-rank component using incremental singular value decomposition. We also establish theoretical bounds on the number of measurements required to guarantee successful compressive separation under the assumptions of static or slowly changing low-rank components. We evaluate the proposed algorithm using numerical experiments and online video foreground-background separation experiments. The experimental results show that the proposed method outperforms the existing methods.

14.
IEEE Trans Image Process ; 26(2): 751-764, 2017 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-27831873

RESUMO

In support of art investigation, we propose a new source separation method that unmixes a single X-ray scan acquired from double-sided paintings. In this problem, the X-ray signals to be separated have similar morphological characteristics, which brings previous source separation methods to their limits. Our solution is to use photographs taken from the front-and back-side of the panel to drive the separation process. The crux of our approach relies on the coupling of the two imaging modalities (photographs and X-rays) using a novel coupled dictionary learning framework able to capture both common and disparate features across the modalities using parsimonious representations; the common component captures features shared by the multi-modal images, whereas the innovation component captures modality-specific information. As such, our model enables the formulation of appropriately regularized convex optimization procedures that lead to the accurate separation of the X-rays. Our dictionary learning framework can be tailored both to a single- and a multi-scale framework, with the latter leading to a significant performance improvement. Moreover, to improve further on the visual quality of the separated images, we propose to train coupled dictionaries that ignore certain parts of the painting corresponding to craquelure. Experimentation on synthetic and real data - taken from digital acquisition of the Ghent Altarpiece (1432) - confirms the superiority of our method against the state-of-the-art morphological component analysis technique that uses either fixed or trained dictionaries to perform image separation.

15.
IEEE Trans Image Process ; 21(4): 1934-49, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22203710

RESUMO

In the context of low-cost video encoding, distributed video coding (DVC) has recently emerged as a potential candidate for uplink-oriented applications. This paper builds on a concept of correlation channel (CC) modeling, which expresses the correlation noise as being statistically dependent on the side information (SI). Compared with classical side-information-independent (SII) noise modeling adopted in current DVC solutions, it is theoretically proven that side-information-dependent (SID) modeling improves the Wyner-Ziv coding performance. Anchored in this finding, this paper proposes a novel algorithm for online estimation of the SID CC parameters based on already decoded information. The proposed algorithm enables bit-plane-by-bit-plane successive refinement of the channel estimation leading to progressively improved accuracy. Additionally, the proposed algorithm is included in a novel DVC architecture that employs a competitive hash-based motion estimation technique to generate high-quality SI at the decoder. Experimental results corroborate our theoretical gains and validate the accuracy of the channel estimation algorithm. The performance assessment of the proposed architecture shows remarkable and consistent coding gains over a germane group of state-of-the-art distributed and standard video codecs, even under strenuous conditions, i.e., large groups of pictures and highly irregular motion content.


Assuntos
Algoritmos , Compressão de Dados/métodos , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Fotografação/métodos , Processamento de Sinais Assistido por Computador , Gravação em Vídeo/métodos , Gráficos por Computador , Análise Numérica Assistida por Computador , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Estatística como Assunto
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA