Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
Sensors (Basel) ; 24(8)2024 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-38676016

RESUMO

With the widespread adoption of modern RGB cameras, an abundance of RGB images is available everywhere. Therefore, multi-view stereo (MVS) 3D reconstruction has been extensively applied across various fields because of its cost-effectiveness and accessibility, which involves multi-view depth estimation and stereo matching algorithms. However, MVS tasks face noise challenges because of natural multiplicative noise and negative gain in algorithms, which reduce the quality and accuracy of the generated models and depth maps. Traditional MVS methods often struggle with noise, relying on assumptions that do not always hold true under real-world conditions, while deep learning-based MVS approaches tend to suffer from high noise sensitivity. To overcome these challenges, we introduce LNMVSNet, a deep learning network designed to enhance local feature attention and fuse features across different scales, aiming for low-noise, high-precision MVS 3D reconstruction. Through extensive evaluation of multiple benchmark datasets, LNMVSNet has demonstrated its superior performance, showcasing its ability to improve reconstruction accuracy and completeness, especially in the recovery of fine details and clear feature delineation. This advancement brings hope for the widespread application of MVS, ranging from precise industrial part inspection to the creation of immersive virtual environments.

2.
IEEE Trans Image Process ; 32: 3383-3396, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37307185

RESUMO

Blind image super-resolution (blind SR) aims to generate high-resolution (HR) images from low-resolution (LR) input images with unknown degradations. To enhance the performance of SR, the majority of blind SR methods introduce an explicit degradation estimator, which helps the SR model adjust to unknown degradation scenarios. Unfortunately, it is impractical to provide concrete labels for the multiple combinations of degradations (e. g., blurring, noise, or JPEG compression) to guide the training of the degradation estimator. Moreover, the special designs for certain degradations hinder the models from being generalized for dealing with other degradations. Thus, it is imperative to devise an implicit degradation estimator that can extract discriminative degradation representations for all types of degradations without requiring the supervision of degradation ground-truth. To this end, we propose a Meta-Learning based Region Degradation Aware SR Network (MRDA), including Meta-Learning Network (MLN), Degradation Extraction Network (DEN), and Region Degradation Aware SR Network (RDAN). To handle the lack of ground-truth degradation, we use the MLN to rapidly adapt to the specific complex degradation after several iterations and extract implicit degradation information. Subsequently, a teacher network MRDAT is designed to further utilize the degradation information extracted by MLN for SR. However, MLN requires iterating on paired LR and HR images, which is unavailable in the inference phase. Therefore, we adopt knowledge distillation (KD) to make the student network learn to directly extract the same implicit degradation representation (IDR) as the teacher from LR images. Furthermore, we introduce an RDAN module that is capable of discerning regional degradations, allowing IDR to adaptively influence various texture patterns. Extensive experiments under classic and real-world degradation settings show that MRDA achieves SOTA performance and can generalize to various degradation processes.

3.
Opt Express ; 31(8): 13328-13341, 2023 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-37157472

RESUMO

Multipath in 3D imaging happens when one pixel receives light from multiple reflections, which causes errors in the measured point cloud. In this paper, we propose the soft epipolar 3D(SEpi-3D) method to eliminate multipath in temporal space with an event camera and a laser projector. Specifically, we align the projector and event camera row onto the same epipolar plane with stereo rectification; we capture event flow synchronized with the projector frame to construct a mapping relationship between event timestamp and projector pixel; we develop a multipath eliminating method that utilizes the temporal information from the event data together with the epipolar geometry. Experiments show that the RMSE decreases by 6.55mm on average in the tested multipath scenes, and the percentage of error points decreases by 7.04%.

4.
Artigo em Inglês | MEDLINE | ID: mdl-37027773

RESUMO

The target of space-time video super-resolution (STVSR) is to increase the spatial-temporal resolution of low-resolution (LR) and low-frame-rate (LFR) videos. Recent approaches based on deep learning have made significant improvements, but most of them only use two adjacent frames, that is, short-term features, to synthesize the missing frame embedding, which cannot fully explore the information flow of consecutive input LR frames. In addition, existing STVSR models hardly exploit the temporal contexts explicitly to assist high-resolution (HR) frame reconstruction. To address these issues, in this article, we propose a deformable attention network called STDAN for STVSR. First, we devise a long short-term feature interpolation (LSTFI) module that is capable of excavating abundant content from more neighboring input frames for the interpolation process through a bidirectional recurrent neural network (RNN) structure. Second, we put forward a spatial-temporal deformable feature aggregation (STDFA) module, in which spatial and temporal contexts in dynamic video frames are adaptively captured and aggregated to enhance SR reconstruction. Experimental results on several datasets demonstrate that our approach outperforms state-of-the-art STVSR methods. The code is available at https://github.com/littlewhitesea/STDAN.

5.
IEEE Trans Med Imaging ; 42(1): 304-316, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36155433

RESUMO

Polarization imaging is sensitive to sub-wavelength microstructures of various cancer tissues, providing abundant optical characteristics and microstructure information of complex pathological specimens. However, how to reasonably utilize polarization information to strengthen pathological diagnosis ability remains a challenging issue. In order to take full advantage of pathological image information and polarization features of samples, we propose a dual polarization modality fusion network (DPMFNet), which consists of a multi-stream CNN structure and a switched attention fusion module for complementarily aggregating the features from different modality images. Our proposed switched attention mechanism could obtain the joint feature embeddings by switching the attention map of different modality images to improve their semantic relatedness. By including a dual-polarization contrastive training scheme, our method can synthesize and align the interaction and representation of two polarization features. Experimental evaluations on three cancer datasets show the superiority of our method in assisting pathological diagnosis, especially in small datasets and low imaging resolution cases. Grad-CAM visualizes the important regions of the pathological images and the polarization images, indicating that the two modalities play different roles and allow us to give insightful corresponding explanations and analysis on cancer diagnosis conducted by the DPMFNet. This technique has potential to facilitate the performance of pathological aided diagnosis and broaden the current digital pathology boundary based on pathological image features.


Assuntos
Processamento de Imagem Assistida por Computador , Semântica
6.
IEEE Trans Image Process ; 31: 6455-6470, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36219663

RESUMO

With the help of convolutional neural networks (CNNs), deep learning-based methods have achieved remarkable performance in face super-resolution (FSR) task. Despite their success, most of the existing methods neglect non-local correlations of face images, leaving much room for improvement. In this paper, we introduce a novel end-to-end trainable attention-driven graph neural network (AD-GNN) for more discriminative feature extraction and feature relation modeling. This is achieved by two major components. The first component is a cross-scale dynamic graph (CDG) block. The CDG block considers cross-scale relationships of patches in distant areas and employs two dynamic graphs to construct enhanced features. The second component is a series of channel attention and spatial dynamic graph (CASDG) blocks. A CASDG block has a channel-wise attention unit and a spatial-aware dynamic graph (SDG) unit. The SDG unit extracts informative features by exploring spatial non-local self-similarity information of the patches using dynamic graph convolution. Using these two components, facial details can be effectively reconstructed with the help of information supplemented by similar but spatially remote patches and structural information of faces. Extensive experiments on two public benchmarks demonstrate the superiority of AD-GNN over the state-of-the-art FSR methods.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Processamento de Imagem Assistida por Computador/métodos , Redes Neurais de Computação , Benchmarking , Atenção
7.
Annu Int Conf IEEE Eng Med Biol Soc ; 2022: 2062-2065, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-36085646

RESUMO

With the rapid development of the world economy and increasing improvement of people's living standards, the number of diabetic patients has been growing quickly. Meanwhile, the complications of diabetes especially retinopathy have been affecting their daily life seriously. The only way to prevent it from getting worse and even leading to blindness is to make corresponding diagnosis as early as possible. However, it's extremely impossible for professionals to diagnose all the patients through their fundus images. It couldn't be better to solve the problem by automatic systems, so we present a novel network to learn the features of diabetic retinopathy (DR) and its complication diabetic macular edema (DME) and the relationship between them, focus on some vital areas in the pictures and eventually obtain the grades of the two diseases at the same time. Experimental results further prove the effectiveness of our proposed module comparing to the only joint grading network before.


Assuntos
Diabetes Mellitus , Retinopatia Diabética , Edema Macular , Cegueira , Retinopatia Diabética/diagnóstico , Fundo de Olho , Humanos , Aprendizagem , Edema Macular/diagnóstico , Edema Macular/etiologia
8.
Appl Opt ; 61(32): 9666-9673, 2022 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-36606907

RESUMO

Soft joint shape measurement is challenging because, in most cases, it relies solely on internal sensors. Existing shape estimation methods commonly take measurements at discrete points and utilize curve-fitting schemes, which are inefficient for complex joint shapes that require continuous measurements. Therefore joint shape measurement sensors rely on the fiber Bragg grating (FBG) due to its sensitivity, immunity to electromagnetic interference, and flexibility. Nevertheless, FBG demodulation is still an open research case. Hence, we propose a shape measurement device appropriate for FBG-based continuous measurements that employs a sensor with only three FBGs thrusting inside the soft joint to measure its 3D shape. Moreover, we develop a simple demodulating system exploiting the FBG's filter overlapping properties and design a calibrating process for FBG signals. Soft joint shape measurement experiments highlight our method's effectiveness, providing a relative error within 0.7%. Further tests involving continuum robot measurement reveal that the achieved precision is of the same level as a motion-capturing system.

9.
IEEE Trans Neural Netw Learn Syst ; 33(8): 3448-3460, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-33523819

RESUMO

Object detection has made enormous progress and has been widely used in many applications. However, it performs poorly when only limited training data is available for novel classes that the model has never seen before. Most existing approaches solve few-shot detection tasks implicitly without directly modeling the detectors for novel classes. In this article, we propose GenDet, a new meta-learning-based framework that can effectively generate object detectors for novel classes from few shots and, thus, conducts few-shot detection tasks explicitly. The detector generator is trained by numerous few-shot detection tasks sampled from base classes each with sufficient samples, and thus, it is expected to generalize well on novel classes. An adaptive pooling module is further introduced to suppress distracting samples and aggregate the detectors generated from multiple shots. Moreover, we propose to train a reference detector for each base class in the conventional way, with which to guide the training of the detector generator. The reference detectors and the detector generator can be trained simultaneously. Finally, the generated detectors of different classes are encouraged to be orthogonal to each other for better generalization. The proposed approach is extensively evaluated on the ImageNet, VOC, and COCO data sets under various few-shot detection settings, and it achieves new state-of-the-art results.

10.
IEEE J Biomed Health Inform ; 26(4): 1628-1639, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-34543208

RESUMO

Brain stroke lesion segmentation is of great importance for stroke rehabilitation neuroimaging analysis. Due to the large variance of stroke lesion shapes and similarities of tissue intensity distribution, it remains a challenging task. To help detect abnormalities, the anatomical symmetries of brain magnetic resonance (MR) images have been widely used as visual cues for clinical practices. However, most methods for brain images segmentation do not fully utilize structural symmetry information. This paper presents a novel mirror difference aware network (MDAN) for stroke lesion segmentation. The network uses an encoder-decoder architecture, aiming at holistically exploiting the symmetries of image features. Specifically, a differential feature augmentation (DFA) module is developed in the encoding path to highlight the semantically pathological asymmetries of features in abnormalities. In the DFA module, a Siamese contrastive supervised loss is designed to enhance discriminative features, and a mirror position-based difference augmentation (MDA) module is used to further magnify the discrepancy. Moreover, mirror feature fusion (MFF) modules are applied to efficiently fuse and transfer the information both of the original input and the horizontally flipped features to the decoding path. Extensive experiments on the Anatomical Tracings of Lesions After Stroke (ATLAS) dataset show the proposed MDAN outperforms the state-of-the-art methods.


Assuntos
Processamento de Imagem Assistida por Computador , Acidente Vascular Cerebral , Encéfalo/diagnóstico por imagem , Humanos , Processamento de Imagem Assistida por Computador/métodos , Imageamento por Ressonância Magnética , Redes Neurais de Computação , Acidente Vascular Cerebral/diagnóstico por imagem
11.
IEEE Trans Image Process ; 31: 216-226, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34793301

RESUMO

Different from the object motion blur, the defocus blur is caused by the limitation of the cameras' depth of field. The defocus amount can be characterized by the parameter of point spread function and thus forms a defocus map. In this paper, we propose a new network architecture called Defocus Image Deblurring Auxiliary Learning Net (DID-ANet), which is specifically designed for single image defocus deblurring by using defocus map estimation as auxiliary task to improve the deblurring result. To facilitate the training of the network, we build a novel and large-scale dataset for single image defocus deblurring, which contains the defocus images, the defocus maps and the all-sharp images. To the best of our knowledge, the new dataset is the first large-scale defocus deblurring dataset for training deep networks. Moreover, the experimental results demonstrate that the proposed DID-ANet outperforms the state-of-the-art methods for both tasks of defocus image deblurring and defocus map estimation, both quantitatively and qualitatively. The dataset, code, and model is available on GitHub: https://github.com/xytmhy/DID-ANet-Defocus-Deblurring.

12.
Appl Opt ; 60(22): 6682-6694, 2021 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-34612912

RESUMO

Different from conventional microimaging techniques, polarization imaging can generate multiple polarization images in a single perspective by changing the polarization angle. However, how to efficiently fuse the information in these multiple polarization images by a convolutional neural network (CNN) is still a challenging problem. In this paper, we propose a hybrid 3D-2D convolutional neural network called MuellerNet, to classify biological cells with Mueller matrix images (MMIs). The MuellerNet includes a normal stream and a polarimetric stream, in which the first Mueller matrix image is taken as the input of normal stream, and the rest MMIs are stacked to form the input of a polarimetric stream. The normal stream is mainly constructed with a backbone network and, in the polarimetric stream, the attention mechanism is used to adaptively assign weights to different convolutional maps. To improve the network's discrimination, a loss function is introduced to simultaneously optimize parameters of the two streams. Two Mueller matrix image datasets are built, which include four types of breast cancer cells and three types of algal cells, respectively. Experiments are conducted on these two datasets with many well-known and recent networks. Results show that the proposed network efficiently improves the classification accuracy and helps to find discriminative features in MMIs.

13.
IEEE Trans Neural Netw Learn Syst ; 32(10): 4742-4747, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-32857706

RESUMO

In deep face recognition, the commonly used softmax loss and its newly proposed variations are not yet sufficiently effective to handle the class imbalance and softmax saturation issues during the training process while extracting discriminative features. In this brief, to address both issues, we propose a class-variant margin (CVM) normalized softmax loss, by introducing a true-class margin and a false-class margin into the cosine space of the angle between the feature vector and the class-weight vector. The true-class margin alleviates the class imbalance problem, and the false-class margin postpones the early individual saturation of softmax. With negligible computational complexity increment during training, the new loss function is easy to implement in the common deep learning frameworks. Comprehensive experiments on the LFW, YTF, and MegaFace protocols demonstrate the effectiveness of the proposed CVM loss function.


Assuntos
Reconhecimento Facial Automatizado/tendências , Aprendizado Profundo/tendências , Redes Neurais de Computação , Reconhecimento Facial Automatizado/métodos , Humanos
14.
Annu Int Conf IEEE Eng Med Biol Soc ; 2020: 2129-2132, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-33018427

RESUMO

Cardiovascular diseases are the biggest threat to human being's health all over the world, and carotid atherosclerotic plaque is the leading cause of ischemic cardiovascular diseases. To determine the location and shape of the plaque, it is of great significance to detect the intima-media (IM). In this paper, a new IM detection method based on convolution neural network (IMD-CNN) is proposed for the detection of IM of blood vessels in longitudinal ultrasonic images. In IMD-CNN, firstly the region of interest (ROI) is automatically extracted by morphological processing, then the patch-wise training data are constructed, and finally a simple CNN is trained to detect the IM. The experimental results obtained on 23 images show that the test accuracy of IMD-CNN is over 86% and the performance of IMD-CNN is also visually proved to be effective.


Assuntos
Meios de Comunicação , Placa Aterosclerótica , Espessura Intima-Media Carotídea , Humanos , Redes Neurais de Computação , Placa Aterosclerótica/diagnóstico por imagem , Ultrassonografia
15.
Artigo em Inglês | MEDLINE | ID: mdl-32845840

RESUMO

Capturing an all-in-focus image with a single camera is difficult since the depth of field of the camera is usually limited. An alternative method to obtain the all-in-focus image is to fuse several images that are focused at different depths. However, existing multi-focus image fusion methods cannot obtain clear results for areas near the focused/defocused boundary (FDB). In this paper, a novel α-matte boundary defocus model is proposed to generate realistic training data with the defocus spread effect precisely modeled, especially for areas near the FDB. Based on this α-matte defocus model and the generated data, a cascaded boundary-aware convolutional network termed MMF-Net is proposed and trained, aiming to achieve clearer fusion results around the FDB. Specifically, the MMF-Net consists of two cascaded subnets for initial fusion and boundary fusion. These two subnets are designed to first obtain a guidance map of FDB and then refine the fusion near the FDB. Experiments demonstrate that with the help of the new α-matte boundary defocus model, the proposed MMF-Net outperforms the state-of-the-art methods both qualitatively and quantitatively.

16.
Artigo em Inglês | MEDLINE | ID: mdl-31567083

RESUMO

In this paper, we develop a concise but efficient network architecture called linear compressing based skipconnecting network (LCSCNet) for image super-resolution. Compared with two representative network architectures with skip connections, ResNet and DenseNet, a linear compressing layer is designed in LCSCNet for skip connection, which connects former feature maps and distinguishes them from newly-explored feature maps. In this way, the proposed LCSCNet enjoys the merits of the distinguish feature treatment of DenseNet and the parametereconomic form of ResNet. Moreover, to better exploit hierarchical information from both low and high levels of various receptive fields in deep models, inspired by gate units in LSTM, we also propose an adaptive element-wise fusion strategy with multisupervised training. Experimental results in comparison with state-of-the-art algorithms validate the effectiveness of LCSCNet.

17.
Annu Int Conf IEEE Eng Med Biol Soc ; 2019: 2576-2579, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31946423

RESUMO

Sleep Apnea-Hypopnea Syndrome (SAHS) is a sleep-related breathing disorder which involves the reduction in breathing airway when patiens sleep. However, a large proportion of patients are usually undiagnosed and untreated which may lead to the risk of life. In this paper, we propose an automatic SAHS event detection method based on Long Short-Term Memory (LSTM) network via nasal airway pressure and temperature signals from clinical polysomnography (PSG) dataset. Focusing on time location of the events, we firstly segment the two channels of signals into a series of sequences by feature extraction. Secondly, a LSTM network is established and these sequences are subsequently fed into this LSTM network for SAHS event classification. The experimental results on both our clinical PSG dataset and public MIT-BIH PSG database show that our method is promising in terms of recall.


Assuntos
Taxa Respiratória , Apneia Obstrutiva do Sono/diagnóstico , Algoritmos , Humanos , Polissonografia , Processamento de Sinais Assistido por Computador , Sono
18.
Annu Int Conf IEEE Eng Med Biol Soc ; 2019: 4741-4744, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31946921

RESUMO

The Dynamic Optical Breast Imaging technology is a promising breast cancer diagnosis approach based on tumor angiogenesis or vascular change detection which generally causes an increased blood volume in tumor. By applying sustained pressure to breast tissue under red light, the tissue with abnormal vascularization exhibits different dynamic behaviors of optical properties compared with normal breast tissue. In this paper, we explore a comprehensive classification method to discriminate malignant from benign lesions based on the Dynamic Optical breast Imaging technology. Firstly, we propose an automatic Point of Reference (POR) and Point of Interest (POI) selection algorithm from input images for following comparison procedures. Secondly, an automatic Margin Shape Patterns (MSP) recognition algorithm for Region of Interest (ROI) is explored. Furthermore, we define a new significant temporal and contextual feature named Temporal Curves Similarity Index (TCSI), with the aim of better characterizing and quantifying the difference inside the same breast. Finally, Support Vector Machine (SVM) is utilized for comprehensive classification. Experimental results, sensitivity of 91% and specificity of 71%, on our clinical database verify the efficiency of the proposed method.


Assuntos
Algoritmos , Neoplasias da Mama/diagnóstico por imagem , Reconhecimento Automatizado de Padrão , Máquina de Vetores de Suporte , Mama/diagnóstico por imagem , Feminino , Humanos , Sensibilidade e Especificidade
19.
IEEE Trans Image Process ; 27(9): 4232-4244, 2018 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-29870344

RESUMO

Full-reference image quality assessment algorithms usually perform comparisons of features extracted from square patches. These patches do not have any visual meanings. On the contrary, a superpixel is a set of image pixels that share similar visual characteristics and is thus perceptually meaningful. Features from superpixels may improve the performance of image quality assessment. Inspired by this, we propose a new superpixel-based similarity index by extracting perceptually meaningful features and revising similarity measures. The proposed method evaluates image quality on the basis of three measurements, namely, superpixel luminance similarity, superpixel chrominance similarity, and pixel gradient similarity. The first two measurements assess the overall visual impression on local images. The third measurement quantifies structural variations. The impact of superpixel-based regional gradient consistency on image quality is also analyzed. Distorted images showing high regional gradient consistency with the corresponding reference images are visually appreciated. Therefore, the three measurements are further revised by incorporating the regional gradient consistency into their computations. A weighting function that indicates superpixel-based texture complexity is utilized in the pooling stage to obtain the final quality score. Experiments on several benchmark databases demonstrate that the proposed method is competitive with the state-of-the-art metrics.

20.
Sensors (Basel) ; 17(6)2017 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-28632166

RESUMO

In speech separation tasks, many separation methods have the limitation that the microphones are closely spaced, which means that these methods are unprevailing for phase wrap-around. In this paper, we present a novel speech separation scheme by using two microphones that does not have this restriction. The technique utilizes the estimation of interaural time difference (ITD) statistics and binary time-frequency mask for the separation of mixed speech sources. The novelties of the paper consist in: (1) the extended application of delay-and-sum beamforming (DSB) and cosine function for ITD calculation; and (2) the clarification of the connection between ideal binary mask and DSB amplitude ratio. Our objective quality evaluation experiments demonstrate the effectiveness of the proposed method.


Assuntos
Percepção da Fala , Humanos , Fala
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA