Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
IEEE Trans Image Process ; 33: 3722-3734, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38857135

RESUMO

Novel view synthesis aims at rendering any posed images from sparse observations of the scene. Recently, neural radiance fields (NeRF) have demonstrated their effectiveness in synthesizing novel views of a bounded scene. However, most existing methods cannot be directly extended to 360° unbounded scenes where the camera orientations and scene depths are unconstrained with large variations. In this paper, we present a spherical radiance field (SRF) for efficient novel view synthesis in 360° unbounded scenes. Specifically, we represent a 3D scene as multiple concentric spheres with different radii. In particular, each sphere encodes its corresponding layered scene into implicit representations and is parameterized with an equirectangular projection image. A shallow multi-layer perceptron (MLP) is then used to infer the density and color from these sphere representations for volume rendering. Moreover, an occupancy grid is introduced to cache the density field and guide the ray sampling, which accelerates the training and rendering procedures by reducing the number of samples along the ray. Experiments show that our method can well fit 360° unbounded scenes and produces state-of-the-art results on three benchmark datasets with less than 30 minutes of training time on a 3090 GPU, surpassing Mip-NeRF 360 with a 400× speedup. In addition, our method achieves competitive performance in terms of both accuracy and efficiency on a bounded dataset. Project page: https://minglin-chen.github.io/SphericalRF.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38557631

RESUMO

Recent years have witnessed the great advances of deep neural networks (DNNs) in light field (LF) image super-resolution (SR). However, existing DNN-based LF image SR methods are developed on a single fixed degradation (e.g., bicubic downsampling), and thus cannot be applied to super-resolve real LF images with diverse degradation. In this article, we propose a simple yet effective method for real-world LF image SR. In our method, a practical LF degradation model is developed to formulate the degradation process of real LF images. Then, a convolutional neural network is designed to incorporate the degradation prior into the SR process. By training on LF images using our formulated degradation, our network can learn to modulate different degradation while incorporating both spatial and angular information in LF images. Extensive experiments on both synthetically degraded and real-world LF images demonstrate the effectiveness of our method. Compared with existing state-of-the-art single and LF image SR methods, our method achieves superior SR performance under a wide range of degradation, and generalizes better to real LF images. Codes and models are available at https://yingqianwang.github.io/LF-DMnet/.

3.
Artigo em Inglês | MEDLINE | ID: mdl-38478434

RESUMO

Visual speech, referring to the visual domain of speech, has attracted increasing attention due to its wide applications, such as public security, medical treatment, military defense, and film entertainment. As a powerful AI strategy, deep learning techniques have extensively promoted the development of visual speech learning. Over the past five years, numerous deep learning based methods have been proposed to address various problems in this area, especially automatic visual speech recognition and generation. To push forward future research on visual speech, this paper will present a comprehensive review of recent progress in deep learning methods on visual speech analysis. We cover different aspects of visual speech, including fundamental problems, challenges, benchmark datasets, a taxonomy of existing methods, and state-of-the-art performance. Besides, we also identify gaps in current research and discuss inspiring future research directions.

4.
Artigo em Inglês | MEDLINE | ID: mdl-38315589

RESUMO

Recently, memory-based networks have achieved promising performance for video object segmentation (VOS). However, existing methods still suffer from unsatisfactory segmentation accuracy and inferior efficiency. The reasons are mainly twofold: 1) during memory construction, the inflexible memory storage mechanism results in a weak discriminative ability for similar appearances in complex scenarios, leading to video-level temporal redundancy, and 2) during memory reading, matching robustness and memory retrieval accuracy decrease as the number of video frames increases. To address these challenges, we propose an adaptive sparse memory network (ASM) that efficiently and effectively performs VOS by sparsely leveraging previous guidance while attending to key information. Specifically, we design an adaptive sparse memory constructor (ASMC) to adaptively memorize informative past frames according to dynamic temporal changes in video frames. Furthermore, we introduce an attentive local memory reader (ALMR) to quickly retrieve relevant information using a subset of memory, thereby reducing frame-level redundant computation and noise in a simpler and more convenient manner. To prevent key features from being discarded by the subset of memory, we further propose a novel attentive local feature aggregation (ALFA) module, which preserves useful cues by selectively aggregating discriminative spatial dependence from adjacent frames, thereby effectively increasing the receptive field of each memory frame. Extensive experiments demonstrate that our model achieves state-of-the-art performance with real-time speed on six popular VOS benchmarks. Furthermore, our ASM can be applied to existing memory-based methods as generic plugins to achieve significant performance improvements. More importantly, our method exhibits robustness in handling sparse videos with low frame rates.

5.
Nutr Rev ; 82(5): 654-663, 2024 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-37587082

RESUMO

Studies have shown that exposure to fine particulate matter (PM2.5) affects various cells, systems, and organs in vivo and in vitro. PM2.5 adversely affects human health through mechanisms such as oxidative stress, inflammatory response, autophagy, ferroptosis, and endoplasmic reticulum stress. Phytochemicals are of interest for their broad range of physiological activities and few side effects, and, in recent years, they have been widely used to mitigate the adverse effects caused by PM2.5 exposure. In this review, the roles of various phytochemicals are summarized, including those of polyphenols, carotenoids, organic sulfur compounds, and saponin compounds, in mitigating PM2.5-induced adverse reactions through different molecular mechanisms, including anti-inflammatory and antioxidant mechanisms, inhibition of endoplasmic reticulum stress and ferroptosis, and regulation of autophagy. These are useful as a scientific basis for the prevention and treatment of disease caused by PM2.5.


Assuntos
Estresse Oxidativo , Material Particulado , Humanos , Material Particulado/toxicidade , Antioxidantes/farmacologia , Autofagia/fisiologia
6.
Artigo em Inglês | MEDLINE | ID: mdl-37976190

RESUMO

Infrared small target (IRST) detection aims at separating targets from cluttered background. Although many deep learning-based single-frame IRST (SIRST) detection methods have achieved promising detection performance, they cannot deal with extremely dim targets while suppressing the clutters since the targets are spatially indistinctive. Multiframe IRST (MIRST) detection can well handle this problem by fusing the temporal information of moving targets. However, the extraction of motion information is challenging since general convolution is insensitive to motion direction. In this article, we propose a simple yet effective direction-coded temporal U-shape module (DTUM) for MIRST detection. Specifically, we build a motion-to-data mapping to distinguish the motion of targets and clutters by indexing different directions. Based on the motion-to-data mapping, we further design a direction-coded convolution block (DCCB) to encode the motion direction into features and extract the motion information of targets. Our DTUM can be equipped with most single-frame networks to achieve MIRST detection. Moreover, in view of the lack of MIRST datasets, including dim targets, we build a multiframe infrared small and dim target dataset (namely, NUDT-MIRSDT) and propose several evaluation metrics. The experimental results on the NUDT-MIRSDT dataset demonstrate the effectiveness of our method. Our method achieves the state-of-the-art performance in detecting infrared small and dim targets and suppressing false alarms. Our codes will be available at https://github.com/TinaLRJ/Multi-frame-infrared-small-target-detection-DTUM.

7.
Foods ; 12(22)2023 Nov 09.
Artigo em Inglês | MEDLINE | ID: mdl-38002125

RESUMO

Today, with the globalization of the food trade progressing, food safety continues to warrant widespread attention. Foodborne diseases caused by contaminated food, including foodborne pathogens, seriously threaten public health and the economy. This has led to the development of more sensitive and accurate methods for detecting pathogenic bacteria. Many signal amplification techniques have been used to improve the sensitivity of foodborne pathogen detection. Among them, hybridization chain reaction (HCR), an isothermal nucleic acid hybridization signal amplification technique, has received increasing attention due to its enzyme-free and isothermal characteristics, and pathogenic bacteria detection methods using HCR for signal amplification have experienced rapid development in the last five years. In this review, we first describe the development of detection technologies for food contaminants represented by pathogens and introduce the fundamental principles, classifications, and characteristics of HCR. Furthermore, we highlight the application of various biosensors based on HCR nucleic acid amplification technology in detecting foodborne pathogens. Lastly, we summarize and offer insights into the prospects of HCR technology and its application in pathogen detection.

8.
IEEE Trans Image Process ; 32: 3924-3938, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37432823

RESUMO

Recently, memory-based methods have achieved remarkable progress in video object segmentation. However, the segmentation performance is still limited by error accumulation and redundant memory, primarily because of 1) the semantic gap caused by similarity matching and memory reading via heterogeneous key-value encoding; 2) the continuously growing and inaccurate memory through directly storing unreliable predictions of all previous frames. To address these issues, we propose an efficient, effective, and robust segmentation method based on Isogenous Memory Sampling and Frame-Relation mining (IMSFR). Specifically, by utilizing an isogenous memory sampling module, IMSFR consistently conducts memory matching and reading between sampled historical frames and the current frame in an isogenous space, minimizing the semantic gap while speeding up the model through an efficient random sampling. Furthermore, to avoid key information loss during the sampling process, we further design a frame-relation temporal memory module to mine inter-frame relations, thereby effectively preserving contextual information from the video sequence and alleviating error accumulation. Extensive experiments demonstrate the effectiveness and efficiency of the proposed IMSFR method. In particular, our IMSFR achieves state-of-the-art performance on six commonly used benchmarks in terms of region similarity & contour accuracy and speed. Our model also exhibits strong robustness against frame sampling due to its large receptive field.

9.
Analyst ; 148(15): 3452-3459, 2023 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-37366080

RESUMO

With the development of new technologies for rapid and high-throughput bacterial detection, ATP-based bioluminescence technology is making progress. Because live bacteria contain ATP, the number of bacteria is correlated with the level of ATP under certain conditions, so that the method of luciferase catalyzing the fluorescence reaction of luciferin with ATP is widely used for the detection of bacteria. This method is easy to operate, has a short detection cycle, does not require much human resources, and is suitable for long-term continuous monitoring. Currently, other methods are being explored in combination with bioluminescence for more accurate, portable and efficient detection. This paper introduces the principle, development and application of bacterial bioluminescence detection based on ATP and compares the combination of bioluminescence and other bacterial detection methods in recent years. In addition, this paper also examines the development prospects and direction of bioluminescence in bacterial detection, hoping to provide a new idea for the application of ATP-based bioluminescence.


Assuntos
Trifosfato de Adenosina , Medições Luminescentes , Humanos , Medições Luminescentes/métodos , Bactérias , Tecnologia , Luciferinas
10.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 9806-9821, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37030771

RESUMO

We study the problem of extracting accurate correspondences for point cloud registration. Recent keypoint-free methods have shown great potential through bypassing the detection of repeatable keypoints which is difficult to do especially in low-overlap scenarios. They seek correspondences over downsampled superpoints, which are then propagated to dense points. Superpoints are matched based on whether their neighboring patches overlap. Such sparse and loose matching requires contextual features capturing the geometric structure of the point clouds. We propose Geometric Transformer, or GeoTransformer for short, to learn geometric feature for robust superpoint matching. It encodes pair-wise distances and triplet-wise angles, making it invariant to rigid transformation and robust in low-overlap cases. The simplistic design attains surprisingly high matching accuracy such that no RANSAC is required in the estimation of alignment transformation, leading to 100 times acceleration. Extensive experiments on rich benchmarks encompassing indoor, outdoor, synthetic, multiway and non-rigid demonstrate the efficacy of GeoTransformer. Notably, our method improves the inlier ratio by 18 âˆ¼ 31 percentage points and the registration recall by over 7 points on the challenging 3DLoMatch benchmark.


Assuntos
Aceleração , Algoritmos , Benchmarking , Aprendizagem
11.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 10376-10393, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37022868

RESUMO

We present RoReg, a novel point cloud registration framework that fully exploits oriented descriptors and estimated local rotations in the whole registration pipeline. Previous methods mainly focus on extracting rotation-invariant descriptors for registration but unanimously neglect the orientations of descriptors. In this paper, we show that the oriented descriptors and the estimated local rotations are very useful in the whole registration pipeline, including feature description, feature detection, feature matching, and transformation estimation. Consequently, we design a novel oriented descriptor RoReg-Desc and apply RoReg-Desc to estimate the local rotations. Such estimated local rotations enable us to develop a rotation-guided detector, a rotation coherence matcher, and a one-shot-estimation RANSAC, all of which greatly improve the registration performance. Extensive experiments demonstrate that RoReg achieves state-of-the-art performance on the widely-used 3DMatch and 3DLoMatch datasets, and also generalizes well to the outdoor ETH dataset. In particular, we also provide in-depth analysis on each component of RoReg, validating the improvements brought by oriented descriptors and the estimated local rotations. Source code and supplementary material are available at https://github.com/HpWang-whu/RoReg.


Assuntos
Algoritmos , Software , Reconhecimento Automatizado de Padrão/métodos
12.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 425-443, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35180076

RESUMO

Light field (LF) cameras record both intensity and directions of light rays, and encode 3D scenes into 4D LF images. Recently, many convolutional neural networks (CNNs) have been proposed for various LF image processing tasks. However, it is challenging for CNNs to effectively process LF images since the spatial and angular information are highly inter-twined with varying disparities. In this paper, we propose a generic mechanism to disentangle these coupled information for LF image processing. Specifically, we first design a class of domain-specific convolutions to disentangle LFs from different dimensions, and then leverage these disentangled features by designing task-specific modules. Our disentangling mechanism can well incorporate the LF structure prior and effectively handle 4D LF data. Based on the proposed mechanism, we develop three networks (i.e., DistgSSR, DistgASR and DistgDisp) for spatial super-resolution, angular super-resolution and disparity estimation. Experimental results show that our networks achieve state-of-the-art performance on all these three tasks, which demonstrates the effectiveness, efficiency, and generality of our disentangling mechanism. Project page: https://yingqianwang.github.io/DistgLF/.

13.
IEEE Trans Pattern Anal Mach Intell ; 45(3): 3949-3967, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35679385

RESUMO

Extracting distinctive, robust, and general 3D local features is essential to downstream tasks such as point cloud registration. However, existing methods either rely on noise-sensitive handcrafted features, or depend on rotation-variant neural architectures. It remains challenging to learn robust and general local feature descriptors for surface matching. In this paper, we propose a new, simple yet effective neural network, termed SpinNet, to extract local surface descriptors which are rotation-invariant whilst sufficiently distinctive and general. A Spatial Point Transformer is first introduced to embed the input local surface into an elaborate cylindrical representation (SO(2) rotation-equivariant), further enabling end-to-end optimization of the entire framework. A Neural Feature Extractor, composed of point-based and 3D cylindrical convolutional layers, is then presented to learn representative and general geometric patterns. An invariant layer is finally used to generate rotation-invariant feature descriptors. Extensive experiments on both indoor and outdoor datasets demonstrate that SpinNet outperforms existing state-of-the-art techniques by a large margin. More critically, it has the best generalization ability across unseen scenarios with different sensor modalities.

14.
IEEE Trans Pattern Anal Mach Intell ; 45(4): 4474-4493, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35881599

RESUMO

Neural networks contain considerable redundant computation, which drags down the inference efficiency and hinders the deployment on resource-limited devices. In this paper, we study the sparsity in convolutional neural networks and propose a generic sparse mask mechanism to improve the inference efficiency of networks. Specifically, sparse masks are learned in both data and channel dimensions to dynamically localize and skip redundant computation at a fine-grained level. Based on our sparse mask mechanism, we develop SMPointSeg, SMSR, and SMStereo for point cloud semantic segmentation, single image super-resolution, and stereo matching tasks, respectively. It is demonstrated that our sparse masks are well compatible to different model components and network architectures to accurately localize redundant computation, with computational cost being significantly reduced for practical speedup. Extensive experiments show that our SMPointSeg, SMSR, and SMStereo achieve state-of-the-art performance on benchmark datasets in terms of both accuracy and efficiency.

15.
IEEE Trans Image Process ; 32: 1745-1758, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35994532

RESUMO

Single-frame infrared small target (SIRST) detection aims at separating small targets from clutter backgrounds. With the advances of deep learning, CNN-based methods have yielded promising results in generic object detection due to their powerful modeling capability. However, existing CNN-based methods cannot be directly applied to infrared small targets since pooling layers in their networks could lead to the loss of targets in deep layers. To handle this problem, we propose a dense nested attention network (DNA-Net) in this paper. Specifically, we design a dense nested interactive module (DNIM) to achieve progressive interaction among high-level and low-level features. With the repetitive interaction in DNIM, the information of infrared small targets in deep layers can be maintained. Based on DNIM, we further propose a cascaded channel and spatial attention module (CSAM) to adaptively enhance multi-level features. With our DNA-Net, contextual information of small targets can be well incorporated and fully exploited by repetitive fusion and enhancement. Moreover, we develop an infrared small target dataset (namely, NUDT-SIRST) and propose a set of evaluation metrics to conduct comprehensive performance evaluation. Experiments on both public and our self-developed datasets demonstrate the effectiveness of our method. Compared to other state-of-the-art methods, our method achieves better performance in terms of probability of detection ( Pd ), false-alarm rate ( Fa ), and intersection of union ( IoU ).

16.
Int J Comput Vis ; 130(9): 2321-2336, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35968252

RESUMO

We present 3DPointCaps++ for learning robust, flexible and generalizable 3D object representations without requiring heavy annotation efforts or supervision. Unlike conventional 3D generative models, our algorithm aims for building a structured latent space where certain factors of shape variations, such as object parts, can be disentangled into independent sub-spaces. Our novel decoder then acts on these individual latent sub-spaces (i.e. capsules) using deconvolution operators to reconstruct 3D points in a self-supervised manner. We further introduce a cluster loss ensuring that the points reconstructed by a single capsule remain local and do not spread across the object uncontrollably. These contributions allow our network to tackle the challenging tasks of part segmentation, part interpolation/replacement as well as correspondence estimation across rigid / non-rigid shape, and across / within category. Our extensive evaluations on ShapeNet objects and human scans demonstrate that our network can learn generic representations that are robust and useful in many applications.

17.
IEEE Trans Image Process ; 31: 2094-2105, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35196234

RESUMO

The goal of ground-to-aerial image geo-localization is to determine the location of a ground query image by matching it against a reference database consisting of aerial/satellite images. This task is highly challenging due to the large appearance difference caused by extreme changes in viewpoint and orientation. In this work, we show that the training difficulty is an important cue that can be leveraged to improve metric learning on cross-view images. More specifically, we propose a new Soft Exemplar Highlighting (SEH) loss to achieve online soft selection of exemplars. Adaptive weights are generated for exemplars by measuring their associated training difficulty using distance rectified logistic regression. These weights are then constrained to remove simple exemplars from training and truncate the large weights of extremely hard exemplars to escape from the trap with a local optimal solution. We further use the proposed SEH loss to train two mainstream convolutional neural networks for ground-to-aerial image-based geo-localization. Experimental results on two benchmark cross-view image datasets demonstrate that the proposed method achieves significant improvements in feature discriminativeness and outperforms the state-of-the-art image-based geo-localization methods.

18.
IEEE Trans Pattern Anal Mach Intell ; 44(4): 2108-2125, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-32976095

RESUMO

Stereo image pairs encode 3D scene cues into stereo correspondences between the left and right images. To exploit 3D cues within stereo images, recent CNN based methods commonly use cost volume techniques to capture stereo correspondence over large disparities. However, since disparities can vary significantly for stereo cameras with different baselines, focal lengths and resolutions, the fixed maximum disparity used in cost volume techniques hinders them to handle different stereo image pairs with large disparity variations. In this paper, we propose a generic parallax-attention mechanism (PAM) to capture stereo correspondence regardless of disparity variations. Our PAM integrates epipolar constraints with attention mechanism to calculate feature similarities along the epipolar line to capture stereo correspondence. Based on our PAM, we propose a parallax-attention stereo matching network (PASMnet) and a parallax-attention stereo image super-resolution network (PASSRnet) for stereo matching and stereo image super-resolution tasks. Moreover, we introduce a new and large-scale dataset named Flickr1024 for stereo image super-resolution. Experimental results show that our PAM is generic and can effectively learn stereo correspondence under large disparity variations in an unsupervised manner. Comparative results show that our PASMnet and PASSRnet achieve the state-of-the-art performance.

19.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 8338-8354, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34033533

RESUMO

We study the problem of efficient semantic segmentation of large-scale 3D point clouds. By relying on expensive sampling techniques or computationally heavy pre/post-processing steps, most existing approaches are only able to be trained and operate over small-scale point clouds. In this paper, we introduce RandLA-Net, an efficient and lightweight neural architecture to directly infer per-point semantics for large-scale point clouds. The key to our approach is to use random point sampling instead of more complex point selection approaches. Although remarkably computation and memory efficient, random sampling can discard key features by chance. To overcome this, we introduce a novel local feature aggregation module to progressively increase the receptive field for each 3D point, thereby effectively preserving geometric details. Comparative experiments show that our RandLA-Net can process 1 million points in a single pass up to 200× faster than existing approaches. Moreover, extensive experiments on five large-scale point cloud datasets, including Semantic3D, SemanticKITTI, Toronto3D, NPM3D and S3DIS, demonstrate the state-of-the-art semantic segmentation performance of our RandLA-Net.

20.
Front Med (Lausanne) ; 8: 758690, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34912820

RESUMO

Background: It is often difficult to diagnose pituitary microadenoma (PM) by MRI alone, due to its relatively small size, variable anatomical structure, complex clinical symptoms, and signs among individuals. We develop and validate a deep learning -based system to diagnose PM from MRI. Methods: A total of 11,935 infertility participants were initially recruited for this project. After applying the exclusion criteria, 1,520 participants (556 PM patients and 964 controls subjects) were included for further stratified into 3 non-overlapping cohorts. The data used for the training set were derived from a retrospective study, and in the validation dataset, prospective temporal and geographical validation set were adopted. A total of 780 participants were used for training, 195 participants for testing, and 545 participants were used to validate the diagnosis performance. The PM-computer-aided diagnosis (PM-CAD) system consists of two parts: pituitary region detection and PM diagnosis. The diagnosis performance of the PM-CAD system was measured using the receiver operating characteristics (ROC) curve and area under the ROC curve (AUC), calibration curve, accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1-score. Results: Pituitary microadenoma-computer-aided diagnosis system showed 94.36% diagnostic accuracy and 98.13% AUC score in the testing dataset. We confirm the robustness and generalization of our PM-CAD system, the diagnostic accuracy in the internal dataset was 96.50% and in the external dataset was 92.26 and 92.36%, the AUC was 95.5, 94.7, and 93.7%, respectively. In human-computer competition, the diagnosis performance of our PM-CAD system was comparable to radiologists with >10 years of professional expertise (diagnosis accuracy of 94.0% vs. 95.0%, AUC of 95.6% vs. 95.0%). For the misdiagnosis cases from radiologists, our system showed a 100% accurate diagnosis. A browser-based software was designed to assist the PM diagnosis. Conclusions: This is the first report showing that the PM-CAD system is a viable tool for detecting PM. Our results suggest that the PM-CAD system is applicable to radiology departments, especially in primary health care institutions.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...