Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38875098

RESUMO

Deep neural networks have exhibited remarkable performance in image super-resolution (SR) tasks by learning a mapping from low-resolution (LR) images to high-resolution (HR) images. However, the SR problem is typically an ill-posed problem and existing methods would come with several limitations. First, the possible mapping space of SR can be extremely large since there may exist many different HR images that can be super-resolved from the same LR image. As a result, it is hard to directly learn a promising SR mapping from such a large space. Second, it is often inevitable to develop very large models with extremely high computational cost to yield promising SR performance. In practice, one can use model compression techniques to obtain compact models by reducing model redundancy. Nevertheless, it is hard for existing model compression methods to accurately identify the redundant components due to the extremely large SR mapping space. To alleviate the first challenge, we propose a dual regression learning scheme to reduce the space of possible SR mappings. Specifically, in addition to the mapping from LR to HR images, we learn an additional dual regression mapping to estimate the downsampling kernel and reconstruct LR images. In this way, the dual mapping acts as a constraint to reduce the space of possible mappings. To address the second challenge, we propose a dual regression compression (DRC) method to reduce model redundancy in both layer-level and channel-level based on channel pruning. Specifically, we first develop a channel number search method that minimizes the dual regression loss to determine the redundancy of each layer. Given the searched channel numbers, we further exploit the dual regression manner to evaluate the importance of channels and prune the redundant ones. Extensive experiments show the effectiveness of our method in obtaining accurate and efficient SR models.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38809744

RESUMO

We study multi-sensor fusion for 3D semantic segmentation that is important to scene understanding for many applications, such as autonomous driving and robotics. For example, for autonomous cars equipped with RGB cameras and LiDAR, it is crucial to fuse complementary information from different sensors for robust and accurate segmentation. Existing fusion-based methods, however, may not achieve promising performance due to the vast difference between the two modalities. In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) to effectively exploit perceptual information from two modalities, namely, appearance information from RGB images and spatio-depth information from point clouds. To this end, we first project point clouds to the camera coordinate using perspective projection. In this way, we can process both inputs from LiDAR and cameras in 2D space while preventing the information loss of RGB images. Then, we propose a two-stream network that consists of a LiDAR stream and a camera stream to extract features from the two modalities, separately. The extracted features are fused by effective residual-based fusion modules. Moreover, we introduce additional perception-aware losses to measure the perceptual difference between the two modalities. Last, we propose an improved version of PMF, i.e., EPMF, which is more efficient and effective by optimizing data pre-processing and network architecture under perspective projection. Specifically, we propose cross-modal alignment and cropping to obtain tight inputs and reduce unnecessary computational costs. We then explore more efficient contextual modules under perspective projection and fuse the LiDAR features into the camera stream to boost the performance of the two-stream network. Extensive experiments on benchmark data sets show the superiority of our method. For example, on nuScenes test set, our EPMF outperforms the state-of-the-art method, i.e., RangeFormer, by 0.9% in mIoU. Compared to PMF, EPMF also achieves 2.06× acceleration with 2.0% improvement in mIoU. Our source code is available at https://github.com/ICEORY/PMF.

3.
Neural Netw ; 175: 106275, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38653078

RESUMO

Face Anti-Spoofing (FAS) seeks to protect face recognition systems from spoofing attacks, which is applied extensively in scenarios such as access control, electronic payment, and security surveillance systems. Face anti-spoofing requires the integration of local details and global semantic information. Existing CNN-based methods rely on small stride or image patch-based feature extraction structures, which struggle to capture spatial and cross-layer feature correlations effectively. Meanwhile, Transformer-based methods have limitations in extracting discriminative detailed features. To address the aforementioned issues, we introduce a multi-stage CNN-Transformer-based framework, which extracts local features through the convolutional layer and long-distance feature relationships via self-attention. Based on this, we proposed a cross-attention multi-stage feature fusion, employing semantically high-stage features to query task-relevant features in low-stage features for further cross-stage feature fusion. To enhance the discrimination of local features for subtle differences, we design pixel-wise material classification supervision and add a auxiliary branch in the intermediate layers of the model. Moreover, to address the limitations of a single acquisition environment and scarcity of acquisition devices in the existing Near-Infrared dataset, we create a large-scale Near-Infrared Face Anti-Spoofing dataset with 380k pictures of 1040 identities. The proposed method could achieve the state-of-the-art in OULU-NPU and our proposed Near-Infrared dataset at just 1.3GFlops and 3.2M parameter numbers, which demonstrate the effective of the proposed method.


Assuntos
Redes Neurais de Computação , Humanos , Reconhecimento Facial Automatizado/métodos , Processamento de Imagem Assistida por Computador/métodos , Face , Segurança Computacional , Algoritmos
4.
IEEE Trans Med Imaging ; PP2024 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-38669168

RESUMO

Many of the tissues/lesions in the medical images may be ambiguous. Therefore, medical segmentation is typically annotated by a group of clinical experts to mitigate personal bias. A common solution to fuse different annotations is the majority vote, e.g., taking the average of multiple labels. However, such a strategy ignores the difference between the grader expertness. Inspired by the observation that medical image segmentation is usually used to assist the disease diagnosis in clinical practice, we propose the diagnosis-first principle, which is to take disease diagnosis as the criterion to calibrate the inter-observer segmentation uncertainty. Following this idea, a framework named Diagnosis-First segmentation Framework (DiFF) is proposed. Specifically, DiFF will first learn to fuse the multi-rater segmentation labels to a single ground-truth which could maximize the disease diagnosis performance. We dubbed the fused ground-truth as Diagnosis-First Ground-truth (DF-GT). Then, the Take and Give Model (T&G Model) to segment DF-GT from the raw image is proposed. With the T&G Model, DiFF can learn the segmentation with the calibrated uncertainty that facilitate the disease diagnosis. We verify the effectiveness of DiFF on three different medical segmentation tasks: optic-disc/optic-cup (OD/OC) segmentation on fundus images, thyroid nodule segmentation on ultrasound images, and skin lesion segmentation on dermoscopic images. Experimental results show that the proposed DiFF can effectively calibrate the segmentation uncertainty, and thus significantly facilitate the corresponding disease diagnosis, which outperforms previous state-of-the-art multi-rater learning methods.

5.
IEEE Trans Pattern Anal Mach Intell ; 46(2): 764-779, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37930907

RESUMO

Image captioning is a core challenge in computer vision, attracting significant attention. Traditional methods prioritize caption quality, often overlooking style control. Our research enhances method controllability, enabling descriptions of varying detail. By integrating a length level embedding into current models, they can produce detailed or concise captions, increasing diversity. We introduce a length-level reranking transformer to correlate image and text complexity, optimizing caption length for informativeness without redundancy. Additionally, with caption length increase, computational complexity grows due to the autoregressive (AR) design of existing methods. To address this, our non-autoregressive (NAR) model maintains constant complexity regardless of caption length. We've developed a training approach that includes refinement sequence training and sequence-level knowledge distillation to close the performance gap between NAR and AR models. In testing, our models set new standards for caption quality on the MS COCO dataset and offer enhanced controllability and diversity. Our NAR model excels over AR models in these aspects and shows greater efficiency with longer captions. With advanced training techniques, our NAR's caption quality rivals that of leading AR models.

6.
IEEE J Biomed Health Inform ; 27(12): 5904-5913, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37682645

RESUMO

Videofluoroscopic swallowing study (VFSS) visualizes the swallowing movement by using X-ray fluoroscopy, which is the most widely used method for dysphagia examination. To better facilitate swallowing assessment, the temporal parameter is one of the most important indicators. However, most information of that acquire is hand-crafted and elaborated, which is time-consuming and difficult to ensure objectivity and accuracy. In this article, we propose to formulate this task as a temporal action localization task and solve it using deep neural networks. However, the action of VFSS has the following characteristics such as small motion targets, small action amplitudes, large sample variances, short duration, and variations in duration. Furthermore, all existing methods often rely on daily behaviors, which makes locating and recognizing micro-actions more challenging. To address the above issues, we first collect and annotate the VFSS micro-action dataset, which includes 847 VFSS data from 71 subjects, due to the lack of benchmarks. We then introduce a coarse-to-fine mechanism to handle the short and repeated nature of micro-actions, which can significantly enhancing micro-action localization accuracy. Moreover, we propose a Variable-Size Window Generator method, which improves the model's characterization performance and addresses the issue of different action timings, leading to further improvements in localization accuracy. The results of our experiments demonstrate the superiority of our method, with significantly improved performance (46.10% vs. 37.70%).


Assuntos
Transtornos de Deglutição , Deglutição , Humanos , Fluoroscopia/métodos , Transtornos de Deglutição/diagnóstico por imagem , Redes Neurais de Computação , Fatores de Tempo
7.
IEEE Trans Pattern Anal Mach Intell ; 45(10): 12459-12473, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37167046

RESUMO

Network pruning and quantization are proven to be effective ways for deep model compression. To obtain a highly compact model, most methods first perform network pruning and then conduct quantization based on the pruned model. However, this strategy may ignore that the pruning and quantization would affect each other and thus performing them separately may lead to sub-optimal performance. To address this, performing pruning and quantization jointly is essential. Nevertheless, how to make a trade-off between pruning and quantization is non-trivial. Moreover, existing compression methods often rely on some pre-defined compression configurations (i.e., pruning rates or bitwidths). Some attempts have been made to search for optimal configurations, which however may take unbearable optimization cost. To address these issues, we devise a simple yet effective method named Single-path Bit Sharing (SBS) for automatic loss-aware model compression. To this end, we consider the network pruning as a special case of quantization and provide a unified view for model pruning and quantization. We then introduce a single-path model to encode all candidate compression configurations, where a high bitwidth value will be decomposed into the sum of a lowest bitwidth value and a series of re-assignment offsets. Relying on the single-path model, we introduce learnable binary gates to encode the choice of configurations and learn the binary gates and model parameters jointly. More importantly, the configuration search problem can be transformed into a subset selection problem, which helps to significantly reduce the optimization difficulty and computation cost. In this way, the compression configurations of each layer and the trade-off between pruning and quantization can be automatically determined. Extensive experiments on CIFAR-100 and ImageNet show that SBS significantly reduces computation cost while achieving promising performance. For example, our SBS compressed MobileNetV2 achieves 22.6× Bit-Operation (BOP) reduction with only 0.1% drop in the Top-1 accuracy.

8.
Neural Netw ; 164: 177-185, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37149918

RESUMO

Deep neural networks (DNNs) are vulnerable to adversarial examples with small perturbations. Adversarial defense thus has been an important means which improves the robustness of DNNs by defending against adversarial examples. Existing defense methods focus on some specific types of adversarial examples and may fail to defend well in real-world applications. In practice, we may face many types of attacks where the exact type of adversarial examples in real-world applications can be even unknown. In this paper, motivated by that adversarial examples are more likely to appear near the classification boundary and are vulnerable to some transformations, we study adversarial examples from a new perspective that whether we can defend against adversarial examples by pulling them back to the original clean distribution. We empirically verify the existence of defense affine transformations that restore adversarial examples. Relying on this, we learn defense transformations to counterattack the adversarial examples by parameterizing the affine transformations and exploiting the boundary information of DNNs. Extensive experiments on both toy and real-world data sets demonstrate the effectiveness and generalization of our defense method. The code is avaliable at https://github.com/SCUTjinchengli/DefenseTransformer.


Assuntos
Generalização Psicológica , Aprendizagem , Redes Neurais de Computação
9.
Br J Ophthalmol ; 107(5): 650-656, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-34893473

RESUMO

AIMS: To characterise the influence of primary open-angle glaucoma (POAG) and high myopia (HM) on the macular and choroidal capillary density (CD). METHODS: Two hundred and seven eyes were enrolled, including 80 POAG without HM, 50 POAG with HM, 31 HM without POAG and 46 normal controls. A fovea-centred 6×6 mm optical coherence tomography angiography scan was performed to obtain the CD of the superficial capillary plexus (SCP), deep capillary plexus (DCP) and choriocapillaris. Macular and choroidal CDs were compared among the groups and the association of CDs with visual field mean deviation (MD) was determined using linear regression models. RESULTS: Compared with normal eyes, SCP CD was decreased in the POAG without HM group (p<0.05), while DCP CD was significantly decreased in the HM without POAG group (p<0.05). Both SCP and DCP CDs were significantly decreased in the POAG with HM group (p<0.05). CD reduction occurred mainly in the outer rather than inner ring of the 6×6 mm scan size. In multivariate regression analysis, worse MD was associated with lower CD in the outer ring of the SCP in all the HM eyes (p<0.05). CONCLUSIONS: POAG and HM reduced macular CD in different layers of the retinal capillary plexus and both particularly in the outer ring of the 6×6 mm scans. Furthermore, assessment of the CD in the outer ring of the SCP may facilitate the diagnosis of glaucoma in eyes with HM.


Assuntos
Glaucoma de Ângulo Aberto , Miopia , Humanos , Glaucoma de Ângulo Aberto/diagnóstico , Retina , Corioide/irrigação sanguínea , Microvasos , Tomografia de Coerência Óptica/métodos , Vasos Retinianos , Angiofluoresceinografia/métodos
10.
IEEE Trans Image Process ; 31: 1870-1881, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35139015

RESUMO

OCT fluid segmentation is a crucial task for diagnosis and therapy in ophthalmology. The current convolutional neural networks (CNNs) supervised by pixel-wise annotated masks achieve great success in OCT fluid segmentation. However, requiring pixel-wise masks from OCT images is time-consuming, expensive and expertise needed. This paper proposes an Intra- and inter-Slice Contrastive Learning Network (ISCLNet) for OCT fluid segmentation with only point supervision. Our ISCLNet learns visual representation by designing contrastive tasks that exploit the inherent similarity or dissimilarity from unlabeled OCT data. Specifically, we propose an intra-slice contrastive learning strategy to leverage the fluid-background similarity and the retinal layer-background dissimilarity. Moreover, we construct an inter-slice contrastive learning architecture to learn the similarity of adjacent OCT slices from one OCT volume. Finally, an end-to-end model combining intra- and inter-slice contrastive learning processes learns to segment fluid under the point supervision. The experimental results on two public OCT fluid segmentation datasets (i.e., AI Challenger and RETOUCH) demonstrate that the ISCLNet bridges the gap between fully-supervised and weakly-supervised OCT fluid segmentation and outperforms other well-known point-supervised segmentation methods.


Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador/métodos , Retina , Aprendizado de Máquina Supervisionado
11.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6501-6516, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-34097606

RESUMO

Designing effective architectures is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods. However, even a well-designed/searched architecture may still contain many nonsignificant or redundant modules/operations (e.g., some intermediate convolution or pooling layers). Such redundancy may not only incur substantial memory consumption and computational cost but also deteriorate the performance. Thus, it is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computational cost. To this end, we have proposed a Neural Architecture Transformer (NAT) method which casts the optimization problem into a Markov Decision Process (MDP) and seeks to replace the redundant operations with more efficient operations, such as skip or null connection. Note that NAT only considers a small number of possible replacements/transitions and thus comes with a limited search space. As a result, such a small search space may hamper the performance of architecture optimization. To address this issue, we propose a Neural Architecture Transformer++ (NAT++) method which further enlarges the set of candidate transitions to improve the performance of architecture optimization. Specifically, we present a two-level transition rule to obtain valid transitions, i.e., allowing operations to have more efficient types (e.g., convolution → separable convolution) or smaller kernel sizes (e.g., 5×5 → 3×3). Note that different operations may have different valid transitions. We further propose a Binary-Masked Softmax (BMSoftmax) layer to omit the possible invalid transitions. Last, based on the MDP formulation, we apply policy gradient to learn an optimal policy, which will be used to infer the optimized architectures. Extensive experiments show that the transformed architectures significantly outperform both their original counterparts and the architectures optimized by existing methods.


Assuntos
Algoritmos , Redes Neurais de Computação
12.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6209-6223, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-34138701

RESUMO

Temporal action localization, which requires a machine to recognize the location as well as the category of action instances in videos, has long been researched in computer vision. The main challenge of temporal action localization lies in that videos are usually long and untrimmed with diverse action contents involved. Existing state-of-the-art action localization methods divide each video into multiple action units (i.e., proposals in two-stage methods and segments in one-stage methods) and then perform action recognition/regression on each of them individually, without explicitly exploiting their relations during learning. In this paper, we claim that the relations between action units play an important role in action localization, and a more powerful action detector should not only capture the local content of each action unit but also allow a wider field of view on the context related to it. To this end, we propose a general graph convolutional module (GCM) that can be easily plugged into existing action localization methods, including two-stage and one-stage paradigms. To be specific, we first construct a graph, where each action unit is represented as a node and their relations between two action units as an edge. Here, we use two types of relations, one for capturing the temporal connections between different action units, and the other one for characterizing their semantic relationship. Particularly for the temporal connections in two-stage methods, we further explore two different kinds of edges, one connecting the overlapping action units and the other one connecting surrounding but disjointed units. Upon the graph we built, we then apply graph convolutional networks (GCNs) to model the relations among different action units, which is able to learn more informative representations to enhance action localization. Experimental results show that our GCM consistently improves the performance of existing action localization methods, including two-stage methods (e.g., CBR [15] and R-C3D [47]) and one-stage methods (e.g., D-SSAD [22]), verifying the generality and effectiveness of our GCM. Moreover, with the aid of GCM, our approach significantly outperforms the state-of-the-art on THUMOS14 (50.9 percent versus 42.8 percent). Augmentation experiments on ActivityNet also verify the efficacy of modeling the relationships between action units. The source code and the pre-trained models are available at https://github.com/Alvin-Zeng/GCM.

13.
IEEE Trans Pattern Anal Mach Intell ; 44(3): 1670-1684, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-32956036

RESUMO

Visual grounding (VG) aims to locate the most relevant object or region in an image, based on a natural language query. Generally, it requires the machine to first understand the query, identify the key concepts in the image, and then locate the target object by specifying its bounding box. However, in many real-world visual grounding applications, we have to face with ambiguous queries and images with complicated scene structures. Identifying the target based on highly redundant and correlated information can be very challenging, and often leading to unsatisfactory performance. To tackle this, in this paper, we exploit an attention module for each kind of information to reduce internal redundancies. We then propose an accumulated attention (A-ATT) mechanism to reason among all the attention modules jointly. In this way, the relation among different kinds of information can be explicitly captured. Moreover, to improve the performance and robustness of our VG models, we additionally introduce some noises into the training procedure to bridge the distribution gap between the human-labeled training data and the real-world poor quality data. With this "noised" training strategy, we can further learn a bounding box regressor, which can be used to refine the bounding box of the target object. We evaluate the proposed methods on four popular datasets (namely ReferCOCO, ReferCOCO+, ReferCOCOg, and GuessWhat?!). The experimental results show that our methods significantly outperform all previous works on every dataset in terms of accuracy.


Assuntos
Algoritmos , Atenção , Humanos
14.
IEEE Trans Pattern Anal Mach Intell ; 44(1): 211-227, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-32750833

RESUMO

Generative adversarial networks (GANs) have shown remarkable success in generating realistic data from some predefined prior distribution (e.g., Gaussian noises). However, such prior distribution is often independent of real data and thus may lose semantic information (e.g., geometric structure or content in images) of data. In practice, the semantic information might be represented by some latent distribution learned from data. However, such latent distribution may incur difficulties in data sampling for GAN methods. In this paper, rather than sampling from the predefined prior distribution, we propose a GAN model with local coordinate coding (LCC), termed LCCGAN, to improve the performance of the image generation. First, we propose an LCC sampling method in LCCGAN to sample meaningful points from the latent manifold. With the LCC sampling method, we can explicitly exploit the local information on the latent manifold and thus produce new data with promising quality. Second, we propose an improved version, namely LCCGAN++, by introducing a higher-order term in the generator approximation. This term is able to achieve better approximation and thus further improve the performance. More critically, we derive the generalization bound for both LCCGAN and LCCGAN++ and prove that a low-dimensional input is sufficient to achieve good generalization performance. Extensive experiments on several benchmark datasets demonstrate the superiority of the proposed method over existing GAN methods.

15.
IEEE Trans Pattern Anal Mach Intell ; 44(8): 4035-4051, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-33755553

RESUMO

We study network pruning which aims to remove redundant channels/kernels and hence speed up the inference of deep networks. Existing pruning methods either train from scratch with sparsity constraints or minimize the reconstruction error between the feature maps of the pre-trained models and the compressed ones. Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, while the latter kind optimizes the reconstruction error but ignores the discriminative power of channels. In this paper, we propose a simple-yet-effective method called discrimination-aware channel pruning (DCP) to choose the channels that actually contribute to the discriminative power. To this end, we first introduce additional discrimination-aware losses into the network to increase the discriminative power of the intermediate layers. Next, we select the most discriminative channels for each layer by considering the discrimination-aware loss and the reconstruction error, simultaneously. We then formulate channel pruning as a sparsity-inducing optimization problem with a convex objective and propose a greedy algorithm to solve the resultant problem. Note that a channel (3D tensor) often consists of a set of kernels (each with a 2D matrix). Besides the redundancy in channels, some kernels in a channel may also be redundant and fail to contribute to the discriminative power of the network, resulting in kernel level redundancy. To solve this issue, we propose a discrimination-aware kernel pruning (DKP) method to further compress deep networks by removing redundant kernels. To avoid manually determining the pruning rate for each layer, we propose two adaptive stopping conditions to automatically determine the number of selected channels/kernels. The proposed adaptive stopping conditions tend to yield more efficient models with better performance in practice. Extensive experiments on both image classification and face recognition demonstrate the effectiveness of our methods. For example, on ILSVRC-12, the resultant ResNet-50 model with 30 percent reduction of channels even outperforms the baseline model by 0.36 percent in terms of Top-1 accuracy. We also deploy the pruned models on a smartphone (equipped with a Qualcomm Snapdragon 845 processor). The pruned MobileNetV1 and MobileNetV2 achieve 1.93× and 1.42× inference acceleration on the mobile device, respectively, with negligible performance degradation. The source code and the pre-trained models are available at https://github.com/SCUT-AILab/DCP.


Assuntos
Algoritmos , Compressão de Dados , Compressão de Dados/métodos , Pressão
16.
Ophthalmology ; 129(1): 45-53, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34619247

RESUMO

PURPOSE: To develop and evaluate the performance of a 3-dimensional (3D) deep-learning-based automated digital gonioscopy system (DGS) in detecting 2 major characteristics in eyes with suspected primary angle-closure glaucoma (PACG): (1) narrow iridocorneal angles (static gonioscopy, Task I) and (2) peripheral anterior synechiae (PAS) (dynamic gonioscopy, Task II) on OCT scans. DESIGN: International, cross-sectional, multicenter study. PARTICIPANTS: A total of 1.112 million images of 8694 volume scans (2294 patients) from 3 centers were included in this study (Task I, training/internal validation/external testing: 4515, 1101, and 2222 volume scans, respectively; Task II, training/internal validation/external testing: 378, 376, and 102 volume scans, respectively). METHODS: For Task I, a narrow angle was defined as an eye in which the posterior pigmented trabecular meshwork was not visible in more than 180° without indentation in the primary position captured in the dark room from the scans. For Task II, PAS was defined as the adhesion of the iris to the trabecular meshwork. The diagnostic performance of the 3D DGS was evaluated in both tasks with gonioscopic records as reference. MAIN OUTCOME MEASURES: The area under the curve (AUC), sensitivity, and specificity of the 3D DGS were calculated. RESULTS: In Task I, 29.4% of patients had a narrow angle. The AUC, sensitivity, and specificity of 3D DGS on the external testing datasets were 0.943 (0.933-0.953), 0.867 (0.838-0.895), and 0.878 (0.859-0.896), respectively. For Task II, 13.8% of patients had PAS. The AUC, sensitivity, and specificity of 3D DGS were 0.902 (0.818-0.985), 0.900 (0.714-1.000), and 0.890 (0.841-0.938), respectively, on the external testing set at quadrant level following normal clinical practice; and 0.885 (0.836-0.933), 0.912 (0.816-1.000), and 0.700 (0.660-0.741), respectively, on the external testing set at clock-hour level. CONCLUSIONS: The 3D DGS is effective in detecting eyes with suspected PACG. It has the potential to be used widely in the primary eye care community for screening of subjects at high risk of developing PACG.


Assuntos
Córnea/patologia , Glaucoma de Ângulo Fechado/diagnóstico , Gonioscopia/métodos , Imageamento Tridimensional/métodos , Iris/patologia , Tomografia de Coerência Óptica/métodos , Malha Trabecular/patologia , Adulto , Idoso , Área Sob a Curva , Córnea/diagnóstico por imagem , Estudos Transversais , Diagnóstico por Computador , Feminino , Humanos , Pressão Intraocular , Iris/diagnóstico por imagem , Masculino , Pessoa de Meia-Idade , Sensibilidade e Especificidade
17.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6454-6471, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-34101584

RESUMO

This paper focuses on the challenging task of learning 3D object surface reconstructions from RGB images. Existing methods achieve varying degrees of success by using different surface representations. However, they all have their own drawbacks, and cannot properly reconstruct the surface shapes of complex topologies, arguably due to a lack of constraints on the topological structures in their learning frameworks. To this end, we propose to learn and use the topology-preserved, skeletal shape representation to assist the downstream task of object surface reconstruction from RGB images. Technically, we propose the novel SkeletonNet design that learns a volumetric representation of a skeleton via a bridged learning of a skeletal point set, where we use parallel decoders each responsible for the learning of points on 1D skeletal curves and 2D skeletal sheets, as well as an efficient module of globally guided subvolume synthesis for a refined, high-resolution skeletal volume; we present a differentiable Point2Voxel layer to make SkeletonNet end-to-end and trainable. With the learned skeletal volumes, we propose two models, the Skeleton-Based Graph Convolutional Neural Network (SkeGCNN) and the Skeleton-Regularized Deep Implicit Surface Network (SkeDISN), which respectively build upon and improve over the existing frameworks of explicit mesh deformation and implicit field learning for the downstream surface reconstruction task. We conduct thorough experiments that verify the efficacy of our proposed SkeletonNet. SkeGCNN and SkeDISN outperform existing methods as well, and they have their own merits when measured by different metrics. Additional results in generalized task settings further demonstrate the usefulness of our proposed methods. We have made our implementation code publicly available at https://github.com/tangjiapeng/SkeletonNet.


Assuntos
Algoritmos , Aprendizagem , Aprendizado de Máquina , Redes Neurais de Computação
18.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6649-6666, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-34181534

RESUMO

Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages. Existing solutions either rely on hand-crafted descriptors or supervised gait representation learning. This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID. Specifically, we first create self-supervision by learning to reconstruct unlabeled skeleton sequences reversely, which involves richer high-level semantics to obtain better gait representations. Other pretext tasks are also explored to further improve self-supervised learning. Second, inspired by the fact that motion's continuity endows adjacent skeletons in one skeleton sequence and temporally consecutive skeleton sequences with higher correlations (referred as locality in 3D skeleton data), we propose a locality-aware attention mechanism and a locality-aware contrastive learning scheme, which aim to preserve locality-awareness on intra-sequence level and inter-sequence level respectively during self-supervised learning. Last, with context vectors learned by our locality-aware attention mechanism and contrastive learning scheme, a novel feature named Constrastive Attention-based Gait Encodings (CAGEs) is designed to represent gait effectively. Empirical evaluations show that our approach significantly outperforms skeleton-based counterparts by 15-40 percent Rank-1 accuracy, and it even achieves superior performance to numerous multi-modal methods with extra RGB or depth information. Our codes are available at https://github.com/Kali-Hac/Locality-Awareness-SGE.


Assuntos
Algoritmos , Marcha , Humanos , Esqueleto
19.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6140-6152, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-34125669

RESUMO

This paper tackles the problem of training a deep convolutional neural network of both low-bitwidth weights and activations. Optimizing a low-precision network is very challenging due to the non-differentiability of the quantizer, which may result in substantial accuracy loss. To address this, we propose three practical approaches, including (i) progressive quantization; (ii) stochastic precision; and (iii) joint knowledge distillation to improve the network training. First, for progressive quantization, we propose two schemes to progressively find good local minima. Specifically, we propose to first optimize a network with quantized weights and subsequently quantize activations. This is in contrast to the traditional methods which optimize them simultaneously. Furthermore, we propose a second progressive quantization scheme which gradually decreases the bitwidth from high-precision to low-precision during training. Second, to alleviate the excessive training burden due to the multi-round training stages, we further propose a one-stage stochastic precision strategy to randomly sample and quantize sub-networks while keeping other parts in full-precision. Finally, we adopt a novel learning scheme to jointly train a full-precision model alongside the low-precision one. By doing so, the full-precision model provides hints to guide the low-precision model training and significantly improves the performance of the low-precision network. Extensive experiments on various datasets (e.g., CIFAR-100, ImageNet) show the effectiveness of the proposed methods.


Assuntos
Algoritmos , Redes Neurais de Computação
20.
Invest Ophthalmol Vis Sci ; 62(15): 1, 2021 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-34851376

RESUMO

Purpose: The purpose of this study was to determine the longitudinal changes in macular retinal and choroidal microvasculature in normal healthy and highly myopic eyes. Methods: Seventy-one eyes, including 32 eyes with high myopia and 39 healthy control eyes, followed for at least 12 months and examined using optical coherence tomography angiography imaging in at least 3 visits, were included in this study. Fovea-centered 6 × 6 mm scans were performed to measure capillary density (CD) of the superficial capillary plexus (SCP), deep capillary plexus (DCP), and choriocapillaris (CC). The rates of CD changes in both groups were estimated using a linear mixed model. Results: Over a mean 14-month follow-up period, highly myopic eyes exhibited a faster rate of whole image CD (wiCD) loss (-1.44%/year vs. -0.11%/year, P = 0.001) and CD loss in the outer ring of the DCP (-1.67%/year vs. -0.14%/year, P < 0.001) than healthy eyes. In multivariate regression analysis, baseline axial length (AL) was negatively correlated with the rate of wiCD loss (estimate = -0.27, 95% confidence interval [CI] = -0.48 to -0.06, P = 0.012) and CD loss in the outer ring (estimate = -0.33, 95% CI = -0.56 to -0.11, P = 0.005), of the DCP. The CD reduction rates in the SCP and CC were comparable in both groups (all P values > 0.05). Conclusions: The rate of CD loss in the DCP is significantly faster in highly myopic eyes than in healthy eyes and is related to baseline AL. The CD in the outer ring reduces faster in eyes with longer baseline AL.


Assuntos
Corioide/irrigação sanguínea , Miopia Degenerativa/fisiopatologia , Vasos Retinianos/fisiopatologia , Adulto , Capilares/diagnóstico por imagem , Capilares/fisiopatologia , Corioide/diagnóstico por imagem , Feminino , Angiofluoresceinografia , Seguimentos , Voluntários Saudáveis , Humanos , Pressão Intraocular/fisiologia , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Miopia Degenerativa/diagnóstico por imagem , Estudos Prospectivos , Vasos Retinianos/diagnóstico por imagem , Tomografia de Coerência Óptica , Acuidade Visual/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA