Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 90
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Opt Lett ; 46(10): 2344-2347, 2021 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-33988579

RESUMO

Rapid screening of red blood cells for active infection of COVID-19 is presented using a compact and field-portable, 3D-printed shearing digital holographic microscope. Video holograms of thin blood smears are recorded, individual red blood cells are segmented for feature extraction, then a bi-directional long short-term memory network is used to classify between healthy and COVID positive red blood cells based on their spatiotemporal behavior. Individuals are then classified based on the simple majority of their cells' classifications. The proposed system may be beneficial for under-resourced healthcare systems. To the best of our knowledge, this is the first report of digital holographic microscopy for rapid screening of COVID-19.


Assuntos
Teste para COVID-19/métodos , COVID-19/sangue , Aprendizado Profundo , Eritrócitos/patologia , Holografia/instrumentação , SARS-CoV-2 , COVID-19/classificação , Humanos , Aumento da Imagem/instrumentação , Microscopia/instrumentação , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
2.
Sensors (Basel) ; 21(8)2021 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-33918952

RESUMO

Recent one-stage 3D detection methods generate anchor boxes with various sizes and orientations in the ground plane, then determine whether these anchor boxes contain any region of interest and adjust the edges of them for accurate object bounding boxes. The anchor-based algorithm calculates the classification and regression label for each anchor box during the training process, which is inefficient and complicated. We propose a one-stage, anchor-free 3D vehicle detection algorithm based on LiDAR point clouds. The object position is encoded as a set of keypoints in the bird's-eye view (BEV) of point clouds. We apply the voxel/pillar feature extractor and convolutional blocks to map an unstructured point cloud to a single-channel 2D heatmap. The vehicle's Z-axis position, dimension, and orientation angle are regressed as additional attributes of the keypoints. Our method combines SmoothL1 loss and IoU (Intersection over Union) loss, and we apply (cosθ,sinθ) as angle regression labels, which achieve high average orientation similarity (AOS) without any direction classification tricks. During the target assignment and bounding box decoding process, our framework completely avoids any calculations related to anchor boxes. Our framework is end-to-end training and stands at the same performance level as the other one-stage anchor-based detectors.

3.
Purinergic Signal ; 16(1): 61-72, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31989534

RESUMO

Accumulating evidence supports a therapeutic role of purinergic signaling in cardiac diseases. Previously, efficacy of systemically infused MRS2339, a charged methanocarba derivative of 2-Cl-adenosine monophosphate, was demonstrated in animal models of heart failure. We now test the hypothesis that an uncharged adenine nucleoside phosphonate, suitable as an oral agent with a hydrolysis-resistant phospho moiety, can prevent the development of cardiac dysfunction in a post-infarction ischemic or pressure overload-induced heart failure model in mice. The diester-masked uncharged phosphonate MRS2978 was efficacious in preventing cardiac dysfunction with improved left ventricular (LV) fractional shortening when administered orally at the onset of ischemic or pressure overload-induced heart failure. MRS2925, the charged, unmasked MRS2978 analog, prevented heart dysfunction when infused subcutaneously but not by oral gavage. When administered orally or systemically, MRS2978 but not MRS2925 could also rescue established cardiac dysfunction in both ischemic and pressure overload heart failure models. The diester-masked phosphate MRS4074 was highly efficacious at preventing the development of dysfunction as well as in rescuing pressure overload-induced and ischemic heart failure. MRS2978 was orally bioavailable (57-75%) giving rise to MRS2925 as a minor metabolite in vivo, tested in rats. The data are consistent with a novel therapeutic role of adenine nucleoside phosphonates in systolic heart failure.


Assuntos
Monofosfato de Adenosina/farmacologia , Insuficiência Cardíaca , Agonistas do Receptor Purinérgico P2X/farmacologia , Monofosfato de Adenosina/síntese química , Monofosfato de Adenosina/química , Animais , Camundongos , Agonistas do Receptor Purinérgico P2X/síntese química , Agonistas do Receptor Purinérgico P2X/química
4.
Am J Physiol Heart Circ Physiol ; 307(10): H1469-77, 2014 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-25239801

RESUMO

P2X4 receptors (P2X4Rs) are ligand-gated ion channels capable of conducting cations such as Na(+). Endogenous cardiac P2X4R can mediate ATP-activated current in adult murine cardiomyocytes. In the present study, we tested the hypothesis that cardiac P2X receptors can induce Na(+) entry and modulate Na(+) handling. We further determined whether P2X receptor-induced stimulation of the Na(+)/Ca(2+) exchanger (NCX) has a role in modulating the cardiac contractile state. Changes in Na(+)-K(+)-ATPase current (Ip) and NCX current (INCX) after agonist stimulation were measured in ventricular myocytes of P2X4 transgenic mice using whole cell patch-clamp techniques. The agonist 2-methylthio-ATP (2-meSATP) increased peak Ip from a basal level of 0.52 ± 0.02 to 0.58 ± 0.03 pA/pF. 2-meSATP also increased the Ca(2+) entry mode of INCX (0.55 ± 0.09 pA/pF under control conditions vs. 0.82 ± 0.14 pA/pF with 2-meSATP) at a membrane potential of +50 mV. 2-meSATP shifted the reversal potential of INCX from -14 ± 2.3 to -25 ± 4.1 mV, causing an estimated intracellular Na(+) concentration increase of 1.28 ± 0.42 mM. These experimental results were closely mimicked by mathematical simulations based on previously established models. KB-R7943 or a structurally different agent preferentially opposing the Ca(2+) entry mode of NCX, YM-244769, could inhibit the 2-meSATP-induced increase in cell shortening in transgenic myocytes. Thus, the Ca(2+) entry mode of INCX participates in P2X agonist-stimulated contractions. In ventricular myocytes from wild-type mice, the P2X agonist could increase INCX, and KB-R7943 was able to inhibit the contractile effect of endogenous P2X4Rs, indicating a physiological role of these receptors in wild-type cells. The data demonstrate a novel Na(+) entry pathway through ligand-gated P2X4Rs in cardiomyocytes.


Assuntos
Miócitos Cardíacos/metabolismo , Receptores Purinérgicos P2X4/metabolismo , Trocador de Sódio e Cálcio/metabolismo , Sódio/metabolismo , Potenciais de Ação , Trifosfato de Adenosina/análogos & derivados , Trifosfato de Adenosina/farmacologia , Animais , Simulação por Computador , Ligantes , Camundongos Transgênicos , Modelos Cardiovasculares , Contração Miocárdica , Miócitos Cardíacos/efeitos dos fármacos , Niacinamida/análogos & derivados , Niacinamida/farmacologia , Agonistas do Receptor Purinérgico P2X/farmacologia , Receptores Purinérgicos P2X4/efeitos dos fármacos , Receptores Purinérgicos P2X4/genética , Transdução de Sinais , Trocador de Sódio e Cálcio/antagonistas & inibidores , ATPase Trocadora de Sódio-Potássio/metabolismo , Tionucleotídeos/farmacologia , Tioureia/análogos & derivados , Tioureia/farmacologia
5.
IEEE Trans Neural Netw Learn Syst ; 35(4): 5435-5446, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37267139

RESUMO

Few-shot object detection (FSOD), which detects novel objects with only a few training instances, has recently attracted more attention. Previous works focus on making the most use of label information of objects. Still, they fail to consider the structural and semantic information of the image itself and solve the misclassification between data-abundant base classes and data-scarce novel classes efficiently. In this article, we propose FSOD with Self-Supervising and Cooperative Classifier ( [Formula: see text]) approach to deal with those concerns. Specifically, we analyze the underlying performance degradation of novel classes in FSOD and discover that false-positive samples are the main reason. By looking into these false-positive samples, we further notice that misclassifying novel classes as base classes are the main cause. Thus, we introduce double RoI heads into the existing Fast-RCNN to learn more specific features for novel classes. We also consider using self-supervised learning (SSL) to learn more structural and semantic information. Finally, we propose a cooperative classifier (CC) with the base-novel regularization to maximize the interclass variance between base and novel classes. In the experiment, [Formula: see text] outperforms all the latest baselines in most cases on PASCAL VOC and COCO.

6.
Artigo em Inglês | MEDLINE | ID: mdl-38743545

RESUMO

Fusing features from different sources is a critical aspect of many computer vision tasks. Existing approaches can be roughly categorized as parameter-free or learnable operations. However, parameter-free modules are limited in their ability to benefit from offline learning, leading to poor performance in some challenging situations. Learnable fusing methods are often space-consuming and timeconsuming, particularly when fusing features with different shapes. To address these shortcomings, we conducted an in-depth analysis of the limitations associated with both fusion methods. Based on our findings, we propose a generalized module named Asymmetric Convolution Module (ACM). This module can learn to encode effective priors during offline training and efficiently fuse feature maps with different shapes in specific tasks. Specifically, we propose a mathematically equivalent method for replacing costly convolutions on concatenated features. This method can be widely applied to fuse feature maps across different shapes. Furthermore, distinguished from parameter-free operations that can only fuse two features of the same type, our ACM is general, flexible, and can fuse multiple features of different types. To demonstrate the generality and efficiency of ACM, we integrate it into several state-of-the-art models on three representative vision tasks: visual object tracking, referring video object segmentation, and monocular 3D object detection. Extensive experimental results on three tasks and several datasets demonstrate that our new module can bring significant improvements and noteworthy efficiency.

7.
IEEE Trans Image Process ; 32: 2889-2900, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36240039

RESUMO

Point cloud semantic segmentation (PCSS), for the purpose of labeling a set of points stored in irregular and unordered structures, is an important yet challenging task. It is vital for the task of learning a good representation for each 3D data point, which encodes rich context knowledge and hierarchically structural information. However, despite great success has been achieved by existing PCSS methods, they are limited to make full use of important context information and rich hierarchical features for representation learning. In this paper, we propose to build 'hyperpoint' representations for 3D data point via a nested network architecture, which is able to explicitly exploit multi-scale, pyramidally hierarchical features and construct powerful representations for PCSS. In particular, we introduce a PCSS nested architecture search (PCSS-NAS) algorithm to automatically design the model's side-output branches at different levels as well as its skip-layer structures, enabling the resulting model to best deal with the scale-space problem. Our searched architecture, named Auto-NestedNet, is evaluated on four well-known benchmarks: S3DIS, ScanNet, Semantic3D and Paris-Lille-3D. Experimental results show that the proposed Auto-NestedNet achieves the state-of-the-art performance. Our source code is available at https://github.com/fanyang587/NestedNet.

8.
IEEE Trans Pattern Anal Mach Intell ; 45(8): 9822-9835, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34752380

RESUMO

Previous works for LiDAR-based 3D object detection mainly focus on the single-frame paradigm. In this paper, we propose to detect 3D objects by exploiting temporal information in multiple frames, i.e., point cloud videos. We empirically categorize the temporal information into short-term and long-term patterns. To encode the short-term data, we present a Grid Message Passing Network (GMPNet), which considers each grid (i.e., the grouped points) as a node and constructs a k-NN graph with the neighbor grids. To update features for a grid, GMPNet iteratively collects information from its neighbors, thus mining the motion cues in grids from nearby frames. To further aggregate long-term frames, we propose an Attentive Spatiotemporal Transformer GRU (AST-GRU), which contains a Spatial Transformer Attention (STA) module and a Temporal Transformer Attention (TTA) module. STA and TTA enhance the vanilla GRU to focus on small objects and better align moving objects. Our overall framework supports both online and offline video object detection in point clouds. We implement our algorithm based on prevalent anchor-based and anchor-free detectors. Evaluation results on the challenging nuScenes benchmark show superior performance of our method, achieving first on the leaderboard (at the time of paper submission) without any "bells and whistles." Our source code is available at https://github.com/shenjianbing/GMP3D.


Assuntos
Algoritmos , Redes Neurais de Computação , Benchmarking , Sinais (Psicologia) , Movimento (Física)
9.
IEEE Trans Vis Comput Graph ; 29(6): 2926-2939, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35044917

RESUMO

Mesh Schelling points explain how humans focus on specific regions of a 3D object. They have a large number of important applications in computer graphics and provide valuable information for perceptual psychology studies. However, detecting mesh Schelling points is time-consuming and expensive since the existing techniques are mostly based on participant observation studies. To overcome these limitations, we propose to employ powerful deep learning techniques to detect mesh Schelling points in an automatic manner, free from participant observation studies. Specifically, we utilize the mesh convolution and pooling operations to extract informative features from mesh objects, and then predict the 3D heat map of Schelling points in an end-to-end manner. In addition, we propose a Deep Schelling Network (DS-Net) to automatically detect the Schelling points, including a multi-scale fusion component and a novel region-specific loss function to improve our network for a better regression of heat maps. To the best of our knowledge, DS-Net is the first deep neural network for detecting Schelling points from 3D meshes. We evaluate DS-Net on a mesh Schelling point dataset obtained from participant observation studies. The experimental results demonstrate that DS-Net is capable of detecting mesh Schelling points effectively and outperforms various state-of-the-art mesh saliency methods and deep learning models, both qualitatively and quantitatively.

10.
IEEE Trans Image Process ; 32: 3163-3175, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37115829

RESUMO

Current video semantic segmentation tasks involve two main challenges: how to take full advantage of multi-frame context information, and how to improve computational efficiency. To tackle the two challenges simultaneously, we present a novel Multi-Granularity Context Network (MGCNet) by aggregating context information at multiple granularities in a more effective and efficient way. Our method first converts image features into semantic prototypes, and then conducts a non-local operation to aggregate the per-frame and short-term contexts jointly. An additional long-term context module is introduced to capture the video-level semantic information during training. By aggregating both local and global semantic information, a strong feature representation is obtained. The proposed pixel-to-prototype non-local operation requires less computational cost than traditional non-local ones, and is video-friendly since it reuses the semantic prototypes of previous frames. Moreover, we propose an uncertainty-aware and structural knowledge distillation strategy to boost the performance of our method. Experiments on Cityscapes and CamVid datasets with multiple backbones demonstrate that the proposed MGCNet outperforms other state-of-the-art methods with high speed and low latency.

11.
IEEE Trans Pattern Anal Mach Intell ; 45(7): 8049-8062, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37015606

RESUMO

In this article, we provide an intuitive viewing to simplify the Siamese-based trackers by converting the tracking task to a classification. Under this viewing, we perform an in-depth analysis for them through visual simulations and real tracking examples, and find that the failure cases in some challenging situations can be regarded as the issue of missing decisive samples in offline training. Since the samples in the initial (first) frame contain rich sequence-specific information, we can regard them as the decisive samples to represent the whole sequence. To quickly adapt the base model to new scenes, a compact latent network is presented via fully using these decisive samples. Specifically, we present a statistics-based compact latent feature for fast adjustment by efficiently extracting the sequence-specific information. Furthermore, a new diverse sample mining strategy is designed for training to further improve the discrimination ability of the proposed compact latent network. Finally, a conditional updating strategy is proposed to efficiently update the basic models to handle scene variation during the tracking phase. To evaluate the generalization ability and effectiveness and of our method, we apply it to adjust three classical Siamese-based trackers, namely SiamRPN++, SiamFC, and SiamBAN. Extensive experimental results on six recent datasets demonstrate that all three adjusted trackers obtain the superior performance in terms of the accuracy, while having high running speed.

12.
Artigo em Inglês | MEDLINE | ID: mdl-37022854

RESUMO

This article presents a new adaptive metric distillation approach that can significantly improve the student networks' backbone features, along with better classification results. Previous knowledge distillation (KD) methods usually focus on transferring the knowledge across the classifier logits or feature structure, ignoring the excessive sample relations in the feature space. We demonstrated that such a design greatly limits performance, especially for the retrieval task. The proposed collaborative adaptive metric distillation (CAMD) has three main advantages: 1) the optimization focuses on optimizing the relationship between key pairs by introducing the hard mining strategy into the distillation framework; 2) it provides an adaptive metric distillation that can explicitly optimize the student feature embeddings by applying the relation in the teacher embeddings as supervision; and 3) it employs a collaborative scheme for effective knowledge aggregation. Extensive experiments demonstrated that our approach sets a new state-of-the-art in both the classification and retrieval tasks, outperforming other cutting-edge distillers under various settings.

13.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 197-210, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-35104213

RESUMO

Subspace clustering is a classical technique that has been widely used for human motion segmentation and other related tasks. However, existing segmentation methods often cluster data without guidance from prior knowledge, resulting in unsatisfactory segmentation results. To this end, we propose a novel Consistency and Diversity induced human Motion Segmentation (CDMS) algorithm. Specifically, our model factorizes the source and target data into distinct multi-layer feature spaces, in which transfer subspace learning is conducted on different layers to capture multi-level information. A multi-mutual consistency learning strategy is carried out to reduce the domain gap between the source and target data. In this way, the domain-specific knowledge and domain-invariant properties can be explored simultaneously. Besides, a novel constraint based on the Hilbert Schmidt Independence Criterion (HSIC) is introduced to ensure the diversity of multi-level subspace representations, which enables the complementarity of multi-level representations to be explored to boost the transfer learning performance. Moreover, to preserve the temporal correlations, an enhanced graph regularizer is imposed on the learned representation coefficients and the multi-level representations of the source data. The proposed model can be efficiently solved using the Alternating Direction Method of Multipliers (ADMM) algorithm. Extensive experimental results on public human motion datasets demonstrate the effectiveness of our method against several state-of-the-art approaches.


Assuntos
Algoritmos , Humanos , Análise por Conglomerados
14.
IEEE Trans Image Process ; 32: 6543-6557, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37922168

RESUMO

Self-supervised space-time correspondence learning utilizing unlabeled videos holds great potential in computer vision. Most existing methods rely on contrastive learning with mining negative samples or adapting reconstruction from the image domain, which requires dense affinity across multiple frames or optical flow constraints. Moreover, video correspondence prediction models need to uncover more inherent properties of the video, such as structural information. In this work, we propose HiGraph+, a sophisticated space-time correspondence framework based on learnable graph kernels. By treating videos as a spatial-temporal graph, the learning objective of HiGraph+ is issued in a self-supervised manner, predicting the unobserved hidden graph via graph kernel methods. First, we learn the structural consistency of sub-graphs in graph-level correspondence learning. Furthermore, we introduce a spatio-temporal hidden graph loss through contrastive learning that facilitates learning temporal coherence across frames of sub-graphs and spatial diversity within the same frame. Therefore, we can predict long-term correspondences and drive the hidden graph to acquire distinct local structural representations. Then, we learn a refined representation across frames on the node-level via a dense graph kernel. The structural and temporal consistency of the graph forms the self-supervision of model training. HiGraph+ achieves excellent performance and demonstrates robustness in benchmark tests involving object, semantic part, keypoint, and instance labeling propagation tasks. Our algorithm implementations have been made publicly available at https://github.com/zyqin19/HiGraph.

15.
Vaccines (Basel) ; 11(2)2023 Feb 04.
Artigo em Inglês | MEDLINE | ID: mdl-36851235

RESUMO

Since the emergence of SARS-CoV-2, maintaining healthcare worker (HCW) health and safety has been fundamental to responding to the global pandemic. Vaccination with mRNA-base vaccines targeting SARS-CoV-2 spike protein has emerged as a key strategy in reducing HCW susceptibility to SARS-CoV-2, however, neutralizing antibody responses subside with time and may be influenced by many variables. We sought to understand the dynamics between vaccine products, prior clinical illness from SARS-CoV-2, and incidence of vaccine-associated adverse reactions on antibody decay over time in HCWs at a university medical center. A cohort of 296 HCWs received standard two-dose vaccination with either bnt162b2 (Pfizer/BioNTech) or mRNA-1273 (Moderna) and were evaluated after two, six, and nine months. Subjects were grouped by antibody decay curve into steep antibody decliners gentle decliners. Vaccination with mRNA-1273 led to more sustained antibody responses compared to bnt162b2. Subjects experiencing vaccine-associated symptoms were more likely to experience a more prolonged neutralizing antibody response. Subjects with clinical SARS-CoV-2 infection prior to vaccination were more likely to experience vaccination-associated symptoms after first vaccination and were more likely to have a more blunted antibody decay. Understanding factors associated with vaccine efficacy may assist clinicians in determining appropriate vaccine strategies in HCWs.

16.
Artigo em Inglês | MEDLINE | ID: mdl-35130171

RESUMO

Answering semantically complicated questions according to an image is challenging in a visual question answering (VQA) task. Although the image can be well represented by deep learning, the question is always simply embedded and cannot well indicate its meaning. Besides, the visual and textual features have a gap for different modalities, it is difficult to align and utilize the cross-modality information. In this article, we focus on these two problems and propose a graph matching attention (GMA) network. First, it not only builds graph for the image but also constructs graph for the question in terms of both syntactic and embedding information. Next, we explore the intramodality relationships by a dual-stage graph encoder and then present a bilateral cross-modality GMA to infer the relationships between the image and the question. The updated cross-modality features are then sent into the answer prediction module for final answer prediction. Experiments demonstrate that our network achieves the state-of-the-art performance on the GQA dataset and the VQA 2.0 dataset. The ablation studies verify the effectiveness of each module in our GMA network.

17.
IEEE Trans Pattern Anal Mach Intell ; 44(4): 2228-2242, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33232224

RESUMO

We introduce a novel network, called CO-attention siamese network (COSNet), to address the zero-shot video object segmentation task in a holistic fashion. We exploit the inherent correlation among video frames and incorporate a global co-attention mechanism to further improve the state-of-the-art deep learning based solutions that primarily focus on learning discriminative foreground representations over appearance and motion in short-term temporal segments. The co-attention layers in COSNet provide efficient and competent stages for capturing global correlations and scene context by jointly computing and appending co-attention responses into a joint feature space. COSNet is a unified and end-to-end trainable framework where different co-attention variants can be derived for capturing diverse properties of the learned joint feature space. We train COSNet with pairs (or groups) of video frames, and this naturally augments training data and allows increased learning capacity. During the segmentation stage, the co-attention model encodes useful information by processing multiple reference frames together, which is leveraged to infer the frequently reappearing and salient foreground objects better. Our extensive experiments over three large benchmarks demonstrate that COSNet outperforms the current alternatives by a large margin. Our implementations are available at https://github.com/carrierlxk/COSNet.

18.
IEEE Trans Pattern Anal Mach Intell ; 44(6): 2827-2840, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-33400648

RESUMO

This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images. Considering the intrinsic complexity and structural nature of the task, we introduce a cascaded parsing network (CP-HOI) for a multi-stage, structured HOI understanding. At each cascade stage, an instance detection module progressively refines HOI proposals and feeds them into a structured interaction reasoning module. Each of the two modules is also connected to its predecessor in the previous stage, enabling efficient cross-stage information propagation. The structured interaction reasoning module is built upon a graph parsing neural network (GPNN), which efficiently models potential HOI structures as graphs and mines rich context for comprehensive relation understanding. In particular, GPNN infers a parse graph that i) interprets meaningful HOI structures by a learnable adjacency matrix, and ii) predicts action (edge) labels. Within an end-to-end, message-passing framework, GPNN blends learning and inference, iteratively parsing HOI structures and reasoning HOI representations (i.e., instance and relation features). Further beyond relation detection at a bounding-box level, we make our framework flexible to perform fine-grained pixel-wise relation segmentation; this provides a new glimpse into better relation modeling. A preliminary version of our CP-HOI model reached 1st place in the ICCV2019 Person in Context Challenge, on both relation detection and segmentation. In addition, our CP-HOI shows promising results on two popular HOI recognition benchmarks, i.e., V-COCO and HICO-DET.


Assuntos
Algoritmos , Redes Neurais de Computação , Humanos , Aprendizagem , Percepção Visual
19.
IEEE Trans Pattern Anal Mach Intell ; 44(7): 3508-3522, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-33513100

RESUMO

Modeling the human structure is central for human parsing that extracts pixel-wise semantic information from images. We start with analyzing three types of inference processes over the hierarchical structure of human bodies: direct inference (directly predicting human semantic parts using image information), bottom-up inference (assembling knowledge from constituent parts), and top-down inference (leveraging context from parent nodes). We then formulate the problem as a compositional neural information fusion (CNIF) framework, which assembles the information from the three inference processes in a conditional manner, i.e., considering the confidence of the sources. Based on CNIF, we further present a part-relation-aware human parser (PRHP), which precisely describes three kinds of human part relations, i.e., decomposition, composition, and dependency, by three distinct relation networks. Expressive relation information can be captured by imposing the parameters in the relation networks to satisfy specific geometric characteristics of different relations. By assimilating generic message-passing networks with their edge-typed, convolutional counterparts, PRHP performs iterative reasoning over the human body hierarchy. With these efforts, PRHP provides a more general and powerful form of CNIF, and lays the foundation for more sophisticated and flexible human relation patterns of reasoning. Experiments on five datasets demonstrate that our two human parsers outperform the state-of-the-arts in all cases.


Assuntos
Algoritmos , Semântica , Humanos , Software
20.
IEEE Trans Image Process ; 31: 585-597, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34310301

RESUMO

Separating the dominant person from the complex background is significant to the human-related research and photo-editing based applications. Existing segmentation algorithms are either too general to separate the person region accurately, or not capable of achieving real-time speed. In this paper, we introduce the multi-domain learning framework into a novel baseline model to construct the Multi-domain TriSeNet Networks for the real-time single person image segmentation. We first divide training data into different subdomains based on the characteristics of single person images, then apply a multi-branch Feature Fusion Module (FFM) to decouple the networks into the domain-independent and the domain-specific layers. To further enhance the accuracy, a self-supervised learning strategy is proposed to dig out domain relations during training. It helps transfer domain-specific knowledge by improving predictive consistency among different FFM branches. Moreover, we create a large-scale single person image segmentation dataset named MSSP20k, which consists of 22,100 pixel-level annotated images in the real world. The MSSP20k dataset is more complex and challenging than existing public ones in terms of scalability and variety. Experiments show that our Multi-domain TriSeNet outperforms state-of-the-art approaches on both public and the newly built datasets with real-time speed.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA