RESUMO
Cooperation within asymmetric populations has garnered significant attention in evolutionary games. This paper explores cooperation evolution in populations with weak and strong players, using a game model where players choose between cooperation and defection. Asymmetry stems from different benefits for strong and weak cooperators, with their benefit ratio indicating the degree of asymmetry. Varied rankings of parameters including the asymmetry degree, cooperation costs, and benefits brought by weak players give rise to scenarios including the prisoner's dilemma (PDG) for both player types, the snowdrift game (SDG), and mixed PDG-SDG interactions. Our results indicate that in an infinite well-mixed population, defection remains the dominant strategy when strong players engage in the prisoner's dilemma game. However, if strong players play snowdrift games, global cooperation increases with the proportion of strong players. In this scenario, strong cooperators can prevail over strong defectors when the proportion of strong players is low, but the prevalence of cooperation among strong players decreases as their proportion increases. In contrast, within a square lattice, the optimum global cooperation emerges at intermediate proportions of strong players with moderate degrees of asymmetry. Additionally, weak players protect cooperative clusters from exploitation by strong defectors. This study highlights the complex dynamics of cooperation in asymmetric interactions, contributing to the theory of cooperation in asymmetric games.
Assuntos
Comportamento Cooperativo , Teoria dos Jogos , Dilema do Prisioneiro , Dinâmica Populacional , Evolução BiológicaRESUMO
Multi-human parsing is an image segmentation task necessitating both instance-level and fine-grained category-level information. However, prior research has typically processed these two types of information through distinct branch types and output formats, leading to inefficient and redundant frameworks. This paper introduces UniParser, which integrates instance-level and category-level representations in three key aspects: 1) we propose a unified correlation representation learning approach, allowing our network to learn instance and category features within the cosine space; 2) we unify the form of outputs of each modules as pixel-level results while supervising instance and category features using a homogeneous label accompanied by an auxiliary loss; and 3) we design a joint optimization procedure to fuse instance and category representations. By unifying instance-level and category-level output, UniParser circumvents manually designed post-processing techniques and surpasses state-of-the-art methods, achieving 49.3% AP on MHPv2.0 and 60.4% AP on CIHP. We have released our source code, pretrained models, and demos to facilitate future studies on https://github.com/cjm-sfw/Uniparser.
RESUMO
Owing to the unremitting efforts from a few institutes, researchers have recently made significant progress in designing superhuman artificial intelligence (AI) in no-limit Texas hold'em (NLTH), the primary testbed for large-scale imperfect-information game research. However, it remains challenging for new researchers to study this problem since there are no standard benchmarks for comparing with existing methods, which hinders further developments in this research area. This work presents OpenHoldem, an integrated benchmark for large-scale imperfect-information game research using NLTH. OpenHoldem makes three main contributions to this research direction: 1) a standardized evaluation protocol for thoroughly evaluating different NLTH AIs; 2) four publicly available strong baselines for NLTH AI; and 3) an online testing platform with easy-to-use APIs for public NLTH AI evaluation. We will publicly release OpenHoldem and hope it facilitates further studies on the unsolved theoretical and computational issues in this area and cultivates crucial research problems like opponent modeling and human-computer interactive learning.
RESUMO
Cooperative AI has shown its effectiveness in solving the conundrum of cooperation. Understanding how cooperation emerges in human-agent hybrid populations is a topic of significant interest, particularly in the realm of evolutionary game theory. In this article, we scrutinize how cooperative and defective Autonomous Agents (AAs) influence human cooperation in social dilemma games with a one-shot setting. Focusing on well-mixed populations, we find that cooperative AAs have a limited impact in the prisoner's dilemma games but facilitate cooperation in the stag hunt games. Surprisingly, defective AAs can promote complete dominance of cooperation in the snowdrift games. As the proportion of AAs increases, both cooperative and defective AAs have the potential to cause human cooperation to disappear. We then extend our investigation to consider the pairwise comparison rule and complex networks, elucidating that imitation strength and population structure are critical for the emergence of human cooperation in human-agent hybrid populations.
RESUMO
Regression-based face alignment involves learning a series of mapping functions to predict the true landmarks from an initial estimation of the alignment. Most existing approaches focus on learning efficacious mapping functions from some feature representations to improve performance. The issues related to the initial alignment estimation and the final learning objective, however, receive less attention. This work proposes a deep regression architecture with progressive reinitialization and a new error-driven learning loss function to explicitly address the above two issues. Given an image with a rough face detection result, the full face region is first mapped by a supervised spatial transformer network to a normalized form and trained to regress coarse positions of landmarks. Then, different face parts are further respectively reinitialized to their own normalized states, followed by another regression sub-network to refine the landmark positions. To deal with the inconsistent annotations in existing training datasets, we further propose an adaptive landmark-weighted loss function. It dynamically adjusts the importance of different landmarks according to their learning errors during training without depending on any hyper-parameters manually set by trial and error. A high level of robustness to annotation inconsistencies is thus achieved. The whole deep architecture permits training from end to end, and extensive experimental analyses and comparisons demonstrate its effectiveness and efficiency. The source code, trained models, and experimental results are made available at https://github.com/shaoxiaohu/Face_Alignment_DPR.git.
Assuntos
Algoritmos , Aprendizado ProfundoRESUMO
This paper presents a new Gaussian Processes (GPs)-based particle filter tracking framework. The framework non-trivially extends Gaussian process regression (GPR) to transfer learning, and, following the tracking-by-fusion strategy, integrates closely two tracking components, namely a GPs component and a CFs one. First, the GPs component analyzes and models the probability distribution of the object appearance by exploiting GPs. It categorizes the labeled samples into auxiliary and target ones, and explores unlabeled samples in transfer learning. The GPs component thus captures rich appearance information over object samples across time. On the other hand, to sample an initial particle set in regions of high likelihood through the direct simulation method in particle filtering, the powerful yet efficient correlation filters (CFs) are integrated, leading to the CFs component. In fact, the CFs component not only boosts the sampling quality, but also benefits from the GPs component, which provides re-weighted knowledge as latent variables for determining the impact of each correlation filter template from the auxiliary samples. In this way, the transfer learning based fusion enables effective interactions between the two components. Superior performance on four object tracking benchmarks (OTB-2015, Temple-Color, and VOT2015/2016), and in comparison with baselines and recent state-of-the-art trackers, has demonstrated clearly the effectiveness of the proposed framework.
RESUMO
There are two key components that can be leveraged for visual tracking: (a) object appearances; and (b) object motions. Many existing techniques have recently employed deep learning to enhance visual tracking due to its superior representation power and strong learning ability, where most of them employed object appearances but few of them exploited object motions. In this work, a deep spatial and temporal network (DSTN) is developed for visual tracking by explicitly exploiting both the object representations from each frame and their dynamics along multiple frames in a video, such that it can seamlessly integrate the object appearances with their motions to produce compact object appearances and capture their temporal variations effectively. Our DSTN method, which is deployed into a tracking pipeline in a coarse-to-fine form, can perceive the subtle differences on spatial and temporal variations of the target (object being tracked), and thus it benefits from both off-line training and online fine-tuning. We have also conducted our experiments over four largest tracking benchmarks, including OTB-2013, OTB-2015, VOT2015, and VOT2017, and our experimental results have demonstrated that our DSTN method can achieve competitive performance as compared with the state-of-the-art techniques. The source code, trained models, and all the experimental results of this work will be made public available to facilitate further studies on this problem.
RESUMO
Skeleton-based human action recognition has recently attracted increasing attention thanks to the accessibility and the popularity of 3D skeleton data. One of the key challenges in action recognition lies in the large variations of action representations when they are captured from different viewpoints. In order to alleviate the effects of view variations, this paper introduces a novel view adaptation scheme, which automatically determines the virtual observation viewpoints over the course of an action in a learning based data driven manner. Instead of re-positioning the skeletons using a fixed human-defined prior criterion, we design two view adaptive neural networks, i.e., VA-RNN and VA-CNN, which are respectively built based on the recurrent neural network (RNN) with the Long Short-term Memory (LSTM) and the convolutional neural network (CNN). For each network, a novel view adaptation module learns and determines the most suitable observation viewpoints, and transforms the skeletons to those viewpoints for the end-to-end recognition with a main classification network. Ablation studies find that the proposed view adaptive models are capable of transforming the skeletons of various views to much more consistent virtual viewpoints. Therefore, the models largely eliminate the influence of the viewpoints, enabling the networks to focus on the learning of action-specific features and thus resulting in superior performance. In addition, we design a two-stream scheme (referred to as VA-fusion) that fuses the scores of the two networks to provide the final prediction, obtaining enhanced performance. Moreover, random rotation of skeleton sequences is employed to improve the robustness of view adaptation models and alleviate overfitting during training. Extensive experimental evaluations on five challenging benchmarks demonstrate the effectiveness of the proposed view-adaptive networks and superior performance over state-of-the-art approaches.
RESUMO
Synthesizing realistic profile faces is beneficial for more efficiently training deep pose-invariant models for large-scale unconstrained face recognition, by augmenting the number of samples with extreme poses and avoiding costly annotation work. However, learning from synthetic faces may not achieve the desired performance due to the discrepancy betwedistributions of the synthetic and real face images. To narrow this gap, we propose a Dual-Agent Generative Adversarial Network (DA-GAN) model, which can improve the realism of a face simulator's output using unlabeled real faces while preserving the identity information during the realism refinement. The dual agents are specially designed for distinguishing real versus fake and identities simultaneously. In particular, we employ an off-the-shelf 3D face model as a simulator to generate profile face images with varying poses. DA-GAN leverages a fully convolutional network as the generator to generate high-resolution images and an auto-encoder as the discriminator with the dual agents. Besides the novel architecture, we make several key modifications to the standard GAN to preserve pose, texture as well as identity, and stabilize the training process: (i) a pose perception loss; (ii) an identity perception loss; (iii) an adversarial loss with a boundary equilibrium regularization term. Experimental results show that DA-GAN not only achieves outstanding perceptual results but also significantly outperforms state-of-the-arts on the large-scale and challenging NIST IJB-A and CFP unconstrained face recognition benchmarks. In addition, the proposed DA-GAN is also a promising new approach for solving generic transfer learning problems more effectively. DA-GAN is the foundation of our winning entry to the NIST IJB-A face recognition competition in which we secured the $1^{st}$ places on the tracks of verification and identification.
Assuntos
Identificação Biométrica/métodos , Aprendizado Profundo , Face/diagnóstico por imagem , Imageamento Tridimensional/métodos , Bases de Dados Factuais , HumanosRESUMO
Human action analytics has attracted a lot of attention for decades in computer vision. It is important to extract discriminative spatio-temporal features to model the spatial and temporal evolutions of different actions. In this paper, we propose a spatial and temporal attention model to explore the spatial and temporal discriminative features for human action recognition and detection from skeleton data. We build our networks based on the recurrent neural networks with long short-term memory units. The learned model is capable of selectively focusing on discriminative joints of skeletons within each input frame and paying different levels of attention to the outputs of different frames. To ensure effective training of the network for action recognition, we propose a regularized cross-entropy loss to drive the learning process and develop a joint training strategy accordingly. Moreover, based on temporal attention, we develop a method to generate the action temporal proposals for action detection. We evaluate the proposed method on the SBU Kinect Interaction data set, the NTU RGB + D data set, and the PKU-MMD data set, respectively. Experiment results demonstrate the effectiveness of our proposed model on both action recognition and action detection.
RESUMO
We construct a new efficient near duplicate image detection method using a hierarchical hash code learning neural network and load-balanced locality-sensitive hashing (LSH) indexing. We propose a deep constrained siamese hash coding neural network combined with deep feature learning. Our neural network is able to extract effective features for near duplicate image detection. The extracted features are used to construct a LSH-based index. We propose a load-balanced LSH method to produce load-balanced buckets in the hashing process. The load-balanced LSH significantly reduces the query time. Based on the proposed load-balanced LSH, we design an effective and feasible algorithm for near duplicate image detection. Extensive experiments on three benchmark data sets demonstrate the effectiveness of our deep siamese hash encoding network and load-balanced LSH.
RESUMO
Predicting human pose in the wild is a challenging problem due to high flexibility of joints and possible occlusion. Existing approaches generally tackle the difficulties either by holistic prediction or multi-stage processing, which suffer from poor performance for locating challenging joints or high computational cost. In this paper, we propose a new Hierarchical Contextual Refinement Network (HCRN) to robustly predict human poses in an efficient manner, where human body joints of different complexities are processed at different layers in a context hierarchy. Different from existing approaches, our proposed model predicts positions of joints from easy to difficult in a single stage through effectively exploiting informative contexts provided in the previous layer. Such approach offers two appealing advantages over state-of-the-arts: (1) more accurate than predicting all the joints together and (2) more efficient than multi-stage processing methods. We design a Contextual Refinement Unit (CRU) to implement the proposed model, which enables auto-diffusion of joint detection results to effectively transfer informative context from easy joints to difficult ones. In this way, difficult joints can be reliably detected even in presence of occlusion or severe distracting factors. Multiple CRUs are organized into a tree-structured hierarchy which is end-to-end trainable and does not require processing joints for multiple iterations. Comprehensive experiments evaluate the efficacy and efficiency of the proposed HCRN model to improve well-established baselines and achieve new state-of-the-art on multiple human pose estimation benchmarks.
RESUMO
Face alignment acts as an important task in computer vision. Regression-based methods currently dominate the approach to solving this problem, which generally employ a series of mapping functions from the face appearance to iteratively update the face shape hypothesis. One keypoint here is thus how to perform the regression procedure. In this work, we formulate this regression procedure as a sparse coding problem. We learn two relational dictionaries, one for the face appearance and the other one for the face shape, with coupled reconstruction coefficient to capture their underlying relationships. To deploy this model for face alignment, we derive the relational dictionaries in a stage-wised manner to perform close-loop refinement of themselves, i.e., the face appearance dictionary is first learned from the face shape dictionary and then used to update the face shape hypothesis, and the updated face shape dictionary from the shape hypothesis is in return used to refine the face appearance dictionary. To improve the model accuracy, we extend this model hierarchically from the whole face shape to face part shapes, thus both the global and local view variations of a face are captured. To locate facial landmarks under occlusions, we further introduce an occlusion dictionary into the face appearance dictionary to recover face shape from partially occluded face appearance. The occlusion dictionary is learned in a data driven manner from background images to represent a set of elemental occlusion patterns, a sparse combination of which models various practical partial face occlusions. By integrating all these technical innovations, we obtain a robust and accurate approach to locate facial landmarks under different face views and possibly severe occlusions for face images in the wild. Extensive experimental analyses and evaluations on different benchmark datasets, as well as two new datasets built by ourselves, have demonstrated the robustness and accuracy of our proposed model, especially for face images with large view variations and/or severe occlusions.
RESUMO
An appearance model adaptable to changes in object appearance is critical in visual object tracking. In this paper, we treat an image patch as a two-order tensor which preserves the original image structure. We design two graphs for characterizing the intrinsic local geometrical structure of the tensor samples of the object and the background. Graph embedding is used to reduce the dimensions of the tensors while preserving the structure of the graphs. Then, a discriminant embedding space is constructed. We prove two propositions for finding the transformation matrices which are used to map the original tensor samples to the tensor-based graph embedding space. In order to encode more discriminant information in the embedding space, we propose a transfer-learning- based semi-supervised strategy to iteratively adjust the embedding space into which discriminative information obtained from earlier times is transferred. We apply the proposed semi-supervised tensor-based graph embedding learning algorithm to visual tracking. The new tracking algorithm captures an object's appearance characteristics during tracking and uses a particle filter to estimate the optimal object state. Experimental results on the CVPR 2013 benchmark dataset demonstrate the effectiveness of the proposed tracking algorithm.
RESUMO
Tracking multiple persons is a challenging task when persons move in groups and occlude each other. Existing group-based methods have extensively investigated how to make group division more accurately in a tracking-by-detection framework; however, few of them quantify the group dynamics from the perspective of targets' spatial topology or consider the group in a dynamic view. Inspired by the sociological properties of pedestrians, we propose a novel socio-topology model with a topology-energy function to factor the group dynamics of moving persons and groups. In this model, minimizing the topology-energy-variance in a two-level energy form is expected to produce smooth topology transitions, stable group tracking, and accurate target association. To search for the strong minimum in energy variation, we design the discrete group-tracklet jump moves embedded in the gradient descent method, which ensures that the moves reduce the energy variation of group and trajectory alternately in the varying topology dimension. Experimental results on both RGB and RGB-D data sets show the superiority of our proposed model for multiple person tracking in crowd scenes.
RESUMO
Multiple object tracking (MOT) is a very challenging task yet of fundamental importance for many practical applications. In this paper, we focus on the problem of tracking multiple players in sports video which is even more difficult due to the abrupt movements of players and their complex interactions. To handle the difficulties in this problem, we present a new MOT algorithm which contributes both in the observation modeling level and in the tracking strategy level. For the observation modeling, we develop a progressive observation modeling process that is able to provide strong tracking observations and greatly facilitate the tracking task. For the tracking strategy, we propose a dual-mode two-way Bayesian inference approach which dynamically switches between an offline general model and an online dedicated model to deal with single isolated object tracking and multiple occluded object tracking integrally by forward filtering and backward smoothing. Extensive experiments on different kinds of sports videos, including football, basketball, as well as hockey, demonstrate the effectiveness and efficiency of the proposed method.