Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 37
Filter
1.
Neural Netw ; 176: 106340, 2024 Apr 25.
Article in English | MEDLINE | ID: mdl-38713967

ABSTRACT

Vision transformers have achieved remarkable success in computer vision tasks by using multi-head self-attention modules to capture long-range dependencies within images. However, the high inference computation cost poses a new challenge. Several methods have been proposed to address this problem, mainly by slimming patches. In the inference stage, these methods classify patches into two classes, one to keep and the other to discard in multiple layers. This approach results in additional computation at every layer where patches are discarded, which hinders inference acceleration. In this study, we tackle the patch slimming problem from a different perspective by proposing a life regression module that determines the lifespan of each image patch in one go. During inference, the patch is discarded once the current layer index exceeds its life. Our proposed method avoids additional computation and parameters in multiple layers to enhance inference speed while maintaining competitive performance. Additionally, our approach1 requires fewer training epochs than other patch slimming methods.

2.
Article in English | MEDLINE | ID: mdl-38739512

ABSTRACT

Deep cooperative multi-agent reinforcement learning has demonstrated its remarkable success over a wide spectrum of complex control tasks. However, recent advances in multi-agent learning mainly focus on value decomposition while leaving entity interactions still intertwined, which easily leads to over-fitting on noisy interactions between entities. In this work, we introduce a novel interactiOn Pattern disenTangling (OPT) method, to disentangle the entity interactions into interaction prototypes, each of which represents an underlying interaction pattern within a subgroup of the entities. OPT facilitates filtering the noisy interactions between irrelevant entities and thus significantly improves generalizability as well as interpretability. Specifically, OPT introduces a sparse disagreement mechanism to encourage sparsity and diversity among discovered interaction prototypes. Then the model selectively restructures these prototypes into a compact interaction pattern by an aggregator with learnable weights. To alleviate the training instability issue caused by partial observability, we propose to maximize the mutual information between the aggregation weights and the history behaviors of each agent. Experiments on single-task, multi-task and zero-shot benchmarks demonstrate that the proposed method yields results superior to the state-of-the-art counterparts. Our code is available at https://github.com/liushunyu/OPT.

3.
Commun Chem ; 7(1): 85, 2024 Apr 17.
Article in English | MEDLINE | ID: mdl-38632308

ABSTRACT

Effective transfer learning for molecular property prediction has shown considerable strength in addressing insufficient labeled molecules. Many existing methods either disregard the quantitative relationship between source and target properties, risking negative transfer, or require intensive training on target tasks. To quantify transferability concerning task-relatedness, we propose Principal Gradient-based Measurement (PGM) for transferring molecular property prediction ability. First, we design an optimization-free scheme to calculate a principal gradient for approximating the direction of model optimization on a molecular property prediction dataset. We have analyzed the close connection between the principal gradient and model optimization through mathematical proof. PGM measures the transferability as the distance between the principal gradient obtained from the source dataset and that derived from the target dataset. Then, we perform PGM on various molecular property prediction datasets to build a quantitative transferability map for source dataset selection. Finally, we evaluate PGM on multiple combinations of transfer learning tasks across 12 benchmark molecular property prediction datasets and demonstrate that it can serve as fast and effective guidance to improve the performance of a target task. This work contributes to more efficient discovery of drugs, materials, and catalysts by offering a task-relatedness quantification prior to transfer learning and understanding the relationship between chemical properties.

4.
Cancer Med ; 13(5): e7104, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38488408

ABSTRACT

BACKGROUND: Microvascular invasion (MVI) is an independent prognostic factor that is associated with early recurrence and poor survival after resection of hepatocellular carcinoma (HCC). However, the traditional pathology approach is relatively subjective, time-consuming, and heterogeneous in the diagnosis of MVI. The aim of this study was to develop a deep-learning model that could significantly improve the efficiency and accuracy of MVI diagnosis. MATERIALS AND METHODS: We collected H&E-stained slides from 753 patients with HCC at the First Affiliated Hospital of Zhejiang University. An external validation set with 358 patients was selected from The Cancer Genome Atlas database. The deep-learning model was trained by simulating the method used by pathologists to diagnose MVI. Model performance was evaluated with accuracy, precision, recall, F1 score, and the area under the receiver operating characteristic curve. RESULTS: We successfully developed a MVI artificial intelligence diagnostic model (MVI-AIDM) which achieved an accuracy of 94.25% in the independent external validation set. The MVI positive detection rate of MVI-AIDM was significantly higher than the results of pathologists. Visualization results demonstrated the recognition of micro MVIs that were difficult to differentiate by the traditional pathology. Additionally, the model provided automatic quantification of the number of cancer cells and spatial information regarding MVI. CONCLUSIONS: We developed a deep learning diagnostic model, which performed well and improved the efficiency and accuracy of MVI diagnosis. The model provided spatial information of MVI that was essential to accurately predict HCC recurrence after surgery.


Subject(s)
Carcinoma, Hepatocellular , Deep Learning , Liver Neoplasms , Humans , Carcinoma, Hepatocellular/pathology , Liver Neoplasms/pathology , Artificial Intelligence , Retrospective Studies , Neoplasm Invasiveness
5.
IEEE Trans Image Process ; 32: 6183-6194, 2023.
Article in English | MEDLINE | ID: mdl-37022902

ABSTRACT

Pseudo supervision is regarded as the core idea in semi-supervised learning for semantic segmentation, and there is always a tradeoff between utilizing only the high-quality pseudo labels and leveraging all the pseudo labels. Addressing that, we propose a novel learning approach, called Conservative-Progressive Collaborative Learning (CPCL), among which two predictive networks are trained in parallel, and the pseudo supervision is implemented based on both the agreement and disagreement of the two predictions. One network seeks common ground via intersection supervision and is supervised by the high-quality labels to ensure a more reliable supervision, while the other network reserves differences via union supervision and is supervised by all the pseudo labels to keep exploring with curiosity. Thus, the collaboration of conservative evolution and progressive exploration can be achieved. To reduce the influences of the suspicious pseudo labels, the loss is dynamic re-weighted according to the prediction confidence. Extensive experiments demonstrate that CPCL achieves state-of-the-art performance for semi-supervised semantic segmentation.

6.
IEEE Trans Image Process ; 32: 2093-2106, 2023.
Article in English | MEDLINE | ID: mdl-37023145

ABSTRACT

Knowledge amalgamation (KA) is a novel deep model reusing task aiming to transfer knowledge from several well-trained teachers to a multi-talented and compact student. Currently, most of these approaches are tailored for convolutional neural networks (CNNs). However, there is a tendency that Transformers, with a completely different architecture, are starting to challenge the domination of CNNs in many computer vision tasks. Nevertheless, directly applying the previous KA methods to Transformers leads to severe performance degradation. In this work, we explore a more effective KA scheme for Transformer-based object detection models. Specifically, considering the architecture characteristics of Transformers, we propose to dissolve the KA into two aspects: sequence-level amalgamation (SA) and task-level amalgamation (TA). In particular, a hint is generated within the sequence-level amalgamation by concatenating teacher sequences instead of redundantly aggregating them to a fixed-size one as previous KA approaches. Besides, the student learns heterogeneous detection tasks through soft targets with efficiency in the task-level amalgamation. Extensive experiments on PASCAL VOC and COCO have unfolded that the sequence-level amalgamation significantly boosts the performance of students, while the previous methods impair the students. Moreover, the Transformer-based students excel in learning amalgamated knowledge, as they have mastered heterogeneous detection tasks rapidly and achieved superior or at least comparable performance to those of the teachers in their specializations.

7.
J Pathol Inform ; 14: 100302, 2023.
Article in English | MEDLINE | ID: mdl-36923447

ABSTRACT

Background and objective: Training a robust cancer diagnostic or prognostic artificial intelligent model using histology images requires a large number of representative cases with labels or annotations, which are difficult to obtain. The histology snapshots available in published papers or case reports can be used to enrich the training dataset. However, the magnifications of these invaluable snapshots are generally unknown, which limits their usage. Therefore, a robust magnification predictor is required for utilizing those diverse snapshot repositories consisting of different diseases. This paper presents a magnification prediction model named Hagnifinder for H&E-stained histological images. Methods: Hagnifinder is a regression model based on a modified convolutional neural network (CNN) that contains 3 modules: Feature Extraction Module, Regression Module, and Adaptive Scaling Module (ASM). In the training phase, the Feature Extraction Module first extracts the image features. Secondly, the ASM is proposed to address the learned feature values uneven distribution problem. Finally, the Regression Module estimates the mapping between the regularized extracted features and the magnifications. We construct a new dataset for training a robust model, named Hagni40, consisting of 94 643 H&E-stained histology image patches at 40 different magnifications of 13 types of cancer based on The Cancer Genome Atlas. To verify the performance of the Hagnifinder, we measure the accuracy of the predictions by setting the maximum allowable difference values (0.5, 1, and 5) between the predicted magnification and the actual magnification. We compare Hagnifinder with state-of-the-art methods on a public dataset BreakHis and the Hagni40. Results: The Hagnifinder provides consistent prediction accuracy, with a mean accuracy of 98.9%, across 40 different magnifications and 13 different cancer types when Resnet50 is used as the feature extractor. Compared with the state-of-the-art methods focusing on 4-5 levels of magnification classification, the Hagnifinder achieves the best and most comparable performance in the BreakHis and Hagni40 datasets. Conclusions: The experimental results suggest that Hagnifinder can be a valuable tool for predicting the associated magnification of any given histology image.

8.
Article in English | MEDLINE | ID: mdl-36399591

ABSTRACT

Researchers of temporal networks (e.g., social networks and transaction networks) have been interested in mining dynamic patterns of nodes from their diverse interactions. Inspired by recently powerful graph mining methods like skip-gram models and graph neural networks (GNNs), existing approaches focus on generating temporal node embeddings sequentially with nodes' sequential interactions. However, the sequential modeling of previous approaches cannot handles the transition structure between nodes' neighbors with limited memorization capacity. In detail, an effective method for the transition structures is required to both model nodes' personalized patterns adaptively and capture node dynamics accordingly. In this article, we propose a method, namely transition propagation graph neural networks (TIP-GNN), to tackle the challenges of encoding nodes' transition structures. The proposed TIP-GNN focuses on the bilevel graph structure in temporal networks: besides the explicit interaction graph, a node's sequential interactions can also be constructed as a transition graph. Based on the bilevel graph, TIP-GNN further encodes transition structures by multistep transition propagation and distills information from neighborhoods by a bilevel graph convolution. Experimental results over various temporal networks reveal the efficiency of our TIP-GNN, with at most 7.2% improvements of accuracy on temporal link prediction. Extensive ablation studies further verify the effectiveness and limitations of the transition propagation module. Our code is available at https://github.com/doujiang-zheng/TIP-GNN.

9.
Chem Sci ; 13(31): 9023-9034, 2022 Aug 10.
Article in English | MEDLINE | ID: mdl-36091202

ABSTRACT

Chemical reaction prediction, involving forward synthesis and retrosynthesis prediction, is a fundamental problem in organic synthesis. A popular computational paradigm formulates synthesis prediction as a sequence-to-sequence translation problem, where the typical SMILES is adopted for molecule representations. However, the general-purpose SMILES neglects the characteristics of chemical reactions, where the molecular graph topology is largely unaltered from reactants to products, resulting in the suboptimal performance of SMILES if straightforwardly applied. In this article, we propose the root-aligned SMILES (R-SMILES), which specifies a tightly aligned one-to-one mapping between the product and the reactant SMILES for more efficient synthesis prediction. Due to the strict one-to-one mapping and reduced edit distance, the computational model is largely relieved from learning the complex syntax and dedicated to learning the chemical knowledge for reactions. We compare the proposed R-SMILES with various state-of-the-art baselines and show that it significantly outperforms them all, demonstrating the superiority of the proposed method.

10.
IEEE Trans Image Process ; 31: 3359-3370, 2022.
Article in English | MEDLINE | ID: mdl-35503832

ABSTRACT

Knowledge distillation (KD) has become a well established paradigm for compressing deep neural networks. The typical way of conducting knowledge distillation is to train the student network under the supervision of the teacher network to harness the knowledge at one or multiple spots (i.e., layers) in the teacher network. The distillation spots, once specified, will not change for all the training samples, throughout the whole distillation process. In this work, we argue that distillation spots should be adaptive to training samples and distillation epochs. We thus propose a new distillation strategy, termed spot-adaptive KD (SAKD), to adaptively determine the distillation spots in the teacher network per sample, at every training iteration during the whole distillation period. As SAKD actually focuses on "where to distill" instead of "what to distill" that is widely investigated by most existing works, it can be seamlessly integrated into existing distillation methods to further improve their performance. Extensive experiments with 10 state-of-the-art distillers are conducted to demonstrate the effectiveness of SAKD for improving their distillation performance, under both homogeneous and heterogeneous distillation settings. Code is available at https://github.com/zju-vipa/spot-adaptive-pytorch.

11.
Sci Rep ; 12(1): 1855, 2022 02 03.
Article in English | MEDLINE | ID: mdl-35115624

ABSTRACT

We aimed to develop an explainable and reliable method to diagnose cysts and tumors of the jaw with massive panoramic radiographs of healthy peoples based on deep learning, since collecting and labeling massive lesion samples are time-consuming, and existing deep learning-based methods lack explainability. Based on the collected 872 lesion samples and 10,000 healthy samples, a two-branch network was proposed for classifying the cysts and tumors of the jaw. The two-branch network is firstly pretrained on massive panoramic radiographs of healthy peoples, then is trained for classifying the sample categories and segmenting the lesion area. Totally, 200 healthy samples and 87 lesion samples were included in the testing stage. The average accuracy, precision, sensitivity, specificity, and F1 score of classification are 88.72%, 65.81%, 66.56%, 92.66%, and 66.14%, respectively. The average accuracy, precision, sensitivity, specificity, and F1 score of classification will reach 90.66%, 85.23%, 84.27%, 93.50%, and 84.74%, if only classifying the lesion samples and healthy samples. The proposed method showed encouraging performance in the diagnosis of cysts and tumors of the jaw. The classified categories and segmented lesion areas serve as the diagnostic basis for further diagnosis, which provides a reliable tool for diagnosing jaw tumors and cysts.


Subject(s)
Deep Learning , Jaw Cysts/diagnostic imaging , Jaw Neoplasms/diagnostic imaging , Radiographic Image Interpretation, Computer-Assisted , Radiography, Panoramic , Case-Control Studies , Humans , Predictive Value of Tests , Reproducibility of Results
12.
IEEE Trans Pattern Anal Mach Intell ; 44(11): 7871-7884, 2022 Nov.
Article in English | MEDLINE | ID: mdl-34550880

ABSTRACT

The goal of image steganography is to hide a full-sized image, termed secret, into another, termed cover. Prior image steganography algorithms can conceal only one secret within one cover. In this paper, we propose an adaptive local image steganography (AdaSteg) system that allows for scale- and location-adaptive image steganography. By adaptively hiding the secret on a local scale, the proposed system makes the steganography more secured, and further enables multi-secret steganography within one single cover. Specifically, this is achieved via two stages, namely the adaptive patch selection stage and secret encryption stage. Given a pair of secret and cover, first, the optimal local patch for concealment is determined adaptively by exploiting deep reinforcement learning with the proposed steganography quality function and policy network. The secret image is then converted into a patch of encrypted noises, resembling the process of generating adversarial examples, which are further encoded to a local region of the cover to realize a more secured steganography. Furthermore, we propose a novel criterion for the assessment of local steganography, and also collect a challenging dataset that is specialized for the task of image steganography, thus contributing to a standardized benchmark for the area. Experimental results demonstrate that the proposed model yields results superior to the state of the art in both security and capacity.

13.
Front Oncol ; 11: 762733, 2021.
Article in English | MEDLINE | ID: mdl-34926264

ABSTRACT

BACKGROUND: An accurate pathological diagnosis of hepatocellular carcinoma (HCC), one of the malignant tumors with the highest mortality rate, is time-consuming and heavily reliant on the experience of a pathologist. In this report, we proposed a deep learning model that required minimal noise reduction or manual annotation by an experienced pathologist for HCC diagnosis and classification. METHODS: We collected a whole-slide image of hematoxylin and eosin-stained pathological slides from 592 HCC patients at the First Affiliated Hospital, College of Medicine, Zhejiang University between 2015 and 2020. We propose a noise-specific deep learning model. The model was trained initially with 137 cases cropped into multiple-scaled datasets. Patch screening and dynamic label smoothing strategies are adopted to handle the histopathological liver image with noise annotation from the perspective of input and output. The model was then tested in an independent cohort of 455 cases with comparable tumor types and differentiations. RESULTS: Exhaustive experiments demonstrated that our two-step method achieved 87.81% pixel-level accuracy and 98.77% slide-level accuracy in the test dataset. Furthermore, the generalization performance of our model was also verified using The Cancer Genome Atlas dataset, which contains 157 HCC pathological slides, and achieved an accuracy of 87.90%. CONCLUSIONS: The noise-specific histopathological classification model of HCC based on deep learning is effective for the dataset with noisy annotation, and it significantly improved the pixel-level accuracy of the regular convolutional neural network (CNN) model. Moreover, the model also has an advantage in detecting well-differentiated HCC and microvascular invasion.

14.
IEEE Trans Vis Comput Graph ; 26(11): 3365-3385, 2020 11.
Article in English | MEDLINE | ID: mdl-31180860

ABSTRACT

The seminal work of Gatys et al. demonstrated the power of Convolutional Neural Networks (CNNs) in creating artistic imagery by separating and recombining image content and style. This process of using CNNs to render a content image in different styles is referred to as Neural Style Transfer (NST). Since then, NST has become a trending topic both in academic literature and industrial applications. It is receiving increasing attention and a variety of approaches are proposed to either improve or extend the original NST algorithm. In this paper, we aim to provide a comprehensive overview of the current progress towards NST. We first propose a taxonomy of current algorithms in the field of NST. Then, we present several evaluation methods and compare different NST algorithms both qualitatively and quantitatively. The review concludes with a discussion of various applications of NST and open problems for future research. A list of papers discussed in this review, corresponding codes, pre-trained models and more comparison results are publicly available at: https://osf.io/f8tu4/.

15.
Article in English | MEDLINE | ID: mdl-31502968

ABSTRACT

Human parsing and matting play important roles in various applications, such as dress collocation, clothing recommendation, and image editing. In this paper, we propose a lightweight hybrid model that unifies the fully-supervised hierarchical-granularity parsing task and the unsupervised matting one. Our model comprises two parts, the extensible hierarchical semantic segmentation block using CNN and the matting module composed of guided filters. Given a human image, the segmentation block stage-1 first obtains a primitive segmentation map to separate the human and the background. The primitive segmentation is then fed into stage-2 together with the original image to give a rough segmentation of human body. This procedure is repeated in the stage-3 to acquire a refined segmentation. The matting module takes as input the above estimated segmentation maps and produces the matting map, in a fully unsupervised manner. The obtained matting map is then in turn fed back to the CNN in the first block for refining the semantic segmentation results.

16.
Aesthetic Plast Surg ; 42(6): 1664-1671, 2018 Dec.
Article in English | MEDLINE | ID: mdl-30206648

ABSTRACT

OBJECTIVE: To evaluate aesthetic outcomes in patients with bilateral trapezius hypertrophy treated by botulinum toxin type A (BTxA) injection for aesthetic reconstruction of the upper trapezius. METHODS: From May 2015 to May 2016, 30 women with a short neck shape resulting from bilateral trapezius hypertrophy were treated with botulinum toxin type A (BTxA) injection at the most affected area of the upper trapezius. Pre- and postoperative values of SACDF (irregularly shaped area of the four points A, C, D, and F) and SACDE (irregularly shaped area of the four points A, C, D, and E), responses to patients' and doctors' Global Aesthetic Improvement Scale (GAIS) questionnaires for neck aesthetic assessment, as well as reported adverse events, were recorded and analyzed. RESULTS: Duration of follow-up ranged from 4 to 12 months. Subjects experienced non-severe adverse events and complete recovery after a single BTxA injection. In patients' GAIS questionnaires, "very much improved" accounted for 53%, "much improved" accounted for 13%, and "improved" accounted for 27%. In doctors' GAIS questionnaires, "very much improved" accounted for 27%, "much improved" accounted for 33%, "improved" accounted for 33%, and "no change" accounted for 7%. The overall degree of improvement was high. Statistically significant differences were observed with respect to the "very much improved" response to GAIS questionnaires between patients and doctors (P = 0.035). CONCLUSION: A single injection of BTxA for aesthetic reconstruction of the upper trapezius is safe and effective in patients with bilateral trapezius hypertrophy. LEVEL OF EVIDENCE IV: This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .


Subject(s)
Botulinum Toxins, Type A/administration & dosage , Hypertrophy/drug therapy , Patient Satisfaction/statistics & numerical data , Superficial Back Muscles/drug effects , Superficial Back Muscles/pathology , Surveys and Questionnaires , Adult , Cohort Studies , Esthetics , Female , Follow-Up Studies , Humans , Hypertrophy/pathology , Injections, Intralesional , Middle Aged , Muscle Relaxation/drug effects , Retrospective Studies , Statistics, Nonparametric , Treatment Outcome
17.
IEEE Trans Image Process ; 26(7): 3331-3343, 2017 Jul.
Article in English | MEDLINE | ID: mdl-28358685

ABSTRACT

In recent years, taking photos and capturing videos with mobile devices have become increasingly popular. Emerging applications based on the depth reconstruction technique have been developed, such as Google lens blur. However, depth reconstruction is difficult due to occlusions, non-diffuse surfaces, repetitive patterns, and textureless surfaces, and it has become more difficult due to the unstable image quality and uncontrolled scene condition in the mobile setting. In this paper, we present a novel hierarchical framework with multi-view confidence-based matching for robust, efficient depth reconstruction in uncontrolled scenes. Particularly, the proposed framework combines local cost aggregation with global cost optimization in a complementary manner that increases efficiency and accuracy. A depth map is efficiently obtained in a coarse-to-fine manner by using an image pyramid. Moreover, confidence maps are computed to robustly fuse multi-view matching cues, and to constrain the stereo matching on a finer scale. The proposed framework has been evaluated with challenging indoor and outdoor scenes, and has achieved robust and efficient depth reconstruction.

18.
IEEE Trans Pattern Anal Mach Intell ; 39(2): 227-241, 2017 02.
Article in English | MEDLINE | ID: mdl-27019472

ABSTRACT

Often, tasks are collected for multi-task learning (MTL) because they share similar feature structures. Based on this observation, in this paper, we present novel algorithm-dependent generalization bounds for MTL by exploiting the notion of algorithmic stability. We focus on the performance of one particular task and the average performance over multiple tasks by analyzing the generalization ability of a common parameter that is shared in MTL. When focusing on one particular task, with the help of a mild assumption on the feature structures, we interpret the function of the other tasks as a regularizer that produces a specific inductive bias. The algorithm for learning the common parameter, as well as the predictor, is thereby uniformly stable with respect to the domain of the particular task and has a generalization bound with a fast convergence rate of order O(1/n), where n is the sample size of the particular task. When focusing on the average performance over multiple tasks, we prove that a similar inductive bias exists under certain conditions on the feature structures. Thus, the corresponding algorithm for learning the common parameter is also uniformly stable with respect to the domains of the multiple tasks, and its generalization bound is of the order O(1/T), where T is the number of tasks. These theoretical analyses naturally show that the similarity of feature structures in MTL will lead to specific regularizations for predicting, which enables the learning algorithms to generalize fast and correctly from a few examples.

19.
IEEE Trans Neural Netw Learn Syst ; 27(6): 1122-34, 2016 06.
Article in English | MEDLINE | ID: mdl-26277008

ABSTRACT

Saliency detection is used to identify the most important and informative area in a scene, and it is widely used in various vision tasks, including image quality assessment, image matching, and object recognition. Manifold ranking (MR) has been used to great effect for the saliency detection, since it not only incorporates the local spatial information but also utilizes the labeling information from background queries. However, MR completely ignores the feature information extracted from each superpixel. In this paper, we propose an MR-based matrix factorization (MRMF) method to overcome this limitation. MRMF models the ranking problem in the matrix factorization framework and embeds query sample labels in the coefficients. By incorporating spatial information and embedding labels, MRMF enforces similar saliency values on neighboring superpixels and ranks superpixels according to the learned coefficients. We prove that the MRMF has good generalizability, and develops an efficient optimization algorithm based on the Nesterov method. Experiments using popular benchmark data sets illustrate the promise of MRMF compared with the other state-of-the-art saliency detection methods.

20.
IEEE Trans Cybern ; 46(4): 890-901, 2016 Apr.
Article in English | MEDLINE | ID: mdl-25872222

ABSTRACT

The desire to reconstruct 3-D face models with expressions from 2-D face images fosters increasing interest in addressing the problem of face modeling. This task is important and challenging in the field of computer animation. Facial contours and wrinkles are essential to generate a face with a certain expression; however, these details are generally ignored or are not seriously considered in previous studies on face model reconstruction. Thus, we employ coupled radius basis function networks to derive an intermediate 3-D face model from a single 2-D face image. To optimize the 3-D face model further through landmarks, a coupled dictionary that is related to 3-D face models and their corresponding 3-D landmarks is learned from the given training set through local coordinate coding. Another coupled dictionary is then constructed to bridge the 2-D and 3-D landmarks for the transfer of vertices on the face model. As a result, the final 3-D face can be generated with the appropriate expression. In the testing phase, the 2-D input faces are converted into 3-D models that display different expressions. Experimental results indicate that the proposed approach to facial expression synthesis can obtain model details more effectively than previous methods can.


Subject(s)
Face , Facial Expression , Imaging, Three-Dimensional/methods , Machine Learning , Algorithms , Face/anatomy & histology , Face/diagnostic imaging , Face/physiology , Humans
SELECTION OF CITATIONS
SEARCH DETAIL
...