Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
IEEE Trans Pattern Anal Mach Intell ; 44(9): 5026-5041, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34061735

RESUMO

We present DistillFlow, a knowledge distillation approach to learning optical flow. DistillFlow trains multiple teacher models and a student model, where challenging transformations are applied to the input of the student model to generate hallucinated occlusions as well as less confident predictions. Then, a self-supervised learning framework is constructed: confident predictions from teacher models are served as annotations to guide the student model to learn optical flow for those less confident predictions. The self-supervised learning framework enables us to effectively learn optical flow from unlabeled data, not only for non-occluded pixels, but also for occluded pixels. DistillFlow achieves state-of-the-art unsupervised learning performance on both KITTI and Sintel datasets. Our self-supervised pre-trained model also provides an excellent initialization for supervised fine-tuning, suggesting an alternate training paradigm in contrast to current supervised learning methods that highly rely on pre-training on synthetic data. At the time of writing, our fine-tuned models ranked 1st among all monocular methods on the KITTI 2015 benchmark, and outperform all published methods on the Sintel Final benchmark. More importantly, we demonstrate the generalization capability of DistillFlow in three aspects: framework generalization, correspondence generalization and cross-dataset generalization. Our code and models will be available on https://github.com/ppliuboy/DistillFlow.

2.
Neural Netw ; 141: 385-394, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33992974

RESUMO

Code retrieval is a common practice for programmers to reuse existing code snippets in the open-source repositories. Given a user query (i.e., a natural language description), code retrieval aims at searching the most relevant ones from a set of code snippets. The main challenge of effective code retrieval lies in mitigating the semantic gap between natural language descriptions and code snippets. With the ever-increasing amount of available open-source code, recent studies resort to neural networks to learn the semantic matching relationships between the two sources. The statement-level dependency information, which highlights the dependency relations among the program statements during the execution, reflects the structural importance of one statement in the code, which is favorable for accurately capturing the code semantics but has never been explored for the code retrieval task. In this paper, we propose CRaDLe, a novel approach for Code Retrieval based on statement-level semantic Dependency Learning. Specifically, CRaDLe distills code representations through fusing both the dependency and semantic information at the statement level, and then learns a unified vector representation for each code and description pair for modeling the matching relationship. Comprehensive experiments and analysis on real-world datasets show that the proposed approach can accurately retrieve code snippets for a given query and significantly outperform the state-of-the-art approaches on the task.


Assuntos
Semântica , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Redes Neurais de Computação , Software
3.
IEEE Trans Neural Netw Learn Syst ; 31(7): 2441-2454, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31425056

RESUMO

Estimating covariance matrix from massive high-dimensional and distributed data is significant for various real-world applications. In this paper, we propose a data-aware weighted sampling-based covariance matrix estimator, namely DACE, which can provide an unbiased covariance matrix estimation and attain more accurate estimation under the same compression ratio. Moreover, we extend our proposed DACE to tackle multiclass classification problems with theoretical justification and conduct extensive experiments on both synthetic and real-world data sets to demonstrate the superior performance of our DACE.

4.
IEEE Trans Neural Netw Learn Syst ; 29(4): 882-895, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-28141529

RESUMO

Classifying binary imbalanced streaming data is a significant task in both machine learning and data mining. Previously, online area under the receiver operating characteristic (ROC) curve (AUC) maximization has been proposed to seek a linear classifier. However, it is not well suited for handling nonlinearity and heterogeneity of the data. In this paper, we propose the kernelized online imbalanced learning (KOIL) algorithm, which produces a nonlinear classifier for the data by maximizing the AUC score while minimizing a functional regularizer. We address four major challenges that arise from our approach. First, to control the number of support vectors without sacrificing the model performance, we introduce two buffers with fixed budgets to capture the global information on the decision boundary by storing the corresponding learned support vectors. Second, to restrict the fluctuation of the learned decision function and achieve smooth updating, we confine the influence on a new support vector to its -nearest opposite support vectors. Third, to avoid information loss, we propose an effective compensation scheme after the replacement is conducted when either buffer is full. With such a compensation scheme, the performance of the learned model is comparable to the one learned with infinite budgets. Fourth, to determine good kernels for data similarity representation, we exploit the multiple kernel learning framework to automatically learn a set of kernels. Extensive experiments on both synthetic and real-world benchmark data sets demonstrate the efficacy of our proposed approach.

5.
Neural Netw ; 71: 214-24, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26433049

RESUMO

Feature selection is an important problem in machine learning and data mining. We consider the problem of selecting features under the budget constraint on the feature subset size. Traditional feature selection methods suffer from the "monotonic" property. That is, if a feature is selected when the number of specified features is set, it will always be chosen when the number of specified feature is larger than the previous setting. This sacrifices the effectiveness of the non-monotonic feature selection methods. Hence, in this paper, we develop an algorithm for non-monotonic feature selection that approximates the related combinatorial optimization problem by a Multiple Kernel Learning (MKL) problem. We justify the performance guarantee for the derived solution when compared to the global optimal solution for the related combinatorial optimization problem. Finally, we conduct a series of empirical evaluation on both synthetic and real-world benchmark datasets for the classification and regression tasks to demonstrate the promising performance of the proposed framework compared with the baseline feature selection approaches.


Assuntos
Mineração de Dados , Aprendizado de Máquina , Algoritmos , Inteligência Artificial , Benchmarking , Simulação por Computador , Bases de Dados Factuais , Incêndios , Habitação/estatística & dados numéricos
6.
Neural Netw ; 70: 90-102, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26264172

RESUMO

Semi-supervised learning (SSL) is a typical learning paradigms training a model from both labeled and unlabeled data. The traditional SSL models usually assume unlabeled data are relevant to the labeled data, i.e., following the same distributions of the targeted labeled data. In this paper, we address a different, yet formidable scenario in semi-supervised classification, where the unlabeled data may contain irrelevant data to the labeled data. To tackle this problem, we develop a maximum margin model, named tri-class support vector machine (3C-SVM), to utilize the available training data, while seeking a hyperplane for separating the targeted data well. Our 3C-SVM exhibits several characteristics and advantages. First, it does not need any prior knowledge and explicit assumption on the data relatedness. On the contrary, it can relieve the effect of irrelevant unlabeled data based on the logistic principle and maximum entropy principle. That is, 3C-SVM approaches an ideal classifier. This classifier relies heavily on labeled data and is confident on the relevant data lying far away from the decision hyperplane, while maximally ignoring the irrelevant data, which are hardly distinguished. Second, theoretical analysis is provided to prove that in what condition, the irrelevant data can help to seek the hyperplane. Third, 3C-SVM is a generalized model that unifies several popular maximum margin models, including standard SVMs, Semi-supervised SVMs (S(3)VMs), and SVMs learned from the universum (U-SVMs) as its special cases. More importantly, we deploy a concave-convex produce to solve the proposed 3C-SVM, transforming the original mixed integer programming, to a semi-definite programming relaxation, and finally to a sequence of quadratic programming subproblems, which yields the same worst case time complexity as that of S(3)VMs. Finally, we demonstrate the effectiveness and efficiency of our proposed 3C-SVM through systematical experimental comparisons.


Assuntos
Aprendizado de Máquina Supervisionado , Algoritmos , Inteligência Artificial , Interpretação Estatística de Dados , Entropia , Processamento de Imagem Assistida por Computador , Modelos Neurológicos , Modelos Estatísticos , Máquina de Vetores de Suporte
7.
IEEE J Biomed Health Inform ; 19(5): 1648-59, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25248205

RESUMO

This paper presents compact yet comprehensive feature representations for the electroencephalogram (EEG) signal to achieve efficient epileptic seizure prediction performance. The initial EEG feature vectors are formed by acquiring the dominant amplitude and frequency components on an epoch-by-epoch basis from the EEG signals. These extracted parameters can reveal the intrinsic EEG signal changes as well as the underlying stage transitions. To improve the efficacy of feature extraction, an elimination-based feature selection method has been applied on the initial feature vectors. This diminishes redundant and noisy points, providing each patient with a lower dimensional and independent final feature form. In this context, our study is distinguished from that of others currently prevailing. Usually, these latter approaches adopted feature extraction processes, which employed time-consuming high-dimensional parameter sets. Machine learning approaches that are considered as state of the art have been employed to build patient-specific binary classifiers that can divide the extracted feature parameters into preictal and interictal groups. Through out-of-sample evaluation on the intracranial EEG recordings provided by the publicly available Freiburg dataset, promising prediction performance has been attained. Specifically, we have achieved 98.8% sensitivity results on the 19 patients included in our experiment, where only one of 83 seizures across all patients was not predicted. To make this investigation more comprehensive, we have conducted extensive comparative studies with other recently published competing approaches, in which the advantages of our method are highlighted.


Assuntos
Eletroencefalografia/métodos , Epilepsia/diagnóstico , Epilepsia/fisiopatologia , Processamento de Sinais Assistido por Computador , Adolescente , Adulto , Criança , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Sensibilidade e Especificidade
8.
IEEE Trans Syst Man Cybern B Cybern ; 42(1): 93-106, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21827974

RESUMO

Expertise retrieval, whose task is to suggest people with relevant expertise on the topic of interest, has received increasing interest in recent years. One of the issues is that previous algorithms mainly consider the documents associated with the experts while ignoring the community information that is affiliated with the documents and the experts. Motivated by the observation that communities could provide valuable insight and distinctive information, we investigate and develop two community-aware strategies to enhance expertise retrieval. We first propose a new smoothing method using the community context for statistical language modeling, which is employed to identify the most relevant documents so as to boost the performance of expertise retrieval in the document-based model. Furthermore, we propose a query-sensitive AuthorRank to model the authors' authorities based on the community coauthorship networks and develop an adaptive ranking refinement method to enhance expertise retrieval. Experimental results demonstrate the effectiveness and robustness of both community-aware strategies. Moreover, the improvements made in the enhanced models are significant and consistent.


Assuntos
Algoritmos , Inteligência Artificial , Mineração de Dados/métodos , Sistemas Inteligentes , Modelos Teóricos , Reconhecimento Automatizado de Padrão/métodos , Encaminhamento e Consulta , Simulação por Computador , Técnicas de Apoio para a Decisão
9.
IEEE Trans Neural Netw ; 22(3): 433-46, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21257374

RESUMO

Kernel methods have been successfully applied in various applications. To succeed in these applications, it is crucial to learn a good kernel representation, whose objective is to reveal the data similarity precisely. In this paper, we address the problem of multiple kernel learning (MKL), searching for the optimal kernel combination weights through maximizing a generalized performance measure. Most MKL methods employ the L(1)-norm simplex constraints on the kernel combination weights, which therefore involve a sparse but non-smooth solution for the kernel weights. Despite the success of their efficiency, they tend to discard informative complementary or orthogonal base kernels and yield degenerated generalization performance. Alternatively, imposing the L(p)-norm (p > 1) constraint on the kernel weights will keep all the information in the base kernels. This leads to non-sparse solutions and brings the risk of being sensitive to noise and incorporating redundant information. To tackle these problems, we propose a generalized MKL (GMKL) model by introducing an elastic-net-type constraint on the kernel weights. More specifically, it is an MKL model with a constraint on a linear combination of the L(1)-norm and the squared L(2)-norm on the kernel weights to seek the optimal kernel combination weights. Therefore, previous MKL problems based on the L(1)-norm or the L(2)-norm constraints can be regarded as special cases. Furthermore, our GMKL enjoys the favorable sparsity property on the solution and also facilitates the grouping effect. Moreover, the optimization of our GMKL is a convex optimization problem, where a local solution is the global optimal solution. We further derive a level method to efficiently solve the optimization problem. A series of experiments on both synthetic and real-world datasets have been conducted to show the effectiveness and efficiency of our GMKL.


Assuntos
Algoritmos , Inteligência Artificial , Modelos Neurológicos , Redes Neurais de Computação , Reconhecimento Automatizado de Padrão/métodos , Ensino
10.
Neural Comput ; 21(2): 560-82, 2009 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-19431269

RESUMO

Support vector machines (SVM) are state-of-the-art classifiers. Typically L2-norm or L1-norm is adopted as a regularization term in SVMs, while other norm-based SVMs, for example, the L0-norm SVM or even the L(infinity)-norm SVM, are rarely seen in the literature. The major reason is that L0-norm describes a discontinuous and nonconvex term, leading to a combinatorially NP-hard optimization problem. In this letter, motivated by Bayesian learning, we propose a novel framework that can implement arbitrary norm-based SVMs in polynomial time. One significant feature of this framework is that only a sequence of sequential minimal optimization problems needs to be solved, thus making it practical in many real applications. The proposed framework is important in the sense that Bayesian priors can be efficiently plugged into most learning methods without knowing the explicit form. Hence, this builds a connection between Bayesian learning and the kernel machines. We derive the theoretical framework, demonstrate how our approach works on the L0-norm SVM as a typical example, and perform a series of experiments to validate its advantages. Experimental results on nine benchmark data sets are very encouraging. The implemented L0-norm is competitive with or even better than the standard L2-norm SVM in terms of accuracy but with a reduced number of support vectors, -9.46% of the number on average. When compared with another sparse model, the relevance vector machine, our proposed algorithm also demonstrates better sparse properties with a training speed over seven times faster.


Assuntos
Algoritmos , Inteligência Artificial , Teorema de Bayes , Redes Neurais de Computação , Bases de Dados Factuais/estatística & dados numéricos , Humanos , Aprendizagem/fisiologia , Valores de Referência
11.
IEEE Trans Pattern Anal Mach Intell ; 31(7): 1210-24, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19443920

RESUMO

In this paper, we present a fusion approach to solve the nonrigid shape recovery problem, which takes advantage of both the appearance information and the local features. We have two major contributions. First, we propose a novel progressive finite Newton optimization scheme for the feature-based nonrigid surface detection problem, which is reduced to only solving a set of linear equations. The key is to formulate the nonrigid surface detection as an unconstrained quadratic optimization problem that has a closed-form solution for a given set of observations. Second, we propose a deformable Lucas-Kanade algorithm that triangulates the template image into small patches and constrains the deformation through the second-order derivatives of the mesh vertices. We formulate it into a sparse regularized least squares problem, which is able to reduce the computational cost and the memory requirement. The inverse compositional algorithm is applied to efficiently solve the optimization problem. We have conducted extensive experiments for performance evaluation on various environments, whose promising results show that the proposed algorithm is both efficient and effective.


Assuntos
Algoritmos , Inteligência Artificial , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Técnica de Subtração , Modelos Biológicos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
12.
Neural Netw ; 22(7): 977-87, 2009 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-19167865

RESUMO

Kernel methods have been widely used in pattern recognition. Many kernel classifiers such as Support Vector Machines (SVM) assume that data can be separated by a hyperplane in the kernel-induced feature space. These methods do not consider the data distribution and are difficult to output the probabilities or confidences for classification. This paper proposes a novel Kernel-based Maximum A Posteriori (KMAP) classification method, which makes a Gaussian distribution assumption instead of a linear separable assumption in the feature space. Robust methods are further proposed to estimate the probability densities, and the kernel trick is utilized to calculate our model. The model is theoretically and empirically important in the sense that: (1) it presents a more generalized classification model than other kernel-based algorithms, e.g., Kernel Fisher Discriminant Analysis (KFDA); (2) it can output probability or confidence for classification, therefore providing potential for reasoning under uncertainty; and (3) multi-way classification is as straightforward as binary classification in this model, because only probability calculation is involved and no one-against-one or one-against-others voting is needed. Moreover, we conduct an extensive experimental comparison with state-of-the-art classification methods, such as SVM and KFDA, on both eight UCI benchmark data sets and three face data sets. The results demonstrate that KMAP achieves very promising performance against other models.


Assuntos
Algoritmos , Inteligência Artificial , Armazenamento e Recuperação da Informação , Reconhecimento Automatizado de Padrão , Biometria , Análise Discriminante , Face , Humanos , Interpretação de Imagem Assistida por Computador , Dinâmica não Linear , Distribuição Normal
13.
IEEE Trans Syst Man Cybern B Cybern ; 39(2): 417-30, 2009 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-19095552

RESUMO

Heat-diffusion models have been successfully applied to various domains such as classification and dimensionality-reduction tasks in manifold learning. One critical local approximation technique is employed to weigh the edges in the graph constructed from data points. This approximation technique is based on an implicit assumption that the data are distributed evenly. However, this assumption is not valid in most cases, so the approximation is not accurate in these cases. To solve this challenging problem, we propose a volume-based heat-diffusion model (VHDM). In VHDM, the volume is theoretically justified by handling the input data that are unevenly distributed on an unknown manifold. We also propose a novel volume-based heat-diffusion classifier (VHDC) based on VHDM. One of the advantages of VHDC is that the computational complexity is linear on the number of edges given a constructed graph. Moreover, we give an analysis on the stability of VHDC with respect to its three free parameters, and we demonstrate the connection between VHDC and some other classifiers. Experiments show that VHDC performs better than Parzen window approach, K nearest neighbor, and the HDC without volumes in prediction accuracy and outperforms some recently proposed transductive-learning algorithms. The enhanced performance of VHDC shows the validity of introducing the volume. The experiments also confirm the stability of VHDC with respect to its three free parameters.

14.
IEEE Trans Syst Man Cybern B Cybern ; 36(4): 913-23, 2006 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-16903374

RESUMO

Imbalanced learning is a challenged task in machine learning. In this context, the data associated with one class are far fewer than those associated with the other class. Traditional machine learning methods seeking classification accuracy over a full range of instances are not suitable to deal with this problem, since they tend to classify all the data into a majority class, usually the less important class. In this correspondence, the authors describe a new approach named the biased minimax probability machine (BMPM) to deal with the problem of imbalanced learning. This BMPM model is demonstrated to provide an elegant and systematic way for imbalanced learning. More specifically, by controlling the accuracy of the majority class under all possible choices of class-conditional densities with a given mean and covariance matrix, this model can quantitatively and systematically incorporate a bias for the minority class. By establishing an explicit connection between the classification accuracy and the bias, this approach distinguishes itself from the many current imbalanced-learning methods; these methods often impose a certain bias on the minority data by adapting intermediate factors via the trial-and-error procedure. The authors detail the theoretical foundation, prove its solvability, propose an efficient optimization algorithm, and perform a series of experiments to evaluate the novel model. The comparison with other competitive methods demonstrates the effectiveness of this new model.


Assuntos
Algoritmos , Inteligência Artificial , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos , Simulação por Computador
15.
IEEE Trans Biomed Eng ; 53(5): 821-31, 2006 May.
Artigo em Inglês | MEDLINE | ID: mdl-16686404

RESUMO

The challenging task of medical diagnosis based on machine learning techniques requires an inherent bias, i.e., the diagnosis should favor the "ill" class over the "healthy" class, since misdiagnosing a patient as a healthy person may delay the therapy and aggravate the illness. Therefore, the objective in this task is not to improve the overall accuracy of the classification, but to focus on improving the sensitivity (the accuracy of the "ill" class) while maintaining an acceptable specificity (the accuracy of the "healthy" class). Some current methods adopt roundabout ways to impose a certain bias toward the important class, i.e., they try to utilize some intermediate factors to influence the classification. However, it remains uncertain whether these methods can improve the classification performance systematically. In this paper, by engaging a novel learning tool, the biased minimax probability machine (BMPM), we deal with the issue in a more elegant way and directly achieve the objective of appropriate medical diagnosis. More specifically, the BMPM directly controls the worst case accuracies to incorporate a bias toward the "ill" class. Moreover, in a distribution-free way, the BMPM derives the decision rule in such a way as to maximize the worst case sensitivity while maintaining an acceptable worst case specificity. By directly controlling the accuracies, the BMPM provides a more rigorous way to handle medical diagnosis; by deriving a distribution-free decision rule, the BMPM distinguishes itself from a large family of classifiers, namely, the generative classifiers, where an assumption on the data distribution is necessary. We evaluate the performance of the model and compare it with three traditional classifiers: the k-nearest neighbor, the naive Bayesian, and the C4.5. The test results on two medical datasets, the breast-cancer dataset and the heart disease dataset, show that the BMPM outperforms the other three models.


Assuntos
Algoritmos , Neoplasias da Mama/diagnóstico , Sistemas de Apoio a Decisões Clínicas , Técnicas de Apoio para a Decisão , Diagnóstico por Computador/métodos , Cardiopatias/diagnóstico , Simulação por Computador , Humanos , Modelos Estatísticos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
16.
Protein Pept Lett ; 13(4): 391-6, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16712516

RESUMO

In this paper, the tertiary structures of protein chains of heterocomplexes were mapped to 2D networks; based on the mapping approach, statistical properties of these networks were systematically studied. Firstly, our experimental results confirmed that the networks derived from protein structures possess small-world properties. Secondly, an interesting relationship between network average degree and the network size was discovered, which was quantified as an empirical function enabling us to estimate the number of residue contacts of the protein chains accurately. Thirdly, by analyzing the average clustering coefficient for nodes having the same degree in the network, it was found that the architectures of the networks and protein structures analyzed are hierarchically organized. Finally, network motifs were detected in the networks which are believed to determine the family or superfamily the networks belong to. The study of protein structures with the new perspective might shed some light on understanding the underlying laws of evolution, function and structures of proteins, and therefore would be complementary to other currently existing methods.


Assuntos
Estrutura Terciária de Proteína , Proteínas/química , Modelos Moleculares , Modelos Teóricos
17.
IEEE Trans Syst Man Cybern B Cybern ; 36(2): 300-11, 2006 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-16602591

RESUMO

The extraction of component line segments and circular arcs from freehand strokes along with their relations is a prerequisite for sketch understanding. Existing approaches usually take three stages to segment a stroke: first identifying segmentation points, then classifying the substroke between each pair of adjacent segmentation points, and, finally, obtaining graphical representations of substrokes by fitting graphical primitives to them. Since a stroke inevitably contains noises, the first stage may produce wrong or inaccurate segmentation points, resulting in the wrong substroke classification in the second stage and inaccurately fitted parameters in the third stage. To overcome the noise sensitivity of the three-stage method, the segmental homogeneity feature is emphasized in this paper. We propose a novel approach, which first extracts graphical primitives from a stroke by a connected segment growing from a seed-segment and then utilizes relationships between the primitives to refine their control parameters. We have conducted experiments using real-life strokes and compared the proposed approach with others. Experimental results demonstrate that the proposed approach is effective and robust.


Assuntos
Algoritmos , Inteligência Artificial , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Pinturas , Reconhecimento Automatizado de Padrão/métodos
18.
FEBS Lett ; 580(2): 380-4, 2006 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-16376878

RESUMO

This paper proposes a novel method that can predict protein interaction sites in heterocomplexes using residue spatial sequence profile and evolution rate approaches. The former represents the information of multiple sequence alignments while the latter corresponds to a residue's evolutionary conservation score based on a phylogenetic tree. Three predictors using a support vector machines algorithm are constructed to predict whether a surface residue is a part of a protein-protein interface. The efficiency and the effectiveness of our proposed approach is verified by its better prediction performance compared with other models. The study is based on a non-redundant data set of heterodimers consisting of 69 protein chains.


Assuntos
Algoritmos , Conformação Proteica , Mapeamento de Interação de Proteínas , Proteínas/química , Sequência de Aminoácidos , Evolução Molecular , Substâncias Macromoleculares , Modelos Moleculares , Dados de Sequência Molecular , Proteínas/genética , Proteínas/metabolismo , Análise de Sequência de Proteína
19.
IEEE Trans Syst Man Cybern B Cybern ; 35(1): 2-11, 2005 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-15719928

RESUMO

Merged characters are the major cause of recognition errors. We classify the merging relationship between two involved characters into three types: "linear," "nonlinear," and "overlapped." Most segmentation methods handle the first type well, however, their capabilities of handling the other two types are limited. The weakness of handling the nonlinear and overlapped types results from character segmentation by linear, usually vertical, cuts assumed in these methods. This paper proposes a novel merged character segmentation and recognition method based on forepart prediction, necessity-sufficiency matching and character-adaptive masking. This method utilizes the information obtained from the forepart of merged characters to predict candidates for the leftmost character, and then applies character-adaptive masking and character recognition to verifying the prediction. Therefore, the arbitrary-shaped cutting path will follow the right shape of the leftmost character so as to preserve the shape of the next character. This method handles the first two types well and greatly improves the segmentation accuracy of the overlapped type. The experimental results and the performance comparisons with other methods demonstrate the effectiveness of the proposed method.


Assuntos
Algoritmos , Inteligência Artificial , Processamento Eletrônico de Dados/métodos , Interpretação de Imagem Assistida por Computador/métodos , Armazenamento e Recuperação da Informação/métodos , Reconhecimento Automatizado de Padrão/métodos , Leitura , Gráficos por Computador , Documentação/métodos , Aumento da Imagem/métodos , Análise Numérica Assistida por Computador , Impressão , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Processamento de Sinais Assistido por Computador
20.
IEEE Trans Pattern Anal Mach Intell ; 26(11): 1491-506, 2004 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-15521496

RESUMO

Arc segmentation plays an important role in the process of graphics recognition from scanned images. The GREC arc segmentation contest shows there is a lot of room for improvement in this area. This paper proposes a multiresolution arc segmentation method based on our previous seeded circular tracking algorithm which largely depends on the OOPSV model. The newly-introduced multiresolution paradigm can handle arcs/circles with large radii well. We describe new approaches for arc seed detection, arc localization, and arc verification, making the proposed method self-contained and more efficient. Moreover, this paper also brings major improvement to the dynamic adjustment algorithm of circular tracking to make it more robust. A systematic performance evaluation of the proposed method has been conducted using the third-party evaluation tool and test images obtained from the GREC arc segmentation contests. The overall performance over various arc angles, arc lengths, line thickness, noises, arc-arc intersections, and arc-line intersections has been measured. The experimental results and time complexity analyses on real scanned images are also reported and compared with other approaches. The evaluation result demonstrates the stable performance and the significant improvement on processing large arcs/circles of the MAS method.


Assuntos
Algoritmos , Inteligência Artificial , Gráficos por Computador , Interpretação de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão , Processamento de Sinais Assistido por Computador , Técnica de Subtração , Análise por Conglomerados , Simulação por Computador , Aumento da Imagem/métodos , Armazenamento e Recuperação da Informação/métodos , Análise Numérica Assistida por Computador , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA