Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
Bioinformatics ; 34(3): 477-484, 2018 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-29028926

RESUMO

Motivation: Protein-peptide interactions are one of the most important biological interactions and play crucial role in many diseases including cancer. Therefore, knowledge of these interactions provides invaluable insights into all cellular processes, functional mechanisms, and drug discovery. Protein-peptide interactions can be analyzed by studying the structures of protein-peptide complexes. However, only a small portion has known complex structures and experimental determination of protein-peptide interaction is costly and inefficient. Thus, predicting peptide-binding sites computationally will be useful to improve efficiency and cost effectiveness of experimental studies. Here, we established a machine learning method called SPRINT-Str (Structure-based prediction of protein-Peptide Residue-level Interaction) to use structural information for predicting protein-peptide binding residues. These predicted binding residues are then employed to infer the peptide-binding site by a clustering algorithm. Results: SPRINT-Str achieves robust and consistent results for prediction of protein-peptide binding regions in terms of residues and sites. Matthews' Correlation Coefficient (MCC) for 10-fold cross validation and independent test set are 0.27 and 0.293, respectively, as well as 0.775 and 0.782, respectively for area under the curve. The prediction outperforms other state-of-the-art methods, including our previously developed sequence-based method. A further spatial neighbor clustering of predicted binding residues leads to prediction of binding sites at 20-116% higher coverage than the next best method at all precision levels in the test set. The application of SPRINT-Str to protein binding with DNA, RNA and carbohydrate confirms the method's capability of separating peptide-binding sites from other functional sites. More importantly, similar performance in prediction of binding residues and sites is obtained when experimentally determined structures are replaced by unbound structures or quality model structures built from homologs, indicating its wide applicability. Availability and implementation: http://sparks-lab.org/server/SPRINT-Str. Contact: yangyd25@mail.sysu.edu.cn. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado de Máquina , Peptídeos/metabolismo , Proteínas/metabolismo , Análise de Sequência de Proteína/métodos , Biologia Computacional/métodos , Humanos , Peptídeos/química , Ligação Proteica , Domínios Proteicos , Proteína Tirosina Fosfatase não Receptora Tipo 4/metabolismo , Proteínas/química
2.
J Comput Chem ; 39(22): 1757-1763, 2018 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-29761520

RESUMO

Malonylation is a recently discovered post-translational modification (PTM) in which a malonyl group attaches to a lysine (K) amino acid residue of a protein. In this work, a novel machine learning model, SPRINT-Mal, is developed to predict malonylation sites by employing sequence and predicted structural features. Evolutionary information and physicochemical properties are found to be the two most discriminative features whereas a structural feature called half-sphere exposure provides additional improvement to the prediction performance. SPRINT-Mal trained on mouse data yields robust performance for 10-fold cross validation and independent test set with Area Under the Curve (AUC) values of 0.74 and 0.76 and Matthews' Correlation Coefficient (MCC) of 0.213 and 0.20, respectively. Moreover, SPRINT-Mal achieved comparable performance when testing on H. sapiens proteins without species-specific training but not in bacterium S. erythraea. This suggests similar underlying physicochemical mechanisms between mouse and human but not between mouse and bacterium. SPRINT-Mal is freely available as an online server at: http://sparks-lab.org/server/SPRINT-Mal/. © 2018 Wiley Periodicals, Inc.


Assuntos
Proteínas de Bactérias/química , Lisina/química , Aprendizado de Máquina , Malonatos/química , Animais , Proteínas de Bactérias/metabolismo , Hominidae/metabolismo , Humanos , Lisina/metabolismo , Malonatos/metabolismo , Camundongos , Estrutura Molecular , Processamento de Proteína Pós-Traducional , Saccharopolyspora/química , Saccharopolyspora/metabolismo
3.
J Comput Chem ; 37(13): 1223-9, 2016 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-26833816

RESUMO

Protein-peptide interactions are essential for all cellular processes including DNA repair, replication, gene-expression, and metabolism. As most protein-peptide interactions are uncharacterized, it is cost effective to investigate them computationally as the first step. All existing approaches for predicting protein-peptide binding sites, however, are based on protein structures despite the fact that the structures for most proteins are not yet solved. This article proposes the first machine-learning method called SPRINT to make Sequence-based prediction of Protein-peptide Residue-level Interactions. SPRINT yields a robust and consistent performance for 10-fold cross validations and independent test. The most important feature is evolution-generated sequence profiles. For the test set (1056 binding and non-binding residues), it yields a Matthews' Correlation Coefficient of 0.326 with a sensitivity of 64% and a specificity of 68%. This sequence-based technique shows comparable or more accurate than structure-based methods for peptide-binding site prediction. SPRINT is available as an online server at: http://sparks-lab.org/. © 2016 Wiley Periodicals, Inc.


Assuntos
Peptídeos/química , Peptídeos/metabolismo , Proteínas/química , Proteínas/metabolismo , Máquina de Vetores de Suporte , Sítios de Ligação
4.
Bioinformatics ; 31(3): 390-6, 2015 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-25304779

RESUMO

MOTIVATION: Cellular interactions of kinesin-1, an adenosine triphosphate (ATP)-driven motor protein capable of undergoing multiple steps on a microtubule (MT), affect its mechanical processivity, the number of steps taken per encounter with MT. Even though the processivity of kinesin has been widely studied, a detailed study of the factors that affect the stepping of the motor along MT is still lacking. RESULTS: We model the cellular interactions of kinesin as a probabilistic timed automaton and use the model to simulate the mechanical processivity of the motor. Theoretical analysis suggests: (i) backward stepping tends to be powered by ATP hydrolysis, rather than ATP synthesis, (ii) backward stepping powered by ATP synthesis is more likely to happen with limiting ATP concentration ([ATP]) at high loads and (iii) with increasing load the frequency of backward stepping powered by ATP hydrolysis at high [ATP] is greater than that powered by ATP synthesis at limiting [ATP]. Together, the higher frequency of backward stepping powered by ATP hydrolysis than by ATP synthesis is found to be a reason for the more dramatic falling of kinesin processivity with rising load at high [ATP] compared with that at low [ATP]. Simulation results further show that the processivity of kinesin can be determined by the number of ATP hydrolysis and synthesis kinetic cycles taken by the motor before becoming inactive. It is also found that the duration of a backward stepping cycle at high loads is more likely to be less than that of a forward stepping cycle. CONTACT: h.r.khataee@griffithuni.edu.au or a.liew@griffith.edu.au.


Assuntos
Trifosfato de Adenosina/metabolismo , Simulação por Computador , Cinesinas/metabolismo , Microtúbulos/metabolismo , Modelos Biológicos , Processos Estocásticos , Algoritmos , Humanos , Hidrólise , Cinética
5.
J Chem Inf Model ; 56(10): 2115-2122, 2016 10 24.
Artigo em Inglês | MEDLINE | ID: mdl-27623166

RESUMO

Carbohydrate-binding proteins play significant roles in many diseases including cancer. Here, we established a machine-learning-based method (called sequence-based prediction of residue-level interaction sites of carbohydrates, SPRINT-CBH) to predict carbohydrate-binding sites in proteins using support vector machines (SVMs). We found that integrating evolution-derived sequence profiles with additional information on sequence and predicted solvent accessible surface area leads to a reasonably accurate, robust, and predictive method, with area under receiver operating characteristic curve (AUC) of 0.78 and 0.77 and Matthew's correlation coefficient of 0.34 and 0.29, respectively for 10-fold cross validation and independent test without balancing binding and nonbinding residues. The quality of the method is further demonstrated by having statistically significantly more binding residues predicted for carbohydrate-binding proteins than presumptive nonbinding proteins in the human proteome, and by the bias of rare alleles toward predicted carbohydrate-binding sites for nonsynonymous mutations from the 1000 genome project. SPRINT-CBH is available as an online server at http://sparks-lab.org/server/SPRINT-CBH .


Assuntos
Metabolismo dos Carboidratos , Proteínas/metabolismo , Máquina de Vetores de Suporte , Sítios de Ligação , Carboidratos/química , Bases de Dados de Proteínas , Humanos , Simulação de Acoplamento Molecular , Ligação Proteica , Proteínas/química , Curva ROC
6.
J Chem Inf Model ; 54(12): 3439-45, 2014 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-25400227

RESUMO

Kinesin is a walking motor protein that shuttles cellular cargoes along microtubules (MTs). This protein is considered as an information processor capable of sensing cellular inputs and transforming them into mechanical steps. Here, we propose a computational model to describe the mechanochemical kinetics underlying forward and backward stepping behavior of kinesin motor as a digital circuit designed based on an adenosine triphosphate (ATP)-driven finite state machine. Kinetic analysis suggests that the backward stepping of kinesin is mainly driven by ATP hydrolysis, whereas ATP synthesis rises the duration of this stepping. It is shown that kinesin pausing due to waiting for ATP binding at limiting ATP concentration ([ATP]) and low backward loads could be longer than that caused by low rate of ATP synthesis under high backward loads. These findings indicate that the pausing duration of kinesin in MT-bound (M·K) kinetic state is affected by [ATP], which in turn affects its velocity at fixed loads. We show that the proposed computational model accurately simulates the forward and backward stepping behavior of kinesin motor under different [ATP] and loads.


Assuntos
Cinesinas/metabolismo , Modelos Biológicos , Trifosfato de Adenosina/metabolismo , Cinética , Microtúbulos/metabolismo , Processos Estocásticos
7.
Artigo em Inglês | MEDLINE | ID: mdl-38917281

RESUMO

For incomplete data classification, missing attribute values are often estimated by imputation methods before building classifiers. The estimated attribute values are not actual attribute values. Thus, the distributions of data will be changed after imputing, and this phenomenon often results in degradation of classification performance. Here, we propose a new framework called integration of multikinds imputation with covariance adaptation (MICA) based on evidence theory (ET) to effectively deal with the classification problem with incomplete training data and complete test data. In MICA, we first employ different kinds of imputation methods to obtain multiple imputed training datasets. In general, the distributions of each imputed training dataset and test dataset will be different. A covariance adaptation module (CAM) is then developed to reduce the distribution difference of each imputed training dataset and test dataset. Then, multiple classifiers can be learned on the multiple imputed training datasets, and they are complementary to each other. For a test pattern, we can combine the multiple pieces of soft classification results yielded by these classifiers based on ET to obtain better classification performance. However, the reliabilities/weights of different imputed training datasets are usually different, so the soft classification results cannot be treated equally during fusion. We propose to use covariance difference across datasets and accuracy of imputed training data to estimate the weights. Finally, the soft classification results discounted by the estimated weights are combined by ET to make the final class decision. MICA was compared with a variety of related methods on several datasets, and the experimental results demonstrate that this new method can significantly improve the classification performance.

8.
Artigo em Inglês | MEDLINE | ID: mdl-38743540

RESUMO

Conversational recommender systems (CRSs) utilize natural language interactions and dialog history to infer user preferences and provide accurate recommendations. Due to the limited conversation context and background knowledge, existing CRSs rely on external sources such as knowledge graphs (KGs) to enrich the context and model entities based on their interrelations. However, these methods ignore the rich intrinsic information within entities. To address this, we introduce the knowledge-enhanced entity representation learning (KERL) framework, which leverages both the KG and a pretrained language model (PLM) to improve the semantic understanding of entities for CRS. In our KERL framework, entity textual descriptions are encoded via a PLM, while a KG helps reinforce the representation of these entities. We also employ positional encoding to effectively capture the temporal information of entities in a conversation. The enhanced entity representation is then used to develop a recommender component that fuses both entity and contextual representations for more informed recommendations, as well as a dialog component that generates informative entity-related information in the response text. A high-quality KG with aligned entity descriptions is constructed to facilitate this study, namely, the Wiki Movie Knowledge Graph (WikiMKG). The experimental results show that KERL achieves state-of-the-art results in both recommendation and response generation tasks. Our code is publicly available at the link: https://github.com/icedpanda/KERL.

9.
Brief Bioinform ; 12(5): 498-513, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21156727

RESUMO

Microarray gene expression data generally suffers from missing value problem due to a variety of experimental reasons. Since the missing data points can adversely affect downstream analysis, many algorithms have been proposed to impute missing values. In this survey, we provide a comprehensive review of existing missing value imputation algorithms, focusing on their underlying algorithmic techniques and how they utilize local or global information from within the data, or their use of domain knowledge during imputation. In addition, we describe how the imputation results can be validated and the different ways to assess the performance of different imputation algorithms, as well as a discussion on some possible future research directions. It is hoped that this review will give the readers a good understanding of the current development in this field and inspire them to come up with the next generation of imputation algorithms.


Assuntos
Perfilação da Expressão Gênica/métodos , Expressão Gênica , Algoritmos , Animais , Bases de Dados Factuais , Humanos , Análise em Microsséries/métodos
10.
Artigo em Inglês | MEDLINE | ID: mdl-37227909

RESUMO

For cross-domain pattern classification, the supervised information (i.e., labeled patterns) in the source domain is often employed to help classify the unlabeled target domain patterns. In practice, multiple target domains are usually available. The unlabeled patterns (in different target domains) which have high-confidence predictions, can also provide some pseudo-supervised information for the downstream classification task. The performance in each target domain would be further improved if the pseudo-supervised information in different target domains can be effectively used. To this end, we propose an evidential multi-target domain adaptation (EMDA) method to take full advantage of the useful information in the single-source and multiple target domains. In EMDA, we first align distributions of the source and target domains by reducing maximum mean discrepancy (MMD) and covariance difference across domains. After that, we use the classifier learned by the labeled source domain data to classify query patterns in the target domains. The query patterns with high-confidence predictions are then selected to train a new classifier for yielding an extra piece of soft classification results of query patterns. The two pieces of soft classification results are then combined by evidence theory. In practice, their reliabilities/weights are usually diverse, and an equal treatment of them often yields the unreliable combination result. Thus, we propose to use the distribution discrepancy across domains to estimate their weighting factors, and discount them before fusing. The evidential combination of the two pieces of discounted soft classification results is employed to make the final class decision. The effectiveness of EMDA was verified by comparing with many advanced domain adaptation methods on several cross-domain pattern classification benchmark datasets.

11.
Quant Imaging Med Surg ; 13(9): 5713-5726, 2023 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-37711804

RESUMO

Background: Thyroid cancer is the most common malignancy in the endocrine system, with its early manifestation being the presence of thyroid nodules. With the advantages of convenience, noninvasiveness, and a lack of radiation, ultrasound is currently the first-line screening tool for the clinical diagnosis of thyroid nodules. The use of artificial intelligence to assist diagnosis is an emerging technology. This paper proposes the use optical neural networks for potential application in the auxiliary diagnosis of thyroid nodules. Methods: Ultrasound images obtained from January 2013 to December 2018 at the Institute and Hospital of Oncology, Tianjin Medical University, were included in a dataset. Patients who consecutively underwent thyroid ultrasound diagnosis and follow-up procedures were included. We developed an all-optical diffraction neural network to assist in the diagnosis of thyroid nodules. The network is composed of 5 diffraction layers and 1 detection plane. The input image is placed 10 mm away from the first diffraction layer. The input of the diffractive neural network is light at a wavelength of 632.8 nm, and the output of this network is determined by the amplitude and light intensity obtained from the detection region. Results: The all-optical neural network was used to assist in the diagnosis of thyroid nodules. In the classification task of benign and malignant thyroid nodules, the accuracy of classification on the test set was 97.79%, with an area under the curve value of 99.8%. In the task of detecting thyroid nodules, we first trained the model to determine whether any nodules were present and achieved an accuracy of 84.92% on the test set. Conclusions: Our study demonstrates the potential of all-optical neural networks in the field of medical image processing. The performance of the models based on optical neural networks is comparable to other widely used network models in the field of image classification.

12.
IEEE Trans Pattern Anal Mach Intell ; 44(1): 32-49, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-32750824

RESUMO

Many image processing and pattern recognition problems can be formulated as binary quadratic programming (BQP) problems. However, solving a large BQP problem with a good quality solution and low computational time is still a challenging unsolved problem. Current methodologies either adopt an independent random search in a semi-definite space or perform search in a relaxed biconvex space. However, the independent search has great computation cost as many different trials are needed to get a good solution. The biconvex search only searches the solution in a local convex ball, which can be a local optimal solution. In this paper, we propose a BQP solver that alternatingly applies a deterministic search and a stochastic neighborhood search. The deterministic search iteratively improves the solution quality until it satisfies the KKT optimality conditions. The stochastic search performs bootstrapping sampling to the objective function constructed from the potential solution to find a stochastic neighborhood vector. These two steps are repeated until the obtained solution is better than many of its stochastic neighborhood vectors. We compare the proposed solver with several state-of-the-art methods for a range of image processing and pattern recognition problems. Experimental results showed that the proposed solver not only outperformed them in solution quality but also with the lowest computational complexity.

13.
IEEE Trans Image Process ; 31: 499-512, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34874859

RESUMO

In this paper, we propose a novel method for boundary detection in close-range hyperspectral images. This method can effectively predict the boundaries of objects of similar colour but different materials. To effectively extract the material information in the image, the spatial distribution of the spectral responses of different materials or endmembers is first estimated by hyperspectral unmixing. The resulting abundance map represents the fraction of each endmember spectra at each pixel. The abundance map is used as a supportive feature such that the spectral signature and the abundance vector for each pixel are fused to form a new spectral feature vector. Then different spectral similarity measures are adopted to construct a sparse spectral-spatial affinity matrix that characterizes the similarity between the spectral feature vectors of neighbouring pixels within a local neighborhood. After that, a spectral clustering method is adopted to produce eigenimages. Finally, the boundary map is constructed from the most informative eigenimages. We created a new HSI dataset and use it to compare the proposed method with four alternative methods, one for hyperspectral image and three for RGB image. The results exhibit that our method outperforms the alternatives and can cope with several scenarios that methods based on colour images cannot handle.

14.
Comput Methods Programs Biomed ; 207: 106127, 2021 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-34051412

RESUMO

BACKGROUND AND OBJECTIVE: Cerebral microbleeds (CMB) are important biomarkers of cerebrovascular diseases and cognitive dysfunctions. Susceptibility weighted imaging (SWI) is a common MRI sequence where CMB appear as small hypointense blobs. The prevalence of CMB in the population and in each scan is low, resulting in tedious and time-consuming visual assessment. Automated detection methods would be of value but are challenged by the CMB low prevalence, the presence of mimics such as blood vessels, and the difficulty to obtain sufficient ground truth for training and testing. In this paper, synthetic CMB (sCMB) generation using an analytical model is proposed for training and testing machine learning methods. The main aim is creating perfect synthetic ground truth as similar as reals, in high number, with a high diversity of shape, volume, intensity, and location to improve training of supervised methods. METHOD: sCMB were modelled with a random Gaussian shape and added to healthy brain locations. We compared training on our synthetic data to standard augmentation techniques. We performed a validation experiment using sCMB and report result for whole brain detection using a 10-fold cross validation design with an ensemble of 10 neural networks. RESULTS: Performance was close to state of the art (~9 false positives per scan), when random forest was trained on synthetic only and tested on real lesion. Other experiments showed that top detection performance could be achieved when training on synthetic CMB only. Our dataset is made available, including a version with 37,000 synthetic lesions, that could be used for benchmarking and training. CONCLUSION: Our proposed synthetic microbleeds model is a powerful data augmentation approach for CMB classification with and should be considered for training automated lesion detection system from MRI SWI.


Assuntos
Hemorragia Cerebral , Imageamento por Ressonância Magnética , Encéfalo , Hemorragia Cerebral/diagnóstico por imagem , Humanos , Aprendizado de Máquina , Redes Neurais de Computação
15.
Front Neurosci ; 15: 778767, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34975381

RESUMO

Cerebral microbleeds (CMB) are increasingly present with aging and can reveal vascular pathologies associated with neurodegeneration. Deep learning-based classifiers can detect and quantify CMB from MRI, such as susceptibility imaging, but are challenging to train because of the limited availability of ground truth and many confounding imaging features, such as vessels or infarcts. In this study, we present a novel generative adversarial network (GAN) that has been trained to generate three-dimensional lesions, conditioned by volume and location. This allows one to investigate CMB characteristics and create large training datasets for deep learning-based detectors. We demonstrate the benefit of this approach by achieving state-of-the-art CMB detection of real CMB using a convolutional neural network classifier trained on synthetic CMB. Moreover, we showed that our proposed 3D lesion GAN model can be applied on unseen dataset, with different MRI parameters and diseases, to generate synthetic lesions with high diversity and without needing laboriously marked ground truth.

16.
Redox Biol ; 47: 102136, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34653841

RESUMO

Autonomously spiking dopaminergic neurons of the substantia nigra pars compacta (SNpc) are exquisitely specialized and suffer toxic iron-loading in Parkinson's disease (PD). However, the molecular mechanism involved remains unclear and critical to decipher for designing new PD therapeutics. The long-lasting (L-type) CaV1.3 voltage-gated calcium channel is expressed at high levels amongst nigral neurons of the SNpc, and due to its role in calcium and iron influx, could play a role in the pathogenesis of PD. Neuronal iron uptake via this route could be unregulated under the pathological setting of PD and potentiate cellular stress due to its redox activity. This Commentary will focus on the role of the CaV1.3 channels in calcium and iron uptake in the context of pharmacological targeting. Prospectively, the audacious use of artificial intelligence to design innovative CaV1.3 channel inhibitors could lead to breakthrough pharmaceuticals that attenuate calcium and iron entry to ameliorate PD pathology.


Assuntos
Doença de Parkinson , Inteligência Artificial , Cálcio/metabolismo , Canais de Cálcio , Humanos , Ferro , Oxirredução , Doença de Parkinson/tratamento farmacológico
17.
BMC Bioinformatics ; 11: 164, 2010 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-20356386

RESUMO

BACKGROUND: Gene clustering for annotating gene functions is one of the fundamental issues in bioinformatics. The best clustering solution is often regularized by multiple constraints such as gene expressions, Gene Ontology (GO) annotations and gene network structures. How to integrate multiple pieces of constraints for an optimal clustering solution still remains an unsolved problem. RESULTS: We propose a novel multiconstrained gene clustering (MGC) method within the generalized projection onto convex sets (POCS) framework used widely in image reconstruction. Each constraint is formulated as a corresponding set. The generalized projector iteratively projects the clustering solution onto these sets in order to find a consistent solution included in the intersection set that satisfies all constraints. Compared with previous MGC methods, POCS can integrate multiple constraints from different nature without distorting the original constraints. To evaluate the clustering solution, we also propose a new performance measure referred to as Gene Log Likelihood (GLL) that considers genes having more than one function and hence in more than one cluster. Comparative experimental results show that our POCS-based gene clustering method outperforms current state-of-the-art MGC methods. CONCLUSIONS: The POCS-based MGC method can successfully combine multiple constraints from different nature for gene clustering. Also, the proposed GLL is an effective performance measure for the soft clustering solutions.


Assuntos
Perfilação da Expressão Gênica/métodos , Algoritmos , Análise por Conglomerados , Bases de Dados Genéticas , Funções Verossimilhança , Família Multigênica , Reconhecimento Automatizado de Padrão/métodos
18.
IEEE Trans Cybern ; 49(6): 2168-2177, 2019 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-29993920

RESUMO

In this paper, we introduced a new approach of combining multiple classifiers in a heterogeneous ensemble system. Instead of using numerical membership values when combining, we constructed interval membership values for each class prediction from the meta-data of observation by using the concept of information granule. In the proposed method, the uncertainty (diversity) of the predictions produced by the base classifiers is quantified by the interval-based information granules. The decision model is then generated by considering both bound and length of the intervals. Extensive experimentation using the UCI datasets has demonstrated the superior performance of our algorithm over other algorithms including six fixed combining methods, one trainable combining method, AdaBoost, bagging, and random subspace.

19.
BMC Bioinformatics ; 9: 209, 2008 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-18433477

RESUMO

BACKGROUND: In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification. However, in many situations a subset of genes only exhibits consistent pattern over a subset of conditions. Conventional clustering algorithms that deal with the entire row or column in an expression matrix would therefore fail to detect these useful patterns in the data. Recently, biclustering has been proposed to detect a subset of genes exhibiting consistent pattern over a subset of conditions. However, most existing biclustering algorithms are based on searching for sub-matrices within a data matrix by optimizing certain heuristically defined merit functions. Moreover, most of these algorithms can only detect a restricted set of bicluster patterns. RESULTS: In this paper, we present a novel geometric perspective for the biclustering problem. The biclustering process is interpreted as the detection of linear geometries in a high dimensional data space. Such a new perspective views biclusters with different patterns as hyperplanes in a high dimensional space, and allows us to handle different types of linear patterns simultaneously by matching a specific set of linear geometries. This geometric viewpoint also inspires us to propose a generic bicluster pattern, i.e. the linear coherent model that unifies the seemingly incompatible additive and multiplicative bicluster models. As a particular realization of our framework, we have implemented a Hough transform-based hyperplane detection algorithm. The experimental results on human lymphoma gene expression dataset show that our algorithm can find biologically significant subsets of genes. CONCLUSION: We have proposed a novel geometric interpretation of the biclustering problem. We have shown that many common types of bicluster are just different spatial arrangements of hyperplanes in a high dimensional data space. An implementation of the geometric framework using the Fast Hough transform for hyperplane detection can be used to discover biologically significant subsets of genes under subsets of conditions for microarray data analysis.


Assuntos
Análise por Conglomerados , Biologia Computacional/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Interpretação Estatística de Dados , Teoria da Decisão , Expressão Gênica , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos , Armazenamento e Recuperação da Informação/métodos , Modelos Lineares , Linfoma/genética , Reconhecimento Automatizado de Padrão/estatística & dados numéricos
20.
BMC Bioinformatics ; 9: 210, 2008 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-18433478

RESUMO

BACKGROUND: The DNA microarray technology allows the measurement of expression levels of thousands of genes under tens/hundreds of different conditions. In microarray data, genes with similar functions usually co-express under certain conditions only 1. Thus, biclustering which clusters genes and conditions simultaneously is preferred over the traditional clustering technique in discovering these coherent genes. Various biclustering algorithms have been developed using different bicluster formulations. Unfortunately, many useful formulations result in NP-complete problems. In this article, we investigate an efficient method for identifying a popular type of biclusters called additive model. Furthermore, parallel coordinate (PC) plots are used for bicluster visualization and analysis. RESULTS: We develop a novel and efficient biclustering algorithm which can be regarded as a greedy version of an existing algorithm known as pCluster algorithm. By relaxing the constraint in homogeneity, the proposed algorithm has polynomial-time complexity in the worst case instead of exponential-time complexity as in the pCluster algorithm. Experiments on artificial datasets verify that our algorithm can identify both additive-related and multiplicative-related biclusters in the presence of overlap and noise. Biologically significant biclusters have been validated on the yeast cell-cycle expression dataset using Gene Ontology annotations. Comparative study shows that the proposed approach outperforms several existing biclustering algorithms. We also provide an interactive exploratory tool based on PC plot visualization for determining the parameters of our biclustering algorithm. CONCLUSION: We have proposed a novel biclustering algorithm which works with PC plots for an interactive exploratory analysis of gene expression data. Experiments show that the biclustering algorithm is efficient and is capable of detecting co-regulated genes. The interactive analysis enables an optimum parameter determination in the biclustering algorithm so as to achieve the best result. In future, we will modify the proposed algorithm for other bicluster models such as the coherent evolution model.


Assuntos
Algoritmos , Análise por Conglomerados , Gráficos por Computador , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão/métodos , Interface Usuário-Computador , Inteligência Artificial , Linguagens de Programação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA