Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Sci Rep ; 14(1): 6296, 2024 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-38491261

RESUMO

Protein residues within binding pockets play a critical role in determining the range of ligands that can interact with a protein, influencing its structure and function. Identifying structural similarities in proteins offers valuable insights into their function and activation mechanisms, aiding in predicting protein-ligand interactions, anticipating off-target effects, and facilitating the development of therapeutic agents. Numerous computational methods assessing global or local similarity in protein cavities have emerged, but their utilization is impeded by complexity, impractical automation for amino acid pattern searches, and an inability to evaluate the dynamics of scrutinized protein-ligand systems. Here, we present a general, automatic and unbiased computational pipeline, named VirtuousPocketome, aimed at screening huge databases of proteins for similar binding pockets starting from an interested protein-ligand complex. We demonstrate the pipeline's potential by exploring a recently-solved human bitter taste receptor, i.e. the TAS2R46, complexed with strychnine. We pinpointed 145 proteins sharing similar binding sites compared to the analysed bitter taste receptor and the enrichment analysis highlighted the related biological processes, molecular functions and cellular components. This work represents the foundation for future studies aimed at understanding the effective role of tastants outside the gustatory system: this could pave the way towards the rationalization of the diet as a supplement to standard pharmacological treatments and the design of novel tastants-inspired compounds to target other proteins involved in specific diseases or disorders. The proposed pipeline is publicly accessible, can be applied to any protein-ligand complex, and could be expanded to screen any database of protein structures.


Assuntos
Proteínas , Papilas Gustativas , Humanos , Ligantes , Sítios de Ligação , Proteínas/metabolismo , Paladar , Papilas Gustativas/metabolismo , Ligação Proteica
2.
Heliyon ; 9(11): e21165, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-38027840

RESUMO

Background and objective: The development of machine learning-based models that can be used for the prediction of severe diseases has been one of the main concerns of the scientific community. The current study seeks to expand a highly sophisticated tool, the Convolutional Neural Networks, making it applicable in multidimensional omics data classification problems and testing the newly introduced method on publicly available transcriptomics and proteomics data. Methods: In this study, we introduce Omics-CNN, a Convolutional Neural Network-based pipeline, which couples Convolutional Neural Networks with dimensionality reduction, preprocessing, clustering, and explainability techniques to make them suitable to build highly accurate and interpretable classification models from high-throughput omics data. The developed tool has the potential to classify patients depending on the expression of genetic and clinical factors and identify features that can act as diagnostic biomarkers. Regarding dimensionality reduction, univariate and multivariate techniques were explored and compared. Gradient Weighted Class Activation Mapping analysis was performed to determine the most important features in the classification of the samples after training the model. Results: The newly introduced pipeline was applied to one transcriptomics and one proteomics dataset for the identification of diagnostic models and biosignatures for Ischemic Stroke (IS) and COVID-19 infection, reporting highly accurate biosignatures with accuracies of 96 % and 95.41 %, respectively. Meanwhile, classification models based solely on a small part of attributes provided lower predictive accuracy, but identified compact transcript biosignature (KRT15, VPRBP, TNFRSF4, GORASP2) for Ischemic Stroke and protein biosignature (ADGRB3, VNN2, AGER, CIAPIN1) for Covid-19 infection diagnosis, respectively. Conclusions: Omics-CNN, overcame the inherent problems of applying Convolutional Neural Networks for the training diagnostic models with quantitative omics data, outperforming previous models of machine learning developed using the same datasets for Ischemic Stroke and Covid-19 infection diagnosis, determining the most contributing biomarkers for both diseases.

3.
Bioinformatics ; 39(7)2023 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-37326976

RESUMO

MOTIVATION: Biomarker discovery is one of the most frequent pursuits in bioinformatics and is crucial for precision medicine, disease prognosis, and drug discovery. A common challenge of biomarker discovery applications is the low ratio of samples over features for the selection of a reliable not-redundant subset of features, but despite the development of efficient tree-based classification methods, such as the extreme gradient boosting (XGBoost), this limitation is still relevant. Moreover, existing approaches for optimizing XGBoost do not deal effectively with the class imbalance nature of the biomarker discovery problems, and the presence of multiple conflicting objectives, since they focus on the training of a single-objective model. In the current work, we introduce MEvA-X, a novel hybrid ensemble for feature selection (FS) and classification, combining a niche-based multiobjective evolutionary algorithm (EA) with the XGBoost classifier. MEvA-X deploys a multiobjective EA to optimize the hyperparameters of the classifier and perform FS, identifying a set of Pareto-optimal solutions and optimizing multiple objectives, including classification and model simplicity metrics. RESULTS: The performance of the MEvA-X tool was benchmarked using one omics dataset coming from a microarray gene expression experiment, and one clinical questionnaire-based dataset combined with demographic information. MEvA-X tool outperformed the state-of-the-art methods in the balanced categorization of classes, creating multiple low-complexity models and identifying important nonredundant biomarkers. The best-performing run of MEvA-X for the prediction of weight loss using gene expression data yields a small set of blood circulatory markers which are sufficient for this precision nutrition application but need further validation. AVAILABILITY AND IMPLEMENTATION: https://github.com/PanKonstantinos/MEvA-X.


Assuntos
Comportamento de Utilização de Ferramentas , Algoritmos , Biomarcadores , Biologia Computacional
4.
Sci Rep ; 12(1): 21735, 2022 12 16.
Artigo em Inglês | MEDLINE | ID: mdl-36526644

RESUMO

The umami taste is one of the five basic taste modalities normally linked to the protein content in food. The implementation of fast and cost-effective tools for the prediction of the umami taste of a molecule remains extremely interesting to understand the molecular basis of this taste and to effectively rationalise the production and consumption of specific foods and ingredients. However, the only examples of umami predictors available in the literature rely on the amino acid sequence of the analysed peptides, limiting the applicability of the models. In the present study, we developed a novel ML-based algorithm, named VirtuousUmami, able to predict the umami taste of a query compound starting from its SMILES representation, thus opening up the possibility of potentially using such a model on any database through a standard and more general molecular description. Herein, we have tested our model on five databases related to foods or natural compounds. The proposed tool will pave the way toward the rationalisation of the molecular features underlying the umami taste and toward the design of specific peptide-inspired compounds with specific taste properties.


Assuntos
Percepção Gustatória , Paladar , Peptídeos/química , Alimentos , Aprendizado de Máquina
5.
Eur Food Res Technol ; 248(9): 2215-2235, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35637881

RESUMO

Taste is a sensory modality crucial for nutrition and survival, since it allows the discrimination between healthy foods and toxic substances thanks to five tastes, i.e., sweet, bitter, umami, salty, and sour, associated with distinct nutritional or physiological needs. Today, taste prediction plays a key role in several fields, e.g., medical, industrial, or pharmaceutical, but the complexity of the taste perception process, its multidisciplinary nature, and the high number of potentially relevant players and features at the basis of the taste sensation make taste prediction a very complex task. In this context, the emerging capabilities of machine learning have provided fruitful insights in this field of research, allowing to consider and integrate a very large number of variables and identifying hidden correlations underlying the perception of a particular taste. This review aims at summarizing the latest advances in taste prediction, analyzing available food-related databases and taste prediction tools developed in recent years. Supplementary Information: The online version contains supplementary material available at 10.1007/s00217-022-04044-5.

6.
Pharmacogenomics J ; 21(6): 638-648, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34145402

RESUMO

Retinoids are widely used in diseases spanning from dermatological lesions to cancer, but exhibit severe adverse effects. A novel all-trans-Retinoic Acid (atRA)-spermine conjugate (termed RASP) has shown previously optimal in vitro and in vivo anti-inflammatory and anticancer efficacy, with undetectable teratogenic and toxic side-effects. To get insights, we treated HaCaT cells which resemble human epidermis with IC50 concentration of RASP and analyzed their miRNA expression profile. Gene ontology analysis of their predicted targets indicated dynamic networks involved in cell proliferation, signal transduction and apoptosis. Furthermore, DNA microarrays analysis verified that RASP affects the expression of the same categories of genes. A protein-protein interaction map produced using the most significant common genes, revealed hub genes of nodal functions. We conclude that RASP is a synthetic retinoid derivative with improved properties, which possess the beneficial effects of retinoids without exhibiting side-effects and with potential beneficial effects against skin diseases including skin cancer.


Assuntos
Queratinócitos/efeitos dos fármacos , MicroRNAs/metabolismo , Espermina/análogos & derivados , Transcriptoma , Tretinoína/análogos & derivados , Apoptose/efeitos dos fármacos , Apoptose/genética , Proliferação de Células/efeitos dos fármacos , Proliferação de Células/genética , Relação Dose-Resposta a Droga , Redes Reguladoras de Genes , Células HaCaT , Humanos , Concentração Inibidora 50 , Queratinócitos/metabolismo , Queratinócitos/patologia , MicroRNAs/genética , Mapas de Interação de Proteínas , Transdução de Sinais/efeitos dos fármacos , Transdução de Sinais/genética , Espermina/farmacologia , Espermina/toxicidade , Tretinoína/farmacologia , Tretinoína/toxicidade
7.
J Pain Res ; 13: 1255-1266, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32547186

RESUMO

PURPOSE: Chronic pain is a life changing condition, and non-opioid treatments have been lately introduced to overcome the addictive nature of opioid therapies and their side effects. In the present study, we explore the potential of machine learning methods to discriminate chronic pain patients into ones who will benefit from such a treatment and ones who will not, aiming to personalize their treatment. PATIENTS AND METHODS: In the current study, data from the OPERA study were used, with 631 chronic pain patients answering the Brief Pain Inventory (BPI) validated questionnaire along with supplemental questions before and after a follow-up period. A novel machine learning approach combining multi-objective optimization and support vector regression was used to build prediction models which can predict, using responses in the baseline, the four different outcomes of the study: total drugs change, total interference change, total severity change, and total complaints change. Data were split to training (504 patients) and testing (127 patients) sets and all results are measured on the independent test set. RESULTS: The machine learning models extracted in the present study significantly overcame other state of the art machine learning methods which were deployed for comparative purposes. The experimental results indicated that the machine learning models can predict the outcomes of this study with considerably high accuracy (AUC 73.8-87.2%) and this allowed their incorporation in a decision support system for the selection of the treatment of chronic pain patients. CONCLUSION: Results of this study revealed the potential of machine learning for an individualized medicine application for chronic pain therapies. Topical analgesics treatment were proven to be, in general, beneficial but carefully selecting with the suggested individualized medicine decision support system was able to decrease by approximately 10% the patients which would have been subscribed with topical analgesics without having benefits from it.

8.
BMC Med Genomics ; 12(1): 118, 2019 08 07.
Artigo em Inglês | MEDLINE | ID: mdl-31391037

RESUMO

BACKGROUND: Identifying molecular biomarkers characteristic of ischemic stroke has the potential to aid in distinguishing stroke cases from stroke mimicking symptoms, as well as advancing the understanding of the physiological changes that underlie the body's response to stroke. This study uses machine learning-based analysis of gene co-expression to identify transcription patterns characteristic of patients with acute ischemic stroke. METHODS: Mutual information values for the expression levels among 13,243 quantified transcripts were computed for blood samples from 82 stroke patients and 68 controls to construct a co-expression network of genes (separately) for stroke and control samples. Page rank centrality scores were computed for every gene; a gene's significance in the network was assessed according to the differences in their network's pagerank centrality between stroke and control expression patterns. A hybrid genetic algorithm - support vector machine learning tool was used to classify samples based on gene centrality in order to identify an optimal set of predictor genes for stroke while minimizing the number of genes in the model. RESULTS: A predictive model with 89.6% accuracy was identified using 6 network-central and differentially expressed genes (ID3, MBTPS1, NOG, SFXN2, BMX, SLC22A1), characterized by large differences in association network connectivity between stroke and control samples. In contrast, classification models based solely on individual genes identified by significant fold-changes in expression level provided lower predictive accuracies: < 71% for any single gene, and even models with larger (10-25) numbers of gene transcript biomarkers gave lower predictive accuracies (≤ 82%) than the 6 network-based gene signature classification. miRNA:mRNA target prediction computational analysis revealed 8 differentially expressed micro-RNAs (miRNAs) that are significantly associated with at least 2 of the 6 network-central genes. CONCLUSIONS: Network-based models have the potential to identify a more statistically robust pattern of gene expression typical of acute ischemic stroke and to generate hypotheses about possible interactions among functionally relevant genes, leading to the identification of more informative biomarkers.


Assuntos
Biomarcadores/sangue , Isquemia Encefálica/sangue , Isquemia Encefálica/genética , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Modelos Genéticos , Acidente Vascular Cerebral/sangue , Acidente Vascular Cerebral/genética , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , Anotação de Sequência Molecular
9.
J Proteome Res ; 17(6): 2165-2173, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29695160

RESUMO

Isobaric tagging is the method of choice in mass-spectrometry-based proteomics for comparing several conditions at a time. Despite its multiplexing capabilities, some drawbacks appear when multiple experiments are merged for comparison in large sample-size studies due to the presence of missing values, which result from the stochastic nature of the data-dependent acquisition mode. Another indirect cause of data incompleteness might derive from the proteomic-typical data-processing workflow that first identifies proteins in individual experiments and then only quantifies those identified proteins, leaving a large number of unmatched spectra with quantitative information unexploited. Inspired by untargeted metabolomic and label-free proteomic workflows, we developed a quantification-driven bioinformatic pipeline (Quantify then Identify (QtI)) that optimizes the processing of isobaric tandem mass tag (TMT) data from large-scale studies. This pipeline includes innovative features, such as peak filtering with a self-adaptive preprocessing pipeline optimization method, Peptide Match Rescue, and Optimized Post-Translational Modification. QtI outperforms a classical benchmark workflow in terms of quantification and identification rates, significantly reducing missing data while preserving unmatched features for quantitative comparison. The number of unexploited tandem mass spectra was reduced by 77 and 62% for two human cerebrospinal fluid and plasma data sets, respectively.


Assuntos
Proteômica/métodos , Coloração e Rotulagem/métodos , Espectrometria de Massas em Tandem/métodos , Fluxo de Trabalho , Algoritmos , Líquido Cefalorraquidiano/química , Biologia Computacional , Conjuntos de Dados como Assunto , Humanos , Plasma/química , Processamento de Proteína Pós-Traducional
10.
Artif Intell Med ; 71: 62-9, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27506132

RESUMO

OBJECTIVE: Proteins are vital biological molecules driving many fundamental cellular processes. They rarely act alone, but form interacting groups called protein complexes. The study of protein complexes is a key goal in systems biology. Recently, large protein-protein interaction (PPI) datasets have been published and a plethora of computational methods that provide new ideas for the prediction of protein complexes have been implemented. However, most of the methods suffer from two major limitations: First, they do not account for proteins participating in multiple functions and second, they are unable to handle weighted PPI graphs. Moreover, the problem remains open as existing algorithms and tools are insufficient in terms of predictive metrics. METHOD: In the present paper, we propose gradually expanding neighborhoods with adjustment (GENA), a new algorithm that gradually expands neighborhoods in a graph starting from highly informative "seed" nodes. GENA considers proteins as multifunctional molecules allowing them to participate in more than one protein complex. In addition, GENA accepts weighted PPI graphs by using a weighted evaluation function for each cluster. RESULTS: In experiments with datasets from Saccharomyces cerevisiae and human, GENA outperformed Markov clustering, restricted neighborhood search and clustering with overlapping neighborhood expansion, three state-of-the-art methods for computationally predicting protein complexes. Seven PPI networks and seven evaluation datasets were used in total. GENA outperformed existing methods in 16 out of 18 experiments achieving an average improvement of 5.5% when the maximum matching ratio metric was used. Our method was able to discover functionally homogeneous protein clusters and uncover important network modules in a Parkinson expression dataset. When used on the human networks, around 47% of the detected clusters were enriched in gene ontology (GO) terms with depth higher than five in the GO hierarchy. CONCLUSIONS: In the present manuscript, we introduce a new method for the computational prediction of protein complexes by making the realistic assumption that proteins participate in multiple protein complexes and cellular functions. Our method can detect accurate and functionally homogeneous clusters.


Assuntos
Algoritmos , Biologia Computacional , Análise por Conglomerados , Humanos , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Saccharomyces cerevisiae
11.
Artigo em Inglês | MEDLINE | ID: mdl-26451829

RESUMO

MicroRNAs (miRNAs) are small non-coding RNAs, which play a significant role in gene regulation. Predicting miRNA genes is a challenging bioinformatics problem and existing experimental and computational methods fail to deal with it effectively. We developed YamiPred, an embedded classification method that combines the efficiency and robustness of support vector machines (SVM) with genetic algorithms (GA) for feature selection and parameters optimization. YamiPred was tested in a new and realistic human dataset and was compared with state-of-the-art computational intelligence approaches and the prevalent SVM-based tools for miRNA prediction. Experimental results indicate that YamiPred outperforms existing approaches in terms of accuracy and of geometric mean of sensitivity and specificity. The embedded feature selection component selects a compact feature subset that contributes to the performance optimization. Further experimentation with this minimal feature subset has achieved very high classification performance and revealed the minimum number of samples required for developing a robust predictor. YamiPred also confirmed the important role of commonly used features such as entropy and enthalpy, and uncovered the significance of newly introduced features, such as %A-U aggregate nucleotide frequency and positional entropy. The best model trained on human data has successfully predicted pre-miRNAs to other organisms including the category of viruses.


Assuntos
Algoritmos , MicroRNAs/genética , Reconhecimento Automatizado de Padrão/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Software , Sequência de Bases , Simulação por Computador , Evolução Molecular , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Modelos Genéticos , Dados de Sequência Molecular , Máquina de Vetores de Suporte
13.
Artif Intell Med ; 63(3): 181-9, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25765008

RESUMO

OBJECTIVE: Proteins are considered to be the most important individual components of biological systems and they combine to form physical protein complexes which are responsible for certain molecular functions. Despite the large availability of protein-protein interaction (PPI) information, not much information is available about protein complexes. Experimental methods are limited in terms of time, efficiency, cost and performance constraints. Existing computational methods have provided encouraging preliminary results, but they phase certain disadvantages as they require parameter tuning, some of them cannot handle weighted PPI data and others do not allow a protein to participate in more than one protein complex. In the present paper, we propose a new fully unsupervised methodology for predicting protein complexes from weighted PPI graphs. METHODS AND MATERIALS: The proposed methodology is called evolutionary enhanced Markov clustering (EE-MC) and it is a hybrid combination of an adaptive evolutionary algorithm and a state-of-the-art clustering algorithm named enhanced Markov clustering. EE-MC was compared with state-of-the-art methodologies when applied to datasets from the human and the yeast Saccharomyces cerevisiae organisms. RESULTS: Using public available datasets, EE-MC outperformed existing methodologies (in some datasets the separation metric was increased by 10-20%). Moreover, when applied to new human datasets its performance was encouraging in the prediction of protein complexes which consist of proteins with high functional similarity. In specific, 5737 protein complexes were predicted and 72.58% of them are enriched for at least one gene ontology (GO) function term. CONCLUSIONS: EE-MC is by design able to overcome intrinsic limitations of existing methodologies such as their inability to handle weighted PPI networks, their constraint to assign every protein in exactly one cluster and the difficulties they face concerning the parameter tuning. This fact was experimentally validated and moreover, new potentially true human protein complexes were suggested as candidates for further validation using experimental techniques.


Assuntos
Análise por Conglomerados , Cadeias de Markov , Mapeamento de Interação de Proteínas/métodos , Algoritmos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Humanos , Mapas de Interação de Proteínas/fisiologia , Saccharomyces cerevisiae
14.
Bioinformatics ; 30(16): 2324-33, 2014 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-24771561

RESUMO

MOTIVATION: Single nucleotide polymorphisms (SNPs) are considered the most frequently occurring DNA sequence variations. Several computational methods have been proposed for the classification of missense SNPs to neutral and disease associated. However, existing computational approaches fail to select relevant features by choosing them arbitrarily without sufficient documentation. Moreover, they are limited to the problem of missing values, imbalance between the learning datasets and most of them do not support their predictions with confidence scores. RESULTS: To overcome these limitations, a novel ensemble computational methodology is proposed. EnsembleGASVR facilitates a two-step algorithm, which in its first step applies a novel evolutionary embedded algorithm to locate close to optimal Support Vector Regression models. In its second step, these models are combined to extract a universal predictor, which is less prone to overfitting issues, systematizes the rebalancing of the learning sets and uses an internal approach for solving the missing values problem without loss of information. Confidence scores support all the predictions and the model becomes tunable by modifying the classification thresholds. An extensive study was performed for collecting the most relevant features for the problem of classifying SNPs, and a superset of 88 features was constructed. Experimental results show that the proposed framework outperforms well-known algorithms in terms of classification performance in the examined datasets. Finally, the proposed algorithmic framework was able to uncover the significant role of certain features such as the solvent accessibility feature, and the top-scored predictions were further validated by linking them with disease phenotypes. AVAILABILITY AND IMPLEMENTATION: Datasets and codes are freely available on the Web at http://prlab.ceid.upatras.gr/EnsembleGASVR/dataset-codes.zip. All the required information about the article is available through http://prlab.ceid.upatras.gr/EnsembleGASVR/site.html.


Assuntos
Algoritmos , Mutação de Sentido Incorreto , Polimorfismo de Nucleotídeo Único , Substituição de Aminoácidos , Humanos
15.
J Biomed Inform ; 46(3): 563-73, 2013 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-23501016

RESUMO

Traditional biology was forced to restate some of its principles when the microRNA (miRNA) genes and their regulatory role were firstly discovered. Typically, miRNAs are small non-coding RNA molecules which have the ability to bind to the 3'untraslated region (UTR) of their mRNA target genes for cleavage or translational repression. Existing experimental techniques for their identification and the prediction of the target genes share some important limitations such as low coverage, time consuming experiments and high cost reagents. Hence, many computational methods have been proposed for these tasks to overcome these limitations. Recently, many researchers emphasized on the development of computational approaches to predict the participation of miRNA genes in regulatory networks and to analyze their transcription mechanisms. All these approaches have certain advantages and disadvantages which are going to be described in the present survey. Our work is differentiated from existing review papers by updating the methodologies list and emphasizing on the computational issues that arise from the miRNA data analysis. Furthermore, in the present survey, the various miRNA data analysis steps are treated as an integrated procedure whose aims and scope is to uncover the regulatory role and mechanisms of the miRNA genes. This integrated view of the miRNA data analysis steps may be extremely useful for all researchers even if they work on just a single step.


Assuntos
Biologia Computacional , MicroRNAs/genética , Redes Reguladoras de Genes , Máquina de Vetores de Suporte
16.
Comp Funct Genomics ; 5(8): 596-616, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-18629176

RESUMO

Gene expression datasets are large and complex, having many variables and unknown internal structure. We apply independent component analysis (ICA) to derive a less redundant representation of the expression data. The decomposition produces components with minimal statistical dependence and reveals biologically relevant information. Consequently, to the transformed data, we apply cluster analysis (an important and popular analysis tool for obtaining an initial understanding of the data, usually employed for class discovery). The proposed self-organizing map (SOM)-based clustering algorithm automatically determines the number of 'natural' subgroups of the data, being aided at this task by the available prior knowledge of the functional categories of genes. An entropy criterion allows each gene to be assigned to multiple classes, which is closer to the biological representation. These features, however, are not achieved at the cost of the simplicity of the algorithm, since the map grows on a simple grid structure and the learning algorithm remains equal to Kohonen's one.

17.
Bioinformatics ; 18(11): 1446-53, 2002 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-12424115

RESUMO

MOTIVATION: Currently the most popular approach to analyze genome-wide expression data is clustering. One of the major drawbacks of most of the existing clustering methods is that the number of clusters has to be specified a priori. Furthermore, by using pure unsupervised algorithms prior biological knowledge is totally ignored Moreover, most current tools lack an effective framework for tight integration of unsupervised and supervised learning for the analysis of high-dimensional expression data and only very few multi-class supervised approaches are designed with the provision for effectively utilizing multiple functional class labeling. RESULTS: The paper adapts a novel Self-Organizing map called supervised Network Self-Organized Map (sNet-SOM) to the peculiarities of multi-labeled gene expression data. The sNet-SOM determines adaptively the number of clusters with a dynamic extension process. This process is driven by an inhomogeneous measure that tries to balance unsupervised, supervised and model complexity criteria. Nodes within a rectangular grid are grown at the boundary nodes, weights rippled from the internal nodes towards the outer nodes of the grid, and whole columns inserted within the map The appropriate level of expansion is determined automatically. Multiple sNet-SOM models are constructed dynamically each for a different unsupervised/supervised balance and model selection criteria are used to select the one optimum one. The results indicate that sNet-SOM yields competitive performance to other recently proposed approaches for supervised classification at a significantly reduced computational cost and it provides extensive exploratory analysis potentiality within the analysis framework. Furthermore, it explores simple design decisions that are easier to comprehend and computationally efficient.


Assuntos
DNA/classificação , DNA/genética , Perfilação da Expressão Gênica/métodos , Saccharomyces cerevisiae/genética , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Bases de Dados de Ácidos Nucleicos , Retroalimentação , Regulação da Expressão Gênica/genética , Genoma Fúngico , Redes Neurais de Computação , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão , Sensibilidade e Especificidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...