Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Comput Biol Chem ; 99: 107717, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-35802991

RESUMEN

Profiles are used to model protein families and domains. They are built by multiple sequence alignments obtained by mapping a query sequence against a database to generate a profile based on the substitution scoring matrix. The profile applications are very dependent on the alignment algorithm and scoring system for amino acid substitution. However, sometimes there are no similar sequences in the database with the query sequence based on the scoring schema. In these cases, it is not possible to make a profile. This paper proposes a method named PA_SPP, based on pre-trained ProtAlbert transformer to predict the profile for a single protein sequence without alignment. The performance of transformers on natural languages is impressive. Protein sequences can be viewed as a language; we can benefit from these models. We analyze the attention heads in different layers of ProtAlbert to show that the transformer can capture five essential protein characteristics of a single sequence. This assessment shows that ProtAlbert considers some protein properties when suggesting amino acids for each position in the sequence. In other words, transformers can be considered an appropriate alternative for alignment and scoring schema to predict a profile. We evaluate PA_SPP on the Casp13 dataset, including 55 proteins. Meanwhile, one thermophilic and two mesophilic proteins are used as case studies. The results display high similarity between the predicted profiles and HSSP profiles.


Asunto(s)
Algoritmos , Proteínas , Secuencia de Aminoácidos , Bases de Datos Factuales , Proteínas/química , Alineación de Secuencia
2.
PLoS One ; 16(8): e0255718, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34370784

RESUMEN

Regardless of all efforts on community discovery algorithms, it is still an open and challenging subject in network science. Recognizing communities in a multilayer network, where there are several layers (types) of connections, is even more complicated. Here, we concentrated on a specific type of communities called seed-centric local communities in the multilayer environment and developed a novel method based on the information cascade concept, called PLCDM. Our simulations on three datasets (real and artificial) signify that the suggested method outstrips two known earlier seed-centric local methods. Additionally, we compared it with other global multilayer and single-layer methods. Eventually, we applied our method on a biological two-layer network of Colon Adenocarcinoma (COAD), reconstructed from transcriptomic and post-transcriptomic datasets, and assessed the output modules. The functional enrichment consequences infer that the modules of interest hold biomolecules involved in the pathways associated with the carcinogenesis.


Asunto(s)
Adenocarcinoma/genética , Algoritmos , Neoplasias del Colon/genética , Mapas de Interacción de Proteínas/genética , Transcriptoma/genética , Adenocarcinoma/metabolismo , Carcinogénesis/genética , Neoplasias del Colon/metabolismo , Humanos
3.
Sci Rep ; 10(1): 22035, 2020 12 16.
Artículo en Inglés | MEDLINE | ID: mdl-33328499

RESUMEN

Controlling a network structure has many potential applications many fields. In order to have an effective network control, not only finding good driver nodes is important, but also finding the optimal time to apply the external control signals to network nodes has a critical role. If applied in an appropriate time, one might be to control a network with a smaller control signals, and thus less energy. In this manuscript, we show that there is a relationship between the strength of the internal fluxes and the effectiveness of the external control signal. To be more effective, external control signals should be applied when the strength of the internal states is the smallest. We validate this claim on synthetic networks as well as a number of real networks. Our results may have important implications in systems medicine, in order to find the most appropriate time to inject drugs as a signal to control diseases.

4.
Genomics ; 112(6): 4938-4944, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32905831

RESUMEN

Controllability of a complex network system is related to finding a set of minimum number of nodes, known as drivers, controlling which allows having a full control on the dynamics of the network. For some applications, only a portion of the network is required to be controlled, for which target control has been proposed. Often, along the controlling route from driver nodes to target nodes, some mediators (intermediate nodes) are also unwillingly controlled, which might cause various side effects. In controlling cancerous cells, unwillingly controlling healthy cells, might result in weakening them, thus affecting the immune system against cancer. This manuscript proposes a suitable candidate solution to the problem of finding minimum number of driver nodes under minimal mediators. Although many others have attempted to develop algorithms to find minimum number of drivers for target control, the newly proposed algorithm is the first one that is capable of achieving this goal and at the same time, keeping the number of the mediators to a minimum. The proposed controllability condition, based on path lengths between node pairs, meets Kalman's controllability rank condition and can be applied on directed networks. Our results show that the path length is a major determinant of in properties of the target control under minimal mediators. As the average path length becomes larger, the ratio of drivers to target nodes decreases and the ratio of mediators to targets increases. The proposed methodology has potential applications in biological networks. The source code of the algorithm and the networks that have been used are available from the following link: https://github.com/LBBSoft/Target-Control-with-Minimal-Mediators.git.


Asunto(s)
Algoritmos , Modelos Biológicos , Animales , Caenorhabditis elegans/fisiología , Redes Reguladoras de Genes , Red Nerviosa/fisiología
5.
PLoS One ; 15(7): e0236519, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32730297

RESUMEN

Stem cells, with their capacity to self-renew and to differentiate to more specialized cell types, play a key role to maintain homeostasis in adult tissues. To investigate how, in the dynamic stochastic environment of a tissue, non-genetic diversity and the precise balance between proliferation and differentiation are achieved, it is necessary to understand the molecular mechanisms of the stem cells in decision making process. By focusing on the impact of stochasticity, we proposed a computational model describing the regulatory circuitry as a tri-stable dynamical system to reveal the mechanism which orchestrate this balance. Our model explains how the distribution of noise in genes, linked to the cell regulatory networks, affects cell decision-making to maintain homeostatic state. The noise effect on tissue homeostasis is achieved by regulating the probability of differentiation and self-renewal through symmetric and/or asymmetric cell divisions. Our model reveals, when mutations due to the replication of DNA in stem cell division, are inevitable, how mutations contribute to either aging gradually or the development of cancer in a short period of time. Furthermore, our model sheds some light on the impact of more complex regulatory networks on the system robustness against perturbations.


Asunto(s)
Modelos Biológicos , Células Madre/metabolismo , Animales , Diferenciación Celular , Autorrenovación de las Células , Humanos , Células Madre/citología
6.
Mol Med ; 25(1): 36, 2019 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-31370801

RESUMEN

BACKGROUND: Acute lymphoblastic leukemia (ALL) is the most common type of cancer diagnosed in children and Glucocorticoids (GCs) form an essential component of the standard chemotherapy in most treatment regimens. The category of infant ALL patients carrying a translocation involving the mixed lineage leukemia (MLL) gene (gene KMT2A) is characterized by resistance to GCs and poor clinical outcome. Although some studies examined GC-resistance in infant ALL patients, the understanding of this phenomenon remains limited and impede the efforts to improve prognosis. METHODS: This study integrates differential co-expression (DC) and protein-protein interaction (PPI) networks to find active protein modules associated with GC-resistance in MLL-rearranged infant ALL patients. A network was constructed by linking differentially co-expressed gene pairs between GC-resistance and GC-sensitive samples and later integrated with PPI networks by keeping the links that are also present in the PPI network. The resulting network was decomposed into two sub-networks, specific to each phenotype. Finally, both sub-networks were clustered into modules using weighted gene co-expression network analysis (WGCNA) and further analyzed with functional enrichment analysis. RESULTS: Through the integration of DC analysis and PPI network, four protein modules were found active under the GC-resistance phenotype but not under the GC-sensitive. Functional enrichment analysis revealed that these modules are related to proteasome, electron transport chain, tRNA-aminoacyl biosynthesis, and peroxisome signaling pathways. These findings are in accordance with previous findings related to GC-resistance in other hematological malignancies such as pediatric ALL. CONCLUSIONS: Differential co-expression analysis is a promising approach to incorporate the dynamic context of gene expression profiles into the well-documented protein interaction networks. The approach allows the detection of relevant protein modules that are highly enriched with DC gene pairs. Functional enrichment analysis of detected protein modules generates new biological hypotheses and may help in explaining the GC-resistance in MLL-rearranged infant ALL patients.


Asunto(s)
Glucocorticoides/uso terapéutico , Leucemia-Linfoma Linfoblástico de Células Precursoras/tratamiento farmacológico , Leucemia-Linfoma Linfoblástico de Células Precursoras/metabolismo , Bases de Datos de Proteínas , Resistencia a Antineoplásicos/genética , Resistencia a Antineoplásicos/fisiología , Perfilación de la Expresión Génica , Redes Reguladoras de Genes/genética , Redes Reguladoras de Genes/fisiología , N-Metiltransferasa de Histona-Lisina/genética , N-Metiltransferasa de Histona-Lisina/metabolismo , Humanos , Proteína de la Leucemia Mieloide-Linfoide/genética , Proteína de la Leucemia Mieloide-Linfoide/metabolismo , Transducción de Señal/genética , Transducción de Señal/fisiología
7.
Front Neurosci ; 13: 625, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31354403

RESUMEN

Application of deep convolutional spiking neural networks (SNNs) to artificial intelligence (AI) tasks has recently gained a lot of interest since SNNs are hardware-friendly and energy-efficient. Unlike the non-spiking counterparts, most of the existing SNN simulation frameworks are not practically efficient enough for large-scale AI tasks. In this paper, we introduce SpykeTorch, an open-source high-speed simulation framework based on PyTorch. This framework simulates convolutional SNNs with at most one spike per neuron and the rank-order encoding scheme. In terms of learning rules, both spike-timing-dependent plasticity (STDP) and reward-modulated STDP (R-STDP) are implemented, but other rules could be implemented easily. Apart from the aforementioned properties, SpykeTorch is highly generic and capable of reproducing the results of various studies. Computations in the proposed framework are tensor-based and totally done by PyTorch functions, which in turn brings the ability of just-in-time optimization for running on CPUs, GPUs, or Multi-GPU platforms.

8.
IEEE Trans Neural Netw Learn Syst ; 29(12): 6178-6190, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-29993898

RESUMEN

Reinforcement learning (RL) has recently regained popularity with major achievements such as beating the European game of Go champion. Here, for the first time, we show that RL can be used efficiently to train a spiking neural network (SNN) to perform object recognition in natural images without using an external classifier. We used a feedforward convolutional SNN and a temporal coding scheme where the most strongly activated neurons fire first, while less activated ones fire later, or not at all. In the highest layers, each neuron was assigned to an object category, and it was assumed that the stimulus category was the category of the first neuron to fire. If this assumption was correct, the neuron was rewarded, i.e., spike-timing-dependent plasticity (STDP) was applied, which reinforced the neuron's selectivity. Otherwise, anti-STDP was applied, which encouraged the neuron to learn something else. As demonstrated on various image data sets (Caltech, ETH-80, and NORB), this reward-modulated STDP (R-STDP) approach has extracted particularly discriminative visual features, whereas classic unsupervised STDP extracts any feature that consistently repeats. As a result, R-STDP has outperformed STDP on these data sets. Furthermore, R-STDP is suitable for online learning and can adapt to drastic changes such as label permutations. Finally, it is worth mentioning that both feature extraction and classification were done with spikes, using at most one spike per neuron. Thus, the network is hardware friendly and energy efficient.


Asunto(s)
Modelos Neurológicos , Plasticidad Neuronal/fisiología , Neuronas/fisiología , Recompensa , Percepción Visual/fisiología , Animales , Simulación por Computador , Humanos , Red Nerviosa
9.
BMC Bioinformatics ; 18(1): 370, 2017 Aug 16.
Artículo en Inglés | MEDLINE | ID: mdl-28814324

RESUMEN

BACKGROUND: Discriminating driver mutations from the ones that play no role in cancer is a severe bottleneck in elucidating molecular mechanisms underlying cancer development. Since protein domains are representatives of functional regions within proteins, mutations on them may disturb the protein functionality. Therefore, studying mutations at domain level may point researchers to more accurate assessment of the functional impact of the mutations. RESULTS: This article presents a comprehensive study to map mutations from 29 cancer types to both sequence- and structure-based domains. Statistical analysis was performed to identify candidate domains in which mutations occur with high statistical significance. For each cancer type, the corresponding type-specific domains were distinguished among all candidate domains. Subsequently, cancer type-specific domains facilitated the identification of specific proteins for each cancer type. Besides, performing interactome analysis on specific proteins of each cancer type showed high levels of interconnectivity among them, which implies their functional relationship. To evaluate the role of mitochondrial genes, stem cell-specific genes and DNA repair genes in cancer development, their mutation frequency was determined via further analysis. CONCLUSIONS: This study has provided researchers with a publicly available data repository for studying both CATH and Pfam domain regions on protein-coding genes. Moreover, the associations between different groups of genes/domains and various cancer types have been clarified. The work is available at http://www.cancerouspdomains.ir .


Asunto(s)
Neoplasias/genética , Proteínas/genética , Reparación del ADN/genética , Bases de Datos Genéticas , Humanos , Internet , Mitocondrias/genética , Mutación , Neoplasias/metabolismo , Neoplasias/patología , Células Madre Neoplásicas/metabolismo , Mapas de Interacción de Proteínas/genética , Proteínas/metabolismo , Interfaz Usuario-Computador
10.
BMC Bioinformatics ; 18(1): 10, 2017 Jan 03.
Artículo en Inglés | MEDLINE | ID: mdl-28049415

RESUMEN

BACKGROUND: Although different protein-protein physical interaction (PPI) datasets exist for Escherichia coli, no common methodology exists to integrate these datasets and extract reliable modules reflecting the existing biological process and protein complexes. Naïve Bayesian formula is the highly accepted method to integrate different PPI datasets into a single weighted PPI network, but detecting proper weights in such network is still a major problem. RESULTS: In this paper, we proposed a new methodology to integrate various physical PPI datasets into a single weighted PPI network in a way that the detected modules in PPI network exhibit the highest similarity to available functional modules. We used the co-expression modules as functional modules, and we shown that direct functional modules detected from Gene Ontology terms could be used as an alternative dataset. After running this integrating methodology over six different physical PPI datasets, orthologous high-confidence interactions from a related organism and two AP-MS PPI datasets gained high weights in the integrated networks, while the weights for one AP-MS PPI dataset and two other datasets derived from public databases have converged to zero. The majority of detected modules shaped around one or few hub protein(s). Still, a large number of highly interacting protein modules were detected which are functionally relevant and are likely to construct protein complexes. CONCLUSIONS: We provided a new high confidence protein complex prediction method supported by functional studies and literature mining.


Asunto(s)
Proteínas Bacterianas/metabolismo , Escherichia coli/metabolismo , Mapeo de Interacción de Proteínas/métodos , Algoritmos , Proteínas Bacterianas/química , Teorema de Bayes , Cromatografía de Afinidad , Espectrometría de Masas , Mapas de Interacción de Proteínas
11.
Cell Oncol (Dordr) ; 40(1): 33-45, 2017 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-27798768

RESUMEN

PURPOSE: Despite vast improvements that have been made in the treatment of children with acute lymphoblastic leukemia (ALL), the majority of infant ALL patients (~80 %, < 1 year of age) that carry a chromosomal translocation involving the mixed lineage leukemia (MLL) gene shows a poor response to chemotherapeutic drugs, especially glucocorticoids (GCs), which are essential components of all current treatment regimens. Although addressed in several studies, the mechanism(s) underlying this phenomenon have remained largely unknown. A major drawback of most previous studies is their primary focus on individual genes, thereby neglecting the putative significance of inter-gene correlations. Here, we aimed at studying GC resistance in MLL-rearranged infant ALL patients by inferring an associated module of genes using co-expression network analysis. The implications of newly identified candidate genes with associations to other well-known relevant genes from the same module, or with associations to known transcription factor or microRNA interactions, were substantiated using literature data. METHODS: A weighted gene co-expression network was constructed to identify gene modules associated with GC resistance in MLL-rearranged infant ALL patients. Significant gene ontology (GO) terms and signaling pathways enriched in relevant modules were used to provide guidance towards which module(s) consisted of promising candidates suitable for further analysis. RESULTS: Through gene co-expression network analysis a novel set of genes (module) related to GC-resistance was identified. The presence in this module of the S100 and ANXA genes, both well-known biomarkers for GC resistance in MLL-rearranged infant ALL, supports its validity. Subsequent gene set net correlation analyses of the novel module provided further support for its validity by showing that the S100 and ANXA genes act as 'hub' genes with potentially major regulatory roles in GC sensitivity, but having lost this role in the GC resistant phenotype. The detected module implicates new genes as being candidates for further analysis through associations with known GC resistance-related genes. CONCLUSIONS: From our data we conclude that available systems biology approaches can be employed to detect new candidate genes that may provide further insights into drug resistance of MLL-rearranged infant ALL cases. Such approaches complement conventional gene-wise approaches by taking putative functional interactions between genes into account.


Asunto(s)
Antineoplásicos/uso terapéutico , Resistencia a Antineoplásicos/genética , Perfilación de la Expresión Génica/métodos , Glucocorticoides/uso terapéutico , Leucemia-Linfoma Linfoblástico de Células Precursoras/tratamiento farmacológico , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Redes Reguladoras de Genes , N-Metiltransferasa de Histona-Lisina/genética , Humanos , Lactante , Proteína de la Leucemia Mieloide-Linfoide/genética
12.
Comput Math Methods Med ; 2014: 393908, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25435900

RESUMEN

To date, few tools for aligning protein-protein interaction networks have been suggested. These tools typically find conserved interaction patterns using various local or global alignment algorithms. However, the improvement of the speed, scalability, simplification, and accuracy of network alignment tools is still the target of new researches. In this paper, we introduce Pin-Align, a new tool for local alignment of protein-protein interaction networks. Pin-Align accuracy is tested on protein interaction networks from IntAct, DIP, and the Stanford Network Database and the results are compared with other well-known algorithms. It is shown that Pin-Align has higher sensitivity and specificity in terms of KEGG Ortholog groups.


Asunto(s)
Biología Computacional/métodos , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Algoritmos , Animales , Análisis por Conglomerados , Bases de Datos de Proteínas , Humanos , Ratones , Probabilidad , Mapas de Interacción de Proteínas , Reproducibilidad de los Resultados , Alineación de Secuencia , Programas Informáticos
13.
PLoS One ; 9(1): e84341, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24392125

RESUMEN

Nowadays, brain signals are employed in various scientific and practical fields such as Medical Science, Cognitive Science, Neuroscience, and Brain Computer Interfaces. Hence, the need for robust signal analysis methods with adequate accuracy and generalizability is inevitable. The brain signal analysis is faced with complex challenges including small sample size, high dimensionality and noisy signals. Moreover, because of the non-stationarity of brain signals and the impacts of mental states on brain function, the brain signals are associated with an inherent uncertainty. In this paper, an evidence-based combining classifiers method is proposed for brain signal analysis. This method exploits the power of combining classifiers for solving complex problems and the ability of evidence theory to model as well as to reduce the existing uncertainty. The proposed method models the uncertainty in the labels of training samples in each feature space by assigning soft and crisp labels to them. Then, some classifiers are employed to approximate the belief function corresponding to each feature space. By combining the evidence raised from each classifier through the evidence theory, more confident decisions about testing samples can be made. The obtained results by the proposed method compared to some other evidence-based and fixed rule combining methods on artificial and real datasets exhibit the ability of the proposed method in dealing with complex and uncertain classification problems.


Asunto(s)
Inteligencia Artificial , Encéfalo/fisiología , Electroencefalografía , Modelos Teóricos , Procesamiento de Señales Asistido por Computador , Algoritmos , Humanos , Reproducibilidad de los Resultados
14.
Int J Bioinform Res Appl ; 9(6): 584-94, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24084239

RESUMEN

One of the fundamental problems in computational biology is the construction of physical maps of chromosomes from the hybridisation experiments between unique probes and clones of chromosome fragments. Before introducing the shotgun sequencing method, Partial Digest Problem (PDP) was an intractable problem used to construct the physical maps of DNA sequence in molecular biology. In this paper, we develop a novel Genetic Algorithm (GA) for solving the PDP. This algorithm is implemented and compared with well-known existing algorithms on different types of random and real instances data, and the obtained results show the efficiency of our algorithm. Also, our GA is adapted to handle the erroneous data and their efficiency is presented for the large instances of this problem.


Asunto(s)
Algoritmos , Secuencia de Bases , Genómica/métodos , Mapeo Cromosómico/métodos
15.
PLoS One ; 8(7): e67552, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23874428

RESUMEN

Our goal of this study was to reconstruct a "genome-scale co-expression network" and find important modules in lung adenocarcinoma so that we could identify the genes involved in lung adenocarcinoma. We integrated gene mutation, GWAS, CGH, array-CGH and SNP array data in order to identify important genes and loci in genome-scale. Afterwards, on the basis of the identified genes a co-expression network was reconstructed from the co-expression data. The reconstructed network was named "genome-scale co-expression network". As the next step, 23 key modules were disclosed through clustering. In this study a number of genes have been identified for the first time to be implicated in lung adenocarcinoma by analyzing the modules. The genes EGFR, PIK3CA, TAF15, XIAP, VAPB, Appl1, Rab5a, ARF4, CLPTM1L, SP4, ZNF124, LPP, FOXP1, SOX18, MSX2, NFE2L2, SMARCC1, TRA2B, CBX3, PRPF6, ATP6V1C1, MYBBP1A, MACF1, GRM2, TBXA2R, PRKAR2A, PTK2, PGF and MYO10 are among the genes that belong to modules 1 and 22. All these genes, being implicated in at least one of the phenomena, namely cell survival, proliferation and metastasis, have an over-expression pattern similar to that of EGFR. In few modules, the genes such as CCNA2 (Cyclin A2), CCNB2 (Cyclin B2), CDK1, CDK5, CDC27, CDCA5, CDCA8, ASPM, BUB1, KIF15, KIF2C, NEK2, NUSAP1, PRC1, SMC4, SYCE2, TFDP1, CDC42 and ARHGEF9 are present that play a crucial role in cell cycle progression. In addition to the mentioned genes, there are some other genes (i.e. DLGAP5, BIRC5, PSMD2, Src, TTK, SENP2, PSMD2, DOK2, FUS and etc.) in the modules.


Asunto(s)
Adenocarcinoma/genética , Redes Reguladoras de Genes , Genoma Humano , Neoplasias Pulmonares/genética , Mutación , Adenocarcinoma/patología , Adenocarcinoma del Pulmón , Ciclo Celular/genética , Procesos de Crecimiento Celular/genética , Supervivencia Celular/genética , Análisis por Conglomerados , Humanos , Neoplasias Pulmonares/patología , Metástasis de la Neoplasia , Polimorfismo de Nucleótido Simple
16.
Genes Genet Syst ; 88(5): 301-9, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24694393

RESUMEN

Gene expression is a highly regulated biological process that is fundamental to the existence of phenotypes of any living organism. The regulatory relations are usually modeled as a network; simply, every gene is modeled as a node and relations are shown as edges between two related genes. This paper presents a novel method for inferring correlation networks, networks constructed by connecting co-expressed genes, through predicting co-expression level from genes promoter's sequences. According to the results, this method works well on biological data and its outcome is comparable to the methods that use microarray as input. The method is written in C++ language and is available upon request from the corresponding author.


Asunto(s)
Redes Reguladoras de Genes , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Programas Informáticos , Factores de Transcripción/genética , Algoritmos , Sitios de Unión , Expresión Génica , Redes Neurales de la Computación , Regiones Promotoras Genéticas , Unión Proteica , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/metabolismo
17.
J Biomol Struct Dyn ; 29(6): 623-33, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22545993

RESUMEN

Protein structural class prediction is one of the challenging problems in bioinformatics. Previous methods directly based on the similarity of amino acid (AA) sequences have been shown to be insufficient for low-similarity protein data-sets. To improve the prediction accuracy for such low-similarity proteins, different methods have been recently proposed that explore the novel feature sets based on predicted secondary structure propensities. In this paper, we focus on protein structural class prediction using combinations of the novel features including secondary structure propensities as well as functional domain (FD) features extracted from the InterPro signature database. Our comprehensive experimental results based on several benchmark data-sets have shown that the integration of new FD features substantially improves the accuracy of structural class prediction for low-similarity proteins as they capture meaningful relationships among AA residues that are far away in protein sequence. The proposed prediction method has also been tested to predict structural classes for partially disordered proteins with the reasonable prediction accuracy, which is a more difficult problem comparing to structural class prediction for commonly used benchmark data-sets and has never been done before to the best of our knowledge. In addition, to avoid overfitting with a large number of features, feature selection is applied to select discriminating features that contribute to achieve high prediction accuracy. The selected features have been shown to achieve stable prediction performance across different benchmark data-sets.


Asunto(s)
Algoritmos , Estructura Secundaria de Proteína , Proteínas/química , Proteínas/metabolismo , Bases de Datos de Proteínas , Proteínas/clasificación , Proteómica , Alineación de Secuencia , Análisis de Secuencia de Proteína
18.
J Theor Biol ; 304: 96-102, 2012 Jul 07.
Artículo en Inglés | MEDLINE | ID: mdl-22504445

RESUMEN

Gene expression is the main cause for the existence of various phenotypes. Through this procedure, the information stored in DNA rises to the phenotype. Essentially, gene expression is dependent upon the successful binding of transcription factors (TFs) - a specific type of proteins - to explicit positions in its upstream, TF binding sites (TFBSs). Unfortunately, finding these TFBSs is costly and laborious; therefore, discovering TFBSs computationally is a significant problem that many researches endeavor to solve. In this paper, a new TFBS discovery method is presented by considering known biological facts about TFBSs. The input to this method includes sequences with arbitrary lengths and the output comprises positions that tend to be TFBS. Through the application of previous methods along with a method that focuses on biological and simulated datasets, it is shown that this method achieves higher accuracy in discovering TFBSs.


Asunto(s)
Alineación de Secuencia/métodos , Factores de Transcripción/metabolismo , Algoritmos , Animales , Sitios de Unión/genética , Biología Computacional/métodos , Escherichia coli/genética , Regulación de la Expresión Génica , Humanos , Unión Proteica/genética
19.
J Biomed Inform ; 43(5): 800-4, 2010 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-20546935

RESUMEN

Single Nucleotide Polymorphisms (SNPs) provide valuable information on human evolutionary history and may lead us to identify genetic variants responsible for human complex diseases. Unfortunately, molecular haplotyping methods are costly, laborious, and time consuming; therefore, algorithms for constructing full haplotype patterns from small available data through computational methods, Tag SNP selection problem, are convenient and attractive. This problem is proved to be an NP-hard problem, so heuristic methods may be useful. In this paper we present a heuristic method based on genetic algorithm to find reasonable solution within acceptable time. The algorithm was tested on a variety of simulated and experimental data. In comparison with the exact algorithm, based on brute force approach, results show that our method can obtain optimal solutions in almost all cases and runs much faster than exact algorithm when the number of SNP sites is large. Our software is available upon request to the corresponding author.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Programas Informáticos , Simulación por Computador , Enfermedad/genética , Predisposición Genética a la Enfermedad/epidemiología , Predisposición Genética a la Enfermedad/genética , Haplotipos , Humanos
20.
BMC Bioinformatics ; 10: 318, 2009 Oct 04.
Artículo en Inglés | MEDLINE | ID: mdl-19799800

RESUMEN

BACKGROUND: Complex networks are studied across many fields of science and are particularly important to understand biological processes. Motifs in networks are small connected sub-graphs that occur significantly in higher frequencies than in random networks. They have recently gathered much attention as a useful concept to uncover structural design principles of complex networks. Existing algorithms for finding network motifs are extremely costly in CPU time and memory consumption and have practically restrictions on the size of motifs. RESULTS: We present a new algorithm (Kavosh), for finding k-size network motifs with less memory and CPU time in comparison to other existing algorithms. Our algorithm is based on counting all k-size sub-graphs of a given graph (directed or undirected). We evaluated our algorithm on biological networks of E. coli and S. cereviciae, and also on non-biological networks: a social and an electronic network. CONCLUSION: The efficiency of our algorithm is demonstrated by comparing the obtained results with three well-known motif finding tools. For comparison, the CPU time, memory usage and the similarities of obtained motifs are considered. Besides, Kavosh can be employed for finding motifs of size greater than eight, while most of the other algorithms have restriction on motifs with size greater than eight. The Kavosh source code and help files are freely available at: http://Lbb.ut.ac.ir/Download/LBBsoft/Kavosh/.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Programas Informáticos , Escherichia coli/genética , Redes Neurales de la Computación , Saccharomyces cerevisiae/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA