Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 21(1): 584, 2020 Dec 17.
Artigo em Inglês | MEDLINE | ID: mdl-33334319

RESUMO

BACKGROUND: Predicting physical interaction between proteins is one of the greatest challenges in computational biology. There are considerable various protein interactions and a huge number of protein sequences and synthetic peptides with unknown interacting counterparts. Most of co-evolutionary methods discover a combination of physical interplays and functional associations. However, there are only a handful of approaches which specifically infer physical interactions. Hybrid co-evolutionary methods exploit inter-protein residue coevolution to unravel specific physical interacting proteins. In this study, we introduce a hybrid co-evolutionary-based approach to predict physical interplays between pairs of protein families, starting from protein sequences only. RESULTS: In the present analysis, pairs of multiple sequence alignments are constructed for each dimer and the covariation between residues in those pairs are calculated by CCMpred (Contacts from Correlated Mutations predicted) and three mutual information based approaches for ten accessible surface area threshold groups. Then, whole residue couplings between proteins of each dimer are unified into a single Frobenius norm value. Norms of residue contact matrices of all dimers in different accessible surface area thresholds are fed into support vector machine as single or multiple feature models. The results of training the classifiers by single features show no apparent different accuracies in distinct methods for different accessible surface area thresholds. Nevertheless, mutual information product and context likelihood of relatedness procedures may roughly have an overall higher and lower performances than other two methods for different accessible surface area cut-offs, respectively. The results also demonstrate that training support vector machine with multiple norm features for several accessible surface area thresholds leads to a considerable improvement of prediction performance. In this context, CCMpred roughly achieves an overall better performance than mutual information based approaches. The best accuracy, sensitivity, specificity, precision and negative predictive value for that method are 0.98, 1, 0.962, 0.96, and 0.962, respectively. CONCLUSIONS: In this paper, by feeding norm values of protein dimers into support vector machines in different accessible surface area thresholds, we demonstrate that even small number of proteins in pairs of multiple alignments could allow one to accurately discriminate between positive and negative dimers.


Assuntos
Proteínas/química , Máquina de Vetores de Suporte , Bases de Dados de Proteínas , Dimerização , Evolução Molecular , Mapas de Interação de Proteínas , Proteínas/metabolismo
2.
BMC Genomics ; 18(1): 964, 2017 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-29233090

RESUMO

BACKGROUND: DNA methylation at promoters is largely correlated with inhibition of gene expression. However, the role of DNA methylation at enhancers is not fully understood, although a crosstalk with chromatin marks is expected. Actually, there exist contradictory reports about positive and negative correlations between DNA methylation and H3K4me1, a chromatin hallmark of enhancers. RESULTS: We investigated the relationship between DNA methylation and active chromatin marks through genome-wide correlations, and found anti-correlation between H3K4me1 and H3K4me3 enrichment at low and intermediate DNA methylation loci. We hypothesized "seesaw" dynamics between H3K4me1 and H3K4me3 in the low and intermediate DNA methylation range, in which DNA methylation discriminates between enhancers and promoters, marked by H3K4me1 and H3K4me3, respectively. Low methylated regions are H3K4me3 enriched, while those with intermediate DNA methylation levels are progressively H3K4me1 enriched. Additionally, the enrichment of H3K27ac, distinguishing active from primed enhancers, follows a plateau in the lower range of the intermediate DNA methylation level, corresponding to active enhancers, and decreases linearly in the higher range of the intermediate DNA methylation. Thus, the decrease of the DNA methylation switches smoothly the state of the enhancers from a primed to an active state. We summarize these observations into a rule of thumb of one-out-of-three methylation marks: "In each genomic region only one out of these three methylation marks {DNA methylation, H3K4me1, H3K4me3} is high. If it is the DNA methylation, the region is inactive. If it is H3K4me1, the region is an enhancer, and if it is H3K4me3, the region is a promoter". To test our model, we used available genome-wide datasets of H3K4 methyltransferases knockouts. Our analysis suggests that CXXC proteins, as readers of non-methylated CpGs would regulate the "seesaw" mechanism that focuses H3K4me3 to unmethylated sites, while being repulsed from H3K4me1 decorated enhancers and CpG island shores. CONCLUSIONS: Our results show that DNA methylation discriminates promoters from enhancers through H3K4me1-H3K4me3 seesaw mechanism, and suggest its possible function in the inheritance of chromatin marks after cell division. Our analyses suggest aberrant formation of promoter-like regions and ectopic transcription of hypomethylated regions of DNA. Such mechanism process can have important implications in biological process in where it has been reported abnormal DNA methylation status such as cancer and aging.


Assuntos
Metilação de DNA , Elementos Facilitadores Genéticos , Código das Histonas , Regiões Promotoras Genéticas , Animais , Citosina/metabolismo , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/metabolismo , Expressão Gênica , Histonas/metabolismo , Camundongos , Domínios Proteicos
3.
Neuroimage ; 159: 289-301, 2017 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-28782679

RESUMO

In free visual exploration, eye-movement is immediately followed by dynamic reconfiguration of brain functional connectivity. We studied the task-dependency of this process in a combined visual search-change detection experiment. Participants viewed two (nearly) same displays in succession. First time they had to find and remember multiple targets among distractors, so the ongoing task involved memory encoding. Second time they had to determine if a target had changed in orientation, so the ongoing task involved memory retrieval. From multichannel EEG recorded during 200 ms intervals time-locked to fixation onsets, we estimated the functional connectivity using a weighted phase lag index at the frequencies of theta, alpha, and beta bands, and derived global and local measures of the functional connectivity graphs. We found differences between both memory task conditions for several network measures, such as mean path length, radius, diameter, closeness and eccentricity, mainly in the alpha band. Both the local and the global measures indicated that encoding involved a more segregated mode of operation than retrieval. These differences arose immediately after fixation onset and persisted for the entire duration of the lambda complex, an evoked potential commonly associated with early visual perception. We concluded that encoding and retrieval differentially shape network configurations involved in early visual perception, affecting the way the visual input is processed at each fixation. These findings demonstrate that task requirements dynamically control the functional connectivity networks involved in early visual perception.


Assuntos
Memória/fisiologia , Vias Neurais/fisiologia , Percepção Visual/fisiologia , Adolescente , Adulto , Comportamento , Eletroencefalografia , Movimentos Oculares/fisiologia , Feminino , Humanos , Masculino , Rede Nervosa/fisiologia , Estimulação Luminosa , Adulto Jovem
4.
BMC Bioinformatics ; 18(1): 30, 2016 Nov 03.
Artigo em Inglês | MEDLINE | ID: mdl-27809781

RESUMO

BACKGROUND: Copy Number Variation (CNV) is envisaged to be a major source of large structural variations in the human genome. In recent years, many studies apply Next Generation Sequencing (NGS) data for the CNV detection. However, still there is a necessity to invent more accurate computational tools. RESULTS: In this study, mate pair NGS data are used for the CNV detection in a Hidden Markov Model (HMM). The proposed HMM has position specific emission probabilities, i.e. a Gaussian mixture distribution. Each component in the Gaussian mixture distribution captures a different type of aberration that is observed in the mate pairs, after being mapped to the reference genome. These aberrations may include any increase (decrease) in the insertion size or change in the direction of mate pairs that are mapped to the reference genome. This HMM with Position-Specific Emission probabilities (PSE-HMM) is utilized for the genome-wide detection of deletions and tandem duplications. The performance of PSE-HMM is evaluated on a simulated dataset and also on a real data of a Yoruban HapMap individual, NA18507. CONCLUSIONS: PSE-HMM is effective in taking observation dependencies into account and reaches a high accuracy in detecting genome-wide CNVs. MATLAB programs are available at http://bs.ipm.ir/softwares/PSE-HMM/ .


Assuntos
Algoritmos , Variações do Número de Cópias de DNA , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Confiabilidade dos Dados , Genômica/métodos , Humanos , Probabilidade
5.
Proteins ; 82(9): 1937-46, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-24596179

RESUMO

Decomposition of structural domains is an essential task in classifying protein structures, predicting protein function, and many other proteomics problems. As the number of known protein structures in PDB grows exponentially, the need for accurate automatic domain decomposition methods becomes more essential. In this article, we introduce a bottom-up algorithm for assigning protein domains using a graph theoretical approach. This algorithm is based on a center-based clustering approach. For constructing initial clusters, members of an independent dominating set for the graph representation of a protein are considered as the centers. A distance matrix is then defined for these clusters. To obtain final domains, these clusters are merged using the compactness principle of domains and a method similar to the neighbor-joining algorithm considering some thresholds. The thresholds are computed using a training set consisting of 50 protein chains. The algorithm is implemented using C++ language and is named ProDomAs. To assess the performance of ProDomAs, its results are compared with seven automatic methods, against five publicly available benchmarks. The results show that ProDomAs outperforms other methods applied on the mentioned benchmarks. The performance of ProDomAs is also evaluated against 6342 chains obtained from ASTRAL SCOP 1.71. ProDomAs is freely available at http://www.bioinf.cs.ipm.ir/software/prodomas.


Assuntos
Estrutura Terciária de Proteína , Proteínas/química , Proteínas/classificação , Algoritmos , Sequência de Aminoácidos , Análise por Conglomerados , Biologia Computacional , Proteômica , Análise de Sequência de Proteína
6.
J Biopharm Stat ; 24(4): 715-31, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24697665

RESUMO

In this article, we discuss an optimization approach to the sample size question, founded on maximizing the value of information in comparison studies with binary responses. The expected value of perfect information (EVPI) is calculated and the optimal sample size is obtained by maximizing the expected net gain of sampling (ENGS), the difference between the expected value of sample information (EVSI) and the cost of conducting the trial. The data are assumed to come from two independent binomial distributions, while the parameter of interest is the difference between the two success probabilities, [Formula: see text]. To formulate our prior knowledge on the parameters, a Dirichlet prior is used. Monte Carlo integration is used in the computation and optimization of ENGS. We also compare the results of this approach with existing Bayesian methods and show how the new approach reduces the computational complexity considerably.


Assuntos
Teorema de Bayes , Método de Monte Carlo , Tamanho da Amostra , Humanos
7.
Genomics ; 102(5-6): 507-14, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24161398

RESUMO

Recent advances in the sequencing technologies have provided a handful of RNA-seq datasets for transcriptome analysis. However, reconstruction of full-length isoforms and estimation of the expression level of transcripts with a low cost are challenging tasks. We propose a novel de novo method named SSP that incorporates interval integer linear programming to resolve alternatively spliced isoforms and reconstruct the whole transcriptome from short reads. Experimental results show that SSP is fast and precise in determining different alternatively spliced isoforms along with the estimation of reconstructed transcript abundances. The SSP software package is available at http://www.bioinf.cs.ipm.ir/software/ssp.


Assuntos
Programação Linear , Isoformas de RNA/análise , Análise de Sequência de RNA/métodos , Processamento Alternativo , Perfilação da Expressão Gênica/métodos , Programação Linear/economia , Análise de Sequência de RNA/economia , Software , Transcriptoma
8.
J Autism Dev Disord ; 53(5): 2050-2061, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-35220523

RESUMO

Autism spectrum disorders (ASD) are strikingly more prevalent in males, but the molecular mechanisms responsible for ASD sex-differential risk are poorly understood. Abnormally shorter telomeres have been associated with autism. Examination of relative telomere lengths (RTL) among non-syndromic male (N = 14) and female (N = 10) children with autism revealed that only autistic male children had significantly shorter RTL than typically-developing controls (N = 24) and paired siblings (N = 10). While average RTL of autistic girls did not differ significantly from controls, it was substantially longer than autistic boys. Our findings indicate a sexually-dimorphic pattern of RTL in childhood autism and could have important implications for RTL as a potential biomarker and the role/s of telomeres in the molecular mechanisms responsible for ASD sex-biased prevalence and etiology.


Assuntos
Transtorno do Espectro Autista , Transtorno Autístico , Criança , Humanos , Masculino , Feminino , Transtorno Autístico/genética , Transtorno do Espectro Autista/genética , Caracteres Sexuais , Biomarcadores , Telômero
9.
BMC Bioinformatics ; 11: 16, 2010 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-20064218

RESUMO

BACKGROUND: Considering energy function to detect a correct protein fold from incorrect ones is very important for protein structure prediction and protein folding. Knowledge-based mean force potentials are certainly the most popular type of interaction function for protein threading. They are derived from statistical analyses of interacting groups in experimentally determined protein structures. These potentials are developed at the atom or the amino acid level. Based on orientation dependent contact area, a new type of knowledge-based mean force potential has been developed. RESULTS: We developed a new approach to calculate a knowledge-based potential of mean-force, using pairwise residue contact area. To test the performance of our approach, we performed it on several decoy sets to measure its ability to discriminate native structure from decoys. This potential has been able to distinguish native structures from the decoys in the most cases. Further, the calculated Z-scores were quite high for all protein datasets. CONCLUSIONS: This knowledge-based potential of mean force can be used in protein structure prediction, fold recognition, comparative modelling and molecular recognition. The program is available at http://www.bioinf.cs.ipm.ac.ir/softwares/surfield.


Assuntos
Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , Sítios de Ligação , Bases de Dados de Proteínas , Dobramento de Proteína
10.
IEEE/ACM Trans Comput Biol Bioinform ; 17(5): 1555-1562, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-30990436

RESUMO

Joint graphical lasso (JGL) approach is a Gaussian graphical model to estimate multiple graphical models corresponding to distinct but related groups. Molecular apocrine (MA) breast cancer tumor has similar characteristics to luminal and basal subtypes. Due to the relationship between MA tumor and two other subtypes, this paper investigates the similarities and differences between the MA genes association network and the ones corresponding to other tumors by taking advantageous of JGL properties. Two distinct JGL graphical models are applied to two sub-datasets including the gene expression information of the MA and the luminal tumors and also the MA and the basal tumors. Then, topological comparisons between the networks such as finding the shared edges are applied. In addition, several support vector machine (SVM) classification models are performed to assess the discriminating power of some critical nodes in the networks, like hub nodes, to discriminate the tumors sample. Applying the JGL approach prepares an appropriate tool to observe the networks of the MA tumor and other subtypes in one map. The results obtained by comparing the networks could be helpful to generate new insight about MA tumor for future studies.


Assuntos
Biomarcadores Tumorais/genética , Neoplasias da Mama , Transcriptoma/genética , Neoplasias da Mama/classificação , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Biologia Computacional , Bases de Dados Genéticas , Feminino , Humanos , Máquina de Vetores de Suporte , Fatores de Transcrição/genética
11.
Sci Rep ; 10(1): 8384, 2020 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32433480

RESUMO

Since the world population is ageing, dementia is going to be a growing concern. Alzheimer's disease is the most common form of dementia. The pathogenesis of Alzheimer's disease is extensively studied, yet unknown remains. Therefore, we aimed to extract new knowledge from existing data. We analysed about 2700 upregulated genes and 2200 downregulated genes from three studies on the CA1 of the hippocampus of brains with Alzheimer's disease. We found that only the calcium signalling pathway enriched by 48 downregulated genes was consistent between all three studies. We predicted miR-129 to target nine out of 48 genes. Then, we validated miR-129 to regulate six out of nine genes in HEK cells. We noticed that four out of six genes play a role in synaptic plasticity. Finally, we confirmed the upregulation of miR-129 in the hippocampus of brains of rats with scopolamine-induced amnesia as a model of Alzheimer's disease. We suggest that future research should investigate the possible role of miR-129 in synaptic plasticity and Alzheimer's disease. This paper presents a novel framework to gain insight into potential biomarkers and targets for diagnosis and treatment of diseases.


Assuntos
Doença de Alzheimer/metabolismo , Doença de Alzheimer/fisiopatologia , Encéfalo/metabolismo , Encéfalo/fisiopatologia , Hipocampo/fisiologia , Plasticidade Neuronal/fisiologia , Animais , Masculino , Análise em Microsséries , Ratos
12.
BMC Bioinformatics ; 10: 269, 2009 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-19712447

RESUMO

BACKGROUND: Global partitioning based on pairwise associations of SNPs has not previously been used to define haplotype blocks within genomes. Here, we define an association index based on LD between SNP pairs. We use the Fisher's exact test to assess the statistical significance of the LD estimator. By this test, each SNP pair is characterized as associated, independent, or not-statistically-significant. We set limits on the maximum acceptable proportion of independent pairs within all blocks and search for the partitioning with maximal proportion of associated SNP pairs. Essentially, this model is reduced to a constrained optimization problem, the solution of which is obtained by iterating a dynamic programming algorithm. RESULTS: In comparison with other methods, our algorithm reports blocks of larger average size. Nevertheless, the haplotype diversity within the blocks is captured by a small number of tagSNPs. Resampling HapMap haplotypes under a block-based model of recombination showed that our algorithm is robust in reproducing the same partitioning for recombinant samples. Our algorithm performed better than previously reported models in a case-control association study aimed at mapping a single locus trait, based on simulation results that were evaluated by a block-based statistical test. Compared to methods of haplotype block partitioning, we performed best on detection of recombination hotspots. CONCLUSION: Our proposed method divides chromosomes into the regions within which allelic associations of SNP pairs are maximized. This approach presents a native design for dimension reduction in genome-wide association studies. Our results show that the pairwise allelic association of SNPs can describe various features of genomic variation, in particular recombination hotspots.


Assuntos
Algoritmos , Biologia Computacional/métodos , Haplótipos , Polimorfismo de Nucleotídeo Único , Alelos , Variação Genética , Estudo de Associação Genômica Ampla/métodos
13.
Proteins ; 77(2): 454-63, 2009 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-19452553

RESUMO

The purpose of this article is to introduce a novel model for discriminating correctly folded proteins from well designed decoy structures using mechanical interatomic forces. In our model, we consider a protein as a collection of springs and the force imposed to each atom is calculated. A potential function is obtained from statistical contact preferences within known protein structures. Combining this function with the spring equation, the interatomic forces are calculated. Finally, we consider a structure and define a score function on the 3D structure of a protein. We compare the force imposed to each atom of a protein with the corresponding atom in the other structures. We then assign larger scores to those atoms with lower forces. The total score is the sum of partial scores of atoms. The optimal structure is assumed to be the one with the highest score in the data set. To evaluate the performance of our model, we apply it on several decoy sets.


Assuntos
Biologia Computacional , Proteínas/química , Simulação por Computador , Conformação Proteica , Dobramento de Proteína , Estatísticas não Paramétricas
14.
Stat Methods Med Res ; 18(2): 183-94, 2009 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-18445695

RESUMO

Sample size computations are largely based on frequentist or classical methods. In the Bayesian approach the prior information on the unknown parameters is taken into account. In this work we consider a fully Bayesian approach to the sample size determination problem which was introduced by Grundy et al. and developed by Lindley. This approach treats the problem as a decision problem and employs a utility function to find the optimal sample size of a trial. Furthermore, we assume that a regulatory authority, which is deciding on whether or not to grant a licence to a new treatment, uses a frequentist approach. We then find the optimal sample size for the trial by maximising the expected net benefit, which is the expected benefit of subsequent use of the new treatment minus the cost of the trial.


Assuntos
Teorema de Bayes , Ensaios Clínicos como Assunto/estatística & dados numéricos , Tamanho da Amostra , Biometria , Ensaios Clínicos como Assunto/economia , Ensaios Clínicos como Assunto/legislação & jurisprudência , Comércio , Humanos , Licenciamento , Modelos Estatísticos , Saúde Pública
15.
Math Biosci ; 217(2): 145-50, 2009 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-19046975

RESUMO

Prediction of protein secondary structure is an important step towards elucidating its three dimensional structure and its function. This is a challenging problem in bioinformatics. Segmental semi Markov models (SSMMs) are one of the best studied methods in this field. However, incorporating evolutionary information to these methods is somewhat difficult. On the other hand, the systems of multiple neural networks (NNs) are powerful tools for multi-class pattern classification which can easily be applied to take these sorts of information into account. To overcome the weakness of SSMMs in prediction, in this work we consider a SSMM as a decision function on outputs of three NNs that uses multiple sequence alignment profiles. We consider four types of observations for outputs of a neural network. Then profile table related to each sequence is reduced to a sequence of four observations. In order to predict secondary structure of each amino acid we need to consider a decision function. We use an SSMM on outputs of three neural networks. The proposed SSMM has discriminative power and weights over different dependency models for outputs of neural networks. The results show that the accuracy of our model in predictions, particularly for strands, is considerably increased.


Assuntos
Cadeias de Markov , Redes Neurais de Computação , Estrutura Secundária de Proteína , Proteínas/química , Alinhamento de Sequência
16.
BMC Bioinformatics ; 9: 357, 2008 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-18759992

RESUMO

BACKGROUND: The problem of accurate prediction of protein secondary structure continues to be one of the challenging problems in Bioinformatics. It has been previously suggested that amino acid relative solvent accessibility (RSA) might be an effective factor for increasing the accuracy of protein secondary structure prediction. Previous studies have either used a single constant threshold to classify residues into discrete classes (buries vs. exposed), or used the real-value predicted RSAs in their prediction method. RESULTS: We studied the effect of applying different RSA threshold types (namely, fixed thresholds vs. residue-dependent thresholds) on a variety of secondary structure prediction methods. With the consideration of DSSP-assigned RSA values we realized that improvement in the accuracy of prediction strictly depends on the selected threshold(s). Furthermore, we showed that choosing a single threshold for all amino acids is not the best possible parameter. We therefore used residue-dependent thresholds and most of residues showed improvement in prediction. Next, we tried to consider predicted RSA values, since in the real-world problem, protein sequence is the only available information. We first predicted the RSA classes by RVP-net program and then used these data in our method. Using this approach, improvement in prediction was also obtained. CONCLUSION: The success of applying the RSA information on different secondary structure prediction methods suggest that prediction accuracy can be improved independent of prediction approaches. Thus, solvent accessibility can be considered as a rich source of information to help the improvement of these methods.


Assuntos
Aminoácidos/química , Modelos Químicos , Modelos Moleculares , Proteínas/química , Proteínas/ultraestrutura , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Simulação por Computador , Dados de Sequência Molecular , Estrutura Secundária de Proteína , Propriedades de Superfície
17.
BMC Bioinformatics ; 9: 274, 2008 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-18547401

RESUMO

BACKGROUND: It has been previously shown that palindromic sequences are frequently observed in proteins. However, our knowledge about their evolutionary origin and their possible importance is incomplete. RESULTS: In this work, we tried to revisit this relatively neglected phenomenon. Several questions are addressed in this work. (1) It is known that there is a large chance of finding a palindrome in low complexity sequences (i.e. sequences with extreme amino acid usage bias). What is the role of sequence complexity in the evolution of palindromic sequences in proteins? (2) Do palindromes coincide with conserved protein sequences? If yes, what are the functions of these conserved segments? (3) In case of conserved palindromes, is it always the case that the whole conserved pattern is also symmetrical? (4) Do palindromic protein sequences form regular secondary structures? (5) Does sequence similarity of the two "sides" of a palindrome imply structural similarity? For the first question, we showed that the complexity of palindromic peptides is significantly lower than randomly generated palindromes. Therefore, one can say that palindromes occur frequently in low complexity protein segments, without necessarily having a defined function or forming a special structure. Nevertheless, this does not rule out the possibility of finding palindromes which play some roles in protein structure and function. In fact, we found several palindromes that overlap with conserved protein Blocks of different functions. However, in many cases we failed to find any symmetry in the conserved regions of corresponding Blocks. Furthermore, to answer the last two questions, the structural characteristics of palindromes were studied. It is shown that palindromes may have a great propensity to form alpha-helical structures. Finally, we demonstrated that the two sides of a palindrome generally do not show significant structural similarities. CONCLUSION: We suggest that the puzzling abundance of palindromic sequences in proteins is mainly due to their frequent concurrence with low-complexity protein regions, rather than a global role in the protein function. In addition, palindromic sequences show a relatively high tendency to form helices, which might play an important role in the evolution of proteins that contain palindromes. Moreover, reverse similarity in peptides does not necessarily imply significant structural similarity. This observation rules out the importance of palindromes for forming symmetrical structures. Although palindromes frequently overlap with conserved Blocks, we suggest that palindromes overlap with Blocks only by coincidence, rather than being involved with a certain structural fold or protein domain.


Assuntos
Sequência de Aminoácidos/fisiologia , Biologia Computacional/métodos , Proteínas/análise , Aminoácidos/análise , Sítios de Ligação/genética , Sequência Conservada/fisiologia , Bases de Dados de Proteínas , Evolução Molecular , Reconhecimento Automatizado de Padrão , Estrutura Secundária de Proteína/fisiologia , Proteínas/ultraestrutura , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , Estatísticas não Paramétricas , Relação Estrutura-Atividade
18.
Comput Biol Chem ; 32(6): 406-11, 2008 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-18789769

RESUMO

The automatic assignment of secondary structure from three-dimensional atomic coordinates of proteins is an essential step for the analysis and modeling of protein structures. So different methods based on different criteria have been designed to perform this task. We introduce a new method for protein secondary structure assignment based solely on C(alpha) coordinates. We introduce four certain relations between C(alpha) three-dimensional coordinates of consecutive residues, each of which applies to one of the four regular secondary structure categories: alpha-helix, 3(10)-helix, pi-helix and beta-strand. In our approach, the deviation of the C(alpha) coordinates of each residue from each relation is calculated. Based on these deviation values, secondary structures are assigned to all residues of a protein. We show that our method agrees well with popular methods as DSSP, STRIDE and assignments in PDB files. It is shown that our method gives more information about helix geometry leading to more accurate secondary structure assignment.


Assuntos
Proteínas/química , Estrutura Secundária de Proteína
19.
J Theor Biol ; 251(2): 380-7, 2008 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-18177672

RESUMO

With large amounts of experimental data, modern molecular biology needs appropriate methods to deal with biological sequences. In this work, we apply a statistical method (Pearson's chi-square test) to recognize the signals appear in the whole genome of the Escherichia coli. To show the effectiveness of the method, we compare the Pearson's chi-square test with linguistic complexity on the complete genome of E. coli. The results suggest that Pearson's chi-square test is an efficient method for distinguishing genes (coding regions) form pseudogenes (noncoding regions). On the other hand, the performance of the linguistic complexity is much lower than the chi-square test method. We also use the Pearson's chi-square test method to determine which parts of the Open Reading Frame (ORF) have significant effect on discriminating genes form pseudogenes. Moreover, different complexity measures and Pearson's chi-square test applied on the genes with high value of Pearson's chi-square statistic. We also compute the measures on homologous of these genes. The results illustrate that there is a region near the start codon with high value of chi-square statistic and low complexity that is conserve between homologous genes.


Assuntos
Escherichia coli/genética , Genoma Bacteriano , Fases de Leitura Aberta , Sequência de Bases , Distribuição de Qui-Quadrado , Biologia Computacional , Sequência Conservada , Dados de Sequência Molecular , Pseudogenes , Homologia de Sequência
20.
Sci Rep ; 8(1): 4009, 2018 03 05.
Artigo em Inglês | MEDLINE | ID: mdl-29507384

RESUMO

Currently a few tools are capable of detecting genome-wide Copy Number Variations (CNVs) based on sequencing of multiple samples. Although aberrations in mate pair insertion sizes provide additional hints for the CNV detection based on multiple samples, the majority of the current tools rely only on the depth of coverage. Here, we propose a new algorithm (MSeq-CNV) which allows detecting common CNVs across multiple samples. MSeq-CNV applies a mixture density for modeling aberrations in depth of coverage and abnormalities in the mate pair insertion sizes. Each component in this mixture density applies a Binomial distribution for modeling the number of mate pairs with aberration in the insertion size and also a Poisson distribution for emitting the read counts, in each genomic position. MSeq-CNV is applied on simulated data and also on real data of six HapMap individuals with high-coverage sequencing, in 1000 Genomes Project. These individuals include a CEU trio of European ancestry and a YRI trio of Nigerian ethnicity. Ancestry of these individuals is studied by clustering the identified CNVs. MSeq-CNV is also applied for detecting CNVs in two samples with low-coverage sequencing in 1000 Genomes Project and six samples form the Simons Genome Diversity Project.


Assuntos
Variações do Número de Cópias de DNA , Análise de Sequência de DNA/normas , Algoritmos , Deleção de Genes , Genoma Humano , Projeto HapMap , Heterozigoto , Homozigoto , Humanos , Distribuição de Poisson , Análise de Sequência de DNA/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA