Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 24
Filter
1.
Data Brief ; 10: 315-324, 2017 Feb.
Article in English | MEDLINE | ID: mdl-28004021

ABSTRACT

We present data on the evolution of intrinsically disordered regions (IDRs) taking into account the entire human protein kinome. The evolutionary data of the IDRs with respect to the kinase domains (KDs) and kinases as a whole protein (WP) are reported. Further, we have reported its post translational modifications of FAK1 IDRs and their contribution to the cytoskeletal remodeling. We also report the data to build a protein-protein interaction (PPI) network of primary and secondary FAK1-interacting hybrid proteins. Detailed analysis of the data and its effect on FAK1-related functions have been described in "Structural pliability adjacent to the kinase domain highlights contribution of FAK1 IDRs to cytoskeletal remodeling" (Kathiriya et. al., 2016) [1].

2.
Biochim Biophys Acta Proteins Proteom ; 1865(1): 43-54, 2017 Jan.
Article in English | MEDLINE | ID: mdl-27718363

ABSTRACT

Therapeutic protein kinase inhibitors are designed on the basis of kinase structures. Here, we define intrinsically disordered regions (IDRs) in structurally hybrid kinases. We reveal that 65% of kinases have an IDR adjacent to their kinase domain (KD). These IDRs are evolutionarily more conserved than IDRs distant to KDs. Strikingly, 36 kinases have adjacent IDRs extending into their KDs, defining a unique structural and functional subset of the kinome. Functional network analysis of this subset of the kinome uncovered FAK1 as topologically the most connected hub kinase. We identify that KD-flanking IDR of FAK1 is more conserved and undergoes more post-translational modifications than other IDRs. It preferentially interacts with proteins regulating scaffolding and kinase activity, which contribute to cytoskeletal remodeling. In summary, spatially and evolutionarily conserved IDRs in kinases may influence their functions, which can be exploited for targeted therapies in diseases including those that involve aberrant cytoskeletal remodeling.


Subject(s)
Cytoskeleton/metabolism , Focal Adhesion Kinase 1/chemistry , Cytoskeleton/enzymology , Focal Adhesion Kinase 1/metabolism , Intrinsically Disordered Proteins/chemistry , Intrinsically Disordered Proteins/metabolism , Protein Conformation , Protein Processing, Post-Translational
3.
Data Brief ; 6: 715-21, 2016 Mar.
Article in English | MEDLINE | ID: mdl-26870755

ABSTRACT

Our analysis examines the conservation of multiprotein complexes among metazoa through use of high resolution biochemical fractionation and precision mass spectrometry applied to soluble cell extracts from 5 representative model organisms Caenorhabditis elegans, Drosophila melanogaster, Mus musculus, Strongylocentrotus purpuratus, and Homo sapiens. The interaction network obtained from the data was validated globally in 4 distant species (Xenopus laevis, Nematostella vectensis, Dictyostelium discoideum, Saccharomyces cerevisiae) and locally by targeted affinity-purification experiments. Here we provide details of our massive set of supporting biochemical fractionation data available via ProteomeXchange (PXD002319-PXD002328), PPIs via BioGRID (185267); and interaction network projections via (http://metazoa.med.utoronto.ca) made fully accessible to allow further exploration. The datasets here are related to the research article on metazoan macromolecular complexes in Nature [1].

4.
Nature ; 525(7569): 339-44, 2015 Sep 17.
Article in English | MEDLINE | ID: mdl-26344197

ABSTRACT

Macromolecular complexes are essential to conserved biological processes, but their prevalence across animals is unclear. By combining extensive biochemical fractionation with quantitative mass spectrometry, here we directly examined the composition of soluble multiprotein complexes among diverse metazoan models. Using an integrative approach, we generated a draft conservation map consisting of more than one million putative high-confidence co-complex interactions for species with fully sequenced genomes that encompasses functional modules present broadly across all extant animals. Clustering reveals a spectrum of conservation, ranging from ancient eukaryotic assemblies that have probably served cellular housekeeping roles for at least one billion years, ancestral complexes that have accrued contemporary components, and rarer metazoan innovations linked to multicellularity. We validated these projections by independent co-fractionation experiments in evolutionarily distant species, affinity purification and functional analyses. The comprehensiveness, centrality and modularity of these reconstructed interactomes reflect their fundamental mechanistic importance and adaptive value to animal cell systems.


Subject(s)
Evolution, Molecular , Multiprotein Complexes/chemistry , Multiprotein Complexes/metabolism , Protein Interaction Maps , Animals , Datasets as Topic , Humans , Protein Interaction Mapping , Reproducibility of Results , Systems Biology , Tandem Mass Spectrometry
5.
BMC Bioinformatics ; 15: 157, 2014 May 22.
Article in English | MEDLINE | ID: mdl-24886131

ABSTRACT

BACKGROUND: Several methods are available for the detection of covarying positions from a multiple sequence alignment (MSA). If the MSA contains a large number of sequences, information about the proximities between residues derived from covariation maps can be sufficient to predict a protein fold. However, in many cases the structure is already known, and information on the covarying positions can be valuable to understand the protein mechanism and dynamic properties. RESULTS: In this study we have sought to determine whether a multivariate (multidimensional) extension of traditional mutual information (MI) can be an additional tool to study covariation. The performance of two multidimensional MI (mdMI) methods, designed to remove the effect of ternary/quaternary interdependencies, was tested with a set of 9 MSAs each containing <400 sequences, and was shown to be comparable to that of the newest methods based on maximum entropy/pseudolikelyhood statistical models of protein sequences. However, while all the methods tested detected a similar number of covarying pairs among the residues separated by < 8 Å in the reference X-ray structures, there was on average less than 65% overlap between the top scoring pairs detected by methods that are based on different principles. CONCLUSIONS: Given the large variety of structure and evolutionary history of different proteins it is possible that a single best method to detect covariation in all proteins does not exist, and that for each protein family the best information can be derived by merging/comparing results obtained with different methods. This approach may be particularly valuable in those cases in which the size of the MSA is small or the quality of the alignment is low, leading to significant differences in the pairs detected by different methods.


Subject(s)
Sequence Alignment/methods , Sequence Analysis, Protein , Models, Statistical , Protein Structure, Secondary , Proteins/chemistry , Proteins/classification
6.
PLoS Genet ; 9(2): e1003280, 2013.
Article in English | MEDLINE | ID: mdl-23468640

ABSTRACT

Expansions of trinucleotide CAG/CTG repeats in somatic tissues are thought to contribute to ongoing disease progression through an affected individual's life with Huntington's disease or myotonic dystrophy. Broad ranges of repeat instability arise between individuals with expanded repeats, suggesting the existence of modifiers of repeat instability. Mice with expanded CAG/CTG repeats show variable levels of instability depending upon mouse strain. However, to date the genetic modifiers underlying these differences have not been identified. We show that in liver and striatum the R6/1 Huntington's disease (HD) (CAG)∼100 transgene, when present in a congenic C57BL/6J (B6) background, incurred expansion-biased repeat mutations, whereas the repeat was stable in a congenic BALB/cByJ (CBy) background. Reciprocal congenic mice revealed the Msh3 gene as the determinant for the differences in repeat instability. Expansion bias was observed in congenic mice homozygous for the B6 Msh3 gene on a CBy background, while the CAG tract was stabilized in congenics homozygous for the CBy Msh3 gene on a B6 background. The CAG stabilization was as dramatic as genetic deficiency of Msh2. The B6 and CBy Msh3 genes had identical promoters but differed in coding regions and showed strikingly different protein levels. B6 MSH3 variant protein is highly expressed and associated with CAG expansions, while the CBy MSH3 variant protein is expressed at barely detectable levels, associating with CAG stability. The DHFR protein, which is divergently transcribed from a promoter shared by the Msh3 gene, did not show varied levels between mouse strains. Thus, naturally occurring MSH3 protein polymorphisms are modifiers of CAG repeat instability, likely through variable MSH3 protein stability. Since evidence supports that somatic CAG instability is a modifier and predictor of disease, our data are consistent with the hypothesis that variable levels of CAG instability associated with polymorphisms of DNA repair genes may have prognostic implications for various repeat-associated diseases.


Subject(s)
Huntington Disease/genetics , Proteins/genetics , Trinucleotide Repeat Expansion/genetics , Trinucleotide Repeats/genetics , Animals , Corpus Striatum/metabolism , Disease Models, Animal , Genomic Instability , Humans , Mice , MutS Homolog 3 Protein , Myotonic Dystrophy/genetics , Myotonic Dystrophy/metabolism , Neostriatum/metabolism , Nerve Tissue Proteins/genetics , Nerve Tissue Proteins/metabolism , Polymorphism, Genetic , Protein Stability
7.
Mol Biol Evol ; 30(2): 332-46, 2013 Feb.
Article in English | MEDLINE | ID: mdl-22977115

ABSTRACT

Protein interaction networks play central roles in biological systems, from simple metabolic pathways through complex programs permitting the development of organisms. Multicellularity could only have arisen from a careful orchestration of cellular and molecular roles and responsibilities, all properly controlled and regulated. Disease reflects a breakdown of this organismal homeostasis. To better understand the evolution of interactions whose dysfunction may be contributing factors to disease, we derived the human protein coevolution network using our MatrixMatchMaker algorithm and using the Orthologous MAtrix project (OMA) database as a source for protein orthologs from 103 eukaryotic genomes. We annotated the coevolution network using protein-protein interaction data, many functional data sources, and we explored the evolutionary rates and dates of emergence of the proteins in our data set. Strikingly, clustering based only on the topology of the coevolution network partitions it into two subnetworks, one generally representing ancient eukaryotic functions and the other functions more recently acquired during animal evolution. That latter subnetwork is enriched for proteins with roles in cell-cell communication, the control of cell division, and related multicellular functions. Further annotation using data from genetic disease databases and cancer genome sequences strongly implicates these proteins in both ciliopathies and cancer. The enrichment for such disease markers in the animal network suggests a functional link between these coevolving proteins. Genetic validation corroborates the recruitment of ancient cilia in the evolution of multicellularity.


Subject(s)
Biological Evolution , Cell Communication/physiology , Proteins/genetics , Proteins/metabolism , Animals , Ciliary Motility Disorders/genetics , Ciliary Motility Disorders/metabolism , Cluster Analysis , Databases, Protein , Female , Gene Expression , Humans , Male , Mutation , Neoplasms/genetics , Neoplasms/metabolism , Protein Binding , Protein Interaction Mapping , Protein Interaction Maps
8.
PLoS One ; 7(10): e47108, 2012.
Article in English | MEDLINE | ID: mdl-23091608

ABSTRACT

BACKGROUND: While the conserved positions of a multiple sequence alignment (MSA) are clearly of interest, non-conserved positions can also be important because, for example, destabilizing effects at one position can be compensated by stabilizing effects at another position. Different methods have been developed to recognize the evolutionary relationship between amino acid sites, and to disentangle functional/structural dependencies from historical/phylogenetic ones. METHODOLOGY/PRINCIPAL FINDINGS: We have used two complementary approaches to test the efficacy of these methods. In the first approach, we have used a new program, MSAvolve, for the in silico evolution of MSAs, which records a detailed history of all covarying positions, and builds a global coevolution matrix as the accumulated sum of individual matrices for the positions forced to co-vary, the recombinant coevolution, and the stochastic coevolution. We have simulated over 1600 MSAs for 8 protein families, which reflect sequences of different sizes and proteins with widely different functions. The calculated coevolution matrices were compared with the coevolution matrices obtained for the same evolved MSAs with different coevolution detection methods. In a second approach we have evaluated the capacity of the different methods to predict close contacts in the representative X-ray structures of an additional 150 protein families using only experimental MSAs. CONCLUSIONS/SIGNIFICANCE: Methods based on the identification of global correlations between pairs were found to be generally superior to methods based only on local correlations in their capacity to identify coevolving residues using either simulated or experimental MSAs. However, the significant variability in the performance of different methods with different proteins suggests that the simulation of MSAs that replicate the statistical properties of the experimental MSA can be a valuable tool to identify the coevolution detection method that is most effective in each case.


Subject(s)
Computational Biology/methods , Evolution, Molecular , Proteins/chemistry , Proteins/genetics , Sequence Alignment , Amino Acid Sequence , Computer Simulation , Molecular Sequence Data , Reproducibility of Results
9.
Cell ; 150(5): 1068-81, 2012 Aug 31.
Article in English | MEDLINE | ID: mdl-22939629

ABSTRACT

Cellular processes often depend on stable physical associations between proteins. Despite recent progress, knowledge of the composition of human protein complexes remains limited. To close this gap, we applied an integrative global proteomic profiling approach, based on chromatographic separation of cultured human cell extracts into more than one thousand biochemical fractions that were subsequently analyzed by quantitative tandem mass spectrometry, to systematically identify a network of 13,993 high-confidence physical interactions among 3,006 stably associated soluble human proteins. Most of the 622 putative protein complexes we report are linked to core biological processes and encompass both candidate disease genes and unannotated proteins to inform on mechanism. Strikingly, whereas larger multiprotein assemblies tend to be more extensively annotated and evolutionarily conserved, human protein complexes with five or fewer subunits are far more likely to be functionally unannotated or restricted to vertebrates, suggesting more recent functional innovations.


Subject(s)
Multiprotein Complexes/analysis , Protein Interaction Maps , Proteins/chemistry , Proteomics/methods , Humans , Tandem Mass Spectrometry
10.
Methods Mol Biol ; 781: 237-56, 2011.
Article in English | MEDLINE | ID: mdl-21877284

ABSTRACT

Bioinformatic methods to predict protein-protein interactions (PPI) via coevolutionary analysis have -positioned themselves to compete alongside established in vitro methods, despite a lack of understanding for the underlying molecular mechanisms of the coevolutionary process. Investigating the alignment of coevolutionary predictions of PPI with experimental data can focus the effective scope of prediction and lead to better accuracies. A new rate-based coevolutionary method, MMM, preferentially finds obligate interacting proteins that form complexes, conforming to results from studies based on coimmunoprecipitation coupled with mass spectrometry. Using gold-standard databases as a benchmark for accuracy, MMM surpasses methods based on abundance ratios, suggesting that correlated evolutionary rates may yet be better than coexpression at predicting interacting proteins. At the level of protein domains, -coevolution is difficult to detect, even with MMM, except when considering small-scale experimental data involving proteins with multiple domains. Overall, these findings confirm that coevolutionary -methods can be confidently used in predicting PPI, either independently or as drivers of coimmunoprecipitation experiments.


Subject(s)
Biological Evolution , Computational Biology , Protein Interaction Mapping/methods , Proteins/chemistry , Proteins/metabolism , Algorithms , Immunoprecipitation , Phylogeny , Protein Binding
11.
Appl Environ Microbiol ; 77(15): 5361-9, 2011 Aug.
Article in English | MEDLINE | ID: mdl-21666017

ABSTRACT

Dehalococcoides spp. are an industrially relevant group of Chloroflexi bacteria capable of reductively dechlorinating contaminants in groundwater environments. Existing Dehalococcoides genomes revealed a high level of sequence identity within this group, including 98 to 100% 16S rRNA sequence identity between strains with diverse substrate specificities. Common molecular techniques for identification of microbial populations are often not applicable for distinguishing Dehalococcoides strains. Here we describe an oligonucleotide microarray probe set designed based on clustered Dehalococcoides genes from five different sources (strain DET195, CBDB1, BAV1, and VS genomes and the KB-1 metagenome). This "pangenome" probe set provides coverage of core Dehalococcoides genes as well as strain-specific genes while optimizing the potential for hybridization to closely related, previously unknown Dehalococcoides strains. The pangenome probe set was compared to probe sets designed independently for each of the five Dehalococcoides strains. The pangenome probe set demonstrated better predictability and higher detection of Dehalococcoides genes than strain-specific probe sets on nontarget strains with <99% average nucleotide identity. An in silico analysis of the expected probe hybridization against the recently released Dehalococcoides strain GT genome and additional KB-1 metagenome sequence data indicated that the pangenome probe set performs more robustly than the combined strain-specific probe sets in the detection of genes not included in the original design. The pangenome probe set represents a highly specific, universal tool for the detection and characterization of Dehalococcoides from contaminated sites. It has the potential to become a common platform for Dehalococcoides-focused research, allowing meaningful comparisons between microarray experiments regardless of the strain examined.


Subject(s)
Bacterial Typing Techniques/methods , Chloroflexi/genetics , Oligonucleotide Array Sequence Analysis/methods , Base Sequence , DNA, Bacterial/analysis , DNA, Bacterial/genetics , Multigene Family , Nucleic Acid Hybridization/genetics , Oligonucleotide Probes/genetics , Proteomics/methods , RNA, Ribosomal, 16S/analysis , RNA, Ribosomal, 16S/genetics , Sequence Alignment , Sequence Analysis, DNA
12.
Biochem Cell Biol ; 88(2): 185-94, 2010 Apr.
Article in English | MEDLINE | ID: mdl-20453921

ABSTRACT

GroEL is a chaperone thought of as essential for bacterial life. However, some species of Mollicutes are missing GroEL. We use phylogenetic analysis to show that the presence of GroEL is polyphyletic among the Mollicutes, and that there is evidence for lateral gene transfer of GroEL to Mycoplasma penetrans from the Proteobacteria. Furthermore, we propose that the presence of GroEL in Mycoplasma may be required for invasion of host tissue, suggesting that GroEL may act as an adhesin-invasin.


Subject(s)
Chaperonin 60/genetics , Chaperonin 60/metabolism , Tenericutes/genetics , Tenericutes/metabolism , Chaperonin 60/chemistry , Phylogeny , Tenericutes/chemistry
13.
Microb Biotechnol ; 3(6): 677-90, 2010 Nov.
Article in English | MEDLINE | ID: mdl-21255363

ABSTRACT

One hundred and seventy-one genes encoding potential esterases from 11 bacterial genomes were cloned and overexpressed in Escherichia coli; 74 of the clones produced soluble proteins. All 74 soluble proteins were purified and screened for esterase activity; 36 proteins showed carboxyl esterase activity on short-chain esters, 17 demonstrated arylesterase activity, while 38 proteins did not exhibit any activity towards the test substrates. Esterases from Rhodopseudomonas palustris (RpEST-1, RpEST-2 and RpEST-3), Pseudomonas putida (PpEST-1, PpEST-2 and PpEST-3), Pseudomonas aeruginosa (PaEST-1) and Streptomyces avermitilis (SavEST-1) were selected for detailed biochemical characterization. All of the enzymes showed optimal activity at neutral or alkaline pH, and the half-life of each enzyme at 50°C ranged from < 5 min to over 5 h. PpEST-3, RpEST-1 and RpEST-2 demonstrated the highest specific activity with pNP-esters; these enzymes were also among the most stable at 50°C and in the presence of detergents, polar and non-polar organic solvents, and imidazolium ionic liquids. Accordingly, these enzymes are particularly interesting targets for subsequent application trials. Finally, biochemical and bioinformatic analyses were compared to reveal sequence features that could be correlated to enzymes with arylesterase activity, facilitating subsequent searches for new esterases in microbial genome sequences.


Subject(s)
Bacteria/enzymology , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Carboxylic Ester Hydrolases/genetics , Carboxylic Ester Hydrolases/metabolism , Genome, Bacterial , Bacterial Proteins/chemistry , Bacterial Proteins/isolation & purification , Carboxylic Ester Hydrolases/chemistry , Carboxylic Ester Hydrolases/isolation & purification , Computational Biology , Enzyme Stability , Hydrogen-Ion Concentration , Substrate Specificity , Temperature
14.
Proteins ; 78(3): 548-58, 2010 Feb 15.
Article in English | MEDLINE | ID: mdl-19768681

ABSTRACT

Correlated mutation analysis (CMA) is an effective approach for predicting functional and structural residue interactions from multiple sequence alignments (MSAs) of proteins. As nearby residues may also play a role in a given functional interaction, we were interested in seeing whether covarying sites were clustered, and whether this could be used to enhance the predictive power of CMA. A large-scale search for coevolving regions within protein domains revealed that if two sites in a MSA covary, then neighboring sites in the alignment also typically covary, resulting in clusters of covarying residues. The program PatchD(http://www.uhnres.utoronto.ca/labs/tillier/) was developed to measure the covariation between disconnected sequence clusters to reveal patch covariation. Patches that exhibit strong covariation identify multiple residues that are generally nearby in the protein structure, suggesting that the detection of covarying patches can be used in conjunction with traditional CMA approaches to reveal functional interaction partners.


Subject(s)
DNA Mutational Analysis/methods , Models, Genetic , Proteins/chemistry , Proteins/genetics , Amino Acid Sequence , Binding Sites , Cluster Analysis , Conserved Sequence , Genetic Variation , Models, Molecular , Phylogeny , Proteins/metabolism , Sequence Alignment
15.
Genome Res ; 19(10): 1861-71, 2009 Oct.
Article in English | MEDLINE | ID: mdl-19696150

ABSTRACT

Coevolution maintains interactions between phenotypic traits through the process of reciprocal natural selection. Detecting molecular coevolution can expose functional interactions between molecules in the cell, generating insights into biological processes, pathways, and the networks of interactions important for cellular function. Prediction of interaction partners from different protein families exploits the property that interacting proteins can follow similar patterns and relative rates of evolution. Current methods for detecting coevolution based on the similarity of phylogenetic trees or evolutionary distance matrices have, however, been limited by requiring coevolution over the entire evolutionary history considered and are inaccurate in the presence of paralogous copies. We present a novel method for determining coevolving protein partners by finding the largest common submatrix in a given pair of distance matrices, with the size of the largest common submatrix measuring the strength of coevolution. This approach permits us to consider matrices of different size and scale, to find lineage-specific coevolution, and to predict multiple interaction partners. We used MatrixMatchMaker to predict protein-protein interactions in the human genome. We show that proteins that are known to interact physically are more strongly coevolving than proteins that simply belong to the same biochemical pathway. The human coevolution network is highly connected, suggesting many more protein-protein interactions than are currently known from high-throughput and other experimental evidence. These most strongly coevolving proteins suggest interactions that have been maintained over long periods of evolutionary time, and that are thus likely to be of fundamental importance to cellular function.


Subject(s)
Evolution, Molecular , Gene Regulatory Networks/genetics , Proteins/genetics , Calibration , Computational Biology/methods , Databases, Protein , Forecasting , Genetic Variation , Humans , Metabolic Networks and Pathways/genetics , Phylogeny , Protein Binding/genetics , Protein Interaction Domains and Motifs/genetics , Proteins/metabolism , Sensitivity and Specificity , Sequence Analysis, Protein/methods , Sequence Analysis, Protein/standards , Software/standards
16.
Biomol Eng ; 24(3): 321-6, 2007 Sep.
Article in English | MEDLINE | ID: mdl-17502167

ABSTRACT

RNA sequences can form structures which are conserved throughout evolution and the question of aligning two RNA secondary structures has been extensively studied. Most of the previous alignment algorithms require the input of gap opening and gap extension penalty parameters. The choice of appropriate parameter values is controversial as there is little biological information to guide their assignment. In this paper, we present an algorithm which circumvents this problem. Instead of finding an optimal alignment with predefined gap opening penalty, the algorithm finds the optimal alignment with exact number of aligned blocks.


Subject(s)
Algorithms , RNA/chemistry , RNA/genetics , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Base Sequence , Molecular Sequence Data , Nucleic Acid Conformation , Sequence Homology, Nucleic Acid
17.
Bioinformatics ; 23(10): 1195-202, 2007 May 15.
Article in English | MEDLINE | ID: mdl-17392329

ABSTRACT

MOTIVATION: With hundreds of completely sequenced microbial genomes available, and advancements in DNA microarray technology, the detection of genes in microbial communities consisting of hundreds of thousands of sequences may be possible. The existing strategies developed for DNA probe design, geared toward identifying specific sequences, are not suitable due to the lack of coverage, flexibility and efficiency necessary for applications in metagenomics. METHODS: ProDesign is a tool developed for the selection of oligonucleotide probes to detect members of gene families present in environmental samples. Gene family-specific probe sequences are generated based on specific and shared words, which are found with the spaced seed hashing algorithm. To detect more sequences, those sharing some common words are re-clustered into new families, then probes specific for the new families are generated. RESULTS: The program is very flexible in that it can be used for designing probes for detecting many genes families simultaneously and specifically in one or more genomes. Neither the length nor the melting temperature of the probes needs to be predefined. We have found that ProDesign provides more flexibility, coverage and speed than other software programs used in the selection of probes for genomic and gene family arrays. AVAILABILITY: ProDesign is licensed free of charge to academic users. ProDesign and Supplementary Material can be obtained by contacting the authors. A web server for ProDesign is available at http://www.uhnresearch.ca/labs/tillier/ProDesign/ProDesign.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Algorithms , Computational Biology/methods , Multigene Family , Oligonucleotide Probes/genetics , Bacteria/genetics , Genome, Bacterial , Microarray Analysis , Oligonucleotide Array Sequence Analysis , Software
18.
Evol Bioinform Online ; 2: 77-90, 2007 Jan 14.
Article in English | MEDLINE | ID: mdl-19455203

ABSTRACT

In comparative genomic studies, syntenic groups of homologous sequence in the same order have been used as supplementary information that can be used in helping to determine the orthology of the compared sequences. The assumption is that orthologous gene copies are more likely to share the same genome positions and share the same gene neighbors. In this study we have defined positional homologs as those that also have homologous neighboring genes and we investigated the usefulness of this distinction for bacterial comparative genomics. We considered the identification of positionaly homologous gene pairs in bacterial genomes using protein and DNA sequence level alignments and found that the positional homologs had on average relatively lower rates of substitution at the DNA level (synonymous substitutions) than duplicate homologs in different genomic locations, regardless of the level of protein sequence divergence (measured with non-synonymous substitution rate). Since gene order conservation can indicate accuracy of orthology assignments, we also considered the effect of imposing certain alignment quality requirements on the sensitivity and specificity of identification of protein pairs by BLAST and FASTA when neighboring information is not available and in comparisons where gene order is not conserved. We found that the addition of a stringency filter based on the second best hits was an efficient way to remove dubious ortholog identifications in BLAST and FASTA analyses. Gene order conservation and DNA sequence homology are useful to consider in comparative genomic studies as they may indicate different orthology assignments than protein sequence homology alone.

19.
BMC Bioinformatics ; 7: 471, 2006 Oct 24.
Article in English | MEDLINE | ID: mdl-17062146

ABSTRACT

BACKGROUND: There have been many algorithms and software programs implemented for the inference of multiple sequence alignments of protein and DNA sequences. The "true" alignment is usually unknown due to the incomplete knowledge of the evolutionary history of the sequences, making it difficult to gauge the relative accuracy of the programs. RESULTS: We tested nine of the most often used protein alignment programs and compared their results using sequences generated with the simulation software Simprot which creates known alignments under realistic and controlled evolutionary scenarios. We have simulated more than 30,000 alignment sets using various evolutionary histories in order to define strengths and weaknesses of each program tested. We found that alignment accuracy is extremely dependent on the number of insertions and deletions in the sequences, and that indel size has a weaker effect. We also considered benchmark alignments from the latest version of BAliBASE and the results relative to BAliBASE- and Simprot-generated data sets were consistent in most cases. CONCLUSION: Our results indicate that employing Simprot's simulated sequences allows the creation of a more flexible and broader range of alignment classes than the usual methods for alignment accuracy assessment. Simprot also allows for a quick and efficient analysis of a wider range of possible evolutionary histories that might not be present in currently available alignment sets. Among the nine programs tested, the iterative approach available in Mafft (L-INS-i) and ProbCons were consistently the most accurate, with Mafft being the faster of the two.


Subject(s)
Amino Acid Sequence , Proteins/chemistry , Sequence Alignment/methods , Software , Computational Biology , Computer Simulation , Databases, Protein , Gene Deletion , Mutation , Protein Conformation , Proteins/genetics , Sequence Alignment/standards
20.
Proteins ; 63(4): 822-31, 2006 Jun 01.
Article in English | MEDLINE | ID: mdl-16634043

ABSTRACT

Approaches for the determination of interacting partners from different protein families (such as ligands and their receptors) have made use of the property that interacting proteins follow similar patterns and relative rates of evolution. Interacting protein partners can then be predicted from the similarity of their phylogenetic trees or evolutionary distances matrices. We present a novel method called Codep, for the determination of interacting protein partners by maximizing co-evolutionary signals. The order of sequences in the multiple sequence alignments from two protein families is determined in such a manner as to maximize the similarity of substitution patterns at amino acid sites in the two alignments and, thus, phylogenetic congruency. This is achieved by maximizing the total number of interdependencies of amino acids sites between the alignments. Once ordered, the corresponding sequences in the two alignments indicate the predicted interacting partners. We demonstrate the efficacy of this approach with computer simulations and in analyses of several protein families. A program implementing our method, Codep, is freely available to academic users from our website: http://www.uhnresearch.ca/labs/tillier/.


Subject(s)
Evolution, Molecular , Proteins/genetics , Proteins/metabolism , Computer Simulation , Phylogeny , Protein Binding , Proteins/chemistry , Software
SELECTION OF CITATIONS
SEARCH DETAIL