Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
Add more filters










Publication year range
1.
Plant Cell Physiol ; 2024 Jul 22.
Article in English | MEDLINE | ID: mdl-39034452

ABSTRACT

Phycobilisomes play a crucial role in the light-harvesting mechanisms of cyanobacteria, red algae, and glaucophytes, but the molecular mechanism of their regulation is largely unknown. In the cyanobacterium, Synechocystis sp. PCC 6803, we identified a gene, slr0244, as a phycobilisome-related gene using phylogenetic profiling analysis, a method to predict gene function based on comparative genomics. To investigate the physiological function of the slr0244 gene, we characterize the slr0244 mutants spectroscopically. The disruption of the slr0244 gene impaired state transition, a process by which the distribution of light energy absorbed by the phycobilisomes between two photosystems was regulated in response to the changes in light conditions. The Slr0244 protein seems to act somewhere at or downstream of the sensing step of the redox state of the plastoquinone pool in the process of state transition. These findings, together with the past report of the interaction of this gene product with thioredoxin or glutaredoxin, suggest that the slr0244 gene is a novel state-transition regulator that integrates the redox signal of plastoquinone pools with that of photosystem I-reducing side. The protein has two USP (universal stress protein) motifs in tandem. The second motif has two conserved cysteine residues found in USPs of other cyanobacteria and land plants. These redox-type USPs with conserved cysteines may function as redox regulators in various photosynthetic organisms. Our study also showed the efficacy of the phylogenetic profiling analysis in predicting the function of cyanobacterial genes that have not been annotated so far.

2.
aBIOTECH ; 4(4): 291-302, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38106430

ABSTRACT

With the increasing number of sequenced species, phylogenetic profiling (PP) has become a powerful method to predict functional genes based on co-evolutionary information. However, its potential in plant genomics has not yet been fully explored. In this context, we combined the power of machine learning and PP to identify salt stress-related genes in a halophytic grass, Spartina alterniflora, using evolutionary information generated from 365 plant species. Our results showed that the genes highly co-evolved with known salt stress-related genes are enriched in biological processes of ion transport, detoxification and metabolic pathways. For ion transport, five identified genes coding two sodium and three potassium transporters were validated to be able to uptake Na+. In addition, we identified two orthologs of trichome-related AtR3-MYB genes, SaCPC1 and SaCPC2, which may be involved in salinity responses. Genes co-evolved with SaCPCs were enriched in functions related to the circadian rhythm and abiotic stress responses. Overall, this work demonstrates the feasibility of mining salt stress-related genes using evolutionary information, highlighting the potential of PP as a valuable tool for plant functional genomics. Supplementary Information: The online version contains supplementary material available at 10.1007/s42994-023-00125-5.

3.
J Mol Evol ; 91(4): 471-481, 2023 08.
Article in English | MEDLINE | ID: mdl-37039856

ABSTRACT

Selenium-binding proteins represent a ubiquitous protein family and recently SBP1 was described as a new stress response regulator in plants. SBP1 has been characterized as a methanethiol oxidase, however its exact role remains unclear. Moreover, in mammals, it is involved in the regulation of anti-carcinogenic growth and progression as well as reduction/oxidation modulation and detoxification. In this work, we delineate the functional potential of certain motifs of SBP in the context of evolutionary relationships. The phylogenetic profiling approach revealed the absence of SBP in the fungi phylum as well as in most non eukaryotic organisms. The phylogenetic tree also indicates the differentiation and evolution of characteristic SBP motifs. Main evolutionary events concern the CSSC motif for which Acidobacteria, Fungi and Archaea carry modifications. Moreover, the CC motif is harbored by some bacteria and remains conserved in Plants, while modified to CxxC in Animals. Thus, the characteristic sequence motifs of SBPs mainly appeared in Archaea and Bacteria and retained in Animals and Plants. Our results demonstrate the emergence of SBP from bacteria and most likely as a methanethiol oxidase.


Subject(s)
Proteins , Selenium-Binding Proteins , Animals , Selenium-Binding Proteins/genetics , Selenium-Binding Proteins/metabolism , Phylogeny , Bacteria/genetics , Bacteria/metabolism , Archaea/genetics , Archaea/metabolism , Plants , Oxidoreductases/genetics , Mammals/metabolism
4.
Microbiol Spectr ; 11(1): e0387122, 2023 02 14.
Article in English | MEDLINE | ID: mdl-36602356

ABSTRACT

Identification of microbial functional association networks allows interpretation of biological phenomena and a greater understanding of the molecular basis of pathogenicity and also underpins the formulation of control measures. Here, we describe PPNet, a tool that uses genome information and analysis of phylogenetic profiles with binary similarity and distance measures to derive large-scale bacterial gene association networks of a single species. As an exemplar, we have derived a functional association network in the pig pathogen Streptococcus suis using 81 binary similarity and dissimilarity measures which demonstrates excellent performance based on the area under the receiver operating characteristic (AUROC), the area under the precision-recall (AUPR), and a derived overall scoring method. Selected network associations were validated experimentally by using bacterial two-hybrid experiments. We conclude that PPNet, a publicly available (https://github.com/liyangjie/PPNet), can be used to construct microbial association networks from easily acquired genome-scale data. IMPORTANCE This study developed PPNet, the first tool that can be used to infer large-scale bacterial functional association networks of a single species. PPNet includes a method for assigning the uniqueness of a bacterial strain using the average nucleotide identity and the average nucleotide coverage. PPNet collected 81 binary similarity and distance measures for phylogenetic profiling and then evaluated and divided them into four groups. PPNet can effectively capture gene networks that are functionally related to phenotype from publicly prokaryotic genomes, as well as provide valuable results for downstream analysis and experiment testing.


Subject(s)
Genes, Bacterial , Prokaryotic Cells , Animals , Swine , Phylogeny , Bacteria/genetics , Gene Regulatory Networks
5.
Elife ; 112022 Oct 06.
Article in English | MEDLINE | ID: mdl-36200752

ABSTRACT

Heme can serve as iron source in many environments, including the iron-poor animal host environment. The fungal pathobiont Candida albicans expresses a family of extracellular CFEM hemophores that capture heme from host proteins and transfer it across the cell wall to the cell membrane, to be endocytosed and utilized as heme or iron source. Here, we identified Frp1 and Frp2, two ferric reductase (FRE)-related proteins that lack an extracellular N-terminal substrate-binding domain, as being required for hemoglobin heme utilization and for sensitivity to toxic heme analogs. Frp1 and Frp2 redistribute to the plasma membrane in the presence of hemin, consistent with a direct role in heme trafficking. Expression of Frp1 with the CFEM hemophore Pga7 can promote heme utilization in Saccharomyces cerevisiae as well, confirming the functional interaction between these proteins. Sequence and structure comparison reveals that the CFEM hemophores are related to the FRE substrate-binding domain that is missing in Frp1/2. We conclude that Frp1/2 and the CFEM hemophores form a functional complex that evolved from FREs to enable extracellular heme uptake.


Hosts and disease-causing fungi are often locked into a battle over resources. The host will attempt to withhold molecules that the fungus needs to survive, while the pathogen will try to find alternative routes to obtain them. Candida albicans, for example, can go after the atoms of iron embedded in the proteins of the organism it infects. To do so it releases molecules known as hemophores, which scavenge the iron-containing heme molecule that equips oxygen-carrying proteins in the blood. Once captured, the heme is carried across the wall that protects C. albicans from the environment and brought to the membrane of the cell. It is then taken in and trafficked inside vesicles to its destination. However, the identity of the molecular actors which help to bridge the internal and external segments of the heme journey remain unclear. Previous studies have shown that the hemophore Pga7 is involved, but this protein is attached to the outside of the cell membrane, where it cannot directly interact with the import machinery. Roy et al. set out to discover this missing link. Examining the genomes of fungal species related to C. albicans highlighted two membrane proteins, Frp1 and Frp2, which could participate in heme uptake. Protein sequence comparison revealed that Frp1 and Frp2 were closely related to ferric reductases, a group of membrane enzymes which can chemically alter extracellular iron prior to uptake. Deleting the genes for Frp1 and Frp2 rendered C. albicans cells incapable of taking in heme. Conversely, a fungal species which cannot normally uptake heme could efficiently internalise these complexes when artificially equipped with Frp1 and Pga7, suggesting that the two proteins work closely together. Finally, protein structure comparisons highlighted that an extracellular domain present in ferric reductases but absent in Frp1 and Frp2 is, in fact, related to Pga7 and other hemophores. This implies that the iron and heme uptake systems may share a common evolutionary origin. Overall, the work by Roy et al. reveals a new family of proteins which allow disease-causing fungi to steal iron from their hosts. This knowledge may be useful to design better anti-fungal treatments.


Subject(s)
Candida albicans , FMN Reductase , Animals , FMN Reductase/metabolism , Candida albicans/genetics , Candida albicans/metabolism , Heme/metabolism , Iron/metabolism , Fungal Proteins/genetics , Fungal Proteins/metabolism
6.
Front Med (Lausanne) ; 9: 824622, 2022.
Article in English | MEDLINE | ID: mdl-35178414

ABSTRACT

SARS-CoV-2 is the causative agent of a new type of coronavirus infection, COVID-19, which has rapidly spread worldwide. The overall genome sequence homology between SARS-CoV-2 and SARS-CoV is 79%. However, the homology of the ORF8 protein between these two coronaviruses is low, at ~26%. Previously, it has been suggested that infection by the ORF8-deleted variant of SARS-CoV-2 results in less severe symptoms than in the case of wild-type SARS-CoV-2. Although we found that ORF8 is involved in the proteasome autoimmunity system, the precise role of ORF8 in infection and pathology has not been fully clarified. In this study, we determined a new network of ORF8-interacting proteins by performing in silico analysis of the binding proteins against the previously described 47 ORF8-binding proteins. We used as a dataset 431 human protein candidates from Uniprot that physically interacted with 47 ORF8-binding proteins, as identified using STRING. Homology and phylogenetic profile analyses of the protein dataset were performed on 446 eukaryotic species whose genome sequences were available in KEGG OC. Based on the phylogenetic profile results, clustering analysis was performed using Ward's method. Our phylogenetic profiling showed that the interactors of the ORF8-interacting proteins were clustered into three classes that were conserved across chordates (Class 1: 152 proteins), metazoans (Class 2: 163 proteins), and eukaryotes (Class 3: 114 proteins). Following the KEGG pathway analysis, classification of cellular localization, tissue-specific expression analysis, and a literature study on each class of the phylogenetic profiling cluster tree, we predicted that the following: protein members in Class 1 could contribute to COVID-19 pathogenesis via complement and coagulation cascades and could promote sarcoidosis; the members of Class 1 and 2, together, may contribute to the downregulation of Interferon-ß; and Class 3 proteins are associated with endoplasmic reticulum stress and the degradation of human leukocyte antigen.

7.
Elife ; 102021 08 06.
Article in English | MEDLINE | ID: mdl-34355696

ABSTRACT

Inactivating mutations in the Methyl-CpG Binding Protein 2 (MECP2) gene are the main cause of Rett syndrome (RTT). Despite extensive research into MECP2 function, no treatments for RTT are currently available. Here, we used an evolutionary genomics approach to construct an unbiased MECP2 gene network, using 1028 eukaryotic genomes to prioritize proteins with strong co-evolutionary signatures with MECP2. Focusing on proteins targeted by FDA-approved drugs led to three promising targets, two of which were previously linked to MECP2 function (IRAK, KEAP1) and one that was not (EPOR). The drugs targeting these three proteins (Pacritinib, DMF, and EPO) were able to rescue different phenotypes of MECP2 inactivation in cultured human neural cell types, and appeared to converge on Nuclear Factor Kappa B (NF-κB) signaling in inflammation. This study highlights the potential of comparative genomics to accelerate drug discovery, and yields potential new avenues for the treatment of RTT.


Subject(s)
Methyl-CpG-Binding Protein 2/therapeutic use , Rett Syndrome/therapy , Genomics , Humans , Rett Syndrome/genetics
8.
Evol Bioinform Online ; 17: 11769343211003079, 2021.
Article in English | MEDLINE | ID: mdl-33795929

ABSTRACT

ORF8 is a highly variable genomic region of SARS-CoV-2. Although non-essential and the precise functions are unknown, it has been suggested that this protein assists in SARS-CoV-2 replication in the early secretory pathway and in immune evasion. We utilized the binding partners of SARS-CoV-2 proteins in human HEK293T cells and performed genome-wide phylogenetic profiling and clustering analyses in 446 eukaryotic species to predict and discover ORF8 binding partners that share associated functional mechanisms based on co-evolution. Results classified 47 ORF8 binding partner proteins into 3 clusters (groups 1-3), which were conserved in vertebrates (group 1), metazoan (group 2), and eukaryotes (group 3). Gene ontology analysis indicated that group 1 had no significant associated biological processes, while groups 2 and 3 were associated with glycoprotein biosynthesis process and ubiquitin-dependent endoplasmic reticulum-associated degradation pathways, respectively. Collectively, our results classified potential genes that might be associated with SARS-CoV-2 viral pathogenesis, specifically related to acute respiratory distress syndrome, and the secretory pathway. Here, we discuss the possible role of ORF8 in viral pathogenesis and in assisting viral replication and immune evasion via secretory pathway, as well as the possible factors associated with the rapid evolution of ORF8.

9.
Mol Biol Evol ; 38(8): 3033-3045, 2021 07 29.
Article in English | MEDLINE | ID: mdl-33822172

ABSTRACT

Accurate determination of the evolutionary relationships between genes is a foundational challenge in biology. Homology-evolutionary relatedness-is in many cases readily determined based on sequence similarity analysis. By contrast, whether or not two genes directly descended from a common ancestor by a speciation event (orthologs) or duplication event (paralogs) is more challenging, yet provides critical information on the history of a gene. Since 2009, this task has been the focus of the Quest for Orthologs (QFO) Consortium. The sixth QFO meeting took place in Okazaki, Japan in conjunction with the 67th National Institute for Basic Biology conference. Here, we report recent advances, applications, and oncoming challenges that were discussed during the conference. Steady progress has been made toward standardization and scalability of new and existing tools. A feature of the conference was the presentation of a panel of accessible tools for phylogenetic profiling and several developments to bring orthology beyond the gene unit-from domains to networks. This meeting brought into light several challenges to come: leveraging orthology computations to get the most of the incoming avalanche of genomic data, integrating orthology from domain to biological network levels, building better gene models, and adapting orthology approaches to the broad evolutionary and genomic diversity recognized in different forms of life and viruses.


Subject(s)
Genetic Speciation , Genomics/trends , Phylogeny , Genome, Viral , Genomics/methods
10.
Microb Genom ; 6(11)2020 11.
Article in English | MEDLINE | ID: mdl-32924924

ABSTRACT

As genome sequencing efforts are unveiling the genetic diversity of the biosphere with an unprecedented speed, there is a need to accurately describe the structural and functional properties of groups of extant species whose genomes have been sequenced, as well as their inferred ancestors, at any given taxonomic level of their phylogeny. Elaborate approaches for the reconstruction of ancestral states at the sequence level have been developed, subsequently augmented by methods based on gene content. While these approaches of sequence or gene-content reconstruction have been successfully deployed, there has been less progress on the explicit inference of functional properties of ancestral genomes, in terms of metabolic pathways and other cellular processes. Herein, we describe PathTrace, an efficient algorithm for parsimony-based reconstructions of the evolutionary history of individual metabolic pathways, pivotal representations of key functional modules of cellular function. The algorithm is implemented as a five-step process through which pathways are represented as fuzzy vectors, where each enzyme is associated with a taxonomic conservation value derived from the phylogenetic profile of its protein sequence. The method is evaluated with a selected benchmark set of pathways against collections of genome sequences from key data resources. By deploying a pangenome-driven approach for pathway sets, we demonstrate that the inferred patterns are largely insensitive to noise, as opposed to gene-content reconstruction methods. In addition, the resulting reconstructions are closely correlated with the evolutionary distance of the taxa under study, suggesting that a diligent selection of target pangenomes is essential for maintaining cohesiveness of the method and consistency of the inference, serving as an internal control for an arbitrary selection of queries. The PathTrace method is a first step towards the large-scale analysis of metabolic pathway evolution and our deeper understanding of functional relationships reflected in emerging pangenome collections.


Subject(s)
Algorithms , Bacteria/genetics , Bacteria/metabolism , Evolution, Molecular , Genome/genetics , Metabolic Networks and Pathways/genetics , Amino Acid Sequence , Base Sequence , Phylogeny , Software
11.
BMC Mol Cell Biol ; 21(1): 18, 2020 Mar 23.
Article in English | MEDLINE | ID: mdl-32293259

ABSTRACT

BACKGROUND: Congenital dyserythropoietic anemia type I (CDA I), is an autosomal recessive disease with macrocytic anemia in which erythroid precursors in the bone marrow exhibit pathognomonic abnormalities including spongy heterochromatin and chromatin bridges. We have shown previously that the gene mutated in CDA I encodes Codanin-1, a ubiquitously expressed and evolutionarily conserved large protein. Recently, an additional etiologic factor for CDA I was reported, C15Orf41, a predicted nuclease. Mutations in both CDAN1 and C15Orf41 genes results in very similar erythroid phenotype. However, the possible relationships between these two etiologic factors is not clear. RESULTS: We demonstrate here that Codanin-1 and C15Orf41 bind to each other, and that Codanin-1 stabilizes C15Orf41. C15Orf41 protein is mainly nuclear and Codanin-1 overexpression shifts it to the cytoplasm. Phylogenetic analyses demonstrated that even though Codanin-1 is an essential protein in mammals, it was lost from several diverse and unrelated animal taxa. Interestingly, C15Orf41 was eliminated in the exact same animal taxa. This is an extreme case of the Phylogenetic Profiling phenomenon, which strongly suggests common pathways for these two proteins. Lastly, as the 3D structure is more conserved through evolution than the protein sequence, we have used the Phyre2 alignment program to find structurally homologous proteins. We found that Codanin-1 is highly similar to CNOT1, a conserved protein which serves as a scaffold for proteins involved in mRNA stability and transcriptional control. CONCLUSIONS: The physical interaction and the stabilization of C15Orf41 by Codanin-1, combined with the phylogenetic co-existence and co-loss of these two proteins during evolution, suggest that the major function of the presumptive scaffold protein, Codanin-1, is to regulate C15Orf41 activities. The similarity between Codanin-1 and CNOT1 suggest that Codanin-1 is involved in RNA metabolism and activity, and opens up a new avenue for the study of the molecular pathways affected in CDAI.


Subject(s)
Anemia, Dyserythropoietic, Congenital , Deoxyribonucleases/genetics , Glycoproteins/genetics , Nuclear Proteins/genetics , Anemia, Dyserythropoietic, Congenital/etiology , Anemia, Dyserythropoietic, Congenital/genetics , Deoxyribonucleases/metabolism , Glycoproteins/metabolism , HeLa Cells , Humans , Mutation , Nuclear Proteins/metabolism , Phylogeny , Protein Binding , Transcription Factors/genetics , Transcription Factors/metabolism
12.
Int J Mol Sci ; 20(24)2019 Dec 13.
Article in English | MEDLINE | ID: mdl-31847093

ABSTRACT

Glycans are involved in various metabolic processes via the functions of glycosyltransferases and glycoside hydrolases. Analysing the evolution of these enzymes is essential for improving the understanding of glycan metabolism and function. Based on our previous study of glycosyltransferases, we performed a genome-wide analysis of whole human glycoside hydrolases using the UniProt, BRENDA, CAZy and KEGG databases. Using cluster analysis, 319 human glycoside hydrolases were classified into four clusters based on their similarity to enzymes conserved in chordates or metazoans (Class 1), metazoans (Class 2), metazoans and plants (Class 3) and eukaryotes (Class 4). The eukaryote and metazoan clusters included N- and O-glycoside hydrolases, respectively. The significant abundance of disordered regions within the most conserved cluster indicated a role for disordered regions in the evolution of glycoside hydrolases. These results suggest that the biological diversity of multicellular organisms is related to the acquisition of N- and O-linked glycans.


Subject(s)
Computer Simulation , Databases, Genetic , Glycoside Hydrolases/genetics , Genome-Wide Association Study , Glycoside Hydrolases/classification , Humans
13.
Curr Neurol Neurosci Rep ; 19(10): 70, 2019 08 23.
Article in English | MEDLINE | ID: mdl-31440850

ABSTRACT

PURPOSE OF REVIEW: Until recently, the gene associated with the recessive form of familial brain calcification (PFBC, Fahr disease) was unknown. MYORG, a gene that causes recessive PFBC was only recently discovered and is currently the only gene associated with a recessive form of this disease. Here, we review the radiological and clinical findings in adult MYORG mutation homozygous and heterozygous individuals. RECENT FINDINGS: MYORG was shown to be the cause of a large fraction of recessive cases of PFBC in patients of different ethnic populations. Pathogenic mutations include inframe insertions and deletions in addition to nonsense and missense mutations that are distributed throughout the entire MYORG coding region. Homozygotes have extensive brain calcification in all known cases, whereas in some carriers of heterozygous mutation, punctuated calcification of the globus pallidus is demonstrated. The clinical spectrum in homozygotes ranges from the lack of neurological symptoms to severe progressive neurological syndrome with bulbar and cerebellar signs, parkinsonism and other movement disorders, and cognitive impairments. Heterozygotes are clinically asymptomatic. MYORG is a transmembrane protein localized to the endoplasmic reticulum and is mainly expressed in astrocytes. While the biochemical pathways of the protein are still unknown, information from its evolution profile across hundreds of species (phylogenetic profiling) suggests a role for MYORG in regulating ion homeostasis via its glycosidase domain. MYORG mutations are a major cause for recessive PFBC in different world populations. Future studies are required in order to reveal the cellular role of the MYORG protein.


Subject(s)
Brain Diseases/genetics , Brain/pathology , Adult , Basal Ganglia Diseases , Calcinosis , Glycoside Hydrolases , Heterozygote , Humans , Male , Mutation , Neurodegenerative Diseases , Pedigree , Phylogeny
14.
Front Plant Sci ; 8: 1831, 2017.
Article in English | MEDLINE | ID: mdl-29163570

ABSTRACT

Despite many developed experimental and computational approaches, functional gene annotation remains challenging. With the rapidly growing number of sequenced genomes, the concept of phylogenetic profiling, which predicts functional links between genes that share a common co-occurrence pattern across different genomes, has gained renewed attention as it promises to annotate gene functions based on presence/absence calls alone. We applied phylogenetic profiling to the problem of metabolic pathway assignments of plant genes with a particular focus on secondary metabolism pathways. We determined phylogenetic profiles for 40,960 metabolic pathway enzyme genes with assigned EC numbers from 24 plant species based on sequence and pathway annotation data from KEGG and Ensembl Plants. For gene sequence family assignments, needed to determine the presence or absence of particular gene functions in the given plant species, we included data of all 39 species available at the Ensembl Plants database and established gene families based on pairwise sequence identities and annotation information. Aside from performing profiling comparisons, we used machine learning approaches to predict pathway associations from phylogenetic profiles alone. Selected metabolic pathways were indeed found to be composed of gene families of greater than expected phylogenetic profile similarity. This was particularly evident for primary metabolism pathways, whereas for secondary pathways, both the available annotation in different species as well as the abstraction of functional association via distinct pathways proved limiting. While phylogenetic profile similarity was generally not found to correlate with gene co-expression, direct physical interactions of proteins were reflected by a significantly increased profile similarity suggesting an application of phylogenetic profiling methods as a filtering step in the identification of protein-protein interactions. This feasibility study highlights the potential and challenges associated with phylogenetic profiling methods for the detection of functional relationships between genes as well as the need to enlarge the set of plant genes with proven secondary metabolism involvement as well as the limitations of distinct pathways as abstractions of relationships between genes.

15.
BMC Bioinformatics ; 18(1): 396, 2017 Sep 05.
Article in English | MEDLINE | ID: mdl-28870256

ABSTRACT

BACKGROUND: Elaboration of powerful methods to predict functional and/or physical protein-protein interactions from genome sequence is one of the main tasks in the post-genomic era. Phylogenetic profiling allows the prediction of protein-protein interactions at a whole genome level in both Prokaryotes and Eukaryotes. For this reason it is considered one of the most promising methods. RESULTS: Here, we propose an improvement of phylogenetic profiling that enables handling of large genomic datasets and infer global protein-protein interactions. This method uses the distance correlation as a new measure of phylogenetic profile similarity. We constructed robust reference sets and developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation that makes it applicable to large genomic data. Using Saccharomyces cerevisiae and Escherichia coli genome datasets, we showed that Phylo-dCor outperforms phylogenetic profiling methods previously described based on the mutual information and Pearson's correlation as measures of profile similarity. CONCLUSIONS: In this work, we constructed and assessed robust reference sets and propose the distance correlation as a measure for comparing phylogenetic profiles. To make it applicable to large genomic data, we developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation. Two R scripts that can be run on a wide range of machines are available upon request.


Subject(s)
Algorithms , Area Under Curve , Databases, Genetic , Escherichia coli/genetics , Escherichia coli/metabolism , Escherichia coli Proteins/chemistry , Escherichia coli Proteins/metabolism , Protein Interaction Maps , ROC Curve , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/chemistry , Saccharomyces cerevisiae Proteins/metabolism
16.
Mol Biol Evol ; 34(8): 2016-2034, 2017 08 01.
Article in English | MEDLINE | ID: mdl-28460059

ABSTRACT

Cilia (flagella) are important eukaryotic organelles, present in the Last Eukaryotic Common Ancestor, and are involved in cell motility and integration of extracellular signals. Ciliary dysfunction causes a class of genetic diseases, known as ciliopathies, however current knowledge of the underlying mechanisms is still limited and a better characterization of genes is needed. As cilia have been lost independently several times during evolution and they are subject to important functional variation between species, ciliary genes can be investigated through comparative genomics. We performed phylogenetic profiling by predicting orthologs of human protein-coding genes in 100 eukaryotic species. The analysis integrated three independent methods to predict a consensus set of 274 ciliary genes, including 87 new promising candidates. A fine-grained analysis of the phylogenetic profiles allowed a partitioning of ciliary genes into modules with distinct evolutionary histories and ciliary functions (assembly, movement, centriole, etc.) and thus propagation of potential annotations to previously undocumented genes. The cilia/basal body localization was experimentally confirmed for five of these previously unannotated proteins (LRRC23, LRRC34, TEX9, WDR27, and BIVM), validating the relevance of our approach. Furthermore, our multi-level analysis sheds light on the core gene sets retained in gamete-only flagellates or Ecdysozoa for instance. By combining gene-centric and species-oriented analyses, this work reveals new ciliary and ciliopathy gene candidates and provides clues about the evolution of ciliary processes in the eukaryotic domain. Additionally, the positive and negative reference gene sets and the phylogenetic profile of human genes constructed during this study can be exploited in future work.


Subject(s)
Cilia/genetics , Ciliopathies/genetics , Animals , Cell Movement/genetics , Cilia/metabolism , Ciliopathies/metabolism , Databases, Nucleic Acid , Eukaryota , Eukaryotic Cells , Evolution, Molecular , Flagella/genetics , Flagella/metabolism , Genomics , Humans , Phylogeny , Sequence Analysis, DNA/methods
17.
Methods ; 129: 8-17, 2017 10 01.
Article in English | MEDLINE | ID: mdl-28454776

ABSTRACT

Recent years have witnessed unprecedented accumulation of DNA sequences and therefore protein sequences (predicted from DNA sequences), due to the advances of sequencing technology. One of the major sources of the hypothetical proteins is the metagenomics research. Current annotation of metagenomes (collections of short metagenomic sequences or assemblies) relies on similarity searches against known gene/protein families, based on which functional profiles of microbial communities can be built. This practice, however, leaves out the hypothetical proteins, which may outnumber the known proteins for many microbial communities. On the other hand, we may ask: what can we gain from the large number of metagenomes made available by the metagenomic studies, for the annotation of metagenomic sequences as well as functional annotation of hypothetical proteins in general? Here we propose a community profiling approach for predicting functional associations between proteins: two proteins are predicted to be associated if they share similar presence and absence profiles (called community profiles) across microbial communities. Community profiling is conceptually similar to the phylogenetic profiling approach to functional prediction, however with fundamental differences. We tested different profile construction methods, the selection of reference metagenomes, and correlation metrics, among others, to optimize the performance of this new approach. We demonstrated that the community profiling approach alone slightly outperforms the phylogenetic profiling approach for associating proteins in species that are well represented by sequenced genomes, and combining phylogenetic and community profiling further improves (though only marginally) the prediction of functional association. Further we showed that community profiling method significantly outperforms phylogenetic profiling, revealing more functional associations, when applied to a more recently sequenced bacterial genome.


Subject(s)
Metagenomics , Microbial Consortia/genetics , Sequence Analysis, DNA/methods , Software , Algorithms , Computational Biology/methods , Databases, Genetic , Genome, Bacterial , Phylogeny
18.
Methods Mol Biol ; 1526: 87-98, 2017.
Article in English | MEDLINE | ID: mdl-27896737

ABSTRACT

Functional constraints between genes display similar patterns of gain or loss during speciation. Similar phylogenetic profiles, therefore, can be an indication of a functional association between genes. The phylogenetic profiling method has been applied successfully to the reconstruction of gene pathways and the inference of unknown gene functions. This method requires only sequence data to generate phylogenetic profiles. This method therefore has the potential to take advantage of the recent explosion in available sequence data to reveal a significant number of functional associations between genes. Since the initial development of phylogenetic profiling, many modifications to improve this method have been proposed, including improvements in the measurement of profile similarity and the selection of reference species. Here, we describe the existing methods of phylogenetic profiling for the inference of functional associations and discuss their technical limitations and caveats.


Subject(s)
Computational Biology/methods , Gene Regulatory Networks/genetics , Gene Expression Profiling , Phylogeny
19.
J Plant Physiol ; 208: 94-101, 2017 Jan.
Article in English | MEDLINE | ID: mdl-27898332

ABSTRACT

Relatively little is known about why odd-numbered fatty acids (OFAs) can be synthesized only by some plant species. We aimed at determining whether there is a relationship between the effects of Cd-induced oxidative stress on unsaturated fatty acids (USFAs) and their degradation products, especially OFAs. Plants with different ability to accumulate Cd - Noccaea praecox from Mezica, Slovenia (Me) and two ecotypes of Noccaea caerulescens from Ganges, France (Ga) and Redlschlag, Austria (Re) were cultivated in pot experiments. Only Me plants contained OFA 13:0, while all plants contained OFAs 15:0, 17:0 and 23:0 but in different proportions. Mutual correlations showed a significant effect of Cd contamination on the content of OFAs and USFAs in Me, a less pronounced effect in Re and the lowest one in Ga plants. The most significant correlation between the contents of USFAs and OFAs was also calculated for Me plants. The correlations between OFAs and USFAs indicate an active participation of OFA in FAs metabolism. Increased efficiency of utilization of the assimilated carbon via OFAs metabolism of Me plants in contrast to Re and Ga is also reflected in the increase of tolerance of Me plants to Cd toxicity in plant cells.


Subject(s)
Brassicaceae/physiology , Cadmium/toxicity , Fatty Acids/metabolism , Adaptation, Physiological , Austria , Biodegradation, Environmental , Brassicaceae/drug effects , Cadmium/metabolism , Ecotype , Fatty Acids/analysis , Fatty Acids, Unsaturated/analysis , Fatty Acids, Unsaturated/metabolism , France , Oxidative Stress/drug effects , Slovenia , Stress, Physiological
20.
Genomics ; 103(1): 65-75, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24530517

ABSTRACT

Gene fusion and fission events are important for evolutionary studies and for predicting protein-protein interactions. Previous studies have shown that fusion events always predominate over fission events and, in their majority, they represent singular events throughout evolution. In this project, the role of fusion and fission events in the genome evolution of 104 human bacterial pathogens was studied. 141 protein pairs were identified to be involved in gene fusion or fission events. Surprisingly, we find that, in the species analyzed, gene fissions prevail over fusions. Moreover, while most events appear to have occurred only once in evolution, 23% of the gene fusion and fission events identified are deduced to have occurred independently multiple times. Comparison of the analyzed bacteria with non-pathogenic close relatives indicates that this impressive result is associated with the recent evolutionary history of the human bacterial pathogens, and thus is probably caused by their pathogenic lifestyle.


Subject(s)
Bacteria/genetics , Bacterial Proteins/genetics , Gene Frequency , Gene Fusion , Genome, Bacterial , Evolution, Molecular , Gene Expression Profiling , Humans , Phylogeny , Recombination, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL