Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 74
Filter
1.
Mol Biol Evol ; 41(9)2024 Sep 04.
Article in English | MEDLINE | ID: mdl-39158305

ABSTRACT

Profile mixture models capture distinct biochemical constraints on the amino acid substitution process at different sites in proteins. These models feature a mixture of time-reversible models with a common matrix of exchangeabilities and distinct sets of equilibrium amino acid frequencies known as profiles. Combining the exchangeability matrix with each profile generates the matrix of instantaneous rates of amino acid exchange for that profile. Currently, empirically estimated exchangeability matrices (e.g. the LG matrix) are widely used for phylogenetic inference under profile mixture models. However, these were estimated using a single profile and are unlikely optimal for profile mixture models. Here, we describe the GTRpmix model that allows maximum likelihood estimation of a common exchangeability matrix under any profile mixture model. We show that exchangeability matrices estimated under profile mixture models differ from the LG matrix, dramatically improving model fit and topological estimation accuracy for empirical test cases. Because the GTRpmix model is computationally expensive, we provide two exchangeability matrices estimated from large concatenated phylogenomic-supermatrices to be used for phylogenetic analyses. One, called Eukaryotic Linked Mixture (ELM), is designed for phylogenetic analysis of proteins encoded by nuclear genomes of eukaryotes, and the other, Eukaryotic and Archaeal Linked mixture (EAL), for reconstructing relationships between eukaryotes and Archaea. These matrices, combined with profile mixture models, fit data better and have improved topology estimation relative to the LG matrix combined with the same mixture models. Starting with version 2.3.1, IQ-TREE2 allows users to estimate linked exchangeabilities (i.e. amino acid exchange rates) under profile mixture models.


Subject(s)
Models, Genetic , Phylogeny , Archaea/genetics , Likelihood Functions , Amino Acid Substitution , Evolution, Molecular , Eukaryota/genetics
2.
Mol Biol Evol ; 41(7)2024 Jul 03.
Article in English | MEDLINE | ID: mdl-38934791

ABSTRACT

We have recently introduced MAPLE (MAximum Parsimonious Likelihood Estimation), a new pandemic-scale phylogenetic inference method exclusively designed for genomic epidemiology. In response to the need for enhancing MAPLE's performance and scalability, here we present two key components: (i) CMAPLE software, a highly optimized C++ reimplementation of MAPLE with many new features and advancements, and (ii) CMAPLE library, a suite of application programming interfaces to facilitate the integration of the CMAPLE algorithm into existing phylogenetic inference packages. Notably, we have successfully integrated CMAPLE into the widely used IQ-TREE 2 software, enabling its rapid adoption in the scientific community. These advancements serve as a vital step toward better preparedness for future pandemics, offering researchers powerful tools for large-scale pathogen genomic analysis.


Subject(s)
Phylogeny , Software , Algorithms , Pandemics , Likelihood Functions , Humans
3.
Genome ; 67(9): 316-326, 2024 Sep 01.
Article in English | MEDLINE | ID: mdl-38722238

ABSTRACT

Animals encounter diverse microbial communities throughout their lifetime, which exert varying selection pressures. Antimicrobial peptides (AMPs), which lyse or inhibit microbial growth, are a first line of defense against some of these microbes. Here we examine how developmental variation in microbial exposure has affected the evolution of expression and amino acid sequences of Defensins (an ancient class of AMPs) in the house fly (Musca domestica). The house fly is a well-suited model for this work because it trophically associates with varying microbial communities throughout its life history and its genome contains expanded families of AMPs, including Defensins. We identified two subsets of house fly Defensins: one expressed in larvae or pupae, and the other expressed in adults. The amino acid sequences of these two Defensin subsets form distinct monophyletic clades, and they are located in separate gene clusters in the genome. The adult-expressed Defensins evolve faster than larval/pupal Defensins, consistent with different selection pressures across developmental stages. Our results therefore suggest that varied microbial communities encountered across life history can shape the evolutionary trajectories of immune genes.


Subject(s)
Defensins , Houseflies , Animals , Defensins/genetics , Houseflies/genetics , Evolution, Molecular , Phylogeny , Larva/genetics , Immune System , Amino Acid Sequence , Multigene Family
4.
Mol Biol Evol ; 41(2)2024 Feb 01.
Article in English | MEDLINE | ID: mdl-38301272

ABSTRACT

The transcription factor and cell cycle regulator p53 is marked for degradation by the ubiquitin ligase MDM2. The interaction between these 2 proteins is mediated by a conserved binding motif in the disordered p53 transactivation domain (p53TAD) and the folded SWIB domain in MDM2. The conserved motif in p53TAD from zebrafish displays a 20-fold weaker interaction with MDM2, compared to the interaction in human and chicken. To investigate this apparent difference, we tracked the molecular evolution of the p53TAD/MDM2 interaction among ray-finned fishes (Actinopterygii), the largest vertebrate clade. Intriguingly, phylogenetic analyses, ancestral sequence reconstructions, and binding experiments showed that different loss-of-affinity changes in the canonical binding motif within p53TAD have occurred repeatedly and convergently in different fish lineages, resulting in relatively low extant affinities (KD = 0.5 to 5 µM). However, for 11 different fish p53TAD/MDM2 interactions, nonconserved regions flanking the canonical motif increased the affinity 4- to 73-fold to be on par with the human interaction. Our findings suggest that compensating changes at conserved and nonconserved positions within the motif, as well as in flanking regions of low conservation, underlie a stabilizing selection of "functional affinity" in the p53TAD/MDM2 interaction. Such interplay complicates bioinformatic prediction of binding and calls for experimental validation. Motif-mediated protein-protein interactions involving short binding motifs and folded interaction domains are very common across multicellular life. It is likely that the evolution of affinity in motif-mediated interactions often involves an interplay between specific interactions made by conserved motif residues and nonspecific interactions by nonconserved disordered regions.


Subject(s)
Tumor Suppressor Protein p53 , Zebrafish , Animals , Humans , Tumor Suppressor Protein p53/genetics , Tumor Suppressor Protein p53/chemistry , Tumor Suppressor Protein p53/metabolism , Phylogeny , Protein Structure, Tertiary , Protein Binding , Proto-Oncogene Proteins c-mdm2/genetics , Proto-Oncogene Proteins c-mdm2/chemistry , Proto-Oncogene Proteins c-mdm2/metabolism
5.
Proc Natl Acad Sci U S A ; 121(6): e2308895121, 2024 Feb 06.
Article in English | MEDLINE | ID: mdl-38285950

ABSTRACT

Computational models of evolution are valuable for understanding the dynamics of sequence variation, to infer phylogenetic relationships or potential evolutionary pathways and for biomedical and industrial applications. Despite these benefits, few have validated their propensities to generate outputs with in vivo functionality, which would enhance their value as accurate and interpretable evolutionary algorithms. We demonstrate the power of epistasis inferred from natural protein families to evolve sequence variants in an algorithm we developed called sequence evolution with epistatic contributions (SEEC). Utilizing the Hamiltonian of the joint probability of sequences in the family as fitness metric, we sampled and experimentally tested for in vivo [Formula: see text]-lactamase activity in Escherichia coli TEM-1 variants. These evolved proteins can have dozens of mutations dispersed across the structure while preserving sites essential for both catalysis and interactions. Remarkably, these variants retain family-like functionality while being more active than their wild-type predecessor. We found that depending on the inference method used to generate the epistatic constraints, different parameters simulate diverse selection strengths. Under weaker selection, local Hamiltonian fluctuations reliably predict relative changes to variant fitness, recapitulating neutral evolution. SEEC has the potential to explore the dynamics of neofunctionalization, characterize viral fitness landscapes, and facilitate vaccine development.


Subject(s)
Epistasis, Genetic , Proteins , Phylogeny , Proteins/genetics , Mutation , Phenotype , Evolution, Molecular , Genetic Fitness , Models, Genetic
6.
Genome Biol Evol ; 15(11)2023 11 01.
Article in English | MEDLINE | ID: mdl-37931036

ABSTRACT

The nonrecombining female-limited W chromosome is predicted to experience unique evolutionary processes. Difficulties in assembling W chromosome sequences have hindered the identification of duck W-linked sequences and their evolutionary footprint. To address this, we conducted three initial contig-level genome assemblies and developed a rigorous pipeline by which to successfully expand the W-linked data set, including 11 known genes and 24 newly identified genes. Our results indicate that the W chromosome expression may not be subject to female-specific selection; a significant convergent pattern of upregulation associated with increased female-specific selection was not detected. The genetic stability of the W chromosome is also reflected in the strong evolutionary correlation between it and the mitochondria; the complete consistency of the cladogram topology constructed from their gene sequences proves the shared maternal coevolution. By detecting the evolutionary trajectories of W-linked sequences, we have found that recombination suppression started in four distinct strata, of which three were conserved across Neognathae. Taken together, our results have revealed a unique evolutionary pattern and an independent stratum evolutionary pattern for sex chromosomes.


Subject(s)
Ducks , Evolution, Molecular , Animals , Female , Ducks/genetics , Sex Chromosomes , Birds/genetics , Inheritance Patterns
7.
Genetica ; 151(6): 325-338, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37817002

ABSTRACT

Identifying homologs is an important process in the analysis of genetic patterns underlying traits and evolutionary relationships among species. Analysis of gene families is often used to form and support hypotheses on genetic patterns such as gene presence, absence, or functional divergence which underlie traits examined in functional studies. These analyses often require precise identification of all members in a targeted gene family. Manual pipelines where homology search and orthology assignment tools are used separately are the most common approach for identifying small gene families where accurate identification of all members is important. The ability to curate sequences between steps in manual pipelines allows for simple and precise identification of all possible gene family members. However, the validity of such manual pipeline analyses is often decreased by inappropriate approaches to homology searches including too relaxed or stringent statistical thresholds, inappropriate query sequences, homology classification based on sequence similarity alone, and low-quality proteome or genome sequences. In this article, we propose several approaches to mitigate these issues and allow for precise identification of gene family members and support for hypotheses linking genetic patterns to functional traits.


Subject(s)
Genome , Software , Biological Evolution
8.
Viruses ; 15(6)2023 05 24.
Article in English | MEDLINE | ID: mdl-37376527

ABSTRACT

The improvement of our knowledge of the virosphere, which includes unknown viruses, is a key area in virology. Metagenomics tools, which perform taxonomic assignation from high throughput sequencing datasets, are generally evaluated with datasets derived from biological samples or in silico spiked samples containing known viral sequences present in public databases, resulting in the inability to evaluate the capacity of these tools to detect novel or distant viruses. Simulating realistic evolutionary directions is therefore key to benchmark and improve these tools. Additionally, expanding current databases with realistic simulated sequences can improve the capacity of alignment-based searching strategies for finding distant viruses, which could lead to a better characterization of the "dark matter" of metagenomics data. Here, we present Virus Pop, a novel pipeline for simulating realistic protein sequences and adding new branches to a protein phylogenetic tree. The tool generates simulated sequences with substitution rate variations that are dependent on protein domains and inferred from the input dataset, allowing for a realistic representation of protein evolution. The pipeline also infers ancestral sequences corresponding to multiple internal nodes of the input data phylogenetic tree, enabling new sequences to be inserted at various points of interest in the group studied. We demonstrated that Virus Pop produces simulated sequences that closely match the structural and functional characteristics of real protein sequences, taking as an example the spike protein of sarbecoviruses. Virus Pop also succeeded at creating sequences that resemble real sequences not included in the databases, which facilitated the identification of a novel pathogenic human circovirus not included in the input database. In conclusion, Virus Pop is helpful for challenging taxonomic assignation tools and could help improve databases to better detect distant viruses.


Subject(s)
Computational Biology , Viruses , Humans , Phylogeny , Computational Biology/methods , Computer Simulation , Databases, Factual , Viruses/genetics , Metagenomics/methods
9.
R Soc Open Sci ; 10(4): 221313, 2023 Apr.
Article in English | MEDLINE | ID: mdl-37035296

ABSTRACT

Genes with sex-biased expression are thought to underlie sexually dimorphic phenotypes and are therefore subject to different selection pressures in males and females. Many authors have proposed that sexual conflict leads to the evolution of sex-biased expression, which allows males and females to reach separate phenotypic and fitness optima. The selection pressures associated with domestication may cause changes in population architectures and mating systems, which in turn can alter their direction and strength. We compared sex-biased expression and genetic signatures in wild and domestic ducks (Anas platyrhynchos), and observed changes of sexual selection and identified the genomic divergence affected by selection forces. The extent of sex-biased expression in both sexes is positively correlated with the level of both d N /d S and nucleotide diversity. This observed changing pattern may mainly be owing to relaxed genetic constraints. We also demonstrate a clear link between domestication and sex-biased evolutionary rate in a comparative framework. Decreased polymorphism and evolutionary rate in domesticated populations generally matched life-history phenotypes known to experience artificial selection. Taken together, our work suggests the important implications of domestication in sex-biased evolution and the roles of artificial selection and sexual selection for shaping the diversity and evolutionary rate of the genome.

10.
BMC Plant Biol ; 23(1): 156, 2023 Mar 22.
Article in English | MEDLINE | ID: mdl-36944988

ABSTRACT

BACKGROUND: Plant organelle genomes are a valuable resource for evolutionary biology research, yet their genome architectures, evolutionary patterns and environmental adaptations are poorly understood in many lineages. Rhodiola species is a type of flora mainly distributed in highland habitats, with high medicinal value. Here, we assembled the organelle genomes of three Rhodiola species (R. wallichiana, R. crenulata and R. sacra) collected from the Qinghai-Tibet plateau (QTP), and compared their genome structure, gene content, structural rearrangements, sequence transfer and sequence evolution rates. RESULTS: The results demonstrated the contrasting evolutionary pattern between plastomes and mitogenomes in three Rhodiola species, with the former possessing more conserved genome structure but faster evolutionary rates of sequence, while the latter exhibiting structural diversity but slower rates of sequence evolution. Some lineage-specific features were observed in Rhodiola mitogenomes, including chromosome fission, gene loss and structural rearrangement. Repeat element analysis shows that the repeats occurring between the two chromosomes may mediate the formation of multichromosomal structure in the mitogenomes of Rhodiola, and this multichromosomal structure may have recently formed. The identification of homologous sequences between plastomes and mitogenomes reveals several unidirectional protein-coding gene transfer events from chloroplasts to mitochondria. Moreover, we found that their organelle genomes contained multiple fragments of nuclear transposable elements (TEs) and exhibited different preferences for TEs insertion type. Genome-wide scans of positive selection identified one gene matR from the mitogenome. Since the matR is crucial for plant growth and development, as well as for respiration and stress responses, our findings suggest that matR may participate in the adaptive response of Rhodiola species to environmental stress of QTP. CONCLUSION: The study analyzed the organelle genomes of three Rhodiola species and demonstrated the contrasting evolutionary pattern between plastomes and mitogenomes. Signals of positive selection were detected in the matR gene of Rhodiola mitogenomes, suggesting the potential role of this gene in Rhodiola adaptation to QTP. Together, the study is expected to enrich the genomic resources and provide valuable insights into the structural dynamics and sequence divergences of Rhodiola species.


Subject(s)
Genome, Mitochondrial , Genome, Plastid , Rhodiola , Rhodiola/genetics , Phylogeny , Tibet , Mitochondria/genetics , Genome, Mitochondrial/genetics , Evolution, Molecular
11.
Biology (Basel) ; 12(2)2023 Feb 10.
Article in English | MEDLINE | ID: mdl-36829559

ABSTRACT

The factors that determine the relative rates of amino acid substitution during protein evolution are complex and known to vary among taxa. We estimated relative exchangeabilities for pairs of amino acids from clades spread across the tree of life and assessed the historical signal in the distances among these clade-specific models. We separately trained these models on collections of arbitrarily selected protein alignments and on ribosomal protein alignments. In both cases, we found a clear separation between the models trained using multiple sequence alignments from bacterial clades and the models trained on archaeal and eukaryotic data. We assessed the predictive power of our novel clade-specific models of sequence evolution by asking whether fit to the models could be used to identify the source of multiple sequence alignments. Model fit was generally able to correctly classify protein alignments at the level of domain (bacterial versus archaeal), but the accuracy of classification at finer scales was much lower. The only exceptions to this were the relatively high classification accuracy for two archaeal lineages: Halobacteriaceae and Thermoprotei. Genomic GC content had a modest impact on relative exchangeabilities despite having a large impact on amino acid frequencies. Relative exchangeabilities involving aromatic residues exhibited the largest differences among models. There were a small number of exchangeabilities that exhibited large differences in comparisons among major clades and between generalized models and ribosomal protein models. Taken as a whole, these results reveal that a small number of relative exchangeabilities are responsible for much of the structure of the "model space" for protein sequence evolution. The clade-specific models we generated may be useful tools for protein phylogenetics, and the structure of evolutionary model space that they revealed has implications for phylogenomic inference across the tree of life.

12.
Mol Biol Evol ; 39(10)2022 10 07.
Article in English | MEDLINE | ID: mdl-36181434

ABSTRACT

Our understanding of the genetic architecture of phenotypic traits has experienced drastic growth over the last years. Nevertheless, the majority of studies associating genotypes and phenotypes have been conducted at the ontogenetic level. Thus, we still have an elusive knowledge of how these genetic-developmental architectures evolve themselves and how their evolution is mirrored in the phenotypic change across evolutionary time. We tackle this gap by reconstructing the evolution of male genital size, one of the most complex traits in insects, together with its underlying genetic architecture. Using the order Hemiptera as a model, spanning over 350 million years of evolution, we estimate the correlation between genitalia and three features: development rate, body size, and rates of DNA substitution in 68 genes associated with genital development. We demonstrate that genital size macro-evolution has been largely dependent on body size and weakly influenced by development rate and phylogenetic history. We further revealed significant correlations between mutation rates and genital size for 19 genes. Interestingly, these genes have diverse functions and participate in distinct signaling pathways, suggesting that genital size is a complex trait whose fast evolution has been enabled by molecular changes associated with diverse morphogenetic processes. Our data further demonstrate that the majority of DNA evolution correlated with the genitalia has been shaped by negative selection or neutral evolution. Thus, in terms of sequence evolution, changes in genital size are predominantly facilitated by relaxation of constraints rather than positive selection, possibly due to the high pleiotropic nature of the morphogenetic genes.


Subject(s)
Biological Evolution , Evolution, Molecular , Animals , Phylogeny , Genitalia, Male , Genitalia
13.
G3 (Bethesda) ; 12(11)2022 11 04.
Article in English | MEDLINE | ID: mdl-36073932

ABSTRACT

The evolutionary speed of a protein sequence is constrained by its expression level, with highly expressed proteins evolving relatively slowly. This negative correlation between expression levels and evolutionary rates (known as the E-R anticorrelation) has already been widely observed in past macroevolution between species from bacteria to animals. However, it remains unclear whether this seemingly general law also governs recent evolution, including past and de novo, within a species. However, the advent of genomic sequencing and high-throughput phenotyping, particularly for bacteria, has revealed fundamental gaps between the 2 evolutionary processes and has provided empirical data opposing the possible underlying mechanisms which are widely believed. These conflicts raise questions about the generalization of the E-R anticorrelation and the relevance of plausible mechanisms. To explore the ubiquitous impact of expression levels on molecular evolution and test the relevance of the possible underlying mechanisms, we analyzed the genome sequences of 99 strains of Escherichia coli for evolution within species in nature. We also analyzed genomic mutations accumulated under laboratory conditions as a model of de novo evolution within species. Here, we show that E-R anticorrelation is significant in both past and de novo evolution within species in E. coli. Our data also confirmed ongoing purifying selection on highly expressed genes. Ongoing selection included codon-level purifying selection, supporting the relevance of the underlying mechanisms. However, the impact of codon-level purifying selection on the constraints in evolution within species might be smaller than previously expected from evolution between species.


Subject(s)
Escherichia coli , Evolution, Molecular , Animals , Escherichia coli/genetics , Codon , Proteins/genetics , Mutation , Selection, Genetic
14.
Metab Eng ; 74: 49-60, 2022 11.
Article in English | MEDLINE | ID: mdl-36113751

ABSTRACT

The utility of engineering enzyme activity is expanding with the development of biotechnology. Conventional methods have limited applicability as they require high-throughput screening or three-dimensional structures to direct target residues of activity control. An alternative method uses sequence evolution of natural selection. A repertoire of mutations was selected for fine-tuning enzyme activities to adapt to varying environments during the evolution. Here, we devised a strategy called sequence co-evolutionary analysis to control the efficiency of enzyme reactions (SCANEER), which scans the evolution of protein sequences and direct mutation strategy to improve enzyme activity. We hypothesized that amino acid pairs for various enzyme activity were encoded in the evolutionary history of protein sequences, whereas loss-of-function mutations were avoided since those are depleted during the evolution. SCANEER successfully predicted the enzyme activities of beta-lactamase and aminoglycoside 3'-phosphotransferase. SCANEER was further experimentally validated to control the activities of three different enzymes of great interest in chemical production: cis-aconitate decarboxylase, α-ketoglutaric semialdehyde dehydrogenase, and inositol oxygenase. Activity-enhancing mutations that improve substrate-binding affinity or turnover rate were found at sites distal from known active sites or ligand-binding pockets. We provide SCANEER to control desired enzyme activity through a user-friendly webserver.


Subject(s)
Protein Engineering , Mutation , Protein Engineering/methods
15.
Genome Biol Evol ; 14(9)2022 09 06.
Article in English | MEDLINE | ID: mdl-35946263

ABSTRACT

Over the last decade, molecular systematics has undergone a change of paradigm as high-throughput sequencing now makes it possible to reconstruct evolutionary relationships using genome-scale datasets. The advent of "big data" molecular phylogenetics provided a battery of new tools for biologists but simultaneously brought new methodological challenges. The increase in analytical complexity comes at the price of highly specific training in computational biology and molecular phylogenetics, resulting very often in a polarized accumulation of knowledge (technical on one side and biological on the other). Interpreting the robustness of genome-scale phylogenetic studies is not straightforward, particularly as new methodological developments have consistently shown that the general belief of "more genes, more robustness" often does not apply, and because there is a range of systematic errors that plague phylogenomic investigations. This is particularly problematic because phylogenomic studies are highly heterogeneous in their methodology, and best practices are often not clearly defined. The main aim of this article is to present what I consider as the ten most important points to take into consideration when planning a well-thought-out phylogenomic study and while evaluating the quality of published papers. The goal is to provide a practical step-by-step guide that can be easily followed by nonexperts and phylogenomic novices in order to assess the technical robustness of phylogenomic studies or improve the experimental design of a project.


Subject(s)
Biological Evolution , High-Throughput Nucleotide Sequencing , Computational Biology , Genome , High-Throughput Nucleotide Sequencing/methods , Phylogeny
16.
Proc Natl Acad Sci U S A ; 119(24): e2203176119, 2022 06 14.
Article in English | MEDLINE | ID: mdl-35648808

ABSTRACT

Bacterial signal transduction systems sense changes in the environment and transmit these signals to control cellular responses. The simplest one-component signal transduction systems include an input sensor domain and an output response domain encoded in a single protein chain. Alternatively, two-component signal transduction systems transmit signals by phosphorelay between input and output domains from separate proteins. The membrane-tethered periplasmic bile acid sensor that activates the Vibrio parahaemolyticus type III secretion system adopts an obligate heterodimer of two proteins encoded by partially overlapping VtrA and VtrC genes. This co-component signal transduction system binds bile acid using a lipocalin-like domain in VtrC and transmits the signal through the membrane to a cytoplasmic DNA-binding transcription factor in VtrA. Using the domain and operon organization of VtrA/VtrC, we identify a fast-evolving superfamily of co-component systems in enteric bacteria. Accurate machine learning­based fold predictions for the candidate co-components support their homology in the twilight zone of rapidly evolving sequences and provide mechanistic hypotheses about previously unrecognized lipid-sensing functions.


Subject(s)
Bacterial Proteins , Gene Expression Regulation, Bacterial , Genomic Islands , Membrane Proteins , Type III Secretion Systems , Vibrio parahaemolyticus , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Bile Acids and Salts/metabolism , DNA-Binding Proteins/metabolism , Membrane Proteins/genetics , Membrane Proteins/metabolism , Protein Multimerization , Signal Transduction , Transcription Factors/metabolism , Type III Secretion Systems/genetics , Vibrio parahaemolyticus/genetics , Vibrio parahaemolyticus/pathogenicity , Virulence/genetics
17.
Methods Mol Biol ; 2449: 149-167, 2022.
Article in English | MEDLINE | ID: mdl-35507261

ABSTRACT

Sequence-based approaches are fundamental to guide experimental investigations in obtaining structural and/or functional insights into uncharacterized protein families. Powerful profile-based sequence search methods rely on a sequence space continuum to identify non-trivial relationships through homology detection. The computational design of protein-like sequences that serve as "artificial linkers" is useful in identifying relationships between distant members of a structural fold. Such sequences act as intermediates and guide homology searches between distantly related proteins. Here, we describe an approach that represents natural intermediate sequences and designed protein-like sequences as HMM (Hidden Markov Models) profiles, to improve the sensitivity of existing search methods. Searches made within the "Profile database" were shown to recognize the parent structural fold for 90% of the search queries at query coverage better than 60%. For 1040 protein families with no available structure, fold associations were made through searches in the database of natural and designed sequence profiles. Most of the associations were made with the Alpha-alpha superhelix, Transmembrane beta-barrels, TIM barrel, and Immunoglobulin-like beta-sandwich folds. For 11 domain families of unknown functions, we provide confident fold associations using the profiles of designed sequences and a consensus from other fold recognition methods. For two DUFs (Domain families of Unknown Functions), we performed detailed functional annotation through comparisons with characterized templates of families of known function.


Subject(s)
Computational Biology , Proteins , Amino Acid Sequence , Computational Biology/methods , Databases, Protein , Proteins/chemistry , Proteins/genetics
18.
Genome Biol Evol ; 14(4)2022 04 10.
Article in English | MEDLINE | ID: mdl-35420669

ABSTRACT

Members of the Peronosporaceae (Oomycota, Chromista), which currently consists of 25 genera and approximately 1,000 recognized species, are responsible for disease on a wide range of plant hosts. Molecular phylogenetic analyses over the last two decades have improved our understanding of evolutionary relationships within Peronosporaceae. To date, 16 numbered and three named clades have been recognized; it is clear from these studies that the current taxonomy does not reflect evolutionary relationships. Whole organelle genome sequences are an increasingly important source of phylogenetic information, and in this study, we present comparative and phylogenetic analyses of mitogenome sequences from 15 of the 19 currently recognized clades of Peronosporaceae, including 44 newly assembled sequences. Our analyses suggest strong conservation of mitogenome size and gene content across Peronosporaceae but, as previous studies have suggested, limited conservation of synteny. Specifically, we identified 28 distinct syntenies amongst the 71 examined isolates. Moreover, 19 of the isolates contained inverted or direct repeats, suggesting repeated sequences may be more common than previously thought. In terms of phylogenetic relationships, our analyses of 34 concatenated mitochondrial gene sequences resulted in a topology that was broadly consistent with previous studies. However, unlike previous studies concatenated mitochondrial sequences provided strong support for higher-level relationships within the family.


Subject(s)
Genome, Mitochondrial , Oomycetes , Evolution, Molecular , Genes, Mitochondrial , Oomycetes/genetics , Phylogeny , Synteny
19.
Int J Toxicol ; 41(2): 132-142, 2022.
Article in English | MEDLINE | ID: mdl-35311363

ABSTRACT

From a micro to macro scale of biological organization, macromolecular diversity and biological heterogeneity are fundamental properties of biological systems. Heterogeneity may result from genetic, epigenetic, and non-genetic characteristics (e.g., tissue microenvironment). Macromolecular diversity and biological heterogeneity are tolerated as long as the sustenance and propagation of life are not disrupted. They also provide the raw materials for microevolutionary changes that may help organisms adapt to new selection pressures arising from the environment. Sequence evolution, functional divergence, and positive selection of gene and promoter dosage play a major role in the evolution of life's diversity including complex metabolic networks, which is ultimately reflected in changes in the allele frequency over time. Robustness in evolvable biological systems is conferred by functional redundancy that is often created by macromolecular diversity and biological heterogeneity. The ability to investigate biological macromolecules at an increasingly finer level has uncovered a wealth of information in this regard. Therefore, the dynamics of biological complexity should be taken into consideration in biomedical research.

20.
Front Immunol ; 13: 1014439, 2022.
Article in English | MEDLINE | ID: mdl-36618367

ABSTRACT

Affinity maturation (AM) of B cells through somatic hypermutations (SHMs) enables the immune system to evolve to recognize diverse pathogens. The accumulation of SHMs leads to the formation of clonal lineages of antibody-secreting b cells that have evolved from a common naïve B cell. Advances in high-throughput sequencing have enabled deep scans of B cell receptor repertoires, paving the way for reconstructing clonal trees. However, it is not clear if clonal trees, which capture microevolutionary time scales, can be reconstructed using traditional phylogenetic reconstruction methods with adequate accuracy. In fact, several clonal tree reconstruction methods have been developed to fix supposed shortcomings of phylogenetic methods. Nevertheless, no consensus has been reached regarding the relative accuracy of these methods, partially because evaluation is challenging. Benchmarking the performance of existing methods and developing better methods would both benefit from realistic models of clonal lineage evolution specifically designed for emulating B cell evolution. In this paper, we propose a model for modeling B cell clonal lineage evolution and use this model to benchmark several existing clonal tree reconstruction methods. Our model, designed to be extensible, has several features: by evolving the clonal tree and sequences simultaneously, it allows modeling selective pressure due to changes in affinity binding; it enables scalable simulations of large numbers of cells; it enables several rounds of infection by an evolving pathogen; and, it models building of memory. In addition, we also suggest a set of metrics for comparing clonal trees and measuring their properties. Our results show that while maximum likelihood phylogenetic reconstruction methods can fail to capture key features of clonal tree expansion if applied naively, a simple post-processing of their results, where short branches are contracted, leads to inferences that are better than alternative methods.


Subject(s)
Antibodies , Benchmarking , Phylogeny , B-Lymphocytes
SELECTION OF CITATIONS
SEARCH DETAIL