Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 24
Filter
1.
Ecotoxicol Environ Saf ; 283: 116795, 2024 Jul 30.
Article in English | MEDLINE | ID: mdl-39083868

ABSTRACT

The rapid growth of cyanobacteria, particularly Microcystis aeruginosa, poses a significant threat to global water security. The proliferation of toxic Microcystis aeruginosa raises concerns due to its potential harm to human health and socioeconomic impacts. Dense blooms contribute to spatiotemporal inorganic carbon depletion, promoting interest in the roles of carbon-concentrating mechanisms (CCMs) for competitive carbon uptake. Despite the importance of HCO3- transporters, genetic evaluations and functional predictions in M. aeruginosa remain insufficient. In this study, we explored the diversity of HCO3- transporters in the genomes of 46 strains of M. aeruginosa, assessing positive selection for each. Intriguingly, although the Microcystis BicA transporter became a partial gene in 23 out of 46 genomic strains, we observed significant positive sites. Structural analyses, including predicted 2D and 3D models, confirmed the structural conservation of the Microcystis BicA transporter. Our findings suggest that the Microcystis BicA transport likely plays a crucial role in competitive carbon uptake, emphasizing its ecological significance. The ecological function of the Microcystis BicA transport in competitive growth during cyanobacterial blooms raises important questions. Future studies require experimental confirmation to better understand the role of the Microcysits BicA transporter in cyanobacterial blooms dynamics.

2.
Semin Cancer Biol ; 92: 84-101, 2023 07.
Article in English | MEDLINE | ID: mdl-37003397

ABSTRACT

Acute myeloid leukemia (AML) is a heterogeneous disease with a genetic, epigenetic, and transcriptional etiology mainly presenting somatic and germline abnormalities. AML incidence rises with age but can also occur during childhood. Pediatric AML (pAML) accounts for 15-20% of all pediatric leukemias and differs considerably from adult AML. Next-generation sequencing technologies have enabled the research community to "paint" the genomic and epigenomic landscape in order to identify pathology-associated mutations and other prognostic biomarkers in pAML. Although current treatments have improved the prognosis for pAML, chemoresistance, recurrence, and refractory disease remain major challenges. In particular, pAML relapse is commonly caused by leukemia stem cells that resist therapy. Marked patient-to-patient heterogeneity is likely the primary reason why the same treatment is successful for some patients but, at best, only partially effective for others. Accumulating evidence indicates that patient-specific clonal composition impinges significantly on cellular processes, such as gene regulation and metabolism. Although our understanding of metabolism in pAML is still in its infancy, greater insights into these processes and their (epigenetic) modulation may pave the way toward novel treatment options. In this review, we summarize current knowledge on the function of genetic and epigenetic (mis)regulation in pAML, including metabolic features observed in the disease. Specifically, we describe how (epi)genetic machinery can affect chromatin status during hematopoiesis, leading to an altered metabolic profile, and focus on the potential value of targeting epigenetic abnormalities in precision and combination therapy for pAML. We also discuss the possibility of using alternative epidrug-based therapeutic approaches that are already in clinical practice, either alone as adjuvant treatments and/or in combination with other drugs.


Subject(s)
Epigenomics , Leukemia, Myeloid, Acute , Humans , Child , Leukemia, Myeloid, Acute/genetics , Leukemia, Myeloid, Acute/therapy , Prognosis , Mutation
3.
Mol Biol Evol ; 40(4)2023 04 04.
Article in English | MEDLINE | ID: mdl-37096789

ABSTRACT

The CODEML program in the PAML package has been widely used to analyze protein-coding gene sequences to estimate the synonymous and nonsynonymous rates (dS and dN) and to detect positive Darwinian selection driving protein evolution. For users not familiar with molecular evolutionary analysis, the program is known to have a steep learning curve. Here, we provide a step-by-step protocol to illustrate the commonly used tests available in the program, including the branch models, the site models, and the branch-site models, which can be used to detect positive selection driving adaptive protein evolution affecting particular lineages of the species phylogeny, affecting a subset of amino acid residues in the protein, and affecting a subset of sites along prespecified lineages, respectively. A data set of the myxovirus (Mx) genes from ten mammal and two bird species is used as an example. We discuss a new feature in CODEML that allows users to perform positive selection tests for multiple genes for the same set of taxa, as is common in modern genome-sequencing projects. The PAML package is distributed at https://github.com/abacus-gene/paml under the GNU license, with support provided at its discussion site (https://groups.google.com/g/pamlsoftware). Data files used in this protocol are available at https://github.com/abacus-gene/paml-tutorial.


Subject(s)
Evolution, Molecular , Software , Animals , Codon , Base Sequence , Selection, Genetic , Phylogeny , Mammals/genetics
4.
Mol Phylogenet Evol ; 182: 107755, 2023 05.
Article in English | MEDLINE | ID: mdl-36906194

ABSTRACT

The genus Rumex L. (Polygonaceae) provides a unique system for investigating the evolutionary development of sex determination and molecular rate evolution. Historically, Rumex has been divided, both taxonomically and colloquially into two groups: 'docks' and 'sorrels'. A well-resolved phylogeny can help evaluate a genetic basis for this division. Here we present a plastome phylogeny for 34 species of Rumex, inferred using maximum likelihood criteria. The historical 'docks' (Rumex subgenus Rumex) were resolved as monophyletic. The historical 'sorrels' (Rumex subgenera Acetosa and Acetosella) were resolved together, though not monophyletic due to the inclusion of R. bucephalophorus (Rumex subgenus Platypodium). Emex is supported as its own subgenus within Rumex, instead of resolved as sister taxa. We found remarkably low nucleotide diversity among the docks, consistent with recent diversification in that group, especially as compared to the sorrels. Fossil calibration of the phylogeny suggested that the common ancestor for Rumex (including Emex) has origins in the lower Miocene (22.13 MYA). The sorrels appear to have subsequently diversified at a relatively constant rate. The origin of the docks, however, was placed in the upper Miocene, but with most speciation occurring in the Plio-Pleistocene.


Subject(s)
Polygonaceae , Rumex , Phylogeny , Rumex/genetics , Biological Evolution , Evolution, Molecular
5.
Mitochondrion ; 69: 36-42, 2023 03.
Article in English | MEDLINE | ID: mdl-36690316

ABSTRACT

The two species of the Old World Camelini tribe, dromedary and Bactrian camels, show superior adaptability to the different environmental conditions they populate, e.g. desert, mountains and coastal areas, which might be associated with adaptive variations on their mitochondrial DNA. Here, we investigate signatures of natural selection in the 13-mitochondrial protein-coding genes of different dromedary camel populations from the Arabian Peninsula, Africa and southwest Asia. The full mitogenome sequences of 42 dromedaries, 38 domestic Bactrian, 29 wild Bactrian camels and 31 samples representing the New World Lamini tribe reveal species-wise genetic distinction among Camelidae family species, with no evidence of geographic distinction among dromedary camels. We observe gene-wide signals of adaptive divergence between the Old World and New World camels, with evidence of purifying selection among Old World camel species. Upon comparing the different Camelidae tribes, 27 amino acid substitutions across ten mtDNA protein-coding genes were found to be under positive selection, in which, 24 codons were defined to be under positive adaptive divergence between Old World and New World camels. Seven codons belonging to three genes demonstrated positive selection in dromedary lineage. A total of 89 codons were found to be under positive selection in Camelidae family based on investigating the impact of amino acid replacement on the physiochemical properties of proteins, including equilibrium constant and surrounding hydrophobicity. These mtDNA variants under positive selection in the Camelidae family might be associated with their adaptation to their contrasting environments.


Subject(s)
Camelus , DNA, Mitochondrial , Animals , Camelus/genetics , DNA, Mitochondrial/genetics , DNA, Mitochondrial/chemistry , Mitochondria/genetics
6.
Genes (Basel) ; 13(6)2022 06 18.
Article in English | MEDLINE | ID: mdl-35741852

ABSTRACT

Evolution is change over time. Although neutral changes promoted by drift effects are most reliable for phylogenetic reconstructions, selection-relevant changes are of only limited use to reconstruct phylogenies. On the other hand, comparative analyses of neutral and selected changes of protein-coding DNA sequences (CDS) retrospectively tell us about episodic constrained, relaxed, and adaptive incidences. The ratio of sites with nonsynonymous (amino acid altering) versus synonymous (not altering) mutations directly measures selection pressure and can be analysed by using the Phylogenetic Analysis by Maximum Likelihood (PAML) software package. We developed a CDS extractor for compiling protein-coding sequences (CDS-extractor) and parallel PAML (paPAML) to simplify, amplify, and accelerate selection analyses via parallel processing, including detection of negatively selected sites. paPAML compiles results of site, branch-site, and branch models and detects site-specific negative selection with the output of a codon list labelling significance values. The tool simplifies selection analyses for casual and inexperienced users and accelerates computing speeds up to the number of allocated computer threads. We then applied paPAML to examine the evolutionary impact on a new GINS Complex Subunit 3 exon, and neutrophil-associated as well as lysin and apolipoprotein genes. Compared with codeml (PAML version 4.9j) and HyPhy (HyPhy FEL version 2.5.26), all paPAML test runs performed with 10 computing threads led to identical selection pressure results, whereas the total selection analysis via paPAML, including all model comparisons, was about 3 to 5 times faster than the longest running codeml model and about 7 to 15 times faster than the entire processing time of these codeml runs.


Subject(s)
Software , Codon , Open Reading Frames , Phylogeny , Retrospective Studies
7.
PeerJ ; 9: e11677, 2021.
Article in English | MEDLINE | ID: mdl-34221740

ABSTRACT

The suppressor of the cytokine signaling (SOCS) family of proteins play an essential role in inhibiting cytokine receptor signaling by regulating immune signal pathways. Although SOCS gene functions have been examined extensively, no comprehensive study has been performed on this gene family's molecular evolution in reptiles. In this study, we identified eight canonical SOCS genes using recently-published reptilian genomes. We used phylogenetic analysis to determine that the SOCS genes had highly conserved evolutionary dynamics that we classified into two types. We identified positive SOCS4 selection signals in whole reptile lineages and SOCS2 selection signals in the crocodilian lineage. Selective pressure analyses using the branch model and Z-test revealed that these genes were under different negative selection pressures compared to reptile lineages. We also concluded that the nature of selection pressure varies across different reptile lineages on SOCS3, and the crocodilian lineage has experienced rapid evolution. Our results may provide a theoretical foundation for further analyses of reptilian SOCS genes' functional and molecular mechanisms, as well as their roles in reptile growth and development.

8.
Genome Biol Evol ; 13(8)2021 08 03.
Article in English | MEDLINE | ID: mdl-34289036

ABSTRACT

Tests based on the dN/dS statistic are used to identify positive selection of nonsynonymous polymorphisms. Using these tests on alignments of all orthologs from related species can provide insights into which gene categories have been most frequently positively selected. However, longer alignments have more power to detect positive selection, creating a detection bias that could create misleading results from functional enrichment tests. Most studies of positive selection in plant pathogens focus on genes with specific virulence functions, with little emphasis on broader molecular processes. Furthermore, no studies in plant pathogens have accounted for detection bias due to alignment length when performing functional enrichment tests. To address these research gaps, we analyze 12 genomes of the phytopathogenic fungal genus Botrytis, including two sequenced in this study. To establish a temporal context, we estimated fossil-calibrated divergence times for the genus. We find that Botrytis likely originated 16-18 Ma in the Miocene and underwent continuous radiation ending in the Pliocene. An untargeted scan of Botrytis single-copy orthologs for positive selection with three different statistical tests uncovered evidence for positive selection among proteases, signaling proteins, CAZymes, and secreted proteins. There was also a strong overrepresentation of transcription factors among positively selected genes. This overrepresentation was still apparent after two complementary controls for detection bias due to sequence length. Positively selected sites were depleted within DNA-binding domains, suggesting changes in transcriptional responses to internal and external cues or protein-protein interactions have undergone positive selection more frequently than changes in promoter fidelity.


Subject(s)
Evolution, Molecular , Selection, Genetic , Botrytis/genetics , Phylogeny , Transcription Factors/genetics
9.
PeerJ ; 7: e7613, 2019.
Article in English | MEDLINE | ID: mdl-31531274

ABSTRACT

The hedgehog signaling pathway plays a vital role in human and animal patterning and cell proliferation during the developmental process. The hedgehog gene family of vertebrate species includes three genes, Shh, Dhh, and Ihh, which possess different functions and expression patterns. Despite the importance of hedgehog genes, genomic evidence of this gene family in reptiles is lacking. In this study, the available genomes of a number of representative reptile species were explored by utilizing adaptive evolutionary analysis methods to characterize the evolutionary patterns of the hedgehog gene family. Altogether, 33 sonic hedgehog (Shh), 25 desert hedgehog (Dhh), and 20 Indian hedgehog (Ihh) genes were obtained from reptiles, and six avian and five mammalian sequences were added to the analysis. The phylogenetic maximum likelihood (ML) tree of the Shh, Dhh, and Ihh genes revealed a similar topology, which is approximately consistent with the traditional taxonomic group. No shared positive selection site was identified by the PAML site model or the three methods in the Data Monkey Server. Branch model and Clade model C analyses revealed that the Dhh and Ihh genes experienced different evolutionary forces in reptiles and other vertebrates, while the Shh gene was not significantly different in terms of selection pressure. The different evolutionary rates of the Dhh and Ihh genes suggest that these genes may be potential contributors to the discrepant sperm and body development of different clades. The different adaptive evolutionary history of the Shh, Dhh, and Ihh genes among reptiles may be due to their different functions in regulating cellular events of development from the embryonic stages to adulthood. Overall, this study has provided meaningful information regarding the evolution of the hedgehog gene family in reptiles and a theoretical foundation for further analyses on the functional and molecular mechanisms that have shaped the reptilian hedgehog genes.

10.
Methods Mol Biol ; 1910: 747-766, 2019.
Article in English | MEDLINE | ID: mdl-31278684

ABSTRACT

Open-source software encourages computer programmers to reuse software components written by others. In evolutionary bioinformatics, open-source software comes in a broad range of programming languages, including C/C++, Perl, Python, Ruby, Java, and R. To avoid writing the same functionality multiple times for different languages, it is possible to share components by bridging computer languages and Bio* projects, such as BioPerl, Biopython, BioRuby, BioJava, and R/Bioconductor.In this chapter, we compare the three principal approaches for sharing software between different programming languages: by remote procedure call (RPC), by sharing a local "call stack," and by calling program to programs. RPC provides a language-independent protocol over a network interface; examples are SOAP and Rserve. The local call stack provides a between-language mapping, not over the network interface but directly in computer memory; examples are R bindings, RPy, and languages sharing the Java virtual machine stack. This functionality provides strategies for sharing of software between Bio* projects, which can be exploited more often.Here, we present cross-language examples for sequence translation and measure throughput of the different options. We compare calling into R through native R, RSOAP, Rserve, and RPy interfaces, with the performance of native BioPerl, Biopython, BioJava, and BioRuby implementations and with call stack bindings to BioJava and the European Molecular Biology Open Software Suite (EMBOSS).In general, call stack approaches outperform native Bio* implementations, and these, in turn, outperform "RPC"-based approaches. To test and compare strategies, we provide a downloadable Docker container with all examples, tools, and libraries included.


Subject(s)
Computational Biology , Programming Languages , Software , Computational Biology/methods , User-Computer Interface , Web Browser
11.
Front Plant Sci ; 9: 1395, 2018.
Article in English | MEDLINE | ID: mdl-30283490

ABSTRACT

Auxin response factors (ARFs) are important transcription factors involved in both the auxin signaling pathway and the regulatory development of various plant organs. In this study, 23 TaARF members encoded by a total of 68 homeoalleles were isolated from 18 wheat chromosomes (excluding chromosome 4). The TaARFs, including their conserved domains, exon/intron structures, related microRNAs, and alternative splicing (AS) variants, were then characterized. Phylogenetic analysis revealed that members of the TaARF family share close homology with ARFs in other grass species. qRT-PCR analyses revealed that 20 TaARF members were expressed in different organs and tissues and that the expression of some members significantly differed in the roots, stems, and leaves of wheat seedlings in response to exogenous auxin treatment. Moreover, protein network analyses and co-expression results showed that TaTIR1-TaARF15/18/19-TaIAA13 may interact at both the protein and genetic levels. The results of subsequent evolutionary analyses showed that three transcripts of TaARF15 in the A subgenome of wheat exhibited high evolutionary rate and underwent positive selection. Transgenic analyses indicated that TaARF15-A.1 promoted the growth of roots and leaves of Arabidopsis thaliana and was upregulated in the overexpression plants after auxin treatment. Our results will provide reference information for subsequent research and utilization of the TaARF gene family.

12.
BMC Bioinformatics ; 19(Suppl 11): 364, 2018 Oct 22.
Article in English | MEDLINE | ID: mdl-30343671

ABSTRACT

BACKGROUND: Determining patterns of nucleotide and amino acid substitution is the first step during sequence evolution analysis. However, it is not easy to visualize the different phylogenetic signatures imprinted in aligned nucleotide and amino acid sequences. RESULTS: Here we present PoSE (Pattern of Sequence Evolution), a reliable resource for unveiling the evolutionary history of sequence alignments and for graphically displaying their contents. Substitutions are displayed by category (transitions and transversions), codon position, and phenotypic effect (synonymous and nonsynonymous). Visualization is accomplished using MATLAB scripts wrapped around PAML (Phylogenetic Analysis by Maximum Likelihood), implemented in an easy-to-use graphical user interface. The application displays inferred substitutions estimated by baseml or codeml, two programs included in the PAML software package. PoSE organizes patterns of substitution in eleven plots, including estimated non-synonymous/synonymous ratios (dN/dS) along the sequence alignment. In addition, PoSE provides visualization and annotation of patterns of amino acid substitutions along groups of related sequences that can be graphically inspected in a phylogenetic tree window. CONCLUSIONS: PoSE is a useful tool to help determine major patterns during sequence evolution of protein-coding sequences, hypervariable regions, or changes in dN/dS ratios. PoSE is publicly available at https://github.com/CDCgov/PoSE.


Subject(s)
Evolution, Molecular , Software , Base Pairing/genetics , Base Sequence , Codon/genetics , Phylogeny , Poliovirus/genetics
13.
Int J Biol Macromol ; 109: 698-703, 2018 Apr 01.
Article in English | MEDLINE | ID: mdl-29292152

ABSTRACT

Toll-like receptors (TLRs) encoded by the TLR multigene family play an important role in initial pathogen recognition in vertebrates. Among the TLRs, TLR2 and TLR4 may be of particular importance to reptiles. In order to study the evolutionary patterns and structural characteristics of TLRs, we explored the available genomes of several representative members of reptiles. 25 TLR2 genes and 19 TLR4 genes from reptiles were obtained in this study. Phylogenetic results showed that the TLR2 gene duplication occurred in several species. Evolutionary analysis by at least two methods identified 30 and 13 common positively selected codons in TLR2 and TLR4, respectively. Most positively selected sites of TLR2 and TLR4 were located in the Leucine-rich repeat (LRRs). Branch model analysis showed that TLR2 genes were under different evolutionary forces in reptiles, while the TLR4 genes showed no significant selection pressure. The different evolutionary adaptation of TLR2 and TLR4 among the reptiles might be due to their different function in recognizing bacteria. Overall, we explored the structure and evolution of TLR2 and TLR4 genes in reptiles for the first time. Our study revealed valuable information regarding TLR2 and TLR4 in reptiles, and provided novel insights into the conservation concern of natural populations.


Subject(s)
Evolution, Molecular , Gene Duplication , Genomics , Reptiles/genetics , Selection, Genetic , Toll-Like Receptor 2/genetics , Toll-Like Receptor 4/genetics , Animals , Genomics/methods , Phylogeny , Reptiles/immunology , Sequence Analysis, DNA , Toll-Like Receptor 2/chemistry , Toll-Like Receptor 4/chemistry
14.
Methods Mol Biol ; 1525: 315-347, 2017.
Article in English | MEDLINE | ID: mdl-27896727

ABSTRACT

In this chapter, I review the basic algorithm underlying the CODEML model implemented in the software package PAML. This is intended as a companion to the software's manual, and a primer to the extensive literature available on CODEML. At the end of this chapter, I hope that you will be able to understand enough of how CODEML operates to plan your own analyses.


Subject(s)
Selection, Genetic/genetics , Algorithms , Codon/genetics , Evolution, Molecular , Software
15.
BMC Bioinformatics ; 17(1): 354, 2016 Sep 06.
Article in English | MEDLINE | ID: mdl-27597435

ABSTRACT

BACKGROUND: Uncovering how phenotypic diversity arises and is maintained in nature has long been a major interest of evolutionary biologists. Recent advances in genome sequencing technologies have remarkably increased the efficiency to pinpoint genes involved in the adaptive evolution of phenotypes. Reliability of such findings is most often examined with statistical and computational methods using Maximum Likelihood codon-based models (i.e., site, branch, branch-site and clade models), such as those available in codeml from the Phylogenetic Analysis by Maximum Likelihood (PAML) package. While these models represent a well-defined workflow for documenting adaptive evolution, in practice they can be challenging for researchers having a vast amount of data, as multiple types of relevant codon-based datasets are generated, making the overall process hard and tedious to handle, error-prone and time-consuming. RESULTS: We introduce LMAP (Lightweight Multigene Analyses in PAML), a user-friendly command-line and interactive package, designed to handle the codeml workflow, namely: directory organization, execution, results gathering and organization for Likelihood Ratio Test estimations with minimal manual user intervention. LMAP was developed for the workstation multi-core environment and provides a unique advantage for processing one, or more, if not all codeml codon-based models for multiple datasets at a time. Our software, proved efficiency throughout the codeml workflow, including, but not limited, to simultaneously handling more than 20 datasets. CONCLUSIONS: We have developed a simple and versatile LMAP package, with outstanding performance, enabling researchers to analyze multiple different codon-based datasets in a high-throughput fashion. At minimum, two file types are required within a single input directory: one for the multiple sequence alignment and another for the phylogenetic tree. To our knowledge, no other software combines all codeml codon substitution models of adaptive evolution. LMAP has been developed as an open-source package, allowing its integration into more complex open-source bioinformatics pipelines. LMAP package is released under GPLv3 license and is freely available at http://lmapaml.sourceforge.net/ .


Subject(s)
Computational Biology/methods , Proteins/genetics , Sequence Alignment/methods , Codon/chemistry , Codon/genetics , Internet , Phylogeny , Proteins/chemistry , Software
16.
J Exp Bot ; 66(22): 7347-58, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26417023

ABSTRACT

The two carboxylation reactions performed by phosphoenolpyruvate carboxylase (PEPC) and ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) are vital in the fixation of inorganic carbon for C4 plants. The abundance of PEPC is substantially elevated in C4 leaves, while the location of Rubisco is restricted to one of two chloroplast types. These differences compared with C3 leaves have been shown to result in convergent enzyme optimization in some C4 species. Investigation into the kinetic properties of PEPC and Rubisco from Kranz C4, single cell C4, and C3 species in Chenopodiaceae s. s. subfamily Suaedoideae showed that these major carboxylases in C4 Suaedoideae species lack the same mutations found in other C4 systems which have been examined; but still have similar convergent kinetic properties. Positive selection analysis on the N-terminus of PEPC identified residues 364 and 368 to be under positive selection with a posterior probability >0.99 using Bayes empirical Bayes. Compared with previous analyses on other C4 species, PEPC from C4 Suaedoideae species have different convergent amino acids that result in a higher K m for PEP and malate tolerance compared with C3 species. Kinetic analysis of Rubisco showed that C4 species have a higher catalytic efficiency of Rubisco (k catc in mol CO2 mol(-1) Rubisco active sites s(-1)), despite lacking convergent substitutions in the rbcL gene. The importance of kinetic changes to the two-carboxylation reactions in C4 leaves related to amino acid selection is discussed.


Subject(s)
Carbon Cycle , Chenopodiaceae/metabolism , Phosphoenolpyruvate Carboxylase/metabolism , Photosynthesis , Ribulose-Bisphosphate Carboxylase/metabolism , Amino Acid Substitution , Biological Evolution , Carbon/chemistry , Carbon/metabolism , Kinetics , Species Specificity
17.
Evol Med Public Health ; 2015(1): 88-105, 2015 Mar 18.
Article in English | MEDLINE | ID: mdl-25788149

ABSTRACT

Helicobacter pylori is a bacterium that lives in the human stomach and is a major risk factor for gastric cancer and ulcers. H.pylori is host dependent and has been carried with human populations around the world after their departure from Africa. We wished to investigate how H.pylori has coevolved with its host during that time, focusing on strains from Japanese and European populations, given that gastric cancer incidence is high in Japanese populations, while low in European. A positive selection analysis of eight H.pylori genomes was conducted, using maximum likelihood based pairwise comparisons in order to maximize the number of strain-specific genes included in the study. Using the genic Ka/Ks ratio, comparisons of four Japanese H.pylori genomes suggests 25-34 genes under positive selection, while four European H.pylori genomes suggests 16-21 genes; few of the genes identified were in common between lineages. Of the identified genes which were annotated, 38% possessed homologs associated with pathogenicity and / or host adaptation, consistent with their involvement in a coevolutionary 'arms race' with the host. Given the efficacy of identifying host interaction factors de novo, in the absence of functionally annotated homologs our evolutionary approach may have value in identifying novel genes which H.pylori employs to interact with the human gut environment. In addition, the larger number of genes inferred as being under positive selection in Japanese strains compared to European implies a stronger overall adaptive pressure, potentially resulting from an elevated immune response which may be linked to increased inflammation, an initial stage in the development of gastric cancer.

18.
Evol Bioinform Online ; 11(Suppl 2): 11-7, 2015.
Article in English | MEDLINE | ID: mdl-26819542

ABSTRACT

The branch-site test of positive selection is a standard approach to detect past episodic positive selection in a priori-specified branches of a gene phylogeny. Here, we ask if differences in the topology of the gene tree have any influence on the ability to infer positively selected sites. Using simulated sequences, we compare the results obtained for true and rearranged topologies. We find a strong relationship between "conflicting branch length," which occurs when the set of sequences that experiences selection for a given topology and foreground is changed, and the ability to predict positively selected sites. Moreover, by reanalyzing a previously published data set, we show that the choice of a gene tree also affects the results obtained for real-world sequences. This is the first study to demonstrate that tree topology has a clear effect on the inference of positive selection. We conclude that the choice of a gene tree is an important factor for the branch-site analysis of positive selection.

19.
Evol Bioinform Online ; 10: 197-204, 2014.
Article in English | MEDLINE | ID: mdl-25525323

ABSTRACT

With the greater availability of genetic data, large genome-wide scans for positive selection increasingly incorporate data from a range of sources. These data sets may be derived from different sequencing methods, each of which has potential sources of error. Sequencing errors, compounded by alignment errors, greatly increase the number of false positives in tests for adaptive evolution. Genome-wide analyses often fail to fully address these issues or to provide sufficient detail on postalignment masking/filtering. Here, we introduce a Sliding Window Alignment Masker for Phylogenetic Analysis by Maximum Likelihood (SWAMP) that scans multiple-sequence alignments for short regions enriched with unreasonably high rates of nonsynonymous substitutions caused, for example, by sequence or alignment errors. SWAMP prevents their inclusion in downstream evolutionary analyses and therefore increases the reliability of downstream analyses. It is able to effectively mask short stretches of erroneous sequence, particularly prevalent in low-coverage genomes, which may not be detected by existing methods based on filtering by sitewise conservation or alignment confidence. SWAMP offers a flexible masking approach, and the user can apply different masking regimens to specific branches or sequences in the phylogeny allowing the stringency of masking to vary according to branch length, expected divergence levels, or assembly quality. We exemplify SWAMPs effectiveness on a dataset of 6,379 protein-coding genes from primate species, including data of variable quality. Full reporting of the software parameters will further improve the reproducibility of genome-wide analyses, as well as reduce false-positive rates.

20.
Genome Biol Evol ; 6(9): 2368-79, 2014 Sep 04.
Article in English | MEDLINE | ID: mdl-25193312

ABSTRACT

Natural selection leaves imprints on DNA, offering the opportunity to identify functionally important regions of the genome. Identifying the genomic regions affected by natural selection within pathogens can aid in the pursuit of effective strategies to control diseases. In this study, we analyzed genome-wide patterns of selection acting on different classes of sequences in a worldwide sample of eight strains of the model plant-pathogenic fungus Colletotrichum graminicola. We found evidence of selective sweeps, balancing selection, and positive selection affecting both protein-coding and noncoding DNA of pathogenicity-related sequences. Genes encoding putative effector proteins and secondary metabolite biosynthetic enzymes show evidence of positive selection acting on the coding sequence, consistent with an Arms Race model of evolution. The 5' untranslated regions (UTRs) of genes coding for effector proteins and genes upregulated during infection show an excess of high-frequency polymorphisms likely the consequence of balancing selection and consistent with the Red Queen hypothesis of evolution acting on these putative regulatory sequences. Based on the findings of this work, we propose that even though adaptive substitutions on coding sequences are important for proteins that interact directly with the host, polymorphisms in the regulatory sequences may confer flexibility of gene expression in the virulence processes of this important plant pathogen.


Subject(s)
Colletotrichum/genetics , Fungi/genetics , Plant Diseases/microbiology , Selection, Genetic , Virulence Factors/genetics , 5' Untranslated Regions , Base Sequence , Codon , Colletotrichum/pathogenicity , Evolution, Molecular , Fungi/pathogenicity , Molecular Sequence Data , Virulence
SELECTION OF CITATIONS
SEARCH DETAIL