Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 236
Filtrar
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38695119

RESUMO

Sequence similarity is of paramount importance in biology, as similar sequences tend to have similar function and share common ancestry. Scoring matrices, such as PAM or BLOSUM, play a crucial role in all bioinformatics algorithms for identifying similarities, but have the drawback that they are fixed, independent of context. We propose a new scoring method for amino acid similarity that remedies this weakness, being contextually dependent. It relies on recent advances in deep learning architectures that employ self-supervised learning in order to leverage the power of enormous amounts of unlabelled data to generate contextual embeddings, which are vector representations for words. These ideas have been applied to protein sequences, producing embedding vectors for protein residues. We propose the E-score between two residues as the cosine similarity between their embedding vector representations. Thorough testing on a wide variety of reference multiple sequence alignments indicate that the alignments produced using the new $E$-score method, especially ProtT5-score, are significantly better than those obtained using BLOSUM matrices. The new method proposes to change the way alignments are computed, with far-reaching implications in all areas of textual data that use sequence similarity. The program to compute alignments based on various $E$-scores is available as a web server at e-score.csd.uwo.ca. The source code is freely available for download from github.com/lucian-ilie/E-score.


Assuntos
Algoritmos , Biologia Computacional , Alinhamento de Sequência , Alinhamento de Sequência/métodos , Biologia Computacional/métodos , Software , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Proteínas/química , Proteínas/genética , Aprendizado Profundo , Bases de Dados de Proteínas
2.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-36946414

RESUMO

In the era of constantly increasing amounts of the available protein data, a relevant and interpretable visualization becomes crucial, especially for tasks requiring human expertise. Poincaré disk projection has previously demonstrated its important efficiency for visualization of biological data such as single-cell RNAseq data. Here, we develop a new method PoincaréMSA for visual representation of complex relationships between protein sequences based on Poincaré maps embedding. We demonstrate its efficiency and potential for visualization of protein family topology as well as evolutionary and functional annotation of uncharacterized sequences. PoincaréMSA is implemented in open source Python code with available interactive Google Colab notebooks as described at https://www.dsimb.inserm.fr/POINCARE_MSA.


Assuntos
Proteínas , Software , Humanos , Sequência de Aminoácidos , Evolução Biológica
3.
Subcell Biochem ; 104: 33-47, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38963482

RESUMO

Catalases are essential enzymes for removal of hydrogen peroxide, enabling aerobic and anaerobic metabolism in an oxygenated atmosphere. Monofunctional heme catalases, catalase-peroxidases, and manganese catalases, evolved independently more than two billion years ago, constituting a classic example of convergent evolution. Herein, the diversity of catalase sequences is analyzed through sequence similarity networks, providing the context for sequence distribution of major catalase families, and showing that many divergent catalase families remain to be experimentally studied.


Assuntos
Catalase , Evolução Molecular , Catalase/química , Catalase/genética , Catalase/metabolismo , Humanos , Animais , Peróxido de Hidrogênio/metabolismo , Peróxido de Hidrogênio/química , Heme/química , Heme/metabolismo
4.
J Bacteriol ; 206(4): e0045223, 2024 04 18.
Artigo em Inglês | MEDLINE | ID: mdl-38551342

RESUMO

The wobble bases of tRNAs that decode split codons are often heavily modified. In bacteria, tRNAGlu, Gln, Asp contains a variety of xnm5s2U derivatives. The synthesis pathway for these modifications is complex and fully elucidated only in a handful of organisms, including the Gram-negative Escherichia coli K12 model. Despite the ubiquitous presence of mnm5s2U modification, genomic analysis shows the absence of mnmC orthologous genes, suggesting the occurrence of alternate biosynthetic schemes for the conversion of cmnm5s2U to mnm5s2U. Using a combination of comparative genomics and genetic studies, a member of the YtqA subgroup of the radical Sam superfamily was found to be involved in the synthesis of mnm5s2U in both Bacillus subtilis and Streptococcus mutans. This protein, renamed MnmL, is encoded in an operon with the recently discovered MnmM methylase involved in the methylation of the pathway intermediate nm5s2U into mnm5s2U in B. subtilis. Analysis of tRNA modifications of both S. mutans and Streptococcus pneumoniae shows that growth conditions and genetic backgrounds influence the ratios of pathway intermediates owing to regulatory loops that are not yet understood. The MnmLM pathway is widespread along the bacterial tree, with some phyla, such as Bacilli, relying exclusively on these two enzymes. Although mechanistic details of these newly discovered components are not fully resolved, the occurrence of fusion proteins, alternate arrangements of biosynthetic components, and loss of biosynthetic branches provide examples of biosynthetic diversity to retain a conserved tRNA modification in Nature.IMPORTANCEThe xnm5s2U modifications found in several tRNAs at the wobble base position are widespread in bacteria where they have an important role in decoding efficiency and accuracy. This work identifies a novel enzyme (MnmL) that is a member of a subgroup of the very versatile radical SAM superfamily and is involved in the synthesis of mnm5s2U in several Gram-positive bacteria, including human pathogens. This is another novel example of a non-orthologous displacement in the field of tRNA modification synthesis, showing how different solutions evolve to retain U34 tRNA modifications.


Assuntos
Escherichia coli K12 , RNA de Transferência , Humanos , RNA de Transferência/genética , Escherichia coli K12/genética , Bactérias/genética , Metilação , Bactérias Gram-Positivas/genética
5.
Proteins ; 92(6): 776-794, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38258321

RESUMO

Three-dimensional (3D) structure information, now available at the proteome scale, may facilitate the detection of remote evolutionary relationships in protein superfamilies. Here, we illustrate this with the identification of a novel family of protein domains related to the ferredoxin-like superfold, by combining (i) transitive sequence similarity searches, (ii) clustering approaches, and (iii) the use of AlphaFold2 3D structure models. Domains of this family were initially identified in relation with the intracellular biomineralization of calcium carbonates by Cyanobacteria. They are part of the large heavy-metal-associated (HMA) superfamily, departing from the latter by specific sequence and structural features. In particular, most of them share conserved basic amino acids  (hence their name CoBaHMA for Conserved Basic residues HMA), forming a positively charged surface, which is likely to interact with anionic partners. CoBaHMA domains are found in diverse modular organizations in bacteria, existing in the form of monodomain proteins or as part of larger proteins, some of which are membrane proteins involved in transport or lipid metabolism. This suggests that the CoBaHMA domains may exert a regulatory function, involving interactions with anionic lipids. This hypothesis might have a particular resonance in the context of the compartmentalization observed for cyanobacterial intracellular calcium carbonates.


Assuntos
Sequência de Aminoácidos , Proteínas de Bactérias , Metais Pesados , Modelos Moleculares , Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Proteínas de Bactérias/genética , Metais Pesados/química , Metais Pesados/metabolismo , Domínios Proteicos , Cianobactérias/metabolismo , Cianobactérias/química , Cianobactérias/genética , Ferredoxinas/química , Ferredoxinas/metabolismo , Dobramento de Proteína
6.
Chembiochem ; : e202400443, 2024 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-38991205

RESUMO

Baeyer-Villiger monooxygenases (BVMOs) are NAD(P)H-dependent flavoproteins that convert ketones to esters and lactones. While these enzymes offer an appealing alternative to traditional Baeyer-Villiger oxidations, these proteins tend to be either too unstable or exhibit too narrow of a substrate scope for implementation as industrial biocatalysts. Here, sequence similarity networks were used to search for novel BVMOs that are both stable and promiscuous. Our genome mining led to the identification of an enzyme from Chloroflexota bacterium (strain G233) dubbed ssnBVMO that exhibits i) the highest melting temperature of any naturally sourced BVMO (62.5 ºC), ii) a remarkable kinetic stability across a wide range of conditions, similar to those of PAMO and PockeMO, iii) optimal catalysis at 50 °C, and iv) a broad substrate scope that includes linear aliphatic, aromatic, and sterically bulky ketones. Subsequent quantitative assays using propiophenone demonstrated >95% conversion. Several fusions were also constructed that linked ssnBVMO to a thermostable phosphite dehydrogenase. These fusions can recycle NADPH and catalyze oxidations with sub-stoichiometric quantities of this expensive cofactor. Characterization of these fusions permitted identification of PTDH-L1-ssnBVMO as the most promising protein that could have utility as a seed sequence for enzyme engineering campaigns aiming to develop biocatalysts for Baeyer-Villiger oxidations.

7.
Chembiochem ; : e202400680, 2024 Sep 24.
Artigo em Inglês | MEDLINE | ID: mdl-39317170

RESUMO

An increasingly effective strategy to identify synthetically useful enzymes is to sample the diversity already present in Nature. Here, we construct and assay a panel of phylogenetically diverse aromatic prenyltransferases (PTs). These enzymes catalyze a variety of C-C bond forming reactions in natural product biosynthesis and are emerging as tools for synthetic chemistry and biology. Homolog screening was further empowered through substrate-multiplexed screening, which provides direct information on enzyme specificity. We perform a head-to-head assessment of the model members of the PT family and further identify homologs with divergent sequences that rival these superb enzymes. This effort revealed the first bacterial O-Tyr PT and, together, provide valuable benchmarking for future synthetic applications of PTs.

8.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35914952

RESUMO

Low complexity regions are fragments of protein sequences composed of only a few types of amino acids. These regions frequently occur in proteins and can play an important role in their functions. However, scientists are mainly focused on regions characterized by high diversity of amino acid composition. Similarity between regions of protein sequences frequently reflect functional similarity between them. In this article, we discuss strengths and weaknesses of the similarity analysis of low complexity regions using BLAST, HHblits and CD-HIT. These methods are considered to be the gold standard in protein similarity analysis and were designed for comparison of high complexity regions. However, we lack specialized methods that could be used to compare the similarity of low complexity regions. Therefore, we investigated the existing methods in order to understand how they can be applied to compare such regions. Our results are supported by exploratory study, discussion of amino acid composition and biological roles of selected examples. We show that existing methods need improvements to efficiently search for similar low complexity regions. We suggest features that have to be re-designed specifically for comparing low complexity regions: scoring matrix, multiple sequence alignment, e-value, local alignment and clustering based on a set of representative sequences. Results of this analysis can either be used to improve existing methods or to create new methods for the similarity analysis of low complexity regions.


Assuntos
Aminoácidos , Proteínas , Algoritmos , Sequência de Aminoácidos , Aminoácidos/genética , Análise por Conglomerados , Proteínas/química , Proteínas/genética , Alinhamento de Sequência , Análise de Sequência de Proteína/métodos
9.
Proc Natl Acad Sci U S A ; 118(4)2021 01 26.
Artigo em Inglês | MEDLINE | ID: mdl-33472976

RESUMO

The monotopic phosphoglycosyl transferase (monoPGT) superfamily comprises over 38,000 nonredundant sequences represented in bacterial and archaeal domains of life. Members of the superfamily catalyze the first membrane-committed step in en bloc oligosaccharide biosynthetic pathways, transferring a phosphosugar from a soluble nucleoside diphosphosugar to a membrane-resident polyprenol phosphate. The singularity of the monoPGT fold and its employment in the pivotal first membrane-committed step allows confident assignment of both protein and corresponding pathway. The diversity of the family is revealed by the generation and analysis of a sequence similarity network for the superfamily, with fusion of monoPGTs with other pathway members being the most frequent and extensive elaboration. Three common fusions were identified: sugar-modifying enzymes, glycosyl transferases, and regulatory domains. Additionally, unexpected fusions of the monoPGT with members of the polytopic PGT superfamily were discovered, implying a possible evolutionary link through the shared polyprenol phosphate substrate. Notably, a phylogenetic reconstruction of the monoPGT superfamily shows a radial burst of functionalization, with a minority of members comprising only the minimal PGT catalytic domain. The commonality and identity of the fusion partners in the monoPGT superfamily is consistent with advantageous colocalization of pathway members at membrane interfaces.


Assuntos
Proteínas de Bactérias/química , Glicoconjugados/química , Glicosiltransferases/química , Bactérias Gram-Negativas/enzimologia , Bactérias Gram-Positivas/enzimologia , Polissacarídeos/química , Sequência de Aminoácidos , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Sítios de Ligação , Citoplasma/enzimologia , Citoplasma/genética , Evolução Molecular , Expressão Gênica , Redes Reguladoras de Genes , Glicoconjugados/metabolismo , Glicosiltransferases/genética , Glicosiltransferases/metabolismo , Bactérias Gram-Negativas/classificação , Bactérias Gram-Negativas/genética , Bactérias Gram-Positivas/classificação , Bactérias Gram-Positivas/genética , Redes e Vias Metabólicas/genética , Modelos Moleculares , Periplasma/enzimologia , Periplasma/genética , Filogenia , Polissacarídeos/metabolismo , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos , Especificidade por Substrato
10.
J Biol Chem ; 298(5): 101881, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35367210

RESUMO

Peptide-derived natural products are a large class of bioactive molecules that often contain chemically challenging modifications. In the biosynthesis of ribosomally synthesized and posttranslationally modified peptides (RiPPs), radical-SAM (rSAM) enzymes have been shown to catalyze the formation of ether, thioether, and carbon-carbon bonds on the precursor peptide. The installation of these bonds typically establishes the skeleton of the mature RiPP. To facilitate the search for unexplored rSAM-dependent RiPPs for the community, we employed a bioinformatic strategy to screen a subfamily of peptide-modifying rSAM enzymes which are known to bind up to three [4Fe-4S] clusters. A sequence similarity network was used to partition related families of rSAM enzymes into >250 clusters. Using representative sequences, genome neighborhood diagrams were generated using the Genome Neighborhood Tool. Manual inspection of bacterial genomes yielded numerous putative rSAM-dependent RiPP pathways with unique features. From this analysis, we identified and experimentally characterized the rSAM enzyme, TvgB, from the tvg gene cluster from Halomonas anticariensis. In the tvg gene cluster, the precursor peptide, TvgA, is comprised of a repeating TVGG motif. Structural characterization of the TvgB product revealed the repeated formation of cyclopropylglycine, where a new bond is formed between the γ-carbons on the precursor valine. This novel RiPP modification broadens the functional potential of rSAM enzymes and validates the proposed bioinformatic approach as a practical broad search tool for the discovery of new RiPP topologies.


Assuntos
Biologia Computacional , S-Adenosilmetionina , Sequência de Aminoácidos , Carbono/metabolismo , Peptídeos/química , Processamento de Proteína Pós-Traducional , S-Adenosilmetionina/metabolismo
11.
Proteins ; 91(12): 1712-1723, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37485822

RESUMO

The human predictor team PEZYFoldings got first place with the assessor's formulae (3rd place with Global Distance Test Total Score [GDT-TS]) in the single-domain category and 10th place in the multimer category in Critical Assessment of Structure Prediction 15. In this paper, I describe the exact method used by PEZYFoldings in the competition. As AlphaFold2 and AlphaFold-Multimer, developed by DeepMind, were state-of-the-art structure prediction tools, it was assumed that enhancing the input and output of the tools was an effective strategy to obtain the highest accuracy for structure prediction. Therefore, I used additional tools and databases to collect evolutionarily related sequences and introduced a deep-learning-based model in the refinement step. In addition to these modifications, manual interventions were performed to address various tasks. Detailed analyses were performed after the competition to identify the main contributors to performance. Comparing the number of evolutionarily related sequences I used with those of the other teams that provided AlphaFold2's baseline predictions revealed that an extensive sequence similarity search was one of the main contributors. Nonetheless, there were specific targets for which I could not identify any evolutionarily related sequences, resulting in my inability to construct accurate structures for these targets. Notably, I noticed that I had gained large Z-scores with the subunits of H1137, for which I performed manual domain parsing considering the interfaces between the subunits. This finding implies that the manual intervention contributed to my performance. The influence of the refinement model on the accuracy of structure prediction was minimal. I could have predicted structures with a similar level of accuracy without employing the refinement model. However, from the perspective of accuracy self-estimate, many structures demonstrated improvement after refinement. This improvement likely had a substantial influence on improving my position in the assessor's formulae rankings. These results highlight the opportunities for improvement in (1) multimer prediction, (2) building of larger and more diverse databases, and (3) developing tools to predict structures from primary sequences alone. In addition, transferring the manual intervention process to automation is a future concern.


Assuntos
Aprendizado Profundo , Humanos , Modelos Moleculares , Biologia Computacional/métodos , Proteínas/química , Conformação Proteica
12.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32893299

RESUMO

Given a group of genomes, represented as the sets of genes that belong to them, the discovery of the pangenomic content is based on the search of genetic homology among the genes for clustering them into families. Thus, pangenomic analyses investigate the membership of the families to the given genomes. This approach is referred to as the gene-oriented approach in contrast to other definitions of the problem that takes into account different genomic features. In the past years, several tools have been developed to discover and analyse pangenomic contents. Because of the hardness of the problem, each tool applies a different strategy for discovering the pangenomic content. This results in a differentiation of the performance of each tool that depends on the composition of the input genomes. This review reports the main analysis instruments provided by the current state of the art tools for the discovery of pangenomic contents. Moreover, unlike previous works, the presented study compares pangenomic tools from a methodological perspective, analysing the causes that lead a given methodology to outperform other tools. The analysis is performed by taking into account different bacterial populations, which are synthetically generated by changing evolutionary parameters. The benchmarks used to compare the pangenomic tools, in addition to the computational pipeline developed for this purpose, are available at https://github.com/InfOmics/pangenes-review. Contact: V. Bonnici, R. Giugno Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.


Assuntos
Algoritmos , Biologia Computacional/métodos , Genoma Bacteriano/genética , Genoma/genética , Genômica/métodos , Bactérias/classificação , Bactérias/genética , Evolução Biológica , Mycoplasma/classificação , Mycoplasma/genética , Filogenia , Software
13.
Appl Environ Microbiol ; 89(6): e0036623, 2023 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-37255440

RESUMO

Ketone bodies, including acetoacetate, 3-hydroxybutyrate, and acetone, are produced in the liver of animals during glucose starvation. Enzymes for the metabolism of (R)-3-hydroxybutyrate have been extensively studied, but little is known about the metabolism of its enantiomer (S)-3-hydroxybutyrate. Here, we report the characterization of a novel pathway for the degradation of (S)-3-hydroxybutyrate in anaerobic bacteria. We identify and characterize a stereospecific (S)-3-hydroxylbutyrate dehydrogenase (3SHBDH) from Desulfotomaculum ruminis, which catalyzes the reversible NAD(P)H-dependent reduction of acetoacetate to form (S)-3-hydroxybutyrate. 3SHBDH also catalyzes oxidation of d-threonine (2R, 3S) and l-allo-threonine (2S, 3S), consistent with its specificity for ß-(3S)-hydroxy acids. Isothermal calorimetry experiments support a sequential mechanism involving binding of NADH prior to (S)-3-hydroxybutyrate. Homologs of 3SHBDH are present in anaerobic fermenting and sulfite-reducing bacteria, and experiments with Clostridium pasteurianum showed that 3SHBDH, acetate CoA-transferase (YdiF), and (S)-3-hydroxybutyryl-CoA dehydrogenase (Hbd) are involved together in the degradation of (S)-3-hydroxybutyrate as a carbon and energy source for growth. (S)-3-hydroxybutyrate is a human metabolic marker and a chiral precursor for chemical synthesis, suggesting potential applications of 3SHBDH in diagnostics or the chemicals industry. IMPORTANCE (R)-3-hydroxybutyrate is well studied as a component of ketone bodies produced by the liver and of bacterial polyesters. However, the biochemistry of its enantiomer (S)-3-hydroxybutyrate is poorly understood. This study describes the identification and characterization of a stereospecific (S)-3-hydroxylbutyrate dehydrogenase and its function in a metabolic pathway for the degradation of (S)-3-hydroxybutyrate as a carbon and energy source in anaerobic bacteria. (S)-3-hydroxybutyrate is a mammalian metabolic marker and a precursor for chemical synthesis and bioplastics, suggesting potential applications of these enzymes in diagnostics and biotechnology.


Assuntos
Acetoacetatos , Bactérias Anaeróbias , Animais , Humanos , Ácido 3-Hidroxibutírico , Bactérias Anaeróbias/metabolismo , Hidroxibutirato Desidrogenase/metabolismo , Hidroxibutiratos/metabolismo , Corpos Cetônicos/metabolismo , 3-Hidroxiacil-CoA Desidrogenase , Bactérias/metabolismo , Carbono , Treonina , Mamíferos
14.
Genetica ; 151(6): 325-338, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37817002

RESUMO

Identifying homologs is an important process in the analysis of genetic patterns underlying traits and evolutionary relationships among species. Analysis of gene families is often used to form and support hypotheses on genetic patterns such as gene presence, absence, or functional divergence which underlie traits examined in functional studies. These analyses often require precise identification of all members in a targeted gene family. Manual pipelines where homology search and orthology assignment tools are used separately are the most common approach for identifying small gene families where accurate identification of all members is important. The ability to curate sequences between steps in manual pipelines allows for simple and precise identification of all possible gene family members. However, the validity of such manual pipeline analyses is often decreased by inappropriate approaches to homology searches including too relaxed or stringent statistical thresholds, inappropriate query sequences, homology classification based on sequence similarity alone, and low-quality proteome or genome sequences. In this article, we propose several approaches to mitigate these issues and allow for precise identification of gene family members and support for hypotheses linking genetic patterns to functional traits.


Assuntos
Genoma , Software , Evolução Biológica
15.
J Muscle Res Cell Motil ; 44(4): 255-270, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37258982

RESUMO

The thick filament-associated A-band region of titin is a highly repetitive component of the titin chain with important scaffolding properties that support thick filament assembly. It also has a demonstrated link to human disease. Despite its functional significance, it remains a largely uncharacterized part of the titin protein. Here, we have performed an analysis of sequence and structure conservation of A-band titin, with emphasis on poly-FnIII tandem components. Specifically, we have applied multi-dimensional sequence pairwise similarity analysis to FnIII domains and complemented this with the crystallographic elucidation of the 3D-structure of the FnIII-triplet A84-A86 from the fourth long super-repeat in the C-zone (C4). Structural models serve here as templates to map sequence conservation onto super-repeat C4, which we show is a prototypical representative of titin's C-zone. This templating identifies positionally conserved residue clusters in C super-repeats with the potential of mediating interactions to thick-filament components. Conservation localizes to two super-repeat positions: Ig domains in position 1 and FnIII domains in position 7. The analysis also allows conclusions to be drawn on the conserved architecture of titin's A-band, as well as revisiting and expanding the evolutionary model of titin's A-band.


Assuntos
Proteínas Musculares , Sarcômeros , Humanos , Conectina/metabolismo , Proteínas Musculares/metabolismo , Sarcômeros/metabolismo
16.
Arch Microbiol ; 205(4): 155, 2023 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-37000297

RESUMO

Levoglucosan is produced in the pyrolysis of cellulose and starch, including from bushfires or the burning of biofuels, and is deposited from the atmosphere across the surface of the earth. We describe two levoglucosan degrading Paenarthrobacter spp. (Paenarthrobacter nitrojuajacolis LG01 and Paenarthrobacter histidinolovorans LG02) that were isolated from soil by metabolic enrichment using levoglucosan as the sole carbon source. Genome sequencing and proteomics analysis revealed the expression of a series of genes encoding known levoglucosan degrading enzymes, levoglucosan dehydrogenase (LGDH, LgdA), 3-keto-levoglucosan ß -eliminase (LgdB1) and glucose 3-dehydrogenase (LgdC), along with an ABC transporter cassette and an associated solute binding protein. However, no homologues of 3-ketoglucose dehydratase (LgdB2) were evident, while the expressed genes contained a range of putative sugar phosphate isomerases/xylose isomerases with weak similarity to LgdB2. Sequence similarity network analysis of genome neighbours of LgdA revealed that homologues of LgdB1 and LgdC are generally conserved in a range of bacteria in the phyla Firmicutes, Actinobacteria and Proteobacteria. One group of sugar phosphate isomerase/xylose isomerase homologues (named LgdB3) was identified with limited distribution that is mutually exclusive with LgdB2, and we propose that they may fulfil a similar function. LgdB1, LgdB2 and LgdB3 adopt similar predicted 3D folds, suggesting overlapping function in processing intermediates in LG metabolism. Our findings highlight diversity within the LGDH pathway, through which bacteria utilize levoglucosan as a nutrient source.


Assuntos
Actinobacteria , Fosfatos Açúcares , Bactérias/genética , Bactérias/metabolismo , Actinobacteria/metabolismo , Glucose/metabolismo
17.
Int J Mol Sci ; 24(18)2023 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-37762466

RESUMO

In flowering plants, C4 photosynthesis is superior to C3 type in carbon fixation efficiency and adaptation to extreme environmental conditions, but the mechanisms behind the assembly of C4 machinery remain elusive. This study attempts to dissect the evolutionary divergence from C3 to C4 photosynthesis in five photosynthetic model plants from the grass family, using a combined comparative transcriptomics and deep learning technology. By examining and comparing gene expression levels in bundle sheath and mesophyll cells of five model plants, we identified 16 differentially expressed signature genes showing cell-specific expression patterns in C3 and C4 plants. Among them, two showed distinctively opposite cell-specific expression patterns in C3 vs. C4 plants (named as FOGs). The in silico physicochemical analysis of the two FOGs illustrated that C3 homologous proteins of LHCA6 had low and stable pI values of ~6, while the pI values of LHCA6 homologs increased drastically in C4 plants Setaria viridis (7), Zea mays (8), and Sorghum bicolor (over 9), suggesting this protein may have different functions in C3 and C4 plants. Interestingly, based on pairwise protein sequence/structure similarities between each homologous FOG protein, one FOG PGRL1A showed local inconsistency between sequence similarity and structure similarity. To find more examples of the evolutionary characteristics of FOG proteins, we investigated the protein sequence/structure similarities of other FOGs (transcription factors) and found that FOG proteins have diversified incompatibility between sequence and structure similarities during grass family evolution. This raised an interesting question as to whether the sequence similarity is related to structure similarity during C4 photosynthesis evolution.


Assuntos
Magnoliopsida , Setaria (Planta) , Sorghum , Zea mays/genética , Fotossíntese/genética
18.
BMC Bioinformatics ; 23(1): 347, 2022 Aug 19.
Artigo em Inglês | MEDLINE | ID: mdl-35986255

RESUMO

BACKGROUND: Amino acid property-aware phylogenetic analysis (APPA) refers to the phylogenetic analysis method based on amino acid property encoding, which is used for understanding and inferring evolutionary relationships between species from the molecular perspective. Fast Fourier transform (FFT) and Higuchi's fractal dimension (HFD) have excellent performance in describing sequences' structural and complexity information for APPA. However, with the exponential growth of protein sequence data, it is very important to develop a reliable APPA method for protein sequence analysis. RESULTS: Consequently, we propose a new method named FFP, it joints FFT and HFD. Firstly, FFP is used to encode protein sequences on the basis of the important physicochemical properties of amino acids, the dissociation constant, which determines acidity and basicity of protein molecules. Secondly, FFT and HFD are used to generate the feature vectors of encoded sequences, whereafter, the distance matrix is calculated from the cosine function, which describes the degree of similarity between species. The smaller the distance between them, the more similar they are. Finally, the phylogenetic tree is constructed. When FFP is tested for phylogenetic analysis on four groups of protein sequences, the results are obviously better than other comparisons, with the highest accuracy up to more than 97%. CONCLUSION: FFP has higher accuracy in APPA and multi-sequence alignment. It also can measure the protein sequence similarity effectively. And it is hoped to play a role in APPA's related research.


Assuntos
Aminoácidos , Fractais , Algoritmos , Análise de Fourier , Filogenia , Proteínas/química
19.
J Biol Chem ; 297(1): 100820, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34029589

RESUMO

CYTH proteins make up a large superfamily that is conserved in all three domains of life. These enzymes have a triphosphate tunnel metalloenzyme (TTM) fold, which typically results in phosphatase functions, e.g., RNA triphosphatase, inorganic polyphosphatase, or thiamine triphosphatase. Some CYTH orthologs cyclize nucleotide triphosphates to 3',5'-cyclic nucleotides. So far, archaeal CYTH proteins have been annotated as adenylyl cyclases, although experimental evidence to support these annotations is lacking. To address this gap, we characterized a CYTH ortholog, SaTTM, from the crenarchaeote Sulfolobus acidocaldarius. Our in silico studies derived ten major subclasses within the CYTH family implying a close relationship between these archaeal CYTH enzymes and class IV adenylyl cyclases. However, initial biochemical characterization reveals inability of SaTTM to produce any cyclic nucleotides. Instead, our structural and functional analyses show a classical TTM behavior, i.e., triphosphatase activity, where pyrophosphate causes product inhibition. The Ca2+-inhibited Michaelis complex indicates a two-metal-ion reaction mechanism analogous to other TTMs. Cocrystal structures of SaTTM further reveal conformational dynamics in SaTTM that suggest feedback inhibition in TTMs due to tunnel closure in the product state. These structural insights combined with further sequence similarity network-based in silico analyses provide a firm molecular basis for distinguishing CYTH orthologs with phosphatase activities from class IV adenylyl cyclases.


Assuntos
Archaea/enzimologia , Proteínas Arqueais/química , Proteínas Arqueais/metabolismo , Família Multigênica , Polifosfatos/metabolismo , Adenilil Ciclases/metabolismo , Motivos de Aminoácidos , Sequência de Aminoácidos , Biocatálise , Íons , Modelos Moleculares , Multimerização Proteica , Especificidade por Substrato , Sulfolobus acidocaldarius/enzimologia , Água
20.
Brief Bioinform ; 21(5): 1596-1608, 2020 09 25.
Artigo em Inglês | MEDLINE | ID: mdl-32978619

RESUMO

Bacterial proteins dubbed virulence factors (VFs) are a highly diverse group of sequences, whose only obvious commonality is the very property of being, more or less directly, involved in virulence. It is therefore tempting to speculate whether their prediction, based on direct sequence similarity (seqsim) to known VFs, could be enhanced or even replaced by using machine-learning methods. Specifically, when trained on a large and diverse set of VFs, such may be able to detect putative, non-trivial characteristics shared by otherwise unrelated VF families and therefore better predict novel VFs with insignificant similarity to each individual family. We therefore first reassess the performance of dimer-based Support Vector Machines, as used in the widely used MP3 method, in light of seqsim-only and seqsim/dimer-hybrid classifiers. We then repeat the analysis with a novel, considerably more diverse data set, also addressing the important problem of negative data selection. Finally, we move on to the real-world use case of proteome-wide VF prediction, outlining different approaches to estimating specificity in this scenario. We find that direct seqsim is of unparalleled importance and therefore should always be exploited. Further, we observe strikingly low correlations between different feature and classifier types when ranking proteins by VF likeness. We therefore propose a 'best of each world' approach to prioritize proteins for experimental testing, focussing on the top predictions of each classifier. Further, classifiers for individual VF families should be developed.


Assuntos
Bactérias/patogenicidade , Proteínas de Bactérias/metabolismo , Máquina de Vetores de Suporte , Fatores de Virulência/metabolismo , Algoritmos , Sequência de Aminoácidos , Proteínas de Bactérias/química , Conjuntos de Dados como Assunto , Dimerização , Proteoma , Fatores de Virulência/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA