Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
1.
J Exp Bot ; 2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38686677

RESUMO

During germination plants rely entirely on their seed storage compounds to provide energy and precursors for the synthesis of macromolecular structures until the seedling has emerged from the soil and photosynthesis can be established. Lupin seeds use proteins as their major storage compounds, accounting for up to 40% of the seed dry weight. Lupins are therefore a valuable complement to soy as a source of plant protein for human and animal nutrition. The aim of this study was to elucidate how storage protein metabolism is coordinated with other metabolic processes to meet the requirements of the growing seedling. In a quantitative approach, we analyzed seedling growth, as well as alterations in biomass composition, the proteome, and metabolite profiles during germination and seedling establishment in Lupinus albus. The reallocation of nitrogen resources from seed storage proteins to functional seed proteins was mapped based on a manually curated functional protein annotation database. Although classified as a protein crop, Lupinus albus does not use amino acids as a primary substrate for energy metabolism during germination. However, fatty acid and amino acid metabolism may be integrated at the level of malate synthase to combine stored carbon from lipids and proteins into gluconeogenesis.

2.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38446740

RESUMO

Protein annotation has long been a challenging task in computational biology. Gene Ontology (GO) has become one of the most popular frameworks to describe protein functions and their relationships. Prediction of a protein annotation with proper GO terms demands high-quality GO term representation learning, which aims to learn a low-dimensional dense vector representation with accompanying semantic meaning for each functional label, also known as embedding. However, existing GO term embedding methods, which mainly take into account ancestral co-occurrence information, have yet to capture the full topological information in the GO-directed acyclic graph (DAG). In this study, we propose a novel GO term representation learning method, PO2Vec, to utilize the partial order relationships to improve the GO term representations. Extensive evaluations show that PO2Vec achieves better outcomes than existing embedding methods in a variety of downstream biological tasks. Based on PO2Vec, we further developed a new protein function prediction method PO2GO, which demonstrates superior performance measured in multiple metrics and annotation specificity as well as few-shot prediction capability in the benchmarks. These results suggest that the high-quality representation of GO structure is critical for diverse biological tasks including computational protein annotation.


Assuntos
Benchmarking , Biologia Computacional , Ontologia Genética , Aprendizagem , Anotação de Sequência Molecular
3.
Microorganisms ; 12(2)2024 Jan 24.
Artigo em Inglês | MEDLINE | ID: mdl-38399651

RESUMO

BACKGROUND: Eukaryotes' whole-genome sequencing is crucial for species identification, gene detection, and protein annotation. Oxford Nanopore Technology (ONT) is an affordable and rapid platform for sequencing eukaryotes; however, the relatively higher error rates require computational and bioinformatic efforts to produce more accurate genome assemblies. Here, we evaluated the effect of read correction tools on eukaryote genome completeness, gene detection and protein annotation. METHODS: Reads generated by ONT of four eukaryotes, C. albicans, C. gattii, S. cerevisiae, and P. falciparum, were assembled using minimap2 and underwent three rounds of read correction using flye, medaka and racon. The generates consensus FASTA files were compared for total length (bp), genome completeness, gene detection, and protein-annotation by QUAST, BUSCO, BRAKER1 and InterProScan, respectively. RESULTS: Genome completeness was dependent on the assembly method rather than on the read correction tool; however, medaka performed better than flye and racon. Racon significantly performed better than flye and medaka in gene detection, while both racon and medaka significantly performed better than flye in protein-annotation. CONCLUSION: We show that three rounds of read correction significantly affect gene detection and protein annotation, which are dependent on assembly quality in preference to assembly completeness.

4.
Biology (Basel) ; 12(6)2023 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-37372080

RESUMO

The number of unannotated protein sequences is explosively increasing due to genome sequence technology. A more comprehensive understanding of protein functions for protein annotation requires the discovery of new features that cannot be captured from conventional methods. Deep learning can extract important features from input data and predict protein functions based on the features. Here, protein feature vectors generated by 3 deep learning models are analyzed using Integrated Gradients to explore important features of amino acid sites. As a case study, prediction and feature extraction models for UbiD enzymes were built using these models. The important amino acid residues extracted from the models were different from secondary structures, conserved regions and active sites of known UbiD information. Interestingly, the different amino acid residues within UbiD sequences were regarded as important factors depending on the type of models and sequences. The Transformer models focused on more specific regions than the other models. These results suggest that each deep learning model understands protein features with different aspects from existing knowledge and has the potential to discover new laws of protein functions. This study will help to extract new protein features for the other protein annotations.

5.
Genome Biol Evol ; 2023 May 22.
Artigo em Inglês | MEDLINE | ID: mdl-37217837

RESUMO

Interpreting protein function from sequence data is a fundamental goal of bioinformatics. However, our current understanding of protein diversity is bottlenecked by the fact that most proteins have only been functionally validated in model organisms, limiting our understanding of how function varies with gene sequence diversity. Thus, accuracy of inferences in clades without model representatives is questionable. Unsupervised learning may help to ameliorate this bias by identifying highly complex patterns and structure from large datasets without external labels. Here we present DeepSeqProt, an unsupervised deep learning program for exploring large protein sequence datasets. DeepSeqProt is a clustering tool capable of distinguishing between broad classes of proteins while learning local and global structure of functional space. DeepSeqProt is capable of learning salient biological features from unaligned, unannotated sequences. DeepSeqProt is more likely to capture complete protein families and statistically significant shared ontologies within proteomes than other clustering methods. We hope this framework will prove of use to researchers and provide a preliminary step in further developing unsupervised deep learning in molecular biology.

6.
Comput Struct Biotechnol J ; 21: 2696-2704, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37143762

RESUMO

Major advances in genomic and associated technologies have demanded reliable bioinformatic tools and workflows for the annotation of genes and their products via comparative analyses using well-curated reference data sets, accessible in public repositories. However, the accurate in silico annotation of molecules (proteins) encoded in organisms (e.g., multicellular parasites) which are evolutionarily distant from those for which these extensive reference data sets are available, including invertebrate model organisms (e.g., Caenorhabditis elegans - free-living nematode, and Drosophila melanogaster - the vinegar fly) and vertebrate species (e.g., Homo sapiens and Mus musculus), remains a major challenge. Here, we constructed an informatic workflow for the enhanced annotation of biologically-important, excretory/secretory (ES) proteins ("secretome") encoded in the genome of a parasitic roundworm, called Haemonchus contortus (commonly known as the barber's pole worm). We critically evaluated the performance of five distinct methods, refined some of them, and then combined the use of all five methods to comprehensively annotate ES proteins, according to gene ontology, biological pathways and/or metabolic (enzymatic) processes. Then, using optimised parameter settings, we applied this workflow to comprehensively annotate 2591 of all 3353 proteins (77.3%) in the secretome of H. contortus. This result is a substantial improvement (10-25%) over previous annotations using individual, "off-the-shelf" algorithms and default settings, indicating the ready applicability of the present, refined workflow to gene/protein sequence data sets from a wide range of organisms in the Tree-of-Life.

7.
Viruses ; 15(4)2023 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-37112988

RESUMO

Recent years have seen major changes in the classification criteria and taxonomy of viruses. The current classification scheme, also called "megataxonomy of viruses", recognizes six different viral realms, defined based on the presence of viral hallmark genes (VHGs). Within the realms, viruses are classified into hierarchical taxons, ideally defined by the phylogeny of their shared genes. To enable the detection of shared genes, viruses have first to be clustered, and there is currently a need for tools to assist with virus clustering and classification. Here, VirClust is presented. It is a novel, reference-free tool capable of performing: (i) protein clustering, based on BLASTp and Hidden Markov Models (HMMs) similarities; (ii) hierarchical clustering of viruses based on intergenomic distances calculated from their shared protein content; (iii) identification of core proteins and (iv) annotation of viral proteins. VirClust has flexible parameters both for protein clustering and for splitting the viral genome tree into smaller genome clusters, corresponding to different taxonomic levels. Benchmarking on a phage dataset showed that the genome trees produced by VirClust match the current ICTV classification at family, sub-family and genus levels. VirClust is freely available, as a web-service and stand-alone tool.


Assuntos
Bacteriófagos , Vírus , Vírus/genética , Bacteriófagos/genética , Genes Virais , Genoma Viral , Filogenia , Análise por Conglomerados
8.
Int J Mol Sci ; 24(4)2023 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-36835188

RESUMO

Derived from the natural language processing (NLP) algorithms, protein language models enable the encoding of protein sequences, which are widely diverse in length and amino acid composition, in fixed-size numerical vectors (embeddings). We surveyed representative embedding models such as Esm, Esm1b, ProtT5, and SeqVec, along with their derivatives (GoPredSim and PLAST), to conduct the following tasks in computational biology: embedding the Saccharomyces cerevisiae proteome, gene ontology (GO) annotation of the uncharacterized proteins of this organism, relating variants of human proteins to disease status, correlating mutants of beta-lactamase TEM-1 from Escherichia coli with experimentally measured antimicrobial resistance, and analyzing diverse fungal mating factors. We discuss the advances and shortcomings, differences, and concordance of the models. Of note, all of the models revealed that the uncharacterized proteins in yeast tend to be less than 200 amino acids long, contain fewer aspartates and glutamates, and are enriched for cysteine. Less than half of these proteins can be annotated with GO terms with high confidence. The distribution of the cosine similarity scores of benign and pathogenic mutations to the reference human proteins shows a statistically significant difference. The differences in embeddings of the reference TEM-1 and mutants have low to no correlation with minimal inhibitory concentrations (MIC).


Assuntos
Algoritmos , Proteínas , Humanos , Sequência de Aminoácidos , Aminoácidos , Biologia Computacional , Proteínas/química , Saccharomyces cerevisiae/genética , Proteômica
9.
Protein Sci ; 32(1): e4524, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36454227

RESUMO

The availability of accurate and fast artificial intelligence (AI) solutions predicting aspects of proteins are revolutionizing experimental and computational molecular biology. The webserver LambdaPP aspires to supersede PredictProtein, the first internet server making AI protein predictions available in 1992. Given a protein sequence as input, LambdaPP provides easily accessible visualizations of protein 3D structure, along with predictions at the protein level (GeneOntology, subcellular location), and the residue level (binding to metal ions, small molecules, and nucleotides; conservation; intrinsic disorder; secondary structure; alpha-helical and beta-barrel transmembrane segments; signal-peptides; variant effect) in seconds. The structure prediction provided by LambdaPP-leveraging ColabFold and computed in minutes-is based on MMseqs2 multiple sequence alignments. All other feature prediction methods are based on the pLM ProtT5. Queried by a protein sequence, LambdaPP computes protein and residue predictions almost instantly for various phenotypes, including 3D structure and aspects of protein function. LambdaPP is freely available for everyone to use under embed.predictprotein.org, the interactive results for the case study can be found under https://embed.predictprotein.org/o/Q9NZC2. The frontend of LambdaPP can be found on GitHub (github.com/sacdallago/embed.predictprotein.org), and can be freely used and distributed under the academic free use license (AFL-2). For high-throughput applications, all methods can be executed locally via the bio-embeddings (bioembeddings.com) python package, or docker image at ghcr.io/bioembeddings/bio_embeddings, which also includes the backend of LambdaPP.


Assuntos
Inteligência Artificial , Proteínas , Proteínas/química , Sequência de Aminoácidos , Estrutura Secundária de Proteína , Alinhamento de Sequência , Software
10.
Cell ; 185(21): 4023-4037.e18, 2022 10 13.
Artigo em Inglês | MEDLINE | ID: mdl-36174579

RESUMO

High-throughput RNA sequencing offers broad opportunities to explore the Earth RNA virome. Mining 5,150 diverse metatranscriptomes uncovered >2.5 million RNA virus contigs. Analysis of >330,000 RNA-dependent RNA polymerases (RdRPs) shows that this expansion corresponds to a 5-fold increase of the known RNA virus diversity. Gene content analysis revealed multiple protein domains previously not found in RNA viruses and implicated in virus-host interactions. Extended RdRP phylogeny supports the monophyly of the five established phyla and reveals two putative additional bacteriophage phyla and numerous putative additional classes and orders. The dramatically expanded phylum Lenarviricota, consisting of bacterial and related eukaryotic viruses, now accounts for a third of the RNA virome. Identification of CRISPR spacer matches and bacteriolytic proteins suggests that subsets of picobirnaviruses and partitiviruses, previously associated with eukaryotes, infect prokaryotic hosts.


Assuntos
Bacteriófagos , Vírus de RNA , Bacteriófagos/genética , RNA Polimerases Dirigidas por DNA/genética , Genoma Viral , Filogenia , RNA , Vírus de RNA/genética , RNA Polimerase Dependente de RNA/genética , Viroma
11.
Front Genet ; 13: 935351, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35938008

RESUMO

Small proteins, encoded by small open reading frames, are only beginning to emerge with the current advancement of omics technology and bioinformatics. There is increasing evidence that small proteins play roles in diverse critical biological functions, such as adjusting cellular metabolism, regulating other protein activities, controlling cell cycles, and affecting disease physiology. In prokaryotes such as bacteria, the small proteins are largely unexplored for their sequence space and functional groups. For most bacterial species from a natural community, the sample cannot be easily isolated or cultured, and the bacterial peptides must be better characterized in a metagenomic manner. The bacterial peptides identified from metagenomic samples can not only enrich the pool of small proteins but can also reveal the community-specific microbe ecology information from a small protein perspective. In this study, metaBP (Bacterial Peptides for metagenomic sample) has been developed as a comprehensive toolkit to explore the small protein universe from metagenomic samples. It takes raw sequencing reads as input, performs protein-level meta-assembly, and computes bacterial peptide homolog groups with sample-specific mutations. The metaBP also integrates general protein annotation tools as well as our small protein-specific machine learning module metaBP-ML to construct a full landscape for bacterial peptides. The metaBP-ML shows advantages for discovering functions of bacterial peptides in a microbial community and increases the yields of annotations by up to five folds. The metaBP toolkit demonstrates its novelty in adopting the protein-level assembly to discover small proteins, integrating protein-clustering tool in a new and flexible environment of RBiotools, and presenting the first-time small protein landscape by metaBP-ML. Taken together, metaBP (and metaBP-ML) can profile functional bacterial peptides from metagenomic samples with potential diverse mutations, in order to depict a unique landscape of small proteins from a microbial community.

12.
Gigascience ; 112022 08 11.
Artigo em Inglês | MEDLINE | ID: mdl-35950840

RESUMO

BACKGROUND: Many biological properties of phages are determined by phage virion proteins (PVPs), and the poor annotation of PVPs is a bottleneck for many areas of viral research, such as viral phylogenetic analysis, viral host identification, and antibacterial drug design. Because of the high diversity of PVP sequences, the PVP annotation of a phage genome remains a particularly challenging bioinformatic task. FINDINGS: Based on deep learning, we developed DeePVP. The main module of DeePVP aims to discriminate PVPs from non-PVPs within a phage genome, while the extended module of DeePVP can further classify predicted PVPs into the 10 major classes of PVPs. Compared with the present state-of-the-art tools, the main module of DeePVP performs better, with a 9.05% higher F1-score in the PVP identification task. Moreover, the overall accuracy of the extended module of DeePVP in the PVP classification task is approximately 3.72% higher than that of PhANNs. Two application cases show that the predictions of DeePVP are more reliable and can better reveal the compact PVP-enriched region than the current state-of-the-art tools. Particularly, in the Escherichia phage phiEC1 genome, a novel PVP-enriched region that is conserved in many other Escherichia phage genomes was identified, indicating that DeePVP will be a useful tool for the analysis of phage genomic structures. CONCLUSIONS: DeePVP outperforms state-of-the-art tools. The program is optimized in both a virtual machine with graphical user interface and a docker so that the tool can be easily run by noncomputer professionals. DeePVP is freely available at https://github.com/fangzcbio/DeePVP/.


Assuntos
Bacteriófagos , Aprendizado Profundo , Bacteriófagos/genética , Biologia Computacional , Genoma Viral , Filogenia , Vírion/genética
13.
Biochim Biophys Acta Proteins Proteom ; 1870(1): 140721, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34624539

RESUMO

Seq2Enz method is a new way to identify whether a query protein sequence is an enzyme and to assign an enzyme class to the protein sequence. The method is based on mask BLAST fortified with some novel structural-chemical properties (NCL) of the building blocks of proteins. All available reviewed enyme sequences (267,276 in number) in Uniprot/SwissProt and most recent depositions (7062) not used for training in ECPred, a state of the art software for enzyme class prediction, are taken for assessment and the results are compared with those from conventional BLAST and ECPred respectively. Seq2Enz is seen to perform consistently better for all the enzyme classes to all the four levels. Seq2Enz methodology is converted into an easy to use web-server and made freely accessible at http://www.scfbio-iitd.res.in Seq2Enz/.


Assuntos
Domínio Catalítico , Análise de Sequência de Proteína/métodos , Software , Animais , Enzimas/química , Enzimas/metabolismo , Humanos
14.
Trends Biotechnol ; 40(2): 240-254, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34304905

RESUMO

Advances in technological and bioinformatics approaches have led to the generation of a plethora of human gut metagenomic datasets. Metabolomics has also provided substantial data regarding the small metabolites produced and modified by the microbiota. Comparatively, the microbial enzymes mediating the transformation of metabolites have not been intensively investigated. Here, we discuss the recent efforts and technologies used for discovering and mining enzymes from the human gut microbiota. The wealth of knowledge on metabolites, reactions, genome sequences, and structures of proteins, may drive the development of strategies for enzyme mining. Ongoing efforts to annotate gut microbiota enzymes will explain catalytic mechanisms that may guide the clinical applications of the gut microbiome for diagnostic and therapeutic purposes.


Assuntos
Microbioma Gastrointestinal , Microbiota , Biologia Computacional , Microbioma Gastrointestinal/genética , Humanos , Metabolômica
16.
Arch Microbiol ; 203(8): 5257-5265, 2021 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-34213598

RESUMO

The placement of Corynespora olivacea within the large genus Corynespora (Pleosporales) is controversial, because the species is distantly related to other congeners, including the type species C. cassiicola. Corynespora cassiicola is a polyphagous, cosmopolitan plant pathogen. Successful colonization of plant tissues requires the pathogen's effector repertoire to modulate host cell physiology and facilitate the infection process. We sequenced and performed functional annotations on the genomes of C. cassiicola CC_29 (genome size about 44.8 Mb; isolated from soybean leaves) and C. olivacea CBS 114450 (32.3 Mb). Our phylogenomic approach showed that C. cassiicola is distantly related to C. olivacea, which clustered among the Massarinaceae family members, supporting a hypothesis that C. olivacea was originally misclassified. The predicted sizes for the proteome and secretome of C. cassiicola (18,487 and 1327, respectively) were larger than those of C. olivacea (13,501 and 920; respectively). Corynespora cassiicola had a richer repertoire of effector proteins (CAZymes, proteases, lipases, and effectors) and genes associated with secondary metabolism than did C. olivacea.


Assuntos
Ascomicetos , Ascomicetos/genética , Simulação por Computador , Filogenia
17.
Curr Protoc ; 1(5): e113, 2021 May.
Artigo em Inglês | MEDLINE | ID: mdl-33961736

RESUMO

Models from machine learning (ML) or artificial intelligence (AI) increasingly assist in guiding experimental design and decision making in molecular biology and medicine. Recently, Language Models (LMs) have been adapted from Natural Language Processing (NLP) to encode the implicit language written in protein sequences. Protein LMs show enormous potential in generating descriptive representations (embeddings) for proteins from just their sequences, in a fraction of the time with respect to previous approaches, yet with comparable or improved predictive ability. Researchers have trained a variety of protein LMs that are likely to illuminate different angles of the protein language. By leveraging the bio_embeddings pipeline and modules, simple and reproducible workflows can be laid out to generate protein embeddings and rich visualizations. Embeddings can then be leveraged as input features through machine learning libraries to develop methods predicting particular aspects of protein function and structure. Beyond the workflows included here, embeddings have been leveraged as proxies to traditional homology-based inference and even to align similar protein sequences. A wealth of possibilities remain for researchers to harness through the tools provided in the following protocols. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. The following protocols are included in this manuscript: Basic Protocol 1: Generic use of the bio_embeddings pipeline to plot protein sequences and annotations Basic Protocol 2: Generate embeddings from protein sequences using the bio_embeddings pipeline Basic Protocol 3: Overlay sequence annotations onto a protein space visualization Basic Protocol 4: Train a machine learning classifier on protein embeddings Alternate Protocol 1: Generate 3D instead of 2D visualizations Alternate Protocol 2: Visualize protein solubility instead of protein subcellular localization Support Protocol: Join embedding generation and sequence space visualization in a pipeline.


Assuntos
Inteligência Artificial , Aprendizado Profundo , Aprendizado de Máquina , Processamento de Linguagem Natural , Proteínas
18.
BMC Bioinformatics ; 22(1): 11, 2021 Jan 06.
Artigo em Inglês | MEDLINE | ID: mdl-33407081

RESUMO

BACKGROUND: High-throughput sequencing has increased the number of available microbial genomes recovered from isolates, single cells, and metagenomes. Accordingly, fast and comprehensive functional gene annotation pipelines are needed to analyze and compare these genomes. Although several approaches exist for genome annotation, these are typically not designed for easy incorporation into analysis pipelines, do not combine results from different annotation databases or offer easy-to-use summaries of metabolic reconstructions, and typically require large amounts of computing power for high-throughput analysis not available to the average user. RESULTS: Here, we introduce MicrobeAnnotator, a fully automated, easy-to-use pipeline for the comprehensive functional annotation of microbial genomes that combines results from several reference protein databases and returns the matching annotations together with key metadata such as the interlinked identifiers of matching reference proteins from multiple databases [KEGG Orthology (KO), Enzyme Commission (E.C.), Gene Ontology (GO), Pfam, and InterPro]. Further, the functional annotations are summarized into Kyoto Encyclopedia of Genes and Genomes (KEGG) modules as part of a graphical output (heatmap) that allows the user to quickly detect differences among (multiple) query genomes and cluster the genomes based on their metabolic similarity. MicrobeAnnotator is implemented in Python 3 and is freely available under an open-source Artistic License 2.0 from https://github.com/cruizperez/MicrobeAnnotator . CONCLUSIONS: We demonstrated the capabilities of MicrobeAnnotator by annotating 100 Escherichia coli and 78 environmental Candidate Phyla Radiation (CPR) bacterial genomes and comparing the results to those of other popular tools. We showed that the use of multiple annotation databases allows MicrobeAnnotator to recover more annotations per genome compared to faster tools that use reduced databases and is computationally efficient for use in personal computers. The output of MicrobeAnnotator can be easily incorporated into other analysis pipelines while the results of other annotation tools can be seemingly incorporated into MicrobeAnnotator to generate summary plots.


Assuntos
Genoma Microbiano/genética , Genômica/métodos , Anotação de Sequência Molecular/métodos , Software , Escherichia coli/genética
19.
Front Bioinform ; 1: 749008, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-36303767

RESUMO

Advances in genome sequencing have accelerated the growth of sequenced genomes but at a cost in the quality of genome annotation. At the same time, computational analysis is widely used for protein annotation, but a dearth of experimental verification has contributed to inaccurate annotation as well as to annotation error propagation. Thus, a tool to help life scientists with accurate protein annotation would be useful. In this work we describe a website we have developed, the Protein Annotation Surveillance Site (PASS), which provides such a tool. This website consists of three major components: a database of homologous clusters of more than eight million protein sequences deduced from the representative genomes of bacteria, archaea, eukarya, and viruses, together with sequence information; a machine-learning software tool which periodically queries the UniprotKB database to determine whether protein function has been experimentally verified; and a query-able webpage where the FASTA headers of sequences from the cluster best matching an input sequence are returned. The user can choose from these sequences to create a sequence similarity network to assist in annotation or else use their expert knowledge to choose an annotation from the cluster sequences. Illustrations demonstrating use of this website are presented.

20.
BMC Microbiol ; 20(1): 342, 2020 11 11.
Artigo em Inglês | MEDLINE | ID: mdl-33176679

RESUMO

BACKGROUND: Members of the genus Aspergillus display a variety of lifestyles, ranging from saprobic to pathogenic on plants and/or animals. Increased genome sequencing of economically important members of the genus permits effective use of "-omics" comparisons between closely related species and strains to identify candidate genes that may contribute to phenotypes of interest, especially relating to pathogenicity. Protein-coding genes were predicted from 216 genomes of 12 Aspergillus species, and the frequencies of various structural aspects (exon count and length, intron count and length, GC content, and codon usage) and functional annotations (InterPro, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes terms) were compared. RESULTS: Using principal component analyses, the three sets of functional annotations for each strain were clustered by species. The species clusters appeared to separate by pathogenicity on plants along the first dimensions, which accounted for over 20% of the variance. More annotations for genes encoding pectinases and secondary metabolite biosynthetic enzymes were assigned to phytopathogenic strains from species such as Aspergillus flavus. In contrast, Aspergillus fumigatus strains, which are pathogenic to animals but not plants, were assigned relatively more terms related to phosphate transferases, and carbohydrate and amino-sugar metabolism. Analyses of publicly available RNA-Seq data indicated that one A. fumigatus protein among 17 amino-sugar processing candidates, a hexokinase, was up-regulated during co-culturing with human immune system cells. CONCLUSION: Genes encoding hexokinases and other proteins of interest may be subject to future manipulations to further refine understanding of Aspergillus pathogenicity factors.


Assuntos
Aspergillus/genética , Fatores de Virulência/genética , Animais , Aspergillus/classificação , Aspergillus/patogenicidade , Genes Fúngicos/genética , Genoma Fúngico/genética , Hexoquinase/genética , Humanos , Anotação de Sequência Molecular , Doenças das Plantas/microbiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA