RESUMEN
The gut metabolome acts as an intermediary between the gut microbiota and host, and has tremendous diagnostic and therapeutic potential. Several studies have utilized bioinformatic tools to predict metabolites based on the different aspects of the gut microbiome. Although these tools have contributed to a better understanding of the relationship between the gut microbiota and various diseases, most of them have focused on the impact of microbial genes on the metabolites and the relationship between microbial genes. In contrast, relatively little is known regarding the effect of metabolites on the microbial genes or the relationship between these metabolites. In this study, we constructed a computational framework of Microbe-Metabolite INteractions-based metabolic profiles Predictor (MMINP), based on the Two-Way Orthogonal Partial Least Squares (O2-PLS) algorithm to predict the metabolic profiles associated with gut microbiota. We demonstrated the predictive value of MMINP relative to that of similar methods. Additionally, we identified the features that would profoundly impact the prediction performance of data-driven methods (O2-PLS, MMINP, MelonnPan, and ENVIM), including the training sample size, host disease state, and the upstream data processing methods of the different technical platforms. We suggest that when using data-driven methods, similar host disease states and preprocessing methods, and a sufficient number of training samples are necessary to achieve accurate prediction.
MMINP fully considers internal and mutual correlations in metabolites and microbial genes and infers metabolite information through their real joint parts.The feasibility of predicting metabolic profiles using gut microbiome data should be based on the premise of similar host disease states, similar preprocessing methods, and a sufficient number of training samples.Although the accuracy of predicted specific metabolites is affected by multiple factors, the systematic conclusions presented for predicted metabolites at higher levels (e.g., class level) are accurate, allowing metabolite prediction to be applied to the discovery of potential metabolite markers.
Asunto(s)
Microbioma Gastrointestinal , Análisis de los Mínimos Cuadrados , Algoritmos , Biología Computacional , MetabolomaRESUMEN
The data output from microbiome research is growing at an accelerating rate, yet mining the data quickly and efficiently remains difficult. There is still a lack of an effective data structure to represent and manage data, as well as flexible and composable analysis methods. In response to these two issues, we designed and developed the MicrobiotaProcess package. It provides a comprehensive data structure, MPSE, to better integrate the primary and intermediate data, which improves the integration and exploration of the downstream data. Around this data structure, the downstream analysis tasks are decomposed and a set of functions are designed under a tidy framework. These functions independently perform simple tasks and can be combined to perform complex tasks. This gives users the ability to explore data, conduct personalized analyses, and develop analysis workflows. Moreover, MicrobiotaProcess can interoperate with other packages in the R community, which further expands its analytical capabilities. This article demonstrates the MicrobiotaProcess for analyzing microbiome data as well as other ecological data through several examples. It connects upstream data, provides flexible downstream analysis components, and provides visualization methods to assist in presenting and interpreting results.
RESUMEN
The toxin-antitoxin (TA) system is a widely distributed group of genetic modules that play important roles in the life of prokaryotes, with mobile genetic elements (MGEs) contributing to the dissemination of antibiotic resistance gene (ARG). The diversity and richness of TA systems in Pseudomonas aeruginosa, as one of the bacterial species with ARGs, have not yet been completely demonstrated. In this study, we explored the TA systems from the public genomic sequencing data and genome sequences. A small scale of genomic sequencing data in 281 isolates was selected from the NCBI SRA database, reassembling the genomes of these isolates led to the findings of abundant TA homologs. Furthermore, remapping these identified TA modules on 5,437 genome/draft genomes uncovers a great diversity of TA modules in P. aeruginosa. Moreover, manual inspection revealed several TA systems that were not yet reported in P. aeruginosa including the hok-sok, cptA-cptB, cbeA-cbtA, tomB-hha, and ryeA-sdsR. Additional annotation revealed that a large number of MGEs were closely distributed with TA. Also, 16% of ARGs are located relatively close to TA. Our work confirmed a wealth of TA genes in the unexplored P. aeruginosa pan-genomes, expanded the knowledge on P. aeruginosa, and provided methodological tips on large-scale data mining for future studies. The co-occurrence of MGE, ARG, and TA may indicate a potential interaction in their dissemination.
RESUMEN
The identification of the conserved and variable regions in the multiple sequence alignment (MSA) is critical to accelerating the process of understanding the function of genes. MSA visualizations allow us to transform sequence features into understandable visual representations. As the sequence-structure-function relationship gains increasing attention in molecular biology studies, the simple display of nucleotide or protein sequence alignment is not satisfied. A more scalable visualization is required to broaden the scope of sequence investigation. Here we present ggmsa, an R package for mining comprehensive sequence features and integrating the associated data of MSA by a variety of display methods. To uncover sequence conservation patterns, variations and recombination at the site level, sequence bundles, sequence logos, stacked sequence alignment and comparative plots are implemented. ggmsa supports integrating the correlation of MSA sequences and their phenotypes, as well as other traits such as ancestral sequences, molecular structures, molecular functions and expression levels. We also design a new visualization method for genome alignments in multiple alignment format to explore the pattern of within and between species variation. Combining these visual representations with prime knowledge, ggmsa assists researchers in discovering MSA and making decisions. The ggmsa package is open-source software released under the Artistic-2.0 license, and it is freely available on Bioconductor (https://bioconductor.org/packages/ggmsa) and Github (https://github.com/YuLab-SMU/ggmsa).
Asunto(s)
Genoma , Programas Informáticos , Secuencia de Aminoácidos , Posición Específica de Matrices de Puntuación , Alineación de SecuenciaRESUMEN
Background: Huanglongbing (HLB, yellow shoot disease) is a highly destructive citrus disease associated with a nonculturable bacterium, "Candidatus Liberibacter asiaticus" (CLas), which is transmitted by Asian citrus psyllid (ACP, Diaphorina citri). In Mexico, HLB was first reported in Tizimin, Yucatán, in 2009 and is now endemic in 351 municipalities of 25 states. Understanding the population diversity of CLas is critical for HLB management. Current CLas diversity research is exclusively based on analysis of the bacterial genome, which composed two regions, chromosome (> 1,000 genes) and prophage (about 40 genes). Methods and results: In this study, 40 CLas-infected ACP samples from 20 states in Mexico were collected. CLas was detected and confirmed by PCR assays. A prophage gene(terL)-based typing system (TTS) divided the Mexican CLas strains into two groups: Term-G including four strains from Yucatán and Chiapas, as well as strain psy62 from Florida, USA, and Term-A included all other 36 Mexican strains, as well as strain AHCA1 from California, USA. CLas diversity was further evaluated to include all chromosomal and prophage genes assisted by using machine learning (ML) tools to resolve multidimensional data handling issues. A Term-G strain (YTMX) and a Term-A strain (BCSMX) were sequenced and analyzed. The two Mexican genome sequences along with the CLas genome sequences available in GenBank were studied. An unsupervised ML was implemented through principal component analysis (PCA) on average nucleotide identities (ANIs) of CLas whole genome sequences; And a supervised ML was implemented through sparse partial least squares discriminant analysis (sPLS-DA) on single nucleotide polymorphisms (SNPs) of coding genes of CLas guided by the TTS. Two CLas Geno-groups, Geno-group 1 that extended Term-A and Geno-group 2 that extended Term-G, were established. Conclusions: This study concluded that: 1) there were at least two different introductions of CLas into Mexico; 2) CLas strains between Mexico and USA are closely related; and 3) The two Geno-groups provide the basis for future CLas subspecies research.
RESUMEN
While phylogenetic trees and associated data have been getting easier to generate, it has been difficult to reuse, combine, and synthesize the information they provided, because published trees are often only available as image files and associated data are often stored in incompatible formats. To increase the reproducibility and reusability of phylogenetic data, the ggtree object was designed for storing phylogenetic tree and associated data, as well as visualization directives. The ggtree object itself is a graphic object and can be rendered as a static image. More importantly, the input tree and associated data that are used in visualization can be extracted from the graphic object, making it an ideal data structure for publishing tree (image, tree, and data in one single object) and thus enhancing data reuse and analytical reproducibility, as well as facilitating integrative and comparative studies. The ggtree package is freely available at https://www.bioconductor.org/packages/ggtree.
RESUMEN
Functional enrichment analysis is pivotal for interpreting high-throughput omics data in life science. It is crucial for this type of tool to use the latest annotation databases for as many organisms as possible. To meet these requirements, we present here an updated version of our popular Bioconductor package, clusterProfiler 4.0. This package has been enhanced considerably compared with its original version published 9 years ago. The new version provides a universal interface for functional enrichment analysis in thousands of organisms based on internally supported ontologies and pathways as well as annotation data provided by users or derived from online databases. It also extends the dplyr and ggplot2 packages to offer tidy interfaces for data operation and visualization. Other new features include gene set enrichment analysis and comparison of enrichment results from multiple gene lists. We anticipate that clusterProfiler 4.0 will be applied to a wide range of scenarios across diverse organisms.
RESUMEN
Citrus Huanglongbing (HLB; yellow shoot disease) is associated with an unculturable α-proteobacterium "Candidatus Liberibacter asiaticus" (CLas). HLB was found in southern California in 2012, and the current management strategy is based on suppression of the Asian citrus psyllid (Diaphorina citri) that transmits CLas and removal of confirmed CLas-positive trees. Little is known about Asian citrus psyllid-associated bacteria and citrus-associated bacteria in the HLB system. Such information is important in HLB management, particularly for accurate detection of CLas. Recent advancements in next-generation sequencing technology provide new opportunities to study HLB through genomic DNA sequence analyses (metagenomics). In this study, HLB-related bacteria in Asian citrus psyllid and citrus (represented by leaf midrib tissues) samples from southern California were analyzed. A metagenomic pipeline was developed to serve as a prototype for future bacteriomic research. This pipeline included steps of next-generation sequencing in Illumina platform, de novo assembly of Illumina reads, sequence classification using the Kaiju tool, acquisition of bacterial draft genome sequences, and taxonomic validation and diversity evaluation using average nucleotide identity. The identified bacteria in Asian citrus psyllids and citrus together included Bradyrhizobium, Buchnera, Burkholderia, "Candidatus Profftella armature," "Candidatus Carsonella ruddii," CLas, Mesorhizobium, Paraburkholderia, Pseudomonas, and Wolbachia. The whole genome of a CLas strain recently found in San Bernardino County was sequenced and classified into prophage typing group 1 (PTG-1), one of the five known CLas groups in California. Based on sequence similarity, Bradyrhizobium and Mesorhizobium were identified as possible source that could interfere with CLas detection using the 16S rRNA gene-based PCR commonly used for HLB diagnosis, particularly at low or zero CLas titer situation.
RESUMEN
We present the ggtreeExtra package for visualizing heterogeneous data with a phylogenetic tree in a circular or rectangular layout (https://www.bioconductor.org/packages/ggtreeExtra). The package supports more data types and visualization methods than other tools. It supports using the grammar of graphics syntax to present data on a tree with richly annotated layers and allows evolutionary statistics inferred by commonly used software to be integrated and visualized with external data. GgtreeExtra is a universal tool for tree data visualization. It extends the applications of the phylogenetic tree in different disciplines by making more domain-specific data to be available to visualize and interpret in the evolutionary context.
Asunto(s)
Filogenia , Programas InformáticosRESUMEN
Phylogenetic trees and data are often stored in incompatible and inconsistent formats. The outputs of software tools that contain trees with analysis findings are often not compatible with each other, making it hard to integrate the results of different analyses in a comparative study. The treeio package is designed to connect phylogenetic tree input and output. It supports extracting phylogenetic trees as well as the outputs of commonly used analytical software. It can link external data to phylogenies and merge tree data obtained from different sources, enabling analyses of phylogeny-associated data from different disciplines in an evolutionary context. Treeio also supports export of a phylogenetic tree with heterogeneous-associated data to a single tree file, including BEAST compatible NEXUS and jtree formats; these facilitate data sharing as well as file format conversion for downstream analysis. The treeio package is designed to work with the tidytree and ggtree packages. Tree data can be processed using the tidy interface with tidytree and visualized by ggtree. The treeio package is released within the Bioconductor and rOpenSci projects. It is available at https://www.bioconductor.org/packages/treeio/.
Asunto(s)
Biología Computacional/métodos , Minería de Datos/métodos , Internet , Filogenia , Programas InformáticosRESUMEN
The genome of "Candidatus Sulcia muelleri" strain KPTW1 from Kolla paulula, a vector of Xylella fastidiosa that causes Pierce's disease (PD) of grapevine in Taiwan, was sequenced. The strain has a genome size of 253,942 bp, GC content of 22.7%, 237 predicted protein-coding genes, and 34 RNA genes.
RESUMEN
Plasmids are important genetic elements contributing to bacterial evolution and environmental adaptation. Xylella fastidiosa is a nutritionally fastidious Gram-negative bacterium causing economically devastating diseases such as Pierce's disease (PD) of grapevine. In this study, the plasmid status of a highly virulent PD strain, Stag's Leap, originally isolated from Napa Valley, CA, was studied using sequencing and bioinformatics tools. DNA samples extracted from a pure culture in periwinkle wilt medium (in vitro DNA) and a PD-symptomatic grapevine artificially inoculated in the greenhouse (in planta DNA) were subject to next-generation sequencing (NGS) analyses (Illumina MiSeq or HiSeq). Sequence analyses and polymerase chain reaction experiments revealed the presence of a circular plasmid, pXFSL21, of 21,665 bp. This plasmid existed as a single copy per bacterial genome under both in vitro and in planta conditions. Two toxin-antitoxin (T-A) systems (ydcD-ydcE and higB-higA) were detected in pXFSL21, a possible mechanism for the long-term survival of this single-copy plasmid in the bacterial population. BLAST searches against the GenBank database (version 222) detected homologs of the two T-A systems in chromosomes or plasmids of some X. fastidiosa strains. However, double T-A systems were found only in pXFSL21. pXFSL21 was not found in other known PD strains and, therefore, could serve as a molecular marker for strain Stag's Leap monitoring and tracking. The NGS-based technique outlined in this article provides an effective tool for identifying single- or low-copy-number plasmids in fastidious prokaryotes.
Asunto(s)
Enfermedades de las Plantas/microbiología , Plásmidos/genética , Sistemas Toxina-Antitoxina , Vitis , Xylella , Secuenciación de Nucleótidos de Alto Rendimiento , Plásmidos/químicaRESUMEN
A range of leaf symptoms, including blotchy mottle, yellowing, and small, upright leaves with a variety of chlorotic patterns resembling those induced by zinc deficiencies, are associated with huanglongbing (HLB, yellow shoot disease), a worldwide destructive citrus disease. HLB is presumably caused by the phloem-limited fastidious prokaryotic α-proteobacterium 'Candidatus Liberibacter spp.' Previous studies focused on the proteome and transcriptome analyses of citrus 5 to 35 weeks after 'Ca. L. spp.' inoculation. In this study, gene expression profiles were analyzed from mandarin Citrus reticulate Blanco cv. jiaogan leaves after a 2 year infection with 'Ca. L. asiaticus'. The Affymetrix microarray analysis explored 2,017 differentially expressed genes. Of the 1,364 genes had known functions, 938 (46.5%) were up-regulated. Genes related to photosynthesis, carbohydrate metabolic, and structure were mostly down-regulated, with rates of 92.7%, 61.0%, and 80.2%, respectively. Genes associated with oxidation-reduction and transport were mostly up-regulated with the rates of 75.0% and 64.6%, respectively. Our data analyses implied that the infection of 'Ca. L. asiaticus' could alter hormone crosstalk, inducing the jasmine acid pathway and depressing the ethylene and salicylic acid pathways in the citrus host. This study provides an enhanced insight into the host response of citrus to 'Ca. L. asiaticus' infection at a two-years infection stage.