RESUMO
Massive metagenomic sequencing combined with gene prediction methods were previously used to compile the gene catalogue of the ocean and host-associated microbes. Global expeditions conducted over the past 15 years have sampled the ocean to build a catalogue of genes from pelagic microbes. Here we undertook a large sequencing effort of a perturbed Red Sea plankton community to uncover that the rate of gene discovery increases continuously with sequencing effort, with no indication that the retrieved 2.83 million non-redundant (complete) genes predicted from the experiment represented a nearly complete inventory of the genes present in the sampled community (i.e., no evidence of saturation). The underlying reason is the Pareto-like distribution of the abundance of genes in the plankton community, resulting in a very long tail of millions of genes present at remarkably low abundances, which can only be retrieved through massive sequencing. Microbial metagenomic projects retrieve a variable number of unique genes per Tera base-pair (Tbp), with a median value of 14.7 million unique genes per Tbp sequenced across projects. The increase in the rate of gene discovery in microbial metagenomes with sequencing effort implies that there is ample room for new gene discovery in further ocean and holobiont sequencing studies.
Assuntos
Organismos Aquáticos/genética , Genoma Bacteriano/genética , Metagenoma/genética , Plâncton/genética , Alphaproteobacteria/genética , Organismos Aquáticos/microbiologia , Diatomáceas/genética , Flavobacteriaceae/genética , Gammaproteobacteria/genética , Estudos de Associação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Oceano Índico , Metagenômica/métodos , Plâncton/microbiologia , Microbiologia da ÁguaRESUMO
The OM43 clade within the family Methylophilaceae of Betaproteobacteria represents a group of methylotrophs that play important roles in the metabolism of C1 compounds in marine environments and other aquatic environments around the globe. Using dilution-to-extinction cultivation techniques, we successfully isolated a novel species of this clade (here designated MBRS-H7) from the ultraoligotrophic open ocean waters of the central Red Sea. Phylogenomic analyses indicate that MBRS-H7 is a novel species that forms a distinct cluster together with isolate KB13 from Hawaii (Hawaii-Red Sea [H-RS] cluster) that is separate from the cluster represented by strain HTCC2181 (from the Oregon coast). Phylogenetic analyses using the robust 16S-23S internal transcribed spacer revealed a potential ecotype separation of the marine OM43 clade members, which was further confirmed by metagenomic fragment recruitment analyses that showed trends of higher abundance in low-chlorophyll and/or high-temperature provinces for the H-RS cluster but a preference for colder, highly productive waters for the HTCC2181 cluster. This potential environmentally driven niche differentiation is also reflected in the metabolic gene inventories, which in the case of the H-RS cluster include those conferring resistance to high levels of UV irradiation, temperature, and salinity. Interestingly, we also found different energy conservation modules between these OM43 subclades, namely, the existence of the NADH:quinone oxidoreductase complex I (NUO) system in the H-RS cluster and the nonhomologous NADH:quinone oxidoreductase (NQR) system in the HTCC2181 cluster, which might have implications for their overall energetic yields.
Assuntos
Ecótipo , Methylophilaceae/classificação , Methylophilaceae/genética , Filogenia , Água do Mar/microbiologia , Análise por Conglomerados , DNA Bacteriano/química , DNA Bacteriano/genética , DNA Espaçador Ribossômico/química , DNA Espaçador Ribossômico/genética , Genômica , Oceano Índico , Dados de Sequência Molecular , Análise de Sequência de DNARESUMO
SUMMARY: In higher eukaryotes, the identification of translation initiation sites (TISs) has been focused on finding these signals in cDNA or mRNA sequences. Using Arabidopsis thaliana (A.t.) information, we developed a prediction tool for signals within genomic sequences of plants that correspond to TISs. Our tool requires only genome sequence, not expressed sequences. Its sensitivity/specificity is for A.t. (90.75%/92.2%), for Vitis vinifera (66.8%/94.4%) and for Populus trichocarpa (81.6%/94.4%), which suggests that our tool can be used in annotation of different plant genomes. We provide a list of features used in our model. Further study of these features may improve our understanding of mechanisms of the translation initiation. AVAILABILITY AND IMPLEMENTATION: Our tool is implemented as an artificial neural network. It is available as a web-based tool and, together with the source code, the list of features, and data used for model development, is accessible at http://cbrc.kaust.edu.sa/dts.
Assuntos
Arabidopsis/genética , Iniciação Traducional da Cadeia Peptídica , Software , Genoma de Planta , Genômica , Internet , Redes Neurais de Computação , Motivos de Nucleotídeos , Sensibilidade e Especificidade , Análise de Sequência de DNARESUMO
MOTIVATION: Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms. In this work, we present one such poly(A) motif prediction method based on properties of human genomic DNA sequence surrounding a poly(A) motif. These properties include thermodynamic, physico-chemical and statistical characteristics. For predictions, we developed Artificial Neural Network and Random Forest models. These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity and specificity and furthermore provide a consistent level of accuracy for 12 poly(A) motif variants. CONTACT: vladimir.bajic@kaust.edu.sa SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Redes Neurais de Computação , Poli A/análise , Genoma Humano , Humanos , Internet , Poli A/genética , Sensibilidade e Especificidade , SoftwareRESUMO
Exponential rise of metagenomics sequencing is delivering massive functional environmental genomics data. However, this also generates a procedural bottleneck for on-going re-analysis as reference databases grow and methods improve, and analyses need be updated for consistency, which require acceess to increasingly demanding bioinformatic and computational resources. Here, we present the KAUST Metagenomic Analysis Platform (KMAP), a new integrated open web-based tool for the comprehensive exploration of shotgun metagenomic data. We illustrate the capacities KMAP provides through the re-assembly of ~ 27,000 public metagenomic samples captured in ~ 450 studies sampled across ~ 77 diverse habitats. A small subset of these metagenomic assemblies is used in this pilot study grouped into 36 new habitat-specific gene catalogs, all based on full-length (complete) genes. Extensive taxonomic and gene annotations are stored in Gene Information Tables (GITs), a simple tractable data integration format useful for analysis through command line or for database management. KMAP pilot study provides the exploration and comparison of microbial GITs across different habitats with over 275 million genes. KMAP access to data and analyses is available at https://www.cbrc.kaust.edu.sa/aamg/kmap.start .
Assuntos
Biologia Computacional , Metagenoma , Metagenômica , Anotação de Sequência Molecular , SoftwareRESUMO
The deep sea, the largest ocean's compartment, drives planetary-scale biogeochemical cycling. Yet, the functional exploration of its microbial communities lags far behind other environments. Here we analyze 58 metagenomes from tropical and subtropical deep oceans to generate the Malaspina Gene Database. Free-living or particle-attached lifestyles drive functional differences in bathypelagic prokaryotic communities, regardless of their biogeography. Ammonia and CO oxidation pathways are enriched in the free-living microbial communities and dissimilatory nitrate reduction to ammonium and H2 oxidation pathways in the particle-attached, while the Calvin Benson-Bassham cycle is the most prevalent inorganic carbon fixation pathway in both size fractions. Reconstruction of the Malaspina Deep Metagenome-Assembled Genomes reveals unique non-cyanobacterial diazotrophic bacteria and chemolithoautotrophic prokaryotes. The widespread potential to grow both autotrophically and heterotrophically suggests that mixotrophy is an ecologically relevant trait in the deep ocean. These results expand our understanding of the functional microbial structure and metabolic capabilities of the largest Earth aquatic ecosystem.
Assuntos
Bactérias/genética , Bactérias/metabolismo , Ciclo do Carbono , DNA Bacteriano/genética , Metagenoma , Fotossíntese , Água do Mar/microbiologia , Bactérias/classificação , Bactérias/isolamento & purificação , DNA Bacteriano/análiseRESUMO
The spread of the novel coronavirus (SARS-CoV-2) has triggered a global emergency, that demands urgent solutions for detection and therapy to prevent escalating health, social, and economic impacts. The spike protein (S) of this virus enables binding to the human receptor ACE2, and hence presents a prime target for vaccines preventing viral entry into host cells. The S proteins from SARS and SARS-CoV-2 are similar, but structural differences in the receptor binding domain (RBD) preclude the use of SARS-specific neutralizing antibodies to inhibit SARS-CoV-2. Here we used comparative pangenomic analysis of all sequenced reference Betacoronaviruses, complemented with functional and structural analyses. This analysis reveals that, among all core gene clusters present in these viruses, the envelope protein E shows a variant cluster shared by SARS and SARS-CoV-2 with two completely-conserved key functional features, namely an ion-channel, and a PDZ-binding motif (PBM). These features play a key role in the activation of the inflammasome causing the acute respiratory distress syndrome, the leading cause of death in SARS and SARS-CoV-2 infections. Together with functional pangenomic analysis, mutation tracking, and previous evidence, on E protein as a determinant of pathogenicity in SARS, we suggest E protein as an alternative therapeutic target to be considered for further studies to reduce complications of SARS-CoV-2 infections in COVID-19.
Assuntos
Betacoronavirus/química , Proteínas do Envelope Viral/química , Proteínas do Envelope Viral/genética , COVID-19 , Proteínas do Envelope de Coronavírus , Infecções por Coronavirus/virologia , Genes Essenciais , Genes Virais , Genoma Viral , Humanos , Coronavírus da Síndrome Respiratória do Oriente Médio/química , Coronavírus da Síndrome Respiratória do Oriente Médio/genética , Mutação , Fases de Leitura Aberta , Domínios PDZ , Pandemias , Pneumonia Viral/virologia , Domínios Proteicos , Coronavírus Relacionado à Síndrome Respiratória Aguda Grave/química , SARS-CoV-2 , Proteínas ViroporinasRESUMO
BACKGROUND: Inborn errors of metabolism (IEM) represent a subclass of rare inherited diseases caused by a wide range of defects in metabolic enzymes or their regulation. Of over a thousand characterized IEMs, only about half are understood at the molecular level, and overall the development of treatment and management strategies has proved challenging. An overview of the changing landscape of therapeutic approaches is helpful in assessing strategic patterns in the approach to therapy, but the information is scattered throughout the literature and public data resources. RESULTS: We gathered data on therapeutic strategies for 300 diseases into the Drug Database for Inborn Errors of Metabolism (DDIEM). Therapeutic approaches, including both successful and ineffective treatments, were manually classified by their mechanisms of action using a new ontology. CONCLUSIONS: We present a manually curated, ontologically formalized knowledgebase of drugs, therapeutic procedures, and mitigated phenotypes. DDIEM is freely available through a web interface and for download at http://ddiem.phenomebrowser.net.
Assuntos
Bases de Dados de Produtos Farmacêuticos , Erros Inatos do Metabolismo , Humanos , Fenótipo , Doenças Raras/tratamento farmacológicoRESUMO
With antimicrobial resistance on the rise, the discovery of new compounds with novel structural scaffolds exhibiting antimicrobial properties has become an important area of research. Such compounds can serve as starting points for the development of new antimicrobials. In this report, we present the draft genome sequence of the Zooshikella ganghwensis strain VG4, isolated from Red Sea sediments, that produces metabolites with antimicrobial properties. A genomic analysis reveals that it carries at least five gene clusters that have the potential to direct biosynthesis of bioactive secondary metabolites such as polyketides and nonribosomal peptides. By using in-silico approaches, we predict the structure of these metabolites.
RESUMO
Solanum pimpinellifolium, a wild relative of cultivated tomato, offers a wealth of breeding potential for desirable traits such as tolerance to abiotic and biotic stresses. Here, we report the genome assembly and annotation of S. pimpinellifolium 'LA0480.' Moreover, we present phenotypic data from one field experiment that demonstrate a greater salinity tolerance for fruit- and yield-related traits in S. pimpinellifolium compared with cultivated tomato. The 'LA0480' genome assembly size (811 Mb) and the number of annotated genes (25,970) are within the range observed for other sequenced tomato species. We developed and utilized the Dragon Eukaryotic Analyses Platform (DEAP) to functionally annotate the 'LA0480' protein-coding genes. Additionally, we used DEAP to compare protein function between S. pimpinellifolium and cultivated tomato. Our data suggest enrichment in genes involved in biotic and abiotic stress responses. To understand the genomic basis for these differences in S. pimpinellifolium and S. lycopersicum, we analyzed 15 genes that have previously been shown to mediate salinity tolerance in plants. We show that S. pimpinellifolium has a higher copy number of the inositol-3-phosphate synthase and phosphatase genes, which are both key enzymes in the production of inositol and its derivatives. Moreover, our analysis indicates that changes occurring in the inositol phosphate pathway may contribute to the observed higher salinity tolerance in 'LA0480.' Altogether, our work provides essential resources to understand and unlock the genetic and breeding potential of S. pimpinellifolium, and to discover the genomic basis underlying its environmental robustness.
RESUMO
The candidate Division MSBL1 (Mediterranean Sea Brine Lakes 1) comprises a monophyletic group of uncultured archaea found in different hypersaline environments. Previous studies propose methanogenesis as the main metabolism. Here, we describe a metabolic reconstruction of MSBL1 based on 32 single-cell amplified genomes from Brine Pools of the Red Sea (Atlantis II, Discovery, Nereus, Erba and Kebrit). Phylogeny based on rRNA genes as well as conserved single copy genes delineates the group as a putative novel lineage of archaea. Our analysis shows that MSBL1 may ferment glucose via the Embden-Meyerhof-Parnas pathway. However, in the absence of organic carbon, carbon dioxide may be fixed via the ribulose bisphosphate carboxylase, Wood-Ljungdahl pathway or reductive TCA cycle. Therefore, based on the occurrence of genes for glycolysis, absence of the core genes found in genomes of all sequenced methanogens and the phylogenetic position, we hypothesize that the MSBL1 are not methanogens, but probably sugar-fermenting organisms capable of autotrophic growth. Such a mixotrophic lifestyle would confer survival advantage (or possibly provide a unique narrow niche) when glucose and other fermentable sugars are not available.
Assuntos
Archaea/genética , Archaea/metabolismo , Metabolismo Energético , Característica Quantitativa Herdável , Sais , Archaea/classificação , Transporte Biológico , Metabolismo dos Carboidratos , Genoma Arqueal , Genômica/métodos , Gluconeogênese , Glicólise , Oceano Índico , Filogenia , RNA Ribossômico 16S/genética , Análise de Sequência de DNA , Estresse FisiológicoRESUMO
Using dilution-to-extinction cultivation, we isolated a strain affiliated with the PS1 clade from surface waters of the Red Sea. Strain RS24 represents the second isolate of this group of marine Alphaproteobacteria after IMCC14465 that was isolated from the East (Japan) Sea. The PS1 clade is a sister group to the OCS116 clade, together forming a putatively novel order closely related to Rhizobiales. While most genomic features and most of the genetic content are conserved between RS24 and IMCC14465, their average nucleotide identity (ANI) is < 81%, suggesting two distinct species of the PS1 clade. Next to encoding two different variants of proteorhodopsin genes, they also harbor several unique genomic islands that contain genes related to degradation of aromatic compounds in IMCC14465 and in polymer degradation in RS24, possibly reflecting the physicochemical differences in the environment they were isolated from. No clear differences in abundance of the genomic content of either strain could be found in fragment recruitment analyses using different metagenomic datasets, in which both genomes were detectable albeit as minor part of the communities. The comparative genomic analysis of both isolates of the PS1 clade and the fragment recruitment analysis provide first insights into the ecology of this group.
Assuntos
Alphaproteobacteria/genética , Genoma Bacteriano , Água do Mar/microbiologia , Microbiologia da Água , Alphaproteobacteria/isolamento & purificação , Ecossistema , Oceano Índico , Anotação de Sequência Molecular , Filogenia , Análise de Sequência de DNARESUMO
BACKGROUND: The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. RESULTS: We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. CONCLUSIONS: We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo.
Assuntos
Archaea/genética , Bactérias/genética , Bases de Dados Genéticas , Genoma Microbiano/genética , Benzoatos/metabolismo , Biodegradação Ambiental , Genoma Bacteriano , Oceano Índico , Anotação de Sequência Molecular , Ferramenta de Busca , Software , Interface Usuário-ComputadorRESUMO
The Dragon Exploration System for Toxicants and Fertility (DESTAF) is a publicly available resource which enables researchers to efficiently explore both known and potentially novel information and associations in the field of reproductive toxicology. To create DESTAF we used data from the literature (including over 10500 PubMed abstracts), several publicly available biomedical repositories, and specialized, curated dictionaries. DESTAF has an interface designed to facilitate rapid assessment of the key associations between relevant concepts, allowing for a more in-depth exploration of information based on different gene/protein-, enzyme/metabolite-, toxin/chemical-, disease- or anatomically centric perspectives. As a special feature, DESTAF allows for the creation and initial testing of potentially new association hypotheses that suggest links between biological entities identified through the database. DESTAF, along with a PDF manual, can be found at http://cbrc.kaust.edu.sa/destaf. It is free to academic and non-commercial users and will be updated quarterly.