Search | VHL Regional Portal

1.

Variability and bias in microbiome metagenomic sequencing: an interlaboratory study comparing experimental protocols.

Forry, Samuel P; Servetas, Stephanie L; Kralj, Jason G; Soh, Keng; Hadjithomas, Michalis; Cano, Raul; Carlin, Martha; Amorim, Maria G de; Auch, Benjamin; Bakker, Matthew G; Bartelli, Thais F; Bustamante, Juan P; Cassol, Ignacio; Chalita, Mauricio; Dias-Neto, Emmanuel; Duca, Aaron Del; Gohl, Daryl M; Kazantseva, Jekaterina; Haruna, Muyideen T; Menzel, Peter; Moda, Bruno S; Neuberger-Castillo, Lorieza; Nunes, Diana N; Patel, Isha R; Peralta, Rodrigo D; Saliou, Adrien; Schwarzer, Rolf; Sevilla, Samantha; Takenaka, Isabella K T M; Wang, Jeremy R; Knight, Rob; Gevers, Dirk; Jackson, Scott A.

Sci Rep ; 14(1): 9785, 2024 04 29.

Article in English | MEDLINE | ID: mdl-38684791

ABSTRACT

Several studies have documented the significant impact of methodological choices in microbiome analyses. The myriad of methodological options available complicate the replication of results and generally limit the comparability of findings between independent studies that use differing techniques and measurement pipelines. Here we describe the Mosaic Standards Challenge (MSC), an international interlaboratory study designed to assess the impact of methodological variables on the results. The MSC did not prescribe methods but rather asked participating labs to analyze 7 shared reference samples (5 × human stool samples and 2 × mock communities) using their standard laboratory methods. To capture the array of methodological variables, each participating lab completed a metadata reporting sheet that included 100 different questions regarding the details of their protocol. The goal of this study was to survey the methodological landscape for microbiome metagenomic sequencing (MGS) analyses and the impact of methodological decisions on metagenomic sequencing results. A total of 44 labs participated in the MSC by submitting results (16S or WGS) along with accompanying metadata; thirty 16S rRNA gene amplicon datasets and 14 WGS datasets were collected. The inclusion of two types of reference materials (human stool and mock communities) enabled analysis of both MGS measurement variability between different protocols using the biologically-relevant stool samples, and MGS bias with respect to ground truth values using the DNA mixtures. Owing to the compositional nature of MGS measurements, analyses were conducted on the ratio of Firmicutes: Bacteroidetes allowing us to directly apply common statistical methods. The resulting analysis demonstrated that protocol choices have significant effects, including both bias of the MGS measurement associated with a particular methodological choices, as well as effects on measurement robustness as observed through the spread of results between labs making similar methodological choices. In the analysis of the DNA mock communities, MGS measurement bias was observed even when there was general consensus among the participating laboratories. This study was the result of a collaborative effort that included academic, commercial, and government labs. In addition to highlighting the impact of different methodological decisions on MGS result comparability, this work also provides insights for consideration in future microbiome measurement study design.

Subject(s)

Feces , Metagenomics , Microbiota , RNA, Ribosomal, 16S , Humans , Metagenomics/methods , Metagenomics/standards , RNA, Ribosomal, 16S/genetics , Feces/microbiology , Microbiota/genetics , Bias , Metagenome , Gastrointestinal Microbiome/genetics , Sequence Analysis, DNA/methods , Bacteria/genetics , Bacteria/classification , Bacteria/isolation & purification , High-Throughput Nucleotide Sequencing/methods

2.

Development of platforms for functional characterization and production of phenazines using a multi-chassis approach via CRAGE.

Ke, Jing; Zhao, Zhiying; Coates, Cameron R; Hadjithomas, Michalis; Kuftin, Andrea; Louie, Katherine; Weller, David; Thomashow, Linda; Mouncey, Nigel J; Northen, Trent R; Yoshikuni, Yasuo.

Metab Eng ; 69: 188-197, 2022 01.

Article in English | MEDLINE | ID: mdl-34890798

ABSTRACT

Phenazines (Phzs), a family of chemicals with a phenazine backbone, are secondary metabolites with diverse properties such as antibacterial, anti-fungal, or anticancer activity. The core derivatives of phenazine, phenazine-1-carboxylic acid (PCA) and phenazine-1,6-dicarboxylic acid (PDC), are themselves precursors for various other derivatives. Recent advances in genome mining tools have enabled researchers to identify many biosynthetic gene clusters (BGCs) that might produce novel Phzs. To characterize the function of these BGCs efficiently, we performed modular construct assembly and subsequent multi-chassis heterologous expression using chassis-independent recombinase-assisted genome engineering (CRAGE). CRAGE allowed rapid integration of a PCA BGC into 23 diverse Î³-proteobacteria species and allowed us to identify top PCA producers. We then used the top five chassis hosts to express four partially refactored PDC BGCs. A few of these platforms produced high levels of PDC. Specifically, Xenorhabdus doucetiae and Pseudomonas simiae produced PDC at a titer of 293 mg/L and 373 mg/L, respectively, in minimal media. These titers are significantly higher than those previously reported. Furthermore, selectivity toward PDC production over PCA production was improved by up to 9-fold. The results show that these strains are promising chassis for production of PCA, PDC, and their derivatives, as well as for function characterization of Phz BGCs identified via bioinformatics mining.

Subject(s)

Phenazines , Recombinases , Multigene Family , Phenazines/metabolism , Recombinases/genetics

3.

Expansion of RiPP biosynthetic space through integration of pan-genomics and machine learning uncovers a novel class of lanthipeptides.

Kloosterman, Alexander M; Cimermancic, Peter; Elsayed, Somayah S; Du, Chao; Hadjithomas, Michalis; Donia, Mohamed S; Fischbach, Michael A; van Wezel, Gilles P; Medema, Marnix H.

PLoS Biol ; 18(12): e3001026, 2020 12.

Article in English | MEDLINE | ID: mdl-33351797

ABSTRACT

Microbial natural products constitute a wide variety of chemical compounds, many which can have antibiotic, antiviral, or anticancer properties that make them interesting for clinical purposes. Natural product classes include polyketides (PKs), nonribosomal peptides (NRPs), and ribosomally synthesized and post-translationally modified peptides (RiPPs). While variants of biosynthetic gene clusters (BGCs) for known classes of natural products are easy to identify in genome sequences, BGCs for new compound classes escape attention. In particular, evidence is accumulating that for RiPPs, subclasses known thus far may only represent the tip of an iceberg. Here, we present decRiPPter (Data-driven Exploratory Class-independent RiPP TrackER), a RiPP genome mining algorithm aimed at the discovery of novel RiPP classes. DecRiPPter combines a Support Vector Machine (SVM) that identifies candidate RiPP precursors with pan-genomic analyses to identify which of these are encoded within operon-like structures that are part of the accessory genome of a genus. Subsequently, it prioritizes such regions based on the presence of new enzymology and based on patterns of gene cluster and precursor peptide conservation across species. We then applied decRiPPter to mine 1,295 Streptomyces genomes, which led to the identification of 42 new candidate RiPP families that could not be found by existing programs. One of these was studied further and elucidated as a representative of a novel subfamily of lanthipeptides, which we designate class V. The 2D structure of the new RiPP, which we name pristinin A3 (1), was solved using nuclear magnetic resonance (NMR), tandem mass spectrometry (MS/MS) data, and chemical labeling. Two previously unidentified modifying enzymes are proposed to create the hallmark lanthionine bridges. Taken together, our work highlights how novel natural product families can be discovered by methods going beyond sequence similarity searches to integrate multiple pathway discovery criteria.

Subject(s)

Bacteriocins/genetics , Genomics/methods , Protein Processing, Post-Translational/genetics , Algorithms , Bacteriocins/metabolism , Biological Products/analysis , Biological Products/metabolism , Computational Biology/methods , Genome/genetics , Machine Learning , Multigene Family/genetics , Peptides/genetics , Protein Processing, Post-Translational/physiology , Ribosomes/metabolism

4.

Hidden genomic evolution in a morphospecies-The landscape of rapidly evolving genes in Tetrahymena.

Xiong, Jie; Yang, Wentao; Chen, Kai; Jiang, Chuanqi; Ma, Yang; Chai, Xiaocui; Yan, Guanxiong; Wang, Guangying; Yuan, Dongxia; Liu, Yifan; Bidwell, Shelby L; Zafar, Nikhat; Hadjithomas, Michalis; Krishnakumar, Vivek; Coyne, Robert S; Orias, Eduardo; Miao, Wei.

PLoS Biol ; 17(6): e3000294, 2019 06.

Article in English | MEDLINE | ID: mdl-31158217

ABSTRACT

A morphospecies is defined as a taxonomic species based wholly on morphology, but often morphospecies consist of clusters of cryptic species that can be identified genetically or molecularly. The nature of the evolutionary novelty that accompanies speciation in a morphospecies is an intriguing question. Morphospecies are particularly common among ciliates, a group of unicellular eukaryotes that separates 2 kinds of nuclei-the silenced germline nucleus (micronucleus [MIC]) and the actively expressed somatic nucleus (macronucleus [MAC])-within a common cytoplasm. Because of their very similar morphologies, members of the Tetrahymena genus are considered a morphospecies. We explored the hidden genomic evolution within this genus by performing a comprehensive comparative analysis of the somatic genomes of 10 species and the germline genomes of 2 species of Tetrahymena. These species show high genetic divergence; phylogenomic analysis suggests that the genus originated about 300 million years ago (Mya). Seven universal protein domains are preferentially included among the species-specific (i.e., the youngest) Tetrahymena genes. In particular, leucine-rich repeat (LRR) genes make the largest contribution to the high level of genome divergence of the 10 species. LRR genes can be sorted into 3 different age groups. Parallel evolutionary trajectories have independently occurred among LRR genes in the different Tetrahymena species. Thousands of young LRR genes contain tandem arrays of exactly 90-bp exons. The introns separating these exons show a unique, extreme phase 2 bias, suggesting a clonal origin and successive expansions of 90-bp-exon LRR genes. Identifying LRR gene age groups allowed us to document a Tetrahymena intron length cycle. The youngest 90-bp exon LRR genes in T. thermophila are concentrated in pericentromeric and subtelomeric regions of the 5 micronuclear chromosomes, suggesting that these regions act as genome innovation centers. Copies of a Tetrahymena Long interspersed element (LINE)-like retrotransposon are very frequently found physically adjacent to 90-bp exon/intron repeat units of the youngest LRR genes. We propose that Tetrahymena species have used a massive exon-shuffling mechanism, involving unequal crossing over possibly in concert with retrotransposition, to create the unique 90-bp exon array LRR genes.

Subject(s)

Genomics/methods , Species Specificity , Tetrahymena/genetics , Biological Evolution , Evolution, Molecular , Exons , Genome, Protozoan , Introns , Leucine-Rich Repeat Proteins , Phylogeny , Proteins/genetics , Tetrahymena/metabolism

5.

Cultivation and sequencing of rumen microbiome members from the Hungate1000 Collection.

Seshadri, Rekha; Leahy, Sinead C; Attwood, Graeme T; Teh, Koon Hoong; Lambie, Suzanne C; Cookson, Adrian L; Eloe-Fadrosh, Emiley A; Pavlopoulos, Georgios A; Hadjithomas, Michalis; Varghese, Neha J; Paez-Espino, David; Perry, Rechelle; Henderson, Gemma; Creevey, Christopher J; Terrapon, Nicolas; Lapebie, Pascal; Drula, Elodie; Lombard, Vincent; Rubin, Edward; Kyrpides, Nikos C; Henrissat, Bernard; Woyke, Tanja; Ivanova, Natalia N; Kelly, William J.

Nat Biotechnol ; 36(4): 359-367, 2018 04.

Article in English | MEDLINE | ID: mdl-29553575

ABSTRACT

Productivity of ruminant livestock depends on the rumen microbiota, which ferment indigestible plant polysaccharides into nutrients used for growth. Understanding the functions carried out by the rumen microbiota is important for reducing greenhouse gas production by ruminants and for developing biofuels from lignocellulose. We present 410 cultured bacteria and archaea, together with their reference genomes, representing every cultivated rumen-associated archaeal and bacterial family. We evaluate polysaccharide degradation, short-chain fatty acid production and methanogenesis pathways, and assign specific taxa to functions. A total of 336 organisms were present in available rumen metagenomic data sets, and 134 were present in human gut microbiome data sets. Comparison with the human microbiome revealed rumen-specific enrichment for genes encoding de novo synthesis of vitamin B12, ongoing evolution by gene loss and potential vertical inheritance of the rumen microbiome based on underrepresentation of markers of environmental stress. We estimate that our Hungate genome resource represents â¼75% of the genus-level bacterial and archaeal taxa present in the rumen.

Subject(s)

Archaea/genetics , Bacteria/genetics , Gastrointestinal Microbiome/genetics , Rumen/microbiology , Animals , Archaea/classification , Archaea/metabolism , Bacteria/classification , Bacteria/metabolism , Biofuels , Humans , Lignin/chemistry , Lignin/genetics , Microbiota/genetics

6.

An integrated workflow for phenazine-modifying enzyme characterization.

Coates, R Cameron; Bowen, Benjamin P; Oberortner, Ernst; Thomashow, Linda; Hadjithomas, Michalis; Zhao, Zhiying; Ke, Jing; Silva, Leslie; Louie, Katherine; Wang, Gaoyan; Robinson, David; Tarver, Angela; Hamilton, Matthew; Lubbe, Andrea; Feltcher, Meghan; Dangl, Jeffery L; Pati, Amrita; Weller, David; Northen, Trent R; Cheng, Jan-Fang; Mouncey, Nigel J; Deutsch, Samuel; Yoshikuni, Yasuo.

J Ind Microbiol Biotechnol ; 45(7): 567-577, 2018 Jul.

Article in English | MEDLINE | ID: mdl-29546662

ABSTRACT

Increasing availability of new genomes and putative biosynthetic gene clusters (BGCs) has extended the opportunity to access novel chemical diversity for agriculture, medicine, environmental and industrial purposes. However, functional characterization of BGCs through heterologous expression is limited because expression may require complex regulatory mechanisms, specific folding or activation. We developed an integrated workflow for BGC characterization that integrates pathway identification, modular design, DNA synthesis, assembly and characterization. This workflow was applied to characterize multiple phenazine-modifying enzymes. Phenazine pathways are useful for this workflow because all phenazines are derived from a core scaffold for modification by diverse modifying enzymes (PhzM, PhzS, PhzH, and PhzO) that produce characterized compounds. We expressed refactored synthetic modules of previously uncharacterized phenazine BGCs heterologously in Escherichia coli and were able to identify metabolic intermediates they produced, including a previously unidentified metabolite. These results demonstrate how this approach can accelerate functional characterization of BGCs.

Subject(s)

Bacterial Proteins/genetics , Multigene Family , Phenazines/metabolism , Biosynthetic Pathways/genetics , Escherichia coli/genetics , Escherichia coli/metabolism

7.

1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life.

Mukherjee, Supratim; Seshadri, Rekha; Varghese, Neha J; Eloe-Fadrosh, Emiley A; Meier-Kolthoff, Jan P; Göker, Markus; Coates, R Cameron; Hadjithomas, Michalis; Pavlopoulos, Georgios A; Paez-Espino, David; Yoshikuni, Yasuo; Visel, Axel; Whitman, William B; Garrity, George M; Eisen, Jonathan A; Hugenholtz, Philip; Pati, Amrita; Ivanova, Natalia N; Woyke, Tanja; Klenk, Hans-Peter; Kyrpides, Nikos C.

Nat Biotechnol ; 35(7): 676-683, 2017 Jul.

Article in English | MEDLINE | ID: mdl-28604660

ABSTRACT

We present 1,003 reference genomes that were sequenced as part of the Genomic Encyclopedia of Bacteria and Archaea (GEBA) initiative, selected to maximize sequence coverage of phylogenetic space. These genomes double the number of existing type strains and expand their overall phylogenetic diversity by 25%. Comparative analyses with previously available finished and draft genomes reveal a 10.5% increase in novel protein families as a function of phylogenetic diversity. The GEBA genomes recruit 25 million previously unassigned metagenomic proteins from 4,650 samples, improving their phylogenetic and functional interpretation. We identify numerous biosynthetic clusters and experimentally validate a divergent phenazine cluster with potential new chemical structure and antimicrobial activity. This Resource is the largest single release of reference genomes to date. Bacterial and archaeal isolate sequence space is still far from saturated, and future endeavors in this direction will continue to be a valuable resource for scientific discovery.

Subject(s)

Chromosome Mapping/standards , Databases, Genetic , Genome, Archaeal/genetics , Genome, Bacterial/genetics , High-Throughput Nucleotide Sequencing/standards , Knowledge Bases , Database Management Systems , Datasets as Topic , Encyclopedias as Topic , Reference Values

8.

Complete genome sequence of Jiangella gansuensis strain YIM 002^T (DSM 44835^T), the type species of the genus Jiangella and source of new antibiotic compounds.

Jiao, Jian-Yu; Carro, Lorena; Liu, Lan; Gao, Xiao-Yang; Zhang, Xiao-Tong; Hozzein, Wael N; Lapidus, Alla; Huntemann, Marcel; Reddy, T B K; Varghese, Neha; Hadjithomas, Michalis; Ivanova, Natalia N; Göker, Markus; Pillay, Manoj; Eisen, Jonathan A; Woyke, Tanja; Klenk, Hans-Peter; Kyrpides, Nikos C; Li, Wen-Jun.

Stand Genomic Sci ; 12: 21, 2017.

Article in English | MEDLINE | ID: mdl-28174619

ABSTRACT

Jiangella gansuensis strain YIM 002T is the type strain of the type species of the genus Jiangella, which is at the present time composed of five species, and was isolated from desert soil sample in Gansu Province (China). The five strains of this genus are clustered in a monophyletic group when closer actinobacterial genera are used to infer a 16S rRNA gene sequence phylogeny. The study of this genome is part of the GenomicEncyclopedia ofBacteria andArchaea project, and here we describe the complete genome sequence and annotation of this taxon. The genome of J. gansuensis strain YIM 002T contains a single scaffold of size 5,585,780 bp, which involves 149 pseudogenes, 4905 protein-coding genes and 50 RNA genes, including 2520 hypothetical proteins and 4 rRNA genes. From the investigation of genome sizes of Jiangella species, J. gansuensis shows a smaller size, which indicates this strain might have discarded too much genetic information to adapt to desert environment. Seven new compounds from this bacterium have recently been described; however, its potential should be higher, as secondary metabolite gene cluster analysis predicted 60 gene clusters, including the potential to produce the pristinamycin.

9.

IMG/M: integrated genome and metagenome comparative data analysis system.

Chen, I-Min A; Markowitz, Victor M; Chu, Ken; Palaniappan, Krishna; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Andersen, Evan; Huntemann, Marcel; Varghese, Neha; Hadjithomas, Michalis; Tennessen, Kristin; Nielsen, Torben; Ivanova, Natalia N; Kyrpides, Nikos C.

Nucleic Acids Res ; 45(D1): D507-D516, 2017 01 04.

Article in English | MEDLINE | ID: mdl-27738135

ABSTRACT

The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support for examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review (ER) companion system (IMG/M ER: https://img.jgi.doe.gov/mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system.

Subject(s)

Computational Biology/methods , Metagenome , Metagenomics/methods , Microbiota/genetics , Software , Web Browser

10.

IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes.

Hadjithomas, Michalis; Chen, I-Min A; Chu, Ken; Huang, Jinghua; Ratner, Anna; Palaniappan, Krishna; Andersen, Evan; Markowitz, Victor; Kyrpides, Nikos C; Ivanova, Natalia N.

Nucleic Acids Res ; 45(D1): D560-D565, 2017 01 04.

Article in English | MEDLINE | ID: mdl-27903896

ABSTRACT

Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic gene clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.

Subject(s)

Bacteria/genetics , Bacteria/metabolism , Genome, Bacterial , Genomics/methods , Metabolomics/methods , Computational Biology/methods , Software , Web Browser

11.

Structure of the germline genome of Tetrahymena thermophila and relationship to the massively rearranged somatic genome.

Hamilton, Eileen P; Kapusta, Aurélie; Huvos, Piroska E; Bidwell, Shelby L; Zafar, Nikhat; Tang, Haibao; Hadjithomas, Michalis; Krishnakumar, Vivek; Badger, Jonathan H; Caler, Elisabet V; Russ, Carsten; Zeng, Qiandong; Fan, Lin; Levin, Joshua Z; Shea, Terrance; Young, Sarah K; Hegarty, Ryan; Daza, Riza; Gujja, Sharvari; Wortman, Jennifer R; Birren, Bruce W; Nusbaum, Chad; Thomas, Jainy; Carey, Clayton M; Pritham, Ellen J; Feschotte, Cédric; Noto, Tomoko; Mochizuki, Kazufumi; Papazyan, Romeo; Taverna, Sean D; Dear, Paul H; Cassidy-Hanley, Donna M; Xiong, Jie; Miao, Wei; Orias, Eduardo; Coyne, Robert S.

Elife ; 52016 11 28.

Article in English | MEDLINE | ID: mdl-27892853

ABSTRACT

The germline genome of the binucleated ciliate Tetrahymena thermophila undergoes programmed chromosome breakage and massive DNA elimination to generate the somatic genome. Here, we present a complete sequence assembly of the germline genome and analyze multiple features of its structure and its relationship to the somatic genome, shedding light on the mechanisms of genome rearrangement as well as the evolutionary history of this remarkable germline/soma differentiation. Our results strengthen the notion that a complex, dynamic, and ongoing interplay between mobile DNA elements and the host genome have shaped Tetrahymena chromosome structure, locally and globally. Non-standard outcomes of rearrangement events, including the generation of short-lived somatic chromosomes and excision of DNA interrupting protein-coding regions, may represent novel forms of developmental gene regulation. We also compare Tetrahymena's germline/soma differentiation to that of other characterized ciliates, illustrating the wide diversity of adaptations that have occurred within this phylum.

Subject(s)

Gene Rearrangement , Genome, Protozoan , Tetrahymena thermophila/genetics , Sequence Analysis, DNA

12.

Supporting community annotation and user collaboration in the integrated microbial genomes (IMG) system.

Chen, I-Min A; Markowitz, Victor M; Palaniappan, Krishna; Szeto, Ernest; Chu, Ken; Huang, Jinghua; Ratner, Anna; Pillay, Manoj; Hadjithomas, Michalis; Huntemann, Marcel; Mikhailova, Natalia; Ovchinnikova, Galina; Ivanova, Natalia N; Kyrpides, Nikos C.

BMC Genomics ; 17: 307, 2016 Apr 26.

Article in English | MEDLINE | ID: mdl-27118214

ABSTRACT

BACKGROUND: The exponential growth of genomic data from next generation technologies renders traditional manual expert curation effort unsustainable. Many genomic systems have included community annotation tools to address the problem. Most of these systems adopted a "Wiki-based" approach to take advantage of existing wiki technologies, but encountered obstacles in issues such as usability, authorship recognition, information reliability and incentive for community participation. RESULTS: Here, we present a different approach, relying on tightly integrated method rather than "Wiki-based" method, to support community annotation and user collaboration in the Integrated Microbial Genomes (IMG) system. The IMG approach allows users to use existing IMG data warehouse and analysis tools to add gene, pathway and biosynthetic cluster annotations, to analyze/reorganize contigs, genes and functions using workspace datasets, and to share private user annotations and workspace datasets with collaborators. We show that the annotation effort using IMG can be part of the research process to overcome the user incentive and authorship recognition problems thus fostering collaboration among domain experts. The usability and reliability issues are addressed by the integration of curated information and analysis tools in IMG, together with DOE Joint Genome Institute (JGI) expert review. CONCLUSION: By incorporating annotation operations into IMG, we provide an integrated environment for users to perform deeper and extended data analysis and annotation in a single system that can lead to publications and community knowledge sharing as shown in the case studies.

Subject(s)

Computational Biology/methods , Genome, Microbial , Genomics/methods , Molecular Sequence Annotation/methods , Software , Cooperative Behavior , Data Accuracy , Information Dissemination , Internet , User-Computer Interface

13.

Local admixture of amplified and diversified secreted pathogenesis determinants shapes mosaic Toxoplasma gondii genomes.

Lorenzi, Hernan; Khan, Asis; Behnke, Michael S; Namasivayam, Sivaranjani; Swapna, Lakshmipuram S; Hadjithomas, Michalis; Karamycheva, Svetlana; Pinney, Deborah; Brunk, Brian P; Ajioka, James W; Ajzenberg, Daniel; Boothroyd, John C; Boyle, Jon P; Dardé, Marie L; Diaz-Miranda, Maria A; Dubey, Jitender P; Fritz, Heather M; Gennari, Solange M; Gregory, Brian D; Kim, Kami; Saeij, Jeroen P J; Su, Chunlei; White, Michael W; Zhu, Xing-Quan; Howe, Daniel K; Rosenthal, Benjamin M; Grigg, Michael E; Parkinson, John; Liu, Liang; Kissinger, Jessica C; Roos, David S; Sibley, L David.

Nat Commun ; 7: 10147, 2016 Jan 07.

Article in English | MEDLINE | ID: mdl-26738725

ABSTRACT

Toxoplasma gondii is among the most prevalent parasites worldwide, infecting many wild and domestic animals and causing zoonotic infections in humans. T. gondii differs substantially in its broad distribution from closely related parasites that typically have narrow, specialized host ranges. To elucidate the genetic basis for these differences, we compared the genomes of 62 globally distributed T. gondii isolates to several closely related coccidian parasites. Our findings reveal that tandem amplification and diversification of secretory pathogenesis determinants is the primary feature that distinguishes the closely related genomes of these biologically diverse parasites. We further show that the unusual population structure of T. gondii is characterized by clade-specific inheritance of large conserved haploblocks that are significantly enriched in tandemly clustered secretory pathogenesis determinants. The shared inheritance of these conserved haploblocks, which show a different ancestry than the genome as a whole, may thus influence transmission, host range and pathogenicity.

Subject(s)

Genome, Protozoan , Toxoplasma/genetics , Toxoplasma/pathogenicity , Conserved Sequence , DNA, Protozoan/genetics , Gene Expression Regulation/physiology , Phylogeny , Polymorphism, Single Nucleotide , Protozoan Proteins/genetics , Protozoan Proteins/metabolism , RNA, Messenger/genetics , RNA, Messenger/metabolism , Synteny , Virulence

14.

Correction: Selecting One of Several Mating Types through Gene Segment Joining and Deletion in Tetrahymena thermophila.

Cervantes, Marcella D; Hamilton, Eileen P; Xiong, Jie; Lawson, Michael J; Yuan, Dongxia; Hadjithomas, Michalis; Miao, Wei; Orias, Eduardo.

PLoS Biol ; 13(10): e1002284, 2015 Oct.

Article in English | MEDLINE | ID: mdl-26488167

15.

Minimum Information about a Biosynthetic Gene cluster.

Medema, Marnix H; Kottmann, Renzo; Yilmaz, Pelin; Cummings, Matthew; Biggins, John B; Blin, Kai; de Bruijn, Irene; Chooi, Yit Heng; Claesen, Jan; Coates, R Cameron; Cruz-Morales, Pablo; Duddela, Srikanth; Düsterhus, Stephanie; Edwards, Daniel J; Fewer, David P; Garg, Neha; Geiger, Christoph; Gomez-Escribano, Juan Pablo; Greule, Anja; Hadjithomas, Michalis; Haines, Anthony S; Helfrich, Eric J N; Hillwig, Matthew L; Ishida, Keishi; Jones, Adam C; Jones, Carla S; Jungmann, Katrin; Kegler, Carsten; Kim, Hyun Uk; Kötter, Peter; Krug, Daniel; Masschelein, Joleen; Melnik, Alexey V; Mantovani, Simone M; Monroe, Emily A; Moore, Marcus; Moss, Nathan; Nützmann, Hans-Wilhelm; Pan, Guohui; Pati, Amrita; Petras, Daniel; Reen, F Jerry; Rosconi, Federico; Rui, Zhe; Tian, Zhenhua; Tobias, Nicholas J; Tsunematsu, Yuta; Wiemann, Philipp; Wyckoff, Elizabeth; Yan, Xiaohui.

Nat Chem Biol ; 11(9): 625-31, 2015 Sep.

Article in English | MEDLINE | ID: mdl-26284661

Subject(s)

Bacteria/genetics , Computational Biology/standards , Fungi/genetics , Multigene Family , Plants/genetics , Protein Biosynthesis , Alkaloids/biosynthesis , Bacteria/metabolism , Databases, Genetic , Fungi/metabolism , Genetic Markers , International Cooperation , Metagenome , Peptide Biosynthesis, Nucleic Acid-Independent , Peptides/metabolism , Plants/metabolism , Polyketides/metabolism , Polysaccharides/biosynthesis , Terminology as Topic , Terpenes/metabolism

16.

IMG-ABC: A Knowledge Base To Fuel Discovery of Biosynthetic Gene Clusters and Novel Secondary Metabolites.

Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Szeto, Ernest; Huang, Jinghua; Reddy, T B K; Cimermancic, Peter; Fischbach, Michael A; Ivanova, Natalia N; Markowitz, Victor M; Kyrpides, Nikos C; Pati, Amrita.

mBio ; 6(4): e00932, 2015 Jul 14.

Article in English | MEDLINE | ID: mdl-26173699

ABSTRACT

UNLABELLED: In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of "big" genomic data for discovering small molecules. IMG-ABC relies on IMG's comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC's focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMPORTANCE: IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG's extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to expand, with the goal of becoming an essential component of any bioinformatic exploration of the secondary metabolism world.

Subject(s)

Biosynthetic Pathways/genetics , Computational Biology/methods , Knowledge Bases , Multigene Family , Secondary Metabolism/genetics

17.

Total synthesis of a functional designer eukaryotic chromosome.

Annaluru, Narayana; Muller, Héloïse; Mitchell, Leslie A; Ramalingam, Sivaprakash; Stracquadanio, Giovanni; Richardson, Sarah M; Dymond, Jessica S; Kuang, Zheng; Scheifele, Lisa Z; Cooper, Eric M; Cai, Yizhi; Zeller, Karen; Agmon, Neta; Han, Jeffrey S; Hadjithomas, Michalis; Tullman, Jennifer; Caravelli, Katrina; Cirelli, Kimberly; Guo, Zheyuan; London, Viktoriya; Yeluru, Apurva; Murugan, Sindurathy; Kandavelou, Karthikeyan; Agier, Nicolas; Fischer, Gilles; Yang, Kun; Martin, J Andrew; Bilgel, Murat; Bohutski, Pavlo; Boulier, Kristin M; Capaldo, Brian J; Chang, Joy; Charoen, Kristie; Choi, Woo Jin; Deng, Peter; DiCarlo, James E; Doong, Judy; Dunn, Jessilyn; Feinberg, Jason I; Fernandez, Christopher; Floria, Charlotte E; Gladowski, David; Hadidi, Pasha; Ishizuka, Isabel; Jabbari, Javaneh; Lau, Calvin Y L; Lee, Pablo A; Li, Sean; Lin, Denise; Linder, Matthias E.

Science ; 344(6179): 55-8, 2014 04 04.

Article in English | MEDLINE | ID: mdl-24674868

ABSTRACT

Rapid advances in DNA synthesis techniques have made it possible to engineer viruses, biochemical pathways and assemble bacterial genomes. Here, we report the synthesis of a functional 272,871-base pair designer eukaryotic chromosome, synIII, which is based on the 316,617-base pair native Saccharomyces cerevisiae chromosome III. Changes to synIII include TAG/TAA stop-codon replacements, deletion of subtelomeric regions, introns, transfer RNAs, transposons, and silent mating loci as well as insertion of loxPsym sites to enable genome scrambling. SynIII is functional in S. cerevisiae. Scrambling of the chromosome in a heterozygous diploid reveals a large increase in a-mater derivatives resulting from loss of the MATα allele on synIII. The complete design and synthesis of synIII establishes S. cerevisiae as the basis for designer eukaryotic genome biology.

Subject(s)

Chromosomes, Fungal , Saccharomyces cerevisiae/genetics , Synthetic Biology/methods , Base Sequence , Chromosomes, Fungal/genetics , Chromosomes, Fungal/metabolism , DNA, Fungal/genetics , Genes, Fungal , Genetic Fitness , Genome, Fungal , Genomic Instability , Introns , Molecular Sequence Data , Mutation , Polymerase Chain Reaction , RNA, Fungal/genetics , RNA, Transfer/genetics , Saccharomyces cerevisiae/cytology , Saccharomyces cerevisiae/physiology , Sequence Analysis, DNA , Sequence Deletion , Transformation, Genetic

18.

Selecting one of several mating types through gene segment joining and deletion in Tetrahymena thermophila.

Cervantes, Marcella D; Hamilton, Eileen P; Xiong, Jie; Lawson, Michael J; Yuan, Dongxia; Hadjithomas, Michalis; Miao, Wei; Orias, Eduardo.

PLoS Biol ; 11(3): e1001518, 2013.

Article in English | MEDLINE | ID: mdl-23555191

ABSTRACT

The unicellular eukaryote Tetrahymena thermophila has seven mating types. Cells can mate only when they recognize cells of a different mating type as non-self. As a ciliate, Tetrahymena separates its germline and soma into two nuclei. During growth the somatic nucleus is responsible for all gene transcription while the germline nucleus remains silent. During mating, a new somatic nucleus is differentiated from a germline nucleus and mating type is decided by a stochastic process. We report here that the somatic mating type locus contains a pair of genes arranged head-to-head. Each gene encodes a mating type-specific segment and a transmembrane domain that is shared by all mating types. Somatic gene knockouts showed both genes are required for efficient non-self recognition and successful mating, as assessed by pair formation and progeny production. The germline mating type locus consists of a tandem array of incomplete gene pairs representing each potential mating type. During mating, a complete new gene pair is assembled at the somatic mating type locus; the incomplete genes of one gene pair are completed by joining to gene segments at each end of germline array. All other germline gene pairs are deleted in the process. These programmed DNA rearrangements make this a fascinating system of mating type determination.

Subject(s)

Reproduction/physiology , Tetrahymena thermophila/physiology , Reproduction/genetics , Tetrahymena thermophila/genetics

19.

Experimental evidence for the role of domain swapping in the evolution of the histone fold.

Hadjithomas, Michalis; Moudrianakis, Evangelos N.

Proc Natl Acad Sci U S A ; 108(33): 13462-7, 2011 Aug 16.

Article in English | MEDLINE | ID: mdl-21813758

ABSTRACT

The histone fold forms the fundamental endoskeleton of the protein core of the nucleosome and is also found in several transcription factors. We have investigated the evolutionary origins of this ubiquitous protein motif, which is found soluble exclusively as an antiparallel (handshake motif) dimer. We introduced a three amino acid insertion into the middle of a homodimeric archaeal histone fold motif. The engineered molecule was found to be a soluble and stable monomer with properties consistent with a four-helix-bundle protein. The experimental evidence presented here support the hypothesis that the handshake association motif characteristic of present-day histone dimers is the evolutionary product of domain swapping between two four-helix bundle domains, each of which derived from the tandem duplication of a primitive helix-strand-helix unit.

Subject(s)

Archaeal Proteins/genetics , Evolution, Molecular , Histones/chemistry , Protein Multimerization , Archaeal Proteins/chemistry , Euryarchaeota/chemistry , Euryarchaeota/genetics , Histones/genetics , Mutagenesis, Insertional , Protein Structure, Tertiary

20.

Structural characterization of the Rous sarcoma virus RNA stability element.

Weil, Jason E; Hadjithomas, Michalis; Beemon, Karen L.

J Virol ; 83(5): 2119-29, 2009 Mar.

Article in English | MEDLINE | ID: mdl-19091866

ABSTRACT

In eukaryotic cells, an mRNA bearing a premature termination codon (PTC) or an abnormally long 3' untranslated region (UTR) is often degraded by the nonsense-mediated mRNA decay (NMD) pathway. Despite the presence of a 5- to 7-kb 3' UTR, unspliced retroviral RNA escapes this degradation. We previously identified the Rous sarcoma virus (RSV) stability element (RSE), an RNA element downstream of the gag natural translation termination codon that prevents degradation of the unspliced viral RNA. Insertion of this element downstream of a PTC in the RSV gag gene also inhibits NMD. Using partial RNase digestion and selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) chemistry, we determined the secondary structure of this element. Incorporating RNase and SHAPE data into structural prediction programs definitively shows that the RSE contains an AU-rich stretch of about 30 single-stranded nucleotides near the 5' end and two substantial stem-loop structures. The overall secondary structure of the RSE appears to be conserved among 20 different avian retroviruses. The structural aspects of this element will serve as a tool in the future design of cis mutants in addressing the mechanism of stabilization.

Subject(s)

Nucleic Acid Conformation , RNA Stability/genetics , RNA, Viral/genetics , Rous sarcoma virus/genetics , Base Sequence , Molecular Sequence Data , RNA, Messenger/genetics , Sequence Alignment , Sequence Analysis, RNA

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL