Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 27
Filter
1.
Article in English | MEDLINE | ID: mdl-37022754

ABSTRACT

A strictly anaerobic hyperthermophilic archaeon, designated strain IOH2T, was isolated from a deep-sea hydrothermal vent (Onnuri vent field) area on the Central Indian Ocean Ridge. Strain IOH2T showed high 16S rRNA gene sequence similarity to Thermococcus sibiricus MM 739T (99.42 %), Thermococcus alcaliphilus DSM 10322T (99.28 %), Thermococcus aegaeus P5T (99.21 %), Thermococcus litoralis DSM 5473T (99.13 %), 'Thermococcus bergensis' T7324T (99.13 %), Thermococcus aggregans TYT (98.92 %) and Thermococcus prieurii Bio-pl-0405IT2T (98.01 %), with all other strains showing lower than 98 % similarity. The average nucleotide identity and in silico DNA-DNA hybridization values were highest between strain IOH2T and T. sibiricus MM 739T (79.33 and 15.00 %, respectively); these values are much lower than the species delineation cut-offs. Cells of strain IOH2T were coccoid, 1.0-1.2 µm in diameter and had no flagella. Growth ranges were 60-85 °C (optimum at 80 °C), pH 4.5-8.5 (optimum at pH 6.3) and 2.0-6.0 % (optimum at 4.0 %) NaCl. Growth of strain IOH2T was enhanced by starch, glucose, maltodextrin and pyruvate as a carbon source, and elemental sulphur as an electron acceptor. Through genome analysis of strain IOH2T, arginine biosynthesis related genes were predicted, and growth of strain IOH2T without arginine was confirmed. The genome of strain IOH2T was assembled as a circular chromosome of 1 946 249 bp and predicted 2096 genes. The DNA G+C content was 39.44 mol%. Based on the results of physiological and phylogenetic analyses, Thermococcus argininiproducens sp. nov. is proposed with type strain IOH2T (=MCCC 4K00089T=KCTC 25190T).


Subject(s)
Thermococcus , Thermococcus/genetics , Seawater , Base Composition , Phylogeny , RNA, Ribosomal, 16S/genetics , Indian Ocean , DNA, Bacterial/genetics , Fatty Acids/chemistry , Sequence Analysis, DNA , Bacterial Typing Techniques
2.
BMC Genomics ; 22(1): 830, 2021 Nov 17.
Article in English | MEDLINE | ID: mdl-34789157

ABSTRACT

BACKGROUND: Trichoderma is a genus of fungi in the family Hypocreaceae and includes species known to produce enzymes with commercial use. They are largely found in soil and terrestrial plants. Recently, Trichoderma simmonsii isolated from decaying bark and decorticated wood was newly identified in the Harzianum clade of Trichoderma. Due to a wide range of applications in agriculture and other industries, genomes of at least 12 Trichoderma spp. have been studied. Moreover, antifungal and enzymatic activities have been extensively characterized in Trichoderma spp. However, the genomic information and bioactivities of T. simmonsii from a particular marine-derived isolate remain largely unknown. While we screened for asparaginase-producing fungi, we observed that T. simmonsii GH-Sj1 strain isolated from edible kelp produced asparaginase. In this study, we report a draft genome of T. simmonsii GH-Sj1 using Illumina and Oxford Nanopore technologies. Furthermore, to facilitate biotechnological applications of this species, RNA-sequencing was performed to elucidate the transcriptional profile of T. simmonsii GH-Sj1 in response to asparaginase-rich conditions. RESULTS: We generated ~ 14 Gb of sequencing data assembled in a ~ 40 Mb genome. The T. simmonsii GH-Sj1 genome consisted of seven telomere-to-telomere scaffolds with no sequencing gaps, where the N50 length was 6.4 Mb. The total number of protein-coding genes was 13,120, constituting ~ 99% of the genome. The genome harbored 176 tRNAs, which encode a full set of 20 amino acids. In addition, it had an rRNA repeat region consisting of seven repeats of the 18S-ITS1-5.8S-ITS2-26S cluster. The T. simmonsii genome also harbored 7 putative asparaginase-encoding genes with potential medical applications. Using RNA-sequencing analysis, we found that 3 genes among the 7 putative genes were significantly upregulated under asparaginase-rich conditions. CONCLUSIONS: The genome and transcriptome of T. simmonsii GH-Sj1 established in the current work represent valuable resources for future comparative studies on fungal genomes and asparaginase production.


Subject(s)
Trichoderma , Asparaginase , Genome , Hypocreales , Telomere , Trichoderma/genetics
3.
BMC Bioinformatics ; 20(Suppl 11): 276, 2019 Jun 06.
Article in English | MEDLINE | ID: mdl-31167633

ABSTRACT

BACKGROUND: A crucial task in metagenomic analysis is to annotate the function and taxonomy of the sequencing reads generated from a microbiome sample. In general, the reads can either be assembled into contigs and searched against reference databases, or individually searched without assembly. The first approach may suffer from fragmentary and incomplete assembly, while the second is hampered by the reduced functional signal contained in the short reads. To tackle these issues, we have previously developed GRASP (Guided Reference-based Assembly of Short Peptides), which accepts a reference protein sequence as input and aims to assemble its homologs from a database containing fragmentary protein sequences. In addition to a gene-centric assembly tool, GRASP also serves as a homolog search tool when using the assembled protein sequences as templates to recruit reads. GRASP has significantly improved recall rate (60-80% vs. 30-40%) compared to other homolog search tools such as BLAST. However, GRASP is both time- and space-consuming. Subsequently, we developed GRASPx, which is 30X faster than GRASP. Here, we present a completely redesigned algorithm, GRASP2, for this computational problem. RESULTS: GRASP2 utilizes Burrows-Wheeler Transformation (BWT) and FM-index to perform assembly graph generation, and reduces the search space by employing a fast ungapped alignment strategy as a filter. GRASP2 also explicitly generates candidate paths prior to alignment, which effectively uncouples the iterative access of the assembly graph and alignment matrix. This strategy makes the execution of the program more efficient under current computer architecture, and contributes to GRASP2's speedup. GRASP2 is 8-fold faster than GRASPx (and 250-fold faster than GRASP) and uses 8-fold less memory while maintaining the original high recall rate of GRASP. GRASP2 reaches ~ 80% recall rate compared to that of ~ 40% generated by BLAST, both at a high precision level (> 95%). With such a high performance, GRASP2 is only ~3X slower than BLASTP. CONCLUSION: GRASP2 is a high-performance gene-centric and homolog search tool with significant speedup compared to its predecessors, which makes GRASP2 a useful tool for metagenomics data analysis, GRASP2 is implemented in C++ and is freely available from http://www.sourceforge.net/projects/grasp2 .


Subject(s)
Genes , Metagenomics/methods , Sequence Analysis, DNA/methods , Sequence Homology, Nucleic Acid , Software , Algorithms , Aquatic Organisms/genetics , Microbiota/genetics , ROC Curve , Time Factors
4.
Proc Natl Acad Sci U S A ; 112(24): 7569-74, 2015 Jun 16.
Article in English | MEDLINE | ID: mdl-26034276

ABSTRACT

One major challenge to studying human microbiome and its associated diseases is the lack of effective tools to achieve targeted modulation of individual species and study its ecological function within multispecies communities. Here, we show that C16G2, a specifically targeted antimicrobial peptide, was able to selectively kill cariogenic pathogen Streptococcus mutans with high efficacy within a human saliva-derived in vitro oral multispecies community. Importantly, a significant shift in the overall microbial structure of the C16G2-treated community was revealed after a 24-h recovery period: several bacterial species with metabolic dependency or physical interactions with S. mutans suffered drastic reduction in their abundance, whereas S. mutans' natural competitors, including health-associated Streptococci, became dominant. This study demonstrates the use of targeted antimicrobials to modulate the microbiome structure allowing insights into the key community role of specific bacterial species and also indicates the therapeutic potential of C16G2 to achieve a healthy oral microbiome.


Subject(s)
Antimicrobial Cationic Peptides/pharmacology , Microbiota/drug effects , Streptococcus mutans/drug effects , Streptococcus mutans/physiology , Adult , Anti-Bacterial Agents/pharmacology , Biofilms/drug effects , Biofilms/growth & development , Dental Caries/microbiology , Humans , Microbial Sensitivity Tests , Mouth/microbiology , Saliva/microbiology , Streptococcus mutans/pathogenicity
5.
Proc Natl Acad Sci U S A ; 112(4): 1173-8, 2015 Jan 27.
Article in English | MEDLINE | ID: mdl-25587132

ABSTRACT

Thaumarchaeota are among the most abundant microbial cells in the ocean, but difficulty in cultivating marine Thaumarchaeota has hindered investigation into the physiological and evolutionary basis of their success. We report here a closed genome assembled from a highly enriched culture of the ammonia-oxidizing pelagic thaumarchaeon CN25, originating from the open ocean. The CN25 genome exhibits strong evidence of genome streamlining, including a 1.23-Mbp genome, a high coding density, and a low number of paralogous genes. Proteomic analysis recovered nearly 70% of the predicted proteins encoded by the genome, demonstrating that a high fraction of the genome is translated. In contrast to other minimal marine microbes that acquire, rather than synthesize, cofactors, CN25 encodes and expresses near-complete biosynthetic pathways for multiple vitamins. Metagenomic fragment recruitment indicated the presence of DNA sequences >90% identical to the CN25 genome throughout the oligotrophic ocean. We propose the provisional name "Candidatus Nitrosopelagicus brevis" str. CN25 for this minimalist marine thaumarchaeon and suggest it as a potential model system for understanding archaeal adaptation to the open ocean.


Subject(s)
Archaea , Archaeal Proteins , Gene Expression Regulation, Archaeal/physiology , Proteome , Proteomics , Water Microbiology , Amino Acid Sequence , Archaea/classification , Archaea/genetics , Archaea/metabolism , Archaeal Proteins/biosynthesis , Archaeal Proteins/genetics , Metagenomics , Molecular Sequence Data , Oceans and Seas , Proteome/biosynthesis , Proteome/genetics
6.
PLoS Comput Biol ; 12(7): e1004991, 2016 07.
Article in English | MEDLINE | ID: mdl-27400380

ABSTRACT

Analyses of metagenome data (MG) and metatranscriptome data (MT) are often challenged by a paucity of complete reference genome sequences and the uneven/low sequencing depth of the constituent organisms in the microbial community, which respectively limit the power of reference-based alignment and de novo sequence assembly. These limitations make accurate protein family classification and abundance estimation challenging, which in turn hamper downstream analyses such as abundance profiling of metabolic pathways, identification of differentially encoded/expressed genes, and de novo reconstruction of complete gene and protein sequences from the protein family of interest. The profile hidden Markov model (HMM) framework enables the construction of very useful probabilistic models for protein families that allow for accurate modeling of position specific matches, insertions, and deletions. We present a novel homology detection algorithm that integrates banded Viterbi algorithm for profile HMM parsing with an iterative simultaneous alignment and assembly computational framework. The algorithm searches a given profile HMM of a protein family against a database of fragmentary MG/MT sequencing data and simultaneously assembles complete or near-complete gene and protein sequences of the protein family. The resulting program, HMM-GRASPx, demonstrates superior performance in aligning and assembling homologs when benchmarked on both simulated marine MG and real human saliva MG datasets. On real supragingival plaque and stool MG datasets that were generated from healthy individuals, HMM-GRASPx accurately estimates the abundances of the antimicrobial resistance (AMR) gene families and enables accurate characterization of the resistome profiles of these microbial communities. For real human oral microbiome MT datasets, using the HMM-GRASPx estimated transcript abundances significantly improves detection of differentially expressed (DE) genes. Finally, HMM-GRASPx was used to reconstruct comprehensive sets of complete or near-complete protein and nucleotide sequences for the query protein families. HMM-GRASPx is freely available online from http://sourceforge.net/projects/hmm-graspx.


Subject(s)
Computational Biology/methods , Gene Expression Profiling/methods , Metagenomics/methods , Proteins/analysis , Proteins/genetics , Algorithms , Anti-Bacterial Agents/pharmacology , Bacteria/drug effects , Bacteria/genetics , Bacteria/metabolism , Computer Simulation , Databases, Genetic , Drug Resistance, Bacterial/genetics , Humans , Metagenome/genetics , Models, Theoretical , Proteins/metabolism , Saliva/chemistry , Saliva/metabolism , Transcriptome/genetics
7.
Nucleic Acids Res ; 43(3): e18, 2015 Feb 18.
Article in English | MEDLINE | ID: mdl-25414351

ABSTRACT

Protein sequences predicted from metagenomic datasets are annotated by identifying their homologs via sequence comparisons with reference or curated proteins. However, a majority of metagenomic protein sequences are partial-length, arising as a result of identifying genes on sequencing reads or on assembled nucleotide contigs, which themselves are often very fragmented. The fragmented nature of metagenomic protein predictions adversely impacts homology detection and, therefore, the quality of the overall annotation of the dataset. Here we present a novel algorithm called GRASP that accurately identifies the homologs of a given reference protein sequence from a database consisting of partial-length metagenomic proteins. Our homology detection strategy is guided by the reference sequence, and involves the simultaneous search and assembly of overlapping database sequences. GRASP was compared to three commonly used protein sequence search programs (BLASTP, PSI-BLAST and FASTM). Our evaluations using several simulated and real datasets show that GRASP has a significantly higher sensitivity than these programs while maintaining a very high specificity. GRASP can be a very useful program for detecting and quantifying taxonomic and protein family abundances in metagenomic datasets. GRASP is implemented in GNU C++, and is freely available at http://sourceforge.net/projects/grasp-release.


Subject(s)
Peptides/chemistry , Algorithms , Databases, Protein , Metagenome , Peptides/genetics
8.
BMC Bioinformatics ; 17 Suppl 8: 283, 2016 Aug 31.
Article in English | MEDLINE | ID: mdl-27585568

ABSTRACT

BACKGROUND: Metagenomics is a cultivation-independent approach that enables the study of the genomic composition of microbes present in an environment. Metagenomic samples are routinely sequenced using next-generation sequencing technologies that generate short nucleotide reads. Proteins identified from these reads are mostly of partial length. On the other hand, de novo assembly of a large metagenomic dataset is computationally demanding and the assembled contigs are often fragmented, resulting in the identification of protein sequences that are also of partial length and incomplete. Annotation of an incomplete protein sequence often proceeds by identifying its homologs in a database of reference sequences. Identifying the homologs of incomplete sequences is a challenge and can result in substandard annotation of proteins from metagenomic datasets. To address this problem, we recently developed a homology detection algorithm named GRASP (Guided Reference-based Assembly of Short Peptides) that identifies the homologs of a given reference protein sequence in a database of short peptide metagenomic sequences. GRASP was developed to implement a simultaneous alignment and assembly algorithm for annotation of short peptides identified on metagenomic reads. The program achieves significantly improved recall rate at the cost of computational efficiency. In this article, we adopted three techniques to speed up the original version of GRASP, including the pre-construction of extension links, local assembly of individual seeds, and the implementation of query-level parallelism. RESULTS: The resulting new program, GRASPx, achieves >30X speedup compared to its predecessor GRASP. At the same time, we show that the performance of GRASPx is consistent with that of GRASP, and that both of them significantly outperform other popular homology-search tools including the BLAST and FASTA suites. GRASPx was also applied to a human saliva metagenome dataset and shows superior performance for both recall and precision rates. CONCLUSIONS: In this article we present GRASPx, a fast and accurate homology-search program implementing a simultaneous alignment and assembly framework. GRASPx can be used for more comprehensive and accurate annotation of short peptides. GRASPx is freely available at http://graspx.sourceforge.net/ .


Subject(s)
Algorithms , Databases, Protein , Metagenome , Metagenomics/methods , Peptides/chemistry , Sequence Alignment/methods , Sequence Homology, Amino Acid , Amino Acid Sequence , Computer Simulation , Humans
9.
Bioinformatics ; 31(11): 1833-5, 2015 Jun 01.
Article in English | MEDLINE | ID: mdl-25637561

ABSTRACT

UNLABELLED: The determination of protein sequences from a metagenomic dataset enables the study of metabolism and functional roles of the organisms that are present in the sampled microbial community. We had previously introduced algorithm and software for the accurate reconstruction of protein sequences from short peptides identified on nucleotide reads in a metagenomic dataset. Here, we present significant computational improvements to the short peptide assembly algorithm that make it practical to reconstruct proteins from large metagenomic datasets containing several hundred million reads, while maintaining accuracy. The improved computational efficiency is achieved using a suffix array data structure that allows for fast querying during the assembly process, and a significant redesign of assembly steps that enables multi-threaded execution. AVAILABILITY AND IMPLEMENTATION: The program is available under the GPLv3 license from sourceforge.net/projects/spa-assembler.


Subject(s)
Metagenomics/methods , Peptides/chemistry , Sequence Analysis, Protein/methods , Software , Algorithms
10.
Proc Natl Acad Sci U S A ; 110(26): E2390-9, 2013 Jun 25.
Article in English | MEDLINE | ID: mdl-23754396

ABSTRACT

The "dark matter of life" describes microbes and even entire divisions of bacterial phyla that have evaded cultivation and have yet to be sequenced. We present a genome from the globally distributed but elusive candidate phylum TM6 and uncover its metabolic potential. TM6 was detected in a biofilm from a sink drain within a hospital restroom by analyzing cells using a highly automated single-cell genomics platform. We developed an approach for increasing throughput and effectively improving the likelihood of sampling rare events based on forming small random pools of single-flow-sorted cells, amplifying their DNA by multiple displacement amplification and sequencing all cells in the pool, creating a "mini-metagenome." A recently developed single-cell assembler, SPAdes, in combination with contig binning methods, allowed the reconstruction of genomes from these mini-metagenomes. A total of 1.07 Mb was recovered in seven contigs for this member of TM6 (JCVI TM6SC1), estimated to represent 90% of its genome. High nucleotide identity between a total of three TM6 genome drafts generated from pools that were independently captured, amplified, and assembled provided strong confirmation of a correct genomic sequence. TM6 is likely a Gram-negative organism and possibly a symbiont of an unknown host (nonfree living) in part based on its small genome, low-GC content, and lack of biosynthesis pathways for most amino acids and vitamins. Phylogenomic analysis of conserved single-copy genes confirms that TM6SC1 is a deeply branching phylum.


Subject(s)
Biofilms , Hospitals , Metagenome , Sanitary Engineering , Water Microbiology , Bacteria/classification , Bacteria/genetics , Bacteria/isolation & purification , DNA, Bacterial/genetics , DNA, Bacterial/isolation & purification , DNA, Bacterial/metabolism , Evolution, Molecular , Genome, Bacterial , Humans , Metabolic Networks and Pathways , Metagenomics/methods , Molecular Sequence Data , Phylogeny , Water Supply
11.
Nucleic Acids Res ; 41(8): e91, 2013 Apr.
Article in English | MEDLINE | ID: mdl-23435317

ABSTRACT

The metagenomic paradigm allows for an understanding of the metabolic and functional potential of microbes in a community via a study of their proteins. The substrate for protein identification is either the set of individual nucleotide reads generated from metagenomic samples or the set of contig sequences produced by assembling these reads. However, a read-based strategy using reads generated by next-generation sequencing (NGS) technologies, results in an overwhelming majority of partial-length protein predictions. A nucleotide assembly-based strategy does not fare much better, as metagenomic assemblies are typically fragmented and also leave a large fraction of reads unassembled. Here, we present a method for reconstructing complete protein sequences directly from NGS metagenomic data. Our framework is based on a novel short peptide assembler (SPA) that assembles protein sequences from their constituent peptide fragments identified on short reads. The SPA algorithm is based on informed traversals of a de Bruijn graph, defined on an amino acid alphabet, to identify probable paths that correspond to proteins. Using large simulated and real metagenomic data sets, we show that our method outperforms the alternate approach of identifying genes on nucleotide sequence assemblies and generates longer protein sequences that can be more effectively analysed.


Subject(s)
Algorithms , Metagenomics/methods , Sequence Analysis, Protein/methods , High-Throughput Nucleotide Sequencing , Peptides/chemistry , Sensitivity and Specificity
12.
NAR Genom Bioinform ; 5(1): lqad023, 2023 Mar.
Article in English | MEDLINE | ID: mdl-36915411

ABSTRACT

Metagenomics is the study of all genomic content contained in given microbial communities. Metagenomic functional analysis aims to quantify protein families and reconstruct metabolic pathways from the metagenome. It plays a central role in understanding the interaction between the microbial community and its host or environment. De novo functional analysis, which allows the discovery of novel protein families, remains challenging for high-complexity communities. There are currently three main approaches for recovering novel genes or proteins: de novo nucleotide assembly, gene calling and peptide assembly. Unfortunately, their information dependency has been overlooked, and each has been formulated as an independent problem. In this work, we develop a sophisticated workflow called integrated Metagenomic Protein Predictor (iMPP), which leverages the information dependencies for better de novo functional analysis. iMPP contains three novel modules: a hybrid assembly graph generation module, a graph-based gene calling module, and a peptide assembly-based refinement module. iMPP significantly improved the existing gene calling sensitivity on unassembled metagenomic reads, achieving a 92-97% recall rate at a high precision level (>85%). iMPP further allowed for more sensitive and accurate peptide assembly, recovering more reference proteins and delivering more hypothetical protein sequences. The high performance of iMPP can provide a more comprehensive and unbiased view of the microbial communities under investigation. iMPP is freely available from https://github.com/Sirisha-t/iMPP.

13.
BMC Bioinformatics ; 13 Suppl 3: S15, 2012 Mar 21.
Article in English | MEDLINE | ID: mdl-22536899

ABSTRACT

BACKGROUND: DNA methylation is essential for normal development and differentiation and plays a crucial role in the development of nearly all types of cancer. Aberrant DNA methylation patterns, including genome-wide hypomethylation and region-specific hypermethylation, are frequently observed and contribute to the malignant phenotype. A number of studies have recently identified distinct features of genomic sequences that can be used for modeling specific DNA sequences that may be susceptible to aberrant CpG methylation in both cancer and normal cells. Although it is now possible, using next generation sequencing technologies, to assess human methylomes at base resolution, no reports currently exist on modeling cell type-specific DNA methylation susceptibility. Thus, we conducted a comprehensive modeling study of cell type-specific DNA methylation susceptibility at three different resolutions: CpG dinucleotides, CpG segments, and individual gene promoter regions. RESULTS: Using a k-mer mixture logistic regression model, we effectively modeled DNA methylation susceptibility across five different cell types. Further, at the segment level, we achieved up to 0.75 in AUC prediction accuracy in a 10-fold cross validation study using a mixture of k-mers. CONCLUSIONS: The significance of these results is three fold: 1) this is the first report to indicate that CpG methylation susceptible "segments" exist; 2) our model demonstrates the significance of certain k-mers for the mixture model, potentially highlighting DNA sequence features (k-mers) of differentially methylated, promoter CpG island sequences across different tissue types; 3) as only 3 or 4 bp patterns had previously been used for modeling DNA methylation susceptibility, ours is the first demonstration that 6-mer modeling can be performed without loss of accuracy.


Subject(s)
DNA Methylation , Logistic Models , Models, Genetic , Promoter Regions, Genetic , Chromosomes, Human, Pair 21 , CpG Islands , Down Syndrome/genetics , Humans , Neoplasms/genetics
14.
Mitochondrial DNA B Resour ; 7(4): 640-641, 2022.
Article in English | MEDLINE | ID: mdl-35425856

ABSTRACT

Fungal species in the genus Trichoderma are widely used for industrial enzyme production and as biocontrol agents. In this study, we report the complete mitochondrial genome of a marine-derived Trichoderma simmonsii strain GH-Sj1, which belongs to the Harzianum clade of Trichoderma. GH-Sj1 was isolated from an edible sea alga Saccharina japonica collected from the southern coast of Korea. This newly assembled circular molecule is 28,668 bp in length and consists of 15 protein-coding genes, 26 transfer RNA genes, and two ribosomal RNA genes. Phylogenetic analysis using the maximum likelihood method shows that T. simmonsii GH-Sj1 is closely related to Trichoderma harzianum and Trichoderma lixii. To the best of our knowledge, this is the first characterization of a marine-derived mitogenome within the genus Trichoderma.

15.
J Microbiol ; 60(9): 916-927, 2022 Sep.
Article in English | MEDLINE | ID: mdl-35913594

ABSTRACT

Siboglinid tubeworms thrive in hydrothermal vent and seep habitats via a symbiotic relationship with chemosynthetic bacteria. Difficulties in culturing tubeworms and their symbionts in a laboratory setting have hindered the study of host-microbe interactions. Therefore, released symbiont genomes are fragmented, thereby limiting the data available on the genome that affect subsequent analyses. Here, we present a complete genome of gammaproteobacterial endosymbiont from the tubeworm Lamellibrachia satsuma collected from a seep in Kagoshima Bay, assembled using a hybrid approach that combines sequences generated from the Illumina and Oxford Nano-pore platforms. The genome consists of a single circular chromosome with an assembly size of 4,323,754 bp and a GC content of 53.9% with 3,624 protein-coding genes. The genome is of high quality and contains no assembly gaps, while the completeness and contamination are 99.33% and 2.73%, respectively. Comparative genome analysis revealed a total of 1,724 gene clusters shared in the vent and seep tubeworm symbionts, while 294 genes were found exclusively in L. satsuma symbionts such as transposons, genes for defense mechanisms, and inorganic ion transportations. The addition of this complete endosymbiont genome assembly would be valuable for comparative studies particularly with tubeworm symbiont genomes as well as with other chemosynthetic microbial communities.


Subject(s)
Hydrothermal Vents , Microbiota , Polychaeta , Animals , Bacteria/genetics , Hydrothermal Vents/microbiology , Polychaeta/genetics , Polychaeta/microbiology , Symbiosis
16.
Gigascience ; 11(1)2022 01 12.
Article in English | MEDLINE | ID: mdl-35022698

ABSTRACT

BACKGROUND: The shuttles hoppfish (mudskipper), Periophthalmus modestus, is one of the mudskippers, which are the largest group of amphibious teleost fishes, which are uniquely adapted to live on mudflats. Because mudskippers can survive on land for extended periods by breathing through their skin and through the lining of the mouth and throat, they were evaluated as a model for the evolutionary sea-land transition of Devonian protoamphibians, ancestors of all present tetrapods. RESULTS: A total of 39.6, 80.2, 52.9, and 33.3 Gb of Illumina, Pacific Biosciences, 10X linked, and Hi-C data, respectively, was assembled into 1,419 scaffolds with an N50 length of 33 Mb and BUSCO score of 96.6%. The assembly covered 117% of the estimated genome size (729 Mb) and included 23 pseudo-chromosomes anchored by a Hi-C contact map, which corresponded to the top 23 longest scaffolds above 20 Mb and close to the estimated one. Of the genome, 43.8% were various repetitive elements such as DNAs, tandem repeats, long interspersed nuclear elements, and simple repeats. Ab initio and homology-based gene prediction identified 30,505 genes, of which 94% had homology to the 14 Actinopterygii transcriptomes and 89% and 85% to Pfam familes and InterPro domains, respectively. Comparative genomics with 15 Actinopterygii species identified 59,448 gene families of which 12% were only in P. modestus. CONCLUSIONS: We present the high quality of the first genome assembly and gene annotation of the shuttles hoppfish. It will provide a valuable resource for further studies on sea-land transition, bimodal respiration, nitrogen excretion, osmoregulation, thermoregulation, vision, and mechanoreception.


Subject(s)
Chromosomes , Genome , Animals , Chromosomes/genetics , Genomics , Molecular Sequence Annotation , Repetitive Sequences, Nucleic Acid
17.
Bioinformatics ; 26(1): 22-9, 2010 Jan 01.
Article in English | MEDLINE | ID: mdl-19855104

ABSTRACT

MOTIVATION: The massively parallel sequencing technology can be used by small research labs to generate genome sequences of their research interest. However, annotation of genomes still relies on the manual process, which becomes a serious bottleneck to the high-throughput genome projects. Recently, automatic annotation methods are increasingly more accurate, but there are several issues. One important challenge in using automatic annotation methods is to distinguish annotation quality of ORFs or genes. The availability of such annotation quality of genes can reduce the human labor cost dramatically since manual inspection can focus only on genes with low-annotation quality scores. RESULTS: In this article, we propose a novel annotation quality or confidence scoring scheme, called Annotation Confidence Score (ACS), using a genome comparison approach. The scoring scheme is computed by combining sequence and textual annotation similarity using a modified version of a logistic curve. The most important feature of the proposed scoring scheme is to generate a score that reflects the excellence in annotation quality of genes by automatically adjusting the number of genomes used to compute the score and their phylogenetic distance. Extensive experiments with bacterial genomes showed that the proposed scoring scheme generated scores for annotation quality according to the quality of annotation regardless of the number of reference genomes and their phylogenetic distance. AVAILABILITY: http://microbial.informatics.indiana.edu/acs


Subject(s)
Algorithms , Chromosome Mapping/methods , Genome/genetics , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Base Sequence , Confidence Intervals , Data Interpretation, Statistical , Molecular Sequence Data
18.
BMC Genomics ; 11: 703, 2010 Dec 14.
Article in English | MEDLINE | ID: mdl-21156066

ABSTRACT

BACKGROUND: Horned beetles, in particular in the genus Onthophagus, are important models for studies on sexual selection, biological radiations, the origin of novel traits, developmental plasticity, biocontrol, conservation, and forensic biology. Despite their growing prominence as models for studying both basic and applied questions in biology, little genomic or transcriptomic data are available for this genus. We used massively parallel pyrosequencing (Roche 454-FLX platform) to produce a comprehensive EST dataset for the horned beetle Onthophagus taurus. To maximize sequence diversity, we pooled RNA extracted from a normalized library encompassing diverse developmental stages and both sexes. RESULTS: We used 454 pyrosequencing to sequence ESTs from all post-embryonic stages of O. taurus. Approximately 1.36 million reads assembled into 50,080 non-redundant sequences encompassing a total of 26.5 Mbp. The non-redundant sequences match over half of the genes in Tribolium castaneum, the most closely related species with a sequenced genome. Analyses of Gene Ontology annotations and biochemical pathways indicate that the O. taurus sequences reflect a wide and representative sampling of biological functions and biochemical processes. An analysis of sequence polymorphisms revealed that SNP frequency was negatively related to overall expression level and the number of tissue types in which a given gene is expressed. The most variable genes were enriched for a limited number of GO annotations whereas the least variable genes were enriched for a wide range of GO terms directly related to fitness. CONCLUSIONS: This study provides the first large-scale EST database for horned beetles, a much-needed resource for advancing the study of these organisms. Furthermore, we identified instances of gene duplications and alternative splicing, useful for future study of gene regulation, and a large number of SNP markers that could be used in population-genetic studies of O. taurus and possibly other horned beetles.


Subject(s)
Coleoptera/anatomy & histology , Coleoptera/genetics , Genes, Insect/genetics , Horns , Alternative Splicing/genetics , Animals , Base Sequence , Cluster Analysis , Databases, Genetic , Databases, Protein , Metabolic Networks and Pathways/genetics , Molecular Sequence Annotation , Phylogeny , Polymorphism, Single Nucleotide/genetics , Repetitive Sequences, Nucleic Acid/genetics , Sequence Analysis, DNA
19.
BMC Genomics ; 11: 694, 2010 Dec 07.
Article in English | MEDLINE | ID: mdl-21138572

ABSTRACT

BACKGROUND: The reptiles, characterized by both diversity and unique evolutionary adaptations, provide a comprehensive system for comparative studies of metabolism, physiology, and development. However, molecular resources for ectothermic reptiles are severely limited, hampering our ability to study the genetic basis for many evolutionarily important traits such as metabolic plasticity, extreme longevity, limblessness, venom, and freeze tolerance. Here we use massively parallel sequencing (454 GS-FLX Titanium) to generate a transcriptome of the western terrestrial garter snake (Thamnophis elegans) with two goals in mind. First, we develop a molecular resource for an ectothermic reptile; and second, we use these sex-specific transcriptomes to identify differences in the presence of expressed transcripts and potential genes of evolutionary interest. RESULTS: Using sex-specific pools of RNA (one pool for females, one pool for males) representing 7 tissue types and 35 diverse individuals, we produced 1.24 million sequence reads, which averaged 366 bp in length after cleaning. Assembly of the cleaned reads from both sexes with NEWBLER and MIRA resulted in 96,379 contigs containing 87% of the cleaned reads. Over 34% of these contigs and 13% of the singletons were annotated based on homology to previously identified proteins. From these homology assignments, additional clustering, and ORF predictions, we estimate that this transcriptome contains ~13,000 unique genes that were previously identified in other species and over 66,000 transcripts from unidentified protein-coding genes. Furthermore, we use a graph-clustering method to identify contigs linked by NEWBLER-split reads that represent divergent alleles, gene duplications, and alternatively spliced transcripts. Beyond gene identification, we identified 95,295 SNPs and 31,651 INDELs. From these sex-specific transcriptomes, we identified 190 genes that were only present in the mRNA sequenced from one of the sexes (84 female-specific, 106 male-specific), and many highly variable genes of evolutionary interest. CONCLUSIONS: This is the first large-scale, multi-organ transcriptome for an ectothermic reptile. This resource provides the most comprehensive set of EST sequences available for an individual ectothermic reptile species, increasing the number of snake ESTs 50-fold. We have identified genes that appear to be under evolutionary selection and those that are sex-specific. This resource will assist studies on gene expression and comparative genomics, and will facilitate the study of evolutionarily important traits at the molecular level.


Subject(s)
Colubridae/genetics , Gene Expression Profiling , High-Throughput Nucleotide Sequencing/methods , Sex Characteristics , Animals , Base Sequence , Cluster Analysis , Female , Gene Expression Regulation , Genome/genetics , Lizards/genetics , Major Histocompatibility Complex/genetics , Male , Molecular Sequence Annotation , Mutation/genetics , Phylogeny , RNA, Messenger/genetics , RNA, Messenger/metabolism , Sequence Analysis, DNA , Sequence Homology, Nucleic Acid , Titanium
20.
Sci Data ; 7(1): 85, 2020 03 09.
Article in English | MEDLINE | ID: mdl-32152293

ABSTRACT

Crustacean amphipods are important trophic links between primary producers and higher consumers. Although most amphipods occur in or around aquatic environments, the family Talitridae is the only family found in terrestrial and semi-terrestrial habitats. The sand-hopper Trinorchestia longiramus is a talitrid species often found in the sandy beaches of South Korea. In this study, we present the first draft genome assembly and annotation of this species. We generated ~380.3 Gb of sequencing data assembled in a 0.89 Gb draft genome. Annotation analysis estimated 26,080 protein-coding genes, with 89.9% genome completeness. Comparison with other amphipods showed that T. longiramus has 327 unique orthologous gene clusters, many of which are expanded gene families responsible for cellular transport of toxic substances, homeostatic processes, and ionic and osmotic stress tolerance. This first talitrid genome will be useful for further understanding the mechanisms of adaptation in terrestrial environments, the effects of heavy metal toxicity, as well as for studies of comparative genomic variation across amphipods.


Subject(s)
Amphipoda/genetics , Genome , Animals , Ecosystem , Genomics , Molecular Sequence Annotation , Multigene Family
SELECTION OF CITATIONS
SEARCH DETAIL