Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Sci Total Environ ; 941: 173737, 2024 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-38844214

RESUMEN

Bacterial communities in soil and rhizosphere maintain a large collection of antibiotic resistance genes (ARGs). However, few of these ARGs and antibiotic resistant bacteria (ARB) are well-characterized under traditional farming practices. Here we compared the ARG profiles of maize rhizosphere and their bulk soils using metagenomic analysis to identify the ARG dissemination and explored the potential impact of chemical fertilization on ARB. Results showed a relatively lower abundance but higher diversity of ARGs under fertilization than straw-return. Moreover, the abundance and diversity of MGEs were significantly promoted by chemical fertilizer inputs in the rhizosphere compared to bulk soil. Machine learning and bipartite networks identified three bacterial genera (Pseudomonas, Bacillus and Streptomyces) as biomarkers for ARG accumulation. Thus we cultured 509 isolates belonging to these three genera from the rhizosphere and tested their antimicrobial susceptibility, and found that multi-resistance was frequently observed among Pseudomonas isolates. Assembly-based tracking explained that ARGs and four class I integrons (LR134330, LS998783, CP065848, LT883143) were co-occurred among contigs from Pseudomonas sp. Chemical fertilizers may shape the resistomes of maize rhizosphere, highlighting that rhizosphere carried multidrug-resistant Pseudomonas isolates, which may pose a risk to animal and human health. This study adds knowledge of long-term chemical fertilization on ARG dissemination in farmland systems and provides information for decision-making in agricultural production and monitoring.


Asunto(s)
Agricultura , Fertilizantes , Rizosfera , Microbiología del Suelo , Zea mays , Zea mays/microbiología , Agricultura/métodos , Bacterias , Farmacorresistencia Microbiana/genética , Suelo/química , Genes Bacterianos
2.
Microbiol Resour Announc ; 13(6): e0019824, 2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38752760

RESUMEN

We examined the dynamics of soil microbiomes under heat press disturbance from an underground coal mine fire in Centralia, PA. Here, we present metagenomic sequencing and assembly data from soil microbiomes across seven consecutive years at repeatedly sampled fire-affected sites along with unaffected reference sites.

3.
BMC Bioinformatics ; 25(1): 54, 2024 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-38302873

RESUMEN

BACKGROUND: Transcriptome assembly from RNA-sequencing data in species without a reliable reference genome has to be performed de novo, but studies have shown that de novo methods often have inadequate ability to reconstruct transcript isoforms. We address this issue by constructing an assembly pipeline whose main purpose is to produce a comprehensive set of transcript isoforms. RESULTS: We present the de novo transcript isoform assembler ClusTrast, which takes short read RNA-seq data as input, assembles a primary assembly, clusters a set of guiding contigs, aligns the short reads to the guiding contigs, assembles each clustered set of short reads individually, and merges the primary and clusterwise assemblies into the final assembly. We tested ClusTrast on real datasets from six eukaryotic species, and showed that ClusTrast reconstructed more expressed known isoforms than any of the other tested de novo assemblers, at a moderate reduction in precision. For recall, ClusTrast was on top in the lower end of expression levels (<15% percentile) for all tested datasets, and over the entire range for almost all datasets. Reference transcripts were often (35-69% for the six datasets) reconstructed to at least 95% of their length by ClusTrast, and more than half of reference transcripts (58-81%) were reconstructed with contigs that exhibited polymorphism, measuring on a subset of reliably predicted contigs. ClusTrast recall increased when using a union of assembled transcripts from more than one assembly tool as primary assembly. CONCLUSION: We suggest that ClusTrast can be a useful tool for studying isoforms in species without a reliable reference genome, in particular when the goal is to produce a comprehensive transcriptome set with polymorphic variants.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Transcriptoma , Análisis de Secuencia , RNA-Seq , Análisis de Secuencia de ARN , Isoformas de Proteínas/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
4.
Mol Ecol Resour ; 24(3): e13920, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38153158

RESUMEN

Many applications in molecular ecology require the ability to match specific DNA sequences from single- or mixed-species samples with a diagnostic reference library. Widely used methods for DNA barcoding and metabarcoding employ PCR and amplicon sequencing to identify taxa based on target sequences, but the target-specific enrichment capabilities of CRISPR-Cas systems may offer advantages in some applications. We identified 54,837 CRISPR-Cas guide RNAs that may be useful for enriching chloroplast DNA across phylogenetically diverse plant species. We tested a subset of 17 guide RNAs in vitro to enrich plant DNA strands ranging in size from diagnostic DNA barcodes of 1,428 bp to entire chloroplast genomes of 121,284 bp. We used an Oxford Nanopore sequencer to evaluate sequencing success based on both single- and mixed-species samples, which yielded mean chloroplast sequence lengths of 2,530-11,367 bp, depending on the experiment. In comparison to mixed-species experiments, single-species experiments yielded more on-target sequence reads and greater mean pairwise identity between contigs and the plant species' reference genomes. But nevertheless, these mixed-species experiments yielded sufficient data to provide ≥48-fold increase in sequence length and better estimates of relative abundance for a commercially prepared mixture of plant species compared to DNA metabarcoding based on the chloroplast trnL-P6 marker. Prior work developed CRISPR-based enrichment protocols for long-read sequencing and our experiments pioneered its use for plant DNA barcoding and chloroplast assemblies that may have advantages over workflows that require PCR and short-read sequencing. Future work would benefit from continuing to develop in vitro and in silico methods for CRISPR-based analyses of mixed-species samples, especially when the appropriate reference genomes for contig assembly cannot be known a priori.


Asunto(s)
Biodiversidad , ARN Guía de Sistemas CRISPR-Cas , Análisis de Secuencia de ADN/métodos , Código de Barras del ADN Taxonómico/métodos , ADN de Plantas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
5.
Biology (Basel) ; 12(8)2023 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-37626980

RESUMEN

Viromes of Chinese narcissus flowers were explored using transcriptome data from 20 samples collected at different flower development stages. Quality controlled raw data underwent de novo assembly, resulting in 5893 viral contigs that matched the seven virus species. The most abundant viruses were narcissus common latent virus (NCLV), narcissus yellow stripe virus (NYSV), and narcissus mottling-associated virus (NMaV). As flower development stages advanced, white tepal plants showed an increase in the proportion of viral reads, while the variation in viral proportion among yellow tepal plants was relatively small. Narcissus degeneration virus (NDV) dominated the white tepal samples, whereas NDV and NYSV prevailed in the yellow tepal samples. Potyviruses, particularly NDV, are the primary infectious viruses. De novo assembly generated viral contigs for five viruses, yielding complete genomes for NCLV, NDV, narcissus late season yellow virus (NLSYV), and NYSV. Phylogenetic analysis revealed genetic diversity, with distinct NCLV, NMaV, NDV, NLSYV, and NYSV groups. This study provides valuable insights into the viromes and genetic diversity of viruses in Chinese narcissus flowers.

6.
Microbiome ; 11(1): 186, 2023 08 19.
Artículo en Inglés | MEDLINE | ID: mdl-37596696

RESUMEN

BACKGROUND: Exploring metagenomic contigs and "binning" them into metagenome-assembled genomes (MAGs) are essential for the delineation of functional and evolutionary guilds within microbial communities. Despite the advances in automated binning algorithms, their capabilities in recovering MAGs with accuracy and biological relevance are so far limited. Researchers often find that human involvement is necessary to achieve representative binning results. This manual process however is expertise demanding and labor intensive, and it deserves to be supported by software infrastructure. RESULTS: We present BinaRena, a comprehensive and versatile graphic interface dedicated to aiding human operators to explore metagenome assemblies via customizable visualization and to associate contigs with bins. Contigs are rendered as an interactive scatter plot based on various data types, including sequence metrics, coverage profiles, taxonomic assignments, and functional annotations. Various contig-level operations are permitted, such as selection, masking, highlighting, focusing, and searching. Binning plans can be conveniently edited, inspected, and compared visually or using metrics including silhouette coefficient and adjusted Rand index. Completeness and contamination of user-selected contigs can be calculated in real time. In demonstration of BinaRena's usability, we show that it facilitated biological pattern discovery, hypothesis generation, and bin refinement in a complex tropical peatland metagenome. It enabled isolation of pathogenic genomes within closely related populations from the gut microbiota of diarrheal human subjects. It significantly improved overall binning quality after curating results of automated binners using a simulated marine dataset. CONCLUSIONS: BinaRena is an installation-free, dependency-free, client-end web application that operates directly in any modern web browser, facilitating ease of deployment and accessibility for researchers of all skill levels. The program is hosted at https://github.com/qiyunlab/binarena , together with documentation, tutorials, example data, and a live demo. It effectively supports human researchers in intuitive interpretation and fine tuning of metagenomic data. Video Abstract.


Asunto(s)
Metagenoma , Microbiota , Humanos , Metagenoma/genética , Microbiota/genética , Algoritmos , Evolución Biológica , Diarrea
7.
Microbiol Resour Announc ; 12(9): e0047923, 2023 Sep 19.
Artículo en Inglés | MEDLINE | ID: mdl-37526435

RESUMEN

The genome of Pseudomonas monsensis strain SARCC-3054 was sequenced after being confirmed as a potential plant growth-promoting rhizobacteria in both in vitro and in vivo assays. The 6.3 MB genome has a GC content of 60.2% and is divided into 59 contigs that contain several plant beneficial genes and proteins.

8.
Mol Ecol Resour ; 2023 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-37526650

RESUMEN

Identifying sex-linked markers in genomic datasets is important because their presence in supposedly neutral autosomal datasets can result in incorrect estimates of genetic diversity, population structure and parentage. However, detecting sex-linked loci can be challenging, and available scripts neglect some categories of sex-linked variation. Here, we present new R functions to (1) identify and separate sex-linked loci in ZW and XY sex determination systems and (2) infer the genetic sex of individuals based on these loci. We tested these functions on genomic data for two bird and one mammal species and compared the biological inferences made before and after removing sex-linked loci using our function. We found that our function identified autosomal loci with ≥98.8% accuracy and sex-linked loci with an average accuracy of 87.8%. We showed that standard filters, such as low read depth and call rate, failed to remove up to 54.7% of sex-linked loci. This led to (i) overestimation of population FIS by up to 24%, and the number of private alleles by up to 8%; (ii) wrongly inferring significant sex differences in heterozygosity; (iii) obscuring genetic population structure and (iv) inferring ~11% fewer correct parentages. We discuss how failure to remove sex-linked markers can lead to incorrect biological inferences (e.g. sex-biased dispersal and cryptic population structure) and misleading management recommendations. For reduced-representation datasets with at least 15 known-sex individuals of each sex, our functions offer convenient resources to remove sex-linked loci and to sex the remaining individuals (freely available at https://github.com/drobledoruiz/conservation_genomics).

9.
Int J Mol Sci ; 24(8)2023 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-37108472

RESUMEN

Root-lesion nematodes (genus Pratylenchus) belong to a diverse group of plant-parasitic nematodes (PPN) with a worldwide distribution. Despite being an economically important PPN group of more than 100 species, genome information related to Pratylenchus genus is scarcely available. Here, we report the draft genome assembly of Pratylenchus scribneri generated on the PacBio Sequel IIe System using the ultra-low DNA input HiFi sequencing workflow. The final assembly created using 500 nematodes consisted of 276 decontaminated contigs, with an average contig N50 of 1.72 Mb and an assembled draft genome size of 227.24 Mb consisting of 51,146 predicted protein sequences. The benchmarking universal single-copy ortholog (BUSCO) analysis with 3131 nematode BUSCO groups indicated that 65.4% of the BUSCOs were complete, whereas 24.0%, 41.4%, and 1.8% were single-copy, duplicated, and fragmented, respectively, and 32.8% were missing. The outputs from GenomeScope2 and Smudgeplots converged towards a diploid genome for P. scribneri. The data provided here will facilitate future studies on host plant-nematode interactions and crop protection at the molecular level.


Asunto(s)
Parásitos , Tylenchoidea , Animales , Anotación de Secuencia Molecular , Análisis de Secuencia de ADN , Genoma , Secuencia de Bases , Tylenchoidea/genética , Parásitos/genética
10.
BioData Min ; 16(1): 13, 2023 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-36973746

RESUMEN

MOTIVATION: Clustering of genetic sequences is one of the key parts of bioinformatics analyses. Resulting phylogenetic trees are beneficial for solving many research questions, including tracing the history of species, studying migration in the past, or tracing a source of a virus outbreak. At the same time, biologists provide more data in the raw form of reads or only on contig-level assembly. Therefore, tools that are able to process those data without supervision need to be developed. RESULTS: In this paper, we present a tool for reference-free phylogeny capable of handling data where no mature-level assembly is available. The tool allows distance calculation for raw reads, contigs, and the combination of the latter. The tool provides an estimation of the Levenshtein distance between the sequences, which in turn estimates the number of mutations between the organisms. Compared to the previous research, the novelty of the method lies in a newly proposed combination of the read and contig measures, a new method for read-contig mapping, and an efficient embedding of contigs.

11.
Genome Biol ; 23(1): 242, 2022 11 14.
Artículo en Inglés | MEDLINE | ID: mdl-36376928

RESUMEN

Evaluating the quality of metagenomic assemblies is important for constructing reliable metagenome-assembled genomes and downstream analyses. Here, we present metaMIC ( https://github.com/ZhaoXM-Lab/metaMIC ), a machine learning-based tool for identifying and correcting misassemblies in metagenomic assemblies. Benchmarking results on both simulated and real datasets demonstrate that metaMIC outperforms existing tools when identifying misassembled contigs. Furthermore, metaMIC is able to localize the misassembly breakpoints, and the correction of misassemblies by splitting at misassembly breakpoints can improve downstream scaffolding and binning results.


Asunto(s)
Metagenoma , Metagenómica , Análisis de Secuencia de ADN/métodos , Metagenómica/métodos , Aprendizaje Automático , Benchmarking , Programas Informáticos , Algoritmos
12.
J Comput Biol ; 29(12): 1357-1376, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36367700

RESUMEN

Metagenomics enables the recovery of various genetic materials from different species, thus providing valuable insights into microbial communities. Metagenomic binning group sequences belong to different organisms, which is an important step in the early stages of metagenomic analysis pipelines. The classic pipeline followed in metagenomic binning is to assemble short reads into longer contigs and then bin these resulting contigs into groups representing different taxonomic groups in the metagenomic sample. Most of the currently available binning tools are designed to bin metagenomic contigs, but they do not make use of the assembly graphs that produce such assemblies. In this study, we propose MetaCoAG, a metagenomic binning tool that uses assembly graphs with the composition and coverage information of contigs. MetaCoAG estimates the number of initial bins using single-copy marker genes, assigns contigs into bins iteratively, and adjusts the number of bins dynamically throughout the binning process. We show that MetaCoAG significantly outperforms state-of-the-art binning tools by producing similar or more high-quality bins than the second-best binning tool on both simulated and real datasets. To the best of our knowledge, MetaCoAG is the first stand-alone contig-binning tool that directly makes use of the assembly graph information along with other features of the contigs.


Asunto(s)
Metagenómica , Microbiota , Metagenómica/métodos , Metagenoma/genética , Microbiota/genética , Algoritmos , Análisis de Secuencia de ADN/métodos
13.
Biosci Biotechnol Biochem ; 86(6): 693-703, 2022 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-35425950

RESUMEN

In these days, for bacterial genome sequence determination, ultralong reads with homopolymeric troubles are used in combinations with short reads, resulting in genomic sequences with possible incorrect uniformity of repeat sequences. We have been determining complete bacterial genomic sequences based on NGS short reads and Newbler assemblage by utilizing functions implemented in 3 software GenoFinisher, AceFileViewer, and ShortReadManager without conducting additional experiments for gap closing, proving the concept that NGS short reads enclose enough information to determine complete genome sequences. Although some manual in silico tasks are to be conducted, they will ultimately be solved in a single pipeline. In this review, we describe the tools and implemented ideas that have enabled complete sequence determination solely based on short reads, which would be useful for establishing the basis for the future development of a short-read-based assembler that enables complete and accurate genome sequence determination at a lower cost.


Asunto(s)
Genoma Bacteriano , Secuenciación de Nucleótidos de Alto Rendimiento , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos
14.
Funct Integr Genomics ; 22(2): 171-178, 2022 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-34997394

RESUMEN

Genome-wide oil biosynthesis was explored by de novo sequencing two cultivated olive tree (Olea europaea) varieties (cv. Ayvalik and Picual). This is the first report of the former variety sequencing. As outgroups, raw reads of cv. Leccino and scaffold-level assembly of cv. Farga were also retrieved. Each of these four cultivars was chromosome-scale assembled into 23 pseudochromosomes, with 1.31 Gbp (Farga), 0.93 Gbp (Ayvalik), 0.7 Gbp (Picual), and 0.54 Gbp (Leccino) in size. Ab initio gene finding was performed on these assemblies, using wild olive tree (oleaster)-trained parameters. High numbers of gene models were predicted and anchored to the pseudochromosomes: 69,028 (Ayvalik), 55,073 (Picual), 63,785 (Farga), and 40,449 (Leccino). Using previously reported oil biosynthesis genes from wild olive tree genome project, the following homologous sequences were identified: 1,355 (Ayvalik), 1,269 (Farga), 812 (Leccino), and 774 (Picual). Of these, 358 sequences were commonly shared by all cultivars. Besides, some sequences were cultivar unique: Ayvalik (126), Farga (118), Leccino (46), and Picual (52). These putative sequences were assigned to various GO terms, ranging from lipid metabolism to stress tolerance, from signal transactions to development, and to many others, implicating that oil biosynthesis is synergistically regulated with involvement of various other pathways.


Asunto(s)
Olea , Olea/genética
15.
BMC Bioinformatics ; 22(Suppl 12): 315, 2022 Jan 20.
Artículo en Inglés | MEDLINE | ID: mdl-35045830

RESUMEN

BACKGROUND: Metagenomics technology can directly extract microbial genetic material from the environmental samples to obtain their sequencing reads, which can be further assembled into contigs through assembly tools. Clustering methods of contigs are subsequently applied to recover complete genomes from environmental samples. The main problems with current clustering methods are that they cannot recover more high-quality genes from complex environments. Firstly, there are multiple strains under the same species, resulting in assembly of chimeras. Secondly, different strains under the same species are difficult to be classified. Thirdly, it is difficult to determine the number of strains during the clustering process. RESULTS: In view of the shortcomings of current clustering methods, we propose an unsupervised clustering method which can improve the ability to recover genes from complex environments and a new method for selecting the number of sample's strains in clustering process. The sequence composition characteristics (tetranucleotide frequency) and co-abundance are combined to train the probability model for clustering. A new recursive method that can continuously reduce the complexity of the samples is proposed to improve the ability to recover genes from complex environments. The new clustering method was tested on both simulated and real metagenomic datasets, and compared with five state-of-the-art methods including CONCOCT, Maxbin2.0, MetaBAT, MyCC and COCACOLA. In terms of the number and quality of recovered genes from metagenomic datasets, the results show that our proposed method is more effective. CONCLUSIONS: A new contigs clustering method is proposed, which can recover more high-quality genes from complex environmental samples.


Asunto(s)
Algoritmos , Metagenómica , Análisis por Conglomerados , Metagenoma , Análisis de Secuencia de ADN
16.
BMC Bioinformatics ; 22(1): 304, 2021 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-34090332

RESUMEN

BACKGROUND: The detection of genome variants, including point mutations, indels and structural variants, is a fundamental and challenging computational problem. We address here the problem of variant detection between two deep-sequencing (DNA-seq) samples, such as two human samples from an individual patient, or two samples from distinct bacterial strains. The preferred strategy in such a case is to align each sample to a common reference genome, collect all variants and compare these variants between samples. Such mapping-based protocols have several limitations. DNA sequences with large indels, aggregated mutations and structural variants are hard to map to the reference. Furthermore, DNA sequences cannot be mapped reliably to genomic low complexity regions and repeats. RESULTS: We introduce 2-kupl, a k-mer based, mapping-free protocol to detect variants between two DNA-seq samples. On simulated and actual data, 2-kupl achieves higher accuracy than other mapping-free protocols. Applying 2-kupl to prostate cancer whole exome sequencing data, we identify a number of candidate variants in hard-to-map regions and propose potential novel recurrent variants in this disease. CONCLUSIONS: We developed a mapping-free protocol for variant calling between matched DNA-seq samples. Our protocol is suitable for variant detection in unmappable genome regions or in the absence of a reference genome.


Asunto(s)
Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , ADN , Genoma Humano , Humanos , Análisis de Secuencia de ADN
17.
Front Microbiol ; 12: 664560, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34093479

RESUMEN

Metagenomes can be considered as mixtures of viral, bacterial, and other eukaryotic DNA sequences. Mining viral sequences from metagenomes could shed insight into virus-host relationships and expand viral databases. Current alignment-based methods are unsuitable for identifying viral sequences from metagenome sequences because most assembled metagenomic contigs are short and possess few or no predicted genes, and most metagenomic viral genes are dissimilar to known viral genes. In this study, I developed a Markov model-based method, VirMC, to identify viral sequences from metagenomic data. VirMC uses Markov chains to model sequence signatures and construct a scoring model using a likelihood test to distinguish viral and bacterial sequences. Compared with the other two state-of-the-art viral sequence-prediction methods, VirFinder and PPR-Meta, my proposed method outperformed VirFinder and had similar performance with PPR-Meta for short contigs with length less than 400 bp. VirMC outperformed VirFinder and PPR-Meta for identifying viral sequences in contaminated metagenomic samples with eukaryotic sequences. VirMC showed better performance in assembling viral-genome sequences from metagenomic data (based on filtering potential bacterial reads). Applying VirMC to human gut metagenomes from healthy subjects and patients with type-2 diabetes (T2D) revealed that viral contigs could help classify healthy and diseased statuses. This alignment-free method complements gene-based alignment approaches and will significantly improve the precision of viral sequence identification.

18.
Algorithms Mol Biol ; 16(1): 3, 2021 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-33947431

RESUMEN

BACKGROUND: Metagenomic sequencing allows us to study the structure, diversity and ecology in microbial communities without the necessity of obtaining pure cultures. In many metagenomics studies, the reads obtained from metagenomics sequencing are first assembled into longer contigs and these contigs are then binned into clusters of contigs where contigs in a cluster are expected to come from the same species. As different species may share common sequences in their genomes, one assembled contig may belong to multiple species. However, existing tools for binning contigs only support non-overlapped binning, i.e., each contig is assigned to at most one bin (species). RESULTS: In this paper, we introduce GraphBin2 which refines the binning results obtained from existing tools and, more importantly, is able to assign contigs to multiple bins. GraphBin2 uses the connectivity and coverage information from assembly graphs to adjust existing binning results on contigs and to infer contigs shared by multiple species. Experimental results on both simulated and real datasets demonstrate that GraphBin2 not only improves binning results of existing tools but also supports to assign contigs to multiple bins. CONCLUSION: GraphBin2 incorporates the coverage information into the assembly graph to refine the binning results obtained from existing binning tools. GraphBin2 also enables the detection of contigs that may belong to multiple species. We show that GraphBin2 outperforms its predecessor GraphBin on both simulated and real datasets. GraphBin2 is freely available at https://github.com/Vini2/GraphBin2 .

19.
Front Microbiol ; 11: 567769, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33304326

RESUMEN

Phages are viruses that infect bacteria. The phages can be classified into two different categories based on their lifestyles: temperate and lytic. Now, the metavirome can generate a large number of fragments from the viral genomic sequences of entire environmental community, which makes it impossible to determine their lifestyles through experiments. Thus, there is a need to development computational methods for annotating phage contigs and making prediction of their lifestyles. Alignment-based methods for classifying phage lifestyle are limited by incomplete assembled genomes and nucleotide databases. Alignment-free methods based on the frequencies of k-mers were widely used for genome and metagenome comparison which did not rely on the completeness of genome or nucleotide databases. To mimic fragmented metagenomic sequences, the temperate and lytic phages genomic sequences were split into non-overlapping fragments with different lengths, then, I comprehensively compared nine alignment-free dissimilarity measures with a wide range of choices of k-mer length and Markov orders for predicting the lifestyles of these phage contigs. The dissimilarity measure, d 2 S , performed better than other dissimilarity measures for classifying the lifestyles of phages. Thus, I propose that the alignment-free method, d 2 S , can be used for predicting the lifestyles of phages which derived from the metagenomic data.

20.
Microbiome ; 8(1): 48, 2020 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-32245390

RESUMEN

BACKGROUND: Metagenomics is revolutionizing the study of microorganisms and their involvement in biological, biomedical, and geochemical processes, allowing us to investigate by direct sequencing a tremendous diversity of organisms without the need for prior cultivation. Unicellular eukaryotes play essential roles in most microbial communities as chief predators, decomposers, phototrophs, bacterial hosts, symbionts, and parasites to plants and animals. Investigating their roles is therefore of great interest to ecology, biotechnology, human health, and evolution. However, the generally lower sequencing coverage, their more complex gene and genome architectures, and a lack of eukaryote-specific experimental and computational procedures have kept them on the sidelines of metagenomics. RESULTS: MetaEuk is a toolkit for high-throughput, reference-based discovery, and annotation of protein-coding genes in eukaryotic metagenomic contigs. It performs fast searches with 6-frame-translated fragments covering all possible exons and optimally combines matches into multi-exon proteins. We used a benchmark of seven diverse, annotated genomes to show that MetaEuk is highly sensitive even under conditions of low sequence similarity to the reference database. To demonstrate MetaEuk's power to discover novel eukaryotic proteins in large-scale metagenomic data, we assembled contigs from 912 samples of the Tara Oceans project. MetaEuk predicted >12,000,000 protein-coding genes in 8 days on ten 16-core servers. Most of the discovered proteins are highly diverged from known proteins and originate from very sparsely sampled eukaryotic supergroups. CONCLUSION: The open-source (GPLv3) MetaEuk software (https://github.com/soedinglab/metaeuk) enables large-scale eukaryotic metagenomics through reference-based, sensitive taxonomic and functional annotation. Video abstract.


Asunto(s)
Algoritmos , Eucariontes/genética , Metagenómica/métodos , Microbiota , Anotación de Secuencia Molecular/métodos , Biología Computacional/métodos , Bases de Datos Genéticas , Ensayos Analíticos de Alto Rendimiento , Metagenoma , Metagenómica/instrumentación , Análisis de Secuencia de ADN/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA