Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
NAR Genom Bioinform ; 2(2): lqaa044, 2020 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-32626849

RESUMEN

Metagenomic sequencing has greatly enhanced the discovery of viral genomic sequences; however, it remains challenging to identify the host(s) of these new viruses. We developed VirHostMatcher-Net, a flexible, network-based, Markov random field framework for predicting virus-prokaryote interactions using multiple, integrated features: CRISPR sequences and alignment-free similarity measures ([Formula: see text] and WIsH). Evaluation of this method on a benchmark set of 1462 known virus-prokaryote pairs yielded host prediction accuracy of 59% and 86% at the genus and phylum levels, representing 16-27% and 6-10% improvement, respectively, over previous single-feature prediction approaches. We applied our host prediction tool to crAssphage, a human gut phage, and two metagenomic virus datasets: marine viruses and viral contigs recovered from globally distributed, diverse habitats. Host predictions were frequently consistent with those of previous studies, but more importantly, this new tool made many more confident predictions than previous tools, up to nearly 3-fold more (n > 27 000), greatly expanding the diversity of known virus-host interactions.

2.
Genome Biol ; 20(1): 266, 2019 12 04.
Artículo en Inglés | MEDLINE | ID: mdl-31801606

RESUMEN

Alignment-free methods, more time and memory efficient than alignment-based methods, have been widely used for comparing genome sequences or raw sequencing samples without assembly. However, in this study, we show that alignment-free dissimilarity calculated based on sequencing samples can be overestimated compared with the dissimilarity calculated based on their genomes, and this bias can significantly decrease the performance of the alignment-free analysis. Here, we introduce a new alignment-free tool, Alignment-Free methods Adjusted by Neural Network (Afann) that successfully adjusts this bias and achieves excellent performance on various independent datasets. Afann is freely available at https://github.com/GeniusTang/Afann.


Asunto(s)
Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Redes Neurales de la Computación , Programas Informáticos , Animales , Primates/genética , Quercus/genética , Análisis de Regresión
3.
Genome Biol ; 20(1): 144, 2019 07 25.
Artículo en Inglés | MEDLINE | ID: mdl-31345254

RESUMEN

BACKGROUND: Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment. RESULTS: Here, we present a community resource (http://afproject.org) to establish standards for comparing alignment-free approaches across different areas of sequence-based research. We characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference, and reconstruction of species trees under horizontal gene transfer and recombination events. CONCLUSION: The interactive web service allows researchers to explore the performance of alignment-free tools relevant to their data types and analytical goals. It also allows method developers to assess their own algorithms and compare them with current state-of-the-art tools, accelerating the development of new, more accurate AF solutions.


Asunto(s)
Análisis de Secuencia , Benchmarking , Transferencia de Gen Horizontal , Internet , Filogenia , Secuencias Reguladoras de Ácidos Nucleicos , Alineación de Secuencia , Análisis de Secuencia de Proteína , Programas Informáticos
4.
BMC Genomics ; 19(1): 896, 2018 Dec 10.
Artículo en Inglés | MEDLINE | ID: mdl-30526482

RESUMEN

BACKGROUND: The application of genomic data and bioinformatics for the identification of restricted or illegally-sourced natural products is urgently needed. The taxonomic identity and geographic provenance of raw and processed materials have implications in sustainable-use commercial practices, and relevance to the enforcement of laws that regulate or restrict illegally harvested materials, such as timber. Improvements in genomics make it possible to capture and sequence partial-to-complete genomes from challenging tissues, such as wood and wood products. RESULTS: In this paper, we report the success of an alignment-free genome comparison method, [Formula: see text] that differentiates different geographic sources of white oak (Quercus) species with a high level of accuracy with very small amount of genomic data. The method is robust to sequencing errors, different sequencing laboratories and sequencing platforms. CONCLUSIONS: This method offers an approach based on genome-scale data, rather than panels of pre-selected markers for specific taxa. The method provides a generalizable platform for the identification and sourcing of materials using a unified next generation sequencing and analysis framework.


Asunto(s)
ADN de Plantas/genética , Genoma de Planta , Geografía , Quercus/genética , Alineación de Secuencia/métodos , Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Componente Principal
5.
Front Microbiol ; 9: 711, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29713314

RESUMEN

Horizontal gene transfer (HGT) plays an important role in the evolution of microbial organisms including bacteria. Alignment-free methods based on single genome compositional information have been used to detect HGT. Currently, Manhattan and Euclidean distances based on tetranucleotide frequencies are the most commonly used alignment-free dissimilarity measures to detect HGT. By testing on simulated bacterial sequences and real data sets with known horizontal transferred genomic regions, we found that more advanced alignment-free dissimilarity measures such as CVTree and [Formula: see text] that take into account the background Markov sequences can solve HGT detection problems with significantly improved performance. We also studied the influence of different factors such as evolutionary distance between host and donor sequences, size of sliding window, and host genome composition on the performances of alignment-free methods to detect HGT. Our study showed that alignment-free methods can predict HGT accurately when host and donor genomes are in different order levels. Among all methods, CVTree with word length of 3, [Formula: see text] with word length 3, Markov order 1 and [Formula: see text] with word length 4, Markov order 1 outperform others in terms of their highest F1-score and their robustness under the influence of different factors.

6.
Annu Rev Biomed Data Sci ; 1: 93-114, 2018 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-31828235

RESUMEN

Genome and metagenome comparisons based on large amounts of next generation sequencing (NGS) data pose significant challenges for alignment-based approaches due to the huge data size and the relatively short length of the reads. Alignment-free approaches based on the counts of word patterns in NGS data do not depend on the complete genome and are generally computationally efficient. Thus, they contribute significantly to genome and metagenome comparison. Recently, novel statistical approaches have been developed for the comparison of both long and shotgun sequences. These approaches have been applied to many problems including the comparison of gene regulatory regions, genome sequences, metagenomes, binning contigs in metagenomic data, identification of virus-host interactions, and detection of horizontal gene transfers. We provide an updated review of these applications and other related developments of word-count based approaches for alignment-free sequence analysis.

7.
BMC Genomics ; 18(Suppl 6): 732, 2017 Oct 03.
Artículo en Inglés | MEDLINE | ID: mdl-28984181

RESUMEN

BACKGROUND: Alignment-free sequence comparison using counts of word patterns (grams, k-tuples) has become an active research topic due to the large amount of sequence data from the new sequencing technologies. Genome sequences are frequently modelled by Markov chains and the likelihood ratio test or the corresponding approximate χ 2-statistic has been suggested to compare two sequences. However, it is not known how to best choose the word length k in such studies. RESULTS: We develop an optimal strategy to choose k by maximizing the statistical power of detecting differences between two sequences. Let the orders of the Markov chains for the two sequences be r 1 and r 2, respectively. We show through both simulations and theoretical studies that the optimal k= max(r 1,r 2)+1 for both long sequences and next generation sequencing (NGS) read data. The orders of the Markov chains may be unknown and several methods have been developed to estimate the orders of Markov chains based on both long sequences and NGS reads. We study the power loss of the statistics when the estimated orders are used. It is shown that the power loss is minimal for some of the estimators of the orders of Markov chains. CONCLUSION: Our studies provide guidelines on choosing the optimal word length for the comparison of Markov sequences.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Cadenas de Markov , Algoritmos , Genómica , Lipoproteína Lipasa/genética
8.
Nucleic Acids Res ; 45(W1): W554-W559, 2017 07 03.
Artículo en Inglés | MEDLINE | ID: mdl-28472388

RESUMEN

Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE.


Asunto(s)
Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Animales , Genoma Microbiano , Internet , Metagenómica , Primates/genética , Alineación de Secuencia , Vertebrados/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...