Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
2.
BMC Bioinformatics ; 25(1): 15, 2024 Jan 11.
Artículo en Inglés | MEDLINE | ID: mdl-38212694

RESUMEN

BACKGROUND: Long reads have gained popularity in the analysis of metagenomics data. Therefore, we comprehensively assessed metagenomics classification tools on the species taxonomic level. We analysed kmer-based tools, mapping-based tools and two general-purpose long reads mappers. We evaluated more than 20 pipelines which use either nucleotide or protein databases and selected 13 for an extensive benchmark. We prepared seven synthetic datasets to test various scenarios, including the presence of a host, unknown species and related species. Moreover, we used available sequencing data from three well-defined mock communities, including a dataset with abundance varying from 0.0001 to 20% and six real gut microbiomes. RESULTS: General-purpose mappers Minimap2 and Ram achieved similar or better accuracy on most testing metrics than best-performing classification tools. They were up to ten times slower than the fastest kmer-based tools requiring up to four times less RAM. All tested tools were prone to report organisms not present in datasets, except CLARK-S, and they underperformed in the case of the high presence of the host's genetic material. Tools which use a protein database performed worse than those based on a nucleotide database. Longer read lengths made classification easier, but due to the difference in read length distributions among species, the usage of only the longest reads reduced the accuracy. The comparison of real gut microbiome datasets shows a similar abundance profiles for the same type of tools but discordance in the number of reported organisms and abundances between types. Most assessments showed the influence of database completeness on the reports. CONCLUSION: The findings indicate that kmer-based tools are well-suited for rapid analysis of long reads data. However, when heightened accuracy is essential, mappers demonstrate slightly superior performance, albeit at a considerably slower pace. Nevertheless, a combination of diverse categories of tools and databases will likely be necessary to analyse complex samples. Discrepancies observed among tools when applied to real gut datasets, as well as a reduced performance in cases where unknown species or a significant proportion of the host genome is present in the sample, highlight the need for continuous improvement of existing tools. Additionally, regular updates and curation of databases are important to ensure their effectiveness.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Metagenoma , Análisis de Secuencia de ADN , Metagenómica , Bases de Datos de Proteínas , Nucleótidos
3.
Nat Methods ; 20(4): 491-492, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36959321
4.
Commun Biol ; 5(1): 967, 2022 09 15.
Artículo en Inglés | MEDLINE | ID: mdl-36109650

RESUMEN

Singapore's National Flower, Papilionanthe (Ple.) Miss Joaquim 'Agnes' (PMJ) is highly prized as a horticultural flower from the Orchidaceae family. A combination of short-read sequencing, single-molecule long-read sequencing and chromatin contact mapping was used to assemble the PMJ genome, spanning 2.5 Gb and 19 pseudo-chromosomal scaffolds. Genomic resources and chemical profiling provided insights towards identifying, understanding and elucidating various classes of secondary metabolite compounds synthesized by the flower. For example, presence of the anthocyanin pigments detected by chemical profiling coincides with the expression of ANTHOCYANIN SYNTHASE (ANS), an enzyme responsible for the synthesis of the former. Similarly, the presence of vandaterosides (a unique class of glycosylated organic acids with the potential to slow skin aging) discovered using chemical profiling revealed the involvement of glycosyltransferase family enzymes candidates in vandateroside biosynthesis. Interestingly, despite the unnoticeable scent of the flower, genes involved in the biosynthesis of volatile compounds and chemical profiling revealed the combination of oxygenated hydrocarbons, including traces of linalool, beta-ionone and vanillin, forming the scent profile of PMJ. In summary, by combining genomics and biochemistry, the findings expands the known biodiversity repertoire of the Orchidaceae family and insights into the genome and secondary metabolite processes of PMJ.


Asunto(s)
Antocianinas , Orchidaceae , Cromatina/metabolismo , Flores/genética , Flores/metabolismo , Regulación de la Expresión Génica de las Plantas , Glicosiltransferasas/genética , Redes y Vías Metabólicas , Orchidaceae/genética , Singapur
5.
Nat Methods ; 19(7): 833-844, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35697834

RESUMEN

Inosine is a prevalent RNA modification in animals and is formed when an adenosine is deaminated by the ADAR family of enzymes. Traditionally, inosines are identified indirectly as variants from Illumina RNA-sequencing data because they are interpreted as guanosines by cellular machineries. However, this indirect method performs poorly in protein-coding regions where exons are typically short, in non-model organisms with sparsely annotated single-nucleotide polymorphisms, or in disease contexts where unknown DNA mutations are pervasive. Here, we show that Oxford Nanopore direct RNA sequencing can be used to identify inosine-containing sites in native transcriptomes with high accuracy. We trained convolutional neural network models to distinguish inosine from adenosine and guanosine, and to estimate the modification rate at each editing site. Furthermore, we demonstrated their utility on the transcriptomes of human, mouse and Xenopus. Our approach expands the toolkit for studying adenosine-to-inosine editing and can be further extended to investigate other RNA modifications.


Asunto(s)
Nanoporos , ARN , Adenosina/genética , Animales , Inosina/genética , Ratones , ARN/genética , ARN/metabolismo , Edición de ARN , Análisis de Secuencia de ARN
6.
Nat Comput Sci ; 1(5): 332-336, 2021 May.
Artículo en Inglés | MEDLINE | ID: mdl-38217213

RESUMEN

Whole genome sequencing technologies are unable to invariably read DNA molecules intact, a shortcoming that assemblers try to resolve by stitching the obtained fragments back together. Here, we present methods for the improvement of de novo genome assembly from erroneous long reads incorporated into a tool called Raven. Raven maintains similar performance for various genomes and has accuracy on par with other assemblers that support third-generation sequencing data. It is one of the fastest options while having the lowest memory consumption on the majority of benchmarked datasets.

7.
Nat Biotechnol ; 37(8): 937-944, 2019 08.
Artículo en Inglés | MEDLINE | ID: mdl-31359005

RESUMEN

Characterization of microbiomes has been enabled by high-throughput metagenomic sequencing. However, existing methods are not designed to combine reads from short- and long-read technologies. We present a hybrid metagenomic assembler named OPERA-MS that integrates assembly-based metagenome clustering with repeat-aware, exact scaffolding to accurately assemble complex communities. Evaluation using defined in vitro and virtual gut microbiomes revealed that OPERA-MS assembles metagenomes with greater base pair accuracy than long-read (>5×; Canu), higher contiguity than short-read (~10× NGA50; MEGAHIT, IDBA-UD, metaSPAdes) and fewer assembly errors than non-metagenomic hybrid assemblers (2×; hybridSPAdes). OPERA-MS provides strain-resolved assembly in the presence of multiple genomes of the same species, high-quality reference genomes for rare species (<1%) with ~9× long-read coverage and near-complete genomes with higher coverage. We used OPERA-MS to assemble 28 gut metagenomes of antibiotic-treated patients, and showed that the inclusion of long nanopore reads produces more contiguous assemblies (200× improvement over short-read assemblies), including more than 80 closed plasmid or phage sequences and a new 263 kbp jumbo phage. High-quality hybrid assemblies enable an exquisitely detailed view of the gut resistome in human patients.


Asunto(s)
Bacterias/efectos de los fármacos , Bacterias/genética , Metagenómica/métodos , Microbiota/efectos de los fármacos , Análisis de Secuencia de ADN/métodos , Antibacterianos/farmacología , Farmacorresistencia Bacteriana , Heces/microbiología , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Metagenoma , Nanoporos , Programas Informáticos
8.
Bioinformatics ; 34(5): 748-754, 2018 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-29069314

RESUMEN

Motivation: High-throughput sequencing has transformed the study of gene expression levels through RNA-seq, a technique that is now routinely used by various fields, such as genetic research or diagnostics. The advent of third generation sequencing technologies providing significantly longer reads opens up new possibilities. However, the high error rates common to these technologies set new bioinformatics challenges for the gapped alignment of reads to their genomic origin. In this study, we have explored how currently available RNA-seq splice-aware alignment tools cope with increased read lengths and error rates. All tested tools were initially developed for short NGS reads, but some have claimed support for long Pacific Biosciences (PacBio) or even Oxford Nanopore Technologies (ONT) MinION reads. Results: The tools were tested on synthetic and real datasets from two technologies (PacBio and ONT MinION). Alignment quality and resource usage were compared across different aligners. The effect of error correction of long reads was explored, both using self-correction and correction with an external short reads dataset. A tool was developed for evaluating RNA-seq alignment results. This tool can be used to compare the alignment of simulated reads to their genomic origin, or to compare the alignment of real reads to a set of annotated transcripts. Our tests show that while some RNA-seq aligners were unable to cope with long error-prone reads, others produced overall good results. We further show that alignment accuracy can be improved using error-corrected reads. Availability and implementation: https://github.com/kkrizanovic/RNAseqEval, https://figshare.com/projects/RNAseq_benchmark/24391. Contact: mile.sikic@fer.hr. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Animales , Drosophila melanogaster/genética , Humanos , Saccharomyces cerevisiae/genética
9.
Bioinformatics ; 33(9): 1394-1395, 2017 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-28453688

RESUMEN

Summary: We present Edlib, an open-source C/C ++ library for exact pairwise sequence alignment using edit distance. We compare Edlib to other libraries and show that it is the fastest while not lacking in functionality and can also easily handle very large sequences. Being easy to use, flexible, fast and low on memory usage, we expect it to be easily adopted as a building block for future bioinformatics tools. Availability and Implementation: Source code, installation instructions and test data are freely available for download at https://github.com/Martinsos/edlib, under the MIT licence. Edlib is implemented in C/C ++ and supported on Linux, MS Windows, and Mac OS. Contact: mile.sikic@fer.hr. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos
10.
Genome Res ; 27(5): 737-746, 2017 05.
Artículo en Inglés | MEDLINE | ID: mdl-28100585

RESUMEN

The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource-intensive error-correction and consensus-generation steps to obtain high-quality assemblies. We show that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment-based, stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore data sets, we show that Racon coupled with miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster.


Asunto(s)
Algoritmos , Mapeo Contig/métodos , Genómica/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Mapeo Contig/normas , Genómica/normas , Alineación de Secuencia/normas , Análisis de Secuencia de ADN/normas
11.
Bioinformatics ; 32(17): i680-i684, 2016 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-27587689

RESUMEN

MOTIVATION: Protein database search is one of the fundamental problems in bioinformatics. For decades, it has been explored and solved using different exact and heuristic approaches. However, exponential growth of data in recent years has brought significant challenges in improving already existing algorithms. BLAST has been the most successful tool for protein database search, but is also becoming a bottleneck in many applications. Due to that, many different approaches have been developed to complement or replace it. In this article, we present SWORD, an efficient protein database search implementation that runs 8-16 times faster than BLAST in the sensitive mode and up to 68 times faster in the fast and less accurate mode. It is designed to be used in nearly all database search environments, but is especially suitable for large databases. Its sensitivity exceeds that of BLAST for majority of input datasets and provides guaranteed optimal alignments. AVAILABILITY AND IMPLEMENTATION: Sword is freely available for download from https://github.com/rvaser/sword CONTACT: robert.vaser@fer.hr and mile.sikic@fer.hr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bases de Datos de Proteínas , Motor de Búsqueda , Alineación de Secuencia , Algoritmos , Programas Informáticos
12.
Bioinformatics ; 32(17): 2582-9, 2016 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-27162186

RESUMEN

MOTIVATION: Recent emergence of nanopore sequencing technology set a challenge for established assembly methods. In this work, we assessed how existing hybrid and non-hybrid de novo assembly methods perform on long and error prone nanopore reads. RESULTS: We benchmarked five non-hybrid (in terms of both error correction and scaffolding) assembly pipelines as well as two hybrid assemblers which use third generation sequencing data to scaffold Illumina assemblies. Tests were performed on several publicly available MinION and Illumina datasets of Escherichia coli K-12, using several sequencing coverages of nanopore data (20×, 30×, 40× and 50×). We attempted to assess the assembly quality at each of these coverages, in order to estimate the requirements for closed bacterial genome assembly. For the purpose of the benchmark, an extensible genome assembly benchmarking framework was developed. Results show that hybrid methods are highly dependent on the quality of NGS data, but much less on the quality and coverage of nanopore data and perform relatively well on lower nanopore coverages. All non-hybrid methods correctly assemble the E. coli genome when coverage is above 40×, even the non-hybrid method tailored for Pacific Biosciences reads. While it requires higher coverage compared to a method designed particularly for nanopore reads, its running time is significantly lower. AVAILABILITY AND IMPLEMENTATION: https://github.com/kkrizanovic/NanoMark CONTACT: mile.sikic@fer.hr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Nanoporos , Análisis de Secuencia de ADN , Escherichia coli , Escherichia coli K12 , Genoma Bacteriano , Secuenciación de Nucleótidos de Alto Rendimiento
13.
Nat Commun ; 7: 11307, 2016 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-27079541

RESUMEN

Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%). Evaluation on MinION sequencing data sets against short- and long-read mappers indicates that GraphMap increases mapping sensitivity by 10-80% and maps >95% of bases. GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Genoma Humano/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Genómica/métodos , Humanos , Nanoporos , Polimorfismo de Nucleótido Simple , Reproducibilidad de los Resultados , Alineación de Secuencia/métodos
14.
Nat Protoc ; 11(1): 1-9, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26633127

RESUMEN

The SIFT (sorting intolerant from tolerant) algorithm helps bridge the gap between mutations and phenotypic variations by predicting whether an amino acid substitution is deleterious. SIFT has been used in disease, mutation and genetic studies, and a protocol for its use has been previously published with Nature Protocols. This updated protocol describes SIFT 4G (SIFT for genomes), which is a faster version of SIFT that enables practical computations on reference genomes. Users can get predictions for single-nucleotide variants from their organism of interest using the SIFT 4G annotator with SIFT 4G's precomputed databases. The scope of genomic predictions is expanded, with predictions available for more than 200 organisms. Users can also run the SIFT 4G algorithm themselves. SIFT predictions can be retrieved for 6.7 million variants in 4 min once the database has been downloaded. If precomputed predictions are not available, the SIFT 4G algorithm can compute predictions at a rate of 2.6 s per protein sequence. SIFT 4G is available from http://sift-dna.org/sift4g.


Asunto(s)
Algoritmos , Genómica/métodos , Mutación Missense/genética , Bases de Datos de Proteínas , Genómica/normas , Humanos , Anotación de Secuencia Molecular , Fenotipo , Estándares de Referencia
15.
Phys Rev Lett ; 114(24): 248701, 2015 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-26197016

RESUMEN

Detection of patient zero can give new insights to epidemiologists about the nature of first transmissions into a population. In this Letter, we study the statistical inference problem of detecting the source of epidemics from a snapshot of spreading on an arbitrary network structure. By using exact analytic calculations and Monte Carlo estimators, we demonstrate the detectability limits for the susceptible-infected-recovered model, which primarily depend on the spreading process characteristics. Finally, we demonstrate the applicability of the approach in a case of a simulated sexually transmitted infection spreading over an empirical temporal network of sexual interactions.


Asunto(s)
Trazado de Contacto/métodos , Modelos Estadísticos , Enfermedades de Transmisión Sexual/epidemiología , Simulación por Computador , Métodos Epidemiológicos , Humanos , Método de Montecarlo , Enfermedades de Transmisión Sexual/transmisión
16.
PLoS One ; 10(12): e0145857, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26719890

RESUMEN

In recent years we have witnessed a growth in sequencing yield, the number of samples sequenced, and as a result-the growth of publicly maintained sequence databases. The increase of data present all around has put high requirements on protein similarity search algorithms with two ever-opposite goals: how to keep the running times acceptable while maintaining a high-enough level of sensitivity. The most time consuming step of similarity search are the local alignments between query and database sequences. This step is usually performed using exact local alignment algorithms such as Smith-Waterman. Due to its quadratic time complexity, alignments of a query to the whole database are usually too slow. Therefore, the majority of the protein similarity search methods prior to doing the exact local alignment apply heuristics to reduce the number of possible candidate sequences in the database. However, there is still a need for the alignment of a query sequence to a reduced database. In this paper we present the SW#db tool and a library for fast exact similarity search. Although its running times, as a standalone tool, are comparable to the running times of BLAST, it is primarily intended to be used for exact local alignment phase in which the database of sequences has already been reduced. It uses both GPU and CPU parallelization and was 4-5 times faster than SSEARCH, 6-25 times faster than CUDASW++ and more than 20 times faster than SSW at the time of writing, using multiple queries on Swiss-prot and Uniref90 databases.


Asunto(s)
Biología Computacional/métodos , Alineación de Secuencia/métodos , Programas Informáticos , Algoritmos , Bases de Datos de Ácidos Nucleicos , Navegador Web
17.
Nucleic Acids Res ; 42(Database issue): D879-81, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24271393

RESUMEN

ExoLocator (http://exolocator.eopsf.org) collects in a single place information needed for comparative analysis of protein-coding exons from vertebrate species. The main source of data--the genomic sequences, and the existing exon and homology annotation--is the ENSEMBL database of completed vertebrate genomes. To these, ExoLocator adds the search for ostensibly missing exons in orthologous protein pairs across species, using an extensive computational pipeline to narrow down the search region for the candidate exons and find a suitable template in the other species, as well as state-of-the-art implementations of pairwise alignment algorithms. The resulting complements of exons are organized in a way currently unique to ExoLocator: multiple sequence alignments, both on the nucleotide and on the peptide levels, clearly indicating the exon boundaries. The alignments can be inspected in the web-embedded viewer, downloaded or used on the spot to produce an estimate of conservation within orthologous sets, or functional divergence across paralogues.


Asunto(s)
Bases de Datos de Proteínas , Exones , Proteínas/genética , Animales , Genoma Humano , Humanos , Internet , Vertebrados/genética
18.
Bioinformatics ; 29(19): 2494-5, 2013 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-23864730

RESUMEN

SUMMARY: We propose SW#, a new CUDA graphical processor unit-enabled and memory-efficient implementation of dynamic programming algorithm, for local alignment. It can be used as either a stand-alone application or a library. Although there are other graphical processor unit implementations of the Smith-Waterman algorithm, SW# is the only one publicly available that can produce sequence alignments on genome-wide scale. For long sequences, it is at least a few hundred times faster than a CPU version of the same algorithm. AVAILABILITY: Source code and installation instructions freely available for download at http://complex.zesoi.fer.hr/SW.html.


Asunto(s)
Algoritmos , Genoma , Secuencia de Bases , Internet , Alineación de Secuencia , Programas Informáticos
19.
Cell Rep ; 2(5): 1207-19, 2012 Nov 29.
Artículo en Inglés | MEDLINE | ID: mdl-23103170

RESUMEN

Chromatin interactions play important roles in transcription regulation. To better understand the underlying evolutionary and functional constraints of these interactions, we implemented a systems approach to examine RNA polymerase-II-associated chromatin interactions in human cells. We found that 40% of the total genomic elements involved in chromatin interactions converged to a giant, scale-free-like, hierarchical network organized into chromatin communities. The communities were enriched in specific functions and were syntenic through evolution. Disease-associated SNPs from genome-wide association studies were enriched among the nodes with fewer interactions, implying their selection against deleterious interactions by limiting the total number of interactions, a model that we further reconciled using somatic and germline cancer mutation data. The hubs lacked disease-associated SNPs, constituted a nonrandomly interconnected core of key cellular functions, and exhibited lethality in mouse mutants, supporting an evolutionary selection that favored the nonrandom spatial clustering of the least-evolving key genomic domains against random genetic or transcriptional errors in the genome. Altogether, our analyses reveal a systems-level evolutionary framework that shapes functionally compartmentalized and error-tolerant transcriptional regulation of human genome in three dimensions.


Asunto(s)
Cromatina/metabolismo , Animales , Evolución Biológica , Redes Reguladoras de Genes , Genoma , Genoma Humano , Estudio de Asociación del Genoma Completo , Humanos , Células K562 , Células MCF-7 , Ratones , Polimorfismo de Nucleótido Simple , Regiones Promotoras Genéticas , ARN Polimerasa II/metabolismo , Transcripción Genética
20.
Nucleic Acids Res ; 40(Web Server issue): W352-7, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22693222

RESUMEN

In this article, we introduce BioMe (biologically relevant metals), a web-based platform for calculation of various statistical properties of metal-binding sites. Users can obtain the following statistical properties: presence of selected ligands in metal coordination sphere, distribution of coordination numbers, percentage of metal ions coordinated by the combination of selected ligands, distribution of monodentate and bidentate metal-carboxyl, bindings for ASP and GLU, percentage of particular binuclear metal centers, distribution of coordination geometry, descriptive statistics for a metal ion-donor distance and percentage of the selected metal ions coordinated by each of the selected ligands. Statistics is presented in numerical and graphical forms. The underlying database contains information about all contacts within the range of 3 Å from a metal ion found in the asymmetric crystal unit. The stored information for each metal ion includes Protein Data Bank code, structure determination method, types of metal-binding chains [protein, ribonucleic acid (RNA), deoxyribonucleic acid (DNA), water and other] and names of the bounded ligands (amino acid residue, RNA nucleotide, DNA nucleotide, water and other) and the coordination number, the coordination geometry and, if applicable, another metal(s). BioMe is on a regular weekly update schedule. It is accessible at http://metals.zesoi.fer.hr.


Asunto(s)
Metales/química , Programas Informáticos , Sitios de Unión , ADN/química , Interpretación Estadística de Datos , Internet , Ligandos , Metaloproteínas/química , ARN/química , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA