Pesquisa | BVS Doenças Infecciosas e Parasitárias

Proficiency Testing of Virus Diagnostics Based on Bioinformatics Analysis of Simulated In Silico High-Throughput Sequencing Data Sets.

Brinkmann, Annika; Andrusch, Andreas; Belka, Ariane; Wylezich, Claudia; Höper, Dirk; Pohlmann, Anne; Nordahl Petersen, Thomas; Lucas, Pierrick; Blanchard, Yannick; Papa, Anna; Melidou, Angeliki; Oude Munnink, Bas B; Matthijnssens, Jelle; Deboutte, Ward; Ellis, Richard J; Hansmann, Florian; Baumgärtner, Wolfgang; van der Vries, Erhard; Osterhaus, Albert; Camma, Cesare; Mangone, Iolanda; Lorusso, Alessio; Marcacci, Maurilia; Nunes, Alexandra; Pinto, Miguel; Borges, Vítor; Kroneman, Annelies; Schmitz, Dennis; Corman, Victor Max; Drosten, Christian; Jones, Terry C; Hendriksen, Rene S; Aarestrup, Frank M; Koopmans, Marion; Beer, Martin; Nitsche, Andreas.

J Clin Microbiol ; 57(8)2019 08.

Artigo em Inglês | MEDLINE | ID: mdl-31167846

RESUMO

Quality management and independent assessment of high-throughput sequencing-based virus diagnostics have not yet been established as a mandatory approach for ensuring comparable results. The sensitivity and specificity of viral high-throughput sequence data analysis are highly affected by bioinformatics processing using publicly available and custom tools and databases and thus differ widely between individuals and institutions. Here we present the results of the COMPARE [Collaborative Management Platform for Detection and Analyses of (Re-)emerging and Foodborne Outbreaks in Europe] in silico virus proficiency test. An artificial, simulated in silico data set of Illumina HiSeq sequences was provided to 13 different European institutes for bioinformatics analysis to identify viral pathogens in high-throughput sequence data. Comparison of the participants' analyses shows that the use of different tools, programs, and databases for bioinformatics analyses can impact the correct identification of viral sequences from a simple data set. The identification of slightly mutated and highly divergent virus genomes has been shown to be most challenging. Furthermore, the interpretation of the results, together with a fictitious case report, by the participants showed that in addition to the bioinformatics analysis, the virological evaluation of the results can be important in clinical settings. External quality assessment and proficiency testing should become an important part of validating high-throughput sequencing-based virus diagnostics and could improve the harmonization, comparability, and reproducibility of results. There is a need for the establishment of international proficiency testing, like that established for conventional laboratory tests such as PCR, for bioinformatics pipelines and the interpretation of such results.

Assuntos

Biologia Computacional/métodos , Simulação por Computador , Sequenciamento de Nucleotídeos em Larga Escala/normas , Ensaio de Proficiência Laboratorial/estatística & dados numéricos , Análise de Sequência de DNA/normas , Vírus/genética , Análise de Dados , Europa (Continente) , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Colaboração Intersetorial , Ensaio de Proficiência Laboratorial/organização & administração , Reprodutibilidade dos Testes , Análise de Sequência de DNA/estatística & dados numéricos , Vírus/patogenicidade

PAIPline: pathogen identification in metagenomic and clinical next generation sequencing samples.

Andrusch, Andreas; Dabrowski, Piotr W; Klenner, Jeanette; Tausch, Simon H; Kohl, Claudia; Osman, Abdalla A; Renard, Bernhard Y; Nitsche, Andreas.

Bioinformatics ; 34(17): i715-i721, 2018 09 01.

Artigo em Inglês | MEDLINE | ID: mdl-30423069

RESUMO

Motivation: Next generation sequencing (NGS) has provided researchers with a powerful tool to characterize metagenomic and clinical samples in research and diagnostic settings. NGS allows an open view into samples useful for pathogen detection in an unbiased fashion and without prior hypothesis about possible causative agents. However, NGS datasets for pathogen detection come with different obstacles, such as a very unfavorable ratio of pathogen to host reads. Alongside often appearing false positives and irrelevant organisms, such as contaminants, tools are often challenged by samples with low pathogen loads and might not report organisms present below a certain threshold. Furthermore, some metagenomic profiling tools are only focused on one particular set of pathogens, for example bacteria. Results: We present PAIPline, a bioinformatics pipeline specifically designed to address problems associated with detecting pathogens in diagnostic samples. PAIPline particularly focuses on userfriendliness and encapsulates all necessary steps from preprocessing to resolution of ambiguous reads and filtering up to visualization in a single tool. In contrast to existing tools, PAIPline is more specific while maintaining sensitivity. This is shown in a comparative evaluation where PAIPline was benchmarked along other well-known metagenomic profiling tools on previously published well-characterized datasets. Additionally, as part of an international cooperation project, PAIPline was applied to an outbreak sample of hemorrhagic fevers of then unknown etiology. The presented results show that PAIPline can serve as a robust, reliable, user-friendly, adaptable and generalizable stand-alone software for diagnostics from NGS samples and as a stepping stone for further downstream analyses. Availability and implementation: PAIPline is freely available under https://gitlab.com/rki_bioinformatics/paipline.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica/métodos , Bactérias/genética , Biologia Computacional/métodos , Humanos , Software

DREAM-Yara: an exact read mapper for very large databases with short update time.

Dadi, Temesgen Hailemariam; Siragusa, Enrico; Piro, Vitor C; Andrusch, Andreas; Seiler, Enrico; Renard, Bernhard Y; Reinert, Knut.

Bioinformatics ; 34(17): i766-i772, 2018 09 01.

Artigo em Inglês | MEDLINE | ID: mdl-30423080

RESUMO

Motivation: Mapping-based approaches have become limited in their application to very large sets of references since computing an FM-index for very large databases (e.g. >10 GB) has become a bottleneck. This affects many analyses that need such index as an essential step for approximate matching of the NGS reads to reference databases. For instance, in typical metagenomics analysis, the size of the reference sequences has become prohibitive to compute a single full-text index on standard machines. Even on large memory machines, computing such index takes about 1 day of computing time. As a result, updates of indices are rarely performed. Hence, it is desirable to create an alternative way of indexing while preserving fast search times. Results: To solve the index construction and update problem we propose the DREAM (Dynamic seaRchablE pArallel coMpressed index) framework and provide an implementation. The main contributions are the introduction of an approximate search distributor via a novel use of Bloom filters. We combine several Bloom filters to form an interleaved Bloom filter and use this new data structure to quickly exclude reads for parts of the databases where they cannot match. This allows us to keep the databases in several indices which can be easily rebuilt if parts are updated while maintaining a fast search time. The second main contribution is an implementation of DREAM-Yara a distributed version of a fully sensitive read mapper under the DREAM framework. Availability and implementation: https://gitlab.com/pirovc/dream_yara/.

Assuntos

Bases de Dados Factuais , Software , Humanos , Fatores de Tempo

LiveKraken--real-time metagenomic classification of illumina data.

Tausch, Simon H; Strauch, Benjamin; Andrusch, Andreas; Loka, Tobias P; Lindner, Martin S; Nitsche, Andreas; Renard, Bernhard Y.

Bioinformatics ; 34(21): 3750-3752, 2018 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-29868852

RESUMO

Motivation: In metagenomics, Kraken is one of the most widely used tools due to its robustness and speed. Yet, the overall turnaround time of metagenomic analysis is hampered by the sequential paradigm of wet and dry lab. In urgent experiments, it can be crucial to gain a timely insight into a dataset. Results: Here, we present LiveKraken, a real-time read classification tool based on the core algorithm of Kraken. LiveKraken uses streams of raw data from Illumina sequencers to classify reads taxonomically. This way, we are able to produce results identical to those of Kraken the moment the sequencer finishes. We are furthermore able to provide comparable results in early stages of a sequencing run, allowing saving up to a week of sequencing time on an Illumina HiSeq in High Throughput Mode. While the number of classified reads grows over time, false classifications appear in negligible numbers and proportions of identified taxa are only affected to a minor extent. Availability and implementation: LiveKraken is available at https://gitlab.com/rki_bioinformatics/LiveKraken. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Metagenômica , Análise de Sequência de DNA/métodos , Software , Biologia Computacional , Sequenciamento de Nucleotídeos em Larga Escala

PathoLive-Real-Time Pathogen Identification from Metagenomic Illumina Datasets.

Tausch, Simon H; Loka, Tobias P; Schulze, Jakob M; Andrusch, Andreas; Klenner, Jeanette; Dabrowski, Piotr Wojciech; Lindner, Martin S; Nitsche, Andreas; Renard, Bernhard Y.

Life (Basel) ; 12(9)2022 Aug 30.

Artigo em Inglês | MEDLINE | ID: mdl-36143382

RESUMO

Over the past years, NGS has become a crucial workhorse for open-view pathogen diagnostics. Yet, long turnaround times result from using massively parallel high-throughput technologies as the analysis can only be performed after sequencing has finished. The interpretation of results can further be challenged by contaminations, clinically irrelevant sequences, and the sheer amount and complexity of the data. We implemented PathoLive, a real-time diagnostics pipeline for the detection of pathogens from clinical samples hours before sequencing has finished. Based on real-time alignment with HiLive2, mappings are scored with respect to common contaminations, low-entropy areas, and sequences of widespread, non-pathogenic organisms. The results are visualized using an interactive taxonomic tree that provides an easily interpretable overview of the relevance of hits. For a human plasma sample that was spiked in vitro with six pathogenic viruses, all agents were clearly detected after only 40 of 200 sequencing cycles. For a real-world sample from Sudan, the results correctly indicated the presence of Crimean-Congo hemorrhagic fever virus. In a second real-world dataset from the 2019 SARS-CoV-2 outbreak in Wuhan, we found the presence of a SARS coronavirus as the most relevant hit without the novel virus reference genome being included in the database. For all samples, clinically irrelevant hits were correctly de-emphasized. Our approach is valuable to obtain fast and accurate NGS-based pathogen identifications and correctly prioritize and visualize them based on their clinical significance: PathoLive is open source and available on GitLab and BioConda.

Whole Genome Characterization of Orthopoxvirus (OPV) Abatino, a Zoonotic Virus Representing a Putative Novel Clade of Old World Orthopoxviruses.

Gruber, Cesare E M; Giombini, Emanuela; Selleri, Marina; Tausch, Simon H; Andrusch, Andreas; Tyshaieva, Alona; Cardeti, Giusy; Lorenzetti, Raniero; De Marco, Lorenzo; Carletti, Fabrizio; Nitsche, Andreas; Capobianchi, Maria R; Ippolito, Giuseppe; Autorino, Gian Luca; Castilletti, Concetta.

Viruses ; 10(10)2018 10 06.

Artigo em Inglês | MEDLINE | ID: mdl-30301229

RESUMO

Orthopoxviruses (OPVs) are diffused over the complete Eurasian continent, but previously described strains are mostly from northern Europe, and few infections have been reported from Italy. Here we present the extended genomic characterization of OPV Abatino, a novel OPV isolated in Italy from an infected Tonkean macaque, with zoonotic potential. Phylogenetic analysis based on 102 conserved OPV genes (core gene set) showed that OPV Abatino is most closely related to the Ectromelia virus species (ECTV), although placed on a separate branch of the phylogenetic tree, bringing substantial support to the hypothesis that this strain may be part of a novel OPV clade. Extending the analysis to the entire set of genes (coding sequences, CDS) further substantiated this hypothesis. In fact the genome of OPV Abatino included more CDS than ECTV; most of the extra genes (mainly located in the terminal genome regions), showed the highest similarity with cowpox virus (CPXV); however vaccinia virus (VACV) and monkeypox virus (MPXV) were the closest OPV for certain CDS. These findings suggest that OPV Abatino could be the result of complex evolutionary events, diverging from any other previously described OPV, and may indicate that previously reported cases in Italy could represent the tip of the iceberg yet to be explored.

Assuntos

Cercopithecidae/virologia , Genoma Viral/genética , Orthopoxvirus/classificação , Orthopoxvirus/genética , Filogenia , Animais , DNA Viral/genética , Genes Virais/genética , Análise de Sequência de DNA , Homologia de Sequência do Ácido Nucleico

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA