Pesquisa | Biblioteca Virtual em Saúde

Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty.

Togkousidis, Anastasis; Kozlov, Oleksiy M; Haag, Julia; Höhler, Dimitri; Stamatakis, Alexandros.

Mol Biol Evol ; 40(10)2023 10 04.

Artigo em Inglês | MEDLINE | ID: mdl-37804116

RESUMO

Phylogenetic inferences under the maximum likelihood criterion deploy heuristic tree search strategies to explore the vast search space. Depending on the input dataset, searches from different starting trees might all converge to a single tree topology. Often, though, distinct searches infer multiple topologies with large log-likelihood score differences or yield topologically highly distinct, yet almost equally likely, trees. Recently, Haag et al. introduced an approach to quantify, and implemented machine learning methods to predict, the dataset difficulty with respect to phylogenetic inference. Easy multiple sequence alignments (MSAs) exhibit a single likelihood peak on their likelihood surface, associated with a single tree topology to which most, if not all, independent searches rapidly converge. As difficulty increases, multiple locally optimal likelihood peaks emerge, yet from highly distinct topologies. To make use of this information, we introduce and implement an adaptive tree search heuristic in RAxML-NG, which modifies the thoroughness of the tree search strategy as a function of the predicted difficulty. Our adaptive strategy is based upon three observations. First, on easy datasets, searches converge rapidly and can hence be terminated at an earlier stage. Second, overanalyzing difficult datasets is hopeless, and thus it suffices to quickly infer only one of the numerous almost equally likely topologies to reduce overall execution time. Third, more extensive searches are justified and required on datasets with intermediate difficulty. While the likelihood surface exhibits multiple locally optimal peaks in this case, a small proportion of them is significantly better. Our experimental results for the adaptive heuristic on 9,515 empirical and 5,000 simulated datasets with varying difficulty exhibit substantial speedups, especially on easy and difficult datasets (53% of total MSAs), where we observe average speedups of more than 10×. Further, approximately 94% of the inferred trees using the adaptive strategy are statistically indistinguishable from the trees inferred under the standard strategy (RAxML-NG).

Assuntos

Algoritmos , Filogenia , Funções Verossimilhança , Alinhamento de Sequência

Detecting SARS-CoV-2 lineages and mutational load in municipal wastewater and a use-case in the metropolitan area of Thessaloniki, Greece.

Pechlivanis, Nikolaos; Tsagiopoulou, Maria; Maniou, Maria Christina; Togkousidis, Anastasis; Mouchtaropoulou, Evangelia; Chassalevris, Taxiarchis; Chaintoutis, Serafeim C; Petala, Maria; Kostoglou, Margaritis; Karapantsios, Thodoris; Laidou, Stamatia; Vlachonikola, Elisavet; Chatzidimitriou, Anastasia; Papadopoulos, Agis; Papaioannou, Nikolaos; Dovas, Chrysostomos I; Argiriou, Anagnostis; Psomopoulos, Fotis.

Sci Rep ; 12(1): 2659, 2022 02 17.

Artigo em Inglês | MEDLINE | ID: mdl-35177697

RESUMO

The COVID-19 pandemic represents an unprecedented global crisis necessitating novel approaches for, amongst others, early detection of emerging variants relating to the evolution and spread of the virus. Recently, the detection of SARS-CoV-2 RNA in wastewater has emerged as a useful tool to monitor the prevalence of the virus in the community. Here, we propose a novel methodology, called lineagespot, for the monitoring of mutations and the detection of SARS-CoV-2 lineages in wastewater samples using next-generation sequencing (NGS). Our proposed method was tested and evaluated using NGS data produced by the sequencing of 14 wastewater samples from the municipality of Thessaloniki, Greece, covering a 6-month period. The results showed the presence of SARS-CoV-2 variants in wastewater data. lineagespot was able to record the evolution and rapid domination of the Alpha variant (B.1.1.7) in the community, and allowed the correlation between the mutations evident through our approach and the mutations observed in patients from the same area and time periods. lineagespot is an open-source tool, implemented in R, and is freely available on GitHub and registered on bio.tools.

Assuntos

Mutação , SARS-CoV-2/genética , SARS-CoV-2/isolamento & purificação , Software , Águas Residuárias/virologia , Humanos

UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction.

Tsagiopoulou, Maria; Maniou, Maria Christina; Pechlivanis, Nikolaos; Togkousidis, Anastasis; Kotrová, Michaela; Hutzenlaub, Tobias; Kappas, Ilias; Chatzidimitriou, Anastasia; Psomopoulos, Fotis.

Front Genet ; 12: 660366, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34122513

RESUMO

A recent refinement in high-throughput sequencing involves the incorporation of unique molecular identifiers (UMIs), which are random oligonucleotide barcodes, on the library preparation steps. A UMI adds a unique identity to different DNA/RNA input molecules through polymerase chain reaction (PCR) amplification, thus reducing bias of this step. Here, we propose an alignment free framework serving as a preprocessing step of fastq files, called UMIc, for deduplication and correction of reads building consensus sequences from each UMI. Our approach takes into account the frequency and the Phred quality of nucleotides and the distances between the UMIs and the actual sequences. We have tested the tool using different scenarios of UMI-tagged library data, having in mind the aspect of a wide application. UMIc is an open-source tool implemented in R and is freely available from https://github.com/BiodataAnalysisGroup/UMIc.

miRkit: R framework analyzing miRNA PCR array data.

Tsagiopoulou, Maria; Togkousidis, Anastasis; Pechlivanis, Nikolaos; Maniou, Maria Christina; Batsali, Aristea; Matheakakis, Angelos; Pontikoglou, Charalampos; Psomopoulos, Fotis.

BMC Res Notes ; 14(1): 376, 2021 Sep 26.

Artigo em Inglês | MEDLINE | ID: mdl-34565441

RESUMO

OBJECTIVE: The characterization of microRNAs (miRNA) in recent years is an important advance in the field of gene regulation. To this end, several approaches for miRNA expression analysis and various bioinformatics tools have been developed over the last few years. It is a common practice to analyze miRNA PCR Array data using the commercially available software, mostly due to its convenience and ease-of-use. RESULTS: In this work we present miRkit, an open source framework written in R, that allows for the comprehensive analysis of RT-PCR data, from the processing of raw data to a functional analysis of the produced results. The main goal of the proposed tool is to provide an assessment of the samples' quality, perform data normalization by endogenous and exogenous miRNAs, and facilitate differential and functional enrichment analysis. The tool offers fast execution times with low memory usage, and is freely available under a ΜΙΤ license from https://bio.tools/mirkit . Overall, miRkit offers the full analysis from the raw RT-PCR data to functional analysis of targeted genes, and specifically designed to support the popular miScript miRNA PCR Array (Qiagen) technology.

Assuntos

MicroRNAs , Biologia Computacional , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , MicroRNAs/genética , Reação em Cadeia da Polimerase , Software

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA