RESUMO
Background: Building Metagenome-Assembled Genomes (MAGs) from highly complex metagenomics datasets encompasses a series of steps covering from cleaning the sequences, assembling them to finally group them into bins. Along the process, multiple tools aimed to assess the quality and integrity of each MAG are implemented. Nonetheless, even when incorporated within end-to-end pipelines, the outputs of these pieces of software must be visualized and analyzed manually lacking integration in a complete framework. Methods: We developed a Nextflow pipeline (MAGFlow) for estimating the quality of MAGs through a wide variety of approaches (BUSCO, CheckM2, GUNC and QUAST), as well as for annotating taxonomically the metagenomes using GTDB-Tk2. MAGFlow is coupled to a Python-Dash application (BIgMAG) that displays the concatenated outcomes from the tools included by MAGFlow, highlighting the most important metrics in a single interactive environment along with a comparison/clustering of the input data. Results: By using MAGFlow/BIgMAG, the user will be able to benchmark the MAGs obtained through different workflows or establish the quality of the MAGs belonging to different samples following the divide and rule methodology. Conclusions: MAGFlow/BIgMAG represents a unique tool that integrates state-of-the-art tools to study different quality metrics and extract visually as much information as possible from a wide range of genome features.
Assuntos
Metagenoma , Software , Metagenômica/métodos , Anotação de Sequência Molecular/métodosRESUMO
Microbial pangenome analysis identifies present or absent genes in prokaryotic genomes. However, current tools are limited when analyzing species with higher sequence diversity or higher taxonomic orders such as genera or families. The Roary ILP Bacterial core Annotation Pipeline (RIBAP) uses an integer linear programming approach to refine gene clusters predicted by Roary for identifying core genes. RIBAP successfully handles the complexity and diversity of Chlamydia, Klebsiella, Brucella, and Enterococcus genomes, outperforming other established and recent pangenome tools for identifying all-encompassing core genes at the genus level. RIBAP is a freely available Nextflow pipeline at github.com/hoelzer-lab/ribap and zenodo.org/doi/10.5281/zenodo.10890871.
Assuntos
Genoma Bacteriano , Anotação de Sequência Molecular , Software , Brucella/genética , Brucella/classificação , Bactérias/genética , Bactérias/classificação , Chlamydia/genética , Enterococcus/genética , Klebsiella/genéticaRESUMO
Ribosome profiling is a powerful technique to study translation at a transcriptome-wide level. However, ensuring good data quality is paramount for accurate interpretation, as is ensuring that the analyses are reproducible. We introduce a new Nextflow DSL2 pipeline, riboseq-flow, designed for processing and comprehensive quality control of ribosome profiling experiments. Riboseq-flow is user-friendly, versatile and upholds high standards in reproducibility, scalability, portability, version control and continuous integration. It enables users to efficiently analyse multiple samples in parallel and helps them evaluate the quality and utility of their data based on the detailed metrics and visualisations that are automatically generated. Riboseq-flow is available at https://github.com/iraiosub/riboseq-flow.
Ribosome profiling is a cutting-edge method that provides a detailed view of protein synthesis across the entire set of RNA molecules within cells. To ensure the reliability of such studies, high-quality data and the ability to replicate analyses are crucial. To address this, we present riboseq-flow, a new tool built with Nextflow DSL2, tailored for analysing data from ribosome profiling experiments. This pipeline stands out for its ease of use, flexibility, and commitment to high reproducibility standards. It's designed to handle multiple samples simultaneously, ensuring efficient analysis for large-scale studies. Moreover, riboseq-flow automatically generates detailed reports and visual representations to assess the data quality, enhancing researchers' understanding of their experiments and guiding future decisions. This valuable resource is freely accessible at https://github.com/iraiosub/riboseq-flow.
RESUMO
This study describes the development of a resource module that is part of a learning platform named "NIGMS Sandbox for Cloud-based Learning" (https://github.com/NIGMS/NIGMS-Sandbox). The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on de novo transcriptome assembly using Nextflow in an interactive format that uses appropriate cloud resources for data access and analysis. Cloud computing is a powerful new means by which biomedical researchers can access resources and capacity that were previously either unattainable or prohibitively expensive. To take advantage of these resources, however, the biomedical research community needs new skills and knowledge. We present here a cloud-based training module, developed in conjunction with Google Cloud, Deloitte Consulting, and the NIH STRIDES Program, that uses the biological problem of de novo transcriptome assembly to demonstrate and teach the concepts of computational workflows (using Nextflow) and cost- and resource-efficient use of Cloud services (using Google Cloud Platform). Our work highlights the reduced necessity of on-site computing resources and the accessibility of cloud-based infrastructure for bioinformatics applications.
Assuntos
Computação em Nuvem , Transcriptoma , Biologia Computacional/métodos , Biologia Computacional/educação , Software , Humanos , Perfilação da Expressão Gênica/métodos , InternetRESUMO
BACKGROUND: Variable number tandem repeats (VNTRs) are highly polymorphic DNA regions harboring many potentially disease-causing variants. However, VNTRs often appear unresolved ("dark") in variation databases due to their repetitive nature. One particularly complex and medically relevant VNTR is the KIV-2 VNTR located in the cardiovascular disease gene LPA which encompasses up to 70% of the coding sequence. RESULTS: Using the highly complex LPA gene as a model, we develop a computational approach to resolve intra-repeat variation in VNTRs from largely available short-read sequencing data. We apply the approach to six protein-coding VNTRs in 2504 samples from the 1000 Genomes Project and developed an optimized method for the LPA KIV-2 VNTR that discriminates the confounding KIV-2 subtypes upfront. This results in an F1-score improvement of up to 2.1-fold compared to previously published strategies. Finally, we analyze the LPA VNTR in > 199,000 UK Biobank samples, detecting > 700 KIV-2 mutations. This approach successfully reveals new strong Lp(a)-lowering effects for KIV-2 variants, with protective effect against coronary artery disease, and also validated previous findings based on tagging SNPs. CONCLUSIONS: Our approach paves the way for reliable variant detection in VNTRs at scale and we show that it is transferable to other dark regions, which will help unlock medical information hidden in VNTRs.
Assuntos
Doenças Cardiovasculares , Repetições Minissatélites , Humanos , Doenças Cardiovasculares/genética , Variação Genética , Análise de Sequência de DNA/métodos , Lipoproteína(a)/genética , Predisposição Genética para DoençaRESUMO
We designed a Nextflow DSL2-based pipeline, Spatial Transcriptomics Quantification (STQ), for simultaneous processing of 10x Genomics Visium spatial transcriptomics data and a matched hematoxylin and eosin (H&E)-stained whole-slide image (WSI), optimized for patient-derived xenograft (PDX) cancer specimens. Our pipeline enables the classification of sequenced transcripts for deconvolving the mouse and human species and mapping the transcripts to reference transcriptomes. We align the H&E WSI with the spatial layout of the Visium slide and generate imaging and quantitative morphology features for each Visium spot. The pipeline design enables multiple analysis workflows, including single or dual reference genome input and stand-alone image analysis. We show the utility of our pipeline on a dataset from Visium profiling of four melanoma PDX samples. The clustering of Visium spots and clustering of H&E imaging features reveal similar patterns arising from the two data modalities.
Assuntos
Xenoenxertos , Humanos , Animais , Camundongos , Perfilação da Expressão Gênica/métodos , Amarelo de Eosina-(YS) , Hematoxilina , Transcriptoma , Processamento de Imagem Assistida por Computador/métodos , Ensaios Antitumorais Modelo de XenoenxertoRESUMO
Cellular barcoding is a lineage-tracing methodology that couples heritable synthetic barcodes to high-throughput sequencing, enabling the accurate tracing of cell lineages across a range of biological contexts. Recent studies have extended these methods by incorporating lineage information into single-cell or spatial transcriptomics readouts. Leveraging the rich biological information within these datasets requires dedicated computational tools for dataset pre-processing and analysis. Here, we present BARtab, a portable and scalable Nextflow pipeline, and bartools, an open-source R package, designed to provide an integrated end-to-end cellular barcoding analysis toolkit. BARtab and bartools contain methods to simplify the extraction, quality control, analysis, and visualization of lineage barcodes from population-level, single-cell, and spatial transcriptomics experiments. We showcase the utility of our integrated BARtab and bartools workflow via the analysis of exemplar bulk, single-cell, and spatial transcriptomics experiments containing cellular barcoding information.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Análise de Célula Única , Transcriptoma , Análise de Célula Única/métodos , Humanos , Software , Código de Barras de DNA Taxonômico/métodos , Genoma/genética , Linhagem da Célula/genética , Perfilação da Expressão Gênica/métodos , Biologia Computacional/métodos , AnimaisRESUMO
Single-cell multiplexing techniques (cell hashing and genetic multiplexing) combine multiple samples, optimizing sample processing and reducing costs. Cell hashing conjugates antibody-tags or chemical-oligonucleotides to cell membranes, while genetic multiplexing allows to mix genetically diverse samples and relies on aggregation of RNA reads at known genomic coordinates. We develop hadge (hashing deconvolution combined with genotype information), a Nextflow pipeline that combines 12 methods to perform both hashing- and genotype-based deconvolution. We propose a joint deconvolution strategy combining best-performing methods and demonstrate how this approach leads to the recovery of previously discarded cells in a nuclei hashing of fresh-frozen brain tissue.
Assuntos
Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Encéfalo/metabolismo , Encéfalo/citologia , Software , GenótipoRESUMO
Population genomic analyses such as inference of population structure and identifying signatures of selection usually involve the application of a plethora of tools. The installation of tools and their dependencies, data transformation, or series of data preprocessing in a particular order sometimes makes the analyses challenging. While the usage of container-based technologies has significantly resolved the problems associated with the installation of tools and their dependencies, population genomic analyses requiring multistep pipelines or complex data transformation can greatly be facilitated by the application of workflow management systems such as Nextflow and Snakemake. Here, we present scalepopgen, a collection of fully automated workflows that can carry out widely used population genomic analyses on the biallelic single nucleotide polymorphism data stored in either variant calling format files or the plink-generated binary files. scalepopgen is developed in Nextflow and can be run locally or on high-performance computing systems using either Conda, Singularity, or Docker. The automated workflow includes procedures such as (i) filtering of individuals and genotypes; (ii) principal component analysis, admixture with identifying optimal K-values; (iii) running TreeMix analysis with or without bootstrapping and migration edges, followed by identification of an optimal number of migration edges; (iv) implementing single-population and pair-wise population comparison-based procedures to identify genomic signatures of selection. The pipeline uses various open-source tools; additionally, several Python and R scripts are also provided to collect and visualize the results. The tool is freely available at https://github.com/Popgen48/scalepopgen.
Assuntos
Metagenômica , Software , Humanos , Fluxo de Trabalho , Genômica/métodos , Biologia Computacional/métodosRESUMO
In recent years, data-independent acquisition (DIA) has emerged as a powerful analysis method in biological mass spectrometry (MS). Compared to the previously predominant data-dependent acquisition (DDA), it offers a way to achieve greater reproducibility, sensitivity, and dynamic range in MS measurements. To make DIA accessible to non-expert users, a multifunctional, automated high-throughput pipeline DIAproteomics was implemented in the computational workflow framework "Nextflow" ( https://nextflow.io ). This allows high-throughput processing of proteomics and peptidomics DIA datasets on diverse computing infrastructures. This chapter provides a short summary and usage protocol guide for the most important modes of operation of this pipeline regarding the analysis of peptidomics datasets using the command line. In brief, DIAproteomics is a wrapper around the OpenSwathWorkflow and relies on either existing or ad-hoc generated spectral libraries from matching DDA runs. The OpenSwathWorkflow extracts chromatograms from the DIA runs and performs chromatographic peak-picking. Further downstream of the pipeline, these peaks are scored, aligned, and statistically evaluated for qualitative and quantitative differences across conditions depending on the user's interest. DIAproteomics is open-source and available under a permissive license. We encourage the scientific community to use or modify the pipeline to meet their specific requirements.
Assuntos
Proteoma , Proteômica , Reprodutibilidade dos Testes , Proteômica/métodos , Espectrometria de Massas/métodos , Cromatografia Líquida/métodos , Fluxo de Trabalho , Proteoma/análise , SoftwareRESUMO
Understanding the link between the human gut virome and diseases has garnered significant interest in the research community. Extracting virus-related information from metagenomic sequencing data is crucial for unravelling virus composition, host interactions, and disease associations. However, current metagenomic analysis workflows for viral genomes vary in effectiveness, posing challenges for researchers seeking the most up-to-date tools. To address this, we present ViromeFlowX, a user-friendly Nextflow workflow that automates viral genome assembly, identification, classification, and annotation. This streamlined workflow integrates cutting-edge tools for processing raw sequencing data for taxonomic annotation and functional analysis. Application to a dataset of 200 metagenomic samples yielded high-quality viral genomes. ViromeFlowX enables efficient mining of viral genomic data, offering a valuable resource to investigate the gut virome's role in virus-host interactions and virus-related diseases.
Assuntos
Genoma Viral , Metagenoma , Humanos , Fluxo de Trabalho , Interações entre Hospedeiro e Microrganismos , MetagenômicaRESUMO
Background: Advancements in DNA sequencing technology have transformed the field of bacterial genomics, allowing for faster and more cost effective chromosome level assemblies compared to a decade ago. However, transforming raw reads into a complete genome model is a significant computational challenge due to the varying quality and quantity of data obtained from different sequencing instruments, as well as intrinsic characteristics of the genome and desired analyses. To address this issue, we have developed a set of container-based pipelines using Nextflow, offering both common workflows for inexperienced users and high levels of customization for experienced ones. Their processing strategies are adaptable based on the sequencing data type, and their modularity enables the incorporation of new components to address the community's evolving needs. Methods: These pipelines consist of three parts: quality control, de novo genome assembly, and bacterial genome annotation. In particular, the genome annotation pipeline provides a comprehensive overview of the genome, including standard gene prediction and functional inference, as well as predictions relevant to clinical applications such as virulence and resistance gene annotation, secondary metabolite detection, prophage and plasmid prediction, and more. Results: The annotation results are presented in reports, genome browsers, and a web-based application that enables users to explore and interact with the genome annotation results. Conclusions: Overall, our user-friendly pipelines offer a seamless integration of computational tools to facilitate routine bacterial genomics research. The effectiveness of these is illustrated by examining the sequencing data of a clinical sample of Klebsiella pneumoniae.
Assuntos
Genoma Bacteriano , Software , Análise de Sequência de DNA/métodos , Anotação de Sequência Molecular , Sequência de BasesRESUMO
Crosslinking and immunoprecipitation (CLIP) technologies have become a central component of the molecular biologists' toolkit to study protein-RNA interactions and thus to uncover core principles of RNA biology. There has been a proliferation of CLIP-based experimental protocols, as well as computational tools, especially for peak-calling. Consequently, there is an urgent need for a well-documented bioinformatic pipeline that enshrines the principles of robustness, reproducibility, scalability, portability and flexibility while embracing the diversity of experimental and computational CLIP tools. To address this, we present nf-core/clipseq - a robust Nextflow pipeline for quality control and analysis of CLIP sequencing data. It is part of the international nf-core community effort to develop and curate a best-practice, gold-standard set of pipelines for data analysis. The standards enabled by Nextflow and nf-core, including workflow management, version control, continuous integration and containerisation ensure that these key needs are met. Furthermore, multiple tools are implemented ( e.g. for peak-calling), alongside visualisation of quality control metrics to empower the user to make their own informed decisions based on their data. nf-core/clipseq remains under active development, with plans to incorporate newly released tools to ensure that pipeline remains up-to-date and relevant for the community. Engagement with users and developers is encouraged through the nf-core GitHub repository and Slack channel to promote collaboration. It is available at https://nf-co.re/clipseq.
RESUMO
Premise: The HybPiper pipeline has become one of the most widely used tools for the assembly of target capture data for phylogenomic analysis. After the production of locus sequences and before phylogenetic analysis, the identification of paralogs is a critical step for ensuring the accurate inference of evolutionary relationships. Algorithmic approaches using gene tree topologies for the inference of ortholog groups are computationally efficient and broadly applicable to non-model organisms, especially in the absence of a known species tree. Methods and Results: We containerized and expanded the functionality of both HybPiper and a pipeline for the inference of ortholog groups, providing novel options for the treatment of target capture sequence data, and allowing seamless use of the outputs of the former as inputs for the latter. The Singularity container presented here includes all dependencies, and the corresponding pipelines (hybpiper-nf and paragone-nf, respectively) are implemented via two Nextflow scripts for easier deployment and to vastly reduce the number of commands required for their use. Conclusions: The hybpiper-nf and paragone-nf pipelines are easily installed and provide a user-friendly experience and robust results to the phylogenetic community. They are used by the Australian Angiosperm Tree of Life project. The pipelines are available at https://github.com/chrisjackson-pellicle/hybpiper-nf and https://github.com/chrisjackson-pellicle/paragone-nf.
RESUMO
Microbes commonly organize into communities consisting of hundreds of species involved in complex interactions with each other. 16S ribosomal RNA (16S rRNA) amplicon profiling provides snapshots that reveal the phylogenies and abundance profiles of these microbial communities. These snapshots, when collected from multiple samples, can reveal the co-occurrence of microbes, providing a glimpse into the network of associations in these communities. However, the inference of networks from 16S data involves numerous steps, each requiring specific tools and parameter choices. Moreover, the extent to which these steps affect the final network is still unclear. In this study, we perform a meticulous analysis of each step of a pipeline that can convert 16S sequencing data into a network of microbial associations. Through this process, we map how different choices of algorithms and parameters affect the co-occurrence network and identify the steps that contribute substantially to the variance. We further determine the tools and parameters that generate robust co-occurrence networks and develop consensus network algorithms based on benchmarks with mock and synthetic data sets. The Microbial Co-occurrence Network Explorer, or MiCoNE (available at https://github.com/segrelab/MiCoNE) follows these default tools and parameters and can help explore the outcome of these combinations of choices on the inferred networks. We envisage that this pipeline could be used for integrating multiple data sets and generating comparative analyses and consensus networks that can guide our understanding of microbial community assembly in different biomes. IMPORTANCE Mapping the interrelationships between different species in a microbial community is important for understanding and controlling their structure and function. The surge in the high-throughput sequencing of microbial communities has led to the creation of thousands of data sets containing information about microbial abundances. These abundances can be transformed into co-occurrence networks, providing a glimpse into the associations within microbiomes. However, processing these data sets to obtain co-occurrence information relies on several complex steps, each of which involves numerous choices of tools and corresponding parameters. These multiple options pose questions about the robustness and uniqueness of the inferred networks. In this study, we address this workflow and provide a systematic analysis of how these choices of tools affect the final network and guidelines on appropriate tool selection for a particular data set. We also develop a consensus network algorithm that helps generate more robust co-occurrence networks based on benchmark synthetic data sets.
Assuntos
Consórcios Microbianos , Microbiota , RNA Ribossômico 16S/genética , Microbiota/genética , Algoritmos , Sequenciamento de Nucleotídeos em Larga EscalaRESUMO
BACKGROUND: A largely unexplored area of research is the identification and characterization of circular RNA (circRNA) in cystic fibrosis (CF). This study is the first to identify and characterize alterations in circRNA expression in cells lacking CFTR function. The circRNA expression profiles in whole blood transcriptomes from CF patients homozygous for the pathogenetic variant F508delCFTR are compared to healthy controls. METHODS: We developed a circRNA pipeline called circRNAFlow utilizing Nextflow. Whole blood transcriptomes from CF patients homozygous for the F508delCFTR-variant and healthy controls were utilized as input to circRNAFlow to discover dysregulated circRNA expression in CF samples compared to wild-type controls. Pathway enrichment analyzes were performed to investigate potential functions of dysregulated circRNAs in whole blood transcriptomes from CF samples compared to wild-type controls. RESULTS: A total of 118 dysregulated circRNAs were discovered in whole blood transcriptomes from CF patients homozygous for the F508delCFTR variant compared to healthy controls. 33 circRNAs were up regulated whilst 85 circRNAs were down regulated in CF samples compared to healthy controls. The overrepresented pathways of the host genes harboring dysregulated circRNA in CF samples compared to controls include positive regulation of responses to endoplasmic reticulum stress, intracellular transport, protein serine/threonine kinase activity, phospholipid-translocating ATPase complex, ferroptosis and cellular senescence. These enriched pathways corroborate the role of dysregulated cellular senescence in CF. CONCLUSION: This study highlights the underexplored roles of circRNAs in CF with a perspective to provide a more complete molecular characterization of CF.
Assuntos
Fibrose Cística , MicroRNAs , Humanos , RNA Circular/genética , RNA/genética , Transcriptoma , Fibrose Cística/genética , Senescência Celular , MicroRNAs/metabolismoRESUMO
This chapter describes MasterOfPores v.2 (MoP2), an open-source suite of pipelines for processing and analyzing direct RNA Oxford Nanopore sequencing data. The MoP2 relies on the Nextflow DSL2 framework and Linux containers, thus enabling reproducible data analysis in transcriptomic and epitranscriptomic studies. We introduce the key concepts of MoP2 and provide a step-by-step fully reproducible and complete example of how to use the workflow for the analysis of S. cerevisiae total RNA samples sequenced using MinION flowcells. The workflow starts with the pre-processing of raw FAST5 files, which includes basecalling, read quality control, demultiplexing, filtering, mapping, estimation of per-gene/transcript abundances, and transcriptome assembly, with support of the GPU computing for the basecalling and read demultiplexing steps. The secondary analyses of the workflow focus on the estimation of RNA poly(A) tail lengths and the identification of RNA modifications. The MoP2 code is available at https://github.com/biocorecrg/MOP2 and is distributed under the MIT license.
Assuntos
Sequenciamento por Nanoporos , Nanoporos , Software , RNA/genética , Saccharomyces cerevisiae/genética , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de RNARESUMO
Long-read sequencing has revolutionized genome assembly, yielding highly contiguous, chromosome-level contigs. However, assemblies from some third generation long read technologies, such as Pacific Biosciences (PacBio) continuous long reads (CLR), have a high error rate. Such errors can be corrected with short reads through a process called polishing. Although best practices for polishing non-model de novo genome assemblies were recently described by the Vertebrate Genome Project (VGP) Assembly community, there is a need for a publicly available, reproducible workflow that can be easily implemented and run on a conventional high performance computing environment. Here, we describe polishCLR (https://github.com/isugifNF/polishCLR), a reproducible Nextflow workflow that implements best practices for polishing assemblies made from CLR data. PolishCLR can be initiated from several input options that extend best practices to suboptimal cases. It also provides re-entry points throughout several key processes, including identifying duplicate haplotypes in purge_dups, allowing a break for scaffolding if data are available, and throughout multiple rounds of polishing and evaluation with Arrow and FreeBayes. PolishCLR is containerized and publicly available for the greater assembly community as a tool to complete assemblies from existing, error-prone long-read data.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Fluxo de Trabalho , HaplótiposRESUMO
BACKGROUND: Circular RNAs (circRNAs) are a class of covalenty closed non-coding RNAs that have garnered increased attention from the research community due to their stability, tissue-specific expression and role as transcriptional modulators via sequestration of miRNAs. Currently, multiple quantification tools capable of detecting circRNAs exist, yet none delineate circRNA-miRNA interactions, and only one employs differential expression analysis. Efforts have been made to bridge this gap by way of circRNA workflows, however these workflows are limited by both the types of analyses available and computational skills required to run them. RESULTS: We present nf-core/circrna, a multi-functional, automated high-throughput pipeline implemented in nextflow that allows users to characterise the role of circRNAs in RNA Sequencing datasets via three analysis modules: (1) circRNA quantification, robust filtering and annotation (2) miRNA target prediction of the mature spliced sequence and (3) differential expression analysis. nf-core/circrna has been developed within the nf-core framework, ensuring robust portability across computing environments via containerisation, parallel deployment on cluster/cloud-based infrastructures, comprehensive documentation and maintenance support. CONCLUSION: nf-core/circrna reduces the barrier to entry for researchers by providing an easy-to-use, platform-independent and scalable workflow for circRNA analyses. Source code, documentation and installation instructions are freely available at https://nf-co.re/circrna and https://github.com/nf-core/circrna .
Assuntos
MicroRNAs , MicroRNAs/genética , MicroRNAs/metabolismo , RNA Circular , Fluxo de Trabalho , Software , Análise de Sequência de RNARESUMO
Background: Accurate genome sequences form the basis for genomic surveillance programs, the added value of which was impressively demonstrated during the COVID-19 pandemic by tracing transmission chains, discovering new viral lineages and mutations, and assessing them for infectiousness and resistance to available treatments. Amplicon strategies employing Illumina sequencing have become widely established for variant detection and reference-based reconstruction of SARS-CoV-2 genomes, and are routine bioinformatics tasks. Yet, specific challenges arise when analyzing amplicon data, for example, when crucial and even lineage-determining mutations occur near primer sites. Methods: We present CoVpipe2, a bioinformatics workflow developed at the Public Health Institute of Germany to reconstruct SARS-CoV-2 genomes based on short-read sequencing data accurately. The decisive factor here is the reliable, accurate, and rapid reconstruction of genomes, considering the specifics of the used sequencing protocol. Besides fundamental tasks like quality control, mapping, variant calling, and consensus generation, we also implemented additional features to ease the detection of mixed samples and recombinants. Results: We highlight common pitfalls in primer clipping, detecting heterozygote variants, and dealing with low-coverage regions and deletions. We introduce CoVpipe2 to address the above challenges and have compared and successfully validated the pipeline against selected publicly available benchmark datasets. CoVpipe2 features high usability, reproducibility, and a modular design that specifically addresses the characteristics of short-read amplicon protocols but can also be used for whole-genome short-read sequencing data. Conclusions: CoVpipe2 has seen multiple improvement cycles and is continuously maintained alongside frequently updated primer schemes and new developments in the scientific community. Our pipeline is easy to set up and use and can serve as a blueprint for other pathogens in the future due to its flexibility and modularity, providing a long-term perspective for continuous support. CoVpipe2 is written in Nextflow and is freely accessible from \href{https://github.com/rki-mf1/CoVpipe2}{github.com/rki-mf1/CoVpipe2} under the GPL3 license.