RESUMO
The Angiosperms353 gene set (AGS) consists of a set of 353 universal low-copy nuclear genes that were selected by examining more than 600 angiosperm species. These genes can be used for phylogenetic studies and population genetics at multiple taxonomic scales. However, current pipelines are not able to recover Angiosperms353 genes efficiently and accurately from high-throughput sequences. Here, we developed Easy353, a reference-guided assembly tool to recover the AGS from high-throughput sequencing (HTS) data (including genome skimming, RNA-seq, and target enrichment). Easy353 is an open-source user-friendly assembler for diverse types of high-throughput data. It has a graphical user interface and a command-line interface that is compatible with all widely-used computer systems. Evaluations, based on both simulated and empirical data, suggest that Easy353 yields low rates of assembly errors.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Filogenia , GenomaRESUMO
The advancement of next-generation sequencing (NGS) technologies has been revolutionary for the field of evolutionary biology. This technology has led to an abundance of available genomes and transcriptomes for researchers to mine. Specifically, researchers can mine for various types of molecular markers that are vital for phylogenetic, evolutionary and ecological studies. Numerous tools have been developed to extract these molecular markers from NGS data. However, due to an insufficient number of well-annotated reference genomes for non-model organisms, it remains challenging to obtain these markers accurately and efficiently. Here, we present GeneMiner, an improved and expanded version of our previous tool, Easy353. GeneMiner combines the reference-guided de Bruijn graph assembly with seed self-discovery and greedy extension. Additionally, it includes a verification step using a parameter-bootstrap method to reduce the pitfalls associated with using a relatively distant reference. Our results, using both experimental and simulation data, showed GeneMiner can accurately acquire phylogenetic molecular markers for plants using transcriptomic, genomic and other NGS data. GeneMiner is designed to be user-friendly, fast and memory-efficient. Further, it is compatible with Linux, Windows and macOS. All source codes are publicly available on GitHub (https://github.com/sculab/GeneMiner) and Gitee (https://gitee.com/sculab/GeneMiner) for easy accessibility and transparency.