RESUMEN
The pig gut virome plays a vital role in the gut microbial ecosystem of pigs. However, a comprehensive understanding of their diversity and a reference database for the virome are currently lacking. To address this gap, we established a Pig Virome Database (PVD) that comprised of 5,566,804 viral contig sequences from 4650 publicly available gut metagenomic samples using a pipeline designated "metav". By clustering sequences, we identified 48,299 viral operational taxonomic units (vOTUs) genomes of at least medium quality, of which 92.83% of which were not found in existing major databases. The majority of vOTUs were identified as Caudoviricetes (72.21%). The PVD database contained a total of 2,362,631 protein-coding genes across the above medium-quality vOTUs genomes that can be used to explore the functional potential of the pig gut virome. These findings highlight the extensive diversity of viruses in the pig gut and provide a pivotal reference dataset for forthcoming research concerning the pig gut virome.
Asunto(s)
Microbioma Gastrointestinal , Genoma Viral , Metagenómica , Viroma , Virus , Animales , Porcinos , Viroma/genética , Metagenómica/métodos , Virus/genética , Virus/clasificación , Virus/aislamiento & purificación , Minería de Datos , Metagenoma , FilogeniaRESUMEN
BACKGROUND: The genetics and molecular biology of sesame has only recently begun to be studied even though sesame is an important oil seed crop. A high-density genetic map for sesame has not been published yet due to a lack of sufficient molecular markers. Specific length amplified fragment sequencing (SLAF-seq) is a recently developed high-resolution strategy for large-scale de novo SNP discovery and genotyping. SLAF-seq was employed in this study to obtain sufficient markers to construct a high-density genetic map for sesame. RESULTS: In total, 28.21 Gb of data containing 201,488,285 pair-end reads was obtained after sequencing. The average coverage for each SLAF marker was 23.48-fold in the male parent, 23.38-fold in the female parent, and 14.46-fold average in each F2 individual. In total, 71,793 high-quality SLAFs were detected of which 3,673 SLAFs were polymorphic and 1,272 of the polymorphic markers met the requirements for use in the construction of a genetic map. The final map included 1,233 markers on the 15 linkage groups (LGs) and was 1,474.87 cM in length with an average distance of 1.20 cM between adjacent markers. To our knowledge, this map is the densest genetic linkage map to date for sesame. 'SNP_only' markers accounted for 87.51% of the markers on the map. A total of 205 markers on the map showed significant (P < 0.05) segregation distortion. CONCLUSIONS: We report here the first high-density genetic map for sesame. The map was constructed using an F2 population and the SLAF-seq approach, which allowed the efficient development of a large number of polymorphic markers in a short time. Results of this study will not only provide a platform for gene/QTL fine mapping, map-based gene isolation, and molecular breeding for sesame, but will also serve as a reference for positioning sequence scaffolds on a physical map, to assist in the process of assembling the sesame genome sequence.
Asunto(s)
Sesamum/genética , Mapeo Cromosómico , Genotipo , Polimorfismo Genético/genética , Sitios de Carácter CuantitativoRESUMEN
Polymerase chain reaction (PCR) variants requiring specific primer types are widely used in various PCR experiments, including generic PCR, inverse PCR, anchored PCR, and ARMS PCR. Few tools can be adapted for multiple PCR variants, and many tools select primers by filtration based on the given parameters, which result in frequent design failures. Here we introduce PrimerScore2, a robust high-throughput primer design tool that can design primers in one click for multiple PCR variants. It scores primers using a piecewise logistic model and the highest-scored primers are selected avoiding the issue of design failure and the necessity to loosen parameters to redesign, and it creatively evaluates specificity by predicting the efficiencies of all target/non-target products. To assess the prediction accuracy of the scores and efficiencies, two next generation sequencing (NGS) libraries were constructed-a 12-plex and a 57-plex-and the results showed that 17 out of 19 (89.5%) low-scoring pairs had a poor depth, 18 out of 19 (94.7%) high-scoring pairs had a high depth, and the depth ratios of the products were linearly correlated with the predicted efficiencies with a slope of 1.025 and a coefficient of determination (R2) 0.935. 116-plex and 114-plex anchored PCR panels designed by PrimerScore2 were applied to 26 maternal plasma samples with male fetuses, the results showed that the predicted fetal DNA fractions were concordant with fractions measured in gold standard method (Y fractions). PrimerScore2 was also used to design 77 monoplex Sanger sequencing primers, the sequencing results indicated that all the primers were effective.
Asunto(s)
Modelos Logísticos , Masculino , Humanos , Reacción en Cadena de la PolimerasaRESUMEN
The precisionFDA Truth Challenge V2 aimed to assess the state of the art of variant calling in challenging genomic regions. Starting with FASTQs, 20 challenge participants applied their variant-calling pipelines and submitted 64 variant call sets for one or more sequencing technologies (Illumina, PacBio HiFi, and Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with updated Genome in a Bottle benchmark sets and genome stratifications. Challenge submissions included numerous innovative methods, with graph-based and machine learning methods scoring best for short-read and long-read datasets, respectively. With machine learning approaches, combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants.
RESUMEN
Noninvasive prenatal testing of common aneuploidies has become routine over the past decade, but testing of monogenic disorders remains a challenge in clinical implementation. Most recent studies have inherent limitations, such as complicated procedures, a lack of versatility, and the need for prior knowledge of parental genotypes or haplotypes. To overcome these limitations, a robust and versatile next-generation sequencing-based cell-free DNA (cfDNA) allelic molecule counting system termed cfDNA barcode-enabled single-molecule test (cfBEST) is developed for the noninvasive prenatal diagnosis (NIPD) of monogenic disorders. The accuracy of cfBEST is found to be comparable to that of droplet digital polymerase chain reaction (ddPCR) in detecting low-abundance mutations in cfDNA. The analytical validity of cfBEST is evidenced by a ß-thalassemia assay, in which a blind validation study of 143 at-risk pregnancies reveals a sensitivity of 99.19% and a specificity of 99.92% on allele detection. Because the validated cfBEST method can be used to detect maternal-fetal genotype combinations in cfDNA precisely and quantitatively, it holds the potential for the NIPD of human monogenic disorders.
RESUMEN
Linkage maps enable the study of important biological questions. The construction of high-density linkage maps appears more feasible since the advent of next-generation sequencing (NGS), which eases SNP discovery and high-throughput genotyping of large population. However, the marker number explosion and genotyping errors from NGS data challenge the computational efficiency and linkage map quality of linkage study methods. Here we report the HighMap method for constructing high-density linkage maps from NGS data. HighMap employs an iterative ordering and error correction strategy based on a k-nearest neighbor algorithm and a Monte Carlo multipoint maximum likelihood algorithm. Simulation study shows HighMap can create a linkage map with three times as many markers as ordering-only methods while offering more accurate marker orders and stable genetic distances. Using HighMap, we constructed a common carp linkage map with 10,004 markers. The singleton rate was less than one-ninth of that generated by JoinMap4.1. Its total map distance was 5,908 cM, consistent with reports on low-density maps. HighMap is an efficient method for constructing high-density, high-quality linkage maps from high-throughput population NGS data. It will facilitate genome assembling, comparative genomic analysis, and QTL studies. HighMap is available at http://highmap.biomarker.com.cn/.
Asunto(s)
Mapeo Cromosómico , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Algoritmos , Animales , Carpas/genética , Marcadores Genéticos/genética , Técnicas de GenotipajeRESUMEN
Large-scale genotyping plays an important role in genetic association studies. It has provided new opportunities for gene discovery, especially when combined with high-throughput sequencing technologies. Here, we report an efficient solution for large-scale genotyping. We call it specific-locus amplified fragment sequencing (SLAF-seq). SLAF-seq technology has several distinguishing characteristics: i) deep sequencing to ensure genotyping accuracy; ii) reduced representation strategy to reduce sequencing costs; iii) pre-designed reduced representation scheme to optimize marker efficiency; and iv) double barcode system for large populations. In this study, we tested the efficiency of SLAF-seq on rice and soybean data. Both sets of results showed strong consistency between predicted and practical SLAFs and considerable genotyping accuracy. We also report the highest density genetic map yet created for any organism without a reference genome sequence, common carp in this case, using SLAF-seq data. We detected 50,530 high-quality SLAFs with 13,291 SNPs genotyped in 211 individual carp. The genetic map contained 5,885 markers with 0.68 cM intervals on average. A comparative genomics study between common carp genetic map and zebrafish genome sequence map showed high-quality SLAF-seq genotyping results. SLAF-seq provides a high-resolution strategy for large-scale genotyping and can be generally applicable to various species and populations.