Búsqueda | Portal Regional de la BVS

An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.

Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K.

PLoS One ; 9(7): e101754, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-25003610

RESUMEN

Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone free software.

Asunto(s)

Minería de Datos/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Polimorfismo de Nucleótido Simple , Programas Informáticos , Cruzamiento , Biología Computacional/métodos , Genoma de Planta , Genómica/métodos , Internet , Plantas/genética

Coverage-based consensus calling (CbCC) of short sequence reads and comparison of CbCC results to identify SNPs in chickpea (Cicer arietinum; Fabaceae), a crop species without a reference genome.

Azam, Sarwar; Thakur, Vivek; Ruperao, Pradeep; Shah, Trushar; Balaji, Jayashree; Amindala, BhanuPrakash; Farmer, Andrew D; Studholme, David J; May, Gregory D; Edwards, David; Jones, Jonathan D G; Varshney, Rajeev K.

Am J Bot ; 99(2): 186-92, 2012 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-22301893

RESUMEN

PREMISE OF THE STUDY: Next-generation sequencing (NGS) technologies are frequently used for resequencing and mining of single nucleotide polymorphisms (SNPs) by comparison to a reference genome. In crop species such as chickpea (Cicer arietinum) that lack a reference genome sequence, NGS-based SNP discovery is a challenge. Therefore, unlike probability-based statistical approaches for consensus calling and by comparison with a reference sequence, a coverage-based consensus calling (CbCC) approach was applied and two genotypes were compared for SNP identification. METHODS: A CbCC approach is used in this study with four commonly used short read alignment tools (Maq, Bowtie, Novoalign, and SOAP2) and 15.7 and 22.1 million Illumina reads for chickpea genotypes ICC4958 and ICC1882, together with the chickpea trancriptome assembly (CaTA). KEY RESULTS: A nonredundant set of 4543 SNPs was identified between two chickpea genotypes. Experimental validation of 224 randomly selected SNPs showed superiority of Maq among individual tools, as 50.0% of SNPs predicted by Maq were true SNPs. For combinations of two tools, greatest accuracy (55.7%) was reported for Maq and Bowtie, with a combination of Bowtie, Maq, and Novoalign identifying 61.5% true SNPs. SNP prediction accuracy generally increased with increasing reads depth. CONCLUSIONS: This study provides a benchmark comparison of tools as well as read depths for four commonly used tools for NGS SNP discovery in a crop species without a reference genome sequence. In addition, a large number of SNPs have been identified in chickpea that would be useful for molecular breeding.

Asunto(s)

Cicer/genética , Secuencia de Consenso , Productos Agrícolas/genética , Genoma de Planta , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , Mapeo Cromosómico/métodos , Biología Computacional/métodos , ADN de Plantas/genética , Variación Genética , Genotipo , Estándares de Referencia , Reproducibilidad de los Resultados , Alineación de Secuencia/métodos , Transcriptoma

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA