Pesquisa | BVS IEC

SASI-Seq: sample assurance Spike-Ins, and highly differentiating 384 barcoding for Illumina sequencing.

Quail, Michael A; Smith, Miriam; Jackson, David; Leonard, Steven; Skelly, Thomas; Swerdlow, Harold P; Gu, Yong; Ellis, Peter.

BMC Genomics ; 15: 110, 2014 Feb 07.

Artigo em Inglês | MEDLINE | ID: mdl-24507442

RESUMO

BACKGROUND: A minor but significant fraction of samples subjected to next-generation sequencing methods are either mixed-up or cross-contaminated. These events can lead to false or inconclusive results. We have therefore developed SASI-Seq; a process whereby a set of uniquely barcoded DNA fragments are added to samples destined for sequencing. From the final sequencing data, one can verify that all the reads derive from the original sample(s) and not from contaminants or other samples. RESULTS: By adding a mixture of three uniquely barcoded amplicons, of different sizes spanning the range of insert sizes one would normally use for Illumina sequencing, at a spike-in level of approximately 0.1%, we demonstrate that these fragments remain intimately associated with the sample. They can be detected following even the tightest size selection regimes or exome enrichment and can report the occurrence of sample mix-ups and cross-contamination.As a consequence of this work, we have designed a set of 384 eleven-base Illumina barcode sequences that are at least 5 changes apart from each other, allowing for single-error correction and very low levels of barcode misallocation due to sequencing error. CONCLUSION: SASI-Seq is a simple, inexpensive and flexible tool that enables sample assurance, allows deconvolution of sample mix-ups and reports levels of cross-contamination between samples throughout NGS workflows.

Assuntos

Análise de Sequência de DNA/métodos , DNA/química , DNA/metabolismo , Contaminação por DNA , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos

Optimal enzymes for amplifying sequencing libraries.

Quail, Michael A; Otto, Thomas D; Gu, Yong; Harris, Simon R; Skelly, Thomas F; McQuillan, Jacqueline A; Swerdlow, Harold P; Oyola, Samuel O.

Nat Methods ; 9(1): 10-1, 2011 Dec 28.

Artigo em Inglês | MEDLINE | ID: mdl-22205512

Assuntos

DNA Polimerase Dirigida por DNA/metabolismo , Biblioteca Genômica , Técnicas de Amplificação de Ácido Nucleico/métodos , Reação em Cadeia da Polimerase/métodos , Bordetella pertussis/genética , Genoma Bacteriano/genética , Humanos , Plasmodium falciparum/genética , Salmonella/genética , Staphylococcus aureus/genética

Distinguishing highly similar gene isoforms with a clustering-based bioinformatics analysis of PacBio single-molecule long reads.

Liang, Ma; Raley, Castle; Zheng, Xin; Kutty, Geetha; Gogineni, Emile; Sherman, Brad T; Sun, Qiang; Chen, Xiongfong; Skelly, Thomas; Jones, Kristine; Stephens, Robert; Zhou, Bin; Lau, William; Johnson, Calvin; Imamichi, Tomozumi; Jiang, Minkang; Dewar, Robin; Lempicki, Richard A; Tran, Bao; Kovacs, Joseph A; Huang, Da Wei.

BioData Min ; 9: 13, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27051465

RESUMO

BACKGROUND: Gene isoforms are commonly found in both prokaryotes and eukaryotes. Since each isoform may perform a specific function in response to changing environmental conditions, studying the dynamics of gene isoforms is important in understanding biological processes and disease conditions. However, genome-wide identification of gene isoforms is technically challenging due to the high degree of sequence identity among isoforms. Traditional targeted sequencing approach, involving Sanger sequencing of plasmid-cloned PCR products, has low throughput and is very tedious and time-consuming. Next-generation sequencing technologies such as Illumina and 454 achieve high throughput but their short read lengths are a critical barrier to accurate assembly of highly similar gene isoforms, and may result in ambiguities and false joining during sequence assembly. More recently, the third generation sequencer represented by the PacBio platform offers sufficient throughput and long reads covering the full length of typical genes, thus providing a potential to reliably profile gene isoforms. However, the PacBio long reads are error-prone and cannot be effectively analyzed by traditional assembly programs. RESULTS: We present a clustering-based analysis pipeline integrated with PacBio sequencing data for profiling highly similar gene isoforms. This approach was first evaluated in comparison to de novo assembly of 454 reads using a benchmark admixture containing 10 known, cloned msg genes encoding the major surface glycoprotein of Pneumocystis jirovecii. All 10 msg isoforms were successfully reconstructed with the expected length (~1.5 kb) and correct sequence by the new approach, while 454 reads could not be correctly assembled using various assembly programs. When using an additional benchmark admixture containing 22 known P. jirovecii msg isoforms, this approach accurately reconstructed all but 4 these isoforms in their full-length (~3 kb); these 4 isoforms were present in low concentrations in the admixture. Finally, when applied to the original clinical sample from which the 22 known msg isoforms were cloned, this approach successfully identified not only all known isoforms accurately (~3 kb each) but also 48 novel isoforms. CONCLUSIONS: PacBio sequencing integrated with the clustering-based analysis pipeline achieves high-throughput and high-resolution discrimination of highly similar sequences, and can serve as a new approach for genome-wide characterization of gene isoforms and other highly repetitive sequences.

Towards Better Precision Medicine: PacBio Single-Molecule Long Reads Resolve the Interpretation of HIV Drug Resistant Mutation Profiles at Explicit Quasispecies (Haplotype) Level.

Huang, Da Wei; Raley, Castle; Jiang, Min Kang; Zheng, Xin; Liang, Dun; Rehman, M Tauseef; Highbarger, Helene C; Jiao, Xiaoli; Sherman, Brad; Ma, Liang; Chen, Xiaofeng; Skelly, Thomas; Troyer, Jennifer; Stephens, Robert; Imamichi, Tomozumi; Pau, Alice; Lempicki, Richard A; Tran, Bao; Nissley, Dwight; Lane, H Clifford; Dewar, Robin L.

J Data Mining Genomics Proteomics ; 7(1)2016 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-26949565

RESUMO

Development of HIV-1 drug resistance mutations (HDRMs) is one of the major reasons for the clinical failure of antiretroviral therapy. Treatment success rates can be improved by applying personalized anti-HIV regimens based on a patient's HDRM profile. However, the sensitivity and specificity of the HDRM profile is limited by the methods used for detection. Sanger-based sequencing technology has traditionally been used for determining HDRM profiles at the single nucleotide variant (SNV) level, but with a sensitivity of only ≥ 20% in the HIV population of a patient. Next Generation Sequencing (NGS) technologies offer greater detection sensitivity (~ 1%) and larger scope (hundreds of samples per run). However, NGS technologies produce reads that are too short to enable the detection of the physical linkages of individual SNVs across the haplotype of each HIV strain present. In this article, we demonstrate that the single-molecule long reads generated using the Third Generation Sequencer (TGS), PacBio RS II, along with the appropriate bioinformatics analysis method, can resolve the HDRM profile at a more advanced quasispecies level. The case studies on patients' HIV samples showed that the quasispecies view produced using the PacBio method offered greater detection sensitivity and was more comprehensive for understanding HDRM situations, which is complement to both Sanger and NGS technologies. In conclusion, the PacBio method, providing a promising new quasispecies level of HDRM profiling, may effect an important change in the field of HIV drug resistance research.

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA