RESUMO
Tissues used in pathology laboratories are typically stored in the form of formalin-fixed, paraffin-embedded (FFPE) samples. One important consideration in repurposing FFPE material for next generation sequencing (NGS) analysis is the sequencing artifacts that can arise from the significant damage to nucleic acids due to treatment with formalin, storage at room temperature and extraction. One such class of artifacts consists of chimeric reads that appear to be derived from non-contiguous portions of the genome. Here, we show that a major proportion of such chimeric reads align to both the 'Watson' and 'Crick' strands of the reference genome. We refer to these as strand-split artifact reads (SSARs). This study provides a conceptual framework for the mechanistic basis of the genesis of SSARs and other chimeric artifacts along with supporting experimental evidence, which have led to approaches to reduce the levels of such artifacts. We demonstrate that one of these approaches, involving S1 nuclease-mediated removal of single-stranded fragments and overhangs, also reduces sequence bias, base error rates, and false positive detection of copy number and single nucleotide variants. Finally, we describe an analytical approach for quantifying SSARs from NGS data.
Assuntos
Artefatos , Fixadores , Formaldeído , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Animais , Biblioteca Genômica , Genômica , Temperatura Alta , Camundongos Endogâmicos C57BL , Inclusão em ParafinaRESUMO
BACKGROUND: RNA-Sequencing (RNA-seq) is now commonly used to reveal quantitative spatiotemporal snapshots of the transcriptome, the structures of transcripts (splice variants and fusions) and landscapes of expressed mutations. However, standard approaches for library construction typically require relatively high amounts of input RNA, are labor intensive, and are time consuming. METHODS: Here, we report the outcome of a systematic effort to optimize and streamline steps in strand-specific RNA-seq library construction. RESULTS: This work has resulted in the identification of an optimized messenger RNA isolation protocol, a potent reverse transcriptase for cDNA synthesis, and an efficient chemistry and a simplified formulation of library construction reagents. We also present an optimization of bead-based purification and size selection designed to maximize the recovery of cDNA fragments. CONCLUSIONS: These developments have allowed us to assemble a rapid high throughput pipeline that produces high quality data from amounts of total RNA as low as 25 ng. While the focus of this study is on RNA-seq sample preparation, some of these developments are also relevant to other next-generation sequencing library types.
Assuntos
Biblioteca Gênica , RNA Mensageiro , Análise de Sequência de RNA/métodos , Manejo de Espécimes/normas , Células HL-60 , HumanosRESUMO
Plant mitochondrial genomes vary widely in size. Although many plant mitochondrial genomes have been sequenced and assembled, the vast majority are of angiosperms, and few are of gymnosperms. Most plant mitochondrial genomes are smaller than a megabase, with a few notable exceptions. We have sequenced and assembled the complete 5.5-Mb mitochondrial genome of Sitka spruce (Picea sitchensis), to date, one of the largest mitochondrial genomes of a gymnosperm. We sequenced the whole genome using Oxford Nanopore MinION, and then identified contigs of mitochondrial origin assembled from these long reads based on sequence homology to the white spruce mitochondrial genome. The assembly graph shows a multipartite genome structure, composed of one smaller 168-kb circular segment of DNA, and a larger 5.4-Mb single component with a branching structure. The assembly graph gives insight into a putative complex physical genome structure, and its branching points may represent active sites of recombination.
Assuntos
Genoma Mitocondrial , Genoma de Planta , Picea/genética , Estrutura MolecularRESUMO
The analysis of cell-free circulating tumor DNA (ctDNA) is potentially a less invasive, more dynamic assessment of cancer progression and treatment response than characterizing solid tumor biopsies. Standard isolation methods require separation of plasma by centrifugation, a time-consuming step that complicates automation. To address these limitations, we present an automatable magnetic bead-based ctDNA isolation method that eliminates centrifugation to purify ctDNA directly from peripheral blood (PB). To develop and test our method, ctDNA from cancer patients was purified from PB and plasma. We found that allelic fractions of somatic single-nucleotide variants from target gene capture libraries were comparable, indicating that the PB ctDNA purification method may be a suitable replacement for the plasma-based protocols currently in use.
Assuntos
Ácidos Nucleicos Livres/sangue , DNA Tumoral Circulante/sangue , Ensaios de Triagem em Larga Escala/métodos , Neoplasias/sangue , Biomarcadores Tumorais/sangue , Biomarcadores Tumorais/isolamento & purificação , Ácidos Nucleicos Livres/isolamento & purificação , DNA Tumoral Circulante/isolamento & purificação , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação , Neoplasias/genéticaRESUMO
We compared clinical validity of two non-invasive prenatal screening (NIPS) methods for fetal trisomies 13, 18, 21, and monosomy X. We recruited prospectively 2203 women at high risk of fetal aneuploidy and 1807 at baseline risk. Three-hundred and twenty-nine euploid samples were randomly removed. The remaining 1933 high risk and 1660 baseline-risk plasma aliquots were assigned randomly between four laboratories and tested with two index NIPS tests, blind to maternal variables and pregnancy outcomes. The two index tests used massively parallel shotgun sequencing (semiconductor-based and optical-based). The reference standard for all fetuses was invasive cytogenetic analysis or clinical examination at birth and postnatal follow-up. For each chromosome of interest, chromosomal ratios were calculated (number of reads for chromosome/total number of reads). Euploid samples' mean chromosomal ratio coefficients of variation were 0.48 (T21), 0.34 (T18), and 0.31 (T13). According to the reference standard, there were 155 cases of T21, 49 T18, 8 T13 and 22 45,X. Using a fetal fraction ≥4% to call results and a chromosomal ratio z-score of ≥3 to report a positive result, detection rates (DR), and false positive rates (FPR) were not statistically different between platforms: mean DR 99% (T21), 100%(T18, T13); 79%(45,X); FPR < 0.3% for T21, T18, T13, and <0.6% for 45,X. Both methods' negative predictive values in high-risk pregnancies were >99.8%, except for 45,X(>99.6%). Threshold analysis in high-risk pregnancies with different fetal fractions and z-score cut-offs suggested that a z-score cutoff to 3.5 for positive results improved test accuracy. Both sequencing platforms showed equivalent and excellent clinical validity.
Assuntos
Aneuploidia , Ácidos Nucleicos Livres , Feto , Ensaios de Triagem em Larga Escala/métodos , Fator de Transcrição Ikaros/genética , Adolescente , Adulto , Síndrome de Down , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Técnicas de Diagnóstico Molecular , Gravidez , Síndrome da Trissomia do Cromossomo 13 , Síndrome da Trissomía do Cromossomo 18 , Síndrome de Turner , Adulto JovemRESUMO
The Steller sea lion is the largest member of the Otariidae family and is found in the coastal waters of the northern Pacific Rim. Here, we present the Steller sea lion genome, determined through DNA sequencing approaches that utilized microfluidic partitioning library construction, as well as nanopore technologies. These methods constructed a highly contiguous assembly with a scaffold N50 length of over 14 megabases, a contig N50 length of over 242 kilobases and a total length of 2.404 gigabases. As a measure of completeness, 95.1% of 4104 highly conserved mammalian genes were found to be complete within the assembly. Further annotation identified 19,668 protein coding genes. The assembled genome sequence and underlying sequence data can be found at the National Center for Biotechnology Information (NCBI) under the BioProject accession number PRJNA475770.
Assuntos
Genoma , Leões-Marinhos/genética , Animais , Biblioteca Genômica , Microfluídica/métodos , Nanoporos , Sequenciamento Completo do GenomaRESUMO
Next generation RNA-sequencing (RNA-seq) is a flexible approach that can be applied to a range of applications including global quantification of transcript expression, the characterization of RNA structure such as splicing patterns and profiling of expressed mutations. Many RNA-seq protocols require up to microgram levels of total RNA input amounts to generate high quality data, and thus remain impractical for the limited starting material amounts typically obtained from rare cell populations, such as those from early developmental stages or from laser micro-dissected clinical samples. Here, we present an assessment of the contemporary ribosomal RNA depletion-based protocols, and identify those that are suitable for inputs as low as 1-10 ng of intact total RNA and 100-500 ng of partially degraded RNA from formalin-fixed paraffin-embedded tissues.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , RNA Ribossômico/genética , Análise de Sequência de RNA/métodos , Animais , Sequência de Bases/genética , Perfilação da Expressão Gênica/métodos , Humanos , Mamíferos/genética , RNA/genética , RNA Mensageiro/genética , Fixação de Tecidos/métodos , Transcriptoma/genéticaRESUMO
OBJECTIVES: Non-invasive prenatal aneuploidy testing (NIPT) by next-generation sequencing of circulating cell-free DNA in maternal plasma relies on chromosomal ratio (chrratio) measurements to detect aneuploid values that depart from euploid ratios. Diagnostic performances are known to depend on the fraction of fetal DNA (FF) present in maternal plasma, although how this translates into specific quantitative changes in specificity/positive predictive values and which other variables might also be important is not well understood. DESIGN & METHODS: To explore this issue, theoretical relationships between FF and various measures of diagnostic performances were assessed for a range of parameter values. Empirical data from three NIPT assays were then used to validate theoretical calculations. RESULTS: For a given positivity threshold, dramatic changes in specificity and positive predictive values (PPV) as a function of both FF and the coefficient of variation (CV) of the chrratio measurement were observed. Theoretically predicted and observed chrratio z-scores agreed closely, confirming the determinant impact of small changes in both FF and chrratio CV. CONCLUSIONS: Evaluation of NIPT assay performances therefore requires knowledge of the FF distribution in the population in which the test is intended to be used and, in particular, of the precise value of the assay chrratio CV for each chromosome or genomic region of interest. Laboratories offering NIPT testing should carefully measure these parameters to ensure test reliability and clinical usefulness in interpreting individual patients' results.
Assuntos
Aneuploidia , Diagnóstico Pré-Natal/métodos , Adulto , Transtornos Cromossômicos/sangue , Feminino , Testes Genéticos/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Modelos Teóricos , Valor Preditivo dos Testes , Gravidez , Diagnóstico Pré-Natal/estatística & dados numéricos , Reprodutibilidade dos TestesRESUMO
Curation and storage of formalin-fixed, paraffin-embedded (FFPE) samples are standard procedures in hospital pathology laboratories around the world. Many thousands of such samples exist and could be used for next generation sequencing analysis. Retrospective analyses of such samples are important for identifying molecular correlates of carcinogenesis, treatment history and disease outcomes. Two major hurdles in using FFPE material for sequencing are the damaged nature of the nucleic acids and the labor-intensive nature of nucleic acid purification. These limitations and a number of other issues that span multiple steps from nucleic acid purification to library construction are addressed here. We optimized and automated a 96-well magnetic bead-based extraction protocol that can be scaled to large cohorts and is compatible with automation. Using sets of 32 and 91 individual FFPE samples respectively, we generated libraries from 100 ng of total RNA and DNA starting amounts with 95-100% success rate. The use of the resulting RNA in micro-RNA sequencing was also demonstrated. In addition to offering the potential of scalability and rapid throughput, the yield obtained with lower input requirements makes these methods applicable to clinical samples where tissue abundance is limiting.