RESUMO
The expression of genes encompasses their transcription into mRNA followed by translation into protein. In recent years, next-generation sequencing and mass spectrometry methods have profiled DNA, RNA and protein abundance in cells. However, there are currently no reference standards that are compatible across these genomic, transcriptomic and proteomic methods, and provide an integrated measure of gene expression. Here, we use synthetic biology principles to engineer a multi-omics control, termed pREF, that can act as a universal molecular standard for next-generation sequencing and mass spectrometry methods. The pREF sequence encodes 21 synthetic genes that can be in vitro transcribed into spike-in mRNA controls, and in vitro translated to generate matched protein controls. The synthetic genes provide qualitative controls that can measure sensitivity and quantitative accuracy of DNA, RNA and peptide detection. We demonstrate the use of pREF in metagenome DNA sequencing and RNA sequencing experiments and evaluate the quantification of proteins using mass spectrometry. Unlike previous spike-in controls, pREF can be independently propagated and the synthetic mRNA and protein controls can be sustainably prepared by recipient laboratories using common molecular biology techniques. Together, this provides a universal synthetic standard able to integrate genomic, transcriptomic and proteomic methods.
Assuntos
DNA , Proteômica , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , DNA/genética , Genômica , RNARESUMO
BACKGROUND: Next-generation sequencing (NGS) can identify mutations in the human genome that cause disease and has been widely adopted in clinical diagnosis. However, the human genome contains many polymorphic, low-complexity, and repetitive regions that are difficult to sequence and analyze. Despite their difficulty, these regions include many clinically important sequences that can inform the treatment of human diseases and improve the diagnostic yield of NGS. RESULTS: To evaluate the accuracy by which these difficult regions are analyzed with NGS, we built an in silico decoy chromosome, along with corresponding synthetic DNA reference controls, that encode difficult and clinically important human genome regions, including repeats, microsatellites, HLA genes, and immune receptors. These controls provide a known ground-truth reference against which to measure the performance of diverse sequencing technologies, reagents, and bioinformatic tools. Using this approach, we provide a comprehensive evaluation of short- and long-read sequencing instruments, library preparation methods, and software tools and identify the errors and systematic bias that confound our resolution of these remaining difficult regions. CONCLUSIONS: This study provides an analytical validation of diagnosis using NGS in difficult regions of the human genome and highlights the challenges that remain to resolve these difficult regions.
Assuntos
Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Cromossomos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Repetições de Microssatélites , Análise de Sequência de DNA/métodos , SoftwareRESUMO
Library adaptors are short oligonucleotides that are attached to RNA and DNA samples in preparation for next-generation sequencing (NGS). Adaptors can also include additional functional elements, such as sample indexes and unique molecular identifiers, to improve library analysis. Here, we describe Control Library Adaptors, termed CAPTORs, that measure the accuracy and reliability of NGS. CAPTORs can be integrated within the library preparation of RNA and DNA samples, and their encoded information is retrieved during sequencing. We show how CAPTORs can measure the accuracy of nanopore sequencing, evaluate the quantitative performance of metagenomic and RNA sequencing, and improve normalisation between samples. CAPTORs can also be customised for clinical diagnoses, correcting systematic sequencing errors and improving the diagnosis of pathogenic BRCA1/2 variants in breast cancer. CAPTORs are a simple and effective method to increase the accuracy and reliability of NGS, enabling comparisons between samples, reagents and laboratories, and supporting the use of nanopore sequencing for clinical diagnosis.
Assuntos
Sequenciamento por Nanoporos , Reprodutibilidade dos Testes , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , RNARESUMO
Leukemia stem cells (LSCs) are linked to relapse in acute myeloid leukemia (AML). The LSC17 gene expression score robustly captures LSC stemness properties in AML and can be used to predict survival outcomes and response to therapy, enabling risk-adapted, upfront treatment approaches. The LSC17 score was developed and validated in a research setting. To enable widespread use of the LSC17 score in clinical decision making, we established a laboratory-developed test (LDT) for the LSC17 score that can be deployed broadly in clinical molecular diagnostic laboratories. We extensively validated the LSC17 LDT in a College of American Pathologists/Clinical Laboratory Improvements Act (CAP/CLIA)-certified laboratory, determining specimen requirements, a synthetic control, and performance parameters for the assay. Importantly, we correlated values from the LSC17 LDT to clinical outcome in a reference cohort of patients with AML, establishing a median assay value that can be used for clinical risk stratification of individual patients with newly diagnosed AML. The assay was established in a second independent CAP/CLIA-certified laboratory, and its technical performance was validated using an independent cohort of patient samples, demonstrating that the LSC17 LDT can be readily implemented in other settings. This study enables the clinical use of the LSC17 score for upfront risk-adapted management of patients with AML.
Assuntos
Laboratórios Clínicos , Leucemia Mieloide Aguda , Estudos de Coortes , Humanos , Leucemia Mieloide Aguda/tratamento farmacológico , Células-Tronco Neoplásicas/metabolismo , Medição de RiscoRESUMO
DNA synthesis in vitro has enabled the rapid production of reference standards. These are used as controls, and allow measurement and improvement of the accuracy and quality of diagnostic tests. Current reference standards typically represent target genetic material, and act only as positive controls to assess test sensitivity. However, negative controls are also required to evaluate test specificity. Using a pair of chimeric A/B RNA standards, this allowed incorporation of positive and negative controls into diagnostic testing for the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2). The chimeric standards constituted target regions for RT-PCR primer/probe sets that are joined in tandem across two separate synthetic molecules. Accordingly, a target region that is present in standard A provides a positive control, whilst being absent in standard B, thereby providing a negative control. This design enables cross-validation of positive and negative controls between the paired standards in the same reaction, with identical conditions. This enables control and test failures to be distinguished, increasing confidence in the accuracy of results. The chimeric A/B standards were assessed using the US Centres for Disease Control real-time RT-PCR protocol, and showed results congruent with other commercial controls in detecting SARS-CoV-2 in patient samples. This chimeric reference standard design approach offers extensive flexibility, allowing representation of diverse genetic features and distantly related sequences, even from different organisms.
Assuntos
Quimera , Sequência de Aminoácidos , COVID-19/diagnóstico , COVID-19/virologia , Humanos , RNA Viral/normas , Padrões de Referência , Reprodutibilidade dos Testes , SARS-CoV-2/química , SARS-CoV-2/genética , SARS-CoV-2/isolamento & purificação , Sensibilidade e EspecificidadeRESUMO
Circulating tumor DNA (ctDNA) sequencing is being rapidly adopted in precision oncology, but the accuracy, sensitivity and reproducibility of ctDNA assays is poorly understood. Here we report the findings of a multi-site, cross-platform evaluation of the analytical performance of five industry-leading ctDNA assays. We evaluated each stage of the ctDNA sequencing workflow with simulations, synthetic DNA spike-in experiments and proficiency testing on standardized, cell-line-derived reference samples. Above 0.5% variant allele frequency, ctDNA mutations were detected with high sensitivity, precision and reproducibility by all five assays, whereas, below this limit, detection became unreliable and varied widely between assays, especially when input material was limited. Missed mutations (false negatives) were more common than erroneous candidates (false positives), indicating that the reliable sampling of rare ctDNA fragments is the key challenge for ctDNA assays. This comprehensive evaluation of the analytical performance of ctDNA assays serves to inform best practice guidelines and provides a resource for precision oncology.
Assuntos
DNA Tumoral Circulante/genética , Oncologia , Neoplasias/genética , Medicina de Precisão , Análise de Sequência de DNA/normas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Limite de Detecção , Guias de Prática Clínica como Assunto , Reprodutibilidade dos TestesRESUMO
Clinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor-normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers. Using whole-genome sequencing (WGS) and whole-exome sequencing (WES), we evaluated the reproducibility of different sample types with varying input amount and tumor purity, and multiple library construction protocols, followed by processing with nine bioinformatics pipelines. We found that read coverage and callers affected both WGS and WES reproducibility, but WES performance was influenced by insert fragment size, genomic copy content and the global imbalance score (GIV; G > T/C > A). Finally, taking into account library preparation protocol, tumor content, read coverage and bioinformatics processes concomitantly, we recommend actionable practices to improve the reproducibility and accuracy of NGS experiments for cancer mutation detection.
Assuntos
Benchmarking , Sequenciamento do Exoma/normas , Neoplasias/genética , Análise de Sequência de DNA/normas , Sequenciamento Completo do Genoma/normas , Linhagem Celular , Linhagem Celular Tumoral , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Mutação , Neoplasias/patologia , Reprodutibilidade dos TestesRESUMO
Standard units of measurement are required for the quantitative description of nature; however, few standard units have been established for genomics to date. Here, we have developed a synthetic DNA ladder that defines a quantitative standard unit that can measure DNA sequence abundance within a next-generation sequencing library. The ladder can be spiked into a DNA sample, and act as an internal scale that measures quantitative genetics features. Unlike previous spike-ins, the ladder is encoded within a single molecule, and can be equivalently and independently synthesized by different laboratories. We show how the ladder can measure diverse quantitative features, including human genetic variation and microbial abundance, and also estimate uncertainty due to technical variation and improve normalization between libraries. This ladder provides an independent quantitative unit that can be used with any organism, application or technology, thereby providing a common metric by which genomes can be measured.
Assuntos
DNA/análise , DNA/síntese química , Sequência de Bases , DNA/genética , Dosagem de Genes , Biblioteca Gênica , Genômica , HumanosRESUMO
Chirality is a property describing any object that is inequivalent to its mirror image. Due to its 5'-3' directionality, a DNA sequence is distinct from a mirrored sequence arranged in reverse nucleotide-order, and is therefore chiral. A given sequence and its opposing chiral partner sequence share many properties, such as nucleotide composition and sequence entropy. Here we demonstrate that chiral DNA sequence pairs also perform equivalently during molecular and bioinformatic techniques that underpin genetic analysis, including PCR amplification, hybridization, whole-genome, target-enriched and nanopore sequencing, sequence alignment and variant detection. Given these shared properties, synthetic DNA sequences mirroring clinically relevant or analytically challenging regions of the human genome are ideal controls for clinical genomics. The addition of synthetic chiral sequences (sequins) to patient tumor samples can prevent false-positive and false-negative mutation detection to improve diagnosis. Accordingly, we propose that sequins can fulfill the need for commutable internal controls in precision medicine.
Assuntos
DNA/genética , Genômica , Sequência de Bases , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Repetições de Microssatélites/genética , Mutação/genética , Nanoporos , Neoplasias/genética , Reação em Cadeia da Polimerase , Alinhamento de SequênciaRESUMO
Next-generation sequencing (NGS) has been widely adopted to identify genetic variants and investigate their association with disease. However, the analysis of sequencing data remains challenging because of the complexity of human genetic variation and confounding errors introduced during library preparation, sequencing and analysis. We have developed a set of synthetic DNA spike-ins-termed 'sequins' (sequencing spike-ins)-that are directly added to DNA samples before library preparation. Sequins can be used to measure technical biases and to act as internal quantitative and qualitative controls throughout the sequencing workflow. This step-by-step protocol explains the use of sequins for both whole-genome and targeted sequencing of the human genome. This includes instructions regarding the dilution and addition of sequins to human DNA samples, followed by the bioinformatic steps required to separate sequin- and sample-derived sequencing reads and to evaluate the diagnostic performance of the assay. These practical guidelines are accompanied by a broader discussion of the conceptual and statistical principles that underpin the design of sequin standards. This protocol is suitable for users with standard laboratory and bioinformatic experience. The laboratory steps require ~1-4 d and the bioinformatic steps (which can be performed with the provided example data files) take an additional day.