RESUMO
Tissues used in pathology laboratories are typically stored in the form of formalin-fixed, paraffin-embedded (FFPE) samples. One important consideration in repurposing FFPE material for next generation sequencing (NGS) analysis is the sequencing artifacts that can arise from the significant damage to nucleic acids due to treatment with formalin, storage at room temperature and extraction. One such class of artifacts consists of chimeric reads that appear to be derived from non-contiguous portions of the genome. Here, we show that a major proportion of such chimeric reads align to both the 'Watson' and 'Crick' strands of the reference genome. We refer to these as strand-split artifact reads (SSARs). This study provides a conceptual framework for the mechanistic basis of the genesis of SSARs and other chimeric artifacts along with supporting experimental evidence, which have led to approaches to reduce the levels of such artifacts. We demonstrate that one of these approaches, involving S1 nuclease-mediated removal of single-stranded fragments and overhangs, also reduces sequence bias, base error rates, and false positive detection of copy number and single nucleotide variants. Finally, we describe an analytical approach for quantifying SSARs from NGS data.
Assuntos
Artefatos , Fixadores , Formaldeído , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Animais , Biblioteca Genômica , Genômica , Temperatura Alta , Camundongos Endogâmicos C57BL , Inclusão em ParafinaRESUMO
Background: Chronic fatigue syndrome (CFS) remains poorly understood. Although infections are speculated to trigger the syndrome, a specific infectious agent and underlying pathophysiological mechanism remain elusive. In a previous study, we described similar clinical phenotypes in CFS patients and alternatively diagnosed chronic Lyme syndrome (ADCLS) patientsindividuals diagnosed with Lyme disease by testing from private Lyme specialty laboratories but who test negative by reference 2-tiered serologic analysis. Methods: Here, we performed blinded RNA-seq analysis of whole blood collected from 25 adults diagnosed with CFS and 13 ADCLS patients, comparing these cases to 25 matched controls and 11 patients with well-controlled systemic lupus erythematosus (SLE). Samples were collected at patient enrollment and not during acute symptom flares. RNA-seq data were used to study host gene expression, B-cell/T-cell receptor profiles (BCR/TCR), and potential viral infections. Results: No differentially expressed genes (DEGs) were found to be significant when CFS or ADCLS cases were compared to controls. Forty-two DEGs were found when SLE cases were compared to controls, consistent with activation of interferon signaling pathways associated with SLE disease. BCR/TCR repertoire analysis did not show significant differences between CFS and controls or ADCLS and controls. Finally, viral sequences corresponding to anelloviruses, human pegivirus 1, herpesviruses, and papillomaviruses were detected in RNA-seq data, but proportions were similar (P = .73) across all genus-level taxonomic categories. Conclusions: Our observations do not support a theory of transcriptionally mediated immune cell dysregulation in CFS and ADCLS, at least outside of periods of acute symptom flares.
Assuntos
Síndrome de Fadiga Crônica/etiologia , Expressão Gênica , Interações Hospedeiro-Patógeno/genética , Doença de Lyme/etiologia , Receptores de Antígenos de Linfócitos B/genética , Receptores de Antígenos de Linfócitos T/genética , Viroses/complicações , Motivos de Aminoácidos , Sequência de Aminoácidos , Linfócitos B/imunologia , Linfócitos B/metabolismo , Estudos de Casos e Controles , Doença Crônica , Suscetibilidade a Doenças , Feminino , Perfilação da Expressão Gênica , Predisposição Genética para Doença , Sequenciamento de Nucleotídeos em Larga Escala , Interações Hospedeiro-Patógeno/imunologia , Humanos , Masculino , Metagenoma , Metagenômica/métodos , Fenótipo , Receptores de Antígenos de Linfócitos B/química , Receptores de Antígenos de Linfócitos T/química , Linfócitos T/imunologia , Linfócitos T/metabolismo , Viroses/virologiaRESUMO
BACKGROUND: RNA-Sequencing (RNA-seq) is now commonly used to reveal quantitative spatiotemporal snapshots of the transcriptome, the structures of transcripts (splice variants and fusions) and landscapes of expressed mutations. However, standard approaches for library construction typically require relatively high amounts of input RNA, are labor intensive, and are time consuming. METHODS: Here, we report the outcome of a systematic effort to optimize and streamline steps in strand-specific RNA-seq library construction. RESULTS: This work has resulted in the identification of an optimized messenger RNA isolation protocol, a potent reverse transcriptase for cDNA synthesis, and an efficient chemistry and a simplified formulation of library construction reagents. We also present an optimization of bead-based purification and size selection designed to maximize the recovery of cDNA fragments. CONCLUSIONS: These developments have allowed us to assemble a rapid high throughput pipeline that produces high quality data from amounts of total RNA as low as 25 ng. While the focus of this study is on RNA-seq sample preparation, some of these developments are also relevant to other next-generation sequencing library types.
Assuntos
Biblioteca Gênica , RNA Mensageiro , Análise de Sequência de RNA/métodos , Manejo de Espécimes/normas , Células HL-60 , HumanosRESUMO
Follicular lymphoma (FL) and diffuse large B-cell lymphoma (DLBCL) are the two most common non-Hodgkin lymphomas (NHLs). Here we sequenced tumour and matched normal DNA from 13 DLBCL cases and one FL case to identify genes with mutations in B-cell NHL. We analysed RNA-seq data from these and another 113 NHLs to identify genes with candidate mutations, and then re-sequenced tumour and matched normal DNA from these cases to confirm 109 genes with multiple somatic mutations. Genes with roles in histone modification were frequent targets of somatic mutation. For example, 32% of DLBCL and 89% of FL cases had somatic mutations in MLL2, which encodes a histone methyltransferase, and 11.4% and 13.4% of DLBCL and FL cases, respectively, had mutations in MEF2B, a calcium-regulated gene that cooperates with CREBBP and EP300 in acetylating histones. Our analysis suggests a previously unappreciated disruption of chromatin biology in lymphomagenesis.
Assuntos
Histonas/metabolismo , Linfoma não Hodgkin/genética , Mutação/genética , Cromatina/genética , Cromatina/metabolismo , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Genoma Humano/genética , Histona Acetiltransferases/genética , Histona Acetiltransferases/metabolismo , Histona Metiltransferases , Histona-Lisina N-Metiltransferase/genética , Histona-Lisina N-Metiltransferase/metabolismo , Humanos , Perda de Heterozigosidade/genética , Linfoma Folicular/enzimologia , Linfoma Folicular/genética , Linfoma Difuso de Grandes Células B/enzimologia , Linfoma Difuso de Grandes Células B/genética , Linfoma não Hodgkin/enzimologia , Proteínas de Domínio MADS/genética , Proteínas de Domínio MADS/metabolismo , Fatores de Transcrição MEF2 , Fatores de Regulação Miogênica/genética , Fatores de Regulação Miogênica/metabolismo , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismoRESUMO
Cilia and flagella play important roles in many physiological processes, including cell and fluid movement, sensory perception, and development. The biogenesis and maintenance of cilia depend on intraflagellar transport (IFT), a motility process that operates bidirectionally along the ciliary axoneme. Disruption in IFT and cilia function causes several human disorders, including polycystic kidneys, retinal dystrophy, neurosensory impairment, and Bardet-Biedl syndrome (BBS). To uncover new ciliary components, including IFT proteins, we compared C. elegans ciliated neuronal and nonciliated cells through serial analysis of gene expression (SAGE) and screened for genes potentially regulated by the ciliogenic transcription factor, DAF-19. Using these complementary approaches, we identified numerous candidate ciliary genes and confirmed the ciliated-cell-specific expression of 14 novel genes. One of these, C27H5.7a, encodes a ciliary protein that undergoes IFT. As with other IFT proteins, its ciliary localization and transport is disrupted by mutations in IFT and bbs genes. Furthermore, we demonstrate that the ciliary structural defect of C. elegans dyf-13(mn396) mutants is caused by a mutation in C27H5.7a. Together, our findings help define a ciliary transcriptome and suggest that DYF-13, an evolutionarily conserved protein, is a novel core IFT component required for cilia function.
Assuntos
Caenorhabditis elegans/genética , Cílios/genética , Perfilação da Expressão Gênica , Neurônios/metabolismo , Animais , Sequência de Bases , Proteínas de Caenorhabditis elegans/metabolismo , Cílios/metabolismo , Biologia Computacional , Genômica/métodos , Proteínas de Fluorescência Verde , Mutação/genética , Transporte Proteico/fisiologia , Análise de Sequência de DNA , Fatores de Transcrição/metabolismoRESUMO
Curation and storage of formalin-fixed, paraffin-embedded (FFPE) samples are standard procedures in hospital pathology laboratories around the world. Many thousands of such samples exist and could be used for next generation sequencing analysis. Retrospective analyses of such samples are important for identifying molecular correlates of carcinogenesis, treatment history and disease outcomes. Two major hurdles in using FFPE material for sequencing are the damaged nature of the nucleic acids and the labor-intensive nature of nucleic acid purification. These limitations and a number of other issues that span multiple steps from nucleic acid purification to library construction are addressed here. We optimized and automated a 96-well magnetic bead-based extraction protocol that can be scaled to large cohorts and is compatible with automation. Using sets of 32 and 91 individual FFPE samples respectively, we generated libraries from 100 ng of total RNA and DNA starting amounts with 95-100% success rate. The use of the resulting RNA in micro-RNA sequencing was also demonstrated. In addition to offering the potential of scalability and rapid throughput, the yield obtained with lower input requirements makes these methods applicable to clinical samples where tissue abundance is limiting.
Assuntos
Automação , DNA/isolamento & purificação , Formaldeído/química , Sequenciamento de Nucleotídeos em Larga Escala , Inclusão em Parafina , RNA/isolamento & purificação , Fixação de Tecidos/métodos , DNA/genética , RNA/genéticaAssuntos
Biomarcadores Tumorais/genética , Perfilação da Expressão Gênica/métodos , Fenótipo , Leucemia-Linfoma Linfoblástico de Células T Precursoras/classificação , Leucemia-Linfoma Linfoblástico de Células T Precursoras/patologia , RNA-Seq/métodos , Análise de Sequência de RNA/métodos , Humanos , Leucemia-Linfoma Linfoblástico de Células T Precursoras/genética , Análise de Célula Única/métodosRESUMO
PURPOSE: Cancers accumulate mutations over time, each of which brings the potential for recognition by the immune system. We evaluated T-cell recognition of the tumor mutanome in patients with ovarian cancer undergoing standard treatment. EXPERIMENTAL DESIGN: Tumor-associated T cells from 3 patients with ovarian cancer were assessed by ELISPOT for recognition of nonsynonymous mutations identified by whole exome sequencing of autologous tumor. The relative levels of mutations and responding T cells were monitored in serial tumor samples collected at primary surgery and first and second recurrence. RESULTS: The vast majority of mutations (78/79) were not recognized by tumor-associated T cells; however, a highly specific CD8(+) T-cell response to the mutation hydroxysteroid dehydrogenase-like protein 1 (HSDL1)(L25V) was detected in one patient. In the primary tumor, the HSDL1(L25V) mutation had low prevalence and expression, and a corresponding T-cell response was undetectable. At first recurrence, there was a striking increase in the abundance of the mutation and corresponding MHC class I epitope, and this was accompanied by the emergence of the HSDL1(L25V)-specific CD8(+) T-cell response. At second recurrence, the HSDL1(L25V) mutation and epitope continued to be expressed; however, the corresponding T-cell response was no longer detectable. CONCLUSION: The immune system can respond to the evolving ovarian cancer genome. However, the T-cell response detected here was rare, was transient, and ultimately failed to prevent disease progression. These findings reveal the limitations of spontaneous tumor immunity in the setting of standard treatments and suggest a high degree of ignorance of tumor mutations that could potentially be reversed by immunotherapy.
Assuntos
Vigilância Imunológica , Mutação , Neoplasias Ovarianas/genética , Neoplasias Ovarianas/imunologia , Linfócitos T/imunologia , Linfócitos T CD8-Positivos/imunologia , Progressão da Doença , Epitopos de Linfócito T/imunologia , Feminino , Antígenos HLA/imunologia , Humanos , Hidroxiesteroide Desidrogenases/genética , Imuno-Histoquímica , Linfócitos do Interstício Tumoral/imunologia , Gradação de Tumores , Neoplasias Ovarianas/patologia , RecidivaRESUMO
Individuals who inherit mutations in BRCA1 or BRCA2 are predisposed to breast and ovarian cancers. However, identifying mutations in these large genes by conventional dideoxy sequencing in a clinical testing laboratory is both time consuming and costly, and similar challenges exist for other large genes, or sets of genes, with relevance in the clinical setting. Second-generation sequencing technologies have the potential to improve the efficiency and throughput of clinical diagnostic sequencing, once clinically validated methods become available. We have developed a method for detection of variants based on automated small-amplicon PCR followed by sample pooling and sequencing with a second-generation instrument. To demonstrate the suitability of this method for clinical diagnostic sequencing, we analyzed the coding exons and the intron-exon boundaries of BRCA1 and BRCA2 in 91 hereditary breast cancer patient samples. Our method generated high-quality sequence coverage across all targeted regions, with median coverage greater than 4000-fold for each sample in pools of 24. Sensitive and specific automated variant detection, without false-positive or false-negative results, was accomplished with a standard software pipeline using bwa for sequence alignment and samtools for variant detection. We experimentally derived a minimum threshold of 100-fold sequence depth for confident variant detection. The results demonstrate that this method is suitable for sensitive, automatable, high-throughput sequence variant detection in the clinical laboratory.