RESUMO
Golden Gate cloning enables the modular assembly of DNA parts into desired synthetic genetic constructs. The "one-pot" nature of Golden Gate reactions makes them particularly amenable to high-throughput automation, facilitating the generation of thousands of constructs in a massively parallel manner. One potential bottleneck in this process is the design of these constructs. There are multiple parameters that must be considered during the design of an assembly process, and the final design should also be checked and verified before implementation. Doing this by hand for large numbers of constructs is neither practical nor feasible and increases the likelihood of introducing potentially costly errors. In this chapter we describe a design workflow that utilizes bespoke computational tools to automate the key phases of the construct design process and perform sequence editing in batches.
Assuntos
Clonagem Molecular , DNA , Edição de Genes , DNA/genética , DNA/química , Edição de Genes/métodos , Clonagem Molecular/métodos , Sistemas CRISPR-Cas , Software , Biologia Sintética/métodos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
Golden Gate cloning has revolutionized synthetic biology. Its concept of modular, highly characterized libraries of parts that can be combined into higher order assemblies allows engineering principles to be applied to biological systems. The basic parts, typically stored in Level 0 plasmids, are sequence validated by the method of choice and can be combined into higher order assemblies on demand. Higher order assemblies are typically transcriptional units, and multiple transcriptional units can be assembled into multi-gene constructs. Higher order Golden Gate assembly based on defined and validated parts usually does not introduce sequence changes. Therefore, simple validation of the assemblies, e.g., by colony polymerase chain reaction (PCR) or restriction digest pattern analysis is sufficient. However, in many experimental setups, researchers do not use defined parts, but rather part libraries, resulting in assemblies of high combinatorial complexity where sequencing again becomes mandatory. Here, we present a detailed protocol for the use of a highly multiplexed dual barcode amplicon sequencing using the Nanopore sequencing platform for in-house sequence validation. The workflow, called DuBA.flow, is a start-to-finish procedure that provides all necessary steps from a single colony to the final easy-to-interpret sequencing report.
Assuntos
Sequenciamento por Nanoporos , Biologia Sintética , Sequenciamento por Nanoporos/métodos , Biologia Sintética/métodos , Clonagem Molecular/métodos , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Reação em Cadeia da Polimerase/métodos , Nanoporos , Fluxo de TrabalhoRESUMO
Genome-wide screens are a powerful technique to dissect the complex network of genes regulating diverse cellular phenotypes. The recent adaptation of the CRISPR-Cas9 system for genome engineering has revolutionized functional genomic screening. Here, we present protocols used to introduce Cas9 into human lymphoma cell lines, produce high-titer lentivirus of a genome-wide sgRNA library, transduce and culture cells during the screen, select cells with a specified phenotype, isolate genomic DNA, and prepare a custom library for next-generation sequencing. These protocols were tailored for loss-of-function CRISPR screens in human B-cell lymphoma cell lines but are highly amenable for other experimental purposes.
Assuntos
Sistemas CRISPR-Cas , Fenótipo , Humanos , Linhagem Celular Tumoral , Linfoma/genética , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Lentivirus/genética , RNA Guia de Sistemas CRISPR-Cas/genética , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Genômica/métodosRESUMO
Recent developments in single cell sequencing technologies enable researchers to examine heterogeneity of cell types and subclusters even deeper. First assays were only available for transcriptome analysis of up to 10,000 cells, but nowadays up to 60,000 cells or even more can be analyzed. Whereas initially only analysis of mRNA expression was possible, currently single cell methods multiplied, with extension of assays for examination of surface molecule expression, DNA accessibility (ATAC-seq), antigen specificity, and B or T cell receptor repertoires. Also, spatial transcriptomics or CRISPR screenings, augmenting classical CRISPR/Cas9 screens by combining them with transcriptomic data at single cell level, can be evaluated. The composition of B and T cell clones-of malignant cells in lymphomas and leukemia, as well as of infiltrating B or T cell clones in other types of cancer-is especially important in tumor research, as these clones may give valuable hints for tumor development and control. This chapter presents detailed methods for implementation and analysis of single cell B and/or T cell receptor repertoire sequencing on the Chromium system from 10× Genomics and the Rhapsody™ system from BD Bioscience.
Assuntos
Linfócitos B , Análise de Célula Única , Linfócitos T , Humanos , Análise de Célula Única/métodos , Linfócitos T/imunologia , Linfócitos T/metabolismo , Linfócitos B/metabolismo , Linfócitos B/imunologia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Receptores de Antígenos de Linfócitos T/genética , Recombinação V(D)J , Análise de Sequência de DNA/métodosRESUMO
In the past decade, single-cell (sc) transcriptomics has overcome the limitations of bulk analysis by measuring gene expression in individual cells, not just a population average. This can identify diverse cell types and states within a sample with high resolution, even without prior purification. Various technologies exist, each with its own capture, barcoding, and library preparation methods. This chapter focuses on the analysis of normal and malignant mature B cells using the 10× Genomics 5' sc-gene expression in parallel with B cell immune repertoire profiling. By integrating the gene expression data from similar cells, the complete transcriptome for each population can be reconstructed, while the identification of the expressed immunoglobulin genes allows investigating clonotype evolution and the detection of tumor clones that share the same clonally rearranged B cell receptor sequence. Researchers are guided through both the experimental protocols and data analysis with a comprehensive, step-by-step walkthrough of how to use some of the more popular single-cell software tools.
Assuntos
Linfócitos B , Perfilação da Expressão Gênica , Análise de Célula Única , Transcriptoma , Análise de Célula Única/métodos , Humanos , Perfilação da Expressão Gênica/métodos , Linfócitos B/metabolismo , Linfócitos B/imunologia , Software , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
Liquid biopsy has become an opportunity in lymphoma diagnostics, since plasma circulating tumor DNA (ctDNA) is an easily accessible source of tumor DNA, which can be complementary to tissue biopsy. Through ctDNA, lymphomas can be molecularly characterized, tumor clonal evolution can be tracked, while monitoring minimal residual diseases during and after therapy. Here, we describe the methodology of cancer personalized profiling by deep sequencing (CAPP-seq) that we use for ctDNA qualification and quantification.
Assuntos
Biomarcadores Tumorais , DNA Tumoral Circulante , Sequenciamento de Nucleotídeos em Larga Escala , Linfoma , Humanos , DNA Tumoral Circulante/sangue , DNA Tumoral Circulante/genética , Linfoma/sangue , Linfoma/genética , Linfoma/diagnóstico , Biópsia Líquida/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biomarcadores Tumorais/sangue , Biomarcadores Tumorais/genética , Medicina de Precisão/métodosRESUMO
Clonal hematopoiesis (CH) is the age-related expansion of hematopoietic stem cell clones resulting from the acquisition of somatic point mutations or mosaic chromosomal alterations (mCAs). It is linked to adverse systemic effects, including hematologic malignancies, cardiovascular diseases, metabolic disorders, as well as liver and kidney ailments, ultimately contributing to elevated overall mortality.Given its diverse biological and clinical implications, the identification of clonal hematopoiesis holds significance in various contexts. While traditionally centered on mutations associated with myeloid malignancies, stem/progenitor cell involvement has been documented for various lymphoid malignancies, including T-cell lymphoma, chronic lymphocytic leukemia (CLL), and follicular lymphoma (FL). Lymphoid CH (L-CH) involves a broader spectrum of genes and occurs at a lower prevalence, resulting in reduced mutation prevalences per gene. This characteristic poses challenges for efficient CH detection.The major strategies to identify CH are whole exome sequencing (WES), whole genome sequencing (WGS), or targeted sequencing. Targeted sequencing allows for much higher sequencing depth compared to WES and WGS because of the focus on genes known to be associated with CH and therefore allows detecting potential variants at low frequencies with high precision. Here, we describe an error-corrected targeted sequencing approach for detection of CH in bone marrow (BM) or peripheral blood (PB) samples, which we have successfully established and used in various cohorts. This protocol includes the process of DNA isolation from PB and BM samples, library preparation with molecular tags including quality control steps and computational analysis including variant filtering.
Assuntos
Hematopoiese Clonal , Humanos , Hematopoiese Clonal/genética , Sequenciamento do Exoma/métodos , Células-Tronco Hematopoéticas/metabolismo , Mutação , Sequenciamento Completo do Genoma/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
Proteogenomics is a growing "multi-omics" research area that combines mass spectrometry-based proteomics and high-throughput nucleotide sequencing technologies. Proteogenomics has helped in genomic annotation for organisms whose complete genome sequences became available by using high-throughput DNA sequencing technologies. Apart from genome annotation, this multi-omics approach has also helped researchers confirm expression of variant proteins belonging to unique proteoforms that could have resulted from single-nucleotide polymorphism (SNP), insertion and deletions (Indels), splice isoforms, or other genome or transcriptome variations.A proteogenomic study depends on a multistep informatics workflow, requiring different software at each step. These integrated steps include creating an appropriate protein sequence database, matching spectral data against these sequences, and finally identifying peptide sequences corresponding to novel proteoforms followed by variant classification and functional analysis. The disparate software required for a proteogenomic study is difficult for most researchers to access and use, especially those lacking computational expertise. Furthermore, using them disjointedly can be error-prone as it requires setting up individual parameters for each software. Consequently, reproducibility suffers. Managing output files from each software is an additional challenge. One solution for these challenges in proteogenomics is the open-source Web-based computational platform Galaxy. Its capability to create and manage workflows comprised of disparate software while recording and saving all important parameters promotes both usability and reproducibility. Here, we describe a workflow that can perform proteogenomic analysis on a Galaxy-based platform. This Galaxy workflow facilitates matching of spectral data with a customized protein sequence database, identifying novel protein variants, assessing quality of results, and classifying variants along with visualization against the genome.
Assuntos
Biologia Computacional , Proteogenômica , Software , Fluxo de Trabalho , Proteogenômica/métodos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Proteômica/métodos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Proteogenomics is a multi-omics setup combining mass spectrometry and next-generation sequencing (NGS) technologies (using genomics and/or transcriptomics) with main aims of improving genome annotation and facilitating characterization of proteo-isoforms. However, working with proteogenomic approach is a very challenging task as it is generating multi-omics data and integrating these data for interpretation of results for biological or clinical implications. There is an urgent need for the development of protocols for integrated proteogenomics approaches. Genome resequencing yields massive data for missense single-nucleotide polymorphisms (SNP), and SNPs are yet not fully covered for their pathogenic nature using proteogenomic approaches. In this chapter, we present such a protocol for dealing with pathogenic missense SNPs using an integrated proteogenomics pipeline combining several steps: DNA-Seq, RNA-Seq, mass spectroscopy (MS), making customized databases of produced datasets, and screening and filtering for useful MS spectrums. This protocol also provides users with tricks and tips for the modifications, based on the requirements of the projects.
Assuntos
Mutação de Sentido Incorreto , Polimorfismo de Nucleotídeo Único , Proteogenômica , Software , Proteogenômica/métodos , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Espectrometria de Massas/métodos , Biologia Computacional/métodos , Genômica/métodosRESUMO
During the last three decades, technological advancements in high-throughput next-generation sequencing have resulted in an increased understanding of proteomic and genomic data, aptly termed proteogenomics. Efforts in developing such approaches have not only been limited but also focused on protein identification and subcellular localization. These approaches, however, have also been explored for their broad understanding of how genomics/transcriptomics data have yielded measures, for example, gene expression regulation/signal cascading and diseasome studies. In this review, we discuss methods and tools developed through sequence-centric integrative modeling of proteogenomic approaches.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Proteogenômica , Proteogenômica/métodos , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Proteômica/métodos , Software , Biologia Computacional/métodos , Genômica/métodos , Proteoma/genética , AnimaisRESUMO
Microbial sample analysis has received growing attention within the last decade, driven by important findings in microbiome research and promising applications in the biotechnological field. Modern mass spectrometry-based methodology has been established in this context, providing sufficient sensitivity, resolution, dynamic range, and throughput to analyze the so-called metaproteome of complex microbial mixtures from clinical or environmental samples. While proteomic analyses were previously restricted to common model organisms, next-generation sequencing technologies nowadays allow for the rapid and cost-efficient characterization of whole metagenomes of microbial consortia and specific genomes from non-model organisms to which microbes contribute by significant amounts. This proteogenomic approach, meaning the combined application of genomic and proteomic methods, enables researchers to create a protein database that presents a tailored blueprint of the microbial sample under investigation. This contribution provides an overview of the computational challenges and opportunities in proteogenomics and metaproteomics as of January 2018. For practical application, we first showcase an integrative proteogenomic method that circumvents existing reference databases by creating sample-specific transcripts. The underlying algorithm uses a graph network approach that combines RNA-Seq and peptide information. As a second example, we provide a tutorial for a simulation tool that estimates the computational limits of detecting microbial non-model organisms. This method evaluates the potential influence of error-tolerant searches and proteogenomic approaches on databases of interest. Finally, we discuss recommendations for developing future strategies that may help overcome present limitations by combining the strengths of genome- and proteome-based methods and moving toward an integrated metaproteogenomics approach.
Assuntos
Microbiota , Proteogenômica , Proteogenômica/métodos , Microbiota/genética , Biologia Computacional/métodos , Proteômica/métodos , Software , Bases de Dados de Proteínas , Metagenômica/métodos , Algoritmos , Metagenoma , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Espectrometria de Massas/métodos , Proteoma/genéticaRESUMO
Foodborne pathogens remain a serious health issue in developed and developing countries. Safeness of food products has been assured for years with culture-based microbiological methods; however, these present several limitations such as turnaround time and extensive hands-on work, which have been typically address taking advantage of DNA-based methods such as real-time PCR (qPCR). These, and other similar techniques, are targeted assays, meaning that they are directed for the specific detection of one specific microbe. Even though reliable, this approach suffers from an important limitation that unless specific assays are design for every single pathogen potentially present, foods may be considered erroneously safe. To address this problem, next-generation sequencing (NGS) can be used as this is a nontargeted method; thus it has the capacity to detect every potential threat present. In this chapter, a protocol for the simultaneous detection and preliminary serotyping of Salmonella enterica serovar Enteritidis, Salmonella enterica serovar Typhimurium, Listeria monocytogenes, and Escherichia coli O157:H7 is described.
Assuntos
Microbiologia de Alimentos , Doenças Transmitidas por Alimentos , Sequenciamento de Nucleotídeos em Larga Escala , Listeria monocytogenes , Microbiologia de Alimentos/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Doenças Transmitidas por Alimentos/microbiologia , Doenças Transmitidas por Alimentos/diagnóstico , Listeria monocytogenes/isolamento & purificação , Listeria monocytogenes/genética , Escherichia coli O157/isolamento & purificação , Escherichia coli O157/genética , Humanos , Sorotipagem/métodos , DNA Bacteriano/genética , DNA Bacteriano/análise , Salmonella typhimurium/isolamento & purificação , Salmonella typhimurium/genéticaRESUMO
Unveiling the strategies of bacterial adaptation to stress constitute a challenging area of research. The understanding of mechanisms governing emergence of resistance to antimicrobials is of particular importance regarding the increasing threat of antibiotic resistance on public health worldwide. In the last decades, the fast democratization of sequencing technologies along with the development of dedicated bioinformatical tools to process data offered new opportunities to characterize genomic variations underlying bacterial adaptation. Thereby, research teams have now the possibility to dive deeper in the deciphering of bacterial adaptive mechanisms through the identification of specific genetic targets mediating survival to stress. In this chapter, we proposed a step-by-step bioinformatical pipeline enabling the identification of mutational events underlying biocidal stress adaptation associated with antimicrobial resistance development using Escherichia marmotae as an illustrative model.
Assuntos
Biologia Computacional , Genoma Bacteriano , Genômica , Mutação , Genômica/métodos , Biologia Computacional/métodos , Bactérias/genética , Bactérias/efeitos dos fármacos , Farmacorresistência Bacteriana/genética , Antibacterianos/farmacologia , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
Next-generation sequencing revolutionized food safety management these last years providing access to a huge quantity of valuable data to identify, characterize, and monitor bacterial pathogens on the food chain. Shotgun metagenomics emerged as a particularly promising approach as it enables in-depth taxonomic profiling and functional investigation of food microbial communities. In this chapter, we provide a comprehensive step-by-step bioinformatical workflow to characterize bacterial ecology and resistome composition from metagenomic short-reads obtained by shotgun sequencing.
Assuntos
Bactérias , Biologia Computacional , Microbiologia de Alimentos , Sequenciamento de Nucleotídeos em Larga Escala , Metagenômica , Metagenômica/métodos , Biologia Computacional/métodos , Microbiologia de Alimentos/métodos , Bactérias/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenoma , Microbiota/genéticaRESUMO
The standardization of the microbiome sequencing of poultry rinsates is essential for generating comparable microbial composition data among poultry processing facilities if this technology is to be adopted by the industry. Samples must first be acquired, DNA must be extracted, and libraries must be constructed. In order to proceed to library sequencing, the samples should meet quality control standards. Finally, data must be analyzed using computer bioinformatics pipelines. This data can subsequently be incorporated into more advanced computer algorithms for risk assessment. Ultimately, *a uniform sequencing pipeline will enable both the government regulatory agencies and the poultry industry to identify potential weaknesses in food safety.This chapter presents the different steps for monitoring the population dynamics of the microbiome in poultry processing using 16S rDNA sequencing.
Assuntos
Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Microbiota , Aves Domésticas , RNA Ribossômico 16S , Animais , RNA Ribossômico 16S/genética , Aves Domésticas/microbiologia , Microbiota/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Biologia Computacional/métodos , DNA Bacteriano/genéticaRESUMO
Recent analyses revealed the essential function of chromatin structure in maintaining and regulating genomic information. Advancements in microscopy, nuclear structure observation techniques, and the development of methods utilizing next-generation sequencers (NGSs) have significantly progressed these discoveries. Methods utilizing NGS enable genome-wide analysis, which is challenging with microscopy, and have elucidated concepts of important chromatin structures such as a loop structure, a domain structure called topologically associating domains (TADs), and compartments. In this chapter, I introduce chromatin interaction techniques using NGS and outline the principles and features of each method.
Assuntos
Cromatina , Sequenciamento de Nucleotídeos em Larga Escala , Cromatina/genética , Cromatina/metabolismo , Cromatina/química , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica/métodos , Estudo de Associação Genômica Ampla/métodos , AnimaisRESUMO
Three-dimensional (3D) chromosome structures are closely related to various chromosomal functions, and deep analysis of the structures is crucial for the elucidation of the functions. In recent years, chromosome conformation capture (3C) techniques combined with next-generation sequencing analysis have been developed to comprehensively reveal 3D chromosome structures. Micro-C is one such method that can detect the structures at nucleosome resolution. In this chapter, I provide a basic method for Micro-C analysis. I present and discuss a series of data analyses ranging from mapping to basic downstream analyses, including loop detection.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Fluxo de Trabalho , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Cromossomos/genética , Biologia Computacional/métodos , Mapeamento Cromossômico/métodos , Nucleossomos/química , Nucleossomos/genética , Nucleossomos/metabolismoRESUMO
Hi-C and 3C-seq are powerful tools to study the 3D genomes of bacteria and archaea, whose small cell sizes and growth conditions are often intractable to detailed microscopic analysis. However, the circularity of prokaryotic genomes requires a number of tricks for Hi-C/3C-seq data analysis. Here, I provide a practical guide to use the HiC-Pro pipeline for Hi-C/3C-seq data obtained from prokaryotes.
Assuntos
Genoma Bacteriano , Software , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Células Procarióticas/metabolismo , Genoma Arqueal , Archaea/genética , Bactérias/genética , Biologia Computacional/métodos , Análise de DadosRESUMO
Hi-C is a popular ligation-based technique to detect 3D physical chromosome structure within the nucleus using cross-linking and next-generation sequencing. As an unbiased genome-wide assay based on chromosome conformation capture, it provides rich insights into chromosome structure, dynamic chromosome folding and interactions, and the regulatory state of a cell. Bioinformatics analyses of Hi-C data require dedicated protocols as most genome alignment tools assume that both paired-end reads will map to the same chromosome, resulting in large two-dimensional matrices as processed data. Here, we outline the necessary steps to generate high-quality aligned Hi-C data by separately mapping each read while correcting for biases from restriction enzyme digests. We introduce our own custom open-source pipeline, which enables users to select an aligner of their choosing with high accuracy and performance. This enables users to generate high-resolution datasets with fast turnaround and fewer unmapped reads. Finally, we discuss recent innovations in experimental techniques, bioinformatics techniques, and their applications in clinical testing for diagnostics.
Assuntos
Mapeamento Cromossômico , Biologia Computacional , Sequenciamento de Nucleotídeos em Larga Escala , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biologia Computacional/métodos , Humanos , Mapeamento Cromossômico/métodos , Cromossomos/genética , Genômica/métodos , Cromatina/genética , Cromatina/químicaRESUMO
Hi-C and Micro-C are the three-dimensional (3D) genome assays that use high-throughput sequencing. In the analysis, the sequenced paired-end reads are mapped to a reference genome to generate a two-dimensional contact matrix for identifying topologically associating domains (TADs), chromatin loops, and chromosomal compartments. On the other hand, the distance distribution of the paired-end mapped reads also provides insight into the 3D genome structure by highlighting global contact frequency patterns at distances indicative of loops, TADs, and compartments. This chapter presents a basic workflow for visualizing and analyzing contact distance distributions from Hi-C data. The workflow can be run on Google Colaboratory, which provides a ready-to-use Python environment accessible through a web browser. The notebook that demonstrates the workflow is available in the GitHub repository at https://github.com/rnakato/Springer_contact_distance_plot.