RESUMO
Accurate measurement of clonal genotypes, mutational processes, and replication states from individual tumor-cell genomes will facilitate improved understanding of tumor evolution. We have developed DLP+, a scalable single-cell whole-genome sequencing platform implemented using commodity instruments, image-based object recognition, and open source computational methods. Using DLP+, we have generated a resource of 51,926 single-cell genomes and matched cell images from diverse cell types including cell lines, xenografts, and diagnostic samples with limited material. From this resource we have defined variation in mitotic mis-segregation rates across tissue types and genotypes. Analysis of matched genomic and image measurements revealed correlations between cellular morphology and genome ploidy states. Aggregation of cells sharing copy number profiles allowed for calculation of single-nucleotide resolution clonal genotypes and inference of clonal phylogenies and avoided the limitations of bulk deconvolution. Finally, joint analysis over the above features defined clone-specific chromosomal aneuploidy in polyclonal populations.
Assuntos
Replicação do DNA/genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Célula Única , Aneuploidia , Animais , Ciclo Celular/genética , Linhagem Celular Tumoral , Forma Celular , Sobrevivência Celular , Cromossomos Humanos/genética , Células Clonais , Elementos de DNA Transponíveis/genética , Diploide , Feminino , Genótipo , Humanos , Masculino , Camundongos , Mutação/genética , Filogenia , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
The practical application of genome-scale technologies to precision oncology research requires flexible tissue processing strategies that can be used to differentially select both tumour and normal cell populations from formalin-fixed, paraffin-embedded tissues. As tumour sequencing scales towards clinical implementation, practical difficulties in scheduling and obtaining fresh tissue biopsies at scale, including blood samples as surrogates for matched 'normal' DNA, have focused attention on the use of formalin-preserved clinical samples collected routinely for diagnostic purposes. In practice, such samples often contain both tumour and normal cells which, if correctly partitioned, could be used to profile both tumour and normal genomes, thus identifying somatic alterations. Here we report a semi-automated method for laser microdissecting entire slide-mounted tissue sections to enrich for cells of interest with sufficient yield for whole genome and transcriptome sequencing. Using this method, we demonstrated enrichment of tumour material from mixed tumour-normal samples by up to 67%. Leveraging new methods that allow for the extraction of high-quality nucleic acids from small amounts of formalin-fixed tissues, we further showed that the method was successful in yielding sequence data of sufficient quality for use in BC Cancer's Personalized OncoGenomics (POG) program. © 2020 The Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.
Assuntos
Microdissecção e Captura a Laser , Neoplasias/patologia , Medicina de Precisão , Animais , Formaldeído , Humanos , Fígado/patologia , Camundongos , Camundongos Endogâmicos C57BL , Fixação de TecidosRESUMO
Summary: Reliably identifying genomic rearrangements and interpreting their impact is a key step in understanding their role in human cancers and inherited genetic diseases. Many short read algorithmic approaches exist but all have appreciable false negative rates. A common approach is to evaluate the union of multiple tools increasing sensitivity, followed by filtering to retain specificity. Here we describe an application framework for the rapid generation of structural variant consensus, unique in its ability to visualize the genetic impact and context as well as process both genome and transcriptome data. Availability and implementation: http://mavis.bcgsc.ca. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Genômica , Neoplasias/genética , Software , Biologia Computacional , Humanos , TranscriptomaRESUMO
PURPOSE: Structural variants (SVs) may be an underestimated cause of hereditary cancer syndromes given the current limitations of short-read next-generation sequencing. Here we investigated the utility of long-read sequencing in resolving germline SVs in cancer susceptibility genes detected through short-read genome sequencing. METHODS: Known or suspected deleterious germline SVs were identified using Illumina genome sequencing across a cohort of 669 advanced cancer patients with paired tumor genome and transcriptome sequencing. Candidate SVs were subsequently assessed by Oxford Nanopore long-read sequencing. RESULTS: Nanopore sequencing confirmed eight simple pathogenic or likely pathogenic SVs, resolving three additional variants whose impact could not be fully elucidated through short-read sequencing. A recurrent sequencing artifact on chromosome 16p13 and one complex rearrangement on chromosome 5q35 were subsequently classified as likely benign, obviating the need for further clinical assessment. Variant configuration was further resolved in one case with a complex pathogenic rearrangement affecting TSC2. CONCLUSION: Our findings demonstrate that long-read sequencing can improve the validation, resolution, and classification of germline SVs. This has important implications for return of results, cascade carrier testing, cancer screening, and prophylactic interventions.
Assuntos
Predisposição Genética para Doença , Neoplasias , Sequência de Bases , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , HumanosRESUMO
Chickens, pigs, and cattle are key reservoirs of Salmonella enterica, a foodborne pathogen of worldwide importance. Though a decade has elapsed since publication of the first Salmonella genome, thousands of genes remain of hypothetical or unknown function, and the basis of colonization of reservoir hosts is ill-defined. Moreover, previous surveys of the role of Salmonella genes in vivo have focused on systemic virulence in murine typhoid models, and the genetic basis of intestinal persistence and thus zoonotic transmission have received little study. We therefore screened pools of random insertion mutants of S. enterica serovar Typhimurium in chickens, pigs, and cattle by transposon-directed insertion-site sequencing (TraDIS). The identity and relative fitness in each host of 7,702 mutants was simultaneously assigned by massively parallel sequencing of transposon-flanking regions. Phenotypes were assigned to 2,715 different genes, providing a phenotype-genotype map of unprecedented resolution. The data are self-consistent in that multiple independent mutations in a given gene or pathway were observed to exert a similar fitness cost. Phenotypes were further validated by screening defined null mutants in chickens. Our data indicate that a core set of genes is required for infection of all three host species, and smaller sets of genes may mediate persistence in specific hosts. By assigning roles to thousands of Salmonella genes in key reservoir hosts, our data facilitate systems approaches to understand pathogenesis and the rational design of novel cross-protective vaccines and inhibitors. Moreover, by simultaneously assigning the genotype and phenotype of over 90% of mutants screened in complex pools, our data establish TraDIS as a powerful tool to apply rich functional annotation to microbial genomes with minimal animal use.
Assuntos
Salmonelose Animal , Salmonella typhimurium , Animais , Galinhas , Intestinos , Salmonella enterica/genética , Salmonella typhimurium/genética , VirulênciaRESUMO
UNLABELLED: White spruce (Picea glauca) is a dominant conifer of the boreal forests of North America, and providing genomics resources for this commercially valuable tree will help improve forest management and conservation efforts. Sequencing and assembling the large and highly repetitive spruce genome though pushes the boundaries of the current technology. Here, we describe a whole-genome shotgun sequencing strategy using two Illumina sequencing platforms and an assembly approach using the ABySS software. We report a 20.8 giga base pairs draft genome in 4.9 million scaffolds, with a scaffold N50 of 20,356 bp. We demonstrate how recent improvements in the sequencing technology, especially increasing read lengths and paired end reads from longer fragments have a major impact on the assembly contiguity. We also note that scalable bioinformatics tools are instrumental in providing rapid draft assemblies. AVAILABILITY: The Picea glauca genome sequencing and assembly data are available through NCBI (Accession#: ALWZ0100000000 PID: PRJNA83435). http://www.ncbi.nlm.nih.gov/bioproject/83435.
Assuntos
Genoma de Planta , Genômica/métodos , Picea/genética , Sequência de Bases , Dados de Sequência Molecular , Análise de Sequência de DNA , SoftwareRESUMO
We demonstrate a method for tissue microdissection using scanning laser ablation that is approximately two orders of magnitude faster than conventional laser capture microdissection. Our novel approach uses scanning laser optics and a slide coating under the tissue that can be excited by the laser to selectively eject regions of tissue for further processing. Tissue was dissected at 0.117 s/mm2 without reduction in yield, sequencing insert size or base quality compared with undissected tissue. From eight cases, 58-416 mm2 of tissue was obtained from one to four slides in 7-48 seconds total dissection time per case. These samples underwent exome sequencing and we found the variant allelic fraction increased in regions enriched for tumour as expected. This suggests that our ablation technique may be useful as a tool in both clinical and research labs.
Assuntos
Microdissecção e Captura a Laser , Humanos , Microdissecção e Captura a Laser/métodos , Terapia a Laser/métodos , Microdissecção/métodos , Sequenciamento do Exoma , Fatores de TempoRESUMO
As part of the COVID-19 pandemic, clinical laboratories have been faced with massive increases in testing, resulting in sample collection systems, reagent, and staff shortages. We utilized self-collected saline gargle samples to optimize high throughput SARS-CoV-2 multiplex polymerase chain reaction (PCR) testing in order to minimize cost and technologist time. This was achieved through elimination of nucleic acid extraction and automation of sample handling on a widely available robotic liquid handler, Hamilton STARlet. A customized barcode scanning script for reading the sample ID by the Hamilton STARlet's software system was developed to allow primary tube sampling. Use of pre-frozen SARS-CoV-2 assay reaction mixtures reduced assay setup time. In both validation and live testing, the assay produced no false positive or false negative results. Of the 1060 samples tested during validation, 3.6% (39/1060) of samples required retesting as they were either single gene positive, had internal control failure or liquid aspiration error. Although the overall turnaround time was only slightly faster in the automated workflow (185 min vs 200 min), there was a 76% reduction in hands-on time, potentially reducing staff fatigue and burnout. This described process from sample self-collection to automated direct PCR testing significantly reduces the total burden on healthcare systems in terms of human resources and reagent requirements.
Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/diagnóstico , Pandemias , Teste para COVID-19 , Manejo de Espécimes , Reação em Cadeia da Polimerase Multiplex , Sensibilidade e Especificidade , RNA Viral/análiseRESUMO
BACKGROUND: To support the implementation of high-throughput pipelines suitable for SARS-CoV-2 sequencing and analysis in a clinical laboratory, we developed an automated sample preparation and analysis workflow. METHODS: We used the established ARTIC protocol with approximately 400â bp amplicons sequenced on Oxford Nanopore's MinION. Sequences were analyzed using Nextclade, assigning both a clade and quality score to each sample. RESULTS: A total of 2179 samples on twenty-five 96-well plates were sequenced. Plates of purified RNA were processed within 12â h, sequencing required up to 24â h, and analysis of each pooled plate required 1â h. The use of samples with known threshold cycle (Ct) values enabled normalization, acted as a quality control check, and revealed a strong correlation between sample Ct values and successful analysis, with 85% of samples with Ct < 30 achieving a "good" Nextclade score. Less abundant samples responded to enrichment with the fraction of Ct > 30 samples achieving a "good" classification rising by 60% after addition of a post-ARTIC PCR normalization. Serial dilutions of 3 variant of concern samples, diluted from approximately Ct = 16 to approximately Ct = 50, demonstrated successful sequencing to Ctâ =â 37. The sample set contained a median of 24 mutations per sample and a total of 1281 unique mutations with reduced sequence read coverage noted in some regions of some samples. A total of 10 separate strains were observed in the sample set, including 3 variants of concern prevalent in British Columbia in the spring of 2021. CONCLUSIONS: We demonstrated a robust automated sequencing pipeline that takes advantage of input Ct values to improve reliability.
Assuntos
COVID-19 , Sequenciamento por Nanoporos , Nanoporos , COVID-19/diagnóstico , COVID-19/epidemiologia , Humanos , Reprodutibilidade dos Testes , SARS-CoV-2/genéticaRESUMO
The COVID-19 pandemic has highlighted the need for generic reagents and flexible systems in diagnostic testing. Magnetic bead-based nucleic acid extraction protocols using 96-well plates on open liquid handlers are readily amenable to meet this need. Here, one such approach is rigorously optimized to minimize cross-well contamination while maintaining sensitivity.
Assuntos
COVID-19 , Ácidos Nucleicos , Teste para COVID-19 , Humanos , Indicadores e Reagentes , Fenômenos Magnéticos , Pandemias , RNA Viral/genética , SARS-CoV-2 , Sensibilidade e EspecificidadeRESUMO
Genes required for infection of mice by Salmonella Typhimurium can be identified by the interrogation of random transposon mutant libraries for mutants that cannot survive in vivo. Inactivation of such genes produces attenuated S. Typhimurium strains that have potential for use as live attenuated vaccines. A quantitative screen, Transposon Mediated Differential Hybridisation (TMDH), has been developed that identifies those members of a large library of transposon mutants that are attenuated. TMDH employs custom transposons with outward-facing T7 and SP6 promoters. Fluorescently-labelled transcripts from the promoters are hybridised to whole-genome tiling microarrays, to allow the position of the transposon insertions to be determined. Comparison of microarray data from the mutant library grown in vitro (input) with equivalent data produced after passage of the library through mice (output) enables an attenuation score to be determined for each transposon mutant. These scores are significantly correlated with bacterial counts obtained during infection of mice using mutants with individual defined deletions of the same genes. Defined deletion mutants of several novel targets identified in the TMDH screen are effective live vaccines.
Assuntos
Elementos de DNA Transponíveis , Salmonelose Animal/microbiologia , Salmonella enterica/genética , Animais , Clonagem Molecular , Bases de Dados Genéticas , Modelos Animais de Doenças , Biblioteca Gênica , Genes Bacterianos , Camundongos , Camundongos Endogâmicos BALB C , Hibridização de Ácido Nucleico , Análise de Sequência com Séries de Oligonucleotídeos , Reprodutibilidade dos Testes , Salmonella enterica/patogenicidade , Deleção de Sequência , Virulência/genéticaRESUMO
RNA sequencing (RNAseq) has been widely used to generate bulk gene expression measurements collected from pools of cells. Only relatively recently have single-cell RNAseq (scRNAseq) methods provided opportunities for gene expression analyses at the single-cell level, allowing researchers to study heterogeneous mixtures of cells at unprecedented resolution. Tumors tend to be composed of heterogeneous cellular mixtures and are frequently the subjects of such analyses. Extensive method developments have led to several protocols for scRNAseq but, owing to the small amounts of RNA in single cells, technical constraints have required compromises. For example, the majority of scRNAseq methods are limited to sequencing only the 3' or 5' termini of transcripts. Other protocols that facilitate full-length transcript profiling tend to capture only polyadenylated mRNAs and are generally limited to processing only 96 cells at a time. Here, we address these limitations and present a novel protocol that allows for the high-throughput sequencing of full-length, total RNA at single-cell resolution. We demonstrate that our method produced strand-specific sequencing data for both polyadenylated and non-polyadenylated transcripts, enabled the profiling of transcript regions beyond only transcript termini, and yielded data rich enough to allow identification of cell types from heterogeneous biological samples.
RESUMO
Plant mitochondrial genomes vary widely in size. Although many plant mitochondrial genomes have been sequenced and assembled, the vast majority are of angiosperms, and few are of gymnosperms. Most plant mitochondrial genomes are smaller than a megabase, with a few notable exceptions. We have sequenced and assembled the complete 5.5-Mb mitochondrial genome of Sitka spruce (Picea sitchensis), to date, one of the largest mitochondrial genomes of a gymnosperm. We sequenced the whole genome using Oxford Nanopore MinION, and then identified contigs of mitochondrial origin assembled from these long reads based on sequence homology to the white spruce mitochondrial genome. The assembly graph shows a multipartite genome structure, composed of one smaller 168-kb circular segment of DNA, and a larger 5.4-Mb single component with a branching structure. The assembly graph gives insight into a putative complex physical genome structure, and its branching points may represent active sites of recombination.
Assuntos
Genoma Mitocondrial , Genoma de Planta , Picea/genética , Estrutura MolecularRESUMO
BACKGROUND: In recent years there has been an increasing problem with Staphylococcus aureus strains that are resistant to treatment with existing antibiotics. An important starting point for the development of new antimicrobial drugs is the identification of "essential" genes that are important for bacterial survival and growth. RESULTS: We have developed a robust microarray and PCR-based method, Transposon-Mediated Differential Hybridisation (TMDH), that uses novel bioinformatics to identify transposon inserts in genome-wide libraries. Following a microarray-based screen, genes lacking transposon inserts are re-tested using a PCR and sequencing-based approach. We carried out a TMDH analysis of the S. aureus genome using a large random mariner transposon library of around a million mutants, and identified a total of 351 S. aureus genes important for survival and growth in culture. A comparison with the essential gene list experimentally derived for Bacillus subtilis highlighted interesting differences in both pathways and individual genes. CONCLUSION: We have determined the first comprehensive list of S. aureus essential genes. This should act as a useful starting point for the identification of potential targets for novel antimicrobial compounds. The TMDH methodology we have developed is generic and could be applied to identify essential genes in other bacterial pathogens.
Assuntos
Elementos de DNA Transponíveis , Genes Essenciais , Análise de Sequência de DNA/métodos , Staphylococcus aureus/genética , Biologia Computacional , DNA Bacteriano/genética , Biblioteca Gênica , Genes Bacterianos , Genoma Bacteriano , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Sondas de Oligonucleotídeos , Reação em Cadeia da Polimerase/métodos , Software , Staphylococcus aureus/classificaçãoRESUMO
Mycobacterium chimaera, a nontuberculous mycobacterium (NTM) belonging to the Mycobacterium avium complex (MAC), is an opportunistic pathogen that can cause respiratory and disseminated disease. We report the complete genome sequence of a strain, SJ42, isolated from an immunocompromised male presenting with MAC pneumonia, assembled from Illumina and Oxford Nanopore data.
RESUMO
Frogs play important ecological roles, and several species are important model organisms for scientific research. The globally distributed Ranidae (true frogs) are the largest frog family, and have substantial evolutionary distance from the model laboratory Xenopus frog species. Unfortunately, there are currently no genomic resources for the former, important group of amphibians. More widely applicable amphibian genomic data is urgently needed as more than two-thirds of known species are currently threatened or are undergoing population declines. We report a 5.8 Gbp (NG50 = 69 kbp) genome assembly of a representative North American bullfrog (Rana [Lithobates] catesbeiana). The genome contains over 22,000 predicted protein-coding genes and 6,223 candidate long noncoding RNAs (lncRNAs). RNA-Seq experiments show thyroid hormone causes widespread transcriptional change among protein-coding and putative lncRNA genes. This initial bullfrog draft genome will serve as a key resource with broad utility including amphibian research, developmental biology, and environmental research.
Assuntos
Genoma , RNA Longo não Codificante/genética , Rana catesbeiana/genética , Animais , Biologia Computacional , Genoma Mitocondrial , Masculino , Anotação de Sequência Molecular , América do Norte , Filogenia , RNA Longo não Codificante/metabolismo , Rana catesbeiana/metabolismo , Hormônios Tireóideos/metabolismoRESUMO
The linked read sequencing library preparation platform by 10X Genomics produces barcoded sequencing libraries, which are subsequently sequenced using the Illumina short read sequencing technology. In this new approach, long fragments of DNA are partitioned into separate micro-reactions, where the same index sequence is incorporated into each of the sequencing fragment inserts derived from a given long fragment. In this study, we exploited this property by using reads from index sequences associated with a large number of reads, to assemble the chloroplast genome of the Sitka spruce tree (Picea sitchensis). Here we report on the first Sitka spruce chloroplast genome assembled exclusively from P. sitchensis genomic libraries prepared using the 10X Genomics protocol. We show that the resulting 124,049 base pair long genome shares high sequence similarity with the related white spruce and Norway spruce chloroplast genomes, but diverges substantially from a previously published P. sitchensis- P. thunbergii chimeric genome. The use of reads from high-frequency indices enabled separation of the nuclear genome reads from that of the chloroplast, which resulted in the simplification of the de Bruijn graphs used at the various stages of assembly.
Assuntos
Cloroplastos/genética , Genoma de Planta , Picea/genética , Filogenia , Picea/classificaçãoRESUMO
The genome sequences of the plastid and mitochondrion of white spruce (Picea glauca) were assembled from whole-genome shotgun sequencing data using ABySS. The sequencing data contained reads from both the nuclear and organellar genomes, and reads of the organellar genomes were abundant in the data as each cell harbors hundreds of mitochondria and plastids. Hence, assembly of the 123-kb plastid and 5.9-Mb mitochondrial genomes were accomplished by analyzing data sets primarily representing low coverage of the nuclear genome. The assembled organellar genomes were annotated for their coding genes, ribosomal RNA, and transfer RNA. Transcript abundances of the mitochondrial genes were quantified in three developmental tissues and five mature tissues using data from RNA-seq experiments. C-to-U RNA editing was observed in the majority of mitochondrial genes, and in four genes, editing events were noted to modify ACG codons to create cryptic AUG start codons. The informatics methodology presented in this study should prove useful to assemble organellar genomes of other plant species using whole-genome shotgun sequencing data.
Assuntos
Genoma de Cloroplastos , Genoma Mitocondrial , Genoma de Planta , Picea/genética , Sequência de Bases , Mapeamento de Sequências Contíguas , Anotação de Sequência Molecular , Dados de Sequência MolecularRESUMO
BACKGROUND: Numerous cancers have been linked to microorganisms. Given that colorectal cancer is a leading cause of cancer deaths and the colon is continuously exposed to a high diversity of microbes, the relationship between gut mucosal microbiome and colorectal cancer needs to be explored. Metagenomic studies have shown an association between Fusobacterium species and colorectal carcinoma. Here, we have extended these studies with deeper sequencing of a much larger number (n = 130) of colorectal carcinoma and matched normal control tissues. We analyzed these data using co-occurrence networks in order to identify microbe-microbe and host-microbe associations specific to tumors. RESULTS: We confirmed tumor over-representation of Fusobacterium species and observed significant co-occurrence within individual tumors of Fusobacterium, Leptotrichia and Campylobacter species. This polymicrobial signature was associated with over-expression of numerous host genes, including the gene encoding the pro-inflammatory chemokine Interleukin-8. The tumor-associated bacteria we have identified are all Gram-negative anaerobes, recognized previously as constituents of the oral microbiome, which are capable of causing infection. We isolated a novel strain of Campylobacter showae from a colorectal tumor specimen. This strain is substantially diverged from a previously sequenced oral Campylobacter showae isolate, carries potential virulence genes, and aggregates with a previously isolated tumor strain of Fusobacterium nucleatum. CONCLUSIONS: A polymicrobial signature of Gram-negative anaerobic bacteria is associated with colorectal carcinoma tissue.