RESUMO
Complex chromosomal rearrangements (CCRs) are rearrangements involving more than two chromosomes or more than two breakpoints. Whole genome sequencing (WGS) allows for outstanding high resolution characterization on the nucleotide level in unique sequences of such rearrangements, but problems remain for mapping breakpoints in repetitive regions of the genome, which are known to be prone to rearrangements. Hence, multiple complementary WGS experiments are sometimes needed to solve the structures of CCRs. We have studied three individuals with CCRs: Case 1 and Case 2 presented with de novo karyotypically balanced, complex interchromosomal rearrangements (46,XX,t(2;8;15)(q35;q24.1;q22) and 46,XY,t(1;10;5)(q32;p12;q31)), and Case 3 presented with a de novo, extremely complex intrachromosomal rearrangement on chromosome 1. Molecular cytogenetic investigation revealed cryptic deletions in the breakpoints of chromosome 2 and 8 in Case 1, and on chromosome 10 in Case 2, explaining their clinical symptoms. In Case 3, 26 breakpoints were identified using WGS, disrupting five known disease genes. All rearrangements were subsequently analyzed using optical maps, linked-read WGS, and short-read WGS. In conclusion, we present a case series of three unique de novo CCRs where we by combining the results from the different technologies fully solved the structure of each rearrangement. The power in combining short-read WGS with long-molecule sequencing or optical mapping in these unique de novo CCRs in a clinical setting is demonstrated.
Assuntos
Cromossomos/genética , Rearranjo Gênico/genética , Variação Estrutural do Genoma/genética , Mapeamento Cromossômico/métodos , Feminino , Humanos , Masculino , Sequenciamento Completo do Genoma/métodosRESUMO
OBJECTIVE: This study sought to determine whether 18F-fluorodeoxyglucose-positron emission tomography/computed tomography could be applied to a murine model of advanced atherosclerotic plaque vulnerability to detect response to therapeutic intervention and changes in lesion stability. Approach and Results: To analyze plaques susceptible to rupture, we fed ApoE-/- mice a high-fat diet and induced vulnerable lesions by cast placement over the carotid artery. After 9 weeks of treatment with orthogonal therapeutic agents (including lipid-lowering and proefferocytic therapies), we assessed vascular inflammation and several features of plaque vulnerability by 18F-fluorodeoxyglucose-positron emission tomography/computed tomography and histopathology, respectively. We observed that 18F-fluorodeoxyglucose-positron emission tomography/computed tomography had the capacity to resolve histopathologically proven changes in plaque stability after treatment. Moreover, mean target-to-background ratios correlated with multiple characteristics of lesion instability, including the corrected vulnerability index. CONCLUSIONS: These results suggest that the application of noninvasive 18F-fluorodeoxyglucose-positron emission tomography/computed tomography to a murine model can allow for the identification of vulnerable atherosclerotic plaques and their response to therapeutic intervention. This approach may prove useful as a drug discovery and prioritization method.
Assuntos
Doenças das Artérias Carótidas/diagnóstico por imagem , Artéria Carótida Primitiva/diagnóstico por imagem , Fluordesoxiglucose F18/administração & dosagem , Placa Aterosclerótica , Tomografia por Emissão de Pósitrons combinada à Tomografia Computadorizada , Compostos Radiofarmacêuticos/administração & dosagem , Animais , Anticorpos Bloqueadores/farmacologia , Atorvastatina/farmacologia , Antígeno CD47/antagonistas & inibidores , Doenças das Artérias Carótidas/tratamento farmacológico , Doenças das Artérias Carótidas/patologia , Artéria Carótida Primitiva/efeitos dos fármacos , Artéria Carótida Primitiva/patologia , Modelos Animais de Doenças , Inibidores de Hidroximetilglutaril-CoA Redutases/farmacologia , Masculino , Camundongos Endogâmicos C57BL , Camundongos Knockout para ApoE , Valor Preditivo dos Testes , Ruptura EspontâneaRESUMO
Clustered copy number variants (CNVs) as detected by chromosomal microarray analysis (CMA) are often reported as germline chromothripsis. However, such cases might need further investigations by massive parallel whole genome sequencing (WGS) in order to accurately define the underlying complex rearrangement, predict the occurrence mechanisms and identify additional complexities. Here, we utilized WGS to delineate the rearrangement structure of 21 clustered CNV carriers first investigated by CMA and identified a total of 83 breakpoint junctions (BPJs). The rearrangements were further sub-classified depending on the patterns observed: I) Cases with only deletions (n = 8) often had additional structural rearrangements, such as insertions and inversions typical to chromothripsis; II) cases with only duplications (n = 7) or III) combinations of deletions and duplications (n = 6) demonstrated mostly interspersed duplications and BPJs enriched with microhomology. In two cases the rearrangement mutational signatures indicated both a breakage-fusion-bridge cycle process and haltered formation of a ring chromosome. Finally, we observed two cases with Alu- and LINE-mediated rearrangements as well as two unrelated individuals with seemingly identical clustered CNVs on 2p25.3, possibly a rare European founder rearrangement. In conclusion, through detailed characterization of the derivative chromosomes we show that multiple mechanisms are likely involved in the formation of clustered CNVs and add further evidence for chromoanagenesis mechanisms in both "simple" and highly complex chromosomal rearrangements. Finally, WGS characterization adds positional information, important for a correct clinical interpretation and deciphering mechanisms involved in the formation of these rearrangements.
Assuntos
Variações do Número de Cópias de DNA , Replicação do DNA/genética , Elementos Alu , Pontos de Quebra do Cromossomo , Cromotripsia , Rearranjo Gênico , Genoma Humano , Humanos , Elementos Nucleotídeos Longos e Dispersos , Análise de Sequência com Séries de Oligonucleotídeos , Sequenciamento Completo do GenomaRESUMO
Conifers have dominated forests for more than 200 million years and are of huge ecological and economic importance. Here we present the draft assembly of the 20-gigabase genome of Norway spruce (Picea abies), the first available for any gymnosperm. The number of well-supported genes (28,354) is similar to the >100 times smaller genome of Arabidopsis thaliana, and there is no evidence of a recent whole-genome duplication in the gymnosperm lineage. Instead, the large genome size seems to result from the slow and steady accumulation of a diverse set of long-terminal repeat transposable elements, possibly owing to the lack of an efficient elimination mechanism. Comparative sequencing of Pinus sylvestris, Abies sibirica, Juniperus communis, Taxus baccata and Gnetum gnemon reveals that the transposable element diversity is shared among extant conifers. Expression of 24-nucleotide small RNAs, previously implicated in transposable element silencing, is tissue-specific and much lower than in other plants. We further identify numerous long (>10,000 base pairs) introns, gene-like fragments, uncharacterized long non-coding RNAs and short RNAs. This opens up new genomic avenues for conifer forestry and breeding.
Assuntos
Evolução Molecular , Genoma de Planta/genética , Picea/genética , Sequência Conservada/genética , Elementos de DNA Transponíveis/genética , Inativação Gênica , Genes de Plantas/genética , Genômica , Internet , Íntrons/genética , Fenótipo , RNA não Traduzido/genética , Análise de Sequência de DNA , Sequências Repetidas Terminais/genética , Transcrição Gênica/genéticaRESUMO
Data produced with short-read sequencing technologies result in ambiguous haplotyping and a limited capacity to investigate the full repertoire of biologically relevant forms of genetic variation. The notion of haplotype-resolved sequencing data has recently gained traction to reduce this unwanted ambiguity and enable exploration of other forms of genetic variation; beyond studies of just nucleotide polymorphisms, such as compound heterozygosity and structural variations. Here we describe Droplet Barcode Sequencing, a novel approach for creating linked-read sequencing libraries by uniquely barcoding the information within single DNA molecules in emulsion droplets, without the aid of specialty reagents or microfluidic devices. Barcode generation and template amplification is performed simultaneously in a single enzymatic reaction, greatly simplifying the workflow and minimizing assay costs compared to alternative approaches. The method has been applied to phase multiple loci targeting all exons of the highly variable Human Leukocyte Antigen A (HLA-A) gene, with DNA from eight individuals present in the same assay. Barcode-based clustering of sequencing reads confirmed analysis of over 2000 independently assayed template molecules, with an average of 753 reads in support of called polymorphisms. Our results show unequivocal characterization of all alleles present, validated by correspondence against confirmed HLA database entries and haplotyping results from previous studies.
Assuntos
Código de Barras de DNA Taxonômico/métodos , Haplótipos , Alelos , Biblioteca Gênica , Antígenos HLA-A/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Reação em Cadeia da Polimerase , Análise de Sequência de DNARESUMO
To improve the epigenomic analysis of tissues rich in 5-hydroxymethylcytosine (hmC), we developed a novel protocol called TAB-Methyl-SEQ, which allows for single base resolution profiling of both hmC and 5-methylcytosine by targeted next-generation sequencing. TAB-Methyl-SEQ data were extensively validated by a set of five methodologically different protocols. Importantly, these extensive cross-comparisons revealed that protocols based on Tet1-assisted bisulfite conversion provided more precise hmC values than TrueMethyl-based methods. A total of 109 454 CpG sites were analyzed by TAB-Methyl-SEQ for mC and hmC in 188 genes from 20 different adult human livers. We describe three types of variability of hepatic hmC profiles: (i) sample-specific variability at 40.8% of CpG sites analyzed, where the local hmC values correlate to the global hmC content of livers (measured by LC-MS), (ii) gene-specific variability, where hmC levels in the coding regions positively correlate to expression of the respective gene and (iii) site-specific variability, where prominent hmC peaks span only 1 to 3 neighboring CpG sites. Our data suggest that both the gene- and site-specific components of hmC variability might contribute to the epigenetic control of hepatic genes. The protocol described here should be useful for targeted DNA analysis in a variety of applications.
Assuntos
5-Metilcitosina/análogos & derivados , Pareamento de Bases , Regulação da Expressão Gênica , Genes , Fígado/metabolismo , 5-Metilcitosina/metabolismo , Adulto , Sequência de Bases , Cromatografia Líquida , Ilhas de CpG/genética , DNA/metabolismo , Humanos , Espectrometria de Massas , Reprodutibilidade dos Testes , Análise de Sequência de DNA , Sulfitos/metabolismoRESUMO
Most balanced translocations are thought to result mechanistically from nonhomologous end joining or, in rare cases of recurrent events, by nonallelic homologous recombination. Here, we use low-coverage mate pair whole-genome sequencing to fine map rearrangement breakpoint junctions in both phenotypically normal and affected translocation carriers. In total, 46 junctions from 22 carriers of balanced translocations were characterized. Genes were disrupted in 48% of the breakpoints; recessive genes in four normal carriers and known dominant intellectual disability genes in three affected carriers. Finally, seven candidate disease genes were disrupted in five carriers with neurocognitive disabilities (SVOPL, SUSD1, TOX, NCALD, SLC4A10) and one XX-male carrier with Tourette syndrome (LYPD6, GPC5). Breakpoint junction analyses revealed microhomology and small templated insertions in a substantive fraction of the analyzed translocations (17.4%; n = 4); an observation that was substantiated by reanalysis of 37 previously published translocation junctions. Microhomology associated with templated insertions is a characteristic seen in the breakpoint junctions of rearrangements mediated by error-prone replication-based repair mechanisms. Our data implicate that a mechanism involving template switching might contribute to the formation of at least 15% of the interchromosomal translocation events.
Assuntos
Mapeamento Cromossômico , Translocação Genética , Sequenciamento Completo do Genoma , Sequência de Bases , Quebra Cromossômica , Hibridização Genômica Comparativa , Variações do Número de Cópias de DNA , Feminino , Estudos de Associação Genética , Genômica/métodos , Genótipo , Recombinação Homóloga , Humanos , Hibridização in Situ Fluorescente , Cariótipo , Masculino , FenótipoRESUMO
MOTIVATION: Fast and accurate quality control is essential for studies involving next-generation sequencing data. Whilst numerous tools exist to quantify QC metrics, there is no common approach to flexibly integrate these across tools and large sample sets. Assessing analysis results across an entire project can be time consuming and error prone; batch effects and outlier samples can easily be missed in the early stages of analysis. RESULTS: We present MultiQC, a tool to create a single report visualising output from multiple tools across many samples, enabling global trends and biases to be quickly identified. MultiQC can plot data from many common bioinformatics tools and is built to allow easy extension and customization. AVAILABILITY AND IMPLEMENTATION: MultiQC is available with an GNU GPLv3 license on GitHub, the Python Package Index and Bioconda. Documentation and example reports are available at http://multiqc.info CONTACT: phil.ewels@scilifelab.se.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Controle de Qualidade , Biologia Computacional , SoftwareRESUMO
Arsenic, a carcinogen with immunotoxic effects, is a common contaminant of drinking water and certain food worldwide. We hypothesized that chronic arsenic exposure alters gene expression, potentially by altering DNA methylation of genes encoding central components of the immune system. We therefore analyzed the transcriptomes (by RNA sequencing) and methylomes (by target-enrichment next-generation sequencing) of primary CD4-positive T cells from matched groups of four women each in the Argentinean Andes, with fivefold differences in urinary arsenic concentrations (median concentrations of urinary arsenic in the lower- and high-arsenic groups: 65 and 276 µg/l, respectively). Arsenic exposure was associated with genome-wide alterations of gene expression; principal component analysis indicated that the exposure explained 53% of the variance in gene expression among the top variable genes and 19% of 28,351 genes were differentially expressed (false discovery rate <0.05) between the exposure groups. Key genes regulating the immune system, such as tumor necrosis factor alpha and interferon gamma, as well as genes related to the NF-kappa-beta complex, were significantly downregulated in the high-arsenic group. Arsenic exposure was associated with genome-wide DNA methylation; the high-arsenic group had 3% points higher genome-wide full methylation (>80% methylation) than the lower-arsenic group. Differentially methylated regions that were hyper-methylated in the high-arsenic group showed enrichment for immune-related gene ontologies that constitute the basic functions of CD4-positive T cells, such as isotype switching and lymphocyte activation and differentiation. In conclusion, chronic arsenic exposure from drinking water was related to changes in the transcriptome and methylome of CD4-positive T cells, both genome wide and in specific genes, supporting the hypothesis that arsenic causes immunotoxicity by interfering with gene expression and regulation.
Assuntos
Arsênio/toxicidade , Linfócitos T CD4-Positivos/efeitos dos fármacos , Metilação de DNA/efeitos dos fármacos , Exposição Ambiental/efeitos adversos , Regulação da Expressão Gênica/efeitos dos fármacos , Adulto , Argentina , Linfócitos T CD4-Positivos/fisiologia , Ilhas de CpG , Feminino , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Pessoa de Meia-Idade , Regiões Promotoras GenéticasRESUMO
BACKGROUND: Bisulfite treatment of DNA followed by sequencing (BS-seq) has become a standard technique in epigenetic studies, providing researchers with tools for generating single-base resolution maps of whole methylomes. Aligning bisulfite-treated reads, however, is a computationally difficult task: bisulfite treatment decreases the (lexical) complexity of low-methylated genomic regions, and C-to-T mismatches may reflect cytosine unmethylation rather than SNPs or sequencing errors. Further challenges arise both during and after the alignment phase: data structures used by the aligner should be fast and should fit into main memory, and the methylation-caller output should be somehow compressed, due to its significant size. METHODS: As far as data structures employed to align bisulfite-treated reads are concerned, solutions proposed in the literature can be roughly grouped into two main categories: those storing pointers at each text position (e.g. hash tables, suffix trees/arrays), and those using the information-theoretic minimum number of bits (e.g. FM indexes and compressed suffix arrays). The former are fast and memory consuming. The latter are much slower and light. In this paper, we try to close this gap proposing a data structure for aligning bisulfite-treated reads which is at the same time fast, light, and very accurate. We reach this objective by combining a recent theoretical result on succinct hashing with a bisulfite-aware hash function. Furthermore, the new versions of the tools implementing our ideas|the aligner ERNE-BS5 2 and the caller ERNE-METH 2|have been extended with increased downstream compatibility (EPP/Bismark cov output formats), output compression, and support for target enrichment protocols. RESULTS: Experimental results on public and simulated WGBS libraries show that our algorithmic solution is a competitive tradeoff between hash-based and BWT-based indexes, being as fast and accurate as the former, and as memory-efficient as the latter. CONCLUSIONS: The new functionalities of our bisulfite aligner and caller make it a fast and memory efficient tool, useful to analyze big datasets with little computational resources, to easily process target enrichment data, and produce statistics such as protocol efficiency and coverage as a function of the distance from target regions.
Assuntos
Metilação de DNA , DNA/química , Epigenômica , Análise de Sequência de DNA/métodos , Software , Sulfitos/química , Ilhas de CpG , Compressão de Dados , Genoma Humano , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , HumanosRESUMO
In pulmonary sarcoidosis, CD4(+) T-cells expressing T-cell receptor Vα2.3 accumulate in the lungs of HLA-DRB1*03(+) patients. To investigate T-cell receptor-HLA-DRB1*03 interactions underlying recognition of hitherto unknown antigens, we performed detailed analyses of T-cell receptor expression on bronchoalveolar lavage fluid CD4(+) T-cells from sarcoidosis patients.Pulmonary sarcoidosis patients (n=43) underwent bronchoscopy with bronchoalveolar lavage. T-cell receptor α and ß chains of CD4(+) T-cells were analysed by flow cytometry, DNA-sequenced, and three-dimensional molecular models of T-cell receptor-HLA-DRB1*03 complexes generated.Simultaneous expression of Vα2.3 with the Vß22 chain was identified in the lungs of all HLA-DRB1*03(+) patients. Accumulated Vα2.3/Vß22-expressing T-cells were highly clonal, with identical or near-identical Vα2.3 chain sequences and inter-patient similarities in Vß22 chain amino acid distribution. Molecular modelling revealed specific T-cell receptor-HLA-DRB1*03-peptide interactions, with a previously identified, sarcoidosis-associated vimentin peptide, (Vim)429-443 DSLPLVDTHSKRTLL, matching both the HLA peptide-binding cleft and distinct T-cell receptor features perfectly.We demonstrate, for the first time, the accumulation of large clonal populations of specific Vα2.3/Vß22 T-cell receptor-expressing CD4(+) T-cells in the lungs of HLA-DRB1*03(+) sarcoidosis patients. Several distinct contact points between Vα2.3/Vß22 receptors and HLA-DRB1*03 molecules suggest presentation of prototypic vimentin-derived peptides.
Assuntos
Linfócitos T CD4-Positivos/imunologia , Cadeias HLA-DRB1/metabolismo , Receptores de Antígenos de Linfócitos T/imunologia , Sarcoidose Pulmonar/imunologia , Adulto , Líquido da Lavagem Broncoalveolar , Broncoscopia , Feminino , Citometria de Fluxo , Humanos , Pulmão/imunologia , Masculino , Pessoa de Meia-Idade , Modelos Moleculares , SuéciaRESUMO
Dictyostelium intermediate repeat sequence 1 (DIRS-1) is the founding member of a poorly characterized class of retrotransposable elements that contain inverse long terminal repeats and tyrosine recombinase instead of DDE-type integrase enzymes. In Dictyostelium discoideum, DIRS-1 forms clusters that adopt the function of centromeres, rendering tight retrotransposition control critical to maintaining chromosome integrity. We report that in deletion strains of the RNA-dependent RNA polymerase RrpC, full-length and shorter DIRS-1 messenger RNAs are strongly enriched. Shorter versions of a hitherto unknown long non-coding RNA in DIRS-1 antisense orientation are also enriched in rrpC- strains. Concurrent with the accumulation of long transcripts, the vast majority of small (21 mer) DIRS-1 RNAs vanish in rrpC- strains. RNASeq reveals an asymmetric distribution of the DIRS-1 small RNAs, both along DIRS-1 and with respect to sense and antisense orientation. We show that RrpC is required for post-transcriptional DIRS-1 silencing and also for spreading of RNA silencing signals. Finally, DIRS-1 mis-regulation in the absence of RrpC leads to retrotransposon mobilization. In summary, our data reveal RrpC as a key player in the silencing of centromeric retrotransposon DIRS-1. RrpC acts at the post-transcriptional level and is involved in spreading of RNA silencing signals, both in the 5' and 3' directions.
Assuntos
Dictyostelium/genética , Interferência de RNA , RNA Polimerase Dependente de RNA/fisiologia , Retroelementos , Núcleo Celular/genética , Dictyostelium/enzimologia , Genoma , Regiões Promotoras Genéticas , RNA Antissenso/metabolismo , RNA Mensageiro/metabolismo , Pequeno RNA não Traduzido/metabolismo , RNA Polimerase Dependente de RNA/genética , Sequências Repetidas TerminaisRESUMO
BACKGROUND: The majority of published gene-expression studies have used RNA isolated from whole cells, overlooking the potential impact of including nuclear transcriptome in the analyses. In this study, mRNA fractions from the cytoplasm and from whole cells (total RNA) were prepared from three human cell lines and sequenced using massive parallel sequencing. RESULTS: For all three cell lines, of about 15000 detected genes approximately 400 to 1400 genes were detected in different amounts in the cytoplasmic and total RNA fractions. Transcripts detected at higher levels in the total RNA fraction had longer coding sequences and higher number of miRNA target sites. Transcripts detected at higher levels in the cytoplasmic fraction were shorter or contained shorter untranslated regions. Nuclear retention of transcripts and mRNA degradation via miRNA pathway might contribute to this differential detection of genes. The consequence of the differential detection was further investigated by comparison to proteomics data. Interestingly, the expression profiles of cytoplasmic and total RNA correlated equally well with protein abundance levels indicating regulation at a higher level. CONCLUSIONS: We conclude that expression levels derived from the total RNA fraction be regarded as an appropriate estimate of the amount of mRNAs present in a given cell population, independent of the coding sequence length or UTRs.
Assuntos
Núcleo Celular/genética , Citoplasma/genética , MicroRNAs/genética , RNA Mensageiro/genética , Linhagem Celular Tumoral , Regulação da Expressão Gênica , Humanos , Análise de Sequência de RNARESUMO
MOTIVATION: New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the 'novel' sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. RESULTS: A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can accurately and rapidly classify sequences as belonging or not belonging to a reference sequence. FACS was first optimized and validated using a synthetic metagenome dataset. An experimental metagenome dataset was then used to show that FACS achieves comparable accuracy as BLAT and SSAHA2 but is at least 21 times faster in classifying sequences. AVAILABILITY: Source code for FACS, Bloom filters and MetaSim dataset used is available at http://facs.biotech.kth.se. The Bloom::Faster 1.6 Perl module can be downloaded from CPAN at http://search.cpan.org/ approximately palvaro/Bloom-Faster-1.6/ CONTACTS: henrik.stranneheim@biotech.kth.se; joakiml@biotech.kth.se SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Algoritmos , Metagenoma , Análise de Sequência de DNA , Genoma Mitocondrial , Humanos , Sistema Respiratório/microbiologia , Sensibilidade e EspecificidadeRESUMO
Whole-genome sequencing (WGS) is a fundamental technology for research to advance precision medicine, but the limited availability of portable and user-friendly workflows for WGS analyses poses a major challenge for many research groups and hampers scientific progress. Here we present Sarek, an open-source workflow to detect germline variants and somatic mutations based on sequencing data from WGS, whole-exome sequencing (WES), or gene panels. Sarek features (i) easy installation, (ii) robust portability across different computer environments, (iii) comprehensive documentation, (iv) transparent and easy-to-read code, and (v) extensive quality metrics reporting. Sarek is implemented in the Nextflow workflow language and supports both Docker and Singularity containers as well as Conda environments, making it ideal for easy deployment on any POSIX-compatible computers and cloud compute environments. Sarek follows the GATK best-practice recommendations for read alignment and pre-processing, and includes a wide range of software for the identification and annotation of germline and somatic single-nucleotide variants, insertion and deletion variants, structural variants, tumour sample purity, and variations in ploidy and copy number. Sarek offers easy, efficient, and reproducible WGS analyses, and can readily be used both as a production workflow at sequencing facilities and as a powerful stand-alone tool for individual research groups. The Sarek source code, documentation and installation instructions are freely available at https://github.com/nf-core/sarek and at https://nf-co.re/sarek/.
Assuntos
Células Germinativas , Software , Sequenciamento Completo do Genoma/métodos , Fluxo de Trabalho , HumanosRESUMO
We report on the incorporation of the Visual DNA concept in a genotyping assay as a simple and straightforward detection tool. The principle of trapping streptavidin-coated superparamagnetic beads of micrometer size for visualization of genetic variances is used for PrASE-based detection of a panel of mutations in the severe and common genetic disorder of cystic fibrosis. The method allows a final investigation of genotypes by the naked eye and the output is easily documented using a regular hand-held device with an integrated digital camera. A number of samples were run through the assay, showing rapid and accurate detection using superparamagnetic beads and an off-the-shelf neodymium magnet. The assay emphasizes the power of Visual DNA and demonstrates the potential value of the method in future point-of-care tests.
Assuntos
DNA/análise , Técnicas de Diagnóstico Molecular/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Polimorfismo Genético/genética , Análise de Sequência de DNA/métodos , Alelos , Regulador de Condutância Transmembrana em Fibrose Cística/genética , Magnetismo , Microesferas , Peptídeo Hidrolases , Reação em Cadeia da PolimeraseRESUMO
The future of human genomics is one that seeks to resolve the entirety of genetic variation through sequencing. The prospect of utilizing genomics for medical purposes require cost-efficient and accurate base calling, long-range haplotyping capability, and reliable calling of structural variants. Short-read sequencing has lead the development towards such a future but has struggled to meet the latter two of these needs. To address this limitation, we developed a technology that preserves the molecular origin of short sequencing reads, with an insignificant increase to sequencing costs. We demonstrate a novel library preparation method for high throughput barcoding of short reads where millions of random barcodes can be used to reconstruct megabase-scale phase blocks.
Assuntos
Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Código de Barras de DNA Taxonômico , Visualização de Dados , Biblioteca Gênica , Genoma Humano , Haplótipos , HumanosRESUMO
Here, we present the genome of the industrial ethanol production strain Brettanomyces bruxellensis CBS 11270. The nuclear genome was found to be diploid, containing four chromosomes with sizes of ranging from 2.2 to 4.0 Mbp. A 75 Kbp mitochondrial genome was also identified. Comparing the homologous chromosomes, we detected that 0.32% of nucleotides were polymorphic, i.e. formed single nucleotide polymorphisms (SNPs), 40.6% of them were found in coding regions (i.e. 0.13% of all nucleotides formed SNPs and were in coding regions). In addition, 8,538 indels were found. The total number of protein coding genes was 4897, of them, 4,284 were annotated on chromosomes; and the mitochondrial genome contained 18 protein coding genes. Additionally, 595 genes, which were annotated, were on contigs not associated with chromosomes. A number of genes was duplicated, most of them as tandem repeats, including a six-gene cluster located on chromosome 3. There were also examples of interchromosomal gene duplications, including a duplication of a six-gene cluster, which was found on both chromosomes 1 and 4. Gene copy number analysis suggested loss of heterozygosity for 372 genes. This may reflect adaptation to relatively harsh but constant conditions of continuous fermentation. Analysis of gene topology showed that most of these losses occurred in clusters of more than one gene, the largest cluster comprising 33 genes. Comparative analysis against the wine isolate CBS 2499 revealed 88,534 SNPs and 8,133 indels. Moreover, when the scaffolds of the CBS 2499 genome assembly were aligned against the chromosomes of CBS 11270, many of them aligned completely, some have chunks aligned to different chromosomes, and some were in fact rearranged. Our findings indicate a highly dynamic genome within the species B. bruxellensis and a tendency towards reduction of gene number in long-term continuous cultivation.
Assuntos
Brettanomyces/metabolismo , Cromossomos Fúngicos/genética , Etanol/metabolismo , Mitocôndrias/genética , Brettanomyces/genética , Mapeamento de Sequências Contíguas , Evolução Molecular , Dosagem de Genes , Variação Genética , Tamanho do Genoma , Anotação de Sequência Molecular , Filogenia , Sequenciamento Completo do Genoma/métodosRESUMO
Urban sewer systems consist of wastewater and stormwater sewers, of which only wastewater is processed before being discharged. Occasionally, misconnections or damages in the network occur, resulting in untreated wastewater entering natural water bodies via the stormwater system. Cultivation of faecal indicator bacteria (e.g. Escherichia coli; E. coli) is the current standard for tracing wastewater contamination. This method is cheap but has limited specificity and mobility. Here, we compared the E. coli culturing approach with two sequencing-based methodologies (Illumina MiSeq 16S rRNA gene amplicon sequencing and Oxford Nanopore MinION shotgun metagenomic sequencing), analysing 73 stormwater samples collected in Stockholm. High correlations were obtained between E. coli culturing counts and frequencies of human gut microbiome amplicon sequences, indicating E. coli is indeed a good indicator of faecal contamination. However, the amplicon data further holds information on contamination source or alternatively how much time has elapsed since the faecal matter has entered the system. Shotgun metagenomic sequencing on a subset of the samples using a portable real-time sequencer, MinION, correlated well with the amplicon sequencing data. This study demonstrates the use of DNA sequencing to detect human faecal contamination in stormwater systems and the potential of tracing faecal contamination directly in the field.
Assuntos
Bactérias/isolamento & purificação , Fezes/microbiologia , Análise de Sequência de DNA/métodos , Esgotos/microbiologia , Águas Residuárias/microbiologia , Microbiologia da Água , Bactérias/classificação , Bactérias/genética , Monitoramento Ambiental/métodos , Escherichia coli/genética , Escherichia coli/isolamento & purificação , Humanos , RNA Ribossômico 16S/genética , Poluição da Água/prevenção & controle , Qualidade da Água/normasRESUMO
Here, we present a novel method for SNP genotyping based on protease-mediated allele-specific primer extension (PrASE), where the two allele-specific extension primers only differ in their 3'-positions. As reported previously [Ahmadian,A., Gharizadeh,B., O'Meara,D., Odeberg,J. and Lundeberg,J. (2001), Nucleic Acids Res., 29, e121], the kinetics of perfectly matched primer extension is faster than mismatched primer extension. In this study, we have utilized this difference in kinetics by adding protease, a protein-degrading enzyme, to discriminate between the extension reactions. The competition between the polymerase activity and the enzymatic degradation yields extension of the perfectly matched primer, while the slower extension of mismatched primer is eliminated. To allow multiplex and simultaneous detection of the investigated single nucleotide polymorphisms (SNPs), each extension primer was given a unique signature tag sequence on its 5' end, complementary to a tag on a generic array. A multiplex nested PCR with 13 SNPs was performed in a total of 36 individuals and their alleles were scored. To demonstrate the improvements in scoring SNPs by PrASE, we also genotyped the individuals without inclusion of protease in the extension. We conclude that the developed assay is highly allele-specific, with excellent multiplex SNP capabilities.