RESUMEN
Complex chromosomal rearrangements (CCRs) are rearrangements involving more than two chromosomes or more than two breakpoints. Whole genome sequencing (WGS) allows for outstanding high resolution characterization on the nucleotide level in unique sequences of such rearrangements, but problems remain for mapping breakpoints in repetitive regions of the genome, which are known to be prone to rearrangements. Hence, multiple complementary WGS experiments are sometimes needed to solve the structures of CCRs. We have studied three individuals with CCRs: Case 1 and Case 2 presented with de novo karyotypically balanced, complex interchromosomal rearrangements (46,XX,t(2;8;15)(q35;q24.1;q22) and 46,XY,t(1;10;5)(q32;p12;q31)), and Case 3 presented with a de novo, extremely complex intrachromosomal rearrangement on chromosome 1. Molecular cytogenetic investigation revealed cryptic deletions in the breakpoints of chromosome 2 and 8 in Case 1, and on chromosome 10 in Case 2, explaining their clinical symptoms. In Case 3, 26 breakpoints were identified using WGS, disrupting five known disease genes. All rearrangements were subsequently analyzed using optical maps, linked-read WGS, and short-read WGS. In conclusion, we present a case series of three unique de novo CCRs where we by combining the results from the different technologies fully solved the structure of each rearrangement. The power in combining short-read WGS with long-molecule sequencing or optical mapping in these unique de novo CCRs in a clinical setting is demonstrated.
Asunto(s)
Cromosomas/genética , Reordenamiento Génico/genética , Variación Estructural del Genoma/genética , Mapeo Cromosómico/métodos , Femenino , Humanos , Masculino , Secuenciación Completa del Genoma/métodosRESUMEN
OBJECTIVE: This study sought to determine whether 18F-fluorodeoxyglucose-positron emission tomography/computed tomography could be applied to a murine model of advanced atherosclerotic plaque vulnerability to detect response to therapeutic intervention and changes in lesion stability. Approach and Results: To analyze plaques susceptible to rupture, we fed ApoE-/- mice a high-fat diet and induced vulnerable lesions by cast placement over the carotid artery. After 9 weeks of treatment with orthogonal therapeutic agents (including lipid-lowering and proefferocytic therapies), we assessed vascular inflammation and several features of plaque vulnerability by 18F-fluorodeoxyglucose-positron emission tomography/computed tomography and histopathology, respectively. We observed that 18F-fluorodeoxyglucose-positron emission tomography/computed tomography had the capacity to resolve histopathologically proven changes in plaque stability after treatment. Moreover, mean target-to-background ratios correlated with multiple characteristics of lesion instability, including the corrected vulnerability index. CONCLUSIONS: These results suggest that the application of noninvasive 18F-fluorodeoxyglucose-positron emission tomography/computed tomography to a murine model can allow for the identification of vulnerable atherosclerotic plaques and their response to therapeutic intervention. This approach may prove useful as a drug discovery and prioritization method.
Asunto(s)
Enfermedades de las Arterias Carótidas/diagnóstico por imagen , Arteria Carótida Común/diagnóstico por imagen , Fluorodesoxiglucosa F18/administración & dosificación , Placa Aterosclerótica , Tomografía Computarizada por Tomografía de Emisión de Positrones , Radiofármacos/administración & dosificación , Animales , Anticuerpos Bloqueadores/farmacología , Atorvastatina/farmacología , Antígeno CD47/antagonistas & inhibidores , Enfermedades de las Arterias Carótidas/tratamiento farmacológico , Enfermedades de las Arterias Carótidas/patología , Arteria Carótida Común/efectos de los fármacos , Arteria Carótida Común/patología , Modelos Animales de Enfermedad , Inhibidores de Hidroximetilglutaril-CoA Reductasas/farmacología , Masculino , Ratones Endogámicos C57BL , Ratones Noqueados para ApoE , Valor Predictivo de las Pruebas , Rotura EspontáneaRESUMEN
Clustered copy number variants (CNVs) as detected by chromosomal microarray analysis (CMA) are often reported as germline chromothripsis. However, such cases might need further investigations by massive parallel whole genome sequencing (WGS) in order to accurately define the underlying complex rearrangement, predict the occurrence mechanisms and identify additional complexities. Here, we utilized WGS to delineate the rearrangement structure of 21 clustered CNV carriers first investigated by CMA and identified a total of 83 breakpoint junctions (BPJs). The rearrangements were further sub-classified depending on the patterns observed: I) Cases with only deletions (n = 8) often had additional structural rearrangements, such as insertions and inversions typical to chromothripsis; II) cases with only duplications (n = 7) or III) combinations of deletions and duplications (n = 6) demonstrated mostly interspersed duplications and BPJs enriched with microhomology. In two cases the rearrangement mutational signatures indicated both a breakage-fusion-bridge cycle process and haltered formation of a ring chromosome. Finally, we observed two cases with Alu- and LINE-mediated rearrangements as well as two unrelated individuals with seemingly identical clustered CNVs on 2p25.3, possibly a rare European founder rearrangement. In conclusion, through detailed characterization of the derivative chromosomes we show that multiple mechanisms are likely involved in the formation of clustered CNVs and add further evidence for chromoanagenesis mechanisms in both "simple" and highly complex chromosomal rearrangements. Finally, WGS characterization adds positional information, important for a correct clinical interpretation and deciphering mechanisms involved in the formation of these rearrangements.
Asunto(s)
Variaciones en el Número de Copia de ADN , Replicación del ADN/genética , Elementos Alu , Puntos de Rotura del Cromosoma , Cromotripsis , Reordenamiento Génico , Genoma Humano , Humanos , Elementos de Nucleótido Esparcido Largo , Análisis de Secuencia por Matrices de Oligonucleótidos , Secuenciación Completa del GenomaRESUMEN
Conifers have dominated forests for more than 200 million years and are of huge ecological and economic importance. Here we present the draft assembly of the 20-gigabase genome of Norway spruce (Picea abies), the first available for any gymnosperm. The number of well-supported genes (28,354) is similar to the >100 times smaller genome of Arabidopsis thaliana, and there is no evidence of a recent whole-genome duplication in the gymnosperm lineage. Instead, the large genome size seems to result from the slow and steady accumulation of a diverse set of long-terminal repeat transposable elements, possibly owing to the lack of an efficient elimination mechanism. Comparative sequencing of Pinus sylvestris, Abies sibirica, Juniperus communis, Taxus baccata and Gnetum gnemon reveals that the transposable element diversity is shared among extant conifers. Expression of 24-nucleotide small RNAs, previously implicated in transposable element silencing, is tissue-specific and much lower than in other plants. We further identify numerous long (>10,000 base pairs) introns, gene-like fragments, uncharacterized long non-coding RNAs and short RNAs. This opens up new genomic avenues for conifer forestry and breeding.
Asunto(s)
Evolución Molecular , Genoma de Planta/genética , Picea/genética , Secuencia Conservada/genética , Elementos Transponibles de ADN/genética , Silenciador del Gen , Genes de Plantas/genética , Genómica , Internet , Intrones/genética , Fenotipo , ARN no Traducido/genética , Análisis de Secuencia de ADN , Secuencias Repetidas Terminales/genética , Transcripción Genética/genéticaRESUMEN
Data produced with short-read sequencing technologies result in ambiguous haplotyping and a limited capacity to investigate the full repertoire of biologically relevant forms of genetic variation. The notion of haplotype-resolved sequencing data has recently gained traction to reduce this unwanted ambiguity and enable exploration of other forms of genetic variation; beyond studies of just nucleotide polymorphisms, such as compound heterozygosity and structural variations. Here we describe Droplet Barcode Sequencing, a novel approach for creating linked-read sequencing libraries by uniquely barcoding the information within single DNA molecules in emulsion droplets, without the aid of specialty reagents or microfluidic devices. Barcode generation and template amplification is performed simultaneously in a single enzymatic reaction, greatly simplifying the workflow and minimizing assay costs compared to alternative approaches. The method has been applied to phase multiple loci targeting all exons of the highly variable Human Leukocyte Antigen A (HLA-A) gene, with DNA from eight individuals present in the same assay. Barcode-based clustering of sequencing reads confirmed analysis of over 2000 independently assayed template molecules, with an average of 753 reads in support of called polymorphisms. Our results show unequivocal characterization of all alleles present, validated by correspondence against confirmed HLA database entries and haplotyping results from previous studies.
Asunto(s)
Código de Barras del ADN Taxonómico/métodos , Haplotipos , Alelos , Biblioteca de Genes , Antígenos HLA-A/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Reacción en Cadena de la Polimerasa , Análisis de Secuencia de ADNRESUMEN
To improve the epigenomic analysis of tissues rich in 5-hydroxymethylcytosine (hmC), we developed a novel protocol called TAB-Methyl-SEQ, which allows for single base resolution profiling of both hmC and 5-methylcytosine by targeted next-generation sequencing. TAB-Methyl-SEQ data were extensively validated by a set of five methodologically different protocols. Importantly, these extensive cross-comparisons revealed that protocols based on Tet1-assisted bisulfite conversion provided more precise hmC values than TrueMethyl-based methods. A total of 109 454 CpG sites were analyzed by TAB-Methyl-SEQ for mC and hmC in 188 genes from 20 different adult human livers. We describe three types of variability of hepatic hmC profiles: (i) sample-specific variability at 40.8% of CpG sites analyzed, where the local hmC values correlate to the global hmC content of livers (measured by LC-MS), (ii) gene-specific variability, where hmC levels in the coding regions positively correlate to expression of the respective gene and (iii) site-specific variability, where prominent hmC peaks span only 1 to 3 neighboring CpG sites. Our data suggest that both the gene- and site-specific components of hmC variability might contribute to the epigenetic control of hepatic genes. The protocol described here should be useful for targeted DNA analysis in a variety of applications.
Asunto(s)
5-Metilcitosina/análogos & derivados , Emparejamiento Base , Regulación de la Expresión Génica , Genes , Hígado/metabolismo , 5-Metilcitosina/metabolismo , Adulto , Secuencia de Bases , Cromatografía Liquida , Islas de CpG/genética , ADN/metabolismo , Humanos , Espectrometría de Masas , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN , Sulfitos/metabolismoRESUMEN
Most balanced translocations are thought to result mechanistically from nonhomologous end joining or, in rare cases of recurrent events, by nonallelic homologous recombination. Here, we use low-coverage mate pair whole-genome sequencing to fine map rearrangement breakpoint junctions in both phenotypically normal and affected translocation carriers. In total, 46 junctions from 22 carriers of balanced translocations were characterized. Genes were disrupted in 48% of the breakpoints; recessive genes in four normal carriers and known dominant intellectual disability genes in three affected carriers. Finally, seven candidate disease genes were disrupted in five carriers with neurocognitive disabilities (SVOPL, SUSD1, TOX, NCALD, SLC4A10) and one XX-male carrier with Tourette syndrome (LYPD6, GPC5). Breakpoint junction analyses revealed microhomology and small templated insertions in a substantive fraction of the analyzed translocations (17.4%; n = 4); an observation that was substantiated by reanalysis of 37 previously published translocation junctions. Microhomology associated with templated insertions is a characteristic seen in the breakpoint junctions of rearrangements mediated by error-prone replication-based repair mechanisms. Our data implicate that a mechanism involving template switching might contribute to the formation of at least 15% of the interchromosomal translocation events.
Asunto(s)
Mapeo Cromosómico , Translocación Genética , Secuenciación Completa del Genoma , Secuencia de Bases , Rotura Cromosómica , Hibridación Genómica Comparativa , Variaciones en el Número de Copia de ADN , Femenino , Estudios de Asociación Genética , Genómica/métodos , Genotipo , Recombinación Homóloga , Humanos , Hibridación Fluorescente in Situ , Cariotipo , Masculino , FenotipoRESUMEN
MOTIVATION: Fast and accurate quality control is essential for studies involving next-generation sequencing data. Whilst numerous tools exist to quantify QC metrics, there is no common approach to flexibly integrate these across tools and large sample sets. Assessing analysis results across an entire project can be time consuming and error prone; batch effects and outlier samples can easily be missed in the early stages of analysis. RESULTS: We present MultiQC, a tool to create a single report visualising output from multiple tools across many samples, enabling global trends and biases to be quickly identified. MultiQC can plot data from many common bioinformatics tools and is built to allow easy extension and customization. AVAILABILITY AND IMPLEMENTATION: MultiQC is available with an GNU GPLv3 license on GitHub, the Python Package Index and Bioconda. Documentation and example reports are available at http://multiqc.info CONTACT: phil.ewels@scilifelab.se.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Control de Calidad , Biología Computacional , Programas InformáticosRESUMEN
Arsenic, a carcinogen with immunotoxic effects, is a common contaminant of drinking water and certain food worldwide. We hypothesized that chronic arsenic exposure alters gene expression, potentially by altering DNA methylation of genes encoding central components of the immune system. We therefore analyzed the transcriptomes (by RNA sequencing) and methylomes (by target-enrichment next-generation sequencing) of primary CD4-positive T cells from matched groups of four women each in the Argentinean Andes, with fivefold differences in urinary arsenic concentrations (median concentrations of urinary arsenic in the lower- and high-arsenic groups: 65 and 276 µg/l, respectively). Arsenic exposure was associated with genome-wide alterations of gene expression; principal component analysis indicated that the exposure explained 53% of the variance in gene expression among the top variable genes and 19% of 28,351 genes were differentially expressed (false discovery rate <0.05) between the exposure groups. Key genes regulating the immune system, such as tumor necrosis factor alpha and interferon gamma, as well as genes related to the NF-kappa-beta complex, were significantly downregulated in the high-arsenic group. Arsenic exposure was associated with genome-wide DNA methylation; the high-arsenic group had 3% points higher genome-wide full methylation (>80% methylation) than the lower-arsenic group. Differentially methylated regions that were hyper-methylated in the high-arsenic group showed enrichment for immune-related gene ontologies that constitute the basic functions of CD4-positive T cells, such as isotype switching and lymphocyte activation and differentiation. In conclusion, chronic arsenic exposure from drinking water was related to changes in the transcriptome and methylome of CD4-positive T cells, both genome wide and in specific genes, supporting the hypothesis that arsenic causes immunotoxicity by interfering with gene expression and regulation.
Asunto(s)
Arsénico/toxicidad , Linfocitos T CD4-Positivos/efectos de los fármacos , Metilación de ADN/efectos de los fármacos , Exposición a Riesgos Ambientales/efectos adversos , Regulación de la Expresión Génica/efectos de los fármacos , Adulto , Argentina , Linfocitos T CD4-Positivos/fisiología , Islas de CpG , Femenino , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Persona de Mediana Edad , Regiones Promotoras GenéticasRESUMEN
BACKGROUND: Bisulfite treatment of DNA followed by sequencing (BS-seq) has become a standard technique in epigenetic studies, providing researchers with tools for generating single-base resolution maps of whole methylomes. Aligning bisulfite-treated reads, however, is a computationally difficult task: bisulfite treatment decreases the (lexical) complexity of low-methylated genomic regions, and C-to-T mismatches may reflect cytosine unmethylation rather than SNPs or sequencing errors. Further challenges arise both during and after the alignment phase: data structures used by the aligner should be fast and should fit into main memory, and the methylation-caller output should be somehow compressed, due to its significant size. METHODS: As far as data structures employed to align bisulfite-treated reads are concerned, solutions proposed in the literature can be roughly grouped into two main categories: those storing pointers at each text position (e.g. hash tables, suffix trees/arrays), and those using the information-theoretic minimum number of bits (e.g. FM indexes and compressed suffix arrays). The former are fast and memory consuming. The latter are much slower and light. In this paper, we try to close this gap proposing a data structure for aligning bisulfite-treated reads which is at the same time fast, light, and very accurate. We reach this objective by combining a recent theoretical result on succinct hashing with a bisulfite-aware hash function. Furthermore, the new versions of the tools implementing our ideas|the aligner ERNE-BS5 2 and the caller ERNE-METH 2|have been extended with increased downstream compatibility (EPP/Bismark cov output formats), output compression, and support for target enrichment protocols. RESULTS: Experimental results on public and simulated WGBS libraries show that our algorithmic solution is a competitive tradeoff between hash-based and BWT-based indexes, being as fast and accurate as the former, and as memory-efficient as the latter. CONCLUSIONS: The new functionalities of our bisulfite aligner and caller make it a fast and memory efficient tool, useful to analyze big datasets with little computational resources, to easily process target enrichment data, and produce statistics such as protocol efficiency and coverage as a function of the distance from target regions.
Asunto(s)
Metilación de ADN , ADN/química , Epigenómica , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Sulfitos/química , Islas de CpG , Compresión de Datos , Genoma Humano , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , HumanosRESUMEN
In pulmonary sarcoidosis, CD4(+) T-cells expressing T-cell receptor Vα2.3 accumulate in the lungs of HLA-DRB1*03(+) patients. To investigate T-cell receptor-HLA-DRB1*03 interactions underlying recognition of hitherto unknown antigens, we performed detailed analyses of T-cell receptor expression on bronchoalveolar lavage fluid CD4(+) T-cells from sarcoidosis patients.Pulmonary sarcoidosis patients (n=43) underwent bronchoscopy with bronchoalveolar lavage. T-cell receptor α and ß chains of CD4(+) T-cells were analysed by flow cytometry, DNA-sequenced, and three-dimensional molecular models of T-cell receptor-HLA-DRB1*03 complexes generated.Simultaneous expression of Vα2.3 with the Vß22 chain was identified in the lungs of all HLA-DRB1*03(+) patients. Accumulated Vα2.3/Vß22-expressing T-cells were highly clonal, with identical or near-identical Vα2.3 chain sequences and inter-patient similarities in Vß22 chain amino acid distribution. Molecular modelling revealed specific T-cell receptor-HLA-DRB1*03-peptide interactions, with a previously identified, sarcoidosis-associated vimentin peptide, (Vim)429-443 DSLPLVDTHSKRTLL, matching both the HLA peptide-binding cleft and distinct T-cell receptor features perfectly.We demonstrate, for the first time, the accumulation of large clonal populations of specific Vα2.3/Vß22 T-cell receptor-expressing CD4(+) T-cells in the lungs of HLA-DRB1*03(+) sarcoidosis patients. Several distinct contact points between Vα2.3/Vß22 receptors and HLA-DRB1*03 molecules suggest presentation of prototypic vimentin-derived peptides.
Asunto(s)
Linfocitos T CD4-Positivos/inmunología , Cadenas HLA-DRB1/metabolismo , Receptores de Antígenos de Linfocitos T/inmunología , Sarcoidosis Pulmonar/inmunología , Adulto , Líquido del Lavado Bronquioalveolar , Broncoscopía , Femenino , Citometría de Flujo , Humanos , Pulmón/inmunología , Masculino , Persona de Mediana Edad , Modelos Moleculares , SueciaRESUMEN
Dictyostelium intermediate repeat sequence 1 (DIRS-1) is the founding member of a poorly characterized class of retrotransposable elements that contain inverse long terminal repeats and tyrosine recombinase instead of DDE-type integrase enzymes. In Dictyostelium discoideum, DIRS-1 forms clusters that adopt the function of centromeres, rendering tight retrotransposition control critical to maintaining chromosome integrity. We report that in deletion strains of the RNA-dependent RNA polymerase RrpC, full-length and shorter DIRS-1 messenger RNAs are strongly enriched. Shorter versions of a hitherto unknown long non-coding RNA in DIRS-1 antisense orientation are also enriched in rrpC- strains. Concurrent with the accumulation of long transcripts, the vast majority of small (21 mer) DIRS-1 RNAs vanish in rrpC- strains. RNASeq reveals an asymmetric distribution of the DIRS-1 small RNAs, both along DIRS-1 and with respect to sense and antisense orientation. We show that RrpC is required for post-transcriptional DIRS-1 silencing and also for spreading of RNA silencing signals. Finally, DIRS-1 mis-regulation in the absence of RrpC leads to retrotransposon mobilization. In summary, our data reveal RrpC as a key player in the silencing of centromeric retrotransposon DIRS-1. RrpC acts at the post-transcriptional level and is involved in spreading of RNA silencing signals, both in the 5' and 3' directions.
Asunto(s)
Dictyostelium/genética , Interferencia de ARN , ARN Polimerasa Dependiente del ARN/fisiología , Retroelementos , Núcleo Celular/genética , Dictyostelium/enzimología , Genoma , Regiones Promotoras Genéticas , ARN sin Sentido/metabolismo , ARN Mensajero/metabolismo , ARN Pequeño no Traducido/metabolismo , ARN Polimerasa Dependiente del ARN/genética , Secuencias Repetidas TerminalesRESUMEN
BACKGROUND: The majority of published gene-expression studies have used RNA isolated from whole cells, overlooking the potential impact of including nuclear transcriptome in the analyses. In this study, mRNA fractions from the cytoplasm and from whole cells (total RNA) were prepared from three human cell lines and sequenced using massive parallel sequencing. RESULTS: For all three cell lines, of about 15000 detected genes approximately 400 to 1400 genes were detected in different amounts in the cytoplasmic and total RNA fractions. Transcripts detected at higher levels in the total RNA fraction had longer coding sequences and higher number of miRNA target sites. Transcripts detected at higher levels in the cytoplasmic fraction were shorter or contained shorter untranslated regions. Nuclear retention of transcripts and mRNA degradation via miRNA pathway might contribute to this differential detection of genes. The consequence of the differential detection was further investigated by comparison to proteomics data. Interestingly, the expression profiles of cytoplasmic and total RNA correlated equally well with protein abundance levels indicating regulation at a higher level. CONCLUSIONS: We conclude that expression levels derived from the total RNA fraction be regarded as an appropriate estimate of the amount of mRNAs present in a given cell population, independent of the coding sequence length or UTRs.
Asunto(s)
Núcleo Celular/genética , Citoplasma/genética , MicroARNs/genética , ARN Mensajero/genética , Línea Celular Tumoral , Regulación de la Expresión Génica , Humanos , Análisis de Secuencia de ARNRESUMEN
MOTIVATION: New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the 'novel' sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. RESULTS: A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can accurately and rapidly classify sequences as belonging or not belonging to a reference sequence. FACS was first optimized and validated using a synthetic metagenome dataset. An experimental metagenome dataset was then used to show that FACS achieves comparable accuracy as BLAT and SSAHA2 but is at least 21 times faster in classifying sequences. AVAILABILITY: Source code for FACS, Bloom filters and MetaSim dataset used is available at http://facs.biotech.kth.se. The Bloom::Faster 1.6 Perl module can be downloaded from CPAN at http://search.cpan.org/ approximately palvaro/Bloom-Faster-1.6/ CONTACTS: henrik.stranneheim@biotech.kth.se; joakiml@biotech.kth.se SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Algoritmos , Metagenoma , Análisis de Secuencia de ADN , Genoma Mitocondrial , Humanos , Sistema Respiratorio/microbiología , Sensibilidad y EspecificidadRESUMEN
Whole-genome sequencing (WGS) is a fundamental technology for research to advance precision medicine, but the limited availability of portable and user-friendly workflows for WGS analyses poses a major challenge for many research groups and hampers scientific progress. Here we present Sarek, an open-source workflow to detect germline variants and somatic mutations based on sequencing data from WGS, whole-exome sequencing (WES), or gene panels. Sarek features (i) easy installation, (ii) robust portability across different computer environments, (iii) comprehensive documentation, (iv) transparent and easy-to-read code, and (v) extensive quality metrics reporting. Sarek is implemented in the Nextflow workflow language and supports both Docker and Singularity containers as well as Conda environments, making it ideal for easy deployment on any POSIX-compatible computers and cloud compute environments. Sarek follows the GATK best-practice recommendations for read alignment and pre-processing, and includes a wide range of software for the identification and annotation of germline and somatic single-nucleotide variants, insertion and deletion variants, structural variants, tumour sample purity, and variations in ploidy and copy number. Sarek offers easy, efficient, and reproducible WGS analyses, and can readily be used both as a production workflow at sequencing facilities and as a powerful stand-alone tool for individual research groups. The Sarek source code, documentation and installation instructions are freely available at https://github.com/nf-core/sarek and at https://nf-co.re/sarek/.
Asunto(s)
Células Germinativas , Programas Informáticos , Secuenciación Completa del Genoma/métodos , Flujo de Trabajo , HumanosRESUMEN
We report on the incorporation of the Visual DNA concept in a genotyping assay as a simple and straightforward detection tool. The principle of trapping streptavidin-coated superparamagnetic beads of micrometer size for visualization of genetic variances is used for PrASE-based detection of a panel of mutations in the severe and common genetic disorder of cystic fibrosis. The method allows a final investigation of genotypes by the naked eye and the output is easily documented using a regular hand-held device with an integrated digital camera. A number of samples were run through the assay, showing rapid and accurate detection using superparamagnetic beads and an off-the-shelf neodymium magnet. The assay emphasizes the power of Visual DNA and demonstrates the potential value of the method in future point-of-care tests.
Asunto(s)
ADN/análisis , Técnicas de Diagnóstico Molecular/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Polimorfismo Genético/genética , Análisis de Secuencia de ADN/métodos , Alelos , Regulador de Conductancia de Transmembrana de Fibrosis Quística/genética , Magnetismo , Microesferas , Péptido Hidrolasas , Reacción en Cadena de la PolimerasaRESUMEN
The future of human genomics is one that seeks to resolve the entirety of genetic variation through sequencing. The prospect of utilizing genomics for medical purposes require cost-efficient and accurate base calling, long-range haplotyping capability, and reliable calling of structural variants. Short-read sequencing has lead the development towards such a future but has struggled to meet the latter two of these needs. To address this limitation, we developed a technology that preserves the molecular origin of short sequencing reads, with an insignificant increase to sequencing costs. We demonstrate a novel library preparation method for high throughput barcoding of short reads where millions of random barcodes can be used to reconstruct megabase-scale phase blocks.
Asunto(s)
Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Código de Barras del ADN Taxonómico , Visualización de Datos , Biblioteca de Genes , Genoma Humano , Haplotipos , HumanosRESUMEN
Here, we present the genome of the industrial ethanol production strain Brettanomyces bruxellensis CBS 11270. The nuclear genome was found to be diploid, containing four chromosomes with sizes of ranging from 2.2 to 4.0 Mbp. A 75 Kbp mitochondrial genome was also identified. Comparing the homologous chromosomes, we detected that 0.32% of nucleotides were polymorphic, i.e. formed single nucleotide polymorphisms (SNPs), 40.6% of them were found in coding regions (i.e. 0.13% of all nucleotides formed SNPs and were in coding regions). In addition, 8,538 indels were found. The total number of protein coding genes was 4897, of them, 4,284 were annotated on chromosomes; and the mitochondrial genome contained 18 protein coding genes. Additionally, 595 genes, which were annotated, were on contigs not associated with chromosomes. A number of genes was duplicated, most of them as tandem repeats, including a six-gene cluster located on chromosome 3. There were also examples of interchromosomal gene duplications, including a duplication of a six-gene cluster, which was found on both chromosomes 1 and 4. Gene copy number analysis suggested loss of heterozygosity for 372 genes. This may reflect adaptation to relatively harsh but constant conditions of continuous fermentation. Analysis of gene topology showed that most of these losses occurred in clusters of more than one gene, the largest cluster comprising 33 genes. Comparative analysis against the wine isolate CBS 2499 revealed 88,534 SNPs and 8,133 indels. Moreover, when the scaffolds of the CBS 2499 genome assembly were aligned against the chromosomes of CBS 11270, many of them aligned completely, some have chunks aligned to different chromosomes, and some were in fact rearranged. Our findings indicate a highly dynamic genome within the species B. bruxellensis and a tendency towards reduction of gene number in long-term continuous cultivation.
Asunto(s)
Brettanomyces/metabolismo , Cromosomas Fúngicos/genética , Etanol/metabolismo , Mitocondrias/genética , Brettanomyces/genética , Mapeo Contig , Evolución Molecular , Dosificación de Gen , Variación Genética , Tamaño del Genoma , Anotación de Secuencia Molecular , Filogenia , Secuenciación Completa del Genoma/métodosRESUMEN
Urban sewer systems consist of wastewater and stormwater sewers, of which only wastewater is processed before being discharged. Occasionally, misconnections or damages in the network occur, resulting in untreated wastewater entering natural water bodies via the stormwater system. Cultivation of faecal indicator bacteria (e.g. Escherichia coli; E. coli) is the current standard for tracing wastewater contamination. This method is cheap but has limited specificity and mobility. Here, we compared the E. coli culturing approach with two sequencing-based methodologies (Illumina MiSeq 16S rRNA gene amplicon sequencing and Oxford Nanopore MinION shotgun metagenomic sequencing), analysing 73 stormwater samples collected in Stockholm. High correlations were obtained between E. coli culturing counts and frequencies of human gut microbiome amplicon sequences, indicating E. coli is indeed a good indicator of faecal contamination. However, the amplicon data further holds information on contamination source or alternatively how much time has elapsed since the faecal matter has entered the system. Shotgun metagenomic sequencing on a subset of the samples using a portable real-time sequencer, MinION, correlated well with the amplicon sequencing data. This study demonstrates the use of DNA sequencing to detect human faecal contamination in stormwater systems and the potential of tracing faecal contamination directly in the field.
Asunto(s)
Bacterias/aislamiento & purificación , Heces/microbiología , Análisis de Secuencia de ADN/métodos , Aguas del Alcantarillado/microbiología , Aguas Residuales/microbiología , Microbiología del Agua , Bacterias/clasificación , Bacterias/genética , Monitoreo del Ambiente/métodos , Escherichia coli/genética , Escherichia coli/aislamiento & purificación , Humanos , ARN Ribosómico 16S/genética , Contaminación del Agua/prevención & control , Calidad del Agua/normasRESUMEN
Here, we present a novel method for SNP genotyping based on protease-mediated allele-specific primer extension (PrASE), where the two allele-specific extension primers only differ in their 3'-positions. As reported previously [Ahmadian,A., Gharizadeh,B., O'Meara,D., Odeberg,J. and Lundeberg,J. (2001), Nucleic Acids Res., 29, e121], the kinetics of perfectly matched primer extension is faster than mismatched primer extension. In this study, we have utilized this difference in kinetics by adding protease, a protein-degrading enzyme, to discriminate between the extension reactions. The competition between the polymerase activity and the enzymatic degradation yields extension of the perfectly matched primer, while the slower extension of mismatched primer is eliminated. To allow multiplex and simultaneous detection of the investigated single nucleotide polymorphisms (SNPs), each extension primer was given a unique signature tag sequence on its 5' end, complementary to a tag on a generic array. A multiplex nested PCR with 13 SNPs was performed in a total of 36 individuals and their alleles were scored. To demonstrate the improvements in scoring SNPs by PrASE, we also genotyped the individuals without inclusion of protease in the extension. We conclude that the developed assay is highly allele-specific, with excellent multiplex SNP capabilities.