Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
Clin Chim Acta ; 512: 40-48, 2021 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-33227269

RESUMEN

The aim of this study was to evaluate the performance of a novel NGS-based assay to monitor mixed chimerism (MC) and compare its technical capacity to established techniques for chimerism analysis. Artificial and clinical samples with increasing amounts of patient DNA were compared using real-time PCR detection of indels and SNP, fragment analysis of short-tandem repeats (STR) and NGS analysis of indels. Real-time PCR displayed excellent sensitivity (>0,01%) but poor accuracy (>20 CV% at MC > 20%), while fragment analysis exhibited good accuracy (<5 CV% at MC > 20%) with limited sensitivity (>2,5%). In contrast, NGS chimerism demonstrated a sensitivity (>0,1%) equal to real-time PCR and an accuracy equal or better than STR analysis throughout an extensive range of mixed chimerism (0,1 - 100%). To evaluate performance of the separate techniques for chimerism determination, 75 retrospective patient monitoring samples (3-7 weeks post-HSCT) with low (<5%), intermediate (5-20%) or high mixed chimerism (>20%) were analyzed. The between run precision for the NGS assay varied from 0,72% (>20% MC) to 7,38% (MC < 5%). In conclusion, NGS displayed a combination of high sensitivity with good accuracy in both artificial and clinical chimerism samples.


Asunto(s)
Quimerismo , Trasplante de Células Madre Hematopoyéticas , ADN , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Repeticiones de Microsatélite , Estudios Retrospectivos , Quimera por Trasplante
2.
Mol Ecol Resour ; 20(5): 1171-1181, 2020 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-30848092

RESUMEN

The high-throughput capacities of the Illumina sequencing platforms and the possibility to label samples individually have encouraged wide use of sample multiplexing. However, this practice results in read misassignment (usually <1%) across samples sequenced on the same lane. Alarmingly high rates of read misassignment of up to 10% were reported for lllumina sequencing machines with exclusion amplification chemistry. This may make use of these platforms prohibitive, particularly in studies that rely on low-quantity and low-quality samples, such as historical and archaeological specimens. Here, we use barcodes, short sequences that are ligated to both ends of the DNA insert, to directly quantify the rate of index hopping in 100-year old museum-preserved gorilla (Gorilla beringei) samples. Correcting for multiple sources of noise, we identify on average 0.470% of reads containing a hopped index. We show that sample-specific quantity of misassigned reads depends on the number of reads that any given sample contributes to the total sequencing pool, so that samples with few sequenced reads receive the greatest proportion of misassigned reads. This particularly affects ancient DNA samples, as these frequently differ in their DNA quantity and endogenous content. Through simulations we show that even low rates of index hopping, as reported here, can lead to biases in ancient DNA studies when multiplexing samples with vastly different quantities of endogenous material.


Asunto(s)
ADN Antiguo , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Animales , ADN , Código de Barras del ADN Taxonómico , Gorilla gorilla/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos
3.
PLoS Genet ; 15(2): e1007858, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30735495

RESUMEN

Complex chromosomal rearrangements (CCRs) are rearrangements involving more than two chromosomes or more than two breakpoints. Whole genome sequencing (WGS) allows for outstanding high resolution characterization on the nucleotide level in unique sequences of such rearrangements, but problems remain for mapping breakpoints in repetitive regions of the genome, which are known to be prone to rearrangements. Hence, multiple complementary WGS experiments are sometimes needed to solve the structures of CCRs. We have studied three individuals with CCRs: Case 1 and Case 2 presented with de novo karyotypically balanced, complex interchromosomal rearrangements (46,XX,t(2;8;15)(q35;q24.1;q22) and 46,XY,t(1;10;5)(q32;p12;q31)), and Case 3 presented with a de novo, extremely complex intrachromosomal rearrangement on chromosome 1. Molecular cytogenetic investigation revealed cryptic deletions in the breakpoints of chromosome 2 and 8 in Case 1, and on chromosome 10 in Case 2, explaining their clinical symptoms. In Case 3, 26 breakpoints were identified using WGS, disrupting five known disease genes. All rearrangements were subsequently analyzed using optical maps, linked-read WGS, and short-read WGS. In conclusion, we present a case series of three unique de novo CCRs where we by combining the results from the different technologies fully solved the structure of each rearrangement. The power in combining short-read WGS with long-molecule sequencing or optical mapping in these unique de novo CCRs in a clinical setting is demonstrated.


Asunto(s)
Cromosomas/genética , Reordenamiento Génico/genética , Variación Estructural del Genoma/genética , Mapeo Cromosómico/métodos , Femenino , Humanos , Masculino , Secuenciación Completa del Genoma/métodos
4.
PLoS Genet ; 14(11): e1007780, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30419018

RESUMEN

Clustered copy number variants (CNVs) as detected by chromosomal microarray analysis (CMA) are often reported as germline chromothripsis. However, such cases might need further investigations by massive parallel whole genome sequencing (WGS) in order to accurately define the underlying complex rearrangement, predict the occurrence mechanisms and identify additional complexities. Here, we utilized WGS to delineate the rearrangement structure of 21 clustered CNV carriers first investigated by CMA and identified a total of 83 breakpoint junctions (BPJs). The rearrangements were further sub-classified depending on the patterns observed: I) Cases with only deletions (n = 8) often had additional structural rearrangements, such as insertions and inversions typical to chromothripsis; II) cases with only duplications (n = 7) or III) combinations of deletions and duplications (n = 6) demonstrated mostly interspersed duplications and BPJs enriched with microhomology. In two cases the rearrangement mutational signatures indicated both a breakage-fusion-bridge cycle process and haltered formation of a ring chromosome. Finally, we observed two cases with Alu- and LINE-mediated rearrangements as well as two unrelated individuals with seemingly identical clustered CNVs on 2p25.3, possibly a rare European founder rearrangement. In conclusion, through detailed characterization of the derivative chromosomes we show that multiple mechanisms are likely involved in the formation of clustered CNVs and add further evidence for chromoanagenesis mechanisms in both "simple" and highly complex chromosomal rearrangements. Finally, WGS characterization adds positional information, important for a correct clinical interpretation and deciphering mechanisms involved in the formation of these rearrangements.


Asunto(s)
Variaciones en el Número de Copia de ADN , Replicación del ADN/genética , Elementos Alu , Puntos de Rotura del Cromosoma , Cromotripsis , Reordenamiento Génico , Genoma Humano , Humanos , Elementos de Nucleótido Esparcido Largo , Análisis de Secuencia por Matrices de Oligonucleótidos , Secuenciación Completa del Genoma
5.
Genes (Basel) ; 9(10)2018 Oct 09.
Artículo en Inglés | MEDLINE | ID: mdl-30304863

RESUMEN

The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data.

6.
PLoS One ; 13(3): e0193928, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29529047

RESUMEN

The detection of recurrent somatic chromosomal rearrangements is standard of care for most leukemia types. Even though karyotype analysis-a low-resolution genome-wide chromosome analysis-is still the gold standard, it often needs to be complemented with other methods to increase resolution. To evaluate the feasibility and applicability of mate pair whole genome sequencing (MP-WGS) to detect structural chromosomal rearrangements in the diagnostic setting, we sequenced ten bone marrow samples from leukemia patients with recurrent rearrangements. Samples were selected based on cytogenetic and FISH results at leukemia diagnosis to include common rearrangements of prognostic relevance. Using MP-WGS and in-house bioinformatic analysis all sought rearrangements were successfully detected. In addition, unexpected complexity or additional, previously undetected rearrangements was unraveled in three samples. Finally, the MP-WGS analysis pinpointed the location of chromosome junctions at high resolution and we were able to identify the exact exons involved in the resulting fusion genes in all samples and the specific junction at the nucleotide level in half of the samples. The results show that our approach combines the screening character from karyotype analysis with the specificity and resolution of cytogenetic and molecular methods. As a result of the straightforward analysis and high-resolution detection of clinically relevant rearrangements, we conclude that MP-WGS is a feasible method for routine leukemia diagnostics of structural chromosomal rearrangements.


Asunto(s)
Aberraciones Cromosómicas , Leucemia/genética , Secuenciación Completa del Genoma/métodos , Médula Ósea , Biología Computacional , Detección Precoz del Cáncer , Exones , Estudios de Factibilidad , Humanos , Hibridación Fluorescente in Situ , Leucemia/patología
7.
F1000Res ; 6: 664, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28781756

RESUMEN

Reliable detection of large structural variation ( > 1000 bp) is important in both rare and common genetic disorders. Whole genome sequencing (WGS) is a technology that may be used to identify a large proportion of the genomic structural variants (SVs) in an individual in a single experiment. Even though SV callers have been extensively used in research to detect mutations, the potential usage of SV callers within routine clinical diagnostics is hindered by high computational costs, usage of non-standard output format, and limited support for the various sequencing platforms and libraries. Another well known, but not well-addressed problem is the large number of benign variants and reference errors present in the human genome that further complicates analysis. Here we present TIDDIT, a time efficient variant caller, that uses discordant read pairs as well as the depth of coverage and split reads to detect and classify a large spectrum of SVs. As part of the software suite, TIDDIT also includes a database functionality that enables filtering for rare variants and reduces the number of false positive calls and background noise. Benchmarked against five state-of-the-art SV callers, TIDDIT performs at an equal/superior level while using only 2 CPU hours per sample. Thanks to its speed, sensitivity, flexibility and ability to easily detect variants on a wide range of WGS library types, TIDDIT solves many of the problems that are currently hindering the utilization of WGS for SV calling in clinical settings.

8.
Eur J Hum Genet ; 25(11): 1253-1260, 2017 11.
Artículo en Inglés | MEDLINE | ID: mdl-28832569

RESUMEN

Here we describe the SweGen data set, a comprehensive map of genetic variation in the Swedish population. These data represent a basic resource for clinical genetics laboratories as well as for sequencing-based association studies by providing information on genetic variant frequencies in a cohort that is well matched to national patient cohorts. To select samples for this study, we first examined the genetic structure of the Swedish population using high-density SNP-array data from a nation-wide cohort of over 10 000 Swedish-born individuals included in the Swedish Twin Registry. A total of 1000 individuals, reflecting a cross-section of the population and capturing the main genetic structure, were selected for whole-genome sequencing. Analysis pipelines were developed for automated alignment, variant calling and quality control of the sequencing data. This resulted in a genome-wide collection of aggregated variant frequencies in the Swedish population that we have made available to the scientific community through the website https://swefreq.nbis.se. A total of 29.2 million single-nucleotide variants and 3.8 million indels were detected in the 1000 samples, with 9.9 million of these variants not present in current databases. Each sample contributed with an average of 7199 individual-specific variants. In addition, an average of 8645 larger structural variants (SVs) were detected per individual, and we demonstrate that the population frequencies of these SVs can be used for efficient filtering analyses. Finally, our results show that the genetic diversity within Sweden is substantial compared with the diversity among continental European populations, underscoring the relevance of establishing a local reference data set.


Asunto(s)
Genoma Humano , Polimorfismo de Nucleótido Simple , Sistema de Registros , Conjuntos de Datos como Asunto , Estudio de Asociación del Genoma Completo , Humanos , Suecia , Gemelos/genética
9.
Arch Toxicol ; 91(5): 2067-2078, 2017 May.
Artículo en Inglés | MEDLINE | ID: mdl-27838757

RESUMEN

Arsenic, a carcinogen with immunotoxic effects, is a common contaminant of drinking water and certain food worldwide. We hypothesized that chronic arsenic exposure alters gene expression, potentially by altering DNA methylation of genes encoding central components of the immune system. We therefore analyzed the transcriptomes (by RNA sequencing) and methylomes (by target-enrichment next-generation sequencing) of primary CD4-positive T cells from matched groups of four women each in the Argentinean Andes, with fivefold differences in urinary arsenic concentrations (median concentrations of urinary arsenic in the lower- and high-arsenic groups: 65 and 276 µg/l, respectively). Arsenic exposure was associated with genome-wide alterations of gene expression; principal component analysis indicated that the exposure explained 53% of the variance in gene expression among the top variable genes and 19% of 28,351 genes were differentially expressed (false discovery rate <0.05) between the exposure groups. Key genes regulating the immune system, such as tumor necrosis factor alpha and interferon gamma, as well as genes related to the NF-kappa-beta complex, were significantly downregulated in the high-arsenic group. Arsenic exposure was associated with genome-wide DNA methylation; the high-arsenic group had 3% points higher genome-wide full methylation (>80% methylation) than the lower-arsenic group. Differentially methylated regions that were hyper-methylated in the high-arsenic group showed enrichment for immune-related gene ontologies that constitute the basic functions of CD4-positive T cells, such as isotype switching and lymphocyte activation and differentiation. In conclusion, chronic arsenic exposure from drinking water was related to changes in the transcriptome and methylome of CD4-positive T cells, both genome wide and in specific genes, supporting the hypothesis that arsenic causes immunotoxicity by interfering with gene expression and regulation.


Asunto(s)
Arsénico/toxicidad , Linfocitos T CD4-Positivos/efectos de los fármacos , Metilación de ADN/efectos de los fármacos , Exposición a Riesgos Ambientales/efectos adversos , Regulación de la Expresión Génica/efectos de los fármacos , Adulto , Argentina , Linfocitos T CD4-Positivos/fisiología , Islas de CpG , Femenino , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Persona de Mediana Edad , Regiones Promotoras Genéticas
10.
Hum Mutat ; 38(2): 180-192, 2017 02.
Artículo en Inglés | MEDLINE | ID: mdl-27862604

RESUMEN

Most balanced translocations are thought to result mechanistically from nonhomologous end joining or, in rare cases of recurrent events, by nonallelic homologous recombination. Here, we use low-coverage mate pair whole-genome sequencing to fine map rearrangement breakpoint junctions in both phenotypically normal and affected translocation carriers. In total, 46 junctions from 22 carriers of balanced translocations were characterized. Genes were disrupted in 48% of the breakpoints; recessive genes in four normal carriers and known dominant intellectual disability genes in three affected carriers. Finally, seven candidate disease genes were disrupted in five carriers with neurocognitive disabilities (SVOPL, SUSD1, TOX, NCALD, SLC4A10) and one XX-male carrier with Tourette syndrome (LYPD6, GPC5). Breakpoint junction analyses revealed microhomology and small templated insertions in a substantive fraction of the analyzed translocations (17.4%; n = 4); an observation that was substantiated by reanalysis of 37 previously published translocation junctions. Microhomology associated with templated insertions is a characteristic seen in the breakpoint junctions of rearrangements mediated by error-prone replication-based repair mechanisms. Our data implicate that a mechanism involving template switching might contribute to the formation of at least 15% of the interchromosomal translocation events.


Asunto(s)
Mapeo Cromosómico , Translocación Genética , Secuenciación Completa del Genoma , Secuencia de Bases , Rotura Cromosómica , Hibridación Genómica Comparativa , Variaciones en el Número de Copia de ADN , Femenino , Estudios de Asociación Genética , Genómica/métodos , Genotipo , Recombinación Homóloga , Humanos , Hibridación Fluorescente in Situ , Cariotipo , Masculino , Fenotipo
11.
Gigascience ; 5: 26, 2016 06 07.
Artículo en Inglés | MEDLINE | ID: mdl-27267963

RESUMEN

With ever-increasing amounts of data being produced by next-generation sequencing (NGS) experiments, the requirements placed on supporting e-infrastructures have grown. In this work, we provide recommendations based on the collective experiences from participants in the EU COST Action SeqAhead for the tasks of data preprocessing, upstream processing, data delivery, and downstream analysis, as well as long-term storage and archiving. We cover demands on computational and storage resources, networks, software stacks, automation of analysis, education, and also discuss emerging trends in the field. E-infrastructures for NGS require substantial effort to set up and maintain over time, and with sequencing technologies and best practices for data analysis evolving rapidly it is important to prioritize both processing capacity and e-infrastructure flexibility when making strategic decisions to support the data analysis demands of tomorrow. Due to increasingly demanding technical requirements we recommend that e-infrastructure development and maintenance be handled by a professional service unit, be it internal or external to the organization, and emphasis should be placed on collaboration between researchers and IT professionals.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Biología Computacional/métodos , Humanos , Almacenamiento y Recuperación de la Información , Internet , Programas Informáticos
12.
BMC Bioinformatics ; 17 Suppl 4: 69, 2016 Mar 02.
Artículo en Inglés | MEDLINE | ID: mdl-26961371

RESUMEN

BACKGROUND: Bisulfite treatment of DNA followed by sequencing (BS-seq) has become a standard technique in epigenetic studies, providing researchers with tools for generating single-base resolution maps of whole methylomes. Aligning bisulfite-treated reads, however, is a computationally difficult task: bisulfite treatment decreases the (lexical) complexity of low-methylated genomic regions, and C-to-T mismatches may reflect cytosine unmethylation rather than SNPs or sequencing errors. Further challenges arise both during and after the alignment phase: data structures used by the aligner should be fast and should fit into main memory, and the methylation-caller output should be somehow compressed, due to its significant size. METHODS: As far as data structures employed to align bisulfite-treated reads are concerned, solutions proposed in the literature can be roughly grouped into two main categories: those storing pointers at each text position (e.g. hash tables, suffix trees/arrays), and those using the information-theoretic minimum number of bits (e.g. FM indexes and compressed suffix arrays). The former are fast and memory consuming. The latter are much slower and light. In this paper, we try to close this gap proposing a data structure for aligning bisulfite-treated reads which is at the same time fast, light, and very accurate. We reach this objective by combining a recent theoretical result on succinct hashing with a bisulfite-aware hash function. Furthermore, the new versions of the tools implementing our ideas|the aligner ERNE-BS5 2 and the caller ERNE-METH 2|have been extended with increased downstream compatibility (EPP/Bismark cov output formats), output compression, and support for target enrichment protocols. RESULTS: Experimental results on public and simulated WGBS libraries show that our algorithmic solution is a competitive tradeoff between hash-based and BWT-based indexes, being as fast and accurate as the former, and as memory-efficient as the latter. CONCLUSIONS: The new functionalities of our bisulfite aligner and caller make it a fast and memory efficient tool, useful to analyze big datasets with little computational resources, to easily process target enrichment data, and produce statistics such as protocol efficiency and coverage as a function of the distance from target regions.


Asunto(s)
Metilación de ADN , ADN/química , Epigenómica , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Sulfitos/química , Islas de CpG , Compresión de Datos , Genoma Humano , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos
13.
BMC Evol Biol ; 16: 59, 2016 Mar 08.
Artículo en Inglés | MEDLINE | ID: mdl-26956800

RESUMEN

BACKGROUND: Although most insect species are specialized on one or few groups of plants, there are phytophagous insects that seem to use virtually any kind of plant as food. Understanding the nature of this ability to feed on a wide repertoire of plants is crucial for the control of pest species and for the elucidation of the macroevolutionary mechanisms of speciation and diversification of insect herbivores. Here we studied Vanessa cardui, the species with the widest diet breadth among butterflies and a potential insect pest, by comparing tissue-specific transcriptomes from caterpillars that were reared on different host plants. We tested whether the similarities of gene-expression response reflect the evolutionary history of adaptation to these plants in the Vanessa and related genera, against the null hypothesis of transcriptional profiles reflecting plant phylogenetic relatedness. RESULT: Using both unsupervised and supervised methods of data analysis, we found that the tissue-specific patterns of caterpillar gene expression are better explained by the evolutionary history of adaptation of the insects to the plants than by plant phylogeny. CONCLUSION: Our findings suggest that V. cardui may use two sets of expressed genes to achieve polyphagy, one associated with the ancestral capability to consume Rosids and Asterids, and another allowing the caterpillar to incorporate a wide range of novel host-plants.


Asunto(s)
Evolución Biológica , Mariposas Diurnas/genética , Animales , Mariposas Diurnas/crecimiento & desarrollo , Mariposas Diurnas/fisiología , Herbivoria , Larva/fisiología , Magnoliopsida/genética , Magnoliopsida/fisiología , Oviposición , Filogenia , Transcriptoma
14.
Eur Respir J ; 47(3): 898-909, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26585430

RESUMEN

In pulmonary sarcoidosis, CD4(+) T-cells expressing T-cell receptor Vα2.3 accumulate in the lungs of HLA-DRB1*03(+) patients. To investigate T-cell receptor-HLA-DRB1*03 interactions underlying recognition of hitherto unknown antigens, we performed detailed analyses of T-cell receptor expression on bronchoalveolar lavage fluid CD4(+) T-cells from sarcoidosis patients.Pulmonary sarcoidosis patients (n=43) underwent bronchoscopy with bronchoalveolar lavage. T-cell receptor α and ß chains of CD4(+) T-cells were analysed by flow cytometry, DNA-sequenced, and three-dimensional molecular models of T-cell receptor-HLA-DRB1*03 complexes generated.Simultaneous expression of Vα2.3 with the Vß22 chain was identified in the lungs of all HLA-DRB1*03(+) patients. Accumulated Vα2.3/Vß22-expressing T-cells were highly clonal, with identical or near-identical Vα2.3 chain sequences and inter-patient similarities in Vß22 chain amino acid distribution. Molecular modelling revealed specific T-cell receptor-HLA-DRB1*03-peptide interactions, with a previously identified, sarcoidosis-associated vimentin peptide, (Vim)429-443 DSLPLVDTHSKRTLL, matching both the HLA peptide-binding cleft and distinct T-cell receptor features perfectly.We demonstrate, for the first time, the accumulation of large clonal populations of specific Vα2.3/Vß22 T-cell receptor-expressing CD4(+) T-cells in the lungs of HLA-DRB1*03(+) sarcoidosis patients. Several distinct contact points between Vα2.3/Vß22 receptors and HLA-DRB1*03 molecules suggest presentation of prototypic vimentin-derived peptides.


Asunto(s)
Linfocitos T CD4-Positivos/inmunología , Cadenas HLA-DRB1/metabolismo , Receptores de Antígenos de Linfocitos T/inmunología , Sarcoidosis Pulmonar/inmunología , Adulto , Líquido del Lavado Bronquioalveolar , Broncoscopía , Femenino , Citometría de Flujo , Humanos , Pulmón/inmunología , Masculino , Persona de Mediana Edad , Modelos Moleculares , Suecia
15.
Gigascience ; 4: 56, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26617983

RESUMEN

BACKGROUND: It remains a challenge to perform de novo assembly using next-generation sequencing (NGS). Despite the availability of multiple sequencing technologies and tools (e.g., assemblers) it is still difficult to assemble new genomes at chromosome resolution (i.e., one sequence per chromosome). Obtaining high quality draft assemblies is extremely important in the case of yeast genomes to better characterise major events in their evolutionary history. The aim of this work is two-fold: on the one hand we want to show how combining different and somewhat complementary technologies is key to improving assembly quality and correctness, and on the other hand we present a de novo assembly pipeline we believe to be beneficial to core facility bioinformaticians. To demonstrate both the effectiveness of combining technologies and the simplicity of the pipeline, here we present the results obtained using the Dekkera bruxellensis genome. METHODS: In this work we used short-read Illumina data and long-read PacBio data combined with the extreme long-range information from OpGen optical maps in the task of de novo genome assembly and finishing. Moreover, we developed NouGAT, a semi-automated pipeline for read-preprocessing, de novo assembly and assembly evaluation, which was instrumental for this work. RESULTS: We obtained a high quality draft assembly of a yeast genome, resolved on a chromosomal level. Furthermore, this assembly was corrected for mis-assembly errors as demonstrated by resolving a large collapsed repeat and by receiving higher scores by assembly evaluation tools. With the inclusion of PacBio data we were able to fill about 5 % of the optical mapped genome not covered by the Illumina data.


Asunto(s)
Biología Computacional/métodos , Dekkera/genética , Genoma Fúngico , Mapeo Cromosómico/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos
16.
J Med Genet ; 52(2): 111-22, 2015 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-25473103

RESUMEN

BACKGROUND: Cytogenetically visible chromosomal translocations are highly informative as they can pinpoint strong effect genes even in complex genetic disorders. METHODS AND RESULTS: Here, we report a mother and daughter, both with borderline intelligence and learning problems within the dyslexia spectrum, and two apparently balanced reciprocal translocations: t(1;8)(p22;q24) and t(5;18)(p15;q11). By low coverage mate-pair whole-genome sequencing, we were able to pinpoint the genomic breakpoints to 2 kb intervals. By direct sequencing, we then located the chromosome 5p breakpoint to intron 9 of CTNND2. An additional case with a 163 kb microdeletion exclusively involving CTNND2 was identified with genome-wide array comparative genomic hybridisation. This microdeletion at 5p15.2 is also present in mosaic state in the patient's mother but absent from the healthy siblings. We then investigated the effect of CTNND2 polymorphisms on normal variability and identified a polymorphism (rs2561622) with significant effect on phonological ability and white matter volume in the left frontal lobe, close to cortical regions previously associated with phonological processing. Finally, given the potential role of CTNND2 in neuron motility, we used morpholino knockdown in zebrafish embryos to assess its effects on neuronal migration in vivo. Analysis of the zebrafish forebrain revealed a subpopulation of neurons misplaced between the diencephalon and telencephalon. CONCLUSIONS: Taken together, our human genetic and in vivo data suggest that defective migration of subpopulations of neuronal cells due to haploinsufficiency of CTNND2 contribute to the cognitive dysfunction in our patients.


Asunto(s)
Cateninas/genética , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Discapacidad Intelectual/genética , Lectura , Adolescente , Adulto , Secuencia de Bases , Niño , Puntos de Rotura del Cromosoma , Cognición , Exones/genética , Femenino , Sitios Genéticos , Proteínas Fluorescentes Verdes/metabolismo , Humanos , Intrones/genética , Masculino , Datos de Secuencia Molecular , Mutación/genética , Linaje , Polimorfismo de Nucleótido Simple/genética , Análisis de Secuencia de ADN , Translocación Genética , Sustancia Blanca/patología , Adulto Joven , Proteínas de Pez Cebra/genética , Catenina delta
17.
BMC Bioinformatics ; 15: 281, 2014 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-25128196

RESUMEN

BACKGROUND: The use of short reads from High Throughput Sequencing (HTS) techniques is now commonplace in de novo assembly. Yet, obtaining contiguous assemblies from short reads is challenging, thus making scaffolding an important step in the assembly pipeline. Different algorithms have been proposed but many of them use the number of read pairs supporting a linking of two contigs as an indicator of reliability. This reasoning is intuitive, but fails to account for variation in link count due to contig features.We have also noted that published scaffolders are only evaluated on small datasets using output from only one assembler. Two issues arise from this. Firstly, some of the available tools are not well suited for complex genomes. Secondly, these evaluations provide little support for inferring a software's general performance. RESULTS: We propose a new algorithm, implemented in a tool called BESST, which can scaffold genomes of all sizes and complexities and was used to scaffold the genome of P. abies (20 Gbp). We performed a comprehensive comparison of BESST against the most popular stand-alone scaffolders on a large variety of datasets. Our results confirm that some of the popular scaffolders are not practical to run on complex datasets. Furthermore, no single stand-alone scaffolder outperforms the others on all datasets. However, BESST fares favorably to the other tested scaffolders on GAGE datasets and, moreover, outperforms the other methods when library insert size distribution is wide. CONCLUSION: We conclude from our results that information sources other than the quantity of links, as is commonly used, can provide useful information about genome structure when scaffolding.


Asunto(s)
Algoritmos , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Biblioteca de Genes , Humanos , Reproducibilidad de los Resultados
18.
BMC Genomics ; 15: 439, 2014 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-24906298

RESUMEN

BACKGROUND: Sampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality. RESULTS: In order to sequence the genome of Norway spruce, which is of great size and complexity, we developed and applied a new technology based on the massive production, sequencing, and assembly of Fosmid pools (FP). The spruce chromosomes were sampled with ~40,000 bp Fosmid inserts to obtain around two-fold genome coverage, in parallel with traditional whole genome shotgun sequencing (WGS) of haploid and diploid genomes. Compared to the WGS results, the contiguity and quality of the FP assemblies were high, and they allowed us to fill WGS gaps resulting from repeats, low coverage, and allelic differences. The FP contig sets were further merged with WGS data using a novel software package GAM-NGS. CONCLUSIONS: By exploiting FP technology, the first published assembly of a conifer genome was sequenced entirely with massively parallel sequencing. Here we provide a comprehensive report on the different features of the approach and the optimization of the process.We have made public the input data (FASTQ format) for the set of pools used in this study:ftp://congenie.org/congenie/Nystedt_2013/Assembly/ProcessedData/FosmidPools/.(alternatively accessible via http://congenie.org/downloads).The software used for running the assembly process is available at http://research.scilifelab.se/andrej_alexeyenko/downloads/fpools/.


Asunto(s)
Vectores Genéticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Picea/genética , Clonación Molecular , Genoma de Planta , Secuenciación de Nucleótidos de Alto Rendimiento/economía , Programas Informáticos
19.
Nat Genet ; 46(5): 474-7, 2014 May.
Artículo en Inglés | MEDLINE | ID: mdl-24658000

RESUMEN

Glutamate receptors are well-known actors in the central and peripheral nervous systems, and altered glutamate signaling is implicated in several neurological and psychiatric disorders. It is increasingly recognized that such receptors may also have a role in tumor growth. Here we provide direct evidence of aberrant glutamate signaling in the development of a locally aggressive bone tumor, chondromyxoid fibroma (CMF). We subjected a series of CMFs to whole-genome mate-pair sequencing and RNA sequencing and found that the glutamate receptor gene GRM1 recombines with several partner genes through promoter swapping and gene fusion events. The GRM1 coding region remains intact, and 18 of 20 CMFs (90%) showed a more than 100-fold and up to 1,400-fold increase in GRM1 expression levels compared to control tissues. Our findings unequivocally demonstrate that direct targeting of GRM1 is a necessary and highly specific driver event for CMF development.


Asunto(s)
Neoplasias Óseas/genética , Fibroma/genética , Regulación Neoplásica de la Expresión Génica/genética , Receptores de Glutamato Metabotrópico/genética , Transducción de Señal/genética , Secuencia de Bases , Bandeo Cromosómico , Variaciones en el Número de Copia de ADN , Fusión Génica/genética , Humanos , Hibridación Fluorescente in Situ , Datos de Secuencia Molecular , Países Bajos , Regiones Promotoras Genéticas/genética , Reacción en Cadena en Tiempo Real de la Polimerasa , Receptores de Glutamato Metabotrópico/metabolismo , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Análisis de Secuencia de ARN
20.
BMC Bioinformatics ; 14 Suppl 7: S6, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23815503

RESUMEN

BACKGROUND: In recent years more than 20 assemblers have been proposed to tackle the hard task of assembling NGS data. A common heuristic when assembling a genome is to use several assemblers and then select the best assembly according to some criteria. However, recent results clearly show that some assemblers lead to better statistics than others on specific regions but are outperformed on other regions or on different evaluation measures. To limit these problems we developed GAM-NGS (Genomic Assemblies Merger for Next Generation Sequencing), whose primary goal is to merge two or more assemblies in order to enhance contiguity and correctness of both. GAM-NGS does not rely on global alignment: regions of the two assemblies representing the same genomic locus (called blocks) are identified through reads' alignments and stored in a weighted graph. The merging phase is carried out with the help of this weighted graph that allows an optimal resolution of local problematic regions. RESULTS: GAM-NGS has been tested on six different datasets and compared to other assembly reconciliation tools. The availability of a reference sequence for three of them allowed us to show how GAM-NGS is a tool able to output an improved reliable set of sequences. GAM-NGS is also a very efficient tool able to merge assemblies using substantially less computational resources than comparable tools. In order to achieve such goals, GAM-NGS avoids global alignment between contigs, making its strategy unique among other assembly reconciliation tools. CONCLUSIONS: The difficulty to obtain correct and reliable assemblies using a single assembler is forcing the introduction of new algorithms able to enhance de novo assemblies. GAM-NGS is a tool able to merge two or more assemblies in order to improve contiguity and correctness. It can be used on all NGS-based assembly projects and it shows its full potential with multi-library Illumina-based projects. With more than 20 available assemblers it is hard to select the best tool. In this context we propose a tool that improves assemblies (and, as a by-product, perhaps even assemblers) by merging them and selecting the generating that is most likely to be correct.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN/métodos , Algoritmos , Cromosomas/genética , Genoma Bacteriano , Genoma Humano , Humanos , Rhodobacter sphaeroides/genética , Programas Informáticos , Staphylococcus aureus/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...