Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
PLoS Genet ; 15(2): e1007858, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30735495

RESUMEN

Complex chromosomal rearrangements (CCRs) are rearrangements involving more than two chromosomes or more than two breakpoints. Whole genome sequencing (WGS) allows for outstanding high resolution characterization on the nucleotide level in unique sequences of such rearrangements, but problems remain for mapping breakpoints in repetitive regions of the genome, which are known to be prone to rearrangements. Hence, multiple complementary WGS experiments are sometimes needed to solve the structures of CCRs. We have studied three individuals with CCRs: Case 1 and Case 2 presented with de novo karyotypically balanced, complex interchromosomal rearrangements (46,XX,t(2;8;15)(q35;q24.1;q22) and 46,XY,t(1;10;5)(q32;p12;q31)), and Case 3 presented with a de novo, extremely complex intrachromosomal rearrangement on chromosome 1. Molecular cytogenetic investigation revealed cryptic deletions in the breakpoints of chromosome 2 and 8 in Case 1, and on chromosome 10 in Case 2, explaining their clinical symptoms. In Case 3, 26 breakpoints were identified using WGS, disrupting five known disease genes. All rearrangements were subsequently analyzed using optical maps, linked-read WGS, and short-read WGS. In conclusion, we present a case series of three unique de novo CCRs where we by combining the results from the different technologies fully solved the structure of each rearrangement. The power in combining short-read WGS with long-molecule sequencing or optical mapping in these unique de novo CCRs in a clinical setting is demonstrated.


Asunto(s)
Cromosomas/genética , Reordenamiento Génico/genética , Variación Estructural del Genoma/genética , Mapeo Cromosómico/métodos , Femenino , Humanos , Masculino , Secuenciación Completa del Genoma/métodos
2.
PLoS Genet ; 14(11): e1007780, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30419018

RESUMEN

Clustered copy number variants (CNVs) as detected by chromosomal microarray analysis (CMA) are often reported as germline chromothripsis. However, such cases might need further investigations by massive parallel whole genome sequencing (WGS) in order to accurately define the underlying complex rearrangement, predict the occurrence mechanisms and identify additional complexities. Here, we utilized WGS to delineate the rearrangement structure of 21 clustered CNV carriers first investigated by CMA and identified a total of 83 breakpoint junctions (BPJs). The rearrangements were further sub-classified depending on the patterns observed: I) Cases with only deletions (n = 8) often had additional structural rearrangements, such as insertions and inversions typical to chromothripsis; II) cases with only duplications (n = 7) or III) combinations of deletions and duplications (n = 6) demonstrated mostly interspersed duplications and BPJs enriched with microhomology. In two cases the rearrangement mutational signatures indicated both a breakage-fusion-bridge cycle process and haltered formation of a ring chromosome. Finally, we observed two cases with Alu- and LINE-mediated rearrangements as well as two unrelated individuals with seemingly identical clustered CNVs on 2p25.3, possibly a rare European founder rearrangement. In conclusion, through detailed characterization of the derivative chromosomes we show that multiple mechanisms are likely involved in the formation of clustered CNVs and add further evidence for chromoanagenesis mechanisms in both "simple" and highly complex chromosomal rearrangements. Finally, WGS characterization adds positional information, important for a correct clinical interpretation and deciphering mechanisms involved in the formation of these rearrangements.


Asunto(s)
Variaciones en el Número de Copia de ADN , Replicación del ADN/genética , Elementos Alu , Puntos de Rotura del Cromosoma , Cromotripsis , Reordenamiento Génico , Genoma Humano , Humanos , Elementos de Nucleótido Esparcido Largo , Análisis de Secuencia por Matrices de Oligonucleótidos , Secuenciación Completa del Genoma
3.
Nature ; 497(7451): 579-84, 2013 May 30.
Artículo en Inglés | MEDLINE | ID: mdl-23698360

RESUMEN

Conifers have dominated forests for more than 200 million years and are of huge ecological and economic importance. Here we present the draft assembly of the 20-gigabase genome of Norway spruce (Picea abies), the first available for any gymnosperm. The number of well-supported genes (28,354) is similar to the >100 times smaller genome of Arabidopsis thaliana, and there is no evidence of a recent whole-genome duplication in the gymnosperm lineage. Instead, the large genome size seems to result from the slow and steady accumulation of a diverse set of long-terminal repeat transposable elements, possibly owing to the lack of an efficient elimination mechanism. Comparative sequencing of Pinus sylvestris, Abies sibirica, Juniperus communis, Taxus baccata and Gnetum gnemon reveals that the transposable element diversity is shared among extant conifers. Expression of 24-nucleotide small RNAs, previously implicated in transposable element silencing, is tissue-specific and much lower than in other plants. We further identify numerous long (>10,000 base pairs) introns, gene-like fragments, uncharacterized long non-coding RNAs and short RNAs. This opens up new genomic avenues for conifer forestry and breeding.


Asunto(s)
Evolución Molecular , Genoma de Planta/genética , Picea/genética , Secuencia Conservada/genética , Elementos Transponibles de ADN/genética , Silenciador del Gen , Genes de Plantas/genética , Genómica , Internet , Intrones/genética , Fenotipo , ARN no Traducido/genética , Análisis de Secuencia de ADN , Secuencias Repetidas Terminales/genética , Transcripción Genética/genética
4.
Hum Mutat ; 38(2): 180-192, 2017 02.
Artículo en Inglés | MEDLINE | ID: mdl-27862604

RESUMEN

Most balanced translocations are thought to result mechanistically from nonhomologous end joining or, in rare cases of recurrent events, by nonallelic homologous recombination. Here, we use low-coverage mate pair whole-genome sequencing to fine map rearrangement breakpoint junctions in both phenotypically normal and affected translocation carriers. In total, 46 junctions from 22 carriers of balanced translocations were characterized. Genes were disrupted in 48% of the breakpoints; recessive genes in four normal carriers and known dominant intellectual disability genes in three affected carriers. Finally, seven candidate disease genes were disrupted in five carriers with neurocognitive disabilities (SVOPL, SUSD1, TOX, NCALD, SLC4A10) and one XX-male carrier with Tourette syndrome (LYPD6, GPC5). Breakpoint junction analyses revealed microhomology and small templated insertions in a substantive fraction of the analyzed translocations (17.4%; n = 4); an observation that was substantiated by reanalysis of 37 previously published translocation junctions. Microhomology associated with templated insertions is a characteristic seen in the breakpoint junctions of rearrangements mediated by error-prone replication-based repair mechanisms. Our data implicate that a mechanism involving template switching might contribute to the formation of at least 15% of the interchromosomal translocation events.


Asunto(s)
Mapeo Cromosómico , Translocación Genética , Secuenciación Completa del Genoma , Secuencia de Bases , Rotura Cromosómica , Hibridación Genómica Comparativa , Variaciones en el Número de Copia de ADN , Femenino , Estudios de Asociación Genética , Genómica/métodos , Genotipo , Recombinación Homóloga , Humanos , Hibridación Fluorescente in Situ , Cariotipo , Masculino , Fenotipo
5.
Arch Toxicol ; 91(5): 2067-2078, 2017 May.
Artículo en Inglés | MEDLINE | ID: mdl-27838757

RESUMEN

Arsenic, a carcinogen with immunotoxic effects, is a common contaminant of drinking water and certain food worldwide. We hypothesized that chronic arsenic exposure alters gene expression, potentially by altering DNA methylation of genes encoding central components of the immune system. We therefore analyzed the transcriptomes (by RNA sequencing) and methylomes (by target-enrichment next-generation sequencing) of primary CD4-positive T cells from matched groups of four women each in the Argentinean Andes, with fivefold differences in urinary arsenic concentrations (median concentrations of urinary arsenic in the lower- and high-arsenic groups: 65 and 276 µg/l, respectively). Arsenic exposure was associated with genome-wide alterations of gene expression; principal component analysis indicated that the exposure explained 53% of the variance in gene expression among the top variable genes and 19% of 28,351 genes were differentially expressed (false discovery rate <0.05) between the exposure groups. Key genes regulating the immune system, such as tumor necrosis factor alpha and interferon gamma, as well as genes related to the NF-kappa-beta complex, were significantly downregulated in the high-arsenic group. Arsenic exposure was associated with genome-wide DNA methylation; the high-arsenic group had 3% points higher genome-wide full methylation (>80% methylation) than the lower-arsenic group. Differentially methylated regions that were hyper-methylated in the high-arsenic group showed enrichment for immune-related gene ontologies that constitute the basic functions of CD4-positive T cells, such as isotype switching and lymphocyte activation and differentiation. In conclusion, chronic arsenic exposure from drinking water was related to changes in the transcriptome and methylome of CD4-positive T cells, both genome wide and in specific genes, supporting the hypothesis that arsenic causes immunotoxicity by interfering with gene expression and regulation.


Asunto(s)
Arsénico/toxicidad , Linfocitos T CD4-Positivos/efectos de los fármacos , Metilación de ADN/efectos de los fármacos , Exposición a Riesgos Ambientales/efectos adversos , Regulación de la Expresión Génica/efectos de los fármacos , Adulto , Argentina , Linfocitos T CD4-Positivos/fisiología , Islas de CpG , Femenino , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Persona de Mediana Edad , Regiones Promotoras Genéticas
6.
BMC Bioinformatics ; 17 Suppl 4: 69, 2016 Mar 02.
Artículo en Inglés | MEDLINE | ID: mdl-26961371

RESUMEN

BACKGROUND: Bisulfite treatment of DNA followed by sequencing (BS-seq) has become a standard technique in epigenetic studies, providing researchers with tools for generating single-base resolution maps of whole methylomes. Aligning bisulfite-treated reads, however, is a computationally difficult task: bisulfite treatment decreases the (lexical) complexity of low-methylated genomic regions, and C-to-T mismatches may reflect cytosine unmethylation rather than SNPs or sequencing errors. Further challenges arise both during and after the alignment phase: data structures used by the aligner should be fast and should fit into main memory, and the methylation-caller output should be somehow compressed, due to its significant size. METHODS: As far as data structures employed to align bisulfite-treated reads are concerned, solutions proposed in the literature can be roughly grouped into two main categories: those storing pointers at each text position (e.g. hash tables, suffix trees/arrays), and those using the information-theoretic minimum number of bits (e.g. FM indexes and compressed suffix arrays). The former are fast and memory consuming. The latter are much slower and light. In this paper, we try to close this gap proposing a data structure for aligning bisulfite-treated reads which is at the same time fast, light, and very accurate. We reach this objective by combining a recent theoretical result on succinct hashing with a bisulfite-aware hash function. Furthermore, the new versions of the tools implementing our ideas|the aligner ERNE-BS5 2 and the caller ERNE-METH 2|have been extended with increased downstream compatibility (EPP/Bismark cov output formats), output compression, and support for target enrichment protocols. RESULTS: Experimental results on public and simulated WGBS libraries show that our algorithmic solution is a competitive tradeoff between hash-based and BWT-based indexes, being as fast and accurate as the former, and as memory-efficient as the latter. CONCLUSIONS: The new functionalities of our bisulfite aligner and caller make it a fast and memory efficient tool, useful to analyze big datasets with little computational resources, to easily process target enrichment data, and produce statistics such as protocol efficiency and coverage as a function of the distance from target regions.


Asunto(s)
Metilación de ADN , ADN/química , Epigenómica , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Sulfitos/química , Islas de CpG , Compresión de Datos , Genoma Humano , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos
7.
BMC Evol Biol ; 16: 59, 2016 Mar 08.
Artículo en Inglés | MEDLINE | ID: mdl-26956800

RESUMEN

BACKGROUND: Although most insect species are specialized on one or few groups of plants, there are phytophagous insects that seem to use virtually any kind of plant as food. Understanding the nature of this ability to feed on a wide repertoire of plants is crucial for the control of pest species and for the elucidation of the macroevolutionary mechanisms of speciation and diversification of insect herbivores. Here we studied Vanessa cardui, the species with the widest diet breadth among butterflies and a potential insect pest, by comparing tissue-specific transcriptomes from caterpillars that were reared on different host plants. We tested whether the similarities of gene-expression response reflect the evolutionary history of adaptation to these plants in the Vanessa and related genera, against the null hypothesis of transcriptional profiles reflecting plant phylogenetic relatedness. RESULT: Using both unsupervised and supervised methods of data analysis, we found that the tissue-specific patterns of caterpillar gene expression are better explained by the evolutionary history of adaptation of the insects to the plants than by plant phylogeny. CONCLUSION: Our findings suggest that V. cardui may use two sets of expressed genes to achieve polyphagy, one associated with the ancestral capability to consume Rosids and Asterids, and another allowing the caterpillar to incorporate a wide range of novel host-plants.


Asunto(s)
Evolución Biológica , Mariposas Diurnas/genética , Animales , Mariposas Diurnas/crecimiento & desarrollo , Mariposas Diurnas/fisiología , Herbivoria , Larva/fisiología , Magnoliopsida/genética , Magnoliopsida/fisiología , Oviposición , Filogenia , Transcriptoma
8.
Eur Respir J ; 47(3): 898-909, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26585430

RESUMEN

In pulmonary sarcoidosis, CD4(+) T-cells expressing T-cell receptor Vα2.3 accumulate in the lungs of HLA-DRB1*03(+) patients. To investigate T-cell receptor-HLA-DRB1*03 interactions underlying recognition of hitherto unknown antigens, we performed detailed analyses of T-cell receptor expression on bronchoalveolar lavage fluid CD4(+) T-cells from sarcoidosis patients.Pulmonary sarcoidosis patients (n=43) underwent bronchoscopy with bronchoalveolar lavage. T-cell receptor α and ß chains of CD4(+) T-cells were analysed by flow cytometry, DNA-sequenced, and three-dimensional molecular models of T-cell receptor-HLA-DRB1*03 complexes generated.Simultaneous expression of Vα2.3 with the Vß22 chain was identified in the lungs of all HLA-DRB1*03(+) patients. Accumulated Vα2.3/Vß22-expressing T-cells were highly clonal, with identical or near-identical Vα2.3 chain sequences and inter-patient similarities in Vß22 chain amino acid distribution. Molecular modelling revealed specific T-cell receptor-HLA-DRB1*03-peptide interactions, with a previously identified, sarcoidosis-associated vimentin peptide, (Vim)429-443 DSLPLVDTHSKRTLL, matching both the HLA peptide-binding cleft and distinct T-cell receptor features perfectly.We demonstrate, for the first time, the accumulation of large clonal populations of specific Vα2.3/Vß22 T-cell receptor-expressing CD4(+) T-cells in the lungs of HLA-DRB1*03(+) sarcoidosis patients. Several distinct contact points between Vα2.3/Vß22 receptors and HLA-DRB1*03 molecules suggest presentation of prototypic vimentin-derived peptides.


Asunto(s)
Linfocitos T CD4-Positivos/inmunología , Cadenas HLA-DRB1/metabolismo , Receptores de Antígenos de Linfocitos T/inmunología , Sarcoidosis Pulmonar/inmunología , Adulto , Líquido del Lavado Bronquioalveolar , Broncoscopía , Femenino , Citometría de Flujo , Humanos , Pulmón/inmunología , Masculino , Persona de Mediana Edad , Modelos Moleculares , Suecia
9.
J Med Genet ; 52(2): 111-22, 2015 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-25473103

RESUMEN

BACKGROUND: Cytogenetically visible chromosomal translocations are highly informative as they can pinpoint strong effect genes even in complex genetic disorders. METHODS AND RESULTS: Here, we report a mother and daughter, both with borderline intelligence and learning problems within the dyslexia spectrum, and two apparently balanced reciprocal translocations: t(1;8)(p22;q24) and t(5;18)(p15;q11). By low coverage mate-pair whole-genome sequencing, we were able to pinpoint the genomic breakpoints to 2 kb intervals. By direct sequencing, we then located the chromosome 5p breakpoint to intron 9 of CTNND2. An additional case with a 163 kb microdeletion exclusively involving CTNND2 was identified with genome-wide array comparative genomic hybridisation. This microdeletion at 5p15.2 is also present in mosaic state in the patient's mother but absent from the healthy siblings. We then investigated the effect of CTNND2 polymorphisms on normal variability and identified a polymorphism (rs2561622) with significant effect on phonological ability and white matter volume in the left frontal lobe, close to cortical regions previously associated with phonological processing. Finally, given the potential role of CTNND2 in neuron motility, we used morpholino knockdown in zebrafish embryos to assess its effects on neuronal migration in vivo. Analysis of the zebrafish forebrain revealed a subpopulation of neurons misplaced between the diencephalon and telencephalon. CONCLUSIONS: Taken together, our human genetic and in vivo data suggest that defective migration of subpopulations of neuronal cells due to haploinsufficiency of CTNND2 contribute to the cognitive dysfunction in our patients.


Asunto(s)
Cateninas/genética , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Discapacidad Intelectual/genética , Lectura , Adolescente , Adulto , Secuencia de Bases , Niño , Puntos de Rotura del Cromosoma , Cognición , Exones/genética , Femenino , Sitios Genéticos , Proteínas Fluorescentes Verdes/metabolismo , Humanos , Intrones/genética , Masculino , Datos de Secuencia Molecular , Mutación/genética , Linaje , Polimorfismo de Nucleótido Simple/genética , Análisis de Secuencia de ADN , Translocación Genética , Sustancia Blanca/patología , Adulto Joven , Proteínas de Pez Cebra/genética , Catenina delta
10.
BMC Bioinformatics ; 15: 281, 2014 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-25128196

RESUMEN

BACKGROUND: The use of short reads from High Throughput Sequencing (HTS) techniques is now commonplace in de novo assembly. Yet, obtaining contiguous assemblies from short reads is challenging, thus making scaffolding an important step in the assembly pipeline. Different algorithms have been proposed but many of them use the number of read pairs supporting a linking of two contigs as an indicator of reliability. This reasoning is intuitive, but fails to account for variation in link count due to contig features.We have also noted that published scaffolders are only evaluated on small datasets using output from only one assembler. Two issues arise from this. Firstly, some of the available tools are not well suited for complex genomes. Secondly, these evaluations provide little support for inferring a software's general performance. RESULTS: We propose a new algorithm, implemented in a tool called BESST, which can scaffold genomes of all sizes and complexities and was used to scaffold the genome of P. abies (20 Gbp). We performed a comprehensive comparison of BESST against the most popular stand-alone scaffolders on a large variety of datasets. Our results confirm that some of the popular scaffolders are not practical to run on complex datasets. Furthermore, no single stand-alone scaffolder outperforms the others on all datasets. However, BESST fares favorably to the other tested scaffolders on GAGE datasets and, moreover, outperforms the other methods when library insert size distribution is wide. CONCLUSION: We conclude from our results that information sources other than the quantity of links, as is commonly used, can provide useful information about genome structure when scaffolding.


Asunto(s)
Algoritmos , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Biblioteca de Genes , Humanos , Reproducibilidad de los Resultados
11.
BMC Genomics ; 15: 439, 2014 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-24906298

RESUMEN

BACKGROUND: Sampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality. RESULTS: In order to sequence the genome of Norway spruce, which is of great size and complexity, we developed and applied a new technology based on the massive production, sequencing, and assembly of Fosmid pools (FP). The spruce chromosomes were sampled with ~40,000 bp Fosmid inserts to obtain around two-fold genome coverage, in parallel with traditional whole genome shotgun sequencing (WGS) of haploid and diploid genomes. Compared to the WGS results, the contiguity and quality of the FP assemblies were high, and they allowed us to fill WGS gaps resulting from repeats, low coverage, and allelic differences. The FP contig sets were further merged with WGS data using a novel software package GAM-NGS. CONCLUSIONS: By exploiting FP technology, the first published assembly of a conifer genome was sequenced entirely with massively parallel sequencing. Here we provide a comprehensive report on the different features of the approach and the optimization of the process.We have made public the input data (FASTQ format) for the set of pools used in this study:ftp://congenie.org/congenie/Nystedt_2013/Assembly/ProcessedData/FosmidPools/.(alternatively accessible via http://congenie.org/downloads).The software used for running the assembly process is available at http://research.scilifelab.se/andrej_alexeyenko/downloads/fpools/.


Asunto(s)
Vectores Genéticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Picea/genética , Clonación Molecular , Genoma de Planta , Secuenciación de Nucleótidos de Alto Rendimiento/economía , Programas Informáticos
12.
BMC Bioinformatics ; 14 Suppl 7: S6, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23815503

RESUMEN

BACKGROUND: In recent years more than 20 assemblers have been proposed to tackle the hard task of assembling NGS data. A common heuristic when assembling a genome is to use several assemblers and then select the best assembly according to some criteria. However, recent results clearly show that some assemblers lead to better statistics than others on specific regions but are outperformed on other regions or on different evaluation measures. To limit these problems we developed GAM-NGS (Genomic Assemblies Merger for Next Generation Sequencing), whose primary goal is to merge two or more assemblies in order to enhance contiguity and correctness of both. GAM-NGS does not rely on global alignment: regions of the two assemblies representing the same genomic locus (called blocks) are identified through reads' alignments and stored in a weighted graph. The merging phase is carried out with the help of this weighted graph that allows an optimal resolution of local problematic regions. RESULTS: GAM-NGS has been tested on six different datasets and compared to other assembly reconciliation tools. The availability of a reference sequence for three of them allowed us to show how GAM-NGS is a tool able to output an improved reliable set of sequences. GAM-NGS is also a very efficient tool able to merge assemblies using substantially less computational resources than comparable tools. In order to achieve such goals, GAM-NGS avoids global alignment between contigs, making its strategy unique among other assembly reconciliation tools. CONCLUSIONS: The difficulty to obtain correct and reliable assemblies using a single assembler is forcing the introduction of new algorithms able to enhance de novo assemblies. GAM-NGS is a tool able to merge two or more assemblies in order to improve contiguity and correctness. It can be used on all NGS-based assembly projects and it shows its full potential with multi-library Illumina-based projects. With more than 20 available assemblers it is hard to select the best tool. In this context we propose a tool that improves assemblies (and, as a by-product, perhaps even assemblers) by merging them and selecting the generating that is most likely to be correct.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN/métodos , Algoritmos , Cromosomas/genética , Genoma Bacteriano , Genoma Humano , Humanos , Rhodobacter sphaeroides/genética , Programas Informáticos , Staphylococcus aureus/genética
13.
Bioinformatics ; 28(1): 123-4, 2012 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-22084252

RESUMEN

SUMMARY: The advent of high-throughput sequencers (HTS) introduced the need of new tools in order to analyse the large amount of data that those machines are able to produce. The mandatory first step for a wide range of analyses is the alignment of the sequences against a reference genome. We present a major update to our rNA (randomized Numerical Aligner) tool. The main feature of rNA is the fact that it achieves an accuracy greater than the majority of other tools in a feasible amount of time. rNA executables and source codes are freely downloadable at http://iga-rna.sourceforge.net/. CONTACT: vezzi@appliedgenomics.org; delfabbro@appliedgenomics.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Humanos
14.
BMC Bioinformatics ; 13 Suppl 14: S8, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23095524

RESUMEN

BACKGROUND: Next Generation Sequencing technologies are able to provide high genome coverages at a relatively low cost. However, due to limited reads' length (from 30 bp up to 200 bp), specific bioinformatics problems have become even more difficult to solve. De novo assembly with short reads, for example, is more complicated at least for two reasons: first, the overall amount of "noisy" data to cope with increased and, second, as the reads' length decreases the number of unsolvable repeats grows. Our work's aim is to go at the root of the problem by providing a pre-processing tool capable to produce (in-silico) longer and highly accurate sequences from a collection of Next Generation Sequencing reads. RESULTS: In this paper a seed-and-extend local assembler is presented. The kernel algorithm is a loop that, starting from a read used as seed, keeps extending it using heuristics whose main goal is to produce a collection of error-free and longer sequences. In particular, GapFiller carefully detects reliable overlaps and operates clustering similar reads in order to reconstruct the missing part between the two ends of the same insert. Our tool's output has been validated on 24 experiments using both simulated and real paired reads datasets. The output sequences are declared correct when the seed-mate is found. In the experiments performed, GapFiller was able to extend high percentages of the processed seeds and find their mates, with a false positives rate that turned out to be nearly negligible. CONCLUSIONS: GapFiller, starting from a sufficiently high short reads coverage, is able to produce high coverages of accurate longer sequences (from 300 bp up to 3500 bp). The procedure to perform safe extensions, together with the mate-found check, turned out to be a powerful criterion to guarantee contigs' correctness. GapFiller has further potential, as it could be applied in a number of different scenarios, including the post-processing validation of insertions/deletions detection pipelines, pre-processing routines on datasets for de novo assembly pipelines, or in any hierarchical approach designed to assemble, analyse or validate pools of sequences.


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN/métodos , Mapeo Contig , Genoma , Genoma Humano , Humanos , Rhodobacter sphaeroides/genética , Programas Informáticos , Staphylococcus aureus/genética
15.
Clin Chim Acta ; 512: 40-48, 2021 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-33227269

RESUMEN

The aim of this study was to evaluate the performance of a novel NGS-based assay to monitor mixed chimerism (MC) and compare its technical capacity to established techniques for chimerism analysis. Artificial and clinical samples with increasing amounts of patient DNA were compared using real-time PCR detection of indels and SNP, fragment analysis of short-tandem repeats (STR) and NGS analysis of indels. Real-time PCR displayed excellent sensitivity (>0,01%) but poor accuracy (>20 CV% at MC > 20%), while fragment analysis exhibited good accuracy (<5 CV% at MC > 20%) with limited sensitivity (>2,5%). In contrast, NGS chimerism demonstrated a sensitivity (>0,1%) equal to real-time PCR and an accuracy equal or better than STR analysis throughout an extensive range of mixed chimerism (0,1 - 100%). To evaluate performance of the separate techniques for chimerism determination, 75 retrospective patient monitoring samples (3-7 weeks post-HSCT) with low (<5%), intermediate (5-20%) or high mixed chimerism (>20%) were analyzed. The between run precision for the NGS assay varied from 0,72% (>20% MC) to 7,38% (MC < 5%). In conclusion, NGS displayed a combination of high sensitivity with good accuracy in both artificial and clinical chimerism samples.


Asunto(s)
Quimerismo , Trasplante de Células Madre Hematopoyéticas , ADN , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Repeticiones de Microsatélite , Estudios Retrospectivos , Quimera por Trasplante
16.
Mol Ecol Resour ; 20(5): 1171-1181, 2020 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-30848092

RESUMEN

The high-throughput capacities of the Illumina sequencing platforms and the possibility to label samples individually have encouraged wide use of sample multiplexing. However, this practice results in read misassignment (usually <1%) across samples sequenced on the same lane. Alarmingly high rates of read misassignment of up to 10% were reported for lllumina sequencing machines with exclusion amplification chemistry. This may make use of these platforms prohibitive, particularly in studies that rely on low-quantity and low-quality samples, such as historical and archaeological specimens. Here, we use barcodes, short sequences that are ligated to both ends of the DNA insert, to directly quantify the rate of index hopping in 100-year old museum-preserved gorilla (Gorilla beringei) samples. Correcting for multiple sources of noise, we identify on average 0.470% of reads containing a hopped index. We show that sample-specific quantity of misassigned reads depends on the number of reads that any given sample contributes to the total sequencing pool, so that samples with few sequenced reads receive the greatest proportion of misassigned reads. This particularly affects ancient DNA samples, as these frequently differ in their DNA quantity and endogenous content. Through simulations we show that even low rates of index hopping, as reported here, can lead to biases in ancient DNA studies when multiplexing samples with vastly different quantities of endogenous material.


Asunto(s)
ADN Antiguo , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Animales , ADN , Código de Barras del ADN Taxonómico , Gorilla gorilla/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos
17.
Genes (Basel) ; 9(10)2018 Oct 09.
Artículo en Inglés | MEDLINE | ID: mdl-30304863

RESUMEN

The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data.

18.
PLoS One ; 13(3): e0193928, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29529047

RESUMEN

The detection of recurrent somatic chromosomal rearrangements is standard of care for most leukemia types. Even though karyotype analysis-a low-resolution genome-wide chromosome analysis-is still the gold standard, it often needs to be complemented with other methods to increase resolution. To evaluate the feasibility and applicability of mate pair whole genome sequencing (MP-WGS) to detect structural chromosomal rearrangements in the diagnostic setting, we sequenced ten bone marrow samples from leukemia patients with recurrent rearrangements. Samples were selected based on cytogenetic and FISH results at leukemia diagnosis to include common rearrangements of prognostic relevance. Using MP-WGS and in-house bioinformatic analysis all sought rearrangements were successfully detected. In addition, unexpected complexity or additional, previously undetected rearrangements was unraveled in three samples. Finally, the MP-WGS analysis pinpointed the location of chromosome junctions at high resolution and we were able to identify the exact exons involved in the resulting fusion genes in all samples and the specific junction at the nucleotide level in half of the samples. The results show that our approach combines the screening character from karyotype analysis with the specificity and resolution of cytogenetic and molecular methods. As a result of the straightforward analysis and high-resolution detection of clinically relevant rearrangements, we conclude that MP-WGS is a feasible method for routine leukemia diagnostics of structural chromosomal rearrangements.


Asunto(s)
Aberraciones Cromosómicas , Leucemia/genética , Secuenciación Completa del Genoma/métodos , Médula Ósea , Biología Computacional , Detección Precoz del Cáncer , Exones , Estudios de Factibilidad , Humanos , Hibridación Fluorescente in Situ , Leucemia/patología
19.
F1000Res ; 6: 664, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28781756

RESUMEN

Reliable detection of large structural variation ( > 1000 bp) is important in both rare and common genetic disorders. Whole genome sequencing (WGS) is a technology that may be used to identify a large proportion of the genomic structural variants (SVs) in an individual in a single experiment. Even though SV callers have been extensively used in research to detect mutations, the potential usage of SV callers within routine clinical diagnostics is hindered by high computational costs, usage of non-standard output format, and limited support for the various sequencing platforms and libraries. Another well known, but not well-addressed problem is the large number of benign variants and reference errors present in the human genome that further complicates analysis. Here we present TIDDIT, a time efficient variant caller, that uses discordant read pairs as well as the depth of coverage and split reads to detect and classify a large spectrum of SVs. As part of the software suite, TIDDIT also includes a database functionality that enables filtering for rare variants and reduces the number of false positive calls and background noise. Benchmarked against five state-of-the-art SV callers, TIDDIT performs at an equal/superior level while using only 2 CPU hours per sample. Thanks to its speed, sensitivity, flexibility and ability to easily detect variants on a wide range of WGS library types, TIDDIT solves many of the problems that are currently hindering the utilization of WGS for SV calling in clinical settings.

20.
Eur J Hum Genet ; 25(11): 1253-1260, 2017 11.
Artículo en Inglés | MEDLINE | ID: mdl-28832569

RESUMEN

Here we describe the SweGen data set, a comprehensive map of genetic variation in the Swedish population. These data represent a basic resource for clinical genetics laboratories as well as for sequencing-based association studies by providing information on genetic variant frequencies in a cohort that is well matched to national patient cohorts. To select samples for this study, we first examined the genetic structure of the Swedish population using high-density SNP-array data from a nation-wide cohort of over 10 000 Swedish-born individuals included in the Swedish Twin Registry. A total of 1000 individuals, reflecting a cross-section of the population and capturing the main genetic structure, were selected for whole-genome sequencing. Analysis pipelines were developed for automated alignment, variant calling and quality control of the sequencing data. This resulted in a genome-wide collection of aggregated variant frequencies in the Swedish population that we have made available to the scientific community through the website https://swefreq.nbis.se. A total of 29.2 million single-nucleotide variants and 3.8 million indels were detected in the 1000 samples, with 9.9 million of these variants not present in current databases. Each sample contributed with an average of 7199 individual-specific variants. In addition, an average of 8645 larger structural variants (SVs) were detected per individual, and we demonstrate that the population frequencies of these SVs can be used for efficient filtering analyses. Finally, our results show that the genetic diversity within Sweden is substantial compared with the diversity among continental European populations, underscoring the relevance of establishing a local reference data set.


Asunto(s)
Genoma Humano , Polimorfismo de Nucleótido Simple , Sistema de Registros , Conjuntos de Datos como Asunto , Estudio de Asociación del Genoma Completo , Humanos , Suecia , Gemelos/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA