Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Biotechniques ; 76(5): 203-215, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38573592

RESUMEN

In the absence of a DNA template, the ab initio production of long double-stranded DNA molecules of predefined sequences is particularly challenging. The DNA synthesis step remains a bottleneck for many applications such as functional assessment of ancestral genes, analysis of alternative splicing or DNA-based data storage. In this report we propose a fully in vitro protocol to generate very long double-stranded DNA molecules starting from commercially available short DNA blocks in less than 3 days using Golden Gate assembly. This innovative application allowed us to streamline the process to produce a 24 kb-long DNA molecule storing part of the Declaration of the Rights of Man and of the Citizen of 1789 . The DNA molecule produced can be readily cloned into a suitable host/vector system for amplification and selection.


Asunto(s)
ADN , ADN/genética , ADN/química , Almacenamiento y Recuperación de la Información/métodos , Humanos , Secuencia de Bases/genética , Clonación Molecular/métodos
2.
Bioinform Adv ; 2(1): vbac068, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36699389

RESUMEN

Motivation: Recently introduced, linked-read technologies, such as the 10× chromium system, use microfluidics to tag multiple short reads from the same long fragment (50-200 kb) with a small sequence, called a barcode. They are inexpensive and easy to prepare, combining the accuracy of short-read sequencing with the long-range information of barcodes. The same barcode can be used for several different fragments, which complicates the analyses. Results: We present QuickDeconvolution (QD), a new software for deconvolving a set of reads sharing a barcode, i.e. separating the reads from the different fragments. QD only takes sequencing data as input, without the need for a reference genome. We show that QD outperforms existing software in terms of accuracy, speed and scalability, making it capable of deconvolving previously inaccessible data sets. In particular, we demonstrate here the first example in the literature of a successfully deconvoluted animal sequencing dataset, a 33-Gb Drosophila melanogaster dataset. We show that the taxonomic assignment of linked reads can be improved by deconvoluting reads with QD before taxonomic classification. Availability and implementation: Code and instructions are available on https://github.com/RolandFaure/QuickDeconvolution. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

3.
Bioinformatics ; 36(17): 4568-4575, 2020 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-32437523

RESUMEN

MOTIVATION: Studies on structural variants (SVs) are expanding rapidly. As a result, and thanks to third generation sequencing technologies, the number of discovered SVs is increasing, especially in the human genome. At the same time, for several applications such as clinical diagnoses, it is important to genotype newly sequenced individuals on well-defined and characterized SVs. Whereas several SV genotypers have been developed for short read data, there is a lack of such dedicated tool to assess whether known SVs are present or not in a new long read sequenced sample, such as the one produced by Pacific Biosciences or Oxford Nanopore Technologies. RESULTS: We present a novel method to genotype known SVs from long read sequencing data. The method is based on the generation of a set of representative allele sequences that represent the two alleles of each structural variant. Long reads are aligned to these allele sequences. Alignments are then analyzed and filtered out to keep only informative ones, to quantify and estimate the presence of each SV allele and the allele frequencies. We provide an implementation of the method, SVJedi, to genotype SVs with long reads. The tool has been applied to both simulated and real human datasets and achieves high genotyping accuracy. We show that SVJedi obtains better performances than other existing long read genotyping tools and we also demonstrate that SV genotyping is considerably improved with SVJedi compared to other approaches, namely SV discovery and short read SV genotyping approaches. AVAILABILITY AND IMPLEMENTATION: https://github.com/llecompte/SVJedi.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma Humano , Programas Informáticos , Variación Estructural del Genoma , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN
4.
5.
Microbiol Resour Announc ; 9(11)2020 Mar 12.
Artículo en Inglés | MEDLINE | ID: mdl-32165385

RESUMEN

The frequency of infections due to Streptococcus pyogenes M/emm89 strains is increasing, presumably due to the emergence of a genetically distinct clone. We sequenced two emm89 strains isolated in Brittany, France, in 2009 and 2010 from invasive and noninvasive infections, respectively. Both strains belong to a newly emerged emm89 clade 3 clone.

6.
J Bioinform Comput Biol ; 17(3): 1950014, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-31288643

RESUMEN

This paper focuses on the last two stages of genome assembly, namely, scaffolding and gap-filling, and shows that they can be solved as part of a single optimization problem. Our approach is based on modeling genome assembly as a problem of finding a simple path in a specific graph that satisfies as many distance constraints as possible encoding the insert-size information. We formulate it as a mixed-integer linear programming (MILP) problem and apply an optimization solver to find the exact solutions on a benchmark of chloroplasts. We show that the presence of repetitions in the set of unitigs is the main reason for the existence of multiple equivalent solutions that are associated to alternative subpaths. We also describe two sufficient conditions and we design efficient algorithms for identifying these subpaths. Comparisons of the results achieved by our tool with the ones obtained with recent assemblers are presented.


Asunto(s)
Algoritmos , Genoma del Cloroplasto , Mapeo Contig/métodos , Genoma de Planta , Modelos Genéticos
7.
Sci Rep ; 8(1): 8511, 2018 05 31.
Artículo en Inglés | MEDLINE | ID: mdl-29855493

RESUMEN

A broad panel of tens of thousands of microsatellite loci is unveiled for an endangered piracema (i.e. migratory) South American fish, Brycon orbignyanus. Once one of the main fisheries resources in the Platine Basin, it is now almost extinct in nature and focus of intense aquaculture activity. A total of 178.2 million paired-end reads (90 bases long) were obtained through the use of sequencing-by-synthesis (from a primary genomic library of 500 bp DNA fragments) and is made available through NCBI's Sequence Read Archive, SRA accession SRX3350440. Short reads were assembled de novo and screening for perfect microsatellite motifs revealed more than 81 thousands unique microsatellite loci, for which primer pairs were proposed. A total of 29 polymorphic microsatellite markers were already previously validated for this panel. A partial genomic assembly is hereby presented and these genomic resources are publicly made available. These data will foster the rapid development of hundreds of new DNA markers for genetic diversity studies, conservation initiatives and management practices for this important and depleted species. The availability of such preliminary genomic data will also be of use in the areas of bioinformatics, ecology, genetics and evolution.


Asunto(s)
Characiformes/genética , Especies en Peligro de Extinción , Repeticiones de Microsatélite , Animales , ADN/genética , Biblioteca de Genes , Marcadores Genéticos/genética , Genómica
8.
Nat Methods ; 14(11): 1063-1071, 2017 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-28967888

RESUMEN

Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.


Asunto(s)
Metagenómica , Programas Informáticos , Algoritmos , Benchmarking , Análisis de Secuencia de ADN
9.
Genome Announc ; 5(39)2017 Sep 28.
Artículo en Inglés | MEDLINE | ID: mdl-28963207

RESUMEN

While the incidence and invasiveness of type emm75 group A Streptococcus (GAS) infections increased in French Brittany during 2013, we sequenced and analyzed the genomes of three independent strains isolated in 2009, 2012, and 2014, respectively. In this short-term evolution, genomic analysis evidenced mainly the integration of new phages encoding virulence factors.

11.
Genome Biol ; 18(1): 27, 2017 02 13.
Artículo en Inglés | MEDLINE | ID: mdl-28190401

RESUMEN

BACKGROUND: The prevailing paradigm of host-parasite evolution is that arms races lead to increasing specialisation via genetic adaptation. Insect herbivores are no exception and the majority have evolved to colonise a small number of closely related host species. Remarkably, the green peach aphid, Myzus persicae, colonises plant species across 40 families and single M. persicae clonal lineages can colonise distantly related plants. This remarkable ability makes M. persicae a highly destructive pest of many important crop species. RESULTS: To investigate the exceptional phenotypic plasticity of M. persicae, we sequenced the M. persicae genome and assessed how one clonal lineage responds to host plant species of different families. We show that genetically identical individuals are able to colonise distantly related host species through the differential regulation of genes belonging to aphid-expanded gene families. Multigene clusters collectively upregulate in single aphids within two days upon host switch. Furthermore, we demonstrate the functional significance of this rapid transcriptional change using RNA interference (RNAi)-mediated knock-down of genes belonging to the cathepsin B gene family. Knock-down of cathepsin B genes reduced aphid fitness, but only on the host that induced upregulation of these genes. CONCLUSIONS: Previous research has focused on the role of genetic adaptation of parasites to their hosts. Here we show that the generalist aphid pest M. persicae is able to colonise diverse host plant species in the absence of genetic specialisation. This is achieved through rapid transcriptional plasticity of genes that have duplicated during aphid evolution.

12.
Genome Announc ; 4(4)2016 Jul 21.
Artículo en Inglés | MEDLINE | ID: mdl-27445380

RESUMEN

Here, we announce the complete annotated genome sequence of the invasive Streptococcus pyogenes strain M/emm66, isolated in 2013 from a subcutaneous abscess in new clustered cases in French Brittany.

13.
BMC Bioinformatics ; 16: 288, 2015 Sep 14.
Artículo en Inglés | MEDLINE | ID: mdl-26370285

RESUMEN

BACKGROUND: Data volumes generated by next-generation sequencing (NGS) technologies is now a major concern for both data storage and transmission. This triggered the need for more efficient methods than general purpose compression tools, such as the widely used gzip method. RESULTS: We present a novel reference-free method meant to compress data issued from high throughput sequencing technologies. Our approach, implemented in the software LEON, employs techniques derived from existing assembly principles. The method is based on a reference probabilistic de Bruijn Graph, built de novo from the set of reads and stored in a Bloom filter. Each read is encoded as a path in this graph, by memorizing an anchoring kmer and a list of bifurcations. The same probabilistic de Bruijn Graph is used to perform a lossy transformation of the quality scores, which allows to obtain higher compression rates without losing pertinent information for downstream analyses. CONCLUSIONS: LEON was run on various real sequencing datasets (whole genome, exome, RNA-seq or metagenomics). In all cases, LEON showed higher overall compression ratios than state-of-the-art compression software. On a C. elegans whole genome sequencing dataset, LEON divided the original file size by more than 20. LEON is an open source software, distributed under GNU affero GPL License, available for download at http://gatb.inria.fr/software/leon/.


Asunto(s)
Algoritmos , Proteínas de Caenorhabditis elegans/genética , Caenorhabditis elegans/genética , Gráficos por Computador , Compresión de Datos/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Animales , Biología Computacional/métodos , Simulación por Computador , Metagenómica , Probabilidad
14.
Bioinformatics ; 30(20): 2959-61, 2014 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-24990603

RESUMEN

MOTIVATION: Efficient and fast next-generation sequencing (NGS) algorithms are essential to analyze the terabytes of data generated by the NGS machines. A serious bottleneck can be the design of such algorithms, as they require sophisticated data structures and advanced hardware implementation. RESULTS: We propose an open-source library dedicated to genome assembly and analysis to fasten the process of developing efficient software. The library is based on a recent optimized de-Bruijn graph implementation allowing complex genomes to be processed on desktop computers using fast algorithms with low memory footprints. AVAILABILITY AND IMPLEMENTATION: The GATB library is written in C++ and is available at the following Web site http://gatb.inria.fr under the A-GPL license. CONTACT: lavenier@irisa.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bioestadística/métodos , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Algoritmos , Gráficos por Computador , Genoma Humano/genética , Humanos
15.
Genome Announc ; 2(1)2014 Jan 16.
Artículo en Inglés | MEDLINE | ID: mdl-24435871

RESUMEN

Lactococcus lactis is widely used in the dairy industry. We report the draft genome sequence of L. lactis subsp. lactis bv. diacetylactis LD61, an industrial and extensively studied strain. In contrast to the closely related and plasmidless strain IL1403, LD61 contains 6 plasmids, and the genome sequence provides additional information related to adaptation to the dairy environment.

16.
J Comput Biol ; 20(9): 672-86, 2013 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-24000926

RESUMEN

Mapping quantitative trait loci (QTL) using genetic marker information is a time-consuming analysis that has interested the mapping community in recent decades. The increasing amount of genetic marker data allows one to consider ever more precise QTL analyses while increasing the demand for computation. Part of the difficulty of detecting QTLs resides in finding appropriate critical values or threshold values, above which a QTL effect is considered significant. Different approaches exist to determine these thresholds, using either empirical methods or algebraic approximations. In this article, we present a new implementation of existing software, QTLMap, which takes advantage of the data parallel nature of the problem by offsetting heavy computations to a graphics processing unit (GPU). Developments on the GPU were implemented using Cuda technology. This new implementation performs up to 75 times faster than the previous multicore implementation, while maintaining the same results and level of precision (Double Precision) and computing both QTL values and thresholds. This speedup allows one to perform more complex analyses, such as linkage disequilibrium linkage analyses (LDLA) and multiQTL analyses, in a reasonable time frame.


Asunto(s)
Desequilibrio de Ligamiento/fisiología , Tipificación de Secuencias Multilocus/métodos , Sitios de Carácter Cuantitativo/fisiología , Programas Informáticos , Marcadores Genéticos/fisiología
17.
Gigascience ; 2(1): 10, 2013 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-23870653

RESUMEN

BACKGROUND: The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. RESULTS: In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. CONCLUSIONS: Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.

18.
Bioinformatics ; 29(5): 652-3, 2013 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-23325618

RESUMEN

SUMMARY: Counting all the k-mers (substrings of length k) in DNA/RNA sequencing reads is the preliminary step of many bioinformatics applications. However, state of the art k-mer counting methods require that a large data structure resides in memory. Such structure typically grows with the number of distinct k-mers to count. We present a new streaming algorithm for k-mer counting, called DSK (disk streaming of k-mers), which only requires a fixed user-defined amount of memory and disk space. This approach realizes a memory, time and disk trade-off. The multi-set of all k-mers present in the reads is partitioned, and partitions are saved to disk. Then, each partition is separately loaded in memory in a temporary hash table. The k-mer counts are returned by traversing each hash table. Low-abundance k-mers are optionally filtered. DSK is the first approach that is able to count all the 27-mers of a human genome dataset using only 4.0 GB of memory and moderate disk space (160 GB), in 17.9 h. DSK can replace a popular k-mer counting software (Jellyfish) on small-memory servers. AVAILABILITY: http://minia.genouest.org/dsk


Asunto(s)
Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Algoritmos , Genoma Humano , Humanos
19.
BMC Bioinformatics ; 13 Suppl 19: S10, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23282463

RESUMEN

BACKGROUND: Nowadays, metagenomic sample analyses are mainly achieved by comparing them with a priori knowledge stored in data banks. While powerful, such approaches do not allow to exploit unknown and/or "unculturable" species, for instance estimated at 99% for Bacteria. METHODS: This work introduces Compareads, a de novo comparative metagenomic approach that returns the reads that are similar between two possibly metagenomic datasets generated by High Throughput Sequencers. One originality of this work consists in its ability to deal with huge datasets. The second main contribution presented in this paper is the design of a probabilistic data structure based on Bloom filters enabling to index millions of reads with a limited memory footprint and a controlled error rate. RESULTS: We show that Compareads enables to retrieve biological information while being able to scale to huge datasets. Its time and memory features make Compareads usable on read sets each composed of more than 100 million Illumina reads in a few hours and consuming 4 GB of memory, and thus usable on today's personal computers. CONCLUSION: Using a new data structure, Compareads is a practical solution for comparing de novo huge metagenomic samples. Compareads is released under the CeCILL license and can be freely downloaded from http://alcovna.genouest.org/compareads/.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Metagenómica/métodos , Algoritmos
20.
Genome Res ; 21(12): 2224-41, 2011 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-21926179

RESUMEN

Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.


Asunto(s)
Genoma/fisiología , Genómica/métodos , Análisis de Secuencia de ADN/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...