Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
2.
BMC Bioinformatics ; 23(1): 448, 2022 Oct 28.
Artículo en Inglés | MEDLINE | ID: mdl-36307762

RESUMEN

BACKGROUND: Internal tandem duplications in the FLT3 gene, termed FLT3-ITDs, are useful molecular markers in acute myeloid leukemia (AML) for patient risk stratification and follow-up. FLT3-ITDs are increasingly screened through high-throughput sequencing (HTS) raising the need for robust and efficient algorithms. We developed a new algorithm, which performs no alignment and uses little resources, to identify and quantify FLT3-ITDs in HTS data. RESULTS: Our algorithm (FiLT3r) focuses on the k-mers from reads covering FLT3 exons 14 and 15. We show that those k-mers bring enough information to accurately detect, determine the length and quantify FLT3-ITD duplications. We compare the performances of FiLT3r to state-of-the-art alternatives and to fragment analysis, the gold standard method, on a cohort of 185 AML patients sequenced with capture-based HTS. On this dataset FiLT3r is more precise (no false positive nor false negative) than the other software evaluated. We also assess the software on public RNA-Seq data, which confirms the previous results and shows that FiLT3r requires little resources compared to other software. CONCLUSION: FiLT3r is a free software available at https://gitlab.univ-lille.fr/filt3r/filt3r . The repository also contains a Snakefile to reproduce our experiments. We show that FiLT3r detects FLT3-ITDs better than other software while using less memory and time.


Asunto(s)
Leucemia Mieloide Aguda , Secuencias Repetidas en Tándem , Humanos , Secuencias Repetidas en Tándem/genética , Leucemia Mieloide Aguda/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Exones , Secuencia de Bases , Tirosina Quinasa 3 Similar a fms/genética , Mutación
3.
Methods Mol Biol ; 2453: 43-59, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35622319

RESUMEN

Within the EuroClonality-NGS group, immune repertoire analysis for target identification in lymphoid malignancies was initially developed using two-stage amplicon approaches, essentially as a progressive modification of preceding methods developed for Sanger sequencing. This approach has, however, limitations with respect to sample handling, adaptation to automation, and risk of contamination by amplicon products. We therefore developed one-step PCR amplicon methods with individual barcoding for batched analysis for IGH, IGK, TRD, TRG, and TRB rearrangements, followed by Vidjil-based data analysis.


Asunto(s)
Genes Codificadores de los Receptores de Linfocitos T , Secuenciación de Nucleótidos de Alto Rendimiento , Inmunoglobulinas , Leucemia-Linfoma Linfoblástico de Células Precursoras , Recombinación Genética , Genes Codificadores de los Receptores de Linfocitos T/genética , Genes Codificadores de los Receptores de Linfocitos T/inmunología , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Inmunoglobulinas/genética , Inmunoglobulinas/inmunología , Neoplasia Residual/diagnóstico , Neoplasia Residual/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/diagnóstico , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/inmunología , Recombinación Genética/genética , Recombinación Genética/inmunología
4.
Methods Mol Biol ; 2453: 153-167, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35622326

RESUMEN

B cell receptor (BcR) immunoglobulins (IG) display a tremendous diversity due to complex DNA rearrangements, the V(D)J recombination, further enhanced by the somatic hypermutation process. In chronic lymphocytic leukemia (CLL), the mutational load of the clonal BcR IG expressed by the leukemic cells constitutes an important prognostic and predictive biomarker. Here, we provide a reliable methodology capable of determining the mutational status of IG genes in CLL using high-throughput sequencing, starting from leukemic cell DNA or RNA.


Asunto(s)
Leucemia Linfocítica Crónica de Células B , Genes de Inmunoglobulinas , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Inmunoglobulinas/genética , Leucemia Linfocítica Crónica de Células B/genética , Receptores de Antígenos de Linfocitos B/genética
5.
Genome Res ; 31(1): 1-12, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33328168

RESUMEN

High-throughput sequencing data sets are usually deposited in public repositories (e.g., the European Nucleotide Archive) to ensure reproducibility. As the amount of data has reached petabyte scale, repositories do not allow one to perform online sequence searches, yet, such a feature would be highly useful to investigators. Toward this goal, in the last few years several computational approaches have been introduced to index and query large collections of data sets. Here, we propose an accessible survey of these approaches, which are generally based on representing data sets as sets of k-mers. We review their properties, introduce a classification, and present their general intuition. We summarize their performance and highlight their current strengths and limitations.


Asunto(s)
Algoritmos , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento , Reproducibilidad de los Resultados
6.
Bioinformatics ; 36(Suppl_1): i177-i185, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32657392

RESUMEN

MOTIVATION: In this work we present REINDEER, a novel computational method that performs indexing of sequences and records their abundances across a collection of datasets. To the best of our knowledge, other indexing methods have so far been unable to record abundances efficiently across large datasets. RESULTS: We used REINDEER to index the abundances of sequences within 2585 human RNA-seq experiments in 45 h using only 56 GB of RAM. This makes REINDEER the first method able to record abundances at the scale of ∼4 billion distinct k-mers across 2585 datasets. REINDEER also supports exact presence/absence queries of k-mers. Briefly, REINDEER constructs the compacted de Bruijn graph of each dataset, then conceptually merges those de Bruijn graphs into a single global one. Then, REINDEER constructs and indexes monotigs, which in a nutshell are groups of k-mers of similar abundances. AVAILABILITY AND IMPLEMENTATION: https://github.com/kamimrcht/REINDEER. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Análisis de Secuencia de ADN , Programas Informáticos , Algoritmos , Humanos , Análisis de Secuencia de ARN
7.
Leukemia ; 33(9): 2241-2253, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31243313

RESUMEN

Amplicon-based next-generation sequencing (NGS) of immunoglobulin (IG) and T-cell receptor (TR) gene rearrangements for clonality assessment, marker identification and quantification of minimal residual disease (MRD) in lymphoid neoplasms has been the focus of intense research, development and application. However, standardization and validation in a scientifically controlled multicentre setting is still lacking. Therefore, IG/TR assay development and design, including bioinformatics, was performed within the EuroClonality-NGS working group and validated for MRD marker identification in acute lymphoblastic leukaemia (ALL). Five EuroMRD ALL reference laboratories performed IG/TR NGS in 50 diagnostic ALL samples, and compared results with those generated through routine IG/TR Sanger sequencing. A central polytarget quality control (cPT-QC) was used to monitor primer performance, and a central in-tube quality control (cIT-QC) was spiked into each sample as a library-specific quality control and calibrator. NGS identified 259 (average 5.2/sample, range 0-14) clonal sequences vs. Sanger-sequencing 248 (average 5.0/sample, range 0-14). NGS primers covered possible IG/TR rearrangement types more completely compared with local multiplex PCR sets and enabled sequencing of bi-allelic rearrangements and weak PCR products. The cPT-QC showed high reproducibility across all laboratories. These validated and reproducible quality-controlled EuroClonality-NGS assays can be used for standardized NGS-based identification of IG/TR markers in lymphoid malignancies.


Asunto(s)
Reordenamiento Génico de Linfocito T/genética , Genes Codificadores de los Receptores de Linfocitos T/genética , Marcadores Genéticos/genética , Inmunoglobulinas/genética , Neoplasia Residual/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Biología Computacional/métodos , Genes de Inmunoglobulinas/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Receptores de Antígenos de Linfocitos T/genética , Recombinación Genética/genética , Estándares de Referencia , Reproducibilidad de los Resultados
8.
PeerJ Comput Sci ; 4: e148, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-33816803

RESUMEN

BACKGROUND: Labels are a way to add some information on a text, such as functional annotations such as genes on a DNA sequences. V(D)J recombinations are DNA recombinations involving two or three short genes in lymphocytes. Sequencing this short region (500 bp or less) produces labeled sequences and brings insight in the lymphocyte repertoire for onco-hematology or immunology studies. METHODS: We present two indexes for a text with non-overlapping labels. They store the text in a Burrows-Wheeler transform (BWT) and a compressed label sequence in a Wavelet Tree. The label sequence is taken in the order of the text (TL-index) or in the order of the BWT (TLBW-index). Both indexes need a space related to the entropy of the labeled text. RESULTS: These indexes allow efficient text-label queries to count and find labeled patterns. The TLBW-index has an overhead on simple label queries but is very efficient on combined pattern-label queries. We implemented the indexes in C++ and compared them against a baseline solution on pseudo-random as well as on V(D)J labeled texts. DISCUSSION: New indexes such as the ones we proposed improve the way we index and query labeled texts as, for instance, lymphocyte repertoire for hematological and immunological studies.

9.
Genome Biol ; 18(1): 243, 2017 12 28.
Artículo en Inglés | MEDLINE | ID: mdl-29284518

RESUMEN

We introduce a k-mer-based computational protocol, DE-kupl, for capturing local RNA variation in a set of RNA-seq libraries, independently of a reference genome or transcriptome. DE-kupl extracts all k-mers with differential abundance directly from the raw data files. This enables the retrieval of virtually all variation present in an RNA-seq data set. This variation is subsequently assigned to biological events or entities such as differential long non-coding RNAs, splice and polyadenylation variants, introns, repeats, editing or mutation events, and exogenous RNA. Applying DE-kupl to human RNA-seq data sets identified multiple types of novel events, reproducibly across independent RNA-seq experiments.


Asunto(s)
Biología Computacional/métodos , Variación Genética , ARN/genética , Programas Informáticos , Alelos , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Poliadenilación , Empalme del ARN , ARN sin Sentido , ARN Largo no Codificante/genética , ARN Mensajero/genética , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN , Transcriptoma
10.
BMC Bioinformatics ; 18(1): 428, 2017 Sep 29.
Artículo en Inglés | MEDLINE | ID: mdl-28969586

RESUMEN

BACKGROUND: The evolution of next-generation sequencing (NGS) technologies has led to increased focus on RNA-Seq. Many bioinformatic tools have been developed for RNA-Seq analysis, each with unique performance characteristics and configuration parameters. Users face an increasingly complex task in understanding which bioinformatic tools are best for their specific needs and how they should be configured. In order to provide some answers to these questions, we investigate the performance of leading bioinformatic tools designed for RNA-Seq analysis and propose a methodology for systematic evaluation and comparison of performance to help users make well informed choices. RESULTS: To evaluate RNA-Seq pipelines, we developed a suite of two benchmarking tools. SimCT generates simulated datasets that get as close as possible to specific real biological conditions accompanied by the list of genomic incidents and mutations that have been inserted. BenchCT then compares the output of any bioinformatics pipeline that has been run against a SimCT dataset with the simulated genomic and transcriptional variations it contains to give an accurate performance evaluation in addressing specific biological question. We used these tools to simulate a real-world genomic medicine question s involving the comparison of healthy and cancerous cells. Results revealed that performance in addressing a particular biological context varied significantly depending on the choice of tools and settings used. We also found that by combining the output of certain pipelines, substantial performance improvements could be achieved. CONCLUSION: Our research emphasizes the importance of selecting and configuring bioinformatic tools for the specific biological question being investigated to obtain optimal results. Pipeline designers, developers and users should include benchmarking in the context of their biological question as part of their design and quality control process. Our SimBA suite of benchmarking tools provides a reliable basis for comparing the performance of RNA-Seq bioinformatics pipelines in addressing a specific biological question. We would like to see the creation of a reference corpus of data-sets that would allow accurate comparison between benchmarks performed by different groups and the publication of more benchmarks based on this public corpus. SimBA software and data-set are available at http://cractools.gforge.inria.fr/softwares/simba/ .


Asunto(s)
Biología Computacional/métodos , Simulación por Computador , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Fusión Génica , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Mutación INDEL/genética , Polimorfismo de Nucleótido Simple/genética
11.
PLoS One ; 12(2): e0172249, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28182777

RESUMEN

[This corrects the article DOI: 10.1371/journal.pone.0166126.].

12.
Leuk Res ; 53: 1-7, 2017 02.
Artículo en Inglés | MEDLINE | ID: mdl-27930944

RESUMEN

Minimal residual disease (MRD) is known to be an independent prognostic factor in patients with acute lymphoblastic leukemia (ALL). High-throughput sequencing (HTS) is currently used in routine practice for the diagnosis and follow-up of patients with hematological neoplasms. In this retrospective study, we examined the role of immunoglobulin/T-cell receptor-based MRD in patients with ALL by HTS analysis of immunoglobulin H and/or T-cell receptor gamma chain loci in bone marrow samples from 11 patients with ALL, at diagnosis and during follow-up. We assessed the clinical feasibility of using combined HTS and bioinformatics analysis with interactive visualization using Vidjil software. We discuss the advantages and drawbacks of HTS for monitoring MRD. HTS gives a more complete insight of the leukemic population than conventional real-time quantitative PCR (qPCR), and allows identification of new emerging clones at each time point of the monitoring. Thus, HTS monitoring of Ig/TR based MRD is expected to improve the management of patients with ALL.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Neoplasia Residual/diagnóstico , Leucemia-Linfoma Linfoblástico de Células Precursoras/diagnóstico , Médula Ósea , Células Clonales/patología , Estudios de Seguimiento , Genes Codificadores de la Cadena gamma de los Receptores de Linfocito T , Humanos , Cadenas Pesadas de Inmunoglobulina/genética , Monitorización Inmunológica , Neoplasia Residual/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Estudios Retrospectivos , Programas Informáticos
13.
PLoS One ; 11(11): e0166126, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27835690

RESUMEN

BACKGROUND: The B and T lymphocytes are white blood cells playing a key role in the adaptive immunity. A part of their DNA, called the V(D)J recombinations, is specific to each lymphocyte, and enables recognition of specific antigenes. Today, with new sequencing techniques, one can get billions of DNA sequences from these regions. With dedicated Repertoire Sequencing (RepSeq) methods, it is now possible to picture population of lymphocytes, and to monitor more accurately the immune response as well as pathologies such as leukemia. METHODS AND RESULTS: Vidjil is an open-source platform for the interactive analysis of high-throughput sequencing data from lymphocyte recombinations. It contains an algorithm gathering reads into clonotypes according to their V(D)J junctions, a web application made of a sample, experiment and patient database and a visualization for the analysis of clonotypes along the time. Vidjil is implemented in C++, Python and Javascript and licensed under the GPLv3 open-source license. Source code, binaries and a public web server are available at http://www.vidjil.org and at http://bioinfo.lille.inria.fr/vidjil. Using the Vidjil web application consists of four steps: 1. uploading a raw sequence file (typically a FASTQ); 2. running RepSeq analysis software; 3. visualizing the results; 4. annotating the results and saving them for future use. For the end-user, the Vidjil web application needs no specific installation and just requires a connection and a modern web browser. Vidjil is used by labs in hematology or immunology for research and clinical applications.


Asunto(s)
Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Recombinación V(D)J/genética , Navegador Web , Algoritmos , Secuencia de Bases , Humanos , Internet , Linfocitos/inmunología , Linfocitos/metabolismo , Reproducibilidad de los Resultados , Homología de Secuencia de Ácido Nucleico
14.
Br J Haematol ; 173(3): 413-20, 2016 05.
Artículo en Inglés | MEDLINE | ID: mdl-26898266

RESUMEN

High-throughput sequencing (HTS) is considered a technical revolution that has improved our knowledge of lymphoid and autoimmune diseases, changing our approach to leukaemia both at diagnosis and during follow-up. As part of an immunoglobulin/T cell receptor-based minimal residual disease (MRD) assessment of acute lymphoblastic leukaemia patients, we assessed the performance and feasibility of the replacement of the first steps of the approach based on DNA isolation and Sanger sequencing, using a HTS protocol combined with bioinformatics analysis and visualization using the Vidjil software. We prospectively analysed the diagnostic and relapse samples of 34 paediatric patients, thus identifying 125 leukaemic clones with recombinations on multiple loci (TRG, TRD, IGH and IGK), including Dd2/Dd3 and Intron/KDE rearrangements. Sequencing failures were halved (14% vs. 34%, P = 0.0007), enabling more patients to be monitored. Furthermore, more markers per patient could be monitored, reducing the probability of false negative MRD results. The whole analysis, from sample receipt to clinical validation, was shorter than our current diagnostic protocol, with equal resources. V(D)J recombination was successfully assigned by the software, even for unusual recombinations. This study emphasizes the progress that HTS with adapted bioinformatics tools can bring to the diagnosis of leukaemia patients.


Asunto(s)
Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Leucemia-Linfoma Linfoblástico de Células Precursoras/diagnóstico , Adolescente , Adulto , Niño , Preescolar , Células Clonales , Errores Diagnósticos/prevención & control , Reordenamiento Génico de Linfocito T , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Humanos , Lactante , Recién Nacido , Neoplasia Residual/diagnóstico , Estudios Prospectivos , Programas Informáticos , Recombinación V(D)J/genética , Adulto Joven
16.
BMC Genomics ; 15: 409, 2014 May 28.
Artículo en Inglés | MEDLINE | ID: mdl-24885090

RESUMEN

BACKGROUND: V(D)J recombinations in lymphocytes are essential for immunological diversity. They are also useful markers of pathologies. In leukemia, they are used to quantify the minimal residual disease during patient follow-up. However, the full breadth of lymphocyte diversity is not fully understood. RESULTS: We propose new algorithms that process high-throughput sequencing (HTS) data to extract unnamed V(D)J junctions and gather them into clones for quantification. This analysis is based on a seed heuristic and is fast and scalable because in the first phase, no alignment is performed with germline database sequences. The algorithms were applied to TR γ HTS data from a patient with acute lymphoblastic leukemia, and also on data simulating hypermutations. Our methods identified the main clone, as well as additional clones that were not identified with standard protocols. CONCLUSIONS: The proposed algorithms provide new insight into the analysis of high-throughput sequencing data for leukemia, and also to the quantitative assessment of any immunological profile. The methods described here are implemented in a C++ open-source program called Vidjil.


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Leucemia-Linfoma Linfoblástico de Células Precursoras/diagnóstico , Análisis de Secuencia de ADN/métodos , Recombinación V(D)J , Humanos , Neoplasia Residual/diagnóstico , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Programas Informáticos
17.
Genome Biol ; 14(3): R30, 2013 Mar 28.
Artículo en Inglés | MEDLINE | ID: mdl-23537109

RESUMEN

A large number of RNA-sequencing studies set out to predict mutations, splice junctions or fusion RNAs. We propose a method, CRAC, that integrates genomic locations and local coverage to enable such predictions to be made directly from RNA-seq read analysis. A k-mer profiling approach detects candidate mutations, indels and splice or chimeric junctions in each single read. CRAC increases precision compared with existing tools, reaching 99:5% for splice junctions, without losing sensitivity. Importantly, CRAC predictions improve with read length. In cancer libraries, CRAC recovered 74% of validated fusion RNAs and predicted novel recurrent chimeric junctions. CRAC is available at http://crac.gforge.inria.fr.


Asunto(s)
Algoritmos , Análisis de Secuencia de ARN/métodos , Neoplasias de la Mama/genética , Simulación por Computador , Femenino , Biblioteca de Genes , Genoma , Humanos , Sitios de Empalme de ARN/genética
18.
BMC Bioinformatics ; 12: 242, 2011 Jun 17.
Artículo en Inglés | MEDLINE | ID: mdl-21682852

RESUMEN

BACKGROUND: High Throughput Sequencing (HTS) is now heavily exploited for genome (re-) sequencing, metagenomics, epigenomics, and transcriptomics and requires different, but computer intensive bioinformatic analyses. When a reference genome is available, mapping reads on it is the first step of this analysis. Read mapping programs owe their efficiency to the use of involved genome indexing data structures, like the Burrows-Wheeler transform. Recent solutions index both the genome, and the k-mers of the reads using hash-tables to further increase efficiency and accuracy. In various contexts (e.g. assembly or transcriptome analysis), read processing requires to determine the sub-collection of reads that are related to a given sequence, which is done by searching for some k-mers in the reads. Currently, many developments have focused on genome indexing structures for read mapping, but the question of read indexing remains broadly unexplored. However, the increase in sequence throughput urges for new algorithmic solutions to query large read collections efficiently. RESULTS: Here, we present a solution, named Gk arrays, to index large collections of reads, an algorithm to build the structure, and procedures to query it. Once constructed, the index structure is kept in main memory and is repeatedly accessed to answer queries like "given a k-mer, get the reads containing this k-mer (once/at least once)". We compared our structure to other solutions that adapt uncompressed indexing structures designed for long texts and show that it processes queries fast, while requiring much less memory. Our structure can thus handle larger read collections. We provide examples where such queries are adapted to different types of read analysis (SNP detection, assembly, RNA-Seq). CONCLUSIONS: Gk arrays constitute a versatile data structure that enables fast and more accurate read analysis in various contexts. The Gk arrays provide a flexible brick to design innovative programs that mine efficiently genomics, epigenomics, metagenomics, or transcriptomics reads. The Gk arrays library is available under Cecill (GPL compliant) license from http://www.atgc-montpellier.fr/ngs/.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Computadores , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...