Búsqueda | Portal Regional de la BVS

1.

Arthur, Rudy; Schulz-Trieglaff, Ole; Cox, Anthony J; O'Connell, Jared.

Bioinformatics ; 33(1): 142-144, 2017 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-27634946

RESUMEN

MOTIVATION: Ancestry and Kinship Toolkit (AKT) is a statistical genetics tool for analysing large cohorts of whole-genome sequenced samples. It can rapidly detect related samples, characterize sample ancestry, calculate correlation between variants, check Mendel consistency and perform data clustering. AKT brings together the functionality of many state-of-the-art methods, with a focus on speed and a unified interface. We believe it will be an invaluable tool for the curation of large WGS datasets. AVAILABILITY AND IMPLEMENTATION: The source code is available at https://illumina.github.io/akt CONTACTS: joconnell@illumina.com or rudy.d.arthur@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online.

Asunto(s)

Genoma Humano , Linaje , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Análisis por Conglomerados , Familia , Femenino , Humanos , Masculino , Filogenia

2.

Rapid genotype refinement for whole-genome sequencing data using multi-variate normal distributions.

Arthur, Rudy; O'Connell, Jared; Schulz-Trieglaff, Ole; Cox, Anthony J.

Bioinformatics ; 32(15): 2306-12, 2016 08 01.

Artículo en Inglés | MEDLINE | ID: mdl-27153730

RESUMEN

MOTIVATION: Whole-genome low-coverage sequencing has been combined with linkage-disequilibrium (LD)-based genotype refinement to accurately and cost-effectively infer genotypes in large cohorts of individuals. Most genotype refinement methods are based on hidden Markov models, which are accurate but computationally expensive. We introduce an algorithm that models LD using a simple multivariate Gaussian distribution. The key feature of our algorithm is its speed. RESULTS: Our method is hundreds of times faster than other methods on the same data set and its scaling behaviour is linear in the number of samples. We demonstrate the performance of the method on both low- and high-coverage samples. AVAILABILITY AND IMPLEMENTATION: The source code is available at https://github.com/illumina/marvin CONTACT: rarthur@illumina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Genotipo , Desequilibrio de Ligamiento , Programas Informáticos , Algoritmos , Humanos , Distribución Normal

3.

Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications.

Chen, Xiaoyu; Schulz-Trieglaff, Ole; Shaw, Richard; Barnes, Bret; Schlesinger, Felix; Källberg, Morten; Cox, Anthony J; Kruglyak, Semyon; Saunders, Christopher T.

Bioinformatics ; 32(8): 1220-2, 2016 04 15.

Artículo en Inglés | MEDLINE | ID: mdl-26647377

RESUMEN

UNLABELLED: : We describe Manta, a method to discover structural variants and indels from next generation sequencing data. Manta is optimized for rapid germline and somatic analysis, calling structural variants, medium-sized indels and large insertions on standard compute hardware in less than a tenth of the time that comparable methods require to identify only subsets of these variant types: for example NA12878 at 50× genomic coverage is analyzed in less than 20 min. Manta can discover and score variants based on supporting paired and split-read evidence, with scoring models optimized for germline analysis of diploid individuals and somatic analysis of tumor-normal sample pairs. Call quality is similar to or better than comparable methods, as determined by pedigree consistency of germline calls and comparison of somatic calls to COSMIC database variants. Manta consistently assembles a higher fraction of its calls to base-pair resolution, allowing for improved downstream annotation and analysis of clinical significance. We provide Manta as a community resource to facilitate practical and routine structural variant analysis in clinical and research sequencing scenarios. AVAILABILITY AND IMPLEMENTATION: Manta is released under the open-source GPLv3 license. Source code, documentation and Linux binaries are available from https://github.com/Illumina/manta. CONTACT: csaunders@illumina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento , Mutación INDEL , Neoplasias/genética , ADN de Neoplasias , Genoma , Genómica , Humanos , Programas Informáticos

4.

NxRepair: error correction in de novo sequence assembly using Nextera mate pairs.

Murphy, Rebecca R; O'Connell, Jared; Cox, Anthony J; Schulz-Trieglaff, Ole.

PeerJ ; 3: e996, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26056623

RESUMEN

Scaffolding errors and incorrect repeat disambiguation during de novo assembly can result in large scale misassemblies in draft genomes. Nextera mate pair sequencing data provide additional information to resolve assembly ambiguities during scaffolding. Here, we introduce NxRepair, an open source toolkit for error correction in de novo assemblies that uses Nextera mate pair libraries to identify and correct large-scale errors. We show that NxRepair can identify and correct large scaffolding errors, without use of a reference sequence, resulting in quantitative improvements in the assembly quality. NxRepair can be downloaded from GitHub or PyPI, the Python Package Index; a tutorial and user documentation are also available.

5.

NxTrim: optimized trimming of Illumina mate pair reads.

O'Connell, Jared; Schulz-Trieglaff, Ole; Carlson, Emma; Hims, Matthew M; Gormley, Niall A; Cox, Anthony J.

Bioinformatics ; 31(12): 2035-7, 2015 Jun 15.

Artículo en Inglés | MEDLINE | ID: mdl-25661542

RESUMEN

MOTIVATION: Mate pair protocols add to the utility of paired-end sequencing by boosting the genomic distance spanned by each pair of reads, potentially allowing larger repeats to be bridged and resolved. The Illumina Nextera Mate Pair (NMP) protocol uses a circularization-based strategy that leaves behind 38-bp adapter sequences, which must be computationally removed from the data. While 'adapter trimming' is a well-studied area of bioinformatics, existing tools do not fully exploit the particular properties of NMP data and discard more data than is necessary. RESULTS: We present NxTrim, a tool that strives to discard as little sequence as possible from NMP reads. NxTrim makes full use of the sequence on both sides of the adapter site to build 'virtual libraries' of mate pairs, paired-end reads and single-ended reads. For bacterial data, we show that aggregating these datasets allows a single NMP library to yield an assembly whose quality compares favourably to that obtained from regular paired-end reads. AVAILABILITY AND IMPLEMENTATION: The source code is available at https://github.com/sequencing/NxTrim

Asunto(s)

Bacterias/genética , Biología Computacional/métodos , Genoma Bacteriano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Biblioteca de Genes

6.

BEETL-fastq: a searchable compressed archive for DNA reads.

Janin, Lilian; Schulz-Trieglaff, Ole; Cox, Anthony J.

Bioinformatics ; 30(19): 2796-801, 2014 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-24950811

RESUMEN

MOTIVATION: FASTQ is a standard file format for DNA sequencing data, which stores both nucleotides and quality scores. A typical sequencing study can easily generate hundreds of gigabytes of FASTQ files, while public archives such as ENA and NCBI and large international collaborations such as the Cancer Genome Atlas can accumulate many terabytes of data in this format. Compression tools such as gzip are often used to reduce the storage burden but have the disadvantage that the data must be decompressed before they can be used. Here, we present BEETL-fastq, a tool that not only compresses FASTQ-formatted DNA reads more compactly than gzip but also permits rapid search for k-mer queries within the archived sequences. Importantly, the full FASTQ record of each matching read or read pair is returned, allowing the search results to be piped directly to any of the many standard tools that accept FASTQ data as input. RESULTS: We show that 6.6 terabytes of human reads in FASTQ format can be transformed into 1.7 terabytes of indexed files, from where we can search for 1, 10, 100, 1000 and a million of 30-mers in 3, 8, 14, 45 and 567 s, respectively, plus 20 ms per output read. Useful applications of the search capability are highlighted, including the genotyping of structural variant breakpoints and 'in silico pull-down' experiments in which only the reads that cover a region of interest are selectively extracted for the purposes of variant calling or visualization. AVAILABILITY AND IMPLEMENTATION: BEETL-fastq is part of the BEETL library, available as a github repository at github.com/BEETL/BEETL.

Asunto(s)

Compresión de Datos/métodos , Neoplasias/genética , Análisis de Secuencia de ADN/métodos , Algoritmos , Simulación por Computador , ADN , Genoma , Genoma Humano , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Programas Informáticos

7.

metaBEETL: high-throughput analysis of heterogeneous microbial populations from shotgun DNA sequences.

Ander, Christina; Schulz-Trieglaff, Ole B; Stoye, Jens; Cox, Anthony J.

BMC Bioinformatics ; 14 Suppl 5: S2, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-23734710

RESUMEN

Environmental shotgun sequencing (ESS) has potential to give greater insight into microbial communities than targeted sequencing of 16S regions, but requires much higher sequence coverage. The advent of next-generation sequencing has made it feasible for the Human Microbiome Project and other initiatives to generate ESS data on a large scale, but computationally efficient methods for analysing such data sets are needed.Here we present metaBEETL, a fast taxonomic classifier for environmental shotgun sequences. It uses a Burrows-Wheeler Transform (BWT) index of the sequencing reads and an indexed database of microbial reference sequences. Unlike other BWT-based tools, our method has no upper limit on the number or the total size of the reference sequences in its database. By capturing sequence relationships between strains, our reference index also allows us to classify reads which are not unique to an individual strain but are nevertheless specific to some higher phylogenetic order.Tested on datasets with known taxonomic composition, metaBEETL gave results that are competitive with existing similarity-based tools: due to normalization steps which other classifiers lack, the taxonomic profile computed by metaBEETL closely matched the true environmental profile. At the same time, its moderate running time and low memory footprint allow metaBEETL to scale well to large data sets.Code to construct the BWT indexed database and for the taxonomic classification is part of the BEETL library, available as a github repository at git@github.com:BEETL/BEETL.git.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Metagenómica/métodos , Microbiota , Análisis de Secuencia de ADN/métodos , Algoritmos , Microbiología Ambiental , Humanos , Filogenia

8.

Genomic variation among contemporary Pseudomonas aeruginosa isolates from chronically infected cystic fibrosis patients.

Chung, Jade C S; Becq, Jennifer; Fraser, Louise; Schulz-Trieglaff, Ole; Bond, Nicholas J; Foweraker, Juliet; Bruce, Kenneth D; Smith, Geoffrey P; Welch, Martin.

J Bacteriol ; 194(18): 4857-66, 2012 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-22753054

RESUMEN

The airways of individuals with cystic fibrosis (CF) often become chronically infected with unique strains of the opportunistic pathogen Pseudomonas aeruginosa. Several lines of evidence suggest that the infecting P. aeruginosa lineage diversifies in the CF lung niche, yet so far this contemporary diversity has not been investigated at a genomic level. In this work, we sequenced the genomes of pairs of randomly selected contemporary isolates sampled from the expectorated sputum of three chronically infected adult CF patients. Each patient was infected by a distinct strain of P. aeruginosa. Single nucleotide polymorphisms (SNPs) and insertions/deletions (indels) were identified in the DNA common to the paired isolates from different patients. The paired isolates from one patient differed due to just 1 SNP and 8 indels. The paired isolates from a second patient differed due to 54 SNPs and 38 indels. The pair of isolates from the third patient both contained a mutS mutation, which conferred a hypermutator phenotype; these isolates cumulatively differed due to 344 SNPs and 93 indels. In two of the pairs of isolates, a different accessory genome composition, specifically integrated prophage, was identified in one but not the other isolate of each pair. We conclude that contemporary isolates from a single sputum sample can differ at the SNP, indel, and accessory genome levels and that the cross-sectional genomic variation among coeval pairs of P. aeruginosa CF isolates can be comparable to the variation previously reported to differentiate between paired longitudinally sampled isolates.

Asunto(s)

Fibrosis Quística/complicaciones , Variación Genética , Infecciones por Pseudomonas/microbiología , Pseudomonas aeruginosa/clasificación , Pseudomonas aeruginosa/genética , Enfermedad Crónica , ADN Bacteriano/química , ADN Bacteriano/genética , Humanos , Mutación INDEL , Mutación Puntual , Profagos/genética , Pseudomonas aeruginosa/aislamiento & purificación , Análisis de Secuencia de ADN , Esputo/microbiología

9.

Rapid whole-genome sequencing for investigation of a neonatal MRSA outbreak.

Köser, Claudio U; Holden, Matthew T G; Ellington, Matthew J; Cartwright, Edward J P; Brown, Nicholas M; Ogilvy-Stuart, Amanda L; Hsu, Li Yang; Chewapreecha, Claire; Croucher, Nicholas J; Harris, Simon R; Sanders, Mandy; Enright, Mark C; Dougan, Gordon; Bentley, Stephen D; Parkhill, Julian; Fraser, Louise J; Betley, Jason R; Schulz-Trieglaff, Ole B; Smith, Geoffrey P; Peacock, Sharon J.

N Engl J Med ; 366(24): 2267-75, 2012 Jun 14.

Artículo en Inglés | MEDLINE | ID: mdl-22693998

RESUMEN

BACKGROUND: Isolates of methicillin-resistant Staphylococcus aureus (MRSA) belonging to a single lineage are often indistinguishable by means of current typing techniques. Whole-genome sequencing may provide improved resolution to define transmission pathways and characterize outbreaks. METHODS: We investigated a putative MRSA outbreak in a neonatal intensive care unit. By using rapid high-throughput sequencing technology with a clinically relevant turnaround time, we retrospectively sequenced the DNA from seven isolates associated with the outbreak and another seven MRSA isolates associated with carriage of MRSA or bacteremia in the same hospital. RESULTS: We constructed a phylogenetic tree by comparing single-nucleotide polymorphisms (SNPs) in the core genome to a reference genome (an epidemic MRSA clone, EMRSA-15 [sequence type 22]). This revealed a distinct cluster of outbreak isolates and clear separation between these and the nonoutbreak isolates. A previously missed transmission event was detected between two patients with bacteremia who were not part of the outbreak. We created an artificial "resistome" of antibiotic-resistance genes and demonstrated concordance between it and the results of phenotypic susceptibility testing; we also created a "toxome" consisting of toxin genes. One outbreak isolate had a hypermutator phenotype with a higher number of SNPs than the other outbreak isolates, highlighting the difficulty of imposing a simple threshold for the number of SNPs between isolates to decide whether they are part of a recent transmission chain. CONCLUSIONS: Whole-genome sequencing can provide clinically relevant data within a time frame that can influence patient care. The need for automated data interpretation and the provision of clinically meaningful reports represent hurdles to clinical implementation. (Funded by the U.K. Clinical Research Collaboration Translational Infection Research Initiative and others.).

Asunto(s)

Bacteriemia/microbiología , Brotes de Enfermedades , Genoma Bacteriano , Staphylococcus aureus Resistente a Meticilina/genética , Infecciones Estafilocócicas/epidemiología , Bacteriemia/epidemiología , Infección Hospitalaria/epidemiología , Infección Hospitalaria/microbiología , ADN Bacteriano/análisis , Humanos , Recién Nacido , Unidades de Cuidado Intensivo Neonatal , Resistencia a la Meticilina/genética , Staphylococcus aureus Resistente a Meticilina/aislamiento & purificación , Pruebas de Sensibilidad Microbiana , Fenotipo , Filogenia , Polimorfismo de Nucleótido Simple , Estudios Retrospectivos , Análisis de Secuencia de ADN/métodos , Infecciones Estafilocócicas/microbiología

10.

Genome sequencing and analysis of the Tasmanian devil and its transmissible cancer.

Murchison, Elizabeth P; Schulz-Trieglaff, Ole B; Ning, Zemin; Alexandrov, Ludmil B; Bauer, Markus J; Fu, Beiyuan; Hims, Matthew; Ding, Zhihao; Ivakhno, Sergii; Stewart, Caitlin; Ng, Bee Ling; Wong, Wendy; Aken, Bronwen; White, Simon; Alsop, Amber; Becq, Jennifer; Bignell, Graham R; Cheetham, R Keira; Cheng, William; Connor, Thomas R; Cox, Anthony J; Feng, Zhi-Ping; Gu, Yong; Grocock, Russell J; Harris, Simon R; Khrebtukova, Irina; Kingsbury, Zoya; Kowarsky, Mark; Kreiss, Alexandre; Luo, Shujun; Marshall, John; McBride, David J; Murray, Lisa; Pearse, Anne-Maree; Raine, Keiran; Rasolonjatovo, Isabelle; Shaw, Richard; Tedder, Philip; Tregidgo, Carolyn; Vilella, Albert J; Wedge, David C; Woods, Gregory M; Gormley, Niall; Humphray, Sean; Schroth, Gary; Smith, Geoffrey; Hall, Kevin; Searle, Stephen M J; Carter, Nigel P; Papenfuss, Anthony T.

Cell ; 148(4): 780-91, 2012 Feb 17.

Artículo en Inglés | MEDLINE | ID: mdl-22341448

RESUMEN

The Tasmanian devil (Sarcophilus harrisii), the largest marsupial carnivore, is endangered due to a transmissible facial cancer spread by direct transfer of living cancer cells through biting. Here we describe the sequencing, assembly, and annotation of the Tasmanian devil genome and whole-genome sequences for two geographically distant subclones of the cancer. Genomic analysis suggests that the cancer first arose from a female Tasmanian devil and that the clone has subsequently genetically diverged during its spread across Tasmania. The devil cancer genome contains more than 17,000 somatic base substitution mutations and bears the imprint of a distinct mutational process. Genotyping of somatic mutations in 104 geographically and temporally distributed Tasmanian devil tumors reveals the pattern of evolution and spread of this parasitic clonal lineage, with evidence of a selective sweep in one geographical area and persistence of parallel lineages in other populations.

Asunto(s)

Neoplasias Faciales/veterinaria , Inestabilidad Genómica , Marsupiales/genética , Mutación , Animales , Evolución Clonal , Especies en Peligro de Extinción , Neoplasias Faciales/epidemiología , Neoplasias Faciales/genética , Neoplasias Faciales/patología , Femenino , Estudio de Asociación del Genoma Completo , Masculino , Datos de Secuencia Molecular , Tasmania/epidemiología

11.

Efficient de novo assembly of single-cell bacterial genomes from short-read data sets.

Chitsaz, Hamidreza; Yee-Greenbaum, Joyclyn L; Tesler, Glenn; Lombardo, Mary-Jane; Dupont, Christopher L; Badger, Jonathan H; Novotny, Mark; Rusch, Douglas B; Fraser, Louise J; Gormley, Niall A; Schulz-Trieglaff, Ole; Smith, Geoffrey P; Evers, Dirk J; Pevzner, Pavel A; Lasken, Roger S.

Nat Biotechnol ; 29(10): 915-21, 2011 Sep 18.

Artículo en Inglés | MEDLINE | ID: mdl-21926975

RESUMEN

Whole genome amplification by the multiple displacement amplification (MDA) method allows sequencing of DNA from single cells of bacteria that cannot be cultured. Assembling a genome is challenging, however, because MDA generates highly nonuniform coverage of the genome. Here we describe an algorithm tailored for short-read data from single cells that improves assembly through the use of a progressively increasing coverage cutoff. Assembly of reads from single Escherichia coli and Staphylococcus aureus cells captures >91% of genes within contigs, approaching the 95% captured from an assembly based on many E. coli cells. We apply this method to assemble a genome from a single cell of an uncultivated SAR324 clade of Deltaproteobacteria, a cosmopolitan bacterial lineage in the global ocean. Metabolic reconstruction suggests that SAR324 is aerobic, motile and chemotaxic. Our approach enables acquisition of genome assemblies for individual uncultivated bacteria using only short reads, providing cell-specific genetic information absent from metagenomic studies.

Asunto(s)

Bacterias/citología , Bacterias/genética , Bases de Datos de Ácidos Nucleicos , Genoma Bacteriano/genética , Análisis de Secuencia de ADN/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Secuencia de Bases , Mapeo Contig , Deltaproteobacteria/citología , Deltaproteobacteria/genética , Escherichia coli/citología , Escherichia coli/genética , Funciones de Verosimilitud , Staphylococcus aureus/citología , Staphylococcus aureus/genética

12.

A comprehensive catalogue of somatic mutations from a human cancer genome.

Pleasance, Erin D; Cheetham, R Keira; Stephens, Philip J; McBride, David J; Humphray, Sean J; Greenman, Chris D; Varela, Ignacio; Lin, Meng-Lay; Ordóñez, Gonzalo R; Bignell, Graham R; Ye, Kai; Alipaz, Julie; Bauer, Markus J; Beare, David; Butler, Adam; Carter, Richard J; Chen, Lina; Cox, Anthony J; Edkins, Sarah; Kokko-Gonzales, Paula I; Gormley, Niall A; Grocock, Russell J; Haudenschild, Christian D; Hims, Matthew M; James, Terena; Jia, Mingming; Kingsbury, Zoya; Leroy, Catherine; Marshall, John; Menzies, Andrew; Mudie, Laura J; Ning, Zemin; Royce, Tom; Schulz-Trieglaff, Ole B; Spiridou, Anastassia; Stebbings, Lucy A; Szajkowski, Lukasz; Teague, Jon; Williamson, David; Chin, Lynda; Ross, Mark T; Campbell, Peter J; Bentley, David R; Futreal, P Andrew; Stratton, Michael R.

Nature ; 463(7278): 191-6, 2010 Jan 14.

Artículo en Inglés | MEDLINE | ID: mdl-20016485

RESUMEN

All cancers carry somatic mutations. A subset of these somatic alterations, termed driver mutations, confer selective growth advantage and are implicated in cancer development, whereas the remainder are passengers. Here we have sequenced the genomes of a malignant melanoma and a lymphoblastoid cell line from the same person, providing the first comprehensive catalogue of somatic mutations from an individual cancer. The catalogue provides remarkable insights into the forces that have shaped this cancer genome. The dominant mutational signature reflects DNA damage due to ultraviolet light exposure, a known risk factor for malignant melanoma, whereas the uneven distribution of mutations across the genome, with a lower prevalence in gene footprints, indicates that DNA repair has been preferentially deployed towards transcribed regions. The results illustrate the power of a cancer genome sequence to reveal traces of the DNA damage, repair, mutation and selection processes that were operative years before the cancer became symptomatic.

Asunto(s)

Genes Relacionados con las Neoplasias/genética , Genoma Humano/genética , Mutación/genética , Neoplasias/genética , Adulto , Línea Celular Tumoral , Daño del ADN/genética , Análisis Mutacional de ADN , Reparación del ADN/genética , Dosificación de Gen/genética , Humanos , Pérdida de Heterocigocidad/genética , Masculino , Melanoma/etiología , Melanoma/genética , MicroARNs/genética , Mutagénesis Insercional/genética , Neoplasias/etiología , Polimorfismo de Nucleótido Simple/genética , Medicina de Precisión , Eliminación de Secuencia/genética , Rayos Ultravioleta

13.

Statistical quality assessment and outlier detection for liquid chromatography-mass spectrometry experiments.

Schulz-Trieglaff, Ole; Machtejevas, Egidijus; Reinert, Knut; Schlüter, Hartmut; Thiemann, Joachim; Unger, Klaus.

BioData Min ; 2(1): 4, 2009 Apr 07.

Artículo en Inglés | MEDLINE | ID: mdl-19351414

RESUMEN

BACKGROUND: Quality assessment methods, that are common place in engineering and industrial production, are not widely spread in large-scale proteomics experiments. But modern technologies such as Multi-Dimensional Liquid Chromatography coupled to Mass Spectrometry (LC-MS) produce large quantities of proteomic data. These data are prone to measurement errors and reproducibility problems such that an automatic quality assessment and control become increasingly important. RESULTS: We propose a methodology to assess the quality and reproducibility of data generated in quantitative LC-MS experiments. We introduce quality descriptors that capture different aspects of the quality and reproducibility of LC-MS data sets. Our method is based on the Mahalanobis distance and a robust Principal Component Analysis. CONCLUSION: We evaluate our approach on several data sets of different complexities and show that we are able to precisely detect LC-MS runs of poor signal quality in large-scale studies.

14.

LC-MSsim--a simulation software for liquid chromatography mass spectrometry data.

Schulz-Trieglaff, Ole; Pfeifer, Nico; Gröpl, Clemens; Kohlbacher, Oliver; Reinert, Knut.

BMC Bioinformatics ; 9: 423, 2008 Oct 08.

Artículo en Inglés | MEDLINE | ID: mdl-18842122

RESUMEN

BACKGROUND: Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyze the protein content of biological samples in large scale studies. The data resulting from an LC-MS experiment is huge, highly complex and noisy. Accordingly, it has sparked new developments in Bioinformatics, especially in the fields of algorithm development, statistics and software engineering. In a quantitative label-free mass spectrometry experiment, crucial steps are the detection of peptide features in the mass spectra and the alignment of samples by correcting for shifts in retention time. At the moment, it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists only for peptide identification algorithms but no data that represents a ground truth for the evaluation of feature detection, alignment and filtering algorithms. RESULTS: We present LC-MSsim, a simulation software for LC-ESI-MS experiments. It simulates ESI spectra on the MS level. It reads a list of proteins from a FASTA file and digests the protein mixture using a user-defined enzyme. The software creates an LC-MS data set using a predictor for the retention time of the peptides and a model for peak shapes and elution profiles of the mass spectral peaks. Our software also offers the possibility to add contaminants, to change the background noise level and includes a model for the detectability of peptides in mass spectra. After the simulation, LC-MSsim writes the simulated data to mzData, a public XML format. The software also stores the positions (monoisotopic m/z and retention time) and ion counts of the simulated ions in separate files. CONCLUSION: LC-MSsim generates simulated LC-MS data sets and incorporates models for peak shapes and contaminations. Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed. We anticipate that LC-MSsim will be useful to the wider community to perform benchmark studies and comparisons between computational tools.

Asunto(s)

Artefactos , Cromatografía Liquida/normas , Espectrometría de Masas/normas , Programas Informáticos , Algoritmos , Biología Computacional , Simulación por Computador , Compresión de Datos/métodos , Humanos , Iones/análisis , Péptidos/análisis , Control de Calidad , Valores de Referencia , Espectrometría de Masa por Ionización de Electrospray

15.

Computational quantification of peptides from LC-MS data.

Schulz-Trieglaff, Ole; Hussong, Rene; Gröpl, Clemens; Leinenbach, Andreas; Hildebrandt, Andreas; Huber, Christian; Reinert, Knut.

J Comput Biol ; 15(7): 685-704, 2008 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-18707556

RESUMEN

Liquid chromatography coupled to mass spectrometry (LC-MS) has become a major tool for the study of biological processes. High-throughput LC-MS experiments are frequently conducted in modern laboratories, generating an enormous amount of data per day. A manual inspection is therefore no longer a feasible task. Consequently, there is a need for computational tools that can rapidly provide information about mass, elution time, and abundance of the compounds in a LC-MS sample. We present an algorithm for the detection and quantification of peptides in LC-MS data. Our approach is flexible and independent of the MS technology in use. It is based on a combination of the sweep line paradigm with a novel wavelet function tailored to detect isotopic patterns of peptides. We propose a simple voting schema to use the redundant information in consecutive scans for an accurate determination of monoisotopic masses and charge states. By explicitly modeling the instrument inaccuracy, we are also able to cope with data sets of different quality and resolution. We evaluate our technique on data from different instruments and show that we can rapidly estimate mass, centroid of retention time, and abundance of peptides in a sound algorithmic framework. Finally, we compare the performance of our method to several other techniques on three data sets of varying complexity.

Asunto(s)

Algoritmos , Cromatografía Liquida/métodos , Espectrometría de Masas/métodos , Péptidos/análisis , Animales , Halobacterium/química , Humanos , Mioglobina/química , Análisis de Regresión , Programas Informáticos

16.

OpenMS - an open-source software framework for mass spectrometry.

Sturm, Marc; Bertsch, Andreas; Gröpl, Clemens; Hildebrandt, Andreas; Hussong, Rene; Lange, Eva; Pfeifer, Nico; Schulz-Trieglaff, Ole; Zerck, Alexandra; Reinert, Knut; Kohlbacher, Oliver.

BMC Bioinformatics ; 9: 163, 2008 Mar 26.

Artículo en Inglés | MEDLINE | ID: mdl-18366760

RESUMEN

BACKGROUND: Mass spectrometry is an essential analytical technique for high-throughput analysis in proteomics and metabolomics. The development of new separation techniques, precise mass analyzers and experimental protocols is a very active field of research. This leads to more complex experimental setups yielding ever increasing amounts of data. Consequently, analysis of the data is currently often the bottleneck for experimental studies. Although software tools for many data analysis tasks are available today, they are often hard to combine with each other or not flexible enough to allow for rapid prototyping of a new analysis workflow. RESULTS: We present OpenMS, a software framework for rapid application development in mass spectrometry. OpenMS has been designed to be portable, easy-to-use and robust while offering a rich functionality ranging from basic data structures to sophisticated algorithms for data analysis. This has already been demonstrated in several studies. CONCLUSION: OpenMS is available under the Lesser GNU Public License (LGPL) from the project website at http://www.openms.de.

Asunto(s)

Algoritmos , Espectrometría de Masas/métodos , Lenguajes de Programación , Programas Informáticos

17.

A geometric approach for the alignment of liquid chromatography-mass spectrometry data.

Lange, Eva; Gröpl, Clemens; Schulz-Trieglaff, Ole; Leinenbach, Andreas; Huber, Christian; Reinert, Knut.

Bioinformatics ; 23(13): i273-81, 2007 Jul 01.

Artículo en Inglés | MEDLINE | ID: mdl-17646306

RESUMEN

MOTIVATION: Liquid chromatography coupled to mass spectrometry (LC-MS) and combined with tandem mass spectrometry (LC-MS/MS) have become a prominent tool for the analysis of complex proteomic samples. An important step in a typical workflow is the combination of results from multiple LC-MS experiments to improve confidence in the obtained measurements or to compare results from different samples. To do so, a suitable mapping or alignment between the data sets needs to be estimated. The alignment has to correct for variations in mass and elution time which are present in all mass spectrometry experiments. RESULTS: We propose a novel algorithm to align LC-MS samples and to match corresponding ion species across samples. Our algorithm matches landmark signals between two data sets using a geometric technique based on pose clustering. Variations in mass and retention time are corrected by an affine dewarping function estimated from matched landmarks. We use the pairwise dewarping in an algorithm for aligning multiple samples. We show that our pose clustering approach is fast and reliable as compared to previous approaches. It is robust in the presence of noise and able to accurately align samples with only few common ion species. In addition, we can easily handle different kinds of LC-MS data and adopt our algorithm to new mass spectrometry technologies. AVAILABILITY: This algorithm is implemented as part of the OpenMS software library for shotgun proteomics and available under the Lesser GNU Public License (LGPL) at www.openms.de.

Asunto(s)

Algoritmos , Cromatografía Liquida/métodos , Espectrometría de Masas/métodos , Mapeo Peptídico/métodos , Proteoma/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos

18.

TOPP--the OpenMS proteomics pipeline.

Kohlbacher, Oliver; Reinert, Knut; Gröpl, Clemens; Lange, Eva; Pfeifer, Nico; Schulz-Trieglaff, Ole; Sturm, Marc.

Bioinformatics ; 23(2): e191-7, 2007 Jan 15.

Artículo en Inglés | MEDLINE | ID: mdl-17237091

RESUMEN

MOTIVATION: Experimental techniques in proteomics have seen rapid development over the last few years. Volume and complexity of the data have both been growing at a similar rate. Accordingly, data management and analysis are one of the major challenges in proteomics. Flexible algorithms are required to handle changing experimental setups and to assist in developing and validating new methods. In order to facilitate these studies, it would be desirable to have a flexible 'toolbox' of versatile and user-friendly applications allowing for rapid construction of computational workflows in proteomics. RESULTS: We describe a set of tools for proteomics data analysis-TOPP, The OpenMS Proteomics Pipeline. TOPP provides a set of computational tools which can be easily combined into analysis pipelines even by non-experts and can be used in proteomics workflows. These applications range from useful utilities (file format conversion, peak picking) over wrapper applications for known applications (e.g. Mascot) to completely new algorithmic techniques for data reduction and data analysis. We anticipate that TOPP will greatly facilitate rapid prototyping of proteomics data evaluation pipelines. As such, we describe the basic concepts and the current abilities of TOPP and illustrate these concepts in the context of two example applications: the identification of peptides from a raw dataset through database search and the complex analysis of a standard addition experiment for the absolute quantitation of biomarkers. The latter example demonstrates TOPP's ability to construct flexible analysis pipelines in support of complex experimental setups. AVAILABILITY: The TOPP components are available as open-source software under the lesser GNU public license (LGPL). Source code is available from the project website at www.OpenMS.de

Asunto(s)

Sistemas de Administración de Bases de Datos , Bases de Datos de Proteínas , Almacenamiento y Recuperación de la Información/métodos , Mapeo Peptídico/métodos , Proteoma/química , Proteómica/métodos , Programas Informáticos , Algoritmos , Gráficos por Computador , Lenguajes de Programación , Interfaz Usuario-Computador

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA