Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Proteomics ; 24(8): e2300084, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38380501

RESUMEN

Assigning statistical confidence estimates to discoveries produced by a tandem mass spectrometry proteomics experiment is critical to enabling principled interpretation of the results and assessing the cost/benefit ratio of experimental follow-up. The most common technique for computing such estimates is to use target-decoy competition (TDC), in which observed spectra are searched against a database of real (target) peptides and a database of shuffled or reversed (decoy) peptides. TDC procedures for estimating the false discovery rate (FDR) at a given score threshold have been developed for application at the level of spectra, peptides, or proteins. Although these techniques are relatively straightforward to implement, it is common in the literature to skip over the implementation details or even to make mistakes in how the TDC procedures are applied in practice. Here we present Crema, an open-source Python tool that implements several TDC methods of spectrum-, peptide- and protein-level FDR estimation. Crema is compatible with a variety of existing database search tools and provides a straightforward way to obtain robust FDR estimates.


Asunto(s)
Algoritmos , Péptidos , Bases de Datos de Proteínas , Péptidos/química , Proteínas/análisis , Proteómica/métodos
2.
J Proteome Res ; 23(6): 1894-1906, 2024 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-38652578

RESUMEN

Searching for tandem mass spectrometry proteomics data against a database is a well-established method for assigning peptide sequences to observed spectra but typically cannot identify peptides harboring unexpected post-translational modifications (PTMs). Open modification searching aims to address this problem by allowing a spectrum to match a peptide even if the spectrum's precursor mass differs from the peptide mass. However, expanding the search space in this way can lead to a loss of statistical power to detect peptides. We therefore developed a method, called CONGA (combining open and narrow searches with group-wise analysis), that takes into account results from both types of searches─a traditional "narrow window" search and an open modification search─while carrying out rigorous false discovery rate control. The result is an algorithm that provides the best of both worlds: the ability to detect unexpected PTMs without a concomitant loss of power to detect unmodified peptides.


Asunto(s)
Algoritmos , Bases de Datos de Proteínas , Procesamiento Proteico-Postraduccional , Proteómica , Espectrometría de Masas en Tándem , Espectrometría de Masas en Tándem/métodos , Proteómica/métodos , Péptidos/análisis , Péptidos/química , Humanos , Programas Informáticos , Secuencia de Aminoácidos
3.
J Proteome Res ; 23(6): 1907-1914, 2024 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-38687997

RESUMEN

Traditional database search methods for the analysis of bottom-up proteomics tandem mass spectrometry (MS/MS) data are limited in their ability to detect peptides with post-translational modifications (PTMs). Recently, "open modification" database search strategies, in which the requirement that the mass of the database peptide closely matches the observed precursor mass is relaxed, have become popular as ways to find a wider variety of types of PTMs. Indeed, in one study, Kong et al. reported that the open modification search tool MSFragger can achieve higher statistical power to detect peptides than a traditional "narrow window" database search. We investigated this claim empirically and, in the process, uncovered a potential general problem with false discovery rate (FDR) control in the machine learning postprocessors Percolator and PeptideProphet. This problem might have contributed to Kong et al.'s report that their empirical results suggest that false discovery (FDR) control in the narrow window setting might generally be compromised. Indeed, reanalyzing the same data while using a more standard form of target-decoy competition-based FDR control, we found that, after accounting for chimeric spectra as well as for the inherent difference in the number of candidates in open and narrow searches, the data does not provide sufficient evidence that FDR control in proteomics MS/MS database search is inherently problematic.


Asunto(s)
Bases de Datos de Proteínas , Procesamiento Proteico-Postraduccional , Proteómica , Espectrometría de Masas en Tándem , Espectrometría de Masas en Tándem/métodos , Proteómica/métodos , Péptidos/análisis , Péptidos/química , Aprendizaje Automático , Humanos , Algoritmos , Programas Informáticos
4.
J Proteome Res ; 22(7): 2172-2178, 2023 07 07.
Artículo en Inglés | MEDLINE | ID: mdl-37261867

RESUMEN

Controlling the false discovery rate (FDR) among discoveries from a tandem mass spectrometry proteomics experiment using target decoy competition (TDC) controls only the proportion of false discoveries in an average sense. Thus, for any particular analysis, even with a valid FDR control procedure, the proportion of false discoveries (the FDP) may be higher than the specified FDR threshold. We demonstrate this phenomenon using real data and describe two recently developed methods that help bridge the gap between controlling the expected or average rate of false discoveries and the empirical rate (FDP). The FDP Stepdown method controls the FDP at any desired confidence level, and the TDC Uniform Band provides a confidence, or upper prediction bound, on the FDP in TDC's list of discoveries.


Asunto(s)
Algoritmos , Proteómica , Bases de Datos de Proteínas , Proteómica/métodos , Espectrometría de Masas en Tándem
5.
Bioinformatics ; 38(Suppl_2): ii82-ii88, 2022 09 16.
Artículo en Inglés | MEDLINE | ID: mdl-36124786

RESUMEN

MOTIVATION: Target-decoy competition (TDC) is a commonly used method for false discovery rate (FDR) control in the analysis of tandem mass spectrometry data. This type of competition-based FDR control has recently gained significant popularity in other fields after Barber and Candès laid its theoretical foundation in a more general setting that included the feature selection problem. In both cases, the competition is based on a head-to-head comparison between an (observed) target score and a corresponding decoy (knockoff) score. However, the effectiveness of TDC depends on whether the data are homogeneous, which is often not the case: in many settings, the data consist of groups with different score profiles or different proportions of true nulls. In such cases, applying TDC while ignoring the group structure often yields imbalanced lists of discoveries, where some groups might include relatively many false discoveries and other groups include relatively very few. On the other hand, as we show, the alternative approach of applying TDC separately to each group does not rigorously control the FDR. RESULTS: We developed Group-walk, a procedure that controls the FDR in the target-decoy/knockoff setting while taking into account a given group structure. Group-walk is derived from the recently developed AdaPT-a general framework for controlling the FDR with side-information. We show using simulated and real datasets that when the data naturally divide into groups with different characteristics Group-walk can deliver consistent power gains that in some cases are substantial. These groupings include the precursor charge state (4% more discovered peptides at 1% FDR threshold), the peptide length (3.6% increase) and the mass difference due to modifications (26% increase). AVAILABILITY AND IMPLEMENTATION: Group-walk is available at https://cran.r-project.org/web/packages/groupwalk/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Proteómica , Espectrometría de Masas en Tándem , Péptidos/química , Proteómica/métodos , Espectrometría de Masas en Tándem/métodos
6.
Biometrics ; 79(4): 3472-3484, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-36652258

RESUMEN

Recently, Barber and Candès laid the theoretical foundation for a general framework for false discovery rate (FDR) control based on the notion of "knockoffs." A closely related FDR control methodology has long been employed in the analysis of mass spectrometry data, referred to there as "target-decoy competition" (TDC). However, any approach that aims to control the FDR, which is defined as the expected value of the false discovery proportion (FDP), suffers from a problem. Specifically, even when successfully controlling the FDR at level α, the FDP in the list of discoveries can significantly exceed α. We offer FDP-SD, a new procedure that rigorously controls the FDP in the knockoff/TDC competition setup by guaranteeing that the FDP is bounded by α at a desired confidence level. Compared with the recently published framework of Katsevich and Ramdas, FDP-SD generally delivers more power and often substantially so in simulated and real data.


Asunto(s)
Algoritmos , Espectrometría de Masas , Reacciones Falso Positivas
7.
J Proteome Res ; 21(10): 2412-2420, 2022 Oct 07.
Artículo en Inglés | MEDLINE | ID: mdl-36166314

RESUMEN

The analysis of shotgun proteomics data often involves generating lists of inferred peptide-spectrum matches (PSMs) and/or of peptides. The canonical approach for generating these discovery lists is by controlling the false discovery rate (FDR), most commonly through target-decoy competition (TDC). At the PSM level, TDC is implemented by competing each spectrum's best-scoring target (real) peptide match with its best match against a decoy database. This PSM-level procedure can be adapted to the peptide level by selecting the top-scoring PSM per peptide prior to FDR estimation. Here, we first highlight and empirically augment a little known previous work by He et al., which showed that TDC-based PSM-level FDR estimates can be liberally biased. We thus propose that researchers instead focus on peptide-level analysis. We then investigate three ways to carry out peptide-level TDC and show that the most common method ("PSM-only") offers the lowest statistical power in practice. An alternative approach that carries out a double competition, first at the PSM and then at the peptide level ("PSM-and-peptide"), is the most powerful method, yielding an average increase of 17% more discovered peptides at 1% FDR threshold relative to the PSM-only method.


Asunto(s)
Algoritmos , Espectrometría de Masas en Tándem , Bases de Datos de Proteínas , Péptidos/análisis , Proteómica/métodos , Espectrometría de Masas en Tándem/métodos
8.
J Proteome Res ; 21(6): 1382-1391, 2022 06 03.
Artículo en Inglés | MEDLINE | ID: mdl-35549345

RESUMEN

Advances in library-based methods for peptide detection from data-independent acquisition (DIA) mass spectrometry have made it possible to detect and quantify tens of thousands of peptides in a single mass spectrometry run. However, many of these methods rely on a comprehensive, high-quality spectral library containing information about the expected retention time and fragmentation patterns of peptides in the sample. Empirical spectral libraries are often generated through data-dependent acquisition and may suffer from biases as a result. Spectral libraries can be generated in silico, but these models are not trained to handle all possible post-translational modifications. Here, we propose a false discovery rate-controlled spectrum-centric search workflow to generate spectral libraries directly from gas-phase fractionated DIA tandem mass spectrometry data. We demonstrate that this strategy is able to detect phosphorylated peptides and can be used to generate a spectral library for accurate peptide detection and quantitation in wide-window DIA data. We compare the results of this search workflow to other library-free approaches and demonstrate that our search is competitive in terms of accuracy and sensitivity. These results demonstrate that the proposed workflow has the capacity to generate spectral libraries while avoiding the limitations of other methods.


Asunto(s)
Péptidos , Espectrometría de Masas en Tándem , Biblioteca de Péptidos , Péptidos/análisis , Procesamiento Proteico-Postraduccional , Proteoma/análisis , Espectrometría de Masas en Tándem/métodos , Flujo de Trabajo
9.
J Proteome Res ; 20(8): 4153-4164, 2021 08 06.
Artículo en Inglés | MEDLINE | ID: mdl-34236864

RESUMEN

The standard proteomics database search strategy involves searching spectra against a peptide database and estimating the false discovery rate (FDR) of the resulting set of peptide-spectrum matches. One assumption of this protocol is that all the peptides in the database are relevant to the hypothesis being investigated. However, in settings where researchers are interested in a subset of peptides, alternative search and FDR control strategies are needed. Recently, two methods were proposed to address this problem: subset-search and all-sub. We show that both methods fail to control the FDR. For subset-search, this failure is due to the presence of "neighbor" peptides, which are defined as irrelevant peptides with a similar precursor mass and fragmentation spectrum as a relevant peptide. Not considering neighbors compromises the FDR estimate because a spectrum generated by an irrelevant peptide can incorrectly match well to a relevant peptide. Therefore, we have developed a new method, "subset-neighbor search" (SNS), that accounts for neighbor peptides. We show evidence that SNS controls the FDR when neighbors are present and that SNS outperforms group-FDR, the only other method that appears to control the FDR relative to a subset of relevant peptides.


Asunto(s)
Algoritmos , Espectrometría de Masas en Tándem , Bases de Datos de Proteínas , Humanos , Péptidos , Proteómica
10.
J Proteome Res ; 18(2): 585-593, 2019 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-30560673

RESUMEN

Decoy database search with target-decoy competition (TDC) provides an intuitive, easy-to-implement method for estimating the false discovery rate (FDR) associated with spectrum identifications from shotgun proteomics data. However, the procedure can yield different results for a fixed data set analyzed with different decoy databases, and this decoy-induced variability is particularly problematic for smaller FDR thresholds, data sets, or databases. The average TDC (aTDC) protocol combats this problem by exploiting multiple independently shuffled decoy databases to provide an FDR estimate with reduced variability. We provide a tutorial introduction to aTDC, describe an improved variant of the protocol that offers increased statistical power, and discuss how to deploy aTDC in practice using the Crux software toolkit.


Asunto(s)
Bases de Datos de Proteínas/normas , Proteómica/métodos , Programas Informáticos , Conjuntos de Datos como Asunto , Humanos , Modelos Estadísticos , Reproducibilidad de los Resultados
11.
Genome Res ; 23(4): 698-704, 2013 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-23241746

RESUMEN

DNA replication origins are necessary for the duplication of genomes. In addition, plasmid-based expression systems require DNA replication origins to maintain plasmids efficiently. The yeast autonomously replicating sequence (ARS) assay has been a valuable tool in dissecting replication origin structure and function. However, the dearth of information on origins in diverse yeasts limits the availability of efficient replication origin modules to only a handful of species and restricts our understanding of origin function and evolution. To enable rapid study of origins, we have developed a sequencing-based suite of methods for comprehensively mapping and characterizing ARSs within a yeast genome. Our approach finely maps genomic inserts capable of supporting plasmid replication and uses massively parallel deep mutational scanning to define molecular determinants of ARS function with single-nucleotide resolution. In addition to providing unprecedented detail into origin structure, our data have allowed us to design short, synthetic DNA sequences that retain maximal ARS function. These methods can be readily applied to understand and modulate ARS function in diverse systems.


Asunto(s)
Mapeo Cromosómico , Replicación del ADN , Levaduras/genética , Cromosomas Fúngicos , Biología Computacional/métodos , Biblioteca de Genes , Genes Fúngicos , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Mutación , Origen de Réplica , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
12.
J Proteome Res ; 14(2): 1147-60, 2015 Feb 06.
Artículo en Inglés | MEDLINE | ID: mdl-25482958

RESUMEN

Identifying the peptide responsible for generating an observed fragmentation spectrum requires scoring a collection of candidate peptides and then identifying the peptide that achieves the highest score. However, analysis of a large collection of such spectra requires that the score assigned to one spectrum be well-calibrated with respect to the scores assigned to other spectra. In this work, we define the notion of calibration in the context of shotgun proteomics spectrum identification, and we introduce a simple, albeit computationally intensive, technique to calibrate an arbitrary score function. We demonstrate that this calibration procedure yields an increased number of identified spectra at a fixed false discovery rate (FDR) threshold. We also show that proper calibration of scores has a surprising effect on a previously described FDR estimation procedure, making the procedure less conservative. Finally, we provide empirical results suggesting that even partial calibration, which is much less computationally demanding, can yield significant increases in spectrum identification. Overall, we argue that accurate shotgun proteomics analysis requires careful attention to score calibration.


Asunto(s)
Proteómica , Calibración , Proteínas de Saccharomyces cerevisiae/química
13.
J Proteome Res ; 14(8): 3148-61, 2015 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-26152888

RESUMEN

Interpreting the potentially vast number of hypotheses generated by a shotgun proteomics experiment requires a valid and accurate procedure for assigning statistical confidence estimates to identified tandem mass spectra. Despite the crucial role such procedures play in most high-throughput proteomics experiments, the scientific literature has not reached a consensus about the best confidence estimation methodology. In this work, we evaluate, using theoretical and empirical analysis, four previously proposed protocols for estimating the false discovery rate (FDR) associated with a set of identified tandem mass spectra: two variants of the target-decoy competition protocol (TDC) of Elias and Gygi and two variants of the separate target-decoy search protocol of Käll et al. Our analysis reveals significant biases in the two separate target-decoy search protocols. Moreover, the one TDC protocol that provides an unbiased FDR estimate among the target PSMs does so at the cost of forfeiting a random subset of high-scoring spectrum identifications. We therefore propose the mix-max procedure to provide unbiased, accurate FDR estimates in the presence of well-calibrated scores. The method avoids biases associated with the two separate target-decoy search protocols and also avoids the propensity for target-decoy competition to discard a random subset of high-scoring target identifications.


Asunto(s)
Algoritmos , Biología Computacional , Péptidos/metabolismo , Proteómica/métodos , Espectrometría de Masas en Tándem/métodos , Secuencia de Aminoácidos , Animales , Proteínas de Caenorhabditis elegans/metabolismo , Biología Computacional/métodos , Bases de Datos de Proteínas , Reacciones Falso Positivas , Plasmodium falciparum/metabolismo , Proteómica/normas , Proteínas Protozoarias/metabolismo , Reproducibilidad de los Resultados , Proteínas de Saccharomyces cerevisiae/metabolismo , Espectrometría de Masas en Tándem/normas
14.
J Proteome Res ; 14(8): 3027-38, 2015 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-26084232

RESUMEN

Accurate assignment of peptide sequences to observed fragmentation spectra is hindered by the large number of hypotheses that must be considered for each observed spectrum. A high score assigned to a particular peptide-spectrum match (PSM) may not end up being statistically significant after multiple testing correction. Researchers can mitigate this problem by controlling the hypothesis space in various ways: considering only peptides resulting from enzymatic cleavages, ignoring possible post-translational modifications or single nucleotide variants, etc. However, these strategies sacrifice identifications of spectra generated by rarer types of peptides. In this work, we introduce a statistical testing framework, cascade search, that directly addresses this problem. The method requires that the user specify a priori a statistical confidence threshold as well as a series of peptide databases. For instance, such a cascade of databases could include fully tryptic, semitryptic, and nonenzymatic peptides or peptides with increasing numbers of modifications. Cascaded search then gradually expands the list of candidate peptides from more likely peptides toward rare peptides, sequestering at each stage any spectrum that is identified with a specified statistical confidence. We compare cascade search to a standard procedure that lumps all of the peptides into a single database, as well as to a previously described group FDR procedure that computes the FDR separately within each database. We demonstrate, using simulated and real data, that cascade search identifies more spectra at a fixed FDR threshold than with either the ungrouped or grouped approach. Cascade search thus provides a general method for maximizing the number of identified spectra in a statistically rigorous fashion.


Asunto(s)
Algoritmos , Péptidos/análisis , Proteómica/métodos , Espectrometría de Masas en Tándem/métodos , Línea Celular , Simulación por Computador , Bases de Datos de Proteínas , Humanos , Péptidos/metabolismo , Isoformas de Proteínas/análisis , Isoformas de Proteínas/metabolismo , Procesamiento Proteico-Postraduccional , Reproducibilidad de los Resultados , Proteínas de Saccharomyces cerevisiae/análisis , Proteínas de Saccharomyces cerevisiae/metabolismo
16.
Bioinformatics ; 30(14): 1965-73, 2014 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-24665130

RESUMEN

MOTIVATION: With over 9000 unique users recorded in the first half of 2013, MEME is one of the most popular motif-finding tools available. Reliable estimates of the statistical significance of motifs can greatly increase the usefulness of any motif finder. By analogy, it is difficult to imagine evaluating a BLAST result without its accompanying E-value. Currently MEME evaluates its EM-generated candidate motifs using an extension of BLAST's E-value to the motif-finding context. Although we previously indicated the drawbacks of MEME's current significance evaluation, we did not offer a practical substitute suited for its needs, especially because MEME also relies on the E-value internally to rank competing candidate motifs. RESULTS: Here we offer a two-tiered significance analysis that can replace the E-value in selecting the best candidate motif and in evaluating its overall statistical significance. We show that our new approach could substantially improve MEME's motif-finding performance and would also provide the user with a reliable significance analysis. In addition, for large input sets, our new approach is in fact faster than the currently implemented E-value analysis.


Asunto(s)
Motivos de Nucleótidos , Programas Informáticos , Algoritmos , Interpretación Estadística de Datos , Análisis de Secuencia de ADN
17.
bioRxiv ; 2024 Jun 04.
Artículo en Inglés | MEDLINE | ID: mdl-38895431

RESUMEN

A pressing statistical challenge in the field of mass spectrometry proteomics is how to assess whether a given software tool provides accurate error control. Each software tool for searching such data uses its own internally implemented methodology for reporting and controlling the error. Many of these software tools are closed source, with incompletely documented methodology, and the strategies for validating the error are inconsistent across tools. In this work, we identify three different methods for validating false discovery rate (FDR) control in use in the field, one of which is invalid, one of which can only provide a lower bound rather than an upper bound, and one of which is valid but under-powered. The result is that the field has a very poor understanding of how well we are doing with respect to FDR control, particularly for the analysis of data-independent acquisition (DIA) data. We therefore propose a new, more powerful method for evaluating FDR control in this setting, and we then employ that method, along with an existing lower bounding technique, to characterize a variety of popular search tools. We find that the search tools for analysis of data-dependent acquisition (DDA) data generally seem to control the FDR at the peptide level, whereas none of the DIA search tools consistently controls the FDR at the peptide level across all the datasets we investigated. Furthermore, this problem becomes much worse when the latter tools are evaluated at the protein level. These results may have significant implications for various downstream analyses, since proper FDR control has the potential to reduce noise in discovery lists and thereby boost statistical power.

18.
PLoS Genet ; 6(5): e1000946, 2010 May 13.
Artículo en Inglés | MEDLINE | ID: mdl-20485513

RESUMEN

Eukaryotic chromosomes initiate DNA synthesis from multiple replication origins. The machinery that initiates DNA synthesis is highly conserved, but the sites where the replication initiation proteins bind have diverged significantly. Functional comparative genomics is an obvious approach to study the evolution of replication origins. However, to date, the Saccharomyces cerevisiae replication origin map is the only genome map available. Using an iterative approach that combines computational prediction and functional validation, we have generated a high-resolution genome-wide map of DNA replication origins in Kluyveromyces lactis. Unlike other yeasts or metazoans, K. lactis autonomously replicating sequences (KlARSs) contain a 50 bp consensus motif suggestive of a dimeric structure. This motif is necessary and largely sufficient for initiation and was used to dependably identify 145 of the up to 156 non-repetitive intergenic ARSs projected for the K. lactis genome. Though similar in genome sizes, K. lactis has half as many ARSs as its distant relative S. cerevisiae. Comparative genomic analysis shows that ARSs in K. lactis and S. cerevisiae preferentially localize to non-syntenic intergenic regions, linking ARSs with loci of accelerated evolutionary change.


Asunto(s)
Genoma Fúngico , Secuencia de Bases , ADN de Hongos , Kluyveromyces/genética , Datos de Secuencia Molecular , Origen de Réplica , Saccharomyces cerevisiae/genética
19.
Methods Mol Biol ; 2426: 25-34, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36308683

RESUMEN

Target-decoy competition has been commonly used for over a decade to control the false discovery rate when analyzing tandem mass spectrometry (MS/MS) data. We recently developed a framework that uses multiple decoys to increase the number of detected peptides in MS/MS data. Here, we present a pipeline of Apache licensed, open-source software that allows the user to readily take advantage of our framework.


Asunto(s)
Proteómica , Espectrometría de Masas en Tándem , Espectrometría de Masas en Tándem/métodos , Proteómica/métodos , Péptidos/química , Programas Informáticos , Bases de Datos de Proteínas , Algoritmos
20.
Bioinformatics ; 27(12): 1603-9, 2011 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-21543443

RESUMEN

MOTIVATION: A question that often comes up after applying a motif finder to a set of co-regulated DNA sequences is whether the reported putative motif is similar to any known motif. While several tools have been designed for this task, Habib et al. pointed out that the scores that are commonly used for measuring similarity between motifs do not distinguish between a good alignment of two informative columns (say, all-A) and one of two uninformative columns. This observation explains why tools such as Tomtom occasionally return an alignment of uninformative columns which is clearly spurious. To address this problem, Habib et al. suggested a new score [Bayesian Likelihood 2-Component (BLiC)] which uses a Bayesian information criterion to penalize matches that are also similar to the background distribution. RESULTS: We show that the BLiC score exhibits other, highly undesirable properties, and we offer instead a general approach to adjust any motif similarity score so as to reduce the number of reported spurious alignments of uninformative columns. We implement our method in Tomtom and show that, without significantly compromising Tomtom's retrieval accuracy or its runtime, we can drastically reduce the number of uninformative alignments. AVAILABILITY AND IMPLEMENTATION: The modified Tomtom is available as part of the MEME Suite at http://meme.nbcr.net.


Asunto(s)
Análisis de Secuencia de ADN , Algoritmos , Teorema de Bayes , Alineación de Secuencia/métodos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA