Repeat-Preserving Decoy Database for False Discovery Rate Estimation in Peptide Identification.

Moosa, Johra Muhammad; Guan, Shenheng; Moran, Michael F; Ma, Bin

Moosa, Johra Muhammad; Guan, Shenheng; Moran, Michael F; Ma, Bin.

Afiliación

Moosa JM; David R. Cheriton School of Computer Science, University of Waterloo, Waterloo N2L 3G1, Canada.
Guan S; David R. Cheriton School of Computer Science, University of Waterloo, Waterloo N2L 3G1, Canada.
Moran MF; Program in Cell Biology and SPARC BioCentre, Hospital for Sick Children, 686 Bay St, Toronto, Ontario M5G 0A4, Canada.
Ma B; Program in Cell Biology and SPARC BioCentre, Hospital for Sick Children, 686 Bay St, Toronto, Ontario M5G 0A4, Canada.

J Proteome Res ; 19(3): 1029-1036, 2020 03 06.

Article en En | MEDLINE | ID: mdl-32009416

ABSTRACT

ABSTRACT

The sequence database searching method is widely used in proteomics for peptide identification. To control the false discovery rate (FDR) of the searching results, the target-decoy method generates and searches a decoy database together with the target database. A known problem is that the target protein sequence database may contain numerous repeated peptides. The structures of these repeats are not preserved by most existing decoy generation algorithms. Previous studies suggest that such discrepancy between the target and decoy databases may lead to an inaccurate FDR estimation. Based on the de Bruijn graph model, we propose a new repeat-preserving algorithm to generate decoy databases. We prove that this algorithm preserves the structures of the repeats in the target database to a great extent. The de Bruijn method has been compared with a few other commonly used methods and demonstrated superior FDR estimation accuracy and an improved number of peptide identification.

Asunto(s)

Péptidos; Espectrometría de Masas en Tándem; Algoritmos; Bases de Datos de Proteínas; Proteómica

Palabras clave

de Bruijn graph; false discovery rate; mass spectrometry; peptide identification; proteomics; target−decoy validation

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Péptidos / Espectrometría de Masas en Tándem Tipo de estudio: Diagnostic_studies Idioma: En Revista: J Proteome Res Asunto de la revista: BIOQUIMICA Año: 2020 Tipo del documento: Article País de afiliación: Canadá

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google