Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Bioinformatics ; 32(12): i322-i331, 2016 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-27307634

RESUMO

UNLABELLED: Tandem mass spectrometry (MS/MS) is the dominant high throughput technology for identifying and quantifying proteins in complex biological samples. Analysis of the tens of thousands of fragmentation spectra produced by an MS/MS experiment begins by assigning to each observed spectrum the peptide that is hypothesized to be responsible for generating the spectrum. This assignment is typically done by searching each spectrum against a database of peptides. To our knowledge, all existing MS/MS search engines compute scores individually between a given observed spectrum and each possible candidate peptide from the database. In this work, we use a trellis, a data structure capable of jointly representing a large set of candidate peptides, to avoid redundantly recomputing common sub-computations among different candidates. We show how trellises may be used to significantly speed up existing scoring algorithms, and we theoretically quantify the expected speedup afforded by trellises. Furthermore, we demonstrate that compact trellis representations of whole sets of peptides enables efficient discriminative learning of a dynamic Bayesian network for spectrum identification, leading to greatly improved spectrum identification accuracy. CONTACT: bilmes@uw.edu or william-noble@uw.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Espectrometria de Massas em Tandem , Algoritmos , Teorema de Bayes , Bases de Dados de Proteínas , Peptídeos , Proteínas , Proteômica
2.
J Proteome Res ; 15(8): 2749-59, 2016 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-27397138

RESUMO

A central problem in mass spectrometry analysis involves identifying, for each observed tandem mass spectrum, the corresponding generating peptide. We present a dynamic Bayesian network (DBN) toolkit that addresses this problem by using a machine learning approach. At the heart of this toolkit is a DBN for Rapid Identification (DRIP), which can be trained from collections of high-confidence peptide-spectrum matches (PSMs). DRIP's score function considers fragment ion matches using Gaussians rather than fixed fragment-ion tolerances and also finds the optimal alignment between the theoretical and observed spectrum by considering all possible alignments, up to a threshold that is controlled using a beam-pruning algorithm. This function not only yields state-of-the art database search accuracy but also can be used to generate features that significantly boost the performance of the Percolator postprocessor. The DRIP software is built upon a general purpose DBN toolkit (GMTK), thereby allowing a wide variety of options for user-specific inference tasks as well as facilitating easy modifications to the DRIP model in future work. DRIP is implemented in Python and C++ and is available under Apache license at http://melodi-lab.github.io/dripToolkit .


Assuntos
Aprendizado de Máquina , Peptídeos/análise , Proteômica/métodos , Teorema de Bayes , Bases de Dados de Proteínas , Software , Espectrometria de Massas em Tandem
3.
Nat Methods ; 9(5): 473-6, 2012 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-22426492

RESUMO

We trained Segway, a dynamic Bayesian network method, simultaneously on chromatin data from multiple experiments, including positions of histone modifications, transcription-factor binding and open chromatin, all derived from a human chronic myeloid leukemia cell line. In an unsupervised fashion, we identified patterns associated with transcription start sites, gene ends, enhancers, transcriptional regulator CTCF-binding regions and repressed regions. Software and genome browser tracks are at http://noble.gs.washington.edu/proj/segway/.


Assuntos
Cromatina/fisiologia , Genoma Humano , Histonas/fisiologia , Sítio de Iniciação de Transcrição , Teorema de Bayes , Cromatina/genética , Histonas/genética , Humanos , Células K562 , Dados de Sequência Molecular , Regiões Promotoras Genéticas , Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição/genética , Fatores de Transcrição/fisiologia
4.
bioRxiv ; 2023 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-37546906

RESUMO

The identification of cell-type-specific 3D chromatin interactions between regulatory elements can help to decipher gene regulation and to interpret the function of disease-associated non-coding variants. However, current chromosome conformation capture (3C) technologies are unable to resolve interactions at this resolution when only small numbers of cells are available as input. We therefore present ChromaFold, a deep learning model that predicts 3D contact maps and regulatory interactions from single-cell ATAC sequencing (scATAC-seq) data alone. ChromaFold uses pseudobulk chromatin accessibility, co-accessibility profiles across metacells, and predicted CTCF motif tracks as input features and employs a lightweight architecture to enable training on standard GPUs. Once trained on paired scATAC-seq and Hi-C data in human cell lines and tissues, ChromaFold can accurately predict both the 3D contact map and peak-level interactions across diverse human and mouse test cell types. In benchmarking against a recent deep learning method that uses bulk ATAC-seq, DNA sequence, and CTCF ChIP-seq to make cell-type-specific predictions, ChromaFold yields superior prediction performance when including CTCF ChIP-seq data as an input and comparable performance without. Finally, fine-tuning ChromaFold on paired scATAC-seq and Hi-C in a complex tissue enables deconvolution of chromatin interactions across cell subpopulations. ChromaFold thus achieves state-of-the-art prediction of 3D contact maps and regulatory interactions using scATAC-seq alone as input data, enabling accurate inference of cell-type-specific interactions in settings where 3C-based assays are infeasible.

5.
Bioinformatics ; 26(12): i334-42, 2010 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-20529925

RESUMO

MOTIVATION: A global map of transcription factor binding sites (TFBSs) is critical to understanding gene regulation and genome function. DNaseI digestion of chromatin coupled with massively parallel sequencing (digital genomic footprinting) enables the identification of protein-binding footprints with high resolution on a genome-wide scale. However, accurately inferring the locations of these footprints remains a challenging computational problem. RESULTS: We present a dynamic Bayesian network-based approach for the identification and assignment of statistical confidence estimates to protein-binding footprints from digital genomic footprinting data. The method, DBFP, allows footprints to be identified in a probabilistic framework and outperforms our previously described algorithm in terms of precision at a fixed recall. Applied to a digital footprinting data set from Saccharomyces cerevisiae, DBFP identifies 4679 statistically significant footprints within intergenic regions. These footprints are mainly located near transcription start sites and are strongly enriched for known TFBSs. Footprints containing no known motif are preferentially located proximal to other footprints, consistent with cooperative binding of these footprints. DBFP also identifies a set of statistically significant footprints in the yeast coding regions. Many of these footprints coincide with the boundaries of antisense transcripts, and the most significant footprints are enriched for binding sites of the chromatin-associated factors Abf1 and Rap1. SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.


Assuntos
Pegadas de Proteínas/métodos , Algoritmos , Teorema de Bayes , Sítios de Ligação , Genoma , Dados de Sequência Molecular , Mapeamento de Interação de Proteínas/métodos , Saccharomyces cerevisiae/genética , Fatores de Transcrição/química , Sítio de Iniciação de Transcrição
6.
PLoS Comput Biol ; 6(7): e1000834, 2010 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-20628623

RESUMO

DNA in eukaryotes is packaged into a chromatin complex, the most basic element of which is the nucleosome. The precise positioning of the nucleosome cores allows for selective access to the DNA, and the mechanisms that control this positioning are important pieces of the gene expression puzzle. We describe a large-scale nucleosome pattern that jointly characterizes the nucleosome core and the adjacent linkers and is predominantly characterized by long-range oscillations in the mono, di- and tri-nucleotide content of the DNA sequence, and we show that this pattern can be used to predict nucleosome positions in both Homo sapiens and Saccharomyces cerevisiae more accurately than previously published methods. Surprisingly, in both H. sapiens and S. cerevisiae, the most informative individual features are the mono-nucleotide patterns, although the inclusion of di- and tri-nucleotide features results in improved performance. Our approach combines a much longer pattern than has been previously used to predict nucleosome positioning from sequence-301 base pairs, centered at the position to be scored-with a novel discriminative classification approach that selectively weights the contributions from each of the input features. The resulting scores are relatively insensitive to local AT-content and can be used to accurately discriminate putative dyad positions from adjacent linker regions without requiring an additional dynamic programming step and without the attendant edge effects and assumptions about linker length modeling and overall nucleosome density. Our approach produces the best dyad-linker classification results published to date in H. sapiens, and outperforms two recently published models on a large set of S. cerevisiae nucleosome positions. Our results suggest that in both genomes, a comparable and relatively small fraction of nucleosomes are well-positioned and that these positions are predictable based on sequence alone. We believe that the bulk of the remaining nucleosomes follow a statistical positioning model.


Assuntos
DNA/química , Conformação de Ácido Nucleico , Nucleossomos/genética , Análise de Sequência de DNA , Elementos Alu/genética , Composição de Bases/genética , Sequência de Bases/genética , Fator de Ligação a CCCTC , DNA Fúngico/química , Humanos , Curva ROC , Proteínas Repressoras/genética , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/genética , Alinhamento de Sequência
7.
Bioinformatics ; 24(13): i348-56, 2008 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-18586734

RESUMO

MOTIVATION: Tandem mass spectrometry (MS/MS) is an indispensable technology for identification of proteins from complex mixtures. Proteins are digested to peptides that are then identified by their fragmentation patterns in the mass spectrometer. Thus, at its core, MS/MS protein identification relies on the relative predictability of peptide fragmentation. Unfortunately, peptide fragmentation is complex and not fully understood, and what is understood is not always exploited by peptide identification algorithms. RESULTS: We use a hybrid dynamic Bayesian network (DBN)/support vector machine (SVM) approach to address these two problems. We train a set of DBNs on high-confidence peptide-spectrum matches. These DBNs, known collectively as Riptide, comprise a probabilistic model of peptide fragmentation chemistry. Examination of the distributions learned by Riptide allows identification of new trends, such as prevalent a-ion fragmentation at peptide cleavage sites C-term to hydrophobic residues. In addition, Riptide can be used to produce likelihood scores that indicate whether a given peptide-spectrum match is correct. A vector of such scores is evaluated by an SVM, which produces a final score to be used in peptide identification. Using Riptide in this way yields improved discrimination when compared to other state-of-the-art MS/MS identification algorithms, increasing the number of positive identifications by as much as 12% at a 1% false discovery rate. AVAILABILITY: Python and C source code are available upon request from the authors. The curated training sets are available at http://noble.gs.washington.edu/proj/intense/. The Graphical Model Tool Kit (GMTK) is freely available at http://ssli.ee.washington.edu/bilmes/gmtk.


Assuntos
Algoritmos , Inteligência Artificial , Espectrometria de Massas/métodos , Reconhecimento Automatizado de Padrão/métodos , Mapeamento de Peptídeos/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Teorema de Bayes , Dados de Sequência Molecular
8.
PLoS Comput Biol ; 4(11): e1000213, 2008 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-18989393

RESUMO

Hidden Markov models (HMMs) have been successfully applied to the tasks of transmembrane protein topology prediction and signal peptide prediction. In this paper we expand upon this work by making use of the more powerful class of dynamic Bayesian networks (DBNs). Our model, Philius, is inspired by a previously published HMM, Phobius, and combines a signal peptide submodel with a transmembrane submodel. We introduce a two-stage DBN decoder that combines the power of posterior decoding with the grammar constraints of Viterbi-style decoding. Philius also provides protein type, segment, and topology confidence metrics to aid in the interpretation of the predictions. We report a relative improvement of 13% over Phobius in full-topology prediction accuracy on transmembrane proteins, and a sensitivity and specificity of 0.96 in detecting signal peptides. We also show that our confidence metrics correlate well with the observed precision. In addition, we have made predictions on all 6.3 million proteins in the Yeast Resource Center (YRC) database. This large-scale study provides an overall picture of the relative numbers of proteins that include a signal-peptide and/or one or more transmembrane segments as well as a valuable resource for the scientific community. All DBNs are implemented using the Graphical Models Toolkit. Source code for the models described here is available at http://noble.gs.washington.edu/proj/philius. A Philius Web server is available at http://www.yeastrc.org/philius, and the predictions on the YRC database are available at http://www.yeastrc.org/pdr.


Assuntos
Teorema de Bayes , Biologia Computacional/métodos , Proteínas de Membrana/ultraestrutura , Modelos Moleculares , Sinais Direcionadores de Proteínas/fisiologia , Inteligência Artificial , Proteínas Fúngicas/ultraestrutura , Cadeias de Markov , Redes Neurais de Computação , Conformação Proteica , Reprodutibilidade dos Testes , Leveduras/ultraestrutura
9.
Adv Neural Inf Process Syst ; 2018: 7989-7999, 2018 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-30705579

RESUMO

We study the problem of maximizing deep submodular functions (DSFs) [13, 3] subject to a matroid constraint. DSFs are an expressive class of submodular functions that include, as strict subfamilies, the facility location, weighted coverage, and sums of concave composed with modular functions. We use a strategy similar to the continuous greedy approach [6], but we show that the multilinear extension of any DSF has a natural and computationally attainable concave relaxation that we can optimize using gradient ascent. Our results show a guarantee of max 0 < δ < 1 ( 1 - ϵ - δ - e - δ 2 Ω ( k ) ) with a running time of O(n 2 /ϵ 2 ) plus time for pipage rounding [6] to recover a discrete solution, where k is the rank of the matroid constraint. This bound is often better than the standard 1 - 1/e guarantee of the continuous greedy algorithm, but runs much faster. Our bound also holds even for fully curved (c = 1) functions where the guarantee of 1 - c/e degenerates to 1 - 1/e where c is the curvature of f [37]. We perform computational experiments that support our theoretical results.

10.
Uncertain Artif Intell ; 30: 320-329, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25298752

RESUMO

We present a peptide-spectrum alignment strategy that employs a dynamic Bayesian network (DBN) for the identification of spectra produced by tandem mass spectrometry (MS/MS). Our method is fundamentally generative in that it models peptide fragmentation in MS/MS as a physical process. The model traverses an observed MS/MS spectrum and a peptide-based theoretical spectrum to calculate the best alignment between the two spectra. Unlike all existing state-of-the-art methods for spectrum identification that we are aware of, our method can learn alignment probabilities given a dataset of high-quality peptide-spectrum pairs. The method, moreover, accounts for noise peaks and absent theoretical peaks in the observed spectrum. We demonstrate that our method outperforms, on a majority of datasets, several widely used, state-of-the-art database search tools for spectrum identification. Furthermore, the proposed approach provides an extensible framework for MS/MS analysis and provides useful information that is not produced by other methods, thanks to its generative structure.

11.
Uncertain Artif Intell ; 28: 775-785, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-25383048

RESUMO

Shotgun proteomics is a high-throughput technology used to identify unknown proteins in a complex mixture. At the heart of this process is a prediction task, the spectrum identification problem, in which each fragmentation spectrum produced by a shotgun proteomics experiment must be mapped to the peptide (protein subsequence) which generated the spectrum. We propose a new algorithm for spectrum identification, based on dynamic Bayesian networks, which significantly out-performs the de-facto standard tools for this task: SEQUEST and Mascot.

12.
Disabil Rehabil Assist Technol ; 3(1): 22-34, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18416516

RESUMO

PURPOSE: Mouse control has become a crucial aspect of many modern day computer interactions. This poses a challenge for individuals with motor impairments or those whose use of hands is restricted due to situational constraints. We present a system called the Vocal Joystick which allows the user to continuously control the mouse cursor by varying vocal parameters such as vowel quality, loudness and pitch. METHOD: Evaluations were conducted to characterize expert performance capability of the Vocal Joystick, and to compare novice user performance and preference for the Vocal Joystick and two other existing speech based cursor control methods. RESULTS: Our results show that Fitts' law, a well adopted model of human motor performance for movement tasks, is a good predictor of the speed - accuracy tradeoff for the Vocal Joystick, and suggests that the optimal performance of the Vocal Joystick may be comparable to that of a conventional hand-operated joystick. Novice user evaluations show that the Vocal Joystick can be used by people without extensive training, and that it presents a viable alternative to existing speech-based cursor control methods. CONCLUSIONS: The Vocal Joystick, with its ease of use, minimal setup requirement, and controllability, offers promise for providing an efficient method for cursor control and other forms of continuous input for individuals with motor impairments.


Assuntos
Gráficos por Computador/instrumentação , Periféricos de Computador , Terminais de Computador , Tecnologia Assistiva , Adolescente , Adulto , Pessoas com Deficiência , Feminino , Humanos , Modelos Lineares , Masculino , Análise e Desempenho de Tarefas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA