Pesquisa | Secretaria de Estado da Saúde

Proteogenomic strategies for identification of aberrant cancer peptides using large-scale next-generation sequencing data.

Woo, Sunghee; Cha, Seong Won; Na, Seungjin; Guest, Clark; Liu, Tao; Smith, Richard D; Rodland, Karin D; Payne, Samuel; Bafna, Vineet.

Proteomics ; 14(23-24): 2719-30, 2014 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-25263569

RESUMO

Cancer is driven by the acquisition of somatic DNA lesions. Distinguishing the early driver mutations from subsequent passenger mutations is key to molecular subtyping of cancers, understanding cancer progression, and the discovery of novel biomarkers. The advances of genomics technologies (whole-genome exome, and transcript sequencing, collectively referred to as NGS (next-generation sequencing)) have fueled recent studies on somatic mutation discovery. However, the vision is challenged by the complexity, redundancy, and errors in genomic data, and the difficulty of investigating the proteome translated portion of aberrant genes using only genomic approaches. Combination of proteomic and genomic technologies are increasingly being employed. Various strategies have been employed to allow the usage of large-scale NGS data for conventional MS/MS searches. This paper provides a discussion of applying different strategies relating to large database search, and FDR (false discovery rate) -based error control, and their implication to cancer proteogenomics. Moreover, it extends and develops the idea of a unified genomic variant database that can be searched by any MS sample. A total of 879 BAM files downloaded from TCGA repository were used to create a 4.34 GB unified FASTA database that contained 2787062 novel splice junctions, 38,464 deletions, 1,105 insertions, and 182,302 substitutions. Proteomic data from a single ovarian carcinoma sample (439,858 spectra) was searched against the database. By applying the most conservative FDR measure, we have identified 524 novel peptides and 65,578 known peptides at 1% FDR threshold. The novel peptides include interesting examples of doubly mutated peptides, frame-shifts, and nonsample-recruited mutations, which emphasize the strength of our approach.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias/metabolismo , Proteômica/métodos , Bases de Dados de Proteínas , Humanos , Neoplasias/genética , Peptídeos/genética

Proteogenomic database construction driven from large scale RNA-seq data.

Woo, Sunghee; Cha, Seong Won; Merrihew, Gennifer; He, Yupeng; Castellana, Natalie; Guest, Clark; MacCoss, Michael; Bafna, Vineet.

J Proteome Res ; 13(1): 21-8, 2014 Jan 03.

Artigo em Inglês | MEDLINE | ID: mdl-23802565

RESUMO

The advent of inexpensive RNA-seq technologies and other deep sequencing technologies for RNA has the promise to radically improve genomic annotation, providing information on transcribed regions and splicing events in a variety of cellular conditions. Using MS-based proteogenomics, many of these events can be confirmed directly at the protein level. However, the integration of large amounts of redundant RNA-seq data and mass spectrometry data poses a challenging problem. Our paper addresses this by construction of a compact database that contains all useful information expressed in RNA-seq reads. Applying our method to cumulative C. elegans data reduced 496.2 GB of aligned RNA-seq SAM files to 410 MB of splice graph database written in FASTA format. This corresponds to 1000× compression of data size, without loss of sensitivity. We performed a proteogenomics study using the custom data set, using a completely automated pipeline, and identified a total of 4044 novel events, including 215 novel genes, 808 novel exons, 12 alternative splicings, 618 gene-boundary corrections, 245 exon-boundary changes, 938 frame shifts, 1166 reverse strands, and 42 translated UTRs. Our results highlight the usefulness of transcript + proteomic integration for improved genome annotations.

Assuntos

Caenorhabditis elegans/metabolismo , Bases de Dados Genéticas , Bases de Dados de Proteínas , Genoma , Proteoma , Análise de Sequência de RNA , Sequência de Aminoácidos , Animais , Automação , Caenorhabditis elegans/genética , Proteínas de Helminto/química , Proteínas de Helminto/genética , Proteínas de Helminto/metabolismo , Dados de Sequência Molecular

High Accuracy Monocular SFM and Scale Correction for Autonomous Driving.

Song, Shiyu; Chandraker, Manmohan; Guest, Clark C.

IEEE Trans Pattern Anal Mach Intell ; 38(4): 730-43, 2016 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-26513777

RESUMO

We present a real-time monocular visual odometry system that achieves high accuracy in real-world autonomous driving applications. First, we demonstrate robust monocular SFM that exploits multithreading to handle driving scenes with large motions and rapidly changing imagery. To correct for scale drift, we use known height of the camera from the ground plane. Our second contribution is a novel data-driven mechanism for cue combination that allows highly accurate ground plane estimation by adapting observation covariances of multiple cues, such as sparse feature matching and dense inter-frame stereo, based on their relative confidences inferred from visual data on a per-frame basis. Finally, we demonstrate extensive benchmark performance and comparisons on the challenging KITTI dataset, achieving accuracy comparable to stereo and exceeding prior monocular systems. Our SFM system is optimized to output pose within 50 ms in the worst case, while average case operation is over 30 fps. Our framework also significantly boosts the accuracy of applications like object localization that rely on the ground plane.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa