Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
ACS Omega ; 7(19): 16454-16467, 2022 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-35601313

RESUMO

Adopting proteogenomics approach to validate single nucleotide variation events by identifying corresponding single amino acid variant peptides from mass spectrometry (MS)-based proteomics data facilitates translational and clinical research. Although variant peptides are usually identified from MS data with a stringent false discovery rate (FDR), FDR control could fail to eliminate dubious results caused by several issues; thus, postexamination to eliminate dubious results is required. However, comprehensive postexaminations of identification results are still lacking. Therefore, we propose a framework of three bottom-up levels, peptide-spectrum match, peptide, and variant event levels, that consists of rigorous 11-aspect examinations from the MS perspective to further confirm the reliability of variant events. As a proof of concept and showing feasibility, we demonstrate 11 examinations on the identified variant peptides from an HEK293 cell line data set, where various database search strategies were applied to maximize the number of identified variant PSMs with an FDR <1% for postexaminations. The results showed that only FDR criterion is insufficient to validate identified variant peptides and the 11 postexaminations can reveal low-confidence variant events detected by shotgun proteomics experiments. Therefore, we suggest that postexaminations of identified variant events based on the proposed framework are necessary for proteogenomics studies.

2.
Sci Rep ; 12(1): 2045, 2022 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-35132134

RESUMO

Identifying peptides and proteins from mass spectrometry (MS) data, spectral library searching has emerged as a complementary approach to the conventional database searching. However, for the spectrum-centric analysis of data-independent acquisition (DIA) data, spectral library searching has not been widely exploited because existing spectral library search tools are mainly designed and optimized for the analysis of data-dependent acquisition (DDA) data. We present Calibr, a spectral library search tool for spectrum-centric DIA data analysis. Calibr optimizes spectrum preprocessing for pseudo MS2 spectra, generating an 8.11% increase in spectrum-spectrum match (SSM) number and a 7.49% increase in peptide number over the traditional preprocessing approach. When searching against the DDA-based spectral library, Calibr improves SSM number by 17.6-26.65% and peptide number by 18.45-37.31% over two state-of-the-art tools on three different data sets. Searching against the public spectral library from MassIVE, Calibr improves state-of-the-art tools in SSM and peptide numbers by more than 31.49% and 25.24%, respectively, for two data sets. Our analyses indicate higher sensitivity of Calibr results from the use of various spectral similarity measures and statistical scores, coupled with machine learning-based statistical validation for FDR control. Calibr executable files including a graphical user-interface application are available at https://ms.iis.sinica.edu.tw/COmics/Software_CalibrWizard.html and https://sourceforge.net/projects/comics-calibr .


Assuntos
Espectrometria de Massas/métodos , Biblioteca de Peptídeos , Peptídeos/química , Peptídeos/genética , Proteínas/química , Proteínas/genética , Proteômica/métodos , Bases de Dados como Assunto , Conjuntos de Dados como Assunto
3.
iScience ; 24(6): 102522, 2021 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-34142036

RESUMO

Lung adenocarcinoma (LUAD) patients in East Asia predominantly harbor oncogenic EGFR mutations. However, there remains a limited understanding of the biological characteristics and therapeutic vulnerabilities of the concurrent mutations of EGFR and other genes in LUAD. Here, we performed comprehensive bioinformatics analyses on 88 treatment-naïve East Asian LUAD patients. Based on somatic mutation clustering, we identified three somatic mutation subtypes: EGFR + TP53 co-mutation, EGFR mutation, and multiple-gene mutation. A proteogenomic analysis among subtypes revealed varying degrees of dysregulation in cell-cycle-related and immune-related processes. An immune-characteristic analysis revealed higher PDL1 protein expression in the EGFR + TP53 co-mutation subtype than in the EGFR mutation subtype, which may affect the therapeutic efficacy of anti-PD-L1 therapy. Moreover, integrating known and potential therapeutic target analysis reveals therapeutic vulnerabilities of specific subtypes and nominates candidate biomarkers for therapeutic intervention. This study provides new biological insight and therapeutic opportunities with respect to EGFR-mutant LUAD subtypes.

4.
Nat Commun ; 12(1): 2539, 2021 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-33953186

RESUMO

Phosphoproteomics can provide insights into cellular signaling dynamics. To achieve deep and robust quantitative phosphoproteomics profiling for minute amounts of sample, we here develop a global phosphoproteomics strategy based on data-independent acquisition (DIA) mass spectrometry and hybrid spectral libraries derived from data-dependent acquisition (DDA) and DIA data. Benchmarking the method using 166 synthetic phosphopeptides shows high sensitivity (<0.1 ng), accurate site localization and reproducible quantification (~5% median coefficient of variation). As a proof-of-concept, we use lung cancer cell lines and patient-derived tissue to construct a hybrid phosphoproteome spectral library covering 159,524 phosphopeptides (88,107 phosphosites). Based on this library, our single-shot streamlined DIA workflow quantifies 36,350 phosphosites (19,755 class 1) in cell line samples within two hours. Application to drug-resistant cells and patient-derived lung cancer tissues delineates site-specific phosphorylation events associated with resistance and tumor progression, showing that our workflow enables the characterization of phosphorylation signaling with deep coverage, high sensitivity and low between-run missing values.


Assuntos
Fosfopeptídeos/metabolismo , Proteoma/análise , Proteômica , Linhagem Celular Tumoral , Humanos , Neoplasias Pulmonares/metabolismo , Fosforilação , Proteínas/metabolismo , Espectrometria de Massas em Tandem/métodos , Fluxo de Trabalho
5.
Sci Rep ; 11(1): 2233, 2021 01 26.
Artigo em Inglês | MEDLINE | ID: mdl-33500498

RESUMO

Mass spectrometry-based proteomics using isobaric labeling for multiplex quantitation has become a popular approach for proteomic studies. We present Multi-Q 2, an isobaric-labeling quantitation tool which can yield the largest quantitation coverage and improved quantitation accuracy compared to three state-of-the-art methods. Multi-Q 2 supports identification results from several popular proteomic data analysis platforms for quantitation, offering up to 12% improvement in quantitation coverage for accepting identification results from multiple search engines when compared with MaxQuant and PatternLab. It is equipped with various quantitation algorithms, including a ratio compression correction algorithm, and results in up to 336 algorithmic combinations. Systematic evaluation shows different algorithmic combinations have different strengths and are suitable for different situations. We also demonstrate that the flexibility of Multi-Q 2 in customizing algorithmic combination can lead to improved quantitation accuracy over existing tools. Moreover, the use of complementary algorithmic combinations can be an effective strategy to enhance sensitivity when searching for biomarkers from differentially expressed proteins in proteomic experiments. Multi-Q 2 provides interactive graphical interfaces to process quantitation and to display ratios at protein, peptide, and spectrum levels. It also supports a heatmap module, enabling users to cluster proteins based on their abundance ratios and to visualize the clustering results. Multi-Q 2 executable files, sample data sets, and user manual are freely available at http://ms.iis.sinica.edu.tw/COmics/Software_Multi-Q2.html .

6.
J Proteomics ; 231: 104021, 2021 01 16.
Artigo em Inglês | MEDLINE | ID: mdl-33148401

RESUMO

Concatenated target-decoy database searches are commonly used in proteogenomic research for variant peptide identification. Currently, protein-based and peptide-based sequence databases are applied to store variant sequences for database searches. The protein-based database records a full-length wild-type protein sequence but using the given variant events to replace the original amino acids, whereas the peptide-based database retains only the in silico digested peptides containing the variants. However, the performance of applying various decoy generation methods on the peptide-based variant sequence database is still unclear, compared to the protein-based database. In this paper, we conduct a thorough comparison on target-decoy databases constructed by the above two types of databases coupled with various decoy generation methods for proteogenomic analyses. The results show that for the protein-based variant sequence database, using the reverse or the pseudo reverse method achieves similar performance for variant peptide identification. Furthermore, for the peptide-based database, the pseudo reverse method is more suitable than the widely used reverse method, as shown by identifying 6% more variant PSMs in a HEK293 cell line data set. SIGNIFICANCE: In our survey of publications on proteogenomic studies, 57% of the studies adopt the peptide-based variant sequence database coupled with the reverse method for decoy generation to construct a target-decoy database for searches. However, our results show that when using the peptide-based variant sequence database, it is better to adopt the pseudo reverse method for generating decoy sequences, to avoid leading to fewer variant peptides being identified.


Assuntos
Proteogenômica , Algoritmos , Bases de Dados de Proteínas , Células HEK293 , Humanos , Peptídeos/genética , Proteínas
7.
J Proteomics ; 223: 103819, 2020 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-32407886

RESUMO

Identifying single-amino-acid variants (SAVs) from mass spectrometry-based experiments is critical for validating single-nucleotide variants (SNVs) at the protein level to facilitate biomedical research. Currently, two approaches are usually applied to convert SNV annotations into SAV-harboring protein sequences. One approach generates one sequence containing exactly one SAV, and the other all SAVs. However, they may neglect the possibility of SAV combinations, e.g., haplotypes, existing in bio-samples. Therefore, it is necessary to consider all SAV combinations of a protein when generating SAV-harboring protein sequences. In this paper, we propose MinProtMaxVP, a novel approach which selects a minimized number of SAV-harboring protein sequences generated from the exhaustive approach, while still accommodating all possible variant peptides, by solving a classic set covering problem. Our study on known haplotype variations of TAS2R38 justifies the necessity for MinProtMaxVP to consider all combinations of SAVs. The performance of MinProtMaxVP is demonstrated by an in silico study on OR2T27 with five SAVs and real experimental data of the HEK293 cell line. Furthermore, assuming simulated somatic and germline variants of OR2T27 in tumor and normal tissues demonstrates that when adopting the appropriate somatic and germline SAV integration strategy, MinProtMaxVP is adaptable to labeling and label-free mass spectrometry-based experiments. SIGNIFICANCE: We present MinProtMaxVP, a novel approach to generate SAV-harboring protein sequences for constructing a customized protein sequence database, which is used in database searching for variant peptide identification. This approach outperforms the existing approaches in generating all possible variant peptides to be included in protein sequences and possibly leading to identification of more variant peptides in proteogenomic analysis.


Assuntos
Proteogenômica , Sequência de Aminoácidos , Bases de Dados de Proteínas , Células HEK293 , Humanos , Peptídeos/genética
8.
Sci Rep ; 9(1): 15975, 2019 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-31685900

RESUMO

N-linked glycosylation is one of the predominant post-translational modifications involved in a number of biological functions. Since experimental characterization of glycosites is challenging, glycosite prediction is crucial. Several predictors have been made available and report high performance. Most of them evaluate their performance at every asparagine in protein sequences, not confined to asparagine in the N-X-S/T sequon. In this paper, we present N-GlyDE, a two-stage prediction tool trained on rigorously-constructed non-redundant datasets to predict N-linked glycosites in the human proteome. The first stage uses a protein similarity voting algorithm trained  on both glycoproteins and non-glycoproteins to predict a score for a protein to improve glycosite prediction. The second stage uses a support vector machine to predict N-linked glycosites by utilizing features of gapped dipeptides, pattern-based predicted surface accessibility, and predicted secondary structure. N-GlyDE's final predictions are derived from a weight adjustment of the second-stage prediction results based on the first-stage prediction score. Evaluated on N-X-S/T sequons of an independent dataset comprised of 53 glycoproteins and 33 non-glycoproteins, N-GlyDE achieves an accuracy and MCC of 0.740 and 0.499, respectively, outperforming the compared tools. The N-GlyDE web server is available at http://bioapp.iis.sinica.edu.tw/N-GlyDE/ .

9.
J Proteome Res ; 18(12): 4124-4132, 2019 12 06.
Artigo em Inglês | MEDLINE | ID: mdl-31429573

RESUMO

When conducting proteomics experiments to detect missing proteins and protein isoforms in the human proteome, it is desirable to use a protease that can yield more unique peptides with properties amenable for mass spectrometry analysis. Though trypsin is currently the most widely used protease, some proteins can yield only a limited number of unique peptides by trypsin digestion. Other proteases and multiple proteases have been applied in reported studies to increase the number of identified proteins and protein sequence coverage. To facilitate the selection of proteases, we developed a web-based resource, called in silico Human Proteome Digestion Map (iHPDM), which contains a comprehensive proteolytic peptide database constructed from human proteins, including isoforms, in neXtProt digested by 15 protease combinations of one or two proteases. iHPDM provides convenient functions and graphical visualizations for users to examine and compare the digestion results of different proteases. Notably, it also supports users to input filtering criteria on digested peptides, e.g., peptide length and uniqueness, to select suitable proteases. iHPDM can facilitate protease selection for shotgun proteomics experiments to identify missing proteins, protein isoforms, and single amino acid variant peptides.


Assuntos
Peptídeo Hidrolases/metabolismo , Mapeamento de Peptídeos/métodos , Proteoma/metabolismo , Gráficos por Computador , Simulação por Computador , Visualização de Dados , Bases de Dados Factuais , Receptores ErbB/metabolismo , Humanos , Internet , MAP Quinase Quinase 1/metabolismo , N-Acetilexosaminiltransferases/metabolismo , Isoformas de Proteínas/metabolismo , Proteômica/métodos , Receptores Odorantes/metabolismo , Interface Usuário-Computador , gama-Glutamiltransferase/metabolismo
10.
Anal Chem ; 91(15): 9403-9406, 2019 08 06.
Artigo em Inglês | MEDLINE | ID: mdl-31305071

RESUMO

Protein and peptide identification and quantitation are essential tasks in proteomics research and involve a series of steps in analyzing mass spectrometry data. Trans-Proteomic Pipeline (TPP) provides a wide range of useful tools through its web interfaces for analyses such as sequence database search, statistical validation, and quantitation. To utilize the powerful functionality of TPP without the need for manual intervention to launch each step, we developed a software tool, called WinProphet, to create and automatically execute a pipeline for proteomic analyses. It seamlessly integrates with TPP and other external command-line programs, supporting various functionalities, including database search for protein and peptide identification, spectral library construction and search, data-independent acquisition (DIA) data analysis, and isobaric labeling and label-free quantitation. WinProphet is a standalone, installation-free tool with graphical interfaces for users to configure, manage, and automatically execute pipelines. The constructed pipelines can be exported as XML files with all of the parameter settings for reusability and portability. The executable files, user manual, and sample data sets of WinProphet are freely available at  http://ms.iis.sinica.edu.tw/COmics/Software_WinProphet.html .


Assuntos
Análise de Dados , Proteômica/métodos , Software , Interface Usuário-Computador , Fluxo de Trabalho
11.
J Proteome Res ; 17(12): 4138-4151, 2018 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-30203655

RESUMO

Human embryonic stem cells (hESCs) have the capacity for self-renewal and multilineage differentiation, which are of clinical importance for regeneration medicine. Despite the significant progress of hESC study, the complete hESC proteome atlas, especially the surface protein composition, awaits delineation. According to the latest release of neXtProt database (January 17, 2018; 19 658 PE1, 2, 3, and 4 human proteins), membrane proteins present the major category (1047; 48%) among all 2186 missing proteins (MPs). We conducted a deep subcellular proteomics analysis of hESCs to identify the nuclear, cytoplasmic, and membrane proteins in hESCs and to mine missing membrane proteins in the very early cell status. To our knowledge, our study achieved the largest data set with confident identification of 11 970 unique proteins (1% false discovery rate at peptide, protein, and PSM levels), including the most-comprehensive description of 6 138 annotated membrane proteins in hESCs. Following the HPP guideline, we identified 26 gold (neXtProt PE2, 3, and 4 MPs) and 87 silver (potential MP candidates with a single unique peptide detected) MPs, of which 69 were membrane proteins, and the expression of 21 gold MPs was further verified either by multiple reaction monitoring mass spectrometry or by matching synthetic peptides in the Peptide Atlas database. Functional analysis of the MPs revealed their potential roles in the pluripotency-related pathways and the lineage- and tissue-specific differentiation processes. Our proteome map of hESCs may provide a rich resource not only for the identification of MPs in the human proteome but also for the investigation on self-renewal and differentiation of hESC. All mass spectrometry data were deposited in ProteomeXchange via jPOST with identifier PXD009840.


Assuntos
Células-Tronco Embrionárias Humanas/química , Proteínas de Membrana/análise , Proteoma/análise , Diferenciação Celular , Linhagem da Célula , Humanos , Membranas Intracelulares/química , Proteômica/métodos
12.
J Proteome Res ; 17(9): 2937-2952, 2018 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-30088773

RESUMO

In proteogenomic studies, many genome-annotated events, for example, single amino acid variation (SAAV) and short INDEL, are often unobserved in shotgun proteomics. Therefore, we propose an analysis pipeline called LeTE-fusion (Le, peptide length; T, theoretical values; E, experimental data) to first investigate whether peptides with certain lengths are observed more often in mass spectrometry (MS)-based proteomics, which may hinder peptide identification causing difficulty in detecting genome-annotated events. By applying LeTE-fusion on different MS-based proteome data sets, we found peptides within 7-20 amino acids are more frequently identified, possibly attributed to MS-related factors instead of proteases. We then further extended the usage of LeTE-fusion on four variant-containing-sequence data sets (SAAV-only) with various sample complexity up to the whole human proteome scale, which yields theoretically ∼70% variants observable in an ideal shotgun proteomics. However, only ∼40% of variants might be detectable in real shotgun proteomic experiments when LeTE-fusion utilizes the experimentally observed variant-site-containing wild-type peptides in PeptideAtlas to estimate the expected observable coverage of variants. Finally, we conducted a case study on HEK293 cell line with variants reported at genomic level that were also identified in shotgun proteomics to demonstrate the efficacy of LeTE-fusion on estimating expected observable coverage of variants. To the best of our knowledge, this is the first study to systematically investigate the detection limits of genome-annotated events via shotgun proteomics using such analysis pipeline.


Assuntos
Genoma Humano , Peptídeos/análise , Proteogenômica/métodos , Proteoma/análise , Sequência de Aminoácidos , Bases de Dados de Proteínas , Conjuntos de Dados como Assunto , Células HEK293 , Humanos , Rim/química , Rim/metabolismo , Peptídeos/química , Proteólise , Proteoma/genética , Proteoma/metabolismo
13.
J Proteome Res ; 16(12): 4415-4424, 2017 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-28929764

RESUMO

To confirm the existence of missing proteins, we need to identify at least two unique peptides with length of 9-40 amino acids of a missing protein in bottom-up mass-spectrometry-based proteomic experiments. However, an identified unique peptide of the missing protein, even identified with high level of confidence, could possibly coincide with a peptide of a commonly observed protein due to isobaric substitutions, mass modifications, alternative splice isoforms, or single amino acid variants (SAAVs). Besides unique peptides of missing proteins, identified variant peptides (SAAV-containing peptides) could also alternatively map to peptides of other proteins due to the aforementioned issues. Therefore, we conducted a thorough comparative analysis on data sets in PeptideAtlas Tiered Human Integrated Search Proteome (THISP, 2017-03 release), including neXtProt (2017-01 release), to systematically investigate the possibility of unique peptides in missing proteins (PE2-4), unique peptides in dubious proteins, and variant peptides affected by isobaric substitutions, causing doubtful identification results. In this study, we considered 11 isobaric substitutions. From our analysis, we found <5% of the unique peptides of missing proteins and >6% of variant peptides became shared with peptides of PE1 proteins after isobaric substitutions.


Assuntos
Peptídeos/análise , Proteoma/análise , Sequência de Aminoácidos , Bases de Dados de Proteínas , Humanos , Isoformas de Proteínas , Espectrometria de Massas em Tandem
14.
Sci Rep ; 7: 44021, 2017 03 14.
Artigo em Inglês | MEDLINE | ID: mdl-28290473

RESUMO

Although EGFR tyrosine kinase inhibitors (TKIs) have demonstrated good efficacy in non-small-cell lung cancer (NSCLC) patients harboring EGFR mutations, most patients develop intrinsic and acquired resistance. We quantitatively profiled the phosphoproteome and proteome of drug-sensitive and drug-resistant NSCLC cells under gefitinib treatment. The construction of a dose-dependent responsive kinase-substrate network of 1548 phosphoproteins and 3834 proteins revealed CK2-centric modules as the dominant core network for the potential gefitinib resistance-associated proteins. CK2 knockdown decreased cell survival in gefitinib-resistant NSCLCs. Using motif analysis to identify the CK2 core sub-network, we verified that elevated phosphorylation level of a CK2 substrate, HMGA1 was a critical node contributing to EGFR-TKI resistance in NSCLC cell. Both HMGA1 knockdown or mutation of the CK2 phosphorylation site, S102, of HMGA1 reinforced the efficacy of gefitinib in resistant NSCLC cells through reactivation of the downstream signaling of EGFR. Our results delineate the TKI resistance-associated kinase-substrate network, suggesting a potential therapeutic strategy for overcoming TKI-induced resistance in NSCLC.


Assuntos
Antineoplásicos/farmacologia , Carcinoma Pulmonar de Células não Pequenas/tratamento farmacológico , Resistencia a Medicamentos Antineoplásicos , Proteína HMGA1a/metabolismo , Neoplasias Pulmonares/tratamento farmacológico , Inibidores de Proteínas Quinases/farmacologia , Quinazolinas/farmacologia , Apoptose , Carcinoma Pulmonar de Células não Pequenas/metabolismo , Linhagem Celular Tumoral , Receptores ErbB/metabolismo , Gefitinibe , Humanos , Neoplasias Pulmonares/metabolismo , Fosforilação , Mapas de Interação de Proteínas , Proteômica
15.
Nucleic Acids Res ; 44(W1): W575-80, 2016 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-27084943

RESUMO

MAGIC-web is the first web server, to the best of our knowledge, that performs both untargeted and targeted analyses of mass spectrometry-based glycoproteomics data for site-specific N-linked glycoprotein identification. The first two modules, MAGIC and MAGIC+, are designed for untargeted and targeted analysis, respectively. MAGIC is implemented with our previously proposed novel Y1-ion pattern matching method, which adequately detects Y1- and Y0-ion without prior information of proteins and glycans, and then generates in silico MS(2) spectra that serve as input to a database search engine (e.g. Mascot) to search against a large-scale protein sequence database. On top of that, the newly implemented MAGIC+ allows users to determine glycopeptide sequences using their own protein sequence file. The third module, Reports Integrator, provides the service of combining protein identification results from Mascot and glycan-related information from MAGIC-web to generate a complete site-specific protein-glycan summary report. The last module, Glycan Search, is designed for the users who are interested in finding possible glycan structures with specific numbers and types of monosaccharides. The results from MAGIC, MAGIC+ and Reports Integrator can be downloaded via provided links whereas the annotated spectra and glycan structures can be visualized in the browser. MAGIC-web is accessible from http://ms.iis.sinica.edu.tw/MAGIC-web/index.html.


Assuntos
Glicoproteínas/análise , Glicoproteínas/química , Internet , Polissacarídeos/análise , Polissacarídeos/química , Software , Simulação por Computador , Bases de Dados de Proteínas , Glicopeptídeos/análise , Glicopeptídeos/química , Humanos , Espectrometria de Massas , Proteômica , Ferramenta de Busca , Interface Usuário-Computador , Navegador
16.
J Proteome Res ; 14(12): 5396-407, 2015 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-26549055

RESUMO

Protein experiment evidence at protein level from mass spectrometry and antibody experiments are essential to characterize the human proteome. neXtProt (2014-09 release) reported 20 055 human proteins, including 16 491 proteins identified at protein level and 3564 proteins unidentified. Excluding 616 proteins at uncertain level, 2948 proteins were regarded as missing proteins. Missing proteins were unidentified partially due to MS limitations and intrinsic properties of proteins, for example, only appearing in specific diseases or tissues. Despite such reasons, it is desirable to explore issues affecting validation of missing proteins from an "ideal" shotgun analysis of human proteome. We thus performed in silico digestions on the human proteins to generate all in silico fully digested peptides. With these presumed peptides, we investigated the identification of proteins without any unique peptide, the effect of sequence variants on protein identification, difficulties in identifying olfactory receptors, and highly similar proteins. Among all proteins with evidence at transcript level, G protein-coupled receptors and olfactory receptors, based on InterPro classification, were the largest families of proteins and exhibited more frequent variants. To identify missing proteins, the above analyses suggested including sequence variants in protein FASTA for database searching. Furthermore, evidence of unique peptides identified from MS experiments would be crucial for experimentally validating missing proteins.


Assuntos
Proteômica/métodos , Sequência de Aminoácidos , Anexinas/química , Anexinas/genética , Biologia Computacional/métodos , Simulação por Computador , Bases de Dados de Proteínas , Variação Genética , Humanos , Interações Hidrofóbicas e Hidrofílicas , Espectrometria de Massas , Anotação de Sequência Molecular , Dados de Sequência Molecular , Fragmentos de Peptídeos/química , Fragmentos de Peptídeos/genética , Fragmentos de Peptídeos/isolamento & purificação , Proteólise , Proteoma/química , Proteoma/genética , Proteoma/isolamento & purificação , Proteômica/estatística & dados numéricos , Receptores Odorantes/química , Receptores Odorantes/genética , Receptores Odorantes/isolamento & purificação
17.
Anal Chem ; 87(24): 12016-23, 2015 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-26554430

RESUMO

Membrane proteins are crucial targets for cancer biomarker discovery and drug development. However, in addition to the inherent challenges of hydrophobicity and low abundance, complete membrane proteome coverage of clinical specimen is usually hindered by the requirement of large amount of starting materials. Toward comprehensive membrane proteomic profiling for small amounts of samples (10 µg), we developed high-pH reverse phase (Hp-RP) combined with stop-and-go extraction tip (StageTip) technique, as a fast (∼15 min.), sensitive, reproducible, high-resolution and multiplexed fractionation method suitable for accurate quantification of the membrane proteome. This approach provided almost 2-fold enhanced detection of peptides encompassing transmembrane helix (TMH) domain, as compared with strong anion exchange (SAX) and strong cation exchange (SCX) StageTip techniques. Almost 5000 proteins (∼60% membrane proteins) can be identified in only 10 µg of membrane protein digests, showing the superior sensitivity of the Hp-RP StageTip approach. The method allowed up to 9- and 6-fold increase in the identification of unique hydrophobic and hydrophilic peptides, respectively. The Hp-RP StageTip method enabled in-depth membrane proteome profiling of 11 lung cancer cell lines harboring different EGFR mutation status, which resulted in the identification of 3983 annotated membrane proteins. This provides the largest collection of reference peptide spectral data for lung cancer membrane subproteome. Finally, relative quantification of membrane proteins between Gefitinib-resistant and -sensitive lung cancer cell lines revealed several up-regulated membrane proteins with key roles in lung cancer progression.


Assuntos
Proteínas de Membrana/análise , Proteínas de Membrana/isolamento & purificação , Proteômica/métodos , Linhagem Celular Tumoral , Humanos , Limite de Detecção , Neoplasias Pulmonares/fisiopatologia , Proteínas de Membrana/química , Proteínas de Membrana/genética , Modelos Biológicos , Mutação , Fatores de Tempo
18.
J Proteome Res ; 14(9): 3658-69, 2015 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-26202522

RESUMO

Despite significant efforts in the past decade toward complete mapping of the human proteome, 3564 proteins (neXtProt, 09-2014) are still "missing proteins". Over one-third of these missing proteins are annotated as membrane proteins, owing to their relatively challenging accessibility with standard shotgun proteomics. Using nonsmall cell lung cancer (NSCLC) as a model study, we aim to mine missing proteins from disease-associated membrane proteome, which may be still largely under-represented. To increase identification coverage, we employed Hp-RP StageTip prefractionation of membrane-enriched samples from 11 NSCLC cell lines. Analysis of membrane samples from 20 pairs of tumor and adjacent normal lung tissue was incorporated to include physiologically expressed membrane proteins. Using multiple search engines (X!Tandem, Comet, and Mascot) and stringent evaluation of FDR (MAYU and PeptideShaker), we identified 7702 proteins (66% membrane proteins) and 178 missing proteins (74 membrane proteins) with PSM-, peptide-, and protein-level FDR of 1%. Through multiple reaction monitoring using synthetic peptides, we provided additional evidence of eight missing proteins including seven with transmembrane helix domains. This study demonstrates that mining missing proteins focused on cancer membrane subproteome can greatly contribute to map the whole human proteome. All data were deposited into ProteomeXchange with the identifier PXD002224.


Assuntos
Proteínas de Membrana/química , Espectrometria de Massas em Tandem/métodos , Sequência de Aminoácidos , Linhagem Celular Tumoral , Cromatografia Líquida/métodos , Humanos , Concentração de Íons de Hidrogênio , Dados de Sequência Molecular , Proteoma
19.
Anal Chem ; 86(1): 685-93, 2014 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-24313913

RESUMO

Methodologies to enrich heterogeneous types of phosphopeptides are critical for comprehensive mapping of the under-explored phosphoproteome. Taking advantage of the distinct binding affinities of Ga(3+) and Fe(3+) for phosphopeptides, we designed a metal-directed immobilized metal ion affinity chromatography for the sequential enrichment of phosphopeptides. In Raji B cells, the sequential Ga(3+)-Fe(3+)-immobilized metal affinity chromatography (IMAC) strategy displayed a 1.5-3.5-fold superior phosphoproteomic coverage compared to single IMAC (Fe(3+), Ti(4+), Ga(3+), and Al(3+)). In addition, up to 92% of the 6283 phosphopeptides were uniquely enriched in either the first Ga(3+)-IMAC (41%) or second Fe(3+)-IMAC (51%). The complementary properties of Ga(3+) and Fe(3+) were further demonstrated through the exclusive enrichment of almost all of 1214 multiply phosphorylated peptides (99.4%) in the Ga(3+)-IMAC, whereas only 10% of 5069 monophosphorylated phosphopeptides were commonly enriched in both fractions. The application of sequential Ga(3+)-Fe(3+)-IMAC to human lung cancer tissue allowed the identification of 2560 unique phosphopeptides with only 8% overlap. In addition to the above-mentioned mono- and multiply phosphorylated peptides, this fractionation ability was also demonstrated on the basic and acidic phosphopeptides: acidophilic phosphorylation sites were predominately enriched in the first Ga(3+)-IMAC (72%), while Pro-directed (85%) and basophilic (79%) phosphorylation sites were enriched in the second Fe(3+)-IMAC. This strategy provided complementary mapping of different kinase substrates in multiple cellular pathways related to cancer invasion and metastasis of lung cancer. Given the fractionation ability and ease of tip preparation of this Ga(3+)-Fe(3+)-IMAC, we propose that this strategy allows more comprehensive characterization of the phosphoproteome both in vitro and in vivo.


Assuntos
Cromatografia de Afinidade/métodos , Metais/química , Proteômica/métodos , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz/métodos , Linhagem Celular Tumoral , Células Imobilizadas , Humanos
20.
J Proteome Res ; 12(1): 33-44, 2013 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-23256888

RESUMO

Chromosome 4 is the fourth largest chromosome, containing approximately 191 megabases (~6.4% of the human genome) with 757 protein-coding genes. A number of marker genes for many diseases have been found in this chromosome, including genetic diseases (e.g., hepatocellular carcinoma) and biomedical research (cardiac system, aging, metabolic disorders, immune system, cancer and stem cell) related genes (e.g., oncogenes, growth factors). As a pilot study for the chromosome 4-centric human proteome project (Chr 4-HPP), we present here a systematic analysis of the disease association, protein isoforms, coding single nucleotide polymorphisms of these 757 protein-coding genes and their experimental evidence at the protein level. We also describe how the findings from the chromosome 4 project might be used to drive the biomarker discovery and validation study in disease-oriented projects, using the examples of secretomic and membrane proteomic approaches in cancer research. By integrating with cancer cell secretomes and several other existing databases in the public domain, we identified 141 chromosome 4-encoded proteins as cancer cell-secretable/shedable proteins. Additionally, we also identified 54 chromosome 4-encoded proteins that have been classified as cancer-associated proteins with successful selected or multiple reaction monitoring (SRM/MRM) assays developed. From literature annotation and topology analysis, 271 proteins were recognized as membrane proteins while 27.9% of the 757 proteins do not have any experimental evidence at the protein-level. In summary, the analysis revealed that the chromosome 4 is a rich resource for cancer-associated proteins for biomarker verification projects and for drug target discovery projects.


Assuntos
Cromossomos Humanos Par 4 , Doença , Proteínas , Cromossomos Humanos Par 4/classificação , Cromossomos Humanos Par 4/genética , Biologia Computacional , Bases de Dados de Proteínas , Doença/classificação , Doença/genética , Genoma Humano , Projeto Genoma Humano , Humanos , Projetos Piloto , Proteínas/classificação , Proteínas/genética , Proteínas/metabolismo , Proteômica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA