Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Molecules ; 26(11)2021 Jun 02.
Artículo en Inglés | MEDLINE | ID: mdl-34199411

RESUMEN

The human testis and epididymis play critical roles in male fertility, including the spermatogenesis process, sperm storage, and maturation. However, the unique functions of the two organs had not been systematically studied. Herein, we provide a systematic and comprehensive multi-omics study between testis and epididymis. RNA-Seq profiling detected and quantified 19,653 in the testis and 18,407 in the epididymis. Proteomic profiling resulted in the identification of a total of 11,024 and 10,386 proteins in the testis and epididymis, respectively, including 110 proteins that previously have been classified as MPs (missing proteins). Furthermore, Five MPs expressed in testis were validated by the MRM method. Subsequently, multi-omcis between testis and epididymis were performed, including biological functions and pathways of DEGs (Differentially Expressed Genes) in each group, revealing that those differences were related to spermatogenesis, male gamete generation, as well as reproduction. In conclusion, this study can help us find the expression regularity of missing protein and help related scientists understand the physiological functions of testis and epididymis more deeply.


Asunto(s)
Epidídimo/química , Perfilación de la Expresión Génica/métodos , Mapas de Interacción de Proteínas , Proteómica/métodos , Testículo/química , Cromatografía Liquida , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Mutación , Especificidad de Órganos , Análisis de Secuencia de ARN , Espermatogénesis , Espectrometría de Masas en Tándem
2.
J Proteome Res ; 18(12): 4167-4179, 2019 12 06.
Artículo en Inglés | MEDLINE | ID: mdl-31601107

RESUMEN

With 2129 proteins still classified by the Human Proteome Organisation Human Proteome Project (HPP) as "missing" without compelling evidence of protein existence (PE) in humans, we hypothesized that in-depth proteomic characterization of tissues that are technically challenging to access and extract would yield evidence for tissue-specific missing proteins. Paradoxically, although the skeleton is the most massive tissue system in humans, as one of the poorest characterized by proteomics, bone falls under the HPP umbrella term as a "rare tissue". Therefore, we aimed to optimize mineralized tissue protein extraction methodology and workflows for proteomic and data analyses of small quantities of healthy young adult human alveolar bone. Osteoid was solubilized by GuHCl extraction, with hydroxyapatite-bound proteins then released by ethylenediaminetetraacetic acid demineralization. A subsequent GuHCl solubilization extraction was followed by solid-phase digestion of the remaining insoluble cross-linked protein using trypsin and then 6 M urea dissolution incorporating LysC digestion. Bone extracts were digested in parallel using trypsin, LysargiNase, AspN, or GluC prior to liquid chromatography-mass spectrometry analysis. Terminal Amine Isotopic Labeling of Substrates was used to purify semitryptic peptides, identifying natural and proteolytic-cleaved neo N-termini of bone proteins. Our strategy enabled complete solubilization of the organic bone matrix leading to extensive categorization of bone proteins in different bone matrix extracts, and hence matrix compartments, for the first time. Moreover, this led to the high confidence identification of pannexin-3, a "missing protein", found only in the insoluble collagenous matrix and revealed for the first time by trypsin solid-phase digestion. We also found a singleton proteotypic peptide of another missing protein, meiosis inhibitor protein 1. We also identified 17 proteins classified in neXtprot as PE1 based on evidence other than from MS, termed non-MS PE1 proteins, including ≥9-mer proteotypic peptides of four proteins.


Asunto(s)
Proceso Alveolar/química , Proteínas/aislamiento & purificación , Proteómica/métodos , Adolescente , Fraccionamiento Químico , Conexinas/análisis , Conexinas/aislamiento & purificación , Bases de Datos de Proteínas , Durapatita/química , Ácido Edético/química , Femenino , Humanos , Marcaje Isotópico , Espectrometría de Masas , Mapeo Peptídico , Proteínas/metabolismo , Solubilidad , Tripsina/química , Adulto Joven
3.
Proteomics ; 18(8): e1700386, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29474001

RESUMEN

Chromosome-centric Human Proteome Project aims at identifying and characterizing protein products encoded from all human protein-coding genes. As of early 2017, 19 837 protein-coding genes have been annotated in the neXtProt database including 2691 missing proteins that have never been identified by mass spectrometry. Missing proteins may be low abundant in many cell types or expressed only in a few cell types in human body such as sperms in testis. In this study, we performed expression proteomics of two near-haploid cell types such as HAP1 and KBM-7 to hunt for missing proteins. Proteomes from the two haploid cell lines were analyzed on an LTQ Orbitrap Velos, producing a total of 200 raw mass spectrometry files. After applying 1% false discovery rates at both levels of peptide-spectrum matches and proteins, more than 10 000 proteins were identified from HAP1 and KBM-7, resulting in the identification of nine missing proteins. Next, unmatched spectra were searched against protein databases translated in three frames from noncoding RNAs derived from RNA-Seq data, resulting in six novel protein-coding regions after careful manual inspection. This study demonstrates that expression proteomics coupled to proteogenomic analysis can be employed to identify many annotated and unannotated missing proteins.


Asunto(s)
Haploidia , Proteogenómica/métodos , Proteoma/genética , Transcriptoma , Secuencia de Aminoácidos , Línea Celular , Humanos , Proteoma/análisis , ARN no Traducido/genética , Análisis de Secuencia de ARN/métodos , Espectrometría de Masas en Tándem/métodos
4.
J Proteome Res ; 17(12): 4042-4050, 2018 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-30269496

RESUMEN

An important goal of the Human Proteome Organization (HUPO) Chromosome-centric Human Proteome Project (C-HPP) is to correctly define the number of canonical proteins encoded by their cognate open reading frames on each chromosome in the human genome. When identified with high confidence of protein evidence (PE), such proteins are termed PE1 proteins in the online database resource, neXtProt. However, proteins that have not been identified unequivocally at the protein level but that have other evidence suggestive of their existence (PE2-4) are termed missing proteins (MPs). The number of MPs has been reduced from 5511 in 2012 to 2186 in 2018 (neXtProt 2018-01-17 release). Although the annotation of the human proteome has made significant progress, the "parts list" alone does not inform function. Indeed, 1937 proteins representing ∼10% of the human proteome have no function either annotated from experimental characterization or predicted by homology to other proteins. Specifically, these 1937 "dark proteins" of the so-called dark proteome are composed of 1260 functionally uncharacterized but identified PE1 proteins, designated as uPE1, plus 677 MPs from categories PE2-PE4, which also have no known or predicted function and are termed uMPs. At the HUPO-2017 Annual Meeting, the C-HPP officially adopted the uPE1 pilot initiative, with 14 participating international teams later committing to demonstrate the feasibility of the functional characterization of large numbers of dark proteins (CP), starting first with 50 uPE1 proteins, in a stepwise chromosome-centric organizational manner. The second aim of the feasibility phase to characterize protein (CP) functions of 50 uPE1 proteins, termed the neXt-CP50 initiative, is to utilize a variety of approaches and workflows according to individual team expertise, interest, and resources so as to enable the C-HPP to recommend experimentally proven workflows to the proteome community within 3 years. The results from this pilot will not only be the cornerstone of a larger characterization initiative but also enhance understanding of the human proteome and integrated cellular networks for the discovery of new mechanisms of pathology, mechanistically informative biomarkers, and rational drug targets.


Asunto(s)
Cromosomas Humanos/genética , Bases de Datos de Proteínas , Proteoma/análisis , Genoma Humano , Humanos , Espectrometría de Masas , Anotación de Secuencia Molecular , Sistemas de Lectura Abierta , Proyectos Piloto , Proteoma/genética
5.
J Proteome Res ; 16(12): 4455-4467, 2017 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-28960081

RESUMEN

One of the major goals of the Chromosome-Centric Human Proteome Project (C-HPP) is to fill the knowledge gaps between human genomic information and the corresponding proteomic information. These gaps are due to "missing" proteins (MPs)-predicted proteins with insufficient evidence from mass spectrometry (MS), biochemical, structural, or antibody analyses-that currently account for 2579 of the 19587 predicted human proteins (neXtProt, 2017-01). We address some of the lessons learned from the inconsistent annotations of missing proteins in databases (DB) and demonstrate a systematic proteogenomic approach designed to explore a potential new function of a known protein. To illustrate a cautious and strategic approach for characterization of novel function in vitro and in vivo, we present the case of Na(+)/H(+) exchange regulatory cofactor 1 (NHERF1/SLC9A3R1, located at chromosome 17q25.1; hereafter NHERF1), which was mistakenly labeled as an MP in one DB (Global Proteome Machine Database; GPMDB, 2011-09 release) but was well known in another public DB and in the literature. As a first step, NHERF1 was determined by MS and immunoblotting for its molecular identity. We next investigated the potential new function of NHERF1 by carrying out the quantitative MS profiling of placental trophoblasts (PXD004723) and functional study of cytotrophoblast JEG-3 cells. We found that NHERF1 was associated with trophoblast differentiation and motility. To validate this newly found cellular function of NHERF1, we used the Caenorhabditis elegans mutant of nrfl-1 (a nematode ortholog of NHERF1), which exhibits a protruding vulva (Pvl) and egg-laying-defective phenotype, and performed genetic complementation work. The nrfl-1 mutant was almost fully rescued by the transfection of the recombinant transgenic construct that contained human NHERF1. These results suggest that NHERF1 could have a previously unknown function in pregnancy and in the development of human embryos. Our study outlines a stepwise experimental platform to explore new functions of ambiguously denoted candidate proteins and scrutinizes the mandated DB search for the selection of MPs to study in the future.


Asunto(s)
Fosfoproteínas/fisiología , Proteogenómica/métodos , Intercambiadores de Sodio-Hidrógeno/fisiología , Animales , Caenorhabditis elegans/genética , Diferenciación Celular , Movimiento Celular , Bases de Datos de Proteínas , Femenino , Humanos , Immunoblotting , Espectrometría de Masas , Reproducción , Transgenes , Trofoblastos/citología
6.
J Proteome Res ; 15(11): 4082-4090, 2016 11 04.
Artículo en Inglés | MEDLINE | ID: mdl-27537616

RESUMEN

In the Chromosome-Centric Human Proteome Project (C-HPP), false-positive identification by peptide spectrum matches (PSMs) after database searches is a major issue for proteogenomic studies using liquid-chromatography and mass-spectrometry-based large proteomic profiling. Here we developed a simple strategy for protein identification, with a controlled false discovery rate (FDR) at the protein level, using an integrated proteomic pipeline (IPP) that consists of four engrailed steps as follows. First, using three different search engines, SEQUEST, MASCOT, and MS-GF+, individual proteomic searches were performed against the neXtProt database. Second, the search results from the PSMs were combined using statistical evaluation tools including DTASelect and Percolator. Third, the peptide search scores were converted into E-scores normalized using an in-house program. Last, ProteinInferencer was used to filter the proteins containing two or more peptides with a controlled FDR of 1.0% at the protein level. Finally, we compared the performance of the IPP to a conventional proteomic pipeline (CPP) for protein identification using a controlled FDR of <1% at the protein level. Using the IPP, a total of 5756 proteins (vs 4453 using the CPP) including 477 alternative splicing variants (vs 182 using the CPP) were identified from human hippocampal tissue. In addition, a total of 10 missing proteins (vs 7 using the CPP) were identified with two or more unique peptides, and their tryptic peptides were validated using MS/MS spectral pattern from a repository database or their corresponding synthetic peptides. This study shows that the IPP effectively improved the identification of proteins, including alternative splicing variants and missing proteins, in human hippocampal tissues for the C-HPP. All RAW files used in this study were deposited in ProteomeXchange (PXD000395).


Asunto(s)
Hipocampo/química , Proteogenómica/métodos , Proteómica/métodos , Motor de Búsqueda , Empalme Alternativo , Biología Computacional/métodos , Bases de Datos de Proteínas , Reacciones Falso Positivas , Humanos , Espectrometría de Masas/métodos
7.
J Proteome Res ; 14(9): 3680-92, 2015 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-26144840

RESUMEN

As part of the Chromosome-Centric Human Proteome Project (C-HPP) mission, laboratories all over the world have tried to map the entire missing proteins (MPs) since 2012. On the basis of the first and second Chinese Chromosome Proteome Database (CCPD 1.0 and 2.0) studies, we developed systematic enrichment strategies to identify MPs that fell into four classes: (1) low molecular weight (LMW) proteins, (2) membrane proteins, (3) proteins that contained various post-translational modifications (PTMs), and (4) nucleic acid-associated proteins. Of 8845 proteins identified in 7 data sets, 79 proteins were classified as MPs. Among data sets derived from different enrichment strategies, data sets for LMW and PTM yielded the most novel MPs. In addition, we found that some MPs were identified in multiple-data sets, which implied that tandem enrichments methods might improve the ability to identify MPs. Moreover, low expression at the transcription level was the major cause of the "missing" of these MPs; however, MPs with higher expression level also evaded identification, most likely due to other characteristics such as LMW, high hydrophobicity and PTM. By combining a stringent manual check of the MS2 spectra with peptides synthesis verification, we confirmed 30 MPs (neXtProt PE2 ∼ PE4) and 6 potential MPs (neXtProt PE5) with authentic MS evidence. By integrating our large-scale data sets of CCPD 2.0, the number of identified proteins has increased considerably beyond simulation saturation. Here, we show that special enrichment strategies can break through the data saturation bottleneck, which could increase the efficiency of MP identification in future C-HPP studies. All 7 data sets have been uploaded to ProteomeXchange with the identifier PXD002255.


Asunto(s)
Proteínas/química , Proteoma , Adulto , Anciano , Anciano de 80 o más Años , Línea Celular , Femenino , Humanos , Masculino , Persona de Mediana Edad , Espectrometría de Masas en Tándem
8.
J Proteome Res ; 14(12): 4985-94, 2015 Dec 04.
Artículo en Inglés | MEDLINE | ID: mdl-26561870

RESUMEN

Although the "missing protein" is a temporary concept in C-HPP, the biological information for their "missing" could be an important clue in evolutionary studies. Here we classified missing-protein-encoding genes into two groups, the genes encoding PE2 proteins (with transcript evidence) and the genes encoding PE3/4 proteins (with no transcript evidence). These missing-protein-encoding genes distribute unevenly among different chromosomes, chromosomal regions, or gene clusters. In the view of evolutionary features, PE3/4 genes tend to be young, spreading at the nonhomology chromosomal regions and evolving at higher rates. Interestingly, there is a higher proportion of singletons in PE3/4 genes than the proportion of singletons in all genes (background) and OTCSGs (organ, tissue, cell type-specific genes). More importantly, most of the paralogous PE3/4 genes belong to the newly duplicated members of the paralogous gene groups, which mainly contribute to special biological functions, such as "smell perception". These functions are heavily restricted into specific type of cells, tissues, or specific developmental stages, acting as the new functional requirements that facilitated the emergence of the missing-protein-encoding genes during evolution. In addition, the criteria for the extremely special physical-chemical proteins were first set up based on the properties of PE2 proteins, and the evolutionary characteristics of those proteins were explored. Overall, the evolutionary analyses of missing-protein-encoding genes are expected to be highly instructive for proteomics and functional studies in the future.


Asunto(s)
Cromosomas Humanos , Proteínas/fisiología , Evolución Molecular , Duplicación de Gen , Humanos , Proteínas/química , Proteínas/genética
9.
J Proteome Res ; 14(12): 5007-16, 2015 Dec 04.
Artículo en Inglés | MEDLINE | ID: mdl-26584007

RESUMEN

This is a report of a human proteome project (HPP) related to chromosome 9 (Chr 9). To reveal missing proteins and undiscovered features in proteogenomes, both LC-MS/MS analysis and next-generation RNA sequencing (RNA-seq)-based identification and characterization were conducted on five pairs of lung adenocarcinoma tumors and adjacent nontumor tissues. Before our previous Chromosome-Centric Human Proteome Project (C-HPP) special issue, there were 170 remaining missing proteins on Chr 9 (neXtProt 2013.09.26 rel.); 133 remain at present (neXtProt 2015.04.28 rel.). In the proteomics study, we found two missing protein candidates that require follow-up work and one unrevealed protein across all chromosomes. RNA-seq analysis detected RNA expression for four nonsynonymous (NS) single nucleotide polymorphisms (SNPs) (in CDH17, HIST1H1T, SAPCD2, and ZNF695) and three synonymous SNPs (in CDH17, CST1, and HNF1A) in all five tumor tissues but not in any of the adjacent normal tissues. By constructing a cancer patient sample-specific protein database based on individual RNA-seq data and by searching the proteomics data from the same sample, we identified four missense mutations in four genes (LTF, HDLBP, TF, and HBD). Two of these mutations were found in tumor samples but not in paired normal tissues. In summary, our proteogenomic study of human primary lung tumor tissues detected additional and revealed novel missense mutations and synonymous SNP signatures, some of which are specific to lung cancers. Data from mass spectrometry have been deposited in the ProteomeXchange with the identifier PXD002523.


Asunto(s)
Adenocarcinoma/genética , Adenocarcinoma/metabolismo , Cromosomas Humanos Par 9 , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/metabolismo , Mutación Missense , Polimorfismo de Nucleótido Simple , Adenocarcinoma del Pulmón , Adulto , Anciano , Cadherinas/genética , Cadherinas/metabolismo , Femenino , Perfilación de la Expresión Génica , Humanos , Masculino , Persona de Mediana Edad , Péptidos/análisis , Péptidos/genética , Proteoma/genética , Seudogenes , ARN Largo no Codificante , Análisis de Secuencia de ARN , Espectrometría de Masas en Tándem
10.
J Proteome Res ; 14(9): 3710-9, 2015 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-26272709

RESUMEN

Since the launch of the Chromosome-centric Human Proteome Project (C-HPP) in 2012, the number of "missing" proteins has fallen to 2932, down from ∼5932 since the number was first counted in 2011. We compared the characteristics of missing proteins with those of already annotated proteins with respect to transcriptional expression pattern and the time periods in which newly identified proteins were annotated. We learned that missing proteins commonly exhibit lower levels of transcriptional expression and less tissue-specific expression compared with already annotated proteins. This makes it more difficult to identify missing proteins as time goes on. One of the C-HPP goals is to identify alternative spliced product of proteins (ASPs), which are usually difficult to find by shot-gun proteomic methods due to their sequence similarities with the representative proteins. To resolve this problem, it may be necessary to use a targeted proteomics approach (e.g., selected and multiple reaction monitoring [S/MRM] assays) and an innovative bioinformatics platform that enables the selection of target peptides for rarely expressed missing proteins or ASPs. Given that the success of efforts to identify missing proteins may rely on more informative public databases, it was necessary to upgrade the available integrative databases. To this end, we attempted to improve the features and utility of GenomewidePDB by integrating transcriptomic information (e.g., alternatively spliced transcripts), annotated peptide information, and an advanced search interface that can find proteins of interest when applying a targeted proteomics strategy. This upgraded version of the database, GenomewidePDB 2.0, may not only expedite identification of the remaining missing proteins but also enhance the exchange of information among the proteome community. GenomewidePDB 2.0 is available publicly at http://genomewidepdb.proteomix.org/.


Asunto(s)
Mapeo Cromosómico , Bases de Datos Genéticas , Bases de Datos de Proteínas , Genoma Humano , Proteoma , Secuencia de Aminoácidos , Humanos , Datos de Secuencia Molecular , Homología de Secuencia de Aminoácido
11.
J Proteome Res ; 14(12): 4959-66, 2015 Dec 04.
Artículo en Inglés | MEDLINE | ID: mdl-26330117

RESUMEN

Approximately 2.9 billion long base-pair human reference genome sequences are known to encode some 20 000 representative proteins. However, 3000 proteins, that is, ~15% of all proteins, have no or very weak proteomic evidence and are still missing. Missing proteins may be present in rare samples in very low abundance or be only temporarily expressed, causing problems in their detection and protein profiling. In particular, some technical limitations cause missing proteins to remain unassigned. For example, current mass spectrometry techniques have high limits and error rates for the detection of complex biological samples. An insufficient proteome coverage in a reference sequence database and spectral library also raises major issues. Thus, the development of a better strategy that results in greater sensitivity and accuracy in the search for missing proteins is necessary. To this end, we used a new strategy, which combines a reference spectral library search and a simulated spectral library search, to identify missing proteins. We built the human iRefSPL, which contains the original human reference spectral library and additional peptide sequence-spectrum match entries from other species. We also constructed the human simSPL, which contains the simulated spectra of 173 907 human tryptic peptides determined by MassAnalyzer (version 2.3.1). To prove the enhanced analytical performance of the combination of the human iRefSPL and simSPL methods for the identification of missing proteins, we attempted to reanalyze the placental tissue data set (PXD000754). The data from each experiment were analyzed using PeptideProphet, and the results were combined using iProphet. For the quality control, we applied the class-specific false-discovery rate filtering method. All of the results were filtered at a false-discovery rate of <1% at the peptide and protein levels. The quality-controlled results were then cross-checked with the neXtProt DB (2014-09-19 release). The two spectral libraries, iRefSPL and simSPL, were designed to ensure no overlap of the proteome coverage. They were shown to be complementary to spectral library searching and significantly increased the number of matches. From this trial, 12 new missing proteins were identified that passed the following criterion: at least 2 peptides of 7 or more amino acids in length or one of 9 or more amino acids in length with one or more unique sequences. Thus, the iRefSPL and simSPL combination can be used to help identify peptides that have not been detected by conventional sequence database searches with improved sensitivity and a low error rate.


Asunto(s)
Cromosomas Humanos , Bases de Datos de Proteínas , Proteoma , Proteómica/métodos , Secuencia de Aminoácidos , Animales , Biología Computacional/métodos , Genoma Humano , Humanos , Espectrometría de Masas , Datos de Secuencia Molecular , Péptidos/análisis , Proteínas/genética , Proteínas/metabolismo
12.
J Proteome Res ; 14(12): 4976-84, 2015 Dec 04.
Artículo en Inglés | MEDLINE | ID: mdl-26500078

RESUMEN

Considering the technical limitations of mass spectrometry in protein identification, the mRNAs bound to ribosomes (RNC-mRNA) are assumed to reflect the mRNAs participating in the translational process. The RNC-mRNA data are reasoned to be useful for appraising the missing proteins. A set of the multiomics data including free-mRNAs, RNC-mRNAs, and proteomes was acquired from three liver cancer cell lines. On the basis of the missing proteins in neXtProt (release 2014-09-19), the bioinformatics analysis was carried out in three phases: (1) finding how many neXtProt missing proteins have or do not have RNA-seq and/or MS/MS evidence, (2) analyzing specific physicochemical and biological properties of the missing proteins that lack both RNA-seq and MS/MS evidence, and (3) analyzing the combined properties of these missing proteins. Total of 1501 missing proteins were found by neither RNC-mRNA nor MS/MS in the three liver cancer cell lines. For these missing proteins, some are expected higher hydrophobicity, unsuitable detection, or sensory functions as properties at the protein level, while some are predicted to have nonexpressing chromatin structures on the corresponding gene level. With further integrated analysis, we could attribute 93% of them (1391/1501) to these causal factors, which result in the expression products scarcely detected by RNA-seq or MS/MS.


Asunto(s)
Biología Computacional/métodos , Proteínas/análisis , Proteómica/métodos , ARN Mensajero/metabolismo , Ribosomas/metabolismo , Línea Celular Tumoral , Desoxirribonucleasa I/metabolismo , Ontología de Genes , Humanos , Interacciones Hidrofóbicas e Hidrofílicas , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/metabolismo , Biosíntesis de Proteínas , Proteínas/química , Proteínas/genética , Ribosomas/genética , Análisis de Secuencia de ARN , Espectrometría de Masas en Tándem
13.
J Bioinform Comput Biol ; 17(2): 1950013, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-31057071

RESUMEN

Functional Class Scoring (FCS) is a network-based approach previously demonstrated to be powerful in missing protein prediction (MPP). We update its performance evaluation using data derived from new proteomics technology (SWATH) and also checked for reproducibility using two independent datasets profiling kidney tissue proteome. We also evaluated the objectivity of the FCS p-value, and followed up on the value of MPP from predicted complexes. Our results suggest that (1) FCS p -values are non-objective, and are confounded strongly by complex size, (2) best recovery performance do not necessarily lie at standard p -value cutoffs, (3) while predicted complexes may be used for augmenting MPP, they are inferior to real complexes, and are further confounded by issues relating to network coverage and quality and (4) moderate sized complexes of size 5 to 10 still exhibit considerable instability, we find that FCS works best with big complexes. While FCS is a powerful approach, blind reliance on its non-objective p -value is ill-advised.


Asunto(s)
Biología Computacional/métodos , Proteómica/métodos , Algoritmos , Bases de Datos de Proteínas , Humanos , Riñón/metabolismo , Neoplasias Renales/metabolismo , Complejos Multiproteicos , Mapas de Interacción de Proteínas , Proteómica/estadística & datos numéricos , Reproducibilidad de los Resultados
14.
Sheng Wu Gong Cheng Xue Bao ; 34(11): 1860-1869, 2018 Nov 25.
Artículo en Zh | MEDLINE | ID: mdl-30499281

RESUMEN

Small proteins (SPs) are defined as peptides of 100 amino acids or less encoded by short open reading frames (sORFs). SPs participate in a wide range of functions in cells, including gene regulating, cell signaling and metabolism. However, most annotated SPs in all living organisms are currently lacking expression evidence at the protein level and regarded as missing proteins (MPs). High efficient SPs identification is the prerequisite for their functional study and contribution to MPs searching. In this study, we identified 72 SPs and successfully validated 9 MPs from Saccharomyces cerevisiae based on SPs enrichment strategy. In-depth analysis showed that the missing factors of MPs were low molecular weight, low abundant, hydrophobicity, lower codon usage bias and unstable. The small protein-based enrichment can be used as MPs searching strategy, which might provide the foundation for their further function research.


Asunto(s)
Proteínas de Saccharomyces cerevisiae/análisis , Saccharomyces cerevisiae , Codón , Sistemas de Lectura Abierta , Péptidos
15.
Oncotarget ; 9(1): 442-452, 2018 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-29416626

RESUMEN

Glycine N-methyltransferase is a tumor suppressor gene for hepatocellular carcinoma, which can activate DNA methylation by inducing the S-adenosylmethionine to S-adenosylhomocystine. Previous studies have indicated that the expression of Glycine N-methyltransferase is inhibited in hepatocellular carcinoma. To confirm and identify missing proteins, the pathologic analysis of the tumor-bearing mice will provide critical histologic information. Such a mouse model is applied as a screening tool for hepatocellular carcinoma as well as a strategy for missing protein discovery. In this study we designed an analysis platform using the human proteome atlas to compare the possible missing proteins to human whole chromosomes. This will integrate the information from animal studies to establish an optimal technique in the missing protein biomarker discovery.

16.
J Proteomics ; 149: 7-14, 2016 10 21.
Artículo en Inglés | MEDLINE | ID: mdl-27535355

RESUMEN

NeXtProt is a web-based protein knowledge platform that supports research on human proteins. NeXtProt (release 2015-04-28) lists 20,060 proteins, among them, 3373 canonical proteins (16.8%) lack credible experimental evidence at protein level (PE2:PE5). Therefore, they are considered as "missing proteins". A comprehensive bioinformatic workflow has been proposed to analyze these "missing" proteins. The aims of current study were to analyze physicochemical properties, existence and distribution of the tryptic cleavage sites, and to pinpoint the signature peptides of the missing proteins. Our findings showed that 23.7% of missing proteins were hydrophobic proteins possessing transmembrane domains (TMD). Also, forty missing entries generate tryptic peptides were either out of mass detection range (>30aa) or mapped to different proteins (<9aa). Additionally, 21% of missing entries didn't generate any unique tryptic peptides. In silico endopeptidase combination strategy increased the possibility of missing proteins identification. Coherently, using both mature protein database and signal peptidome database could be a promising option to identify some missing proteins by targeting their unique N-terminal tryptic peptide from mature protein database and or C-terminus tryptic peptide from signal peptidome database. In conclusion, Identification of missing protein requires additional consideration during sample preparation, extraction, digestion and data analysis to increase its incidence of identification.


Asunto(s)
Biología Computacional/métodos , Mapeo Peptídico , Péptidos/química , Proteoma/química , Secuencia de Aminoácidos , Simulación por Computador , Bases de Datos de Proteínas , Conjuntos de Datos como Asunto , Humanos , Interacciones Hidrofóbicas e Hidrofílicas , Péptidos/clasificación , Proteoma/clasificación , Tripsina/química
17.
BMC Syst Biol ; 10(Suppl 4): 113, 2016 Dec 23.
Artículo en Inglés | MEDLINE | ID: mdl-28155671

RESUMEN

BACKGROUND: With the rapid development of high-throughput sequencing technology, the proteomics research becomes a trendy field in the post genomics era. It is necessary to identify all the native-encoding protein sequences for further function and pathway analysis. Toward that end, the Human Proteome Organization lunched the Human Protein Project in 2011. However many proteins are hard to be detected by experiment methods, which becomes one of the bottleneck in Human Proteome Project. In consideration of the complicatedness of detecting these missing proteins by using wet-experiment approach, here we use bioinformatics method to pre-filter the missing proteins. RESULTS: Since there are analogy between the biological sequences and natural language, the n-gram models from Natural Language Processing field has been used to filter the missing proteins. The dataset used in this study contains 616 missing proteins from the "uncertain" category of the neXtProt database. There are 102 proteins deduced by the n-gram model, which have high probability to be native human proteins. We perform a detail analysis on the predicted structure and function of these missing proteins and also compare the high probability proteins with other mass spectrum datasets. The evaluation shows that the results reported here are in good agreement with those obtained by other well-established databases. CONCLUSION: The analysis shows that 102 proteins may be native gene-coding proteins and some of the missing proteins are membrane or natively disordered proteins which are hard to be detected by experiment methods.


Asunto(s)
Procesamiento de Lenguaje Natural , Proteoma/metabolismo , Proteómica/métodos , Bases de Datos de Proteínas , Ontología de Genes , Humanos , Espacio Intracelular/metabolismo , Modelos Teóricos , Probabilidad , Transporte de Proteínas , Proteoma/química , Proteoma/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA