RESUMO
To improve the delightful flavor of mulberry wine through semi-artificial inoculation fermentation with Saccharomyces cerevisiae, we studied the dynamics change of microbiota, along with the physicochemical properties and metabolite profiles and their interaction relationship during the fermentation process. The abundance of lactic acid bacteria (Weissella, Lactobacillus, Fructobacillus, and Pediococcus) increased significantly during fermentation, while yeasts gradually established dominance. The inter-kingdom network of the dominant genera analysis further identified the following as core microbiota: Alternaria, Botrytis, Kazachstania, Acremonium, Mycosphaerella, Pediococcus, Gardnerella, and Schizothecium. Additionally, pH, alcohol, and total acid were significantly affected by microbiota variation. Fourteen of all identified volatile compounds with key different aromas were screened using PCA, OPLS-DA, and rOAV. The network of interconnected core microbiota with key different aromas revealed that Kazachstania and Pediococcus had stronger correlations with 1-butanol, 3-methyl-, propanoic acid, and 2-methyl-ethyl ester.
RESUMO
Accurate genome annotation, the foundation of life science research in the genome era, is hampered by limited known gene models, nonstandard start codons, and the limited homology of annotated genes in other organisms. LysargiNase mirrors trypsin at the cleavage sites, providing the opportunity to identify peptides other than tryptic peptides. In this study, we used an in-house developed acetylated LysargiNase (Ac-LysargiNase) with higher activity and stability in non-pathogenic Mycolicibacterium smegmatis MC2 155 to supplement the widely used trypsin in proteomic studies. We identified 27,582 peptides from 3844 annotated proteins and 332 novel genome search-specific peptides (GSSPs). Among these GSSPs, 88 peptides were annotated in another M.smegmatis genome database, and 41 were verified as novel peptides by predicted theoretical spectra and their corresponding 15N-labeling spectra. Further analysis revealed that 17 verified GSSPs corrected the N-terminus of the 13 annotated genes. The other 24 verified GSSPs helped identify 17 novel open reading frames (ORFs) missed in previously annotated M. smegmatis genomes. Among these novel ORFs, four relatively small proteins with amino acid residues less than 100 and three were precisely identified with C-terminal peptides. Ac-LysargiNase helps with genome reannotation by identifying new genes and events in proteogenomic studies. SIGNIFICANCE: Correct genomic annotation is vital in the field of life sciences. The nonstandard start codons seriously affect the confirmation of the translation initiation sites (TISs) of an open reading frame (ORF), and unknown structural genes are easily missed in automated gene prediction. Although proteogenomics presents new avenues for validating gene expression and gene structure refinement based on conventional tryptic peptides, determining the TISs and potential encoding genes is complicated. Thus, validation of TISs and encoding ORFs is crucial and urgent. Therefore, we recommend Ac-LysargiNase, a mirror enzyme of trypsin that can identify additional novel peptides for N-terminal correction and ORF identification.
Assuntos
Peptídeos , Proteômica , Códon de Iniciação , Fases de Leitura Aberta , Peptídeos/metabolismo , Proteínas , Tripsina/químicaRESUMO
Although members of the Mycobacterium tuberculosis complex (MTBC) exhibit high similarity, they are characterized by differences with respect to virulence, immune response, and transmissibility. To understand the virulence of these bacteria and identify potential novel therapeutic targets, we systemically investigated the total cell protein contents of virulent H37Rv, attenuated H37Ra, and avirulent M. bovis BCG vaccine strains at the log and stationary phases, based on tandem mass tag (TMT) quantitative proteomics. Data analysis revealed that we obtained deep-coverage protein identification and high quantification. Although 272 genetic variations were reported in H37Ra and H37Rv, they showed very little expression difference in log and stationary phase. Quantitative comparison revealed H37Ra and H37Rv had significantly dysregulation in log phase (227) compared with stationary phase (61). While BCG and H37Rv, and BCG and H37Ra showed notable differences in stationary phase (1171 and 1124) with respect to log phase (381 and 414). In the log phase, similar patterns of protein abundance were observed between H37Ra and BCG, whereas a more similar expression pattern was observed between H37Rv and H37Ra in the stationary phase. Bioinformatic analysis revealed that the upregulated proteins detected for H37Rv and H37Ra in log phase were virulence-related factors. In both log and stationary phases, the dysregulated proteins detected for BCG, which have also been identified as M. tuberculosis response proteins under dormancy conditions. We accordingly describe the proteomic profiles of H37Rv, H37Ra, and BCG, which we believe will potentially provide a better understanding of H37Rv pathogenesis, H37Ra attenuation, and BCG immuno protection.
Assuntos
Mycobacterium bovis , Mycobacterium tuberculosis , Tuberculose , Vacina BCG , Humanos , Mycobacterium bovis/genética , Mycobacterium tuberculosis/metabolismo , Proteômica/métodos , Tuberculose/microbiologia , Virulência/genética , Fatores de Virulência/metabolismoRESUMO
The Chromosome-Centric Human Proteome Project (C-HPP) was launched in 2012 to perfect the annotation of human protein existence by identifying stronger evidence of the expression of missing proteins (MPs) at the protein level. After an 8 year effort all over the world, the number of MPs in the neXtProt database significantly decreased from 5511 (2012-02-24) to 1899 (2020-01-17). It is now more difficult to provide confident evidence of the remaining MPs because of their specific characteristics, including low abundance, low molecular weight, unexpected modifications, transmembrane structure, tissue-expression specificity, and so on. A higher resolution mass spectrometry (MS) interpretation engine might provide an opportunity to identify these buried MPs in complex samples by the combination with multi-tissue large-scale proteomics. In this study, open-pFind was used to dig MPs from 20 pairs of healthy human tissues by Wang et al. ( Mol. Syst. Biol. 2019, 15 (2), e8503) combined with our large-scale testis data set digested by three enzymes (Glu-C, Lys-C, and trypsin) with specificity for different amino acid residues ( J. Proteme Res. 2019, 18 (12), 4189-4196). A total of 1â¯535â¯536 peptides with 17â¯283â¯477 peptide-spectrum matches (PSMs) were mapped to 14â¯279 protein entries at a false discovery rate of <1% at the PSM, peptide, and protein levels. A total of 103 MP candidates were identified, among which 86 candidates had more unique peptide numbers compared with our single testis tissue. After rigorous screening, manual checks, peptide synthesis, and matching with documented peptides from PeptideAtlas, we validated four MPs, P0C7T8 (duodenum and small intestine), Q8WWZ4 (stomach and rectum), Q8IV35 (fallopian tube), and O14921 (tonsil), at the protein level. All MS raw files have been deposited to the ProteomeXchange with identifier PXD021391.
Assuntos
Proteoma , Proteômica , Feminino , Humanos , Masculino , Espectrometria de Massas , Peso Molecular , PeptídeosRESUMO
Mass spectrometry (MS)-based identification of ubiquitinated sites requires trypsin digestion prior to MS analysis, and a signature peptide was produced with a diglycine residue attached to the ubiquitinated lysine (K-ε-GG peptide). However, the missed cleavage of modified lysines by trypsin results in modified peptides with increased length and charge, whose detection by MS analysis is suppressed by the vast majority of internally unmodified peptides. LysargiNase, the mirrored trypsin, is reported to cleave before lysine and arginine residues and to be favorable for the identification of methylation and phosphorylation, but its digestive characteristics related to ubiquitination are unclear. Herein, we tested the capacity of the in-house developed acetylated LysargiNase (Ac-LysargiNase) with high activity and stability, for cleaving ubiquitinated sites in both the seven types of ubiquitin chains and their corresponding K-ε-GG peptides. Interestingly, Ac-LysargiNase could efficiently cleave the K63-linked chain but had little effect on the other types of chains. Additionally, Ac-LysargiNase had higher exopeptidase activity than trypsin. Utilizing these features of the paired mirror proteases, a workflow of trypsin and Ac-LysargiNase tandem digestion was developed for the identification of ubiquitinated proteins. Through this method, the charge states and ionization capacity of the unmodified peptides were efficiently reduced, and the identification of modified sites was consequently increased by 30% to 50%. Strikingly, approximately 15% of the modified sites were cleaved by Ac-LysargiNase, resulting in shorter K-ε-GG peptides for better identification. The enzyme Ac-LysargiNase is expected to serve as an option for increasing the efficiency of modified site identification in ubiquitome research.
Assuntos
Lisina/análise , Peptídeos/metabolismo , Espectrometria de Massas em Tandem , Tripsina/metabolismo , Sequência de Aminoácidos , Cromatografia Líquida de Alta Pressão , Exopeptidases/metabolismo , Lisina/metabolismo , Peptídeos/química , UbiquitinaçãoRESUMO
In recent years, high-throughput technologies have contributed to the development of a more precise picture of the human proteome. However, 2129 proteins remain listed as missing proteins (MPs) in the newest neXtProt release (2019-02). The main reasons for MPs are a low abundance, a low molecular weight, unexpected modifications, membrane characteristics, and so on. Moreover, >50% of the MS/MS data have not been successfully identified in shotgun proteomics. Open-pFind, an efficient open search engine, recently released by the pFind group in China, might provide an opportunity to identify these buried MPs in complex samples. In this study, proteins and potential MPs were identified using Open-pFind and three other search engines to compare their performance and efficiency with three large-scale data sets digested by three enzymes (Glu-C, Lys-C, and trypsin) with specificity on different amino acid (AA) residues. Our results demonstrated that Open-pFind identified 44.7-93.1% more peptide-spectrum matches and 21.3-61.6% more peptide sequences than the second-best search engine. As a result, Open-pFind detected 53.1% more MP candidates than MaxQuant and 8.8% more candidate MPs than Proteome Discoverer. In total, 5 (PE2) of the 124 MP candidates identified by Open-pFind were verified with 2 or 3 unique peptides containing more than 9 AAs by using a spectrum theoretical prediction with pDeep and synthesized peptide matching with pBuild after spectrum quality analysis, isobaric post-translational modification, and single amino acid variant filtering. These five verified MPs can be saved as PE1 proteins. In addition, three other MP candidates were verified with two unique peptides (one peptide containing more than 9 AAs and the other containing only 8 AAs), which was slightly lower than the criteria listed by C-HPP and required additional verification information. More importantly, unexpected modifications were detected in these MPs. All MS data sets have been deposited into ProteomeXchange with the identifier PXD015759.
Assuntos
Bases de Dados de Proteínas , Software , Testículo/química , Humanos , Masculino , Espectrometria de Massas , Processamento de Proteína Pós-Traducional , Proteínas/análise , Proteínas/genética , Proteínas/metabolismo , Proteômica/métodos , Ferramenta de BuscaRESUMO
Rv2742 is a novel gene identified from Mycobacterium tuberculosis H37Rv by the proteogenomics strategy. The aim of this study was to establish a system of soluble expression and purification of the missing protein Rv2742 in M. tuberculosis H37Rv, to provide reference for further research on the biological function of Rv2742. The soluble protein was not successfully induced by prokaryotic expression vectors pGEX-4T-2-Rv2742, pET-32a-Rv2742, pET-28a-Rv2742 and pMAL-c2X-Rv2742. After the codon of novel gene Rv2742 was optimized according to E. coli codon usage frequency, only the recombinant strain containing plasmid pMAL-c2X-Rv2742 could produce soluble products of Rv2742 encoding gene. In addition, the expression effects of the desired fusion protein were also analyzed under different conditions including hosts, culture temperatures and IPTG concentrations. The optimum expression conditions were as follows: Rosetta (DE3) host, 16 °C culture temperature and 0.5 mmol/L IPTG. After being purified by affinity chromatography with amylose resin, the fusion protein sequence was confirmed by LC-MS/MS. These results indicated that the novel gene Rv2742 product could be successfully induced and expressed in a soluble form by the expression system pMAL-c2X with MBP tag. Our findings provide reference for studies on potential interaction and immunogenicity.
Assuntos
Mycobacterium tuberculosis , Cromatografia Líquida , Clonagem Molecular , Escherichia coli , Mycobacterium tuberculosis/genética , Proteínas Recombinantes de Fusão , Espectrometria de Massas em TandemRESUMO
In 2012, the Chromosome-centric Human Proteome Project (C-HPP) launched an investigation for missing proteins (MPs) to complete the Human Proteome Project (HPP). The majority of the MPs were distributed in low-molecular-weight (LMW) ranges, especially from 0 to 40 kDa. LMW protein identification is challenging, owing to their short length, low abundance, and hydrophobicity. Furthermore, many sequences from trypsin digestion are unlikely to yield detectable peptides or a reasonable quality of MS2 spectrum. Therefore, we focused on small MPs by combining LMW protein enrichment and a pair of complementary proteases strategy with trypsin and LysargiNase for human testis samples. In-depth testis LMW protein profiling resulted in the identification of 4063 proteins, of which 2565 were LMW proteins and 1130 had pairs of peptides generated from both trypsin and LysargiNase. This provided additional mass spectral evidence of further verification of small MPs. Finally, two MPs were verified from the seven MP candidates. One of them, Q8N688 , was verified with two series of continuous and complementary b/y-product ions from the pairs of spectra for tryptic and LysargiNase digested peptides after the "mirror spectrum" matching. This make the confident identification of the representative peptides for the target MPs. On the contrary, the two verified peptides for Q86WR6 were identified with the same strategy from the gel-separation and gel-elution samples, respectively. Although the other five MP candidates showed high-quality spectra, they could not be sufficiently distinguished as PE1s and require further verification. All MS data sets have been deposited in the ProteomeXchange with identifier PXD010093.
Assuntos
Peptídeos/análise , Testículo/química , Humanos , Masculino , Espectrometria de Massas/métodos , Peso Molecular , Peptídeo Hidrolases/metabolismoRESUMO
Subsequent to conducting the Chromosome-Centric Human Proteome Project, we have focused on human testis-enriched missing proteins (MPs) since 2015. For protein coverage to be enhanced, a multiprotease strategy was used for separation of samples by 10% SDS-PAGE. For the separating efficiency to be improved, a high-pH reverse phase (RP) separation strategy was applied to fractionate complex samples in this study. A total of 11,558 proteins was identified, which is the largest proteome data set for single human tissue sample so far. On the basis of this large-scale data set, we verified 14 MPs (PE2) in neXtProt (2018-01) after spectrum quality analysis, isobaric post-translational modification, and single amino acid variant filtering, and synthesized peptide matching. Tissue expression analysis showed that 3 of 14 MPs were testis-specific proteins. Functional analysis showed that 10 of 14 MPs were closely related to liver tumor, liver carcinoma, and hepatocellular carcinoma. Another 100 MPs were listed as candidates but required additional verification information. All MS data sets have been deposited into the ProteomeXchange with the identifier PXD009737.
Assuntos
Proteoma/análise , Testículo/química , Eletroforese em Gel de Poliacrilamida , Variação Genética , Humanos , Neoplasias Hepáticas/química , Masculino , Espectrometria de Massas , Peptídeo Hidrolases/metabolismo , Processamento de Proteína Pós-Traducional , Proteômica/métodosRESUMO
Cucurbitaceae plants are of considerable biological and economic importance, and genomes of cucumber, watermelon, and melon have been sequenced. However, a comparative genomics exploration of their genome structures and evolution has not been available. Here, we aimed at performing a hierarchical inference of genomic homology resulted from recursive paleopolyploidizations. Unexpectedly, we found that, shortly after a core-eudicot-common hexaploidy, a cucurbit-common tetraploidization (CCT) occurred, overlooked by previous reports. Moreover, we characterized gene loss (and retention) after these respective events, which were significantly unbalanced between inferred subgenomes, and between plants after their split. The inference of a dominant subgenome and a sensitive one suggested an allotetraploid nature of the CCT. Besides, we found divergent evolutionary rates among cucurbits, and after doing rate correction, we dated the CCT to be 90-102 Ma, likely common to all Cucurbitaceae plants, showing its important role in the establishment of the plant family.
Assuntos
Cucurbitaceae/genética , Análise de Sequência de DNA/métodos , Sequência de Bases/genética , Mapeamento Cromossômico/métodos , Evolução Molecular , Variação Genética/genética , Genoma de Planta/genética , Genômica/métodos , Taxa de Mutação , Filogenia , Poliploidia , TetraploidiaRESUMO
Mainly due to their economic importance, genomes of 10 legumes, including soybean (Glycine max), wild peanut (Arachis duranensis and Arachis ipaensis), and barrel medic (Medicago truncatula), have been sequenced. However, a family-level comparative genomics analysis has been unavailable. With grape (Vitis vinifera) and selected legume genomes as outgroups, we managed to perform a hierarchical and event-related alignment of these genomes and deconvoluted layers of homologous regions produced by ancestral polyploidizations or speciations. Consequently, we illustrated genomic fractionation characterized by widespread gene losses after the polyploidizations. Notably, high similarity in gene retention between recently duplicated chromosomes in soybean supported the likely autopolyploidy nature of its tetraploid ancestor. Moreover, although most gene losses were nearly random, largely but not fully described by geometric distribution, we showed that polyploidization contributed divergently to the copy number variation of important gene families. Besides, we showed significantly divergent evolutionary levels among legumes and, by performing synonymous nucleotide substitutions at synonymous sites correction, redated major evolutionary events during their expansion. This effort laid a solid foundation for further genomics exploration in the legume research community and beyond. We describe only a tiny fraction of legume comparative genomics analysis that we performed; more information was stored in the newly constructed Legume Comparative Genomics Research Platform (www.legumegrp.org).