Pesquisa | Portal Regional da BVS

Detection and editing of the updated Arabidopsis plastid- and mitochondrial-encoded proteomes through PeptideAtlas.

van Wijk, Klaas J; Bentolila, Stephane; Leppert, Tami; Sun, Qi; Sun, Zhi; Mendoza, Luis; Li, Margaret; Deutsch, Eric W.

Plant Physiol ; 194(3): 1411-1430, 2024 Feb 29.

Artigo em Inglês | MEDLINE | ID: mdl-37879112

RESUMO

Arabidopsis (Arabidopsis thaliana) ecotype Col-0 has plastid and mitochondrial genomes encoding over 100 proteins. Public databases (e.g. Araport11) have redundancy and discrepancies in gene identifiers for these organelle-encoded proteins. RNA editing results in changes to specific amino acid residues or creation of start and stop codons for many of these proteins, but the impact of RNA editing at the protein level is largely unexplored due to the complexities of detection. Here, we assembled the nonredundant set of identifiers, their correct protein sequences, and 452 predicted nonsynonymous editing sites of which 56 are edited at lower frequency. We then determined accumulation of edited and/or unedited proteoforms by searching â¼259 million raw tandem MS spectra from ProteomeXchange, which is part of PeptideAtlas (www.peptideatlas.org/builds/arabidopsis/). We identified all mitochondrial proteins and all except 3 plastid-encoded proteins (NdhG/Ndh6, PsbM, and Rps16), but no proteins predicted from the 4 ORFs were identified. We suggest that Rps16 and 3 of the ORFs are pseudogenes. Detection frequencies for each edit site and type of edit (e.g. S to L/F) were determined at the protein level, cross-referenced against the metadata (e.g. tissue), and evaluated for technical detection challenges. We detected 167 predicted edit sites at the proteome level. Minor frequency sites were edited at low frequency at the protein level except for cytochrome C biogenesis 382 at residue 124 (Ccb382-124). Major frequency sites (>50% editing of RNA) only accumulated in edited form (>98% to 100% edited) at the protein level, with the exception of Rpl5-22. We conclude that RNA editing for major editing sites is required for stable protein accumulation.

Assuntos

Proteínas de Arabidopsis , Arabidopsis , Arabidopsis/genética , Arabidopsis/metabolismo , Proteoma/genética , Proteoma/metabolismo , Plastídeos/genética , Plastídeos/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Mitocôndrias/genética , Mitocôndrias/metabolismo

Detection of the Arabidopsis Proteome and Its Post-translational Modifications and the Nature of the Unobserved (Dark) Proteome in PeptideAtlas.

van Wijk, Klaas J; Leppert, Tami; Sun, Zhi; Kearly, Alyssa; Li, Margaret; Mendoza, Luis; Guzchenko, Isabell; Debley, Erica; Sauermann, Georgia; Routray, Pratyush; Malhotra, Sagunya; Nelson, Andrew; Sun, Qi; Deutsch, Eric W.

J Proteome Res ; 23(1): 185-214, 2024 01 05.

Artigo em Inglês | MEDLINE | ID: mdl-38104260

RESUMO

This study describes a new release of the Arabidopsis thaliana PeptideAtlas proteomics resource (build 2023-10) providing protein sequence coverage, matched mass spectrometry (MS) spectra, selected post-translational modifications (PTMs), and metadata. 70 million MS/MS spectra were matched to the Araport11 annotation, identifying â¼0.6 million unique peptides and 18,267 proteins at the highest confidence level and 3396 lower confidence proteins, together representing 78.6% of the predicted proteome. Additional identified proteins not predicted in Araport11 should be considered for the next Arabidopsis genome annotation. This release identified 5198 phosphorylated proteins, 668 ubiquitinated proteins, 3050 N-terminally acetylated proteins, and 864 lysine-acetylated proteins and mapped their PTM sites. MS support was lacking for 21.4% (5896 proteins) of the predicted Araport11 proteome: the "dark" proteome. This dark proteome is highly enriched for E3 ligases, transcription factors, and for certain (e.g., CLE, IDA, PSY) but not other (e.g., THIONIN, CAP) signaling peptides families. A machine learning model trained on RNA expression data and protein properties predicts the probability that proteins will be detected. The model aids in discovery of proteins with short half-life (e.g., SIG1,3 and ERF-VII TFs) and for developing strategies to identify the missing proteins. PeptideAtlas is linked to TAIR, tracks in JBrowse, and several other community proteomics resources.

Assuntos

Arabidopsis , Humanos , Arabidopsis/genética , Arabidopsis/metabolismo , Proteoma/análise , Espectrometria de Massas em Tandem/métodos , Processamento de Proteína Pós-Traducional , Peptídeos/análise , Bases de Dados de Proteínas

Mapping the Arabidopsis thaliana proteome in PeptideAtlas and the nature of the unobserved (dark) proteome; strategies towards a complete proteome.

bioRxiv ; 2023 Jun 05.

Artigo em Inglês | MEDLINE | ID: mdl-37333403

RESUMO

This study describes a new release of the Arabidopsis thaliana PeptideAtlas proteomics resource providing protein sequence coverage, matched mass spectrometry (MS) spectra, selected PTMs, and metadata. 70 million MS/MS spectra were matched to the Araport11 annotation, identifying â¼0.6 million unique peptides and 18267 proteins at the highest confidence level and 3396 lower confidence proteins, together representing 78.6% of the predicted proteome. Additional identified proteins not predicted in Araport11 should be considered for building the next Arabidopsis genome annotation. This release identified 5198 phosphorylated proteins, 668 ubiquitinated proteins, 3050 N-terminally acetylated proteins and 864 lysine-acetylated proteins and mapped their PTM sites. MS support was lacking for 21.4% (5896 proteins) of the predicted Araport11 proteome - the 'dark' proteome. This dark proteome is highly enriched for certain ( e.g. CLE, CEP, IDA, PSY) but not other ( e.g. THIONIN, CAP,) signaling peptides families, E3 ligases, TFs, and other proteins with unfavorable physicochemical properties. A machine learning model trained on RNA expression data and protein properties predicts the probability for proteins to be detected. The model aids in discovery of proteins with short-half life ( e.g. SIG1,3 and ERF-VII TFs) and completing the proteome. PeptideAtlas is linked to TAIR, JBrowse, PPDB, SUBA, UniProtKB and Plant PTM Viewer.

Does the Ubiquitination Degradation Pathway Really Reach inside of the Chloroplast? A Re-Evaluation of Mass Spectrometry-Based Assignments of Ubiquitination.

van Wijk, Klaas J; Leppert, Tami; Sun, Zhi; Deutsch, Eric W.

J Proteome Res ; 22(6): 2079-2091, 2023 06 02.

Artigo em Inglês | MEDLINE | ID: mdl-37092802

RESUMO

A recent paper in Science Advances by Sun et al. claims that intra-chloroplast proteins in the model plant Arabidopsis can be polyubiquitinated and then extracted into the cytosol for subsequent degradation by the proteasome. Most of this conclusion hinges on several sets of mass spectrometry (MS) data. If the proposed results and conclusion are true, this would be a major change in the proteolysis/proteostasis field, breaking the long-standing dogma that there are no polyubiquitination mechanisms within chloroplast organelles (nor in mitochondria). Given its importance, we reanalyzed their raw MS data using both open and closed sequence database searches and encountered many issues not only with the results but also discrepancies between stated methods (e.g., use of alkylating agent iodoacetamide (IAA)) and observed mass modifications. Although there is likely enrichment of ubiquitination signatures in a subset of the data (probably from ubiquitination in the cytosol), we show that runaway alkylation with IAA caused extensive artifactual modifications of N termini and lysines to the point that a large fraction of the desired ubiquitination signatures is indistinguishable from artifactual acetamide signatures, and thus, no intra-chloroplast polyubiquitination conclusions can be drawn from these data. We provide recommendations on how to avoid such perils in future work.

Assuntos

Arabidopsis , Cloroplastos , Ubiquitinação , Proteólise , Cloroplastos/metabolismo , Complexo de Endopeptidases do Proteassoma/metabolismo , Arabidopsis/metabolismo , Espectrometria de Massas

The Arabidopsis PeptideAtlas: Harnessing worldwide proteomics data to create a comprehensive community proteomics resource.

van Wijk, Klaas J; Leppert, Tami; Sun, Qi; Boguraev, Sascha S; Sun, Zhi; Mendoza, Luis; Deutsch, Eric W.

Plant Cell ; 33(11): 3421-3453, 2021 11 04.

Artigo em Inglês | MEDLINE | ID: mdl-34411258

RESUMO

We developed a resource, the Arabidopsis PeptideAtlas (www.peptideatlas.org/builds/arabidopsis/), to solve central questions about the Arabidopsis thaliana proteome, such as the significance of protein splice forms and post-translational modifications (PTMs), or simply to obtain reliable information about specific proteins. PeptideAtlas is based on published mass spectrometry (MS) data collected through ProteomeXchange and reanalyzed through a uniform processing and metadata annotation pipeline. All matched MS-derived peptide data are linked to spectral, technical, and biological metadata. Nearly 40 million out of â¼143 million MS/MS (tandem MS) spectra were matched to the reference genome Araport11, identifying â¼0.5 million unique peptides and 17,858 uniquely identified proteins (only isoform per gene) at the highest confidence level (false discovery rate 0.0004; 2 non-nested peptides ≥9 amino acid each), assigned canonical proteins, and 3,543 lower-confidence proteins. Physicochemical protein properties were evaluated for targeted identification of unobserved proteins. Additional proteins and isoforms currently not in Araport11 were identified that were generated from pseudogenes, alternative start, stops, and/or splice variants, and small Open Reading Frames; these features should be considered when updating the Arabidopsis genome. Phosphorylation can be inspected through a sophisticated PTM viewer. PeptideAtlas is integrated with community resources including TAIR, tracks in JBrowse, PPDB, and UniProtKB. Subsequent PeptideAtlas builds will incorporate millions more MS/MS data.

Assuntos

Arabidopsis/genética , Peptídeos/análise , Proteínas de Plantas/análise , Proteômica

Demonstration of Protein-Based Human Identification Using the Hair Shaft Proteome.

Parker, Glendon J; Leppert, Tami; Anex, Deon S; Hilmer, Jonathan K; Matsunami, Nori; Baird, Lisa; Stevens, Jeffery; Parsawar, Krishna; Durbin-Johnson, Blythe P; Rocke, David M; Nelson, Chad; Fairbanks, Daniel J; Wilson, Andrew S; Rice, Robert H; Woodward, Scott R; Bothner, Brian; Hart, Bradley R; Leppert, Mark.

PLoS One ; 11(9): e0160653, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27603779

RESUMO

Human identification from biological material is largely dependent on the ability to characterize genetic polymorphisms in DNA. Unfortunately, DNA can degrade in the environment, sometimes below the level at which it can be amplified by PCR. Protein however is chemically more robust than DNA and can persist for longer periods. Protein also contains genetic variation in the form of single amino acid polymorphisms. These can be used to infer the status of non-synonymous single nucleotide polymorphism alleles. To demonstrate this, we used mass spectrometry-based shotgun proteomics to characterize hair shaft proteins in 66 European-American subjects. A total of 596 single nucleotide polymorphism alleles were correctly imputed in 32 loci from 22 genes of subjects' DNA and directly validated using Sanger sequencing. Estimates of the probability of resulting individual non-synonymous single nucleotide polymorphism allelic profiles in the European population, using the product rule, resulted in a maximum power of discrimination of 1 in 12,500. Imputed non-synonymous single nucleotide polymorphism profiles from European-American subjects were considerably less frequent in the African population (maximum likelihood ratio = 11,000). The converse was true for hair shafts collected from an additional 10 subjects with African ancestry, where some profiles were more frequent in the African population. Genetically variant peptides were also identified in hair shaft datasets from six archaeological skeletal remains (up to 260 years old). This study demonstrates that quantifiable measures of identity discrimination and biogeographic background can be obtained from detecting genetically variant peptides in hair shaft protein, including hair from bioarchaeological contexts.

Assuntos

Antropologia Forense/métodos , Cabelo/química , Reação em Cadeia da Polimerase , Proteômica , Alelos , População Negra/genética , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único/genética , População Branca/genética

Identification of rare DNA sequence variants in high-risk autism families and their prevalence in a large case/control population.

Matsunami, Nori; Hensel, Charles H; Baird, Lisa; Stevens, Jeff; Otterud, Brith; Leppert, Tami; Varvil, Tena; Hadley, Dexter; Glessner, Joseph T; Pellegrino, Renata; Kim, Cecilia; Thomas, Kelly; Wang, Fengxiang; Otieno, Frederick G; Ho, Karen; Christensen, Gerald B; Li, Dongying; Prekeris, Rytis; Lambert, Christophe G; Hakonarson, Hakon; Leppert, Mark F.

Mol Autism ; 5(1): 5, 2014 Jan 27.

Artigo em Inglês | MEDLINE | ID: mdl-24467814

RESUMO

BACKGROUND: Genetics clearly plays a major role in the etiology of autism spectrum disorders (ASDs), but studies to date are only beginning to characterize the causal genetic variants responsible. Until recently, studies using multiple extended multi-generation families to identify ASD risk genes had not been undertaken. METHODS: We identified haplotypes shared among individuals with ASDs in large multiplex families, followed by targeted DNA capture and sequencing to identify potential causal variants. We also assayed the prevalence of the identified variants in a large ASD case/control population. RESULTS: We identified 584 non-conservative missense, nonsense, frameshift and splice site variants that might predispose to autism in our high-risk families. Eleven of these variants were observed to have odds ratios greater than 1.5 in a set of 1,541 unrelated children with autism and 5,785 controls. Three variants, in the RAB11FIP5, ABP1, and JMJD7-PLA2G4B genes, each were observed in a single case and not in any controls. These variants also were not seen in public sequence databases, suggesting that they may be rare causal ASD variants. Twenty-eight additional rare variants were observed only in high-risk ASD families. Collectively, these 39 variants identify 36 genes as ASD risk genes. Segregation of sequence variants and of copy number variants previously detected in these families reveals a complex pattern, with only a RAB11FIP5 variant segregating to all affected individuals in one two-generation pedigree. Some affected individuals were found to have multiple potential risk alleles, including sequence variants and copy number variants (CNVs), suggesting that the high incidence of autism in these families could be best explained by variants at multiple loci. CONCLUSIONS: Our study is the first to use haplotype sharing to identify familial ASD risk loci. In total, we identified 39 variants in 36 genes that may confer a genetic risk of developing autism. The observation of 11 of these variants in unrelated ASD cases further supports their role as ASD risk variants.

Identification of rare recurrent copy number variants in high-risk autism families and their prevalence in a large ASD population.

Matsunami, Nori; Hadley, Dexter; Hensel, Charles H; Christensen, G Bryce; Kim, Cecilia; Frackelton, Edward; Thomas, Kelly; da Silva, Renata Pellegrino; Stevens, Jeff; Baird, Lisa; Otterud, Brith; Ho, Karen; Varvil, Tena; Leppert, Tami; Lambert, Christophe G; Leppert, Mark; Hakonarson, Hakon.

PLoS One ; 8(1): e52239, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23341896

RESUMO

Structural variation is thought to play a major etiological role in the development of autism spectrum disorders (ASDs), and numerous studies documenting the relevance of copy number variants (CNVs) in ASD have been published since 2006. To determine if large ASD families harbor high-impact CNVs that may have broader impact in the general ASD population, we used the Affymetrix genome-wide human SNP array 6.0 to identify 153 putative autism-specific CNVs present in 55 individuals with ASD from 9 multiplex ASD pedigrees. To evaluate the actual prevalence of these CNVs as well as 185 CNVs reportedly associated with ASD from published studies many of which are insufficiently powered, we designed a custom Illumina array and used it to interrogate these CNVs in 3,000 ASD cases and 6,000 controls. Additional single nucleotide variants (SNVs) on the array identified 25 CNVs that we did not detect in our family studies at the standard SNP array resolution. After molecular validation, our results demonstrated that 15 CNVs identified in high-risk ASD families also were found in two or more ASD cases with odds ratios greater than 2.0, strengthening their support as ASD risk variants. In addition, of the 25 CNVs identified using SNV probes on our custom array, 9 also had odds ratios greater than 2.0, suggesting that these CNVs also are ASD risk variants. Eighteen of the validated CNVs have not been reported previously in individuals with ASD and three have only been observed once. Finally, we confirmed the association of 31 of 185 published ASD-associated CNVs in our dataset with odds ratios greater than 2.0, suggesting they may be of clinical relevance in the evaluation of children with ASDs. Taken together, these data provide strong support for the existence and application of high-impact CNVs in the clinical genetic evaluation of children with ASD.

Assuntos

Transtorno Autístico/genética , Variações do Número de Cópias de DNA/genética , Transtorno Autístico/epidemiologia , Estudos de Casos e Controles , Criança , Cromossomos Humanos Par 15/genética , Família , Feminino , Redes Reguladoras de Genes/genética , Loci Gênicos/genética , Genoma Humano/genética , Humanos , Masculino , Linhagem , Reação em Cadeia da Polimerase , Polimorfismo de Nucleotídeo Único , Prevalência , Reprodutibilidade dos Testes , Fatores de Risco , Utah/epidemiologia

A family-based paradigm to identify candidate chromosomal regions for isolated congenital diaphragmatic hernia.

Arrington, Cammon B; Bleyl, Steven B; Matsunami, Nori; Bowles, Neil E; Leppert, Tami I; Demarest, Bradley L; Osborne, Karen; Yoder, Bradley A; Byrne, Janice L; Schiffman, Joshua D; Null, Donald M; DiGeronimo, Robert; Rollins, Michael; Faix, Roger; Comstock, Jessica; Camp, Nicola J; Leppert, Mark F; Yost, H Joseph; Brunelli, Luca.

Am J Med Genet A ; 158A(12): 3137-47, 2012 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-23165927

RESUMO

Congenital diaphragmatic hernia (CDH) is a developmental defect of the diaphragm that causes high newborn mortality. Isolated or non-syndromic CDH is considered a multifactorial disease, with strong evidence implicating genetic factors. As low heritability has been reported in isolated CDH, family-based genetic methods have yet to identify the genetic factors associated with the defect. Using the Utah Population Database, we identified distantly related patients from several extended families with a high incidence of isolated CDH. Using high-density genotyping, seven patients were analyzed by homozygosity exclusion rare allele mapping (HERAM) and phased haplotype sharing (HapShare), two methods we developed to map shared chromosome regions. Our patient cohort shared three regions not previously associated with CDH, that is, 2q11.2-q12.1, 4p13 and 7q11.2, and two regions previously involved in CDH, that is, 8p23.1 and 15q26.2. The latter regions contain GATA4 and NR2F2, two genes implicated in diaphragm formation in mice. Interestingly, three patients shared the 8p23.1 locus and one of them also harbored the 15q26.2 segment. No coding variants were identified in GATA4 or NR2F2, but a rare shared variant was found in intron 1 of GATA4. This work shows the role of heritability in isolated CDH. Our family-based strategy uncovers new chromosomal regions possibly associated with disease, and suggests that non-coding variants of GATA4 and NR2F2 may contribute to the development of isolated CDH. This approach could speed up the discovery of the genes and regulatory elements causing multifactorial diseases, such as isolated CDH.

Assuntos

Cromossomos Humanos , Hérnias Diafragmáticas Congênitas , Adulto , Fator II de Transcrição COUP/genética , Estudos de Casos e Controles , Criança , Estudos de Coortes , DNA/sangue , DNA/genética , Diafragma/anormalidades , Saúde da Família , Feminino , Fator de Transcrição GATA4/genética , Dosagem de Genes , Predisposição Genética para Doença , Genótipo , Hérnia Diafragmática/sangue , Hérnia Diafragmática/genética , Humanos , Masculino , Linhagem , Polimorfismo de Nucleotídeo Único

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA