Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 62
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
bioRxiv ; 2024 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-39314370

RESUMEN

A major scientific drive is to characterize the protein-coding genome as it provides the primary basis for the study of human health. But the fundamental question remains: what has been missed in prior genomic analyses? Over the past decade, the translation of non-canonical open reading frames (ncORFs) has been observed across human cell types and disease states, with major implications for proteomics, genomics, and clinical science. However, the impact of ncORFs has been limited by the absence of a large-scale understanding of their contribution to the human proteome. Here, we report the collaborative efforts of stakeholders in proteomics, immunopeptidomics, Ribo-seq ORF discovery, and gene annotation, to produce a consensus landscape of protein-level evidence for ncORFs. We show that at least 25% of a set of 7,264 ncORFs give rise to translated gene products, yielding over 3,000 peptides in a pan-proteome analysis encompassing 3.8 billion mass spectra from 95,520 experiments. With these data, we developed an annotation framework for ncORFs and created public tools for researchers through GENCODE and PeptideAtlas. This work will provide a platform to advance ncORF-derived proteins in biomedical discovery and, beyond humans, diverse animals and plants where ncORFs are similarly observed.

2.
bioRxiv ; 2024 Mar 25.
Artículo en Inglés | MEDLINE | ID: mdl-38585907

RESUMEN

The biological process of RNA translation is fundamental to cellular life and has wide-ranging implications for human disease. Yet, accurately delineating the variation in RNA translation represents a significant challenge. Here, we develop RiboTIE, a transformer model-based approach to map global RNA translation. We find that RiboTIE offers unparalleled precision and sensitivity for ribosome profiling data. Application of RiboTIE to normal brain and medulloblastoma cancer samples enables high-resolution insights into disease regulation of RNA translation.

3.
NAR Genom Bioinform ; 5(1): lqad021, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-36879896

RESUMEN

The correct mapping of the proteome is an important step towards advancing our understanding of biological systems and cellular mechanisms. Methods that provide better mappings can fuel important processes such as drug discovery and disease understanding. Currently, true determination of translation initiation sites is primarily achieved by in vivo experiments. Here, we propose TIS Transformer, a deep learning model for the determination of translation start sites solely utilizing the information embedded in the transcript nucleotide sequence. The method is built upon deep learning techniques first designed for natural language processing. We prove this approach to be best suited for learning the semantics of translation, outperforming previous approaches by a large margin. We demonstrate that limitations in the model performance are primarily due to the presence of low-quality annotations against which the model is evaluated against. Advantages of the method are its ability to detect key features of the translation process and multiple coding sequences on a transcript. These include micropeptides encoded by short Open Reading Frames, either alongside a canonical coding sequence or within long non-coding RNAs. To demonstrate the use of our methods, we applied TIS Transformer to remap the full human proteome.

6.
Bioinformatics ; 38(3): 597-603, 2022 01 12.
Artículo en Inglés | MEDLINE | ID: mdl-34718418

RESUMEN

MOTIVATION: The adoption of current single-cell DNA methylation sequencing protocols is hindered by incomplete coverage, outlining the need for effective imputation techniques. The task of imputing single-cell (methylation) data requires models to build an understanding of underlying biological processes. RESULTS: We adapt the transformer neural network architecture to operate on methylation matrices through combining axial attention with sliding window self-attention. The obtained CpG Transformer displays state-of-the-art performances on a wide range of scBS-seq and scRRBS-seq datasets. Furthermore, we demonstrate the interpretability of CpG Transformer and illustrate its rapid transfer learning properties, allowing practitioners to train models on new datasets with a limited computational and time budget. AVAILABILITY AND IMPLEMENTATION: CpG Transformer is freely available at https://github.com/gdewael/cpg-transformer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Metilación de ADN , Epigenoma , Secuencia de Bases , Análisis de Secuencia de ADN/métodos , Redes Neurales de la Computación
7.
Front Genet ; 12: 728900, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34759956

RESUMEN

Transcriptome and ribosome sequencing have revealed the existence of many non-canonical transcripts, mainly containing splice variants, ncRNA, sORFs and altORFs. However, identification and characterization of products that may be translated out of these remains a challenge. Addressing this, we here report on 552 non-canonical proteins and splice variants in the model organism C. elegans using tandem mass spectrometry. Aided by sequencing-based prediction, we generated a custom proteome database tailored to search for non-canonical translation products of C. elegans. Using this database, we mined available mass spectrometric resources of C. elegans, from which 51 novel, non-canonical proteins could be identified. Furthermore, we utilized diverse proteomic and peptidomic strategies to detect 40 novel non-canonical proteins in C. elegans by LC-TIMS-MS/MS, of which 6 were common with our meta-analysis of existing resources. Together, this permits us to provide a resource with detailed annotation of 467 splice variants and 85 novel proteins mapped onto UTRs, non-coding regions and alternative open reading frames of the C. elegans genome.

8.
Front Cell Dev Biol ; 9: 720570, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34604223

RESUMEN

Bioactive peptides exhibit key roles in a wide variety of complex processes, such as regulation of body weight, learning, aging, and innate immune response. Next to the classical bioactive peptides, emerging from larger precursor proteins by specific proteolytic processing, a new class of peptides originating from small open reading frames (sORFs) have been recognized as important biological regulators. But their intrinsic properties, specific expression pattern and location on presumed non-coding regions have hindered the full characterization of the repertoire of bioactive peptides, despite their predominant role in various pathways. Although the development of peptidomics has offered the opportunity to study these peptides in vivo, it remains challenging to identify the full peptidome as the lack of cleavage enzyme specification and large search space complicates conventional database search approaches. In this study, we introduce a proteogenomics methodology using a new type of mass spectrometry instrument and the implementation of machine learning tools toward improved identification of potential bioactive peptides in the mouse brain. The application of trapped ion mobility spectrometry (tims) coupled to a time-of-flight mass analyzer (TOF) offers improved sensitivity, an enhanced peptide coverage, reduction in chemical noise and the reduced occurrence of chimeric spectra. Subsequent machine learning tools MS2PIP, predicting fragment ion intensities and DeepLC, predicting retention times, improve the database searching based on a large and comprehensive custom database containing both sORFs and alternative ORFs. Finally, the identification of peptides is further enhanced by applying the post-processing semi-supervised learning tool Percolator. Applying this workflow, the first peptidomics workflow combined with spectral intensity and retention time predictions, we identified a total of 167 predicted sORF-encoded peptides, of which 48 originating from presumed non-coding locations, next to 401 peptides from known neuropeptide precursors, linked to 66 annotated bioactive neuropeptides from within 22 different families. Additional PEAKS analysis expanded the pool of SEPs on presumed non-coding locations to 84, while an additional 204 peptides completed the list of peptides from neuropeptide precursors. Altogether, this study provides insights into a new robust pipeline that fuses technological advancements from different fields ensuring an improved coverage of the neuropeptidome in the mouse brain.

9.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33834200

RESUMEN

The effectiveness of deep learning methods can be largely attributed to the automated extraction of relevant features from raw data. In the field of functional genomics, this generally concerns the automatic selection of relevant nucleotide motifs from DNA sequences. To benefit from automated learning methods, new strategies are required that unveil the decision-making process of trained models. In this paper, we present a new approach that has been successful in gathering insights on the transcription process in Escherichia coli. This work builds upon a transformer-based neural network framework designed for prokaryotic genome annotation purposes. We find that the majority of subunits (attention heads) of the model are specialized towards identifying transcription factors and are able to successfully characterize both their binding sites and consensus sequences, uncovering both well-known and potentially novel elements involved in the initiation of the transcription process. With the specialization of the attention heads occurring automatically, we believe transformer models to be of high interest towards the creation of explainable neural networks in this field.


Asunto(s)
Aprendizaje Profundo , Escherichia coli/genética , Genoma Bacteriano , Genómica/métodos , Sitio de Iniciación de la Transcripción , Secuencia de Bases , Sitios de Unión , ADN Bacteriano/genética , ADN Bacteriano/metabolismo , Escherichia coli/metabolismo , Regiones Promotoras Genéticas/genética , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
10.
Mol Cell Proteomics ; 20: 100076, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33823297

RESUMEN

Proteogenomics approaches often struggle with the distinction between true and false peptide-to-spectrum matches as the database size enlarges. However, features extracted from tandem mass spectrometry intensity predictors can enhance the peptide identification rate and can provide extra confidence for peptide-to-spectrum matching in a proteogenomics context. To that end, features from the spectral intensity pattern predictors MS2PIP and Prosit were combined with the canonical scores from MaxQuant in the Percolator postprocessing tool for protein sequence databases constructed out of ribosome profiling and nanopore RNA-Seq analyses. The presented results provide evidence that this approach enhances both the identification rate as well as the validation stringency in a proteogenomic setting.


Asunto(s)
Proteogenómica/métodos , Bases de Datos de Proteínas , Células HCT116 , Humanos , Aprendizaje Automático , RNA-Seq , Ribosomas
11.
Nat Cancer ; 2(6): 611-628, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-35121941

RESUMEN

Post-transcriptional modifications of RNA constitute an emerging regulatory layer of gene expression. The demethylase fat mass- and obesity-associated protein (FTO), an eraser of N6-methyladenosine (m6A), has been shown to play a role in cancer, but its contribution to tumor progression and the underlying mechanisms remain unclear. Here, we report widespread FTO downregulation in epithelial cancers associated with increased invasion, metastasis and worse clinical outcome. Both in vitro and in vivo, FTO silencing promotes cancer growth, cell motility and invasion. In human-derived tumor xenografts (PDXs), FTO pharmacological inhibition favors tumorigenesis. Mechanistically, we demonstrate that FTO depletion elicits an epithelial-to-mesenchymal transition (EMT) program through increased m6A and altered 3'-end processing of key mRNAs along the Wnt signaling cascade. Accordingly, FTO knockdown acts via EMT to sensitize mouse xenografts to Wnt inhibition. We thus identify FTO as a key regulator, across epithelial cancers, of Wnt-triggered EMT and tumor progression and reveal a therapeutically exploitable vulnerability of FTO-low tumors.


Asunto(s)
Neoplasias Glandulares y Epiteliales , ARN , Dioxigenasa FTO Dependiente de Alfa-Cetoglutarato/genética , Animales , Regulación hacia Abajo/genética , Transición Epitelial-Mesenquimal/genética , Humanos , Ratones
12.
Nat Commun ; 11(1): 4956, 2020 10 02.
Artículo en Inglés | MEDLINE | ID: mdl-33009383

RESUMEN

Tet-enzyme-mediated 5-hydroxymethylation of cytosines in DNA plays a crucial role in mouse embryonic stem cells (ESCs). In RNA also, 5-hydroxymethylcytosine (5hmC) has recently been evidenced, but its physiological roles are still largely unknown. Here we show the contribution and function of this mark in mouse ESCs and differentiating embryoid bodies. Transcriptome-wide mapping in ESCs reveals hundreds of messenger RNAs marked by 5hmC at sites characterized by a defined unique consensus sequence and particular features. During differentiation a large number of transcripts, including many encoding key pluripotency-related factors (such as Eed and Jarid2), show decreased cytosine hydroxymethylation. Using Tet-knockout ESCs, we find Tet enzymes to be partly responsible for deposition of 5hmC in mRNA. A transcriptome-wide search further reveals mRNA targets to which Tet1 and Tet2 bind, at sites showing a topology similar to that of 5hmC sites. Tet-mediated RNA hydroxymethylation is found to reduce the stability of crucial pluripotency-promoting transcripts. We propose that RNA cytosine 5-hydroxymethylation by Tets is a mark of transcriptome flexibility, inextricably linked to the balance between pluripotency and lineage commitment.


Asunto(s)
5-Metilcitosina/análogos & derivados , Diferenciación Celular , Proteínas de Unión al ADN/metabolismo , Células Madre Embrionarias de Ratones/citología , Células Madre Embrionarias de Ratones/metabolismo , Proteínas Proto-Oncogénicas/metabolismo , ARN/metabolismo , 5-Metilcitosina/metabolismo , Animales , Especificidad de Anticuerpos/inmunología , Secuencia de Bases , Dioxigenasas , Cuerpos Embrioides/metabolismo , Ratones , Modelos Biológicos , Células Madre Pluripotentes/metabolismo , Unión Proteica , Estabilidad del ARN/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo , Transcriptoma/genética
13.
Exp Cell Res ; 391(1): 111923, 2020 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-32135166

RESUMEN

Growing evidence illustrates the shortcomings on the current understanding of the full complexity of the proteome. Previously overlooked small open reading frames (sORFs) and their encoded microproteins have filled important gaps, exerting their function as biologically relevant regulators. The characterization of the full small proteome has potential applications in many fields. Continuous development of techniques and tools led to an improved sORF discovery, where these can originate from bioinformatics analyses, from sequencing routines or proteomics approaches. In this mini review, we discuss the ongoing trends in the three fields and suggest some strategies for further characterization of high potential candidates.


Asunto(s)
Biología Computacional/estadística & datos numéricos , Redes Neurales de la Computación , Sistemas de Lectura Abierta , Biosíntesis de Proteínas , Proteoma/genética , Ribosomas/genética , Animales , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Plantas/genética , Señales de Clasificación de Proteína/genética , Proteoma/clasificación , Proteoma/metabolismo , Ribosomas/clasificación , Ribosomas/metabolismo , Programas Informáticos
14.
Nat Commun ; 11(1): 1312, 2020 03 11.
Artículo en Inglés | MEDLINE | ID: mdl-32161263

RESUMEN

The emergence of small open reading frame (sORF)-encoded peptides (SEPs) is rapidly expanding the known proteome at the lower end of the size distribution. Here, we show that the mitochondrial proteome, particularly the respiratory chain, is enriched for small proteins. Using a prediction and validation pipeline for SEPs, we report the discovery of 16 endogenous nuclear encoded, mitochondrial-localized SEPs (mito-SEPs). Through functional prediction, proteomics, metabolomics and metabolic flux modeling, we demonstrate that BRAWNIN, a 71 a.a. peptide encoded by C12orf73, is essential for respiratory chain complex III (CIII) assembly. In human cells, BRAWNIN is induced by the energy-sensing AMPK pathway, and its depletion impairs mitochondrial ATP production. In zebrafish, Brawnin deletion causes complete CIII loss, resulting in severe growth retardation, lactic acidosis and early death. Our findings demonstrate that BRAWNIN is essential for vertebrate oxidative phosphorylation. We propose that mito-SEPs are an untapped resource for essential regulators of oxidative metabolism.


Asunto(s)
Complejo III de Transporte de Electrones/metabolismo , Mitocondrias/metabolismo , Proteínas Mitocondriales/metabolismo , Fosforilación Oxidativa , Péptidos/metabolismo , Proteínas de Pez Cebra/metabolismo , Acidosis Láctica/genética , Animales , Animales Modificados Genéticamente , Modelos Animales de Enfermedad , Femenino , Técnicas de Silenciamiento del Gen , Trastornos del Crecimiento/genética , Humanos , Masculino , Metabolómica , Proteínas Mitocondriales/genética , Modelos Animales , Modelos Biológicos , Sistemas de Lectura Abierta/genética , Péptidos/genética , Proteómica , Pez Cebra/genética , Pez Cebra/crecimiento & desarrollo , Proteínas de Pez Cebra/genética
15.
Neurosci Res ; 151: 31-37, 2020 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-30862443

RESUMEN

Brain derived peptides function as signaling molecules in the brain and regulate various physiological and behavioral processes. The low abundance and atypical fragmentation of these brain derived peptides makes detection using traditional proteomic methods challenging. In this study, we introduce and validate a new methodology for the discovery of novel peptides derived from mammalian brain. This methodology combines ribosome profiling and mass spectrometry-based peptidomics. Using this framework, we have identified a novel peptide in mouse whole brain whose expression is highest in the basal ganglia, hypothalamus and amygdala. Although its functional role is unknown, it has been previously detected in peripheral tissue as a component of the mRNA decapping complex. Continued discovery and studies of novel regulating peptides in mammalian brain may also provide insight into brain disorders.


Asunto(s)
Neuropéptidos/aislamiento & purificación , Proteómica/métodos , Animales , Encéfalo/metabolismo , Masculino , Espectrometría de Masas , Ratones , Ratones Endogámicos C57BL , Neuropéptidos/análisis , Péptidos , Ribosomas , Análisis de Secuencia de Proteína
16.
Genes (Basel) ; 10(9)2019 09 05.
Artículo en Inglés | MEDLINE | ID: mdl-31492022

RESUMEN

The increasing availability of high throughput proteomics data provides us with opportunities as well as posing new ethical challenges regarding data privacy and re-identifiability of participants. Moreover, the fact that proteomics represents a level between the genotype and the phenotype further exacerbates the situation, introducing dilemmas related to publicly available data, anonymization, ownership of information and incidental findings. In this paper, we try to differentiate proteomics from genomics data and cover the ethical challenges related to proteomics data sharing. Finally, we give an overview of the proposed solutions and the outlook for future studies.


Asunto(s)
Privacidad Genética/normas , Medicina de Precisión/ética , Proteómica/ética , Humanos , Consentimiento Informado/normas , Medicina de Precisión/métodos , Proteómica/métodos
17.
PLoS One ; 14(9): e0215185, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31545805

RESUMEN

Neuropeptides are a class of bioactive peptides shown to be involved in various physiological processes, including metabolism, development, and reproduction. Although neuropeptide candidates have been predicted from genomic and transcriptomic data, comprehensive characterization of neuropeptide repertoires remains a challenge owing to their small size and variable sequences. De novo prediction of neuropeptides from genome or transcriptome data is difficult and usually only efficient for those peptides that have identified orthologs in other animal species. Recent peptidomics technology has enabled systematic structural identification of neuropeptides by using the combination of liquid chromatography and tandem mass spectrometry. However, reliable identification of naturally occurring peptides using a conventional tandem mass spectrometry approach, scanning spectra against a protein database, remains difficult because a large search space must be scanned due to the absence of a cleavage enzyme specification. We developed a pipeline consisting of in silico prediction of candidate neuropeptides followed by peptide-spectrum matching. This approach enables highly sensitive and reliable neuropeptide identification, as the search space for peptide-spectrum matching is highly reduced. Nematostella vectensis is a basal eumetazoan with one of the most ancient nervous systems. We scanned the Nematostella protein database for sequences displaying structural hallmarks typical of eumetazoan neuropeptide precursors, including amino- and carboxyterminal motifs and associated modifications. Peptide-spectrum matching was performed against a dataset of peptides that are cleaved in silico from these putative peptide precursors. The dozens of newly identified neuropeptides display structural similarities to bilaterian neuropeptides including tachykinin, myoinhibitory peptide, and neuromedin-U/pyrokinin, suggesting these neuropeptides occurred in the eumetazoan ancestor of all animal species.


Asunto(s)
Evolución Molecular , Neuropéptidos/genética , Anémonas de Mar/química , Anémonas de Mar/genética , Espectrometría de Masas en Tándem , Secuencia de Aminoácidos , Animales , Biología Computacional/métodos , Secuencia Conservada , Bases de Datos Genéticas , Expresión Génica , Neuropéptidos/química , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción
18.
Artículo en Inglés | MEDLINE | ID: mdl-31238262

RESUMEN

On average a human cell type expresses around 10,000 different protein coding genes synthesizing all the different molecular forms of the protein product (proteoforms) found in a cell. In a typical shotgun bottom up proteomic approach, the proteins are enzymatically cleaved, producing several 100,000 s of different peptides that are analyzed with liquid chromatography-tandem mass spectrometry (LC-MSMS). One of the major consequences of this high sample complexity is that coelution of peptides cannot be avoided. Moreover, low abundant peptides are difficult to identify as they have a lower chance of being selected for fragmentation due to ion-suppression effects and the semi-stochastic nature of the precursor selection in data-dependent shotgun proteomic analysis where peptides are selected for fragmentation analysis one-by-one as they elute from the column. In the current study we explore a simple novel approach that has the potential to counter some of the effect of coelution of peptides and improves the number of peptide identifications in a bottom-up proteomic analysis. In this method, peptides from a HeLa cell digest were eluted from the reverse phase column using three different elution solvents (acetonitrile, methanol and acetone) in three replicate reversed phase LC-MS/MS shotgun proteomic analysis. Results were compared with three technical replicates using the same solvent, which is common practice in proteomic analysis. In total, we see an increase of up to 10% in unique protein and up to 30% in unique peptide identifications from the combined analysis using different elution solvents when compared to the combined identifications from the three replicates of the same solvent. In addition, the overlap of unique peptide identifications common in all three LC-MS analyses in our approach is only 23% compared to 50% in the replicates using the same solvent. The method presented here thus provides an easy to implement method to significantly reduce the effects of coelution and ion suppression of peptides and improve protein coverage in shotgun proteomics. Data are available via ProteomeXchange with identifier PXD011908.


Asunto(s)
Cromatografía Liquida/métodos , Proteoma/química , Proteómica/métodos , Espectrometría de Masas en Tándem/métodos , Células HeLa , Humanos , Péptidos/química
19.
Mol Cell Proteomics ; 18(8 suppl 1): S126-S140, 2019 08 09.
Artículo en Inglés | MEDLINE | ID: mdl-31040227

RESUMEN

PROTEOFORMER is a pipeline that enables the automated processing of data derived from ribosome profiling (RIBO-seq, i.e. the sequencing of ribosome-protected mRNA fragments). As such, genome-wide ribosome occupancies lead to the delineation of data-specific translation product candidates and these can improve the mass spectrometry-based identification. Since its first publication, different upgrades, new features and extensions have been added to the PROTEOFORMER pipeline. Some of the most important upgrades include P-site offset calculation during mapping, comprehensive data pre-exploration, the introduction of two alternative proteoform calling strategies and extended pipeline output features. These novelties are illustrated by analyzing ribosome profiling data of human HCT116 and Jurkat data. The different proteoform calling strategies are used alongside one another and in the end combined together with reference sequences from UniProt. Matching mass spectrometry data are searched against this extended search space with MaxQuant. Overall, besides annotated proteoforms, this pipeline leads to the identification and validation of different categories of new proteoforms, including translation products of up- and downstream open reading frames, 5' and 3' extended and truncated proteoforms, single amino acid variants, splice variants and translation products of so-called noncoding regions. Further, proof-of-concept is reported for the improvement of spectrum matching by including Prosit, a deep neural network strategy that adds extra fragmentation spectrum intensity features to the analysis. In the light of ribosome profiling-driven proteogenomics, it is shown that this allows validating the spectrum matches of newly identified proteoforms with elevated stringency. These updates and novel conclusions provide new insights and lessons for the ribosome profiling-based proteogenomic research field. More practical information on the pipeline, raw code, the user manual (README) and explanations on the different modes of availability can be found at the GitHub repository of PROTEOFORMER: https://github.com/Biobix/proteoformer.


Asunto(s)
Proteogenómica/métodos , Ribosomas/metabolismo , Cromatografía Liquida , Células HCT116 , Humanos , Células Jurkat , Espectrometría de Masas en Tándem
20.
J Proteome Res ; 18(6): 2686-2692, 2019 06 07.
Artículo en Inglés | MEDLINE | ID: mdl-31081335

RESUMEN

Mass-spectrometry-based proteomics enables the high-throughput identification and quantification of proteins, including sequence variants and post-translational modifications (PTMs) in biological samples. However, most workflows require that such variations be included in the search space used to analyze the data, and doing so remains challenging with most analysis tools. In order to facilitate the search for known sequence variants and PTMs, the Proteomics Standards Initiative (PSI) has designed and implemented the PSI extended FASTA format (PEFF). PEFF is based on the very popular FASTA format but adds a uniform mechanism for encoding substantially more metadata about the sequence collection as well as individual entries, including support for encoding known sequence variants, PTMs, and proteoforms. The format is very nearly backward compatible, and as such, existing FASTA parsers will require little or no changes to be able to read PEFF files as FASTA files, although without supporting any of the extra capabilities of PEFF. PEFF is defined by a full specification document, controlled vocabulary terms, a set of example files, software libraries, and a file validator. Popular software and resources are starting to support PEFF, including the sequence search engine Comet and the knowledge bases neXtProt and UniProtKB. Widespread implementation of PEFF is expected to further enable proteogenomics and top-down proteomics applications by providing a standardized mechanism for encoding protein sequences and their known variations. All the related documentation, including the detailed file format specification and example files, are available at http://www.psidev.info/peff .


Asunto(s)
Proteómica/normas , Humanos , Almacenamiento y Recuperación de la Información , Espectrometría de Masas , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...