Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 81
Filtrar
1.
Anal Chem ; 2024 Sep 25.
Artículo en Inglés | MEDLINE | ID: mdl-39322219

RESUMEN

Mass-spectrometry-based proteomics has advanced with the integration of experimental and predicted spectral libraries, which have significantly improved peptide identification in complex search spaces. However, challenges persist in distinguishing some peptides with close retention times and nearly identical fragmentation patterns. In this study, we conducted a theoretical assessment to quantify the prevalence of indistinguishable peptides within the human canonical proteome and immunopeptidome using state-of-the-art retention time and spectrum prediction models. By quantifying the proportion of peptides posing challenges to unequivocal identification, we set the theoretical nonaccessible portion within a given proteome, and underscore the effectiveness of contemporary analytical methodologies in resolving the complexity of the human proteome and immunopeptidome via mass spectrometry.

3.
Nucleic Acids Res ; 52(17): 10144-10160, 2024 Sep 23.
Artículo en Inglés | MEDLINE | ID: mdl-39175109

RESUMEN

Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs) (1-3). Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs. With NeEDL (network-based epistasis detection via local search), we leverage network medicine to inform the selection of EIs that are an order of magnitude more statistically significant compared to existing tools and consist, on average, of five SNPs. We further show that this computationally demanding task can be substantially accelerated once quantum computing hardware becomes available. We apply NeEDL to eight different diseases and discover genes (affected by EIs of SNPs) that are partly known to affect the disease, additionally, these results are reproducible across independent cohorts. EIs for these eight diseases can be interactively explored in the Epistasis Disease Atlas (https://epistasis-disease-atlas.com). In summary, NeEDL demonstrates the potential of seamlessly integrated quantum computing techniques to accelerate biomedical research. Our network medicine approach detects higher-order EIs with unprecedented statistical and biological evidence, yielding unique insights into polygenic diseases and providing a basis for the development of improved risk scores and combination therapies.


Asunto(s)
Epistasis Genética , Polimorfismo de Nucleótido Simple , Humanos , Teoría Cuántica , Herencia Multifactorial/genética , Enfermedad/genética , Biología Computacional/métodos , Algoritmos , Predisposición Genética a la Enfermedad
4.
J Proteomics ; 305: 105246, 2024 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-38964537

RESUMEN

The 2023 European Bioinformatics Community for Mass Spectrometry (EuBIC-MS) Developers Meeting was held from January 15th to January 20th, 2023, in Congressi Stefano Franscin at Monte Verità in Ticino, Switzerland. The participants were scientists and developers working in computational mass spectrometry (MS), metabolomics, and proteomics. The 5-day program was split between introductory keynote lectures and parallel hackathon sessions focusing on "Artificial Intelligence in proteomics" to stimulate future directions in the MS-driven omics areas. During the latter, the participants developed bioinformatics tools and resources addressing outstanding needs in the community. The hackathons allowed less experienced participants to learn from more advanced computational MS experts and actively contribute to highly relevant research projects. We successfully produced several new tools applicable to the proteomics community by improving data analysis and facilitating future research.


Asunto(s)
Espectrometría de Masas , Proteómica , Proteómica/métodos , Humanos , Espectrometría de Masas/métodos , Biología Computacional/métodos , Metabolómica/métodos , Inteligencia Artificial
5.
Methods Mol Biol ; 2836: 157-181, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38995541

RESUMEN

Proteomics, the study of proteins within biological systems, has seen remarkable advancements in recent years, with protein isoform detection emerging as one of the next major frontiers. One of the primary challenges is achieving the necessary peptide and protein coverage to confidently differentiate isoforms as a result of the protein inference problem and protein false discovery rate estimation challenge in large data. In this chapter, we describe the application of artificial intelligence-assisted peptide property prediction for database search engine rescoring by Oktoberfest, an approach that has proven effective, particularly for complex samples and extensive search spaces, which can greatly increase peptide coverage. Further, it illustrates a method for increasing isoform coverage by the PickedGroupFDR approach that is designed to excel when applied on large data. Real-world examples are provided to illustrate the utility of the tools in the context of rescoring, protein grouping, and false discovery rate estimation. By implementing these cutting-edge techniques, researchers can achieve a substantial increase in both peptide and isoform coverage, thus unlocking the potential of protein isoform detection in their studies and shedding light on their roles and functions in biological processes.


Asunto(s)
Inteligencia Artificial , Bases de Datos de Proteínas , Isoformas de Proteínas , Proteómica , Programas Informáticos , Isoformas de Proteínas/análisis , Proteómica/métodos , Humanos , Biología Computacional/métodos , Motor de Búsqueda , Péptidos/química , Péptidos/análisis , Algoritmos , Proteínas/química , Proteínas/análisis
6.
Artículo en Inglés | MEDLINE | ID: mdl-39012054

RESUMEN

Alternative splicing is a major contributor of transcriptomic complexity, but the extent to which transcript isoforms are translated into stable, functional protein isoforms is unclear. Furthermore, detection of relatively scarce isoform-specific peptides is challenging, with many protein isoforms remaining uncharted due to technical limitations. Recently, a family of advanced targeted MS strategies, termed internal standard parallel reaction monitoring (IS-PRM), have demonstrated multiplexed, sensitive detection of predefined peptides of interest. Such approaches have not yet been used to confirm existence of novel peptides. Here, we present a targeted proteogenomic approach that leverages sample-matched long-read RNA sequencing (lrRNA-seq) data to predict potential protein isoforms with prior transcript evidence. Predicted tryptic isoform-specific peptides, which are specific to individual gene product isoforms, serve as "triggers" and "targets" in the IS-PRM method, Tomahto. Using the model human stem cell line WTC11, LR RNaseq data were generated and used to inform the generation of synthetic standards for 192 isoform-specific peptides (114 isoforms from 55 genes). These synthetic "trigger" peptides were labeled with super heavy tandem mass tags (TMT) and spiked into TMT-labeled WTC11 tryptic digest, predicted to contain corresponding endogenous "target" peptides. Compared to DDA mode, Tomahto increased detectability of isoforms by 3.6-fold, resulting in the identification of five previously unannotated isoforms. Our method detected protein isoform expression for 43 out of 55 genes corresponding to 54 resolved isoforms. This lrRNA-seq-informed Tomahto targeted approach is a new modality for generating protein-level evidence of alternative isoforms─a critical first step in designing functional studies and eventually clinical assays.

7.
Sci Rep ; 14(1): 17214, 2024 07 26.
Artículo en Inglés | MEDLINE | ID: mdl-39060396

RESUMEN

Backstroke has been thoroughly investigated in the context of sports science. However, we have no knowledge about the nationalities of the fastest age group backstroke swimmers. Therefore, the present study intended to investigate the nationalities of the fastest backstroke swimmers. For all World Masters Championships held between 1986 and 2024, the year of competition, the first and last name, the age, and the age group, and both the stroke and the distance were recorded for each swimmer. Descriptive data were presented using mean, standard deviation, maximum and minimum values, and confidence intervals. The top ten race times for each swimming distance and sex were identified for descriptive purposes. Nationalities were then grouped into six categories: the top five nationalities with the most appearances in the backstroke swimming top ten times by distance each year and one group consisting of all other nationalities. The Kruskal-Wallis test compared nationality differences, followed by Bonferroni-adjusted pairwise comparisons to identify specific distinctions. Between 1986 and 2024, most age group backstroke swimmers (39.6%) competed in the 50 m event (11,964, 6206 women, and 5,758 men), followed by the 100 m event (32.3%, n = 9764, 5157 women, and 4607 men), and the 200 m event (28.1%, n = 8483, 4511 women, and 3,972 men). Germany had the highest number of top ten female swimmers in the 50 m backstroke distance. Brazil had the highest number of top ten male swimmers in the same distance. The USA had the highest number of female and male swimmers among the top ten in the 100 m and 200 m backstroke distances. Germany and Great Britain were the only countries with swimmers in the top ten for all female backstroke distances. Brazil, the USA, Italy, and Germany were the countries that had swimmers in the top ten for all male backstroke distances. In summary, the fastest backstroke age group swimmers originated from Germany, Brazil, USA, Great Britain, and Italy, where differences between the sexes and race distances exist.


Asunto(s)
Natación , Humanos , Masculino , Femenino , Adulto , Rendimiento Atlético/estadística & datos numéricos , Rendimiento Atlético/fisiología , Persona de Mediana Edad , Adulto Joven , Atletas/estadística & datos numéricos , Factores de Edad , Anciano , Adolescente
8.
BMC Genomics ; 25(1): 619, 2024 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-38898442

RESUMEN

Plant genomics plays a pivotal role in enhancing global food security and sustainability by offering innovative solutions for improving crop yield, disease resistance, and stress tolerance. As the number of sequenced genomes grows and the accuracy and contiguity of genome assemblies improve, structural annotation of plant genomes continues to be a significant challenge due to their large size, polyploidy, and rich repeat content. In this paper, we present an overview of the current landscape in crop genomics research, highlighting the diversity of genomic characteristics across various crop species. We also assessed the accuracy of popular gene prediction tools in identifying genes within crop genomes and examined the factors that impact their performance. Our findings highlight the strengths and limitations of BRAKER2 and Helixer as leading structural genome annotation tools and underscore the impact of genome complexity, fragmentation, and repeat content on their performance. Furthermore, we evaluated the suitability of the predicted proteins as a reliable search space in proteomics studies using mass spectrometry data. Our results provide valuable insights for future efforts to refine and advance the field of structural genome annotation.


Asunto(s)
Productos Agrícolas , Genoma de Planta , Anotación de Secuencia Molecular , Proteómica , Productos Agrícolas/genética , Proteómica/métodos , Genómica/métodos , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo
9.
Mol Cell Proteomics ; 23(7): 100798, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38871251

RESUMEN

Rescoring of peptide spectrum matches originating from database search engines enabled by peptide property predictors is exceeding the performance of peptide identification from traditional database search engines. In contrast to the peptide spectrum match scores calculated by traditional database search engines, rescoring peptide spectrum matches generates scores based on comparing observed and predicted peptide properties, such as fragment ion intensities and retention times. These newly generated scores enable a more efficient discrimination between correct and incorrect peptide spectrum matches. This approach was shown to lead to substantial improvements in the number of confidently identified peptides, facilitating the analysis of challenging datasets in various fields such as immunopeptidomics, metaproteomics, proteogenomics, and single-cell proteomics. In this review, we summarize the key elements leading up to the recent introduction of multiple data-driven rescoring pipelines. We provide an overview of relevant post-processing rescoring tools, introduce prominent data-driven rescoring pipelines for various applications, and highlight limitations, opportunities, and future perspectives of this approach and its impact on mass spectrometry-based proteomics.


Asunto(s)
Péptidos , Proteómica , Proteómica/métodos , Péptidos/metabolismo , Péptidos/química , Humanos , Bases de Datos de Proteínas , Espectrometría de Masas/métodos , Motor de Búsqueda
10.
bioRxiv ; 2024 Jun 03.
Artículo en Inglés | MEDLINE | ID: mdl-38895358

RESUMEN

Recent developments in machine-learning (ML) and deep-learning (DL) have immense potential for applications in proteomics, such as generating spectral libraries, improving peptide identification, and optimizing targeted acquisition modes. Although new ML/DL models for various applications and peptide properties are frequently published, the rate at which these models are adopted by the community is slow, which is mostly due to technical challenges. We believe that, for the community to make better use of state-of-the-art models, more attention should be spent on making models easy to use and accessible by the community. To facilitate this, we developed Koina, an open-source containerized, decentralized and online-accessible high-performance prediction service that enables ML/DL model usage in any pipeline. Using the widely used FragPipe computational platform as example, we show how Koina can be easily integrated with existing proteomics software tools and how these integrations improve data analysis.

11.
Nat Commun ; 15(1): 3956, 2024 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-38730277

RESUMEN

Immunopeptidomics is crucial for immunotherapy and vaccine development. Because the generation of immunopeptides from their parent proteins does not adhere to clear-cut rules, rather than being able to use known digestion patterns, every possible protein subsequence within human leukocyte antigen (HLA) class-specific length restrictions needs to be considered during sequence database searching. This leads to an inflation of the search space and results in lower spectrum annotation rates. Peptide-spectrum match (PSM) rescoring is a powerful enhancement of standard searching that boosts the spectrum annotation performance. We analyze 302,105 unique synthesized non-tryptic peptides from the ProteomeTools project on a timsTOF-Pro to generate a ground-truth dataset containing 93,227 MS/MS spectra of 74,847 unique peptides, that is used to fine-tune the deep learning-based fragment ion intensity prediction model Prosit. We demonstrate up to 3-fold improvement in the identification of immunopeptides, as well as increased detection of immunopeptides from low input samples.


Asunto(s)
Aprendizaje Profundo , Péptidos , Espectrometría de Masas en Tándem , Humanos , Péptidos/química , Péptidos/inmunología , Espectrometría de Masas en Tándem/métodos , Bases de Datos de Proteínas , Proteómica/métodos , Antígenos HLA/inmunología , Antígenos HLA/genética , Programas Informáticos , Iones
12.
bioRxiv ; 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38617311

RESUMEN

Alternative splicing is a major contributor of transcriptomic complexity, but the extent to which transcript isoforms are translated into stable, functional protein isoforms is unclear. Furthermore, detection of relatively scarce isoform-specific peptides is challenging, with many protein isoforms remaining uncharted due to technical limitations. Recently, a family of advanced targeted MS strategies, termed internal standard parallel reaction monitoring (IS-PRM), have demonstrated multiplexed, sensitive detection of pre-defined peptides of interest. Such approaches have not yet been used to confirm existence of novel peptides. Here, we present a targeted proteogenomic approach that leverages sample-matched long-read RNA sequencing (LR RNAseq) data to predict potential protein isoforms with prior transcript evidence. Predicted tryptic isoform-specific peptides, which are specific to individual gene product isoforms, serve as "triggers" and "targets" in the IS-PRM method, Tomahto. Using the model human stem cell line WTC11, LR RNAseq data were generated and used to inform the generation of synthetic standards for 192 isoform-specific peptides (114 isoforms from 55 genes). These synthetic "trigger" peptides were labeled with super heavy tandem mass tags (TMT) and spiked into TMT-labeled WTC11 tryptic digest, predicted to contain corresponding endogenous "target" peptides. Compared to DDA mode, Tomahto increased detectability of isoforms by 3.6-fold, resulting in the identification of five previously unannotated isoforms. Our method detected protein isoform expression for 43 out of 55 genes corresponding to 54 resolved isoforms. This LR RNA seq-informed Tomahto targeted approach, called LRP-IS-PRM, is a new modality for generating protein-level evidence of alternative isoforms - a critical first step in designing functional studies and eventually clinical assays.

14.
Methods Mol Biol ; 2758: 457-483, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38549030

RESUMEN

Liquid chromatography-coupled mass spectrometry (LC-MS/MS) is the primary method to obtain direct evidence for the presentation of disease- or patient-specific human leukocyte antigen (HLA). However, compared to the analysis of tryptic peptides in proteomics, the analysis of HLA peptides still poses computational and statistical challenges. Recently, fragment ion intensity-based matching scores assessing the similarity between predicted and observed spectra were shown to substantially increase the number of confidently identified peptides, particularly in use cases where non-tryptic peptides are analyzed. In this chapter, we describe in detail three procedures on how to benefit from state-of-the-art deep learning models to analyze and validate single spectra, single measurements, and multiple measurements in mass spectrometry-based immunopeptidomics. For this, we explain how to use the Universal Spectrum Explorer (USE), online Oktoberfest, and offline Oktoberfest. For intensity-based scoring, Oktoberfest uses fragment ion intensity and retention time predictions from the deep learning framework Prosit, a deep neural network trained on a very large number of synthetic peptides and tandem mass spectra generated within the ProteomeTools project. The examples shown highlight how deep learning-assisted analysis can increase the number of identified HLA peptides, facilitate the discovery of confidently identified neo-epitopes, or provide assistance in the assessment of the presence of cryptic peptides, such as spliced peptides.


Asunto(s)
Aprendizaje Profundo , Humanos , Cromatografía Liquida , Espectrometría de Masas en Tándem/métodos , Péptidos/análisis , Antígenos de Histocompatibilidad Clase I , Antígenos HLA
15.
Nat Commun ; 15(1): 151, 2024 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-38167372

RESUMEN

Unlike for DNA and RNA, accurate and high-throughput sequencing methods for proteins are lacking, hindering the utility of proteomics in applications where the sequences are unknown including variant calling, neoepitope identification, and metaproteomics. We introduce Spectralis, a de novo peptide sequencing method for tandem mass spectrometry. Spectralis leverages several innovations including a convolutional neural network layer connecting peaks in spectra spaced by amino acid masses, proposing fragment ion series classification as a pivotal task for de novo peptide sequencing, and a peptide-spectrum confidence score. On spectra for which database search provided a ground truth, Spectralis surpassed 40% sensitivity at 90% precision, nearly doubling state-of-the-art sensitivity. Application to unidentified spectra confirmed its superiority and showcased its applicability to variant calling. Altogether, these algorithmic innovations and the substantial sensitivity increase in the high-precision range constitute an important step toward broadly applicable peptide sequencing.


Asunto(s)
Aprendizaje Profundo , Algoritmos , Análisis de Secuencia de Proteína/métodos , Péptidos/química , Secuencia de Aminoácidos
16.
Proteomics ; 24(8): e2300112, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-37672792

RESUMEN

Machine learning (ML) and deep learning (DL) models for peptide property prediction such as Prosit have enabled the creation of high quality in silico reference libraries. These libraries are used in various applications, ranging from data-independent acquisition (DIA) data analysis to data-driven rescoring of search engine results. Here, we present Oktoberfest, an open source Python package of our spectral library generation and rescoring pipeline originally only available online via ProteomicsDB. Oktoberfest is largely search engine agnostic and provides access to online peptide property predictions, promoting the adoption of state-of-the-art ML/DL models in proteomics analysis pipelines. We demonstrate its ability to reproduce and even improve our results from previously published rescoring analyses on two distinct use cases. Oktoberfest is freely available on GitHub (https://github.com/wilhelm-lab/oktoberfest) and can easily be installed locally through the cross-platform PyPI Python package.


Asunto(s)
Proteómica , Programas Informáticos , Proteómica/métodos , Péptidos , Algoritmos
17.
medRxiv ; 2023 Nov 09.
Artículo en Inglés | MEDLINE | ID: mdl-38076997

RESUMEN

Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs)1-3. Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs. With NeEDL (network-based epistasis detection via local search), we leverage network medicine to inform the selection of EIs that are an order of magnitude more statistically significant compared to existing tools and consist, on average, of five SNPs. We further show that this computationally demanding task can be substantially accelerated once quantum computing hardware becomes available. We apply NeEDL to eight different diseases and discover genes (affected by EIs of SNPs) that are partly known to affect the disease, additionally, these results are reproducible across independent cohorts. EIs for these eight diseases can be interactively explored in the Epistasis Disease Atlas (https://epistasis-disease-atlas.com). In summary, NeEDL is the first application that demonstrates the potential of seamlessly integrated quantum computing techniques to accelerate biomedical research. Our network medicine approach detects higher-order EIs with unprecedented statistical and biological evidence, yielding unique insights into polygenic diseases and providing a basis for the development of improved risk scores and combination therapies.

18.
Nat Chem Biol ; 2023 Oct 30.
Artículo en Inglés | MEDLINE | ID: mdl-37904048

RESUMEN

Medicinal chemistry has discovered thousands of potent protein and lipid kinase inhibitors. These may be developed into therapeutic drugs or chemical probes to study kinase biology. Because of polypharmacology, a large part of the human kinome currently lacks selective chemical probes. To discover such probes, we profiled 1,183 compounds from drug discovery projects in lysates of cancer cell lines using Kinobeads. The resulting 500,000 compound-target interactions are available in ProteomicsDB and we exemplify how this molecular resource may be used. For instance, the data revealed several hundred reasonably selective compounds for 72 kinases. Cellular assays validated GSK986310C as a candidate SYK (spleen tyrosine kinase) probe and X-ray crystallography uncovered the structural basis for the observed selectivity of the CK2 inhibitor GW869516X. Compounds targeting PKN3 were discovered and phosphoproteomics identified substrates that indicate target engagement in cells. We anticipate that this molecular resource will aid research in drug discovery and chemical biology.

19.
Anal Chem ; 95(37): 13746-13749, 2023 09 19.
Artículo en Inglés | MEDLINE | ID: mdl-37676919

RESUMEN

Mass spectrometry coupled to liquid chromatography is one of the most powerful technologies for proteome quantification in biomedical samples. In peptide-centric workflows, protein mixtures are enzymatically digested to peptides prior their analysis. However, proteome-wide quantification studies rarely identify all potential peptides for any given protein, and targeted proteomics experiments focus on a set of peptides for the proteins of interest. Consequently, proteomics relies on the use of a limited subset of all possible peptides as proxies for protein quantitation. In this work, we evaluated the stability of the human proteotypic peptides during 21 days and trained a deep learning model to predict peptide stability directly from tryptic sequences, which together constitute a resource of broad interest to prioritize and select peptides in proteome quantification experiments.


Asunto(s)
Proteoma , Proteómica , Humanos , Péptidos , Cromatografía Liquida , Espectrometría de Masas
20.
Nat Commun ; 14(1): 4632, 2023 08 02.
Artículo en Inglés | MEDLINE | ID: mdl-37532709

RESUMEN

Systemic pan-tumor analyses may reveal the significance of common features implicated in cancer immunogenicity and patient survival. Here, we provide a comprehensive multi-omics data set for 32 patients across 25 tumor types for proteogenomic-based discovery of neoantigens. By using an optimized computational approach, we discover a large number of tumor-specific and tumor-associated antigens. To create a pipeline for the identification of neoantigens in our cohort, we combine DNA and RNA sequencing with MS-based immunopeptidomics of tumor specimens, followed by the assessment of their immunogenicity and an in-depth validation process. We detect a broad variety of non-canonical HLA-binding peptides in the majority of patients demonstrating partially immunogenicity. Our validation process allows for the selection of 32 potential neoantigen candidates. The majority of neoantigen candidates originates from variants identified in the RNA data set, illustrating the relevance of RNA as a still understudied source of cancer antigens. This study underlines the importance of RNA-centered variant detection for the identification of shared biomarkers and potentially relevant neoantigen candidates.


Asunto(s)
Neoplasias , Proteogenómica , Humanos , Neoplasias/genética , Antígenos de Neoplasias/genética , Péptidos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA