Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 413
Filtrar
1.
J Proteome Res ; 23(6): 1983-1999, 2024 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-38728051

RESUMEN

In recent years, several deep learning-based methods have been proposed for predicting peptide fragment intensities. This study aims to provide a comprehensive assessment of six such methods, namely Prosit, DeepMass:Prism, pDeep3, AlphaPeptDeep, Prosit Transformer, and the method proposed by Guan et al. To this end, we evaluated the accuracy of the predicted intensity profiles for close to 1.7 million precursors (including both tryptic and HLA peptides) corresponding to more than 18 million experimental spectra procured from 40 independent submissions to the PRIDE repository that were acquired for different species using a variety of instruments and different dissociation types/energies. Specifically, for each method, distributions of similarity (measured by Pearson's correlation and normalized angle) between the predicted and the corresponding experimental b and y fragment intensities were generated. These distributions were used to ascertain the prediction accuracy and rank the prediction methods for particular types of experimental conditions. The effect of variables like precursor charge, length, and collision energy on the prediction accuracy was also investigated. In addition to prediction accuracy, the methods were evaluated in terms of prediction speed. The systematic assessment of these six methods may help in choosing the right method for MS/MS spectra prediction for particular needs.


Asunto(s)
Aprendizaje Profundo , Humanos , Fragmentos de Péptidos/química , Fragmentos de Péptidos/análisis , Espectrometría de Masas en Tándem/métodos , Espectrometría de Masas en Tándem/estadística & datos numéricos , Proteómica/métodos , Proteómica/estadística & datos numéricos
2.
J Proteome Res ; 23(6): 2078-2089, 2024 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-38666436

RESUMEN

Data-independent acquisition (DIA) has become a well-established method for MS-based proteomics. However, the list of options to analyze this type of data is quite extensive, and the use of spectral libraries has become an important factor in DIA data analysis. More specifically the use of in silico predicted libraries is gaining more interest. By working with a differential spike-in of human standard proteins (UPS2) in a constant yeast tryptic digest background, we evaluated the sensitivity, precision, and accuracy of the use of in silico predicted libraries in data DIA data analysis workflows compared to more established workflows. Three commonly used DIA software tools, DIA-NN, EncyclopeDIA, and Spectronaut, were each tested in spectral library mode and spectral library-free mode. In spectral library mode, we used independent spectral library prediction tools PROSIT and MS2PIP together with DeepLC, next to classical data-dependent acquisition (DDA)-based spectral libraries. In total, we benchmarked 12 computational workflows for DIA. Our comparison showed that DIA-NN reached the highest sensitivity while maintaining a good compromise on the reproducibility and accuracy levels in either library-free mode or using in silico predicted libraries pointing to a general benefit in using in silico predicted libraries.


Asunto(s)
Simulación por Computador , Proteómica , Programas Informáticos , Flujo de Trabajo , Proteómica/métodos , Proteómica/estadística & datos numéricos , Humanos , Reproducibilidad de los Resultados , Análisis de Datos , Biblioteca de Péptidos
3.
PLoS Biol ; 18(11): e3000999, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-33253151

RESUMEN

How do we scale biological science to the demand of next generation biology and medicine to keep track of the facts, predictions, and hypotheses? These days, enormous amounts of DNA sequence and other omics data are generated. Since these data contain the blueprint for life, it is imperative that we interpret it accurately. The abundance of DNA is only one part of the challenge. Artificial Intelligence (AI) and network methods routinely build on large screens, single cell technologies, proteomics, and other modalities to infer or predict biological functions and phenotypes associated with proteins, pathways, and organisms. As a first step, how do we systematically trace the provenance of knowledge from experimental ground truth to gene function predictions and annotations? Here, we review the main challenges in tracking the evolution of biological knowledge and propose several specific solutions to provenance and computational tracing of evidence in functional linkage networks.


Asunto(s)
Macrodatos , Redes Reguladoras de Genes , Genómica/estadística & datos numéricos , Algoritmos , Inteligencia Artificial , Biología Computacional , Ligamiento Genético , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Modelos Genéticos , Proteómica/estadística & datos numéricos , Biología Sintética , Biología de Sistemas
4.
Mol Cell ; 59(5): 867-81, 2015 Sep 03.
Artículo en Inglés | MEDLINE | ID: mdl-26051181

RESUMEN

Execution of the DNA damage response (DDR) relies upon a dynamic array of protein modifications. Using quantitative proteomics, we have globally profiled ubiquitination, acetylation, and phosphorylation in response to UV and ionizing radiation. To improve acetylation site profiling, we developed the strategy FACET-IP. Our datasets of 33,500 ubiquitination and 16,740 acetylation sites provide valuable insight into DDR remodeling of the proteome. We find that K6- and K33-linked polyubiquitination undergo bulk increases in response to DNA damage, raising the possibility that these linkages are largely dedicated to DDR function. We also show that Cullin-RING ligases mediate 10% of DNA damage-induced ubiquitination events and that EXO1 is an SCF-Cyclin F substrate in the response to UV radiation. Our extensive datasets uncover additional regulated sites on known DDR players such as PCNA and identify previously unknown DDR targets such as CENPs, underscoring the broad impact of the DDR on cellular physiology.


Asunto(s)
Daño del ADN , Proteómica/métodos , Acetilación/efectos de la radiación , Proteínas Cullin/metabolismo , Reparación del ADN , Enzimas Reparadoras del ADN/metabolismo , Bases de Datos de Proteínas , Exodesoxirribonucleasas/metabolismo , Células HeLa , Humanos , Fosforilación/efectos de la radiación , Complejo de la Endopetidasa Proteasomal/metabolismo , Análisis por Matrices de Proteínas/estadística & datos numéricos , Proteoma/metabolismo , Proteoma/efectos de la radiación , Proteómica/estadística & datos numéricos , Huso Acromático/metabolismo , Ubiquitinación/efectos de la radiación
5.
PLoS Comput Biol ; 17(11): e1009161, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34762640

RESUMEN

Network propagation refers to a class of algorithms that integrate information from input data across connected nodes in a given network. These algorithms have wide applications in systems biology, protein function prediction, inferring condition-specifically altered sub-networks, and prioritizing disease genes. Despite the popularity of network propagation, there is a lack of comparative analyses of different algorithms on real data and little guidance on how to select and parameterize the various algorithms. Here, we address this problem by analyzing different combinations of network normalization and propagation methods and by demonstrating schemes for the identification of optimal parameter settings on real proteome and transcriptome data. Our work highlights the risk of a 'topology bias' caused by the incorrect use of network normalization approaches. Capitalizing on the fact that network propagation is a regularization approach, we show that minimizing the bias-variance tradeoff can be utilized for selecting optimal parameters. The application to real multi-omics data demonstrated that optimal parameters could also be obtained by either maximizing the agreement between different omics layers (e.g. proteome and transcriptome) or by maximizing the consistency between biological replicates. Furthermore, we exemplified the utility and robustness of network propagation on multi-omics datasets for identifying ageing-associated genes in brain and liver tissues of rats and for elucidating molecular mechanisms underlying prostate cancer progression. Overall, this work compares different network propagation approaches and it presents strategies for how to use network propagation algorithms to optimally address a specific research question at hand.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Envejecimiento/genética , Envejecimiento/metabolismo , Animales , Sesgo , Encéfalo/metabolismo , Biología Computacional/estadística & datos numéricos , Interpretación Estadística de Datos , Progresión de la Enfermedad , Perfilación de la Expresión Génica/estadística & datos numéricos , Redes Reguladoras de Genes , Genómica/estadística & datos numéricos , Humanos , Hígado/metabolismo , Masculino , Neoplasias de la Próstata/etiología , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/metabolismo , Mapas de Interacción de Proteínas , Proteómica/estadística & datos numéricos , ARN Mensajero/genética , ARN Mensajero/metabolismo , Ratas , Biología de Sistemas
6.
PLoS Comput Biol ; 17(2): e1008101, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33617527

RESUMEN

Proteases are an important class of enzymes, whose activity is central to many physiologic and pathologic processes. Detailed knowledge of protease specificity is key to understanding their function. Although many methods have been developed to profile specificities of proteases, few have the diversity and quantitative grasp necessary to fully define specificity of a protease, both in terms of substrate numbers and their catalytic efficiencies. We have developed a concept of "selectome"; the set of substrate amino acid sequences that uniquely represent the specificity of a protease. We applied it to two closely related members of the Matrixin family-MMP-2 and MMP-9 by using substrate phage display coupled with Next Generation Sequencing and information theory-based data analysis. We have also derived a quantitative measure of substrate specificity, which accounts for both the number of substrates and their relative catalytic efficiencies. Using these advances greatly facilitates elucidation of substrate selectivity between closely related members of a protease family. The study also provides insight into the degree to which the catalytic cleft defines substrate recognition, thus providing basis for overcoming two of the major challenges in the field of proteolysis: 1) development of highly selective activity probes for studying proteases with overlapping specificities, and 2) distinguishing targeted proteolysis from bystander proteolytic events.


Asunto(s)
Modelos Biológicos , Péptido Hidrolasas/genética , Péptido Hidrolasas/metabolismo , Secuencia de Aminoácidos , Dominio Catalítico/genética , Biología Computacional , Secuenciación de Nucleótidos de Alto Rendimiento , Teoría de la Información , Metaloproteinasa 2 de la Matriz/química , Metaloproteinasa 2 de la Matriz/genética , Metaloproteinasa 2 de la Matriz/metabolismo , Metaloproteinasa 9 de la Matriz/química , Metaloproteinasa 9 de la Matriz/genética , Metaloproteinasa 9 de la Matriz/metabolismo , Modelos Moleculares , Péptido Hidrolasas/clasificación , Biblioteca de Péptidos , Pliegue de Proteína , Proteolisis , Proteómica/métodos , Proteómica/estadística & datos numéricos , Especificidad por Sustrato/genética , Especificidad por Sustrato/fisiología
7.
J Proteome Res ; 20(3): 1457-1463, 2021 03 05.
Artículo en Inglés | MEDLINE | ID: mdl-33617253

RESUMEN

Since the outset of COVID-19, the pandemic has prompted immediate global efforts to sequence SARS-CoV-2, and over 450 000 complete genomes have been publicly deposited over the course of 12 months. Despite this, comparative nucleotide and amino acid sequence analyses often fall short in answering key questions in vaccine design. For example, the binding affinity between different ACE2 receptors and SARS-COV-2 spike protein cannot be fully explained by amino acid similarity at ACE2 contact sites because protein structure similarities are not fully reflected by amino acid sequence similarities. To comprehensively compare protein homology, secondary structure (SS) analysis is required. While protein structure is slow and difficult to obtain, SS predictions can be made rapidly, and a well-predicted SS structure may serve as a viable proxy to gain biological insight. Here we review algorithms and information used in predicting protein SS to highlight its potential application in pandemics research. We also showed examples of how SS predictions can be used to compare ACE2 proteins and to evaluate the zoonotic origins of viruses. As computational tools are much faster than wet-lab experiments, these applications can be important for research especially in times when quickly obtained biological insights can help in speeding up response to pandemics.


Asunto(s)
COVID-19/virología , SARS-CoV-2/química , SARS-CoV-2/genética , Glicoproteína de la Espiga del Coronavirus/química , Glicoproteína de la Espiga del Coronavirus/genética , Algoritmos , Enzima Convertidora de Angiotensina 2/química , Enzima Convertidora de Angiotensina 2/genética , Animales , COVID-19/genética , Genoma Viral , Interacciones Microbiota-Huesped/genética , Humanos , Modelos Moleculares , Pandemias , Dominios y Motivos de Interacción de Proteínas , Estructura Secundaria de Proteína , Proteómica/estadística & datos numéricos , Receptores Virales/química , Receptores Virales/genética , SARS-CoV-2/patogenicidad , Alineación de Secuencia
8.
J Proteome Res ; 20(3): 1464-1475, 2021 03 05.
Artículo en Inglés | MEDLINE | ID: mdl-33605735

RESUMEN

The SARS-CoV-2 virus is the causative agent of the 2020 pandemic leading to the COVID-19 respiratory disease. With many scientific and humanitarian efforts ongoing to develop diagnostic tests, vaccines, and treatments for COVID-19, and to prevent the spread of SARS-CoV-2, mass spectrometry research, including proteomics, is playing a role in determining the biology of this viral infection. Proteomics studies are starting to lead to an understanding of the roles of viral and host proteins during SARS-CoV-2 infection, their protein-protein interactions, and post-translational modifications. This is beginning to provide insights into potential therapeutic targets or diagnostic strategies that can be used to reduce the long-term burden of the pandemic. However, the extraordinary situation caused by the global pandemic is also highlighting the need to improve mass spectrometry data and workflow sharing. We therefore describe freely available data and computational resources that can facilitate and assist the mass spectrometry-based analysis of SARS-CoV-2. We exemplify this by reanalyzing a virus-host interactome data set to detect protein-protein interactions and identify host proteins that could potentially be used as targets for drug repurposing.


Asunto(s)
COVID-19/virología , Difusión de la Información/métodos , Espectrometría de Masas/métodos , SARS-CoV-2/química , COVID-19/epidemiología , Prueba de COVID-19/métodos , Prueba de COVID-19/estadística & datos numéricos , Biología Computacional , Bases de Datos de Proteínas/estadística & datos numéricos , Reposicionamiento de Medicamentos , Interacciones Microbiota-Huesped/fisiología , Humanos , Espectrometría de Masas/estadística & datos numéricos , Pandemias , Dominios y Motivos de Interacción de Proteínas , Mapas de Interacción de Proteínas , Procesamiento Proteico-Postraduccional , Proteómica/métodos , Proteómica/estadística & datos numéricos , SARS-CoV-2/patogenicidad , SARS-CoV-2/fisiología , Proteínas Virales/química , Proteínas Virales/fisiología , Tratamiento Farmacológico de COVID-19
9.
Trends Genet ; 34(10): 790-805, 2018 10.
Artículo en Inglés | MEDLINE | ID: mdl-30143323

RESUMEN

Omics data contain signals from the molecular, physical, and kinetic inter- and intracellular interactions that control biological systems. Matrix factorization (MF) techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in applications ranging from pathway discovery to timecourse analysis. We review exemplary applications of MF for systems-level analyses. We discuss appropriate applications of these methods, their limitations, and focus on the analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with MF enables discovery from high-throughput data beyond the limits of current biological knowledge - answering questions from high-dimensional data that we have not yet thought to ask.


Asunto(s)
Interpretación Estadística de Datos , Genómica/estadística & datos numéricos , Proteómica/estadística & datos numéricos , Algoritmos , Humanos , Biología de Sistemas/estadística & datos numéricos
10.
J Hepatol ; 75(6): 1377-1386, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34329660

RESUMEN

BACKGROUND & AIMS: The microenvironment of intrahepatic cholangiocarcinoma (iCCA) is hypovascularized, with an extensive lymphatic network. This leads to rapid cancer spread into regional lymph nodes and the liver parenchyma, precluding curative treatments. Herein, we investigated which factors released in the iCCA stroma drive the inhibition of angiogenesis and promote lymphangiogenesis. METHODS: Quantitative proteomics was performed on extracellular fluid (ECF) proteins extracted both from cancerous and non-cancerous tissues (NCT) of patients with iCCA. Computational biology was applied on a proteomic dataset to identify proteins involved in the regulation of vessel formation. Endothelial cells incubated with ECF from either iCCA or NCT specimens were used to assess the role of candidate proteins in 3D vascular assembly, cell migration, proliferation and viability. Angiogenesis and lymphangiogenesis were further investigated in vivo by a heterotopic transplantation of bone marrow stromal cells, along with endothelial cells in SCID/beige mice. RESULTS: Functional analysis of upregulated proteins in iCCA unveils a soluble angio-inhibitory milieu made up of thrombospondin (THBS)1, THBS2 and pigment epithelium-derived factor (PEDF). iCCA ECF was able to inhibit in vitro vessel morphogenesis and viability. Antibodies blocking THBS1, THBS2 and PEDF restored tube formation and endothelial cell viability to levels observed in NCT ECF. Moreover, in transplanted mice, the inhibition of blood vessel formation, the de novo generation of the lymphatic network and the dissemination of iCCA cells in lymph nodes were shown to depend on THBS1, THBS2 and PEDF expression. CONCLUSIONS: THBS1, THBS2 and PEDF reduce blood vessel formation and promote tumor-associated lymphangiogenesis in iCCA. Our results identify new potential targets for interventions to counteract the dissemination process in iCCA. LAY SUMMARY: Intrahepatic cholangiocarcinoma is a highly aggressive cancer arising from epithelial cells lining the biliary tree, characterized by dissemination into the liver parenchyma via lymphatic vessels. Herein, we show that the proteins THBS1, THBS2 and PEDF, once released in the tumor microenvironment, inhibit vascular growth, while promoting cancer-associated lymphangiogenesis. Therefore, targeting THBS1, THBS2 and PEDF may be a promising strategy to reduce cancer-associated lymphangiogenesis and counteract the invasiveness of intrahepatic cholangiocarcinoma.


Asunto(s)
Inductores de la Angiogénesis/metabolismo , Colangiocarcinoma/etiología , Linfangiogénesis/efectos de los fármacos , Trombospondina 1/farmacología , Trombospondinas/farmacología , Inhibidores de la Angiogénesis/farmacología , Inhibidores de la Angiogénesis/uso terapéutico , Animales , Colangiocarcinoma/fisiopatología , Modelos Animales de Enfermedad , Ratones , Proteómica/métodos , Proteómica/estadística & datos numéricos , Trombospondina 1/administración & dosificación , Trombospondinas/administración & dosificación , Microambiente Tumoral/efectos de los fármacos
11.
Brief Bioinform ; 20(1): 356-359, 2019 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-28981583

RESUMEN

Major scientific challenges that are beyond the capability of individuals need to be addressed by multi-disciplinary and multi-institutional consortia. Examples of these endeavours include the Human Genome Project, and more recently, the Structural Genomics (SG) initiative. The SG initiative pursues the expansion of structural coverage to include at least one structural representative for each protein family to derive the remaining structures using homology modelling. However, biological function is inherently connected with protein dynamics that can be studied by knowing different structures of the same protein. This ensemble of structures provides snapshots of protein conformational diversity under native conditions. Thus, sequence redundancy in the Protein Data Bank (PDB) (i.e. crystallization of the same protein under different conditions) is therefore an essential input contributing to experimentally based studies of protein dynamics and providing insights into protein function. In this work, we show that sequence redundancy, a key concept for exploring protein dynamics, is highly biased and fundamentally incomplete in the PDB. Additionally, our results show that dynamical behaviour of proteins cannot be inferred using homologous proteins. Minor to moderate changes in sequence can produce great differences in dynamical behaviour. Nonetheless, the structural and dynamical incompleteness of the PDB is apparently unrelated concepts in SG. While the first could be reversed by promoting the extension of the structural coverage, we would like to emphasize that further focused efforts will be needed to amend the incompleteness of the PDB in terms of dynamical information content, essential to fully understand protein function.


Asunto(s)
Bases de Datos de Proteínas/estadística & datos numéricos , Biología Computacional/métodos , Biología Computacional/estadística & datos numéricos , Cristalografía por Rayos X , Genómica/estadística & datos numéricos , Humanos , Simulación de Dinámica Molecular , Conformación Proteica , Proteínas/química , Proteínas/genética , Proteómica/estadística & datos numéricos , Homología de Secuencia de Aminoácido , Homología Estructural de Proteína
12.
Brief Bioinform ; 20(4): 1269-1279, 2019 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-29272335

RESUMEN

With the recent developments in the field of multi-omics integration, the interest in factors such as data preprocessing, choice of the integration method and the number of different omics considered had increased. In this work, the impact of these factors is explored when solving the problem of sample classification, by comparing the performances of five unsupervised algorithms: Multiple Canonical Correlation Analysis, Multiple Co-Inertia Analysis, Multiple Factor Analysis, Joint and Individual Variation Explained and Similarity Network Fusion. These methods were applied to three real data sets taken from literature and several ad hoc simulated scenarios to discuss classification performance in different conditions of noise and signal strength across the data types. The impact of experimental design, feature selection and parameter training has been also evaluated to unravel important conditions that can affect the accuracy of the result.


Asunto(s)
Biología Computacional/métodos , Integración de Sistemas , Aprendizaje Automático no Supervisado , Algoritmos , Animales , Análisis por Conglomerados , Simulación por Computador , Bases de Datos Factuales , Análisis Factorial , Genómica/estadística & datos numéricos , Humanos , Metabolómica/estadística & datos numéricos , Ratones , Modelos Biológicos , Análisis Multivariante , Proteómica/estadística & datos numéricos , Biología de Sistemas , Aprendizaje Automático no Supervisado/estadística & datos numéricos
13.
Brief Bioinform ; 20(1): 347-355, 2019 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-30657890

RESUMEN

Mass spectrometry (MS)-based proteomics has undergone rapid advancements in recent years, creating challenging problems for bioinformatics. We focus on four aspects where bioinformatics plays a crucial role (and proteomics is needed for clinical application): peptide-spectra matching (PSM) based on the new data-independent acquisition (DIA) paradigm, resolving missing proteins (MPs), dealing with biological and technical heterogeneity in data and statistical feature selection (SFS). DIA is a brute-force strategy that provides greater width and depth but, because it indiscriminately captures spectra such that signal from multiple peptides is mixed, getting good PSMs is difficult. We consider two strategies: simplification of DIA spectra to pseudo-data-dependent acquisition spectra or, alternatively, brute-force search of each DIA spectra against known reference libraries. The MP problem arises when proteins are never (or inconsistently) detected by MS. When observed in at least one sample, imputation methods can be used to guess the approximate protein expression level. If never observed at all, network/protein complex-based contextualization provides an independent prediction platform. Data heterogeneity is a difficult problem with two dimensions: technical (batch effects), which should be removed, and biological (including demography and disease subpopulations), which should be retained. Simple normalization is seldom sufficient, while batch effect-correction algorithms may create errors. Batch effect-resistant normalization methods are a viable alternative. Finally, SFS is vital for practical applications. While many methods exist, there is no best method, and both upstream (e.g. normalization) and downstream processing (e.g. multiple-testing correction) are performance confounders. We also discuss signal detection when class effects are weak.


Asunto(s)
Biología Computacional/métodos , Proteómica/estadística & datos numéricos , Algoritmos , Biología Computacional/estadística & datos numéricos , Bases de Datos de Proteínas/estadística & datos numéricos , Humanos , Péptidos/química , Proteínas/química , Programas Informáticos , Espectrometría de Masas en Tándem/estadística & datos numéricos
14.
J Hum Genet ; 66(1): 93-102, 2021 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-32385339

RESUMEN

Omics studies attempt to extract meaningful messages from large-scale and high-dimensional data sets by treating the data sets as a whole. The concept of treating data sets as a whole is important in every step of the data-handling procedures: the pre-processing step of data records, the step of statistical analyses and machine learning, translation of the outputs into human natural perceptions, and acceptance of the messages with uncertainty. In the pre-processing, the method by which to control the data quality and batch effects are discussed. For the main analyses, the approaches are divided into two types and their basic concepts are discussed. The first type is the evaluation of many items individually, followed by interpretation of individual items in the context of multiple testing and combination. The second type is the extraction of fewer important aspects from the whole data records. The outputs of the main analyses are translated into natural languages with techniques, such as annotation and ontology. The other technique for making the outputs perceptible is visualization. At the end of this review, one of the most important issues in the interpretation of omics data analyses is discussed. Omics studies have a large amount of information in their data sets, and every approach reveals only a very restricted aspect of the whole data sets. The understandable messages from these studies have unavoidable uncertainty.


Asunto(s)
Epigenómica/estadística & datos numéricos , Perfilación de la Expresión Génica/estadística & datos numéricos , Genómica/estadística & datos numéricos , Metabolómica/estadística & datos numéricos , Proteómica/estadística & datos numéricos , Interpretación Estadística de Datos , Epigenómica/métodos , Epigenómica/normas , Cromatografía de Gases y Espectrometría de Masas/métodos , Cromatografía de Gases y Espectrometría de Masas/normas , Cromatografía de Gases y Espectrometría de Masas/estadística & datos numéricos , Perfilación de la Expresión Génica/métodos , Perfilación de la Expresión Génica/normas , Genómica/métodos , Genómica/normas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Metabolómica/métodos , Metabolómica/normas , Proteómica/métodos , Proteómica/normas , Control de Calidad
15.
Biochemistry (Mosc) ; 86(3): 338-349, 2021 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-33838633

RESUMEN

One of the main goals of quantitative proteomics is molecular profiling of cellular response to stress at the protein level. To perform this profiling, statistical analysis of experimental data involves multiple testing of a hypothesis about the equality of protein concentrations between the cells under normal and stress conditions. This analysis is then associated with the multiple testing problem dealing with the increased chance of obtaining false positive results. A number of solutions to this problem are known, yet, they may lead to the loss of potentially important biological information when applied with commonly accepted thresholds of statistical significance. Using the proteomic data obtained earlier for the yeast samples containing proteins at known concentrations and the biological models of early and late cellular responses to stress, we analyzed dependences of distributions of false positive and false negative rates on the protein fold changes and thresholds of statistical significance. Based on the analysis of the density of data points in the volcano plots, Benjamini-Hochberg method, and gene ontology analysis, visual approach for optimization of the statistical threshold and selection of the differentially regulated proteins has been suggested, which could be useful for researchers working in the field of quantitative proteomics.


Asunto(s)
Astrocitos/fisiología , Proteómica/normas , Saccharomyces cerevisiae/fisiología , Estrés Fisiológico , Astrocitos/metabolismo , Reacciones Falso Positivas , Humanos , Proteómica/estadística & datos numéricos , Saccharomyces cerevisiae/metabolismo
16.
Int J Mol Sci ; 22(17)2021 Sep 06.
Artículo en Inglés | MEDLINE | ID: mdl-34502557

RESUMEN

Analysis of differential abundance in proteomics data sets requires careful application of missing value imputation. Missing abundance values widely vary when performing comparisons across different sample treatments. For example, one would expect a consistent rate of "missing at random" (MAR) across batches of samples and varying rates of "missing not at random" (MNAR) depending on the inherent difference in sample treatments within the study. The missing value imputation strategy must thus be selected that best accounts for both MAR and MNAR simultaneously. Several important issues must be considered when deciding the appropriate missing value imputation strategy: (1) when it is appropriate to impute data; (2) how to choose a method that reflects the combinatorial manner of MAR and MNAR that occurs in an experiment. This paper provides an evaluation of missing value imputation strategies used in proteomics and presents a case for the use of hybrid left-censored missing value imputation approaches that can handle the MNAR problem common to proteomics data.


Asunto(s)
Exactitud de los Datos , Bases de Datos de Proteínas/estadística & datos numéricos , Espectrometría de Masas/métodos , Proteómica/estadística & datos numéricos , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/patología , Línea Celular Tumoral , Glucosa/metabolismo , Humanos , Proteómica/métodos , Proteómica/normas
17.
J Proteome Res ; 19(1): 477-492, 2020 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-31664839

RESUMEN

Targeted analysis of sequential window acquisition of all theoretical mass spectra (SWATH-MS) requires the spectral library, which can be generated by shotgun mass spectrometry (MS) or by the pseudo-spectra files directly obtained from SWATH-MS data. The external library generated by shotgun MS is employed in most SWATH-MS research. However, performance of the internal library, which is constructed by pseudo-spectra files, in the targeted analysis of SWATH-MS has not been systemically evaluated. Here, we show that up to 40% of the peptides detected by the internal library were not overlapped with those detected by the external library for most SWATH-MS data sets. However, the internal library did not identify extra phosphopeptides compared with the external library for phosphoproteomic SWATH-MS data. Therefore, the internal library should be incorporated into the external library for targeted analysis of nonphosphoproteomic SWATH-MS, given that it can significantly increase the number of peptides of SWATH-MS without requiring additional instrument measurement time.


Asunto(s)
Espectrometría de Masas/métodos , Péptidos/análisis , Proteómica/métodos , Animales , Proteínas Sanguíneas/análisis , Línea Celular , Células HeLa , Humanos , Espectrometría de Masas/estadística & datos numéricos , Ratones , Biblioteca de Péptidos , Fosfoproteínas/análisis , Proteómica/estadística & datos numéricos , Flujo de Trabajo
18.
J Proteome Res ; 19(1): 248-259, 2020 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-31697504

RESUMEN

High-density lipoprotein (HDL) is a diverse group of particles with multiple cardioprotective functions. HDL proteome follows HDL particle complexity. Many proteins were described in HDL, but consistent quantification of HDL protein cargo is still a challenge. To address this issue, the aim of this work was to compare data-independent acquisition (DIA) and parallel reaction monitoring (PRM) methodologies in their abilities to differentiate HDL subclasses through their proteomes. To this end, we first evaluated the analytical performances of DIA and PRM using labeled peptides in pooled digested HDL as a biological matrix. Next, we compared the quantification capabilities of the two methodologies for 24 proteins found in HDL2 and HDL3 from 19 apparently healthy subjects. DIA and PRM exhibited comparable linearity, accuracy, and precision. Moreover, both methodologies worked equally well, differentiating HDL subclasses' proteomes with high precision. Our findings may help to understand HDL functional diversity.


Asunto(s)
Lipoproteínas HDL/sangre , Proteómica/métodos , Adulto , Anciano , Calibración , Cromatografía Líquida de Alta Presión/métodos , Humanos , Límite de Detección , Lipoproteínas HDL2/sangre , Lipoproteínas HDL3/sangre , Persona de Mediana Edad , Proteómica/estadística & datos numéricos , Control de Calidad , Espectrometría de Masas en Tándem/métodos , Flujo de Trabajo , Adulto Joven
19.
Proteins ; 88(11): 1413-1422, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32519388

RESUMEN

The Nest is a concave-shaped structural motif in proteins formed by consecutive enantiomeric left-handed (L) and right-handed (R) helical conformation of the backbone. This important motif subsumes many turn and helix capping structures and binds electron-rich ligands. Simple Nests are either RL or LR. Larger Nests (>2 residues long) may be RLR, LRL, RLRL, and so forth, being considered as composed of overlapping simple Nests. The larger Nests remain under-explored despite their widely known contributions to protein function. In our study, we address whether the recurrence of enantiomeric geometry in the larger Nests constrains the peptide backbone such that distinct compositional and conformational preferences are seen compared to simple Nests. Our analysis reveals the critical role of the L helical torsion angle in the formation of larger Nests. This can be observed through the higher propensity of residue or secondary structure combinations in LR and LRL backbone conformation in comparison to RL or RLR, although LR/LRL is considerably lower by occurrence. We also find that the most abundant doublets and triplets in Nests have a propensity for particular secondary structures, suggesting a strong sequence-structure relationship in the larger Nest. Overall, our analysis corroborates distinct features of simple and the larger Nests. Such insights would be helpful towards in-vitro design of peptides and peptidomimetic studies.


Asunto(s)
Secuencias de Aminoácidos , Bacterias/química , Modelos Moleculares , Proteómica/estadística & datos numéricos , Bacterias/genética , Bases de Datos de Proteínas , Conjuntos de Datos como Asunto , Humanos , Enlace de Hidrógeno , Estructura Secundaria de Proteína , Estereoisomerismo
20.
Brief Bioinform ; 19(5): 946-953, 2018 09 28.
Artículo en Inglés | MEDLINE | ID: mdl-28369202

RESUMEN

Biomedical researchers are often interested in computing the correlation between RNA and protein abundance. However, correlations can be computed between rows of a data matrix or between columns, and the results are not the same. The belief that these two types of correlation are estimating the same phenomenon is a special case of a well-known logical error called the ecological fallacy. In this article, we review different uses of correlation found in the literature, explain the differences between row and column correlations and argue that one of them has an undesirable interpretation in most applications. Through simulation studies and theoretical derivations, we show that the commonly used Pearson's coefficient, computed from protein and transcript data from a single sample, is only loosely related to the biological correlation that most researchers will be interested in studying. Beyond our basic exploration of the ecological fallacy, we examine how correlations are affected by relative quantification proteomics data and common normalization procedures, finding that double normalization is capable of completely masking true correlative relationships. We conclude with guidelines for properly identifying and computing consistent correlation coefficients.


Asunto(s)
Proteínas/genética , Proteínas/metabolismo , Proteómica/estadística & datos numéricos , ARN/genética , ARN/metabolismo , Sesgo , Biología Computacional/métodos , Simulación por Computador , Interpretación Estadística de Datos , Humanos , Modelos Biológicos , Modelos Estadísticos , Transcripción Genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA