Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 73
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36907650

RESUMO

Proteomic studies characterize the protein composition of complex biological samples. Despite recent advancements in mass spectrometry instrumentation and computational tools, low proteome coverage and interpretability remains a challenge. To address this, we developed Proteome Support Vector Enrichment (PROSE), a fast, scalable and lightweight pipeline for scoring proteins based on orthogonal gene co-expression network matrices. PROSE utilizes simple protein lists as input, generating a standard enrichment score for all proteins, including undetected ones. In our benchmark with 7 other candidate prioritization techniques, PROSE shows high accuracy in missing protein prediction, with scores correlating strongly to corresponding gene expression data. As a further proof-of-concept, we applied PROSE to a reanalysis of the Cancer Cell Line Encyclopedia proteomics dataset, where it captures key phenotypic features, including gene dependency. We lastly demonstrated its applicability on a breast cancer clinical dataset, showing clustering by annotated molecular subtype and identification of putative drivers of triple-negative breast cancer. PROSE is available as a user-friendly Python module from https://github.com/bwbio/PROSE.


Assuntos
Proteoma , Proteômica , Proteômica/métodos , Proteoma/análise
2.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37419612

RESUMO

Missing values (MVs) can adversely impact data analysis and machine-learning model development. We propose a novel mixed-model method for missing value imputation (MVI). This method, ProJect (short for Protein inJection), is a powerful and meaningful improvement over existing MVI methods such as Bayesian principal component analysis (PCA), probabilistic PCA, local least squares and quantile regression imputation of left-censored data. We rigorously tested ProJect on various high-throughput data types, including genomics and mass spectrometry (MS)-based proteomics. Specifically, we utilized renal cancer (RC) data acquired using DIA-SWATH, ovarian cancer (OC) data acquired using DIA-MS, bladder (BladderBatch) and glioblastoma (GBM) microarray gene expression dataset. Our results demonstrate that ProJect consistently performs better than other referenced MVI methods. It achieves the lowest normalized root mean square error (on average, scoring 45.92% less error in RC_C, 27.37% in RC_full, 29.22% in OC, 23.65% in BladderBatch and 20.20% in GBM relative to the closest competing method) and the Procrustes sum of squared error (Procrustes SS) (exhibits 79.71% less error in RC_C, 38.36% in RC full, 18.13% in OC, 74.74% in BladderBatch and 30.79% in GBM compared to the next best method). ProJect also leads with the highest correlation coefficient among all types of MV combinations (0.64% higher in RC_C, 0.24% in RC full, 0.55% in OC, 0.39% in BladderBatch and 0.27% in GBM versus the second-best performing method). ProJect's key strength is its ability to handle different types of MVs commonly found in real-world data. Unlike most MVI methods that are designed to handle only one type of MV, ProJect employs a decision-making algorithm that first determines if an MV is missing at random or missing not at random. It then employs targeted imputation strategies for each MV type, resulting in more accurate and reliable imputation outcomes. An R implementation of ProJect is available at https://github.com/miaomiao6606/ProJect.


Assuntos
Algoritmos , Genômica , Teorema de Bayes , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Espectrometria de Massas/métodos
3.
Bioinformatics ; 40(6)2024 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-38889277

RESUMO

MOTIVATION: Deep graph learning (DGL) has been widely employed in the realm of ligand-based virtual screening. Within this field, a key hurdle is the existence of activity cliffs (ACs), where minor chemical alterations can lead to significant changes in bioactivity. In response, several DGL models have been developed to enhance ligand bioactivity prediction in the presence of ACs. Yet, there remains a largely unexplored opportunity within ACs for optimizing ligand bioactivity, making it an area ripe for further investigation. RESULTS: We present a novel approach to simultaneously predict and optimize ligand bioactivities through DGL and ACs (OLB-AC). OLB-AC possesses the capability to optimize ligand molecules located near ACs, providing a direct reference for optimizing ligand bioactivities with the matching of original ligands. To accomplish this, a novel attentive graph reconstruction neural network and ligand optimization scheme are proposed. Attentive graph reconstruction neural network reconstructs original ligands and optimizes them through adversarial representations derived from their bioactivity prediction process. Experimental results on nine drug targets reveal that out of the 667 molecules generated through OLB-AC optimization on datasets comprising 974 low-activity, noninhibitor, or highly toxic ligands, 49 are recognized as known highly active, inhibitor, or nontoxic ligands beyond the datasets' scope. The 27 out of 49 matched molecular pairs generated by OLB-AC reveal novel transformations not present in their training sets. The adversarial representations employed for ligand optimization originate from the gradients of bioactivity predictions. Therefore, we also assess OLB-AC's prediction accuracy across 33 different bioactivity datasets. Results show that OLB-AC achieves the best Pearson correlation coefficient (r2) on 27/33 datasets, with an average improvement of 7.2%-22.9% against the state-of-the-art bioactivity prediction methods. AVAILABILITY AND IMPLEMENTATION: The code and dataset developed in this work are available at github.com/Yueming-Yin/OLB-AC.


Assuntos
Aprendizado Profundo , Ligantes , Redes Neurais de Computação , Descoberta de Drogas/métodos
4.
Proteomics ; 24(1-2): e2200332, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37876146

RESUMO

This article summarizes the PROTREC method and investigates the impact that the different hyper-parameters have on the task of missing protein prediction using PROTREC. We evaluate missing protein recovery rates using different PROTREC score selection approaches (MAX, MIN, MEDIAN, and MEAN), different PROTREC score thresholds, as well as different complex size thresholds. In addition, we included two additional cancer datasets in our analysis and introduced a new validation method to check both the robustness of the PROTREC method as well as the correctness of our analysis. Our analysis showed that the missing protein recovery rate can be improved by adopting PROTREC score selection operations of MIN, MEDIAN, and MEAN instead of the default MAX. However, this may come at a cost of reduced numbers of proteins predicted and validated. The users should therefore choose their hyper-parameters carefully to find a balance in the accuracy-quantity trade-off. We also explored the possibility of combining PROTREC with a p-value-based method (FCS) and demonstrated that PROTREC is able to perform well independently without any help from a p-value-based method. Furthermore, we conducted a downstream enrichment analysis to understand the biological pathways and protein networks within the cancerous tissues using the recovered proteins. Missing protein recovery rate using PROTREC can be improved by selecting a different PROTREC score selection method. Different PROTREC score selection methods and other hyper-parameters such as PROTREC score threshold and complex size threshold introduce accuracy-quantity trade-off. PROTREC is able to perform well independently of any filtering using a p-value-based method. Verification of the PROTREC method on additional cancer datasets. Downstream Enrichment Analysis to understand the biological pathways and protein networks in cancerous tissues.


Assuntos
Algoritmos , Neoplasias , Humanos
5.
PLoS Comput Biol ; 19(3): e1010961, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36930671

RESUMO

In mass spectrometry (MS)-based proteomics, protein inference from identified peptides (protein fragments) is a critical step. We present ProInfer (Protein Inference), a novel protein assembly method that takes advantage of information in biological networks. ProInfer assists recovery of proteins supported only by ambiguous peptides (a peptide which maps to more than one candidate protein) and enhances the statistical confidence for proteins supported by both unique and ambiguous peptides. Consequently, ProInfer rescues weakly supported proteins thereby improving proteome coverage. Evaluated across THP1 cell line, lung cancer and RAW267.4 datasets, ProInfer always infers the most numbers of true positives, in comparison to mainstream protein inference tools Fido, EPIFANY and PIA. ProInfer is also adept at retrieving differentially expressed proteins, signifying its usefulness for functional analysis and phenotype profiling. Source codes of ProInfer are available at https://github.com/PennHui2016/ProInfer.


Assuntos
Algoritmos , Peptídeos , Peptídeos/química , Proteoma/análise , Espectrometria de Massas , Proteômica/métodos , Bases de Dados de Proteínas , Software
6.
Bioinformatics ; 38(23): 5307-5314, 2022 11 30.
Artigo em Inglês | MEDLINE | ID: mdl-36264128

RESUMO

MOTIVATION: Differentiating 12 stages of the mouse seminiferous epithelial cycle is vital towards understanding the dynamic spermatogenesis process. However, it is challenging since two adjacent spermatogenic stages are morphologically similar. Distinguishing Stages I-III from Stages IV-V is important for histologists to understand sperm development in wildtype mice and spermatogenic defects in infertile mice. To achieve this, we propose a novel pipeline for computerized spermatogenesis staging (CSS). RESULTS: The CSS pipeline comprises four parts: (i) A seminiferous tubule segmentation model is developed to extract every single tubule; (ii) A multi-scale learning (MSL) model is developed to integrate local and global information of a seminiferous tubule to distinguish Stages I-V from Stages VI-XII; (iii) a multi-task learning (MTL) model is developed to segment the multiple testicular cells for Stages I-V without an exhaustive requirement for manual annotation; (iv) A set of 204D image-derived features is developed to discriminate Stages I-III from Stages IV-V by capturing cell-level and image-level representation. Experimental results suggest that the proposed MSL and MTL models outperform classic single-scale and single-task models when manual annotation is limited. In addition, the proposed image-derived features are discriminative between Stages I-III and Stages IV-V. In conclusion, the CSS pipeline can not only provide histologists with a solution to facilitate quantitative analysis for spermatogenesis stage identification but also help them to uncover novel computerized image-derived biomarkers. AVAILABILITY AND IMPLEMENTATION: https://github.com/jydada/CSS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sêmen , Espermatogênese , Camundongos , Masculino , Animais , Túbulos Seminíferos , Testículo/anatomia & histologia
7.
BMC Biol ; 20(1): 222, 2022 10 05.
Artigo em Inglês | MEDLINE | ID: mdl-36199058

RESUMO

BACKGROUND: Progesterone receptor (PGR) is a master regulator of uterine function through antagonistic and synergistic interplays with oestrogen receptors. PGR action is primarily mediated by activation functions AF1 and AF2, but their physiological significance is unknown. RESULTS: We report the first study of AF1 function in mice. The AF1 mutant mice are infertile with impaired implantation and decidualization. This is associated with a delay in the cessation of epithelial proliferation and in the initiation of stromal proliferation at preimplantation. Despite tissue selective effect on PGR target genes, AF1 mutations caused global loss of the antioestrogenic activity of progesterone in both pregnant and ovariectomized models. Importantly, the study provides evidence that PGR can exert an antioestrogenic effect by genomic inhibition of Esr1 and Greb1 expression. ChIP-Seq data mining reveals intermingled PGR and ESR1 binding on Esr1 and Greb1 gene enhancers. Chromatin conformation analysis shows reduced interactions in these genes' loci in the mutant, coinciding with their upregulations. CONCLUSION: AF1 mediates genomic inhibition of ESR1 action globally whilst it also has tissue-selective effect on PGR target genes.


Assuntos
Progesterona , Receptores de Progesterona , Animais , Cromatina/metabolismo , Endométrio/metabolismo , Estrogênios/metabolismo , Estrogênios/farmacologia , Feminino , Furilfuramida/metabolismo , Furilfuramida/farmacologia , Camundongos , Gravidez , Progesterona/metabolismo , Progesterona/farmacologia , Receptores de Estrogênio/genética , Receptores de Estrogênio/metabolismo , Receptores de Progesterona/genética , Receptores de Progesterona/metabolismo , Útero/metabolismo
8.
Proteomics ; 22(23-24): e2200092, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36349819

RESUMO

Proteomics data are often plagued with missingness issues. These missing values (MVs) threaten the integrity of subsequent statistical analyses by reduction of statistical power, introduction of bias, and failure to represent the true sample. Over the years, several categories of missing value imputation (MVI) methods have been developed and adapted for proteomics data. These MVI methods perform their tasks based on different prior assumptions (e.g., data is normally or independently distributed) and operating principles (e.g., the algorithm is built to address random missingness only), resulting in varying levels of performance even when dealing with the same dataset. Thus, to achieve a satisfactory outcome, a suitable MVI method must be selected. To guide decision making on suitable MVI method, we provide a decision chart which facilitates strategic considerations on datasets presenting different characteristics. We also bring attention to other issues that can impact proper MVI such as the presence of confounders (e.g., batch effects) which can influence MVI performance. Thus, these too, should be considered during or before MVI.


Assuntos
Algoritmos , Proteômica
9.
Brief Bioinform ; 20(1): 347-355, 2019 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-30657890

RESUMO

Mass spectrometry (MS)-based proteomics has undergone rapid advancements in recent years, creating challenging problems for bioinformatics. We focus on four aspects where bioinformatics plays a crucial role (and proteomics is needed for clinical application): peptide-spectra matching (PSM) based on the new data-independent acquisition (DIA) paradigm, resolving missing proteins (MPs), dealing with biological and technical heterogeneity in data and statistical feature selection (SFS). DIA is a brute-force strategy that provides greater width and depth but, because it indiscriminately captures spectra such that signal from multiple peptides is mixed, getting good PSMs is difficult. We consider two strategies: simplification of DIA spectra to pseudo-data-dependent acquisition spectra or, alternatively, brute-force search of each DIA spectra against known reference libraries. The MP problem arises when proteins are never (or inconsistently) detected by MS. When observed in at least one sample, imputation methods can be used to guess the approximate protein expression level. If never observed at all, network/protein complex-based contextualization provides an independent prediction platform. Data heterogeneity is a difficult problem with two dimensions: technical (batch effects), which should be removed, and biological (including demography and disease subpopulations), which should be retained. Simple normalization is seldom sufficient, while batch effect-correction algorithms may create errors. Batch effect-resistant normalization methods are a viable alternative. Finally, SFS is vital for practical applications. While many methods exist, there is no best method, and both upstream (e.g. normalization) and downstream processing (e.g. multiple-testing correction) are performance confounders. We also discuss signal detection when class effects are weak.


Assuntos
Biologia Computacional/métodos , Proteômica/estatística & dados numéricos , Algoritmos , Biologia Computacional/estatística & dados numéricos , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Peptídeos/química , Proteínas/química , Software , Espectrometria de Massas em Tandem/estatística & dados numéricos
11.
Mol Cancer ; 17(1): 152, 2018 10 20.
Artigo em Inglês | MEDLINE | ID: mdl-30342537

RESUMO

Overcoming multidrug resistance has always been a major challenge in cancer treatment. Recent evidence suggested epithelial-mesenchymal transition plays a role in MDR, but the mechanism behind this link remains unclear. We found that the expression of multiple ABC transporters was elevated in concordance with an increased drug efflux in cancer cells during EMT. The metastasis-related angiopoietin-like 4 (ANGPTL4) elevates cellular ATP to transcriptionally upregulate ABC transporters expression via the Myc and NF-κB signaling pathways. ANGPTL4 deficiency reduced IC50 of anti-tumor drugs and enhanced apoptosis of cancer cells. In vivo suppression of ANGPTL4 led to higher accumulation of cisplatin-DNA adducts in primary and metastasized tumors, and a reduced metastatic tumor load. ANGPTL4 empowered cancer cells metabolic flexibility during EMT, securing ample cellular energy that fuels multiple ABC transporters to confer EMT-mediated chemoresistance. It suggests that metabolic strategies aimed at suppressing ABC transporters along with energy deprivation of EMT cancer cells may overcome drug resistance.


Assuntos
Proteína 4 Semelhante a Angiopoietina/antagonistas & inibidores , Proteína 4 Semelhante a Angiopoietina/metabolismo , Antineoplásicos/farmacologia , Resistencia a Medicamentos Antineoplásicos , Metabolismo Energético/efeitos dos fármacos , Neoplasias/metabolismo , Transportadores de Cassetes de Ligação de ATP/genética , Transportadores de Cassetes de Ligação de ATP/metabolismo , Trifosfato de Adenosina/metabolismo , Proteína 4 Semelhante a Angiopoietina/genética , Animais , Linhagem Celular Tumoral , Transição Epitelial-Mesenquimal/efeitos dos fármacos , Transição Epitelial-Mesenquimal/genética , Humanos , Camundongos , Neoplasias/tratamento farmacológico , Neoplasias/genética
12.
Proteomics ; 17(10): e1700093, 2017 May.
Artigo em Inglês | MEDLINE | ID: mdl-28390171

RESUMO

Identifying reproducible yet relevant protein features in proteomics data is a major challenge. Analysis at the level of protein complexes can resolve this issue and we have developed a suite of feature-selection methods collectively referred to as Rank-Based Network Analysis (RBNA). RBNAs differ in their individual statistical test setup but are similar in the sense that they deploy rank-defined weights among proteins per sample. This procedure is known as gene fuzzy scoring. Currently, no RBNA exists for paired-sample scenarios where both control and test tissues originate from the same source (e.g. same patient). It is expected that paired tests, when used appropriately, are more powerful than approaches intended for unpaired samples. We report that the class-paired RBNA, PPFSNET, dominates in both simulated and real data scenarios. Moreover, for the first time, we explicitly incorporate batch-effect resistance as an additional evaluation criterion for feature-selection approaches. Batch effects are class irrelevant variations arising from different handlers or processing times, and can obfuscate analysis. We demonstrate that PPFSNET and an earlier RBNA, PFSNET, are particularly resistant against batch effects, and only select features strongly correlated with class but not batch.

13.
J Proteome Res ; 16(8): 3102-3112, 2017 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-28664733

RESUMO

Protein complex-based feature selection (PCBFS) provides unparalleled reproducibility with high phenotypic relevance on proteomics data. Currently, there are five PCBFS paradigms, but not all representative methods have been implemented or made readily available. To allow general users to take advantage of these methods, we developed the R-package NetProt, which provides implementations of representative feature-selection methods. NetProt also provides methods for generating simulated differential data and generating pseudocomplexes for complex-based performance benchmarking. The NetProt open source R package is available for download from https://github.com/gohwils/NetProt/releases/ , and online documentation is available at http://rpubs.com/gohwils/204259 .


Assuntos
Complexos Multiproteicos/análise , Proteômica/métodos , Benchmarking , Biologia Computacional/métodos , Humanos , Métodos , Reprodutibilidade dos Testes , Software
14.
BMC Genomics ; 18(Suppl 2): 142, 2017 03 14.
Artigo em Inglês | MEDLINE | ID: mdl-28361693

RESUMO

BACKGROUND: In proteomics, batch effects are technical sources of variation that confounds proper analysis, preventing effective deployment in clinical and translational research. RESULTS: Using simulated and real data, we demonstrate existing batch effect-correction methods do not always eradicate all batch effects. Worse still, they may alter data integrity, and introduce false positives. Moreover, although Principal component analysis (PCA) is commonly used for detecting batch effects. The principal components (PCs) themselves may be used as differential features, from which relevant differential proteins may be effectively traced. Batch effect are removable by identifying PCs highly correlated with batch but not class effect. However, neither PC-based nor existing batch effect-correction methods address well subtle batch effects, which are difficult to eradicate, and involve data transformation and/or projection which is error-prone. To address this, we introduce the concept of batch-effect resistant methods and demonstrate how such methods incorporating protein complexes are particularly resistant to batch effect without compromising data integrity. CONCLUSIONS: Protein complex-based analyses are powerful, offering unparalleled differential protein-selection reproducibility and high prediction accuracy. We demonstrate for the first time their innate resistance against batch effects, even subtle ones. As complex-based analyses require no prior data transformation (e.g. batch-effect correction), data integrity is protected. Individual checks on top-ranked protein complexes confirm strong association with phenotype classes and not batch. Therefore, the constituent proteins of these complexes are more likely to be clinically relevant.


Assuntos
Neoplasias Renais/química , Proteínas de Neoplasias/química , Análise de Componente Principal , Proteômica/estatística & dados numéricos , Análise por Conglomerados , Humanos , Ligação Proteica , Multimerização Proteica , Proteômica/métodos , Reprodutibilidade dos Testes , Manejo de Espécimes/normas
15.
J Neurochem ; 140(4): 613-628, 2017 02.
Artigo em Inglês | MEDLINE | ID: mdl-27935040

RESUMO

The brain adapts to dynamic environmental conditions by altering its epigenetic state, thereby influencing neuronal transcriptional programs. An example of an epigenetic modification is protein methylation, catalyzed by protein arginine methyltransferases (PRMT). One member, Prmt8, is selectively expressed in the central nervous system during a crucial phase of early development, but little else is known regarding its function. We hypothesize Prmt8 plays a role in synaptic maturation during development. To evaluate this, we used a proteome-wide approach to characterize the synaptic proteome of Prmt8 knockout versus wild-type mice. Through comparative network-based analyses, proteins and functional clusters related to neurite development were identified to be differentially regulated between the two genotypes. One interesting protein that was differentially regulated was tenascin-R (TNR). Chromatin immunoprecipitation demonstrated binding of PRMT8 to the tenascin-r (Tnr) promoter. TNR, a component of perineuronal nets, preserves structural integrity of synaptic connections within neuronal networks during the development of visual-somatosensory cortices. On closer inspection, Prmt8 removal increased net formation and decreased inhibitory parvalbumin-positive (PV+) puncta on pyramidal neurons, thereby hindering the maturation of circuits. Consequently, visual acuity of the knockout mice was reduced. Our results demonstrated Prmt8's involvement in synaptic maturation and its prospect as an epigenetic modulator of developmental neuroplasticity by regulating structural elements such as the perineuronal nets.


Assuntos
Epigênese Genética/fisiologia , Rede Nervosa/fisiologia , Proteína-Arginina N-Metiltransferases/deficiência , Proteoma/biossíntese , Sinapses/metabolismo , Animais , Aprendizagem por Discriminação/fisiologia , Feminino , Redes Reguladoras de Genes/fisiologia , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Knockout , Camundongos Transgênicos , Proteína-Arginina N-Metiltransferases/genética , Proteoma/genética , Sinapses/genética , Córtex Visual/citologia , Córtex Visual/fisiologia
16.
J Proteome Res ; 15(9): 3167-79, 2016 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-27454466

RESUMO

Despite advances in proteomic technologies, idiosyncratic data issues, for example, incomplete coverage and inconsistency, resulting in large data holes, persist. Moreover, because of naïve reliance on statistical testing and its accompanying p values, differential protein signatures identified from such proteomics data have little diagnostic power. Thus, deploying conventional analytics on proteomics data is insufficient for identifying novel drug targets or precise yet sensitive biomarkers. Complex-based analysis is a new analytical approach that has potential to resolve these issues but requires formalization. We categorize complex-based analysis into five method classes or paradigms and propose an even-handed yet comprehensive evaluation rubric based on both simulated and real data. The first four paradigms are well represented in the literature. The fifth and newest paradigm, the network-paired (NP) paradigm, represented by a method called Extremely Small SubNET (ESSNET), dominates in precision-recall and reproducibility, maintains strong performance in small sample sizes, and sensitively detects low-abundance complexes. In contrast, the commonly used over-representation analysis (ORA) and direct-group (DG) test paradigms maintain good overall precision but have severe reproducibility issues. The other two paradigms considered here are the hit-rate and rank-based network analysis paradigms; both of these have good precision-recall and reproducibility, but they do not consider low-abundance complexes. Therefore, given its strong performance, NP/ESSNET may prove to be a useful approach for improving the analytical resolution of proteomics data. Additionally, given its stability, it may also be a powerful new approach toward functional enrichment tests, much like its ORA and DG counterparts.


Assuntos
Redes e Vias Metabólicas , Proteômica/métodos , Biologia Computacional/métodos , Humanos , Mapas de Interação de Proteínas , Proteômica/normas , Reprodutibilidade dos Testes , Tamanho da Amostra
17.
Nat Commun ; 15(1): 3922, 2024 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-38724498

RESUMO

Identification of differentially expressed proteins in a proteomics workflow typically encompasses five key steps: raw data quantification, expression matrix construction, matrix normalization, missing value imputation (MVI), and differential expression analysis. The plethora of options in each step makes it challenging to identify optimal workflows that maximize the identification of differentially expressed proteins. To identify optimal workflows and their common properties, we conduct an extensive study involving 34,576 combinatoric experiments on 24 gold standard spike-in datasets. Applying frequent pattern mining techniques to top-ranked workflows, we uncover high-performing rules that demonstrate optimality has conserved properties. Via machine learning, we confirm optimal workflows are indeed predictable, with average cross-validation F1 scores and Matthew's correlation coefficients surpassing 0.84. We introduce an ensemble inference to integrate results from individual top-performing workflows for expanding differential proteome coverage and resolve inconsistencies. Ensemble inference provides gains in pAUC (up to 4.61%) and G-mean (up to 11.14%) and facilitates effective aggregation of information across varied quantification approaches such as topN, directLFQ, MaxLFQ intensities, and spectral counts. However, further development and evaluation are needed to establish acceptable frameworks for conducting ensemble inference on multiple proteomics workflows.


Assuntos
Proteômica , Proteômica/métodos , Fluxo de Trabalho , Aprendizado de Máquina , Proteoma/metabolismo , Humanos , Algoritmos , Bases de Dados de Proteínas
18.
J Proteome Res ; 12(5): 2116-27, 2013 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-23557376

RESUMO

Despite its prominence for characterization of complex mixtures, LC-MS/MS frequently fails to identify many proteins. Network-based analysis methods, based on protein-protein interaction networks (PPINs), biological pathways, and protein complexes, are useful for recovering non-detected proteins, thereby enhancing analytical resolution. However, network-based analysis methods do come in varied flavors for which the respective efficacies are largely unknown. We compare the recovery performance and functional insights from three distinct instances of PPIN-based approaches, viz., Proteomics Expansion Pipeline (PEP), Functional Class Scoring (FCS), and Maxlink, in a test scenario of valproic acid (VPA)-treated mice. We find that the most comprehensive functional insights, as well as best non-detected protein recovery performance, are derived from FCS utilizing real biological complexes. This outstrips other network-based methods such as Maxlink or Proteomics Expansion Pipeline (PEP). From FCS, we identified known biological complexes involved in epigenetic modifications, neuronal system development, and cytoskeletal rearrangements. This is congruent with the observed phenotype where adult mice showed an increase in dendritic branching to allow the rewiring of visual cortical circuitry and an improvement in their visual acuity when tested behaviorally. In addition, PEP also identified a novel complex, comprising YWHAB, NR1, NR2B, ACTB, and TJP1, which is functionally related to the observed phenotype. Although our results suggest different network analysis methods can produce different results, on the whole, the findings are mutually supportive. More critically, the non-overlapping information each provides can provide greater holistic understanding of complex phenotypes.


Assuntos
Anticonvulsivantes/farmacologia , Mapas de Interação de Proteínas , Proteoma/metabolismo , Ácido Valproico/farmacologia , Córtex Visual/metabolismo , Animais , Análise por Conglomerados , Feminino , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Anotação de Sequência Molecular , Complexos Multiproteicos/genética , Complexos Multiproteicos/metabolismo , Mapeamento de Interação de Proteínas/métodos , Proteoma/genética , Proteômica , Transcriptoma , Córtex Visual/efeitos dos fármacos
19.
J Proteome Res ; 12(6): 2933-45, 2013 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-23659346

RESUMO

Troglitazone, a first-generation thiazolidinedione of antihyperglycaemic properties, was withdrawn from the market due to unacceptable idiosyncratic hepatotoxicity. Despite intensive research, the underlying mechanism of troglitazone-induced liver toxicity remains unknown. Here we report the use of the Sod2(+/-) mouse model of silent mitochondrial oxidative-stress-based and quantitative mass spectrometry-based proteomics to track the mitochondrial proteome changes induced by physiologically relevant troglitazone doses. By quantitative untargeted proteomics, we first globally profiled the Sod2(+/-) hepatic mitochondria proteome and found perturbations including GSH metabolism that enhanced the toxicity of the normally nontoxic troglitazone. Short- and long-term troglitazone administration in Sod2(+/-) mouse led to a mitochondrial proteome shift from an early compensatory response to an eventual phase of intolerable oxidative stress, due to decreased mitochondrial glutathione (mGSH) import protein, decreased dicarboxylate ion carrier (DIC), and the specific activation of ASK1-JNK and FOXO3a with prolonged troglitazone exposure. Furthermore, mapping of the detected proteins onto mouse specific protein-centered networks revealed lipid-associated proteins as contributors to overt mitochondrial and liver injury when under prolonged exposure to the lipid-normalizing troglitazone. By integrative toxicoproteomics, we demonstrated a powerful systems approach in identifying the collapse of specific fragile nodes and activation of crucial proteome reconfiguration regulators when targeted by an exogenous toxicant.


Assuntos
Cromanos/toxicidade , Glutationa/antagonistas & inibidores , Hipoglicemiantes/toxicidade , Mitocôndrias/efeitos dos fármacos , Proteínas Mitocondriais/genética , Proteômica , Tiazolidinedionas/toxicidade , Animais , Transportadores de Ácidos Dicarboxílicos/antagonistas & inibidores , Transportadores de Ácidos Dicarboxílicos/genética , Transportadores de Ácidos Dicarboxílicos/metabolismo , Feminino , Proteína Forkhead Box O3 , Fatores de Transcrição Forkhead/agonistas , Fatores de Transcrição Forkhead/genética , Fatores de Transcrição Forkhead/metabolismo , Regulação da Expressão Gênica/efeitos dos fármacos , Glutationa/metabolismo , Humanos , Transporte de Íons/efeitos dos fármacos , MAP Quinase Quinase 4/genética , MAP Quinase Quinase 4/metabolismo , MAP Quinase Quinase Quinase 5/genética , MAP Quinase Quinase Quinase 5/metabolismo , Masculino , Camundongos , Camundongos Knockout , Mitocôndrias/genética , Mitocôndrias/metabolismo , Proteínas Mitocondriais/metabolismo , Estresse Oxidativo/efeitos dos fármacos , Transdução de Sinais , Superóxido Dismutase/deficiência , Superóxido Dismutase/genética , Troglitazona
20.
BMC Genomics ; 14: 35, 2013 Jan 16.
Artigo em Inglês | MEDLINE | ID: mdl-23324392

RESUMO

BACKGROUND: Proteomics Signature Profiling (PSP) is a novel hit-rate based method that proved useful in resolving consistency and coverage issues in proteomics. As a follow-up study, several points need to be addressed: 1/ PSP's generalisability to pathways, 2/ understanding the biological interplay between significant complexes and pathway subnets co-located on the same pathways on our liver cancer dataset, 3/ understanding PSP's false positive rate and 4/ demonstrating that PSP works on other suitable proteomics datasets as well as expanding PSP's analytical resolution via the use of specialised ontologies. RESULTS: 1/ PSP performs well with Pathway-Derived Subnets (PDSs). Comparing the performance of PDSs derived from various pathway databases, we find that an integrative approach is best for optimising analytical resolution. Feature selection also confirms that significant PDSs are closely connected to the cancer phenotype.2/ In liver cancer, correlation studies of significant PSP complexes and PDSs co-localised on the same pathways revealed an interesting relationship between the purine metabolism pathway and two other complexes involved in DNA repair. Our work suggests progression to poor stage requires additional mutations that disrupt DNA repair enzymes.3/ False positive analysis reveals that PSP, applied on both complexes and PDSs, is powerful and precise.4/ Via an expert-curated lipid ontology, we uncovered several interesting lipid-associated complexes that could be associated with cancer progression. Of particular interest is the HMGB1-HMGB2-HSC70-ERP60-GAPDH complex which is also involved in DNA repair. We also demonstrated generalisability of PSP using a non-small-cell lung carcinoma data set. CONCLUSIONS: PSP is a powerful and precise technique, capable of identifying biologically coherent features. It works with biological complexes, network-predicted clusters as well as PDSs. Here, an instance of the interplay between significant PDSs and complexes, possibly significantly involved in liver cancer progression but not well understood as yet, is demonstrated. Also demonstrated is the enhancement of PSP's analytical resolution using specialised ontologies.


Assuntos
Proteômica/métodos , Carcinoma Pulmonar de Células não Pequenas/genética , Bases de Dados Genéticas , Neoplasias Hepáticas/metabolismo , Redes e Vias Metabólicas/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA