Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 40
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Bioinformatics ; 39(6)2023 06 01.
Article in English | MEDLINE | ID: mdl-37314966

ABSTRACT

MOTIVATION: Recent advances in multimodal single-cell omics technologies enable multiple modalities of molecular attributes, such as gene expression, chromatin accessibility, and protein abundance, to be profiled simultaneously at a global level in individual cells. While the increasing availability of multiple data modalities is expected to provide a more accurate clustering and characterization of cells, the development of computational methods that are capable of extracting information embedded across data modalities is still in its infancy. RESULTS: We propose SnapCCESS for clustering cells by integrating data modalities in multimodal single-cell omics data using an unsupervised ensemble deep learning framework. By creating snapshots of embeddings of multimodality using variational autoencoders, SnapCCESS can be coupled with various clustering algorithms for generating consensus clustering of cells. We applied SnapCCESS with several clustering algorithms to various datasets generated from popular multimodal single-cell omics technologies. Our results demonstrate that SnapCCESS is effective and more efficient than conventional ensemble deep learning-based clustering methods and outperforms other state-of-the-art multimodal embedding generation methods in integrating data modalities for clustering cells. The improved clustering of cells from SnapCCESS will pave the way for more accurate characterization of cell identity and types, an essential step for various downstream analyses of multimodal single-cell omics data. AVAILABILITY AND IMPLEMENTATION: SnapCCESS is implemented as a Python package and is freely available from https://github.com/PYangLab/SnapCCESS under the open-source license of GPL-3. The data used in this study are publicly available (see section 'Data availability').


Subject(s)
Deep Learning , Algorithms , Cluster Analysis , Chromatin , Single-Cell Analysis
2.
Nat Methods ; 17(8): 799-806, 2020 08.
Article in English | MEDLINE | ID: mdl-32661426

ABSTRACT

Single-cell genomics has transformed our ability to examine cell fate choice. Examining cells along a computationally ordered 'pseudotime' offers the potential to unpick subtle changes in variability and covariation among key genes. We describe an approach, scHOT-single-cell higher-order testing-which provides a flexible and statistically robust framework for identifying changes in higher-order interactions among genes. scHOT can be applied for cells along a continuous trajectory or across space and accommodates various higher-order measurements including variability or correlation. We demonstrate the use of scHOT by studying coordinated changes in higher-order interactions during embryonic development of the mouse liver. Additionally, scHOT identifies subtle changes in gene-gene correlations across space using spatially resolved transcriptomics data from the mouse olfactory bulb. scHOT meaningfully adds to first-order differential expression testing and provides a framework for interrogating higher-order interactions using single-cell data.


Subject(s)
Liver/embryology , Single-Cell Analysis/methods , Animals , Computational Biology , Databases, Nucleic Acid , Hepatocytes/physiology , Liver/cytology , Mice , Oligonucleotide Array Sequence Analysis , Sequence Analysis, RNA , Software
3.
Bioinformatics ; 38(20): 4745-4753, 2022 10 14.
Article in English | MEDLINE | ID: mdl-36040148

ABSTRACT

MOTIVATION: With the recent surge of large-cohort scale single cell research, it is of critical importance that analytical methods can fully utilize the comprehensive characterization of cellular systems that single cell technologies produce to provide insights into samples from individuals. Currently, there is little consensus on the best ways to compress information from the complex data structures of these technologies to summary statistics that represent each sample (e.g. individuals). RESULTS: Here, we present scFeatures, an approach that creates interpretable cellular and molecular representations of single-cell and spatial data at the sample level. We demonstrate that summarizing a broad collection of features at the sample level is both important for understanding underlying disease mechanisms in different experimental studies and for accurately classifying disease status of individuals. AVAILABILITY AND IMPLEMENTATION: scFeatures is publicly available as an R package at https://github.com/SydneyBioX/scFeatures. All data used in this study are publicly available with accession ID reported in the Section 2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Software , Humans
4.
Transpl Int ; 36: 11338, 2023.
Article in English | MEDLINE | ID: mdl-37767525

ABSTRACT

Accurate prediction of allograft survival after kidney transplantation allows early identification of at-risk recipients for adverse outcomes and initiation of preventive interventions to optimize post-transplant care. Many prediction algorithms do not model cohort heterogeneity and may lead to inaccurate assessment of longer-term graft outcomes among minority groups. Using data from a national Australian kidney transplant cohort (2008-2017) as the derivation set, we developed P-Cube, a multi-step precision prediction pathway model for predicting overall graft survival in three ethnic subgroups: European Australians, Asian Australians and Aboriginal and Torres Strait Islander Peoples. The concordance index for the European Australians, Asian Australians, and Aboriginal and Torres Strait Islander Peoples subpopulations were 0.99 (0.98-0.99), 0.93 (0.92-0.94) and 0.92 (0.91-0.93), respectively. Similar findings were observed when validating P-cube using an external dataset [Scientific Registry of Transplant Recipient Registry (2006-2020)]. Six sub-categories of recipients with distinct risk factor profiles were identified. Some factors such as blood group compatibility were considered important across the entire transplant population. Other factors such as human leukocyte antigen (HLA)-DR mismatches were unique to older recipients. The P-cube model identifies allograft survival specific risk factors within a heterogenous population and offers personalized survival predictions in a diverse cohort.


Subject(s)
Kidney Transplantation , Humans , Kidney Transplantation/adverse effects , Transplant Recipients , Australia/epidemiology , Transplantation, Homologous , Allografts
5.
Proc Natl Acad Sci U S A ; 116(20): 9775-9784, 2019 05 14.
Article in English | MEDLINE | ID: mdl-31028141

ABSTRACT

Concerted examination of multiple collections of single-cell RNA sequencing (RNA-seq) data promises further biological insights that cannot be uncovered with individual datasets. Here we present scMerge, an algorithm that integrates multiple single-cell RNA-seq datasets using factor analysis of stably expressed genes and pseudoreplicates across datasets. Using a large collection of public datasets, we benchmark scMerge against published methods and demonstrate that it consistently provides improved cell type separation by removing unwanted factors; scMerge can also enhance biological discovery through robust data integration, which we show through the inference of development trajectory in a liver dataset collection.


Subject(s)
Meta-Analysis as Topic , Sequence Analysis, RNA , Single-Cell Analysis , Software , Algorithms , Animals , Embryonic Development , Factor Analysis, Statistical , Gene Expression , Humans , Mice
6.
Kidney Int ; 99(4): 817-823, 2021 04.
Article in English | MEDLINE | ID: mdl-32916179

ABSTRACT

Kidney transplant recipients and transplant physicians face important clinical questions where machine learning methods may help improve the decision-making process. This mini-review explores potential applications of machine learning methods to key stages of a kidney transplant recipient's journey, from initial waitlisting and donor selection, to personalization of immunosuppression and prediction of post-transplantation events. Both unsupervised and supervised machine learning methods are presented, including k-means clustering, principal components analysis, k-nearest neighbors, and random forests. The various challenges of these approaches are also discussed.


Subject(s)
Kidney Transplantation , Machine Learning , Humans , Kidney Transplantation/adverse effects , Transplant Recipients
7.
Brief Bioinform ; 20(6): 2316-2326, 2019 11 27.
Article in English | MEDLINE | ID: mdl-30137247

ABSTRACT

Advances in high-throughput sequencing on single-cell gene expressions [single-cell RNA sequencing (scRNA-seq)] have enabled transcriptome profiling on individual cells from complex samples. A common goal in scRNA-seq data analysis is to discover and characterise cell types, typically through clustering methods. The quality of the clustering therefore plays a critical role in biological discovery. While numerous clustering algorithms have been proposed for scRNA-seq data, fundamentally they all rely on a similarity metric for categorising individual cells. Although several studies have compared the performance of various clustering algorithms for scRNA-seq data, currently there is no benchmark of different similarity metrics and their influence on scRNA-seq data clustering. Here, we compared a panel of similarity metrics on clustering a collection of annotated scRNA-seq datasets. Within each dataset, a stratified subsampling procedure was applied and an array of evaluation measures was employed to assess the similarity metrics. This produced a highly reliable and reproducible consensus on their performance assessment. Overall, we found that correlation-based metrics (e.g. Pearson's correlation) outperformed distance-based metrics (e.g. Euclidean distance). To test if the use of correlation-based metrics can benefit the recently published clustering techniques for scRNA-seq data, we modified a state-of-the-art kernel-based clustering algorithm (SIMLR) using Pearson's correlation as a similarity measure and found significant performance improvement over Euclidean distance on scRNA-seq data clustering. These findings demonstrate the importance of similarity metrics in clustering scRNA-seq data and highlight Pearson's correlation as a favourable choice. Further comparison on different scRNA-seq library preparation protocols suggests that they may also affect clustering performance. Finally, the benchmarking framework is available at http://www.maths.usyd.edu.au/u/SMS/bioinformatics/software.html.


Subject(s)
Sequence Analysis, RNA , Algorithms , Cluster Analysis , Humans
8.
Bioinformatics ; 36(14): 4137-4143, 2020 08 15.
Article in English | MEDLINE | ID: mdl-32353146

ABSTRACT

MOTIVATION: Multi-modal profiling of single cells represents one of the latest technological advancements in molecular biology. Among various single-cell multi-modal strategies, cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) allows simultaneous quantification of two distinct species: RNA and cell-surface proteins. Here, we introduce CiteFuse, a streamlined package consisting of a suite of tools for doublet detection, modality integration, clustering, differential RNA and protein expression analysis, antibody-derived tag evaluation, ligand-receptor interaction analysis and interactive web-based visualization of CITE-seq data. RESULTS: We demonstrate the capacity of CiteFuse to integrate the two data modalities and its relative advantage against data generated from single-modality profiling using both simulations and real-world CITE-seq data. Furthermore, we illustrate a novel doublet detection method based on a combined index of cell hashing and transcriptome data. Finally, we demonstrate CiteFuse for predicting ligand-receptor interactions by using multi-modal CITE-seq data. Collectively, we demonstrate the utility and effectiveness of CiteFuse for the integrative analysis of transcriptome and epitope profiles from CITE-seq data. AVAILABILITY AND IMPLEMENTATION: CiteFuse is freely available at http://shiny.maths.usyd.edu.au/CiteFuse/ as an online web service and at https://github.com/SydneyBioX/CiteFuse/ as an R package. CONTACT: pengyi.yang@sydney.edu.au. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Software , Transcriptome , Epitopes , Gene Expression Profiling , RNA , Sequence Analysis, RNA , Single-Cell Analysis
9.
Mol Syst Biol ; 16(6): e9389, 2020 06.
Article in English | MEDLINE | ID: mdl-32567229

ABSTRACT

Automated cell type identification is a key computational challenge in single-cell RNA-sequencing (scRNA-seq) data. To capitalise on the large collection of well-annotated scRNA-seq datasets, we developed scClassify, a multiscale classification framework based on ensemble learning and cell type hierarchies constructed from single or multiple annotated datasets as references. scClassify enables the estimation of sample size required for accurate classification of cell types in a cell type hierarchy and allows joint classification of cells when multiple references are available. We show that scClassify consistently performs better than other supervised cell type classification methods across 114 pairs of reference and testing data, representing a diverse combination of sizes, technologies and levels of complexity, and further demonstrate the unique components of scClassify through simulations and compendia of experimental datasets. Finally, we demonstrate the scalability of scClassify on large single-cell atlases and highlight a novel application of identifying subpopulations of cells from the Tabula Muris data that were unidentified in the original publication. Together, scClassify represents state-of-the-art methodology in automated cell type identification from scRNA-seq data.


Subject(s)
Cells/metabolism , Animals , Cluster Analysis , Databases as Topic , Humans , Leukocytes, Mononuclear/metabolism , Machine Learning , Mice , Pancreas/metabolism , Sample Size , Software
10.
BMC Pregnancy Childbirth ; 21(1): 277, 2021 Apr 06.
Article in English | MEDLINE | ID: mdl-33823838

ABSTRACT

BACKGROUND: There is increasing awareness that perinatal psychosocial adversity experienced by mothers, children, and their families, may influence health and well-being across the life course. To maximise the impact of population-based interventions for optimising perinatal wellbeing, health services can utilise empirical methods to identify subgroups at highest risk of poor outcomes relative to the overall population. METHODS: This study sought to identify sub-groups using latent class analysis within a population of mothers in Sydney, Australia, based on their differing experience of self-reported indicators of psychosocial adversity. This study sought to identify sub-groups using latent class analysis within a population of mothers in Sydney, Australia, based on their differing experience of self-reported indicators of psychosocial adversity. Subgroup differences in antenatal and postnatal depressive symptoms were assessed using the Edinburgh Postnatal Depression Scale. RESULTS: Latent class analysis identified four distinct subgroups within the cohort, who were distinguished empirically on the basis of their native language, current smoking status, previous involvement with Family-and-Community Services (FaCS), history of child abuse, presence of a supportive partner, and a history of intimate partner psychological violence. One group consisted of socially supported 'local' women who speak English as their primary language (Group L), another of socially supported 'migrant' women who speak a language other than English as their primary language (Group M), another of socially stressed 'local' women who speak English as their primary language (Group Ls), and socially stressed 'migrant' women who speak a language other than English as their primary language (Group Ms.). Compared to local and not socially stressed residents (L group), the odds of antenatal depression were nearly three times higher for the socially stressed groups (Ls OR: 2.87 95%CI 2.10-3.94) and nearly nine times more in the Ms. group (Ms OR: 8.78, 95%CI 5.13-15.03). Antenatal symptoms of depression were also higher in the not socially stressed migrant group (M OR: 1.70 95%CI 1.47-1.97) compared to non-migrants. In the postnatal period, Group M was 1.5 times more likely, while the Ms. group was over five times more likely to experience suboptimal mental health compared to Group L (OR 1.50, 95%CI 1.22-1.84; and OR 5.28, 95%CI 2.63-10.63, for M and Ms. respectively). CONCLUSIONS: The application of empirical subgrouping analysis permits an informed approach to targeted interventions and resource allocation for optimising perinatal maternal wellbeing.


Subject(s)
Depression, Postpartum/prevention & control , Mass Screening/organization & administration , Maternal Health/statistics & numerical data , Mental Health/statistics & numerical data , Adult , Australia/epidemiology , Depression, Postpartum/diagnosis , Depression, Postpartum/epidemiology , Depression, Postpartum/psychology , Electronic Health Records/statistics & numerical data , Female , Health Care Rationing , Humans , Infant, Newborn , Latent Class Analysis , Mass Screening/methods , Perinatal Care/methods , Perinatal Care/organization & administration , Pregnancy , Psychiatric Status Rating Scales/statistics & numerical data , Retrospective Studies , Risk Assessment/methods , Self Report/statistics & numerical data , Social Determinants of Health/statistics & numerical data , Young Adult
11.
Proteomics ; 19(13): e1900068, 2019 07.
Article in English | MEDLINE | ID: mdl-31099962

ABSTRACT

The increasing role played by liquid chromatography-mass spectrometry (LC-MS)-based proteomics in biological discovery has led to a growing need for quality control (QC) on the LC-MS systems. While numerous quality control tools have been developed to track the performance of LC-MS systems based on a pre-defined set of performance factors (e.g., mass error, retention time), the precise influence and contribution of the performance factors and their generalization property to different biological samples are not as well characterized. Here, a web-based application (QCMAP) is developed for interactive diagnosis and prediction of the performance of LC-MS systems across different biological sample types. Leveraging on a standardized HeLa cell sample run as QC within a multi-user facility, predictive models are trained on a panel of commonly used performance factors to pinpoint the precise conditions to a (un)satisfactory performance in three LC-MS systems. It is demonstrated that the learned model can be applied to predict LC-MS system performance for brain samples generated from an independent study. By compiling these predictive models into our web-application, QCMAP allows users to benchmark the performance of their LC-MS systems using their own samples and identify key factors for instrument optimization. QCMAP is freely available from: http://shiny.maths.usyd.edu.au/QCMAP/.


Subject(s)
Chromatography, Liquid/methods , Proteomics/methods , Quality Control , Tandem Mass Spectrometry/methods , Cell Line, Tumor , HeLa Cells , Humans , Internet
12.
BMC Genomics ; 20(Suppl 9): 913, 2019 Dec 24.
Article in English | MEDLINE | ID: mdl-31874628

ABSTRACT

BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) is a fast emerging technology allowing global transcriptome profiling on the single cell level. Cell type identification from scRNA-seq data is a critical task in a variety of research such as developmental biology, cell reprogramming, and cancers. Typically, cell type identification relies on human inspection using a combination of prior biological knowledge (e.g. marker genes and morphology) and computational techniques (e.g. PCA and clustering). Due to the incompleteness of our current knowledge and the subjectivity involved in this process, a small amount of cells may be subject to mislabelling. RESULTS: Here, we propose a semi-supervised learning framework, named scReClassify, for 'post hoc' cell type identification from scRNA-seq datasets. Starting from an initial cell type annotation with potentially mislabelled cells, scReClassify first performs dimension reduction using PCA and next applies a semi-supervised learning method to learn and subsequently reclassify cells that are likely mislabelled initially to the most probable cell types. By using both simulated and real-world experimental datasets that profiled various tissues and biological systems, we demonstrate that scReClassify is able to accurately identify and reclassify misclassified cells to their correct cell types. CONCLUSIONS: scReClassify can be used for scRNA-seq data as a post hoc cell type classification tool to fine-tune cell type annotations generated by any cell type classification procedure. It is implemented as an R package and is freely available from https://github.com/SydneyBioX/scReClassify.


Subject(s)
RNA-Seq/methods , Animals , Humans , Machine Learning , Mice , Single-Cell Analysis/methods , Software
13.
Bioinformatics ; 33(13): 1916-1920, 2017 Jul 01.
Article in English | MEDLINE | ID: mdl-28203701

ABSTRACT

MOTIVATION: DNA binding proteins such as chromatin remodellers, transcription factors (TFs), histone modifiers and co-factors often bind cooperatively to activate or repress their target genes in a cell type-specific manner. Nonetheless, the precise role of cooperative binding in defining cell-type identity is still largely uncharacterized. RESULTS: Here, we collected and analyzed 214 public datasets representing chromatin immunoprecipitation followed by sequencing (ChIP-Seq) of 104 DNA binding proteins in embryonic stem cell (ESC) lines. We classified their binding sites into those proximal to gene promoters and those in distal regions, and developed a web resource called Proximal And Distal (PAD) clustering to identify their co-localization at these respective regions. Using this extensive dataset, we discovered an extensive co-localization of BRG1 and CHD7 at distal but not proximal regions. The comparison of co-localization sites to those bound by either BRG1 or CHD7 alone showed an enrichment of ESC master TFs binding and active chromatin architecture at co-localization sites. Most notably, our analysis reveals the co-dependency of BRG1 and CHD7 at distal regions on regulating expression of their common target genes in ESC. This work sheds light on cooperative binding of TF binding proteins in regulating gene expression in ESC, and demonstrates the utility of integrative analysis of a manually curated compendium of genome-wide protein binding profiles in our online resource PAD. AVAILABILITY AND IMPLEMENTATION: PAD is freely available at http://pad.victorchang.edu.au/ and its source code is available via an open source GPL 3.0 license at https://github.com/VCCRI/PAD/. CONTACT: pengyi.yang@sydney.edu.au or j.ho@victorchang.edu.au. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
DNA Helicases/genetics , DNA-Binding Proteins/genetics , Embryonic Stem Cells/metabolism , Gene Expression Regulation, Developmental , Nuclear Proteins/genetics , Sequence Analysis, DNA/methods , Software , Transcription Factors/genetics , Animals , Cell Line , Chromatin Immunoprecipitation/methods , Mice
14.
BMC Dev Biol ; 17(1): 4, 2017 02 13.
Article in English | MEDLINE | ID: mdl-28193178

ABSTRACT

BACKGROUND: The molecular mechanisms underlying the development of the unusual echinoderm pentameral body plan and their likeness to mechanisms underlying the development of the bilateral plans of other deuterostomes are of interest in tracing body plan evolution. In this first study of the spatial expression of genes associated with Nodal and BMP2/4 signalling during the transition to pentamery in sea urchins, we investigate Heliocidaris erythrogramma, a species that provides access to the developing adult rudiment within days of fertilization. RESULTS: BMP2/4, and the putative downstream genes, Six1/2, Eya, Tbx2/3 and Msx were expressed in the earliest morphological manifestation of pentamery during development, the five hydrocoele lobes. The formation of the vestibular ectoderm, the specialized region overlying the left coelom that forms adult ectoderm, involved the expression of putative Nodal target genes Chordin, Gsc and BMP2/4 and putative BMP2/4 target genes Dlx, Msx and Tbx. The expression of Nodal, Lefty and Pitx2 in the right ectoderm, and Pitx2 in the right coelom, was as previously observed in other sea urchins. CONCLUSION: That genes associated with Nodal and BMP2/4 signalling are expressed in the hydrocoele lobes, indicates that they have a role in the developmental transition to pentamery, contributing to our understanding of how the most unusual body plan in the Bilateria may have evolved. We suggest that the Nodal and BMP2/4 signalling cascades might have been duplicated or split during the evolution to pentamery.


Subject(s)
Anthocidaris/growth & development , Anthocidaris/genetics , Body Patterning/genetics , Bone Morphogenetic Proteins/genetics , Gene Expression Regulation, Developmental , Nodal Protein/genetics , Animals , Bone Morphogenetic Proteins/metabolism , Ectoderm/metabolism , Nodal Protein/metabolism , Signal Transduction
15.
Proteomics ; 16(13): 1868-71, 2016 07.
Article in English | MEDLINE | ID: mdl-27145998

ABSTRACT

Mass spectrometry (MS)-based quantitative phosphoproteomics has become a key approach for proteome-wide profiling of phosphorylation in tissues and cells. Traditional experimental design often compares a single treatment with a control, whereas increasingly more experiments are designed to compare multiple treatments with respect to a control. To this end, the development of bioinformatic tools that can integrate multiple treatments and visualise kinases and substrates under combinatorial perturbations is vital for dissecting concordant and/or independent effects of each treatment. Here, we propose a hypothesis driven kinase perturbation analysis (KinasePA) to annotate and visualise kinases and their substrates that are perturbed by various combinatorial effects of treatments in phosphoproteomics experiments. We demonstrate the utility of KinasePA through its application to two large-scale phosphoproteomics datasets and show its effectiveness in dissecting kinases and substrates within signalling pathways driven by unique combinations of cellular stimuli and inhibitors. We implemented and incorporated KinasePA as part of the "directPA" R package available from the comprehensive R archive network (CRAN). Furthermore, KinasePA also has an interactive web interface that can be readily applied to annotate user provided phosphoproteomics data (http://kinasepa.pengyiyang.org).


Subject(s)
Protein Kinases/metabolism , Proteomics/methods , Cell Line , Chromones/pharmacology , Databases, Protein , Heterocyclic Compounds, 3-Ring/pharmacology , Humans , Insulin/metabolism , Morpholines/pharmacology , Naphthyridines/pharmacology , Phosphorylation , Protein Kinase Inhibitors/pharmacology , Signal Transduction/drug effects , Sirolimus/pharmacology , TOR Serine-Threonine Kinases/antagonists & inhibitors , TOR Serine-Threonine Kinases/metabolism
16.
PLoS Comput Biol ; 11(8): e1004403, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26252020

ABSTRACT

Cell signaling underlies transcription/epigenetic control of a vast majority of cell-fate decisions. A key goal in cell signaling studies is to identify the set of kinases that underlie key signaling events. In a typical phosphoproteomics study, phosphorylation sites (substrates) of active kinases are quantified proteome-wide. By analyzing the activities of phosphorylation sites over a time-course, the temporal dynamics of signaling cascades can be elucidated. Since many substrates of a given kinase have similar temporal kinetics, clustering phosphorylation sites into distinctive clusters can facilitate identification of their respective kinases. Here we present a knowledge-based CLUster Evaluation (CLUE) approach for identifying the most informative partitioning of a given temporal phosphoproteomics data. Our approach utilizes prior knowledge, annotated kinase-substrate relationships mined from literature and curated databases, to first generate biologically meaningful partitioning of the phosphorylation sites and then determine key kinases associated with each cluster. We demonstrate the utility of the proposed approach on two time-series phosphoproteomics datasets and identify key kinases associated with human embryonic stem cell differentiation and insulin signaling pathway. The proposed approach will be a valuable resource in the identification and characterizing of signaling networks from phosphoproteomics data.


Subject(s)
Cell Communication/physiology , Knowledge Bases , Phosphoproteins/metabolism , Proteome/metabolism , Proteomics/methods , Signal Transduction/physiology , Cell Differentiation/physiology , Cell Line , Databases, Protein , Embryonic Stem Cells , Humans
17.
BMC Genomics ; 16: 617, 2015 Aug 19.
Article in English | MEDLINE | ID: mdl-26283093

ABSTRACT

BACKGROUND: Most biological processes are influenced by protein post-translational modifications (PTMs). Identifying novel PTM sites in different organisms, including humans and model organisms, has expedited our understanding of key signal transduction mechanisms. However, with increasing availability of deep, quantitative datasets in diverse species, there is a growing need for tools to facilitate cross-species comparison of PTM data. This is particularly important because functionally important modification sites are more likely to be evolutionarily conserved; yet cross-species comparison of PTMs is difficult since they often lie in structurally disordered protein domains. Current tools that address this can only map known PTMs between species based on known orthologous phosphosites, and do not enable the cross-species mapping of newly identified modification sites. Here, we addressed this by developing a web-based software tool, PhosphOrtholog ( www.phosphortholog.com ) that accurately maps protein modification sites between different species. This facilitates the comparison of datasets derived from multiple species, and should be a valuable tool for the proteomics community. RESULTS: Here we describe PhosphOrtholog, a web-based application for mapping known and novel orthologous PTM sites from experimental data obtained from different species. PhosphOrtholog is the only generic and automated tool that enables cross-species comparison of large-scale PTM datasets without relying on existing PTM databases. This is achieved through pairwise sequence alignment of orthologous protein residues. To demonstrate its utility we apply it to two sets of human and rat muscle phosphoproteomes generated following insulin and exercise stimulation, respectively, and one publicly available mouse phosphoproteome following cellular stress revealing high mapping and coverage efficiency. Although coverage statistics are dataset dependent, PhosphOrtholog increased the number of cross-species mapped sites in all our example data sets by more than double when compared to those recovered using existing resources such as PhosphoSitePlus. CONCLUSIONS: PhosphOrtholog is the first tool that enables mapping of thousands of novel and known protein phosphorylation sites across species, accessible through an easy-to-use web interface. Identification of conserved PTMs across species from large-scale experimental data increases our knowledgebase of functional PTM sites. Moreover, PhosphOrtholog is generic being applicable to other PTM datasets such as acetylation, ubiquitination and methylation.


Subject(s)
Protein Processing, Post-Translational , Proteome/chemistry , Proteome/metabolism , Sequence Analysis, Protein/methods , Animals , Databases, Protein , Humans , Internet , Mice , Phosphorylation , Rats , Software
18.
Int J Cancer ; 136(4): 863-74, 2015 Feb 15.
Article in English | MEDLINE | ID: mdl-24975271

ABSTRACT

In patients with metastatic melanoma, the identification and validation of accurate prognostic biomarkers will assist rational treatment planning. Studies based on "-omics" technologies have focussed on a single high-throughput data type such as gene or microRNA transcripts. Occasionally, these features have been evaluated in conjunction with limited clinico-pathologic data. With the increased availability of multiple data types, there is a pressing need to tease apart which of these sources contain the most valuable prognostic information. We evaluated and integrated several data types derived from the same tumor specimens in AJCC stage III melanoma patients-gene, protein, and microRNA expression as well as clinical, pathologic and mutation information-to determine their relative impact on prognosis. We used classification frameworks based on pre-validation and bootstrap multiple imputation to compare the prognostic power of each data source, both individually as well as integratively. We found that the prognostic utility of clinico-pathologic information was not out-performed by any of the various "-omics" platforms. Rather, a combination of clinico-pathologic variables and mRNA expression data performed best. Furthermore, a patient-based classification analysis revealed that the prognostic accuracy of various data types was not the same for different patients. This indicates that ongoing development in the individualized evaluation of melanoma patients must take account of the value of both traditional and novel "-omics" measurements.


Subject(s)
Biomarkers, Tumor/metabolism , Melanoma/genetics , MicroRNAs/metabolism , RNA, Messenger/metabolism , Skin Neoplasms/genetics , Biomarkers, Tumor/genetics , Cohort Studies , DNA Mutational Analysis , Humans , Melanoma/metabolism , Melanoma/secondary , MicroRNAs/genetics , Prognosis , Proteome/genetics , Proteome/metabolism , RNA, Messenger/genetics , Skin Neoplasms/metabolism , Skin Neoplasms/pathology
19.
Sci Rep ; 14(1): 4248, 2024 02 21.
Article in English | MEDLINE | ID: mdl-38378802

ABSTRACT

In the enduring challenge against disease, advancements in medical technology have empowered clinicians with novel diagnostic platforms. Whilst in some cases, a single test may provide a confident diagnosis, often additional tests are required. However, to strike a balance between diagnostic accuracy and cost-effectiveness, one must rigorously construct the clinical pathways. Here, we developed a framework to build multi-platform precision pathways in an automated, unbiased way, recommending the key steps a clinician would take to reach a diagnosis. We achieve this by developing a confidence score, used to simulate a clinical scenario, where at each stage, either a confident diagnosis is made, or another test is performed. Our framework provides a range of tools to interpret, visualize and compare the pathways, improving communication and enabling their evaluation on accuracy and cost, specific to different contexts. This framework will guide the development of novel diagnostic pathways for different diseases, accelerating the implementation of precision medicine into clinical practice.


Subject(s)
Communication , Precision Medicine , Mental Processes
20.
Bioinformatics ; 28(10): 1404-5, 2012 May 15.
Article in English | MEDLINE | ID: mdl-22467906

ABSTRACT

MOTIVATION: Mass spectrometry-based iTRAQ protein quantification is a high-throughput assay for determining relative protein expressions and identifying disease biomarkers. Processing and analysis of these large and complex data involves a number of distinct components and it is desirable to have a pipeline to efficiently integrate these together. To date, there are limited public available comprehensive analysis pipelines for iTRAQ data and many of these existing pipelines have limited visualization tools and no convenient interfaces with downstream analyses. We have developed a new open source comprehensive iTRAQ analysis pipeline, OCAP, integrating a wavelet-based preprocessing algorithm which provides better peak picking, a new quantification algorithm and a suite of visualizsation tools. OCAP is mainly developed in C++ and is provided as a standalone version (OCAP_standalone) as well as an R package. The R package (OCAP) provides the necessary interfaces with downstream statistical analysis.


Subject(s)
Algorithms , Proteins/genetics , Proteomics/methods , Software , Disease/genetics , Humans , Mass Spectrometry , Proteins/analysis , Proteomics/instrumentation
SELECTION OF CITATIONS
SEARCH DETAIL