Search | VHL CLAP/WR-PAHO/WHO

1.

Data integration and inference of gene regulation using single-cell temporal multimodal data with scTIE.

Lin, Yingxin; Wu, Tung-Yu; Chen, Xi; Wan, Sheng; Chao, Brian; Xin, Jingxue; Yang, Jean Y H; Wong, Wing H; Wang, Y X Rachel.

Genome Res ; 34(1): 119-133, 2024 02 07.

Article in English | MEDLINE | ID: mdl-38190633

ABSTRACT

Single-cell technologies offer unprecedented opportunities to dissect gene regulatory mechanisms in context-specific ways. Although there are computational methods for extracting gene regulatory relationships from scRNA-seq and scATAC-seq data, the data integration problem, essential for accurate cell type identification, has been mostly treated as a standalone challenge. Here we present scTIE, a unified method that integrates temporal multimodal data and infers regulatory relationships predictive of cellular state changes. scTIE uses an autoencoder to embed cells from all time points into a common space by using iterative optimal transport, followed by extracting interpretable information to predict cell trajectories. Using a variety of synthetic and real temporal multimodal data sets, we show scTIE achieves effective data integration while preserving more biological signals than existing methods, particularly in the presence of batch effects and noise. Furthermore, on the exemplar multiome data set we generated from differentiating mouse embryonic stem cells over time, we show scTIE captures regulatory elements highly predictive of cell transition probabilities, providing new potentials to understand the regulatory landscape driving developmental processes.

Subject(s)

Gene Expression Profiling , Single-Cell Analysis , Animals , Mice , Gene Expression Profiling/methods , Single-Cell Analysis/methods , Gene Expression Regulation

2.

scSTAR reveals hidden heterogeneity with a real-virtual cell pair structure across conditions in single-cell RNA sequencing data.

Hao, Jie; Zou, Jiawei; Zhang, Jiaqiang; Chen, Ke; Wu, Duojiao; Cao, Wei; Shang, Guoguo; Yang, Jean Y H; Wong-Lin, KongFatt; Sun, Hourong; Zhang, Zhen; Wang, Xiangdong; Chen, Wantao; Zou, Xin.

Brief Bioinform ; 24(2)2023 03 19.

Article in English | MEDLINE | ID: mdl-36813563

ABSTRACT

Cell-state transition can reveal additional information from single-cell ribonucleic acid (RNA)-sequencing data in time-resolved biological phenomena. However, most of the current methods are based on the time derivative of the gene expression state, which restricts them to the short-term evolution of cell states. Here, we present single-cell State Transition Across-samples of RNA-seq data (scSTAR), which overcomes this limitation by constructing a paired-cell projection between biological conditions with an arbitrary time span by maximizing the covariance between two feature spaces using partial least square and minimum squared error methods. In mouse ageing data, the response to stress in CD4+ memory T cell subtypes was found to be associated with ageing. A novel Treg subtype characterized by mTORC activation was identified to be associated with antitumour immune suppression, which was confirmed by immunofluorescence microscopy and survival analysis in 11 cancers from The Cancer Genome Atlas Program. On melanoma data, scSTAR improved immunotherapy-response prediction accuracy from 0.8 to 0.96.

Subject(s)

Gene Expression Profiling , RNA , Animals , Mice , RNA/genetics , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Genome

3.

A field guide to cultivating computational biology.

Way, Gregory P; Greene, Casey S; Carninci, Piero; Carvalho, Benilton S; de Hoon, Michiel; Finley, Stacey D; Gosline, Sara J C; LÈ Cao, Kim-Anh; Lee, Jerry S H; Marchionni, Luigi; Robine, Nicolas; Sindi, Suzanne S; Theis, Fabian J; Yang, Jean Y H; Carpenter, Anne E; Fertig, Elana J.

PLoS Biol ; 19(10): e3001419, 2021 10.

Article in English | MEDLINE | ID: mdl-34618807

ABSTRACT

Evolving in sync with the computation revolution over the past 30 years, computational biology has emerged as a mature scientific field. While the field has made major contributions toward improving scientific knowledge and human health, individual computational biology practitioners at various institutions often languish in career development. As optimistic biologists passionate about the future of our field, we propose solutions for both eager and reluctant individual scientists, institutions, publishers, funding agencies, and educators to fully embrace computational biology. We believe that in order to pave the way for the next generation of discoveries, we need to improve recognition for computational biologists and better align pathways of career success with pathways of scientific progress. With 10 outlined steps, we call on all adjacent fields to move away from the traditional individual, single-discipline investigator research model and embrace multidisciplinary, data-driven, team science.

Subject(s)

Computational Biology , Budgets , Cooperative Behavior , Humans , Interdisciplinary Research , Mentoring , Motivation , Publications , Reward , Software

4.

Scalable workflow for characterization of cell-cell communication in COVID-19 patients.

Lin, Yingxin; Loo, Lipin; Tran, Andy; Lin, David M; Moreno, Cesar; Hesselson, Daniel; Neely, G Gregory; Yang, Jean Y H.

PLoS Comput Biol ; 18(10): e1010495, 2022 10.

Article in English | MEDLINE | ID: mdl-36197936

ABSTRACT

COVID-19 patients display a wide range of disease severity, ranging from asymptomatic to critical symptoms with high mortality risk. Our ability to understand the interaction of SARS-CoV-2 infected cells within the lung, and of protective or dysfunctional immune responses to the virus, is critical to effectively treat these patients. Currently, our understanding of cell-cell interactions across different disease states, and how such interactions may drive pathogenic outcomes, is incomplete. Here, we developed a generalizable and scalable workflow for identifying cells that are differentially interacting across COVID-19 patients with distinct disease outcomes and use this to examine eight public single-cell RNA-seq datasets (six from peripheral blood mononuclear cells, one from bronchoalveolar lavage and one from nasopharyngeal), with a total of 211 individual samples. By characterizing the cell-cell interaction patterns across epithelial and immune cells in lung tissues for patients with varying disease severity, we illustrate diverse communication patterns across individuals, and discover heterogeneous communication patterns among moderate and severe patients. We further illustrate patterns derived from cell-cell interactions are potential signatures for discriminating between moderate and severe patients. Overall, this workflow can be generalized and scaled to combine multiple scRNA-seq datasets to uncover cell-cell interactions.

Subject(s)

COVID-19 , Cell Communication , Humans , Leukocytes, Mononuclear , SARS-CoV-2 , Workflow

5.

Whole genome duplication in oral squamous cell carcinoma in patients younger than 50 years: implications for prognosis and adverse clinicopathological factors.

Satgunaseelan, Laveniya; Strbenac, Dario; Willet, Cali; Chew, Tracy; Sadsad, Rosemarie; Wykes, James; Low, Tsu-Hui Hubert; Cooper, Wendy A; Lee, C Soon; Palme, Carsten E; Yang, Jean Y H; Clark, Jonathan R; Gupta, Ruta.

Genes Chromosomes Cancer ; 61(9): 561-571, 2022 09.

Article in English | MEDLINE | ID: mdl-35670448

ABSTRACT

INTRODUCTION: Oral squamous cell carcinoma (OSCC) in the young (<50 years), without known carcinogenic risk factors, is on the rise globally. Whole genome duplication (WGD) has been shown to occur at higher rates in cancers without an identifiable carcinogenic agent. We aimed to evaluate the prevalence of WGD in a cohort of OSCC patients under the age of 50 years. METHODS: Whole genome sequencing (WGS) was performed on 28 OSCC patients from the Sydney Head and Neck Cancer Institute (SHNCI) biobank. An additional nine cases were obtained from The Cancer Genome Atlas (TCGA). RESULTS: WGD was seen in 27 of 37 (73%) cases. Non-synonymous, somatic TP53 mutations occurred in 25 of 27 (93%) cases of WGD and were predicted to precede WGD in 21 (77%). WGD was significantly associated with larger tumor size (p = 0.01) and was frequent in patients with recurrences (87%, p = 0.36). Overall survival was significantly worse in those with WGD (p = 0.05). CONCLUSIONS: Our data, based on one of the largest WGS datasets of young patients with OSCC, demonstrates a high frequency of WGD and its association with adverse pathologic characteristics and clinical outcomes. TP53 mutations also preceded WGD, as has been described in other tumors without a clear mutagenic driver.

Subject(s)

Carcinoma, Squamous Cell , Head and Neck Neoplasms , Mouth Neoplasms , Carcinoma, Squamous Cell/genetics , Gene Duplication , Head and Neck Neoplasms/genetics , Humans , Middle Aged , Mouth Neoplasms/genetics , Squamous Cell Carcinoma of Head and Neck/genetics

6.

LC-N2G: a local consistency approach for nutrigenomics data analysis.

Xu, Xiangnan; Solon-Biet, Samantha M; Senior, Alistair; Raubenheimer, David; Simpson, Stephen J; Fontana, Luigi; Mueller, Samuel; Yang, Jean Y H.

BMC Bioinformatics ; 21(1): 530, 2020 Nov 17.

Article in English | MEDLINE | ID: mdl-33203358

ABSTRACT

BACKGROUND: Nutrigenomics aims at understanding the interaction between nutrition and gene information. Due to the complex interactions of nutrients and genes, their relationship exhibits non-linearity. One of the most effective and efficient methods to explore their relationship is the nutritional geometry framework which fits a response surface for the gene expression over two prespecified nutrition variables. However, when the number of nutrients involved is large, it is challenging to find combinations of informative nutrients with respect to a certain gene and to test whether the relationship is stronger than chance. Methods for identifying informative combinations are essential to understanding the relationship between nutrients and genes. RESULTS: We introduce Local Consistency Nutrition to Graphics (LC-N2G), a novel approach for ranking and identifying combinations of nutrients with gene expression. In LC-N2G, we first propose a model-free quantity called Local Consistency statistic to measure whether there is non-random relationship between combinations of nutrients and gene expression measurements based on (1) the similarity between samples in the nutrient space and (2) their difference in gene expression. Then combinations with small LC are selected and a permutation test is performed to evaluate their significance. Finally, the response surfaces are generated for the subset of significant relationships. Evaluation on simulated data and real data shows the LC-N2G can accurately find combinations that are correlated with gene expression. CONCLUSION: The LC-N2G is practically powerful for identifying the informative nutrition variables correlated with gene expression. Therefore, LC-N2G is important in the area of nutrigenomics for understanding the relationship between nutrition and gene expression information.

Subject(s)

Algorithms , Data Analysis , Nutrigenomics , Animal Nutritional Physiological Phenomena , Animals , Computer Simulation , Gene Expression Regulation , Mice , Nonlinear Dynamics

7.

Mutational and transcriptomic landscapes of a rare human prostate basal cell carcinoma.

Su, Xianbin; Long, Qi; Bo, Juanjie; Shi, Yi; Zhao, Li-Nan; Lin, Yingxin; Luo, Qing; Ghazanfar, Shila; Zhang, Chao; Liu, Qiang; Wang, Lan; He, Kunyan; He, Jian; Cui, Xiaofang; Yang, Jean Y H; Han, Ze-Guang; Yang, Guoliang; Sha, Jian-Jun.

Prostate ; 80(6): 508-517, 2020 05.

Article in English | MEDLINE | ID: mdl-32119131

ABSTRACT

BACKGROUND: As a rare subtype of prostate carcinoma, basal cell carcinoma (BCC) has not been studied extensively and thus lacks systematic molecular characterization. METHODS: Here, we applied single-cell genomic amplification and RNA-Seq to a specimen of human prostate BCC (CK34ßE12+ /P63+ /PAP- /PSA- ). The mutational landscape was obtained via whole exome sequencing of the amplification mixture of 49 single cells, and the transcriptomes of 69 single cells were also obtained. RESULTS: The five putative driver genes mutated in BCC are CASC5, NUTM1, PTPRC, KMT2C, and TBX3, and the top three nucleotide substitutions are C>T, T>C, and C>A, similar to common prostate cancer. The distribution of the variant allele frequency values indicated that these single cells are from the same tumor clone. The 69 single cells were clustered into tumor, stromal, and immune cells based on their global transcriptomic profiles. The tumor cells specifically express basal cell markers like KRT5, KRT14, and KRT23 and epithelial markers EPCAM, CDH1, and CD24. The transcription factor covariance network analysis showed that the BCC tumor cells have distinct regulatory networks. By comparison with current prostate cancer datasets, we found that some of the bulk samples exhibit basal cell signatures. Interestingly, at single-cell resolution the gene expression patterns of prostate BCC tumor cells show uniqueness compared with that of common prostate cancer-derived circulating tumor cells. CONCLUSIONS: This study, for the first time, discloses the comprehensive mutational and transcriptomic landscapes of prostate BCC, which lays a foundation for the understanding of its tumorigenesis mechanism and provides new insights into prostate cancers in general.

Subject(s)

Carcinoma, Basal Cell/genetics , Prostatic Neoplasms/genetics , Biopsy, Needle , Carcinoma, Basal Cell/pathology , Gene Amplification , Gene Expression Profiling/methods , Gene Frequency , Humans , Immunohistochemistry , Male , Middle Aged , Mutation , Prostatic Neoplasms/pathology , Single-Cell Analysis/methods , Stromal Cells/pathology , Transcriptome , Exome Sequencing

8.

3D reconstruction of spatial expression.

Lin, Yingxin; Yang, Jean Y H.

Nat Methods ; 19(5): 526-527, 2022 05.

Article in English | MEDLINE | ID: mdl-35577956

Subject(s)

Image Processing, Computer-Assisted , Imaging, Three-Dimensional , Algorithms

9.

DCARS: differential correlation across ranked samples.

Ghazanfar, Shila; Strbenac, Dario; Ormerod, John T; Yang, Jean Y H; Patrick, Ellis.

Bioinformatics ; 35(5): 823-829, 2019 03 01.

Article in English | MEDLINE | ID: mdl-30102408

ABSTRACT

MOTIVATION: Genes act as a system and not in isolation. Thus, it is important to consider coordinated changes of gene expression rather than single genes when investigating biological phenomena such as the aetiology of cancer. We have developed an approach for quantifying how changes in the association between pairs of genes may inform the outcome of interest called Differential Correlation across Ranked Samples (DCARS). Modelling gene correlation across a continuous sample ranking does not require the dichotomisation of samples into two distinct classes and can identify differences in gene correlation across early, mid or late stages of the outcome of interest. RESULTS: When we evaluated DCARS against the typical Fisher Z-transformation test for differential correlation, as well as a typical approach testing for interaction within a linear model, on real TCGA data, DCARS significantly ranked gene pairs containing known cancer genes more highly across several cancers. Similar results are found with our simulation study. DCARS was applied to 13 cancers datasets in TCGA, revealing several distinct relationships for which survival ranking was found to be associated with a change in correlation between genes. Furthermore, we demonstrated that DCARS can be used in conjunction with network analysis techniques to extract biological meaning from multi-layered and complex data. AVAILABILITY AND IMPLEMENTATION: DCARS R package and sample data are available at https://github.com/shazanfar/DCARS. Publicly available data from The Cancer Genome Atlas (TCGA) was used using the TCGABiolinks R package. Supplementary Files and DCARS R package is available at https://github.com/shazanfar/DCARS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Neoplasms , Genome , Humans , Software

10.

bcGST-an interactive bias-correction method to identify over-represented gene-sets in boutique arrays.

Wang, Kevin Y X; Menzies, Alexander M; Silva, Ines P; Wilmott, James S; Yan, Yibing; Wongchenko, Matthew; Kefford, Richard F; Scolyer, Richard A; Long, Georgina V; Tarr, Garth; Mueller, Samuel; Yang, Jean Y H.

Bioinformatics ; 35(8): 1350-1357, 2019 04 15.

Article in English | MEDLINE | ID: mdl-30215668

ABSTRACT

MOTIVATION: Gene annotation and pathway databases such as Gene Ontology and Kyoto Encyclopaedia of Genes and Genomes are important tools in Gene-Set Test (GST) that describe gene biological functions and associated pathways. GST aims to establish an association relationship between a gene-set of interest and an annotation. Importantly, GST tests for over-representation of genes in an annotation term. One implicit assumption of GST is that the gene expression platform captures the complete or a very large proportion of the genome. However, this assumption is neither satisfied for the increasingly popular boutique array nor the custom designed gene expression profiling platform. Specifically, conventional GST is no longer appropriate due to the gene-set selection bias induced during the construction of these platforms. RESULTS: We propose bcGST, a bias-corrected GST by introducing bias-correction terms in the contingency table needed for calculating the Fisher's Exact Test. The adjustment method works by estimating the proportion of genes captured on the array with respect to the genome in order to assist filtration of annotation terms that would otherwise be falsely included or excluded. We illustrate the practicality of bcGST and its stability through multiple differential gene expression analyses in melanoma and the Cancer Genome Atlas cancer studies. AVAILABILITY AND IMPLEMENTATION: The bcGST method is made available as a Shiny web application at http://shiny.maths.usyd.edu.au/bcGST/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Gene Expression Profiling , Software , Computational Biology , Gene Ontology , Genome , Molecular Sequence Annotation

11.

One-Time Fecal Immunochemical Screening for Advanced Colorectal Neoplasia in Patients with CKD (DETECT Study).

Wong, Germaine; Hope, Richard L; Howard, Kirsten; Chapman, Jeremy R; Castells, Antoni; Roger, Simon D; Bourke, Michael J; Macaskill, Petra; Turner, Robin; Williams, Gabrielle; Lim, Wai Hon; Lok, Charmaine E; Diekmann, Fritz; Cross, Nicholas B; Sen, Shaundeep; Allen, Richard D M; Chadban, Steven J; Pollock, Carol A; Tong, Allison; Teixeira-Pinto, Armando; Yang, Jean Y H; Williams, Narelle; Au, Eric Hoi Kit; Kieu, Anh; James, Laura; Craig, Jonathan C.

J Am Soc Nephrol ; 30(6): 1061-1072, 2019 06.

Article in English | MEDLINE | ID: mdl-31040191

ABSTRACT

BACKGROUND: In patients with CKD, the risk of developing colorectal cancer is high and outcomes are poor. Screening using fecal immunochemical testing (FIT) is effective in reducing mortality from colorectal cancer, but performance characteristics of FIT in CKD are unknown. METHODS: To determine the detection rates and performance characteristics of FIT for advanced colorectal neoplasia (ACN) in patients with CKD, we used FIT to prospectively screen patients aged 35-74 years with CKD (stages 3-5 CKD, dialysis, and renal transplant) from 11 sites in Australia, New Zealand, Canada, and Spain. All participants received clinical follow-up at 2 years. We used a two-step reference standard approach to estimate disease status. RESULTS: Overall, 369 out of 1706 patients who completed FIT (21.6%) tested positive; 323 (87.5%) underwent colonoscopies. A total of 1553 (91.0%) completed follow-up; 82 (4.8%) had died and 71 (4.2%) were lost. The detection rate of ACN using FIT was 6.0% (5.6%, 7.4%, and 5.6% for stages 3-5 CKD, dialysis, and transplant). Sensitivity, specificity, and positive and negative predictive values of FIT for ACN were 0.90, 0.83, 0.30, and 0.99, respectively. Of participants who underwent colonoscopy, five (1.5%) experienced major colonoscopy-related complications, including bowel perforation and major bleeding. CONCLUSIONS: FIT appears to be an accurate screening test for patients with CKD, such that a negative test may rule out the diagnosis of colorectal cancer within 2 years. However, the risk of major complications from work-up colonoscopy are at least ten-fold higher than in the general population.

Subject(s)

Cause of Death , Colorectal Neoplasms/epidemiology , Colorectal Neoplasms/pathology , Early Detection of Cancer/methods , Renal Insufficiency, Chronic/epidemiology , Renal Insufficiency, Chronic/therapy , Adult , Aged , Australia , Canada , Cohort Studies , Colonoscopy/methods , Colorectal Neoplasms/diagnosis , Comorbidity , Female , Humans , Immunohistochemistry , Internationality , Male , Mass Screening/methods , Middle Aged , New Zealand , Occult Blood , Prevalence , Renal Insufficiency, Chronic/diagnosis , Retrospective Studies , Risk Assessment , Spain , Survival Analysis

12.

Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis.

Geddes, Thomas A; Kim, Taiyun; Nan, Lihao; Burchfield, James G; Yang, Jean Y H; Tao, Dacheng; Yang, Pengyi.

BMC Bioinformatics ; 20(Suppl 19): 660, 2019 Dec 24.

Article in English | MEDLINE | ID: mdl-31870278

ABSTRACT

BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the high feature-dimensionality of the transcriptome (i.e. the large number of measured genes in each cell) and because only a small fraction of genes are cell type-specific and therefore informative for generating cell type-specific clusters, clustering directly on the original feature/gene dimension may lead to uninformative clusters and hinder correct cell type identification. RESULTS: Here, we propose an autoencoder-based cluster ensemble framework in which we first take random subspace projections from the data, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and finally apply ensemble clustering across all encoded datasets to generate clusters of cells. We employ four evaluation metrics to benchmark clustering performance and our experiments demonstrate that the proposed autoencoder-based cluster ensemble can lead to substantially improved cell type-specific clusters when applied with both the standard k-means clustering algorithm and a state-of-the-art kernel-based clustering algorithm (SIMLR) designed specifically for scRNA-seq data. Compared to directly using these clustering algorithms on the original datasets, the performance improvement in some cases is up to 100%, depending on the evaluation metric used. CONCLUSIONS: Our results suggest that the proposed framework can facilitate more accurate cell type identification as well as other downstream analyses. The code for creating the proposed autoencoder-based cluster ensemble framework is freely available from https://github.com/gedcom/scCCESS.

Subject(s)

Sequence Analysis, RNA , Algorithms , Cluster Analysis , Data Analysis , Humans , Neural Networks, Computer , RNA-Seq , Single-Cell Analysis , Transcriptome

13.

scDC: single cell differential composition analysis.

Cao, Yue; Lin, Yingxin; Ormerod, John T; Yang, Pengyi; Yang, Jean Y H; Lo, Kitty K.

BMC Bioinformatics ; 20(Suppl 19): 721, 2019 Dec 24.

Article in English | MEDLINE | ID: mdl-31870280

ABSTRACT

BACKGROUND: Differences in cell-type composition across subjects and conditions often carry biological significance. Recent advancements in single cell sequencing technologies enable cell-types to be identified at the single cell level, and as a result, cell-type composition of tissues can now be studied in exquisite detail. However, a number of challenges remain with cell-type composition analysis - none of the existing methods can identify cell-type perfectly and variability related to cell sampling exists in any single cell experiment. This necessitates the development of method for estimating uncertainty in cell-type composition. RESULTS: We developed a novel single cell differential composition (scDC) analysis method that performs differential cell-type composition analysis via bootstrap resampling. scDC captures the uncertainty associated with cell-type proportions of each subject via bias-corrected and accelerated bootstrap confidence intervals. We assessed the performance of our method using a number of simulated datasets and synthetic datasets curated from publicly available single cell datasets. In simulated datasets, scDC correctly recovered the true cell-type proportions. In synthetic datasets, the cell-type compositions returned by scDC were highly concordant with reference cell-type compositions from the original data. Since the majority of datasets tested in this study have only 2 to 5 subjects per condition, the addition of confidence intervals enabled better comparisons of compositional differences between subjects and across conditions. CONCLUSIONS: scDC is a novel statistical method for performing differential cell-type composition analysis for scRNA-seq data. It uses bootstrap resampling to estimate the standard errors associated with cell-type proportion estimates and performs significance testing through GLM and GLMM models. We have made this method available to the scientific community as part of the scdney package (Single Cell Data Integrative Analysis) R package, available from https://github.com/SydneyBioX/scdney.

Subject(s)

Single-Cell Analysis/methods , Humans

14.

Differential distribution improves gene selection stability and has competitive classification performance for patient survival.

Strbenac, Dario; Mann, Graham J; Yang, Jean Y H; Ormerod, John T.

Nucleic Acids Res ; 44(13): e119, 2016 07 27.

Article in English | MEDLINE | ID: mdl-27190235

ABSTRACT

A consistent difference in average expression level, often referred to as differential expression (DE), has long been used to identify genes useful for classification. However, recent cancer studies have shown that when transcription factors or epigenetic signals become deregulated, a change in expression variability (DV) of target genes is frequently observed. This suggests that assessing the importance of genes by either differential expression or variability alone potentially misses sets of important biomarkers that could lead to improved predictions and treatments. Here, we describe a new approach for assessing the importance of genes based on differential distribution (DD), which combines information from differential expression and differential variability into a unified metric. We show that feature ranking and selection stability based on DD can perform two to three times better than DE or DV alone, and that DD yields equivalent error rates to DE and DV. Finally, assessing genes via differential distribution produces a complementary set of selected genes to DE and DV, potentially opening up new categories of biomarkers.

Subject(s)

Biomarkers, Tumor/genetics , Gene Expression Regulation, Neoplastic/genetics , Melanoma/genetics , Oligonucleotide Array Sequence Analysis/methods , Adenocarcinoma/genetics , Adenocarcinoma/pathology , Adenocarcinoma of Lung , Algorithms , Biomarkers, Tumor/biosynthesis , Female , Gene Expression Profiling/methods , Humans , Lung Neoplasms/genetics , Lung Neoplasms/pathology , Melanoma/pathology , Ovarian Neoplasms/genetics , Ovarian Neoplasms/pathology

15.

Quantitative Performance Evaluator for Proteomics (QPEP): Web-based Application for Reproducible Evaluation of Proteomics Preprocessing Methods.

Strbenac, Dario; Zhong, Ling; Raftery, Mark J; Wang, Penghao; Wilson, Susan R; Armstrong, Nicola J; Yang, Jean Y H.

J Proteome Res ; 16(7): 2359-2369, 2017 07 07.

Article in English | MEDLINE | ID: mdl-28580786

ABSTRACT

Tandem mass spectrometry is one of the most popular techniques for quantitation of proteomes. There exists a large variety of options in each stage of data preprocessing that impact the bias and variance of the summarized protein-level values. Using a newly released data set satisfying a replicated Latin squares design, a diverse set of performance metrics has been developed and implemented in a web-based application, Quantitative Performance Evaluator for Proteomics (QPEP). QPEP has the flexibility to allow users to apply their own method to preprocess this data set and share the results, allowing direct and straightforward comparison of new methodologies. Application of these new metrics to three case studies highlights that (i) the summarization of peptides to proteins is robust to the choice of peptide summary used, (ii) the differences between iTRAQ labels are stronger than the differences between experimental runs, and (iii) the commercial software ProteinPilot performs equivalently well at between-sample normalization to more complicated methods developed by academics. Importantly, finding (ii) underscores the benefits of using the principles of randomization and blocking to avoid the experimental measurements being confounded by technical factors. Data are available via ProteomeXchange with identifier PXD003608.

Subject(s)

Peptides/analysis , Proteome/analysis , Proteomics/statistics & numerical data , Saccharomyces cerevisiae Proteins/isolation & purification , Software , Tandem Mass Spectrometry/standards , Benchmarking , Internet , Reproducibility of Results , Saccharomyces cerevisiae/chemistry

16.

Single-cell RNA-Seq analysis reveals dynamic trajectories during mouse liver development.

Su, Xianbin; Shi, Yi; Zou, Xin; Lu, Zhao-Ning; Xie, Gangcai; Yang, Jean Y H; Wu, Chong-Chao; Cui, Xiao-Fang; He, Kun-Yan; Luo, Qing; Qu, Yu-Lan; Wang, Na; Wang, Lan; Han, Ze-Guang.

BMC Genomics ; 18(1): 946, 2017 Dec 04.

Article in English | MEDLINE | ID: mdl-29202695

ABSTRACT

BACKGROUND: The differentiation and maturation trajectories of fetal liver stem/progenitor cells (LSPCs) are not fully understood at single-cell resolution, and a priori knowledge of limited biomarkers could restrict trajectory tracking. RESULTS: We employed marker-free single-cell RNA-Seq to characterize comprehensive transcriptional profiles of 507 cells randomly selected from seven stages between embryonic day 11.5 and postnatal day 2.5 during mouse liver development, and also 52 Epcam-positive cholangiocytes from postnatal day 3.25 mouse livers. LSPCs in developing mouse livers were identified via marker-free transcriptomic profiling. Single-cell resolution dynamic developmental trajectories of LSPCs exhibited contiguous but discrete genetic control through transcription factors and signaling pathways. The gene expression profiles of cholangiocytes were more close to that of embryonic day 11.5 rather than other later staged LSPCs, cuing the fate decision stage of LSPCs. Our marker-free approach also allows systematic assessment and prediction of isolation biomarkers for LSPCs. CONCLUSIONS: Our data provide not only a valuable resource but also novel insights into the fate decision and transcriptional control of self-renewal, differentiation and maturation of LSPCs.

Subject(s)

Embryonic Stem Cells/metabolism , Gene Expression Profiling/methods , Gene Expression Regulation, Developmental , Liver/metabolism , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Animals , Biomarkers/metabolism , Cells, Cultured , Embryonic Stem Cells/cytology , Liver/embryology , Mice , Mice, Inbred C57BL

17.

Inferring data-specific micro-RNA function through the joint ranking of micro-RNA and pathways from matched micro-RNA and gene expression data.

Patrick, Ellis; Buckley, Michael; Müller, Samuel; Lin, David M; Yang, Jean Y H.

Bioinformatics ; 31(17): 2822-8, 2015 Sep 01.

Article in English | MEDLINE | ID: mdl-25910695

ABSTRACT

MOTIVATION: In practice, identifying and interpreting the functional impacts of the regulatory relationships between micro-RNA and messenger-RNA is non-trivial. The sheer scale of possible micro-RNA and messenger-RNA interactions can make the interpretation of results difficult. RESULTS: We propose a supervised framework, pMim, built upon concepts of significance combination, for jointly ranking regulatory micro-RNA and their potential functional impacts with respect to a condition of interest. Here, pMim directly tests if a micro-RNA is differentially expressed and if its predicted targets, which lie in a common biological pathway, have changed in the opposite direction. We leverage the information within existing micro-RNA target and pathway databases to stabilize the estimation and annotation of micro-RNA regulation making our approach suitable for datasets with small sample sizes. In addition to outputting meaningful and interpretable results, we demonstrate in a variety of datasets that the micro-RNA identified by pMim, in comparison to simpler existing approaches, are also more concordant with what is described in the literature. AVAILABILITY AND IMPLEMENTATION: This framework is implemented as an R function, pMim, in the package sydSeq available from http://www.ellispatrick.com/r-packages. CONTACT: jean.yang@sydney.edu.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Algorithms , Computational Biology/methods , Gene Regulatory Networks , MicroRNAs/metabolism , RNA, Messenger/metabolism , Software , Databases, Factual , Gene Expression Profiling/methods , Gene Expression Regulation , Humans , MicroRNAs/genetics , RNA, Messenger/genetics

18.

ClassifyR: an R package for performance assessment of classification with applications to transcriptomics.

Strbenac, Dario; Mann, Graham J; Ormerod, John T; Yang, Jean Y H.

Bioinformatics ; 31(11): 1851-3, 2015 Jun 01.

Article in English | MEDLINE | ID: mdl-25644269

ABSTRACT

UNLABELLED: Although a large collection of classification software packages exist in R, a new generic framework for linking custom classification functions with classification performance measures is needed. A generic classification framework has been designed and implemented as an R package in an object oriented style. Its design places emphasis on parallel processing, reproducibility and extensibility. Finally, a comprehensive set of performance measures are available to ease post-processing. Taken together, these important characteristics enable rapid and reproducible benchmarking of alternative classifiers. AVAILABILITY AND IMPLEMENTATION: ClassifyR is implemented in R and can be obtained from the Bioconductor project: http://bioconductor.org/packages/release/bioc/html/ClassifyR.html.

Subject(s)

Gene Expression Profiling , Software , Classification/methods , Humans

19.

Authors' Reply.

Wong, Germaine; Hope, Richard L; Howard, Kirsten; Chapman, Jeremy R; Castells, Antoni; Roger, Simon D; Bourke, Michael J; Macaskill, Petra; Turner, Robin; Williams, Gabrielle; Lim, Wai H; Lok, Charmaine E; Diekman, Fritz; Cross, Nicholas; Sen, Shaundeep; Allen, Richard D M; Chadban, Steven J; Pollock, Carol A; Tong, Allison; Teixeira-Pinto, Armando; Yang, Jean Y H; Williams, Narelle; Au, Eric; Kieu, Anh; James, Laura; Craig, Jonathan C.

J Am Soc Nephrol ; 30(11): 2276-2277, 2019 11.

Article in English | MEDLINE | ID: mdl-31597717

Subject(s)

Colorectal Neoplasms , Kidney Failure, Chronic , Kidney Transplantation , Humans , Mass Screening

20.

BIDCell: Biologically-informed self-supervised learning for segmentation of subcellular spatial transcriptomics data.

Fu, Xiaohang; Lin, Yingxin; Lin, David M; Mechtersheimer, Daniel; Wang, Chuhan; Ameen, Farhan; Ghazanfar, Shila; Patrick, Ellis; Kim, Jinman; Yang, Jean Y H.

Nat Commun ; 15(1): 509, 2024 Jan 13.

Article in English | MEDLINE | ID: mdl-38218939

ABSTRACT

Recent advances in subcellular imaging transcriptomics platforms have enabled high-resolution spatial mapping of gene expression, while also introducing significant analytical challenges in accurately identifying cells and assigning transcripts. Existing methods grapple with cell segmentation, frequently leading to fragmented cells or oversized cells that capture contaminated expression. To this end, we present BIDCell, a self-supervised deep learning-based framework with biologically-informed loss functions that learn relationships between spatially resolved gene expression and cell morphology. BIDCell incorporates cell-type data, including single-cell transcriptomics data from public repositories, with cell morphology information. Using a comprehensive evaluation framework consisting of metrics in five complementary categories for cell segmentation performance, we demonstrate that BIDCell outperforms other state-of-the-art methods according to many metrics across a variety of tissue types and technology platforms. Our findings underscore the potential of BIDCell to significantly enhance single-cell spatial expression analyses, enabling great potential in biological discovery.

Subject(s)

Benchmarking , Gene Expression Profiling , Erythrocytes, Abnormal , Histocompatibility Testing , Supervised Machine Learning

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL