Search | VHL Search Portal

1.

Mian: interactive web-based microbiome data table visualization and machine learning platform.

Jin, Boyang Tom; Xu, Feng; Ng, Raymond T; Hogg, James C.

Bioinformatics ; 38(4): 1176-1178, 2022 01 27.

Article in English | MEDLINE | ID: mdl-34788784

ABSTRACT

SUMMARY: Mian is a web application to interactively visualize, run statistical tools and train machine learning models on operational taxonomic unit (OTU) or amplicon sequence variant (ASV) datasets to identify key taxonomic groups, diversity trends or taxonomic composition shifts in the context of provided categorical or numerical sample metadata. Tools, including Fisher's exact test, Boruta feature selection, alpha and beta diversity, and random forest and deep neural network classifiers, facilitate open-ended data exploration and hypothesis generation on microbial datasets. AVAILABILITY: Mian is freely available at: miandata.org. Mian is an open-source platform licensed under the MIT license with source code available at github.com/tbj128/mian. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Microbiota , Software , Data Visualization , Machine Learning , Internet

2.

Microbial dysbiosis and the host airway epithelial response: insights into HIV-associated COPD using multi'omics profiling.

Jude, Marcia Smiti; Yang, Chen Xi; Filho, Fernando Studart Leitao; Hernandez Cordero, Ana I; Yang, Julia; Shaipanich, Tawimas; Li, Xuan; Lin, David; MacIsaac, Julie; Kobor, Michael S; Sinha, Sunita; Nislow, Corey; Singh, Amrit; Lam, Wan; Lam, Stephen; Guillemi, Silvia; Harris, Marianne; Montaner, Julio; Ng, Raymond T; Carlsten, Christopher; Paul Man, S F; Sin, Don D; Leung, Janice M.

Respir Res ; 24(1): 124, 2023 May 04.

Article in English | MEDLINE | ID: mdl-37143066

ABSTRACT

BACKGROUND: People living with HIV (PLWH) are at increased risk of developing Chronic Obstructive Pulmonary Disease (COPD) independent of cigarette smoking. We hypothesized that dysbiosis in PLWH is associated with epigenetic and transcriptomic disruptions in the airway epithelium. METHODS: Airway epithelial brushings were collected from 18 COPD + HIV + , 16 COPD - HIV + , 22 COPD + HIV - and 20 COPD - HIV - subjects. The microbiome, methylome, and transcriptome were profiled using 16S sequencing, Illumina Infinium Methylation EPIC chip, and RNA sequencing, respectively. Multi 'omic integration was performed using Data Integration Analysis for Biomarker discovery using Latent cOmponents. A correlation > 0.7 was used to identify key interactions between the 'omes. RESULTS: The COPD + HIV -, COPD -HIV + , and COPD + HIV + groups had reduced Shannon Diversity (p = 0.004, p = 0.023, and p = 5.5e-06, respectively) compared to individuals with neither COPD nor HIV, with the COPD + HIV + group demonstrating the most reduced diversity. Microbial communities were significantly different between the four groups (p = 0.001). Multi 'omic integration identified correlations between Bacteroidetes Prevotella, genes FUZ, FASTKD3, and ACVR1B, and epigenetic features CpG-FUZ and CpG-PHLDB3. CONCLUSION: PLWH with COPD manifest decreased diversity and altered microbial communities in their airway epithelial microbiome. The reduction in Prevotella in this group was linked with epigenetic and transcriptomic disruptions in host genes including FUZ, FASTKD3, and ACVR1B.

Subject(s)

HIV Infections , Pulmonary Disease, Chronic Obstructive , Humans , Dysbiosis/genetics , Pulmonary Disease, Chronic Obstructive/epidemiology , Pulmonary Disease, Chronic Obstructive/genetics , Gene Expression Profiling , Epithelium , HIV Infections/epidemiology , HIV Infections/genetics

3.

The molecular and cellular mechanisms associated with the destruction of terminal bronchioles in COPD.

Xu, Feng; Vasilescu, Dragos M; Kinose, Daisuke; Tanabe, Naoya; Ng, Kevin W; Coxson, Harvey O; Cooper, Joel D; Hackett, Tillie-Louise; Verleden, Stijn E; Vanaudenaerde, Bart M; Stevenson, Christopher S; Lenburg, Marc E; Spira, Avrum; Tan, Wan C; Sin, Don D; Ng, Raymond T; Hogg, James C.

Eur Respir J ; 59(5)2022 05.

Article in English | MEDLINE | ID: mdl-34675046

ABSTRACT

RATIONALE: Peripheral airway obstruction is a key feature of chronic obstructive pulmonary disease (COPD), but the mechanisms of airway loss are unknown. This study aims to identify the molecular and cellular mechanisms associated with peripheral airway obstruction in COPD. METHODS: Ten explanted lung specimens donated by patients with very severe COPD treated by lung transplantation and five unused donor control lungs were sampled using systematic uniform random sampling (SURS), resulting in 240 samples. These samples were further examined by micro-computed tomography (CT), quantitative histology and gene expression profiling. RESULTS: Micro-CT analysis showed that the loss of terminal bronchioles in COPD occurs in regions of microscopic emphysematous destruction with an average airspace size of ≥500 and <1000âµm, which we have termed a "hot spot". Based on microarray gene expression profiling, the hot spot was associated with an 11-gene signature, with upregulation of pro-inflammatory genes and downregulation of inhibitory immune checkpoint genes, indicating immune response activation. Results from both quantitative histology and the bioinformatics computational tool CIBERSORT, which predicts the percentage of immune cells in tissues from transcriptomic data, showed that the hot spot regions were associated with increased infiltration of CD4 and CD8 T-cell and B-cell lymphocytes. INTERPRETATION: The reduction in terminal bronchioles observed in lungs from patients with COPD occurs in a hot spot of microscopic emphysema, where there is upregulation of IFNG signalling, co-stimulatory immune checkpoint genes and genes related to the inflammasome pathway, and increased infiltration of immune cells. These could be potential targets for therapeutic interventions in COPD.

Subject(s)

Airway Obstruction , Emphysema , Pulmonary Disease, Chronic Obstructive , Pulmonary Emphysema , Bronchioles/pathology , Emphysema/complications , Humans , Pulmonary Disease, Chronic Obstructive/complications , X-Ray Microtomography

4.

Automated medical chart review for breast cancer outcomes research: a novel natural language processing extraction system.

Chen, Yifu; Hao, Lucy; Zou, Vito Z; Hollander, Zsuzsanna; Ng, Raymond T; Isaac, Kathryn V.

BMC Med Res Methodol ; 22(1): 136, 2022 05 12.

Article in English | MEDLINE | ID: mdl-35549854

ABSTRACT

BACKGROUND: Manually extracted data points from health records are collated on an institutional, provincial, and national level to facilitate clinical research. However, the labour-intensive clinical chart review process puts an increasing burden on healthcare system budgets. Therefore, an automated information extraction system is needed to ensure the timeliness and scalability of research data. METHODS: We used a dataset of 100 synoptic operative and 100 pathology reports, evenly split into 50 reports in training and test sets for each report type. The training set guided our development of a Natural Language Processing (NLP) extraction pipeline system, which accepts scanned images of operative and pathology reports. The system uses a combination of rule-based and transfer learning methods to extract numeric encodings from text. We also developed visualization tools to compare the manual and automated extractions. The code for this paper was made available on GitHub. RESULTS: A test set of 50 operative and 50 pathology reports were used to evaluate the extraction accuracies of the NLP pipeline. Gold standard, defined as manual extraction by expert reviewers, yielded accuracies of 90.5% for operative reports and 96.0% for pathology reports, while the NLP system achieved overall 91.9% (operative) and 95.4% (pathology) accuracy. The pipeline successfully extracted outcomes data pertinent to breast cancer tumor characteristics (e.g. presence of invasive carcinoma, size, histologic type), prognostic factors (e.g. number of lymph nodes with micro-metastases and macro-metastases, pathologic stage), and treatment-related variables (e.g. margins, neo-adjuvant treatment, surgical indication) with high accuracy. Out of the 48 variables across operative and pathology codebooks, NLP yielded 43 variables with F-scores of at least 0.90; in comparison, a trained human annotator yielded 44 variables with F-scores of at least 0.90. CONCLUSIONS: The NLP system achieves near-human-level accuracy in both operative and pathology reports using a minimal curated dataset. This system uniquely provides a robust solution for transparent, adaptable, and scalable automation of data extraction from patient health records. It may serve to advance breast cancer clinical research by facilitating collection of vast amounts of valuable health data at a population level.

Subject(s)

Breast Neoplasms , Natural Language Processing , Breast Neoplasms/surgery , Electronic Health Records , Female , Humans , Information Storage and Retrieval , Outcome Assessment, Health Care , Research Report

5.

ProbeRating: a recommender system to infer binding profiles for nucleic acid-binding proteins.

Yang, Shu; Liu, Xiaoxi; Ng, Raymond T.

Bioinformatics ; 36(18): 4797-4804, 2020 09 15.

Article in English | MEDLINE | ID: mdl-32573679

ABSTRACT

MOTIVATION: The interaction between proteins and nucleic acids plays a crucial role in gene regulation and cell function. Determining the binding preferences of nucleic acid-binding proteins (NBPs), namely RNA-binding proteins (RBPs) and transcription factors (TFs), is the key to decipher the protein-nucleic acids interaction code. Today, available NBP binding data from in vivo or in vitro experiments are still limited, which leaves a large portion of NBPs uncovered. Unfortunately, existing computational methods that model the NBP binding preferences are mostly protein specific: they need the experimental data for a specific protein in interest, and thus only focus on experimentally characterized NBPs. The binding preferences of experimentally unexplored NBPs remain largely unknown. RESULTS: Here, we introduce ProbeRating, a nucleic acid recommender system that utilizes techniques from deep learning and word embeddings of natural language processing. ProbeRating is developed to predict binding profiles for unexplored or poorly studied NBPs by exploiting their homologs NBPs which currently have available binding data. Requiring only sequence information as input, ProbeRating adapts FastText from Facebook AI Research to extract biological features. It then builds a neural network-based recommender system. We evaluate the performance of ProbeRating on two different tasks: one for RBP and one for TF. As a result, ProbeRating outperforms previous methods on both tasks. The results show that ProbeRating can be a useful tool to study the binding mechanism for the many NBPs that lack direct experimental evidence. and implementation. AVAILABILITY AND IMPLEMENTATION: The source code is freely available at . SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Nucleic Acids , RNA-Binding Proteins , Binding Sites , Neural Networks, Computer , Protein Binding , RNA-Binding Proteins/metabolism , Software

6.

Analytical Validation of HEARTBiT: A Blood-Based Multiplex Gene Expression Profiling Assay for Exclusionary Diagnosis of Acute Cellular Rejection in Heart Transplant Patients.

Kim, Ji-Young V; Lee, Brandon; Koitsopoulos, Pavlos; Shannon, Casey P; Chen, Virginia; Hollander, Zsuzsanna; Assadian, Sara; Lam, Karen; Ritchie, Gordon; McManus, Janet; McMaster, W Robert; Ng, Raymond T; McManus, Bruce M; Tebbutt, Scott J.

Clin Chem ; 66(8): 1063-1071, 2020 08 01.

Article in English | MEDLINE | ID: mdl-32705124

ABSTRACT

BACKGROUND: HEARTBiT is a whole blood-based gene profiling assay using the nucleic acid counting NanoString technology for the exclusionary diagnosis of acute cellular rejection in heart transplant patients. The HEARTBiT score measures the risk of acute cellular rejection in the first year following heart transplant, distinguishing patients with stable grafts from those at risk for acute cellular rejection. Here, we provide the analytical performance characteristics of the HEARTBiT assay and the results on pilot clinical validation. METHODS: We used purified RNA collected from PAXgene blood samples to evaluate the characteristics of a 12-gene panel HEARTBiT assay, for its linearity range, quantitative bias, precision, and reproducibility. These parameters were estimated either from serial dilutions of individual samples or from repeated runs on pooled samples. RESULTS: We found that all 12 genes showed linear behavior within the recommended assay input range of 125 ng to 500 ng of purified RNA, with most genes showing 3% or lower quantitative bias and around 5% coefficient of variation. Total variation resulting from unique operators, reagent lots, and runs was less than 0.02 units standard deviation (SD). The performance of the analytically validated assay (AUC = 0.75) was equivalent to what we observed in the signature development dataset. CONCLUSION: The analytical performance of the assay within the specification input range demonstrated reliable quantification of the HEARTBiT score within 0.02 SD units, measured on a 0 to 1 unit scale. This assay may therefore be of high utility in clinical validation of HEARTBiT in future biomarker observational trials.

Subject(s)

Gene Expression Profiling/methods , Graft Rejection/diagnosis , Heart Transplantation/adverse effects , RNA/blood , Adult , Biomarkers/blood , Female , Humans , Limit of Detection , Male , Middle Aged , Pilot Projects , Prognosis , Reproducibility of Results

7.

Development and Validation of Apolipoprotein AI-Associated Lipoprotein Proteome Panel for the Prediction of Cholesterol Efflux Capacity and Coronary Artery Disease.

Jin, Zhicheng; Collier, Timothy S; Dai, Darlene L Y; Chen, Virginia; Hollander, Zsuzsanna; Ng, Raymond T; McManus, Bruce M; Balshaw, Robert; Apostolidou, Sophia; Penn, Marc S; Bystrom, Cory.

Clin Chem ; 65(2): 282-290, 2019 02.

Article in English | MEDLINE | ID: mdl-30463841

ABSTRACT

BACKGROUND: Cholesterol efflux capacity (CEC) is a measure of HDL function that, in cell-based studies, has demonstrated an inverse association with cardiovascular disease. The cell-based measure of CEC is complex and low-throughput. We hypothesized that assessment of the lipoprotein proteome would allow for precise, high-throughput CEC prediction. METHODS: After isolating lipoprotein particles from serum, we used LC-MS/MS to quantify 21 lipoprotein-associated proteins. A bioinformatic pipeline was used to identify proteins with univariate correlation to cell-based CEC measurements and generate a multivariate algorithm for CEC prediction (pCE). Using logistic regression, protein coefficients in the pCE model were reweighted to yield a new algorithm predicting coronary artery disease (pCAD). RESULTS: Discovery using targeted LC-MS/MS analysis of 105 training and test samples yielded a pCE model comprising 5 proteins (Spearman r = 0.86). Evaluation of pCE in a case-control study of 231 specimens from healthy individuals and patients with coronary artery disease revealed lower pCE in cases (P = 0.03). Derived within this same study, the pCAD model significantly improved classification (P < 0.0001). Following analytical validation of the multiplexed proteomic method, we conducted a case-control study of myocardial infarction in 137 postmenopausal women that confirmed significant separation of specimen cohorts in both the pCE (P = 0.015) and pCAD (P = 0.001) models. CONCLUSIONS: Development of a proteomic pCE provides a reproducible high-throughput alternative to traditional cell-based CEC assays. The pCAD model improves stratification of case and control cohorts and, with further studies to establish clinical validity, presents a new opportunity for the assessment of cardiovascular health.

Subject(s)

Apolipoprotein A-I/blood , Cholesterol/metabolism , Coronary Artery Disease/pathology , Lipoproteins/blood , Proteome/analysis , Tandem Mass Spectrometry/methods , Area Under Curve , Case-Control Studies , Chromatography, High Pressure Liquid , Coronary Artery Disease/blood , Female , Humans , Limit of Detection , Male , Middle Aged , Myocardial Infarction/blood , Myocardial Infarction/pathology , ROC Curve , Validation Studies as Topic

8.

Effect of short-term oral prednisone therapy on blood gene expression: a randomised controlled clinical trial.

Takiguchi, Hiroto; Chen, Virginia; Obeidat, Ma'en; Hollander, Zsuzsanna; FitzGerald, J Mark; McManus, Bruce M; Ng, Raymond T; Sin, Don D.

Respir Res ; 20(1): 176, 2019 Aug 05.

Article in English | MEDLINE | ID: mdl-31382977

ABSTRACT

BACKGROUND: Effects of systemic corticosteroids on blood gene expression are largely unknown. This study determined gene expression signature associated with short-term oral prednisone therapy in patients with chronic obstructive pulmonary disease (COPD) and its relationship to 1-year mortality following an acute exacerbation of COPD (AECOPD). METHODS: Gene expression in whole blood was profiled using the Affymetrix Human Gene 1.1 ST microarray chips from two cohorts: 1) a prednisone cohort with 37 stable COPD patients randomly assigned to prednisone 30 mg/d + standard therapy for 4 days or standard therapy alone and 2) the Rapid Transition Program (RTP) cohort with 218 COPD patients who experienced AECOPD and were treated with systemic corticosteroids. All gene expression data were adjusted for the total number of white blood cells and their differential cell counts. RESULTS: In the prednisone cohort, 51 genes were differentially expressed between prednisone and standard therapy group at a false discovery rate of < 0.05. The top 3 genes with the largest fold-changes were KLRF1, GZMH and ADGRG1; and 21 genes were significantly enriched in immune system pathways including the natural killer cell mediated cytotoxicity. In the RTP cohort, 27 patients (12.4%) died within 1 year after hospitalisation of AECOPD; 32 of 51 genes differentially expressed in the prednisone cohort significantly changed from AECOPD to the convalescent state and were enriched in similar cellular immune pathways to that in the prednisone cohort. Of these, 10 genes including CX3CR1, KLRD1, S1PR5 and PRF1 were significantly associated with 1-year mortality. CONCLUSIONS: Short-term daily prednisone therapy produces a distinct blood gene signature that may be used to determine and monitor treatment responses to prednisone in COPD patients during AECOPD. TRIAL REGISTRATION: The prednisone cohort was registered at clinicalTrials.gov ( NCT02534402 ) and the RTP cohort was registered at ClinicalTrials.gov ( NCT02050022 ).

Subject(s)

Glucocorticoids/administration & dosage , Prednisone/administration & dosage , Pulmonary Disease, Chronic Obstructive/blood , Pulmonary Disease, Chronic Obstructive/genetics , Administration, Oral , Aged , Aged, 80 and over , Drug Administration Schedule , Female , Gene Expression , Humans , Male , Middle Aged , Pulmonary Disease, Chronic Obstructive/drug therapy

9.

Inferring RNA sequence preferences for poorly studied RNA-binding proteins based on co-evolution.

Yang, Shu; Wang, Junwen; Ng, Raymond T.

BMC Bioinformatics ; 19(1): 96, 2018 03 12.

Article in English | MEDLINE | ID: mdl-29529991

ABSTRACT

BACKGROUND: Characterizing the binding preference of RNA-binding proteins (RBP) is essential for us to understand the interaction between an RBP and its RNA targets, and to decipher the mechanism of post-transcriptional regulation. Experimental methods have been used to generate protein-RNA binding data for a number of RBPs in vivo and in vitro. Utilizing the binding data, a couple of computational methods have been developed to detect the RNA sequence or structure preferences of the RBPs. However, the majority of RBPs have not yet been experimentally characterized and lack RNA binding data. For these poorly studied RBPs, the identification of their binding preferences cannot be performed by most existing computational methods because the experimental binding data are prerequisite to these methods. RESULTS: Here we propose a new method based on co-evolution to predict the sequence preferences for the poorly studied RBPs, waiving the requirement of their binding data. First, we demonstrate the co-evolutionary relationship between RBPs and their RNA partners. We then present a K-nearest neighbors (KNN) based algorithm to infer the sequence preference of an RBP using only the preference information from its homologous RBPs. By benchmarking against several in vitro and in vivo datasets, our proposed method outperforms the existing alternative which uses the closest neighbor's preference on all the datasets. Moreover, it shows comparable performance with two state-of-the-art methods that require the presence of the experimental binding data. Finally, we demonstrate the usage of this method to infer sequence preferences for novel proteins which have no binding preference information available. CONCLUSION: For a poorly studied RBP, the current methods used to determine its binding preference need experimental data, which is expensive and time consuming. Therefore, determining RBP's preference is not practical in many situations. This study provides an economic solution to infer the sequence preference of such protein based on the co-evolution. The source codes and related datasets are available at https://github.com/syang11/KNN .

Subject(s)

Algorithms , Evolution, Molecular , RNA-Binding Proteins/chemistry , RNA-Binding Proteins/metabolism , RNA/chemistry , RNA/metabolism , Binding Sites

10.

Integrative Genomics of Emphysema-Associated Genes Reveals Potential Disease Biomarkers.

Obeidat, Ma'en; Nie, Yunlong; Fishbane, Nick; Li, Xuan; Bossé, Yohan; Joubert, Philippe; Nickle, David C; Hao, Ke; Postma, Dirkje S; Timens, Wim; Sze, Marc A; Shannon, Casey P; Hollander, Zsuzsanna; Ng, Raymond T; McManus, Bruce; Miller, Bruce E; Rennard, Stephen; Spira, Avrum; Hackett, Tillie-Louise; Lam, Wan; Lam, Stephen; Faner, Rosa; Agusti, Alvar; Hogg, James C; Sin, Don D; Paré, Peter D.

Am J Respir Cell Mol Biol ; 57(4): 411-418, 2017 10.

Article in English | MEDLINE | ID: mdl-28459279

ABSTRACT

Chronic obstructive pulmonary disease is the third leading cause of death worldwide. Gene expression profiling across multiple regions of the same lung identified genes significantly related to emphysema. We sought to determine whether the lung and epithelial expression of 127 emphysema-related genes was also related to lung function in independent cohorts, and whether any of these genes could be used as biomarkers in the peripheral blood of patients with chronic obstructive pulmonary disease. To that end, we examined whether the expression levels of these genes were under genetic control in lung tissue (n = 1,111). We then determined whether the mRNA levels of these genes in lung tissue (n = 727), small airway epithelial cells (n = 238), and peripheral blood (n = 620) were significantly related to lung function measurements. The expression of 63 of the 127 genes (50%) was under genetic control in lung tissue. The lung and epithelial mRNA expression of a subset of the emphysema-associated genes, including ASRGL1, LPHN2, and EDNRB, was strongly associated with lung function. In peripheral blood, the expression of 40 genes was significantly associated with lung function. Twenty-nine of these genes (73%) were also associated with lung function in lung tissue, but with the opposite direction of effect for 24 of the 29 genes, including those involved in hypoxia and B cell-related responses. The integrative genomics approach uncovered a significant overlap of emphysema genes associations with lung function between lung and blood with opposite directions between the two. These results support the use of peripheral blood to detect disease biomarkers.

Subject(s)

Gene Expression Profiling , Gene Expression Regulation , Genomics , Lung/metabolism , Pulmonary Emphysema/metabolism , RNA, Messenger/biosynthesis , B-Lymphocytes/metabolism , B-Lymphocytes/pathology , Biomarkers/metabolism , Cell Hypoxia , Female , Humans , Lung/pathology , Male , Pulmonary Emphysema/genetics , Pulmonary Emphysema/pathology , RNA, Messenger/genetics

11.

Enumerateblood - an R package to estimate the cellular composition of whole blood from Affymetrix Gene ST gene expression profiles.

Shannon, Casey P; Balshaw, Robert; Chen, Virginia; Hollander, Zsuzsanna; Toma, Mustafa; McManus, Bruce M; FitzGerald, J Mark; Sin, Don D; Ng, Raymond T; Tebbutt, Scott J.

BMC Genomics ; 18(1): 43, 2017 01 06.

Article in English | MEDLINE | ID: mdl-28061752

ABSTRACT

BACKGROUND: Measuring genome-wide changes in transcript abundance in circulating peripheral whole blood is a useful way to study disease pathobiology and may help elucidate the molecular mechanisms of disease, or discovery of useful disease biomarkers. The sensitivity and interpretability of analyses carried out in this complex tissue, however, are significantly affected by its dynamic cellular heterogeneity. It is therefore desirable to quantify this heterogeneity, either to account for it or to better model interactions that may be present between the abundance of certain transcripts, specific cell types and the indication under study. Accurate enumeration of the many component cell types that make up peripheral whole blood can further complicate the sample collection process, however, and result in additional costs. Many approaches have been developed to infer the composition of a sample from high-dimensional transcriptomic and, more recently, epigenetic data. These approaches rely on the availability of isolated expression profiles for the cell types to be enumerated. These profiles are platform-specific, suitable datasets are rare, and generating them is expensive. No such dataset exists on the Affymetrix Gene ST platform. RESULTS: We present 'Enumerateblood', a freely-available and open source R package that exposes a multi-response Gaussian model capable of accurately predicting the composition of peripheral whole blood samples from Affymetrix Gene ST expression profiles, outperforming other current methods when applied to Gene ST data. CONCLUSIONS: 'Enumerateblood' significantly improves our ability to study disease pathobiology from whole blood gene expression assayed on the popular Affymetrix Gene ST platform by allowing a more complete study of the various components of this complex tissue without the need for additional data collection. Future use of the model may allow for novel insights to be generated from the ~400 Affymetrix Gene ST blood gene expression datasets currently available on the Gene Expression Omnibus (GEO) website.

Subject(s)

Blood Cells/cytology , Blood Cells/metabolism , Gene Expression Profiling , Genomics/methods , Machine Learning , Humans , Models, Statistical

12.

Network-based analysis reveals novel gene signatures in peripheral blood of patients with chronic obstructive pulmonary disease.

Obeidat, Ma'en; Nie, Yunlong; Chen, Virginia; Shannon, Casey P; Andiappan, Anand Kumar; Lee, Bernett; Rotzschke, Olaf; Castaldi, Peter J; Hersh, Craig P; Fishbane, Nick; Ng, Raymond T; McManus, Bruce; Miller, Bruce E; Rennard, Stephen; Paré, Peter D; Sin, Don D.

Respir Res ; 18(1): 72, 2017 04 24.

Article in English | MEDLINE | ID: mdl-28438154

ABSTRACT

BACKGROUND: Chronic obstructive pulmonary disease (COPD) is currently the third leading cause of death and there is a huge unmet clinical need to identify disease biomarkers in peripheral blood. Compared to gene level differential expression approaches to identify gene signatures, network analyses provide a biologically intuitive approach which leverages the co-expression patterns in the transcriptome to identify modules of co-expressed genes. METHODS: A weighted gene co-expression network analysis (WGCNA) was applied to peripheral blood transcriptome from 238 COPD subjects to discover co-expressed gene modules. We then determined the relationship between these modules and forced expiratory volume in 1 s (FEV1). In a second, independent cohort of 381 subjects, we determined the preservation of these modules and their relationship with FEV1. For those modules that were significantly related to FEV1, we determined the biological processes as well as the blood cell-specific gene expression that were over-represented using additional external datasets. RESULTS: Using WGCNA, we identified 17 modules of co-expressed genes in the discovery cohort. Three of these modules were significantly correlated with FEV1 (FDR < 0.1). In the replication cohort, these modules were highly preserved and their FEV1 associations were reproducible (P < 0.05). Two of the three modules were negatively related to FEV1 and were enriched in IL8 and IL10 pathways and correlated with neutrophil-specific gene expression. The positively related module, on the other hand, was enriched in DNA transcription and translation and was strongly correlated to CD4+, CD8+ T cell-specific gene expression. CONCLUSIONS: Network based approaches are promising tools to identify potential biomarkers for COPD. TRIAL REGISTRATION: The ECLIPSE study was funded by GlaxoSmithKline, under ClinicalTrials.gov identifier NCT00292552 and GSK No. SCO104960.

Subject(s)

Cytokines/blood , Cytokines/genetics , Gene Expression Profiling/methods , Metabolic Networks and Pathways/genetics , Models, Genetic , Pulmonary Disease, Chronic Obstructive/blood , Pulmonary Disease, Chronic Obstructive/genetics , Adult , Aged , Biomarkers/blood , Computer Simulation , Female , Humans , Male , Middle Aged , Pulmonary Disease, Chronic Obstructive/diagnosis , Reproducibility of Results , Sensitivity and Specificity

13.

SABRE: a method for assessing the stability of gene modules in complex tissues and subject populations.

Shannon, Casey P; Chen, Virginia; Takhar, Mandeep; Hollander, Zsuzsanna; Balshaw, Robert; McManus, Bruce M; Tebbutt, Scott J; Sin, Don D; Ng, Raymond T.

BMC Bioinformatics ; 17(1): 460, 2016 Nov 14.

Article in English | MEDLINE | ID: mdl-27842512

ABSTRACT

BACKGROUND: Gene network inference (GNI) algorithms can be used to identify sets of coordinately expressed genes, termed network modules from whole transcriptome gene expression data. The identification of such modules has become a popular approach to systems biology, with important applications in translational research. Although diverse computational and statistical approaches have been devised to identify such modules, their performance behavior is still not fully understood, particularly in complex human tissues. Given human heterogeneity, one important question is how the outputs of these computational methods are sensitive to the input sample set, or stability. A related question is how this sensitivity depends on the size of the sample set. We describe here the SABRE (Similarity Across Bootstrap RE-sampling) procedure for assessing the stability of gene network modules using a re-sampling strategy, introduce a novel criterion for identifying stable modules, and demonstrate the utility of this approach in a clinically-relevant cohort, using two different gene network module discovery algorithms. RESULTS: The stability of modules increased as sample size increased and stable modules were more likely to be replicated in larger sets of samples. Random modules derived from permutated gene expression data were consistently unstable, as assessed by SABRE, and provide a useful baseline value for our proposed stability criterion. Gene module sets identified by different algorithms varied with respect to their stability, as assessed by SABRE. Finally, stable modules were more readily annotated in various curated gene set databases. CONCLUSIONS: The SABRE procedure and proposed stability criterion may provide guidance when designing systems biology studies in complex human disease and tissues.

Subject(s)

Computational Biology/methods , Gene Regulatory Networks , Algorithms , Gene Expression Profiling , Humans , Software , Systems Biology , Transcriptome

14.

Discovery of novel plasma protein biomarkers to predict imminent cystic fibrosis pulmonary exacerbations using multiple reaction monitoring mass spectrometry.

Quon, Bradley S; Dai, Darlene L Y; Hollander, Zsuzsanna; Ng, Raymond T; Tebbutt, Scott J; Man, S F Paul; Wilcox, Pearce G; Sin, Don D.

Thorax ; 71(3): 216-22, 2016 Mar.

Article in English | MEDLINE | ID: mdl-25777587

ABSTRACT

BACKGROUND: Despite the significant morbidity and mortality related to pulmonary exacerbations in cystic fibrosis (CF), there remains no reliable predictor of imminent exacerbation. OBJECTIVE: To identify blood-based biomarkers to predict imminent (<4âmonths from stable blood draw) CF pulmonary exacerbations using targeted proteomics. METHODS: 104 subjects provided plasma samples when clinically stable and were randomly split into discovery (n=70) and replication (n=34) cohorts. Multiple reaction monitoring mass spectrometry (MRM-MS) was used to measure 117 peptides (79 proteins) from plasma. Plasma proteins with differential abundance between subjects who did versus did not develop an imminent exacerbation were analysed and proteins with fold difference >1.5 between the groups were included in an MRM-MS classifier model to predict imminent exacerbations. Performance characteristics were compared with clinical predictors and candidate plasma protein biomarkers. RESULTS: Six proteins were included in the final MRM-MS protein panel. The area under the curve (AUC) for the prediction of imminent exacerbations was highest for the MRM-MS protein panel (AUC 0.74) in comparison to FEV1% predicted (AUC 0.55) and the top candidate plasma protein biomarkers, including C-reactive protein (AUC 0.61) and interleukin-6 (AUC 0.60). The MRM-MS protein panel performed similarly in the replication cohort (AUC 0.73). CONCLUSIONS: Using MRM-MS, a six-protein panel measured from plasma can distinguish individuals with versus without an imminent exacerbation. With further replication and assay development, this biomarker panel may be clinically applicable for prediction of exacerbations in individuals with CF.

Subject(s)

Biomarkers/blood , Blood Proteins/analysis , Cystic Fibrosis/blood , Mass Spectrometry/methods , Monitoring, Physiologic/methods , Proteomics/methods , Adult , Disease Progression , Female , Follow-Up Studies , Humans , Male , Retrospective Studies , Time Factors

15.

The Effect of Different Case Definitions of Current Smoking on the Discovery of Smoking-Related Blood Gene Expression Signatures in Chronic Obstructive Pulmonary Disease.

Obeidat, Ma'en; Ding, Xiaoting; Fishbane, Nick; Hollander, Zsuzsanna; Ng, Raymond T; McManus, Bruce; Tebbutt, Scott J; Miller, Bruce E; Rennard, Stephen; Paré, Peter D; Sin, Don D.

Nicotine Tob Res ; 18(9): 1903-9, 2016 09.

Article in English | MEDLINE | ID: mdl-27154971

ABSTRACT

INTRODUCTION: Smoking is the number one modifiable environmental risk factor for chronic obstructive pulmonary disease (COPD). Clinical, epidemiological and increasingly "omics" studies assess or adjust for current smoking status using only self-report, which may be inaccurate. Objective measures such as exhaled carbon monoxide (eCO) may also be problematic owing to limitations in the measurements and the relatively short half life of the molecule. In this study, we determined the impact of different case definitions of current cigarette smoking on gene expression in peripheral blood of patients with COPD. METHODS: Peripheral blood gene expression from 573 former- and current-smokers with COPD in the ECLIPSE study was used to find genes whose expression was associated with smoking status. Current smoking was defined using self-report, eCO concentrations, or both. Linear regression was used to determine the association of current smoking status with gene expression adjusting for age, sex and propensity score. Pathway enrichment analyses were performed on genes with P < .001. RESULT: Using self-report or eCO, only two genes were differentially expressed between current and ex-smokers, with no enrichment in biological processes. When current smoking was defined using both eCO and self-report, four genes were differentially expressed (LRRN3, PID1, FUCA1, GPR15) with enrichment in 40 biological pathways related to metabolic processes, response to hypoxia and hormonal stimulus. Additionally, the combined definition provided better distributions of test statistics for differential gene expression. CONCLUSION: A combined phenotype of eCO and self report allows for better discovery of genes and pathways related to current smoking. IMPLICATIONS: Studies relying only on self report of smoking status to assess or adjust for the impact of smoking may not fully capture its effect and will lead to residual confounding of results.

Subject(s)

Pulmonary Disease, Chronic Obstructive/etiology , Self Report , Smoking/genetics , Adult , Aged , Carbon Monoxide/analysis , Carrier Proteins/genetics , Female , Gene Expression , Humans , Male , Membrane Glycoproteins , Membrane Proteins/genetics , Middle Aged , Neoplasm Proteins/genetics , Phenotype , Receptors, G-Protein-Coupled/genetics , Receptors, Peptide/genetics , Risk Factors , Smoking/adverse effects , Smoking/blood , Transcriptome , alpha-L-Fucosidase/genetics

16.

Biomarker Development for Chronic Obstructive Pulmonary Disease. From Discovery to Clinical Implementation.

Sin, Don D; Hollander, Zsuzsanna; DeMarco, Mari L; McManus, Bruce M; Ng, Raymond T.

Am J Respir Crit Care Med ; 192(10): 1162-70, 2015 Nov 15.

Article in English | MEDLINE | ID: mdl-26176936

ABSTRACT

Chronic obstructive pulmonary disease (COPD) is one of the major causes of morbidity and mortality in the world. Regrettably, there are no biomarkers to objectively diagnose COPD exacerbations, which are the major drivers of hospitalization and deaths from COPD. Moreover, there are no biomarkers to guide therapeutic choices or to risk stratify patients for imminent exacerbations and no objective biomarkers of disease activity or disease progression. Although there has been a tremendous investment in COPD biomarker discovery over the past 2 decades, clinical translation and implementation have not matched these efforts. In this article, we outline the challenges of biomarker development in COPD and provide an overview of a developmental pipeline that may be able to surmount these challenges and bring novel biomarker solutions to accelerate therapeutic discoveries and to improve the care and outcomes of the millions of individuals worldwide with COPD.

Subject(s)

Genetic Markers , Precision Medicine/methods , Pulmonary Disease, Chronic Obstructive/genetics , Disease Progression , Gene Expression Profiling , Humans , Metabolomics/methods , Prognosis , Proteomics/methods , Pulmonary Disease, Chronic Obstructive/drug therapy , Pulmonary Disease, Chronic Obstructive/physiopathology , Risk Assessment/methods

17.

Computational biomarker pipeline from discovery to clinical implementation: plasma proteomic biomarkers for cardiac transplantation.

Cohen Freue, Gabriela V; Meredith, Anna; Smith, Derek; Bergman, Axel; Sasaki, Mayu; Lam, Karen K Y; Hollander, Zsuzsanna; Opushneva, Nina; Takhar, Mandeep; Lin, David; Wilson-McManus, Janet; Balshaw, Robert; Keown, Paul A; Borchers, Christoph H; McManus, Bruce; Ng, Raymond T; McMaster, W Robert.

PLoS Comput Biol ; 9(4): e1002963, 2013 Apr.

Article in English | MEDLINE | ID: mdl-23592955

ABSTRACT

Recent technical advances in the field of quantitative proteomics have stimulated a large number of biomarker discovery studies of various diseases, providing avenues for new treatments and diagnostics. However, inherent challenges have limited the successful translation of candidate biomarkers into clinical use, thus highlighting the need for a robust analytical methodology to transition from biomarker discovery to clinical implementation. We have developed an end-to-end computational proteomic pipeline for biomarkers studies. At the discovery stage, the pipeline emphasizes different aspects of experimental design, appropriate statistical methodologies, and quality assessment of results. At the validation stage, the pipeline focuses on the migration of the results to a platform appropriate for external validation, and the development of a classifier score based on corroborated protein biomarkers. At the last stage towards clinical implementation, the main aims are to develop and validate an assay suitable for clinical deployment, and to calibrate the biomarker classifier using the developed assay. The proposed pipeline was applied to a biomarker study in cardiac transplantation aimed at developing a minimally invasive clinical test to monitor acute rejection. Starting with an untargeted screening of the human plasma proteome, five candidate biomarker proteins were identified. Rejection-regulated proteins reflect cellular and humoral immune responses, acute phase inflammatory pathways, and lipid metabolism biological processes. A multiplex multiple reaction monitoring mass-spectrometry (MRM-MS) assay was developed for the five candidate biomarkers and validated by enzyme-linked immune-sorbent (ELISA) and immunonephelometric assays (INA). A classifier score based on corroborated proteins demonstrated that the developed MRM-MS assay provides an appropriate methodology for an external validation, which is still in progress. Plasma proteomic biomarkers of acute cardiac rejection may offer a relevant post-transplant monitoring tool to effectively guide clinical care. The proposed computational pipeline is highly applicable to a wide range of biomarker proteomic studies.

Subject(s)

Biomarkers/analysis , Blood Proteins/analysis , Computational Biology/methods , Heart Transplantation , Proteomics/methods , Calibration , Cohort Studies , Enzyme-Linked Immunosorbent Assay , Graft Rejection , Heart Failure/therapy , Humans , Inflammation , Mass Spectrometry , Proteome/analysis

18.

Predicting which patients with cancer will see a psychiatrist or counsellor from their initial oncology consultation document using natural language processing.

Nunez, John-Jose; Leung, Bonnie; Ho, Cheryl; Ng, Raymond T; Bates, Alan T.

Commun Med (Lond) ; 4(1): 69, 2024 Apr 08.

Article in English | MEDLINE | ID: mdl-38589545

ABSTRACT

BACKGROUND: Patients with cancer often have unmet psychosocial needs. Early detection of who requires referral to a counsellor or psychiatrist may improve their care. This work used natural language processing to predict which patients will see a counsellor or psychiatrist from a patient's initial oncology consultation document. We believe this is the first use of artificial intelligence to predict psychiatric outcomes from non-psychiatric medical documents. METHODS: This retrospective prognostic study used data from 47,625 patients at BC Cancer. We analyzed initial oncology consultation documents using traditional and neural language models to predict whether patients would see a counsellor or psychiatrist in the 12 months following their initial oncology consultation. RESULTS: Here, we show our best models achieved a balanced accuracy (receiver-operating-characteristic area-under-curve) of 73.1% (0.824) for predicting seeing a psychiatrist, and 71.0% (0.784) for seeing a counsellor. Different words and phrases are important for predicting each outcome. CONCLUSION: These results suggest natural language processing can be used to predict psychosocial needs of patients with cancer from their initial oncology consultation document. Future research could extend this work to predict the psychosocial needs of medical patients in other settings.

Patients with cancer often need support for their mental health. Early detection of who requires referral to a counsellor or psychiatrist may improve their care. This study trained a type of artificial intelligence (AI) called natural language processing to read the consultation report an oncologist writes after they first see a patient to predict which patients will see a counsellor or psychiatrist. The AI predicted this with performance similar to other uses of AI in mental health, and used different words and phrases to predict who would see a psychiatrist compared to seeing a counsellor. We believe this is the first use of AI to predict mental health outcomes from medical documents written by clinicians outside of mental health. This study suggests this type of AI can predict the mental health needs of patients with cancer from this widely-available document.

19.

Pairwise network mechanisms in the host signaling response to coxsackievirus B3 infection.

Garmaroudi, Farshid S; Marchant, David; Si, Xiaoning; Khalili, Abbas; Bashashati, Ali; Wong, Brian W; Tabet, Aline; Ng, Raymond T; Murphy, Kevin; Luo, Honglin; Janes, Kevin A; McManus, Bruce M.

Proc Natl Acad Sci U S A ; 107(39): 17053-8, 2010 Sep 28.

Article in English | MEDLINE | ID: mdl-20833815

ABSTRACT

Signal transduction networks can be perturbed biochemically, genetically, and pharmacologically to unravel their functions. But at the systems level, it is not clear how such perturbations are best implemented to extract molecular mechanisms that underlie network function. Here, we combined pairwise perturbations with multiparameter phosphorylation measurements to reveal causal mechanisms within the signaling network response of cardiomyocytes to coxsackievirus B3 (CVB3) infection. Using all possible pairs of six kinase inhibitors, we assembled a dynamic nine-protein phosphorylation signature of perturbed CVB3 infectivity. Cluster analysis of the resulting dataset showed repeatedly that paired inhibitor data were required for accurate data-driven predictions of kinase substrate links in the host network. With pairwise data, we also derived a high-confidence network based on partial correlations, which identified phospho-IκBα as a central "hub" in the measured phosphorylation signature. The reconstructed network helped to connect phospho-IκBα with an autocrine feedback circuit in host cells involving the proinflammatory cytokines, TNF and IL-1. Autocrine blockade substantially inhibited CVB3 progeny release and improved host cell viability, implicating TNF and IL-1 as cell autonomous components of CVB3-induced myocardial damage. We conclude that pairwise perturbations, when combined with network-level intracellular measurements, enrich for mechanisms that would be overlooked by single perturbants.

Subject(s)

Enterovirus B, Human , Enterovirus Infections/metabolism , Host-Pathogen Interactions , Metabolic Networks and Pathways , Myocytes, Cardiac/virology , Cell Line , Humans , Interleukin-1/metabolism , Myocytes, Cardiac/drug effects , Myocytes, Cardiac/metabolism , Phosphorylation , Protein Kinase Inhibitors/pharmacology , Signal Transduction , Tumor Necrosis Factor-alpha/metabolism

20.

Predicting the Survival of Patients With Cancer From Their Initial Oncology Consultation Document Using Natural Language Processing.

Nunez, John-Jose; Leung, Bonnie; Ho, Cheryl; Bates, Alan T; Ng, Raymond T.

JAMA Netw Open ; 6(2): e230813, 2023 02 01.

Article in English | MEDLINE | ID: mdl-36848085

ABSTRACT

Importance: Predicting short- and long-term survival of patients with cancer may improve their care. Prior predictive models either use data with limited availability or predict the outcome of only 1 type of cancer. Objective: To investigate whether natural language processing can predict survival of patients with general cancer from a patient's initial oncologist consultation document. Design, Setting, and Participants: This retrospective prognostic study used data from 47â¯625 of 59â¯800 patients who started cancer care at any of the 6 BC Cancer sites located in the province of British Columbia between April 1, 2011, and December 31, 2016. Mortality data were updated until April 6, 2022, and data were analyzed from update until September 30, 2022. All patients with a medical or radiation oncologist consultation document generated within 180 days of diagnosis were included; patients seen for multiple cancers were excluded. Exposures: Initial oncologist consultation documents were analyzed using traditional and neural language models. Main Outcomes and Measures: The primary outcome was the performance of the predictive models, including balanced accuracy and receiver operating characteristics area under the curve (AUC). The secondary outcome was investigating what words the models used. Results: Of the 47â¯625 patients in the sample, 25â¯428 (53.4%) were female and 22â¯197 (46.6%) were male, with a mean (SD) age of 64.9 (13.7) years. A total of 41â¯447 patients (87.0%) survived 6 months, 31â¯143 (65.4%) survived 36 months, and 27â¯880 (58.5%) survived 60 months, calculated from their initial oncologist consultation. The best models achieved a balanced accuracy of 0.856 (AUC, 0.928) for predicting 6-month survival, 0.842 (AUC, 0.918) for 36-month survival, and 0.837 (AUC, 0.918) for 60-month survival, on a holdout test set. Differences in what words were important for predicting 6- vs 60-month survival were found. Conclusions and Relevance: These findings suggest that models performed comparably with or better than previous models predicting cancer survival and that they may be able to predict survival using readily available data without focusing on 1 cancer type.

Subject(s)

Natural Language Processing , Neoplasms , Humans , Female , Male , Middle Aged , Aged , Retrospective Studies , Neoplasms/therapy , Medical Oncology , Referral and Consultation

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL