Search | VHL Regional Portal

1.

Uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns.

Borisov, Nicolas; Tkachev, Victor; Simonov, Alexander; Sorokin, Maxim; Kim, Ella; Kuzmin, Denis; Karademir-Yilmaz, Betul; Buzdin, Anton.

Front Mol Biosci ; 10: 1237129, 2023.

Article in English | MEDLINE | ID: mdl-37745690

ABSTRACT

Introduction: Co-normalization of RNA profiles obtained using different experimental platforms and protocols opens avenue for comprehensive comparison of relevant features like differentially expressed genes associated with disease. Currently, most of bioinformatic tools enable normalization in a flexible format that depends on the individual datasets under analysis. Thus, the output data of such normalizations will be poorly compatible with each other. Recently we proposed a new approach to gene expression data normalization termed Shambhala which returns harmonized data in a uniform shape, where every expression profile is transformed into a pre-defined universal format. We previously showed that following shambhalization of human RNA profiles, overall tissue-specific clustering features are strongly retained while platform-specific clustering is dramatically reduced. Methods: Here, we tested Shambhala performance in retention of fold-change gene expression features and other functional characteristics of gene clusters such as pathway activation levels and predicted cancer drug activity scores. Results: Using 6,793 cancer and 11,135 normal tissue gene expression profiles from the literature and experimental datasets, we applied twelve performance criteria for different versions of Shambhala and other methods of transcriptomic harmonization with flexible output data format. Such criteria dealt with the biological type classifiers, hierarchical clustering, correlation/regression properties, stability of drug efficiency scores, and data quality for using machine learning classifiers. Discussion: Shambhala-2 harmonizer demonstrated the best results with the close to 1 correlation and linear regression coefficients for the comparison of training vs validation datasets and more than two times lesser instability for calculation of drug efficiency scores compared to other methods.

2.

Transcriptomic Harmonization as the Way for Suppressing Cross-Platform Bias and Batch Effect.

Borisov, Nicolas; Buzdin, Anton.

Biomedicines ; 10(9)2022 Sep 18.

Article in English | MEDLINE | ID: mdl-36140419

ABSTRACT

(1) Background: Emergence of methods interrogating gene expression at high throughput gave birth to quantitative transcriptomics, but also posed a question of inter-comparison of expression profiles obtained using different equipment and protocols and/or in different series of experiments. Addressing this issue is challenging, because all of the above variables can dramatically influence gene expression signals and, therefore, cause a plethora of peculiar features in the transcriptomic profiles. Millions of transcriptomic profiles were obtained and deposited in public databases of which the usefulness is however strongly limited due to the inter-comparison issues; (2) Methods: Dozens of methods and software packages that can be generally classified as either flexible or predefined format harmonizers have been proposed, but none has become to the date the gold standard for unification of this type of Big Data; (3) Results: However, recent developments evidence that platform/protocol/batch bias can be efficiently reduced not only for the comparisons of limited transcriptomic datasets. Instead, instruments were proposed for transforming gene expression profiles into the universal, uniformly shaped format that can support multiple inter-comparisons for reasonable calculation costs. This forms a basement for universal indexing of all or most of all types of RNA sequencing and microarray hybridization profiles; (4) Conclusions: In this paper, we attempted to overview the landscape of modern approaches and methods in transcriptomic harmonization and focused on the practical aspects of their application.

3.

The Role of the Metabolism of Zinc and Manganese Ions in Human Cancerogenesis.

Rozenberg, Julian Markovich; Kamynina, Margarita; Sorokin, Maksim; Zolotovskaia, Marianna; Koroleva, Elena; Kremenchutckaya, Kristina; Gudkov, Alexander; Buzdin, Anton; Borisov, Nicolas.

Biomedicines ; 10(5)2022 May 05.

Article in English | MEDLINE | ID: mdl-35625809

ABSTRACT

Metal ion homeostasis is fundamental for life. Specifically, transition metals iron, manganese and zinc play a pivotal role in mitochondrial metabolism and energy generation, anti-oxidation defense, transcriptional regulation and the immune response. The misregulation of expression or mutations in ion carriers and the corresponding changes in Mn2+ and Zn2+ levels suggest that these ions play a pivotal role in cancer progression. Moreover, coordinated changes in Mn2+ and Zn2+ ion carriers have been detected, suggesting that particular mechanisms influenced by both ions might be required for the growth of cancer cells, metastasis and immune evasion. Here, we present a review of zinc and manganese pathophysiology suggesting that these ions might cooperatively regulate cancerogenesis. Zn and Mn effects converge on mitochondria-induced apoptosis, transcriptional regulation and the cGAS-STING signaling pathway, mediating the immune response. Both Zn and Mn influence cancer progression and impact treatment efficacy in animal models and clinical trials. We predict that novel strategies targeting the regulation of both Zn and Mn in cancer will complement current therapeutic strategies.

4.

OncoboxPD: human 51 672 molecular pathways database with tools for activity calculating and visualization.

Zolotovskaia, Marianna A; Tkachev, Victor S; Guryanova, Anastasia A; Simonov, Alexander M; Raevskiy, Mikhail M; Efimov, Victor V; Wang, Ye; Sekacheva, Marina I; Garazha, Andrew V; Borisov, Nicolas M; Kuzmin, Denis V; Sorokin, Maxim I; Buzdin, Anton A.

Comput Struct Biotechnol J ; 20: 2280-2291, 2022.

Article in English | MEDLINE | ID: mdl-35615022

ABSTRACT

OncoboxPD (Oncobox pathway databank) available at https://open.oncobox.com is the collection of 51 672 uniformly processed human molecular pathways. Superposition of all pathways formed interactome graph of protein-protein interactions and metabolic reactions containing 361 654 interactions and 64 095 molecular participants. Pathways are uniformly classified by biological processes, and each pathway node is algorithmically functionally annotated by specific activator/repressor role. This enables online calculation of statistically supported pathway activation levels (PALs) with the built-in bioinformatic tool using custom RNA/protein expression profiles. Each pathway can be visualized as static or dynamic graph, where vertices are molecules participating in a pathway and edges are interactions or reactions between them. Differentially expressed nodes in a pathway can be visualized in two-color mode with user-defined color scale. For every comparison, OncoboxPD also generates a graph summarizing top up- and downregulated pathways.

5.

Shambhala-2: A Protocol for Uniformly Shaped Harmonization of Gene Expression Profiles of Various Formats.

Borisov, Nicolas; Sorokin, Maksim; Zolotovskaya, Marianna; Borisov, Constantin; Buzdin, Anton.

Curr Protoc ; 2(5): e444, 2022 May.

Article in English | MEDLINE | ID: mdl-35617464

ABSTRACT

Uniformly shaped harmonization of gene expression profiles is central for the simultaneous comparison of multiple gene expression datasets. It is expected to operate with the gene expression data obtained using various experimental methods and equipment, and to return harmonized profiles in a uniform shape. Such uniformly shaped expression profiles from different initial datasets can be further compared directly. However, current harmonization techniques have strong limitations that prevent their broad use for bioinformatic applications. They can either operate with only up to two datasets/platforms or return data in a dynamic format that will be different for every comparison under analysis. This also does not allow for adding new data to the previously harmonized dataset(s), which complicates the analysis and increases calculation costs. We propose here a new method termed Shambhala-2 that can transform multi-platform expression data into a universal format that is identical for all harmonizations made using this technique. Shambhala-2 is based on sample-by-sample cubic conversion of the initial expression dataset into a preselected shape of the reference definitive dataset. Using 8390 samples of 12 healthy human tissue types and 4086 samples of colorectal, kidney, and lung cancer tissues, we verified Shambhala-2's capacity in restoring tissue-specific expression patterns for seven microarray and three RNA sequencing platforms. Shambhala-2 performed well for all tested combinations of RNAseq and microarray profiles, and retained gene-expression ranks, as evidenced by high correlations between different single- or aggregated gene expression metrics in pre- and post-Shambhalized samples, including preserving cancer-specific gene expression and pathway activation features. © 2022 Wiley Periodicals LLC. Basic Protocol: Shambhala-2 harmonizer Alternate Protocol 1: Linear Shambhala/Shambhala-1 Alternate Protocol 2: Alternative (flexible-format and uniformly shaped) normalization methods Support Protocol 1: Watermelon multisection (WM) Support Protocol 2: Calculation of cancer-to-normal log-fold-change (LFC) and pathway activation level (PAL).

Subject(s)

Neoplasms , Transcriptome , Gene Expression Profiling/methods , Humans , Microarray Analysis , Sequence Analysis, RNA

6.

Gene Expression-Based Signature Can Predict Sorafenib Response in Kidney Cancer.

Gudkov, Alexander; Shirokorad, Valery; Kashintsev, Kirill; Sokov, Dmitriy; Nikitin, Daniil; Anisenko, Andrey; Borisov, Nicolas; Sekacheva, Marina; Gaifullin, Nurshat; Garazha, Andrew; Suntsova, Maria; Koroleva, Elena; Buzdin, Anton; Sorokin, Maksim.

Front Mol Biosci ; 9: 753318, 2022.

Article in English | MEDLINE | ID: mdl-35359606

ABSTRACT

Sorafenib is a tyrosine kinase inhibitory drug with multiple molecular specificities that is approved for clinical use in second-line treatments of metastatic and advanced renal cell carcinomas (RCCs). However, only 10-40% of RCC patients respond on sorafenib-containing therapies, and personalization of its prescription may help in finding an adequate balance of clinical efficiency, cost-effectiveness, and side effects. We investigated whether expression levels of known molecular targets of sorafenib in RCC can serve as prognostic biomarker of treatment response. We used Illumina microarrays to profile RNA expression in pre-treatment formalin-fixed paraffin-embedded (FFPE) samples of 22 metastatic or advanced RCC cases with known responses on next-line sorafenib monotherapy. Among them, nine patients showed partial response (PR), three patients-stable disease (SD), and 10 patients-progressive disease (PD) according to Response Evaluation Criteria In Solid Tumors (RECIST) criteria. We then classified PR + SD patients as "responders" and PD patients as "poor responders". We found that gene signature including eight sorafenib target genes was congruent with the drug response characteristics and enabled high-quality separation of the responders and poor responders [area under a receiver operating characteristic curve (AUC) 0.89]. We validated these findings on another set of 13 experimental annotated FFPE RCC samples (for 2 PR, 1 SD, and 10 PD patients) that were profiled by RNA sequencing and observed AUC 0.97 for 8-gene signature as the response classifier. We further validated these results in a series of qRT-PCR experiments on the third experimental set of 12 annotated RCC biosamples (for 4 PR, 3 SD, and 5 PD patients), where 8-gene signature showed AUC 0.83.

7.

Better Agreement of Human Transcriptomic and Proteomic Cancer Expression Data at the Molecular Pathway Activation Level.

Raevskiy, Mikhail; Sorokin, Maxim; Zakharova, Galina; Tkachev, Victor; Borisov, Nicolas; Kuzmin, Denis; Kremenchutckaya, Kristina; Gudkov, Alexander; Kamashev, Dmitry; Buzdin, Anton.

Int J Mol Sci ; 23(5)2022 Feb 26.

Article in English | MEDLINE | ID: mdl-35269755

ABSTRACT

Previously, we have shown that the aggregation of RNA-level gene expression profiles into quantitative molecular pathway activation metrics results in lesser batch effects and better agreement between different experimental platforms. Here, we investigate whether pathway level of data analysis provides any advantage when comparing transcriptomic and proteomic data. We compare the paired proteomic and transcriptomic gene expression and pathway activation profiles obtained for the same human cancer biosamples in The Cancer Genome Atlas (TCGA) and the NCI Clinical Proteomic Tumor Analysis Consortium (CPTAC) projects, for a total of 755 samples of glioblastoma, breast, liver, lung, ovarian, pancreatic, and uterine cancers. In a CPTAC assay, expression levels of 15,112 protein-coding genes were profiled using the Thermo QE series of mass spectrometers. In TCGA, RNA expression levels of the same genes were obtained using the Illumina HiSeq 4000 engine for the same biosamples. At the gene level, absolute gene expression values are compared, whereas pathway-grade comparisons are made between the pathway activation levels (PALs) calculated using average sample-normalized transcriptomic and proteomic profiles. We observed remarkably different average correlations between the primary RNA- and protein expression data for different cancer types: Spearman Rho between 0.017 (p = 1.7 × 10−13) and 0.27 (p < 2.2 × 10−16). However, at the pathway level we detected overall statistically significantly higher correlations: averaged Rho between 0.022 (p < 2.2 × 10−16) and 0.56 (p < 2.2 × 10−16). Thus, we conclude that data analysis at the PAL-level yields results of a greater similarity when comparing high-throughput RNA and protein expression profiles.

Subject(s)

Neoplasms , Transcriptome , Gene Expression Profiling/methods , Humans , Mass Spectrometry , Neoplasms/genetics , Neoplasms/metabolism , Proteomics , RNA

8.

FNC: An Advanced Anticancer Therapeutic or Just an Underdog?

Fayzullina, Daria; Kharwar, Rajesh Kumar; Acharya, Arbind; Buzdin, Anton; Borisov, Nicolas; Timashev, Peter; Ulasov, Ilya; Kapomba, Byron.

Front Oncol ; 12: 820647, 2022.

Article in English | MEDLINE | ID: mdl-35223502

ABSTRACT

Azvudine (FNC) is a novel cytidine analogue that has both antiviral and anticancer activities. This minireview focuses on its underlying molecular mechanisms of suppressing viral life cycle and cancer cell growth and discusses applications of this nucleoside drug for advanced therapy of tumors and malignant blood diseases. FNC inhibits positive-stand RNA viruses, like HCV, EV, SARS-COV-2, HBV, and retroviruses, including HIV, by suppressing their RNA-dependent polymerase enzymes. It may also inhibit such enzyme (reverse transcriptase) in the human retrotransposons, including human endogenous retroviruses (HERVs). As the activation of retrotransposons can be the major factor of ongoing cancer genome instability and consequently higher aggressiveness of tumors, FNC has a potential to increase the efficacy of multiple anticancer therapies. Furthermore, FNC also showed other aspects of anticancer activity by inhibiting adhesion, migration, invasion, and proliferation of malignant cells. It was also reported to be involved in cell cycle arrest and apoptosis, thereby inhibiting the progression of cancer through different pathways. To the date, the grounds of FNC effects on cancer cells are not fully understood and hence additional studies are needed for better understanding molecular mechanisms of its anticancer activities to support its medical use in oncology.

9.

Using proteomic and transcriptomic data to assess activation of intracellular molecular pathways.

Buzdin, Anton; Tkachev, Victor; Zolotovskaia, Marianna; Garazha, Andrew; Moshkovskii, Sergey; Borisov, Nicolas; Gaifullin, Nurshat; Sorokin, Maksim; Suntsova, Maria.

Adv Protein Chem Struct Biol ; 127: 1-53, 2021.

Article in English | MEDLINE | ID: mdl-34340765

ABSTRACT

Analysis of molecular pathway activation is the recent instrument that helps to quantize activities of various intracellular signaling, structural, DNA synthesis and repair, and biochemical processes. This may have a deep impact in fundamental research, bioindustry, and medicine. Unlike gene ontology analyses and numerous qualitative methods that can establish whether a pathway is affected in principle, the quantitative approach has the advantage of exactly measuring the extent of a pathway up/downregulation. This results in emergence of a new generation of molecular biomarkers-pathway activation levels, which reflect concentration changes of all measurable pathway components. The input data can be the high-throughput proteomic or transcriptomic profiles, and the output numbers take both positive and negative values and positively reflect overall pathway activation. Due to their nature, the pathway activation levels are more robust biomarkers compared to the individual gene products/protein levels. Here, we review the current knowledge of the quantitative gene expression interrogation methods and their applications for the molecular pathway quantization. We consider enclosed bioinformatic algorithms and their applications for solving real-world problems. Besides a plethora of applications in basic life sciences, the quantitative pathway analysis can improve molecular design and clinical investigations in pharmaceutical industry, can help finding new active biotechnological components and can significantly contribute to the progressive evolution of personalized medicine. In addition to the theoretical principles and concepts, we also propose publicly available software for the use of large-scale protein/RNA expression data to assess the human pathway activation levels.

Subject(s)

Algorithms , Gene Expression Profiling , Precision Medicine , Proteomics , Animals , Humans

10.

Machine Learning Applicability for Classification of PAD/VCD Chemotherapy Response Using 53 Multiple Myeloma RNA Sequencing Profiles.

Borisov, Nicolas; Sergeeva, Anna; Suntsova, Maria; Raevskiy, Mikhail; Gaifullin, Nurshat; Mendeleeva, Larisa; Gudkov, Alexander; Nareiko, Maria; Garazha, Andrew; Tkachev, Victor; Li, Xinmin; Sorokin, Maxim; Surin, Vadim; Buzdin, Anton.

Front Oncol ; 11: 652063, 2021.

Article in English | MEDLINE | ID: mdl-33937058

ABSTRACT

Multiple myeloma (MM) affects ~500,000 people and results in ~100,000 deaths annually, being currently considered treatable but incurable. There are several MM chemotherapy treatment regimens, among which eleven include bortezomib, a proteasome-targeted drug. MM patients respond differently to bortezomib, and new prognostic biomarkers are needed to personalize treatments. However, there is a shortage of clinically annotated MM molecular data that could be used to establish novel molecular diagnostics. We report new RNA sequencing profiles for 53 MM patients annotated with responses on two similar chemotherapy regimens: bortezomib, doxorubicin, dexamethasone (PAD), and bortezomib, cyclophosphamide, dexamethasone (VCD), or with responses to their combinations. Fourteen patients received both PAD and VCD; six received only PAD, and 33 received only VCD. We compared profiles for the good and poor responders and found five genes commonly regulated here and in the previous datasets for other bortezomib regimens (all upregulated in the good responders): FGFR3, MAF, IGHA2, IGHV1-69, and GRB14. Four of these genes are linked with known immunoglobulin locus rearrangements. We then used five machine learning (ML) methods to build a classifier distinguishing good and poor responders for two cohorts: PAD + VCD (53 patients), and separately VCD (47 patients). We showed that the application of FloWPS dynamic data trimming was beneficial for all ML methods tested in both cohorts, and also in the previous MM bortezomib datasets. However, the ML models build for the different datasets did not allow cross-transferring, which can be due to different treatment regimens, experimental profiling methods, and MM heterogeneity.

11.

Algorithmic Annotation of Functional Roles for Components of 3,044 Human Molecular Pathways.

Sorokin, Maxim; Borisov, Nicolas; Kuzmin, Denis; Gudkov, Alexander; Zolotovskaia, Marianna; Garazha, Andrew; Buzdin, Anton.

Front Genet ; 12: 617059, 2021.

Article in English | MEDLINE | ID: mdl-33633781

ABSTRACT

Current methods of high-throughput molecular and genomic analyses enabled to reconstruct thousands of human molecular pathways. Knowledge of molecular pathways structure and architecture taken along with the gene expression data can help interrogating the pathway activation levels (PALs) using different bioinformatic algorithms. In turn, the pathway activation profiles can characterize molecular processes, which are differentially regulated and give numeric characteristics of the extent of their activation or inhibition. However, different pathway nodes may have different functions toward overall pathway regulation, and calculation of PAL requires knowledge of molecular function of every node in the pathway in terms of its activator or inhibitory role. Thus, high-throughput annotation of functional roles of pathway nodes is required for the comprehensive analysis of the pathway activation profiles. We proposed an algorithm that identifies functional roles of the pathway components and applied it to annotate 3,044 human molecular pathways extracted from the Biocarta, Reactome, KEGG, Qiagen Pathway Central, NCI, and HumanCYC databases and including 9,022 gene products. The resulting knowledgebase can be applied for the direct calculation of the PALs and establishing large scale profiles of the signaling, metabolic, and DNA repair pathway regulation using high throughput gene expression data. We also provide a bioinformatic tool for PAL data calculations using the current pathway knowledgebase.

12.

Cancer gene expression profiles associated with clinical outcomes to chemotherapy treatments.

Borisov, Nicolas; Sorokin, Maxim; Tkachev, Victor; Garazha, Andrew; Buzdin, Anton.

BMC Med Genomics ; 13(Suppl 8): 111, 2020 09 18.

Article in English | MEDLINE | ID: mdl-32948183

ABSTRACT

BACKGROUND: Machine learning (ML) methods still have limited applicability in personalized oncology due to low numbers of available clinically annotated molecular profiles. This doesn't allow sufficient training of ML classifiers that could be used for improving molecular diagnostics. METHODS: We reviewed published datasets of high throughput gene expression profiles corresponding to cancer patients with known responses on chemotherapy treatments. We browsed Gene Expression Omnibus (GEO), The Cancer Genome Atlas (TCGA) and Tumor Alterations Relevant for GEnomics-driven Therapy (TARGET) repositories. RESULTS: We identified data collections suitable to build ML models for predicting responses on certain chemotherapeutic schemes. We identified 26 datasets, ranging from 41 till 508 cases per dataset. All the datasets identified were checked for ML applicability and robustness with leave-one-out cross validation. Twenty-three datasets were found suitable for using ML that had balanced numbers of treatment responder and non-responder cases. CONCLUSIONS: We collected a database of gene expression profiles associated with clinical responses on chemotherapy for 2786 individual cancer cases. Among them seven datasets included RNA sequencing data (for 645 cases) and the others - microarray expression profiles. The cases represented breast cancer, lung cancer, low-grade glioma, endothelial carcinoma, multiple myeloma, adult leukemia, pediatric leukemia and kidney tumors. Chemotherapeutics included taxanes, bortezomib, vincristine, trastuzumab, letrozole, tipifarnib, temozolomide, busulfan and cyclophosphamide.

Subject(s)

Gene Expression Profiling , Machine Learning , Neoplasms/drug therapy , Antineoplastic Agents/therapeutic use , Humans , Neoplasms/genetics , Progression-Free Survival , Treatment Outcome

13.

Disparity between Inter-Patient Molecular Heterogeneity and Repertoires of Target Drugs Used for Different Types of Cancer in Clinical Oncology.

Zolotovskaia, Marianna A; Sorokin, Maxim I; Petrov, Ivan V; Poddubskaya, Elena V; Moiseev, Alexey A; Sekacheva, Marina I; Borisov, Nicolas M; Tkachev, Victor S; Garazha, Andrew V; Kaprin, Andrey D; Shegay, Peter V; Giese, Alf; Kim, Ella; Roumiantsev, Sergey A; Buzdin, Anton A.

Int J Mol Sci ; 21(5)2020 Feb 26.

Article in English | MEDLINE | ID: mdl-32111026

ABSTRACT

Inter-patient molecular heterogeneity is the major declared driver of an expanding variety of anticancer drugs and personalizing their prescriptions. Here, we compared interpatient molecular heterogeneities of tumors and repertoires of drugs or their molecular targets currently in use in clinical oncology. We estimated molecular heterogeneity using genomic (whole exome sequencing) and transcriptomic (RNA sequencing) data for 4890 tumors taken from The Cancer Genome Atlas database. For thirteen major cancer types, we compared heterogeneities at the levels of mutations and gene expression with the repertoires of targeted therapeutics and their molecular targets accepted by the current guidelines in oncology. Totally, 85 drugs were investigated, collectively covering 82 individual molecular targets. For the first time, we showed that the repertoires of molecular targets of accepted drugs did not correlate with molecular heterogeneities of different cancer types. On the other hand, we found that the clinical recommendations for the available cancer drugs were strongly congruent with the gene expression but not gene mutation patterns. We detected the best match among the drugs usage recommendations and molecular patterns for the kidney, stomach, bladder, ovarian and endometrial cancers. In contrast, brain tumors, prostate and colorectal cancers showed the lowest match. These findings provide a theoretical basis for reconsidering usage of targeted therapeutics and intensifying drug repurposing efforts.

Subject(s)

Drug Delivery Systems , Genetic Heterogeneity , Medical Oncology/methods , Molecular Targeted Therapy/methods , Neoplasms/drug therapy , Neoplasms/genetics , Antineoplastic Agents/therapeutic use , Cluster Analysis , Drug Therapy , Genomics , Humans , Mutation , Pathology, Molecular , Precision Medicine/methods , Transcriptome , Exome Sequencing

14.

Flexible Data Trimming Improves Performance of Global Machine Learning Methods in Omics-Based Personalized Oncology.

Tkachev, Victor; Sorokin, Maxim; Borisov, Constantin; Garazha, Andrew; Buzdin, Anton; Borisov, Nicolas.

Int J Mol Sci ; 21(3)2020 Jan 22.

Article in English | MEDLINE | ID: mdl-31979006

ABSTRACT

(1) Background: Machine learning (ML) methods are rarely used for an omics-based prescription of cancer drugs, due to shortage of case histories with clinical outcome supplemented by high-throughput molecular data. This causes overtraining and high vulnerability of most ML methods. Recently, we proposed a hybrid global-local approach to ML termed floating window projective separator (FloWPS) that avoids extrapolation in the feature space. Its core property is data trimming, i.e., sample-specific removal of irrelevant features. (2) Methods: Here, we applied FloWPS to seven popular ML methods, including linear SVM, k nearest neighbors (kNN), random forest (RF), Tikhonov (ridge) regression (RR), binomial naïve Bayes (BNB), adaptive boosting (ADA) and multi-layer perceptron (MLP). (3) Results: We performed computational experiments for 21 high throughput gene expression datasets (41-235 samples per dataset) totally representing 1778 cancer patients with known responses on chemotherapy treatments. FloWPS essentially improved the classifier quality for all global ML methods (SVM, RF, BNB, ADA, MLP), where the area under the receiver-operator curve (ROC AUC) for the treatment response classifiers increased from 0.61-0.88 range to 0.70-0.94. We tested FloWPS-empowered methods for overtraining by interrogating the importance of different features for different ML methods in the same model datasets. (4) Conclusions: We showed that FloWPS increases the correlation of feature importance between the different ML methods, which indicates its robustness to overtraining. For all the datasets tested, the best performance of FloWPS data trimming was observed for the BNB method, which can be valuable for further building of ML classifiers in personalized oncology.

Subject(s)

Medical Oncology/methods , Precision Medicine/methods , Antineoplastic Agents/therapeutic use , High-Throughput Screening Assays/methods , Humans , Machine Learning , Neoplasms/drug therapy

15.

System, Method and Software for Calculation of a Cannabis Drug Efficiency Index for the Reduction of Inflammation.

Borisov, Nicolas; Ilnytskyy, Yaroslav; Byeon, Boseon; Kovalchuk, Olga; Kovalchuk, Igor.

Int J Mol Sci ; 22(1)2020 Dec 31.

Article in English | MEDLINE | ID: mdl-33396562

ABSTRACT

There are many varieties of Cannabis sativa that differ from each other by composition of cannabinoids, terpenes and other molecules. The medicinal properties of these cultivars are often very different, with some being more efficient than others. This report describes the development of a method and software for the analysis of the efficiency of various cannabis extracts to detect the anti-inflammatory properties of the various cannabis extracts. The method uses high-throughput gene expression profiling data but can potentially use other omics data as well. According to the signaling pathway topology, the gene expression profiles are convoluted into the signaling pathway activities using a signaling pathway impact analysis (SPIA) method. The method was tested by inducing inflammation in human 3D epithelial tissues, including intestine, oral and skin, and then exposing these tissues to various extracts and then performing transcriptome analysis. The analysis showed a different efficiency of the various extracts in restoring the transcriptome changes to the pre-inflammation state, thus allowing to calculate a different cannabis drug efficiency index (CDEI).

Subject(s)

Cannabinoids/pharmacology , Cannabis/chemistry , Drug Monitoring/methods , Inflammation/drug therapy , Plant Extracts/pharmacology , Software , Transcriptome/drug effects , Biomarkers/analysis , Cells, Cultured , Gene Expression Profiling , Humans , Inflammation/metabolism , Intestinal Mucosa/drug effects , Intestinal Mucosa/metabolism , Mouth Mucosa/drug effects , Mouth Mucosa/metabolism , Skin/drug effects , Skin/metabolism

16.

Quantitation of Molecular Pathway Activation Using RNA Sequencing Data.

Borisov, Nicolas; Sorokin, Maxim; Garazha, Andrew; Buzdin, Anton.

Methods Mol Biol ; 2063: 189-206, 2020.

Article in English | MEDLINE | ID: mdl-31667772

ABSTRACT

Intracellular molecular pathways (IMPs) control all major events in the living cell. IMPs are considered hotspots in biomedical sciences and thousands of IMPs have been discovered for humans and model organisms. Knowledge of IMPs activation is essential for understanding biological functions and differences between the biological objects at the molecular level. Here we describe the Oncobox system for accurate quantitative scoring activities of up to several thousand molecular pathways based on high throughput molecular data. Although initially designed for gene expression and mainly RNA sequencing data, Oncobox is now also applicable for quantitative proteomics, microRNA and transcription factor binding sites mapping data. The Oncobox system includes modules of gene expression data harmonization, aggregation and comparison and a recursive algorithm for automatic annotation of molecular pathways. The universal rationale of Oncobox enables scoring of signaling, metabolic, cytoskeleton, immunity, DNA repair, and other pathways in a multitude of biological objects. The Oncobox system can be helpful to all those working in the fields of genetics, biochemistry, interactomics, and big data analytics in molecular biomedicine.

Subject(s)

Gene Expression Profiling/methods , Molecular Sequence Annotation/methods , Systems Biology/methods , Transcriptome/genetics , Algorithms , Base Sequence , Enzyme Activation/genetics , Humans , Machine Learning , MicroRNAs/genetics , Sequence Analysis, RNA , Signal Transduction/genetics , Exome Sequencing/methods

17.

Oncobox Method for Scoring Efficiencies of Anticancer Drugs Based on Gene Expression Data.

Tkachev, Victor; Sorokin, Maxim; Garazha, Andrew; Borisov, Nicolas; Buzdin, Anton.

Methods Mol Biol ; 2063: 235-255, 2020.

Article in English | MEDLINE | ID: mdl-31667774

ABSTRACT

We describe here the Oncobox method for scoring efficiencies of anticancer target drugs (ATDs) using high throughput gene expression data. The method rationale, design, and validation are given along with the examples of its practical applications in biomedicine. The method is based on the analysis of intracellular molecular pathways activation and measuring expressions of molecular target genes for every ATD under consideration. Using Oncobox method requires collection of normal (control) expression profiles and annotated databases of molecular pathways and drug target genes. Both microarray and RNA sequencing profiles are acceptable, although the latter type of data prevails in the most recent applications of this technique.

Subject(s)

Antineoplastic Agents/pharmacology , Computational Biology/methods , Drug Discovery/methods , Molecular Targeted Therapy/methods , Neoplasms/drug therapy , Aged , Algorithms , Biomarkers, Tumor/genetics , Clinical Trials as Topic , Drug Delivery Systems , Female , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Gene Regulatory Networks , Humans , Mutation/genetics , Mutation Rate , Neoplasms/genetics , Protein Kinase Inhibitors/pharmacology , Treatment Outcome

18.

New Paradigm of Machine Learning (ML) in Personalized Oncology: Data Trimming for Squeezing More Biomarkers From Clinical Datasets.

Borisov, Nicolas; Buzdin, Anton.

Front Oncol ; 9: 658, 2019.

Article in English | MEDLINE | ID: mdl-31380288

19.

High-Throughput Mutation Data Now Complement Transcriptomic Profiling: Advances in Molecular Pathway Activation Analysis Approach in Cancer Biology.

Buzdin, Anton; Sorokin, Maxim; Poddubskaya, Elena; Borisov, Nicolas.

Cancer Inform ; 18: 1176935119838844, 2019.

Article in English | MEDLINE | ID: mdl-30936679

ABSTRACT

We recently reviewed the current progress in the use of high-throughput molecular "omics" data for the quantitative analysis of molecular pathway activation. These quantitative metrics may be used in many ways, and we focused on their application as tumor biomarkers. Here, we provide an update of the most recent conceptual findings related to pathway analysis in tumor biology, which were not included in the previous review. The major novelties include a method enabling calculation of pathway-scale tumor mutation burden termed "Pathway Instability" and its application for scoring of anticancer target drugs. A new technique termed Shambhala emerged that enables accurate common harmonization of any number of gene expression profiles obtained using any number of experimental platforms. This may be helpful for merging various gene expression data sets and for comparing their pathway activation characteristics. Another recent bioinformatics method, termed FLOating-Window Projective Separator (FloWPS), has the potential to significantly enhance the value of pathway activation profiles as biomarkers of cancer response to treatments. It reduces the minimum required number of training samples needed to construct a machine-learning-based classifier. Finally, several documented clinical cases have been recently published, in which gene-expression-based pathway analysis was successfully used for personalized off-label prescription of target drugs to metastatic cancer patients.

20.

Shambhala: a platform-agnostic data harmonizer for gene expression data.

Borisov, Nicolas; Shabalina, Irina; Tkachev, Victor; Sorokin, Maxim; Garazha, Andrew; Pulin, Andrey; Eremin, Ilya I; Buzdin, Anton.

BMC Bioinformatics ; 20(1): 66, 2019 Feb 06.

Article in English | MEDLINE | ID: mdl-30727942

ABSTRACT

BACKGROUND: Harmonization techniques make different gene expression profiles and their sets compatible and ready for comparisons. Here we present a new bioinformatic tool termed Shambhala for harmonization of multiple human gene expression datasets obtained using different experimental methods and platforms of microarray hybridization and RNA sequencing. RESULTS: Unlike previously published methods enabling good quality data harmonization for only two datasets, Shambhala allows conversion of multiple datasets into the universal form suitable for further comparisons. Shambhala harmonization is based on the calibration of gene expression profiles using the auxiliary standardization dataset. Each profile is transformed to make it similar to the output of microarray hybridization platform Affymetrix Human Gene. This platform was chosen because it has the biggest number of human gene expression profiles deposited in public databases. We evaluated Shambhala ability to retain biologically important features after harmonization. The same four biological samples taken in multiple replicates were profiled independently using three and four different experimental platforms, respectively, then Shambhala-harmonized and investigated by hierarchical clustering. CONCLUSION: Our results showed that unlike other frequently used methods: quantile normalization and DESeq/DESeq2 normalization, Shambhala harmonization was the only method supporting sample-specific and platform-independent biologically meaningful clustering for the data obtained from multiple experimental platforms.

Subject(s)

Software , Cluster Analysis , Gene Expression Profiling , Gene Expression Regulation , Humans , Reproducibility of Results

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL