Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 176
Filter
1.
Biophys Chem ; 311: 107253, 2024 Apr 30.
Article in English | MEDLINE | ID: mdl-38768531

ABSTRACT

The prediction of binding affinity changes caused by missense mutations can elucidate antigen-antibody interactions. A few accessible structure-based online computational tools have been proposed. However, selecting suitable software for particular research is challenging, especially research on the SARS-CoV-2 spike protein with antibodies. Therefore, benchmarking of the mutation-diverse SARS-CoV-2 datasets is critical. Here, we collected the datasets including 1216 variants about the changes in binding affinity of antigens from 22 complexes for SARS-CoV-2 S proteins and 22 monoclonal antibodies as well as applied them to evaluate the performance of seven binding affinity prediction tools. The tested tools' Pearson correlations between predicted and measured changes in binding affinity were between -0.158 and 0.657, while accuracy in classification tasks on predicting increasing or decreasing affinity ranged from 0.444 to 0.834. These tools performed relatively better on predicting single mutations, especially at epitope sites, whereas poor performance on extremely decreasing affinity. The tested tools were relatively insensitive to the experimental techniques used to obtain structures of complexes. In summary, we constructed a list of datasets and evaluated a range of structure-based online prediction tools that will explicate relevant processes of antigen-antibody interactions and enhance the computational design of therapeutic monoclonal antibodies.

2.
Bioinformatics ; 40(4)2024 Mar 29.
Article in English | MEDLINE | ID: mdl-38603616

ABSTRACT

MOTIVATION: Clustering analysis for single-cell RNA sequencing (scRNA-seq) data is an important step in revealing cellular heterogeneity. Many clustering methods have been proposed to discover heterogenous cell types from scRNA-seq data. However, adaptive clustering with accurate cluster number reflecting intrinsic biology nature from large-scale scRNA-seq data remains quite challenging. RESULTS: Here, we propose a single-cell Deep Adaptive Clustering (scDAC) model by coupling the Autoencoder (AE) and the Dirichlet Process Mixture Model (DPMM). By jointly optimizing the model parameters of AE and DPMM, scDAC achieves adaptive clustering with accurate cluster numbers on scRNA-seq data. We verify the performance of scDAC on five subsampled datasets with different numbers of cell types and compare it with 15 widely used clustering methods across nine scRNA-seq datasets. Our results demonstrate that scDAC can adaptively find accurate numbers of cell types or subtypes and outperforms other methods. Moreover, the performance of scDAC is robust to hyperparameter changes. AVAILABILITY AND IMPLEMENTATION: The scDAC is implemented in Python. The source code is available at https://github.com/labomics/scDAC.


Subject(s)
Single-Cell Analysis , Transcriptome , Single-Cell Analysis/methods , Cluster Analysis , Transcriptome/genetics , Humans , Algorithms , Sequence Analysis, RNA/methods , Gene Expression Profiling/methods , Software
3.
Database (Oxford) ; 20242024 Feb 12.
Article in English | MEDLINE | ID: mdl-38345567

ABSTRACT

Detecting changes in the dynamics of secreted proteins in serum has been a challenge for proteomics. Enter secreted protein database (SEPDB), an integrated secretory proteomics database offering human, mouse and rat secretory proteomics datasets collected from serum, exosomes and cell culture media. SEPDB compiles secreted protein information from secreted protein database, UniProt and Human Protein Atlas databases to annotate secreted proteomics data based on protein subcellular localization and disease markers. SEPDB integrates the latest predictive modeling techniques to measure deviations in the distribution of signal peptide structures of secreted proteins, extends signal peptide sequence prediction by excluding transmembrane structural domain proteins and updates the validation analysis pipeline for secreted proteins. To establish tissue-specific profiles, we have also created secreted proteomics datasets associated with different human tissues. In addition, we provide information on heterogeneous receptor network organizational relationships, reflective of the complex functional information inherent in the molecular structures of secreted proteins that serve as ligands. Users can take advantage of the Refreshed Search, Analyze, Browse and Download functions of SEPDB, which is available online at https://sysomics.com/SEPDB/. Database URL:  https://sysomics.com/SEPDB/.


Subject(s)
Proteins , Proteomics , Animals , Mice , Rats , Humans , Databases, Protein , Proteins/chemistry , Proteomics/methods , Protein Sorting Signals
4.
Cell Death Dis ; 15(1): 9, 2024 01 05.
Article in English | MEDLINE | ID: mdl-38182571

ABSTRACT

Chromatin accessibility plays important roles in revealing the regulatory networks of gene expression, while its application in bladder cancer is yet to be fully elucidated. Chloride intracellular channel 3 (CLIC3) protein has been reported to be associated with the progression of some tumors, whereas the specific mechanism of CLIC3 in tumor remains unclear. Here, we screened for key genes in bladder cancer through the identification of transcription factor binding site clustered region (TFCR) on the basis of chromatin accessibility and TF motif. CLIC3 was identified by joint profiling of chromatin accessibility data with TCGA database. Clinically, CLIC3 expression was significantly elevated in bladder cancer and was negatively correlated with patient survival. CLIC3 promoted the proliferation of bladder cancer cells by reducing p21 expression in vitro and in vivo. Mechanistically, CLIC3 interacted with NAT10 and inhibited the function of NAT10, resulting in the downregulation of ac4C modification and stability of p21 mRNA. Overall, these findings uncover an novel mechanism of mRNA ac4C modification and CLIC3 may act as a potential therapeutic target for bladder cancer.


Subject(s)
Urinary Bladder Neoplasms , Humans , Chloride Channels/genetics , Chromatin , N-Terminal Acetyltransferases , RNA, Messenger/genetics , Urinary Bladder , Urinary Bladder Neoplasms/genetics
5.
Nat Biotechnol ; 2024 Jan 23.
Article in English | MEDLINE | ID: mdl-38263515

ABSTRACT

Integrating single-cell datasets produced by multiple omics technologies is essential for defining cellular heterogeneity. Mosaic integration, in which different datasets share only some of the measured modalities, poses major challenges, particularly regarding modality alignment and batch effect removal. Here, we present a deep probabilistic framework for the mosaic integration and knowledge transfer (MIDAS) of single-cell multimodal data. MIDAS simultaneously achieves dimensionality reduction, imputation and batch correction of mosaic data by using self-supervised modality alignment and information-theoretic latent disentanglement. We demonstrate its superiority to 19 other methods and reliability by evaluating its performance in trimodal and mosaic integration tasks. We also constructed a single-cell trimodal atlas of human peripheral blood mononuclear cells and tailored transfer learning and reciprocal reference mapping schemes to enable flexible and accurate knowledge transfer from the atlas to new data. Applications in mosaic integration, pseudotime analysis and cross-tissue knowledge transfer on bone marrow mosaic datasets demonstrate the versatility and superiority of MIDAS. MIDAS is available at https://github.com/labomics/midas .

6.
Cancer Gene Ther ; 31(3): 439-453, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38146007

ABSTRACT

Recurrence and extraocular metastasis in advanced intraocular retinoblastoma (RB) are still major obstacles for successful treatment of Chinese children. Tuberous sclerosis complex (TSC) is a very rare, multisystemic genetic disorder characterized by hamartomatous growth. In this study, we aimed to compare genomic and epigenomic profiles with human RB or TSC using recently developed nanopore sequencing, and to identify disease-associated variations or genes. Peripheral blood samples were collected from either RB or RB/TSC patients plus their normal siblings, followed by nanopore sequencing and identification of disease-specific structural variations (SVs) and differentially methylated regions (DMRs) by a systematic biology strategy named as multiomics-based joint screening framework. In total, 316 RB- and 1295 TSC-unique SVs were identified, as well as 1072 RB- and 1114 TSC-associated DMRs, respectively. We eventually identified 6 key genes for RB for further functional validation. Knockdown of CDK19 with specific siRNAs significantly inhibited Y79 cellular proliferation and increased sensitivity to carboplatin, whereas downregulation of AHNAK2 promoted the cell growth as well as drug resistance. Those two genes might serve as potential diagnostic markers or therapeutic targets of RB. The systematic biology strategy combined with functional validation might be an effective approach for rare pediatric malignances with limited samples and challenging collection process.


Subject(s)
Nanopore Sequencing , Retinal Neoplasms , Retinoblastoma , Tuberous Sclerosis , Child , Humans , Retinoblastoma/genetics , Tuberous Sclerosis/diagnosis , Tuberous Sclerosis/genetics , Epigenomics , Genomics , Retinal Neoplasms/genetics , Retinal Neoplasms/pathology , Cyclin-Dependent Kinases
7.
Nat Commun ; 14(1): 8282, 2023 Dec 13.
Article in English | MEDLINE | ID: mdl-38092772

ABSTRACT

Structural variants (SVs), accounting for a larger fraction of the genome than SNPs/InDels, are an important pool of genetic variation, enabling environmental adaptations. Here, we perform long-read sequencing data of 320 Tibetan and Han samples and show that SVs are highly involved in high-altitude adaptation. We expand the landscape of global SVs, apply robust models of selection and population differentiation combining SVs, SNPs and InDels, and use epigenomic analyses to predict enhancers, target genes and biological functions. We reveal diverse Tibetan-specific SVs affecting the regulatory circuitry of biological functions, including the hypoxia response, energy metabolism and pulmonary function. We find a Tibetan-specific deletion disrupts a super-enhancer and downregulates EPAS1 using enhancer reporter, cellular knock-out and DNA pull-down assays. Our study expands the global SV landscape, reveals the role of gene-regulatory circuitry rewiring in human adaptation, and illustrates the diverse functional roles of SVs in human biology.


Subject(s)
Altitude , Genome , Humans , Hypoxia/genetics , Sequence Analysis, DNA , Adaptation, Physiological/genetics
8.
J Adv Res ; 2023 Dec 02.
Article in English | MEDLINE | ID: mdl-38043609

ABSTRACT

INTRODUCTION: Synthetic lethality (SL) provides an opportunity to leverage different genetic interactions when designing synergistic combination therapies. To further explore SL-based combination therapies for cancer treatment, it is important to identify and mechanistically characterize more SL interactions. Artificial intelligence (AI) methods have recently been proposed for SL prediction, but the results of these models are often not interpretable such that deriving the underlying mechanism can be challenging. OBJECTIVES: This study aims to develop an interpretable AI framework for SL prediction and subsequently utilize it to design SL-based synergistic combination therapies. METHODS: We propose a knowledge and data dual-driven AI framework for SL prediction (KDDSL). Specifically, we use gene knowledge related to the SL mechanism to guide the construction of the model and develop a method to identify the most relevant gene knowledge for the predicted results. RESULTS: Experimental and literature-based validation confirmed a good balance between predictive and interpretable ability when using KDDSL. Moreover, we demonstrated that KDDSL could help to discover promising drug combinations and clarify associated biological processes, such as the combination of MDM2 and CDK9 inhibitors, which exhibited significant anti-cancer effects in vitro and in vivo. CONCLUSION: These data underscore the potential of KDDSL to guide SL-based combination therapy design. There is a need for biomedicine-focused AI strategies to combine rational biological knowledge with developed models.

9.
Bioinformatics ; 39(12)2023 12 01.
Article in English | MEDLINE | ID: mdl-37995293

ABSTRACT

SUMMARY: A variety of computational methods have been developed to identify functionally related gene modules from genome-wide gene expression profiles. Integrating the results of these methods to identify consensus modules is a promising approach to produce more accurate and robust results. In this application note, we introduce COMMO, the first web server to identify and analyze consensus gene functionally related gene modules from different module detection methods. First, COMMO implements eight state-of-the-art module detection methods and two consensus clustering algorithms. Second, COMMO provides users with mRNA and protein expression data for 33 cancer types from three public databases. Users can also upload their own data for module detection. Third, users can perform functional enrichment and two types of survival analyses on the observed gene modules. Finally, COMMO provides interactive, customizable visualizations and exportable results. With its extensive analysis and interactive capabilities, COMMO offers a user-friendly solution for conducting module-based precision medicine research. AVAILABILITY AND IMPLEMENTATION: COMMO web is available at https://commo.ncpsb.org.cn/, with the source code available on GitHub: https://github.com/Song-xinyu/COMMO/tree/master.


Subject(s)
Gene Regulatory Networks , Software , Consensus , Algorithms , Computers
10.
Commun Biol ; 6(1): 989, 2023 09 27.
Article in English | MEDLINE | ID: mdl-37758874

ABSTRACT

Cellular transitions hold great promise in translational medicine research. However, therapeutic applications are limited by the low efficiency and safety concerns of using transcription factors. Small molecules provide a temporal and highly tunable approach to overcome these issues. Here, we present PC3T, a computational framework to enrich molecules that induce desired cellular transitions, and PC3T was able to consistently enrich small molecules that had been experimentally validated in both bulk and single-cell datasets. We then predicted small molecule reprogramming of fibroblasts into hepatic progenitor-like cells (HPLCs). The converted cells exhibited epithelial cell-like morphology and HPLC-like gene expression pattern. Hepatic functions were also observed, such as glycogen storage and lipid accumulation. Finally, we collected and manually curated a cell state transition resource containing 224 time-course gene expression datasets and 153 cell types. Our framework, together with the data resource, is freely available at http://pc3t.idrug.net.cn/ . We believe that PC3T is a powerful tool to promote chemical-induced cell state transitions.


Subject(s)
Cellular Reprogramming , Fibroblasts , Fibroblasts/metabolism , Stem Cells/metabolism , Transcription Factors/metabolism , Epithelial Cells/metabolism
11.
Commun Biol ; 6(1): 901, 2023 09 02.
Article in English | MEDLINE | ID: mdl-37660148

ABSTRACT

Early embryonic development is a dynamic process that relies on proper cell-cell communication to form a correctly patterned embryo. Early embryo development-related ligand-receptor pairs (eLRs) have been shown to guide cell fate decisions and morphogenesis. However, the scope of eLRs and their influence on early embryo development remain elusive. Here, we developed a computational framework named TimeTalk from integrated public time-course mouse scRNA-seq datasets to decipher the secret of eLRs. Extensive validations and analyses were performed to ensure the involvement of identified eLRs in early embryo development. Process analysis identified that eLRs could be divided into six temporal windows corresponding to sequential events in the early embryo development process. With the interpolation strategy, TimeTalk is powerful in revealing paracrine settings and studying cell-cell communication during early embryo development. Furthermore, by using TimeTalk in the blastocyst and blastoid models, we found that the blastoid models share the core communication pathways with the epiblast and primitive endoderm lineages in the blastocysts. This result suggests that TimeTalk has transferability to other bio-dynamic processes. We also curated eLRs recognized by TimeTalk, which may provide valuable clues for understanding early embryo development and relevant disorders.


Subject(s)
Cell Communication , Single-Cell Gene Expression Analysis , Female , Pregnancy , Animals , Mice , Cell Communication/genetics , Embryonic Development/genetics , Morphogenesis , Blastocyst
12.
iScience ; 26(8): 107378, 2023 Aug 18.
Article in English | MEDLINE | ID: mdl-37559907

ABSTRACT

Cancer is an extremely complex disease and each type of cancer usually has several different subtypes. Multi-omics data can provide more comprehensive biological information for identifying and discovering cancer subtypes. However, existing unsupervised cancer subtyping methods cannot effectively learn comprehensive shared and specific information of multi-omics data. Therefore, a novel method is proposed based on shared and specific representation learning. For each omics data, two autoencoders are applied to extract shared and specific information, respectively. To reduce redundancy and mutual interference, orthogonality constraint is introduced to separate shared and specific information. In addition, contrastive learning is applied to align the shared information and strengthen their consistency. Finally, the obtained shared and specific information for all samples are used for clustering tasks to achieve cancer subtyping. Experimental results demonstrate that the proposed method can effectively capture shared and specific information of multi-omics data and outperform other state-of-the-art methods on cancer subtyping.

13.
Lancet Reg Health West Pac ; 36: 100779, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37547044

ABSTRACT

Background: Stroke ranks second worldwide and first in China as a leading cause of death and disability. It has a polygenic architecture and is influenced by environmental and lifestyle factors. However, it remains unknown as to whether and how much the genetic predisposition of stroke is associated with disease burden. Methods: Allele frequency from the whole genome sequencing data in the Chinese Millionome Database of 141,418 individuals and trait-specific polygenic risk score models were applied to estimate the provincial genetic predisposition to stroke, stroke-related risk factors and stroke-related drug response. Disease burden including mortality, disability-adjusted life years (DALYs), years of life lost(YLLs), years lived with disability (YLDs) and prevalence in China was collected from the Global Burden Disease study. The association between stroke genetic predisposition and the epidemiological burden was assessed and then quantified in both regression-based models and machine learning-based models at a provincial resolution. Findings: Among the 30 administrative divisions in China, the genetic predisposition of stroke was characterized by a north-higher-than-south gradient (p < 0.0001). Genetic predisposition to stroke, blood pressure, body mass index, and alcohol use were strongly intercorrelated (rho >0.6; p < 0.05 after Bonferroni correction for each comparison). Genetic risk imposed an independent effect of approximately 1-6% on mortality, DALYs and YLLs. Interpretation: The distribution pattern of stroke genetic predisposition is different at a macroscopic level, and it subtly but significantly impacts the epidemiological burden. Further research is warranted to identify the detailed aetiology and potential translation into public health measures. Funding: Beijing Municipal Science and Technology Commission (Z191100006619106), CAMS Innovation Fund for Medical Sciences (CAMS-I2M, 2023-I2M-1-001), the National High Level Hospital Clinical Research Funding (2022-GSP-GG-17), National Natural Science Foundation of China (32000398, 32171441 to X.J.), Natural Science Foundation of Guangdong Province, China (2017A030306026 to X.J.), and National Key R&D Program of China (2022YFC2502402).

14.
BMC Bioinformatics ; 24(1): 325, 2023 Aug 29.
Article in English | MEDLINE | ID: mdl-37644423

ABSTRACT

INTRODUCTION: There are countless possibilities for drug combinations, which makes it expensive and time-consuming to rely solely on clinical trials to determine the effects of each possible drug combination. In order to screen out the most effective drug combinations more quickly, scholars began to apply machine learning to drug combination prediction. However, most of them are of low interpretability. Consequently, even though they can sometimes produce high prediction accuracy, experts in the medical and biological fields can still not fully rely on their judgments because of the lack of knowledge about the decision-making process. RELATED WORK: Decision trees and their ensemble algorithms are considered to be suitable methods for pharmaceutical applications due to their excellent performance and good interpretability. We review existing decision trees or decision tree ensemble algorithms in the medical field and point out their shortcomings. METHOD: This study proposes a decision stump (DS)-based solution to extract interpretable knowledge from data sets. In this method, a set of DSs is first generated to selectively form a decision tree (DST). Different from the traditional decision tree, our algorithm not only enables a partial exchange of information between base classifiers by introducing a stump exchange method but also uses a modified Gini index to evaluate stump performance so that the generation of each node is evaluated by a global view to maintain high generalization ability. Furthermore, these trees are combined to construct an ensemble of DST (EDST). EXPERIMENT: The two-drug combination data sets are collected from two cell lines with three classes (additive, antagonistic and synergistic effects) to test our method. Experimental results show that both our DST and EDST perform better than other methods. Besides, the rules generated by our methods are more compact and more accurate than other rule-based algorithms. Finally, we also analyze the extracted knowledge by the model in the field of bioinformatics. CONCLUSION: The novel decision tree ensemble model can effectively predict the effect of drug combination datasets and easily obtain the decision-making process.


Subject(s)
Algorithms , Computational Biology , Cell Line , Drug Combinations , Knowledge
15.
Genome Res ; 33(8): 1381-1394, 2023 08.
Article in English | MEDLINE | ID: mdl-37524436

ABSTRACT

Accurately measuring biological age is crucial for improving healthcare for the elderly population. However, the complexity of aging biology poses challenges in how to robustly estimate aging and interpret the biological significance of the traits used for estimation. Here we present SCALE, a statistical pipeline that quantifies biological aging in different tissues using explainable features learned from literature and single-cell transcriptomic data. Applying SCALE to the "Mouse Aging Cell Atlas" (Tabula Muris Senis) data, we identified tissue-level transcriptomic aging programs for more than 20 murine tissues and created a multitissue resource of mouse quantitative aging-associated genes. We observe that SCALE correlates well with other age indicators, such as the accumulation of somatic mutations, and can distinguish subtle differences in aging even in cells of the same chronological age. We further compared SCALE with other transcriptomic and methylation "clocks" in data from aging muscle stem cells, Alzheimer's disease, and heterochronic parabiosis. Our results confirm that SCALE is more generalizable and reliable in assessing biological aging in aging-related diseases and rejuvenating interventions. Overall, SCALE represents a valuable advancement in our ability to measure aging accurately, robustly, and interpretably in single cells.


Subject(s)
Aging , Transcriptome , Animals , Mice , Aging/genetics , Gene Expression Profiling , Phenotype , Models, Biological
16.
J Chem Inf Model ; 63(12): 3941-3954, 2023 06 26.
Article in English | MEDLINE | ID: mdl-37303117

ABSTRACT

Combination therapy is a promising clinical treatment strategy for cancer and other complex diseases. Multiple drugs can target multiple proteins and pathways, greatly improving the therapeutic effect and slowing down drug resistance. To narrow the search space of synergistic drug combinations, many prediction models have been developed. However, drug combination datasets always have the characteristics of class imbalance. Synergistic drug combinations receive the most attention in clinical application but are in small numbers. To predict synergistic drug combinations in different cancer cell lines, in this study, we propose a genetic algorithm-based ensemble learning framework, GA-DRUG, to address the problems of class imbalance and high dimensionality of input data. The cell-line-specific gene expression profiles under drug perturbations are used to train GA-DRUG, which contains imbalanced data processing and the search of global optimal solutions. Compared to 11 state-of-the-art algorithms, GA-DRUG achieves the best performance and significantly improves the prediction performance in the minority class (Synergy). The ensemble framework can effectively correct the classification results of a single classifier. In addition, the cellular proliferation experiment performed on several previously unexplored drug combinations further confirms the predictive ability of GA-DRUG.


Subject(s)
Algorithms , Neoplasms , Humans , Drug Combinations , Neoplasms/drug therapy , Proteins , Machine Learning
17.
Nat Commun ; 14(1): 2631, 2023 05 06.
Article in English | MEDLINE | ID: mdl-37149708

ABSTRACT

Although long-read single-cell RNA isoform sequencing (scISO-Seq) can reveal alternative RNA splicing in individual cells, it suffers from a low read throughput. Here, we introduce HIT-scISOseq, a method that removes most artifact cDNAs and concatenates multiple cDNAs for PacBio circular consensus sequencing (CCS) to achieve high-throughput and high-accuracy single-cell RNA isoform sequencing. HIT-scISOseq can yield >10 million high-accuracy long-reads in a single PacBio Sequel II SMRT Cell 8M. We also report the development of scISA-Tools that demultiplex HIT-scISOseq concatenated reads into single-cell cDNA reads with >99.99% accuracy and specificity. We apply HIT-scISOseq to characterize the transcriptomes of 3375 corneal limbus cells and reveal cell-type-specific isoform expression in them. HIT-scISOseq is a high-throughput, high-accuracy, technically accessible method and it can accelerate the burgeoning field of long-read single-cell transcriptomics.


Subject(s)
RNA Isoforms , RNA , RNA Isoforms/genetics , High-Throughput Nucleotide Sequencing/methods , Consensus , Protein Isoforms/genetics , Sequence Analysis, DNA/methods , Sequence Analysis, RNA
18.
Genome Biol ; 24(1): 90, 2023 04 24.
Article in English | MEDLINE | ID: mdl-37095580

ABSTRACT

BACKGROUND: DNA double-strand breaks (DSBs) are among the most deleterious DNA lesions, and they can cause cancer if improperly repaired. Recent chromosome conformation capture techniques, such as Hi-C, have enabled the identification of relationships between the 3D chromatin structure and DSBs, but little is known about how to explain these relationships, especially from global contact maps, or their contributions to DSB formation. RESULTS: Here, we propose a framework that integrates graph neural network (GNN) to unravel the relationship between 3D chromatin structure and DSBs using an advanced interpretable technique GNNExplainer. We identify a new chromatin structural unit named the DNA fragility-associated chromatin interaction network (FaCIN). FaCIN is a bottleneck-like structure, and it helps to reveal a universal form of how the fragility of a piece of DNA might be affected by the whole genome through chromatin interactions. Moreover, we demonstrate that neck interactions in FaCIN can serve as chromatin structural determinants of DSB formation. CONCLUSIONS: Our study provides a more systematic and refined view enabling a better understanding of the mechanisms of DSB formation under the context of the 3D genome.


Subject(s)
Chromatin , DNA Repair , DNA , DNA Breaks, Double-Stranded , DNA-Binding Proteins/metabolism
19.
Comput Struct Biotechnol J ; 21: 1807-1819, 2023.
Article in English | MEDLINE | ID: mdl-36923471

ABSTRACT

Established taxonomy system based on disease symptom and tissue characteristics have provided an important basis for physicians to correctly identify diseases and treat them successfully. However, these classifications tend to be based on phenotypic observations, lacking a molecular biological foundation. Therefore, there is an urgent to integrate multi-dimensional molecular biological information or multi-omics data to redefine disease classification in order to provide a powerful perspective for understanding the molecular structure of diseases. Therefore, we offer a flexible disease classification that integrates the biological process, gene expression, and symptom phenotype of diseases, and propose a disease-disease association network based on multi-view fusion. We applied the fusion approach to 223 diseases and divided them into 24 disease clusters. The contribution of internal and external edges of disease clusters were analyzed. The results of the fusion model were compared with Medical Subject Headings, a traditional and commonly used disease taxonomy. Then, experimental results of model performance comparison show that our approach performs better than other integration methods. As it was observed, the obtained clusters provided more interesting and novel disease-disease associations. This multi-view human disease association network describes relationships between diseases based on multiple molecular levels, thus breaking through the limitation of the disease classification system based on tissues and organs. This approach which motivates clinicians and researchers to reposition the understanding of diseases and explore diagnosis and therapy strategies, extends the existing disease taxonomy. Availability of data and materials: The preprocessed dataset and source code supporting the conclusions of this article are available at GitHub repository https://github.com/yangxiaoxi89/mvHDN.

20.
Cell Rep Methods ; 3(2): 100411, 2023 02 27.
Article in English | MEDLINE | ID: mdl-36936075

ABSTRACT

Combination therapy is a promising approach in treating multiple complex diseases. However, the large search space of available drug combinations exacerbates challenge for experimental screening. To predict synergistic drug combinations in different cancer cell lines, we propose an improved deep forest-based method, ForSyn, and design two forest types embedded in ForSyn. ForSyn handles imbalanced and high-dimensional data in medium-/small-scale datasets, which are inherent characteristics of drug combination datasets. Compared with 12 state-of-the-art methods, ForSyn ranks first on four metrics for eight datasets with different feature combinations. We conduct a systematic analysis to identify the most appropriate configuration parameters. We validate the predictive value of ForSyn with cell-based experiments on several previously unexplored drug combinations. Finally, a systematic analysis of feature importance is performed on the top contributing features extracted by ForSyn. The resulting key genes may play key roles on corresponding cancers.


Subject(s)
Computational Biology , Neoplasms , Humans , Computational Biology/methods , Neoplasms/drug therapy , Drug Combinations , Cell Line
SELECTION OF CITATIONS
SEARCH DETAIL
...