Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 179
Filter
Add more filters

Publication year range
1.
Trends Genet ; 39(1): 1-4, 2023 01.
Article in English | MEDLINE | ID: mdl-35934594

ABSTRACT

Ionizing radiation (IR)-induced DNA damage and repair are complex and occur at hierarchical chromatin structures; radiobiology needs to be studied from a 3D-genomic perspective. Differences in IR damage and repair throughout the 3D genome may help to explain differences in radiosensitivity.


Subject(s)
DNA Damage , DNA Repair , DNA Repair/genetics , DNA Damage/genetics , Radiation, Ionizing , Radiation Tolerance/genetics , Genomics
2.
Genome Res ; 33(8): 1381-1394, 2023 08.
Article in English | MEDLINE | ID: mdl-37524436

ABSTRACT

Accurately measuring biological age is crucial for improving healthcare for the elderly population. However, the complexity of aging biology poses challenges in how to robustly estimate aging and interpret the biological significance of the traits used for estimation. Here we present SCALE, a statistical pipeline that quantifies biological aging in different tissues using explainable features learned from literature and single-cell transcriptomic data. Applying SCALE to the "Mouse Aging Cell Atlas" (Tabula Muris Senis) data, we identified tissue-level transcriptomic aging programs for more than 20 murine tissues and created a multitissue resource of mouse quantitative aging-associated genes. We observe that SCALE correlates well with other age indicators, such as the accumulation of somatic mutations, and can distinguish subtle differences in aging even in cells of the same chronological age. We further compared SCALE with other transcriptomic and methylation "clocks" in data from aging muscle stem cells, Alzheimer's disease, and heterochronic parabiosis. Our results confirm that SCALE is more generalizable and reliable in assessing biological aging in aging-related diseases and rejuvenating interventions. Overall, SCALE represents a valuable advancement in our ability to measure aging accurately, robustly, and interpretably in single cells.


Subject(s)
Aging , Transcriptome , Animals , Mice , Aging/genetics , Gene Expression Profiling , Phenotype , Models, Biological
3.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38935071

ABSTRACT

Advances in chromatin mapping have exposed the complex chromatin hierarchical organization in mammals, including topologically associating domains (TADs) and their substructures, yet the functional implications of this hierarchy in gene regulation and disease progression are not fully elucidated. Our study delves into the phenomenon of shared TAD boundaries, which are pivotal in maintaining the hierarchical chromatin structure and regulating gene activity. By integrating high-resolution Hi-C data, chromatin accessibility, and DNA double-strand breaks (DSBs) data from various cell lines, we systematically explore the complex regulatory landscape at high-level TAD boundaries. Our findings indicate that these boundaries are not only key architectural elements but also vibrant hubs, enriched with functionally crucial genes and complex transcription factor binding site-clustered regions. Moreover, they exhibit a pronounced enrichment of DSBs, suggesting a nuanced interplay between transcriptional regulation and genomic stability. Our research provides novel insights into the intricate relationship between the 3D genome structure, gene regulation, and DNA repair mechanisms, highlighting the role of shared TAD boundaries in maintaining genomic integrity and resilience against perturbations. The implications of our findings extend to understanding the complexities of genomic diseases and open new avenues for therapeutic interventions targeting the structural and functional integrity of TAD boundaries.


Subject(s)
Chromatin , DNA Breaks, Double-Stranded , DNA Repair , Gene Expression Regulation , Humans , Chromatin/metabolism , Chromatin/genetics , Transcription Factors/metabolism , Transcription Factors/genetics , Animals , Genomics/methods , Genomic Instability , Chromatin Assembly and Disassembly
4.
Mol Cell ; 71(2): 306-318.e7, 2018 07 19.
Article in English | MEDLINE | ID: mdl-30017583

ABSTRACT

DNA N6-methyladenine (6mA) modification is the most prevalent DNA modification in prokaryotes, but whether it exists in human cells and whether it plays a role in human diseases remain enigmatic. Here, we showed that 6mA is extensively present in the human genome, and we cataloged 881,240 6mA sites accounting for ∼0.051% of the total adenines. [G/C]AGG[C/T] was the most significantly associated motif with 6mA modification. 6mA sites were enriched in the coding regions and mark actively transcribed genes in human cells. DNA 6mA and N6-demethyladenine modification in the human genome were mediated by methyltransferase N6AMT1 and demethylase ALKBH1, respectively. The abundance of 6mA was significantly lower in cancers, accompanied by decreased N6AMT1 and increased ALKBH1 levels, and downregulation of 6mA modification levels promoted tumorigenesis. Collectively, our results demonstrate that DNA 6mA modification is extensively present in human cells and the decrease of genomic DNA 6mA promotes human tumorigenesis.


Subject(s)
Adenine/analogs & derivatives , AlkB Homolog 1, Histone H2a Dioxygenase/metabolism , Genome, Human , Site-Specific DNA-Methyltransferase (Adenine-Specific)/metabolism , Adenine/metabolism , AlkB Homolog 1, Histone H2a Dioxygenase/genetics , Animals , Carcinogenesis/genetics , DNA/genetics , DNA Methylation , Heterografts , Humans , Mice , Mice, Nude , Site-Specific DNA-Methyltransferase (Adenine-Specific)/genetics
5.
Nucleic Acids Res ; 2024 May 30.
Article in English | MEDLINE | ID: mdl-38813828

ABSTRACT

Gene expression is temporally and spatially regulated by the interaction of transcription factors (TFs) and cis-regulatory elements (CREs). The uneven distribution of TF binding sites across the genome poses challenges in understanding how this distribution evolves to regulate spatio-temporal gene expression and consequent heritable phenotypic variation. In this study, chromatin accessibility profiles and gene expression profiles were collected from several species including mammals (human, mouse, bovine), fish (zebrafish and medaka), and chicken. Transcription factor binding sites clustered regions (TFCRs) at different embryonic stages were characterized to investigate regulatory evolution. The study revealed dynamic changes in TFCR distribution during embryonic development and species evolution. The synchronization between TFCR complexity and gene expression was assessed across species using RegulatoryScore. Additionally, an explainable machine learning model highlighted the importance of the distance between TFCR and promoter in the coordinated regulation of TFCRs on gene expression. Our results revealed the developmental and evolutionary dynamics of TFCRs during embryonic development from fish, chicken to mammals. These data provide valuable resources for exploring the relationship between transcriptional regulation and phenotypic differences during embryonic development.

6.
Genome Res ; 2022 Aug 17.
Article in English | MEDLINE | ID: mdl-35977841

ABSTRACT

During early mammalian embryo development, different epigenetic marks undergo reprogramming and play crucial roles in the mediation of gene expression. Currently, several databases provide multi-omics information on early embryos. However, how interconnected epigenetic markers function together to coordinate the expression of the genetic code in a spatiotemporal manner remains difficult to analyze, markedly limiting scientific and clinical research. Here, we present dbEmbryo, an integrated and interactive multi-omics database for human and mouse early embryos. dbEmbryo integrates data on gene expression, DNA methylation, histone modifications, chromatin accessibility, and higher-order chromatin structure profiles for human and mouse early embryos. It incorporates customized analysis tools, such as "multi-omics visualization," "Gene&Peak annotation," "ZGA gene cluster," "cis-regulation," "synergistic regulation," "promoter signal enrichment," and "3D genome." Users can retrieve gene expression and epigenetic profile patterns to analyze synergistic changes across different early embryo developmental stages. We showed the uniqueness of dbEmbryo among extant databases containing data on early embryo development and provided an overview. Using dbEmbryo, we obtained a phase-separated model of transcriptional control during early embryo development. dbEmbryo offers web-based analytical tools and a comprehensive resource for biologists and clinicians to decipher molecular regulatory mechanisms of human and mouse early embryo development.

7.
Brief Bioinform ; 24(1)2023 01 19.
Article in English | MEDLINE | ID: mdl-36460622

ABSTRACT

Drug response prediction in cancer cell lines is of great significance in personalized medicine. In this study, we propose GADRP, a cancer drug response prediction model based on graph convolutional networks (GCNs) and autoencoders (AEs). We first use a stacked deep AE to extract low-dimensional representations from cell line features, and then construct a sparse drug cell line pair (DCP) network incorporating drug, cell line, and DCP similarity information. Later, initial residual and layer attention-based GCN (ILGCN) that can alleviate over-smoothing problem is utilized to learn DCP features. And finally, fully connected network is employed to make prediction. Benchmarking results demonstrate that GADRP can significantly improve prediction performance on all metrics compared with baselines on five datasets. Particularly, experiments of predictions of unknown DCP responses, drug-cancer tissue associations, and drug-pathway associations illustrate the predictive power of GADRP. All results highlight the effectiveness of GADRP in predicting drug responses, and its potential value in guiding anti-cancer drug selection.


Subject(s)
Antineoplastic Agents , Neoplasms , Humans , Neoplasms/drug therapy , Antineoplastic Agents/pharmacology , Antineoplastic Agents/therapeutic use , Benchmarking , Cell Line , Learning
8.
Bioinformatics ; 40(4)2024 Mar 29.
Article in English | MEDLINE | ID: mdl-38603616

ABSTRACT

MOTIVATION: Clustering analysis for single-cell RNA sequencing (scRNA-seq) data is an important step in revealing cellular heterogeneity. Many clustering methods have been proposed to discover heterogenous cell types from scRNA-seq data. However, adaptive clustering with accurate cluster number reflecting intrinsic biology nature from large-scale scRNA-seq data remains quite challenging. RESULTS: Here, we propose a single-cell Deep Adaptive Clustering (scDAC) model by coupling the Autoencoder (AE) and the Dirichlet Process Mixture Model (DPMM). By jointly optimizing the model parameters of AE and DPMM, scDAC achieves adaptive clustering with accurate cluster numbers on scRNA-seq data. We verify the performance of scDAC on five subsampled datasets with different numbers of cell types and compare it with 15 widely used clustering methods across nine scRNA-seq datasets. Our results demonstrate that scDAC can adaptively find accurate numbers of cell types or subtypes and outperforms other methods. Moreover, the performance of scDAC is robust to hyperparameter changes. AVAILABILITY AND IMPLEMENTATION: The scDAC is implemented in Python. The source code is available at https://github.com/labomics/scDAC.


Subject(s)
Single-Cell Analysis , Transcriptome , Single-Cell Analysis/methods , Cluster Analysis , Transcriptome/genetics , Humans , Algorithms , Sequence Analysis, RNA/methods , Gene Expression Profiling/methods , Software
9.
Nucleic Acids Res ; 51(D1): D1432-D1445, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36400569

ABSTRACT

The toxic effects of compounds on environment, humans, and other organisms have been a major focus of many research areas, including drug discovery and ecological research. Identifying the potential toxicity in the early stage of compound/drug discovery is critical. The rapid development of computational methods for evaluating various toxicity categories has increased the need for comprehensive and system-level collection of toxicological data, associated attributes, and benchmarks. To contribute toward this goal, we proposed TOXRIC (https://toxric.bioinforai.tech/), a database with comprehensive toxicological data, standardized attribute data, practical benchmarks, informative visualization of molecular representations, and an intuitive function interface. The data stored in TOXRIC contains 113 372 compounds, 13 toxicity categories, 1474 toxicity endpoints covering in vivo/in vitro endpoints and 39 feature types, covering structural, target, transcriptome, metabolic data, and other descriptors. All the curated datasets of endpoints and features can be retrieved, downloaded and directly used as output or input to Machine Learning (ML)-based prediction models. In addition to serving as a data repository, TOXRIC also provides visualization of benchmarks and molecular representations for all endpoint datasets. Based on these results, researchers can better understand and select optimal feature types, molecular representations, and baseline algorithms for each endpoint prediction task. We believe that the rich information on compound toxicology, ML-ready datasets, benchmarks and molecular representation distribution can greatly facilitate toxicological investigations, interpretation of toxicological mechanisms, compound/drug discovery and the development of computational methods.


Subject(s)
Databases, Factual , Toxicology , Humans , Benchmarking , Toxicology/methods , Software
10.
Brief Bioinform ; 23(3)2022 05 13.
Article in English | MEDLINE | ID: mdl-35380622

ABSTRACT

Drug-target interaction (DTI) prediction plays an important role in drug repositioning, drug discovery and drug design. However, due to the large size of the chemical and genomic spaces and the complex interactions between drugs and targets, experimental identification of DTIs is costly and time-consuming. In recent years, the emerging graph neural network (GNN) has been applied to DTI prediction because DTIs can be represented effectively using graphs. However, some of these methods are only based on homogeneous graphs, and some consist of two decoupled steps that cannot be trained jointly. To further explore GNN-based DTI prediction by integrating heterogeneous graph information, this study regards DTI prediction as a link prediction problem and proposes an end-to-end model based on HETerogeneous graph with Attention mechanism (DTI-HETA). In this model, a heterogeneous graph is first constructed based on the drug-drug and target-target similarity matrices and the DTI matrix. Then, the graph convolutional neural network is utilized to obtain the embedded representation of the drugs and targets. To highlight the contribution of different neighborhood nodes to the central node in aggregating the graph convolution information, a graph attention mechanism is introduced into the node embedding process. Afterward, an inner product decoder is applied to predict DTIs. To evaluate the performance of DTI-HETA, experiments are conducted on two datasets. The experimental results show that our model is superior to the state-of-the-art methods. Also, the identification of novel DTIs indicates that DTI-HETA can serve as a powerful tool for integrating heterogeneous graph information to predict DTIs.


Subject(s)
Drug Development , Neural Networks, Computer , Drug Development/methods , Drug Interactions , Drug Repositioning , Polymers
11.
Brief Bioinform ; 23(2)2022 03 10.
Article in English | MEDLINE | ID: mdl-35062018

ABSTRACT

Combination therapy has shown an obvious curative effect on complex diseases, whereas the search space of drug combinations is too large to be validated experimentally even with high-throughput screens. With the increase of the number of drugs, artificial intelligence techniques, especially machine learning methods, have become applicable for the discovery of synergistic drug combinations to significantly reduce the experimental workload. In this study, in order to predict novel synergistic drug combinations in various cancer cell lines, the cell line-specific drug-induced gene expression profile (GP) is added as a new feature type to capture the cellular response of drugs and reveal the biological mechanism of synergistic effect. Then, an enhanced cascade-based deep forest regressor (EC-DFR) is innovatively presented to apply the new small-scale drug combination dataset involving chemical, physical and biological (GP) properties of drugs and cells. Verified by the dataset, EC-DFR outperforms two state-of-the-art deep neural network-based methods and several advanced classical machine learning algorithms. Biological experimental validation performed subsequently on a set of previously untested drug combinations further confirms the performance of EC-DFR. What is more prominent is that EC-DFR can distinguish the most important features, making it more interpretable. By evaluating the contribution of each feature type, GP feature contributes 82.40%, showing the cellular responses of drugs may play crucial roles in synergism prediction. The analysis based on the top contributing genes in GP further demonstrates some potential relationships between the transcriptomic levels of key genes under drug regulation and the synergism of drug combinations.


Subject(s)
Artificial Intelligence , Computational Biology , Computational Biology/methods , Drug Combinations , Machine Learning , Neural Networks, Computer
12.
Brief Bioinform ; 23(3)2022 05 13.
Article in English | MEDLINE | ID: mdl-35238349

ABSTRACT

Inhibition of host protein functions using established drugs produces a promising antiviral effect with excellent safety profiles, decreased incidence of resistant variants and favorable balance of costs and risks. Genomic methods have produced a large number of robust host factors, providing candidates for identification of antiviral drug targets. However, there is a lack of global perspectives and systematic prioritization of known virus-targeted host proteins (VTHPs) and drug targets. There is also a need for host-directed repositioned antivirals. Here, we integrated 6140 VTHPs and grouped viral infection modes from a new perspective of enriched pathways of VTHPs. Clarifying the superiority of nonessential membrane and hub VTHPs as potential ideal targets for repositioned antivirals, we proposed 543 candidate VTHPs. We then presented a large-scale drug-virus network (DVN) based on matching these VTHPs and drug targets. We predicted possible indications for 703 approved drugs against 35 viruses and explored their potential as broad-spectrum antivirals. In vitro and in vivo tests validated the efficacy of bosutinib, maraviroc and dextromethorphan against human herpesvirus 1 (HHV-1), hepatitis B virus (HBV) and influenza A virus (IAV). Their drug synergy with clinically used antivirals was evaluated and confirmed. The results proved that low-dose dextromethorphan is better than high-dose in both single and combined treatments. This study provides a comprehensive landscape and optimization strategy for druggable VTHPs, constructing an innovative and potent pipeline to discover novel antiviral host proteins and repositioned drugs, which may facilitate their delivery to clinical application in translational medicine to combat fatal and spreading viral infections.


Subject(s)
Antiviral Agents , Influenza A virus , Antiviral Agents/pharmacology , Antiviral Agents/therapeutic use , Dextromethorphan , Humans , Influenza A virus/genetics
13.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34477201

ABSTRACT

Combination therapy has shown an obvious efficacy on complex diseases and can greatly reduce the development of drug resistance. However, even with high-throughput screens, experimental methods are insufficient to explore novel drug combinations. In order to reduce the search space of drug combinations, there is an urgent need to develop more efficient computational methods to predict novel drug combinations. In recent decades, more and more machine learning (ML) algorithms have been applied to improve the predictive performance. The object of this study is to introduce and discuss the recent applications of ML methods and the widely used databases in drug combination prediction. In this study, we first describe the concept and controversy of synergism between drug combinations. Then, we investigate various publicly available data resources and tools for prediction tasks. Next, ML methods including classic ML and deep learning methods applied in drug combination prediction are introduced. Finally, we summarize the challenges to ML methods in prediction tasks and provide a discussion on future work.


Subject(s)
Algorithms , Machine Learning , Databases, Factual , Drug Combinations , Drug Interactions
14.
Brief Bioinform ; 23(3)2022 05 13.
Article in English | MEDLINE | ID: mdl-35352098

ABSTRACT

Synthetic lethality (SL) occurs between two genes when the inactivation of either gene alone has no effect on cell survival but the inactivation of both genes results in cell death. SL-based therapy has become one of the most promising targeted cancer therapies in the last decade as PARP inhibitors achieve great success in the clinic. The key point to exploiting SL-based cancer therapy is the identification of robust SL pairs. Although many wet-lab-based methods have been developed to screen SL pairs, known SL pairs are less than 0.1% of all potential pairs due to large number of human gene combinations. Computational prediction methods complement wet-lab-based methods to effectively reduce the search space of SL pairs. In this paper, we review the recent applications of computational methods and commonly used databases for SL prediction. First, we introduce the concept of SL and its screening methods. Second, various SL-related data resources are summarized. Then, computational methods including statistical-based methods, network-based methods, classical machine learning methods and deep learning methods for SL prediction are summarized. In particular, we elaborate on the negative sampling methods applied in these models. Next, representative tools for SL prediction are introduced. Finally, the challenges and future work for SL prediction are discussed.


Subject(s)
Neoplasms , Synthetic Lethal Mutations , Databases, Factual , Humans , Machine Learning , Neoplasms/genetics
15.
Bioinformatics ; 39(12)2023 12 01.
Article in English | MEDLINE | ID: mdl-37995293

ABSTRACT

SUMMARY: A variety of computational methods have been developed to identify functionally related gene modules from genome-wide gene expression profiles. Integrating the results of these methods to identify consensus modules is a promising approach to produce more accurate and robust results. In this application note, we introduce COMMO, the first web server to identify and analyze consensus gene functionally related gene modules from different module detection methods. First, COMMO implements eight state-of-the-art module detection methods and two consensus clustering algorithms. Second, COMMO provides users with mRNA and protein expression data for 33 cancer types from three public databases. Users can also upload their own data for module detection. Third, users can perform functional enrichment and two types of survival analyses on the observed gene modules. Finally, COMMO provides interactive, customizable visualizations and exportable results. With its extensive analysis and interactive capabilities, COMMO offers a user-friendly solution for conducting module-based precision medicine research. AVAILABILITY AND IMPLEMENTATION: COMMO web is available at https://commo.ncpsb.org.cn/, with the source code available on GitHub: https://github.com/Song-xinyu/COMMO/tree/master.


Subject(s)
Gene Regulatory Networks , Software , Consensus , Algorithms , Computers
16.
Nucleic Acids Res ; 50(W1): W312-W321, 2022 07 05.
Article in English | MEDLINE | ID: mdl-35639516

ABSTRACT

In the era of life-omics, huge amounts of multi-omics data have been generated and widely used in biomedical research. It is challenging for biologists with limited programming skills to obtain biological insights from multi-omics data. Thus, a biologist-oriented platform containing visualization functions is needed to make complex omics data digestible. Here, we propose an easy-to-use, interactive web server named ExpressVis. In ExpressVis, users can prepare datasets; perform differential expression analysis, clustering analysis, and survival analysis; and integrate expression data with protein-protein interaction networks and pathway maps. These analyses are organized into six modules. Users can use each module independently or use several modules interactively. ExpressVis displays analysis results in interactive figures and tables, and provides comprehensive interactive operations in each figure and table, between figures or tables in each module, and among different modules. It is freely accessible at https://omicsmining.ncpsb.org.cn/ExpressVis and does not require login. To test the performance of ExpressVis for multi-omics studies of clinical cohorts, we re-analyzed a published hepatocellular carcinoma dataset and reproduced their main findings, suggesting that ExpressVis is convenient enough to analyze multi-omics data. Based on its complete analysis processes and unique interactive operations, ExpressVis provides an easy-to-use solution for exploring multi-omics data.


Subject(s)
Multiomics , Software , Computers , Protein Interaction Maps , Internet
17.
BMC Bioinformatics ; 24(1): 325, 2023 Aug 29.
Article in English | MEDLINE | ID: mdl-37644423

ABSTRACT

INTRODUCTION: There are countless possibilities for drug combinations, which makes it expensive and time-consuming to rely solely on clinical trials to determine the effects of each possible drug combination. In order to screen out the most effective drug combinations more quickly, scholars began to apply machine learning to drug combination prediction. However, most of them are of low interpretability. Consequently, even though they can sometimes produce high prediction accuracy, experts in the medical and biological fields can still not fully rely on their judgments because of the lack of knowledge about the decision-making process. RELATED WORK: Decision trees and their ensemble algorithms are considered to be suitable methods for pharmaceutical applications due to their excellent performance and good interpretability. We review existing decision trees or decision tree ensemble algorithms in the medical field and point out their shortcomings. METHOD: This study proposes a decision stump (DS)-based solution to extract interpretable knowledge from data sets. In this method, a set of DSs is first generated to selectively form a decision tree (DST). Different from the traditional decision tree, our algorithm not only enables a partial exchange of information between base classifiers by introducing a stump exchange method but also uses a modified Gini index to evaluate stump performance so that the generation of each node is evaluated by a global view to maintain high generalization ability. Furthermore, these trees are combined to construct an ensemble of DST (EDST). EXPERIMENT: The two-drug combination data sets are collected from two cell lines with three classes (additive, antagonistic and synergistic effects) to test our method. Experimental results show that both our DST and EDST perform better than other methods. Besides, the rules generated by our methods are more compact and more accurate than other rule-based algorithms. Finally, we also analyze the extracted knowledge by the model in the field of bioinformatics. CONCLUSION: The novel decision tree ensemble model can effectively predict the effect of drug combination datasets and easily obtain the decision-making process.


Subject(s)
Algorithms , Computational Biology , Cell Line , Drug Combinations , Knowledge
18.
Brief Bioinform ; 22(5)2021 09 02.
Article in English | MEDLINE | ID: mdl-33454752

ABSTRACT

The exploration of three-dimensional chromatin interaction and organization provides insight into mechanisms underlying gene regulation, cell differentiation and disease development. Advances in chromosome conformation capture technologies, such as high-throughput chromosome conformation capture (Hi-C) and chromatin interaction analysis by paired-end tag (ChIA-PET), have enabled the exploration of chromatin interaction and organization. However, high-resolution Hi-C and ChIA-PET data are only available for a limited number of cell lines, and their acquisition is costly, time consuming, laborious and affected by theoretical limitations. Increasing evidence shows that DNA sequence and epigenomic features are informative predictors of regulatory interaction and chromatin architecture. Based on these features, numerous computational methods have been developed for the prediction of chromatin interaction and organization, whereas they are not extensively applied in biomedical study. A systematical study to summarize and evaluate such methods is still needed to facilitate their application. Here, we summarize 48 computational methods for the prediction of chromatin interaction and organization using sequence and epigenomic profiles, categorize them and compare their performance. Besides, we provide a comprehensive guideline for the selection of suitable methods to predict chromatin interaction and organization based on available data and biological question of interest.


Subject(s)
Chromatin/chemistry , Epigenesis, Genetic , Supervised Machine Learning , Unsupervised Machine Learning , Base Sequence , Chromatin/metabolism , Chromatin Assembly and Disassembly , Genome, Human , Humans , Sequence Analysis, DNA
19.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-32987404

ABSTRACT

Topologically associated domains (TADs) are spatial and functional units of metazoan chromatin structure. Interpretation of the interplay between regulatory factors and chromatin structure within TADs is crucial to understand the spatial and temporal regulation of gene expression. However, a computational metric for the sensitive characterization of TAD regulatory landscape is lacking. Here, we present the spatial density of open chromatin (SDOC) metric as a quantitative measurement of intra-TAD chromatin state and structure. SDOC sensitively reflects epigenetic properties and gene transcriptional activity in TADs. During mouse T-cell development, we found that TADs with decreased SDOC are enriched in repressed developmental genes, and the joint effect of SDOC-decreasing and TAD clustering corresponds to the highest level of gene repression. In addition, we revealed a pervasive preference for TADs with similar SDOC to interact with each other, which may reflect the principle of chromatin organization.


Subject(s)
Algorithms , Chromatin Assembly and Disassembly/genetics , Chromatin/genetics , Computational Biology/methods , Gene Expression Profiling/methods , Genome/genetics , Animals , Cell Differentiation/genetics , Cell Line , Chromatin/metabolism , Cluster Analysis , Epigenomics/methods , Humans , K562 Cells , RNA-Seq/methods , Reproducibility of Results , T-Lymphocytes/classification , T-Lymphocytes/cytology , T-Lymphocytes/metabolism
20.
J Chem Inf Model ; 63(12): 3941-3954, 2023 06 26.
Article in English | MEDLINE | ID: mdl-37303117

ABSTRACT

Combination therapy is a promising clinical treatment strategy for cancer and other complex diseases. Multiple drugs can target multiple proteins and pathways, greatly improving the therapeutic effect and slowing down drug resistance. To narrow the search space of synergistic drug combinations, many prediction models have been developed. However, drug combination datasets always have the characteristics of class imbalance. Synergistic drug combinations receive the most attention in clinical application but are in small numbers. To predict synergistic drug combinations in different cancer cell lines, in this study, we propose a genetic algorithm-based ensemble learning framework, GA-DRUG, to address the problems of class imbalance and high dimensionality of input data. The cell-line-specific gene expression profiles under drug perturbations are used to train GA-DRUG, which contains imbalanced data processing and the search of global optimal solutions. Compared to 11 state-of-the-art algorithms, GA-DRUG achieves the best performance and significantly improves the prediction performance in the minority class (Synergy). The ensemble framework can effectively correct the classification results of a single classifier. In addition, the cellular proliferation experiment performed on several previously unexplored drug combinations further confirms the predictive ability of GA-DRUG.


Subject(s)
Algorithms , Neoplasms , Humans , Drug Combinations , Neoplasms/drug therapy , Proteins , Machine Learning
SELECTION OF CITATIONS
SEARCH DETAIL