Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 186
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Trends Genet ; 39(1): 1-4, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-35934594

RESUMO

Ionizing radiation (IR)-induced DNA damage and repair are complex and occur at hierarchical chromatin structures; radiobiology needs to be studied from a 3D-genomic perspective. Differences in IR damage and repair throughout the 3D genome may help to explain differences in radiosensitivity.


Assuntos
Dano ao DNA , Reparo do DNA , Reparo do DNA/genética , Dano ao DNA/genética , Radiação Ionizante , Tolerância a Radiação/genética , Genômica
2.
Genome Res ; 33(8): 1381-1394, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37524436

RESUMO

Accurately measuring biological age is crucial for improving healthcare for the elderly population. However, the complexity of aging biology poses challenges in how to robustly estimate aging and interpret the biological significance of the traits used for estimation. Here we present SCALE, a statistical pipeline that quantifies biological aging in different tissues using explainable features learned from literature and single-cell transcriptomic data. Applying SCALE to the "Mouse Aging Cell Atlas" (Tabula Muris Senis) data, we identified tissue-level transcriptomic aging programs for more than 20 murine tissues and created a multitissue resource of mouse quantitative aging-associated genes. We observe that SCALE correlates well with other age indicators, such as the accumulation of somatic mutations, and can distinguish subtle differences in aging even in cells of the same chronological age. We further compared SCALE with other transcriptomic and methylation "clocks" in data from aging muscle stem cells, Alzheimer's disease, and heterochronic parabiosis. Our results confirm that SCALE is more generalizable and reliable in assessing biological aging in aging-related diseases and rejuvenating interventions. Overall, SCALE represents a valuable advancement in our ability to measure aging accurately, robustly, and interpretably in single cells.


Assuntos
Envelhecimento , Transcriptoma , Animais , Camundongos , Envelhecimento/genética , Perfilação da Expressão Gênica , Fenótipo , Modelos Biológicos
3.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39226890

RESUMO

Nanopore selective sequencing allows the targeted sequencing of DNA of interest using computational approaches rather than experimental methods such as targeted multiplex polymerase chain reaction or hybridization capture. Compared to sequence-alignment strategies, deep learning (DL) models for classifying target and nontarget DNA provide large speed advantages. However, the relatively low accuracy of these DL-based tools hinders their application in nanopore selective sequencing. Here, we present a DL-based tool named ReadCurrent for nanopore selective sequencing, which takes electric currents as inputs. ReadCurrent employs a modified very deep convolutional neural network (VDCNN) architecture, enabling significantly lower computational costs for training and quicker inference compared to conventional VDCNN. We evaluated the performance of ReadCurrent across 10 nanopore sequencing datasets spanning human, yeasts, bacteria, and viruses. We observed that ReadCurrent achieved a mean accuracy of 98.57% for classification, outperforming four other DL-based selective sequencing methods. In experimental validation that selectively sequenced microbial DNA from human DNA, ReadCurrent achieved an enrichment ratio of 2.85, which was higher than the 2.7 ratio achieved by MinKNOW using the sequence-alignment strategy. In summary, ReadCurrent can rapidly classify target and nontarget DNA with high accuracy, providing an alternative in the toolbox for nanopore selective sequencing. ReadCurrent is available at https://github.com/Ming-Ni-Group/ReadCurrent.


Assuntos
Sequenciamento por Nanoporos , Sequenciamento por Nanoporos/métodos , Humanos , Análise de Sequência de DNA/métodos , Redes Neurais de Computação , Nanoporos , Software , Aprendizado Profundo , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos
4.
Brief Bioinform ; 25(6)2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39322627

RESUMO

Short-tandem repeats (STRs) are the type of genetic markers extensively utilized in biomedical and forensic applications. Due to sequencing noise in nanopore sequencing, accurate analysis methods are lacking. We developed NASTRA, an innovative tool for Nanopore Autosomal Short Tandem Repeat Analysis, which overcomes traditional database-based methods' limitations and provides a precise germline analysis of STR genetic markers without the need for allele sequence reference. Demonstrating high accuracy in cell line authentication testing and paternity testing, NASTRA significantly surpasses existing methods in both speed and accuracy. This advancement makes it a promising solution for rapid cell line authentication and kinship testing, highlighting the potential of nanopore sequencing for in-field applications.


Assuntos
Algoritmos , Repetições de Microssatélites , Sequenciamento por Nanoporos , Sequenciamento por Nanoporos/métodos , Humanos , Marcadores Genéticos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alelos
5.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38935071

RESUMO

Advances in chromatin mapping have exposed the complex chromatin hierarchical organization in mammals, including topologically associating domains (TADs) and their substructures, yet the functional implications of this hierarchy in gene regulation and disease progression are not fully elucidated. Our study delves into the phenomenon of shared TAD boundaries, which are pivotal in maintaining the hierarchical chromatin structure and regulating gene activity. By integrating high-resolution Hi-C data, chromatin accessibility, and DNA double-strand breaks (DSBs) data from various cell lines, we systematically explore the complex regulatory landscape at high-level TAD boundaries. Our findings indicate that these boundaries are not only key architectural elements but also vibrant hubs, enriched with functionally crucial genes and complex transcription factor binding site-clustered regions. Moreover, they exhibit a pronounced enrichment of DSBs, suggesting a nuanced interplay between transcriptional regulation and genomic stability. Our research provides novel insights into the intricate relationship between the 3D genome structure, gene regulation, and DNA repair mechanisms, highlighting the role of shared TAD boundaries in maintaining genomic integrity and resilience against perturbations. The implications of our findings extend to understanding the complexities of genomic diseases and open new avenues for therapeutic interventions targeting the structural and functional integrity of TAD boundaries.


Assuntos
Cromatina , Quebras de DNA de Cadeia Dupla , Reparo do DNA , Regulação da Expressão Gênica , Humanos , Cromatina/metabolismo , Cromatina/genética , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Animais , Genômica/métodos , Instabilidade Genômica , Montagem e Desmontagem da Cromatina
6.
Mol Cell ; 71(2): 306-318.e7, 2018 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-30017583

RESUMO

DNA N6-methyladenine (6mA) modification is the most prevalent DNA modification in prokaryotes, but whether it exists in human cells and whether it plays a role in human diseases remain enigmatic. Here, we showed that 6mA is extensively present in the human genome, and we cataloged 881,240 6mA sites accounting for ∼0.051% of the total adenines. [G/C]AGG[C/T] was the most significantly associated motif with 6mA modification. 6mA sites were enriched in the coding regions and mark actively transcribed genes in human cells. DNA 6mA and N6-demethyladenine modification in the human genome were mediated by methyltransferase N6AMT1 and demethylase ALKBH1, respectively. The abundance of 6mA was significantly lower in cancers, accompanied by decreased N6AMT1 and increased ALKBH1 levels, and downregulation of 6mA modification levels promoted tumorigenesis. Collectively, our results demonstrate that DNA 6mA modification is extensively present in human cells and the decrease of genomic DNA 6mA promotes human tumorigenesis.


Assuntos
Adenina/análogos & derivados , Homólogo AlkB 1 da Histona H2a Dioxigenase/metabolismo , Genoma Humano , DNA Metiltransferases Sítio Específica (Adenina-Específica)/metabolismo , Adenina/metabolismo , Homólogo AlkB 1 da Histona H2a Dioxigenase/genética , Animais , Carcinogênese/genética , DNA/genética , Metilação de DNA , Xenoenxertos , Humanos , Camundongos , Camundongos Nus , DNA Metiltransferases Sítio Específica (Adenina-Específica)/genética
7.
Nucleic Acids Res ; 52(13): 7610-7626, 2024 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-38813828

RESUMO

Gene expression is temporally and spatially regulated by the interaction of transcription factors (TFs) and cis-regulatory elements (CREs). The uneven distribution of TF binding sites across the genome poses challenges in understanding how this distribution evolves to regulate spatio-temporal gene expression and consequent heritable phenotypic variation. In this study, chromatin accessibility profiles and gene expression profiles were collected from several species including mammals (human, mouse, bovine), fish (zebrafish and medaka), and chicken. Transcription factor binding sites clustered regions (TFCRs) at different embryonic stages were characterized to investigate regulatory evolution. The study revealed dynamic changes in TFCR distribution during embryonic development and species evolution. The synchronization between TFCR complexity and gene expression was assessed across species using RegulatoryScore. Additionally, an explainable machine learning model highlighted the importance of the distance between TFCR and promoter in the coordinated regulation of TFCRs on gene expression. Our results revealed the developmental and evolutionary dynamics of TFCRs during embryonic development from fish, chicken to mammals. These data provide valuable resources for exploring the relationship between transcriptional regulation and phenotypic differences during embryonic development.


Assuntos
Evolução Molecular , Regulação da Expressão Gênica no Desenvolvimento , Aprendizado de Máquina , Fatores de Transcrição , Animais , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Sítios de Ligação , Humanos , Camundongos , Bovinos , Oryzias/genética , Oryzias/metabolismo , Oryzias/embriologia , Galinhas/genética , Desenvolvimento Embrionário/genética , Regiões Promotoras Genéticas , Cromatina/metabolismo , Cromatina/genética
8.
Genome Res ; 2022 Aug 17.
Artigo em Inglês | MEDLINE | ID: mdl-35977841

RESUMO

During early mammalian embryo development, different epigenetic marks undergo reprogramming and play crucial roles in the mediation of gene expression. Currently, several databases provide multi-omics information on early embryos. However, how interconnected epigenetic markers function together to coordinate the expression of the genetic code in a spatiotemporal manner remains difficult to analyze, markedly limiting scientific and clinical research. Here, we present dbEmbryo, an integrated and interactive multi-omics database for human and mouse early embryos. dbEmbryo integrates data on gene expression, DNA methylation, histone modifications, chromatin accessibility, and higher-order chromatin structure profiles for human and mouse early embryos. It incorporates customized analysis tools, such as "multi-omics visualization," "Gene&Peak annotation," "ZGA gene cluster," "cis-regulation," "synergistic regulation," "promoter signal enrichment," and "3D genome." Users can retrieve gene expression and epigenetic profile patterns to analyze synergistic changes across different early embryo developmental stages. We showed the uniqueness of dbEmbryo among extant databases containing data on early embryo development and provided an overview. Using dbEmbryo, we obtained a phase-separated model of transcriptional control during early embryo development. dbEmbryo offers web-based analytical tools and a comprehensive resource for biologists and clinicians to decipher molecular regulatory mechanisms of human and mouse early embryo development.

9.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36460622

RESUMO

Drug response prediction in cancer cell lines is of great significance in personalized medicine. In this study, we propose GADRP, a cancer drug response prediction model based on graph convolutional networks (GCNs) and autoencoders (AEs). We first use a stacked deep AE to extract low-dimensional representations from cell line features, and then construct a sparse drug cell line pair (DCP) network incorporating drug, cell line, and DCP similarity information. Later, initial residual and layer attention-based GCN (ILGCN) that can alleviate over-smoothing problem is utilized to learn DCP features. And finally, fully connected network is employed to make prediction. Benchmarking results demonstrate that GADRP can significantly improve prediction performance on all metrics compared with baselines on five datasets. Particularly, experiments of predictions of unknown DCP responses, drug-cancer tissue associations, and drug-pathway associations illustrate the predictive power of GADRP. All results highlight the effectiveness of GADRP in predicting drug responses, and its potential value in guiding anti-cancer drug selection.


Assuntos
Antineoplásicos , Neoplasias , Humanos , Neoplasias/tratamento farmacológico , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Benchmarking , Linhagem Celular , Aprendizagem
10.
Bioinformatics ; 40(4)2024 03 29.
Artigo em Inglês | MEDLINE | ID: mdl-38603616

RESUMO

MOTIVATION: Clustering analysis for single-cell RNA sequencing (scRNA-seq) data is an important step in revealing cellular heterogeneity. Many clustering methods have been proposed to discover heterogenous cell types from scRNA-seq data. However, adaptive clustering with accurate cluster number reflecting intrinsic biology nature from large-scale scRNA-seq data remains quite challenging. RESULTS: Here, we propose a single-cell Deep Adaptive Clustering (scDAC) model by coupling the Autoencoder (AE) and the Dirichlet Process Mixture Model (DPMM). By jointly optimizing the model parameters of AE and DPMM, scDAC achieves adaptive clustering with accurate cluster numbers on scRNA-seq data. We verify the performance of scDAC on five subsampled datasets with different numbers of cell types and compare it with 15 widely used clustering methods across nine scRNA-seq datasets. Our results demonstrate that scDAC can adaptively find accurate numbers of cell types or subtypes and outperforms other methods. Moreover, the performance of scDAC is robust to hyperparameter changes. AVAILABILITY AND IMPLEMENTATION: The scDAC is implemented in Python. The source code is available at https://github.com/labomics/scDAC.


Assuntos
Análise de Célula Única , Transcriptoma , Análise de Célula Única/métodos , Análise por Conglomerados , Transcriptoma/genética , Humanos , Algoritmos , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica/métodos , Software
11.
Nucleic Acids Res ; 51(D1): D1432-D1445, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36400569

RESUMO

The toxic effects of compounds on environment, humans, and other organisms have been a major focus of many research areas, including drug discovery and ecological research. Identifying the potential toxicity in the early stage of compound/drug discovery is critical. The rapid development of computational methods for evaluating various toxicity categories has increased the need for comprehensive and system-level collection of toxicological data, associated attributes, and benchmarks. To contribute toward this goal, we proposed TOXRIC (https://toxric.bioinforai.tech/), a database with comprehensive toxicological data, standardized attribute data, practical benchmarks, informative visualization of molecular representations, and an intuitive function interface. The data stored in TOXRIC contains 113 372 compounds, 13 toxicity categories, 1474 toxicity endpoints covering in vivo/in vitro endpoints and 39 feature types, covering structural, target, transcriptome, metabolic data, and other descriptors. All the curated datasets of endpoints and features can be retrieved, downloaded and directly used as output or input to Machine Learning (ML)-based prediction models. In addition to serving as a data repository, TOXRIC also provides visualization of benchmarks and molecular representations for all endpoint datasets. Based on these results, researchers can better understand and select optimal feature types, molecular representations, and baseline algorithms for each endpoint prediction task. We believe that the rich information on compound toxicology, ML-ready datasets, benchmarks and molecular representation distribution can greatly facilitate toxicological investigations, interpretation of toxicological mechanisms, compound/drug discovery and the development of computational methods.


Assuntos
Bases de Dados Factuais , Toxicologia , Humanos , Benchmarking , Toxicologia/métodos , Software
12.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35380622

RESUMO

Drug-target interaction (DTI) prediction plays an important role in drug repositioning, drug discovery and drug design. However, due to the large size of the chemical and genomic spaces and the complex interactions between drugs and targets, experimental identification of DTIs is costly and time-consuming. In recent years, the emerging graph neural network (GNN) has been applied to DTI prediction because DTIs can be represented effectively using graphs. However, some of these methods are only based on homogeneous graphs, and some consist of two decoupled steps that cannot be trained jointly. To further explore GNN-based DTI prediction by integrating heterogeneous graph information, this study regards DTI prediction as a link prediction problem and proposes an end-to-end model based on HETerogeneous graph with Attention mechanism (DTI-HETA). In this model, a heterogeneous graph is first constructed based on the drug-drug and target-target similarity matrices and the DTI matrix. Then, the graph convolutional neural network is utilized to obtain the embedded representation of the drugs and targets. To highlight the contribution of different neighborhood nodes to the central node in aggregating the graph convolution information, a graph attention mechanism is introduced into the node embedding process. Afterward, an inner product decoder is applied to predict DTIs. To evaluate the performance of DTI-HETA, experiments are conducted on two datasets. The experimental results show that our model is superior to the state-of-the-art methods. Also, the identification of novel DTIs indicates that DTI-HETA can serve as a powerful tool for integrating heterogeneous graph information to predict DTIs.


Assuntos
Desenvolvimento de Medicamentos , Redes Neurais de Computação , Desenvolvimento de Medicamentos/métodos , Interações Medicamentosas , Reposicionamento de Medicamentos , Polímeros
13.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35062018

RESUMO

Combination therapy has shown an obvious curative effect on complex diseases, whereas the search space of drug combinations is too large to be validated experimentally even with high-throughput screens. With the increase of the number of drugs, artificial intelligence techniques, especially machine learning methods, have become applicable for the discovery of synergistic drug combinations to significantly reduce the experimental workload. In this study, in order to predict novel synergistic drug combinations in various cancer cell lines, the cell line-specific drug-induced gene expression profile (GP) is added as a new feature type to capture the cellular response of drugs and reveal the biological mechanism of synergistic effect. Then, an enhanced cascade-based deep forest regressor (EC-DFR) is innovatively presented to apply the new small-scale drug combination dataset involving chemical, physical and biological (GP) properties of drugs and cells. Verified by the dataset, EC-DFR outperforms two state-of-the-art deep neural network-based methods and several advanced classical machine learning algorithms. Biological experimental validation performed subsequently on a set of previously untested drug combinations further confirms the performance of EC-DFR. What is more prominent is that EC-DFR can distinguish the most important features, making it more interpretable. By evaluating the contribution of each feature type, GP feature contributes 82.40%, showing the cellular responses of drugs may play crucial roles in synergism prediction. The analysis based on the top contributing genes in GP further demonstrates some potential relationships between the transcriptomic levels of key genes under drug regulation and the synergism of drug combinations.


Assuntos
Inteligência Artificial , Biologia Computacional , Biologia Computacional/métodos , Combinação de Medicamentos , Aprendizado de Máquina , Redes Neurais de Computação
14.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34477201

RESUMO

Combination therapy has shown an obvious efficacy on complex diseases and can greatly reduce the development of drug resistance. However, even with high-throughput screens, experimental methods are insufficient to explore novel drug combinations. In order to reduce the search space of drug combinations, there is an urgent need to develop more efficient computational methods to predict novel drug combinations. In recent decades, more and more machine learning (ML) algorithms have been applied to improve the predictive performance. The object of this study is to introduce and discuss the recent applications of ML methods and the widely used databases in drug combination prediction. In this study, we first describe the concept and controversy of synergism between drug combinations. Then, we investigate various publicly available data resources and tools for prediction tasks. Next, ML methods including classic ML and deep learning methods applied in drug combination prediction are introduced. Finally, we summarize the challenges to ML methods in prediction tasks and provide a discussion on future work.


Assuntos
Algoritmos , Aprendizado de Máquina , Bases de Dados Factuais , Combinação de Medicamentos , Interações Medicamentosas
15.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35238349

RESUMO

Inhibition of host protein functions using established drugs produces a promising antiviral effect with excellent safety profiles, decreased incidence of resistant variants and favorable balance of costs and risks. Genomic methods have produced a large number of robust host factors, providing candidates for identification of antiviral drug targets. However, there is a lack of global perspectives and systematic prioritization of known virus-targeted host proteins (VTHPs) and drug targets. There is also a need for host-directed repositioned antivirals. Here, we integrated 6140 VTHPs and grouped viral infection modes from a new perspective of enriched pathways of VTHPs. Clarifying the superiority of nonessential membrane and hub VTHPs as potential ideal targets for repositioned antivirals, we proposed 543 candidate VTHPs. We then presented a large-scale drug-virus network (DVN) based on matching these VTHPs and drug targets. We predicted possible indications for 703 approved drugs against 35 viruses and explored their potential as broad-spectrum antivirals. In vitro and in vivo tests validated the efficacy of bosutinib, maraviroc and dextromethorphan against human herpesvirus 1 (HHV-1), hepatitis B virus (HBV) and influenza A virus (IAV). Their drug synergy with clinically used antivirals was evaluated and confirmed. The results proved that low-dose dextromethorphan is better than high-dose in both single and combined treatments. This study provides a comprehensive landscape and optimization strategy for druggable VTHPs, constructing an innovative and potent pipeline to discover novel antiviral host proteins and repositioned drugs, which may facilitate their delivery to clinical application in translational medicine to combat fatal and spreading viral infections.


Assuntos
Antivirais , Vírus da Influenza A , Antivirais/farmacologia , Antivirais/uso terapêutico , Dextrometorfano , Humanos , Vírus da Influenza A/genética
16.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35352098

RESUMO

Synthetic lethality (SL) occurs between two genes when the inactivation of either gene alone has no effect on cell survival but the inactivation of both genes results in cell death. SL-based therapy has become one of the most promising targeted cancer therapies in the last decade as PARP inhibitors achieve great success in the clinic. The key point to exploiting SL-based cancer therapy is the identification of robust SL pairs. Although many wet-lab-based methods have been developed to screen SL pairs, known SL pairs are less than 0.1% of all potential pairs due to large number of human gene combinations. Computational prediction methods complement wet-lab-based methods to effectively reduce the search space of SL pairs. In this paper, we review the recent applications of computational methods and commonly used databases for SL prediction. First, we introduce the concept of SL and its screening methods. Second, various SL-related data resources are summarized. Then, computational methods including statistical-based methods, network-based methods, classical machine learning methods and deep learning methods for SL prediction are summarized. In particular, we elaborate on the negative sampling methods applied in these models. Next, representative tools for SL prediction are introduced. Finally, the challenges and future work for SL prediction are discussed.


Assuntos
Neoplasias , Mutações Sintéticas Letais , Bases de Dados Factuais , Humanos , Aprendizado de Máquina , Neoplasias/genética
17.
Bioinformatics ; 39(12)2023 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-37995293

RESUMO

SUMMARY: A variety of computational methods have been developed to identify functionally related gene modules from genome-wide gene expression profiles. Integrating the results of these methods to identify consensus modules is a promising approach to produce more accurate and robust results. In this application note, we introduce COMMO, the first web server to identify and analyze consensus gene functionally related gene modules from different module detection methods. First, COMMO implements eight state-of-the-art module detection methods and two consensus clustering algorithms. Second, COMMO provides users with mRNA and protein expression data for 33 cancer types from three public databases. Users can also upload their own data for module detection. Third, users can perform functional enrichment and two types of survival analyses on the observed gene modules. Finally, COMMO provides interactive, customizable visualizations and exportable results. With its extensive analysis and interactive capabilities, COMMO offers a user-friendly solution for conducting module-based precision medicine research. AVAILABILITY AND IMPLEMENTATION: COMMO web is available at https://commo.ncpsb.org.cn/, with the source code available on GitHub: https://github.com/Song-xinyu/COMMO/tree/master.


Assuntos
Redes Reguladoras de Genes , Software , Consenso , Algoritmos , Computadores
18.
Nucleic Acids Res ; 50(W1): W312-W321, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35639516

RESUMO

In the era of life-omics, huge amounts of multi-omics data have been generated and widely used in biomedical research. It is challenging for biologists with limited programming skills to obtain biological insights from multi-omics data. Thus, a biologist-oriented platform containing visualization functions is needed to make complex omics data digestible. Here, we propose an easy-to-use, interactive web server named ExpressVis. In ExpressVis, users can prepare datasets; perform differential expression analysis, clustering analysis, and survival analysis; and integrate expression data with protein-protein interaction networks and pathway maps. These analyses are organized into six modules. Users can use each module independently or use several modules interactively. ExpressVis displays analysis results in interactive figures and tables, and provides comprehensive interactive operations in each figure and table, between figures or tables in each module, and among different modules. It is freely accessible at https://omicsmining.ncpsb.org.cn/ExpressVis and does not require login. To test the performance of ExpressVis for multi-omics studies of clinical cohorts, we re-analyzed a published hepatocellular carcinoma dataset and reproduced their main findings, suggesting that ExpressVis is convenient enough to analyze multi-omics data. Based on its complete analysis processes and unique interactive operations, ExpressVis provides an easy-to-use solution for exploring multi-omics data.


Assuntos
Multiômica , Software , Computadores , Mapas de Interação de Proteínas , Internet
19.
Int J Mol Sci ; 25(17)2024 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-39273227

RESUMO

Predicting protein-ligand binding sites is an integral part of structural biology and drug design. A comprehensive understanding of these binding sites is essential for advancing drug innovation, elucidating mechanisms of biological function, and exploring the nature of disease. However, accurately identifying protein-ligand binding sites remains a challenging task. To address this, we propose PGpocket, a geometric deep learning-based framework to improve protein-ligand binding site prediction. Initially, the protein surface is converted into a point cloud, and then the geometric and chemical properties of each point are calculated. Subsequently, the point cloud graph is constructed based on the inter-point distances, and the point cloud graph neural network (GNN) is applied to extract and analyze the protein surface information to predict potential binding sites. PGpocket is trained on the scPDB dataset, and its performance is verified on two independent test sets, Coach420 and HOLO4K. The results show that PGpocket achieves a 58% success rate on the Coach420 dataset and a 56% success rate on the HOLO4K dataset. These results surpass competing algorithms, demonstrating PGpocket's advancement and practicality for protein-ligand binding site prediction.


Assuntos
Redes Neurais de Computação , Proteínas , Sítios de Ligação , Ligantes , Proteínas/química , Proteínas/metabolismo , Ligação Proteica , Algoritmos , Aprendizado Profundo , Bases de Dados de Proteínas
20.
BMC Bioinformatics ; 24(1): 325, 2023 Aug 29.
Artigo em Inglês | MEDLINE | ID: mdl-37644423

RESUMO

INTRODUCTION: There are countless possibilities for drug combinations, which makes it expensive and time-consuming to rely solely on clinical trials to determine the effects of each possible drug combination. In order to screen out the most effective drug combinations more quickly, scholars began to apply machine learning to drug combination prediction. However, most of them are of low interpretability. Consequently, even though they can sometimes produce high prediction accuracy, experts in the medical and biological fields can still not fully rely on their judgments because of the lack of knowledge about the decision-making process. RELATED WORK: Decision trees and their ensemble algorithms are considered to be suitable methods for pharmaceutical applications due to their excellent performance and good interpretability. We review existing decision trees or decision tree ensemble algorithms in the medical field and point out their shortcomings. METHOD: This study proposes a decision stump (DS)-based solution to extract interpretable knowledge from data sets. In this method, a set of DSs is first generated to selectively form a decision tree (DST). Different from the traditional decision tree, our algorithm not only enables a partial exchange of information between base classifiers by introducing a stump exchange method but also uses a modified Gini index to evaluate stump performance so that the generation of each node is evaluated by a global view to maintain high generalization ability. Furthermore, these trees are combined to construct an ensemble of DST (EDST). EXPERIMENT: The two-drug combination data sets are collected from two cell lines with three classes (additive, antagonistic and synergistic effects) to test our method. Experimental results show that both our DST and EDST perform better than other methods. Besides, the rules generated by our methods are more compact and more accurate than other rule-based algorithms. Finally, we also analyze the extracted knowledge by the model in the field of bioinformatics. CONCLUSION: The novel decision tree ensemble model can effectively predict the effect of drug combination datasets and easily obtain the decision-making process.


Assuntos
Algoritmos , Biologia Computacional , Linhagem Celular , Combinação de Medicamentos , Conhecimento
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA