Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-39058606

RESUMO

In the past decade, Artificial Intelligence (AI) driven drug design and discovery has been a hot research topic in the AI area, where an important branch is molecule generation by generative models, from GAN-based models and VAE-based models to the latest diffusion-based models. However, most existing models pursue mainly the basic properties like validity and uniqueness of the generated molecules, a few go further to explicitly optimize one single important molecular property (e.g., QED or PlogP), which makes most generated molecules little usefulness in practice. In this paper, we present a novel approach to generating molecules with desirable properties, which expands the diffusion model framework with multiple innovative designs. The novelty is two-fold. On the one hand, considering that the structures of molecules are complex and diverse, and molecular properties are usually determined by some substructures (e.g., pharmacophores), we propose to perform diffusion on two structural levels: molecules and molecular fragments respectively, with which a mixed Gaussian distribution is obtained for the reverse diffusion process. To get desirable molecular fragments, we develop a novel electronic effect based fragmentation method. On the other hand, we introduce two ways to explicitly optimize multiple molecular properties under the diffusion model framework. First, as potential drug molecules must be chemically valid, we optimize molecular validity by an energy-guidance function. Second, since potential drug molecules should be desirable in various properties, we employ a multi-objective mechanism to optimize multiple molecular properties simultaneously. Extensive experiments with two benchmark datasets QM9 and ZINC250k show that the molecules generated by our proposed method have better validity, uniqueness, novelty, Fr´echet ChemNet Distance (FCD), QED, and PlogP than those generated by current SOTA models. The Code of D2L-OMP is available at https://github.com/bz99bz/D2L-OMP.

2.
Artigo em Inglês | MEDLINE | ID: mdl-39078760

RESUMO

Causal partitioning is an effective approach for causal discovery based on the divide-and-conquer strategy. Up to now, various heuristic methods based on conditional independence (CI) tests have been proposed for causal partitioning. However, most of these methods fail to achieve satisfactory partitioning without violating d-separation, leading to poor inference performance. In this work, we transform causal partitioning into an alternative problem that can be more easily solved. Concretely, we first construct a superstructure G of the true causal graph GT by performing a set of low-order CI tests on the observed data D. Then, we leverage point-line duality to obtain a graph GA adjoint to G. We show that the solution of minimizing edge-cut ratio on GA can lead to a valid causal partitioning with smaller causal-cut ratio on G and without violating d-separation. We design an efficient algorithm to solve this problem. Extensive experiments show that the proposed method can achieve significantly better causal partitioning without violating d-separation than the existing methods. The source code and data are available at https://github.com/hzsiat/CPA.

3.
Artigo em Inglês | MEDLINE | ID: mdl-38809725

RESUMO

The discovery of cancer biomarkers helps to advance medical diagnosis and plays an important role in biomedical applications. Most of the existing data-driven methods identify biomarkers by ranking-based strategies, which generally return a subset or superset of the actual biomarkers, while some other causal-wise feature selection methods are based on Markov Blanket (MB) learning, facing the challenges of high-dimensionality & low-sample. In this work, we propose a novel hybrid causal feature selection method (called CAFES) to support large-scale cancer biomarker discovery from real RNA-seq data. Concretely, CAFES first uses minimal-redundancy & maximal-relevance strategy for dimensionality reduction that returns a set of candidate features. CAFES then learns the causal skeleton w.r.t. those features by CI tests and further obtains an appropriate superset of the MB of the target variable. Finally, CAFES learns the causal structure of this superset by the DAG-GNN algorithm and then obtains the MB of the target variable, which can be treated as the cancer biomarkers. We conduct experiments to evaluate the proposed method on two real well-known RNA-seq datasets that covering both binary and multi-class cases. We compare our method CAFES with seven recent methods including Semi-HITON-MB, STMB, BAMB, FBED, LCS-FS, EEMB, and EAMB. The results show that CAFES can identify dozens of cancer biomarkers, and 1/6  âˆ¼ 1/2 of the discovered biomarkers can be verified by existing works that they are really directly related to the corresponding disease. An advantage of CAFES is that its Recall is significantly higher than those of all the counterparts, indicating that the continuous optimization (DAG-GNN) with the returned causal skeleton after feature selection (that can be treated as a conditional independence-based constraint to the optimization problem) is effective in cancer biomarkers identification under high-dimensional and low-sample RNA-seq data. The source code of CAFES is available at https://github.com/Milkteaww/CFS.

4.
Bioinformatics ; 40(5)2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38710497

RESUMO

MOTIVATION: Molecular property prediction (MPP) is a fundamental but challenging task in the computer-aided drug discovery process. More and more recent works employ different graph-based models for MPP, which have achieved considerable progress in improving prediction performance. However, current models often ignore relationships between molecules, which could be also helpful for MPP. RESULTS: For this sake, in this article we propose a graph structure learning (GSL) based MPP approach, called GSL-MPP. Specifically, we first apply graph neural network (GNN) over molecular graphs to extract molecular representations. Then, with molecular fingerprints, we construct a molecule similarity graph (MSG). Following that, we conduct GSL on the MSG, i.e. molecule-level GSL, to get the final molecular embeddings, which are the results of fuzing both GNN encoded molecular representations and the relationships among molecules. That is, combining both intra-molecule and inter-molecule information. Finally, we use these molecular embeddings to perform MPP. Extensive experiments on 10 various benchmark datasets show that our method could achieve state-of-the-art performance in most cases, especially on classification tasks. Further visualization studies also demonstrate the good molecular representations of our method. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/zby961104/GSL-MPP.


Assuntos
Redes Neurais de Computação , Descoberta de Drogas/métodos , Aprendizado de Máquina , Algoritmos
5.
J Chem Inf Model ; 64(7): 2921-2930, 2024 04 08.
Artigo em Inglês | MEDLINE | ID: mdl-38145387

RESUMO

Self-supervised pretrained models are gaining increasingly more popularity in AI-aided drug discovery, leading to more and more pretrained models with the promise that they can extract better feature representations for molecules. Yet, the quality of learned representations has not been fully explored. In this work, inspired by the two phenomena of Activity Cliffs (ACs) and Scaffold Hopping (SH) in traditional Quantitative Structure-Activity Relationship analysis, we propose a method named Representation-Property Relationship Analysis (RePRA) to evaluate the quality of the representations extracted by the pretrained model and visualize the relationship between the representations and properties. The concepts of ACs and SH are generalized from the structure-activity context to the representation-property context, and the underlying principles of RePRA are analyzed theoretically. Two scores are designed to measure the generalized ACs and SH detected by RePRA, and therefore, the quality of representations can be evaluated. In experiments, representations of molecules from 10 target tasks generated by 7 pretrained models are analyzed. The results indicate that the state-of-the-art pretrained models can overcome some shortcomings of canonical Extended-Connectivity FingerPrints, while the correlation between the basis of the representation space and specific molecular substructures are not explicit. Thus, some representations could be even worse than the canonical fingerprints. Our method enables researchers to evaluate the quality of molecular representations generated by their proposed self-supervised pretrained models. And our findings can guide the community to develop better pretraining techniques to regularize the occurrence of ACs and SH.


Assuntos
Fármacos Anti-HIV , Descoberta de Drogas , Hidrolases , Aprendizagem , Relação Quantitativa Estrutura-Atividade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA