RESUMO
Molecular clustering analysis has been developed to facilitate visual inspection in the process of structure-based virtual screening. However, traditional methods based on molecular fingerprints or molecular descriptors limit the accuracy of selecting active hit compounds, which may be attributed to the lack of representations of receptor structural and protein-ligand interaction during the clustering. Here, a novel deep clustering framework named ClusterX is proposed to learn molecular representations of protein-ligand complexes and cluster the ligands. In ClusterX, the graph was used to represent the protein-ligand complex, and the joint optimisation can be used efficiently for learning the cluster-friendly features. Experiments on the KLIFs database show that the model can distinguish well between the binding modes of different kinase inhibitors. To validate the effectiveness of the model, the clustering results on the virtual screening dataset further demonstrated that ClusterX achieved better or more competitive performance against traditional methods, such as SIFt and extended connectivity fingerprints. This framework may provide a unique tool for clustering analysis and prove to assist computational medicinal chemists in visual decision-making.