RESUMO
Generative molecular models generate novel molecules with desired properties by searching chemical space. Traditional combinatorial optimization methods, such as genetic algorithms, have demonstrated superior performance in various molecular optimization tasks. However, these methods do not utilize docking simulation to inform the design process, and heavy dependence on the quality and quantity of available data, as well as require additional structural optimization to become candidate drugs. To address this limitation, we propose a novel model named DockingGA that combines Transformer neural networks and genetic algorithms to generate molecules with better binding affinity for specific targets. In order to generate high quality molecules, we chose the Self-referencing Chemical Structure Strings to represent the molecule and optimize the binding affinity of the molecules to different targets. Compared to other baseline models, DockingGA proves to be the optimal model in all docking results for the top 1, 10 and 100 molecules, while maintaining 100% novelty. Furthermore, the distribution of physicochemical properties demonstrates the ability of DockingGA to generate molecules with favorable and appropriate properties. This innovation creates new opportunities for the application of generative models in practical drug discovery.
Assuntos
Algoritmos , Simulação de Acoplamento Molecular , Redes Neurais de Computação , Simulação de Acoplamento Molecular/métodos , Descoberta de Drogas/métodosRESUMO
Single-cell transcriptomics is rapidly advancing our understanding of the composition of complex tissues and biological cells, and single-cell RNA sequencing (scRNA-seq) holds great potential for identifying and characterizing the cell composition of complex tissues. Cell type identification by analyzing scRNA-seq data is mostly limited by time-consuming and irreproducible manual annotation. As scRNA-seq technology scales to thousands of cells per experiment, the exponential increase in the number of cell samples makes manual annotation more difficult. On the other hand, the sparsity of gene transcriptome data remains a major challenge. This paper applied the idea of the transformer to single-cell classification tasks based on scRNA-seq data. We propose scTransSort, a cell-type annotation method pretrained with single-cell transcriptomics data. The scTransSort incorporates a method of representing genes as gene expression embedding blocks to reduce the sparsity of data used for cell type identification and reduce the computational complexity. The feature of scTransSort is that its implementation of intelligent information extraction for unordered data, automatically extracting valid features of cell types without the need for manually labeled features and additional references. In experiments on cells from 35 human and 26 mouse tissues, scTransSort successfully elucidated its high accuracy and high performance for cell type identification, and demonstrated its own high robustness and generalization ability.
Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Humanos , Animais , Camundongos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , TranscriptomaRESUMO
Advancements in single-cell sequencing research have revolutionized our understanding of cellular heterogeneity and functional diversity through the analysis of single-cell transcriptomes and genomes. A crucial step in single-cell RNA sequencing (scRNA-seq) analysis is identifying cell types. However, scRNA-seq data are often high dimensional and sparse, and manual cell type identification can be time-consuming, subjective, and lack reproducibility. Consequently, analyzing scRNA-seq data remains a computational challenge. With the increasing availability of well-annotated scRNA-seq datasets, advanced methods are emerging to aid in cell type identification by leveraging this information. Deep learning neural networks have great potential for analyzing single-cell data. This paper proposes MulCNN, a multi-level convolutional neural network that uses a unique cell type-specific gene expression feature extraction method. This method extracts critical features through multi-scale convolution while filtering noise. Extensive testing using datasets from various species and comparisons with popular classification methods show that MulCNN has outstanding performance and offers a new and scalable direction for scRNA-seq analysis.
RESUMO
Recent advances in single-cell RNA sequencing (scRNA-seq) have accelerated the development of techniques to classify thousands of cells through transcriptome profiling. As more and more scRNA-seq data become available, supervised cell type classification methods using externally well-annotated source data become more popular than unsupervised clustering algorithms. However, accurate cellular annotation of single cell transcription data remains a significant challenge. Here, we propose a hybrid network structure called TransCluster, which uses linear discriminant analysis and a modified Transformer to enhance feature learning. It is a cell-type identification tool for single-cell transcriptomic maps. It shows high accuracy and robustness in many cell data sets of different human tissues. It is superior to other known methods in external test data set. To our knowledge, TransCluster is the first attempt to use Transformer for annotating cell types of scRNA-seq, which greatly improves the accuracy of cell-type identification.