Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 64
Filtrar
1.
Comput Biol Med ; 182: 109207, 2024 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-39341115

RESUMO

Precise estimations of RNA secondary structures have the potential to reveal the various roles that non-coding RNAs play in regulating cellular activity. However, the mainstay of traditional RNA secondary structure prediction methods relies on thermos-dynamic models via free energy minimization, a laborious process that requires a lot of prior knowledge. Here, RNA secondary structure prediction using Wfold, an end-to-end deep learning-based approach, is suggested. Wfold is trained directly on annotated data and base-pairing criteria. It makes use of an image-like representation of RNA sequences, which an enhanced U-net incorporated with a transformer encoder can process effectively. Wfold eventually increases the accuracy of RNA secondary structure prediction by combining the benefits of self-attention mechanism's mining of long-range information with U-net's ability to gather local information. We compare Wfold's performance using RNA datasets that are within and across families. When trained and evaluated on different RNA families, it achieves a similar performance as the traditional methods, but dramatically outperforms the state-of-the-art methods on within-family datasets. Moreover, Wfold can also reliably forecast pseudoknots. The findings imply that Wfold may be useful for improving sequence alignment, functional annotations, and RNA structure modeling.

2.
ACS Appl Mater Interfaces ; 16(38): 51554-51564, 2024 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-39264852

RESUMO

Hydrogenated silsesquioxane (HSQ) is a key inorganic electron beam resist, celebrated for its sub-10 nm resolution and etching resistance, but it faces challenges with stability and sensitivity. Our innovative study has comprehensively assessed the lithographic performance of three functionalized polysilsesquioxane (PSQ) resist series─olefins, halogenated alkanes, and alkanes─under electron beam lithography (EBL). We discovered that the addition of olefin groups, such as in the HMP-30 formulation with 30% propyl acrylate, remarkably increased the sensitivity to 0.6 µC/cm2. The inclusion of halogenated aromatic and hydrogen-substituted methyl groups further enhanced sensitivity and contrast, with HClBN-50 achieving a 22.9 nm resolution pattern. At the same time, the storage of PSQ resists was significantly improved compared to commercial HSQ with increasing alkane group content. Crucially, our research has unveiled the lithography reaction mechanism, highlighting how group encapsulation and steric hindrance influence PSQ performance. This insight is groundbreaking, offering a deeper understanding of the molecular structure-performance relationship and laying the groundwork for developing next-generation electron beam resists with superior sensitivity, resolution, and contrast for microelectronics manufacturing.

3.
J Colloid Interface Sci ; 676: 158-167, 2024 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-39024816

RESUMO

Non-oxidative intercalation of graphite avoids damage to graphene lattices and is a suitable method to produce high-quality graphene. However, the yield of exfoliated graphene is low in this process due to the poor delamination efficiency of guest species. In this study, a Brønsted acid intercalation protocol is developed involving polyoxometalate (POM) clusters (H6P2W18O62) as guests and intercalation of graphite is realized at the sub-nanometer scale. Theoretical simulation based on DFT elucidates the stepwise intercalation mechanism of Brønsted acid molecules and clusters. Unlike common molecules/ionic guests, intercalation of POM clusters induces large expansion and extensive donor-acceptor interactions among graphite interlayers. This significantly weakens the van der Waals forces and promotes exfoliation efficiency of graphene layers. The exfoliated graphene possesses outstanding features of large lateral size, thin thickness, and high purity, and shows excellent performance as the anode for high power sodium-ion batteries. This work proffers a new pathway toward non-oxidative intercalation of graphite for large-scale production of graphene.

4.
Interdiscip Sci ; 16(3): 741-754, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38710957

RESUMO

Molecular representation learning can preserve meaningful molecular structures as embedding vectors, which is a necessary prerequisite for molecular property prediction. Yet, learning how to accurately represent molecules remains challenging. Previous approaches to learning molecular representations in an end-to-end manner potentially suffered information loss while neglecting the utilization of molecular generative representations. To obtain rich molecular feature information, the pre-training molecular representation model utilized different molecular representations to reduce information loss caused by a single molecular representation. Therefore, we provide the MVGC, a unique multi-view generative contrastive learning pre-training model. Our pre-training framework specifically acquires knowledge of three fundamental feature representations of molecules and effectively integrates them to predict molecular properties on benchmark datasets. Comprehensive experiments on seven classification tasks and three regression tasks demonstrate that our proposed MVGC model surpasses the majority of state-of-the-art approaches. Moreover, we explore the potential of the MVGC model to learn the representation of molecules with chemical significance.


Assuntos
Aprendizado de Máquina , Algoritmos , Modelos Moleculares
5.
Angew Chem Int Ed Engl ; 63(32): e202401850, 2024 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-38706222

RESUMO

Seeking high-performance photoresists is an important item for semiconductor industry due to the continuous miniaturization and intelligentization of integrated circuits. Polymer resin containing carbonate group has many desirable properties, such as high transmittance, acid sensitivity and chemical formulation, thus serving as promising photoresist material. In this work, a series of aqueous developable CO2-sourced polycarbonates (CO2-PCs) were produced via alternating copolymerization of CO2 and epoxides bearing acid-cleavable cyclic acetal groups in the presence of tetranuclear organoborane catalyst. The produced CO2-PCs were investigated as chemical amplification resists in deep ultraviolet (DUV) lithography. Under the catalysis of photogenerated acid, the acetal (ketal) groups in CO2-PC were hydrolysed into two equivalents of hydroxyl groups, which change the exposed area from hydrophobicity to hydrophilicity, thus enabling the exposed area to be developed with water. Through normalized remaining thickness analysis, the optimal CO2-derived resist achieved a remarkable sensitivity of 1.9 mJ/cm2, a contrast of 7.9, a favorable resolution (750 nm, half pitch), and a good etch resistance (38 % higher than poly(tert-butyl acrylate)). Such performances outperform commercial KrF and ArF chemical amplification resists (i.e., polyhydroxystyrene-derived and polymethacrylate-based resists), which endows broad application prospects in the field of DUV (KrF and ArF) and extreme ultraviolet (EUV) lithography for nanomanufacturing.

6.
Sci Adv ; 10(22): eadn7553, 2024 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-38809970

RESUMO

Long-range ordered phases in most high-entropy and medium-entropy alloys (HEAs/MEAs) exhibit poor ductility, stemming from their brittle nature of complex crystal structure with specific bonding state. Here, we propose a design strategy to severalfold strengthen a single-phase face-centered cubic (fcc) Ni2CoFeV MEA by introducing trigonal κ and cubic L12 intermetallic phases via hierarchical ordering. The tri-phase MEA has an ultrahigh tensile strength exceeding 1.6 GPa and an outstanding ductility of 30% at room temperature, which surpasses the strength-ductility synergy of most reported HEAs/MEAs. The simultaneous activation of unusual dislocation multiple slip and stacking faults (SFs) in the κ phase, along with nano-SF networks, Lomer-Cottrell locks, and high-density dislocations in the coupled L12 and fcc phases, contributes to enhanced strain hardening and excellent ductility. This work offers a promising prototype to design super-strong and ductile structural materials by harnessing the hierarchical ordered phases.

7.
Interdiscip Sci ; 16(2): 361-377, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38457109

RESUMO

Accurately predicting compound-protein interactions (CPI) is a critical task in computer-aided drug design. In recent years, the exponential growth of compound activity and biomedical data has highlighted the need for efficient and interpretable prediction approaches. In this study, we propose GraphsformerCPI, an end-to-end deep learning framework that improves prediction performance and interpretability. GraphsformerCPI treats compounds and proteins as sequences of nodes with spatial structures, and leverages novel structure-enhanced self-attention mechanisms to integrate semantic and graph structural features within molecules for deep molecule representations. To capture the vital association between compound atoms and protein residues, we devise a dual-attention mechanism to effectively extract relational features through .cross-mapping. By extending the powerful learning capabilities of Transformers to spatial structures and extensively utilizing attention mechanisms, our model offers strong interpretability, a significant advantage over most black-box deep learning methods. To evaluate GraphsformerCPI, extensive experiments were conducted on benchmark datasets including human, C. elegans, Davis and KIBA datasets. We explored the impact of model depth and dropout rate on performance and compared our model against state-of-the-art baseline models. Our results demonstrate that GraphsformerCPI outperforms baseline models in classification datasets and achieves competitive performance in regression datasets. Specifically, on the human dataset, GraphsformerCPI achieves an average improvement of 1.6% in AUC, 0.5% in precision, and 5.3% in recall. On the KIBA dataset, the average improvement in Concordance index (CI) and mean squared error (MSE) is 3.3% and 7.2%, respectively. Molecular docking shows that our model provides novel insights into the intrinsic interactions and binding mechanisms. Our research holds practical significance in effectively predicting CPIs and binding affinities, identifying key atoms and residues, enhancing model interpretability.


Assuntos
Aprendizado Profundo , Proteínas , Humanos , Proteínas/química , Proteínas/metabolismo , Animais , Algoritmos , Caenorhabditis elegans/metabolismo , Desenho de Fármacos , Ligação Proteica
8.
J Chem Theory Comput ; 20(7): 2947-2958, 2024 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-38501645

RESUMO

The ordered assembly of Tau protein into filaments characterizes Alzheimer's and other neurodegenerative diseases, and thus, stabilization of Tau protein is a promising avenue for tauopathies therapy. To dissect the underlying aggregation mechanisms on Tau, we employ a set of molecular simulations and the Markov state model to determine the kinetics of ensemble of K18. K18 is the microtubule-binding domain of Tau protein and plays a vital role in the microtubule assembly, recycling processes, and amyloid fibril formation. Here, we efficiently explore the conformation of K18 with about 150 µs lifetimes in silico. Our results observe that all four repeat regions (R1-R4) are very dynamic, featuring frequent conformational conversion and lacking stable conformations, and the R2 region is more flexible than the R1, R3, and R4 regions. Additionally, it is worth noting that residues 300-310 in R2-R3 and residues 319-336 in R3 tend to form sheet structures, indicating that K18 has a broader functional role than individual repeat monomers. Finally, the simulations combined with Markov state models and deep learning reveal 5 key conformational states along the transition pathway and provide the information on the microsecond time scale interstate transition rates. Overall, this study offers significant insights into the molecular mechanism of Tau pathological aggregation and develops novel strategies for both securing tauopathies and advancing drug discovery.


Assuntos
Aprendizado Profundo , Melfalan , Tauopatias , gama-Globulinas , Humanos , Proteínas tau/metabolismo , Sequência de Aminoácidos , Estrutura Secundária de Proteína
10.
J Mol Graph Model ; 128: 108703, 2024 05.
Artigo em Inglês | MEDLINE | ID: mdl-38228013

RESUMO

Molecular property prediction plays an essential role in drug discovery for identifying the candidate molecules with target properties. Deep learning models usually require sufficient labeled data to train good prediction models. However, the size of labeled data is usually small for molecular property prediction, which brings great challenges to deep learning-based molecular property prediction methods. Furthermore, the global information of molecules is critical for predicting molecular properties. Therefore, we propose INTransformer for molecular property prediction, which is a data augmentation method via contrastive learning to alleviate the limitations of the labeled molecular data while enhancing the ability to capture global information. Specifically, INTransformer consists of two identical Transformer sub-encoders to extract the molecular representation from the original SMILES and noisy SMILES respectively, while achieving the goal of data augmentation. To reduce the influence of noise, we use contrastive learning to ensure the molecular encoding of noisy SMILES is consistent with that of the original input so that the molecular representation information can be better extracted by INTransformer. Experiments on various benchmark datasets show that INTransformer achieved competitive performance for molecular property prediction tasks compared with the baselines and state-of-the-art methods.


Assuntos
Descoberta de Drogas , Fontes de Energia Elétrica , Bases de Dados Factuais
11.
Comput Biol Chem ; 107: 107972, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37883905

RESUMO

Accurately predicting protein-ligand binding affinities is crucial for determining molecular properties and understanding their physical effects. Neural networks and transformers are the predominant methods for sequence modeling, and both have been successfully applied independently for protein-ligand binding affinity prediction. As local and global information of molecules are vital for protein-ligand binding affinity prediction, we aim to combine bi-directional gated recurrent unit (BiGRU) and convolutional neural network (CNN) to effectively capture both local and global molecular information. Additionally, attention mechanisms can be incorporated to automatically learn and adjust the level of attention given to local and global information, thereby enhancing the performance of the model. To achieve this, we propose the PLAsformer approach, which encodes local and global information of molecules using 3DCNN and BiGRU with attention mechanism, respectively. This approach enhances the model's ability to encode comprehensive local and global molecular information. PLAsformer achieved a Pearson's correlation coefficient of 0.812 and a Root Mean Square Error (RMSE) of 1.284 when comparing experimental and predicted affinity on the PDBBind-2016 dataset. These results surpass the current state-of-the-art methods for binding affinity prediction. The high accuracy of PLAsformer's predictive performance, along with its excellent generalization ability, is clearly demonstrated by these findings.


Assuntos
Algoritmos , Redes Neurais de Computação , Ligantes , Proteínas/química , Ligação Proteica
12.
Small ; 19(41): e2303296, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37294167

RESUMO

Hard Carbon have become the most promising anode candidates for sodium-ion batteries, but the poor rate performance and cycle life remain key issues. In this work, N-doped hard carbon with abundant defects and expanded interlayer spacing is constructed by using carboxymethyl cellulose sodium as precursor with the assistance of graphitic carbon nitride. The formation of N-doped nanosheet structure is realized by the CN• or CC• radicals generated through the conversion of nitrile intermediates in the pyrolysis process. This greatly enhances the rate capability (192.8 mAh g-1 at 5.0 A g-1 ) and ultra-long cycle stability (233.3 mAh g-1 after 2000 cycles at 0.5 A g-1 ). In situ Raman spectroscopy, ex situ X-ray diffraction and X-ray photoelectron spectroscopy analysis in combination with comprehensive electrochemical characterizations, reveal that the interlayer insertion coordinated quasi-metallic sodium storage in the low potential plateau region and adsorption storage in the high potential sloping region. The first-principles density functional theory calculations further demonstrate strong coordination effect on nitrogen defect sites to capture sodium, especially with pyrrolic N, uncovering the formation mechanism of quasi-metallic bond in the sodium storage. This work provides new insights into the sodium storage mechanism of high-performance carbonaceous materials, and offers new opportunities for better design of hard carbon anode.

13.
PLoS One ; 18(5): e0285563, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37186596

RESUMO

The diffusion phenomena taking place in complex networks are usually modelled as diffusion process, such as the diffusion of diseases, rumors and viruses. Identification of diffusion source is crucial for developing strategies to control these harmful diffusion processes. At present, accurately identifying the diffusion source is still an opening challenge. In this paper, we define a kind of diffusion characteristics that is composed of the diffusion direction and time information of observers, and propose a neural networks based diffusion characteristics classification framework (NN-DCCF) to identify the source. The NN-DCCF contains three stages. First, the diffusion characteristics are utilized to construct network snapshot feature. Then, a graph LSTM auto-encoder is proposed to convert the network snapshot feature into low-dimension representation vectors. Further, a source classification neural network is proposed to identify the diffusion source by classifying the representation vectors. With NN-DCCF, the identification of diffusion source is converted into a classification problem. Experiments are performed on a series of synthetic and real networks. The results show that the NN-DCCF is feasible and effective in accurately identifying the diffusion source.


Assuntos
Redes Neurais de Computação , Difusão
14.
J Mol Graph Model ; 122: 108498, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37126908

RESUMO

Innovations in drug-target interactions (DTIs) prediction accelerate the progression of drug development. The introduction of deep learning models has a dramatic impact on DTIs prediction, with a distinct influence on saving time and money in drug discovery. This study develops an end-to-end deep collaborative learning model for DTIs prediction, called EDC-DTI, to identify new targets for existing drugs based on multiple drug-target-related information including homogeneous information and heterogeneous information by the way of deep learning. Our end-to-end model is composed of a feature builder and a classifier. Feature builder consists of two collaborative feature construction algorithms that extract the molecular properties and the topology property of networks, and the classifier consists of a feature encoder and a feature decoder which are designed for feature integration and DTIs prediction, respectively. The feature encoder, mainly based on the improved graph attention network, incorporates heterogeneous information into drug features and target features separately. The feature decoder is composed of multiple neural networks for predictions. Compared with six popular baseline models, EDC-DTI achieves highest predictive performance in the case of low computational costs. Robustness tests demonstrate that EDC-DTI is able to maintain strong predictive performance on sparse datasets. As well, we use the model to predict the most likely targets to interact with Simvastatin (DB00641), Nifedipine (DB01115) and Afatinib (DB08916) as examples. Results show that most of the predictions can be confirmed by literature with clear evidence.


Assuntos
Práticas Interdisciplinares , Desenvolvimento de Medicamentos/métodos , Descoberta de Drogas/métodos , Redes Neurais de Computação , Algoritmos
15.
BMC Cardiovasc Disord ; 23(1): 129, 2023 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-36899310

RESUMO

BACKGROUND: This study aims to investigate the value of myocardial work (MW) parameters during the isovolumic relaxation (IVR) period in patients with left ventricular diastolic dysfunction (LVDD). METHODS: This study prospectively recruited 448 patients with risks for LVDD and 95 healthy subjects. An additional 42 patients with invasive measurements of left ventricular (LV) diastolic function were prospectively included. The MW parameters during IVR were noninvasively measured using EchoPAC. RESULTS: The total myocardial work during IVR (MWIVR), myocardial constructive work during IVR (MCWIVR), myocardial wasted work during IVR (MWWIVR), and myocardial work efficiency during IVR (MWEIVR) of these patients were 122.5 ± 60.1 mmHg%, 85.7 ± 47.8 mmHg%, 36.7 ± 30.6 mmHg%, and 69.4 ± 17.8%, respectively. The MW during IVR was significantly different between patients and healthy subjects. For patients, MWEIVR and MCWIVR were significantly correlated with the LV E/e' ratio and left atrial volume index, MWEIVR exhibited a significant correlation with the maximal rate of decrease in LV pressure (dp/dt per min) and tau, and the MWEIVR corrected by IVRT also exhibited a significant correlation with tau. CONCLUSIONS: MW during IVR significantly changes in patients with risks for LVDD, and is correlated to LV conventional diastolic indices, including dp/dt min and tau. Noninvasive MW during IVR may be a promising tool to evaluate the LV diastolic function.


Assuntos
Disfunção Ventricular Esquerda , Função Ventricular Esquerda , Humanos , Diástole , Miocárdio
16.
Int Heart J ; 64(2): 137-144, 2023 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-36927932

RESUMO

Cardiac shockwave therapy (CSWT) is a noninvasive treatment for patients with refractory angina or myocardial ischemia. This study aims to evaluate the potential beneficial effect and safety of CSWT in patients with severe coronary artery disease (CAD) who have undergone coronary artery bypass grafting (CABG).This was a single-arm prospective cohort study. A total of 30 patients with severe CAD who were not suitable for coronary revascularization and who had undergone CABG were enrolled. All patients received CSWT for nine sessions. Evaluation was performed before and after CSWT, including the Canadian Cardiovascular Society (CCS) classification, New York Heart Association (NYHA) classification, 6-minute walk test (6MWT), Seattle Angina Questionnaire (SAQ) score, nitroglycerin dosage, echocardiography, myocardial perfusion imaging (MPI), and safety parameters. All patients were followed up at both 1 month and 9 months after CSWT.After treatment, CSWT significantly improved CCS classification (P < 0.05), NYHA classification (P < 0.05), nitroglycerin dosage (P < 0.001), and 6MWT (P < 0.05) at 1 month and 9 months after CSWT. SAQ score (P < 0.05) and left ventricular ejection fraction (LVEF; P = 0.037) by echocardiography significantly improved at 1 month after CSWT. Significant decreases in summed stress score (SSS), summed difference score (SDS), ischemic area stress, and ischemic area difference by MPI were observed at 1 month and 9 months after CSWT (P < 0.01). There were no changes in safety parameters before and after CSWT.CSWT may have a beneficial effect on improving myocardial perfusion, clinical symptoms, exertional capacity, and quality of life and is a safe alternative treatment for patients with severe CAD who have undergone CABG.


Assuntos
Doença da Artéria Coronariana , Ondas de Choque de Alta Energia , Humanos , Doença da Artéria Coronariana/cirurgia , Doença da Artéria Coronariana/diagnóstico , Nitroglicerina , Ondas de Choque de Alta Energia/uso terapêutico , Volume Sistólico , Estudos Prospectivos , Qualidade de Vida , Resultado do Tratamento , Função Ventricular Esquerda , Canadá , Ponte de Artéria Coronária
17.
J Mol Graph Model ; 121: 108454, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36963306

RESUMO

Simplified Molecular-Input Line-Entry System (SMILES) is one of a widely used molecular representation methods for molecular property prediction. We conjecture that all the characters in the SMILES string of a molecule are essential for making up the molecules, but most of them make little contribution to determining a particular property of the molecule. Therefore, we verified the conjecture in the pre-experiment. Motivated by the result, we propose to inject proper noisy information into the SMILES to augment the training data by increasing the diversity of the labeled molecules. To this end, we explore injecting perturbing noise into the original labeled SMILES strings to construct augmented data for alleviating the limitation of the labeled compound data and enhancing the model to extract more useful molecular representation for molecular property prediction. Specifically, we directly adopt mask, swap, deletion, and fusion operations on SMILES strings to randomly mask, swap, and delete atoms in SMILES strings. Then, the augmented data is used by two strategies: each epoch alternately feeds the original and perturbing noisy molecules, or each batch alternately feeds the original and perturbing noisy molecules. We conduct experiments on both Transformer and BiGRU models to validate the effectiveness by adopting widely used datasets from MoleculeNet and ZINC. Experimental results demonstrate that the proposed method outperforms strong baselines on all the datasets. NoiseMol obtains the best performance on BBBP and FDA when compared with state-of-the-art methods. Besides, NoiseMol achieves the best accuracy on LogP. Therefore, injecting perturbing noise into the labeled SMILES strings is an effective and efficient method, which improves the prediction performance, generalization, and robustness of the deep learning models.

18.
Appl Intell (Dordr) ; 53(12): 15246-15260, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36405344

RESUMO

Molecular property prediction is an essential but challenging task in drug discovery. The recurrent neural network (RNN) and Transformer are the mainstream methods for sequence modeling, and both have been successfully applied independently for molecular property prediction. As the local information and global information of molecules are very important for molecular properties, we aim to integrate the bi-directional gated recurrent unit (BiGRU) into the original Transformer encoder, together with self-attention to better capture local and global molecular information simultaneously. To this end, we propose the TranGRU approach, which encodes the local and global information of molecules by using the BiGRU and self-attention, respectively. Then, we use a gate mechanism to reasonably fuse the two molecular representations. In this way, we enhance the ability of the proposed model to encode both local and global molecular information. Compared to the baselines and state-of-the-art methods when treating each task as a single-task classification on Tox21, the proposed approach outperforms the baselines on 9 out of 12 tasks and state-of-the-art methods on 5 out of 12 tasks. TranGRU also obtains the best ROC-AUC scores on BBBP, FDA, LogP, and Tox21 (multitask classification) and has a comparable performance on ToxCast, BACE, and ecoli. On the whole, TranGRU achieves better performance for molecular property prediction. The source code is available in GitHub: https://github.com/Jiangjing0122/TranGRU.

19.
J Mol Graph Model ; 118: 108344, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36242862

RESUMO

Molecular property prediction is a significant task in drug discovery. Most deep learning-based computational methods either develop unique chemical representation or combine complex model. However, researchers are less concerned with the possible advantages of enormous quantities of unlabeled molecular data. Since the obvious limited amount of labeled data available, this task becomes more difficult. In some senses, SMILES of the drug molecule may be regarded of as a language for chemistry, taking inspiration from natural language processing research and current advances in pretrained models. In this paper, we incorporated Rotary Position Embedding(RoPE) efficiently encode the position information of SMILES sequences, ultimately enhancing the capability of the BERT pretrained model to extract potential molecular substructure information for molecular property prediction. We proposed the MolRoPE-BERT framework, an new end-to-end deep learning framework that integrates an efficient position coding approach for capturing sequence position information with a pretrained BERT model for molecular property prediction. To generate useful molecular substructure embeddings, we first exclusively train the MolRoPE-BERT on four million unlabeled drug SMILES(i.e., ZINC 15 and ChEMBL 27). Then, we conduct a series of experiments to evaluate the performance of our proposed MolRoPE-BERT on four well-studied datasets. Compared with conventional and state-of-the-art baselines, our experiment demonstrated comparable or superior performance.


Assuntos
Descoberta de Drogas
20.
J Chem Inf Model ; 62(17): 4122-4133, 2022 09 12.
Artigo em Inglês | MEDLINE | ID: mdl-36036609

RESUMO

To develop a realistic electrostatic model that allows for the anisotropy of the atomic electron density, high-rank atomic multipole moments computed by quantum chemical calculations have been studied extensively. However, it is hard to process huge RNA systems only relying on quantum chemical calculations due to its highly computational cost. In this study, we employ five machine learning methods of Gaussian process regression with automatic relevance determination (ARDGPR), Kriging, radial basis function neural networks, Bagging, and generalized regression neural network to predict atomic multipole moments. Atom-atom electrostatic interaction energies are subsequently computed using the predicted atomic multipole moments in the pilot system pentose of RNA. Here, the performance of the five methods is compared in terms of both the multipole moment prediction errors and the electrostatic energy prediction errors. For the predicted high-rank multipole moments of the four elements (O, C, N, and H) in capped pentose, ARDGPR and Kriging consistently outperform the other three methods. Therefore, the multipole moments predicted by the two best methods of ARDGPR and Kriging are then used to predict electrostatic interaction energy of each pentose. Finally, the absolute average energy errors of ARDGPR and Kriging are 1.83 and 4.33 kJ mol-1, respectively. Compared to Kriging, the ARDGPR method achieves a 58% decrease in the absolute average energy error. These satisfactory results demonstrated that the ARDGPR method with the strong feature extraction ability can predict the electrostatic interaction energy of pentose in RNA correctly and reliably.


Assuntos
Pentoses , RNA , Aprendizado de Máquina , Distribuição Normal , Eletricidade Estática
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA