Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
J Chem Inf Model ; 64(4): 1112-1122, 2024 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-38315002

RESUMO

Molecular pretraining, which learns molecular representations over massive unlabeled data, has become a prominent paradigm to solve a variety of tasks in computational chemistry and drug discovery. Recently, prosperous progress has been made in molecular pretraining with different molecular featurizations, including 1D SMILES strings, 2D graphs, and 3D geometries. However, the role of molecular featurizations with their corresponding neural architectures in molecular pretraining remains largely unexamined. In this paper, through two case studies─chirality classification and aromatic ring counting─we first demonstrate that different featurization techniques convey chemical information differently. In light of this observation, we propose a simple and effective MOlecular pretraining framework with COllaborative featurizations (MOCO). MOCO comprehensively leverages multiple featurizations that complement each other and outperforms existing state-of-the-art models that solely rely on one or two featurizations on a wide range of molecular property prediction tasks.


Assuntos
Química Computacional , Descoberta de Drogas , Aprendizagem
2.
Nature ; 620(7972): 47-60, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37532811

RESUMO

Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment and accelerate research, helping scientists to generate hypotheses, design experiments, collect and interpret large datasets, and gain insights that might not have been possible using traditional scientific methods alone. Here we examine breakthroughs over the past decade that include self-supervised learning, which allows models to be trained on vast amounts of unlabelled data, and geometric deep learning, which leverages knowledge about the structure of scientific data to enhance model accuracy and efficiency. Generative AI methods can create designs, such as small-molecule drugs and proteins, by analysing diverse data modalities, including images and sequences. We discuss how these methods can help scientists throughout the scientific process and the central issues that remain despite such advances. Both developers and users of AI toolsneed a better understanding of when such approaches need improvement, and challenges posed by poor data quality and stewardship remain. These issues cut across scientific disciplines and require developing foundational algorithmic approaches that can contribute to scientific understanding or acquire it autonomously, making them critical areas of focus for AI innovation.


Assuntos
Inteligência Artificial , Projetos de Pesquisa , Inteligência Artificial/normas , Inteligência Artificial/tendências , Conjuntos de Dados como Assunto , Aprendizado Profundo , Projetos de Pesquisa/normas , Projetos de Pesquisa/tendências , Aprendizado de Máquina não Supervisionado
4.
Nat Comput Sci ; 3(12): 1045-1055, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38177724

RESUMO

Transition state search is key in chemistry for elucidating reaction mechanisms and exploring reaction networks. The search for accurate 3D transition state structures, however, requires numerous computationally intensive quantum chemistry calculations due to the complexity of potential energy surfaces. Here we developed an object-aware SE(3) equivariant diffusion model that satisfies all physical symmetries and constraints for generating sets of structures-reactant, transition state and product-in an elementary reaction. Provided reactant and product, this model generates a transition state structure in seconds instead of hours, which is typically required when performing quantum-chemistry-based optimizations. The generated transition state structures achieve a median of 0.08 Å root mean square deviation compared to the true transition state. With a confidence scoring model for uncertainty quantification, we approach an accuracy required for reaction barrier estimation (2.6 kcal mol-1) by only performing quantum chemistry-based optimizations on 14% of the most challenging reactions. We envision usefulness for our approach in constructing large reaction networks with unknown mechanisms.

5.
Bioinformatics ; 38(12): 3200-3208, 2022 06 13.
Artigo em Inglês | MEDLINE | ID: mdl-35511125

RESUMO

MOTIVATION: Expanding our knowledge of small molecules beyond what is known in nature or designed in wet laboratories promises to significantly advance cheminformatics, drug discovery, biotechnology and material science. In silico molecular design remains challenging, primarily due to the complexity of the chemical space and the non-trivial relationship between chemical structures and biological properties. Deep generative models that learn directly from data are intriguing, but they have yet to demonstrate interpretability in the learned representation, so we can learn more about the relationship between the chemical and biological space. In this article, we advance research on disentangled representation learning for small molecule generation. We build on recent work by us and others on deep graph generative frameworks, which capture atomic interactions via a graph-based representation of a small molecule. The methodological novelty is how we leverage the concept of disentanglement in the graph variational autoencoder framework both to generate biologically relevant small molecules and to enhance model interpretability. RESULTS: Extensive qualitative and quantitative experimental evaluation in comparison with state-of-the-art models demonstrate the superiority of our disentanglement framework. We believe this work is an important step to address key challenges in small molecule generation with deep generative frameworks. AVAILABILITY AND IMPLEMENTATION: Training and generated data are made available at https://ieee-dataport.org/documents/dataset-disentangled-representation-learning-interpretable-molecule-generation. All code is made available at https://anonymous.4open.science/r/D-MolVAE-2799/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Descoberta de Drogas
6.
Int J Comput Assist Radiol Surg ; 16(5): 749-756, 2021 May.
Artigo em Inglês | MEDLINE | ID: mdl-33864189

RESUMO

PURPOSE: Pelvic bone segmentation in CT has always been an essential step in clinical diagnosis and surgery planning of pelvic bone diseases. Existing methods for pelvic bone segmentation are either hand-crafted or semi-automatic and achieve limited accuracy when dealing with image appearance variations due to the multi-site domain shift, the presence of contrasted vessels, coprolith and chyme, bone fractures, low dose, metal artifacts, etc. Due to the lack of a large-scale pelvic CT dataset with annotations, deep learning methods are not fully explored. METHODS: In this paper, we aim to bridge the data gap by curating a large pelvic CT dataset pooled from multiple sources, including 1184 CT volumes with a variety of appearance variations. Then, we propose for the first time, to the best of our knowledge, to learn a deep multi-class network for segmenting lumbar spine, sacrum, left hip, and right hip, from multiple-domain images simultaneously to obtain more effective and robust feature representations. Finally, we introduce a post-processor based on the signed distance function (SDF). RESULTS: Extensive experiments on our dataset demonstrate the effectiveness of our automatic method, achieving an average Dice of 0.987 for a metal-free volume. SDF post-processor yields a decrease of 15.1% in Hausdorff distance compared with traditional post-processor. CONCLUSION: We believe this large-scale dataset will promote the development of the whole community and open source the images, annotations, codes, and trained baseline models at https://github.com/ICT-MIRACLE-lab/CTPelvic1K .


Assuntos
Aprendizado Profundo , Processamento de Imagem Assistida por Computador/métodos , Ossos Pélvicos/diagnóstico por imagem , Pelve/diagnóstico por imagem , Pelve/cirurgia , Tomografia Computadorizada por Raios X/métodos , Algoritmos , Humanos , Reconhecimento Automatizado de Padrão , Reprodutibilidade dos Testes
7.
Molecules ; 26(5)2021 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-33668217

RESUMO

Protein molecules are inherently dynamic and modulate their interactions with different molecular partners by accessing different tertiary structures under physiological conditions. Elucidating such structures remains challenging. Current momentum in deep learning and the powerful performance of generative adversarial networks (GANs) in complex domains, such as computer vision, inspires us to investigate GANs on their ability to generate physically-realistic protein tertiary structures. The analysis presented here shows that several GAN models fail to capture complex, distal structural patterns present in protein tertiary structures. The study additionally reveals that mechanisms touted as effective in stabilizing the training of a GAN model are not all effective, and that performance based on loss alone may be orthogonal to performance based on the quality of generated datasets. A novel contribution in this study is the demonstration that Wasserstein GAN strikes a good balance and manages to capture both local and distal patterns, thus presenting a first step towards more powerful deep generative models for exploring a possibly very diverse set of structures supporting diverse activities of a protein molecule in the cell.


Assuntos
Redes Neurais de Computação , Proteínas/química , Estrutura Terciária de Proteína
8.
Bioinform Adv ; 1(1): vbab036, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-36700110

RESUMO

Motivation: Modeling the structural plasticity of protein molecules remains challenging. Most research has focused on obtaining one biologically active structure. This includes the recent AlphaFold2 that has been hailed as a breakthrough for protein modeling. Computing one structure does not suffice to understand how proteins modulate their interactions and even evade our immune system. Revealing the structure space available to a protein remains challenging. Data-driven approaches that learn to generate tertiary structures are increasingly garnering attention. These approaches exploit the ability to represent tertiary structures as contact or distance maps and make direct analogies with images to harness convolution-based generative adversarial frameworks from computer vision. Since such opportunistic analogies do not allow capturing highly structured data, current deep models struggle to generate physically realistic tertiary structures. Results: We present novel deep generative models that build upon the graph variational autoencoder framework. In contrast to existing literature, we represent tertiary structures as 'contact' graphs, which allow us to leverage graph-generative deep learning. Our models are able to capture rich, local and distal constraints and additionally compute disentangled latent representations that reveal the impact of individual latent factors. This elucidates what the factors control and makes our models more interpretable. Rigorous comparative evaluation along various metrics shows that the models, we propose advance the state-of-the-art. While there is still much ground to cover, the work presented here is an important first step, and graph-generative frameworks promise to get us to our goal of unraveling the exquisite structural complexity of protein molecules. Availability and implementation: Code is available at https://github.com/anonymous1025/CO-VAE. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

9.
Talanta ; 202: 580-590, 2019 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-31171224

RESUMO

Aliphatic aldehyde is a type of important organic material and precursor. However, some of them have been proved danger or harmful to health, especially those are occurred in food and cosmetics. Here, we reported a rapid and efficiency one-step membrane protected micro-solid-phase extraction and derivatization (µ-SPE-D) method for selectively determination of trace aliphatic aldehydes in complex cosmetic and food samples by coupling to high-performance liquid chromatography (HPLC). In this method, one-step extraction and derivatization strategy reduced the sample preparation time; membrane protected solid-phase extraction technology eliminated matrix interference; and the efficient sorbent mono-(6-(diethylenetriamine)-6-deoxy)-beta-cyclodextrin- ploy(styrene-divinylbenzene-methacrylic acid) (NH2-ß-CD-Poly(St-DVB-MAA)) guaranteed the selective adsorption of aliphatic aldehydes. The sorbent, NH2-ß-CD-Poly(St-DVB-MAA), was prepared via chemical fabrication and optimized carefully. The µ-SPE device, and extraction and derivatization conditions were optimized. The mechanism of extraction and derivatization was also discussed. While coupling to HPLC, the method detection limits were as low as 0.024-2.5 µg/L, and the method quantification limits were 0.081-7.6 µg/L with relative standard deviations (RSDs) in the range of 2.2-7.7%. Finally, this membrane protected µ-SPE-D-HPLC was successfully applied to the real sample analysis. In cosmetics analysis, aliphatic aldehydes were found and quantified in the range of 9.0-750.2 µg/kg, and the recoveries were in the range of 81.7-114.9% with RSDs less than 8.3% from toner and moisturizer samples. Meanwhile, aliphatic aldehydes were found and quantified in the range of 1.3-31.3 µg/kg, and the recoveries were in the range of 80.8-114.4% with RSDs less than 7.1% from fried food samples. Our results proved that only 5 min was needed for extraction and derivatization processes. Furthermore, the method detection limits of formaldehyde were an order of magnitude less than literature reports which reveal excellent selectivity of this membrane protected µ-SPE-D-HPLC method.


Assuntos
Aldeídos/análise , Cosméticos/química , Contaminação de Alimentos/análise , Microextração em Fase Sólida , Cromatografia Líquida de Alta Pressão
10.
Se Pu ; 36(7): 579-587, 2018 Jul 08.
Artigo em Chinês | MEDLINE | ID: mdl-30136528

RESUMO

Derivatization is an effective method to analyze substances, in which the analyte is converted into a more suitable form for analysis. As one of the most used pre-column derivatization methods, in situ derivatization can simultaneously extract and derivatize the analyte in a sample matrix, resulting in good efficiency, sensitivity, and selectivity of the analytical method. Recently, in situ derivatization combined with sample pretreatment techniques has been widely used in biological, drug, food, environmental, and cosmetic samples for the analysis of trace amines, aldehydes and ketones, alcohols, phenols, carboxylic acids, and thiols. This review summarizes the reaction types and representative derivatization reagents of in situ derivatization. The applications of in situ derivatization in liquid chromatography (LC) and liquid chromatography-mass spectrometry (LC-MS) analysis are examined. Furthermore, the long-term prospects and potential applications of in situ derivatization are discussed.


Assuntos
Cromatografia Líquida , Espectrometria de Massas em Tandem , Álcoois , Aldeídos , Aminas , Ácidos Carboxílicos , Cosméticos/análise , Análise de Alimentos , Indicadores e Reagentes , Cetonas , Preparações Farmacêuticas/análise , Fenóis , Compostos de Sulfidrila
11.
J Chromatogr A ; 1554: 37-44, 2018 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-29703597

RESUMO

Nowadays, the safety of cosmetics is a widespread concern. Amines are common cosmetic additives. Some of them such as amino acids are beneficial. Another kind of amines, however, ε-aminocaproic acid (EACA) is prohibited to add into cosmetics for its adverse reactions. In this study, a simple, rapid, sensitive and eco-friendly one-step ultrasonic-assisted extraction and derivatization (UAE-D) method was developed for determination of EACA and amino acids in cosmetics by coupling with high-performance liquid chromatography (HPLC). By using this sample preparation method, extraction and derivatization of EACA and amino acids were finished in one step in ultrasound field. During this procedure, 4-fluoro-7-nitrobenzofurazan (NBD-F)was applied as derivatization reagent. The extraction conditions including the amount of NBD-F, extraction and derivatization temperature, the ultrasonic vibration time and pH value of the aqueous phase were evaluated. Meanwhile, the extraction mechanism was investigated. Under optimized conditions, the method detection limits were 0.086-0.15 µg/L, and method quantitation limits were 0.29-0.47 µg/L with RSDs less than 3.7% (n = 3). The recoveries of EACA and amino acids obtained from cosmetic samples were in range from 76.9% to 122.3%. Amino acids were found in all selected samples and quantified in range from 1.9 ±â€¯0.9 to 677.2 ±â€¯17.9 µg/kg. And EACA was found and quantified with the contents of 1284.3 ±â€¯22.1 µg/kg in a toner sample. This UAE-D-HPLC method shortened and simplified the sample pretreatment as well as enhanced the sensitivity of analytical method. In our record, only 10 min was needed for the total sample preparation process. And the method detection limits were two orders of magnitude less than literature reports. Furthermore, we reduced the consumption of solvent and minimized the usage of organic solvents, which made our method moving towards green analytical chemistry. In brief, our UAE-D-HPLC method is a simple, rapid, sensitive and eco-friendly analytical method for the determination of EACA and amino acids in cosmetics.


Assuntos
Aminoácidos/análise , Ácido Aminocaproico/análise , Cromatografia Líquida de Alta Pressão , Cosméticos/química , Extração em Fase Sólida/métodos , 4-Cloro-7-nitrobenzofurazano/análogos & derivados , 4-Cloro-7-nitrobenzofurazano/química , Aminoácidos/isolamento & purificação , Ácido Aminocaproico/isolamento & purificação , Concentração de Íons de Hidrogênio , Limite de Detecção , Solventes/química , Sonicação , Temperatura
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA