Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 56
Filtrar
1.
bioRxiv ; 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38659952

RESUMO

Cells have evolved mechanisms to distribute ~10 billion protein molecules to subcellular compartments where diverse proteins involved in shared functions must efficiently assemble. Such assembly is presumed to unfold as a result of specific interactions between biomolecules; however, recent evidence suggests that distinctive chemical environments within subcellular compartments may also play an important role. Here, we test the hypothesis that protein groups with shared functions also share codes that guide them to compartment destinations. To test our hypothesis, we developed a transformer large language model, called ProtGPS, that predicts with high performance the compartment localization of human proteins excluded from the training set. We then demonstrate ProtGPS can be used for guided generation of novel protein sequences that selectively assemble into specific compartments in cells. Furthermore, ProtGPS predictions were sensitive to disease-associated mutations that produce changes in protein compartmentalization, suggesting that this type of pathogenic dysfunction can be discovered in silico. Our results indicate that protein sequences contain not only a folding code, but also a previously unrecognized chemical code governing their distribution in specific cellular compartments.

2.
ArXiv ; 2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-38463508

RESUMO

Accurate blind docking has the potential to lead to new biological breakthroughs, but for this promise to be realized, docking methods must generalize well across the proteome. Existing benchmarks, however, fail to rigorously assess generalizability. Therefore, we develop DockGen, a new benchmark based on the ligand-binding domains of proteins, and we show that existing machine learning-based docking models have very weak generalization abilities. We carefully analyze the scaling laws of ML-based docking and show that, by scaling data and model size, as well as integrating synthetic data strategies, we are able to significantly increase the generalization capacity and set new state-of-the-art performance across benchmarks. Further, we propose Confidence Bootstrapping, a new training paradigm that solely relies on the interaction between diffusion and confidence models and exploits the multi-resolution generation process of diffusion models. We demonstrate that Confidence Bootstrapping significantly improves the ability of ML-based docking methods to dock to unseen protein classes, edging closer to accurate and generalizable blind docking methods.

3.
ArXiv ; 2024 Jan 08.
Artigo em Inglês | MEDLINE | ID: mdl-38259348

RESUMO

Protein design often begins with knowledge of a desired function from a motif which motif-scaffolding aims to construct a functional protein around. Recently, generative models have achieved breakthrough success in designing scaffolds for a diverse range of motifs. However, the generated scaffolds tend to lack structural diversity, which can hinder success in wet-lab validation. In this work, we extend FrameFlow, an SE(3) flow matching model for protein backbone generation, to perform motif-scaffolding with two complementary approaches. The first is motif amortization, in which FrameFlow is trained with the motif as input using a data augmentation strategy. The second is motif guidance, which performs scaffolding using an estimate of the conditional score from FrameFlow, and requires no additional training. Both approaches achieve an equivalent or higher success rate than previous state-of-the-art methods, with 2.5 times more structurally diverse scaffolds. Code: https://github.com/microsoft/frame-flow.

4.
Nat Chem Biol ; 20(3): 291-301, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-37770698

RESUMO

Diverse mechanisms have been described for selective enrichment of biomolecules in membrane-bound organelles, but less is known about mechanisms by which molecules are selectively incorporated into biomolecular assemblies such as condensates that lack surrounding membranes. The chemical environments within condensates may differ from those outside these bodies, and if these differed among various types of condensate, then the different solvation environments would provide a mechanism for selective distribution among these intracellular bodies. Here we use small molecule probes to show that different condensates have distinct chemical solvating properties and that selective partitioning of probes in condensates can be predicted with deep learning approaches. Our results demonstrate that different condensates harbor distinct chemical environments that influence the distribution of molecules, show that clues to condensate chemical grammar can be ascertained by machine learning and suggest approaches to facilitate development of small molecule therapeutics with optimal subcellular distribution and therapeutic benefit.


Assuntos
Condensados Biomoleculares , Aprendizado de Máquina
5.
Science ; 382(6677): eadi1407, 2023 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-38127734

RESUMO

A closed-loop, autonomous molecular discovery platform driven by integrated machine learning tools was developed to accelerate the design of molecules with desired properties. We demonstrated two case studies on dye-like molecules, targeting absorption wavelength, lipophilicity, and photooxidative stability. In the first study, the platform experimentally realized 294 unreported molecules across three automatic iterations of molecular design-make-test-analyze cycles while exploring the structure-function space of four rarely reported scaffolds. In each iteration, the property prediction models that guided exploration learned the structure-property space of diverse scaffold derivatives, which were realized with multistep syntheses and a variety of reactions. The second study exploited property models trained on the explored chemical space and previously reported molecules to discover nine top-performing molecules within a lightly explored structure-property space.

6.
Syst Rev ; 12(1): 187, 2023 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-37803451

RESUMO

BACKGROUND: Evidence-based medicine requires synthesis of research through rigorous and time-intensive systematic literature reviews (SLRs), with significant resource expenditure for data extraction from scientific publications. Machine learning may enable the timely completion of SLRs and reduce errors by automating data identification and extraction. METHODS: We evaluated the use of machine learning to extract data from publications related to SLRs in oncology (SLR 1) and Fabry disease (SLR 2). SLR 1 predominantly contained interventional studies and SLR 2 observational studies. Predefined key terms and data were manually annotated to train and test bidirectional encoder representations from transformers (BERT) and bidirectional long-short-term memory machine learning models. Using human annotation as a reference, we assessed the ability of the models to identify biomedical terms of interest (entities) and their relations. We also pretrained BERT on a corpus of 100,000 open access clinical publications and/or enhanced context-dependent entity classification with a conditional random field (CRF) model. Performance was measured using the F1 score, a metric that combines precision and recall. We defined successful matches as partial overlap of entities of the same type. RESULTS: For entity recognition, the pretrained BERT+CRF model had the best performance, with an F1 score of 73% in SLR 1 and 70% in SLR 2. Entity types identified with the highest accuracy were metrics for progression-free survival (SLR 1, F1 score 88%) or for patient age (SLR 2, F1 score 82%). Treatment arm dosage was identified less successfully (F1 scores 60% [SLR 1] and 49% [SLR 2]). The best-performing model for relation extraction, pretrained BERT relation classification, exhibited F1 scores higher than 90% in cases with at least 80 relation examples for a pair of related entity types. CONCLUSIONS: The performance of BERT is enhanced by pretraining with biomedical literature and by combining with a CRF model. With refinement, machine learning may assist with manual data extraction for SLRs.


Assuntos
Medicina Baseada em Evidências , Gastos em Saúde , Humanos , Aprendizado de Máquina , Oncologia
7.
Sci Rep ; 13(1): 18611, 2023 10 30.
Artigo em Inglês | MEDLINE | ID: mdl-37903855

RESUMO

A validated open-source deep-learning algorithm called Sybil can accurately predict long-term lung cancer risk from a single low-dose chest computed tomography (LDCT). However, Sybil was trained on a majority-male cohort. Use of artificial intelligence algorithms trained on imbalanced cohorts may lead to inequitable outcomes in real-world settings. We aimed to study whether Sybil predicts lung cancer risk equally regardless of sex. We analyzed 10,573 LDCTs from 6127 consecutive lung cancer screening participants across a health system between 2015 and 2021. Sybil achieved AUCs of 0.89 (95% CI: 0.85-0.93) for females and 0.89 (95% CI: 0.85-0.94) for males at 1 year, p = 0.92. At 6 years, the AUC was 0.87 (95% CI: 0.83-0.93) for females and 0.79 (95% CI: 0.72-0.86) for males, p = 0.01. In conclusion, Sybil can accurately predict future lung cancer risk in females and males in a real-world setting and performs better in females than in males for predicting 6-year lung cancer risk.


Assuntos
Neoplasias Pulmonares , Feminino , Humanos , Masculino , Neoplasias Pulmonares/diagnóstico por imagem , Neoplasias Pulmonares/epidemiologia , Detecção Precoce de Câncer/métodos , Inteligência Artificial , Tomografia Computadorizada por Raios X/métodos , Risco
8.
Thorac Surg Clin ; 33(4): 401-409, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37806742

RESUMO

Recent advances in artificial intelligence and machine learning (AI/ML) hold substantial promise to address some of the current challenges in lung cancer screening and improve health equity. This article reviews the status and future directions of AI/ML tools in the lung cancer screening workflow, focusing on determining screening eligibility, radiation dose reduction and image denoising for low-dose chest computed tomography (CT), lung nodule detection, lung nodule classification, and determining optimal screening intervals. AI/ML tools can assess for chronic diseases on CT, which creates opportunities to improve population health through opportunistic screening.


Assuntos
Detecção Precoce de Câncer , Neoplasias Pulmonares , Humanos , Inteligência Artificial , Neoplasias Pulmonares/diagnóstico por imagem , Aprendizado de Máquina , Tomografia Computadorizada por Raios X
10.
Nature ; 620(7976): 1089-1100, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37433327

RESUMO

There has been considerable recent progress in designing new proteins using deep-learning methods1-9. Despite this progress, a general deep-learning framework for protein design that enables solution of a wide range of design challenges, including de novo binder design and design of higher-order symmetric architectures, has yet to be described. Diffusion models10,11 have had considerable success in image and language generative modelling but limited success when applied to protein modelling, probably due to the complexity of protein backbone geometry and sequence-structure relationships. Here we show that by fine-tuning the RoseTTAFold structure prediction network on protein structure denoising tasks, we obtain a generative model of protein backbones that achieves outstanding performance on unconditional and topology-constrained protein monomer design, protein binder design, symmetric oligomer design, enzyme active site scaffolding and symmetric motif scaffolding for therapeutic and metal-binding protein design. We demonstrate the power and generality of the method, called RoseTTAFold diffusion (RFdiffusion), by experimentally characterizing the structures and functions of hundreds of designed symmetric assemblies, metal-binding proteins and protein binders. The accuracy of RFdiffusion is confirmed by the cryogenic electron microscopy structure of a designed binder in complex with influenza haemagglutinin that is nearly identical to the design model. In a manner analogous to networks that produce images from user-specified inputs, RFdiffusion enables the design of diverse functional proteins from simple molecular specifications.


Assuntos
Aprendizado Profundo , Proteínas , Domínio Catalítico , Microscopia Crioeletrônica , Glicoproteínas de Hemaglutininação de Vírus da Influenza/química , Glicoproteínas de Hemaglutininação de Vírus da Influenza/metabolismo , Glicoproteínas de Hemaglutininação de Vírus da Influenza/ultraestrutura , Ligação Proteica , Proteínas/química , Proteínas/metabolismo , Proteínas/ultraestrutura
11.
J Chem Inf Model ; 63(13): 4030-4041, 2023 07 10.
Artigo em Inglês | MEDLINE | ID: mdl-37368970

RESUMO

Reaction diagram parsing is the task of extracting reaction schemes from a diagram in the chemistry literature. The reaction diagrams can be arbitrarily complex; thus, robustly parsing them into structured data is an open challenge. In this paper, we present RxnScribe, a machine learning model for parsing reaction diagrams of varying styles. We formulate this structured prediction task with a sequence generation approach, which condenses the traditional pipeline into an end-to-end model. We train RxnScribe on a dataset of 1378 diagrams and evaluate it with cross validation, achieving an 80.0% soft match F1 score, with significant improvements over previous models. Our code and data are publicly available at https://github.com/thomas0809/RxnScribe.


Assuntos
Aprendizado de Máquina
12.
Nat Chem Biol ; 19(11): 1342-1350, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37231267

RESUMO

Acinetobacter baumannii is a nosocomial Gram-negative pathogen that often displays multidrug resistance. Discovering new antibiotics against A. baumannii has proven challenging through conventional screening approaches. Fortunately, machine learning methods allow for the rapid exploration of chemical space, increasing the probability of discovering new antibacterial molecules. Here we screened ~7,500 molecules for those that inhibited the growth of A. baumannii in vitro. We trained a neural network with this growth inhibition dataset and performed in silico predictions for structurally new molecules with activity against A. baumannii. Through this approach, we discovered abaucin, an antibacterial compound with narrow-spectrum activity against A. baumannii. Further investigations revealed that abaucin perturbs lipoprotein trafficking through a mechanism involving LolE. Moreover, abaucin could control an A. baumannii infection in a mouse wound model. This work highlights the utility of machine learning in antibiotic discovery and describes a promising lead with targeted activity against a challenging Gram-negative pathogen.


Assuntos
Acinetobacter baumannii , Aprendizado Profundo , Animais , Camundongos , Antibacterianos/farmacologia , Farmacorresistência Bacteriana Múltipla , Testes de Sensibilidade Microbiana
13.
J Chem Inf Model ; 63(7): 1925-1934, 2023 04 10.
Artigo em Inglês | MEDLINE | ID: mdl-36971363

RESUMO

Molecular structure recognition is the task of translating a molecular image into its graph structure. Significant variation in drawing styles and conventions exhibited in chemical literature poses a significant challenge for automating this task. In this paper, we propose MolScribe, a novel image-to-graph generation model that explicitly predicts atoms and bonds, along with their geometric layouts, to construct the molecular structure. Our model flexibly incorporates symbolic chemistry constraints to recognize chirality and expand abbreviated structures. We further develop data augmentation strategies to enhance the model robustness against domain shifts. In experiments on both synthetic and realistic molecular images, MolScribe significantly outperforms previous models, achieving 76-93% accuracy on public benchmarks. Chemists can also easily verify MolScribe's prediction, informed by its confidence estimation and atom-level alignment with the input image. MolScribe is publicly available through Python and web interfaces: https://github.com/thomas0809/MolScribe.


Assuntos
Benchmarking , Estrutura Molecular
14.
J Clin Oncol ; 41(12): 2191-2200, 2023 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-36634294

RESUMO

PURPOSE: Low-dose computed tomography (LDCT) for lung cancer screening is effective, although most eligible people are not being screened. Tools that provide personalized future cancer risk assessment could focus approaches toward those most likely to benefit. We hypothesized that a deep learning model assessing the entire volumetric LDCT data could be built to predict individual risk without requiring additional demographic or clinical data. METHODS: We developed a model called Sybil using LDCTs from the National Lung Screening Trial (NLST). Sybil requires only one LDCT and does not require clinical data or radiologist annotations; it can run in real time in the background on a radiology reading station. Sybil was validated on three independent data sets: a heldout set of 6,282 LDCTs from NLST participants, 8,821 LDCTs from Massachusetts General Hospital (MGH), and 12,280 LDCTs from Chang Gung Memorial Hospital (CGMH, which included people with a range of smoking history including nonsmokers). RESULTS: Sybil achieved area under the receiver-operator curves for lung cancer prediction at 1 year of 0.92 (95% CI, 0.88 to 0.95) on NLST, 0.86 (95% CI, 0.82 to 0.90) on MGH, and 0.94 (95% CI, 0.91 to 1.00) on CGMH external validation sets. Concordance indices over 6 years were 0.75 (95% CI, 0.72 to 0.78), 0.81 (95% CI, 0.77 to 0.85), and 0.80 (95% CI, 0.75 to 0.86) for NLST, MGH, and CGMH, respectively. CONCLUSION: Sybil can accurately predict an individual's future lung cancer risk from a single LDCT scan to further enable personalized screening. Future study is required to understand Sybil's clinical applications. Our model and annotations are publicly available.[Media: see text].


Assuntos
Aprendizado Profundo , Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/diagnóstico por imagem , Detecção Precoce de Câncer/métodos , Tomografia Computadorizada por Raios X , Pulmão , Programas de Rastreamento/métodos
15.
J Clin Oncol ; 40(20): 2281-2282, 2022 07 10.
Artigo em Inglês | MEDLINE | ID: mdl-35452271
16.
Nat Med ; 28(1): 136-143, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-35027757

RESUMO

Screening programs must balance the benefit of early detection with the cost of overscreening. Here, we introduce a novel reinforcement learning-based framework for personalized screening, Tempo, and demonstrate its efficacy in the context of breast cancer. We trained our risk-based screening policies on a large screening mammography dataset from Massachusetts General Hospital (MGH; USA) and validated this dataset in held-out patients from MGH and external datasets from Emory University (Emory; USA), Karolinska Institute (Karolinska; Sweden) and Chang Gung Memorial Hospital (CGMH; Taiwan). Across all test sets, we find that the Tempo policy combined with an image-based artificial intelligence (AI) risk model is significantly more efficient than current regimens used in clinical practice in terms of simulated early detection per screen frequency. Moreover, we show that the same Tempo policy can be easily adapted to a wide range of possible screening preferences, allowing clinicians to select their desired trade-off between early detection and screening costs without training new policies. Finally, we demonstrate that Tempo policies based on AI-based risk models outperform Tempo policies based on less accurate clinical risk models. Altogether, our results show that pairing AI-based risk models with agile AI-designed screening policies has the potential to improve screening programs by advancing early detection while reducing overscreening.


Assuntos
Inteligência Artificial , Neoplasias da Mama/diagnóstico , Mamografia/métodos , Detecção Precoce de Câncer/métodos , Feminino , Humanos
17.
J Chem Inf Model ; 62(9): 2035-2045, 2022 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-34115937

RESUMO

Access to structured chemical reaction data is of key importance for chemists in performing bench experiments and in modern applications like computer-aided drug design. Existing reaction databases are generally populated by human curators through manual abstraction from published literature (e.g., patents and journals), which is time consuming and labor intensive, especially with the exponential growth of chemical literature in recent years. In this study, we focus on developing automated methods for extracting reactions from chemical literature. We consider journal publications as the target source of information, which are more comprehensive and better represent the latest developments in chemistry compared to patents; however, they are less formulaic in their descriptions of reactions. To implement the reaction extraction system, we first devised a chemical reaction schema, primarily including a central product, and a set of associated reaction roles such as reactants, catalyst, solvent, and so on. We formulate the task as a structure prediction problem and solve it with a two-stage deep learning framework consisting of product extraction and reaction role labeling. Both models are built upon Transformer-based encoders, which are adaptively pretrained using domain and task-relevant unlabeled data. Our models are shown to be both effective and data efficient, achieving an F1 score of 76.2% in product extraction and 78.7% in role extraction, with only hundreds of annotated reactions.


Assuntos
Bases de Dados Factuais , Humanos
18.
J Clin Oncol ; 40(16): 1732-1740, 2022 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-34767469

RESUMO

PURPOSE: Accurate risk assessment is essential for the success of population screening programs in breast cancer. Models with high sensitivity and specificity would enable programs to target more elaborate screening efforts to high-risk populations, while minimizing overtreatment for the rest. Artificial intelligence (AI)-based risk models have demonstrated a significant advance over risk models used today in clinical practice. However, the responsible deployment of novel AI requires careful validation across diverse populations. To this end, we validate our AI-based model, Mirai, across globally diverse screening populations. METHODS: We collected screening mammograms and pathology-confirmed breast cancer outcomes from Massachusetts General Hospital, USA; Novant, USA; Emory, USA; Maccabi-Assuta, Israel; Karolinska, Sweden; Chang Gung Memorial Hospital, Taiwan; and Barretos, Brazil. We evaluated Uno's concordance index for Mirai in predicting risk of breast cancer at one to five years from the mammogram. RESULTS: A total of 128,793 mammograms from 62,185 patients were collected across the seven sites, of which 3,815 were followed by a cancer diagnosis within 5 years. Mirai obtained concordance indices of 0.75 (95% CI, 0.72 to 0.78), 0.75 (95% CI, 0.70 to 0.80), 0.77 (95% CI, 0.75 to 0.79), 0.77 (95% CI, 0.73 to 0.81), 0.81 (95% CI, 0.79 to 0.82), 0.79 (95% CI, 0.76 to 0.83), and 0.84 (95% CI, 0.81 to 0.88) at Massachusetts General Hospital, Novant, Emory, Maccabi-Assuta, Karolinska, Chang Gung Memorial Hospital, and Barretos, respectively. CONCLUSION: Mirai, a mammography-based risk model, maintained its accuracy across globally diverse test sets from seven hospitals across five countries. This is the broadest validation to date of an AI-based breast cancer model and suggests that the technology can offer broad and equitable improvements in care.


Assuntos
Neoplasias da Mama , Inteligência Artificial , Neoplasias da Mama/diagnóstico por imagem , Neoplasias da Mama/epidemiologia , Detecção Precoce de Câncer , Feminino , Humanos , Mamografia , Programas de Rastreamento
19.
Proc Natl Acad Sci U S A ; 118(39)2021 09 28.
Artigo em Inglês | MEDLINE | ID: mdl-34526388

RESUMO

Effective treatments for COVID-19 are urgently needed. However, discovering single-agent therapies with activity against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been challenging. Combination therapies play an important role in antiviral therapies, due to their improved efficacy and reduced toxicity. Recent approaches have applied deep learning to identify synergistic drug combinations for diseases with vast preexisting datasets, but these are not applicable to new diseases with limited combination data, such as COVID-19. Given that drug synergy often occurs through inhibition of discrete biological targets, here we propose a neural network architecture that jointly learns drug-target interaction and drug-drug synergy. The model consists of two parts: a drug-target interaction module and a target-disease association module. This design enables the model to utilize drug-target interaction data and single-agent antiviral activity data, in addition to available drug-drug combination datasets, which may be small in nature. By incorporating additional biological information, our model performs significantly better in synergy prediction accuracy than previous methods with limited drug combination training data. We empirically validated our model predictions and discovered two drug combinations, remdesivir and reserpine as well as remdesivir and IQ-1S, which display strong antiviral SARS-CoV-2 synergy in vitro. Our approach, which was applied here to address the urgent threat of COVID-19, can be readily extended to other diseases for which a dearth of chemical-chemical combination data exists.


Assuntos
Antivirais/farmacologia , Tratamento Farmacológico da COVID-19 , Aprendizado Profundo , Monofosfato de Adenosina/análogos & derivados , Alanina/análogos & derivados , Sobrevivência Celular/efeitos dos fármacos , Combinação de Medicamentos , Interações Medicamentosas , Sinergismo Farmacológico , Humanos , SARS-CoV-2
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...