Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
J Immunol ; 212(11): 1766-1781, 2024 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38683120

RESUMO

Better understanding of the host responses to Mycobacterium tuberculosis infections is required to prevent tuberculosis and develop new therapeutic interventions. The host transcription factor BHLHE40 is essential for controlling M. tuberculosis infection, in part by repressing Il10 expression, where excess IL-10 contributes to the early susceptibility of Bhlhe40-/- mice to M. tuberculosis infection. Deletion of Bhlhe40 in lung macrophages and dendritic cells is sufficient to increase the susceptibility of mice to M. tuberculosis infection, but how BHLHE40 impacts macrophage and dendritic cell responses to M. tuberculosis is unknown. In this study, we report that BHLHE40 is required in myeloid cells exposed to GM-CSF, an abundant cytokine in the lung, to promote the expression of genes associated with a proinflammatory state and better control of M. tuberculosis infection. Loss of Bhlhe40 expression in murine bone marrow-derived myeloid cells cultured in the presence of GM-CSF results in lower levels of proinflammatory associated signaling molecules IL-1ß, IL-6, IL-12, TNF-α, inducible NO synthase, IL-2, KC, and RANTES, as well as higher levels of the anti-inflammatory-associated molecules MCP-1 and IL-10 following exposure to heat-killed M. tuberculosis. Deletion of Il10 in Bhlhe40-/- myeloid cells restored some, but not all, proinflammatory signals, demonstrating that BHLHE40 promotes proinflammatory responses via both IL-10-dependent and -independent mechanisms. In addition, we show that macrophages and neutrophils within the lungs of M. tuberculosis-infected Bhlhe40-/- mice exhibit defects in inducible NO synthase production compared with infected wild-type mice, supporting that BHLHE40 promotes proinflammatory responses in innate immune cells, which may contribute to the essential role for BHLHE40 during M. tuberculosis infection in vivo.


Assuntos
Fatores de Transcrição Hélice-Alça-Hélice Básicos , Interleucina-10 , Camundongos Knockout , Células Mieloides , Animais , Camundongos , Interleucina-10/imunologia , Interleucina-10/genética , Fatores de Transcrição Hélice-Alça-Hélice Básicos/genética , Fatores de Transcrição Hélice-Alça-Hélice Básicos/imunologia , Células Mieloides/imunologia , Mycobacterium tuberculosis/imunologia , Macrófagos/imunologia , Proteínas de Homeodomínio/genética , Camundongos Endogâmicos C57BL , Fator Estimulador de Colônias de Granulócitos e Macrófagos , Células Dendríticas/imunologia , Pulmão/imunologia , Tuberculose/imunologia , Polaridade Celular , Células Cultivadas
2.
Curr Opin Struct Biol ; 86: 102794, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38663170

RESUMO

Engineering new molecules with desirable functions and properties has the potential to extend our ability to engineer proteins beyond what nature has so far evolved. Advances in the so-called 'de novo' design problem have recently been brought forward by developments in artificial intelligence. Generative architectures, such as language models and diffusion processes, seem adept at generating novel, yet realistic proteins that display desirable properties and perform specified functions. State-of-the-art design protocols now achieve experimental success rates nearing 20%, thus widening the access to de novo designed proteins. Despite extensive progress, there are clear field-wide challenges, for example, in determining the best in silico metrics to prioritise designs for experimental testing, and in designing proteins that can undergo large conformational changes or be regulated by post-translational modifications. With an increase in the number of models being developed, this review provides a framework to understand how these tools fit into the overall process of de novo protein design. Throughout, we highlight the power of incorporating biochemical knowledge to improve performance and interpretability.


Assuntos
Inteligência Artificial , Engenharia de Proteínas , Proteínas , Proteínas/química , Proteínas/metabolismo , Engenharia de Proteínas/métodos , Modelos Moleculares , Conformação Proteica
3.
Eur J Clin Invest ; 54(6): e14183, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38381530

RESUMO

Large language models (LLMs) are a type of machine learning model that learn statistical patterns over text, such as predicting the next words in a sequence of text. Both general purpose and task-specific LLMs have demonstrated potential across diverse applications. Science and medicine have many data types that are highly suitable for LLMs, such as scientific texts (publications, patents and textbooks), electronic medical records, large databases of DNA and protein sequences and chemical compounds. Carefully validated systems that can understand and reason across all these modalities may maximize benefits. Despite the inevitable limitations and caveats of any new technology and some uncertainties specific to LLMs, LLMs have the potential to be transformative in science and medicine.


Assuntos
Aprendizado de Máquina , Humanos , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Medicina , Ciência , Patentes como Assunto
4.
Nat Biotechnol ; 42(2): 275-283, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37095349

RESUMO

Natural evolution must explore a vast landscape of possible sequences for desirable yet rare mutations, suggesting that learning from natural evolutionary strategies could guide artificial evolution. Here we report that general protein language models can efficiently evolve human antibodies by suggesting mutations that are evolutionarily plausible, despite providing the model with no information about the target antigen, binding specificity or protein structure. We performed language-model-guided affinity maturation of seven antibodies, screening 20 or fewer variants of each antibody across only two rounds of laboratory evolution, and improved the binding affinities of four clinically relevant, highly mature antibodies up to sevenfold and three unmatured antibodies up to 160-fold, with many designs also demonstrating favorable thermostability and viral neutralization activity against Ebola and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pseudoviruses. The same models that improve antibody binding also guide efficient evolution across diverse protein families and selection pressures, including antibiotic resistance and enzyme activity, suggesting that these results generalize to many settings.


Assuntos
Anticorpos Neutralizantes , Anticorpos Antivirais , Humanos , Testes de Neutralização , Anticorpos Antivirais/genética , Anticorpos Neutralizantes/química , SARS-CoV-2/genética , Mutação
5.
ArXiv ; 2023 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-37292483

RESUMO

Directed evolution of proteins has been the most effective method for protein engineering. However, a new paradigm is emerging, fusing the library generation and screening approaches of traditional directed evolution with computation through the training of machine learning models on protein sequence fitness data. This chapter highlights successful applications of machine learning to protein engineering and directed evolution, organized by the improvements that have been made with respect to each step of the directed evolution cycle. Additionally, we provide an outlook for the future based on the current direction of the field, namely in the development of calibrated models and in incorporating other modalities, such as protein structure.

6.
Science ; 379(6637): 1123-1130, 2023 03 17.
Artigo em Inglês | MEDLINE | ID: mdl-36927031

RESUMO

Recent advances in machine learning have leveraged evolutionary information in multiple sequence alignments to predict protein structure. We demonstrate direct inference of full atomic-level protein structure from primary sequence using a large language model. As language models of protein sequences are scaled up to 15 billion parameters, an atomic-resolution picture of protein structure emerges in the learned representations. This results in an order-of-magnitude acceleration of high-resolution structure prediction, which enables large-scale structural characterization of metagenomic proteins. We apply this capability to construct the ESM Metagenomic Atlas by predicting structures for >617 million metagenomic protein sequences, including >225 million that are predicted with high confidence, which gives a view into the vast breadth and diversity of natural proteins.


Assuntos
Evolução Molecular , Aprendizado de Máquina , Proteínas , Análise de Sequência de Proteína , Sequência de Aminoácidos , Proteínas/química , Conformação Proteica
7.
bioRxiv ; 2023 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-38187780

RESUMO

Large language models trained on sequence information alone are capable of learning high level principles of protein design. However, beyond sequence, the three-dimensional structures of proteins determine their specific function, activity, and evolvability. Here we show that a general protein language model augmented with protein structure backbone coordinates and trained on the inverse folding problem can guide evolution for diverse proteins without needing to explicitly model individual functional tasks. We demonstrate inverse folding to be an effective unsupervised, structure-based sequence optimization strategy that also generalizes to multimeric complexes by implicitly learning features of binding and amino acid epistasis. Using this approach, we screened ~30 variants of two therapeutic clinical antibodies used to treat SARS-CoV-2 infection and achieved up to 26-fold improvement in neutralization and 37-fold improvement in affinity against antibody-escaped viral variants-of-concern BQ.1.1 and XBB.1.5, respectively. In addition to substantial overall improvements in protein function, we find inverse folding performs with leading experimental success rates among other reported machine learning-guided directed evolution methods, without requiring any task-specific training data.

8.
Cell Syst ; 13(4): 274-285.e6, 2022 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-35120643

RESUMO

The degree to which evolution is predictable is a fundamental question in biology. Previous attempts to predict the evolution of protein sequences have been limited to specific proteins and to small changes, such as single-residue mutations. Here, we demonstrate that by using a protein language model to predict the local evolution within protein families, we recover a dynamic "vector field" of protein evolution that we call evolutionary velocity (evo-velocity). Evo-velocity generalizes to evolution over vastly different timescales, from viral proteins evolving over years to eukaryotic proteins evolving over geologic eons, and can predict the evolutionary dynamics of proteins that were not used to develop the original model. Evo-velocity also yields new evolutionary insights by predicting strategies of viral-host immune escape, resolving conflicting theories on the evolution of serpins, and revealing a key role of horizontal gene transfer in the evolution of eukaryotic glycolysis.


Assuntos
Evolução Molecular , Idioma , Sequência de Aminoácidos , Mutação/genética , Proteínas/genética
9.
Sci Transl Med ; 14(633): eabk3445, 2022 Feb 23.
Artigo em Inglês | MEDLINE | ID: mdl-35014856

RESUMO

SARS-CoV-2 evolution threatens vaccine- and natural infection-derived immunity as well as the efficacy of therapeutic antibodies. To improve public health preparedness, we sought to predict which existing amino acid mutations in SARS-CoV-2 might contribute to future variants of concern. We tested the predictive value of features comprising epidemiology, evolution, immunology, and neural network-based protein sequence modeling, and identified primary biological drivers of SARS-CoV-2 intra-pandemic evolution. We found evidence that ACE2-mediated transmissibility and resistance to population-level host immunity has waxed and waned as a primary driver of SARS-CoV-2 evolution over time. We retroactively identified with high accuracy (area under the receiver operator characteristic curve, AUROC=0.92-0.97) mutations that will spread, at up to four months in advance, across different phases of the pandemic. The behavior of the model was consistent with a plausible causal structure wherein epidemiological covariates combine the effects of diverse and shifting drivers of viral fitness. We applied our model to forecast mutations that will spread in the future and characterize how these mutations affect the binding of therapeutic antibodies. These findings demonstrate that it is possible to forecast the driver mutations that could appear in emerging SARS-CoV-2 variants of concern. We validate this result against Omicron, showing elevated predictive scores for its component mutations prior to emergence, and rapid score increase across daily forecasts during emergence. This modeling approach may be applied to any rapidly evolving pathogens with sufficiently dense genomic surveillance data, such as influenza, and unknown future pandemic viruses.


Assuntos
COVID-19 , SARS-CoV-2 , COVID-19/virologia , Humanos , Mutação , Pandemias , SARS-CoV-2/genética
10.
Curr Opin Struct Biol ; 72: 145-152, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34896756

RESUMO

Machine-learning models that learn from data to predict how protein sequence encodes function are emerging as a useful protein engineering tool. However, when using these models to suggest new protein designs, one must deal with the vast combinatorial complexity of protein sequences. Here, we review how to use a sequence-to-function machine-learning surrogate model to select sequences for experimental measurement. First, we discuss how to select sequences through a single round of machine-learning optimization. Then, we discuss sequential optimization, where the goal is to discover optimized sequences and improve the model across multiple rounds of training, optimization, and experimental measurement.


Assuntos
Aprendizado de Máquina , Engenharia de Proteínas , Sequência de Aminoácidos , Proteínas
11.
Genome Biol ; 22(1): 131, 2021 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-33941239

RESUMO

A complete understanding of biological processes requires synthesizing information across heterogeneous modalities, such as age, disease status, or gene expression. Technological advances in single-cell profiling have enabled researchers to assay multiple modalities simultaneously. We present Schema, which uses a principled metric learning strategy that identifies informative features in a modality to synthesize disparate modalities into a single coherent interpretation. We use Schema to infer cell types by integrating gene expression and chromatin accessibility data; demonstrate informative data visualizations that synthesize multiple modalities; perform differential gene expression analysis in the context of spatial variability; and estimate evolutionary pressure on peptide sequences.


Assuntos
Montagem e Desmontagem da Cromatina , Cromatina/genética , Cromatina/metabolismo , Biologia Computacional , Perfilação da Expressão Gênica/métodos , Aprendizado de Máquina , Análise de Célula Única/métodos , Biologia Computacional/métodos , Regulação da Expressão Gênica , Especificidade de Órgãos/genética , Transcriptoma
12.
Science ; 371(6526): 284-288, 2021 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-33446556

RESUMO

The ability for viruses to mutate and evade the human immune system and cause infection, called viral escape, remains an obstacle to antiviral and vaccine development. Understanding the complex rules that govern escape could inform therapeutic design. We modeled viral escape with machine learning algorithms originally developed for human natural language. We identified escape mutations as those that preserve viral infectivity but cause a virus to look different to the immune system, akin to word changes that preserve a sentence's grammaticality but change its meaning. With this approach, language models of influenza hemagglutinin, HIV-1 envelope glycoprotein (HIV Env), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Spike viral proteins can accurately predict structural escape patterns using sequence data alone. Our study represents a promising conceptual bridge between natural language and viral evolution.


Assuntos
Síndrome da Imunodeficiência Adquirida/imunologia , COVID-19/imunologia , HIV-1/genética , Vírus da Influenza A/genética , Influenza Humana/imunologia , SARS-CoV-2/genética , Síndrome da Imunodeficiência Adquirida/virologia , Sítios de Ligação , COVID-19/virologia , Evolução Molecular , Glicoproteínas de Hemaglutininação de Vírus da Influenza/química , Glicoproteínas de Hemaglutininação de Vírus da Influenza/genética , Humanos , Influenza Humana/virologia , Mutação , Domínios Proteicos , Glicoproteína da Espícula de Coronavírus/química , Glicoproteína da Espícula de Coronavírus/genética , Produtos do Gene env do Vírus da Imunodeficiência Humana/química , Produtos do Gene env do Vírus da Imunodeficiência Humana/genética
13.
Cell Syst ; 11(5): 461-477.e9, 2020 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-33065027

RESUMO

Machine learning that generates biological hypotheses has transformative potential, but most learning algorithms are susceptible to pathological failure when exploring regimes beyond the training data distribution. A solution to address this issue is to quantify prediction uncertainty so that algorithms can gracefully handle novel phenomena that confound standard methods. Here, we demonstrate the broad utility of robust uncertainty prediction in biological discovery. By leveraging Gaussian process-based uncertainty prediction on modern pre-trained features, we train a model on just 72 compounds to make predictions over a 10,833-compound library, identifying and experimentally validating compounds with nanomolar affinity for diverse kinases and whole-cell growth inhibition of Mycobacterium tuberculosis. Uncertainty facilitates a tight iterative loop between computation and experimentation and generalizes across biological domains as diverse as protein engineering and single-cell transcriptomics. More broadly, our work demonstrates that uncertainty should play a key role in the increasing adoption of machine learning algorithms into the experimental lifecycle.


Assuntos
Biologia Computacional/métodos , Previsões/métodos , Incerteza , Algoritmos , Aprendizado de Máquina/tendências , Distribuição Normal
14.
Cell Syst ; 8(6): 483-493.e7, 2019 06 26.
Artigo em Inglês | MEDLINE | ID: mdl-31176620

RESUMO

Large-scale single-cell RNA sequencing (scRNA-seq) studies that profile hundreds of thousands of cells are becoming increasingly common, overwhelming existing analysis pipelines. Here, we describe how to enhance and accelerate single-cell data analysis by summarizing the transcriptomic heterogeneity within a dataset using a small subset of cells, which we refer to as a geometric sketch. Our sketches provide more comprehensive visualization of transcriptional diversity, capture rare cell types with high sensitivity, and reveal biological cell types via clustering. Our sketch of umbilical cord blood cells uncovers a rare subpopulation of inflammatory macrophages, which we experimentally validated. The construction of our sketches is extremely fast, which enabled us to accelerate other crucial resource-intensive tasks, such as scRNA-seq data integration, while maintaining accuracy. We anticipate our algorithm will become an increasingly essential step when sharing and analyzing the rapidly growing volume of scRNA-seq data and help enable the democratization of single-cell omics.


Assuntos
Análise de Célula Única/métodos , Transcriptoma , Algoritmos , Animais , Análise de Dados , Conjuntos de Dados como Assunto , Heterogeneidade Genética , Humanos , Macrófagos , RNA-Seq
15.
Nat Biotechnol ; 37(6): 685-691, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31061482

RESUMO

Integration of single-cell RNA sequencing (scRNA-seq) data from multiple experiments, laboratories and technologies can uncover biological insights, but current methods for scRNA-seq data integration are limited by a requirement for datasets to derive from functionally similar cells. We present Scanorama, an algorithm that identifies and merges the shared cell types among all pairs of datasets and accurately integrates heterogeneous collections of scRNA-seq data. We applied Scanorama to integrate and remove batch effects across 105,476 cells from 26 diverse scRNA-seq experiments representing 9 different technologies. Scanorama is sensitive to subtle temporal changes within the same cell lineage, successfully integrating functionally similar cells across time series data of CD14+ monocytes at different stages of differentiation into macrophages. Finally, we show that Scanorama is orders of magnitude faster than existing techniques and can integrate a collection of 1,095,538 cells in just ~9 h.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Transcriptoma/genética , Diferenciação Celular/genética , Humanos , Macrófagos/metabolismo , Monócitos/química , Monócitos/metabolismo
16.
Elife ; 82019 01 16.
Artigo em Inglês | MEDLINE | ID: mdl-30650056

RESUMO

Genome-wide association studies (GWAS) are a powerful approach for connecting genotype to phenotype. Most GWAS hits are located in cis-regulatory regions, but the underlying causal variants and their molecular mechanisms remain unknown. To better understand human cis-regulatory variation, we mapped quantitative trait loci for chromatin accessibility (caQTLs)-a key step in cis-regulation-in 1000 individuals from 10 diverse populations. Most caQTLs were shared across populations, allowing us to leverage the genetic diversity to fine-map candidate causal regulatory variants, several thousand of which have been previously implicated in GWAS. In addition, many caQTLs that affect the expression of distal genes also alter the landscape of long-range chromosomal interactions, suggesting a mechanism for long-range expression QTLs. In sum, our results show that molecular QTL mapping integrated across diverse populations provides a high-resolution view of how worldwide human genetic variation affects chromatin accessibility, gene expression, and phenotype. Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that minor issues remain unresolved (see decision letter).


Assuntos
Mapeamento Cromossômico/métodos , Variação Genética , Genética Populacional , Sequências Reguladoras de Ácido Nucleico/genética , Sequência de Bases , Linhagem Celular , Cromatina/genética , Cromossomos Humanos/genética , Estudo de Associação Genômica Ampla , Humanos , Ligação Proteica , Locos de Características Quantitativas/genética , Fatores de Transcrição/metabolismo
17.
Science ; 362(6412): 347-350, 2018 10 19.
Artigo em Inglês | MEDLINE | ID: mdl-30337410

RESUMO

Although combining data from multiple entities could power life-saving breakthroughs, open sharing of pharmacological data is generally not viable because of data privacy and intellectual property concerns. To this end, we leverage modern cryptographic tools to introduce a computational protocol for securely training a predictive model of drug-target interactions (DTIs) on a pooled dataset that overcomes barriers to data sharing by provably ensuring the confidentiality of all underlying drugs, targets, and observed interactions. Our protocol runs within days on a real dataset of more than 1 million interactions and is more accurate than state-of-the-art DTI prediction methods. Using our protocol, we discover previously unidentified DTIs that we experimentally validated via targeted assays. Our work lays a foundation for more effective and cooperative biomedical research.


Assuntos
Confidencialidade , Bases de Dados de Produtos Farmacêuticos/legislação & jurisprudência , Sistemas de Liberação de Medicamentos , Disseminação de Informação/legislação & jurisprudência , Disseminação de Informação/métodos , Farmacologia/legislação & jurisprudência , Simulação por Computador , Humanos
18.
Cell ; 165(3): 730-41, 2016 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-27087447

RESUMO

Cis-regulatory elements such as transcription factor (TF) binding sites can be identified genome-wide, but it remains far more challenging to pinpoint genetic variants affecting TF binding. Here, we introduce a pooling-based approach to mapping quantitative trait loci (QTLs) for molecular-level traits. Applying this to five TFs and a histone modification, we mapped thousands of cis-acting QTLs, with over 25-fold lower cost compared to standard QTL mapping. We found that single genetic variants frequently affect binding of multiple TFs, and CTCF can recruit all five TFs to its binding sites. These QTLs often affect local chromatin and transcription but can also influence long-range chromosomal contacts, demonstrating a role for natural genetic variation in chromosomal architecture. Thousands of these QTLs have been implicated in genome-wide association studies, providing candidate molecular mechanisms for many disease risk loci and suggesting that TF binding variation may underlie a large fraction of human phenotypic variation.


Assuntos
Imunoprecipitação da Cromatina/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Análise de Sequência de DNA/métodos , Fatores de Transcrição/metabolismo , Predisposição Genética para Doença , Código das Histonas , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA