Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 53
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PLoS Comput Biol ; 19(11): e1011621, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37976326

RESUMO

We present here an approach to protein design that combines (i) scarce functional information such as experimental data (ii) evolutionary information learned from a natural sequence variants and (iii) physics-grounded modeling. Using a Restricted Boltzmann Machine (RBM), we learn a sequence model of a protein family. We use semi-supervision to leverage available functional information during the RBM training. We then propose a strategy to explore the protein representation space that can be informed by external models such as an empirical force-field method (FoldX). Our approach is applied to a domain of the Cas9 protein responsible for recognition of a short DNA motif. We experimentally assess the functionality of 71 variants generated to explore a range of RBM and FoldX energies. Sequences with as many as 50 differences (20% of the protein domain) to the wild-type retained functionality. Overall, 21/71 sequences designed with our method were functional. Interestingly, 6/71 sequences showed an improved activity in comparison with the original wild-type protein sequence. These results demonstrate the interest in further exploring the synergies between machine-learning of protein sequence representations and physics grounded modeling strategies informed by structural information.


Assuntos
Sistemas CRISPR-Cas , Proteínas , Proteínas/genética , Proteínas/química , Sequência de Aminoácidos , Aprendizado de Máquina , Aprendizagem
2.
PLoS Comput Biol ; 19(10): e1011521, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37883593

RESUMO

Predicting the effects of mutations on protein function is an important issue in evolutionary biology and biomedical applications. Computational approaches, ranging from graphical models to deep-learning architectures, can capture the statistical properties of sequence data and predict the outcome of high-throughput mutagenesis experiments probing the fitness landscape around some wild-type protein. However, how the complexity of the models and the characteristics of the data combine to determine the predictive performance remains unclear. Here, based on a theoretical analysis of the prediction error, we propose descriptors of the sequence data, characterizing their quantity and relevance relative to the model. Our theoretical framework identifies a trade-off between these two quantities, and determines the optimal subset of data for the prediction task, showing that simple models can outperform complex ones when inferred from adequately-selected sequences. We also show how repeated subsampling of the sequence data is informative about how much epistasis in the fitness landscape is not captured by the computational model. Our approach is illustrated on several protein families, as well as on in silico solvable protein models.


Assuntos
Evolução Biológica , Proteínas , Proteínas/genética , Mutagênese , Mutação , Simulação por Computador , Aptidão Genética/genética , Modelos Genéticos
3.
Elife ; 122023 09 08.
Artigo em Inglês | MEDLINE | ID: mdl-37681658

RESUMO

Antigen immunogenicity and the specificity of binding of T-cell receptors to antigens are key properties underlying effective immune responses. Here we propose diffRBM, an approach based on transfer learning and Restricted Boltzmann Machines, to build sequence-based predictive models of these properties. DiffRBM is designed to learn the distinctive patterns in amino-acid composition that, on the one hand, underlie the antigen's probability of triggering a response, and on the other hand the T-cell receptor's ability to bind to a given antigen. We show that the patterns learnt by diffRBM allow us to predict putative contact sites of the antigen-receptor complex. We also discriminate immunogenic and non-immunogenic antigens, antigen-specific and generic receptors, reaching performances that compare favorably to existing sequence-based predictors of antigen immunogenicity and T-cell receptor specificity.


Assuntos
Aminoácidos , Aprendizagem , Especificidade do Receptor de Antígeno de Linfócitos T , Membrana Celular , Membranas Mitocondriais
4.
Phys Rev E ; 108(2-1): 024141, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37723761

RESUMO

We study transition paths in energy landscapes over multicategorical Potts configurations using the mean-field approach introduced by Mauri et al. [Phys. Rev. Lett. 130, 158402 (2023)0031-900710.1103/PhysRevLett.130.158402]. Paths interpolate between two fixed configurations or are anchored at one extremity only. We characterize the properties of "good" transition paths realizing a trade-off between exploring low-energy regions in the landscape and being not too long, such as their entropy or the probability of escape from a region of the landscape. We unveil the existence of a phase transition separating a regime in which paths are stretched in between their anchors from another regime where paths can explore the energy landscape more globally to minimize the energy. This phase transition is first illustrated and studied in detail on a mathematically tractable Hopfield-Potts toy model, then studied in energy landscapes inferred from protein sequence data.

5.
Phys Rev Lett ; 130(15): 158402, 2023 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-37115874

RESUMO

Identifying and characterizing mutational paths is an important issue in evolutionary biology, with potential applications to bioengineering. We here propose an algorithm to sample mutational paths, which we benchmark on exactly solvable models of proteins in silico, and apply to data-driven models of natural proteins learned from sequence data with restricted Boltzmann machines. We then use mean-field theory to characterize paths for different mutational dynamics of interest, and to extend Kimura's estimate of evolutionary distances to sequence-based epistatic models of selection.


Assuntos
Evolução Biológica , Proteínas , Mutação , Proteínas/genética , Algoritmos
6.
Curr Opin Struct Biol ; 80: 102571, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36947951

RESUMO

Computational protein design facilitates the discovery of novel proteins with prescribed structure and functionality. Exciting designs were recently reported using novel data-driven methodologies that can be roughly divided into two categories: evolutionary-based and physics-inspired approaches. The former infer characteristic sequence features shared by sets of evolutionary-related proteins, such as conserved or coevolving positions, and recombine them to generate candidates with similar structure and function. The latter approaches estimate key biochemical properties, such as structure free energy, conformational entropy, or binding affinities using machine learning surrogates, and optimize them to yield improved designs. Here, we review recent progress along both tracks, discuss their strengths and weaknesses, and highlight opportunities for synergistic approaches.


Assuntos
Aprendizado de Máquina , Proteínas , Proteínas/química , Física , Bases de Dados de Proteínas
7.
Elife ; 122023 03 14.
Artigo em Inglês | MEDLINE | ID: mdl-36916902

RESUMO

Establishing accurate as well as interpretable models of network activity is an open challenge in systems neuroscience. Here, we infer an energy-based model of the anterior rhombencephalic turning region (ARTR), a circuit that controls zebrafish swimming statistics, using functional recordings of the spontaneous activity of hundreds of neurons. Although our model is trained to reproduce the low-order statistics of the network activity at short time scales, its simulated dynamics quantitatively captures the slowly alternating activity of the ARTR. It further reproduces the modulation of this persistent dynamics by the water temperature and visual stimulation. Mathematical analysis of the model unveils a low-dimensional landscape-based representation of the ARTR activity, where the slow network dynamics reflects Arrhenius-like barriers crossings between metastable states. Our work thus shows how data-driven models built from large neural populations recordings can be reduced to low-dimensional functional models in order to reveal the fundamental mechanisms controlling the collective neuronal dynamics.


Assuntos
Redes Neurais de Computação , Peixe-Zebra , Animais , Peixe-Zebra/fisiologia , Neurônios/fisiologia , Natação , Estimulação Luminosa , Modelos Neurológicos
8.
Phys Rev E ; 108(6-1): 064301, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38243526

RESUMO

Continuous attractor neural networks (CANN) form an appealing conceptual model for the storage of information in the brain. However a drawback of CANN is that they require finely tuned interactions. We here study the effect of quenched noise in the interactions on the coding of positional information within CANN. Using the replica method we compute the Fisher information for a network with position-dependent input and recurrent connections composed of a short-range (in space) and a disordered component. We find that the loss in positional information is small for not too large disorder strength, indicating that CANN have a regime in which the advantageous effects of local connectivity on information storage outweigh the detrimental ones. Furthermore, a substantial part of this information can be extracted with a simple linear readout.

9.
PLoS Comput Biol ; 18(9): e1010561, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-36174101

RESUMO

Selection protocols such as SELEX, where molecules are selected over multiple rounds for their ability to bind to a target of interest, are popular methods for obtaining binders for diagnostic and therapeutic purposes. We show that Restricted Boltzmann Machines (RBMs), an unsupervised two-layer neural network architecture, can successfully be trained on sequence ensembles from single rounds of SELEX experiments for thrombin aptamers. RBMs assign scores to sequences that can be directly related to their fitnesses estimated through experimental enrichment ratios. Hence, RBMs trained from sequence data at a given round can be used to predict the effects of selection at later rounds. Moreover, the parameters of the trained RBMs are interpretable and identify functional features contributing most to sequence fitness. To exploit the generative capabilities of RBMs, we introduce two different training protocols: one taking into account sequence counts, capable of identifying the few best binders, and another based on unique sequences only, generating more diverse binders. We then use RBMs model to generate novel aptamers with putative disruptive mutations or good binding properties, and validate the generated sequences with gel shift assay experiments. Finally, we compare the RBM's performance with different supervised learning approaches that include random forests and several deep neural network architectures.


Assuntos
Redes Neurais de Computação , Trombina , Aprendizado de Máquina
10.
Nat Commun ; 13(1): 4122, 2022 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-35840595

RESUMO

Episodic memory formation and recall are complementary processes that rely on opposing neuronal computations in the hippocampus. How this conflict is resolved in hippocampal circuits is unclear. To address this question, we obtained in vivo whole-cell patch-clamp recordings from dentate gyrus granule cells in head-fixed mice trained to explore and distinguish between familiar and novel virtual environments. We find that granule cells consistently show a small transient depolarisation upon transition to a novel environment. This synaptic novelty signal is sensitive to local application of atropine, indicating that it depends on metabotropic acetylcholine receptors. A computational model suggests that the synaptic response to novelty may bias granule cell population activity, which can drive downstream attractor networks to a new state, favouring the switch from recall to new memory formation when faced with novelty. Such a novelty-driven switch may enable flexible encoding of new memories while preserving stable retrieval of familiar ones.


Assuntos
Hipocampo , Memória Episódica , Animais , Giro Denteado/fisiologia , Hipocampo/fisiologia , Rememoração Mental/fisiologia , Camundongos , Neurônios/fisiologia
11.
Nature ; 606(7913): 389-395, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35589842

RESUMO

Cancer immunoediting1 is a hallmark of cancer2 that predicts that lymphocytes kill more immunogenic cancer cells to cause less immunogenic clones to dominate a population. Although proven in mice1,3, whether immunoediting occurs naturally in human cancers remains unclear. Here, to address this, we investigate how 70 human pancreatic cancers evolved over 10 years. We find that, despite having more time to accumulate mutations, rare long-term survivors of pancreatic cancer who have stronger T cell activity in primary tumours develop genetically less heterogeneous recurrent tumours with fewer immunogenic mutations (neoantigens). To quantify whether immunoediting underlies these observations, we infer that a neoantigen is immunogenic (high-quality) by two features-'non-selfness'  based on neoantigen similarity to known antigens4,5, and 'selfness'  based on the antigenic distance required for a neoantigen to differentially bind to the MHC or activate a T cell compared with its wild-type peptide. Using these features, we estimate cancer clone fitness as the aggregate cost of T cells recognizing high-quality neoantigens offset by gains from oncogenic mutations. With this model, we predict the clonal evolution of tumours to reveal that long-term survivors of pancreatic cancer develop recurrent tumours with fewer high-quality neoantigens. Thus, we submit evidence that that the human immune system naturally edits neoantigens. Furthermore, we present a model to predict how immune pressure induces cancer cell populations to evolve over time. More broadly, our results argue that the immune system fundamentally surveils host genetic changes to suppress cancer.


Assuntos
Antígenos de Neoplasias , Sobreviventes de Câncer , Neoplasias Pancreáticas , Antígenos de Neoplasias/genética , Antígenos de Neoplasias/imunologia , Humanos , Neoplasias Pancreáticas/genética , Neoplasias Pancreáticas/imunologia , Neoplasias Pancreáticas/patologia , Linfócitos T/imunologia , Evasão Tumoral/imunologia
12.
Phys Rev E ; 104(3-1): 034109, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34654094

RESUMO

Restricted Boltzmann machines (RBM) are bilayer neural networks used for the unsupervised learning of model distributions from data. The bipartite architecture of RBM naturally defines an elegant sampling procedure, called alternating Gibbs sampling (AGS), where the configurations of the latent-variable layer are sampled conditional to the data-variable layer and vice versa. We study here the performance of AGS on several analytically tractable models borrowed from statistical mechanics. We show that standard AGS is not more efficient than classical Metropolis-Hastings (MH) sampling of the effective energy landscape defined on the data layer. However, RBM can identify meaningful representations of training data in their latent space. Furthermore, using these representations and combining Gibbs sampling with the MH algorithm in the latent space can enhance the sampling performance of the RBM when the hidden units encode weakly dependent features of the data. We illustrate our findings on three datasets: Bars and Stripes and MNIST, well known in machine learning, and the so-called lattice proteins dataset, introduced in theoretical biology to study the sequence-to-structure mapping in proteins.

13.
PLoS Comput Biol ; 17(9): e1009297, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34473697

RESUMO

With the increasing ability to use high-throughput next-generation sequencing to quantify the diversity of the human T cell receptor (TCR) repertoire, the ability to use TCR sequences to infer antigen-specificity could greatly aid potential diagnostics and therapeutics. Here, we use a machine-learning approach known as Restricted Boltzmann Machine to develop a sequence-based inference approach to identify antigen-specific TCRs. Our approach combines probabilistic models of TCR sequences with clone abundance information to extract TCR sequence motifs central to an antigen-specific response. We use this model to identify patient personalized TCR motifs that respond to individual tumor and infectious disease antigens, and to accurately discriminate specific from non-specific responses. Furthermore, the hidden structure of the model results in an interpretable representation space where TCRs responding to the same antigen cluster, correctly discriminating the response of TCR to different viral epitopes. The model can be used to identify condition specific responding TCRs. We focus on the examples of TCRs reactive to candidate neoantigens and selected epitopes in experiments of stimulated TCR clone expansion.


Assuntos
Biologia Computacional/métodos , Modelos Estatísticos , Linfócitos T/imunologia , Sobreviventes de Câncer , Carcinoma Ductal Pancreático/imunologia , Análise por Conglomerados , Conjuntos de Dados como Assunto , Humanos , Neoplasias Pancreáticas/imunologia , Receptores de Antígenos de Linfócitos T/imunologia
14.
Bioinformatics ; 37(22): 4083-4090, 2021 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-34117879

RESUMO

MOTIVATION: Modeling of protein family sequence distribution from homologous sequence data recently received considerable attention, in particular for structure and function predictions, as well as for protein design. In particular, direct coupling analysis, a method to infer effective pairwise interactions between residues, was shown to capture important structural constraints and to successfully generate functional protein sequences. Building on this and other graphical models, we introduce a new framework to assess the quality of the secondary structures of the generated sequences with respect to reference structures for the family. RESULTS: We introduce two scoring functions characterizing the likeliness of the secondary structure of a protein sequence to match a reference structure, called Dot Product and Pattern Matching. We test these scores on published experimental protein mutagenesis and design dataset, and show improvement in the detection of nonfunctional sequences. We also show that use of these scores help rejecting nonfunctional sequences generated by graphical models (Restricted Boltzmann Machines) learned from homologous sequence alignments. AVAILABILITY AND IMPLEMENTATION: Data and code available at https://github.com/CyrilMa/ssqa. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas , Proteínas/química , Sequência de Aminoácidos , Alinhamento de Sequência , Estrutura Secundária de Proteína , Mutagênese
15.
Phys Rev E ; 103(5-1): 052413, 2021 May.
Artigo em Inglês | MEDLINE | ID: mdl-34134280

RESUMO

Affinity maturation (AM) is the process through which the immune system is able to develop potent antibodies against new pathogens it encounters, and is at the base of the efficacy of vaccines. At its core AM is analogous to a Darwinian evolutionary process, where B cells mutate and are selected on the base of their affinity for an antigen (Ag), and Ag availability tunes the selective pressure. In cases when this selective pressure is high, the number of B cells might quickly decrease and the population might risk extinction in what is known as a population bottleneck. Here we study the probability for a B-cell lineage to survive this bottleneck scenario as a function of the progenitor affinity for the Ag. Using recursive relations and probability generating functions we derive expressions for the average extinction time and progeny size for lineages that go extinct. We then extend our results to the full population, both in the absence and presence of competition for T-cell help, and quantify the population survival probability as a function of Ag concentration and initial population size. Our study suggests the population bottleneck phenomenology might represent a limit case in the space of biologically plausible maturation scenarios, whose characterization could help guide the process of vaccine development.


Assuntos
Afinidade de Anticorpos , Linfócitos B/imunologia
16.
PLoS Comput Biol ; 17(3): e1008751, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33765014

RESUMO

The sequences of antibodies from a given repertoire are highly diverse at few sites located on the surface of a genome-encoded larger scaffold. The scaffold is often considered to play a lesser role than highly diverse, non-genome-encoded sites in controlling binding affinity and specificity. To gauge the impact of the scaffold, we carried out quantitative phage display experiments where we compare the response to selection for binding to four different targets of three different antibody libraries based on distinct scaffolds but harboring the same diversity at randomized sites. We first show that the response to selection of an antibody library may be captured by two measurable parameters. Second, we provide evidence that one of these parameters is determined by the degree of affinity maturation of the scaffold, affinity maturation being the process by which antibodies accumulate somatic mutations to evolve towards higher affinities during the natural immune response. In all cases, we find that libraries of antibodies built around maturated scaffolds have a lower response to selection to other arbitrary targets than libraries built around germline-based scaffolds. We thus propose that germline-encoded scaffolds have a higher selective potential than maturated ones as a consequence of a selection for this potential over the long-term evolution of germline antibody genes. Our results are a first step towards quantifying the evolutionary potential of biomolecules.


Assuntos
Anticorpos/genética , Biblioteca Gênica , Biologia Computacional , DNA/genética , Evolução Molecular , Humanos
17.
Mol Biol Evol ; 38(6): 2428-2445, 2021 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-33555346

RESUMO

COVID-19 can lead to acute respiratory syndrome, which can be due to dysregulated immune signaling. We analyze the distribution of CpG dinucleotides, a pathogen-associated molecular pattern, in the SARS-CoV-2 genome. We characterize CpG content by a CpG force that accounts for statistical constraints acting on the genome at the nucleotidic and amino acid levels. The CpG force, as the CpG content, is overall low compared with other pathogenic betacoronaviruses; however, it widely fluctuates along the genome, with a particularly low value, comparable with the circulating seasonal HKU1, in the spike coding region and a greater value, comparable with SARS and MERS, in the highly expressed nucleocapside coding region (N ORF), whose transcripts are relatively abundant in the cytoplasm of infected cells and present in the 3'UTRs of all subgenomic RNA. This dual nature of CpG content could confer to SARS-CoV-2 the ability to avoid triggering pattern recognition receptors upon entry, while eliciting a stronger response during replication. We then investigate the evolution of synonymous mutations since the outbreak of the COVID-19 pandemic, finding a signature of CpG loss in regions with a greater CpG force. Sequence motifs preceding the CpG-loss-associated loci in the N ORF match recently identified binding patterns of the zinc finger antiviral protein. Using a model of the viral gene evolution under human host pressure, we find that synonymous mutations seem driven in the SARS-CoV-2 genome, and particularly in the N ORF, by the viral codon bias, the transition-transversion bias, and the pressure to lower CpG content.


Assuntos
COVID-19/genética , Ilhas de CpG , Evolução Molecular , Genoma Viral , RNA Viral/genética , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , SARS-CoV-2/patogenicidade
18.
Neural Comput ; 33(4): 1063-1112, 2021 03 26.
Artigo em Inglês | MEDLINE | ID: mdl-33513327

RESUMO

We study the learning dynamics and the representations emerging in recurrent neural networks (RNNs) trained to integrate one or multiple temporal signals. Combining analytical and numerical investigations, we characterize the conditions under which an RNN with n neurons learns to integrate D(≪n) scalar signals of arbitrary duration. We show, for linear, ReLU, and sigmoidal neurons, that the internal state lives close to a D-dimensional manifold, whose shape is related to the activation function. Each neuron therefore carries, to various degrees, information about the value of all integrals. We discuss the deep analogy between our results and the concept of mixed selectivity forged by computational neuroscientists to interpret cortical recordings.

19.
Cell Syst ; 12(2): 195-202.e9, 2021 02 17.
Artigo em Inglês | MEDLINE | ID: mdl-33338400

RESUMO

The recent increase of immunopeptidomics data, obtained by mass spectrometry or binding assays, opens up possibilities for investigating endogenous antigen presentation by the highly polymorphic human leukocyte antigen class I (HLA-I) protein. State-of-the-art methods predict with high accuracy presentation by HLA alleles that are well represented in databases at the time of release but have a poorer performance for rarer and less characterized alleles. Here, we introduce a method based on Restricted Boltzmann Machines (RBMs) for prediction of antigens presented on the Major Histocompatibility Complex (MHC) encoded by HLA genes-RBM-MHC. RBM-MHC can be trained on custom and newly available samples with no or a small amount of HLA annotations. RBM-MHC ensures improved predictions for rare alleles and matches state-of-the-art performance for well-characterized alleles while being less data demanding. RBM-MHC is shown to be a flexible and easily interpretable method that can be used as a predictor of cancer neoantigens and viral epitopes, as a tool for feature discovery, and to reconstruct peptide motifs presented on specific HLA molecules.


Assuntos
Apresentação de Antígeno/imunologia , Biologia Computacional/métodos , Antígenos de Histocompatibilidade Classe I/genética , Antígenos de Histocompatibilidade Classe I/imunologia , Algoritmos , Alelos , Apresentação de Antígeno/genética , Bases de Dados de Proteínas , Epitopos , Antígenos HLA/genética , Antígenos HLA/imunologia , Humanos , Aprendizado de Máquina , Complexo Principal de Histocompatibilidade/imunologia , Espectrometria de Massas/métodos , Modelos Teóricos , Peptídeos/química , Ligação Proteica
20.
Science ; 369(6502): 440-445, 2020 07 24.
Artigo em Inglês | MEDLINE | ID: mdl-32703877

RESUMO

The rational design of enzymes is an important goal for both fundamental and practical reasons. Here, we describe a process to learn the constraints for specifying proteins purely from evolutionary sequence data, design and build libraries of synthetic genes, and test them for activity in vivo using a quantitative complementation assay. For chorismate mutase, a key enzyme in the biosynthesis of aromatic amino acids, we demonstrate the design of natural-like catalytic function with substantial sequence diversity. Further optimization focuses the generative model toward function in a specific genomic context. The data show that sequence-based statistical models suffice to specify proteins and provide access to an enormous space of functional sequences. This result provides a foundation for a general process for evolution-based design of artificial proteins.


Assuntos
Corismato Mutase , Evolução Molecular , Modelos Genéticos , Modelos Estatísticos , Sequência de Aminoácidos , Corismato Mutase/química , Corismato Mutase/genética , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...