Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
1.
Cell Syst ; 15(1): 4-18.e4, 2024 01 17.
Article in English | MEDLINE | ID: mdl-38194961

ABSTRACT

Machine learning-guided protein engineering is rapidly progressing; however, collecting high-quality, large datasets remains a bottleneck. Directed evolution and protein engineering studies often require extensive experimental processes to eliminate noise and label protein sequence-function data. Meta learning has proven effective in other fields in learning from noisy data via bi-level optimization given the availability of a small dataset with trusted labels. Here, we leverage meta learning approaches to overcome noisy and under-labeled data and expedite workflows in antibody engineering. We generate yeast display antibody mutagenesis libraries and screen them for target antigen binding followed by deep sequencing. We then create representative learning tasks, including learning from noisy training data, positive and unlabeled learning, and learning out of distribution properties. We demonstrate that meta learning has the potential to reduce experimental screening time and improve the robustness of machine learning models by training with noisy and under-labeled training data.


Subject(s)
Antibodies , Engineering , Amino Acid Sequence , Machine Learning , Mutagenesis
2.
Bioinform Adv ; 3(1): vbac094, 2023.
Article in English | MEDLINE | ID: mdl-36698759

ABSTRACT

Summary: Machine learning-guided protein engineering is a rapidly advancing field. Despite major experimental and computational advances, collecting protein genotype (sequence) and phenotype (function) data remains time- and resource-intensive. As a result, the quality and quantity of training data are often a limiting factor in developing machine learning models. Data augmentation techniques have been successfully applied to the fields of computer vision and natural language processing; however, there is a lack of such augmentation techniques for biological sequence data. Towards this end, we develop nucleotide augmentation (NTA), which leverages natural nucleotide codon degeneracy to augment protein sequence data via synonymous codon substitution. As a proof of concept for protein engineering, we test several online and offline augmentation implementations to train machine learning models with benchmark datasets of protein genotype and phenotype, revealing performance gains on par and surpassing benchmark models using a fraction of the training data. NTA also enables substantial improvements for classification tasks under heavy class imbalance. Availability and implementation: The code used in this study is publicly available at https://github.com/minotm/NTA. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

3.
Science ; 381(6656): eadh1720, 2023 07 28.
Article in English | MEDLINE | ID: mdl-37499032

ABSTRACT

Fine-tuning of protein-protein interactions occurs naturally through coevolution, but this process is difficult to recapitulate in the laboratory. We describe a platform for synthetic protein-protein coevolution that can isolate matched pairs of interacting muteins from complex libraries. This large dataset of coevolved complexes drove a systems-level analysis of molecular recognition between Z domain-affibody pairs spanning a wide range of structures, affinities, cross-reactivities, and orthogonalities, and captured a broad spectrum of coevolutionary networks. Furthermore, we harnessed pretrained protein language models to expand, in silico, the amino acid diversity of our coevolution screen, predicting remodeled interfaces beyond the reach of the experimental library. The integration of these approaches provides a means of simulating protein coevolution and generating protein complexes with diverse molecular recognition properties for biotechnology and synthetic biology.


Subject(s)
Directed Molecular Evolution , Protein Interaction Domains and Motifs , Proteins , Amino Acids/chemistry , Machine Learning , Proteins/chemistry , Directed Molecular Evolution/methods , Datasets as Topic , Staphylococcal Protein A/chemistry
4.
Trends Pharmacol Sci ; 43(2): 123-135, 2022 02.
Article in English | MEDLINE | ID: mdl-34895944

ABSTRACT

The biophysical and functional properties of monoclonal antibody (mAb) drug candidates are often improved by protein engineering methods to increase the probability of clinical efficacy. One emerging method is deep mutational scanning (DMS) which combines the power of exhaustive protein mutagenesis and functional screening with deep sequencing and bioinformatics. The application of DMS has yielded significant improvements to the affinity, specificity, and stability of several preclinical antibodies alongside novel applications such as introducing multi-specific binding properties. DMS has also been applied directly on target antigens to precisely map antibody-binding epitopes and notably to profile the mutational escape potential of viral targets (e.g., SARS-CoV-2 variants). Finally, DMS combined with machine learning is enabling advances in the computational screening and engineering of therapeutic antibodies.


Subject(s)
COVID-19 , SARS-CoV-2 , Antibodies, Viral , Humans , Spike Glycoprotein, Coronavirus
5.
PLoS One ; 12(11): e0187373, 2017.
Article in English | MEDLINE | ID: mdl-29155837

ABSTRACT

Complement is an important pathway in innate immunity, inflammation, and many disease processes. However, despite its importance, there are few validated mathematical models of complement activation. In this study, we developed an ensemble of experimentally validated reduced order complement models. We combined ordinary differential equations with logical rules to produce a compact yet predictive model of complement activation. The model, which described the lectin and alternative pathways, was an order of magnitude smaller than comparable models in the literature. We estimated an ensemble of model parameters from in vitro dynamic measurements of the C3a and C5a complement proteins. Subsequently, we validated the model on unseen C3a and C5a measurements not used for model training. Despite its small size, the model was surprisingly predictive. Global sensitivity and robustness analysis suggested complement was robust to any single therapeutic intervention. Only the simultaneous knockdown of both C3 and C5 consistently reduced C3a and C5a formation from all pathways. Taken together, we developed a validated mathematical model of complement activation that was computationally inexpensive, and could easily be incorporated into pre-existing or new pharmacokinetic models of immune system function. The model described experimental data, and predicted the need for multiple points of therapeutic intervention to fully disrupt complement activation.


Subject(s)
Complement Activation/genetics , Immunity, Innate , Inflammation/drug therapy , Lectins/immunology , Models, Theoretical , Complement C3/genetics , Complement C3/immunology , Complement C3a/genetics , Complement C3a/immunology , Complement C5/genetics , Complement C5/immunology , Complement C5a/genetics , Complement C5a/immunology , Gene Knockdown Techniques , Humans , Inflammation/immunology , Lectins/pharmacokinetics , Lectins/therapeutic use , Pharmacokinetics
6.
BMC Syst Biol ; 11(1): 10, 2017 01 25.
Article in English | MEDLINE | ID: mdl-28122561

ABSTRACT

BACKGROUND: Ensemble modeling is a promising approach for obtaining robust predictions and coarse grained population behavior in deterministic mathematical models. Ensemble approaches address model uncertainty by using parameter or model families instead of single best-fit parameters or fixed model structures. Parameter ensembles can be selected based upon simulation error, along with other criteria such as diversity or steady-state performance. Simulations using parameter ensembles can estimate confidence intervals on model variables, and robustly constrain model predictions, despite having many poorly constrained parameters. RESULTS: In this software note, we present a multiobjective based technique to estimate parameter or models ensembles, the Pareto Optimal Ensemble Technique in the Julia programming language (JuPOETs). JuPOETs integrates simulated annealing with Pareto optimality to estimate ensembles on or near the optimal tradeoff surface between competing training objectives. We demonstrate JuPOETs on a suite of multiobjective problems, including test functions with parameter bounds and system constraints as well as for the identification of a proof-of-concept biochemical model with four conflicting training objectives. JuPOETs identified optimal or near optimal solutions approximately six-fold faster than a corresponding implementation in Octave for the suite of test functions. For the proof-of-concept biochemical model, JuPOETs produced an ensemble of parameters that gave both the mean of the training data for conflicting data sets, while simultaneously estimating parameter sets that performed well on each of the individual objective functions. CONCLUSIONS: JuPOETs is a promising approach for the estimation of parameter and model ensembles using multiobjective optimization. JuPOETs can be adapted to solve many problem types, including mixed binary and continuous variable types, bilevel optimization problems and constrained problems without altering the base algorithm. JuPOETs is open source, available under an MIT license, and can be installed using the Julia package manager from the JuPOETs GitHub repository.


Subject(s)
Models, Biological , Programming Languages , Uncertainty
SELECTION OF CITATIONS
SEARCH DETAIL