Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters

Database
Language
Publication year range
1.
Bioinformatics ; 40(8)2024 08 02.
Article in English | MEDLINE | ID: mdl-39115383

ABSTRACT

SUMMARY: Deep mutational scanning (DMS) experiments provide a powerful method to measure the functional effects of genetic mutations at massive scales. However, the data generated from these experiments can be difficult to analyze, with significant variation between experimental replicates. To overcome this challenge, we developed popDMS, a computational method based on population genetics theory, to infer the functional effects of mutations from DMS data. Through extensive tests, we found that the functional effects of single mutations and epistasis inferred by popDMS are highly consistent across replicates, comparing favorably with existing methods. Our approach is flexible and can be widely applied to DMS data that includes multiple time points, multiple replicates, and different experimental conditions. AVAILABILITY AND IMPLEMENTATION: popDMS is implemented in Python and Julia, and is freely available on GitHub at https://github.com/bartonlab/popDMS.


Subject(s)
Mutation , Software , Epistasis, Genetic , Computational Biology/methods , Genetics, Population/methods , High-Throughput Nucleotide Sequencing/methods , DNA Mutational Analysis/methods , Humans , Algorithms
2.
bioRxiv ; 2024 Jul 16.
Article in English | MEDLINE | ID: mdl-39071321

ABSTRACT

Human immunodeficiency virus (HIV)-1 exhibits remarkable genetic diversity. For this reason, an effective HIV-1 vaccine must elicit antibodies that can neutralize many variants of the virus. While broadly neutralizing antibodies (bnAbs) have been isolated from HIV-1 infected individuals, a general understanding of the virus-antibody coevolutionary processes that lead to their development remains incomplete. We performed a quantitative study of HIV-1 evolution in two individuals who developed bnAbs. We observed strong selection early in infection for mutations affecting HIV-1 envelope glycosylation and escape from autologous strain-specific antibodies, followed by weaker selection for bnAb resistance later in infection. To confirm our findings, we analyzed data from rhesus macaques infected with viruses derived from the same two individuals. We inferred remarkably similar fitness effects of HIV-1 mutations in humans and macaques. Moreover, we observed a striking pattern of rapid HIV-1 evolution, consistent in both humans and macaques, that precedes the development of bnAbs. Our work highlights strong parallels between infection in rhesus macaques and humans, and it reveals a quantitative evolutionary signature of bnAb development.

3.
Phys Rev E ; 107(2-1): 024116, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36932614

ABSTRACT

Many dynamical systems, from quantum many-body systems to evolving populations to financial markets, are described by stochastic processes. Parameters characterizing such processes can often be inferred using information integrated over stochastic paths. However, estimating time-integrated quantities from real data with limited time resolution is challenging. Here, we propose a framework for accurately estimating time-integrated quantities using Bézier interpolation. We applied our approach to two dynamical inference problems: Determining fitness parameters for evolving populations and inferring forces driving Ornstein-Uhlenbeck processes. We found that Bézier interpolation reduces the estimation bias for both dynamical inference problems. This improvement was especially noticeable for data sets with limited time resolution. Our method could be broadly applied to improve accuracy for other dynamical inference problems using finitely sampled data.

4.
Phys Rev E ; 104(2-1): 024407, 2021 Aug.
Article in English | MEDLINE | ID: mdl-34525554

ABSTRACT

Boltzmann machines (BMs) are widely used as generative models. For example, pairwise Potts models (PMs), which are instances of the BM class, provide accurate statistical models of families of evolutionarily related protein sequences. Their parameters are the local fields, which describe site-specific patterns of amino acid conservation, and the two-site couplings, which mirror the coevolution between pairs of sites. This coevolution reflects structural and functional constraints acting on protein sequences during evolution. The most conservative choice to describe the coevolution signal is to include all possible two-site couplings into the PM. This choice, typical of what is known as Direct Coupling Analysis, has been successful for predicting residue contacts in the three-dimensional structure, mutational effects, and generating new functional sequences. However, the resulting PM suffers from important overfitting effects: many couplings are small, noisy, and hardly interpretable; the PM is close to a critical point, meaning that it is highly sensitive to small parameter perturbations. In this work, we introduce a general parameter-reduction procedure for BMs, via a controlled iterative decimation of the less statistically significant couplings, identified by an information-based criterion that selects either weak or statistically unsupported couplings. For several protein families, our procedure allows one to remove more than 90% of the PM couplings, while preserving the predictive and generative properties of the original dense PM, and the resulting model is far away from criticality, hence more robust to noise.

5.
Phys Rev E ; 100(3-1): 032128, 2019 Sep.
Article in English | MEDLINE | ID: mdl-31639992

ABSTRACT

Statistical models for families of evolutionary related proteins have recently gained interest: In particular, pairwise Potts models as those inferred by the direct-coupling analysis have been able to extract information about the three-dimensional structure of folded proteins and about the effect of amino acid substitutions in proteins. These models are typically requested to reproduce the one- and two-point statistics of the amino acid usage in a protein family, i.e., to capture the so-called residue conservation and covariation statistics of proteins of common evolutionary origin. Pairwise Potts models are the maximum-entropy models achieving this. Although being successful, these models depend on huge numbers of ad hoc introduced parameters, which have to be estimated from finite amounts of data and whose biophysical interpretation remains unclear. Here, we propose an approach to parameter reduction, which is based on selecting collective sequence motifs. It naturally leads to the formulation of statistical sequence models in terms of Hopfield-Potts models. These models can be accurately inferred using a mapping to restricted Boltzmann machines and persistent contrastive divergence. We show that, when applied to protein data, even 20-40 patterns are sufficient to obtain statistically close-to-generative models. The Hopfield patterns form interpretable sequence motifs and may be used to clusterize amino acid sequences into functional subfamilies. However, the distributed collective nature of these motifs intrinsically limits the ability of Hopfield-Potts models in predicting contact maps, showing the necessity of developing models going beyond the Hopfield-Potts models discussed here.


Subject(s)
Models, Statistical , Proteins/chemistry , Amino Acid Motifs , Cluster Analysis , Likelihood Functions , Protein Folding
SELECTION OF CITATIONS
SEARCH DETAIL