Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Bioinformatics ; 39(9)2023 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-37647658

RESUMEN

SUMMARY: DCAlign is a new alignment method able to cope with the conservation and the co-evolution signals that characterize the columns of multiple sequence alignments of homologous sequences. However, the pre-processing steps required to align a candidate sequence are computationally demanding. We show in v1.0 how to dramatically reduce the overall computing time by including an empirical prior over an informative set of variables mirroring the presence of insertions and deletions. AVAILABILITY AND IMPLEMENTATION: DCAlign v1.0 is implemented in Julia and it is fully available at https://github.com/infernet-h2020/DCAlign.


Asunto(s)
Alineación de Secuencia , Biología Computacional
2.
Proc Natl Acad Sci U S A ; 118(32)2021 08 10.
Artículo en Inglés | MEDLINE | ID: mdl-34312253

RESUMEN

Contact tracing is an essential tool to mitigate the impact of a pandemic, such as the COVID-19 pandemic. In order to achieve efficient and scalable contact tracing in real time, digital devices can play an important role. While a lot of attention has been paid to analyzing the privacy and ethical risks of the associated mobile applications, so far much less research has been devoted to optimizing their performance and assessing their impact on the mitigation of the epidemic. We develop Bayesian inference methods to estimate the risk that an individual is infected. This inference is based on the list of his recent contacts and their own risk levels, as well as personal information such as results of tests or presence of syndromes. We propose to use probabilistic risk estimation to optimize testing and quarantining strategies for the control of an epidemic. Our results show that in some range of epidemic spreading (typically when the manual tracing of all contacts of infected people becomes practically impossible but before the fraction of infected people reaches the scale where a lockdown becomes unavoidable), this inference of individuals at risk could be an efficient way to mitigate the epidemic. Our approaches translate into fully distributed algorithms that only require communication between individuals who have recently been in contact. Such communication may be encrypted and anonymized, and thus, it is compatible with privacy-preserving standards. We conclude that probabilistic risk estimation is capable of enhancing the performance of digital contact tracing and should be considered in the mobile applications.


Asunto(s)
Trazado de Contacto/métodos , Epidemias/prevención & control , Algoritmos , Teorema de Bayes , COVID-19/epidemiología , COVID-19/prevención & control , Trazado de Contacto/estadística & datos numéricos , Humanos , Aplicaciones Móviles , Privacidad , Medición de Riesgo , SARS-CoV-2
3.
Biophys J ; 121(10): 1919-1930, 2022 05 17.
Artículo en Inglés | MEDLINE | ID: mdl-35422414

RESUMEN

Despite major environmental and genetic differences, microbial metabolic networks are known to generate consistent physiological outcomes across vastly different organisms. This remarkable robustness suggests that, at least in bacteria, metabolic activity may be guided by universal principles. The constrained optimization of evolutionarily motivated objective functions, such as the growth rate, has emerged as the key theoretical assumption for the study of bacterial metabolism. While conceptually and practically useful in many situations, the idea that certain functions are optimized is hard to validate in data. Moreover, it is not always clear how optimality can be reconciled with the high degree of single-cell variability observed in experiments within microbial populations. To shed light on these issues, we develop an inverse modeling framework that connects the fitness of a population of cells (represented by the mean single-cell growth rate) to the underlying metabolic variability through the maximum entropy inference of the distribution of metabolic phenotypes from data. While no clear objective function emerges, we find that, as the medium gets richer, the fitness and inferred variability for Escherichia coli populations follow and slowly approach the theoretically optimal bound defined by minimal reduction of variability at given fitness. These results suggest that bacterial metabolism may be crucially shaped by a population-level trade-off between growth and heterogeneity.


Asunto(s)
Escherichia coli , Redes y Vías Metabólicas , Bacterias/metabolismo , Entropía , Escherichia coli/metabolismo , Fenotipo
4.
BMC Bioinformatics ; 22(1): 528, 2021 Oct 29.
Artículo en Inglés | MEDLINE | ID: mdl-34715775

RESUMEN

BACKGROUND: Boltzmann machines are energy-based models that have been shown to provide an accurate statistical description of domains of evolutionary-related protein and RNA families. They are parametrized in terms of local biases accounting for residue conservation, and pairwise terms to model epistatic coevolution between residues. From the model parameters, it is possible to extract an accurate prediction of the three-dimensional contact map of the target domain. More recently, the accuracy of these models has been also assessed in terms of their ability in predicting mutational effects and generating in silico functional sequences. RESULTS: Our adaptive implementation of Boltzmann machine learning, adabmDCA, can be generally applied to both protein and RNA families and accomplishes several learning set-ups, depending on the complexity of the input data and on the user requirements. The code is fully available at https://github.com/anna-pa-m/adabmDCA . As an example, we have performed the learning of three Boltzmann machines modeling the Kunitz and Beta-lactamase2 protein domains and TPP-riboswitch RNA domain. CONCLUSIONS: The models learned by adabmDCA are comparable to those obtained by state-of-the-art techniques for this task, in terms of the quality of the inferred contact map as well as of the synthetically generated sequences. In addition, the code implements both equilibrium and out-of-equilibrium learning, which allows for an accurate and lossless training when the equilibrium one is prohibitive in terms of computational time, and allows for pruning irrelevant parameters using an information-based criterion.


Asunto(s)
Aprendizaje Automático , Proteínas , Humanos , Proteínas/genética , ARN
5.
PNAS Nexus ; 3(9): pgae377, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39285934

RESUMEN

The recent COVID-19 pandemic underscores the significance of early stage nonpharmacological intervention strategies. The widespread use of masks and the systematic implementation of contact tracing strategies provide a potentially equally effective and socially less impactful alternative to more conventional approaches, such as large-scale mobility restrictions. However, manual contact tracing faces strong limitations in accessing the network of contacts, and the scalability of currently implemented protocols for smartphone-based digital contact tracing becomes impractical during the rapid expansion phases of the outbreaks, due to the surge in exposure notifications and associated tests. A substantial improvement in digital contact tracing can be obtained through the integration of probabilistic techniques for risk assessment that can more effectively guide the allocation of diagnostic tests. In this study, we first quantitatively analyze the diagnostic and social costs associated with these containment measures based on contact tracing, employing three state-of-the-art models of SARS-CoV-2 spreading. Our results suggest that probabilistic techniques allow for more effective mitigation at a lower cost. Secondly, our findings reveal a remarkable efficacy of probabilistic contact-tracing techniques in performing backward and multistep tracing and capturing superspreading events.

6.
Sci Rep ; 13(1): 7350, 2023 May 05.
Artículo en Inglés | MEDLINE | ID: mdl-37147382

RESUMEN

Estimating observables from conditioned dynamics is typically computationally hard. While obtaining independent samples efficiently from unconditioned dynamics is usually feasible, most of them do not satisfy the imposed conditions and must be discarded. On the other hand, conditioning breaks the causal properties of the dynamics, which ultimately renders the sampling of the conditioned dynamics non-trivial and inefficient. In this work, a Causal Variational Approach is proposed, as an approximate method to generate independent samples from a conditioned distribution. The procedure relies on learning the parameters of a generalized dynamical model that optimally describes the conditioned distribution in a variational sense. The outcome is an effective and unconditioned dynamical model from which one can trivially obtain independent samples, effectively restoring the causality of the conditioned dynamics. The consequences are twofold: the method allows one to efficiently compute observables from the conditioned dynamics by averaging over independent samples; moreover, it provides an effective unconditioned distribution that is easy to interpret. This approximation can be applied virtually to any dynamics. The application of the method to epidemic inference is discussed in detail. The results of direct comparison with state-of-the-art inference methods, including the soft-margin approach and mean-field methods, are promising.

7.
Phys Rev E ; 104(2-1): 024407, 2021 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-34525554

RESUMEN

Boltzmann machines (BMs) are widely used as generative models. For example, pairwise Potts models (PMs), which are instances of the BM class, provide accurate statistical models of families of evolutionarily related protein sequences. Their parameters are the local fields, which describe site-specific patterns of amino acid conservation, and the two-site couplings, which mirror the coevolution between pairs of sites. This coevolution reflects structural and functional constraints acting on protein sequences during evolution. The most conservative choice to describe the coevolution signal is to include all possible two-site couplings into the PM. This choice, typical of what is known as Direct Coupling Analysis, has been successful for predicting residue contacts in the three-dimensional structure, mutational effects, and generating new functional sequences. However, the resulting PM suffers from important overfitting effects: many couplings are small, noisy, and hardly interpretable; the PM is close to a critical point, meaning that it is highly sensitive to small parameter perturbations. In this work, we introduce a general parameter-reduction procedure for BMs, via a controlled iterative decimation of the less statistically significant couplings, identified by an information-based criterion that selects either weak or statistically unsupported couplings. For several protein families, our procedure allows one to remove more than 90% of the PM couplings, while preserving the predictive and generative properties of the original dense PM, and the resulting model is far away from criticality, hence more robust to noise.

8.
Phys Rev E ; 102(6-1): 062409, 2020 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-33465950

RESUMEN

Sequences of nucleotides (for DNA and RNA) or amino acids (for proteins) are central objects in biology. Among the most important computational problems is that of sequence alignment, i.e., arranging sequences from different organisms in such a way to identify similar regions, to detect evolutionary relationships between sequences, and to predict biomolecular structure and function. This is typically addressed through profile models, which capture position specificities like conservation in sequences but assume an independent evolution of different positions. Over recent years, it has been well established that coevolution of different amino-acid positions is essential for maintaining three-dimensional structure and function. Modeling approaches based on inverse statistical physics can catch the coevolution signal in sequence ensembles, and they are now widely used in predicting protein structure, protein-protein interactions, and mutational landscapes. Here, we present DCAlign, an efficient alignment algorithm based on an approximate message-passing strategy, which is able to overcome the limitations of profile models, to include coevolution among positions in a general way, and to be therefore universally applicable to protein- and RNA-sequence alignment without the need of using complementary structural information. The potential of DCAlign is carefully explored using well-controlled simulated data, as well as real protein and RNA sequences.


Asunto(s)
Secuencia Conservada , Evolución Molecular , Modelos Genéticos
9.
J R Soc Interface ; 16(151): 20180844, 2019 02 28.
Artículo en Inglés | MEDLINE | ID: mdl-30958195

RESUMEN

Accessing the network through which a propagation dynamics diffuses is essential for understanding and controlling it. In a few cases, such information is available through direct experiments or thanks to the very nature of propagation data. In a majority of cases however, available information about the network is indirect and comes from partial observations of the dynamics, rendering the network reconstruction a fundamental inverse problem. Here we show that it is possible to reconstruct the whole structure of an interaction network and to simultaneously infer the complete time course of activation spreading, relying just on single epoch (i.e. snapshot) or time-scattered observations of a small number of activity cascades. The method that we present is built on a belief propagation approximation, that has shown impressive accuracy in a wide variety of relevant cases, and is able to infer interactions in the presence of incomplete time-series data by providing a detailed modelling of the posterior distribution of trajectories conditioned to the observations. Furthermore, we show by experiments that the information content of full cascades is relatively smaller than that of sparse observations or single snapshots.


Asunto(s)
Algoritmos , Biología Computacional , Infecciones/epidemiología , Modelos Biológicos
10.
Phys Rev E ; 100(3-1): 032134, 2019 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-31639925

RESUMEN

The problem of efficiently reconstructing tomographic images can be mapped into a Bayesian inference problem over the space of pixels densities. Solutions to this problem are given by pixels assignments that are compatible with tomographic measurements and maximize a posterior probability density. This maximization can be performed with standard local optimization tools when the log-posterior is a convex function, but it is generally intractable when introducing realistic nonconcave priors that reflect typical images features such as smoothness or sharpness. We introduce a new method to reconstruct images obtained from Radon projections by using expectation propagation, which allows us to approximate the intractable posterior. We show, by means of extensive simulations, that, compared to state-of-the-art algorithms for this task, expectation propagation paired with very simple but non-log-concave priors is often able to reconstruct images up to a smaller error while using a lower amount of information per pixel. We provide estimates for the critical rate of information per pixel above which recovery is error-free by means of simulations on ensembles of phantom and real images.

11.
Nat Commun ; 8: 14915, 2017 04 06.
Artículo en Inglés | MEDLINE | ID: mdl-28382977

RESUMEN

Assuming a steady-state condition within a cell, metabolic fluxes satisfy an underdetermined linear system of stoichiometric equations. Characterizing the space of fluxes that satisfy such equations along with given bounds (and possibly additional relevant constraints) is considered of utmost importance for the understanding of cellular metabolism. Extreme values for each individual flux can be computed with linear programming (as flux balance analysis), and their marginal distributions can be approximately computed with Monte Carlo sampling. Here we present an approximate analytic method for the latter task based on expectation propagation equations that does not involve sampling and can achieve much better predictions than other existing analytic methods. The method is iterative, and its computation time is dominated by one matrix inversion per iteration. With respect to sampling, we show through extensive simulation that it has some advantages including computation time, and the ability to efficiently fix empirically estimated distributions of fluxes.


Asunto(s)
Escherichia coli/metabolismo , Análisis de Flujos Metabólicos , Redes y Vías Metabólicas , Programación Lineal , Método de Montecarlo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA