Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Phys Rev Lett ; 130(6): 067401, 2023 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-36827575

RESUMO

Real-world datasets characterized by discrete features are ubiquitous: from categorical surveys to clinical questionnaires, from unweighted networks to DNA sequences. Nevertheless, the most common unsupervised dimensional reduction methods are designed for continuous spaces, and their use for discrete spaces can lead to errors and biases. In this Letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces. We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting, finding a surprisingly small ID, of order 2. This suggests that evolutive pressure acts on a low-dimensional manifold despite the high dimensionality of sequences' space.

2.
Patterns (N Y) ; 3(10): 100589, 2022 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-36277821

RESUMO

DADApy is a Python software package for analyzing and characterizing high-dimensional data manifolds. It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering, and for comparing different distance metrics. We review the main functionalities of the package and exemplify its usage in a synthetic dataset and in a real-world application. DADApy is freely available under the open-source Apache 2.0 license.

3.
PNAS Nexus ; 1(2): pgac039, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-36713323

RESUMO

Real-world data typically contain a large number of features that are often heterogeneous in nature, relevance, and also units of measure. When assessing the similarity between data points, one can build various distance measures using subsets of these features. Finding a small set of features that still retains sufficient information about the dataset is important for the successful application of many statistical learning approaches. We introduce a statistical test that can assess the relative information retained when using 2 different distance measures, and determine if they are equivalent, independent, or if one is more informative than the other. This ranking can in turn be used to identify the most informative distance measure and, therefore, the most informative set of features, out of a pool of candidates. To illustrate the general applicability of our approach, we show that it reproduces the known importance ranking of policy variables for Covid-19 control, and also identifies compact yet informative descriptors for atomic structures. We further provide initial evidence that the information asymmetry measured by the proposed test can be used to infer relationships of causality between the features of a dataset. The method is general and should be applicable to many branches of science.

4.
J Chem Phys ; 154(22): 224112, 2021 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-34241204

RESUMO

We probe the accuracy of linear ridge regression employing a three-body local density representation derived from the atomic cluster expansion. We benchmark the accuracy of this framework in the prediction of formation energies and atomic forces in molecules and solids. We find that such a simple regression framework performs on par with state-of-the-art machine learning methods which are, in most cases, more complex and more computationally demanding. Subsequently, we look for ways to sparsify the descriptor and further improve the computational efficiency of the method. To this aim, we use both principal component analysis and least absolute shrinkage operator regression for energy fitting on six single-element datasets. Both methods highlight the possibility of constructing a descriptor that is four times smaller than the original with a similar or even improved accuracy. Furthermore, we find that the reduced descriptors share a sizable fraction of their features across the six independent datasets, hinting at the possibility of designing material-agnostic, optimally compressed, and accurate descriptors.

5.
Chem Rev ; 121(16): 9722-9758, 2021 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-33945269

RESUMO

Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.

6.
J Chem Phys ; 153(12): 124108, 2020 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-33003713

RESUMO

The recently introduced Gaussian Process State (GPS) provides a highly flexible, compact, and physically insightful representation of quantum many-body states based on ideas from the zoo of machine learning approaches. In this work, we give a comprehensive description of how such a state can be learned from given samples of a potentially unknown target state and show how regression approaches based on Bayesian inference can be used to compress a target state into a highly compact and accurate GPS representation. By application of a type II maximum likelihood method based on relevance vector machines, we are able to extract many-body configurations from the underlying Hilbert space, which are particularly relevant for the description of the target state, as support points to define the GPS. Together with an introduced optimization scheme for the hyperparameters of the model characterizing the weighting of modeled correlation features, this makes it possible to easily extract physical characteristics of the state such as the relative importance of particular correlation properties. We apply the Bayesian learning scheme to the problem of modeling ground states of small Fermi-Hubbard chains and show that the found solutions represent a systematically improvable trade-off between sparsity and accuracy of the model. Moreover, we show how the learned hyperparameters and the extracted relevant configurations, characterizing the correlation of the wave function, depend on the interaction strength of the Hubbard model and the target accuracy of the representation.

7.
J Chem Phys ; 148(24): 241739, 2018 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-29960375

RESUMO

We assess Gaussian process (GP) regression as a technique to model interatomic forces in metal nanoclusters by analyzing the performance of 2-body, 3-body, and many-body kernel functions on a set of 19-atom Ni cluster structures. We find that 2-body GP kernels fail to provide faithful force estimates, despite succeeding in bulk Ni systems. However, both 3- and many-body kernels predict forces within an ∼0.1 eV/Šaverage error even for small training datasets and achieve high accuracy even on out-of-sample, high temperature structures. While training and testing on the same structure always provide satisfactory accuracy, cross-testing on dissimilar structures leads to higher prediction errors, posing an extrapolation problem. This can be cured using heterogeneous training on databases that contain more than one structure, which results in a good trade-off between versatility and overall accuracy. Starting from a 3-body kernel trained this way, we build an efficient non-parametric 3-body force field that allows accurate prediction of structural properties at finite temperatures, following a newly developed scheme [A. Glielmo et al., Phys. Rev. B 95, 214302 (2017)]. We use this to assess the thermal stability of Ni19 nanoclusters at a fractional cost of full ab initio calculations.

8.
Phys Rev E ; 93(3): 032901, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-27078430

RESUMO

The coefficient of restitution may be determined from the sound signal emitted by a sphere bouncing repeatedly off the ground. Although there is a large number of publications exploiting this method, so far, there is no quantitative discussion of the error related to this type of measurement. Analyzing the main error sources, we find that even tiny deviations of the shape from the perfect sphere may lead to substantial errors that dominate the overall error of the measurement. Therefore, we come to the conclusion that the well-established method to measure the coefficient of restitution through the emitted sound is applicable only for the case of nearly perfect spheres. For larger falling height, air drag may lead to considerable error, too.

9.
Phys Rev E Stat Nonlin Soft Matter Phys ; 90(5-1): 052204, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25493788

RESUMO

We consider the motion of an aspherical inelastic particle of dumbbell type bouncing repeatedly on a horizontal flat surface. The coefficient of restitution of such a particle depends not only on material properties and impact velocity but also on the angular orientation at the instant of the collision whose variance is considerable, even for small eccentricity. Assuming random angular orientation of the particle at the instant of contact we characterize the measured coefficient of restitution as a fluctuating quantity and obtain a wide probability density function including a finite probability for negative values of the coefficient of restitution. This may be understood from the partial exchange of translational and rotational kinetic energy.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...