Accelerating seminumerical Fock-exchange calculations using mixed single- and double-precision arithmethic.

Laqua, Henryk; Kussmann, Jörg; Ochsenfeld, Christian

Laqua, Henryk; Kussmann, Jörg; Ochsenfeld, Christian.

Afiliação

Laqua H; Department of Chemistry, Chair of Theoretical Chemistry, University of Munich (LMU), D-81377 München, Germany.
Kussmann J; Department of Chemistry, Chair of Theoretical Chemistry, University of Munich (LMU), D-81377 München, Germany.
Ochsenfeld C; Department of Chemistry, Chair of Theoretical Chemistry, University of Munich (LMU), D-81377 München, Germany.

J Chem Phys ; 154(21): 214116, 2021 Jun 07.

Article em En | MEDLINE | ID: mdl-34240990

RESUMO

We investigate the applicability of single-precision (fp32) floating point operations within our linear-scaling, seminumerical exchange method sn-LinK [Laqua et al., J. Chem. Theory Comput. 16, 1456 (2020)] and find that the vast majority of the three-center-one-electron (3c1e) integrals can be computed with reduced numerical precision with virtually no loss in overall accuracy. This leads to a near doubling in performance on central processing units (CPUs) compared to pure fp64 evaluation. Since the cost of evaluating the 3c1e integrals is less significant on graphic processing units (GPUs) compared to CPU, the performance gains from accelerating 3c1e integrals alone is less impressive on GPUs. Therefore, we also investigate the possibility of employing only fp32 operations to evaluate the exchange matrix within the self-consistent-field (SCF) followed by an accurate one-shot evaluation of the exchange energy using mixed fp32/fp64 precision. This still provides very accurate (1.8 µEh maximal error) results while providing a sevenfold speedup on a typical "gaming" GPU (GTX 1080Ti). We also propose the use of incremental exchange-builds to further reduce these errors. The proposed SCF scheme (i-sn-LinK) requires only one mixed-precision exchange matrix calculation, while all other exchange-matrix builds are performed with only fp32 operations. Compared to pure fp64 evaluation, this leads to 4-7× speedups for the whole SCF procedure without any significant deterioration of the results or the convergence behavior.

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2021 Tipo de documento: Article