10.5281/zenodo.4263826
https://zenodo.org/records/4263826
oai:zenodo.org:4263826
Cheng, H.
H.
Cheng
University of Luxembourg
Großschädl, J.
J.
Großschädl
University of Luxembourg
Tian, J.
J.
Tian
University of Luxembourg
Roenne, P.
P.
Roenne
University of Luxembourg
Ryan. P.
Ryan. P.
University of Luxembourg
High-Throughput Elliptic Curve Cryptography using AVX2 Vector Instructions
Zenodo
2020
Throughput-optimized cryptography
Curve25519
Single instruction multiple data (SIMD)
Advanced vector extensions (AVX2)
2020-11-09
eng
10.5281/zenodo.4263825
https://zenodo.org/communities/futuretpm-h2020
Creative Commons Attribution 4.0 International
Single-Instruction-Multiple-Data (SIMD) extensions like Intel's AVX2 o er a great potential to accelerate elliptic curve cryptography compared to a straightforward implementation using only base x64 instructions. All existing AVX2 implementations of scalar multiplication on Curve25519 and alternative elliptic curves are optimized for low latency. We argue in this paper that many applications, most notably server-side TLS handshake processing, would bene t more from throughput-optimized implementations than latency-optimized ones. To support this argument we introduce throughput-optimized AVX2 implementations of variable-base scalar multiplication on Curve25519 and xed-base scalar multiplication on Ed25519. Both implementations perform four scalar multiplications in parallel, whereby each scalar multiplication uses a 64-bit element of a 256-bit AVX2 vector. The eld arithmetic is based on a radix-229 representation of the eld elements, which makes it possible to execute four parallel multiplications modulo a multiple of p = 2255 19 in just 88 Skylake cycles. Four variable-base scalar multiplications on Curve25519 require less than 250,000 Skylake cycles, which translates into a throughput of 32,318 scalar multiplications per second at a clock frequency of 2 GHz. For comparison, the currently best latency-optimized AVX2 implementation reaches a throughput of only about 21,000 scalar multiplications per second on the same Skylake processor.
European Commission
00k4n6c32
779391
Future Proofing the Connected World: A Quantum-Resistant Trusted Platform Module