Conference paper Open Access

High-Throughput Elliptic Curve Cryptography using AVX2 Vector Instructions

Cheng, H.; Großschädl, J.; Tian, J.; Roenne, P.; Ryan. P.

Single-Instruction-Multiple-Data (SIMD) extensions like Intel's AVX2 o er a great potential to accelerate elliptic curve cryptography compared to a straightforward implementation using only base x64 instructions. All existing AVX2 implementations of scalar multiplication on Curve25519 and alternative elliptic curves are optimized for low latency. We argue in this paper that many applications, most notably server-side TLS handshake processing, would bene t more from throughput-optimized implementations than latency-optimized ones. To support this argument we introduce throughput-optimized AVX2 implementations of variable-base scalar multiplication on Curve25519 and xed-base scalar multiplication on Ed25519. Both implementations perform four scalar multiplications in parallel, whereby each scalar multiplication uses a 64-bit element of a 256-bit AVX2 vector. The eld arithmetic is based on a radix-229 representation of the eld elements, which makes it possible to execute four parallel multiplications modulo a multiple of p = 2255 􀀀 19 in just 88 Skylake cycles. Four variable-base scalar multiplications on Curve25519 require less than 250,000 Skylake cycles, which translates into a throughput of 32,318 scalar multiplications per second at a clock frequency of 2 GHz. For comparison, the currently best latency-optimized AVX2 implementation reaches a throughput of only about 21,000 scalar multiplications per second on the same Skylake processor.

Files (463.0 kB)
Name Size
53-High-Throughput-Elliptic-Curve-Cryptography-using-AVX2-Vector-Instructions.pdf
md5:fdfae83387bf443043253c61fdea4d27
463.0 kB Download
15
14
views
downloads
All versions This version
Views 1515
Downloads 1414
Data volume 6.5 MB6.5 MB
Unique views 1313
Unique downloads 1111

Share

Cite as