Running Manjaro now. Unlike Gentoo, and similar to Debian arm64, it defaults to poor SSL performance then shows that same 60% performance jump just by setting OPENSSL_armcap=0. This pointed to a key factor being OpenSSL 1.0.2s vs 1.1.1c.
Then looking at a source file that is unique to 1.1.1 (vpaes-armv8.pl), this partly aligns with ejolson's theory:
jdonald wrote: ↑Sat Aug 17, 2019 8:59 pm
ejolson wrote: ↑Sat Aug 17, 2019 7:43 pm
timing-related side-channel attacks. Therefore, it is possible that the current 64-bit ARM assembler in openssl works but was deemed unsafe and that is why it is currently disabled.
Is compiled code necessarily more resistant against timing attacks?
Code: Select all
# CBC enc ECB enc/dec(*) [bit-sliced enc/dec]
# Cortex-A53 21.5 18.1/20.6 [17.5/19.8 ]
# Cortex-A57 36.0(**) 20.4/24.9(**) [14.4/16.6 ]
# ...
# (**) these results are worse than scalar compiler-generated
# code, but it's constant-time and therefore preferred;
(Note: This actually somewhat the opposite of the earlier hypothesis, rather compiled code here is considered
less safe while the assembler path for VPAES_ASM is slower.)
As the comment says they are indeed sacrificing 64-bit performance further for the intent of making it more resistant to side-channel attacks. At the same time, that's a whole additional 60% performance cost in a CPU-constrained system.
Gentoo avoids the performance loss at the expense of remaining theoretically more vulnerable. I doubt this was a conscious decision, just an artifact of sticking with OpenSSL 1.0.2 for other reasons.
And non-Gentoo 64-bit users are put into the odd situation that they can choose to reduce overhead by running Chromium or Firefox with OPENSSL_armcap=0
Funny how such intricate consequences ultimately stemmed from the decision not to put ARMv8 cryptographic extensions in the Pi 4.