Skip to content

Commit b41c1d8

Browse files
author
Eric Biggers
committed
fscrypt: Don't use problematic non-inline crypto engines
Make fscrypt no longer use Crypto API drivers for non-inline crypto engines, even when the Crypto API prioritizes them over CPU-based code (which unfortunately it often does). These drivers tend to be really problematic, especially for fscrypt's workload. This commit has no effect on inline crypto engines, which are different and do work well. Specifically, exclude drivers that have CRYPTO_ALG_KERN_DRIVER_ONLY or CRYPTO_ALG_ALLOCATES_MEMORY set. (Later, CRYPTO_ALG_ASYNC should be excluded too. That's omitted for now to keep this commit backportable, since until recently some CPU-based code had CRYPTO_ALG_ASYNC set.) There are two major issues with these drivers: bugs and performance. First, these drivers tend to be buggy. They're fundamentally much more error-prone and harder to test than the CPU-based code. They often don't get tested before kernel releases, and even if they do, the crypto self-tests don't properly test these drivers. Released drivers have en/decrypted or hashed data incorrectly. These bugs cause issues for fscrypt users who often didn't even want to use these drivers, e.g.: - google/fscryptctl#32 - google/fscryptctl#9 - https://lore.kernel.org/r/PH0PR02MB731916ECDB6C613665863B6CFFAA2@PH0PR02MB7319.namprd02.prod.outlook.com These drivers have also similarly caused issues for dm-crypt users, including data corruption and deadlocks. Since Linux v5.10, dm-crypt has disabled most of them by excluding CRYPTO_ALG_ALLOCATES_MEMORY. Second, these drivers tend to be *much* slower than the CPU-based code. This may seem counterintuitive, but benchmarks clearly show it. There's a *lot* of overhead associated with going to a hardware driver, off the CPU, and back again. To prove this, I gathered as many systems with this type of crypto engine as I could, and I measured synchronous encryption of 4096-byte messages (which matches fscrypt's workload): Intel Emerald Rapids server: AES-256-XTS: xts-aes-vaes-avx512 16171 MB/s [CPU-based, Vector AES] qat_aes_xts 289 MB/s [Offload, Intel QuickAssist] Qualcomm SM8650 HDK: AES-256-XTS: xts-aes-ce 4301 MB/s [CPU-based, ARMv8 Crypto Extensions] xts-aes-qce 73 MB/s [Offload, Qualcomm Crypto Engine] i.MX 8M Nano LPDDR4 EVK: AES-256-XTS: xts-aes-ce 647 MB/s [CPU-based, ARMv8 Crypto Extensions] xts(ecb-aes-caam) 20 MB/s [Offload, CAAM] AES-128-CBC-ESSIV: essiv(cbc-aes-caam,sha256-lib) 23 MB/s [Offload, CAAM] STM32MP157F-DK2: AES-256-XTS: xts-aes-neonbs 13.2 MB/s [CPU-based, ARM NEON] xts(stm32-ecb-aes) 3.1 MB/s [Offload, STM32 crypto engine] AES-128-CBC-ESSIV: essiv(cbc-aes-neonbs,sha256-lib) 14.7 MB/s [CPU-based, ARM NEON] essiv(stm32-cbc-aes,sha256-lib) 3.2 MB/s [Offload, STM32 crypto engine] Adiantum: adiantum(xchacha12-arm,aes-arm,nhpoly1305-neon) 52.8 MB/s [CPU-based, ARM scalar + NEON] So, there was no case in which the crypto engine was even *close* to being faster. On the first three, which have AES instructions in the CPU, the CPU was 30 to 55 times faster (!). Even on STM32MP157F-DK2 which has a Cortex-A7 CPU that doesn't have AES instructions, AES was over 4 times faster on the CPU. And Adiantum encryption, which is what actually should be used on CPUs like that, was over 17 times faster. Other justifications that have been given for these non-inline crypto engines (almost always coming from the hardware vendors, not actual users) don't seem very plausible either: - The crypto engine throughput could be improved by processing multiple requests concurrently. Currently irrelevant to fscrypt, since it doesn't do that. This would also be complex, and unhelpful in many cases. 2 of the 4 engines I tested even had only one queue. - Some of the engines, e.g. STM32, support hardware keys. Also currently irrelevant to fscrypt, since it doesn't support these. Interestingly, the STM32 driver itself doesn't support this either. - Free up CPU for other tasks and/or reduce energy usage. Not very plausible considering the "short" message length, driver overhead, and scheduling overhead. There's just very little time for the CPU to do something else like run another task or enter low-power state, before the message finishes and it's time to process the next one. - Some of these engines resist power analysis and electromagnetic attacks, while the CPU-based crypto generally does not. In theory, this sounds great. In practice, if this benefit requires the use of an off-CPU offload that massively regresses performance and has a low-quality, buggy driver, the price for this hardening (which is not relevant to most fscrypt users, and tends to be incomplete) is just too high. Inline crypto engines are much more promising here, as are on-CPU solutions like RISC-V High Assurance Cryptography. Fixes: b30ab0e ("ext4 crypto: add ext4 encryption facilities") Cc: stable@vger.kernel.org Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250704070322.20692-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
1 parent 66271c1 commit b41c1d8

5 files changed

Lines changed: 37 additions & 25 deletions

File tree

Documentation/filesystems/fscrypt.rst

Lines changed: 15 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -147,9 +147,8 @@ However, these ioctls have some limitations:
147147
were wiped. To partially solve this, you can add init_on_free=1 to
148148
your kernel command line. However, this has a performance cost.
149149

150-
- Secret keys might still exist in CPU registers, in crypto
151-
accelerator hardware (if used by the crypto API to implement any of
152-
the algorithms), or in other places not explicitly considered here.
150+
- Secret keys might still exist in CPU registers or in other places
151+
not explicitly considered here.
153152

154153
Full system compromise
155154
~~~~~~~~~~~~~~~~~~~~~~
@@ -406,9 +405,12 @@ the work is done by XChaCha12, which is much faster than AES when AES
406405
acceleration is unavailable. For more information about Adiantum, see
407406
`the Adiantum paper <https://eprint.iacr.org/2018/720.pdf>`_.
408407

409-
The (AES-128-CBC-ESSIV, AES-128-CBC-CTS) pair exists only to support
410-
systems whose only form of AES acceleration is an off-CPU crypto
411-
accelerator such as CAAM or CESA that does not support XTS.
408+
The (AES-128-CBC-ESSIV, AES-128-CBC-CTS) pair was added to try to
409+
provide a more efficient option for systems that lack AES instructions
410+
in the CPU but do have a non-inline crypto engine such as CAAM or CESA
411+
that supports AES-CBC (and not AES-XTS). This is deprecated. It has
412+
been shown that just doing AES on the CPU is actually faster.
413+
Moreover, Adiantum is faster still and is recommended on such systems.
412414

413415
The remaining mode pairs are the "national pride ciphers":
414416

@@ -1318,22 +1320,13 @@ this by validating all top-level encryption policies prior to access.
13181320
Inline encryption support
13191321
=========================
13201322

1321-
By default, fscrypt uses the kernel crypto API for all cryptographic
1322-
operations (other than HKDF, which fscrypt partially implements
1323-
itself). The kernel crypto API supports hardware crypto accelerators,
1324-
but only ones that work in the traditional way where all inputs and
1325-
outputs (e.g. plaintexts and ciphertexts) are in memory. fscrypt can
1326-
take advantage of such hardware, but the traditional acceleration
1327-
model isn't particularly efficient and fscrypt hasn't been optimized
1328-
for it.
1329-
1330-
Instead, many newer systems (especially mobile SoCs) have *inline
1331-
encryption hardware* that can encrypt/decrypt data while it is on its
1332-
way to/from the storage device. Linux supports inline encryption
1333-
through a set of extensions to the block layer called *blk-crypto*.
1334-
blk-crypto allows filesystems to attach encryption contexts to bios
1335-
(I/O requests) to specify how the data will be encrypted or decrypted
1336-
in-line. For more information about blk-crypto, see
1323+
Many newer systems (especially mobile SoCs) have *inline encryption
1324+
hardware* that can encrypt/decrypt data while it is on its way to/from
1325+
the storage device. Linux supports inline encryption through a set of
1326+
extensions to the block layer called *blk-crypto*. blk-crypto allows
1327+
filesystems to attach encryption contexts to bios (I/O requests) to
1328+
specify how the data will be encrypted or decrypted in-line. For more
1329+
information about blk-crypto, see
13371330
:ref:`Documentation/block/inline-encryption.rst <inline_encryption>`.
13381331

13391332
On supported filesystems (currently ext4 and f2fs), fscrypt can use

fs/crypto/fscrypt_private.h

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,23 @@
4545
*/
4646
#undef FSCRYPT_MAX_KEY_SIZE
4747

48+
/*
49+
* This mask is passed as the third argument to the crypto_alloc_*() functions
50+
* to prevent fscrypt from using the Crypto API drivers for non-inline crypto
51+
* engines. Those drivers have been problematic for fscrypt. fscrypt users
52+
* have reported hangs and even incorrect en/decryption with these drivers.
53+
* Since going to the driver, off CPU, and back again is really slow, such
54+
* drivers can be over 50 times slower than the CPU-based code for fscrypt's
55+
* workload. Even on platforms that lack AES instructions on the CPU, using the
56+
* offloads has been shown to be slower, even staying with AES. (Of course,
57+
* Adiantum is faster still, and is the recommended option on such platforms...)
58+
*
59+
* Note that fscrypt also supports inline crypto engines. Those don't use the
60+
* Crypto API and work much better than the old-style (non-inline) engines.
61+
*/
62+
#define FSCRYPT_CRYPTOAPI_MASK \
63+
(CRYPTO_ALG_ALLOCATES_MEMORY | CRYPTO_ALG_KERN_DRIVER_ONLY)
64+
4865
#define FSCRYPT_CONTEXT_V1 1
4966
#define FSCRYPT_CONTEXT_V2 2
5067

fs/crypto/hkdf.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, const u8 *master_key,
5858
u8 prk[HKDF_HASHLEN];
5959
int err;
6060

61-
hmac_tfm = crypto_alloc_shash(HKDF_HMAC_ALG, 0, 0);
61+
hmac_tfm = crypto_alloc_shash(HKDF_HMAC_ALG, 0, FSCRYPT_CRYPTOAPI_MASK);
6262
if (IS_ERR(hmac_tfm)) {
6363
fscrypt_err(NULL, "Error allocating " HKDF_HMAC_ALG ": %ld",
6464
PTR_ERR(hmac_tfm));

fs/crypto/keysetup.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,8 @@ fscrypt_allocate_skcipher(struct fscrypt_mode *mode, const u8 *raw_key,
104104
struct crypto_skcipher *tfm;
105105
int err;
106106

107-
tfm = crypto_alloc_skcipher(mode->cipher_str, 0, 0);
107+
tfm = crypto_alloc_skcipher(mode->cipher_str, 0,
108+
FSCRYPT_CRYPTOAPI_MASK);
108109
if (IS_ERR(tfm)) {
109110
if (PTR_ERR(tfm) == -ENOENT) {
110111
fscrypt_warn(inode,

fs/crypto/keysetup_v1.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,8 @@ static int derive_key_aes(const u8 *master_key,
5252
struct skcipher_request *req = NULL;
5353
DECLARE_CRYPTO_WAIT(wait);
5454
struct scatterlist src_sg, dst_sg;
55-
struct crypto_skcipher *tfm = crypto_alloc_skcipher("ecb(aes)", 0, 0);
55+
struct crypto_skcipher *tfm =
56+
crypto_alloc_skcipher("ecb(aes)", 0, FSCRYPT_CRYPTOAPI_MASK);
5657

5758
if (IS_ERR(tfm)) {
5859
res = PTR_ERR(tfm);

0 commit comments

Comments
 (0)