What to do about memory
, part 2: Secure memory erasure
Aside from the ByteArray
classes, there is another feature that I wish to extract from memory
, and that is the ScrubbedBytes
, or rather, the functionality that backs it.
What is ScrubbedBytes
?
I got into it a bit last post, briefly covering ScrubbedBytes
as one of the occasioanlly used features from memory
.
To quote the documentation:
ScrubbedBytes is a memory chunk which have the properties of:
- Being scrubbed after its goes out of scope.
- A Show instance that doesn't actually show any content
- A Eq instance that is constant time
Data types like these are used to avoid accidentally leaking sensitive data and / or personally identifiable information through means such as logs and core dump, and any production team worth their salt has a process set up to handle this (usually accompanied by / piped straight into analytics).
Breaking it down
- The memory is scrubbed after it goes out of scope.
When all references to a ScrubbedBytes
instance are gone, the finalizer runs, and the data will be explicitly zeroed out. This is great! Kind of…
But what does it actually do? Well, if we peek into the source code of newScrubbedBytes
, we see that ScrubbedBytes
is just an aligned, pinned byte array with a finalizer. The finalizer contains a dynamically-generated scrubber function, which eventually calls memset
on the memory region.
ScrubbedBytes.hs
:
newScrubbedBytes :: Int -> IO ScrubbedBytes
newScrubbedBytes (I# sz)
-- Elided zero-size cases
| otherwise = IO $ \s ->
case newAlignedPinnedByteArray# sz 8# s of
(# s1, mbarr #) ->
let !scrubber = getScrubber (byteArrayContents# (unsafeCoerce# mbarr))
!mba = ScrubbedBytes mbarr
in case mkWeak# mbarr () (finalize scrubber mba) s1 of
(# s2, _ #) -> (# s2, mba #)
where
getScrubber :: Addr# -> State# RealWorld -> State# RealWorld
getScrubber addr s =
let IO scrubBytes = memSet (Ptr addr) 0 (I# sz)
in case scrubBytes s of
(# s', _ #) -> s'
finalize :: (State# RealWorld -> State# RealWorld) -> ScrubbedBytes -> State# RealWorld -> (# State# RealWorld, () #)
finalize scrubber mba@(ScrubbedBytes _) = \s1 ->
case scrubber s1 of
s2 -> case touch# mba s2 of
s3 -> (# s3, () #)
PtrMethods.hs
:
memSet :: Ptr Word8 -> Word8 -> Int -> IO ()
memSet start v n = c_memset start v (fromIntegral n) >>= \_ -> return ()
foreign import ccall unsafe "memset"
c_memset :: Ptr Word8 -> Word8 -> CSize -> IO ()
This attempts to avoid leaking sensitive data in core dumps and memory (I believe it is attempting to use the crossing of FFI / module boundaries in between the memset and free to ensure it is not elided), and is functionality that needs to be preserved, but it’s just calling memset
from the C standard library, and it isn’t really exposed in a manner that I am comfortable with, either. In the absence of another mechanism, relying on GHC triggering GC to clean up sensitive memory is just begging for trouble.
Also, this is all done in an attempt to prevent the compiler from eliding the memset
because it occurs immediately before a free
. Many compilers will automatically remove any writes to a memory region that is about to be freed, which has been a significant thorn in the side of computer security, and has caused countless leaks - there are better options than memset
, more on that later.
I would prefer a bracketed withScrubbedBytes
-style function over relying on a data type with finalizers, but it is really the secure / guaranteed erasure capability that we are after, and using it in a finalizer is better than nothing.
Notably, this is something that cannot be handled in Haskell - it requires dropping down to C
.
- A Show instance that doesn’t actually show any content
When we print a value, we don’t want to show sensitive data or PII, so instead we sanitize it. It doesn’t stop you from directly accessing the raw contents and intentionally printing them, but it can save you from accidental exposure, eg if a debug statement makes it into production
This isn’t really specific to any data type or function or even language. Any language with a Show
or Printable
can override their debug / print / log instances to avoid any sensitive data from being exposed. Thus, preserving this functionality doesn’t require any special effort.
Perhaps instead of requiring the logic in Show
, it could be codified in a typeclass SecureShow
that provides a printing formatter that automatically elides secure sub-values via Generics or something. However, that requires then that there be a secure logging mechanism that utilizes it, and is a bit far out for the moment.
- A Eq instance that is constant time
When we compare sensitive data for equality, the goal is to report whether or not they are equal, and nothing more.
This requires special handling, because if you short-circuit the comparison, how long it takes to compare them may leak information - eg, if we stop comparing at the first byte because they are different, you know something about where in the data the difference occurs, and thus the implementation leaks more information than just ‘is equal or not’.
This is not especially problematic, as we simply need ensure that the entire data is compared before returning the result.
Additionally, constEq
is already part of the ByteArray
classes, and can be preserved from there.
Replacing memset
Now, with our properties in mind, we can really focus on the big gotcha, which is replacing use of memset
. But why? First, some history:
In the beginning, the Universe C standard was created without any concern for data security. This has made a lot of programs very insecure and been widely regarded as a bad move.
Compilers back then were not nearly as smart, and CPU design was much simpler; when you issued a memset
call, you could expect that that memory region would be set. Then compilers got smarter, and started optimizing. One of those optimizations is to discard any writes to memory that is about to be freed, and another optimization is to leave the content of malloc
uninitialized. You may see where this is going - all of a sudden, data that had been cleared out using memset
was reappearing in newly malloc
ed data. This is not a problem if calloc
is used, but it is slower, and this issue retroactively affected older code, and we’ve been paying for it ever since.
There have since been developed three general different ways of handling secure memory erasure:
- Adding a secure memset variant to the C standard, which is guaranteed not to be elided
- memset (C89)
- memset_s (C11)
- memset_explicit (C23)
memset
is the original function we are trying to replace. C11 added the memset_s
function for secure erasure.
memset_s()
can not be elided: K.3.7.4.1: Unlike memset
, any call to the memset_s
function shall be evaluated strictly according to the rules of the abstract machine as described in (5.1.2.3). That is, any call to the memset_s
function shall assume that the memory indicated by s
and n
may be accessible in the future and thus must contain the values indicated by c
.
Unfortunately, it was both improperly specified, and added as a part of Annex K, which was not a required part of the standard, and thus not widely implemented.
As a replacement for memset_s
, C23 specifies the function memset_explicit
, which should be preferred in new code. However, C23 is not yet finalized, and thus cannot be assumed to be available yet.
Thus, we should prefer to call memset_explicit
if at all possible. I’m not sure whether we should even try to call memset_s
, more investigation is needed.
- Calling an OS-specific secure memset variant
- explicit_memset (NetBSD, Oracle)
- RtlSecureZeroMemory (Windows)
- explicit_bzero (FreeBSD, OpenBSD, Linux)
- memzero_explicit (Linux)
Our secondary preference is to call an OS-specific function explicitly intended for securely erasing memory. There are a plethora of these, and we can select which one at compile-time by detecting the operating system.
We should prefer to call these if and only if memset_explicit
is not available.
- Using volatile function or memory pointers to avoid being optimized out
- calling memset on the memory region through a volatile function pointer
- writing to the memory region through a volatile pointer
- writing an extern value to the memory region
We should fall back on these if and only if no other solution is available. Notably, MacOS / clang
uses this fallback in botan
, with an accompanying note.
- Increasingly wild alternative methods
- using
mmap + MAP_ANONYMOUS + MAP_PRIVATE
- using a function from a widely-used library (eg OpenSSL, libsodium, botan) which does 1-3 for you
These can’t be relied upon, for various reasons. Also, mmap
is POSIX and MAP_ANONYMOUS
was just added in POSIX 8.
Doing it properly vs doing what we can
Doing it properly requires deeper support than just C- or ASM-level. Modern CPUs are incredibly complex, with multiple levels of cache to handle, and in some ways, the problem has just moved inside of the CPUs, where the internal microarchitecture may still JIT discard / optimize instructions out of the machine code unless the proper instructions are used. The real solution is for the OS to keep keys / sensitive data / PII inside of a secure enclave, but that’s OS & hardware level, and “Dont ever load it in local memory” is not a solution that is always available.
We can’t handle that here, so the methods described are what we can do at the language level - we do the best to provide a solution for when we DO have sensitive information in local memory.
secure-memory-erase
To that end, I’ve plucked out and stripped down the secure-memory-erase
code from botan C++
, and have been looking at / comparing it to similar solutions in other libraries like libsodium
. Right now it is a single C/C++
header + source file, which exposes a single C function, which is in turn foreign imported by an hsc
file.
secure_memory_erase.h
:
#pragma once
#include <stddef.h>
#include <stdint.h>
#ifdef __cplusplus
extern "C" {
#endif
void secure_memory_erase(void * mem, size_t bytes);
#ifdef __cplusplus
} // extern C
#endif
secure_memory_erase.cpp
:
#include "secure_memory_erase.h"
#include <stdio.h>
#if defined(SME_TARGET_OS_HAS_RTLSECUREZEROMEMORY)
// Windows: void RtlSecureZeroMemory(void * ptr, size_t n);
// https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-rtlsecurezeromemory
#define NOMINMAX 1
#define _WINSOCKAPI_ // stop windows.h including winsock.h
#define WIN32_LEAN_AND_MEAN // stop windows.h including lots of things
#include <windows.h>
#undef NOMINMAX
#undef _WINSOCKAPI_
#undef WIN32_LEAN_AND_MEAN
#elif defined(SME_TARGET_OS_HAS_EXPLICIT_BZERO)
// Linux, FreeBSD, OpenBSD: void explicit_bzero(void *b, size_t len);
// https://man.freebsd.org/cgi/man.cgi?query=explicit_bzero
#include <string.h>
#elif defined(SME_TARGET_OS_HAS_EXPLICIT_MEMSET)
// NetBSD: void * explicit_memset(void *b, int c, size_t len);
// https://man.netbsd.org/NetBSD-8.0/explicit_memset.3
// NOTE: C23 is memset_explicit, *NOT* NetBSD explicit_memset
#include <string.h>
#endif
namespace Secure::Memory::Erase {
void erase(void * ptr, size_t n) {
// TODO: If C23, just call memset_explicit like libsodium
// TODO: If C11 and supported, maybe call memset_s
#if defined(SME_TARGET_OS_HAS_RTLSECUREZEROMEMORY)
// Windows
::RtlSecureZeroMemory(ptr, n);
#elif defined(SME_TARGET_OS_HAS_EXPLICIT_BZERO)
// FreeBSD, OpenBSD, Linux
::explicit_bzero(ptr, n);
#elif defined(SME_TARGET_OS_HAS_EXPLICIT_MEMSET)
// NetBSD
(void)::explicit_memset(ptr, 0, n);
#elif defined(SME_USE_VOLATILE_MEMSET_FOR_ZERO) && (SME_USE_VOLATILE_MEMSET_FOR_ZERO == 1)
// MacOS
/*
Call memset through a static volatile pointer, which the compiler
should not elide.
*/
static void* (*const volatile memset_ptr)(void*, int, size_t) = std::memset;
(memset_ptr)(ptr, 0, n);
#else
// MacOS
/*
Cast to a volatile pointer and set each byte in a loop, which the compiler
should not elide.
*/
volatile uint8_t* p = reinterpret_cast<volatile uint8_t*>(ptr);
for(size_t i = 0; i != n; ++i) {
p[i] = 0;
}
#endif
} // erase
} // Secure::Memory::Erase
extern "C" {
void secure_memory_erase(void * ptr, size_t n) {
Secure::Memory::Erase::erase(ptr,n);
}
} // extern C
Secure/Memory/Erase.hsc
:
{-# LANGUAGE CApiFFI #-}
module Secure.Memory.Erase where
import Prelude
import Foreign.C.Types
import Foreign.Ptr
#include "secure_memory_erase.h"
foreign import capi safe "secure_memory_erase.h secure_memory_erase"
secureMemoryErase
:: Ptr a -- ^ mem
-> CSize -- ^ bytes
-> IO ()
Which solution actually gets used depends on the cabal os
flags - botan C++
uses a configure
script step to detect the OS on install, correlating it with an OS-dependent capabilities
list that gets turned into a set of CPP #ifdef
flags. We do the same, using the cabal os
instead:
if os(windows)
cxx-options: -DSME_TARGET_OS_HAS_RTLSECUREZEROMEMORY
if os(linux) || os(freebsd) || os(openbsd)
cxx-options: -DSME_TARGET_OS_HAS_EXPLICIT_BZERO
if os(netbsd)
cxx-options: -DSME_TARGET_OS_HAS_EXPLICIT_MEMSET
It is not properly tested, however, and we could potentially support a wider / more fine-grained range of operating systems (botan C++
includes a rather extensive list).
I’d potentially like to ditch the C++ in favor of pure C, as well as adding a flag to use botan C++
as a dependency instead of the included source, but so far, it appears to be working.
Ideally, there should be also a Scrubbed
or Secure
typeclass that requires a secure erasure function, thus making the actual solution more of a choice, and then we’d simply be providing a sensible default secureMemoryErase
function without getting in the way of other solutions. It may be a little convoluted, but this would allow botan
to depend on secure-memory-erase
while still using its own functions internally.
Between this and the bytearray-classes
that I’ve been sussing out, I think I have separated out and established the critical functionality that should make an update to botan-low
& botan
actually meaningful, but also more broadly applicable.
Hey everyone, it’s good to be updating again. I’ve been away for health reasons - I’m doing rather well lately but I very consciously have to take it easy in order to maintain that. I may be a bit quiet or slow to respond, but I’m here, and I’ll be taking my time as to make sure I don’t deplete myself again.