Hey Tom,
Thank you for your response. You've made some great points. My response is inline.
On Mon, Aug 19, 2019 at 04:09:36PM +0000, Tom Ritter wrote:
Okay I'm going to try and clear up a lot of misconceptions and stuff here. I don't own Firefox's memory allocator but I have worked in it, recently, and am one of the people who are working on hardening it.
Firefox's memory allocator is not jemalloc. It's probably better referred to as mozjemalloc. We forked jemalloc and have been improving it (at least from our perspective.) Any analysis of or comparison to jemalloc is - at this point - outdated and should be redone from scratch against mozjemalloc on mozilla-central.
LD_PRELOAD='/path/to/libhardened_malloc.so' /path/to/program will do nothing or approximately nothing. mozjemalloc uses mmap and low level allocation tools to create chunks of memory to be used by its internal memory allocator. To successfully replace Firefox memory allocator you should either use LD_PRELOAD _with_ a --disable-jemalloc build OR Firefox's replace_malloc functionality: https://searchfox.org/mozilla-central/source/memory/build/replace_malloc.h
Completely agreed. And, using LD_PRELOAD to hook into the allocator is improper, anyways, since it won't catch early uses of the allocator. And, as you mention, it wouldn't even work with Firefox given mozjemalloc. Firefox is not the only application to want to have control over the allocator.
The only way to guarantee catching early allocator use is to switch the system's allocator (ie, libc itself) to the new one. Otherwise, the application will end up with two allocator implementations being used: the application's custom one and the system's, included and used within libc (and other system libraries, of course.)
Fingerprinting: It is most likely possible to be creative enough to fingerprint what memory allocator is used. If we were to choose from different allocators at runtime, I don't think that fingerprinting is the worst thing open to us - it seems likely that any attacker who does such a attack could also fingerprinting your CPU speed, RAM, and your ASLR base addresses which depending on OS might not change until reboot.
My post was more along the lines of: what system-level components, if replaced, have a potentially visible effect on current (or future) fingerprinting techniques?
And: If, or how, does breaking monocultures affect fingerprinting? Breaking monocultures is typically done to help secure an environment through diversity, causing an attacker to have to spend more resources in quest for success.
The only reason I can think of to choose between allocators at runtime is to introduce randomness into the allocation strategy. An attacker relying on a blind overwrite may not be able to position their overwrite reliably AND it has the cause the process to crash otherwise they can just try again.
Allocators can introduce randomness themselves, you don't need to choose between allocators to do that.
I'm assuming you're talking about randomness of the address space? When it comes to browsers, ASLR is dead. Local execution of remotely-sourced arbitrary code, an attack vector ASLR was never meant to protect against.
Thus, discussion of whether choice of allocator improves effectiveness of ASLR when applied to the browser is moot.
In virtually all browser exploits we have seen recently the attacker creates exploitation primitives that allow partial memory read/write and then full memory read/write. Randomness introduced is bypassed and ineffective. I've seen a general trend away from randomness for this purpose. The exception is when the attacker is heavily constrained - like exploiting over IPC or in a network protocol. Not when the attacker has a full Javascript execution environment available to them.
When exploiting a memory corruption vulnerability, you can target the application's memory (meaning, target a DOM object or an ArrayBuffer) or you can target the memory allocator's metadata. While allocator metadata corruption was popular in the past, I haven't seen it used recently.
Okay all that out of the way, let's talk about allocators.
I skimmed https://github.com/GrapheneOS/hardened_malloc and it looks like it has:
- out of line metadata
- double free protection
- guard regions of some type
- zero-filling
- MPK support
- randomization
- support for arenas
mozjemalloc:
- arenas (we call them partitions)
- randomization (support for, not enabled by default due to limited
utility, but improvements coming)
- double free protection
- zero-filling
In Progress:
- we're actively working on guard regions
Future Work:
- out of line metadata
- MPK
harden_malloc definitely has more bells and whistles than mozjemalloc. But the benefit gained by slapping in an LD_PRELOAD and calling it a day is small to zero. Probably negative because you'll not utilize partitions by default. You'd need a particurally constrained vulnerability to actually prevent exploitation - it's more likely you'll just cost the attacker another 2-8 hours of work.
100% agreed with your thoughts on LD_PRELOAD here, with the additions of my notes above.
Out of line metadata is on-the-surface-attractive but... that tends to only help when you have a off-by-one/four write and you corrupt metadata state because it's the only thing you *can* do. With out of line metadata, you can just corrupt a real object and effect a different type of corruption. I'm pretty skeptical of the benefit at this point, although I could be convinced. We don't see metadata corruption attacks anymore - but I'm not sure if it's because we find better exploit primitives or better vulnerabilities.
In particular, if you wanted to pursue hardened_malloc you would need to use replace_malloc and wire up the partitions correctly. Randomization will almost certainly not help (and will hurt performance)*. MPK sounds nice but you have to use it correctly (which requires application code changes), you have to ensure there are no MPK gadgets, and oh wait no one can use it because it's only available in Linux on server CPUs. =(
- One place randomization will help is on the other side of an IPC
boundary. e.g. in the parent process. I'm trying to get that enabled for mozjemalloc in H2 2019.
In conclusion, while it's possible hardened_malloc could provide some small security increase over mozjemalloc, the gap is much smaller than it was when I advocated for allocator improvements 5 years ago, the effort is definitely non-trivial, and the gap is closing.
I'm curious about how breaking monocultures affect attacks. I think supporting hardened_malloc (or <insert arbitrary allocator here>) would provide at least the framework for academic exercises.
Thanks,