Re: [tor-dev] TBB Memory Allocator choice fingerprint implications

19 Aug 2019

      Hey Tom,
Thank you for your response. You've made some great points. My
response is inline.
On Mon, Aug 19, 2019 at 04:09:36PM +0000, Tom Ritter wrote:
...
Okay I'm going to try and clear up a lot of misconceptions and stuff
here.  I don't own Firefox's memory allocator but I have worked in it,
recently, and am one of the people who are working on hardening it.
Firefox's memory allocator is not jemalloc. It's probably better
referred to as mozjemalloc. We forked jemalloc and have been improving
it (at least from our perspective.) Any analysis of or comparison to
jemalloc is - at this point - outdated and should be redone from
scratch against mozjemalloc on mozilla-central.
LD_PRELOAD='/path/to/libhardened_malloc.so' /path/to/program will do
nothing or approximately nothing. mozjemalloc uses mmap and low level
allocation tools to create chunks of memory to be used by its internal
memory allocator. To successfully replace Firefox memory allocator you
should either use LD_PRELOAD _with_ a --disable-jemalloc build OR
Firefox's replace_malloc functionality:
https://searchfox.org/mozilla-central/source/memory/build/replace_malloc.h
Completely agreed. And, using LD_PRELOAD to hook into the allocator is
improper, anyways, since it won't catch early uses of the allocator.
And, as you mention, it wouldn't even work with Firefox given
mozjemalloc. Firefox is not the only application to want to have
control over the allocator.
The only way to guarantee catching early allocator use is to switch
the system's allocator (ie, libc itself) to the new one. Otherwise,
the application will end up with two allocator implementations being
used: the application's custom one and the system's, included and used
within libc (and other system libraries, of course.)
...
Fingerprinting: It is most likely possible to be creative enough to
fingerprint what memory allocator is used. If we were to choose from
different allocators at runtime, I don't think that fingerprinting is
the worst thing open to us - it seems likely that any attacker who
does such a attack could also fingerprinting your CPU speed, RAM, and
your ASLR base addresses which depending on OS might not change until
reboot.
My post was more along the lines of: what system-level components, if
replaced, have a potentially visible effect on current (or future)
fingerprinting techniques?
And: If, or how, does breaking monocultures affect fingerprinting?
Breaking monocultures is typically done to help secure an environment
through diversity, causing an attacker to have to spend more resources
in quest for success.
...
The only reason I can think of to choose between allocators at runtime
is to introduce randomness into the allocation strategy. An attacker
relying on a blind overwrite may not be able to position their
overwrite reliably AND it has the cause the process to crash otherwise
they can just try again.
Allocators can introduce randomness themselves, you don't need to
choose between allocators to do that.
I'm assuming you're talking about randomness of the address space?
When it comes to browsers, ASLR is dead. Local execution of
remotely-sourced arbitrary code, an attack vector ASLR was never meant
to protect against.
Thus, discussion of whether choice of allocator improves effectiveness
of ASLR when applied to the browser is moot.
...
In virtually all browser exploits we have seen recently the attacker
creates exploitation primitives that allow partial memory read/write
and then full memory read/write. Randomness introduced is bypassed and
ineffective. I've seen a general trend away from randomness for this
purpose. The exception is when the attacker is heavily constrained -
like exploiting over IPC or in a network protocol. Not when the
attacker has a full Javascript execution environment available to
them.
When exploiting a memory corruption vulnerability, you can target the
application's memory (meaning, target a DOM object or an ArrayBuffer)
or you can target the memory allocator's metadata. While allocator
metadata corruption was popular in the past, I haven't seen it used
recently.
Okay all that out of the way, let's talk about allocators.
I skimmed https://github.com/GrapheneOS/hardened_malloc and it looks
like it has:

out of line metadata
double free protection
guard regions of some type
zero-filling
MPK support
randomization
support for arenas

mozjemalloc:

arenas (we call them partitions)
randomization (support for, not enabled by default due to limited

utility, but improvements coming)

double free protection
zero-filling

In Progress:

we're actively working on guard regions

Future Work:

out of line metadata
MPK

harden_malloc definitely has more bells and whistles than mozjemalloc.
But the benefit gained by slapping in an LD_PRELOAD and calling it a
day is small to zero. Probably negative because you'll not utilize
partitions by default. You'd need a particurally constrained
vulnerability to actually prevent exploitation - it's more likely
you'll just cost the attacker another 2-8 hours of work.
100% agreed with your thoughts on LD_PRELOAD here, with the additions
of my notes above.
...
Out of line metadata is on-the-surface-attractive but... that tends to
only help when you have a off-by-one/four write and you corrupt
metadata state because it's the only thing you *can* do. With out of
line metadata, you can just corrupt a real object and effect a
different type of corruption. I'm pretty skeptical of the benefit at
this point, although I could be convinced. We don't see metadata
corruption attacks anymore - but I'm not sure if it's because we find
better exploit primitives or better vulnerabilities.
In particular, if you wanted to pursue hardened_malloc you would need
to use replace_malloc and wire up the partitions correctly.
Randomization will almost certainly not help (and will hurt
performance)*. MPK sounds nice but you have to use it correctly (which
requires application code changes), you have to ensure there are no
MPK gadgets, and oh wait no one can use it because it's only available
in Linux on server CPUs. =(

One place randomization will help is on the other side of an IPC

boundary. e.g. in the parent process. I'm trying to get that enabled
for mozjemalloc in H2 2019.
In conclusion, while it's possible hardened_malloc could provide some
small security increase over mozjemalloc, the gap is much smaller than
it was when I advocated for allocator improvements 5 years ago, the
effort is definitely non-trivial, and the gap is closing.
I'm curious about how breaking monocultures affect attacks. I think
supporting hardened_malloc (or <insert arbitrary allocator here>)
would provide at least the framework for academic exercises.
Thanks,
-- 
Shawn Webb
Cofounder / Security Engineer
HardenedBSD

Tor-ified Signal:    +1 443-546-8752
Tor+XMPP+OTR:        lattera@is.a.hacker.sx
GPG Key ID:          0xFF2E67A277F8E1FA
GPG Key Fingerprint: D206 BB45 15E0 9C49 0CF9  3633 C85B 0AF8 AB23 0FB2

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [tor-dev] TBB Memory Allocator choice fingerprint implications