ESR38 isn't going to have jemalloc3. But the code is in the FF tree, well-integrated, and while I wouldn't call it 'supported', the mozilla devs are at least pretty familiar with it. I wanted to see if there was anything that would stop Tor from potentially using it even if FF isn't.
The mozilla bug is https://bugzilla.mozilla.org/show_bug.cgi?id=762449
Their concerns are, at least: - mismatched calls to VirtualAlloc/Free: https://github.com/jemalloc/jemalloc/issues/213 (This can be resolved by a patch in comment #1 which is slower than they'd like) - performance concerns - I think there are issues on mobile platforms, but I'm not entirely clear there
There are probably more, but these were the ones I was able to discern. I decided to test the performance question, with and without the patch. I ran a slew of in-browser benchmark sites on an Ubuntu desktop, running mozilla-release tag FIREFOX_38_0_RELEASE.
The results are close, and it looks like the current jemalloc (2?) outperforms jemalloc3, but that jemalloc3 (even with the patch) is not significantly more painful. Both compare favorably to chromium for instance.
Bigger is better for all but the last chart (which labels it in the title).
If Tor wanted to consider turning on jemalloc3 in TBB, I would suggest the following: 1) Seeing if the heap isolation based on class of object (e.g. js strings/arrays) is implemented, and if not, what it would take to do so. If it's not there and it's hard to do, there's not much point to turning to jemalloc3 2) Round up the mozilla devs and lay it out as a real possibility and get their input
I will work on #1 as I'm able.
-tom
Benchmarks: http://peacekeeper.futuremark.com http://browsermark.rightware.com/ http://dromaeo.com/ http://v8.googlecode.com/svn/data/benchmarks/v7/run.html http://octane-benchmark.googlecode.com/svn/latest/index.html http://browserbench.org/Speedometer/ http://browserbench.org/JetStream/ http://v8.googlecode.com/svn/branches/bleeding_edge/benchmarks/spinning-ball... http://www.webkit.org/perf/sunspider/sunspider.html http://krakenbenchmark.mozilla.org/index.html
On Monday, May 11, 2015 at 11:59 AM, Tom Ritter wrote:
ESR38 isn't going to have jemalloc3. But the code is in the FF tree, well-integrated, and while I wouldn't call it 'supported', the mozilla devs are at least pretty familiar with it. I wanted to see if there was anything that would stop Tor from potentially using it even if FF isn't.
The mozilla bug is https://bugzilla.mozilla.org/show_bug.cgi?id=762449
Their concerns are, at least:
- mismatched calls to VirtualAlloc/Free:
https://github.com/jemalloc/jemalloc/issues/213 (This can be resolved by a patch in comment #1 which is slower than they'd like)
- performance concerns
- I think there are issues on mobile platforms, but I'm not entirely
clear there
There are probably more, but these were the ones I was able to discern. I decided to test the performance question, with and without the patch. I ran a slew of in-browser benchmark sites on an Ubuntu desktop, running mozilla-release tag FIREFOX_38_0_RELEASE.
The results are close, and it looks like the current jemalloc (2?) outperforms jemalloc3, but that jemalloc3 (even with the patch) is not significantly more painful. Both compare favorably to chromium for instance.
Bigger is better for all but the last chart (which labels it in the title).
If Tor wanted to consider turning on jemalloc3 in TBB, I would suggest the following:
- Seeing if the heap isolation based on class of object (e.g. js
strings/arrays) is implemented, and if not, what it would take to do so. If it's not there and it's hard to do, there's not much point to turning to jemalloc3
Some relevant links about heap partitioning in FF, http://guilherme-pg.com/2014/10/15/Partitioned-heap-in-Firefox-pt1.html http://guilherme-pg.com/2015/03/03/Partitioned-heap-in-Firefox-pt2.html
- Round up the mozilla devs and lay it out as a real possibility and
get their input
I will work on #1 as I'm able.
-tom
Benchmarks: http://peacekeeper.futuremark.com http://browsermark.rightware.com/ http://dromaeo.com/ http://v8.googlecode.com/svn/data/benchmarks/v7/run.html http://octane-benchmark.googlecode.com/svn/latest/index.html http://browserbench.org/Speedometer/ http://browserbench.org/JetStream/ http://v8.googlecode.com/svn/branches/bleeding_edge/benchmarks/spinning-ball... http://www.webkit.org/perf/sunspider/sunspider.html http://krakenbenchmark.mozilla.org/index.html _______________________________________________ tbb-dev mailing list tbb-dev@lists.torproject.org (mailto:tbb-dev@lists.torproject.org) https://lists.torproject.org/cgi-bin/mailman/listinfo/tbb-dev
Progress! Although I missed the meeting, sorry.
========= Adding this patch: https://bug1052573.bugzilla.mozilla.org/attachment.cgi?id=8573433 to FF38 (plus enabling jemalloc3) means we have heap partitioning functions available to both Firefox proper (I believe) and a replace library.
I have successfully implemented a replace library that creates arenas and mallocs things into them.
========= Then I went and traced down how memory tags work. Bad news. As I understand them, there's no real metadata about the tag available. Rather different parts of the FF Codebase implement nsIMemoryMultiReporter[0], which calls some functions, which self-reports allocated data into buckets. For an example:
look at js/xpconnect/src/XPCJSRuntime.cpp JSReporter::CollectReports calls (among other things) ReportZoneStats ReportZoneStats reports the mallocHeapLatin1 member as the tag "malloc-heap/latin1" The mallocHeapLatin1 member is the number of bytes used for that type of bucket, and is filled in js/src/vm/MemoryMetrics.cpp Specifically, we track the memory as being "malloc-heap/latin1" when we call StatsCellCallback, which is called by JS::CollectRuntimeStats
My point is: at allocation time, I don't see any way to use these memory tags to direct allocation.
========= Future directions:
----------------- Find the allocations of the most dangerous JS types (strings, arrays), and switch their allocations over to partitions. Just patch the code ourselves. Can we put our patches behind a preference? ...Probably... I think? Assuming we're using jemalloc3 (compiled in, no preference, just always on) I don't think allocating without or with arenas or switching in the middle of things is a problem. Obviously we'd test but, it seems like it would work.
I don't even think it matters for things like reallocs. Specifically: http://www.canonware.com/download/jemalloc/jemalloc-latest/doc/jemalloc.html Look for the non-standard APIs that accept a flags argument. You use the macro MALLOCX_ARENA(int) as a flag to allocate in a specific arena.
- The Non-Standard API does not define a free() that requires you to pass in the arena it was allocated to. You do not need to preserve that information anywhere, you can just free a pointer. - The rallocx() and xallocx() functions does not say you must reallocate in the same arena you allocated in
----------------- Write a replace library that randomizes the arena based on the callstack. (Yep, I'm still pushing this. =P)
----------------- When there's a ESR38 TBB I'll start working off that instead of mozilla-release
-tom
[0] https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interf...