Post-processing jemalloc jeprof heap dump files for statistical analysis

If you have or suspect you have a memory leak in your application, you simply cannot go past using jemalloc to help you track it down. It's an implementation of malloc() and related functions that provide all the usual memory allocation/deallocation functionality as well as profiling and heap allocation stats capture. These can be invaluable in helping track down memory leaks, but there are some caveats. Documentation is not perfect and some of the output formats are hard to understand and reuse for statistical analysis.

There is documentation in the jemalloc Git repository that talks about leak checking. I'm not going to go into details on how to set up jemalloc here and will assume that it has already been set up and the heap dump profile files are already available (you can read about how to set it up here). There are essentially two ways to analyse these files, both are done with the aid of the jeprof tool. The first is generating a text representation of memory usage and second is to generate a graph that shows both memory usage and function call relationships. For statistical analysis the former is more appropriate since we can use that information over many profile dumps to plot graphs showing memory usage over time.

For a single profile dump, we can run jeprof to generate our required output like this...

Command

jeprof --text --show_bytes cmd_to_profile myheap.0.heap

...which produces output that is very human readable and nice but is not ideal for further processing...

jeprof output

Total: 73016352 B

45024872 61.7% 61.7% 45024872 61.7% os::malloc

11258484 15.4% 77.1% 11258484 15.4% updatewindow

6291456 8.6% 85.7% 6291456 8.6% init

4271669 5.9% 91.6% 4271669 5.9% CRYPTO_malloc

1750952 2.4% 93.9% 1750952 2.4% inflateInit2_

1487634 2.0% 96.0% 1487634 2.0% readCEN

...

Since jemalloc aligns to gperftools to some extent, the format of the above output is as per the gperftools documentation described here.

The first column contains the direct memory use in MB.
The fourth column contains memory use by the procedure and all of its callees.
The second and fifth columns are just percentage representations of the numbers in the first and fourth columns.
The third column is a cumulative sum of the second column (i.e., the kth entry in the third column is the sum of the first k entries in the second column.)

In terms of statistical analysis, the percentage columns do not really matter. Then, the column that has the procedure name is the last column, which isn't very spreadsheet friendly. This output is good for manual debugging of a single heap dump, but if you want to do further analysis and comparison to other dumps the only columns that are of interest are 1, 4 and 6 i.e direct memory use, procedure and callee memory use and the procedure name/address.

To capture data that is of interest then requires some basic filtering with grep and awk...

commands

jeprog --text --show_bytes cmd_to_profile myheap.0.heap | grep "%" | awk '{print $6 " " $1 " " $4}'

The use of grep ensures that we only get lines that have actual data (looking for %age values), ignoring any other lines like total counts, etc. Then awk is used to rearrange and extract data from columns 1, 4 and 6. The output from that on the same profile dump as earlier looks like this...

processed jeprof output

os::malloc 45024872 45024872

updatewindow 11258484 11258484

init 6291456 6291456

CRYPTO_malloc 4271669 4271669

inflateInit2_ 1750952 1750952

readCEN 1487634 1487634

...

That's not as human readable but is much more machine friendly!

Of course looking at one profile dump is not enough to see trends. Since each dump file has memory usage statistics at that instance in time, it is necessary to look at multiple dumps to get an understanding of the behaviour of procedures showing where memory is being allocated and potentially leaked.

I used a simple Bash script to process all of the jeprof dump files into the more friendly machine readable format and then went on to processing those files further and plotting memory usage trends. These are topics for another blog post, which I'll link to here once they are available.

-i