Since jemalloc does not write heap profile dumps at regular time intervals this was not a temporal analysis of memory allocation but could be treated as a sequential analysis of discrete heap profiles that happen to have occurred over a period of time. This could have been extended to take dump file creation time into account, however I did not do so. The analysis as is gave a good idea of patterns in memory allocation and in my case painted a very clear picture of where a memory leak was.
The input was a collection of heap profile dumps that have been post-processed as described in my previous article. The result of this post-processing was a collection of data files that had 3 columns each - procedure name, direct memory use and memory use of the procedure and its callees. A sample of one file looked like this...
I used the direct memory use of a procedure and ignored the procedure+callee memory usage. This meant that I was only interested in the first and second columns in the file.
The challenge here was to stitch up all of the data files so that a 2D array was produced that contained each procedure's memory allocation from every data file. This was a somewhat sparse array because it was not guaranteed that all of the procedure names would appear in all data files. At first I tried to use the join command in Linux to do this but because of the sparseness of the data it didn't give me the correct results. I ended up writing some very simple code in Java that populated a HashMap with the data instead.
The procedure name was used as the map key and data was held in an long array the size of the number of input data files (1 file per column essentially). The code went along these lines (simplified here)...
That code produced output like the one below. Each of the procedure names had its memory allocation on one line with each profile data file sample separated by a space character i.e. 20 dump files would produce a final file that had 21 columns (1 column for the procedure name and 20 samples).
Something to note was that the final output contained only those procedures that had more memory allocated at the end of the sample period vs the beginning. This served to remove noise and reduce the size of the potential memory leaking procedures. The bytesCutOff was set to zero in my case but it was possible to use it to also filter out any 'small leakers', leaving just the more significant memory hogs.
The above data set was then plotted in Excel and produced a graph similar to this...
From that graph it was very clear which procedures were the majority leakers. By plotting data over multiple profile dumps made it very easy to see patterns in memory allocation that were not possible to see by looking at individual heap profiles.
That was not the end of the analysis however. The plotting only went as far as identifying the procedures to concentrate on. After this I looked at the final heap profile and generated a call graph using jeprof. From the call graph I could see that both of my majority leakers were related and they originated somewhere in the Java code.
Decompiling Java code and tracking down where these procedures were called was next. That work is outside the scope of this article however. Happy leak hunting!
-i