First, an update. Going back through the code I discovered a bug in the way the triangle distribution was implemented. Fortunately this is a toy example, as it made the results inaccurate. My fault for not unit testing. I would still not suggest using this code for anything other than learning about fork/join.
Thread Local Random Numbers
To move on to more interesting things: I was reading through Oracle’s release notes on Java SE 7 and noticed that they include a new facility for concurrent random numbers. Since a Monte Carlo simulation generates millions of random numbers, I was very interested to see how its performance would be impacted.
The Javadoc for ThreadLocalRandom mentions contention
when multiple threads use regular
uses atomic variables to hold the current seed so that two threads calling
simultaneously get pseudo-random numbers in sequence rather than the same
number twice. In my case, I am using a separate instance of
Random for each
random variable in the simulation, but these instances are being shared across
all runs on the simulation. As a result, there is a strong possibility of
contention, so we should expect an improvement.
ThreadLocalRandom is not instantiated directly; instead, there is a static
current() that makes a new instance the first time it is called from
a given thread. So by changing from
are changing two things about the program:
- We no longer have the contention of accessing a single random number generator instance from multiple threads.
- Instead of instantiating a random number generator instance per random variable, we instantiate one per thread.
To set a baseline, here’s a fresh run using regular
As discussed in the previous posts, the move from sequential to parallel is much more important than the way that the task is divided up.
The change is very minor:
The resulting change is substantial:
The improvement is around 25%, which is well worth having.
It is interesting that the sequential case shows less of an improvement than the parallel case. This tends to show that there was genuine contention between different threads accessing the same random number generator, and not just overhead from the use of atomic variables.
It is also interesting that there is improvement in the sequential case. This
shows that not all the performance gain was created by eliminating contention.
This tends to suggest that even in regular Java code, there is an advantage to
ThreadLocalRandom if many random numbers will be needed. This is
similar to the difference between
shifting thread safety to instantiation rather than during use, it is possible
to improve performance. It is possible that there are other cases in Java
synchronized code blocks are used that would be more
performant with separate
In the numbers above, the “divide by two” case with 2 children and the smallest chunk size appears to be significantly worse than other options. Note that this method spawns the most tasks, but because of the way the fork/join framework operates, it does not mean that it spawns more threads than the others. Subsequent runs still showed some difference, but not as marked, so this may not be significant.
The Java fork/join framework is, in my opinion, a valuable contribution to Java SE 7.
ThreadLocalRandom class may be a considerably smaller and less complex addition,
but it appears to have a strong case for its usage, possibly even in “regular” Java code.