Recently I’ve been working with Scala. Scala’s collections api is much richer than Java’s and offers mutable and immutable implementations for most of the common collection types. I’ve always been interested in algorithm and data structure performance so I decided to run some benchmarks to see how the collections performed. I was most interested in the relationship between mutable and immutable collections.
Note: Scala does list the expected performance characteristics of the collections but it doesn’t show the relative performance of the types.
Two very common scenarios are appending to a collection (e.g. reading from a file or database cursor) and then iterating through the collection. It’s widely known that prepending to a list then reversing it can yield greater performance (for some implementations) so I also tried this.
As you can see in the source code, all I am doing is iterating through a list of 1000 Ints and appending each element to a collection of a given type. It’s trivial but it’s also very common. Here are the results:
Appending
Appending 1000 elements to the given collection. Benchmarks marked with M use mutable collections
Collection | Throughput (ops/s) |
---|---|
M ArrayBuffer | 173,677 |
M ArrayBuilder | 117,886 |
M ListBuffer | 69,692 |
List (prepend) | 60,677 |
M Array * | 49,790 |
M ListBuffer (prepend) | 39,799 |
Queue | 38,482 |
M MutableQueue | 30,996 |
M MutableList | 30,839 |
Stack (prepend) | 29,370 |
Vector (prepend) | 26,910 |
Vector | 26,882 |
M MutableQueue (prepend) | 17,007 |
M ArrayBuffer (prepend) | 16,634 |
M MutableList (prepend) | 16,047 |
Array (using a view) ** | 14,604 |
Queue (prepend) | 353 |
List | 307 |
Stack | 98 |
* As you can see in the source, I actually allocated a 1000 element array then copied each element into the slot so it’s not strictly an append operation. Whether this is a valid benchmark is subject to interpretation. see “array-comparison” (below) also
** For the Array benchmarks I need to get the current index of the source List. I’ve done this using zipWithIndex. A common optimisation when using zipWithIndex is to call it on a view however this yields worse performance! As I’ve said many times before - don’t assume A will be quicker than B - prove it! The overhead of the optimisation quite often outweighs the performance gains.
Mutable vs Immutable collections
We can get very good performance out of the standard immutable List. Don’t assume that immutable collections are inherently slow - they’re not.
Lists
Immutable lists offer good performance and as you’ll see in the traversal benchmarks, traversal is also performant. Prepending to an immutable list then reversing it is much faster than simply appending. This is to be expected but the magnitude of the difference surprised me.
Prepending to a mutable list and reversing it is slower than simply appending. This makes perfect sense when you read this comment in the scala documentation “A MutableList consists of a single linked list together with a pointer that refers to the terminal empty node of that list”. The expensive part of the append operation is getting to the end of the list, the mutable list maintains a pointer so we don’t need to traverse the list to get to the end.
Queues and Stacks
Admittedly Queues and Stacks are suited to specific use cases which are not represented here. However in these tests they are slower than Lists, not by a significant margin but enough to make me think that it may be better to use Lists in their place. Scala’s collection api makes it trivial to implement our own Stacks and Queues on top of lists.
Vectors
Prepending and appending to Vectors yields similar performance. This is to be expected as Vectors are optimised for random access
Arrays, ArrayBuilder and ArrayBuffer
The mutable ArrayBuffer offers amazing performance and surprisingly it’s faster than the ArrayBuilder (a lower level api) and raw Array access. Assuming a mutable data structure is suitable for my use case, ArrayBuffer would be my go to for performance focussed code blocks.
Array comparison
My comparison with raw array access is a little questionable because it’s very hard to separate out the actual overhead of the array access from the overhead of running the benchmark. In particular the way I’ve run all these benchmarks is to pass in a list, traverse it and append/set each element on the collection under test. The traversal of the original list will incur an overhead but it should be consistent throughout the benchmarks. I could have simply performed System.arrayCopy which would likely be very fast but this would then be an apples with oranges comparison and I wanted to benchmark incremental appends.
Iterating
Iterating a collection is also very common so I iterated through 1000 elements in a given collection and summed the results. I could have used Scala’s sum() helper function but I wanted to avoid any optimisations that it may introduce. I just wanted to iterate the collection but I need to do something with the elements or the JIT will drop the dead code.
Iterating over 1000 elements of the given collection. Benchmarks marked with M use mutable collections
Collection | Throughput (ops/s) |
---|---|
M ArrayBuffer | 1,226,833 |
M Vector | 402,363 |
M ListBuffer | 412,568 |
List | 396,586 |
Queue | 363,200 |
Array | 321,700 |
ArrayBuilder | 163,464 |
Stack | 54,152 |
M MutableList | 37,536 |
M MutableQueue | 33,620 |
Again the ArrayBuffer trumps everything else. The ArrayBuilder benchmark is pretty pointless as it simply calls result() to get an array then does exactly the same thing as the Array benchmark. I’ve included it in case someone asks “what about the ArrayBuilder?” Why ArrayBuffer yields such good performance deserves further investigation.
Random access
I find random access to indexed and linear sequences less common but I decided to benchmark them anyway. I simply created a collection of Ints and then accessed them randomly using the apply method. The results were quite predictable:
Benchmark | Throughput (ops/s) |
---|---|
M ArrayBuffer | 72,734,680 |
Vector | 62,082,283 |
M Array | 60,777,210 |
M ListBuffer | 1,107,126 |
M MutableList | 936,839 |
M MutableQueue | 907,515 |
List | 313,253 |
Queue | 161,754 |
Stack | 45,474 |
Once again ArrayBuffer is the winner followed closely by Vectors and Arrays. Predictably Linear Seqs offer much lower performance for random access due to the need to traverse the collection to find the element. What is surprising is the performance increase of a mutable List/ListBuffer vs the immutable List (3x faster). A mutable collection should not be inherently faster for random access to elements.
How useful are micro benchmarks?
To some people micro benchmarks are useless. They will argue that any difference in collection performance will be vastly outweighed by network calls, disk IO etc in the real world. It’s certainly true that SQL queries, IO etc will have a big impact on performance but not all applications and use cases are IO/DB bound.
Even for IO bound use cases we may still be able to squeeze a little bit more performance from good data structure and algorithm selection. It could be enough to hit that elusive NFR.
Write your own benchmarks
Micro benchmarks are very sensitive to the parameters used. Your own use case will be quite different to the trivial use case I have simulated. Collection size, the type of objects in the collection and the operations you perform on the elements will all affect performance and may render these benchmarks invalid. Feel free to clone my github repo and use it as a template for your own benchmarks and let me know your results!
Write your own collections
Scala’s collection Api is great, coming from Java it was a revelation and I’m impressed with how well thought out it is. The api also makes it easy to write our own collections and Programming in Scala (a superb book) includes examples of this. Don’t be afraid of writing your own use case specific collections. In another next post I’ll show how we can do this and try to beat the ArrayBuffer benchmark.