Wednesday, 25 July 2012

Clojure poor performance - why?

The only glaring result from my functional language benchmarking (a strong word for what I did in all honesty) is the Clojure result.

Now, let me be upfront here - I love Clojure. I've always loved the lisp family of languages, but always found Common Lisp a little clunky sometimes (although very quick). Clojure is great, it has all the benefits of the JVM, but with modern data structures and some good concurrency constructs. It's functional, immutable and has macros support (as you'd expect).

It's also dynamic, with lots of boxing going on if you're not careful. I thought this was the reason for the slowdown, but after doing a lot of optimising using type hints and the new numerics supports in 1.3 I got very little improvement in performance. I even dropped records and used maps, even though it's counter-intuitive...but all to no avail.

I thought it could be the trig and maths at fault, so the first thing I tried was to use memoize to cache the results of the travel calculations. However, this made only a small improvement.

It was when I commented out the various trig calls that I got a change in the results. It went down from 51s to 10s.

Right.

So I started to do some digging and came across the following:

https://blogs.oracle.com/jag/entry/transcendental_meditation
http://docs.oracle.com/javase/6/docs/api/java/lang/Math.html

These seems to suggest it's not using the hardware fcos/fsin some/most of the time - that to achieve the JVM required accuracy software implementations are used.

Do some googling and you'll find many more articles on this.

So beware!

I'm still not sure why Scala doesn't seem to suffer? Maybe the libraries have workarounds?

But it gives me confidence that Clojure is close enough performance wise to be practical for my purposes.

Functional languages - A performance comparison


Okay, I've decided to write a blog - one of those things I've been meaning to do for longer than I can remember. Now this isn't going to be the most exciting blog in the world, but you never know - it might well be useful to somebody.

I'm going to start off with a little project I did recently. I'm into functional programming, but am acutely aware that there is a performance penalty associated with anything immutable and/or recursive. In recent years there has been substantial effort in optimising compilers and VMs for functional languages, and performance is approaching that of more conventional imperative mutable languages.

Now, I've got a whole brain full of ideas I want to try out, but just for fun you understand. However, as a man who feels fluent in several languages and competent in at least a dozen more, I've decided to try and pick one and try and master it fully. Of course, this won't happen, but it's at least a worthy goal.

Now, there are various metrics about a language and platform:

  • Performance
  • Syntax including conciseness and expressiveness
  • Libraries and platform support

In this post I'm going to tackle the first item. There is the famous Computer Language Benchmarks Game, but this hampers functional languages in certain ways, for example it doesn't allow lazy evaluation.

So, instead I put together my own little benchmark. Now, I admit this is fairly simple and it doesn't use some functional features such as currying or lazy evaluation; however it does use recursion and immutability. Also, I've tried to keep the code idiomatic but still following the same basic algorithm each time. It would be possible to do optimisations in each language or be more idiomatic (by modifying the algorithm) but I avoided this.

The problem to solve is a simple one - there are 10 people with 500 places to visit (this is stored in a CSV file), however they can only visit 10 each. I use a brute force method of finding possible journeys, basically recurse through each resource in turn, then choose the 10 nearest places in turn (using the Haversine formula). I then fold this to get a total travel distance, which I ensure is the same across each implementation. This makes good use of recursion, some heavy lifting with fold/map, some file access and a bit of simple trig.

The source is on github.

Results


Haskell
-------
Lines: 83


1 - 0m0.025s
10 - 0m0.090s
100 - 0m0.773s
1000 - 0m7.677s
10000 - 1m17.865s



F# (Mono)
---------
Lines: 83


1 - 0m0.155s
10 - 0m0.378s
100 - 0m2.641s
1000 - 0m25.487s (SGen - 0m19.399s)
10000 - 4m9.381s (.NET in VM 1m28.366)


F# (.NET)
---------
Lines: 83


1 - 0m0.123s
10 - 0m0.199s
100 - 0m.948s
1000 - 0m8.39s
10000 - 1m23.619s



Scala
-----
Lines: 82


1 - 0m1.028s
10 - 0m1.375s
100 - 0m2.253s
1000 - 0m11.959s
10000 - 1m44.669s


Clojure
-------
Lines: 88


1 - 0m1.511s
10 - 0m2.317s
100 - 0m7.022s
1000 - 0m51.575s
10000 - 8m50.565s


OCaml
-----
Lines: 88


1 - 0m0.015s
10 - 0m0.118s
100 - 0m1.162s
1000 - 0m11.162s
10000 - 1m51.518s


Some very interesting results, some things I've noticed:
  • Haskell is the fastest but Scala seems to be accelerating in performance. HotSpot compiler maybe?
  • The F# performance is a lot better on the real VM and on the new SGen GC.
  • Clojure is strangely slow, but I'll cover why I think this is the case in a later blog.
  • The native compiled languages show a very quick 1 iteration time.
  • Clojure has the longest 1 iteration time, but this is to be expected.
  • They are all very similar on number of lines, however each implementation is very similar in code.
  • The memory usage was significantly (as in 1/100) lower for Haskell and O'Caml - although I didn't measure this precisely.

Conclusion

I'm not sure there is one, all the languages I tested are close enough to be interchangeable with the exception of Clojure (which I'm going to cover in a different post).

Which language did I find easiest? F# - the IDE and debugger makes such a difference.
The hardest? Haskell, but once I'd got it to compile it worked first time.

Table & Graph

To make it clear, here is a quick graph I've put together a table (the top number is iterations) and a graph.






1
10
100
1000
10000
Haskell
0.025
0.090
0.773
7.677
77.865
F# .NET
0.123
0.199
0.948
8.39
83.619
Scala
1.028
1.375
2.253
11.959
104.669
O’Caml
0.015
0.118
1.162
11.162
111.518
F# Mono
0.155
0.378
2.641
25.487
249.381
Clojure
1.511
2.317
7.022
51.575
530.565