On Quasi Random Numbers - MersenneTwister vs Sobol precision in Asian Option Pricing

Tue, May 5, 2009 While starting a side project that does Monte Carlo pricing in Java (http://code.google.com/p/javamc/ - nothing yet there I am waiting for Mercurial repository support), I wondered what was the importance of quasi random numbers versus more regular pseudo random numbers in Monte Carlo simulations.

This brought me to read more carefully several books about Monte Carlo and Finance (Haug Option Pricing, Sobol Primer on Monte Carlo, and Glasserman Monte Carlo Methods in Finance Engineering). I had quite a hard time to understand why the dimension of the quasi random generator was so important to price an asian option. Intuitively I thought the averaging points of an asian option were all on the same path, so they should be using the same random generator. This is very wrong as one does not care about the path in the first place but just in simulating each point in the average (using the regular black and scholes hypothesis). Finding the estimation for the average on the given points forces to use independent random generators for each point, because we want to approximate the estimation by the sum over those random points for each point.

There is another simple argument to explain why independence of the random generators is so important. If we use the same generator for each point, then each point will move exactly the same way at each simulation. The average of those point will therefore behave exactly the same way as if there was only 1 point using the same generator. And we don't price an asian anymore but just a regular vanilla option.

Using a pseudo random generator, one does not see the problem of dimension, because we can create N independent dimensions by just taking numbers N by N on a pseudo random generator. So effectively having 1 or N dimensions is the same on a pseudo random generator.

Still I wrote a small test to see if a 1D quasi random generator was so bad when simulating N dimensions (taking values N by N on the quasi random generator). Here are the results:

MersenneTwister vs MersenneTwister on 10D asian:
14:43:51,111 INFO MonteCarloSimulationTest:114 - 867970000 -- expPrice=0.978958644504466
14:43:51,428 INFO MonteCarloSimulationTest:120 - 314619000 -- expPrice=0.9733220318545934
14:43:51,430 INFO MonteCarloSimulationTest:122 - relative difference=-0.005757763804951897
can be as high as 2%

Sobol vs MersenneTwister on 10D asian:
14:48:46,909 INFO MonteCarloSimulationTest:115 - 980209000 -- expPrice=0.9895032774079221
14:48:47,345 INFO MonteCarloSimulationTest:121 - 433685000 -- expPrice=0.9790264042895171
14:48:47,348 INFO MonteCarloSimulationTest:123 - relative difference=-0.010588012548932534
about 1% it is actually bounded by MersenneTwister precision.

Sobol vs Sobol1D on 10D asian:
14:47:08,614 INFO MonteCarloSimulationTest:115 - 717444000 -- expPrice=0.8810736428068913
14:47:08,925 INFO MonteCarloSimulationTest:121 - 308499000 -- expPrice=0.9791449305055208
14:47:08,927 INFO MonteCarloSimulationTest:123 - relative difference=0.11130884290920073
about 10% and stays that way even when increasing number of simulations.

Using an asian rate with 10 points, we see that Sobol1D will always give a very bad estimate, no matter the number of simulations. While Sobol used properly will give (much) better precision for less iterations. So even though there is the word random in quasi random, the numbers are very far from being random or even behaving like random numbers. It helped me to read about Van der Corput and Halton numbers to really understand quasi random numbers.

Java Logging Still Crap in 2009

Fri, Apr 24, 2009

When java logging API was first introduced in JDK 1.4 in 2002, it caused quite a lot a fuss around, with everybody asking “Why did not they just include Log4j instead of creating their own bastard child?”.

I remember having looked at it very shortly before continuing using Log4j on all projects I have been involved with.

Today, while doing a very small project, I tried once more to use java logging. The main reason is that I was lazy to add a dependency to one more jar for this small project. While trying I found out that:

you still need to use a damned JVM parameter to point to your configuration file
you can not change the formatting without writing a formatter class!

It’s 2009! What has Sun done? I am amazed the most elementary things you expect from a Logger are still not included by default in the JDK.

Bachelier vs. Black

Mon, Mar 23, 2009

Black and Scholes gives a strange result for the price of a binary option under high volatility. You will learn here how to simulate a stock price evolution using Java, and how to show it using JFreeChart library. It starts with more complex concepts (don't be afraid) and goes done towards simpler things.

I could not write all that in a blog format, so I created a old HTML page about it here and a PDF version.

Linux Audio State = Miserable

Thu, Mar 19, 2009

There are lots of programs for playing MP3 under linux, a few dealing decently with big libraries. But when you start looking for a program that does crossfade well and manage big libraries easily - there is nothing.

Rhythmbox does some crossfade, but crashes when you move manually in the song. Audacious does some crossfade but regularly crashes with crossfade plugin.

The real alternative are AIMP2 or Foobar2000 in Wine. It is quite incredible that you can have good solid crossfade in wine and not natively in Linux.

Maybe people spent too much time on useless Pulseaudio (I have much less issues using only ALSA).

Senior Developers Team Productivity X4 (from MS Research Paper)

Tue, Feb 10, 2009 There is a very interesting MS Research paper about test driven development (TDD). It is one of the only real study about it that I know of. The paper conclusions from experiments over 4 TDD teams vs 4 traditional teams is:

"TDD seems to be applicable in various domains and can significantly reduce the defect density of developed software without significant productivity reduction of the development team"

Their data gives also other interesting results:

An experienced team (5 people over 10 years + 2 people under 5 years) : 155KLOC C# code (+60 test).

A junior team (3 people under 10 years + 6 people under 5 years): 41 KLOC Java code (+28 test).

If you do the ratio of KLOC/man month, you have the following graph:

I know this is very far from scientific evidence and more like astrology, but still, the most conservative ratio for senior/junior is 4.23!

The End Of Rings Around Plain Java - A Better Concurrency Test

Thu, Jan 15, 2009

In my previous post, I was wondering why single thread was faster. D Andreou gave the correct explanation: as we send only 1 start message and as each node only send 1 message to the next one, there is always only 1 message being processed. So the test is optimum on 1 thread. It does not make much sense to make a multithreading benchmark on a problem that is fundamentally single threaded.

His suggestion was to simple send N start messages where N >= number of processors. In theory, the performance will become optimal with N threads then. Unfortunately this is not what happened in real life. In real life the single threaded performance is still better if you send even 16 messages on a biprocessor machine.

public static void main(String[] args) throws Exception {
    OptimizedRing ring = new OptimizedRing();
    RingNode node = ring.startRing(Integer.parseInt(args[0]));
    node.sendMessage(new StartMessage());
    node.sendMessage(new TokenMessage(node.nodeId,1));
    node.sendMessage(new TokenMessage(node.nodeId,1));
    node.sendMessage(new TokenMessage(node.nodeId,1));
    ring.executor.awaitTermination(10, TimeUnit.MINUTES);
}

My idea was that it was related to the swiching from thread to thread overhead, which is precisely what I think the original author of the test had in mind to test. I am not 100% convinced it is really what’s happening. I wanted a test that would actually be faster using N threads; so I decided to add a bit of computation at before processing each Token. Unfortunately I had the bad idea to compute Pi by Monte Carlo method to do that. Running my tests I was surprised it did not change the results, and made things worse the most computer intensive the computation was (increasing the number of monte carlo iterations). It scared me a bit wondering what the hell could be wrong there. The following class performs much worse with 2 threads compared to 1:

public class BadParallelPi {

    private static void startExecutors() throws Exception {        
        long startTime = System.currentTimeMillis();
        System.out.println(startTime);
        ExecutorService executor1 = Executors.newFixedThreadPool(1);
        executor1.execute(new Computation());
        executor1.execute(new Computation());
        executor1.shutdown();
        executor1.awaitTermination(60, TimeUnit.SECONDS);
        long delay = System.currentTimeMillis() - startTime;
        System.out.println("finished single thread in "+(delay/1000.0));
        startTime = System.currentTimeMillis();
        System.out.println(startTime);
        executor1 = Executors.newFixedThreadPool(2);
        executor1.execute(new Computation());
        executor1.execute(new Computation());
        executor1.shutdown();
        executor1.awaitTermination(60, TimeUnit.SECONDS);
        delay = System.currentTimeMillis() - startTime;
        System.out.println("finished 2 threads in "+(delay/1000.0));
    }
    
    public static class Computation implements Runnable {
        public volatile int count = 0;
         private double computePi() {
            double pi = 0;
            double x,y;
            int n = 10000000;
            for (int i=0;i<n;i++) {
                x = Math.random();
                x *= x;
                y = Math.random();
                y *= y;
                if (x+y < 1) {
                    pi +=1;
                }
            }
            pi = 4*pi/n;
            return pi;
        }
        
        public void run() {
            double pi = computePi();
            long time = System.currentTimeMillis();
            System.out.println(time+" thread "+Thread.currentThread().getId()+" pi="+pi);
            count++;
        }        
    }

    
    public static void main(String[] args) throws Exception {
        startExecutors();
    }
}

Did you figure out why?

It took me less time with this simple code than with the original ring test to find out why. It is simply because of the Math.random call. Math.random only creates one random number generator, and it will be shared among threads. So every thread will wait at the other one at this point. Creating one random generator per thread showed 2 threads were much faster than 1, finally.

Back to the original ring test. Adding the correct way to compute Pi by Monte Carlo, I now had decent test results as long as the number of iterations is not too small. 10 iterations is enough to show a real difference between N threads and 1. Adding a small computation helps figuring out what happens behind the scene. You can also verify D Andreou claim, using only 1 start message the single threaded version is faster. If computation is too weak (for example number of Monte Carlo iteration of 0, one only measures method calls between threads (context switching), which is obviously optimal for 1 thread. Measuring Actor libraries on it is dangerous: if I write a single threaded Actor library, it will be the fastest of this test, but it certainly is not what you want to use as Actor library.

Let’s see now how Scala fares compared to the Plain Java solution, using computation:

Machine	Algorithm	Time for 100000 ring count, 10 mc, 4 messages	Time for 10000 ring count, 100 mc, 4 messages
Core2Duo	OptimizedRing 2 Threads	57s	37s
Core2Duo	OptimizedRing 4 Threads	78s	39s
Core2Duo	Scala Actors	82s	47s
Core2Duo	SimpleRing (100 Threads)	137s	58s
Core2Duo	OptimizedRing 1 Thread	89s	71s
Core2Quad	OptimizedRing 4 Threads	81s	25s
Core2Quad	Scala Actors	71s	30s
Core2Quad	OptimizedRing 2 Threads	61s	43s
Core2Quad	OptimizedRing 1 Threads	100s	80s

The Core2Duo is Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz The Core2Quad is Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz

It is interesting to compare results of 4 threads on a biprocessor with monte carlo count of 10 and 100. We see a much higher thread overhead with fewer computation. With too few computation in monte carlo, the overhead of threads is too high over 2 concurrent threads. This explains why the very simple threading architecture fares much better in the last column compared to the previous one.

Scala Actors fares much better when it is not hindered in the creation of too many threads. It seem actually very good at abstracting multithreading intricacies, while still providing near Java performance in the real world where each actor does enough computation and multithreading is important.

Object Oriented Analysis And Design with Applications Book Review

Thu, Jan 8, 2009

A while ago, I had a comment from someone implying I knew nothing about OO programming because I had not mentioned (and therefore read) Object Oriented Analysis And Design with Applications from G. Booch. I was intrigued by such a silly comment and decided to look at this book that was considered as the bible of OOP.

Well, I don’t find it that good! But I don’t find the bible particularly good either. I like B. Meyer Object Oriented Software Construction book much more, because it is more practical, more in touch with realities while pointing at the real important problems like:

Real systems have no top

In contrast G Booch book has too much evident concepts that don’t really make you learn anything or think things a different way. It is a good book for someone who is learning OO for the first time. It covers the subject in details, but I did not find anything in it that made me say

wow!

It is like most other book you can find on OO. Furthermore only the first parts are on OO, the rest is more a UML tutorial.

Running Rings Around Plain Java - The Killer Code

Thu, Jan 8, 2009

I wrote my previous post too fast. I found a very simple change that increases the speed x6!

The idea is too process messages in a ThreadPoolExecutor. As my Nodes are Runnable, I just needed to initialize a common ThreadPoolExecutor, and in a sendMessage, execute the runnable each time.

Here is the full code: public class OptimizedRing {

    private ExecutorService executor;

    public static void main(String[] args) throws Exception {
        OptimizedRing ring = new OptimizedRing();
        RingNode node = ring.startRing(Integer.parseInt(args[0]));
        node.sendMessage(new StartMessage());
    }

    public RingNode startRing(int n) {
        RingNode[] nodes = spawnNodes(n, startTimer());
        connectNodes(n, nodes);
        return nodes[0];
    }

    private Timer startTimer() {
        Timer timer = new Timer();
        new Thread(timer).start();
        return timer;
    }

    private RingNode[] spawnNodes(int n, final Timer timer) {
        System.out.println("constructing nodes");
        long start = System.currentTimeMillis();
        executor = Executors.newFixedThreadPool(4);
        RingNode[] nodes = new RingNode[n+1];
        for (int i = 0; i < n ; i++) {
            nodes[i] = new RingNode(i, timer, null);
        }
        long end = System.currentTimeMillis();
        System.out.println("Took "+(end-start)+"ms to construct "+n+" nodes");
        return nodes;
    }

    private void connectNodes(int n, RingNode[] nodes) {
        System.out.println("connecting nodes");
        nodes[n] = nodes[0];
        for (int i=0; i<n; i++) {
            nodes[i].connect(nodes[i+1]);
        }
    }

    interface Message {
        String getType();
    }

    private static class StartMessage implements Message {
        public String getType() {
            return "START";
        }
    }

    private static class StopMessage implements Message {
        public String getType() {
            return "STOP";
        }
    }

    private static class CancelMessage implements Message {
        public String getType() {
            return "CANCEL";
        }
    }

    private static class TokenMessage implements Message {
        private int nodeId;
        private int value;

        public TokenMessage(int nodeId, int value) {
            this.nodeId = nodeId;
            this.value = value;
        }

        public String getType() {
            return "TOKEN";
        }
    }

    private class RingNode implements Runnable {
        private int nodeId;
        private Timer timer;
        private RingNode nextNode;
        private BlockingQueue<Message> queue = new LinkedBlockingQueue<Message>();
        private boolean isActive = false;

        public RingNode(int id, Timer timer, RingNode nextNode) {
            nodeId = id;
            this.timer = timer;
            this.nextNode = nextNode;
        }

        public void connect(RingNode node) {
            nextNode = node;
            isActive = true;
        }

        public void sendMessage(Message m) {
            queue.add(m);
            executor.execute(this);
        }

        public void run() {
            if (isActive) {
                try {
                    Message m = queue.take();
                    if (m instanceof StartMessage) {
                        log("Starting messages");
                        timer.sendMessage(m);
                        nextNode.sendMessage(new TokenMessage(nodeId, 0));
                    } else if (m instanceof StopMessage) {
                        log("Stopping");
                        nextNode.sendMessage(m);
                        isActive = false;
                        //
                    } else if (m instanceof TokenMessage) {
                        if (((TokenMessage)m).nodeId == nodeId) {
                            int nextValue = ((TokenMessage)m).value + 1;
                            if (nextValue % 10000 == 0) {
                                log("Around ring "+nextValue+" times");
                            }
                            if (nextValue == 1000000) {
                                timer.sendMessage(new StopMessage());
                                timer.sendMessage(new CancelMessage());
                                nextNode.sendMessage(new StopMessage());
                                isActive = false;
                            } else {
                                nextNode.sendMessage(new TokenMessage(nodeId, nextValue));
                            }

                        } else {
                            nextNode.sendMessage(m);
                        }
                    }

                } catch (InterruptedException ie) {
                    ie.printStackTrace();
                }
            }
        }

        public void log(String s) {
            System.out.println(System.currentTimeMillis()+" "+nodeId+": "+s);
        }

    }

    private static class Timer implements Runnable {
        private BlockingQueue<Message> queue = new LinkedBlockingQueue<Message>();
        private boolean timing = false;
        private long startTime;

        public void sendMessage(Message m) {
            //we don't need to change this implementation as timer is rarely called
            queue.add(m);
        }

        public void run() {
            while (true) {
                Message m;
                try {
                    m = queue.take();
                    if (m instanceof StartMessage) {
                        startTime = System.currentTimeMillis();
                        timing = true;
                    } else if (m instanceof StopMessage) {
                        long end = System.currentTimeMillis();
                        System.out.println("Start="+startTime+" Stop="+end+" Elapsed="+(end-startTime));
                        timing = false;
                    } else if (m instanceof CancelMessage) {
                        break;
                    }
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        }
    }
}

Code	Spawn	Send 100M messages
Scala Actors	15ms	270104ms
SimpleRing	11ms	493073ms
OptimizedRing (4 threads)	6ms	84727ms
OptimizedRing (5+ threads)	5ms	62593ms
OptimizedRing (1 thread)	5ms	60660ms

I finally saw my 4 cores used! Max multithreaded throughput is achieved at 5 threads. However 1 thread is faster. Is this related to memory bandwith limit?

Now I am left wondering if actors are really that important if one can achieve much higher throughput using plain Java and very simple concepts (BlockingQueue, ThreadPoolExecutor). Worse, this test is actually faster with only 1 thread...

Running rings around plain Java

Thu, Jan 8, 2009

Alex Miller has a very interesting test of Actors. He finds out Scala performance is relatively low compared to Erlang, and Kilim is very near Erlang. But Kilim code is the most difficult to read in the lot.

I thought it would be simple to just do the same test in plain Java. I wrote the code for it duplicating the scala logic using Threads instead of Actors.

                                                                                                                                                                                          I was a bit surprised by the result. It was slow and only 1 thread was

b-size:4;-o-tab-size:4;tab-size:4;">public class SimpleRing { public static void main(String[] args) throws Exception { SimpleRing ring = new SimpleRing(); RingNode node = ring.startRing(Integer.parseInt(args[0])); node.sendMessage(new StartMessage()); } public RingNode startRing(int n) { RingNode[] nodes = spawnNodes(n, startTimer()); connectNodes(n, nodes); return nodes[0]; } private Timer startTimer() { Timer timer = new Timer(); new Thread(timer).start(); return timer; } private RingNode[] spawnNodes(int n, final Timer timer) { System.out.println("constructing nodes"); long start = System.currentTimeMillis(); RingNode[] nodes = new RingNode[n+1]; for (int i = 0; i < n ; i++) { nodes[i] = new RingNode(i, timer, null); new Thread(nodes[i]).start(); //later use pool } long end = System.currentTimeMillis(); System.out.println("Took "+(end-start)+"ms to construct "+n+" nodes"); return nodes; } private void connectNodes(int n, RingNode[] nodes) { System.out.println("connecting nodes"); nodes[n] = nodes[0]; for (int i=0; i<n; i++) { nodes[i].connect(nodes[i+1]); } } interface Message { String getType(); } private static class StartMessage implements Message { public String getType() { return "START"; } } private static class StopMessage implements Message { public String getType() { return "STOP"; } } private static class CancelMessage implements Message { public String getType() { return "CANCEL"; } } private static class TokenMessage implements Message { private int nodeId; private int value; public TokenMessage(int nodeId, int value) { this.nodeId = nodeId; this.value = value; } public String getType() { return "TOKEN"; } } private static class RingNode implements Runnable { private int nodeId; private Timer timer; private RingNode nextNode; private BlockingQueue<Message> queue = new LinkedBlockingQueue<Message>(); public RingNode(int id, Timer timer, RingNode nextNode) { nodeId = id; this.timer = timer; this.nextNode = nextNode; } public void connect(RingNode node) { nextNode = node; } public void sendMessage(Message m) { queue.add(m); } public void run() { while (true) { try { Message m = queue.take(); if (m instanceof StartMessage) { log("Starting messages"); timer.sendMessage(m); nextNode.sendMessage(new TokenMessage(nodeId, 0)); } else if (m instanceof StopMessage) { log("Stopping"); nextNode.sendMessage(m); break; } else if (m instanceof TokenMessage) { if (((TokenMessage)m).nodeId == nodeId) { int nextValue = ((TokenMessage)m).value + 1; if (nextValue % 10000 == 0) { log("Around ring "+nextValue+" times"); } if (nextValue == 1000000) { timer.sendMessage(new StopMessage()); timer.sendMessage(new CancelMessage()); nextNode.sendMessage(new StopMessage()); break; } else { nextNode.sendMessage(new TokenMessage(nodeId, nextValue)); } } else { nextNode.sendMessage(m); } } } catch (InterruptedException ie) { ie.printStackTrace(); } } } public void log(String s) { System.out.println(System.currentTimeMillis()+" "+nodeId+": "+s); } } private static class Timer implements Runnable { private BlockingQueue<Message> queue = new LinkedBlockingQueue<Message>(); private boolean timing = false; private long startTime; public void sendMessage(Message m) { queue.add(m); } public void run() { while (true) { Message m; try { m = queue.take(); if (m instanceof StartMessage) { startTime = System.currentTimeMillis(); timing = true; } else if (m instanceof StopMessage) { long end = System.currentTimeMillis(); System.out.println("Start="+startTime+" Stop="+end+" Elapsed="+(end-startTime)); timing = false; } else if (m instanceof CancelMessage) { break; } } catch (InterruptedException e) { e.printStackTrace(); } } } } }

really active at one time. This is why the test is particularly good. It is not trivial to reproduce the functionality in plain Java in an effective manner. It really shows how the concept of Actors can be useful.

SimpleRing
- spawn 100: 11ms
- send 100M messages: 493073ms
Scala Actors
- spawn 100: 15 ms
- send 100M messages: 270104ms

UPDATE with slight changes plain Java rocks!

More Double Troubles

Tue, Jan 6, 2009 We saw in a previous entry how one has to be careful with Double.NaN. Today we will see how regular double can cause problems. By the way the NaN issue was not Java specific and this issue is also general in different programming languages.

A coworker was shocked that in Java (I was a bit surprised he saw that only today, but it is true it can be surprising that such a simple thing does not work as expected):

408.16 - 40.82 = 367.34000000000003

In C, this would lead to the same result. This is all due to the binary represention of double numbers. Using the formula 2^(exponent)*1.mantissa where mantissa is on 52 bits, we have

408.16 decomposition:

exponent = 256. Then 408.16/256 = 1.594375 = 1 + 0x9828F5C28F5C28F5C28F5C... * 2^-52
We round to 52 bits, the mantissa is 0x9828F5C28F5C3 = 2676827028518339.
As a decimal, the internal value is (2676827028518339/2^52+1) * 256 = 408.1600000000000250111042987555265426635742

40.82 decomposition:

exponent = 32. Then 40.82/32= 1.275625 = 1 + 0x468F5C28F5C28F5C...*2^-52
Rounded to 52 bits, the mantissa is 0x468F5C28F5C29 = 1241304647293993
As a decimal, the internal value is (1241304647293993/2^52 + 1)*32 = 40.8200000000000002842170943040400743484497

The difference in decimal becomes 367.34000000000002472... which becomes 367.34000000000003 when represented in binary (to convince yourself you can apply the same technique).

The Solution

One solution to this problem is to use java.math.BigDecimal which stores a number as 2 integers, one for the digits, one for the exponent power of 10 (and not 2).
The correct code would become:

BigDecimal value = BigDecimal.valueOf(408.16).subtract(BigDecimal.valueOf(40.82));

value would then be 367.34.

But BigDecimal has also many potential for bugs. For example, you should never use the constructor taking a double but always the one taking a String.

new BigDecimal(408.16) = 408.16000000000002501110429875552654266357421875

This is because of the binary representation of 408.16 as a double. 408.16 is only an approximation of 408.16!

Another trick with BigDecimal is not to use equals(Object) but compareTo(Object) because 408.160 is not equal to 408.16 using equals.

Why Could not They Make it Work With Double?

If you were too lazy to follow the steps of the explanation. There is a simpler explanation. Imagine the representation of a number in base 3 with 2 "digits". Let's imagine 1/3 is represented as 0.1 (this is a very simple number representation) 1/3+1/3+1/3 becomes 0.1+0.1+0.1 = 1.0 (in base 3) = 1.0 if we convert to base 10. Now in base 10, 1/3 can only be represented as 0.3, so 1/3+1/3+1/3 = 0.3+0.3+0.3 = 0.9 <> 1.0.
So BigDecimal is only interesting to handle ... decimals! In the enterprise world, this should be most apps. It is a bit sad it appeared so late in the JDK. It should really be a primitive type.