Scala Again

Mon, Feb 6, 2012 I am trying Scala again. Last time, several years ago, I played around with it as a web tool, combining it with a Servlet Runner like Tomcat. This time, I play around with it for some quantitative finance experiments.

Why Scala? It still seem the most advanced alternative to Java on the JVM, and the mix of functional programming and OO programming is interesting. Furthermore it goes quite far as it ships with its own library. I was curious to see if I could express some things better with Scala.

Here are my first impressions after a week:

I like the object keyword. It avoids the messy singleton pattern, or the classes with many static methods. I think it makes things much cleaner to not use static at all but distinguish between object & class.
I like the Array[Double], and especially ArrayBuffer[Double]. Finally we don't have to worry between the Double and double performance issues.
I was a bit annoyed by a(i) instead of a[i] but it makes sense. I wonder if there is a performance implication for arrays, hopefully not.
I like the real properties, automatic getter/setter: less boilerplate code, less getThis(), setThat(toto).
Very natural interaction with Java libraries.
I found a good use of case classes (to my surprise): typically an enum that can have some well defined parameters, and that you don't want to make a class (because it's not). My use case was to define boundaries of a spline.
I love the formatter in the scala (eclipse) IDE. Finally a formatter in eclipse that does not produce crap.

Now things I still need time to get used to:

member variable declared implicitly in the constructor. I first made the mistake (still?) to declare some variables twice.
I got hit by starting a line with a + instead of ending with a +. It is dangerous, but it certainly makes the code more consistent.
Performance impacts: I will need to take a look at the bytecode for some scala constructs to really understand the performance impact of some uses. For example I tend to use while loops instead of for comprehension after some scary post of the Twitter guys about for comprehension. But at first, it looks as fast as Java.
I wrote my code a bit fast. I am sure I could make use of more Scala features.
The scala IDE in eclipse 3.7.1 has known issues. I wish it was a bit more functional, but it's quite ok (search for references works, renaming works to some extent).
Scala unit tests: I used scala tests, but it seems a bit funny at first. Also I am not convinced by the syntax that avoid method names and prefer test("test name"). It makes it more difficult to browse the source code.

Some things they should consider:

Integrate directly a Log API. I just use SLF4J without any scala wrapper, but it feels like it should be part of the standard API (even if that did not work out so well for Sun).
Double.Epsilon is not the machine epsilon: very strange. I found out somewhere else there was the machine epsilon, don't remember where because I ended up just making a small object.
Unit tests should be part of the standard API.

Overall I found it quite exciting as there are definitely new ways to solve problems. It was a while since I had been excited with actual coding.

KDE 4.8 finally has a dock

Fri, Jan 27, 2012 KDE 4.8 finally has a dock: you just have to add the plasma icon tasks. Also the flexibility around ALT+TAB is welcome. With Krusader as file manager, Thunderbird and Firefox for email and web, it is becoming a real nice desktop, but it took a while since the very bad KDE 4.0 release.

It is easy to install under ubuntu 11.10 through the backports and seems very stable so far.

Something quite important is to tweak the fonts: use Déjà Vu Sans instead of Ubuntu fonts, use RGB subpixel rendering, use Crisp desktop effects. With those settings, KDE looks very nice. It's sad that they are not default in Kubuntu.

Update March 2013: It's been a while now that it is in the standard Ubuntu repositories and I believe installed by default, one has just to remove the task manager widget add the icon task widget:

One can also change the settings using a right click (I find useful not to highlight the windows) and it can look like:

List of companies where I have been an employee

Wed, Jan 11, 2012

Intern:

Siemens (Berlin)
IBM (Boeblingen)
Osram Sylvania (Beverly, MA)

Employee:

Cap Gemini (Paris) working for Alcatel
Silicomp (Paris) working for Alcatel Nextenso
C2labs / one 0 development (San Francisco, CA) working for Whenmobile, Sony Pictures, GoPix, Technorati.
Credit Agricole alternative (Paris)
Prima solutions (Paris)
Esearchvision (Paris)
Ulink (Paris)
Edifixio (Paris)
Horizon (Paris)
Darty (Paris)
Calypso (Paris)

Generating random numbers following a given discrete probability distribution

Mon, Jan 9, 2012

I have never really thought very much about generating random numbers according to a precise discrete distribution, for example to simulate an unfair dice.

In finance, we are generally interested in continuous distributions, where there is typically 2 ways:

the inverse transform (usually computed in a numerical way),
and the acceptance-rejection method, typically the ziggurat.

The inverse transform is often preferred, because it’s usable method for Quasi Monte-Carlo simulations while the acceptance rejection is not.I would have thought about the simple way to generate random numbers according to a discrete distribution as first described here. But establishing a link with Huffman encoding is brilliant. Some better performing alternative (unrelated to Huffman) is offered there.

Quant Interview & Education

Wed, Dec 21, 2011 Recently, I interviewed someone for a quant position. I was very surprised to find out that someone who did one of the best master in probabilities and finance in France could not solve a very basic probability problem:

This is accessible to someone with very little knowledge of probabilities

When I asked this problem around to co-workers (who have all at least a master in a scientific subject), very few could actually answer it properly. Most of the time, I suspect it is because they did not dedicate enough time to do it properly, and wanted to answer it too quickly.

It was more shocking that someone just out of school, with a major in probabilities could not answer that properly. It raises the question: what is all this education worth?

The results were not better as soon as the question was not exactly like what students in those masters are used to, like for example, this simple stochastic calculus question:

My opinion is that, today in our society, people study for too long. The ideal system for me would be one where people learn a lot in math/physics the first 2 years of university, and then have more freedom in their education, much like a doctorate.

We still offered the job to this person, because live problem solving is not the most important criteria. Other qualities like seriousness and motivation are much more valuable.

Gnome 3 not so crap after all

Wed, Nov 30, 2011

In a previous post, I was complaining how bad Gnome 3 was. Since I have installed a real dock: docky, it is now much more usable. I can easily switch / launch applications without an annoying full screen change.

In addition I found out that it had a good desktop search (tracker). The ALT+F2 also does some sort of completion, too bad it can not use tracker here as well.

So it looks like Gnome 3 + gnome-tweak-tool + docky makes a reasonably good desktop. XFCE does not really fit the bill for me: bad handling of sound, bad default applications, not so good integration with gnome application notifications.

Now if only I found a way to change this ugly big white scrollbar…

Good & Popular Algorithms are Simple

Thu, Nov 17, 2011 I recently tried to minimise a function according to some constraints. One popular method to minimise a function in several dimensions is Nelder-Mead Simplex. It is quite simple, so simple that I programmed it in Java in 1h30, including a small design and a test. It helped that the original paper from Nelder-Mead is very clear:

However the main issue is that it works only for unconstrained problems. Nelder and Mead suggested to add a penalty, but in practice this does not work so well. For constrained problems, there is an adaptation of the original idea by Box (incredible name for a constrained method) that he called the Complex method, a deliberate pun to the Nelder-Mead Simplex method. The basic idea is to reset the trial point near the fixed boundary if it goes outside. Now this took me much longer to program, as the paper is not as clear, even if the method is still relatively simple. But worst, after a day of deciphering the paper and programming the complex method, I find out that it does not works so well: it does not manage to minimise a simple multidimensional quadratic with a simple bound constraint: f(X)=sum(X)^2 with X >= 0 where X is an N-dimensional vector. In the end, I don't want to work with such a simple function, but it is a good simple test to see if the method is really working or not. Here is how the function looks with N=2:

And if we restrict to x,y >= 0 it becomes:

I suspected an error in my program, so I decided to try with scilab, that has also the Box method as part of their neldermead_search functionality. Scilab also failed to compute the minimum in 8 dimensions of my simple quadratic.

I tried various settings, without ever obtaining a decent result (I expect to find a value near 0).

There is another algorithm that can find a global minimum, also very popular: the Differential Evolution. At first, being a genetic algorithm, you would think it would be complicated to write. But no, the main loop is around 20 lines.

Those 20 lines are a bit more difficult than Nelder-Mead, but still, when I saw that, I understood that "this is a classic algorithm". And it does work with constraints easily. How to do this is explained well in K. Price book "Differential Evolution", and takes only a few lines of code. Here is the result I got in dimension 29:
dim=29 min=1.2601854176573729E-12 genMax=412 x=[3.340096901536317E-8, 8.889252404343621E-8, 7.163904251807348E-10, 9.71847877381699E-9, 2.7423324674150668E-8, 2.4022269439114537E-9, 1.7336434478718816E-11, 7.244238163901778E-9, 1.0013136274729337E-8, 7.412154679865083E-9, 5.4694460144807974E-9, 2.3682413086417524E-9, 4.241739250073559E-7, 4.821920889534676E-10, 2.115396281722523E-9, 8.750883007882899E-8, 2.512011485133975E-9, 4.811507109129279E-9, 1.0752997894113096E-7, 5.120475258343548E-9, 8.404448964497456E-9, 4.1062290228305595E-9, 1.7030766521603753E-8, 5.589430643552073E-9, 8.237098544820173E-10, 3.5796523161196554E-9, 5.186299547466997E-9, 2.430326342762937E-7, 5.493850433494286E-9]

It works well, although there is quite a bit of parameters. I noticed that the strategy was especially important.

Gnome 3, Unity, Crap

Wed, Aug 17, 2011

After the upgrate to Ubuntu 11.04 I was directly on Unity. Having a dock on the left side is nice, but unfortunately, it has various bugs which makes it sometimes annoying. Also the menu on top like Mac Os X is not a bad idea, but it breaks many apps (for example Picasa). Then there is the scrollbar insanity, it’s almost impossible to click on to scroll, and again breaks in some apps (for example Eclipse). Fortunately one can disable all of that and go back to the old Gnome 2.

Then I decided to try Gnome 3 the past 2 weeks. I was curious that Linus found it so bad. At first it looked nice, probably because it’s different. But very quickly, it becomes annoying: the full screen changed just to switch window with the mouse is a stupid idea. And then installing Gnome 3 in Ubuntu is not such a great idea: Rhythmbox does not work well anymore with it, all my “file open” mapping was gone and replaced with garbage: why does it propose GIMP to open a PDF is beyond me.

Gnome 2 + a nice standard dock + widgets (or the nice unity indicators) + an indexer would have been perfect. Now I am back to KDE 4 which provides all of that, even if it feels a bit bloated.

Carmack & GPGPU programming

Sun, Aug 14, 2011

Finally someone who shares the same opinion on the current state of GPGPU programming.

John Carmack: On the other hand, we have converted all of our offline processing stuff to ray tracing. For years, the back-end MegaTexture generation for Rage was done with… we had a GPGPU cluster with NVIDIA cards and it was such a huge pain to keep. It was an amazing pain where one system would be having heat problems and would be behaving weird even though we thought they had identical drivers. Something would always be wrong with render farm 12, and whenever we wanted to put in new features it was like “Okay, writing new fragment programs to go into this.” Now, granted I did this just when CUDA was in its infancy. If I did re-implement it with OpenCL or CUDA we wouldn’t have some of these problems, but when I converted all these over to ray tracing there was a number of things that got a lot better. Things that we deal with, for example shadows and reflections that have to be approximated, and were so used to doing with rasterization… we sometimes forget how big of hacks these are. To be able to say I really just want that ray, and tell me what it hit; not do a projection with feathering shadowed edges and whatever the heck else we’re doing there, so much of the code got so much easier. If it’s a choice of… now that we have these awesome multi-core x86 CPUs where we can get 24 threads in commodity boxes… it’s true that one GPU card can do more ray tracing than one 24 thread x86 box, but it’s not multiples more and if it’s just a matter of buying more $2000 boxes, it makes the development, maintenance, and upkeep much better. While everyone in high performance computing is all “rah-rah” GPUs right now, I’ve come full circle back around to saying the fact that we can get massive amounts of x86 cores and threads… it wont win on FLOPS/watt or FLOPS/volume, but in terms of results per developer hour it is much, much better.

exp(y*log(x)) Much Faster than Math.pow(x,y)

Fri, Apr 8, 2011

Today I found out that replacing Math.pow(x,y) by Math.exp(yMath.log(x))* made me gain 50% performance in my program. Of course, both x and y are double in my case. I find this quite surprising, I expected better from Math.pow.