FloatArray rewrite
September 1, 2008, 11:38:14 pm

It took me the long weekend (on and off, I did other things too!) to redo the 7 lessons (now 6) using FloatArray, VertexArray and my new VectorN and MatrixN and ColorN classes which are also subclasses of FloatArray.

I think it was a worthwhile rewrite. The objects lend themselves to the way OpenGL actually operates well. As a result, I decided to redo the GPU math test to see what numbers I could get now that things are easier to work with. The results are staggering. I could get a consistent 2000% to 3000% improvement using C to do the floating point math, but the GPU wins on bigger data sets easily:

Runs Size CPU GPU Speedup
1000 1000 0 2 0%
100 10000 12 7 171%
100 100000 150 34 441%
10 1000000 4254 157 2709%
1 10000000 176291 1376 12812%
By Claus on September 2, 2008, 4:49:55 am

Why does the GPU win on bigger data sets? Is there so much overhead?

By Michael Lucas-Smith on September 2, 2008, 5:43:52 am

Basically it comes down to the cost of objects. Since each float is an object and the math makes two floats for each float, we've created an intermediate float that will be garbage collected. However, remember that we have 10 million of them in the final example, which means lots of new memory spaces, lots of objects being pushed in to old space, run inside a very tight loop, lots of garbage generated, lots of object table entries.. lots and lots of wasted time. The VM doesn't really stand a chance as you scale up.

But as we can see, even for small numbers of floats we still get an advantage using the GPU.. which is actually a little surprising given the amount of free cycles we have to waste on modern CPUs :)