general

Moving blog to blog.epigent.com

December 6, 2010 14:21:39.804

I decided to move the blog to my own domain for Epigent at blog.epigent.com. The new blog will have the same focus on Smalltalk and related technologies. Please update your subscription to the new blog.

I want to thank Cincom, and James Robertson, for hosting my blog all these years.

general

"All existing revision control systems were built by people who build installed software"

November 5, 2010 8:08:56.223

Paul Hammond has an interesting presentation on how version control software is designed for "installed software", and how to deal with this in "deployed web applications".

 

 

Smalltalk

RoarVM Released

November 4, 2010 7:00:38.770

RoarVM has been released:

"RoarVM, formerly known as the Renaissance Virtual Machine (RVM) is developed as part of a IBM Research project to investigate programming paradigms and languages for manycore systems of the future. Specifically, this VM is meant to support manycore systems with more than 1000 cores in the future."

Note that RoarVM approach to parallelization is based on a shared-memory model, while other Smalltalk multi-core approaches (like Polycephaly) are based on  message-passing between multiple images, each having their own memory space.

Personally I have more hopes for a message-passing model, but all work in this area is interesting.

Smalltalk

VisualWorks 7.7.1 able to use 3.0 GB on Windows 7 64 bit

November 4, 2010 5:46:58.391

VisualWorks 7.7.1 (32 bit) can use 3.0 GB of memory when running on Windows 7 64 bit. Previously the memory limit for VisualWorks running on Windows was 2.0 GB.


The change in memory usage was introduced in the VisualWorks 7.7.1 VM by enabling the LARGEADDRESSAWARE flag for the executable.

To use up to 3 GB of memory, you must increase the memory policy’s “defaultMemoryUpperBound”. One way of doing this is subclassing LargeGrainMemoryPolicy and override defaultMemoryUpperBound like this:

defaultMemoryUpperBound

                ^1024 * 1024 * 1024 * 3

Please note that this memory policy will be aggressive on using memory. I will post more about this later.

The new memory limit works "out-of-the-box" on Windows 7 64 bit. For other Windows editions (Windows 7 32 bit, XP 32 bit/64 bit, Vista 32 bit/64 bit) it will probably be possible to configure Windows so that it supports more memory, but I have not tested this yet.

general

Evernote: "Starting from scratch"

October 28, 2010 3:02:57.665

Here is what Evernote has to say about using Windows .NET and WPF:

"Starting from scratch

Evernote 4 is a major departure from Evernote 3.5 in every way. While 3.5 added tons of great new features, there were some problems we simply couldn’t fix: the blurry fonts, slow startup times, large memory footprint, and poor support for certain graphics cards were all issues that the technology behind 3.5 (Windows .net and WPF) was incapable of resolving. As a result, we ended up chasing down platform bugs rather than adding the great features our users wanted.

So we decided to start over from scratch, with fast, native C++ that we knew we could rely on. As you’ll see, the results are amazing. This new version will set a foundation for rapid improvement."

Smalltalk

Running VisualWorks from RAM Disk

April 11, 2010 2:53:49.942

You might want to try using a RAM disk to speed up several operations in VisualWorks. I use RAMDisk from Dataram which supports most editions of Windows, including 64 bit. Place your image (.im) and change file (.cha) on the RAM disk.

On my computer, loading code (non-binary) from a Store database on the LAN is twice as fast when using a RAM disk instead of a spinning disk. Image save and browning code changes is also very fast.

RAM is very fast, hard disks are slow. SSD (hard disk based on Flash memory) are somewhere in-between. But SSDs can have bad performance compared to RAM, especially on write operations. My guess is that VisuaWorks will gain more from using RAM disks than SSDs.

What about losing data? Most RAM disks can save their contents to disk. But any media can fail, including hard disks. So you need to have some kind of backup plan, regardless of whether you use RAM or magnetic media.

development

Misunderstanding the SOLID principles

February 7, 2009 17:39:13.333

In a recent podcast, with parts transcribed here, Joel Spolsky attacks the SOLID principles. The SOLID principles give advice on how to handle dependency management in software projects. They explain how to prevent dependencies getting out of control, and how to structure code in the various layers of a project.

Joel's claim is that following the SOLID principles cause to much code, and that developers that follow the principles spend “(…) enormous amount of time writing a lot of extra code.”

I think it is generally accepted that dividing a project into modules helps the whole development process. The SOLID principles simply explain how to do this in practice. Joel should either state that he does not care about dependency management, or describe alternatives to the SOLID principles if he thinks they are not usable to keep dependencies under control. Keeping dependencies under control is hard, and you need to apply non-trivial techniques to accomplish it. If you think it is OK to let each software project be defined as a single module, it is true that the SOLID principles are a waste of time.

Joel does not fully understand the SOLID principles. He believes that following them will lead to an explosion in the number of classes required:

One of the SOLID principles, and I'm totally butchering this, but, one of the principles was that you shouldn't have two things in the same class that would be changed for a different reason [PDF]. Like, you don't want to have an Employee class, because it's got his name which might get changed if he gets married, and it has his salary, which might get changed if he gets a raise. Those have to be two separate classes, because they get changed under different circumstances. And you wind up with millions of tiny little classes, like the EmployeeSalary class, and it's just... (laughs) idiotic!

It seems clear that name and salary both are part of the same module. They are probably both part of the module that is commonly termed the “business layer”. Then, there is no need to create separate classes for these two properties. You should divide functionality after the layers/components in the problem domain, not per property, as Joel believes.

The linked article from Object Mentor specifically mentions an example where modules for geometry and graphics are mixed in the same class. The advice is to separate the graphic system into a separate module. The new graphic module will depend on the geometry module. Then, any changes in the graphic module will not affect the geometry layer. There is no claim to make the structure Joel describes.

Joel reads the article in the context of C++, without reflecting over the progress that has been made in the tools he uses himself. C# and Visual Basic both support layering code into modules without making "millions of tiny little classes". The new construct (in C#) is named "extension methods", but has been used to make modules in Smalltalk projects since the mid 1990s. Modules simply add methods to existing classes in other modules.

When the layering is not done per property and tools support layering without making new classes, following the SOLID principles is not … “idiotic”. You do not end up with a lot more code. But you get more control over how the various parts of your system interact.

Smalltalk

Supporting Multiple CPU Cores in VisualWorks

July 29, 2008 16:28:52.814

The VisualWorks Roadmap says Cincom will research how to make it ”(…) easier to leverage multi-core computer”. Cincom will follow a share-nothing approach for multi-core support in VisualWorks. Their approach will therefore probably be to add better support for running multiple images.

But how should spawning new images happen? In a new OS thread per image? How many images will developers spawn, and when will it be done? Per task you need to parallelize, or during startup of the application?

As Anwar Ghuloum of Intel explains, there exists two directions when scaling for multiple cores. I think these directions can be summarized as:

  • Parallelize the code to match the current hardware. This typically means spawning threads to match the number of cores you have at hand. You parallelize your code in n jobs in order to keep n cores busy.
  • Parallelize the code to match the domain problem. Erlang-style programming follows this approach. This style results in process counts far larger than the number of cores on today’s CPUs.

Using match-the-hardware approach means you target hardware that soon becomes dated. Anwar Ghuloum recommends programming “(…) for as many cores as possible, even if it is more cores than are currently in shipping products.“ This means choosing the match-the-domain approach, which is what Cincom should try to support in VisualWorks.

A problem with using the match-the-hardware approach is that spawning many OS threads boggles the operating system. Erlang solves this nicely by using lightweight VM (Virtual Machine) threads that is executed using a pool of OS threads. The number of OS threads matches the number of cores.

VisualWorks should have support for letting multiple images share all static data. This would for example include all Smalltalk byte code. Letting each image consume several megabytes is too much. The question is whether match-the-domain approach is feasible in VisualWorks at all. Without some smart work by Cincom, starting a new image will be many times as expensive as starting a new OS thread. Developers will then need to use the match-the-hardware approach.

Cincom should look at running multiple instances of a (single) image from one VM. All images should share their static data. The VM would use a pool of OS threads (possible matching the number of cores), and schedule the execution of the images using its own algorithm. Starting a new image should consume less resources than starting an OS thread. Erlang consumes around 1200 bytes per process. Maybe this is an impossible goal for Smalltalk. Basically, Cincom need to look at Erlang and see if there are lessons to learn.

As I wrote earlier, there already exists a solution to scale on multiple cores using Squeak. The Hydra VM basically eases start-up and communication between multiple images running in parallel. In Hydra, each image uses its own memory (no sharing of static data), and one OS thread. Cincom should aim for an approach similar to Hydra’s, but needs to improve it.