Threading in Cincom Smalltalk
Darren Hobbes has just started re-examining Smalltalk, and is looking at VisualWorks. He was worried about threading, so I figured I'd point a few things out:
- VisualWorks is single threaded (in general; the VM runs a few threads for housekeeping purposes, and you can use native threads for external API calls)
- In VW, Smalltalk processes are "green" - i.e., they run within the context of the single heavyweight process
- In ObjectStudio, Smalltalk processes map directly to Windows threads
Interestingly enough, you get far, far better control and predictability with the green threads - and a VisualWorks app (like this blog, for instance) scales quite nicely. Our engineering group just ported Opentalk from VW to ObjectStudio, giving the two products better interoperability - and the single biggest problem was with the native threading. If a native thread dies, you lose the entire application - as opposed to a VW thread, where you log the problem and move on. Earlier today, I introduced a bug into the server with comment handling (I was putting a throttling system in place). I forgot to update the form page, and - as a consequence - every process trying to add a comment threw an exception. These simply got logged by the server, whic kept humming.
Here's another example of process usage - BottomFeeder. With the default settings, Bf spawns a new process for each http query done during the update cycle. I subscribe to 296 feeds now, so that can be a lot of processes. If those mapped to Windows threads, my machine would be brought to its knees - instead, I still have UI responsiveness.
Sometimes, you have to look beyond the conventional wisdom to see how scaling works - native threads are not the be-all, end-all that people think they are.


Comments
Threading
[Pensieri di un lunatico minore] January 16, 2005 0:04:17.973
Trackback from Pensieri di un lunatico minore
Threading
James Robertson talks about threading in Smalltalk, which was spawned by some discussion on the mailing list about threading, and......
Scalability is about increasing thoroughput not concurrency
[ade] January 16, 2005 16:47:56.948
Darren's experiments came out of a discussion we had at the office about the contrast between the increased productivity of Smalltalk and the availability of high-quality libraries in Java.
To take your example of spawning a new thread for every feed in an update cycle: this increases concurrency but not scalability. This is because you're likely to run out of available bandwidth if you have that many threads all trying to connect at the same time. A better approach (and incidentally the one I took with Aggrevator: my attempt at an RSS aggregator) is to identify how many concurrent threads are needed to make the best use of a shared resource like bandwidth. Spawning a new thread for every feed doesn't scale when you're subscribing to several thousand feeds (as I do) and a pool of 20 threads actually performs better than several thousand threads that spend most of their time contending for bandwidth.
Implementing a good threadpool isn't easy and I have no doubt that it would be easier in Smalltalk than Java. But in the Java world we don't build things like that. We just download them. In my case Doug Lea had built a library that did exactly what I needed and it had been proven in many production environments.
Threading in Cincom Smalltalk
[ James Robertson] January 16, 2005 17:04:42.517
Comment by James Robertson
ade,
rather than use thread pooling (I considered it, but decided that it served no useful purpose), I've got a number of configurable options in BottomFeeder:
I tend to change my usage based on available bandwidth - single threaded in hotels on slow dialup, threaded when I'm on broadband. One thing to note - just because I subscribe to 300 feeds doesn't mean that I spawn 300 threads contending for bandwidth. There are feed level checks (based on meta data from the feed itself) on whether it's "time" for an update yet. As well, each query is a conditional-get (assuming the back end supports it).
Scalability is expensive and has to be designed in
[ade] January 16, 2005 17:25:19.736
James, I installed BottomFeeder and was very impressed with the installer. I then imported 30 or so feeds from my aggregator and things were fine. I then tried to import about 5100 feeds and it crashed during the import phase after several hours. I'm trying it again, the CPU is at a 100% usage and I expect it to crash soon. The same thing happens with just about all the other aggregators I've tried except for JetBrains Omea.
My essential point is that scalability is something you have to design for from the very beginning with an idea of just how much scalability you want and can afford. Then you can measure how scalable your application is because you started off with a scalability requirement.
I started off with the specific requirement that it should be possible to easily and safely import the OPML blogroll of someone like Robert Scoble or any of the others on the prolific subscribers list at Share Your OPML. With that metric in mind I've been able to evaluate my application and re-architect when necessary. On the other I doubt if my aggregator would scale up to millions of feeds. But I know that's outside the application's scope and consequently I don't have to worry about it.
We all tend to be very fuzzy when we talk about scalability and we would find it easier to resolve scalability issues if we specified measurable concrete requirements.
Threading in Cincom Smalltalk
[ James Robertson] January 16, 2005 22:12:24.528
Comment by James Robertson
ade,
This is more a failure of imagination on my part than anything else - I never really considered a usage pattern of that many feeds :)
Threading in Cincom Smalltalk
[ James Robertson] January 17, 2005 2:30:03.327
Comment by James Robertson
As an aside, I just added a thread pool - while I was watching "Desperate Housewives". It's not released to the dev stream yet, since I'd like to test it some more. Wasn't a complex piece of work though...
Threading in Cincom Smalltalk
[ Martin Kobetic] January 17, 2005 14:29:34.744
Comment by Martin Kobetic
The original thinking behind the suggestion to simply spawn a smalltalk process for each update request was based on the fact that smalltalk processes are lightweight green threads. As such they are very cheap and from the OS level they represent a single native thread. The expectation is that they will be synchronized by the OS level read/write primitives. The desired effect is to have them send out the requests at a maximum possible speed end then just wait for responses to trickle in as they come. Adding a thread pool scheduler on top of the existing smalltalk ProcessorScheduler seems kinda pointless and makes the code that much more complicated. VW can handle thousands of ST processes without any problem. The only thing that I can think of that could make a noticeable difference is if the suspended processes had a large stack with mostly identical content associated with it, so the thread pool could maybe eliminate some of the memory overhead. But I wouldn't expect that to show up in something like BottomFeeder even with few thousands of feeds. Of course these are all just conjectures and the reality will likely show something different :-). Although in that case I'd suspect there might be other causes of trouble.