David Buck reports on the next Ottawa Smalltalk User's group meeting, which will be held in conjunction with the local Python group's meeting. Sounds like fun
I love it when an idea comes all the way back around after a multi-year detour. Have a look at this post from Dave Roberts on binary XML:
Essentially, everybody is finally realizing that while XML is the first widely accepted data markup format, it's a pig. It's verbose and redundant, which makes it store poorly, transmit slowly, and parse, well, like a pig. Don't get me wrong, the original idea for XML was actually fine, but everybody has taken it way over the top and things like SOAP are just an abomination.
Well, one way to help XML while still retaining XML semantics is to make a binary version of it. While a compression program like gzip can reduce the overall transfer and storage size of an XML document, an XML parser would still have to deal with the XML textual format on either end of a transfer. And that textual XML format actually expands binary data carried in an XML document by forcing it into BASE64 format which does a 3-goes-to-4 encoding.
The reason XML took off had far more to do with the openess of Port 80 than with anything else. RSS? Nowhere without the open ports. SOAP? Nowhere without the open ports. So now we want to create a binary encoding for XML? Heck, why not just run a CORBA broker on port 80? We could have done that 7 years ago and called it a day...
Is there any information domain where the open, world-writable wiki model can't be beneficially applied?
We're sure to find out, as wikis and wiki-variants appear everywhere.
When frustrated with DMOZ, I've often wished for a more radically open web directory, with submissions and categorizations from anyone, at any time, like a wiki.
Community moderation would curb the worst abuses. Those wishes have been answered: the latest from the folks behind Wikipedia is Wikia, which applies the wiki philosophy to a search index of web sites.
I'm not sure that moderation is enough. Have a look at the Recent Changes page on the VW Wiki at UIUC - see all the "changes" over the last few weeks? There's been a flood of daily spam attacks, with the corresponding promotion of good versions of the pages back up. Gordon notes this:
Why not let any contributor instantly add sites -- even individual pages within sites -- and reorder the results of any search based on users' perception of sites' appropriateness to the query? Well, spammers and system-abusers and ranking-wars, I guess. But could open feedback systems be devised that keep those problems suitably in check? It's worth a try!
I don't know what the Wikipedia site does to prevent spam; maybe they have something like Spam Assassin hooked up on the back end. Relying on user moderation is an endless, thankless task though - and I rather suspect that the maintainers will run out of steam before the spammers do. It really, really sucks how the a******* have ruined the entire neighborhood this way.
The Titan probe landed successfully yesterday, and has started sending back data on what it's finding. I love this description of the terrain:
Data sent back by the Huygens space probe from the Saturnian moon Titan show a frozen, orange world shrouded in a methane-rich haze with dark ice rocks dotting a riverbed-like surface the consistency of wet sand, scientists said on Saturday.
Kudos to the European Space agency, Italian Space agency, and NASA for their work on this.
ComputerWorld reports another nail in the iTanium's coffin - no more Windows XP for that platform:
Microsoft Corp. has pulled the plug on a version of Windows XP for workstations running Intel Corp.'s Itanium 2 processor. The move follows the decisions by major hardware suppliers to stop building workstations based on the 64-bit chip.
I see that Larry O'Brien is all excited about AOP - specifically, about being able to wrap methods:
Aspects are classes whose methods are injected into existing programs at "join points," the most common of which are method entry and exit. Another way to put it is that aspects intercept method invocations and surround the existing method with new behavior. The other critical feature of aspects is that they do this without any modification of the existing source code whatsoever.
I love the way people using Java and .NET get all excited over things that were invented eons ago in Smalltalk (or in many cases, in Lisp). Take a look at this paper from John Brant, Don Roberts, and Ralph Johnson. That dates to the mid 90's, but if you look at the references at the bottom, you'll see that it's all based on previous work.
The excitement never ends for curly brace programmers, because the past simply doesn't exist...
I just ran across an interesting article by Allen Holub on Open Source, and some qualms he's developing over it:
My most recent experiences with Tomcat demonstrate my dilemma. The newest version of Tomcat was just plain broken. It didn't load and run custom tags correctly, for example. Though I could (in fact did) go to an earlier version, the fact that Tomcat wasn't subjected to even rudimentary regression testing is disturbing.
The next issue is one of documentation and developer attitude. I wanted to use a local-configuration-file mechanism that's pushed heavily in the documentation (context.xml), and I was deploying from the development directory using the Ant build.xml file that's shipped with Tomcat. This process simply doesn't work.
I started with the documentation to find a solution. Tomcat's documentation is virtually unusable, though. It's a hodgepodge of inadequate .html files. There's no way to print it. There's no real organizational principle, index or search mechanism. A lot of the documentation is written by developers for other developers, so it is sketchy at best, and incomprehensible to a new user.
This isn't surprising. A lot of the necessary, but "drudgery" related tasks for a software project simply don't get done unless someone is paying for them (either explicitly or implicitly). Documentation is one of the best examples - and before you point to Eclipse or Apache - look at the well funded foundations that back those projects - someone is paying for that drudge work. The same thing applies to usability related tasks (build procedures, UI, etc) - once the lead developer(s) have a system that works, they tend to stop - it's no longer an interesting problem, so it falls by the wayside. In a funded effort, you end up with complaints coming in that might affect revenue - so the problems tend to get addressed
Note that funding is separate from licensing - this isn't a failure of Open Source per se. It's endemic to Open Source simply because most Open Source projects are not funded. Allen hits the punch line here:
The bottom line is that you have to do a lot of work to use an open-source product like Tomcat. You have to test new versions rigorously; you have to compensate for inadequate documentation by using a high-volume mailing list that may or may not provide an answer in a day or two; you have to trust amateurish code, which you can't really repair yourself because your version will no longer be "standard." Even if you could repair the code, progress on your real project would slow down.
All of these problems translate directly to wasted time and wasted money. The Tomcat team has no incentive to fix any of this because they're not being paid and have no market pressure on them to improve quality or provide real customer support. I know that the open-source party line is that third parties will step up to fill these gaps, but then you're paying for the software, aren't you? So much for the price advantage.
There's a danger there even to funded open source efforts that Allen doesn't address - I wrote about that here. To summarize, take an open product of significant scale - JBoss, for example. You can buy service and support from the company itself, but getting the product itself is free. There's nothing stopping another entity from setting up support for that product at a lower price - and at lower cost. The JBoss group has to carry the cost of both developers and of support staff - a third party need only carry the cost of support. In a non-open licensed product, the vendor can still recover their costs via license sales, even if their support gets cannibalized. What's the Open Source vendor going to do in that case? I rather suspect that they'll get blown out of the water.
There's no free lunch for this stuff - if you expect to get everything for free, you might want to ask yourself why your company doesn't give its products away.
The comment spammers keep trying - there was a spate of it this afternoon - 15 on one post. The amazing thing is how much effort the guy doing it went through - he was using the web form, and it was clearly not a bot - because there was just over a minute between each spam post. I've added some more checking to the server that should put a stop to that kind of nonsense.
I decided to take a look at my RSS feed stats - it's been awhile since I looked at them, and I've increased my readership since then. There are four tools that dominate my feed stats right now:
|Net News Wire||21.5%|
That makes up almost 84% of my readership right there. I was a little surprised by the Net News Wire totals - that's a Mac tool, and the number of hits from that tool was near nil just a few months ago. Am I getting more Mac users, or are more people buying Macs? Or both?
I create content in part to promote my law firm, which I cannot do effectively if my contact info is removed. I do not participate in targeted advertising programs because the majority of advertisers that target the keyword 'trademark' are competitors. I cannot prevent such advertising when my page is reproduced and 'framed' by a third party.
Now have a look at his site - see that RSS feed icon? What does he think happens when people subscribe to it? Sheesh...
Darren Hobbes has just started re-examining Smalltalk, and is looking at VisualWorks. He was worried about threading, so I figured I'd point a few things out:
- VisualWorks is single threaded (in general; the VM runs a few threads for housekeeping purposes, and you can use native threads for external API calls)
- In VW, Smalltalk processes are "green" - i.e., they run within the context of the single heavyweight process
- In ObjectStudio, Smalltalk processes map directly to Windows threads
Interestingly enough, you get far, far better control and predictability with the green threads - and a VisualWorks app (like this blog, for instance) scales quite nicely. Our engineering group just ported Opentalk from VW to ObjectStudio, giving the two products better interoperability - and the single biggest problem was with the native threading. If a native thread dies, you lose the entire application - as opposed to a VW thread, where you log the problem and move on. Earlier today, I introduced a bug into the server with comment handling (I was putting a throttling system in place). I forgot to update the form page, and - as a consequence - every process trying to add a comment threw an exception. These simply got logged by the server, whic kept humming.
Here's another example of process usage - BottomFeeder. With the default settings, Bf spawns a new process for each http query done during the update cycle. I subscribe to 296 feeds now, so that can be a lot of processes. If those mapped to Windows threads, my machine would be brought to its knees - instead, I still have UI responsiveness.
Sometimes, you have to look beyond the conventional wisdom to see how scaling works - native threads are not the be-all, end-all that people think they are.
Dave Winer claims that one of the things blogging addresses is the conventional wisdom problem that the "professional" media have:
At the time I started blogging, the pros were reporting that there was no new Macintosh software. I would call these reporters and point out that there was lots of new Mac software, they were using it, they knew about it. They would respond by saying Everyone knows there's no new Mac software.
I don't think knew they were being dishonest, by then reporting wasn't about facts, it was about conventional wisdom. If CW said there was no new Mac software then the reporters would report that. This meant that competitors didn't actually have to win in the market, that's much harder, they just had to convince the reporters that they had. This leads to not only a very wrong place, but a dangerous one, because the reporters had come to have so much power. With no accountability, no way to vote them out of office, we were totally controlled by them. That's why blogging came about, as a counter-action to the corruption of the professional system.
This isn't a problem limited to reporters, professional or otherwise. I can't recall who said this, but it's applicable here: "It's not what you know that kills you, it's what you know that isn't so". The "conventional wisdom" pops up all over the place, all the time. Winer's Mac example is perfect - so is the "Isn't Smalltalk dead?" question that comes up a lot. Heck, Winer contributed to this himself a little while back - look at this post where he simply assumed (wrongly, as it turned out) that a scaling issue was due to using a scripting language instead of a "real" one. There's a piece of conventional wisdom that has slowed progress in the IT sector for decades now.
This isn't a problem simply of professionals versus the "squeaky clean" blogger heroes - it's about the unrevised assumptions we all make about everyday things. We all have our biases, and plenty of them are uninformed - at best.
I've been wondering for awhile about some of the referer spam I get - the links either point nowhere, or they point to an "account closed" type of page. This was baffling before I ran across this eweek story - it's an unintended consequence of the CAN-SPAM act:
One troublesome technique finding favor with spammers involves sending mass mailings in the middle of the night from a domain that has not yet been registered. After the mailings go out, the spammer registers the domain early the next morning.
By doing this, spammers hope to avoid stiff CAN-SPAM fines through minimal exposure and visibility with a given domain. The ruse, they hope, makes them more difficult to find and prosecute.
The scheme, however, has unintended consequences of its own. During the interval between mailing and registration, the SMTP servers on the recipients' networks attempt Domain Name System look-ups on the nonexistent domain, causing delays and timeouts on the DNS servers and backups in SMTP message queues.
I think I'm seeing a variation on that - either the new domain isn't up when I see the ref, or the domain has been taken back down. This is getting to be like the old "Spy vs. Spy" routine in Mad magazine. How much damage are these bozos doing to the network commons? Here's a thought:
"We have to figure out how to taper DNS services gracefully rather than having catastrophic failures," said Paul Mockapetris, the author of the first DNS implementation and chief scientist at Nominum Inc., based in Redwood City, Calif. "Mail look-up was the first application put on top of DNS after I designed it, and I was so excited to see that. And now, 20 years later, people are trying to figure out how to stop doing mail look-up on DNS. It's bizarre."
Yes, that's a provocative title. I was reading this piece by Jim Rapoza of eweek - and I ran across this:
But why should we expect it to be? Face it: The bad coders are winning. They've convinced users and companies that bugs, security holes and patches are inevitable, and everyone just shrugs their shoulders and accepts that - no matter how bad things get.
But it doesn't have to be this way. All of us have seen even large, complex applications with source code that's clean, free from bugs and secure. All it takes to write good code is the desire to do so, but there really isn't any incentive for software companies to write clean, secure code.
It's not simply a matter of desire, it's a matter of incentives. Part of it is what he says in the next paragraph - end users of software want new features and functions more than they want anything else. I don't think that's all of it though. A large part of it is price. Look at what's driving the industry today - open source and outsourcing, both of which (from an IT management perspective) are about cost control. Secure code? Way, way down the priority chain. If we can get systems done for $15 an hour, have it!
You won't start seeing secure code until end users are willing to pay for it. At present, it's pretty clear to me that most aren't willing to.
I just made a modification to the development update stream that you'll notice - on Windows, "slim mode" now goes to the Windows tray instead of to a smaller window. This is more in line with what Windows users expect, and it gets the application completely out of the way. The tooltip will change when there's new content, so you'll be able to tell what's up. If I can figure out how to add another icon to the executable with ResHacker, I'll see about changing the tray icon when there's new content as well.
This post piqued my interest - a Java developer asks a question about Groovy:
Well... the Codehaus site tells me a lot about what I guess I need to know about the language... but I'm still wondering why I'd want to use it. The home page says:
Groovy is a new agile dynamic language for the JVM combining lots of great features from languages like Python, Ruby and Smalltalk and making them available to the Java developers using a Java-like syntax.
Okay... great features, I guess, except I don't know what the great features from Python, Ruby, and Smalltalk are that I'm looking forward to. Agile and dynamic here look like buzzwords, although they both have real meaning.
The next paragraph:
Groovy is designed to help you get things done on the Java platform in a quicker, more concise and fun way - bringing the power of Python and Ruby inside the Java platform.
Um. A quicker, more concise, and fun way... when I don't really have a problem doing quick, concise things in Java as it is. (Maybe my definitions are poor.) The power of Python and Ruby still don't do much for me.
This points out something I commented on in a post Joel made awhile back - where Joel claimed that "millions of developers" had seen Lisp and rejected it. What I'm seeing is that lots of developers have never so much as seen anything outside the C family (with the possible exception of Basic). That's why we see posts like the one above - and I'm not pointing at it in order to ridicule it. Simply put, many developers can't imagine what benefits a dynamic language might have because they've never seen one. To them, it's all about what they learned in C and its various cousins.
This tells me that we - the proponents of dynamic languages - have a rather large education issue in front of us. It's not so much that people disagree with us about dynamic languages vs. static ones (although there's plenty of that) - it's that they have no idea whatsoever as to what a dynamic language is.
An interesting discussion of scalability started up in this comment thread after I posted on Smalltalk threads. One of the commenters pointed out that BottomFeeder had trouble importing his feeds - he had 5100 of them. The interesting thing is that he immediately assumed that it was due to the spawning of 5100 threads - he saw CPU spiking and a crash.
Now first off, I'll admit that I've never considered the use case of 5100 subscriptions :) Having said that, threads really aren't the problem here - memory usage is. Go open the settings in BottomFeeder, and look at the Memory page. There's a soft limit on memory, and a hard limit. Once BottomFeeder hits the hard limit, any request for additional memory will result in garbage collection - and eventually, if the problem persists, an out of memory exception. Now, it can get ugly at that point - it takes memory to put up a dialog stating "out of memory", so it's possible to have a complete crash. You can customize the memory policy to deal with that (assuming that actual real or virtual memory is available, of course).
Here's what's going on - the base BottomFeeder application consumes 30MB before you load any feeds. Yes, that's fairly "hefty", but I have what amounts to a "kitchen sink" image with lots of libraries pre-loaded - it's made some of the additional features I've added to the app much, much simpler. I subscribe to 300 feeds, and they are all kept in memory. As I'm sitting here, Bf is consuming an extra 30MB over it's starting base. To figure out why, you have to understand this:
- How VisualWorks memory management works
- How BottomFeeder keeps feed and item data around
- What happens during the BottomFeeder update cycle
There's fairly extensive documentation on the memory model in our Application Developers Guide - for our purposes here, I'll explain parts of it. There are three areas of memory relevant to our discussion here:
- New Space - where new objects are "born"
- Survivor Space - 2 memory zones where objects that survive the first pass GC in New Space end up
- Old Space - As the Survivor Spaces fill, objects "tenure" into Old Space. Unless there's manual intervention or a hard memory limit, Old Space simply grows and is not scavenged
Now, let's look at an update cycle. BottomFeeder loops over the N subscriptions, and if threaded updates are on, spawns N threads. Each thread looks at the feed in question, and does the following:
- Is it time to check for an update? Check the meta data in the feed (if any). Possibly answer no, and generate no update
- If the answer was yes, issue a conditional-http get (assuming the feed supports that). If that answers no new stuff, the thread ends
- If there is new stuff, we now have an HTTP response object (consuming memory)
- We now parse the XML we got into an XML doc (consuming memory)
- We now do an XML to Object conversion, creating a set of items for the feed (consuming memory)
- Those items are merged into the existing feed, and this thread terminates
That's a noticeable amount of memory for each update that comes back from the server, although much of it is transient objects - i.e., objects that should never make it into Old Space. In older revs of Bf, I didn't have New Space set big enough, and memory usage tended to grow after the first set of updates, and stabilize high - too many objects tenured. I now have that handled, at least for subscription sizes in the neighborhood of mine and smaller. For 5100 though? I suspect that a lot of tenuring is happening, and is subsequently slamming into the hard upper limit on memory - not to mention that the persistent objects likely consume enough space to stress the default upper bound
So let's return to the example - 5100 threads answer back with well over 1000 responses, generating an equal number of XML documents, and a large number of items. That likely stresses both the size of New Space and the default memory bound - causing a scaling problem. Note how threads - which are very, very cheap in Smalltalk - don't really enter into it. Now, creating a thread pool for a large number of feeds might cut down on some of the CPU spiking, but it's not really an issue in the problem at hand
In the case at hand, setting the upper bound of memory up higher - to something like 400 or 500 MB - would allow Bf to handle that number of subscriptions. The hard part is the tenuring. While Old Space size limits can be configured on the fly, New Space and the Survivor Spaces can't be - to change them you have to save the image and restart (something that Bf does not do in its deployed state). I need to ask our VM guys how feasible it would be to allow runtime resizing of those.
In any case, a thread pool isn't really the issue here. Having said that, I created one as an optional thing for BottomFeeder last night. I noted that a commenter pointed out that Java developers didn't need to roll one, since they can just download one. That's as may be, but I was able to create one in about an hour while most of my attention was on "Desperate Housewives" (a true guilty pleasure if ever there was one). A general thread pool would have been more trouble anyway - I created a standalone pool implementation first and discovered that the hard way :)
With my first post of the day coming after 3:00 (US EST time), it looks like I've been working all day. Nothing could be further from the truth :) I bought the latest Turtledove book, "Homeward Bound". It's part of the World War/Colonization train he was on before the civil war/WWI extravaganza, and I'd been waiting for it. Action fans will be somewhat disappointed; it's not filled with the sounds of battle like the previous ones were. Nevertheless, it's been interesting, and it kept me up until 4 AM last night...
Tim Bray explains some of the weird referer spam we've all been seeing:
Near as I can tell, pretty well every somewhat-visible website in the world is seeing its logfiles fill up with with bogus page fetches there only as a vehicle for a spammish "referrer" field; whether or not the site posts referrer data. This high-volume flood is a fairly recent phenomenon, and what makes it weird is that the vast majority of the bogus referrer sites are off the air due to some terms-of-service violation. It would appear that a sleazebag somewhere launched a really ambitious assault on the whole world -- using, I can only assume, a few zillion zombified drone machines -- only to be found out and have their hosting yanked while their mindless slaves continue to spew vacuous venom into logfiles everywhere. Damn, the Internet is a weird place.
I ran across an interesting requirement today - I needed to enable/disable individual settings in the BottomFeeder settings tool. That tool is simply a user of the new settings framework created by Vassili. In a typical VW interface, enabling/disabling widgets isn't hard - you do something like this:
(self builder componentAt: #idOfWidgetHere) enable.
It gets trickier when the widget is in a subcanvas - the builder used to create the subcanvas isn't typically kept, and the widgets in it aren't tracked by the builder in the outer UI. Typically, one solves that by grabbing a copy of the builder used for the subcanvas and caching it. Each page of the settings tool, is a subcanvas, so I had to accomplish the same thing. Here's where I had to do some digging. In class SettingsManager, there's a method that swaps the pages:
pageSelected | page tree | page := self pageListHolder selection. page isNil ifTrue: [^self]. tree := self pageListHolder list. ((tree isExpandable: page) and: [(tree isExpanded: page) not]) ifTrue: [tree expand: self pageListHolder selectionIndex]. (self widgetAt: #GroupBox) label: ((TreeViewIndentedLabelAndIcon with: page label) icon: page listIcon). "This is the only kind of label with a sane icon positioning." page builder: nil. (self widgetAt: #Subcanvas) client: page spec: page spec builder: self builder newSubBuilder. self updatePageSubcanvasScrollbar
The actual swapping happens in the #client:spec:builder: message send. So, to do what I needed to do, I changed that section as follows by overriding that method in my subclass:
newBuilder := self builder newSubBuilder. (self widgetAt: #Subcanvas) client: page spec: page spec builder: newBuilder. self updatePageSubcanvasScrollbar. self possiblyAdjustEnablementFor: newBuilder onPage: page
By holding a reference to the builder, I can call out to a new method and check which page we are on - and also check the applicable application state. From there, I can grab the relevant widgets and enable/disable them. Here's what that method looks like:
possiblyAdjustEnablementFor: newBuilder onPage: page "if we are in certain application states, disable some settings" RSS.RSSFeedManager default isUpdating ifFalse: [^self]. page id = #(#rss #network) ifFalse: [^self]. "This grabs each wrapper for the thread options and disables them if we are in the update cycle" #(#'-rss-network-runThreadedUpdates' #'-rss-network-shouldSpreadUpdates' #'-rss-network-shouldThrottleThreads') do: [:each | (newBuilder componentAt: each) disable]
Those ID's are not pretty - they are manufactured by the settings framework, and I figured out what they were with a breakpoint and an inspector. In any case, the method checks the application state and the page, and then toggles the appropriate settings based on those. In this case, I'm making sure that the threading/non-threading of updates cannot be mucked with while the update process is awake and running. In general, it looks like it's easy enough for any application to enable/disable settings options via this strategy.
aPage when: version valueHolder valueSatisfies: [:v | v = #latest] enable: level
the first argument is a value model. Whenever it changes, the block passed as the second argument runs and the module passed as the third argument is enabled or disabled depending on whether the block returns true or false.
As usual with Vassili's designs, it's simple and makes sense.
Here's an interesting article on usability concerns. In looking at a US Air Force document from 1986 on creating good user interfaces, they discovered that most - 70% - of the guidelines were still applicable. Things haven't changed as much as we might have thought - this document was focused on the development of "green screen" mainframe applications. The key point is here:
You would be hard-pressed to find any other Air Force technical manual from 1986 that's 70% correct and relevant today. Whether for pilots, airplane engineers, or programmers, general lessons of the past might continue to apply, but the specific guidelines changed long ago.
Usability guidelines endure because they depend on human behavior, which changes very slowly, if at all. What was difficult for users twenty years ago continues to be difficult today. People can only remember so many things, and we don't get any smarter.
No matter how many spiffy we think UIs are now compared to then, the basics really haven't changed that much.
There's a new e-book out on RSS for marketers: "Unleash the Marketing and Publishing Power of RSS" by Rok Hrastnik. I was interviewed for the book - you can buy the book here:
It's a pretty good summary of what RSS is and how it works, but not at the technical implementors level. This is aimed at marketing and sales folks who would like to know what the buzz about syndication is about, and how it can be used.
Darren Hobbs asks about threading strategies in Smalltalk relative to the ones he knows in Java:
For example with the latest release of Java you might use the nonblocking IO library and one thread per cpu, keeping all the cpu's maxed out without thread-switching, so your only bottleneck is available memory (and browser timeouts). Is there an idiomatic smalltalk equivalent?
My usual tactic for maximising throughput is to minimize the number of thread/processes running, and only hand off to another process when you would otherwise have blocked, for example on IO. Nonblocking IO support in java allows me to use just one thread for all an application's network IO, and keep the rest of the threads as busy as bandwidth allows. I don't know how to do this in smalltalk, nor even if I'm worrying about the right problem in smalltalk
The primary difference (at least with respect to VisualWorks) is the nature of threads in the language. In Java, threads (usually) map directly to OS level threads. That means that you have to manage them carefully; create too many and you'll just spike all available CPUs. In VisualWorks, threads are lightweight (green) threads within the context of a single heavyweight Smalltalk process. That makes them nearly free from an overhead standpoint.
In the context of something like a web server, this is especially true - the server I host the twenty Cincom Smalltalk blogs on is running the VisualWorks web application server, and handles a decent amount of traffic. Here's how things work in the server itself:
There's a listener process - this one sits on the application server's inbound port, waiting for connections. It runs at a high priority (70, where the scale in VW runs from 1 to 100). As connections come in, the server determines who (i.e., which web application) should handle each one. As soon as it determines that, it forks off a Smalltalk process at 50 to handle that connection, and then goes back to listening. This scales pretty well; we have yet to see any kind of issue in this server, and many of our customers handle much higher loads than we do.
Now, when I say "threading in Smalltalk", I have to be careful which Smalltalk I'm talking about. Cincom Smalltalk consists of VisualWorks and ObjectStudio - ObjectStudio maps Smalltalk processes to native threads. In VisualWorks, we can use native threads when spinning up an external API call - so we can make sure that database calls (for example) don't block the VM (important for a server!). Native threads have their own issues; when engineering ported Opentalk from VW to Opentalk, they ran into a few problems:
- As a developer, you have far less control over native threads than you do over green threads
- minimal ability to set priority
- if a native thread crashes, it can take down the entire system - a green thread crash can be handled easily via standard exception handling
Other Smalltalk implementations vary as well. Squeak and VisualAge are like VW - green threads for Smalltalk processes, and in VA, the ability to map an external API call to a native thread. Smalltalk MT, like ObjectStudio, maps Smalltalk processes to native threads. I'm not entirely certain what Dolphin does; I think it uses green threads. In any case, the point is that it varies by implementation.
Dare Obasanjo reports that MyMSN now supports RSS and Atom - looks like users have a fair bit of control over what they can put out as well. There's another effect here as well - now that Google and MS are both supporting Atom 0.3 format, the eventual release of Atom 1.0 becomes pretty much irrelevant. This is a correction from earlier - I misread the post by Dare.
If you've tried to post a comment via the web form with Safari, it's likely the case that you got a silent failure. I added a comment throttle awhile back to help prevent spam - the trouble is, it looks like a submit from Safari results in the server getting 2 POSTs within a few seconds of each other - I have no idea why. Very, very odd...
Update: This has a perfectly good explanation that I should have thought of - the first post was a preview, the second the actual post. Both hit the server as a post, of course. When I implemented the comment throttle, I stupidly didn't take the web form preview into account. That's been addressed; it all ought to work now.
Slashdot is reporting that "Enterrpise" is on the ropes. Hmm - could that have something to do with:
- The generally lame plot-lines they've pursued? (ooh, Vulcan emotions. No one's explored that topic before)
- The non-serious way they've dealt with enemies? "Look, those guys are trying to kill us. Set phasers on Stun".
- The completely derivative nature of last season's xindi plot (look, a sneak attack on Earth using terror tactics! That couldn't possibly be our commentary on the post 9/11 world!)
I've been frustrated with the Star Trek universe for years. Paramount needs to send Berman and anyone who looks like Berman on a permanent vacation, and find a real set of writers. For starters, they could look at some of the alumni from SG-1...
This is very good news - Google is getting out front in the fight against comment spam. I'm going to have to look at supporting this in my server:
If you're a blogger (or a blog reader), you're painfully familiar with people who try to raise their own websites' search engine rankings by submitting linked blog comments like "Visit my discount pharmaceuticals site." This is called comment spam, we don't like it either, and we've been testing a new tag that blocks it. From now on, when Google sees the attribute (rel="nofollow") on hyperlinks, those links won't get any credit when we rank websites in our search results. This isn't a negative vote for the site where the comment was posted; it's just a way to make sure that spammers get no benefit from abusing public areas like blog comments, trackbacks, and referrer lists.
I was getting very, very tired of fixing the wiki pages on the uiuc VW Wiki. Apparently, they have a fix that they plan to implement - in the meantime, something like 100 pages got spammed repeatedly yesterday. That was annoying to cleanup. However, there's a simply way to revert a Wiki page back to a former version. It was a manual process to gather all of the pages along with their last good version, but now I have that. I think - based on the time interval - that the spammer is running a script that edits a set of known pages and inserts his set of bozo links. Well, now I have a script that simply resets all of them. Check out the Recent Changes page - see how closely bunched all of those changes from one system (mine) are? Fight fire with water, I always say...
I've started to doubt that the noFollow thing will put a serious dent in spam. Don't get me wrong - having the ability to mark a link as ineligible for crawling is useful in and of itself, and will at least help with bogus page-rank. However, it's not going to fix the problem itself.
Look at the strategy behind email spam - the cost of sending the email approaches zero, so the spammer only needs a miniscule response rate to make it worth their while. This is in contrast to direct mail, where there are actual costs involved. Over time, sending mails back to people who never buy anything has a real cost - doing the same with email is virtually cost-free. Sadly, the same thing applies to spam in comments, referer lists, and wiki pages. Page rank is a nice bonus for these people, but it's not really the driver. With no actual costs, they can keep pounding away, happy to receive whatever tiny percentage of click-throughs they get.
For an example, just look at the spam pounding that the UIUC VW Wiki has been getting. Yesterday, more than 100 pages were defaced, most likely by a script. They ended up getting restored quickly (see my previous post), but over the course of the day yesterday there were 3 mass defacings. None of them stayed up long enough to guarantee a bot crawl - but they were up long enough to get found, and for some small percentage of people to - gosh knows why - click through.
While this is a good and useful idea, I don't think that the level of celebration ringing through the blogosphere is warranted - the impact of this will be minimal at best.
Update: a few posts along similar lines of thought:
Sean McGrath has an absolutely priceless commentary on the optional typing hullabaloo over in Python-Land.
I see posts like this and I realize how small the impact of Smalltalk and Lisp have been on most developers. Have a look below:
At one time or another, every programmer has imagined what it would be like to work directly with the deep structure of code. Some of the best minds in the business are working to make that happen. The legendary Charles Simonyi, who left Microsoft a couple of years ago to pursue his vision of intentional programming, says deep structure is at the core of the toolset his new company, Intentional Software, is building. Sergey Dmitriev shares Simonyi's vision, and his company -- JetBrains, creator of IntelliJ IDEA -- wants to do something similar with its next-generation toolset. These projects are still under wraps, but another champion of deep structure is working out in the open. Jonathan Edwards, currently a visiting engineer with MIT's Software Design Group, has built a prototype system that he is demonstrating in a screencast at subtextual.org.
There are big ideas at work here. In Edwards' prototype, programming, testing, and debugging are just different ways of interacting with a program's tree structure. Edwards' 2004 OOPSLA paper, Example-centric programming, explores one of the benefits of this arrangement: the examples (or "use cases") that drive program design are worked out the context of the living and evolving program. We've all heard this stuff before. I may yet go to my grave without emacs ever having been pried from my cold dead fingers. But it's worth pondering, now and then, what we could do with tools that didn't think of programs as strings of text. Full story InfoWorld.com
You know, it's not like this stuff doesn't already exist. I realize that it's likely news to Gosling, who seems to think that he's the first one to have ever thought of parse trees. As for Simonyi, I think the phrase "Hungarian Notation" tells me everything I need to know about what his vision would look like... almost certainly a place I'd have no interest in going.
Seriously though, live code isn't a new thing at all - Lisp has been around a long time, and so has Smalltalk. Everything that is talked about above - it's been out there, implemented and in use for (literally) decades now. I guess that - for lots of developers - if it doesn't use curly braces and semi-colons, it simply doesn't exist...
After a few comments in this thread, followed by this post, I thought I should address truly large feed subscriptions in BottomFeeder. By default, I set a memory ceiling of 128MB in Bf - that was an attempt to make it a sociable client that wouldn't grab memory like a sociopath on your desktop :) Now, that's resettable - you can set that figure higher in Settings>>Memory. If you really want to read a few thousand feeds - then I'd make the following settings changes:
- Memory: set the soft limit to somewhere around 300 MB, and the hard limit to 500 MB or above
- Turn threaded updates on
- Turn "Limit the number of threads during update" on
The latter will cut down on how many concurrent xml docs stack up for parsing at once during update cycles. The lower limits are kind of tuned for what I consider "normal" usage - a few hundred subscriptions at most.
Compared to my interest in the previous post, news of X-Files 2 work just makes me yawn. The big question: "Why?" Yeah, I know the answer is "too make money?" :) Seriously though - the series had jumped the shark many seasons before the end, and the first movie was pretty lame...
Ben Hammersley's Dangerous Precedent explains why the excitement over "noFollow" in the Blogosphere is misplaced:
This is the key point. If rel="nofollow" works, if it's applied universally, it will actually have the reverse effect. It actually gets less effective the more it is implemented. Why? Because the comment spamming sites are in competition with *each*other*, and not with any legitimate businesses. They're not so much trying to get the best pagerank for their term, as trying to get a better one than their rivals. That's a key distinction. If the playing field is levelled by rel="nofollow", then everyone involved will be forced to try all the harder to get their links out there. The blogosphere will be hit all the harder because of the need to maximise the gains. As there's no more effort in hitting 6 million blogs as there is in hitting 1 million, this really won't bother the spammers one bit. All it does is shift the problem from the high pagerank blogs we here might have, with rel="nofollow", custom sanitize settings, and mt-blacklist in full effect, all the way over to the less technically adept. And that is one enormous customer service problem heading towards Blogger, 6A and the rest.
That's about the size of it. You really have to keep in mind that spamming costs the spammer nothing - so these techniques don't really affect their behavior. See Phil Ringnalda for more along these lines.
Update: Ben Hammersly updated the post I linked to with this excellect observation - one that I certainly hadn't thought of:
Meanwhile, Scoble points out how it can be used in other ways, and undermines the second aspect of the attribute: as respecting rel="nofollow" will involve losing an enormous amount of implicit metadata, any tools that are interested in that will be forced to ignore it. Technorati will have to choose if it's a site that measures raw interconnectivity, or some curious High School metric of look-at-that-person-but-don't-pay-her-any-attention that the selective use of the rel="nofollow" attribute will produce. For many purposes, this would mean the results are totally debased and close to useless.
Yes, picking inauguration day was probably not the brightest idea on our part. When I suggested the 20th instead of the 13th (I was out of town on the 13th), I hadn't thought of this :) In any case, it looks like taking the DC Metro to the Gallery Place/Chinatown stop is the best way to go - and according to the metro's planner, it's open today and tonight. Once you get there, walk north past the MCI Center to the Starbuck's at 800 7th street, and you'll be there. The meetup starts at 7 PM - with all the inauguration hoopla, it's probably a good idea to give yourself plenty of time. I'll have the latest Cincom Smalltalk non-commercial CD's if that's any motivational help :)
What is new here is the application of this technology "outside the box" of a debugger.
A debugger is used in a different mode than the editor - first you edit your code, then you switch to the debugger and manually run the code with some inputs. The debugger presents an entirely different UI and mode of interaction than the editor. The goal here is to eliminate this mode-switching by unifying the debugger and editor into a single tool with a consistent UI. This can be described as an example-enlightened editor.
In addition to sidelining the debugger, this approach supplants the need for a Read-Eval-Print-Loop: the canonical exploratory UI to an interpreter. Expressions typed into a REPL are instead now just example snippets in a source file, with their results appearing in the example view rather than inserted into the transcript. Results are automatically refreshed whenever the code changes, which avoids the hidden pitfalls of anachronistic definitions
Here's the part where the Smalltalkers realize that yes, in fact, there's nothing that fascinating here: A debugger is used in a different mode than the editor - first you edit your code, then you switch to the debugger and manually run the code with some inputs. The Smalltalk debugger is both a debugger and code browser - it's not the separate tool that the author discusses. I'm sure that this kind of tool looks very interesting to people using the mainstream languages - while to those of us using Smalltalk it elicits mostly "I've had equivalent capabilities for years now". Not identical mind you, but very much akin.
After that section, the author discusses unit tests, and how they are useful as examples. Nothing to argue with there - many developers view unit tests as something close to a documentation replacement. The unit testing assistance that the tool offers looks interesting, but - in theory, you write the test first. Thus, picking out a code snippet that qualifies as an assertion puts the cart somewhat before the horse. On the other hand, I hardly qualify as a testing purist, so this kind of support likely would be useful - if it helped generate a unit test. As outlined, it suggests that the tool is obviating the need for a separate test. That's likely a bad idea - once the initial developer leaves, this artifact goes with him. A separate unit test survives as a marker.
Ultimately, I'm not convinced that it's a good idea to encourage people to not write separate tests - which is what this paper argues for. Tests, like code, are primarily communication - a link between the original author and the future maintainer. Code maintenance always goes on longer than initial development, and any increase in the communication between the original author and the future maintainer is a good thing. The path suggested by this paper would reduce communication... not a good thing at all. I realize that the author posits a new set of IDE tools where all of this is integrated - and if it was all tied together, it would be more impressive. I still don't see it as much of an advance over the Smalltalk debugger I currently have though.
There's an interesting op-ed piece in Computerworld - I was taken in by the notion of a "Toxicity Survey". Everyone in the marketing sector has read Moore's "Crossing the Chasm" - Thornton May says it's time to move on:
Inappropriate and outdated mental models on why and how technologies enter the organization: The days of "crossing the chasm" are over. Geoffrey Moore, the creator of this once-dominant descriptive framework, has moved on; vendors should too. The simplistic, product-centric characterization of customers as innovators, early adopters, early majority, late majority or laggards has given way to a much more fragmented and nuanced set of behavioral buying clusters. Just as society has fragmented into categories such as soccer moms, NASCAR dads and underemployed knowledge workers, so too have technology entry points atomized. Most vendor marketing programs haven't been successful at targeting the tribal leaders of these buying clusters.
I'll have to give this one some thought. In most marketing circles, Moore's chasm work is near gospel - certainly the conservatism of the late majority is evident in the space. The rest of the piece is thought provoking as well. I'll have to chew on this one.
Well, well. I've been saying that WS* is the new CORBA for awhile now, with the only significant difference being port 80. Looks like the IT press is starting to think similar thoughts - have a look at Alexander Krampf's op-ed in SD Times:
Is it me, or does all of this seem eerily familiar? I can't help comparing today's Web services hype to the CORBA boom of the 1990s.
What happened and which technological revolution did I miss that makes all this a reality? It must have been XML and SOAP. But wait 14while XML is a great way to store and exchange information, it does so in a very verbose manner. And SOAP is really just another RPC protocol that happens to use XML instead of a binary data representation.
Deploying Web services requires a stack that includes XML parsers, Web servers and additional infrastructure. I just don't see that huge a difference between what Web services offers to me today and what CORBA offered to me five years ago. Back then, I needed to know IDL (Interface Definition Language). Today, I need to know all the different Web services schemas. And keeping track of the various standards and specifications is another enormous challenge.
I recall watching the ParcPlace distribution team trying to keep track of the vast array of CORBA services that were spinning out in the early to mid 90's - and believe me, the view of WS* specs rolling out looks like the same thing all over again.
This is probably the best advice on WS I've read yet (it's a good general point as well):
Web services are yet another tool in our increasingly large arsenal of integration approaches. They have many admirable characteristics and will make a valuable contribution in helping businesses operate more efficiently and serve customers better. Let's just make sure that when we choose Web services, we do so because they are the appropriate solution for each integration problem and not because we have a free Web services stack sitting around with nothing to do.
January 10, 2005: NASA scientists studying the Indonesian earthquake of Dec. 26, 2004, have calculated that it slightly changed our planet's shape, shaved almost 3 microseconds from the length of the day, and shifted the North Pole by centimeters.
None of these changes have yet been measured--only calculated. But Chao and Gross hope to detect the changes when Earth rotation data from ground based and space-borne sensors are reviewed.
All I can say is... wow.
Consider the profound contradiction between the OOP practices of encapsulation and inheritance. To keep your code bug-free, encapsulation hides procedures (and sometimes even data) from other programmers and doesn't allow them to edit it. Inheritance then asks these same programmers to inherit, modify, and reuse this code that they cannot see -- they see what goes in and what comes out, but they must remain ignorant of what's going on inside. In effect, a programmer with no knowledge of the specific inner workings of your encapsulated class is asked to reuse it and modify its members. True, OOP includes features to help deal with this problem, but why does OOP generate problems it must then deal with later?
Why does OOP generate problems it must then deal with later? All this leads to the familiar granularity paradox in OOP: should you create only extremely small and simple classes for stability (some computer science professors say yes), or should you make them large and abstract for flexibility (other professors say yes). Which is it?
I'd say that this has more to do with black box development vs. white box development than with OOP per se. But heck - don't take my word for it - read the whole thing.
I linked to this post by Phil Ringnalda earlier - he makes the point that spammers are going for quantity, not quality - they spray their efforts far and wide (because they can), and don't really care if the effort succeeds. Like Phil, I've got the logs to prove it :) Now, I don't suffer nearly the volume of spam attempts that sites running the common blogs servers do - hello, security through obscurity. Still, I get a daily ration of attempts. I turned off comments on all posts that fall out of the RSS feeds awhile back - I downloaded all the posts and scanned for spam a few months ago and was unpleasantly surprised to find a fair amount of spam lurking in posts that I had long since forgotten. The fact that I've disabled comments for older posts hasn't entered the consciousness of whoever tries to spam the CST blogs; every day there are attempts, and every day it's to the same small set of old posts. To quote one of my colleagues here at Cincom, and so it goes...
PR Opinions (naturally enough) likes the analytical groups:
The influence of the analyst community in the technology purchasing decision is a much coveted resource. It is one of the reasons that technology vendors are anxious to access, inform and influence these analysts. Typically the bigger vendors spread their analyst budget around a large number of firms. However the smaller firms, with limited funds have a far more difficult time and often budget decisions are driven by broader marketing requirements. With this market dynamic there is always risk.
I have been working with industry analysts in North America, Europe and Asia, on and off, for well over a decade. In my experience, the majority of analysts and their firms, offer impartial, informed and valuable advice to their clients - whether those are vendors, end-users or a combination. But as with any market (think Public Relations ladies and gentlemen!) you always have rogue elements who offer biased, "pay-for-play" services which are about as valuable as you'd expect.
"Suffice to say that sometimes the industry analyst business looks something like the Mafia... some analyst firms appear to run a sophisticated version of the protection racket. If you pay up we let you do business - if not we can make life real hard for you by smashing the place up/downgrading your products. Its an open secret in the business, the corpse out in the backyard we all catch occasional whiffs of...It is becoming increasingly clear that the industry analyst business is ready for an overhaul."
Now, I'm not sure that I'd go quite that far - but the bias in the tech industry is somewhat akin to the much talked about media bias thing in politics - it's not that the media are purposely slanting one way - or that analysts are purposely slanting one way either. It's more that they are all coming from the same "culture", and you get an inadvertant "groupthink" thing going. This is why many media stories (think the OJ trial) get a real herd mentality behind them, I think - and it's also why the analysts tend to mirror the IT industry supposed consensus.
What you get from the analyst groups is their distilled notions of popularity - not any kind of meaningful technical analysis. That's fine as far as it goes, and it does have value. You should just be aware of what you are getting.
Last night's Smalltalk meetup went pretty well - I ran into some old and new faces. We retired to Fuddruckers from Starbucks (thank you Victor for arranging that!) after we ran out of room in the corner of Starbucks. We had a good time - kudos to Matt for arranging it. I look forward to the next one.
I share a lot of Rogers Cahdenhead's frustration over RSS 1.1 - it's another solution in search of a problem - see here for the format that exists due to antipathy for Dave Winer. However, there's a reason I added support for it to BottomFeeder - I just knew that Id get asked about it, and it's simpler to pre-empt that and just add support. It was simple enough; I subclassed the RSS 1.0 handler and overrode 4 methods in the subclass - done. I'm entirely unclear as to what problem it solves, but there it is...
I'm only now getting back to work - we had the in-laws over for a brunch, and my morning was swallowed by my own mistakes. Last night, on my way to Smalltalk Meetup, I lost my phone. Somehow, it slipped out of my pocket on the metro - so much for that. I went to the Verizon store this morning, and ran into a few hurdles. They were actually quite nice - they let me replace my old phone for the upgrade price even though I had no insurance and had bought the new phone just 6 months ago. It still took forever - the guy in front of me in the service line wanted a personal introduction to every feature on his new phone, and I had to wait to get my address book transferred from my penultimate phone to the new one - complicated by the old phone's complete lack of charge. That ate up 2 1/2 hours. I'm happy with Verizon's service, I just wish it hadn't chewed my morning up...
I made a comment in this post's comments about the memory requirements for BottomFeeder. By extension, it was about memory usage in VisualWorks applications in general. With that in mind, let's take a look at that. If I bring up a base VW 7.3 development image, and execute ObjectMemory dynamicallyAllocatedFootprint, I find that I'm using 14 MB of memory. Where is that coming from? There are two places:
- Perm Space - all the classes and objects that start off in the image
- The rest of the memory spaces:
You can't do much about perm space without doing a strip (typically with RuntimePackager). The other spaces are well documented in the class side comments of ObjectMemory. You can find out how much space they take up this way: ObjectMemory actualSizes. That will return an array full of numbers, representing the bytes used. In a base (development) image, they look like this:
#(307200 61440 204800 40960 655360 591904 204800)
You can adjust those with the #sizesAtStartup: message - it allows you to send multipliers (x factors) by which to make those bigger. For instance, if you are going to be creating lots of objects quickly, it might make sense to make Eden and the survivor spaces bigger. That's still not a lot of space - at present, BottomFeeder is taking 65 MB on my desktop. Most of that is oldSpace - i.e., objects that have become permenent within the context of the runtime - my feeds, items, etc.
I mentioned that I had preloaded a lot of code - I have. Things like Opentalk, all the network libraries, SOAP, just rafts of stuff. That's something like 16MB of stuff in my base image. Now, I'm not really trying to limit what I load or take stuff out at all - I pretty much just package the image (i.e., make it a runtime) and go. You can do a lot better, and many people do. For instance, Liberty Basic - that's a 3mb download, and it's a Smalltalk application.
Looks like there's a security breach in Java - anything older than the latest. I still say that trust relationships matter more than sandboxes, because developers - all of us, whether we use static languages, dynamic languages, whatever - we all make mistakes, and in far too many cases we only learn about them later...
We didn't get the huge snowfall that had been predicted yesterday - more like 4-5 inches. That's enough for good sledding, especially when the temperature stays down around 20 (Farenheit). My daughter and I got the toboggan out, and headed for "the hill" - a nice steep slope in the neighborhood. Lots of kids and adults had already gathered, and more piled in all afternoon. We had a pretty good time - it was so packed kids were periodically getting bowled over by other kids on uncontrollable sleds - the rubber inflatable kind. We stayed out a fai while - I didn't even have to clear the driveway, because my neighbors did that for me (to be popular in the winter here, try being the only one in your neighborhood with a snowblower :) ).