I'm wondering if the VW Wiki at UIUC is a spam casualty. It's been offline all day (I fixed a bunch of spam last night). There's one wiki still up on the UIUC site, the Camp Smalltalk Wiki. Hit Recent Changes and scroll down - see how many pages have been changed recently? That's a Wiki that barely gets any traffic or changes; it's all spam vandalism. It's really getting to the point where a few jerks are ruining it for the whole class.
Well well - looks like another search site is supporting syndicated searches - MSN. It's very early (alpha) at the moment, but this is good news.
Here's a question that came up at the planning meeting this morning, and it's one that should properly be addressed to the community. There are a number of items in the "preview" (beta) directory of the VisualWorks distribution. The way we currently run releases, we do the following:
- Major release in the late fall/early winter of the year
- Minor release in the late spring/early summer of the year
In a minor release, we try to address things that skipped the major release, bugs, and preview items. looking at the preview items, here's the list:
- 64-bit engines
- Database\TIGER (connect package for the Cincom database)
- Database\Supra (connect package for the Cincom database)
- Modtracking/ModificationManagement (a framework for dealing with tracking object changes/immutability for things like database operations, particularly OODBs)
- Opentalk/iiop (CORBA related)
- Packaging/base.im (part of the deployment simplification process)
- SmalltalkDoc (new documentation framework, to be integrated with the browsing tools)
- Unicode (display of arbitrary character sets w/o regard to the current locale)
The question I have here is - what do you think? If it was your call, which ones would you be most interested in seeing promoted out to a fully supported set?
Here are some new press releases from our German Smalltalk group - they are all in German:
"So, we had a problem, and it had a deceptively simple solution. This is exactly the kind of problem Word's Auto Formatter was designed to solve. The fancy name for Word's Auto Formatter is a "rule-based inference engine," which really means that it's like pattern matching on steroids. It takes into consideration the current state of the document, and interprets various keystroke input sequences in terms of a set of rules. It's designed to work very fast, so that users don't notice any effect on typing performance...."
"To summarize these rules, if the insertion point is:
- In an empty paragraph--always inserts a tab character;
- In the middle of a non-empty paragraph--always indents the whole paragraph; and
- In the first line of a paragraph:
- If there are no tab stops set, then indents the first line of the paragraph; or
- If there is a tab stop set, then inserts a tab character.... "
It gets better than that :) I really, really like this:
Personally, I'd like Tab to always be "Tab" and nothing else. Funnily enough, there's a solution for that as well. In order for 'Tab" to always be "Tab", you need to press "Ctrl+Tab". Now *that's* design ;)
I think the whole tool would work better if there was an option to unload the inference engine completely. It works only for short memo writing, and it absolutely sucks for anything more complex. Note to Microsoft - Word is awfully expensive for something suitable only for memos...
One of the big things we intend to deal with over the next release cycle is deployment. Right now, it's kind of painful to go from application in development to application in deployment - using RuntimePackager can be pretty complex. We had a talk about this today, and it looks like a solution will appear in preview in 7.3.1. If you look in the packaging directory, you'll see a base.im - which is a first cut at a runtime image, ready to deploy. What we intend to do is simplify deployment down to a straightforward script - you'll toggle some settings, tell the system what parcels should be loaded - and ten point the script at the base image. The script will run, preparing a deployable image (or executable on Windows). This should make it far, far easier to push a product out the door.
I got a lot of good feedback to this question on priorities - not so much in comments, but in the vwnc and vw-dev mailing lists. I'll be going over all of this with the engineers in the next little while; thanks!
People aren't good at mind-reading (another term for mind-reading is "making stuff up" -- see any comedy of errors for examples). Software is even worse at mind-reading. The worst features in Microsoft Word are those that attempt to do mind-reading... auto-formatting, auto-capitalization, auto-correction, etc.
Can we get an editor that follows the simple minded do what I tell you to dictum, or is that just asking too much? Like Keith, I recall older versions of Word as useful - and newer versions as being truly, truly irritating
Our Australian representative, Andrew McNeil, passed me some pictures of this year's engineering meeting:
A lot of today will be taken up by a "Camp VisualWorks" kind of thing - given the distributed nature of our team, it's not often that all of us get together in the same room.
Here's a shot from one of yesterday's breakout sessions, where a bunch of our guys were discussing plans for pending work:
The code name for the next major release of Cincom Smalltalk is... Syrto. For those of you wondering what that is, it's a traditional Greek dance:
We had a good set of discussions at our annual planning meeting. We hold them in an interesting little place - in a "skybox" conference room overlooking the basketball court at Santa Clara University in Santa Clara, California. Other than the echoing of the basketballs from the practices, it's a pretty nice location :)
This time, thanks to the avalanche of feedback I got in email to this request for feedback, I was able to let the entire team know what people thought about our priority list immediately - very cool.
We have more planning to do in the near future - Cincom is in the process of hiring two new team members for the Smalltalk group - a marketing manager and a business manager. This isn't unique to the Smalltalk group; Cincom is doing this for the other product lines as well. This will end my relative isolation as Product Manager - I'll have a few more tasks on my plate as we get to work charting the future of the Cincom Smalltalk product suite.
All in all, it was a good week (other than the cold I seem to have picked up), and I'm looking forward to a productive year. Come see us at Smalltalk Solutions in Orlando, Florida - we should have a lot more to say by then!
There's a Smalltalk meetup happening in DC next week - take a look over here and join us! Here are the specifics:
What: Washington Smalltalk January Meetup
When: Thursday, January 20 at 7:00PM
Where: Starbuck's Coffee (Chinatown)
800 7th St., NW
Washington DC 20001
Event Description:| members | members := Collection new. members addMembersFrom: 'meetup.com'. members do: [ :from | members do: [ : to | ( from =~ to ) ifFalse: [ from greet: to]]]
See you there - I'll have the latest Cincom Smalltalk non-commercial CD's on hand.
I've given a talk at Smalltalk Solutions for a few years now; typically, it's been one of two kinds of talks:
- A "demo" sort of talk, either on goodies, BottomFeeder, etc
- A "What's New in Cincom Smalltalk" kind of talk
This year, I've got something different planned - more of a business level talk. It's one thing for me to talk about BottomFeeder implementation details (like I did last year) - it's another thing entirely to talk about how I use BottomFeeder in my job. That's What I intend to do - talk about how I use BottomFeeder to keep track of the IT industry, Smalltalk related stuff - and more importantly, what kinds of things people are saying about our products. I'm hoping that StS won't be the first time I give this talk; I've got a few other applications out for earlier conferences. We'll see how it goes.
Here's an interesting article on the changing media landscape. It's not the case that blogs are replacing the MSM; rather, they are augmenting it and changing it (much like TV augmented and changed rasio - you don't see many radio dramas anymore, but radio is far from dead).
Blogs have created an innovator's dilemma for the media business. They (and other technologies such as RSS, podcasting etc.) have emerged because first and foremost they have lowered the barriers to entry. Secondly mainstream media have for the most part, become staid and homogenous, reporting broadly the same news and events. Blogs on the other hand tackle far more diverse issues and topics and of course provide a wider spectrum of opinion - though this may or may not be good depending on your point of view.
The major challenge for the media business is that as blogs become widely adopted there will be a change in the media mix. If a consumer reads blogs then they are likely to still read newspapers and magazines, watch TV and listen to the radio - but it's also likely that the proportionate mix will change. Perhaps the consumer will reduce TV time or read a smaller number of magazines. That's the challenge.
For Public Relations practitioners the challenge is about understanding that mix. You need to understand where your audience is and how they are finding information. Once you have that valuable information, you need to use it wisely and communicate using the tools your audience prefer. This is why it is so important that this profession steps up and embraces the changes taking place online.
Reaching your audience may be harder than you think though. I'll give you an example based on conversations I had with some of the engineers I was meeting with yesterday at our planning meeting. For me, blogs have made it possible to hear more opinions - I read far more about MS oriented and Java oriented development than I ever did before. I also read tons more political opinion with which I disagree. That's not the norm for everyone though. A number of the people I spoke to yesterday use the proliferation of sources to further isolate themselves - they live in an echo chamber filled only with agreeable opinions. It's far easier to do this now than it used to be.
Back when your only sources of technical opinions were written journals, you were confronted with technology that you might not work with regularly, or even know much about. Back before blogs and 500+ cable channels, political junkies only had things like "CrossFire" (or going furher back, "Point/CounterPoint"). Now, you can immediately locate favored content and ignore all else. There are obvious issues with this, but let's look at one in particular - reaching an audience with your marketing message:
How do you intend to get your point delivered, much less listened to?
If increasing numbers of your target audience live in self imposed echo chambers, how do you deliver an alternative point of view? That audience will get encouragement and reinforcement from other members of the "tribe", and will become increasingly ill disposed towards your alternative message. It's a real problem, and it's growing.
Julia Lerman points to an impressive failure in software development at the FBI. This sort of thing happens in the public sector and the private sector - most of us who work in and around the IT world have our own store of "war stories". Ironically, I'd guess that we hear about the public failures more often - in the private sector, the people who got fleeced seem to be far better at burying the bad news. I know about multiple failures that have never been reported publically...
Rob Fahrni points out that the Patriots are (not) working hard to get a home field advantage:
ESPN: " The team left the Gillette Stadium grass uncovered through Wednesday's rain and Thursday's fog. With more rain or snow expected Friday and freezing temperatures for the weekend, the Indianapolis Colts' prolific offense could find the footing funky in Sunday's playoff game."
Well, we can definitely say that Apple isn't all about great marketing - they step in the smelly stuff as well as the rest of us. Have a look at this WaPo story:
CAMBRIDGE, Mass. -- Nicholas M. Ciarelli was not even old enough to shave when he started getting under Apple Computer Inc.'s skin. As a 13-year-old middle-schooler, the New Woodstock, N.Y., native built a Web site in 1998 and began publishing insider news and rumors about Apple, using the alias Nick dePlume. Three years later, ThinkSecret.com was first to report that the company would debut a G4 version of the PowerBook laptop series. The product launched soon thereafter, along with ThinkSecret's reputation among Apple's legendarily zealous fans, generating millions of page views per month.
But after a series of letters warning the Web site to stop publishing proprietary information, Apple decided enough was enough. When Ciarelli scored yet another scoop in late December, by predicting the arrival of a new software package and a sub-$500 computer rolled out at this week's MacWorld Conference and Expo in San Francisco, the computer maker filed a lawsuit accusing him of illegally misappropriating trade secrets
The stupid part is that people like that are providing free advertising and marketing. They aren't costing Apple money; rather, they are creating buzz. I've seen the same behavior with fan sites for books and movies. The question you want to ask yourself when you decide to send the lawyers is this:
Will this make me look better, or make me look stupid?
I'll give you a personal example - this action by Sony sent me over to Nintendo for a Game Cube. Sure, it was only one sale, and PlayStation2 sales aren't suffering (far from it). On the other hand, did that action by Sony have any positive impact? Likewise, will Apple's suit have any positive impact?
It sounds like the wave of spam directed at the UIUC Wiki contributed to the downtime it had recently - the drive filled up! Every spam page added is a page cached on disk somewhere, and it adds up after awhile.
Rok Hrastnik is about to release an online book on RSS marketing:
I just wanted to let you know that the RSS e-book is finally ready.
The e-book will be launched on Tuesday, 18th of January, at app. 6PM CET
I was interviewed for the book - I've got early access to the material as well. Looks good so far - there's a lot there. I'll post a link as soon as it's public
David Buck reports on the next Ottawa Smalltalk User's group meeting, which will be held in conjunction with the local Python group's meeting. Sounds like fun
I love it when an idea comes all the way back around after a multi-year detour. Have a look at this post from Dave Roberts on binary XML:
Essentially, everybody is finally realizing that while XML is the first widely accepted data markup format, it's a pig. It's verbose and redundant, which makes it store poorly, transmit slowly, and parse, well, like a pig. Don't get me wrong, the original idea for XML was actually fine, but everybody has taken it way over the top and things like SOAP are just an abomination.
Well, one way to help XML while still retaining XML semantics is to make a binary version of it. While a compression program like gzip can reduce the overall transfer and storage size of an XML document, an XML parser would still have to deal with the XML textual format on either end of a transfer. And that textual XML format actually expands binary data carried in an XML document by forcing it into BASE64 format which does a 3-goes-to-4 encoding.
The reason XML took off had far more to do with the openess of Port 80 than with anything else. RSS? Nowhere without the open ports. SOAP? Nowhere without the open ports. So now we want to create a binary encoding for XML? Heck, why not just run a CORBA broker on port 80? We could have done that 7 years ago and called it a day...
Is there any information domain where the open, world-writable wiki model can't be beneficially applied?
We're sure to find out, as wikis and wiki-variants appear everywhere.
When frustrated with DMOZ, I've often wished for a more radically open web directory, with submissions and categorizations from anyone, at any time, like a wiki.
Community moderation would curb the worst abuses. Those wishes have been answered: the latest from the folks behind Wikipedia is Wikia, which applies the wiki philosophy to a search index of web sites.
I'm not sure that moderation is enough. Have a look at the Recent Changes page on the VW Wiki at UIUC - see all the "changes" over the last few weeks? There's been a flood of daily spam attacks, with the corresponding promotion of good versions of the pages back up. Gordon notes this:
Why not let any contributor instantly add sites -- even individual pages within sites -- and reorder the results of any search based on users' perception of sites' appropriateness to the query? Well, spammers and system-abusers and ranking-wars, I guess. But could open feedback systems be devised that keep those problems suitably in check? It's worth a try!
I don't know what the Wikipedia site does to prevent spam; maybe they have something like Spam Assassin hooked up on the back end. Relying on user moderation is an endless, thankless task though - and I rather suspect that the maintainers will run out of steam before the spammers do. It really, really sucks how the a******* have ruined the entire neighborhood this way.
The Titan probe landed successfully yesterday, and has started sending back data on what it's finding. I love this description of the terrain:
Data sent back by the Huygens space probe from the Saturnian moon Titan show a frozen, orange world shrouded in a methane-rich haze with dark ice rocks dotting a riverbed-like surface the consistency of wet sand, scientists said on Saturday.
Kudos to the European Space agency, Italian Space agency, and NASA for their work on this.
ComputerWorld reports another nail in the iTanium's coffin - no more Windows XP for that platform:
Microsoft Corp. has pulled the plug on a version of Windows XP for workstations running Intel Corp.'s Itanium 2 processor. The move follows the decisions by major hardware suppliers to stop building workstations based on the 64-bit chip.
I see that Larry O'Brien is all excited about AOP - specifically, about being able to wrap methods:
Aspects are classes whose methods are injected into existing programs at "join points," the most common of which are method entry and exit. Another way to put it is that aspects intercept method invocations and surround the existing method with new behavior. The other critical feature of aspects is that they do this without any modification of the existing source code whatsoever.
I love the way people using Java and .NET get all excited over things that were invented eons ago in Smalltalk (or in many cases, in Lisp). Take a look at this paper from John Brant, Don Roberts, and Ralph Johnson. That dates to the mid 90's, but if you look at the references at the bottom, you'll see that it's all based on previous work.
The excitement never ends for curly brace programmers, because the past simply doesn't exist...
I just ran across an interesting article by Allen Holub on Open Source, and some qualms he's developing over it:
My most recent experiences with Tomcat demonstrate my dilemma. The newest version of Tomcat was just plain broken. It didn't load and run custom tags correctly, for example. Though I could (in fact did) go to an earlier version, the fact that Tomcat wasn't subjected to even rudimentary regression testing is disturbing.
The next issue is one of documentation and developer attitude. I wanted to use a local-configuration-file mechanism that's pushed heavily in the documentation (context.xml), and I was deploying from the development directory using the Ant build.xml file that's shipped with Tomcat. This process simply doesn't work.
I started with the documentation to find a solution. Tomcat's documentation is virtually unusable, though. It's a hodgepodge of inadequate .html files. There's no way to print it. There's no real organizational principle, index or search mechanism. A lot of the documentation is written by developers for other developers, so it is sketchy at best, and incomprehensible to a new user.
This isn't surprising. A lot of the necessary, but "drudgery" related tasks for a software project simply don't get done unless someone is paying for them (either explicitly or implicitly). Documentation is one of the best examples - and before you point to Eclipse or Apache - look at the well funded foundations that back those projects - someone is paying for that drudge work. The same thing applies to usability related tasks (build procedures, UI, etc) - once the lead developer(s) have a system that works, they tend to stop - it's no longer an interesting problem, so it falls by the wayside. In a funded effort, you end up with complaints coming in that might affect revenue - so the problems tend to get addressed
Note that funding is separate from licensing - this isn't a failure of Open Source per se. It's endemic to Open Source simply because most Open Source projects are not funded. Allen hits the punch line here:
The bottom line is that you have to do a lot of work to use an open-source product like Tomcat. You have to test new versions rigorously; you have to compensate for inadequate documentation by using a high-volume mailing list that may or may not provide an answer in a day or two; you have to trust amateurish code, which you can't really repair yourself because your version will no longer be "standard." Even if you could repair the code, progress on your real project would slow down.
All of these problems translate directly to wasted time and wasted money. The Tomcat team has no incentive to fix any of this because they're not being paid and have no market pressure on them to improve quality or provide real customer support. I know that the open-source party line is that third parties will step up to fill these gaps, but then you're paying for the software, aren't you? So much for the price advantage.
There's a danger there even to funded open source efforts that Allen doesn't address - I wrote about that here. To summarize, take an open product of significant scale - JBoss, for example. You can buy service and support from the company itself, but getting the product itself is free. There's nothing stopping another entity from setting up support for that product at a lower price - and at lower cost. The JBoss group has to carry the cost of both developers and of support staff - a third party need only carry the cost of support. In a non-open licensed product, the vendor can still recover their costs via license sales, even if their support gets cannibalized. What's the Open Source vendor going to do in that case? I rather suspect that they'll get blown out of the water.
There's no free lunch for this stuff - if you expect to get everything for free, you might want to ask yourself why your company doesn't give its products away.
The comment spammers keep trying - there was a spate of it this afternoon - 15 on one post. The amazing thing is how much effort the guy doing it went through - he was using the web form, and it was clearly not a bot - because there was just over a minute between each spam post. I've added some more checking to the server that should put a stop to that kind of nonsense.
I decided to take a look at my RSS feed stats - it's been awhile since I looked at them, and I've increased my readership since then. There are four tools that dominate my feed stats right now:
|Net News Wire||21.5%|
That makes up almost 84% of my readership right there. I was a little surprised by the Net News Wire totals - that's a Mac tool, and the number of hits from that tool was near nil just a few months ago. Am I getting more Mac users, or are more people buying Macs? Or both?
I create content in part to promote my law firm, which I cannot do effectively if my contact info is removed. I do not participate in targeted advertising programs because the majority of advertisers that target the keyword 'trademark' are competitors. I cannot prevent such advertising when my page is reproduced and 'framed' by a third party.
Now have a look at his site - see that RSS feed icon? What does he think happens when people subscribe to it? Sheesh...
Darren Hobbes has just started re-examining Smalltalk, and is looking at VisualWorks. He was worried about threading, so I figured I'd point a few things out:
- VisualWorks is single threaded (in general; the VM runs a few threads for housekeeping purposes, and you can use native threads for external API calls)
- In VW, Smalltalk processes are "green" - i.e., they run within the context of the single heavyweight process
- In ObjectStudio, Smalltalk processes map directly to Windows threads
Interestingly enough, you get far, far better control and predictability with the green threads - and a VisualWorks app (like this blog, for instance) scales quite nicely. Our engineering group just ported Opentalk from VW to ObjectStudio, giving the two products better interoperability - and the single biggest problem was with the native threading. If a native thread dies, you lose the entire application - as opposed to a VW thread, where you log the problem and move on. Earlier today, I introduced a bug into the server with comment handling (I was putting a throttling system in place). I forgot to update the form page, and - as a consequence - every process trying to add a comment threw an exception. These simply got logged by the server, whic kept humming.
Here's another example of process usage - BottomFeeder. With the default settings, Bf spawns a new process for each http query done during the update cycle. I subscribe to 296 feeds now, so that can be a lot of processes. If those mapped to Windows threads, my machine would be brought to its knees - instead, I still have UI responsiveness.
Sometimes, you have to look beyond the conventional wisdom to see how scaling works - native threads are not the be-all, end-all that people think they are.
Dave Winer claims that one of the things blogging addresses is the conventional wisdom problem that the "professional" media have:
At the time I started blogging, the pros were reporting that there was no new Macintosh software. I would call these reporters and point out that there was lots of new Mac software, they were using it, they knew about it. They would respond by saying Everyone knows there's no new Mac software.
I don't think knew they were being dishonest, by then reporting wasn't about facts, it was about conventional wisdom. If CW said there was no new Mac software then the reporters would report that. This meant that competitors didn't actually have to win in the market, that's much harder, they just had to convince the reporters that they had. This leads to not only a very wrong place, but a dangerous one, because the reporters had come to have so much power. With no accountability, no way to vote them out of office, we were totally controlled by them. That's why blogging came about, as a counter-action to the corruption of the professional system.
This isn't a problem limited to reporters, professional or otherwise. I can't recall who said this, but it's applicable here: "It's not what you know that kills you, it's what you know that isn't so". The "conventional wisdom" pops up all over the place, all the time. Winer's Mac example is perfect - so is the "Isn't Smalltalk dead?" question that comes up a lot. Heck, Winer contributed to this himself a little while back - look at this post where he simply assumed (wrongly, as it turned out) that a scaling issue was due to using a scripting language instead of a "real" one. There's a piece of conventional wisdom that has slowed progress in the IT sector for decades now.
This isn't a problem simply of professionals versus the "squeaky clean" blogger heroes - it's about the unrevised assumptions we all make about everyday things. We all have our biases, and plenty of them are uninformed - at best.
I've been wondering for awhile about some of the referer spam I get - the links either point nowhere, or they point to an "account closed" type of page. This was baffling before I ran across this eweek story - it's an unintended consequence of the CAN-SPAM act:
One troublesome technique finding favor with spammers involves sending mass mailings in the middle of the night from a domain that has not yet been registered. After the mailings go out, the spammer registers the domain early the next morning.
By doing this, spammers hope to avoid stiff CAN-SPAM fines through minimal exposure and visibility with a given domain. The ruse, they hope, makes them more difficult to find and prosecute.
The scheme, however, has unintended consequences of its own. During the interval between mailing and registration, the SMTP servers on the recipients' networks attempt Domain Name System look-ups on the nonexistent domain, causing delays and timeouts on the DNS servers and backups in SMTP message queues.
I think I'm seeing a variation on that - either the new domain isn't up when I see the ref, or the domain has been taken back down. This is getting to be like the old "Spy vs. Spy" routine in Mad magazine. How much damage are these bozos doing to the network commons? Here's a thought:
"We have to figure out how to taper DNS services gracefully rather than having catastrophic failures," said Paul Mockapetris, the author of the first DNS implementation and chief scientist at Nominum Inc., based in Redwood City, Calif. "Mail look-up was the first application put on top of DNS after I designed it, and I was so excited to see that. And now, 20 years later, people are trying to figure out how to stop doing mail look-up on DNS. It's bizarre."
Yes, that's a provocative title. I was reading this piece by Jim Rapoza of eweek - and I ran across this:
But why should we expect it to be? Face it: The bad coders are winning. They've convinced users and companies that bugs, security holes and patches are inevitable, and everyone just shrugs their shoulders and accepts that - no matter how bad things get.
But it doesn't have to be this way. All of us have seen even large, complex applications with source code that's clean, free from bugs and secure. All it takes to write good code is the desire to do so, but there really isn't any incentive for software companies to write clean, secure code.
It's not simply a matter of desire, it's a matter of incentives. Part of it is what he says in the next paragraph - end users of software want new features and functions more than they want anything else. I don't think that's all of it though. A large part of it is price. Look at what's driving the industry today - open source and outsourcing, both of which (from an IT management perspective) are about cost control. Secure code? Way, way down the priority chain. If we can get systems done for $15 an hour, have it!
You won't start seeing secure code until end users are willing to pay for it. At present, it's pretty clear to me that most aren't willing to.
I just made a modification to the development update stream that you'll notice - on Windows, "slim mode" now goes to the Windows tray instead of to a smaller window. This is more in line with what Windows users expect, and it gets the application completely out of the way. The tooltip will change when there's new content, so you'll be able to tell what's up. If I can figure out how to add another icon to the executable with ResHacker, I'll see about changing the tray icon when there's new content as well.
This post piqued my interest - a Java developer asks a question about Groovy:
Well... the Codehaus site tells me a lot about what I guess I need to know about the language... but I'm still wondering why I'd want to use it. The home page says:
Groovy is a new agile dynamic language for the JVM combining lots of great features from languages like Python, Ruby and Smalltalk and making them available to the Java developers using a Java-like syntax.
Okay... great features, I guess, except I don't know what the great features from Python, Ruby, and Smalltalk are that I'm looking forward to. Agile and dynamic here look like buzzwords, although they both have real meaning.
The next paragraph:
Groovy is designed to help you get things done on the Java platform in a quicker, more concise and fun way - bringing the power of Python and Ruby inside the Java platform.
Um. A quicker, more concise, and fun way... when I don't really have a problem doing quick, concise things in Java as it is. (Maybe my definitions are poor.) The power of Python and Ruby still don't do much for me.
This points out something I commented on in a post Joel made awhile back - where Joel claimed that "millions of developers" had seen Lisp and rejected it. What I'm seeing is that lots of developers have never so much as seen anything outside the C family (with the possible exception of Basic). That's why we see posts like the one above - and I'm not pointing at it in order to ridicule it. Simply put, many developers can't imagine what benefits a dynamic language might have because they've never seen one. To them, it's all about what they learned in C and its various cousins.
This tells me that we - the proponents of dynamic languages - have a rather large education issue in front of us. It's not so much that people disagree with us about dynamic languages vs. static ones (although there's plenty of that) - it's that they have no idea whatsoever as to what a dynamic language is.
An interesting discussion of scalability started up in this comment thread after I posted on Smalltalk threads. One of the commenters pointed out that BottomFeeder had trouble importing his feeds - he had 5100 of them. The interesting thing is that he immediately assumed that it was due to the spawning of 5100 threads - he saw CPU spiking and a crash.
Now first off, I'll admit that I've never considered the use case of 5100 subscriptions :) Having said that, threads really aren't the problem here - memory usage is. Go open the settings in BottomFeeder, and look at the Memory page. There's a soft limit on memory, and a hard limit. Once BottomFeeder hits the hard limit, any request for additional memory will result in garbage collection - and eventually, if the problem persists, an out of memory exception. Now, it can get ugly at that point - it takes memory to put up a dialog stating "out of memory", so it's possible to have a complete crash. You can customize the memory policy to deal with that (assuming that actual real or virtual memory is available, of course).
Here's what's going on - the base BottomFeeder application consumes 30MB before you load any feeds. Yes, that's fairly "hefty", but I have what amounts to a "kitchen sink" image with lots of libraries pre-loaded - it's made some of the additional features I've added to the app much, much simpler. I subscribe to 300 feeds, and they are all kept in memory. As I'm sitting here, Bf is consuming an extra 30MB over it's starting base. To figure out why, you have to understand this:
- How VisualWorks memory management works
- How BottomFeeder keeps feed and item data around
- What happens during the BottomFeeder update cycle
There's fairly extensive documentation on the memory model in our Application Developers Guide - for our purposes here, I'll explain parts of it. There are three areas of memory relevant to our discussion here:
- New Space - where new objects are "born"
- Survivor Space - 2 memory zones where objects that survive the first pass GC in New Space end up
- Old Space - As the Survivor Spaces fill, objects "tenure" into Old Space. Unless there's manual intervention or a hard memory limit, Old Space simply grows and is not scavenged
Now, let's look at an update cycle. BottomFeeder loops over the N subscriptions, and if threaded updates are on, spawns N threads. Each thread looks at the feed in question, and does the following:
- Is it time to check for an update? Check the meta data in the feed (if any). Possibly answer no, and generate no update
- If the answer was yes, issue a conditional-http get (assuming the feed supports that). If that answers no new stuff, the thread ends
- If there is new stuff, we now have an HTTP response object (consuming memory)
- We now parse the XML we got into an XML doc (consuming memory)
- We now do an XML to Object conversion, creating a set of items for the feed (consuming memory)
- Those items are merged into the existing feed, and this thread terminates
That's a noticeable amount of memory for each update that comes back from the server, although much of it is transient objects - i.e., objects that should never make it into Old Space. In older revs of Bf, I didn't have New Space set big enough, and memory usage tended to grow after the first set of updates, and stabilize high - too many objects tenured. I now have that handled, at least for subscription sizes in the neighborhood of mine and smaller. For 5100 though? I suspect that a lot of tenuring is happening, and is subsequently slamming into the hard upper limit on memory - not to mention that the persistent objects likely consume enough space to stress the default upper bound
So let's return to the example - 5100 threads answer back with well over 1000 responses, generating an equal number of XML documents, and a large number of items. That likely stresses both the size of New Space and the default memory bound - causing a scaling problem. Note how threads - which are very, very cheap in Smalltalk - don't really enter into it. Now, creating a thread pool for a large number of feeds might cut down on some of the CPU spiking, but it's not really an issue in the problem at hand
In the case at hand, setting the upper bound of memory up higher - to something like 400 or 500 MB - would allow Bf to handle that number of subscriptions. The hard part is the tenuring. While Old Space size limits can be configured on the fly, New Space and the Survivor Spaces can't be - to change them you have to save the image and restart (something that Bf does not do in its deployed state). I need to ask our VM guys how feasible it would be to allow runtime resizing of those.
In any case, a thread pool isn't really the issue here. Having said that, I created one as an optional thing for BottomFeeder last night. I noted that a commenter pointed out that Java developers didn't need to roll one, since they can just download one. That's as may be, but I was able to create one in about an hour while most of my attention was on "Desperate Housewives" (a true guilty pleasure if ever there was one). A general thread pool would have been more trouble anyway - I created a standalone pool implementation first and discovered that the hard way :)
With my first post of the day coming after 3:00 (US EST time), it looks like I've been working all day. Nothing could be further from the truth :) I bought the latest Turtledove book, "Homeward Bound". It's part of the World War/Colonization train he was on before the civil war/WWI extravaganza, and I'd been waiting for it. Action fans will be somewhat disappointed; it's not filled with the sounds of battle like the previous ones were. Nevertheless, it's been interesting, and it kept me up until 4 AM last night...
Tim Bray explains some of the weird referer spam we've all been seeing:
Near as I can tell, pretty well every somewhat-visible website in the world is seeing its logfiles fill up with with bogus page fetches there only as a vehicle for a spammish "referrer" field; whether or not the site posts referrer data. This high-volume flood is a fairly recent phenomenon, and what makes it weird is that the vast majority of the bogus referrer sites are off the air due to some terms-of-service violation. It would appear that a sleazebag somewhere launched a really ambitious assault on the whole world -- using, I can only assume, a few zillion zombified drone machines -- only to be found out and have their hosting yanked while their mindless slaves continue to spew vacuous venom into logfiles everywhere. Damn, the Internet is a weird place.
I ran across an interesting requirement today - I needed to enable/disable individual settings in the BottomFeeder settings tool. That tool is simply a user of the new settings framework created by Vassili. In a typical VW interface, enabling/disabling widgets isn't hard - you do something like this:
(self builder componentAt: #idOfWidgetHere) enable.
It gets trickier when the widget is in a subcanvas - the builder used to create the subcanvas isn't typically kept, and the widgets in it aren't tracked by the builder in the outer UI. Typically, one solves that by grabbing a copy of the builder used for the subcanvas and caching it. Each page of the settings tool, is a subcanvas, so I had to accomplish the same thing. Here's where I had to do some digging. In class SettingsManager, there's a method that swaps the pages:
pageSelected | page tree | page := self pageListHolder selection. page isNil ifTrue: [^self]. tree := self pageListHolder list. ((tree isExpandable: page) and: [(tree isExpanded: page) not]) ifTrue: [tree expand: self pageListHolder selectionIndex]. (self widgetAt: #GroupBox) label: ((TreeViewIndentedLabelAndIcon with: page label) icon: page listIcon). "This is the only kind of label with a sane icon positioning." page builder: nil. (self widgetAt: #Subcanvas) client: page spec: page spec builder: self builder newSubBuilder. self updatePageSubcanvasScrollbar
The actual swapping happens in the #client:spec:builder: message send. So, to do what I needed to do, I changed that section as follows by overriding that method in my subclass:
newBuilder := self builder newSubBuilder. (self widgetAt: #Subcanvas) client: page spec: page spec builder: newBuilder. self updatePageSubcanvasScrollbar. self possiblyAdjustEnablementFor: newBuilder onPage: page
By holding a reference to the builder, I can call out to a new method and check which page we are on - and also check the applicable application state. From there, I can grab the relevant widgets and enable/disable them. Here's what that method looks like:
possiblyAdjustEnablementFor: newBuilder onPage: page "if we are in certain application states, disable some settings" RSS.RSSFeedManager default isUpdating ifFalse: [^self]. page id = #(#rss #network) ifFalse: [^self]. "This grabs each wrapper for the thread options and disables them if we are in the update cycle" #(#'-rss-network-runThreadedUpdates' #'-rss-network-shouldSpreadUpdates' #'-rss-network-shouldThrottleThreads') do: [:each | (newBuilder componentAt: each) disable]
Those ID's are not pretty - they are manufactured by the settings framework, and I figured out what they were with a breakpoint and an inspector. In any case, the method checks the application state and the page, and then toggles the appropriate settings based on those. In this case, I'm making sure that the threading/non-threading of updates cannot be mucked with while the update process is awake and running. In general, it looks like it's easy enough for any application to enable/disable settings options via this strategy.
aPage when: version valueHolder valueSatisfies: [:v | v = #latest] enable: level
the first argument is a value model. Whenever it changes, the block passed as the second argument runs and the module passed as the third argument is enabled or disabled depending on whether the block returns true or false.
As usual with Vassili's designs, it's simple and makes sense.
Here's an interesting article on usability concerns. In looking at a US Air Force document from 1986 on creating good user interfaces, they discovered that most - 70% - of the guidelines were still applicable. Things haven't changed as much as we might have thought - this document was focused on the development of "green screen" mainframe applications. The key point is here:
You would be hard-pressed to find any other Air Force technical manual from 1986 that's 70% correct and relevant today. Whether for pilots, airplane engineers, or programmers, general lessons of the past might continue to apply, but the specific guidelines changed long ago.
Usability guidelines endure because they depend on human behavior, which changes very slowly, if at all. What was difficult for users twenty years ago continues to be difficult today. People can only remember so many things, and we don't get any smarter.
No matter how many spiffy we think UIs are now compared to then, the basics really haven't changed that much.
There's a new e-book out on RSS for marketers: "Unleash the Marketing and Publishing Power of RSS" by Rok Hrastnik. I was interviewed for the book - you can buy the book here:
It's a pretty good summary of what RSS is and how it works, but not at the technical implementors level. This is aimed at marketing and sales folks who would like to know what the buzz about syndication is about, and how it can be used.
Darren Hobbs asks about threading strategies in Smalltalk relative to the ones he knows in Java:
For example with the latest release of Java you might use the nonblocking IO library and one thread per cpu, keeping all the cpu's maxed out without thread-switching, so your only bottleneck is available memory (and browser timeouts). Is there an idiomatic smalltalk equivalent?
My usual tactic for maximising throughput is to minimize the number of thread/processes running, and only hand off to another process when you would otherwise have blocked, for example on IO. Nonblocking IO support in java allows me to use just one thread for all an application's network IO, and keep the rest of the threads as busy as bandwidth allows. I don't know how to do this in smalltalk, nor even if I'm worrying about the right problem in smalltalk
The primary difference (at least with respect to VisualWorks) is the nature of threads in the language. In Java, threads (usually) map directly to OS level threads. That means that you have to manage them carefully; create too many and you'll just spike all available CPUs. In VisualWorks, threads are lightweight (green) threads within the context of a single heavyweight Smalltalk process. That makes them nearly free from an overhead standpoint.
In the context of something like a web server, this is especially true - the server I host the twenty Cincom Smalltalk blogs on is running the VisualWorks web application server, and handles a decent amount of traffic. Here's how things work in the server itself:
There's a listener process - this one sits on the application server's inbound port, waiting for connections. It runs at a high priority (70, where the scale in VW runs from 1 to 100). As connections come in, the server determines who (i.e., which web application) should handle each one. As soon as it determines that, it forks off a Smalltalk process at 50 to handle that connection, and then goes back to listening. This scales pretty well; we have yet to see any kind of issue in this server, and many of our customers handle much higher loads than we do.
Now, when I say "threading in Smalltalk", I have to be careful which Smalltalk I'm talking about. Cincom Smalltalk consists of VisualWorks and ObjectStudio - ObjectStudio maps Smalltalk processes to native threads. In VisualWorks, we can use native threads when spinning up an external API call - so we can make sure that database calls (for example) don't block the VM (important for a server!). Native threads have their own issues; when engineering ported Opentalk from VW to Opentalk, they ran into a few problems:
- As a developer, you have far less control over native threads than you do over green threads
- minimal ability to set priority
- if a native thread crashes, it can take down the entire system - a green thread crash can be handled easily via standard exception handling
Other Smalltalk implementations vary as well. Squeak and VisualAge are like VW - green threads for Smalltalk processes, and in VA, the ability to map an external API call to a native thread. Smalltalk MT, like ObjectStudio, maps Smalltalk processes to native threads. I'm not entirely certain what Dolphin does; I think it uses green threads. In any case, the point is that it varies by implementation.
Dare Obasanjo reports that MyMSN now supports RSS and Atom - looks like users have a fair bit of control over what they can put out as well. There's another effect here as well - now that Google and MS are both supporting Atom 0.3 format, the eventual release of Atom 1.0 becomes pretty much irrelevant. This is a correction from earlier - I misread the post by Dare.
If you've tried to post a comment via the web form with Safari, it's likely the case that you got a silent failure. I added a comment throttle awhile back to help prevent spam - the trouble is, it looks like a submit from Safari results in the server getting 2 POSTs within a few seconds of each other - I have no idea why. Very, very odd...
Update: This has a perfectly good explanation that I should have thought of - the first post was a preview, the second the actual post. Both hit the server as a post, of course. When I implemented the comment throttle, I stupidly didn't take the web form preview into account. That's been addressed; it all ought to work now.
Slashdot is reporting that "Enterrpise" is on the ropes. Hmm - could that have something to do with:
- The generally lame plot-lines they've pursued? (ooh, Vulcan emotions. No one's explored that topic before)
- The non-serious way they've dealt with enemies? "Look, those guys are trying to kill us. Set phasers on Stun".
- The completely derivative nature of last season's xindi plot (look, a sneak attack on Earth using terror tactics! That couldn't possibly be our commentary on the post 9/11 world!)
I've been frustrated with the Star Trek universe for years. Paramount needs to send Berman and anyone who looks like Berman on a permanent vacation, and find a real set of writers. For starters, they could look at some of the alumni from SG-1...
This is very good news - Google is getting out front in the fight against comment spam. I'm going to have to look at supporting this in my server:
If you're a blogger (or a blog reader), you're painfully familiar with people who try to raise their own websites' search engine rankings by submitting linked blog comments like "Visit my discount pharmaceuticals site." This is called comment spam, we don't like it either, and we've been testing a new tag that blocks it. From now on, when Google sees the attribute (rel="nofollow") on hyperlinks, those links won't get any credit when we rank websites in our search results. This isn't a negative vote for the site where the comment was posted; it's just a way to make sure that spammers get no benefit from abusing public areas like blog comments, trackbacks, and referrer lists.
I was getting very, very tired of fixing the wiki pages on the uiuc VW Wiki. Apparently, they have a fix that they plan to implement - in the meantime, something like 100 pages got spammed repeatedly yesterday. That was annoying to cleanup. However, there's a simply way to revert a Wiki page back to a former version. It was a manual process to gather all of the pages along with their last good version, but now I have that. I think - based on the time interval - that the spammer is running a script that edits a set of known pages and inserts his set of bozo links. Well, now I have a script that simply resets all of them. Check out the Recent Changes page - see how closely bunched all of those changes from one system (mine) are? Fight fire with water, I always say...
I've started to doubt that the noFollow thing will put a serious dent in spam. Don't get me wrong - having the ability to mark a link as ineligible for crawling is useful in and of itself, and will at least help with bogus page-rank. However, it's not going to fix the problem itself.
Look at the strategy behind email spam - the cost of sending the email approaches zero, so the spammer only needs a miniscule response rate to make it worth their while. This is in contrast to direct mail, where there are actual costs involved. Over time, sending mails back to people who never buy anything has a real cost - doing the same with email is virtually cost-free. Sadly, the same thing applies to spam in comments, referer lists, and wiki pages. Page rank is a nice bonus for these people, but it's not really the driver. With no actual costs, they can keep pounding away, happy to receive whatever tiny percentage of click-throughs they get.
For an example, just look at the spam pounding that the UIUC VW Wiki has been getting. Yesterday, more than 100 pages were defaced, most likely by a script. They ended up getting restored quickly (see my previous post), but over the course of the day yesterday there were 3 mass defacings. None of them stayed up long enough to guarantee a bot crawl - but they were up long enough to get found, and for some small percentage of people to - gosh knows why - click through.
While this is a good and useful idea, I don't think that the level of celebration ringing through the blogosphere is warranted - the impact of this will be minimal at best.
Update: a few posts along similar lines of thought:
Sean McGrath has an absolutely priceless commentary on the optional typing hullabaloo over in Python-Land.
I see posts like this and I realize how small the impact of Smalltalk and Lisp have been on most developers. Have a look below:
At one time or another, every programmer has imagined what it would be like to work directly with the deep structure of code. Some of the best minds in the business are working to make that happen. The legendary Charles Simonyi, who left Microsoft a couple of years ago to pursue his vision of intentional programming, says deep structure is at the core of the toolset his new company, Intentional Software, is building. Sergey Dmitriev shares Simonyi's vision, and his company -- JetBrains, creator of IntelliJ IDEA -- wants to do something similar with its next-generation toolset. These projects are still under wraps, but another champion of deep structure is working out in the open. Jonathan Edwards, currently a visiting engineer with MIT's Software Design Group, has built a prototype system that he is demonstrating in a screencast at subtextual.org.
There are big ideas at work here. In Edwards' prototype, programming, testing, and debugging are just different ways of interacting with a program's tree structure. Edwards' 2004 OOPSLA paper, Example-centric programming, explores one of the benefits of this arrangement: the examples (or "use cases") that drive program design are worked out the context of the living and evolving program. We've all heard this stuff before. I may yet go to my grave without emacs ever having been pried from my cold dead fingers. But it's worth pondering, now and then, what we could do with tools that didn't think of programs as strings of text. Full story InfoWorld.com
You know, it's not like this stuff doesn't already exist. I realize that it's likely news to Gosling, who seems to think that he's the first one to have ever thought of parse trees. As for Simonyi, I think the phrase "Hungarian Notation" tells me everything I need to know about what his vision would look like... almost certainly a place I'd have no interest in going.
Seriously though, live code isn't a new thing at all - Lisp has been around a long time, and so has Smalltalk. Everything that is talked about above - it's been out there, implemented and in use for (literally) decades now. I guess that - for lots of developers - if it doesn't use curly braces and semi-colons, it simply doesn't exist...
After a few comments in this thread, followed by this post, I thought I should address truly large feed subscriptions in BottomFeeder. By default, I set a memory ceiling of 128MB in Bf - that was an attempt to make it a sociable client that wouldn't grab memory like a sociopath on your desktop :) Now, that's resettable - you can set that figure higher in Settings>>Memory. If you really want to read a few thousand feeds - then I'd make the following settings changes:
- Memory: set the soft limit to somewhere around 300 MB, and the hard limit to 500 MB or above
- Turn threaded updates on
- Turn "Limit the number of threads during update" on
The latter will cut down on how many concurrent xml docs stack up for parsing at once during update cycles. The lower limits are kind of tuned for what I consider "normal" usage - a few hundred subscriptions at most.