It's a disaster of epic proportions - I can't get to the Dilbert site, and the feed hasn't been updated....
On the Atom mailing list, there's been a lot of talk recently about what should/should not be done with malformed feeds. The answer probably differs based on context:
- In a b2b context, you probably want to reject malformed XML data. This isn't an appropriate place to make a "best guess" and move along
- In a consumer context (i.e., the one most news aggregators live in), it's reasonable to flag the bad data (so that a user who cares can report it) and try to present it anyway.
The difference is context - if it's a business level communication, then guessing isn't appropriate. If, on the other hand, I'm trying to find out what the latest baseball scores are, then I don't really care about the stray Unicode character that wandered into a feed.
The truly interesting piece is the stats that Mark Pilgrim dug up:
I analyzed 5096 RSS and Atom feeds chosen at random from Syndic8.com and parsed them with Universal Feed Parser 3.0.1 using the latest version of libxml2 as the underlying XML parser.
Actually, I analyzed more feeds than that, but I threw away feeds that
- didn't either return an HTTP status code 200 or redirect to a URL that returned 200, or
- didn't have a recognizable root-level element of some version of RSS or Atom
- 3929 feeds (77.10%) were well-formed.
- 961 feeds (18.86%) were not well-formed due to specifying "Content-Type: text/xml" but containing non-us-ascii characters.
- 206 feeds (4.04%) were not well-formed for other reasons.
Nearly a quarter of the feeds chosen (and likely this holds across all feeds) have issues - and they have issues that a tighter spec is not going to solve. We've crossed the Rubicon on this one, at least in the consumer space....
Maybe the next big health care crisis won't involve humans at all: The Register reports that PC's spend more time on "sick leave" than people do:
British PCs are taking an average of nine sick days a year due to spam and viral attack, Yahoo! claims. This is two more days than the typical Brit spends at home as a result of illness/injury/Euro 2004-induced hangover.
So when will they start taking "mental health" days?
I decided to figure out how many downloads BottomFeeder gets from the server, so I had a look at the apache logs. I was pleasantly surprised to see that I'm averaging over 100 downloads of the install files (for all platforms) per day. I've also determined that Mac penetration is higher than most people seem to think - 1/7th of the Bf downloads are for the Mac platform. Maybe I should set up a mailing list :)
In a discussion of "next gen" development features, Wenser Moise makes an odd Smalltalk reference
I think a major cause of the delay in this revolution is that both C/C++ relied on preprocessors and headers. Some historical languages like Smalltalk actually had this support. Fortunately, more modern languages like C#, Java and VB are standalone files, one class per file, with a little or no preprocessing support. This enables easy parsing.
Historical language?Heh. He needs to have a look here. Source code in a db? Yawn, been there, done that - for years. I'm not sure about his parse node thing though. I just don't see direct representation being more meaningful than text....
Greg Reinacker - the guy behind NewsGator - just got VC backing. The aggregator space just got more interesting....
- Memory allocation
Because memory allocation is new's primary function, it is hard to see how allocating memory from the heap could be a problem. But sometimes you want to allocate memory from somewhere other than the heap, and at other times you don't want to allocate memory at all. In other words, you may want an object, but you may not want to create one, at least not here, not now.
Now, in a language like Smalltalk, this isn't necessarily a problem - you can create your own version of #new and have it do what you want (create a new instance, manage some pool of instances by issuing one, return a singleton... you get the idea). As to where in memory the object is created - well, I figure the GC sub-system is going to do a better job with that than I am, so if I get a newly allocated object, I'm happy to let the system handle it. However - I can manage what #new does. The author of this paper goes through a number of examples of the issues in Java; it's worth reading. My little discussion of #new in Smalltalk gets into his second concern: polymorphism:
Turning to the second problem: new is famously nonpolymorphic. When I write new Rectangle(), I always get a Rectangle, never anything else, and in particular never a subclass of Rectangle. Much of the time this is what I want, but sometimes it isn't; for example:
A framework where I request an implementation by name or other attribute, and am given an instance of the appropriate class. Some of the classes in the java.security package, like Signature, work this way. Or imagine a data structure wizard that lets a client specify the desired properties of a data structure (for example, constant average-time access, no duplicates, ordering not required) and returns an implementation satisfying the criteria (HashSet, for instance).
Well, #new in Smalltalk is, of course, polymorphic. A good example of Factory usage in VisualWorks is the class SocketAccessor - you send instance creation messages to that class, but you actually get a platform specific subclass. It's not that you can't do this in Java; it's that you can't do it in the context of new(). What this does is add a layer of cruft, whereby you have to remember "oh yeah, I can't do that here...". To quote another section and reveal another issue:
The problem with static factory methods is that they are static; strictly speaking, a call to one is not polymorphic. That is, when you call Signature.getInstance you are always invoking the same code. Although its strategy for returning an appropriate subclass may be clever and complex, there is no way to use a different strategy.
The obvious antidote - indeed, the only other possibility - is a nonstatic factory method. In other words, you call a method on an object, and you get back an object (possibly a new one). Since instance (that is, nonstatic) method calls can be polymorphic, creation can be as well. There are a variety of ways in which this basic mechanism can be exploited.
Here the problem is that in Java, classes aren't actually objects - you can't add class methods in the same sense that you can in languages like Smalltalk. Again, this adds a layer of cruft that developers have to work around. A lot of what's problematic in java could be fixed by making classes into actual objects, and getting rid of keywords and adding methods (new() as opposed to #new). That's not going to happen anytime soon, so Java developers will just have to keep climbing the same piles of cruft over and over again...
Now, compare that with this discussion over on the Squeak-dev list.
Security may, as Scoble say, be an industry-wide problem. It's a big one for MS though, bigger than for the rest of the industry. Why? Due to the dominance of MS operating systems, the eco-system available for viruses (worms, etc) to live in is huge. That means that the bad actors are going to target MS first, and everyone else second (if at all).
BBC reports that CERT has issued a warning against using IE until MS fixes a fairly nasty bug:
The net watchdog, the US Computer Emergency Reponse Center, and the net security monitor, the Internet Storm Center, have both issued warnings about the combined threat of compromised websites and browser loophole.
Cert said: "Users should be aware that any website, even those that may be trusted by the user, may be affected by this activity and thus contain potentially malicious code."
In its round-up of the threat the Internet Storm Center bluntly stated that users should if possible "use a browser other then MS Internet Explorer until the current vulnerabilities in MSIE are patched."
Lessig points to a mocked up lawsuit based on the Induce act that a bi-partisan consensus in the Congress wants to pass. If this thing passes, then a lot of our "fair use" rights will just flush straight down the drain. Lessig has been following this thing pretty closely; have a look at all his recent posts on this topic...
Cincom's owner and founder, Tom Nies, has been honored by Ernst and Young:
Cincom President Tom Nies was selected as the Ernst & Young Entrepreneur of the Year on 6-24-2004.A Joint Press Release will be issued, along with pictures of the event, and the special full page coverage on Tom and Cincom that is in the Ernst and Young Entrepreneur Showcase
I'll update this post with a link as soon as I have it.
Eric Winger explains how to make testing a nearly automatic part of your VisualWorks development. Very cool.
There are two new features coming up in BottomFeeder:
- Newspaper View - This is already in the dev stream. You can set this view globally or on a feed specific basis. If it's set globally, then you'll see all new (or all today's) items summarized into a single HTML view. If you select a folder, you'll get the same behavior for all child feeds in the folder. If there are too many items to summarize, the html view can be paged. If it's set on a feed level, you get this only for specific feeds with that property set
- Synchronization between running instances of Bf. I don't have this working yet, but I've got the basics of it written. I'll be testing it out when I get back to my office and have a second machine to test with.
There will also be a number of small enhancements here and there, including a much better algorithm for comparing inbound feed items with ones you already have - and this should finally be decoupled from the size of your saved items cache. Once I get the synchronization code working (and once Rich gets the doc up to date), the next release will be ready to go. With Newspaper view and syncromization, Bf will support pretty much the same feature set as the commercial aggregator choices - all while being cross platform and completely immune to any browser based attacks.
Check out the Smalltalk Central site and see what Smalltalk Doc is all about. Cool stuff.
No blogging for awhile today - I'm off to Universal Studios with my parents and daughter. In the meantime - have a look at this rant about GMail. I wondered about the lack of folders when I took my first look. I still have reservations about server based mail....
I switched to Firefox awhile back, and have been mostly pleased with it. Every so often, I follow a link and one of two things happens:
- Firefox pops up, and IE pops up
- Just IE pops up
Well, we're back from Universal - we went over to Islands of Adventure - and straight over to the Bilge Rats ride (a water ride guaranteed to get you wet). I went prepared - sandals and bathing suit. The lines for all of the newer rides were long - 60-75 minutes in most cases. On the pther hand, all the older rides were pretty quick (especially over in the original park - no wait at all for "Earthquake", "Twister", or "Back to the Future". Oddly enough, the "Hulk" coaster had a huge wait, while the "Dueling Dragons" one (further back in the park) was 10 minutes. Still can't get my daughter on any of the challenging coasters either. It was a full day - ended a bit early by a set of nasty t-storms that rolled through at 6:00 or so in the evening. We watched the fireworks from that all the way back to the house. One more day, and then it's back home.
Dave Buck points to a Russel Beattie post where Russel decries the complexity of Java APIs. The post is good, but make sure you read the comments. The Java developers who answer him unintentionally reinforce his point - all the while implying something like this: "Complexity is Good; it's how we know we have done software right..."
The Yankees are where they usually are, in first place, steadily opening up ground on the Red Sox. Even yesterday's shellacking by the Mets didn't matter - while the Yankees were busy losing 9-3, the Red Sox lost 9-1 to the Phillies. So what should we expect? The usual, of course:
- Yankees win the AL East
- Red Sox win the wild card, but lose in 7 games (to the the Yankees, most likely)
Of course, it's always possible that they'll just choke in September :)
Postel's Law has two parts. This is something a lot of people don't want you to look at, they only want you to think about the first part, the part which XML says is not a great idea -- be liberal in what you accept. This tends to favor the big guys who have the resources to catch up, and then the chutzpah to throw a big fat hairball into the middle of the market, one that no one else can handle. Maybe Postel didn't live in a world where these big companies could create such big messes, but I have had to deal with them, many times in my career, and they usually end competitive markets. A lot of well-intentioned people in the syndication community don't have the benefit of this experience, and we may have to learn this once again for their benefit. I hope not.
I'm not so sure that being liberal in what you accept favors "the big guys". They are all using sealed, immovable XML frameworks - which means that they have to hack up their own parser to deal with things rejected by the library parser. I've mentioned before that I don't have to do that. See, Smalltalk doesn't have bad ideas like "final" classes. To deal with malformed XML, I simply subclassed the VW XML parser. In the methods where fatal errors were being raised due to bad characters (and such like), I simply have the parser "move on", and flag an error (so that the 3 people who care can report it). This required virtually no effort on my part, and I haven't so much as touched that parser code in 6 months (possibly more; it was awhile ago). The people who say that this is a lot of effort do so from the perspective of the kinds of effort that is required with the mainstream tools. Look outside the mainstream a little and you'll find that the "hard" things just aren't as hard as you thought (not that VW is without flaws... like any other language/library platform, it has its own set of issues).
Bottom line - the "rules" for how to deal with malformed content are going to differ based on context. In a consumer context (most aggregator usage), you'll favor the accept almost everything idea. In a business context, you'll favor the reject anything malformed idea.
The working vacation in Florida is just about over; my daughter and I head back to Maryland early tomorrow. My parents live in Melbourne Beach, and I have no desire to get up at 4:30 am - so we are heading over to a hotel for one night. I should be back in the office by 1:00 or so tomorrow. next up - Australia in July!
This is good news - CS Lewis' "The Lion, Witch and the Wardrobe" has started filming in New Zealand. I read the books as an adult, and I liked them a lot
It's always a struggle to get back to work on a travel day. I arrived home about 1:30 today, but it didn't really feel like a work day (air travel at an early hour will do that to you). I expect I'll be back in the swing of things tomorrow.
So Sun is now playing the up the version number to impress people game.
Sun announced today the Beta 2 release of the J2SE 5.0 software development kit (JDK), which includes tools such as compilers and debuggers necessary for developing applets and applications and the Java Runtime Environment (JRE). I
Not that this is new; ParcPlace jumped from VW 3.1 to VW 5i - and that was two pieces of marketing led silliness:
- The "5" was there because IBM was at version 5 of VAST - and they didn't want to look "behind" with a version 4
- The "i" was for "internet" - which was amusing all by itself, since at the time we didn't support any of the common net protocols (although we did have a browser plugin)
Lots of companies do this, and it makes me wonder - who do they they think they are fooling? Does Sun think that people will forget that it's really 1.5? Like ParcPlace thought that the "i" was enough to convince people? Like many other software firms seem to think? This kind of thing is just silly.
Yeah, Bruce Tate is in Groundhog day alright:
Enter Ground Hog Day on a grander scale. Those of you who are old enough to remember have seen this before. A long time ago, when we thought we'd taken procedural programming as far as it would go, we hit that unrunnable rapid. People wrote books like the Mythical Man Month and Death March. So we innovated. Object oriented technology was born. We started playing with it in the'70s, with Small Talk. A few joined the band wagon, and used the technology to good effect. They are like the guides that led Mike and I to the dangerous river. And they had fun, and were incredibly productive. And like Mike and I, other lesser equipped paddlers in the industry followed enthusiastically, but they were in over their head. And they crashed and burned. They heard the roar (inheritance? CORBA? #define?) And they flipped, tried to recover, and swam. Some wailed, "This is truly a river that is too mighty for the layman to run. Let us build a sign, and a wall, to keep the public safely out. Let there be no more good people drown here."
Hey - there's a reason that the Smalltalk guys made it look easy (heck, CORBA is easy in Smalltalk, for goodness sake). The answers are explained here. Unfortunately for the rest of us, neither Sun nor Bruce have learned anything:
And that, finally, brings me to the point. We're in a similar place today. The early guides are putting in on the roaring river, and their kayaks have AOP (Aspect Oriented Programming) all over them. And a few of us are going to foolishly follow them down the river, and maybe even get killed. We'll hear that AOP really isn't a seventeen-foot rubber raft that is immune to capsize. But what happens next? Can that particular river be run? I, for one, think so.
Cue the next silver bullet chase music....
StepTalk is the official GNUstep scripting framework. It is more than a scripting framework with an illusion of single objective environment between objects of scriptable servers or applications. It is language independent, but the default scripting language is Smalltalk.
Car manufacturers have been stuffing new cars (especially luxury models) with a lot of new electronics - and they aren't always debugged:
And while some automakers force car owners to sign nondisclosure agreements to avoid bad publicity when their electronics go haywire, the Internet is abuzz with the horror stories.
A widely posted Associated Press story reported that the Thai minister of finance was trapped inside his BMW when a computer malfunction locked the car's doors and windows; a bystander had to break one of the car's windows with a sledgehammer to let him escape.
I wonder what software development tools are used for these kinds of systems - is it still the fiefdom of C?
I've been making progress on the synchronization feature to BottomFeeder. How does this work? Well, there are two ways you can use this:
- From Bf, execute a remote synch. This assumes that Bf is running on the remote system, and that port 8666 is accessible (HTTP) from where you are running Bf locally
- Since that may be problematic, you can also export a synch file and then get that file over to the remote system manually (FTP, floppy, whatever) and then load it up.
The synchronization data is a simple dictionary of your feeds and all the items that have been read. When you synchronize, all matching items in the local Bf will be marked read (as they were in the source). This is pretty cool! I'll have an update to the dev stream as soon as I get more testing done on it.
Sun has Jonathan Schwartz out blogging now - given the current "be careful what you say in public" atmosphere surrounding public companies, this is a very interesting decision. I have to give them credit for opening up this way.
The synchronization feature I talked about here is now in the dev stream updates for BottomFeeder. It was pretty easy to implement; the tough part will be end users having the ability to remotely access another Bf via port 8666 (HTTP). That's why you can export a synchronization file and then load it in.
There are a number of new features in the upcoming (3.6) release of BottomFeeder. I've also addressed a number of bugs that have been brought to my attention - here's the list of the major new stuff coming down the pike:
- Added "Newspaper view" to BottomFeeder. Users can view all new items (on feed/folder selection) in an html summary, or limit those summaries to individual feeds via property settings. The view summarizes all items (either all new or all today's) at the appropriate selection level. If there are too many items, a paging view is displayed
- Added remote synchronization. A running Bf can query another Bf via HTTP (on port 8666) for its state of items (read/unread). The local image will be appropriately updated. This may be limited by firewalls, so there is also a file import/export mechanism.
- Improved the "is this item new" algorithm significantly, and decoupled the logic from the size of the feed cache.
- Upgraded the Blog Poster with more wiki markup options
- Added the ability to toggle the spell checker on and off for the poster and for the comment tool
- If you browse a document link inside of Bf, Bf will now prompt you to download the file. Previously, Bf logged an error on such requests
- Image display in the HTML component has received more upgrading. This should be improved in the latest release
This keeps Bf even with or ahead of the rest of the aggregator field, including the commercial ones. The new release should be out fairly soon - so far as I know, Rich is mostly caught up on doc.
Windley points to some information about the relative levels of security on the campaign sites of Bush and Kerry - pointing out that there's an interesting tension between security and speed on something as temporally bound as a campaign website (which may map very well to sales/marketing campaigns in the business world). The main finding is that Bush's site seems to have a lot more potential security holes in it - color me not surprised after reading this:
And for those who evaluate a candidate's choice of operating systems when choosing their president, Smith's check showed that the Kerry site is housed on an Apache Web server running on a Red Hat Linux box. The Bush website is hosted on a Microsoft IIS 5.0 server and uses Microsoft's ASP.net.
IIS is a sure way to security holes....
There's more fun in the pop up space - at least one of the keylogging pieces of malware out there detects a large number of banking sites in IE, and logs keystrokes while you are on them. How many users of IE are getting waxed by this one, I wonder? See the TechRepublic story for details.
I'll be in Australia in July, and Cincom's guy in Australia - Andrew McNeil - has posted an announcement of my talk in Melbourne:
For the information of those in and around Melbourne, Australia -
Cincom will soon release VisualWorks 7.2.1 and ObjectStudio 6.9.1. Come and hear what the plans are for the next major release (due in November 2004), as well as Cincom's plans for the next few releases. James Robertson, Cincom Smalltalk Product Manager, will give a presentation and answer your questions. As time permits James will also talk about blogs, blogging, RSS, and the open source RSS/Atom news aggregator BottomFeeder.
Venue: CSBC Mediation centre Level 15, 350 Collins st Melbourne.
Citibank Building roughly half way Between Queen and Elizabeth st on the North side of Collins St. 350 is about the halfway point of Collins st Melb.
Date: Monday 19th July
Time: From 12:30
I'm looking forward to it!
Apparently, there's things we didn't know about Blaine Buxton - not only is he dating Barbie, but he's a long lost Australian :)
When I decided to support synchronization in BottomFeeder, I decided to do something really simple. The gist of the process is to transfer a synchronize definition from system 1 to system 2. There are two ways to do that:
- Export a file, and then import it into the other system
- Have the system here request a synch from a remote system (via HTTP)
In either case, a synchronize document is transferred, decoded, and applied. The document is simple, and it's not XML. Why? Well, an XML document would end up being bloated. The data being transferred is very simple - it's a dictionary:
- Key = Feed URL
- Value = OrderedCollection (guids)
The guids are the guids of the items that have been read - there's no need to transfer information about the unread items (either they are new in both places, or you have already read it in the requesting place- either way, no need to send that information). So how to format that? A simple text format - looks like this:
http://www.someUrlHere.com/feed.xml (guid1 guid2 ... ) http://www.someOtherUrlHere.com/feed.xml (guid1 guid2 ... )
So I decode that back into a dictionary, and grab all the local feeds. Then I match feed urls, and mark all local items with matching guids as read. That's it. It's simple, it's pretty fast, and it didn't require a lot of work. If a "standard" approach to this problem ever arises, I'll probably support it. For now, this works just fine between different instances of BottomFeeder
ComputerWorld's Robert Mitchell talks about the security enhancements in XP SP2. It seems like they are at least moving in the right direction - the firewall will default to on, for instance. However, some of the security features just seem like hurdles. Take this:
To get an idea of just how far Microsoft's thinking has come, consider how SP2 handles .zip files. These can contain viruses that antivirus programs can't detect, so SP2 blocks all such attachments. To open a .zip file, the user must first save it to disk, then select it, bring up the Properties dialog box and click on an option to unlock the file. That makes handling of .zip files pretty inconvenient but safer.
How is that actually safer? As you unzip the file, either you have virus scanners that will catch bad things or you don't. Making it a pain in the butt to grab compressed files doesn't actually help. This reminds me of nothing so much as the Transporation (In)Security Agency's insistence that you take your PC out of its bag before it gets X-Rayed. News Flash - X-rays see through the bag. I'm so pleased to see that MS has gone for "in your face" perceived security, rather than the real thing.
Here's what they could do if they really wanted to improve things - make it possible to use Windows as a normal user rather than as a user with admin rights. I rarely use the root account on my Linux box - on Windows, I can't do much of anything unless I have admin rights. Since there's been zero reporting on that issue, I guess it's not going to be addressed soon...