BottomFeeder

Feed Scraping Tools

November 27, 2004 19:04:24.355

Awhile back Bob mentioned some scraping tools he created for use with BottomFeeder. I decided to have a look at them today, because I decided that I'd like to have a subscription to User Friendly. I loaded Bob's code (Simple Script Runner from the public Store) and had a look. I decided that it would be more useful if it had some SAX drivers attached, so I created a new bundle - RSSScriptRunner - that included a few. I'm planning to enhance this little package some, but in the meantime the following script produces a valid feed with today's User Friendly comic in my local Bf directory:


| writer content str rest out |
contentBlock := [:builder :chunk |
	| stream |
	stream := ReadStream on: chunk.
	stream throughAll: 'SRC="'.
	builder link: (stream upTo: $").
	builder title: 'User Friendly For: ', Date today printString.
	builder description: '<a href="', chunk.
	builder pubDate: Timestamp now].

out := 'userFriendly.xml' asFilename writeStream.
writer := RSS20_SAXWriter new output: out.
writer prolog.
writer startRSS.
writer startChannel.
writer title: 'User Friendly Feed'.
writer link: 'http://www.userfriendly.org/'.
writer description: 'User Friendly Feed'.
writer pubDate: Timestamp now.
writer startItem.
writer title: 'User Friendly For: ', Date today printString.
content := 'http://www.userfriendly.org/' asURI valueStream contents.
str := content readStream.
str throughAll: 'CARTOON FOR'.
str upToAll: 'href="'.
rest := str throughAll: '</A>'.
contentBlock value: writer value: rest.
writer endItem.
writer endChannel.
writer endRSS.
out close.

Works like a charm

 Share Tweet This
-->