Awhile back Bob mentioned some scraping tools he created for use with BottomFeeder. I decided to have a look at them today, because I decided that I'd like to have a subscription to User Friendly. I loaded Bob's code (Simple Script Runner from the public Store) and had a look. I decided that it would be more useful if it had some SAX drivers attached, so I created a new bundle - RSSScriptRunner - that included a few. I'm planning to enhance this little package some, but in the meantime the following script produces a valid feed with today's User Friendly comic in my local Bf directory:
| writer content str rest out | contentBlock := [:builder :chunk | | stream | stream := ReadStream on: chunk. stream throughAll: 'SRC="'. builder link: (stream upTo: $"). builder title: 'User Friendly For: ', Date today printString. builder description: '<a href="', chunk. builder pubDate: Timestamp now]. out := 'userFriendly.xml' asFilename writeStream. writer := RSS20_SAXWriter new output: out. writer prolog. writer startRSS. writer startChannel. writer title: 'User Friendly Feed'. writer link: 'http://www.userfriendly.org/'. writer description: 'User Friendly Feed'. writer pubDate: Timestamp now. writer startItem. writer title: 'User Friendly For: ', Date today printString. content := 'http://www.userfriendly.org/' asURI valueStream contents. str := content readStream. str throughAll: 'CARTOON FOR'. str upToAll: 'href="'. rest := str throughAll: '</A>'. contentBlock value: writer value: rest. writer endItem. writer endChannel. writer endRSS. out close.
Works like a charm