Finding References
I was inspired by this post from Jon Udell, where he used XQuery to walk through his feeds and find all the items that made a reference to Amazon - which is a pretty good approximation for all the posts that reference a book. Well, I'd much rather play with Smalltalk than with XQuery, and - as it happens - I have a lot of the development system available to me in the BottomFeeder runtime. So - I opened a workspace (off the System menu) and ran the following script:
| amazonCollection stream matches |
amazonCollection := SortedCollection new sortBlock: [:a :b | a value size > b value size].
RSSFeedManager default getAllMyFeeds do:
[:each | | items |
items := each allItems.
matches := items select:
[:eachItem | | desc |
desc := eachItem description.
desc
ifNil: [false]
ifNotNil: ['*href*amazon*' match: desc]].
matches notEmpty ifTrue: [amazonCollection add: each->matches] ].
stream := WriteStream on: (String new: 10000).
stream nextPutAll: '<p><b>Amazon Reference Report</b></p><p>'; cr; cr.
stream nextPutAll: '<table width="100%">'; cr; cr.
stream nextPutAll: '<tr>'; cr.
amazonCollection do: [:each |
| key value |
key := each key.
value := each value.
stream nextPutAll: '<td><a href="', key link, '">', key title, '</a></td>'; cr.
stream nextPutAll: '<td>'.
stream nextPutAll: value size printString.
stream nextPutAll: ' ('.
1 to: value size do: [:cnt | | each1 |
each1 := value at: cnt.
each1 getMyLink isNil
ifFalse: [ stream nextPutAll: '<a href="', each1 getMyLink, '">', cnt printString, '</a> '].
cnt = value size
ifFalse: [stream nextPutAll: ', ']].
stream nextPutAll: ')'.
stream nextPutAll: '</td></tr><tr>'; cr].
stream nextPutAll: '</table>'; cr.
^stream contents
By inspecting the results, I got a ready to post set of html, which I pasted below. The nice thing is, I already have objects for all the feeds and items - and those objects have a nice rich API for me to exploit. Since I have access to tools like inspectors and workspaces (no browser or debugger though), I can do this in the application, with all of the application data at my fingers. So, here's the output from my 250 feeds - all of the items in all of my feeds that make a reference to Amazon, complete with links to the matching feeds and items:
Amazon Reference Report


Comments
Cool!
[Jon Udell] February 19, 2005 21:57:03.000
If you can also restrict the matches using this regexp:
d{b9,9}d[dX]
it's a more accurate filter for book reference.
Does BottomFeeder understand feeds in terms of an HTML DOM-like thingy or an XML DOM-like thing?
- Jon
Finding References
[ James Robertson] February 19, 2005 23:07:17.000
Comment by James Robertson
Jon,
Actually, BottomFeeder doesn't maintain any of the feed/item information in XML form - once the content comes down, I parse it into objects and do all other operations on the objects. So, if I was going to do additional matching it would have to be against the object model.
[giorgio ferraris] February 20, 2005 8:43:06.000
james,
In case of longer amazonCollection, it would be probably better use a standard OrderedCollection and then convert it to a SortedCollection at the end, using asSortedCollection: sortBlock, becouse the add could become time consuming (it does an ordering at every entry). The addAll: method used on the operation, as implemented on SortedCollection, is a specialization taking care of speed.
Ciao
Use PubSub.com to find references to Amazon.com
[Bob Wyman] February 22, 2005 15:33:26.739
If you want to generate a feed with many references to Amazon.com, come to PubSub.com and create a weblogs subscription with the following query:
URI:amazon.com and not SOURCE:amazon.com
This will cause all posts which reference a URI on the amazon.com site to be inserted into your feed. The "not SOURCE:amazon.com" predicate will prevent inclusions of any posts that may have originated from the amazon site.
Note: Use the weblogs focused search form at: http://www.pubsub.com/weblogs.php to create this subscription.
To make extracting URI references easier for script writers, you'll find that we augment the postings that we publish by explicitly extracting the URI's that are linked to. You'll find these in the posting in a format something like:
I hope this is helpful.
bob wyman