rss

Finding References

February 19, 2005 20:31:05.596

I was inspired by this post from Jon Udell, where he used XQuery to walk through his feeds and find all the items that made a reference to Amazon - which is a pretty good approximation for all the posts that reference a book. Well, I'd much rather play with Smalltalk than with XQuery, and - as it happens - I have a lot of the development system available to me in the BottomFeeder runtime. So - I opened a workspace (off the System menu) and ran the following script:


| amazonCollection stream matches |
amazonCollection := SortedCollection new sortBlock: [:a :b | a value size > b value size].
RSSFeedManager default getAllMyFeeds do: 
	[:each | | items |
		items := each allItems.
		matches := items select:
			 [:eachItem | | desc |
				desc := eachItem description.
				desc 
					ifNil: [false]
					ifNotNil: ['*href*amazon*' match: desc]].
		matches notEmpty ifTrue: [amazonCollection add: each->matches] ].
stream := WriteStream on: (String new: 10000).
stream nextPutAll: '<p><b>Amazon Reference Report</b></p><p>'; cr; cr.
stream nextPutAll: '<table width="100%">'; cr; cr.
stream nextPutAll: '<tr>'; cr.
amazonCollection do: [:each |
	| key value |
	key := each key.
	value := each value.
	stream nextPutAll: '<td><a href="',  key link, '">', key title, '</a></td>'; cr.
	stream nextPutAll: '<td>'.
	stream nextPutAll: value size printString.
	stream nextPutAll: '  ('.
	1 to: value size do: [:cnt | | each1 |
		each1 := value at: cnt.
		each1 getMyLink isNil
			ifFalse: [	stream nextPutAll: '<a href="', each1 getMyLink, '">', cnt printString, '</a> '].
					cnt = value size
						ifFalse: [stream nextPutAll: ', ']].
		stream nextPutAll: ')'.
	stream nextPutAll: '</td></tr><tr>'; cr].
stream nextPutAll: '</table>

'; cr. ^stream contents

By inspecting the results, I got a ready to post set of html, which I pasted below. The nice thing is, I already have objects for all the feeds and items - and those objects have a nice rich API for me to exploit. Since I have access to tools like inspectors and workspaces (no browser or debugger though), I can do this in the application, with all of the application data at my fingers. So, here's the output from my 250 feeds - all of the items in all of my feeds that make a reference to Amazon, complete with links to the matching feeds and items:

Amazon Reference Report

John Porcaro: mktg@msft 9  (1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 )
The Daily Brief: We'll Try To Be Nicer, If You'll Try To Be Smarter! 7  (1 , 2 , 3 , 4 , 5 , 6 , 7 )
0xDECAFBAD Blog 6  (1 , 2 , 3 , 4 , 5 , 6 )
Miguel de Icaza 6  (1 , 2 , 3 , 4 , 5 , 6 )
PubSub: Smalltalk 5  (1 , 2 , 3 , 4 , 5 )
Making it stick. 5  (1 , 2 , 3 , 4 , 5 )
Jon's Radio 4  (1 , 2 , 3 , 4 )
Java, .NET, and Religion 4  (1 , 2 , 3 , 4 )
Ted Leung on the air 4  (1 , 2 , 3 , 4 )
WebSense 3  (1 , 2 , 3 )
Frank Patrick's Focused Performance Blog 3  (1 , 2 , 3 )
beyond bullets 3  (1 , 2 , 3 )
Phil Windley's Technometria 3  (1 , 2 , 3 )
Planet Lisp 3  (1 , 2 , 3 )
Web Things, by Mark Baker 3  (1 , 2 , 3 )
Panopticon Central 3  (1 , 2 , 3 )
Matt Croydon::Postneo 2.0 3  (1 , 2 , 3 )
Bob Congdon 3  (1 , 2 , 3 )
Instapundit.com 3  (1 , 2 , 3 )
Extremetech 2  (1 , 2 )
Dare Obasanjo's WebLog 2  (1 , 2 )
WonderBranding: Marketing to Women 2  (1 , 2 )
Bill Clementson's Blog 2  (1 , 2 )
Doug Tidwell's weblog 2  (1 , 2 )
misbehaving.net 2  (1 , 2 )
d2r 2  (1 , 2 )
Sam Gentile's Blog 2  (1 , 2 )
Clemens Vasters: Enterprise Development & Alien Abductions 2  (1 , 2 )
Brad Abrams 2  (1 , 2 )
John Lam: Building Better Software, Faster 2  (1 , 2 )
freeform goodness 2  (1 , 2 )
PragDave 2  (1 , 2 )
Martin Fowler's Bliki 2  (1 , 2 )
Dare Obasanjo aka Carnage4Life 2  (1 , 2 )
Derek's Rantings and Musings 2  (1 , 2 )
Aaron Swartz: The Weblog 2  (1 , 2 )
Radio Free Blogistan 1  (1 )
phil ringnalda dot com 1  (1 )
manicwave 1  (1 )
Don Park's Daily Habit 1  (1 )
Scobleizer: Microsoft Geek Blogger 1  (1 )
PR Opinions 1  (1 )
Incipient(thoughts) 1  (1 )
Jonathan Schwartz's Weblog 1  (1 )
The Fishbowl 1  (1 )
Ryan Lowe's Blog 1  (1 )
Larkware News 1  (1 )
Lambda the Ultimate - Programming Languages Weblog 1  (1 )
Ralph Johnson - Blog 1  (1 )
Smalltalk Tidbits, Industry Rants 1  (1 )
Bob Westergaard - Blog 1  (1 )
Joel on Software 1  (1 )
Scripting News 1  (1 )
NRO Corner Feed 1  (1 )
MOOREWATCH 1  (1 )
Lessig Blog 1  (1 )
Belmont Club 1  (1 )
Tapped Feed 1  (1 )

Comments

Cool!

[Jon Udell] February 19, 2005 21:57:03.000

If you can also restrict the matches using this regexp:

d{b9,9}d[dX]

it's a more accurate filter for book reference.

Does BottomFeeder understand feeds in terms of an HTML DOM-like thingy or an XML DOM-like thing?

- Jon

Finding References

[ James Robertson] February 19, 2005 23:07:17.000

Comment by James Robertson

Jon,
Actually, BottomFeeder doesn't maintain any of the feed/item information in XML form - once the content comes down, I parse it into objects and do all other operations on the objects. So, if I was going to do additional matching it would have to be against the object model.

[giorgio ferraris] February 20, 2005 8:43:06.000

james,

In case of longer amazonCollection, it would be probably better use a standard OrderedCollection and then convert it to a SortedCollection at the end, using asSortedCollection: sortBlock, becouse the add could become time consuming (it does an ordering at every entry). The addAll: method used on the operation, as implemented on SortedCollection, is a specialization taking care of speed.

Ciao

Use PubSub.com to find references to Amazon.com

[Bob Wyman] February 22, 2005 15:33:26.739

If you want to generate a feed with many references to Amazon.com, come to PubSub.com and create a weblogs subscription with the following query:

URI:amazon.com and not SOURCE:amazon.com

This will cause all posts which reference a URI on the amazon.com site to be inserted into your feed. The "not SOURCE:amazon.com" predicate will prevent inclusions of any posts that may have originated from the amazon site.

Note: Use the weblogs focused search form at: http://www.pubsub.com/weblogs.php to create this subscription.

To make extracting URI references easier for script writers, you'll find that we augment the postings that we publish by explicitly extracting the URI's that are linked to. You'll find these in the posting in a format something like:

I hope this is helpful.

bob wyman

 Share Tweet This
-->