I grabbed last week's logs again in order to see where the referrals might be coming from - I don't provide counts on the server side, so I had to run some scripts to cut that stuff out. This is fairly procedural code, because it's just slapdash stuff in a workspace - but Smalltalk is good for that kind of thing, given the immediate response. After reading in all the log entries, I had to clean up the list - my log viewer does this, so I just stole code from there:
"take the logs and cut down to the real referers" cruft := #('.jpg' '.gif' '.js' '.png' '.swf' 'inc/' 'servlet/' 'css/' '.wmv'). cleans := refs entries reject: [:each | | val | val := cruft detect: [:each1 | '*', each1, '*' match: each command] ifNone: [nil]. val notNil]. cleans := cleans reject: [:each | | val | val := ReferralScanner currentRejects detect: [: each1 | each1 match: each referer] ifNone: [nil]. val notNil].
That yanks out all the image fetches, and css, etc, etc. Next, I ran through the entries to figure out what the top referers were:
"now that we have that, we can get some data" dict := Dictionary new. cleans do: [:each | | stream cmd total| stream := each command readStream. stream through: Character space. cmd := stream upTo: Character space. dict at: each referer ifAbsentPut: [cmd->0]. total := (dict at: each referer) value + 1. dict at: each referer put: (cmd ->total)]. collection := OrderedCollection new. dict keysAndValuesDo: [:key :value | value value > 20 ifTrue: [collection add: key->value]]
That creates a dictionary of associations, matching up the referers to the pages on my server (and the counts from that source). Finally, sorting it:
collection := collection asSortedCollection: [:a :b | a value value > b value value]
Then it's just a small matter of pushing out an HTML table:
More than one of those is actually from Martin Fowler's "Humane Interface" post - I had to intervene manually to clean that up. I also eliminated all the posts that got 20 or fewer referrals. That list also doesn't include the scads of search engine referrals, because they didn't tend to cluster over the week I looked at. Now that I have a script, I'll probably take a periodic look.