Send to Printer

BottomFeeder

Redirection hell in an aggregator

July 12, 2006 22:22:28.427

Gordon Weakliem points to Charles Miller, who has run into the infamous hotel 301 problem - Charles says:

On a recent trip to the USA, Scott stayed in a hotel with a ‘net connection. The connection was one of those with an arbitrary web-based registration that made sure you’d agreed to their ass-covering terms of service. Until you had agreed, the hotel’s proxy would redirect any HTTP connection to the registration page… using the 301 status code.

Within a few seconds of plugging in the ethernet cable, all of Scott’s RSS subscriptions had been silently replaced with the hotel’s registration page URL. All the original feed URLs were lost.

Considering this, Gordon writes:

Lots of aggregator developers have been burned by the hotel proxy problem. The problem is that as a developer who read the spec, you tend to think that other developers did the same. The interesting thing is what to do with the 301. So you ask the user if they want to accept the redirect. Does the user even understand what that means? Let's see, "Some server is saying that the resource at http://example.com/rss.xml has moved to dumbhotelproxy.com. Do you want to change your subscription?" Considering the amount of effort people are putting into making it easy to subscribe to an RSS feed (because the users don't know what to do with XML, or the orange XML button), I have no idea how to pose that question such that the user might actually do the right thing.

I ran across this one myself over 2 years ago with BottomFeeder (thank goodness I back up feeds at startup!). I stay in hotels that have 12 or 24 hour subscriptions often enough that I wanted a permanent solution to this problem, so I came up with one:

  1. When I see a redirect, I cache the old address
  2. If I see more than a handful (default: 3) of redirects to the same place, I stop the update loop and restore the old urls

Here's the code I use to make that call:

 
 

httpRedirectEvent: anUrl
	"we got a redirect - check to see if we have more than one this
cycle that went the same place.  If so, assume that we have a proxy
of some sort, and revert it all back"

       self redirects add: anUrl.
	(self redirects occurrencesOf: anUrl) > self tooManyRedirects
	       ifTrue: [self resetOriginalsAndStopUpdates: anUrl.
	                       self redirects: OrderedCollection new].


Each redirect generates an event, which ends up in that method. You can see how it works, and it's saved me from this problem more than once. Now, how easy do other frameworks make it for you to do this sort of thing? I have no idea. It was a few minutes work to solve it in Smalltalk though

Technorati Tags: , , ,

Comments

[James Holderness] July 13, 2006 2:01:19.803

I've never encountered this sort of thing myself, but based on what Charles reported last year, I got the impression that you would be redirected to an HTML page - you would never actually see a valid feed. Under those conditions, it seemed to me the easiest solution would be to only accept permanent redirects after receiving a valid feed response.

Have your hotel experiences been different? Would such a solution not have worked for you?

Re: Redirection hell in an aggregator

[ James Robertson] July 13, 2006 7:16:41.465

Comment by James Robertson

I could have done that, but my approach was what I thought of at the time.

Redirection Solution

[Gordon Weakliem] July 13, 2006 12:22:14.368

In the end, we went with the solution James mentions - if the retrieved content doesn't parse as RSS/RDF/Atom, then we don't accept the redirect.  I should mention that the whole issue of redirects is pretty subtle - for instance, say you retrieve a document, get a 302, then a 301, then the document.  Is the redirect temporary or permanent?  You can extend that to arbitrary chains, for instance, 301 -> 302 -> 301 -> 200 - what is the redirect address?  The first case is pretty easy, you don't accept the permanent redirect (this is a real world case we had with feedburner).  Cases like the second are more obscure, but aggregators should have thought through what happens here.

 

[James Holderness] July 13, 2006 13:02:37.080

Yikes. I'd probably say the first case (302 -> 301 -> 200) shouldn't permanently redirect. In the second case (301 -> 302 -> 301 -> 200) you probably should permanently redirect, but to the second page retrieved, not the final page.

Currently what we do is permanently redirect to the final page when the second last return code is 301 (so both of those examples would permanently redirect). I don't think that's terribly bad, but I don't think it's correct. What do you do with those cases?

[Aristotle Pagaltzis] July 13, 2006 15:04:15.156

I’d say the only redirect that counts is the first one. In a sequence such as

  1. A: 301 -> B
  2. B: 302 -> C
  3. C: 301 -> D
  4. D: 200

I would change the subscription to “A” to one to “B”. The redirect from B to C is not permanent so it isn’t relevant. Only if you get an uninterrupted series of 301s can you shortcircuit to the last of their targets:

  1. A: 301 -> B
  2. B: 301 -> C
  3. C: 301 -> D
  4. D: 302 -> E
  5. E: 301 -> F
  6. F: 200

Here, I’d change the subscription to “A” to one to “D”, because all redirects on the chain from A to D are permanent. The following redirect is a 302, so at that point, the chain ends.

 Share Tweet This