How I Built a Free Grid Computer, In Less Than a Week


By now you’ve all heard about BuyLater, my happy little firefox extension that (thanks to an unexpected LifeHacker.com article) is rapidly climbing towards 1000 users and world domination. Without getting TOO technical, I thought I would share with you how I saved BuyLater from becoming an infrastructure nightmare – one that would have either killed the value of the application (real-time updates), or sucked tons of money and hardware into a technology backwash.

This will be a little controversial, I think – simply because the technique I used, (grid computing), is most often used for less… legitimate… purposes. So much so, that it is almost synonymous with “Bot Nets”.

But let’s go back to the beginning.

When I approached Jesse about WiiMe, and suggested that he ought to generalize it beyond a single product (Wii), and a single interface (Twitter), he told me it would be too hard. The key problem, he pointed out, was keeping a large number of items up-to-date. “Don’t you realize Amazon has a limit on their API?”

Well, I did realize that. But I also had a few ideas.

Step 1 – Do more with Less

Amazon limits API requests to 1 request per second, a totally reasonable limit for most purposes. However, they enforce that limit based on IP address, NOT based on API key. So getting a bunch of extra API keys and round-robining through them was not going to work. (That was a trick I had used on Google’s Search API, some years ago.)

However, most people don’t realize that you can poll for more than one item, per API call. In fact, you can pack 10 items into a single request. Doing this gave me a theoretical maximum of 600 items per minute. When I broke the 300 user mark (somewhere in the first two days), and the total number of items exceeded 600, I had to drop the refresh interval back to 2 minutes. Uh-oh – I could see where this was going.

Sure enough, over the course of the next week, I gradually reduced the update interval to 6 minutes – which meant that BuyLater became essentially useless for tracking Wiis and other scarce items, where the time in-stock is typically 5 minutes or less.

I needed a BUNCH more IP addresses, and quick.

Step 2 – With a little help from my friends…

Rather than start buying additional servers (which I couldn’t afford), or additional IP addresses (which I couldn’t get), I did what any sensible child of the digital age would do – I made it someone else’s problem. I simply added a small service to the BuyLater extension – that fetches a given URL every 60 seconds, and returns the resulting XML data to the BuyLater server. In essence, I distributed the task of polling amazon to the end-users.

Why 60 seconds? Simple math, really. I’ve always wanted to maintain a 60-second refresh interval for the BuyLater service; most users are following 2 unique items, and as a basic assumption, I assume people have their browser open 20% of the time. (Having users in the UK and, hopefully soon, Asia, helps to spread out the polling). Remember, each query to Amazon fetches 10 items – which means, hopefully, that the cluster will be able to maintain my target refresh rate… indefinitely.

Now, obviously I’ll still need to add some more servers at some point, since all this data is still going back to one place. But at least I’ll be adding them for the right reasons.

Step 3 – ???????

There are two questions that people have asked me, so far:

  • What do your users think about that?
  • What does Amazon think about that?

To the first one, I have no idea. That’s really what this post is about – what DO you think about it? Is it alright for Larry to be fetching data from Amazon, that helps Sally get a deal? Should I have made the whole thing opt-in, or opt-out? From a technical standpoint, BuyLater users were already visiting both Amazon, and the BuyLater site (albeit not once every 60 seconds), and there’s no personal info in any of this data, so what’s the difference?

On to the second question – again, I have no idea. But since there were a couple of @amazon.com email addresses in yesterday’s batch of users, I imagine if they have a problem with it… I’ll hear about it pretty quick.

, , , , , , , ,

  1. #1 by Raphael on 11Apr08 - 5:37 pm

    I’m a user, and despite the potential increase of my firefox memory disappearing into thin air, don’t have a problem with this.

    Distributing the load to the client is an excellent idea, especially for things that need constant updates or are exceptionally rich web applications. I’ve thought quite a bit on how much can really be offloaded to the client to reduce the server work.

    I’m sure some people might have a problem with it… but do most people care? I doubt it.
    Maybe people would be concerned about somehow figuring out what people have on their lists – containing potentially embarrassing items… but you can probably obfuscate that.

  2. #2 by Doug Ransom on 11Apr08 - 7:55 pm

    The users who will a problem are the same ones who have a problem sharing their bandwidth with Bittorrent when downloading legitimate content. I think your idea is capital.

    I do think you share your trade secrets too easily – demo camp and now this post. Build a bigger mass before you get emulated with someone with a better marketing scheme.

  3. #3 by John on 19Jul08 - 12:53 am

    I love the idea.

    I’d very much like to discuss with you an idea of my own. Please email may when you get a chance.

    Thanks
    john

(will not be published)


Close
E-mail It