really, nothing here

software geek

15.8.07

Toby Segaran

Friend, fellow Foo camper, and now author Toby Segaran has just released his book Programming Collective Intelligence: Building Smart Web 2.0 Applications. My copy is now on order.

It's an overview of pretty much all the buzzword compliant algorithms and techniques available to your above average programmer. O'Reilly's pitching it as a solution manual to creating a Web 2.0 website and I think that's not quite the right take on the book. (But I have dumb opinions on Web 2.0 ) This book is probably the book you should turn to after you've built an excellent Web 2.0 site with huge volumes of user supplied community data and you want to mine it for all its value. It's not, however, going to help you put it in an Ajax widget and mash it up with the latest Yelp postings.

I can't review the book yet, because my copy's en route. But I can assure you Toby's the right guy to take on such a large topic and deliver on it in style and if your interested in figuring out how to go about handling the volumes of community data you've been building up. There's a chapter on search engines available on O'ReillyNet that's a pretty good indicator, I believe, of where this book is going. It takes you from a lame AltaVista-style search engine to learning, neural-net based search engine that will deliver you personalized results. Nice stuff.

I also appreciate that O'Reilly changed their cover format and put on a flock of penguins for this book.

Labels: ,

13.8.07

Berkeley DB

I confess that I strayed. Perhaps its been spending too much time in fake grad school that had confused me, or perhaps I'm just not as saavy a developer as I thought I was.

I've been working on a certain 100,000,000 row dataset problem using hand built data structures and mmap-ing them out of the file system. This morning, on a lark, I loaded the data into a Berkeley BD instance, and I'm haven't looked back since. Even with a Btree index the database builds faster than my own code, or that of other good netflix libraries, and then you get the benefit of the index on retrieval.

I haven't evagelized for Berkeley DB here yet. And not everyone will agree with me that its a good solution for most of your data access needs (certainly my alma mater didn't. But if you have a highly performant problem and what a quick, simple and very low overhead mechanism for managing your data give Berkeley DB a look. It eve has ruby bindings. It's free as in beer and code. [Updated: See comment below for an actual description of the licensing requirements.]

So for all two of you that read this blog and know and care about these things: Why are we still using relational databases for our websites. I understand the need for flexible systems for our reporting tools and data mining, but when a website has maybe 30 queries that need to be executed, and that database is almost always the first piece of a architecture to fail, why not use a customer database without the overhead of query management? Why are we still parsing SQL for simple CRUD calls when it could be done 10 times faster?

Labels: , ,