I confess that I strayed. Perhaps its been spending too much time in
fake grad school that had confused me, or perhaps I'm just not as saavy a developer as I thought I was.
I've been working on a certain
100,000,000 row dataset problem using hand built data structures and mmap-ing them out of the file system. This morning, on a lark, I loaded the data into a
Berkeley BD instance, and I'm haven't looked back since. Even with a Btree index the database builds faster than my own code, or that of other good netflix libraries, and then you get the benefit of the index on retrieval.
I haven't evagelized for Berkeley DB here yet. And not everyone will agree with me that its a good solution for most of your data access needs (certainly my
alma mater didn't. But if you have a highly performant problem and what a quick, simple and very low overhead mechanism for managing your data give Berkeley DB a look. It eve has
ruby bindings. It's free as in beer and code. [
Updated: See comment below for an actual description of the licensing requirements.]
So for all two of you that read this blog and know and care about these things: Why are we still using relational databases for our websites. I understand the need for flexible systems for our reporting tools and data mining, but when a website has maybe 30 queries that need to be executed, and that database is almost always the first piece of a architecture to fail, why not use a customer database without the overhead of query management? Why are we still parsing SQL for simple CRUD calls when it could be done 10 times faster?
Labels: brilliant, half assed thoughts, software