really, nothing here

software geek

31.8.07

Tierney Thys



I've been commuting from Berkeley into San Francisco everyday, which, despite the decent and improving status of BART as a mass transit system, still takes over an hour to accomplish. I've been passing the time away watching videos from the TED conferences which have recently been put on the web.

I'm jealous of the people who get to go to these talks. The quality is very high and the curation of the attendees seems to be generally superior to anything else like it. Talks by Will Wright, Thomas Barnett and Jeff Hawkins are all noteworthy.

The winner for completely blow your mind that this could ever exist, though, is Tierney Thys' lecture on the Mola mola, a 10 foot long jellyfish eating variant of the sunfish. Really incredible stuff. Check it out.

As an extra bonus for anyone interested in visualization, Jonathan Harris' talk on We Feel Fine (regretably apparently broken now), and Universe is fascinating as well. Credit goes to RD over at The Quiet Quiet for first introducting me to these projects.

Labels:

28.8.07

This is collective intelligence?


One of the more disappointing lessons of fake grad school is the validity of subjective work in objective fields and an total over valuing of collective opinions. You get taught to force your models to report similar values as everyone else's because failure to do so could get you noticed, and getting you noticed could get you fired. So you learn alot about manipulating assumptions to forcing "better" results out of your spreadsheet. There's even a whole culture around rationalizing this process as a pseudo-probabilistic enhancement by creating nonsense "scenarios".

So it's nice to see one analyst (Mary Meeker) get busted not once, but twice on fudging the facts and math to create a model for YouTube advertising that justifies the incredible $1.6B valuation it got from Google. First she gets the definition of CPM wrong, resulting in a projection that's , oh, 3 orders of magnitude off, then in correcting that mistake, we're too believe she spontaneously discovered that her assumptions were (gasp!) too conservative, by, oh 3 orders of magnitude -- so the original estimate wasn't that wrong to begin with! Amazing that.

This isn't the first time accusations of sloppiness (laziness?) have followed our intrepid analysis, a lot of her analysis from the late 90s that looked a lot like fraud in hindsite to a lot of people (though Meeker was never accused of anything other than being completely wrong). Mistakes like that resulting in massive class action suites and Elliot Spitzer becoming Govenor of New York. I will be interesting to see what happens this time.

Maybe its time business schools stop teaching scenario-ing and started teaching actual statistics before someone, you know, goes to jail.

17.8.07

Skype

Software As A Service has been one of the big success stories in softare post the .com crash. The pitch was that you should be able to treat your software like the utilities you subscribed to and worry about them just as much. And it was a pretty good pitch when you were handing over your keys to small, hungry, and nimble companies that could afford to only hire execellent programmers.

Skype users found out this week that things can change mighty quickly. This week almost all Skype users lost the ability to log into the service and make phone calls. For a number of companies Skype has been the communications backbone for their distributed teams -- these companies are now scrambling to come up with alternate communication plans, presumably bootstrapping email or whatnot. Skype is free for most users, who are getting every bit of service that they paid for, but even the paying ones are getting screwed right now. Skype's a good company and I would expect that they'll happily refund the subscription costs for part, or all, of this month, but its highly unlikely that those costs reflect even a small portion of the total cost of this outage.

These problems shouldn't surprise us. Who really likes their utility companies? We generally are forced to expect very little from them, and certainly rarely get the customer service we might really need. Once you've given up control over your data to a SaaS system, your relationship to this services is akin to a utilit -- exactly how many excellent options do you have left? Skype isn't a particularly evil company, by and large, they remain highly user focused and I believe that their engineers are working as rapidly as possible to resolve the problem, but they can't help you out of your complete communication blackout problem if you've relied on them too closely.

As more and more products are released that have minimal technical barriers to entry we are developing companies that develop various forms of user lockin, in order to maintain their subscriber base. As these user bases grow and the value of the subscription annuity exceeds the value of future customer wins, its only natural that firms stop innovating and start counting their money. Leaving you in the cold, and having to suck up the costs o switching services or dealing with obsolete, or completely failing, services.

Things like this make me wonder if its really true that open APIs to web services solve the problems of not having open source web services.

15.8.07

Toby Segaran

Friend, fellow Foo camper, and now author Toby Segaran has just released his book Programming Collective Intelligence: Building Smart Web 2.0 Applications. My copy is now on order.

It's an overview of pretty much all the buzzword compliant algorithms and techniques available to your above average programmer. O'Reilly's pitching it as a solution manual to creating a Web 2.0 website and I think that's not quite the right take on the book. (But I have dumb opinions on Web 2.0 ) This book is probably the book you should turn to after you've built an excellent Web 2.0 site with huge volumes of user supplied community data and you want to mine it for all its value. It's not, however, going to help you put it in an Ajax widget and mash it up with the latest Yelp postings.

I can't review the book yet, because my copy's en route. But I can assure you Toby's the right guy to take on such a large topic and deliver on it in style and if your interested in figuring out how to go about handling the volumes of community data you've been building up. There's a chapter on search engines available on O'ReillyNet that's a pretty good indicator, I believe, of where this book is going. It takes you from a lame AltaVista-style search engine to learning, neural-net based search engine that will deliver you personalized results. Nice stuff.

I also appreciate that O'Reilly changed their cover format and put on a flock of penguins for this book.

Labels: ,

13.8.07

Berkeley DB

I confess that I strayed. Perhaps its been spending too much time in fake grad school that had confused me, or perhaps I'm just not as saavy a developer as I thought I was.

I've been working on a certain 100,000,000 row dataset problem using hand built data structures and mmap-ing them out of the file system. This morning, on a lark, I loaded the data into a Berkeley BD instance, and I'm haven't looked back since. Even with a Btree index the database builds faster than my own code, or that of other good netflix libraries, and then you get the benefit of the index on retrieval.

I haven't evagelized for Berkeley DB here yet. And not everyone will agree with me that its a good solution for most of your data access needs (certainly my alma mater didn't. But if you have a highly performant problem and what a quick, simple and very low overhead mechanism for managing your data give Berkeley DB a look. It eve has ruby bindings. It's free as in beer and code. [Updated: See comment below for an actual description of the licensing requirements.]

So for all two of you that read this blog and know and care about these things: Why are we still using relational databases for our websites. I understand the need for flexible systems for our reporting tools and data mining, but when a website has maybe 30 queries that need to be executed, and that database is almost always the first piece of a architecture to fail, why not use a customer database without the overhead of query management? Why are we still parsing SQL for simple CRUD calls when it could be done 10 times faster?

Labels: , ,

9.8.07

Richard Serra

JP and I missed the ginormous (now a word recognized by the O.E.D - w00t!) Richard Serra exhibit by about 16 hours when we were out east for MSA's graduation. Instead we got a bunch of rain. But we both remember the Dia:Beacon exhibit we saw a few years ago fondly and wish we'd made it to MoMa.

The nerdy among you saw the Serra segment on the NewsHours last night, the rest of you will have to settle for wikipedia. In a nutshell he works on many materials and tries to create new spaces and forms using an number of simple transformations like "lifting" or "cutting".

To get a completely innappropriate view of him check out his work in Cremaster 3 as "The Architect" pouring vaseline down the Gugenheim spiral.

At any rate. If you visit or live in NYC you should check out the exhibit before the pieces fall on someome and get banned.

Here's a pretty 'arty' description of his work. IT really doesn't suck as much as these guys make it seem. In fact his pieces are pretty incredible.

Labels: ,

7.8.07

Web 2.0 should be so much more.

Yeah. Second post promised about a month ago. Okay so this post is maybe two years after everyone stopped trying to figure out what's going on here and the models have even reached the point of complete mockability. But I'm still thinking about these things, primarily because I'm reasonably convinced that everyone's got the answer wrong. Web 2.0 by and large refers to a class of Ajaxy (completely irrelevent), community driven (tangentially relevent - but epinions is still Web 1.0) and tagged content (and this is the real story). I'm pretty happy to start with Web 2.0 as a system of tagging platforms. That doesn't include Facebook and MySpace.

So what's going on here then. I think Mark Zuckerberg get's it right when he takes about creating graphs of meaning (I'm extending his comments slightly). I'm confused as to why we've gotten this far into the evolution of Web 2.0 and Facebook is the first company to publically disclose itself as a content managment system with nuanced edges making navigation between content nodes trivial. What's del.icio.us other than a graph of url nodes with typed edges between this nodes -- there's no reason on consider just one set of edges, Facebook as two edge types (users, groups), plus an open API for creating new edge types.

It's frustrating to me that the conversation of Web 2.0 has been so focussed on social edges between active users because there's so much more meaning that can be categorized and made accessible using these tools. The original set of companies, Flickr and Del.icio.us weren't just platforms for communicating, but systems that enabled gross categorization of human knowledge (well the part on the web). That part's mostly gone, and now you just build a site for people to find a date on.

Couldn't we somehow crowdsource an asset of considerable social value? Can't someone create the Google of this world? The data stuctures, query languages and general approaches to these problems have even been solved already using Topic Maps. What's missing is the successful commercialization of these solutions in a slick (okay maybe Ajaxy) interface that caters to people's need to categorize the web.

It's probably way to late for these solutions though. Because next batch of startups are coming that categorize dynamicaly using learning machines rather than crowdsourcing. The technical challenges here are impressive but it's nice to see a few kids here and there getting hot and bothered about these solutions. Since these systems won't have people or tags -- does that make them Web 3.0?

Labels:

1.8.07

Carl Youngblood

For those of you with matching interests, Carl Youngblood wrote a nice introduction to Bayesian Networks. There's some stuff in the back about programming these thingsin Ruby that will limit the audience some, but the section on Conditional Independence is a nice explaination of one of the more important statistical understanding in the recent time. Highly recommended.