Fun With Heat Maps
It's been a while. Long enough that RD called me out on not posting since the summer. I can't promise to keep up with his ferocious pace, but I really ought to get back in gear. So hear we go (we'll start easy):
As many of you know (and have been bored to tears by) I've been wasting a good deal of my time on the Netflix Prize. Other than failing submissions, there's a bunch of interesting byproducts of this effort. Here are a few heat maps I've created from the Netflix Prize Corpus. There's not a lot of information there, but I thought a few of them were pretty interesting to look at. Not much to say about them other than color represents probability density, and the curves on the left and the bottom of the charts are the marginal distributions of the probability density marked up for the contribution coming from different intensity sources (so a very cool colored peak would be a long tail effect, etc.) The colors suck, if someone has a good system for allocating colors to intensity someway I'd love to hear about it.
There is some pattern emergence in a few having to do with the integer nature of the data that you don't get to see every day.

Some of them need extra binning on account of the dimension variable being flakey, so there's a lot less density over all here.

A few misleading, but tempting linear relationships that you can't, actually, run a standard regression against with any hopes of getting real results.


And some that show just about perfect noise.

0 Comments:
Post a Comment
Links to this post:
Create a Link
<< Home