For the last month or two the Digg engineering team has spent quite a bit of time looking into, playing with and finally deploying Cassandra in production. It’s been a super fun project to take on – but even before the fun began we had to spend quite a bit of time figuring out Cassandra’s data model… the phrase “WTF is a ’super column’” was uttered quite a few times.
If you’re coming from an RDBMS background (which is almost everyone) you’ll probably trip over some of the naming conventions while learning about Cassandra’s data model. It took me and my team members at Digg a couple days of talking things out before we “got it”. In recent weeks a bikeshed went down in the dev mailing list proposing a completely new naming scheme to alleviate some of the confusion. Throughout this discussion I kept thinking: “maybe if there were some decent examples out there people wouldn’t get so confused by the naming.” So, this is my stab at explaining Cassandra’s data model; It’s intended to help you get your feet wet & doesn’t go into every single detail but, hopefully, it helps clarify a few things.
Powered by ScribeFire.