Correlating documents in CouchDB

I’m in the very fortunate position, two actually, of being able to 1) migrate my biggest production application from MySQL to CouchDB, and 2) build a stunning new system for a multinational welfare organization on top of CouchDB.

I’ve been lurking in the CouchDB world for quite some time and have spent a lot of time wrestling with how to loosely relate documents in CouchDB to each other. A big part of learning to use CouchDB successfully is to break away from the shackles of the relational world. Relationships between documents is one such a shackle that seems hard to break.

It is unavoidable that data has to be correlated, and I wanted to rethink how to do it. After plenty of discussions in #ruote with John Mettraux we came up with a model based on how the web works. Since CouchDB is “off the web”, the approach feels quite fitting to me and hopefully to you too.

First some insight into my thinking at this stage.

The web has been successful in loosely expressing relationships between documents. Take two examples:

and

For those of you reading this through a reader, click through to see the gist’s above.

Simple as it seems, in both cases we have a document that is somehow related to the page. The nature of the relationship is expressed via the rel attribute, and the target specified via the href attribute. This got me thinking. Since CouchDB is made off the web, can’t these same principles be applied?

Yes, they can. And here is how:

Currently you might be tempted to express relationships link this in your JSON:

Where changing it to this holds the key:

If anything this format, albeit more verbose, expresses the relationships more clearly and in a format that is web friendly. We’ve broken the shackles of relational thinking.

Enter Correlate

Correlate is an experiment in this line of thinking. It is a mixin for CouchRest’s extended documents that allows you to express these relationships:

Correlate generates getter and setter methods for working with your relationships and lot more (review the README). It also includes a compatibility layer for ActiveRecord to help when you’re migrating from ActiveRecord to CouchRest or building a system on CouchDB that needs to access legacy data via ActiveRecord.

The project is still pretty much a moving target, but I’d love to hear how others address the same issues. Correlate does a great job at maintaining relationship information in a web friendly manner and providing you with some convenience around the verbose data structure. Correlate also has a lot of room for improvement, but that will hopefully change over the coming days as I continue integrating it deeply into my existing projects.

Please fork the project on github and join the experiment with me.

flattr this!

  • TheSorrow

    But how do you fetch the related data ? You need a view… Do you generate the view on the fly ?

  • http://www.aimred.com Farrel

    What about a hybrid approach? I’ve been thinking about using a SQL database and to purely use it to model the essential relationships which need representing (and not to use it to model the entities) and then using a NoSQL database to store the document for fast retrieval. Not sure if that’s been tried before in the NoSQL world or if it’s even feasible…

  • Kenneth Kalmer

    @TheSorrow – Correlate generates a view dynamically on the class where related_to() is called. In any case the documents _id is known to correlate, so it picks them individually from the database.

  • Sammy

    I like “links” “microformat” and now considering to use it in my pet project.

  • http://jchrisa.net J Chris A

    This is really cool. It looks like what I’d hoped to see in Ruby land. Thanks for sharing.