By nature of all so-called „No-SQL“ databases, there is no query language which offers the same expressiveness as SQL. In particular, there is no notion like “joins”. In most cases, this is not an issue since the more flexible structure of a document makes it unnecessary to distribute information to different tables, i.e. there is simply nothing to join.
However, if you need to do something similar to joins, there is a solution: view collations. The original idea has already been described by Christopher Lenz back in 2007. With CouchDB version 0.11, the include_docs parameter improves this approach.
Combining view collation with appropriate list functions provides an easy way to combine several documents into one complex JSON document which can be queried.
Let’s look at an example of a many-to-many relationship based on the data model I use for the Haynes catalog. A “work” represents a piece of music which has been written by one or more “composers”. Needless to say, a composer may have written several works. Users can add comments to works. For various reasons I would like to keep these three types of documents normalized, i.e. works, composers and comments are stored in different components.
In a CouchApp, I would like to render a work including composer information and the comments. Since the result should be search-engine friendly, I would like to avoid javascript to load the data asynchronously.
Ideally, I would like to have an enhanced work document which includes an array of composers and comments.
To see how this can be done, let’s look at an example:
{_id: “work_1”, type: ”work”, title: “Title of work 1”, composer_ids: [“composer_1”, “composer_2”] }
{_id: “composer_1”, type: ”composer”, name: “Name of composer 1” }
{_id: “composer_2”, type: ”composer”, name: “Name of composer 2” }
{_id: “comment_1”, type: “comment”, text: “Comment for work 1”, work_id: “work_1” }
{_id: “comment_2”, type: “comment”, text: “Another comment for work 2”, work_id: “work_1” }
As you can see, there is one work written by two composers and having two comments.
I would like to create the following document which could be used in my template to render html:
{_id: “work_1”, type: ”work”, title: “Title of work 1”,
composers: [
{_id: “composer_1”, type: ”composer”, name: “Name of composer 1” },
{_id: “composer_2”, type: ”composer”, name: “Name of composer 2” }],
comments: [
{_id: “comment_1”, type: “comment”, text: “Comment for work 1”, work_id: “work_1”},
{_id: “comment_2”, type: “comment”, text: “Another comment for work 2”, work_id: “work_1”}]
}
Wouldn’t it be great if I could use this document in may template and simply write <%= work.composers[0].name %> instead of ajax calls?
Combining the known techniques from above with a list function, gives you exactly this possibility. Let’s start with the map function for the view called “works”:
function (doc) {
if (doc.type == "work") {
emit([doc._id], {_id:doc._id});
for (var ix in doc.composer_ids) {
emit([id, "composers", ix], {_id: doc.composer_ids[ix]});
}
}
if (doc.type == "comment") {
emit([doc.work_id, "comments", doc._id], {_id: doc._id});
}
}
There is no reduce function for this view. Actually there is an approach to use reduce functions to merge the documents into one document, but as you can see in the comments to this article this is not advised for larger data sets.
Instead, we use a list function, let’s call it “join”:
function(head, req) {
provides("json", function() {
var data = [];
var dataItem = {};
var row, key;
while (row = getRow()) {
if (row.key.length == 1) {
if (dataItem._id && dataItem._id != row.value._id) {
data.push(dataItem);
}
dataItem = row.doc;
} else {
object = dataItem;
for (var i = 1; i < row.key.length - 1; i++) {
if (!(row.key[i] in object))
object[row.key[i]] = [];
object = object[row.key[i]];
}
object[row.key[row.key.length - 1]] = row.doc;
}
}
data.push(dataItem);
return toJSON(data);
});
}
Before we go into detail what happens in this method, let’s look at the result of the query http://localhost:5984/db_name/_design/design_name/_list/join/works?include_docs=true&startkey=[“work_1”]&endkey=[“work_1”,{}]:
[{_id: “work_1”, type: ”work”, title: “Title of work 1”,
composers: [
{_id: “composer_1”, type: ”composer”, name: “Name of composer 1” },
{_id: “composer_2”, type: ”composer”, name: “Name of composer 2” }],
comments: [
{_id: “comment_1”, type: “comment”, text: “Comment for work 1”, work_id: “work_1”},
{_id: “comment_2”, type: “comment”, text: “Another comment for work 2”, work_id: “work_1”}]
}]
The result is an array of the enhanced work documents. In a real example, the “data” array or rather only the first element would be used in a template to provide html instead of json, but you get the idea.
Now let’s have a closer look at what happens in the list function. If the key is an array with only one element (row.key.length == 1), we store the corresponding document. If we encounter a new one, we push the old one as this is obviously completely processed.
If the key has more than 1 element, the additional elements contain the property names of the final document which should contain the document. For example the key [“work_1”, “composers”, 0] means that work[“composers”][0] (=work.composer[0]) should contain the document for this key. The for loop makes sure that the properties exist (if not, they are initialised as []). Finally the document can be stored in the property.
As a result, you can create arbitrarily complex compound objects joining several separate documents. As you will typically join only a relative small number of documents, the list function performs well.
Tags: CouchDB