You are on page 1of 1

CouchDB and Topic Maps by Hans-Henning Koch, University of Leipzig - contact: phi04bib@studserv.uni-leipzig.

de

What is CouchDB?
CouchDB is an open source document-oriented database management sys-
tem and an incubation project by the Apache Software Foudation.
It is written in Erlang, a functional programming language, designed for dsi-
tributed, fraud-tolerant and fast applications - perfectly fitting to CouchDB’s
goals: to be a fast, scalable and lock-free DBMS.
As ‘document-oriented database’ implies, the data is stored in semi-structured
documents and not in tables like in common RDBMS. Therefor CouchDB
doesn’t need a scheme. View-functions are used to report on documents.
CouchDB can be accessed using a RESTful JSON API. There are libaries
for many programming language that allow a comfortable access, but curl or
Apache’s HttpClient for Java, which I am using, are working well, too.
Documents
Documents in CouchDB are simply JSON Objects - a set of key-value pairs.
There are two special attributes each document has: _id and _rev. The ID is
an unique identifier for a document in the datadabse. If not provided when
creating a document, the database assigns the ID automatically. The _rev at-
tribute stores a revision number of the document. When a document is modi-
fied, the revision is updated, but it’s old revision can still be accessed until the
database is compacted.
The document model is lockless and optimistic. When an edited document is
saved but the revision doesn’t match the currenct revision in the database be-
cause someone else saved it after you’ve red it from the database, an update
conflict is returned. Then, the current version has to be loaded, the changes
reapplied and saved again.
Example: a topic name

{
“_id” : “nameID”,
“_rev” : “12345”,
Views “value” : “this is a name”,
“type” : “typeID”,
Views in CouchDB are Javascript functions which are traversing all docu- “scope” : [“scopeTopicID“],
“iids” : “itemIdentifier“,
ments. They are stored in special documents, called design documents. For
“parent” : “parentTopicsID”
each document, key-value pairs can be emittet. The value can be the whole }
document, parts of it or something calculated based on that document. The
key is compared with the parameters the view was queried with and match- Topic map constructs as documents
ing key-value pairs are returned to the application. It is not possible to include
something from a different document which is specified in the first one, for Now, why use CouchDB as a backend for Topic Map engines? The concept
example retrieving a topic with it’s names included. In that case, more than of document is much more closer to the nature of topic maps and their con-
one call of a view is needed. That may sound like a disadvantage but it is not. structs than the storage in a relational database is. In a document-oriented
CouchDB is keeping the querys simple by intention which contributes to scal- database a construct can be saved as a whole an doesn’t have to be split up
ability. among several tables. This way, the number of join-like operations can be
The view model supports Google’s MapReduce algorithm. Each view has to greatly reduced and the structure of the queries is kept relatively simple. The
have a map function and may have a reduce function. When a reduce func- lock-less environment and and CouchDB’s scalability furthermore contributes
tion is present, the map function is passing intermediate results to the reduce to a good performance.
function which is calculating the final return value based on them.
To be able to store topic map constructs in CouchDB a few more attributes
Views to retrieve Topic Map constructs than mentioned in the TMDM have to be added. Firstly an ID that unique-
ly identifies the construct which is required by the DB. Secondly a revision,
The views that are needed to retrieve topic map constructs from the database which is required by default, too. The revision is updated after every change
are pretty simple. Reduce functions are not needed. All views have a similar that is saved - the retuned JSON string after saving a document contains
structure like the examples below have. I am using 28 views in my Topic Map the new revision. And finally a document type attribute, because for example
engine. when querying for names that contain a certain topic in their scope, we don’t
want to get their variants, too and a type attribute is the only way to accom-
Example: variantMergeCheck plish that without having seperate tables for the different constructs.
function(doc) {
if(doc.documenttype == ‘VARIANT’) { How to deal with potential inconsistency
scope = new Array(doc.scope.length);
for(var i = 0; i < doc.scope.length; i++) { Before writing a change to the database, first it has to be checked, wether
scope[i] = doc.scope[i];
} scope.sort();
the document that is to be saved is in its lates revision. If not, it has to be re-
emit([doc.value, doc.datatype, scope, doc.parent], doc); loaded an the changes reapplied in it’s newest revision. Unfortunately, TMAPI
} doesn’t contain exception that can alert the user that this case has just hap-
} pened. Therefor the changes would have to be discarted without notice or be
Example: getNameByTheme accepted that the implementation is not 100 percent TMAPI comformable.
When a modification of a document entails changes in other documents, for
function(doc) {
if(doc.documenttype == ‘NAME’) {
example when merging, all writing operations have to be successful or oth-
if(doc.scope == null) { erwise all have to be reverted. But in that case, it could happen, that a docu-
emit(‘none’, doc); ment to be reverted has changed in the meantime, again. A possible solution
} to this problem would be to use CouchDB 0.8.0 (current version is 0.10.0)
for(var i in doc.scope) {
which supports a bulk update operation that fails when a document is out of
emit(doc.scope[i], doc);
} revision.
}
}
Further reading
CouchDB website http://couchdb.apache.org/
CouchDB Wiki http://wiki.apache.org/couchdb/
CouchDB: The Definitive Guide http://books.couchdb.org/relax/ a free O’Reilly Media book

You might also like