You are on page 1of 27

Camlistore

http://camlistore.org/ Brad Fitzpatrick, brad@danga.com 2011-02-01

What is Camlistore?
20% project, personal itch to scratch o sick of building CMS-like systems o livejournal, photos, brackup, scanning cabinet, my websites, ... o hacking since jun 2010, idle planning for ~year before that. storage system? o yeah, but not the whole story. "a way to store, sync, share, model and back up content" ...

Religion (or lack thereof)


"cloud" or local? both. o for storage o for content creation (sync adapters) programming language? any, all. o interfaces are what matters. identity / verifiableness you should own your content, have backup dorks are great, but this must be usable. o mental model: "Your stuff. Can't screw it up."

oh, right, it's an acronym...


ContentAddressable, MultiLayer, Indexed, Storage

Content-Addressable
at the bottom-most layer, everything is addressed by its digest of bytes Terminology: "Blob" -- 0 or more bytes. No extra-metadata.

"Blobref" -- handle to a blob, in form <hashfunc><xxxx> e.g. sha1-8a30407962eeb19b309b78ddf587aea18ab55232

Content-Addressability properties
trivial caching / syncing: you have it or you don't. o no "which version do you have?" content-deduplication o multiple users having same content, o filesystem backup snapshots incrementals are cheap etc...

Multi-Layer
Unix school of thought: o small, well-defined composable tools Camlistore has multiple layers / parts: o blob server: super dumb o schema: how one might represent data o search/indexer: make sense of dumbness, data o frontend: interact with world, sharing.

Architecture

Blob Server: how dumb it is...


Private operations, to owner of data only: o get(blobref) -> blob o put(blobref, blob) o enumerate(..) -> [(blobref, size), ...] Public/non-owner operations: o none!
GET /camli/sha1-xxxxxxxxx HTTP/1.0 .... Hello, world!

Blob Server: dumbness continued


so, just blobs. remember: no meta-data no "filenames" no "mime types" no "{create,mod,access} time" nothing seriously, no metadata! o we will fight you. (and have)

Uh, what can you do with that?


Not a terrible lot. But let's start with an easy example at this layer...

Filesystem backups
previous project: brackup o slide/dice/encrypt S3 backup, contentaddressed, but only files C-A, not dirs. fossil/venti, git: recursive directories contentaddressed o git: "tree objects" camlistore: "schema blobs"

Schema Blobs
so if all blobs are just dumb blobs of bytes with no metadata, how do you store metadata? as blobs themselves! how to recognize it? same way you sniff a JPEG. magic. start with a '{'? parse as JSON? in memory schema -> JSON object serialization with "camliVersion" key == "schema blob"

Schema Blob
Minimal "schema blob" is: { "camliVersion": 1, "camliType": "whatever" } Whitespace doesn't matter. Just must be valid JSON in its entirety. Use whatever JSON libraries you've got.

That one is named sha1-19e851fe3eb3d1f3d9d1cefe9f92c6f3c7d754f6

Schema Blob; type "file"


{"camliVersion": 1, "camliType": "file", "fileName": "foo.dat", "unixPermission": "0644", ..., "size": 6000133, "contentParts": [ {"blobRef": "sha1-...dead", "size": 111}, {"blobRef": "sha1-...beef", "size": 5000000, "offset": 492 }, {"size": 1000000}, {"blobRef": "digalg-blobref", "size": 22}, ] }

Schema Blob; type "directory"


{"camliVersion": 1, "camliType": "directory", "fileName": "foodir", "unixPermission": "0755", ..., "entries": "sha1-c3764bc2138338d5e2936def18ff8cc9cda38455", }

Schema Blob; type "static-set"


{"camliVersion": 1, "camliType": "static-set", "members": [ "sha1-xxxxxxxxxxxx", "sha1-xxxxxxxxxxxx", "sha1-xxxxxxxxxxxx", "sha1-xxxxxxxxxxxx", "sha1-xxxxxxxxxxxx", "sha1-xxxxxxxxxxxx", ], }

Backup your filesystem...


$ camput --file $HOME sha1-8659a52f726588dc44d38dfb22d84a4da2902fed

(like git/hg/fossil, that identifier represents everything down.)


Iterative backups are cheap, easy identifier to share, etc.

But what about mutable data?


immutable data is easy to represent & reference how to represent mutable data in an immutable, content-addressed world? how to share a reference to a mutable object when changing an object mutates its name?

Objects & Permanodes

Terminology...
"object": a set of blobs representing a mutable object. you modify an object by adding a new mutation claim blob to the set. "signed schema blob" or "claim": a schema blob that you JSON-sign. (OpenPGP) aside: bootstrapping tools for this. "permanode": a blob that's just a signed schema blob of a random number that serves as the anchor and reference point for the blob. like a "permalink" on the web.

Permanode
$ camput --permanode sha1-ea799271abfbf85d8e22e4577f15f704c8349026 $ camget sha1-ea799271abfbf85d8e22e4577f15f704c8349026 {"camliVersion": 1, "camliSigner": "sha1-c4da9d771661563a27704b91b67989e7ea1e50b8", "camliType": "permanode", "random": "oj)r}$Wa/[J|XQThNdhE" ,"camliSig":"iQEcBAABAgAGBQJNRxceAAoJEGjzeDN/6vt8ihIH/Aov7FRIq4dODAP WGDwqL1X9Ko2ZtSSO1lwHxCQVdCMquDtAdI3387fDlEG/ALoT/LhmtXQgYTt8Qq DxVduEK1or6/jqo3RMQ8tTgZ+rW2cj9f3Q/dg7el0Ngoq03hyYXdo3whxCH2x0jajSt4 RCcgdXN6XmLlOgD/LVQEJ303Du1OhCvKX1A40BIdwe1zxBc5zkLmoa8rClAlHdq wogxYFY4cwFm+jJM5YhSPemNrDe8W7KT6r0oA7SVfOan1NbIQUel65xwIZBD0ah CXBx6WXvfId6AdiahnbZiBup1fWSzxeeW7Y2/RQwv5IZ8UgfBqRHvnxcbNmScrzlp3 V3ZoY==BfKn"}

Search / Indexer...

Search / Indexer
subscribes to blobs in real-time o or enumerates / mapreduces world on init builds index of: o directed blob graph, o resolved attributes, o set memberships, o dates, tags, ... o whatever's needed eventually consistent

Privacy Model & Sharing


all your blobs & searches are private o nothing public by default to share something (a blob, object, or search query e.g. "recent public photos of mine") you create a "share" claim o claim = "signed schema blob"

demo: http://camlistore.org/docs/sharing

Questions?

http://camlistore.org/

You might also like