Memcached Back to my blog
compiled by Anand Kishore
anand (at) semanticvoid.com
A shot at documenting the super cool cache

I have tried to compile the following from various articles and the posts in the mailing list.

Motivation
Brad Fitzpatrick, the guy behind Memcached, built Memcached in response to the limitations in mod_perl he faced while building the LiveJournal system. One of the limitations he faced was that each process or rather request had its own address space and hence could cache on its own. The cached data was not available to the other processes. What he rather wanted was a global hash table that all web processes on all machines could access. Thats how Memcached came into existence.

How Does It Work
Memcached is similar to a hash table. A hash table is implemented as an array of buckets. Each bucket (array element) contains a list of nodes, with each node containing [key, value]. This list later is searched to find the node containing the right key. Most hashes start small and dynamically resize over time as the lists of the buckets get too long. A request to get/set a key with a value requires that the key be run through a hash function. A hash function is a one-way function mapping a key (be it numeric or string) to some number that is going to be the bucket number. Once the bucket number has been calculated, the list of nodes for that bucket is searched, looking for the node with the given key. If it's not found, a new one can be added to the list.

Memcached represents the user with a dictionary interface but instead of a one way hash function it represents a two way hash. The first kind of hashing is done in the client library where the client decides to which server to send the request to. It does so by hashing the key into a list of virtual buckets, with each bucket representing a Memcached server. Each Memcached server represents a typical hash table.  Each Memcached server instance is independent, unaware of the presence or status of other servers. All these servers are unified together by the client library. If a server fails, the clients can be configured to route around the dead machine or machines and use the remaining active servers.

Memory Management & LRU
Memcached uses a slab allocator for memory allocation. A slab allocator allocates only large chunks of memory, slicing them up into little chunks for particular classes of items, then maintaining free lists for each class whenever an object is freed. In Memcached the classes are decided on the size of the object. That size classes go by the powers of 2: 64 bytes, 128 bytes, 256 bytes, 512bytes, 1k and so on. The size of an item is roughly the equivalent to 'key + size of its data + a fixed header'.

There is no global LRU for for all items, but rather separate LRU's for different size classes. As the cache is filled in initially, memcached allocates chunks of 1Mb of memory to a class whenever it needs to store a new item. It chops up that 1Mb into slots of the appropriate size for that class and makes them available. When all the memory you've given to memcached has been exhausted by this, it starts to remove items from LRU queues in order to put in new ones. But it only does so on an individual class basis. So that, for instance, if you're trying to store an item of total size 10k, memcached will look at the free slots it has for the 16k class, and if it has none, it'll throw away an existing item from that class (the one that hasn't been requested for the longest time) and let your new item take its slot. However, if there are no existing items in the LRU for that class, and no free slots either, because when the cache had been filling an item of that size had never been stored and a chunk of 1Mb had never been given to this class, memcached gives up and returns the error (the reason you get 'out of memory' error).

Another feature of Memcached is that it is lockless. All objects are multiversioned internally and reference counted, so no client can block any other client's actions.

Namespaces
Most of the client libraries allow you to specify a namespace attrubute. This namespace is used as 'namespace + key'. This allows us to specify keys which are specific to the namespace and will not collide with that of another namespace. But this is all the support you get. There is no support for purging or invalidating of namespaces. We can get over this limitation by using the cool hack floating around in the mailing list.

Invalidating Namespaces (by Martin Atkins)
Let's say we have a "namespace" called some.namespace. Now, track "revisions" of this namespace in rev.some.namespace; this is simply an integer which starts its life at zero.

When inserting into or querying memcache, first get the value of this key. Assuming its value is 5, the actual key prefix we use in the request is "some.namespace.5"; eg. "some.namespace.5.userid".

Now a "delete" on that entire namespace can be done atomically by using memcached's atomic increment command to increment rev.some.namespace in-place. Any future gets/sets will go to some.namespace.6.userid. The leftover stale data under some.namespace.5.* will be ignored and eventually expire out of the cache.

There is a possible race condition in this scenario, of course:
Client A fetches rev.some.namespace = 5 Client B flushes some.namespace by incrementing the counter to 6 Client A performs some operation on the old revision

In the set case this is mostly harmless since Client A will just set a memcache key which will be ignored until it expires. In the get case client A gets some stale data. Whether the split-second chance of recieving stale data there matters depends on your application.

Of course, if you delete too frequently you'll end up with lots of dead data hanging around waiting to expire. If you delete really frequently you might find cases where a stale revision zero is still hanging about when the integer overflows. This scenario should  not occur as an application isn't going to be flushing its cache too often, else there is little point in having a cache in the first place.

The final caveat is that the increment command will fail if the key does not already exist. This can be mitigated by having your application, on startup, use the "set unless there's already a value" command to set all of the revision counters to zero. Also the expiry time for this counter could be set to never expire (but you might yet end up losing the counter incase of machine failure).

I'll be updating this document with more information as I delve deeper and deeper into Memcached.