| Memcached |
Back to my blog
compiled by Anand
Kishore
anand (at) semanticvoid.com |
A shot at documenting the
super cool cache
I have tried to compile the following from various articles and the posts in the mailing list.
Motivation
Brad Fitzpatrick, the
guy behind Memcached, built Memcached in response to the limitations in
mod_perl he faced while building the LiveJournal system. One of the
limitations he faced was that each process or rather request had its
own address space and hence could cache on its own. The cached data was
not available to the other processes. What he rather wanted was a
global hash table that all web processes on all machines could access.
Thats how Memcached came into existence.
How
Does It Work
Memcached is similar to a hash table. A
hash
table is implemented as an array of buckets. Each bucket (array
element) contains a list of nodes, with each
node containing [key, value]. This list later is searched to find the
node containing the right key. Most
hashes start small and dynamically resize over time as the lists of the
buckets get too long. A request to get/set a key with a value requires
that the key be run through a hash function. A hash
function is a one-way function mapping a key (be it numeric or string)
to some number that is going to be the
bucket number. Once the bucket number has been calculated, the list of
nodes for that bucket is searched,
looking for the node with the given key. If it's not found, a new one
can be added to the list.
Memcached represents the user with a dictionary interface but instead
of a one way hash function it represents a two way hash. The first kind
of hashing is done in the client library where the client decides to
which server to send the request to. It does so by hashing the key into
a list of virtual buckets, with each bucket representing a Memcached
server. Each Memcached server represents a typical hash table.
Each Memcached server instance is independent, unaware of the
presence or status of other servers. All these servers are unified
together by the client library. If a server fails, the clients can
be configured to route around the dead machine or machines and use the
remaining active servers.
Memory
Management & LRU
Memcached
uses a slab allocator for memory allocation. A slab allocator
allocates only large chunks of memory, slicing them up into
little chunks for particular classes of items, then maintaining free
lists for each class whenever an object
is freed. In Memcached the classes are decided on the size of the
object. That size classes go by the powers of 2: 64 bytes, 128 bytes,
256 bytes, 512bytes, 1k and so on. The size of an item is roughly the
equivalent to 'key +
size of its data + a fixed header'.
There is no global LRU for for all items, but rather separate LRU's for
different size classes. As the cache is filled in initially, memcached
allocates chunks of 1Mb of memory to a class whenever it needs to store
a new item. It chops up that 1Mb into slots of the appropriate size for
that class and makes them available. When all the memory you've given
to memcached has been exhausted by this, it starts to remove items from
LRU queues in order to put in new ones. But it only does so on an
individual class basis. So that, for instance, if you're trying to
store an item of total size 10k, memcached will look at the free slots
it has for the 16k class, and if it has none, it'll throw away an
existing item from that class (the one that hasn't been requested for
the longest time) and let your new item take its slot. However, if
there are no existing items in the LRU for that
class, and no free slots either, because when the cache had been
filling
an item of that size had never been stored and a chunk of 1Mb had never
been given to this class, memcached gives up and returns the error (the
reason you get 'out of
memory' error).
Another feature of Memcached is that it is lockless. All objects are multiversioned internally
and reference counted, so no client can block any other client's actions.
Namespaces
Most
of the client libraries allow you to
specify a namespace attrubute. This namespace is used as 'namespace + key'.
This allows us to specify keys which are specific to the namespace and
will not collide with that of another namespace. But this is all the
support you get. There is no support for purging or invalidating of
namespaces. We can get over this limitation by using the cool hack
floating around in the mailing list.
Invalidating Namespaces (by
Martin Atkins)
Let's say we have a "namespace" called some.namespace. Now, track
"revisions" of this namespace in rev.some.namespace; this is
simply an integer which starts its life at zero.
When inserting into or querying memcache, first get the value of this
key. Assuming its value is 5, the actual
key prefix we use in the
request is "some.namespace.5"; eg. "some.namespace.5.userid".
Now a "delete" on that entire namespace can be done atomically
by using
memcached's atomic increment command to increment rev.some.namespace
in-place. Any future gets/sets will go to some.namespace.6.userid. The
leftover stale data under some.namespace.5.* will be ignored and
eventually expire out of the cache.
There is a possible race condition in this scenario, of course:
Client A fetches rev.some.namespace = 5
Client B flushes some.namespace by incrementing the counter to 6
Client A performs some operation on the old revision
In the set case this is mostly harmless since Client A will
just set a
memcache key which will be ignored until it expires. In the get case
client A gets some stale data. Whether the split-second chance of
recieving stale data there matters depends on your application.
Of course, if you delete too frequently you'll end up with lots of dead
data hanging around waiting to expire. If you delete really
frequently
you might find cases where a stale revision zero is still hanging about
when the integer overflows. This scenario should
not occur as an application isn't going to be flushing its cache too
often, else there is little point in having a cache in the first place.
The final caveat is that the increment command
will fail if the key does not already exist. This can be mitigated by
having your application, on startup, use the "set unless there's
already
a value" command to set all of the revision counters to zero. Also the
expiry time for this counter could be set to never expire (but you
might yet end up losing the counter incase of machine failure).
I'll be updating this document with more information as I delve deeper and deeper into Memcached.