Sunday, July 3, 2011

Memcached and MySQL (Part I - Memcached)

We all know that disk I/O is expensive then memory and we also know that data in memory is volatile but on disk is non-volatile. Talking about relational database storage, data is stored on disk which means that it is non-volatile but the retrieval and storage is slower then memory. On the contrary, if we store data in memory the retrieval and storage is super fast but the data is volatile and thus prone to loss. The question is whether we can get the best of both worlds. The question will be anwered ahead. Lets first see what Memcached is. Below is a point-wise explanation of Memcached (will try to cover as much as I can)

Memcached in a nutshell

  1. Memcached is a distributed in-memory object caching system. It's distributed, which means that Memcached does not represent a single server but can span hundreds of computers. Its in-memory, which means that all the objects are stored in memory (RAM). Memcached is a distributed version of Memcache.

  2. It's an in-memory key/value store, where data/object can be stored using the key as an identifier of the data/object.

  3. Data is in-memory and is therefore volatile. Its good as a cache but not good for data that needs to be persisted and the loss of which might not be good for the application or the users.

  4. All operations in the Memcached take constant time and hence their complexity is O(1). The basic operations of Memcached are add, set, get, multi-get, delete, replace (Note: a set after a set on the same key is considered to be an update).

  5. All items in Memcached have expiration time. An expiration time of zero '0' means that the item will never expire (here never means 30 days of expiration time). If the expiry time is greater then 30, it will be treated as a UNIX timestamp.

  6. Memcached does not have a garbage collection mechanism. You need to either explicitely delete the item, get an item that is already expired, or wait for Memcached to run out of alloted memory. In short, Memcached memory reclaiming is lazy, which is logical keeping in mind the complexity and processing involved in garbage collection.

  7. Memcached reclaims the memory using the following mechanism:

    1. If an item is requested, Memcached checks it's expiry time. If the item is expired, it returns a negative response and reclaims the memory by freeing its memory.

    2. If Memcached is unable to accommodate any new items, it starts to free the memory of LRU (least recently used) items in order to accommodate new items.

  8. Memcached servers are isolated, which means that one server is unaware of the presence of another server. Where to route the request is the responsibility of the Memcached client library.

  9. Generally you do not need authentication mechanism for Memcached and previously it was not even supported. Now if the client supports, SASL authentication can be used. Generally Memcached infrastructure is in a closed internal network and hence having authentication and other security measures may complicate and introduce unwanted latency to an otherwise simple concept.

  10. Memcached has a client part and the server part. The client part is responsible for routing the request to an appropriate Memcached server in the Memcached server cluster, managing connection and handling failures. The server part is responsible for request processing and reclaiming memory.

  11. You can cache objects, queries, data-set and anything sensible in the Memcached. Just remember its a cache and not a persistent storage.

  12. There is no replication or a fail-over mechanism in Memcached.

  13. Compression and Serielization of cache objects should be investigated when selecting an appropriate client for Memcached. Also connection handling mechanism should be carefully read in order to avoid connection leakages which will render the Memcached server useless.

  14. Hashing Algorithm depend on the clients. Generally 'Consistent Hashing' algorithm is implemented by the clients. This algorithm devises a strategy to distribute the keys across several Memcached servers evenly but the biggest advantage comes in when new servers are added to the Memcached cluster. This algorithm minimizes the number of re-hashed keys whenever a new server is added in comparison to the normal hashing algorithms where re-hashing is considerable.

Below is a diagram that shows the client part and server part of Memcached (in a Memcached cluster of two Memcached servers):



The next part will discuss how we can use Memcached to allieviate the load from the database server, which was the actual motive of this post.

No comments:

Post a Comment

I appreciate your comments/feedback/questions. Please do not spam or advertise.