Muhammad Abdullah's Blog - technology as i C it: Memcache

Sunday, July 10, 2011

Memcached and MySQL (Part II – Memcached + MySQL)

In part one of this blog (you can see it here) I gave a detailed overview of what Memcached really is. In this part I will address the usage of Memcached to alleviate the load from MySQL. In a scalable + high traffic application, database will prove to be a bottleneck due to the limitations of disk read/write rate, which is slow as compared to reading from memory. As Memcached is a memory-based distributed object storage cache, we can utilize the power of Memcached. Will avoiding database a good optimization? Yes it is. This is where Memcached will come into play, avoiding the requests to hit MySQL. Well not in all cases but yes to a considerable extent. Having Memcache installation on the same server as MySQL and using it will help the data source to perform much better. When I say Memcache it means that I am refering to only one instance of Memcache and not a distributed version (in the latter case its Memcached). Having Memcache on the same server as MySQL means that memory will be distributed between Memcache and MySQL which is not what I prefer. I would prefer a separate server dedicated to Memcache so MySQL can utilize the memory and definitely get more of it. Below is the architecture that I am considering:

As you can see that we have a MySQL server and a Memcache Server. We generally have queries that are either fetching data from database or adding/deleting/updating data in the database. For each type we can follow a particular sequence:

Read Queries:

Check Memcached if the data set is available
If its available then congratulations you just avoided a hit to the database
If its not, too bad. Get the result set from the database
You want to avoid hitting the database again, so set the data set in Memcache

Write Queries:

Write data to the database
If the write was a success, then either drop the data set from the Memcache or update the Memcache with current information

Of course this is a very naive concept when it comes to the practical implementation. Practically things are very much different and a bit complicated when it comes to the usage of Memcached. Additionally there can be different levels of caching in an application which are constructed overtime in the life cycle of an application and are based on the needs. Also we can increase the Memcache servers to develop a Memcached cluster to cater the needs of ever increasing data. Now we can clearly see that the performance of the application will increase greatly by reducing the load on the database, using an in-memory key value storage.

Sunday, July 3, 2011

Memcached and MySQL (Part I - Memcached)

We all know that disk I/O is expensive then memory and we also know that data in memory is volatile but on disk is non-volatile. Talking about relational database storage, data is stored on disk which means that it is non-volatile but the retrieval and storage is slower then memory. On the contrary, if we store data in memory the retrieval and storage is super fast but the data is volatile and thus prone to loss. The question is whether we can get the best of both worlds. The question will be anwered ahead. Lets first see what Memcached is. Below is a point-wise explanation of Memcached (will try to cover as much as I can)

Memcached in a nutshell

Memcached is a distributed in-memory object caching system. It's distributed, which means that Memcached does not represent a single server but can span hundreds of computers. Its in-memory, which means that all the objects are stored in memory (RAM). Memcached is a distributed version of Memcache.
It's an in-memory key/value store, where data/object can be stored using the key as an identifier of the data/object.
Data is in-memory and is therefore volatile. Its good as a cache but not good for data that needs to be persisted and the loss of which might not be good for the application or the users.
All operations in the Memcached take constant time and hence their complexity is O(1). The basic operations of Memcached are add, set, get, multi-get, delete, replace (Note: a set after a set on the same key is considered to be an update).
All items in Memcached have expiration time. An expiration time of zero '0' means that the item will never expire (here never means 30 days of expiration time). If the expiry time is greater then 30, it will be treated as a UNIX timestamp.
Memcached does not have a garbage collection mechanism. You need to either explicitely delete the item, get an item that is already expired, or wait for Memcached to run out of alloted memory. In short, Memcached memory reclaiming is lazy, which is logical keeping in mind the complexity and processing involved in garbage collection.
Memcached reclaims the memory using the following mechanism:
1. If an item is requested, Memcached checks it's expiry time. If the item is expired, it returns a negative response and reclaims the memory by freeing its memory.
2. If Memcached is unable to accommodate any new items, it starts to free the memory of LRU (least recently used) items in order to accommodate new items.
Memcached servers are isolated, which means that one server is unaware of the presence of another server. Where to route the request is the responsibility of the Memcached client library.
Generally you do not need authentication mechanism for Memcached and previously it was not even supported. Now if the client supports, SASL authentication can be used. Generally Memcached infrastructure is in a closed internal network and hence having authentication and other security measures may complicate and introduce unwanted latency to an otherwise simple concept.
Memcached has a client part and the server part. The client part is responsible for routing the request to an appropriate Memcached server in the Memcached server cluster, managing connection and handling failures. The server part is responsible for request processing and reclaiming memory.
You can cache objects, queries, data-set and anything sensible in the Memcached. Just remember its a cache and not a persistent storage.
There is no replication or a fail-over mechanism in Memcached.
Compression and Serielization of cache objects should be investigated when selecting an appropriate client for Memcached. Also connection handling mechanism should be carefully read in order to avoid connection leakages which will render the Memcached server useless.
Hashing Algorithm depend on the clients. Generally 'Consistent Hashing' algorithm is implemented by the clients. This algorithm devises a strategy to distribute the keys across several Memcached servers evenly but the biggest advantage comes in when new servers are added to the Memcached cluster. This algorithm minimizes the number of re-hashed keys whenever a new server is added in comparison to the normal hashing algorithms where re-hashing is considerable.

Below is a diagram that shows the client part and server part of Memcached (in a Memcached cluster of two Memcached servers):

The next part will discuss how we can use Memcached to allieviate the load from the database server, which was the actual motive of this post.