Sunday, July 27, 2008

Playing with Memcached Part - 1



Reactions: 

One technology I am truly excited about is memcached. One word that comes to my mind when I think of it is mesmerized. I am so excited about it that all I can think of is to modify all famous open source softwares and make them work with memcached out of the box.

This three part series on memcached is brain dump of whatever I know about it. Following is how I plan to divide the information
  • Part one: Answers to what, where, how, who. Advantages. Components. Overall idea.
  • Part two: Component details. Client-Server division. Detailed discussion of BeITMemcached
  • Part three: A simple utility using memcached
Instead of answering what, let me start with where and who.

Where?
Memcached is everywhere :). Every architecture that matters.

Who?
Every site that matters. Facebook, Digg, Twitter, Slashdot.

Now, I come back to what.

So, what is Memcached?
Memory + Caching + Daemon. That's all it is. It is a in-memory process which allows software developers to store data in a giant hash table which can be distributed amongst several machines. It is common to misunderstand how memcached distributes. I will explain it with more detail.

Hmm, what is so great about it?
It is great because it was first to solve a great problem. Its greatness amplifies because it is simple.

What is the problem?
Reason I listed answers for where and who before writing anything about memcached is to insinuate that it is being used by the sites with the highest traffic on the Internet. For these websites scalability is a major concern. They have to service ever growing, ever demanding user base without compromising on QoS.

Disk is the slowest component of the physical architecture of an application. Writes are especially the killers. Look at the following diagram.
basic application architecture
It is the most common client-server architecture. Requests to serve content is coming from the Internet. Application is replicated for load distribution and database is clustered. Database is essentially the disk and hence the biggest bottleneck when it comes to scaling.

One may argue to use in memory tables. Reasons why memcached is advantageous over that

1) In memory tables essentially requires complete structure to be in memory. Some tables might be so huge that they may not even fit in the memory available.
2) No distribution. This results in lots of space left unutilized.
3) Scaling them again requires complex schemes. Master-Slave variants for example. Which after a level is tedious.

Coming back to distribution:
Memcached is so powerful because it is distributed. But unlike databases or common distribution schemes there is no central controller which distributes the data. Also there is no self automated distribution amongst nodes running memcached. Look at the following diagram. Which is wrong in case of memcached
memcached wrongAs I mentioned earlier, memcached's biggest features is its simplicity. Instead of entirely being a server only application, it is divided amongst two components. This is the most important part to understand from developer's perspective.

1) Client: Runs as part of application code. Gets list of servers running memcached server and uses an internal hashing scheme to distribute data evenly.
2) Server: Very simple and concise process supporting commands to store, delete and retrieve data.

Look at the following diagram. This is how distribution works in memcached.
memcached rightThis an important point to understand. Application code communicates with memcached client library. Which generates a hash based on the key passed to it. Based on this key it selects the server to communicate. The hashing scheme should be consistent across all client servers.

Typical application architecture with memcached:
typical architecture with memcachedMost important addition here is a separate process which can be independent from application server. This server can have schemes to persist data on the disk.

Some issues:
  • No authentication support. So servers has to be firewalled.
  • By default supports maximum of 1 MB data as value.
  • By default key length is 256 characters.
Some Links
http://www.danga.com/memcached (origins, FAQs, more links)
http://www.splinedancer.com/memcached-win32/ (latest Windows port)

1 comments:

Anonymous said...

Can anyone recommend the robust Network Monitoring utility for a small IT service company like mine? Does anyone use Kaseya.com or GFI.com? How do they compare to these guys I found recently: [url=http://www.n-able.com] N-able N-central software distribution
[/url] ? What is your best take in cost vs performance among those three? I need a good advice please... Thanks in advance!