Should you use memcached? Should you just shard mysql more?
Memcached's popularity is expanding its use into some odd places. It's becoming an authoritative datastore for some large sites, and almost more importantly it's sneaking into the lowly web startup. This is
causing some discussion.
Most of whom seem to be missing the point. In this post I attempt to explain my point of view for how memcached should really influence your bouncing baby startups, and even give some pointers to the big guys who might have trouble seeing the forest through the trees.
Using memcached does not scale your website! Entertain me, I'm playing semantics here: This thing is not for scaleout. Mostly. What memcached really is, is a giant floating magnifying glass. It takes what you have already built and makes stretch ten times further. I insist on not confusing caching with scaleout as when your little stretch-armstrong of a website hits that tenfold limit, you're still screwed. There's no magic switch or configuration option in memcached that will save you from dealing with proper optimization and sharding.
You sure can get away with a hell of a lot though!
Keep it in the front of your mind; no it will not help you batch your writes, or make them smaller, or really help you deal with them in any useful way. If you want to write data you will need back later, you must shard. If it's data you don't care about, maybe write it to memcached and make a note of it in your business plan.
Also strongly keep in mind; memcached won't help your cache misses suck less. If you're writing awful data warehouse quality queries which you expect to run live on the site, go bust out the failboat and get-a-rowin'. You're screwed. As your dataset grows you will find new slices of hell in which your queries behave in all new ways. What once scanned "a few extra rows" now might hit tens of thousands. Cache misses will suck. You will have to deal with this. That's not something this solves.
Sometimes memcached does let you achieve the impossible, or scale the unlikely. Take slightly complex queries, or even template operations, which under the best of conditions might take 15-20 milliseconds each. An obnoxious join, a weird subquery, a tree walk, or fancy HTML templating. Being able to do this live could mean the difference betwen your website standing apart or having to settle with an awful workaround. In these cases, with a high enough hit rate, you can soak those cache misses and make the feature work.
My example isn't translating a 5 second query into 0.5ms with memcached, it's a 15-20ms query. If you had a dozen of these in a page load, a bad load might take an extra quarter second to render, but it wouldn't ruin the user experience. The issue memcached solves here is subtle. Tacking on 0.25 seconds per page render might not make the site completely unusable, but realize these queries are using solid resources on your expensive hardware for that extra quarter second. With a quadcore database, it's possible under the best conditions you would only be able to render 14-16 pages per second off of that machine. Throw in all the other things you have to do on a page load, writes, internal database whoosits and uneven CPU usage and you'd be lucky to get 5 pages per second.
In this case, it's still walking the line of scalability, but it turns something mildly impossible into something highly probable. On the cheap.
The cost equation
Now the most important factor here has reared its ugly head: Cost.
Cost. Ugly for startups. Ugly for established companies. Nightmares for venture capital. What is your cost? Why am I talking cash about companies who have millions of dollars in VC or sales? Just buy more servers! Whatever, right?
Well no. The largest cost is time. All others pale in comparison. The best physical goods investments your company can make are more related to your people than your hardware. Hardware has horrific depreciation. Most of the value is lost immediately, the rest over the first year of operation.
In comparison, buying your employees really fucking nice chairs, desks, and monitors in a swanky comfortable office are much more solid investments for your company. Aeron chairs have great resale value for that inevitable going-bumpkus dot bomb sale. Also anything you do to make your workers happier and more productive will pay out more than any hardware investment. Your product ships on time, you react to the market faster.
To sidestep into hardware a little... Always max out the RAM in your databases. Everyone should. I didn't realize people don't actually do this until I read some of these arguments against memcached. Whenever I add memcached to a website, RAM memcached gets is RAM that didn't fit into the databases, but easily fits into empty memory slots in webservers or cheaper hardware. A good solid database might cost $5,000, but a beefy memcached box will cost less than half that. Way less than that if you just add memory to existing hardware. So "adding that extra RAM to your databases" isn't a very fair apples-to-apples comparison unless you're already doing something wrong.
So it should be obvious just what the hell I'm getting at now, and what seems to be bothering everyone else about this whole stupid memcached fad.
You're all wasting your goddamn time! Yeesh!
How can a small site or startup benefit from memcached?
Simple: The idea.
Caching really wedges your whole RDBMS worldview. You don't just CRUD anymore. Your data is a process. A flow between points instead of just the store and display. At any time in this flow an idea may be injected. Maybe it's serializing a generated object and caching it, maybe it's utilizing
gearman to shift off some asyncronous work. There is just more to it now.
But that's all messy complicated. What can you do? What should you do?
Design for having cache, design for change.
... but don't write all the code yet.
... but certainly design for change.
Think good object design. A "user" is a class. That user has base properties which you might find in the `user` table. A "user" object might have a profile, which is really another object with another class representing a `profile` table.
my $user; is an invaluable abstraction.
That user object must load and store data. When you build this at first it's all standard CRUD. Straight to a database.
Where would you think to add caching to this system? I hope I've made it too obvious.
At the query layer! Use a database abstraction class and have it memcache resultset objects and... No no no, that's a lie. I'm lying. Don't do that.
Do it inside that $user object. At the highest level possible. Take the whole object state and shovel it somewhere. That object is its own biggest authority. It knows when it's been updated, when it needs to load data, and when to write to the database. It might've had to read from several tables or load dependent objects based on what you ask it to do.
Instead of wrangling your best and brightest into figuring out a cache invalidation algorithm which might work "okay" against your schemas, do what's simple for the object. If adding caching to the $user object means the load() function tries memcached first, and all write operations hit memcached with a delete operation, so be it. You just added basic caching to one of the hottest objects in your website in, oh, half an hour. Maybe a few days if you're really scraping the bottom of the talent barrel.
Now we're back where we started. Reap the time benefits! Abstract your data access methods properly, plan for caching. Actually go write caching into a few objects. Maybe turn it off when you're done. You don't need it yet. Write your objects to talk directly to your database and save time.
Same idea for sharding. Either focus on that now, or realize you can take a $user object and extend its load() magic to find and write to users based on a sharding scheme. You probably don't have to rewrite all of the code to make this happen. Refactor to win.
So now you're ready. You're building your site fast and abstracting where you can. Brace for change. Be ready to shard, be ready to cache. React and change to what you push out which is actually popular, vs overplanning and wasting valuable time. Keeping it simple is gold here.
You're building something new and you're going to fail at it. Your design will be wrong, you will anticipate the wrong feature to be popular. Dealing with this quickly can set you apart. Being able to slap memcached into a bunch of objects in a few days (or even hours) can mean the difference between riding a load spike or riding the walrus.
Bullet points for fun! How can your small site benefit from memcached:
- Design for change! Holy crap I can't say this enough.
-
Don't cache in ways that piss off your users.- Not keeping it simple is fail.
- Cache and shard at the highest level possible relative to your data.
- Read
High Performance MySQL 2nd ed. Memcached won't fix your lack of database knowledge.
- The same ideas which help you prepare for cache, helps you prepare for sharding.
- Don't waste all your time getting it right now. Get it close, get an idea, try it out, and prepare to be wrong.
Finally:
- Keep an open mind. Sites like grazr and fotolog do things differently. Doesn't mean they're right, doesn't mean they're wrong. Be inventive where it makes sense for your business.
There. Sorry this came out so long :)