Drop-dead simple Django caching

Caching is easy to screw up. Usually it's a manual process which is error-prone and tedious. It's actually quite easy to cache, but knowing when to invalidate which caches becomes a lot harder. There is a subset of caching the caching problem that, with Django, can be done quite easily. The underlying idea is that every Django model has a primary key, which makes for an excellent key to a cache. Using this basic idea, we can cover a fairly large use case for caching, automatically, in a much more deterministic way. Let's begin.

First, we need to decide upon a setting for how long each individual item should be saved in the cache. I'm going to call that SIMPLE_CACHE_SECONDS and grab it like so:

from django.conf import settings

SIMPLE_CACHE_SECONDS = getattr(settings, 'SIMPLE_CACHE_SECONDS', 2592000)

The next thing we need to do is be able to generate a cache key from an instance of a model. Thanks to Django's _meta information, we can get the app label and model name, plus the primary key, and we're all set.

def key_from_instance(instance):
    opts = instance._meta
    return '%s.%s:%s' % (opts.app_label, opts.module_name, instance.pk)

So now let's start setting the cache! My preferred way to do it is via a signal, but you could do it in a less generic way by overriding save on a model. My signal looks like this:

from django.core.cache import cache
from django.db.models.signals import post_save

def post_save_cache(sender, instance, **kwargs):
    cache.set(key_from_instance(instance), instance, SIMPLE_CACHE_SECONDS)
post_save.connect(post_save_cache)

Now that we're putting items in the cache, we should probably delete them from the cache when the model instance is deleted:

from django.db.models.signals import pre_delete

def pre_delete_uncache(sender, instance, **kwargs):
    cache.delete(key_from_instance(instance))
pre_delete.connect(pre_delete_uncache)

This is all good and well, but right now we don't really have a way to get at that information. Cache is pretty useless if we never use it! Our interface to the database is through the model's QuerySet, so let's make sure that our QuerySet is making good use of our newly-populated cache. To do so, we'll subclass QuerySet:

from django.db.models.query import QuerySet

class SimpleCacheQuerySet(QuerySet):
    def filter(self, *args, **kwargs):
        pk = None
        for val in ('pk', 'pk__exact', 'id', 'id__exact'):
            if val in kwargs:
                pk = kwargs[val]
                break
        if pk is not None:
            opts = self.model._meta
            key = '%s.%s:%s' % (opts.app_label, opts.module_name, pk)
            obj = cache.get(key)
            if obj is not None:
                self._result_cache = [obj]
        return super(SimpleCacheQuerySet, self).filter(*args, **kwargs)

The only method that we really need to overwrite is filter, since get and get_or_create both just rely on filter anyway. The first for loop in the filter method just checks to see if there is a query by id or pk, and if so, then we construct a key and try to fetch it from the cache. If we found the item in the cache, then we place it into Django's internal result cache. At that point we're as good as done. Then we just let Django do the rest!

This SimpleCacheQuerySet won't be used all on its own though, we need to actually force a model to use it. How do we do that? We create a manager:

from django.db import models

class SimpleCacheManager(models.Manager):
    def get_query_set(self):
        return SimpleCacheQuerySet(self.model)

Now that we have this transparent caching library set up, we can go around to all of our models and import it and attach it as needed. Here's how that might look:

from django.db import models
from django_simplecache import SimpleCacheManager

class BlogPost(models.Model):
    title = models.TextField()
    body = models.TextField()

    objects = SimpleCacheManager()

That's it! Just by attaching this manager to our model we're getting all the benefits of per-object caching right away. Of course, this isn't comprehensive. It does hit the vast majority of use cases, though. If you were to use this for a real site, however, then you wouldn't be able to use update method. It's a little bit trickier since there's no post_update signal, but it's nowhere near impossible. Let's just say that, for now, it's being left unimplemented as an exercise for the reader. in_bulk would be actually quite fun to implement, too, because you could get all of the results possible from cache, and all the rest could be gotten from the database, then merge those two dictionaries before returning.

I think this would be a really good reusable Django application. Essentially, we've grown a library from the ground up that really isn't all that much code. I think it took me 20 minutes to write the actual code, but with some serious polish and love, this library could evolve into something that I think many reusable apps would use to great benefit. What do you think? What should a good, simple, Django caching library have?

24 Comments So Far...

By Massimo at 2:01 a.m. on Nov. 29, 2008

Yes, Django cache is nice but you have one problem: You force the developer, once for all, in the settings, to decide where cache lives. Sometimes you want to cache data in different places depending on context and type, within the same app.

In web2py for example, you do
cache.ram(key,f,t)
and the output of function f is cache in ram. Same for cache.disk and cache.memcache. The developer can create additional caching mechanisms using the same interface, pipe existing caches as in:
cache.ram(key,lambda: cache.disk(key,f,t0),t1)
and use them as decorators for controller actions :
@cache(...cache.ram...).
or in database queries without modified the models:
rows=db().select(...cache=(cache.ram,t))

Perhaps Django could use a similar mechanism and thus allow more flexibility.

Perhaps I am wrong. In that case you should amend the Django documentation.

 

By Andreas at 4:58 a.m. on Nov. 29, 2008

You are saying that taking decisions is a bad thing or what? I think it's rather good that theres many ways to cache with django because depending on the size of your site it will depend on how you cache.

This is an easy way of caching but it will end up filling your ram with a clone of your db. In most cases if you want to lower the load on your servers its better to cache processed data rather than raw data. Though I could see a combination of this and the great staticgenerator some performece ass...

 

By Orestis Markou at 6:22 a.m. on Nov. 29, 2008

I'm pretty sure the above comments states the opposite, that Django has only one cache (so far as the OP can tell, I don't know if that's right), whereas web.py has different caches that you can use depending on the data/context.

I'm pretty sure you can set up Django with different caches, though - it doesn't seem that complicated. You may even roll your own caching layer, built with the same machinery.

 

By matt at 2:22 a.m. on Nov. 30, 2008

django support cache backends, be they disk, database, memcached, or your own homegrown "cachinations".

personally, i just pre-generate every possible request for my websites and save them as gzipped tarballs using md5 hashes. its much faster and robust than using an im-memory cache, because no one's memory is that good.

 

By Alexander Solovyov at 12:23 p.m. on Nov. 29, 2008

 

By Alex at 12:53 p.m. on Nov. 29, 2008

One issue with the way you've implemented(that I didn't think of last night :P ) is that it breaks lazy filtering on the queryset.

 

By Mayuresh at 11:01 p.m. on Dec. 2, 2008

The cache would be used for some models which have

objects = SimpleCacheManager().

post_save_cache and pre_delete_uncache would be called for all the models in the system. So caching would occur for all models.
Cache space would be wasted for models which do not have objects = SimpleCacheManager().

 

By Sumit Chachra at 5:41 p.m. on Dec. 12, 2008

How about pass the sender in the constructor of the manager (unless there is some magical way to pass it in Python) and then register with the save and delete signals only for that sender.

Registration is guaranteed to happen as long as those models are being used.

Would this be an ok approach?

 

By Sumit Chachra at 5:35 p.m. on Dec. 12, 2008

Any hints on how to make this perfect, given the lack of post_update signal ? I know this was an exercise to the reader, but would love to get some hints!

 

By nendyGoarcace at 3:57 a.m. on April 17, 2009

nice, really nice!

 

By Taavi at 2:30 p.m. on April 21, 2009

Tried this out with the current Django 1.1 beta and it doesn't appear to work anymore. As far as I can tell, for some reason the internal result_cache is being reset after the cached item is retrieved, but before the queryset tries to return the object. I spent a few hours tinkering around, but wasn't able to come up with a solution. Maybe someone else can figure something out. Back to using the low-level cache API I guess...

 

By ben 10 oyunları at 3:06 a.m. on May 25, 2009

So caching would occur for all models.
Cache space would be wasted for models which do not have objects = SimpleCacheManager().

 

By phony alex at 1:21 p.m. on June 8, 2009

ya, I'm pretty sure you can set up Django with different caches, though - it doesn't seem that complicated.

You may even roll your own caching layer, built with the same machinery.

 

By Buy WoW Gold at 4:25 a.m. on June 13, 2009

I’d like to know more details, thx.

 

By stop premature ejaculation at 1:30 p.m. on June 15, 2009

There is a subset of caching the caching problem that, with Django, can be done quite easily.

 

By lingerie at 8:10 p.m. on June 22, 2009

Computing values from a normalized database takes time, and so it's a good target for memorization. Even if it's cheap to compute or pull at the database, the HTTP daemon might not be on the same computer

 

By runescape gold at 7:41 p.m. on June 23, 2009

Life is a leaf of paper white, thereon each of us may write his word or two.

 

By jordan shoes at 12:32 a.m. on June 25, 2009

So caching would occur for all models.
Cache space would be wasted for models which do not have objects = SimpleCacheManager().

 

By jordan shoes at 12:33 a.m. on June 25, 2009

One issue with the way you've implemented(that I didn't think of last night :P ) is that it breaks lazy filtering on the queryset.

 

By ugg boots at 12:34 a.m. on June 25, 2009

Any hints on how to make this perfect, given the lack of post_update signal ? I know this was an exercise to the reader, but would love to get some hints!

 

By nike shoes at 12:35 a.m. on June 25, 2009

I’d like to know more details, thx.

 

By tiffany jewellery at 12:36 a.m. on June 25, 2009

Thanks for your idea/

 

Voice your opinion...