Django Tip: A Denormalization Alternative

In creating an any website with textual content, you have the choice of either writing plaintext or writing in a markup language of some kind. The immediately obvious choice for markup language is HTML (or XHTML), but HTML is not as human-readable as something like Textile, Markdown, or Restructured Text. The advantage of choosing one of those human-readable alternatives is that content encoded using one of them can be translated very easily into HTML.

When one of my friends started designing his blog using Django, it got me thinking about how best to deal with that translated HTML. It seems like a waste to keep re-translating it every time a visitor views the page, but it also seems like it's redundant to keep the translated HTML stored in the database.

Here's my solution to the problem: cache it. For a month. Here's an example, using Restructured Text:

from django.db import models
from django.contrib.markup.templatetags.markup import restructuredtext
from django.core.cache import cache
from django.utils.safestring import mark_safe

class MyContent(models.Model):
    content = models.TextField()

    def _get_content_html(self):
        key = 'mycontent_html_%s' % str(self.pk)
        html = cache.get(key)
        if not html:
            html = restructuredtext(self.content)
            cache.set(key, html, 60*60*24*30)
        return mark_safe(html)
    content_html = property(_get_content_html)

    def save(self):
        if self.id:
            cache.delete('mycontent_html_%s' % str(self.pk))
        super(MyContent, self).save()

What I'm doing here is writing a method which either gets the translated HTML from the cache, or translates it and stores it in the cache for a month. Then, it returns it as safe HTML to display in a template. The last thing that we do is override the save method on the model, so that whenever the model is re-saved, the cache is deleted.

There we go! We now have the HTML-rendered data that we want, and no duplicated data in the database. Keep in mind that this way of doing things becomes more and more useful the more RAM that your webserver has.

21 Comments So Far...

By Ben at 4:21 p.m. on Feb. 15, 2008

Have you given this any performance tests? I'd be curious to see if network latency offsets the gains from not having to reprocess each time.

 

By Eric Florenzano at 4:32 p.m. on Feb. 15, 2008

No, I haven't. I'm running memcached locally on the same machine as my webserver, so it's not really an issue for me. I'd suspect that if you're avoiding fetching items from the cache due to network latency, then there may be a problem with your network or architecture.

 

By Ben at 5:23 p.m. on Feb. 17, 2008

The basis of my question is: why implement caching if it doesn't decrease time-to-output? For large objects that are expensive to create, of course you'll want to cache it. For small bits that are cheap to generate, why bother caching it -- you'll likely make your app slower.

I ran a highly unscientific experiment using 'blog-sized' input text run through markdown... I was getting 4ms as a mean generation time... so there's no gain in going over a network to hit a cache.

Sure, you'll be waiting on IO, so you can have higher concurrency... but it's a questionable gain.

I agree that there's merit to your approach, and appreciate the write-up. I was just trying to explore the trade-offs.

 

By Eric Florenzano at 5:33 p.m. on Feb. 17, 2008

I definitely agree that it's not for every application.

My view about it is that every programming choice is about tradeoffs. In this case, the tradeoff is processing cycles vs. RAM space used. I've just chosen the latter of the two--which makes sense for me because my processing power is limited and slow on my server arrangement, but RAM is plentiful.

That said, I do see your point. If push came to shove, and memory started becoming a problem, this would definitely be the first thing to go.

Thanks for the comment!

 

By Max Battcher at 4:38 p.m. on Feb. 15, 2008

An alternative to writing your caching in the model is to make use of the new-ish cache template tag:

http://www.djangoproject.com/documentation/cache/#template-fragment-caching

So then your template might look like::

{% load cache %}
{% load markup %}
{% cache 2592000 blog_post object.id %}
{% object.content|restructuredtext %}
{% endcache %}

 

By Eric Florenzano at 4:41 p.m. on Feb. 15, 2008

Definitely! That's a great solution for a lot of different things--and I definitely thought about that. However, it's a little unclear how one goes about invalidating that data. That's why I ended up recommending explicit caching.

 

By Doug Napoleone at 9:43 p.m. on Feb. 15, 2008

NOTE: there is an even better way.

Use 'rstpages' and combine the power of restructuredtext and flatpages!!! (and it doubles as a wiki)

This is the backbone of the new PyCon website.
http://us.pycon.org/

Want to see something really cool?
Check out the recent changes RSS feed:
http://us.pycon.org/2008/recent/?feed=rss2

And I am barely scratching the surface.
https://pycon.coderanger.net/

 

By Eric Florenzano at 6:27 p.m. on Feb. 16, 2008

Where can I find out more info about rstpages? Is it this module?
https://pycon.coderanger.net/browser/django/trunk/pycon/restructuredtext

Also, I'm not really sure why an RSS feed would have the straight ReST and not processed ReST, but that's an interesting idea.

Anyways, thanks for all of your hard work on PyCon-Tech stuff. I'm sure we'll run into each other at PyCon!

 

By Rajesh Dhawan at 3:26 p.m. on Feb. 18, 2008

You could also add a non-editable content_html field straight to your model and compute it by overriding MyContent.save().

This idea is used by the Textpattern CMS and there's a Django write up on it here:

http://code.djangoproject.com/wiki/UsingMarkup

 

By Eric Florenzano at 3:51 p.m. on Feb. 18, 2008

Yep, that would be classic denormalization. It's a good solution in certain situations.

 

By wholesale jewelry at 3:17 a.m. on May 22, 2009

Great post,nice website,thank you/

 

By ben10 oyunları at 3:02 a.m. on May 25, 2009

This idea is used by the Textpattern CMS and there's a Django write up on it here:

 

By injection molding at 1 a.m. on June 14, 2009

That's absolutely right!

 

By WoW Gold at 3:37 a.m. on June 18, 2009

Thx for this great news for us, it is really cool.

 

By jordan shoes at 3:42 a.m. on June 25, 2009

Yep, that would be classic denormalization. It's a good solution in certain situations

 

By jordan shoes at 3:44 a.m. on June 25, 2009

Yep, that would be classic denormalization. It's a good solution in certain situations

 

By ugg boots at 3:45 a.m. on June 25, 2009

Great post,nice website,thank you/

 

By nike shoes at 3:47 a.m. on June 25, 2009

Great post,nice website,thank you/

 

By tiffany jewellery at 3:48 a.m. on June 25, 2009

No, I haven't. I'm running memcached locally on the same machine as my webserver, so it's not really an issue for me. I'd suspect that if you're avoiding fetching items from the cache due to network latency, then there may be a problem with your network or architecture.

 

By Owen at 1:12 a.m. on July 2, 2009

Hi guys. Anyone who has gumption knows what it is, and anyone who hasn't can never know what it is. So there is no need of defining it.
I am from Burundi and now study English, give true I wrote the following sentence: "The fleapit is a cafe bar on columbia rd right next to the park."

Regards 8) Owen.

 

By hearthomes at 10:04 p.m. on July 3, 2009

 

Voice your opinion...