Posted on Feb. 15, 2008 at 3:28 P.M.

In creating an any website with textual content, you have the choice of either writing plaintext or writing in a markup language of some kind. The immediately obvious choice for markup language is HTML (or XHTML), but HTML is not as human-readable as something like Textile, Markdown, or Restructured Text. The advantage of choosing one of those human-readable alternatives is that content encoded using one of them can be translated very easily into HTML.

When one of my friends started designing his blog using Django, it got me thinking about how best to deal with that translated HTML. It seems like a waste to keep re-translating it every time a visitor views the page, but it also seems like it's redundant to keep the translated HTML stored in the database.

Here's my solution to the problem: cache it. For a month. Here's an example, using Restructured Text:

from django.db import models
from django.contrib.markup.templatetags.markup import restructuredtext
from django.core.cache import cache
from django.utils.safestring import mark_safe

class MyContent(models.Model):
    content = models.TextField()

    def _get_content_html(self):
        key = 'mycontent_html_%s' % str(self.pk)
        html = cache.get(key)
        if not html:
            html = restructuredtext(self.content)
            cache.set(key, html, 60*60*24*30)
        return mark_safe(html)
    content_html = property(_get_content_html)

    def save(self):
        if self.id:
            cache.delete('mycontent_html_%s' % str(self.pk))
        super(MyContent, self).save()

What I'm doing here is writing a method which either gets the translated HTML from the cache, or translates it and stores it in the cache for a month. Then, it returns it as safe HTML to display in a template. The last thing that we do is override the save method on the model, so that whenever the model is re-saved, the cache is deleted.

There we go! We now have the HTML-rendered data that we want, and no duplicated data in the database. Keep in mind that this way of doing things becomes more and more useful the more RAM that your webserver has.

Ben
at 4:21 p.m.
on Feb. 15, 2008

Have you given this any performance tests? I'd be curious to see if network latency offsets the gains from not having to reprocess each time.

at 4:32 p.m.
on Feb. 15, 2008

No, I haven't. I'm running memcached locally on the same machine as my webserver, so it's not really an issue for me. I'd suspect that if you're avoiding fetching items from the cache due to network latency, then there may be a problem with your network or architecture.

Ben
at 5:23 p.m.
on Feb. 17, 2008

The basis of my question is: why implement caching if it doesn't decrease time-to-output? For large objects that are expensive to create, of course you'll want to cache it. For small bits that are cheap to generate, why bother caching it -- you'll likely make your app slower.

I ran a highly unscientific experiment using 'blog-sized' input text run through markdown... I was getting 4ms as a mean generation time... so there's no gain in going over a network to hit a cache.

Sure, you'll be waiting on IO, so you can have higher concurrency... but it's a questionable gain.

I agree that there's merit to your approach, and appreciate the write-up. I was just trying to explore the trade-offs.

at 5:33 p.m.
on Feb. 17, 2008

I definitely agree that it's not for every application.

My view about it is that every programming choice is about tradeoffs. In this case, the tradeoff is processing cycles vs. RAM space used. I've just chosen the latter of the two--which makes sense for me because my processing power is limited and slow on my server arrangement, but RAM is plentiful.

That said, I do see your point. If push came to shove, and memory started becoming a problem, this would definitely be the first thing to go.

Thanks for the comment!

at 4:38 p.m.
on Feb. 15, 2008

An alternative to writing your caching in the model is to make use of the new-ish cache template tag:

http://www.djangoproject.com/documentation/cache/#template-fragment-caching

So then your template might look like::

{% load cache %}
{% load markup %}
{% cache 2592000 blog_post object.id %}
{% object.content|restructuredtext %}
{% endcache %}

at 4:41 p.m.
on Feb. 15, 2008

Definitely! That's a great solution for a lot of different things--and I definitely thought about that. However, it's a little unclear how one goes about invalidating that data. That's why I ended up recommending explicit caching.

at 9:43 p.m.
on Feb. 15, 2008

NOTE: there is an even better way.

Use 'rstpages' and combine the power of restructuredtext and flatpages!!! (and it doubles as a wiki)

This is the backbone of the new PyCon website.
http://us.pycon.org/

Want to see something really cool?
Check out the recent changes RSS feed:
http://us.pycon.org/2008/recent/?feed=rss2

And I am barely scratching the surface.
https://pycon.coderanger.net/

at 6:27 p.m.
on Feb. 16, 2008

Where can I find out more info about rstpages? Is it this module?
https://pycon.coderanger.net/browser/django/trunk/pycon/restructuredtext

Also, I'm not really sure why an RSS feed would have the straight ReST and not processed ReST, but that's an interesting idea.

Anyways, thanks for all of your hard work on PyCon-Tech stuff. I'm sure we'll run into each other at PyCon!

Rajesh Dhawan
at 3:26 p.m.
on Feb. 18, 2008

You could also add a non-editable content_html field straight to your model and compute it by overriding MyContent.save().

This idea is used by the Textpattern CMS and there's a Django write up on it here:

http://code.djangoproject.com/wiki/UsingMarkup

at 3:51 p.m.
on Feb. 18, 2008

Yep, that would be classic denormalization. It's a good solution in certain situations.

Search

 

Recent Links

  • Callcast - Discussion with Jeff Croft
  • Great discussion with Jeff Croft and Kevin Fricovsky, talking about Django, design, web standards, and various other things. Kevin has really been on fire lately in his blog, and Jeff has some good stuff to say. Both their sites are bookmarks, for sure.

  • git-issues: A distributed issue tracker built-in to Git.
  • I predicted this back in March--can't believe a solution has surfaced so soon. It makes so much sense to build in an issue tracker to a revision control system. Since you're working with the code, might as well track the issues in the same system and take advantage of the extra metadata. This is really cool (and as a bonus, it's written in Python) so I hope to see it really grow and flourish!

  • How I Work Daily
  • Daily blog by Kevin Fricovsky. In addition to having some really great content, he has started to post audio interviews with people from the Django community. This is a site to keep an eye on in the coming days and months.

  • django-arcade
  • Demo site for django-arcade, an open source reusable Django app to add new flash games to any django-powered site. Looks very cool for easily creating game portals. It also comes from my future employer.

  • Facebook Chat and Scalability (with Erlang)
  • Eugene Letuchy talks about how they they took Facebook Chat from no users to 70 million users, with the help of Erlang.

  • Simon Willison: The Implications of OpenID
  • I somehow missed this presentation when it came out, but it's an absolutely fantastic overview and defense of OpenID by Simon Willison. If you are in any way interested in what OpenID is and what it can offer, you owe it to yourself to check out this presentation.

  • See the rest of my links...

Pownce

Badges

  • django badge
  • apache badge
  • GeoURL
  • XFN Friendly
  • Valid HTML 4.01 Transitional