Materialized Views with Django (This Old Pony #102)

how

                April 28, 2022

            Materialized Views with Django (This Old Pony #102)

            In the last edition of This Old Pony we introduced the database view and its more valuable cousin, the materialized view. Materialized views give us a way of denormalizing and caching data directly in the database.
If you’re using Django’s ORM you probably want a nice way of accessing this data with all the accommodations the ORM provides.
Direct method
We’ll start with two facts. A database view (materialized or otherwise) is queryable in the same way that a table is. And the Django ORM lets you map existing database tables to model classes without managing the table schema. This should hint at the beginning of the solution!
Given that you have a view, you can create a model class which maps the resultant columns to model fields and set manage = False in your model’s Meta class options.
That’s it!
Implementing the direct method
Given that “that’s it!” isn’t really an adequate explanation, let’s take a look at how you could actually implement this.
Let’s assume that our model, or view in this case, is an aggregation of season stats for basketball players. To simplify our example, all the stats are already aggregated per game, so all we need to do is aggregate over a single season (compared to some hyper precise system where individual points, rebounds, etc. are tied to a game minute).
A view might look like:
CREATE VIEW event_counts_view AS
        SELECT events.event, sum(events.count) AS total
        FROM events;

That leaves our model looking like this:
class EventCounts(models.Model):
    event = models.CharField(max_length=100)
    total = models.IntegerField()

    class Meta:
        db_table = "event_counts_view"
        managed = False

If the view is already present in the database, i.e. responsibility for its creation is entirely out of scope of the Django app, then you’re done! However if you actually want to create this view, whether in production, development, or test runs (!!!) then you’ll need a separate step. Thankfully this is as simple as creating a blank migration with the --empty flag and then adding the CREATE VIEW SQL:
class Migration(migrations.Migration):

    dependencies = [
        ...
    ]

    operations = [
        migrations.RunSQL("CREATE VIEW..."),
    ]

The same steps work for creating a materialized view, except of course replacing VIEW with MATERIALIZED VIEW.
This leaves two more requirements to address: refreshing the materialized view, and updating the view or materialized view in a migration in the event that it changes.
The former can be accomplished with by adding a simple function that executes the SQL to refresh the materialized view. This can be called from signals in your app, a periodic task, or a management command. And the migrations can be satisfied by adding new empty migrations that use the CREATE OR REPLACE VIEW or CREATE OR REPLACE MATERIALIZED VIEW command.
But honestly this is probably starting to sound a tad tedious, isn’t it?
Class-ifying the logic
One solution I’ve worked with in the past involves adding some classmethods to the View models to provide a nice handle for dropping, creating, and refreshing materialized views. Dropping is necessary because you cannot update the schema for a view, you can only replace it.
That looks a bit like this:
class MaterializedEventCounts(models.Model)
    ...

    @classmethod
    def create_view_sql(cls):
        return "SELECT * FROM view_table;"

    @classmethod
    def drop_view_sql(cls):
        return "DROP VIEW mat_view CASCADE;"

    @classmethod
    def refresh_sql():
        return "REFRESH MATERIALIZED VIEW CONCURRENTLY mat_view;"

    @classmethod
    def refresh(cls):
        with connection.cursor() as c:
            c.execute(cls.refresh_sql())

This omits other methods for adding indices, for example. By breaking everything up into little classmethods this allows composition, and then ultimately adding two methods to an empty migration to drop the views and rebuild them each time there’s a change to the view query or indexes.
The migration is little more than:
class Migration(migrations.Migration):
    dependencies = [...]
    operations = [
        migrations.RunSQL(
            MaterializedEventCounts.rebuild_view_sql()
        )
     ]

Where of course the mystery rebuild_view_sql classmethod returns a list of SQL strings that drops the view and rebuilds it. We’ll ignore the issues surrounding importing a concrete model in a migration.
Here I share only the flavor of this solution instead of the whole thing in order to illustrate it without recommending it. It’s somewhat tedious and just a little verbose! It is overkill for one view and unnecessary overhead for more than a handful. It is in fact, a solution pursued only because somebody (not to point fingers here) was unaware that the unmaintained package for managing PostgreSQL views had been forked and revived.
Django apps to the rescue
The package in question is django-pgviews-redux.
It replaces all of the crufty little classmethods, the need to specify an ID field for a materialized view, adds support for Django’s Meta options including indexes, and side steps the need for “hand crafted” migrations.
Without regurgitating the documentation, here’s a short example from a demo project that shows how to implement both database views and materialized views with django-pgviews-redux(please excuse the uninspired names):
from django.db import models

from django_pgviews import view as pg

class Event(models.Model):
    event = models.CharField(max_length=100)
    count = models.IntegerField()

    class Meta:
        db_table = "foo_events"

class EventView(pg.View):
    sql = """
      SELECT foo_events.event, sum(foo_events.count)
      FROM foo_events;
    """

    class Meta:
      db_table = 'foo_view'
      managed = False

class EventMatView(pg.MaterializedView):
    dependencies = ['fun.EventView',]
    sql = """SELECT * from foo_view;"""

    class Meta:
        db_table = 'foo_mat_view'
        managed = False

This is a somewhat trivial example but it’s just enough to show what the models look like in practice.
One difference in the fields between rolling your own materialized view and using django-pgviews-redux MaterializedView is that the latter automatically adds a default sequential ID field. Otherwise you need to specifically identify which field will be the primary key. 
And it provides some additional affordances like a nice little refresh() method for refreshing your materialized view, concurrently, too!
The bigger take aways:

Materialized view-based models created with pg.MaterializedView can use the Meta.Indexes option just like a model, which may save some headache in creating index on materialized views[2], too, or at least make them more legible in the code.
Migrations seem to just work. This was something I was pretty skeptical about, and to be fair, haven’t pursued testing in a production environment yet, BUT I didn’t need to do anything to see changes in the database. I haven’t gone back to the project to more deeply inspect why or how but I was left with more delight than suspicion.

NEXT ISSUE: The Python PaaS showdown!
Two months ago I started a mini-project to test out various platforms as a service (PaaS) for deploying Python web applications. It involves deploying a single, simple app to different platforms and comparing the entirety of the experience.
The initial project already includes:

Heroku
Platform.sh
Render

I’m planning on adding the following to the assessment:

Railyway
Fly.io

What other production-deployment, Python-friendly PaaS would you like to see included? I’m sure I’m missing some.
Not-quite-punctually yours,
Ben

[0] django-pgviews-redux
[1] Dimitri Fontaine (Major Contributor to PostgreSQL) recommends in his book The Art of PostgreSQL on using an intermediate view for materialized views, such that the query definition is left to the view and the materialized view simply SELECT * from the view. The core rationale is that it makes it easy to discern if there is drift, and how much, between a materialized view and the source data. I think it also makes testing simpler ~ maybe ~ in that you can pull data from the view instead of the materialized view in test runs and avoid the need to refresh the materialized view for every relevant test function.
[2] This is pretty powerful, and is restricted to materialized views (for reasons that will be apparent), but you can add indexes to your materialized views. This means everything from unique constraints to full text search, e.g. add a column to the view query creating your search tokens and then index that in the materialized view. Et voilà, fast search.

Don't miss what's next. Subscribe to This Old Pony: Running Django apps in the Wild: