Using raw SQL in django

For a django project I’m working on, I need to run a simple query with COUNT and GROUP BY statements to collect some summary data. I tried working with the model API, and eventually fell back to using raw SQL instead.

Here’s the query I wanted to run:

SELECT
(year/10)*10 as decade,
count(*) as case_count
FROM
docket_case
GROUP BY
decade


One problem I encountered was there was no way (that I could find) to do a GROUP BY. I found one reference to a group_by() method, but I don’t think the patch has been submitted or approved.

I spent some time trying to work out the right way to do it using the model API, before realizing I was banging my head on the wrong brick wall. I was using values() and select() extra(select='...') together, which should have clued me in right away. Well, that and the fact that the data I was retrieving wasn’t one of my model classes.

Using a tip from page 325 of the new django book (see my earlier post for a review), I decided to just go ahead and run the query directly and work with the result set. The results need to go into a template, so I wanted to have dictionaries instead of the tuples I get from cursor.fetchall(). This is what I came up with:

from itertools import *
from django.db import connection

def query_to_dicts(query_string, *query_args):
"""Run a simple query and produce a generator
that returns the results as a bunch of dictionaries
with keys for the column values selected.
"""
cursor = connection.cursor()
cursor.execute(query_string, query_args)
col_names = [desc[0] for desc in cursor.description]
while True:
row = cursor.fetchone()
if row is None:
break
row_dict = dict(izip(col_names, row))
yield row_dict
return


You call it like:

    results = query_to_dicts("""
SELECT
(year/10)*10 as decade,
count(*) as case_count
FROM
docket_case
GROUP BY
decade
""")


or, with arguments (this query counts the cases in each year of a particular decade):

    results = query_to_dicts("""
SELECT
year,
count(*) as case_count
FROM
docket_case
WHERE
year/10 = %s
GROUP BY
year
""", decade/10)


I don’t expect a large result set, but I figured this was an excuse to experiment with generators and iterators. I’m pretty happy with the results, but surprised I had to write something like this myself. I didn’t see a cursor method to fetch the rows as dictionaries instead of tuples, and the search I ran didn’t return anything that looked useful.

Am I missing something?

[Updated 31 Dec: One correction: The method is extra(), the argument is select. I was adding a count field using extra(), and then asking for the value with values(). There does not seem to be a way to structure the query I wanted using the models API methods.]
  • http://www.blogger.com/profile/07642457177220572135 David Avraamides

    This is one of the limitations of Django’s ORM that eventually led to me moving away from Django. The problem is that arbitrarty SQL in general will not return rows that map to a model class so you can’t leverage all the other functionality that the model classes provide, like accessing related objects.

    I know it’s not an easy thing to do and I’m not surprised that it’s not supported directly in Django, but in my domain (financial services) the norm is to do lots of aggregation in SQL so I was always running into this limitation.

    The query_to_dicts function you wrote is pretty common – and a little surprising it’s not part of the DB API. (Try searching for “fetchalldict”). One thing you may want to change on your function is to change desc[0] to desc[0].lower().replace(‘ ‘, ‘_’). Eventually, you might use it to call a view or SP that you didn’t write and you won’t be able to use the .member syntax in templates if they have spaces in them.

  • http://www.blogger.com/profile/01892352754222143463 Doug Hellmann

    Thanks for the tip, David, I’ll add those changes to my function just to be safe.

    Once I figured it out, it was clear to me that trying to use the model interface for these particular queries was just wrong. I don’t mind writing SQL (I actually like it, since I can express what I want very clearly), and in this particular case I’m not loading model data, I’m computing meta-data about those models. So, it makes sense to do it directly instead of trying to use a QuerySet() on the models.

    What was confusing or surprising was that there didn’t seem to be a fetchalldict() method already available.

  • Anonymous

    custom sql django does it help?

  • http://www.blogger.com/profile/01892352754222143463 Doug Hellmann

    @Anonymous – That’s an interesting link, and I’ll consider adding a custom manager with some methods instead of embedding the query in my view function. Thanks!

  • http://www.blogger.com/profile/14649043170344448278 JohnLenton

    did you try importing dictfetchall (or one, or many) from django.db.backend ?

  • http://www.blogger.com/profile/01892352754222143463 Doug Hellmann

    Aha! No, I did not know about dictfetch*(). A quick google shows up some patches, but nothing in the documentation.

    Can I ask how you knew about them?

  • http://www.blogger.com/profile/14649043170344448278 JohnLenton

    dictfecth* has been part of psycopg’s dbapi for a long time (at least as long as I have been using it). I therefore guessed you were using a different backend, so I googled for “MySQLdb dictfetchall”, and one of the first results was from django.db.backends.mysql.base. Then, I checked that it was actually in all the backends, and it is — even in dummy, which makes me think it’s ok to use it in my code. You should probably ask if it really is public api, however :)

  • http://www.blogger.com/profile/01892352754222143463 Doug Hellmann

    Ok, that makes sense. I’ve used psycopg for a while on another project and I thought there was such a function, but couldn’t seem to discover the name. We use it at work with our own ORM, which has the feature built in but it works somewhat differently.

    I came to the opposite conclusion that you did: if I couldn’t find anything about it in the (usually) excellent documentation, it either didn’t exist, wasn’t portable across backends, or wasn’t intended to be used publicly.

  • Anonymous

    Hi Doug – Very nice approach! I agree about aggregation. And also for large or complex joins (e.g. in genomic databases). Do you know if there is a way to issue a stored proc execute on the connection, and get back TWO result sets?

  • http://www.blogger.com/profile/01892352754222143463 Doug Hellmann

    @Anon – I don’t use stored procedures that often. Can they return multiple “result sets” as rows, or do you mean sets of rows? Can you use UNION to combine the sets into one?

  • http://www.blogger.com/profile/08805149752833221751 ggbeta

    Doug thanks for the very useful code snippet. I use it extensively in the app I’m writing. Can you please let me know how you use it for checking if there are any rows present? I would like to display ‘No records found’ message if there are no rows returned. Thank you!

  • http://www.blogger.com/profile/01892352754222143463 Doug Hellmann

    @ggbeta – Good question. In my case, there is always data for the generator to return. If the query returns no data, neither would the generator. If you want to check for data you could pass a COUNT query in, or you could convert the output of the generator to a list and check its length before attempting any further processing of the data.

  • Anonymous

    Thanks for this, saved me some time.

    Is there some way to return the number of rows found? Since it iterates through the rows I thought this would be trivial. I tried putting a counter in the generator but it will only yield one value.

  • http://www.blogger.com/profile/01892352754222143463 Doug Hellmann

    This post was for Django 0.96. I think more recent versions have support for aggregation functions through the ORM, so you might be able to ask for counts that way.

    If not, the cursor object created by the Python DB-API has a rowcount attribute. See http://www.python.org/dev/peps/pep-0249/

  • http://www.blogger.com/profile/03672404826730142638 jrs_66

    Thanks a million for this… I’ve spent MUCH time with the Django ORM thinking I’m just missing something… I now believe the ORM is simply lacking much support for even moderately complex SQL (although it works great for single table selects ;)… It just seems simpler to write straight SQL (and more logical)…

  • http://www.blogger.com/profile/15979653958648881988 Scott

    Using psycopg2:

    import psycopg2.extras

    cursor = connection.cursor(cursor_factory=psycopg2.extras.DictCursor)

    Using Mysqldb:

    import MySQLdb.cursors

    cursor = connection.cursor(MySQLdb.DictCursor)

    cursor.execute() will then return a dictionary like object.

  • http://www.blogger.com/profile/15979653958648881988 Scott

    corrections:

    it’s MySQLdb.cursors.DictCursor, and

    cursor.fetchone() will retrieve a dict like object.

    You get the idea.

  • http://www.blogger.com/profile/01892352754222143463 Doug Hellmann

    @Scott – Thanks for the tips! Using the package-native cursor factories is probably faster than my generator hack, even though it looks like they aren’t portable between backends (a problem for otherwise reusable apps).

  • http://www.blogger.com/profile/05012722401165767905 Skylar Saveland
  • http://www.blogger.com/profile/01892352754222143463 Doug Hellmann

    @Skylar – IIRC, you can use slicing notation with query sets, can’t you?