Using raw SQL in django
For a django project I’m working on, I need to run a simple query with COUNT
and GROUP BY
statements to collect some summary data. I tried working with the model API, and eventually fell back to using raw SQL instead.
Here’s the query I wanted to run:
SELECT
(year/10)*10 as decade,
count(*) as case_count
FROM
docket_case
GROUP BY
decade
One problem I encountered was there was no way (that I could find) to do a GROUP BY
. I found one reference to a group_by() method, but I don’t think the patch has been submitted or approved.
I spent some time trying to work out the right way to do it using the model API, before realizing I was banging my head on the wrong brick wall. I was using values()
and [STRIKEOUT:select()
] extra(select='…')
together, which should have clued me in right away. Well, that and the fact that the data I was retrieving wasn’t one of my model classes.
Using a tip from page 325 of the new django book (see my earlier
post) for a
review), I decided to just go ahead and run the query directly and
work with the result set. The results need to go into a template, so I
wanted to have dictionaries instead of the tuples I get from
cursor.fetchall()
. This is what I came up with:
from itertools import *
from django.db import connection
def query_to_dicts(query_string, *query_args):
"""Run a simple query and produce a generator
that returns the results as a bunch of dictionaries
with keys for the column values selected.
"""
cursor = connection.cursor()
cursor.execute(query_string, query_args)
col_names = [desc[] for desc in cursor.description]
while True:
row = cursor.fetchone()
if row is None:
break
row_dict = dict(izip(col_names, row))
yield row_dict
return
You call it like:
results = query_to_dicts("""
SELECT
(year/10)*10 as decade,
count(*) as case_count
FROM
docket_case
GROUP BY
decade
""")
or, with arguments (this query counts the cases in each year of a particular decade):
results = query_to_dicts("""
SELECT
year,
count(*) as case_count
FROM
docket_case
WHERE
year/10 = %s
GROUP BY
year
""", decade/10)
I don’t expect a large result set, but I figured this was an excuse to experiment with generators and iterators. I’m pretty happy with the results, but surprised I had to write something like this myself. I didn’t see a cursor
method to fetch the rows as dictionaries instead of tuples, and the search I ran didn’t return anything that looked useful.
Am I missing something?
[Updated 31 Dec: One correction: The method is extra()
, the argument is select. I was adding a count field using extra()
, and then asking for the value with values()
. There does not seem to be a way to structure the query I wanted using the models API methods.]