Monthly Archives: December 2012

Testing with Django, Haystack and Whoosh

The problem: you want to test a Django view for results of a search query, but Haystack will be using your real query index, built from your real database, instead of an index built from your test fixtures.

Turns out you can generalise this for any Haystack back-end by replacing the haystack.backend module with the simple back-end.

from myapp.models import MyModel
from django.test import TestCase
import haystack

class SearchViewTests(TestCase):
    fixtures = ['test-data.json']

    def setUp(self):
        self._haystack_backend = haystack.backend
        haystack.backend = haystack.load_backend('simple')

    def tearDown(self):
        haystack.backend = self._haystack_backend

    def test_search(self):
        results = SearchQuerySet().all()
        assert results.count() == MyModel.objects.count()

My first attempt at this made changes to the project settings and did HAYSTACK_WHOOSH_STORAGE = "ram" which works but was complicated because then you have to re-build the index with the fixtures loaded, except the fixtures don’t get loaded in TestCase.setUpClass, so the choice was to load the fixtures myself or to re-index for each test. And it was specific to the Whoosh back-end of course.

(This is for Django 1.4 and Haystack 1.2.7. In my actual project I get to deploy on Python 2.5. Ain’t I lucky? On a fucking PowerMac G5 running OS X Server 10.5 for fuck sacks.)

Optimizing queries in Haystack results

My Adobe software updates app (which uses Haystack + Django to provide a search feature) has a very inefficient search results template, where for each search result the template links back to the update’s related product page.

The meat of the search results template looks something like this:

{% for result in page.object_list %}
<div class="search-result">
    <a href="{{ result.object.get_absolute_url }}">{{ result.object }}</a>
    <a href="{% url "product_page" result.object.product.slug %}">{{ result.object.product }}</a>
</div>
{% endfor %}

The reverse URL lookup triggers a separate SQL query to find the related product object’s slug field for each object in the results list, and that slows down the page response significantly.

For a regular queryset you would tell Django to fetch the related objects in one go when populating the template context in order to avoid the extra queries, but in this case page.object_list is generated by Haystack. So how to tell Haystack to use select_related() for the queryset?

It is easy. When you register a model to be indexed with Haystack for searching, you have to define a SearchIndex model, and you can also override the read_queryset() method that is used by Haystack to get a Django queryset:

# myapp.search_indexes.py
from haystack import indexes, site
from myapp.models import MyModel

class MyModelIndex(indexes.SearchIndex):
    # Indexed fields declared here
    ...
    def get_model(self):
        return MyModel

    def read_queryset(self):
        return self.model.objects.select_related()

site.register(MyModel, MyModelIndex)

And that solved it for me. Shaves three quarters off the execution time.

PS This all pertains to Django 1.4 and Haystack 1.2.7.

PPS Also pertains to a version of my Adobe software updates page that I haven’t deployed quite yet.