Full-stack web applications developer

Welcome to my blog

As I discover new things in my field and solve troublesome problems, I will attempt to document my finds here.

Improving Xapian backed Haystack searches in Django

January 9, 2019, Arthur Pemberton0 Comments

I recently switched a Django project utilizing Haystack from Whoosh as the engine to Xapian. The performance significantly improved, but I was left with some deficiencies in the search results — this turned out to be due to the default settings.

Here are the necessary settings to improve the search result quality:

import xapian

    xapian.QueryParser.FLAG_PHRASE |
    xapian.QueryParser.FLAG_BOOLEAN |
    xapian.QueryParser.FLAG_LOVEHATE |
    xapian.QueryParser.FLAG_WILDCARD |
    xapian.QueryParser.FLAG_PURE_NOT |

    'default': {
        'ENGINE': 'xapian_backend.XapianEngine',
        'PATH': root.path('data/xapian')(),

Before these settings, I was having problem with queries being apparently case sensitive, at least when using AutoQuery, and failing partial searches. These changes improve case insensitivity and stemming, which still having almost 100x better performance than Whoosh.

Hopefully this helps someone out there looking for solutions.

Backend request heades with an Apache reverse proxy (frontend server)

July 18, 2018, Arthur Pemberton0 Comments

If you ever find yourself using Apache as a frontend server or reverse proxy, as opposed to one Nginx, for example. You may want to be sure to pass the actual host IP and scheme information back to your application so that it may function properly. By default, mod_proxy sets the X-Forwarded-For header. But that header is a list, and is used by other types of proxies. You may want to set the X-Forwarded-Proto and X-Real-IP as well. The following entries into your VirtualHost or similar should do the job:

Read More

phpMyAdmin: Incorrect format parameter

June 20, 2018, Arthur Pemberton3 Comments

If you’re getting “Incorrect format parameter” errors on attempts to import files into phpMyAdmin, be sure to get your post_max_size and upload_max_filesize settings in php.ini. More than likely, they are too small for the file that you’re uploading.

Full page screenshots with Python and Selenium

December 18, 2017, Arthur Pemberton9 Comments

Currently, none of the major Selenium drivers (browsers) support the ability to easily take a screenshot of an entire web page. The following function takes multiple screenshots through the viewport and scrolls between screenshots, then stitches the resulting images into a single PNG.

Read More

Calling openssl genrsa from Python

September 28, 2017, Arthur Pemberton0 Comments

I have been writing some automation code a few dozen websites, and I wanted to generate SSL keys, and I couldn’t get the `subprocess` call to work properly, so I thought I would post the final solution. The argument to `communicate()` is critical.
Read More

Replicating Chilkat AES Cryptography with PyCrypto

November 22, 2016, Arthur Pemberton0 Comments

Today, I had the need to replicate an encrypted query string token to inter-operate with a third-party commercial application. I was able to determine the library, symmetrical algorithm and secret key being used to create the token. Turns out, it was an web application using the Chilkat .NET library to do the encryption and decryption. Specifically, it was the Chilkat AES (aka Rijndael) methods being used.
Read More