Press "Enter" to skip to content

@timeout decorator

timeout_smallThere was a problem lately in a Django (python) system I work with, that some of the memcache requests took a really long time to finish, several seconds in some cases. This resulted long page loads and even caused some MySQL queries to pile up producing an undesired high load on the database. Something had to be done!

We need timeouts!

The main idea was that some timeout is needed, so that the system won’t wait forever for memcache. I started investigating Django caching, dug deeper and deeper until I hit the wall, named python-memcached. This is the library used to access memcache. Along the way I didn’t see any indication of timeout settings, but there were some options passed down to the library, so I still had some hope. Unfortunately it turned out that python-memcached does not support timeouts for the requests, it was a dead end.

Still, we need timeouts, I started looking for some other way…

Act I: Signals

Searching the internet was really productive, and I managed to find a timeout script which used signals. The idea behind this solution is that you set an asynchronous alert event in the future with signal.alarm(), which when fired calls a handler function. A little added hurdle was that the original handler had to be stored, and set back when the timeout logic is done.

There was a problem with this solution, that signal.alarm() only accepts integer values, which translate to seconds, so the smallest possible timeout was 1 second. This is already too much, since the memcache requests in question should finish around a few ms. No need to worry, there is another signal method for the job, signal.setitimer(), which works with float values, so a fraction of a second could be set. The final code was like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
from functools import wraps
import signal
 
class TimeoutException(Exception):
    pass
 
def timeout(timeout):
    def wrap_function(func):
        @wraps(func)
        def __wrapper(*args, **kwargs):
            def handler(signum, frame):
                raise TimeoutException()
 
            old = signal.signal(signal.SIGALRM, handler)
            signal.setitimer(signal.ITIMER_REAL, float(timeout) / 1000)
            try:
                result = func(*args, **kwargs)
            finally:
                signal.signal(signal.SIGALRM, old)
            return result
        return __wrapper
    return wrap_function

This is a working solution in simple programs, but in a framework, such as Django, signals are delicate things and the system relies on them. When I was testing this version with the test suite, it simply quit on the first SIGALRM. I had to find another way.

Act II: Processes

I started looking for ways to do timeouts with parallel processing and found the multiprocessing tools of python. The idea here is to run the critical logic in a separate process, store the result in a queue, and terminate the process if it’s still running when the timeout comes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from functools import wraps
from multiprocessing import Process, Queue
 
class TimeoutException(Exception):
    pass
 
def timeout(timeout):
    def wrap_function(func):
        @wraps(func)
        def __wrapper(*args, **kwargs):
            def queue_wrapper(args, kwargs):
                q.put(func(*args, **kwargs))
 
            q = multiprocessing.Queue()
            p = multiprocessing.Process(target=queue_wrapper, args=(args, kwargs))
            p.start()
            p.join(float(timeout) / 1000)
            if p.is_alive():
                p.terminate()
                p.join()
                raise TimeoutException()
            p.terminate()
            return q.get()
        return __wrapper
    return wrap_function

This solution was a step forward, but also had its own flaws. No more messing with signals (except for SIGTERM, but it’s called on a different process, so no problem) is good, the test suit was running steadily, but it was really slow. One of my colleagues pointed out that this solutions wasn’t suited for large frameworks like Django either, since process creation is based on the fork(2) system call, which creates a copy of the current process.

Copying the whole environment for a single call is a really bad idea. It uses a lot of memory and it takes a considerable amount of time, which in the end renders the whole timeout effort useless. In the test suite I set a 100ms timeout, and a lot of tests timed out, the system clearly waited the 100ms almost every time (hence the slowness). Some tests even failed, since they depend on the cache, which timed out frequently. Again, I had to find another way.

Act III: Threads

What is parallel processing and cheaper than a whole process: threads. I started by looking at the threading module of python, but my first experiments were clunky, I could start a thread with a function, but couldn’t get the results out. I started to lose hope, when I found ThreadPools.

It was in a icon-stack-overflow StackOverflow answer and it is the undocumented hidden gem of python concurrent programming. It can be found in multiprocessing tools and it uses the same interface as (process) Pools. Pools are really convenient, one can apply a functions on them asynchronously and get an async result object back. Then calling get() on this object returns the result of the function applied on the pool, unless you set a timeout. If the thread didn’t finish before timeout a TimeoutError exceptions is raised. Now imagine this with threads!

No more handlers, signals or manual queue management, and even the exception is raised, no wonder my first implementation was only a few lines, compared to the previous solutions, and it was working! But… as usual, it had some problems.

The first problem was, that it relies on an attribute, which is not accessible in python <2.7.2, but there was a clever fix for this. The other was, that once it was running in the test suite it exhausted all the threads really fast… because I was an idiot, creating a ThreadPool (with one thread) for every timeout. In a small environment, this caused no problem at all, but in the test suite it was showering errors (and segmentation faults). The fix was a lazy global ThreadPool, which is created only if needed and only once, and a try:except: for thread.errors, for extreme cases. After fixing everything, it was running like a charm, here is the final decorator:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
from functools import wraps
from multiprocessing import TimeoutError
from multiprocessing.pool import ThreadPool
import thread
import threading
import weakref
 
thread_pool = None
 
def get_thread_pool():
    global thread_pool
    if thread_pool is None:
        # fix for python <2.7.2
        if not hasattr(threading.current_thread(), "_children"):
            threading.current_thread()._children = weakref.WeakKeyDictionary()
        thread_pool = ThreadPool(processes=1)
    return thread_pool
 
def timeout(timeout):
    def wrap_function(func):
        @wraps(func)
        def __wrapper(*args, **kwargs):
            try:
                async_result = get_thread_pool().apply_async(func, args=args, kwds=kwargs)
                return async_result.get(float(timeout) / 1000)
            except thread.error:
                return func(*args, **kwargs)
        return __wrapper
    return wrap_function

This was a long journey with many obstacles along the way, but I learned a lot about concurrent programming, processes and threads. I won’t say that I understand it fully though. The three versions I created show, that you can do things in many ways with python, but at the end only one would fit for your exact needs, and I wouldn’t be surprised If there was an even better solution out there…

You can find my experimenting in the following gist: https://gist.github.com/aorcsik/bcc17a299434ee2a2a1a

4 Comments

    • aorcsik
      aorcsik June 4, 2014

      Thank you very much! After checking the repo I figured it is possible to use the built in socket timeout, and it is also possible (in a tricky way) with the version of the library we use.

      • aorcsik
        aorcsik June 4, 2014

        Well, it turned out just half the joy. The whole story gets complicated with every day. But pointing me to the source was definitely a good move 🙂

Leave a Reply

%d bloggers like this: