Python multiprocessing
Sometimes I find a good time by writing small but easy to understand test programs to research specific behaviours in some language or another. Sometimes, the programs grows a bit wieldy and not so easy to understand, but all’s good that ends well. During the last year or so, I’ve grown more and more interested in python programming and finding it very enjoyable. There are some strange constructs that can be hard to wrap your head around, and sometimes I run into some very weird problems, but it’s not a big problem (imho).
My last forays has been into multiprocessing and how it behaves in Python. One of the features I had a hard time to wrap my head around since there are some syntactical weirdness that could use some addressing, or at least takes a bit of time to get your head around.
- Multiprocessing requires all objects to be pickled and sent over to the running process by a pipe. This requires all objects to be picklable, including the instance methods etc.
- Default implementation of pickling functions in python can’t handle instance methods, and hence some modifications needs to be done.
- Correct parameters must be passed to callbacks and functions, via the apply_async. Failing to do so causes very strange errors to be reported.
- Correct behaviour might be hard to predict since values are calculated at different times. This is especially true if your code has side effects.
The small but rather interesting testcode below explores and shows some of the interesting aspects mentioned above. Of special interest imho is the timing differences, it clearly shows what you get yourself into when doing multiprocessing.
#!/usr/bin/python import multiprocessing def _pickle_method(method): func_name = method.im_func.__name__ obj = method.im_self cls = method.im_class return _unpickle_method, (func_name, obj, cls) def _unpickle_method(func_name, obj, cls): for cls in cls.mro(): try: func = cls.__dict__[func_name] except KeyError: pass else: break return func.__get__(obj, cls) import copy_reg import types copy_reg.pickle(types.MethodType, _pickle_method, _unpickle_method) class A: def __init__(self): print "A::__init__()" self.weird = "weird" class B(object): def doAsync(self, lala): print "B::doAsync()" return lala**lala def callBack(self, result): print "B::callBack()" self.a.weird="wherio" print result def __init__(self, myA): print "B::__init__()" self.a = myA def callback(result): print "callback result: " + str(result) def func(x): print "func" return x**x if __name__ == '__main__': pool = multiprocessing.Pool(2) a = A() b = B(a) print a.weird print "Starting" result1 = pool.apply_async(func, [4], callback=callback) result2 = pool.apply_async(b.doAsync, [8], callback=b.callBack) print a.weird print "result1: " + str(result1.get()) print "result2: " + str(result2.get()) print a.weird print "End"
The above code resulted in the following two runs, and if you look closely, the timing problems show up rather clearly. Things simply don’t happen in the order always expected when threading applications:
oan@laptop4:~$ ./multiprocessingtest.py A::__init__() B::__init__() weird Starting weird func B::doAsync() callback result: 256 B::callBack() 16777216result1: 256 result2: 16777216 wherio End oan@laptop4:~$ ./multiprocessingtest.py A::__init__() B::__init__() weird Starting weird func B::doAsync() callback result: 256 B::callBack() 16777216 result1: 256 result2: 16777216 wherio End
One more warning is in order. A job that leaves via the multiprocessing.Pool, and then calls the callback function has a major effect that could take some getting used to. The callback is run in such a fashion that if a class was changed, the change has not taken place in the context of the callback.