Welcome to part 11 of the intermediate Python programming tutorial series. In this part, we're going to talk more about the built-in library: multiprocessing.
In the previous multiprocessing tutorial, we showed how you can spawn processes. If these processes are fine to act on their own, without communicating with eachother or back to the main program, then this is fine. These processes can also share a common database, or something like that to work together, but, many times, it will make more sense to use multiprocessing to do some processing, and then return results back to the main program. That's what we're going to cover here.
To begin, we're going to import Pool
from multiprocessing import Pool
Pool allows us to create a pool of worker processes
Let's say we want to run a function over each item in an iterable. Let's just do:
def job(num): return num * 2
Simple enough, now let's set up the processes:
if __name__ == '__main__': p = Pool(processes=20) data = p.map(job, [i for i in range(20)]) p.close() print(data)
In the above case, what we're going to do is first set up the Pool
object, which will have 20 processes that we'll allow to do some work.
Next, we're going to map the job function to a list of parameters ([i for i in range(20)]
). When done, we close the pool, and then we're printing the result. In this case, we get:
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38]
If you raise the range and processes to the 100's, you can see your CPU max out and the processes if you like. ~500 for me seems to do the trick.