Solving Python's everlasting problem of slow code

Python is truly my go-to programming language. The ease of use, clean code and speed of development it delivers are unmatched. Often times I was able to prototype an idea very quickly. When I moved on to larger workloads there was one problem though: the execution speed.

Because Python enforces a lot less constraints than non dynamically typed languages, such as Go, Rust or C/C++, it is easier and faster to write code in Python, but it will also never achieve the same levels of speed.

So let me be clear from the start: There are ways of improving Python's speed, but they all come with added complexity and their own downsides. If you don't necessarily have to stick to Python it would probably be your best choice to just use a programming language such as Go

Alternative interpreters/compilers

PyPy

PyPy is an alternative to Python's default interpreter CPython. On average, PyPy is 4x faster than Python! Not all libraries work with PyPy though. A list of compatible libraries is available here.

Nuitka

This one is arguably a favorite of mine. Nuitka first translates your Python code to C and then compiles it. This also includes a lot of clever optimizations which offer a speedup of well over 300%. You get a single binary which includes all off your program's dependencies and can be easily distributed. The target computer doesn't even need to have Python installed anymore to run the program! At the same time Nuitka offers a greater amount of compatibility than PyPy.


Solving the problem of concurrency

When it comes to concurrency (completing multiple tasks at the same time) there is a problem with Python: The Global Interpreter Lock (GIL). This basically allows a single Python process to only ever use one thread.  This article goes into more detail: Link

Let's look at some example code:

from datetime import datetime

import httpx


start_time = datetime.now()

for _ in range(0, 10):
    httpx.get("https://httpbin.org/get")

print("Took: ", datetime.now() - start_time)

Here we send 10 HTTP GET requests to httpbin and also determine the time it takes.

The output: Took:  0:00:04.893904

On my machine it took nearly 5 seconds to send those requests and receive the responses. Let's see if we can speed things up a little.

Asynchronous Code (asyncio)

By running code asynchronously, we can make use of waiting time which occurs throughout the program. Take web requests as an example: Normally the program would send an HTTP request and then wait for the server to send back a response before going on to the next line of code. The only downside of this is that the code becomes harder to debug.

With asynchronous code the next web request (or task) will be started while we are still waiting for the first response to arrive.

from datetime import datetime
import asyncio

import httpx


async def run():
    async with httpx.AsyncClient() as client:
        for _ in range(0, 10):
            await client.get("https://httpbin.org/get")


start_time = datetime.now()
asyncio.get_event_loop().run_until_complete(run())
print("Took: ", datetime.now() - start_time)

By modifying our code to use asyncio we reduce the time the program takes to complete all request to 0:00:01.466089. The program now runs more than 3 times as fast.

Multiprocessing

Since one Python process can only every use one thread at a time (→ GIL), why not use multiple processes? That is exactly what the package multiprocessing allows us to do. While this speeds up the program, each additional process also takes up system resources, and it becomes an additional challenge to exchange data between the processes.

Modifying our source code once again, this time to start multiple processes for completing the requests, we get:

from datetime import datetime
from multiprocessing import Pool, cpu_count

import httpx


def run():
    httpx.get("https://httpbin.org/get")


start_time = datetime.now()

# Sets the number of processes running at the same time
# equal to the number of cpu cores.
pool = Pool(cpu_count())

for _ in range(0, 10):
    pool.apply_async(run, [])
pool.close()
pool.join()

print("Took: ", datetime.now() - start_time)

This brings our total down to 0:00:00.531039! It pretty much took only 1/3 of the time of the asynchronous implementation.

Quoorex
Future Entrepreneur. Interested in tech, finance and becoming 1% better every day.
Germany