Python Compiler And Interpreter Python

Feb 12th, 2021 - written by Kimserey with .

Last week we looked at how compilers worked in general. We saw that they were mostly composed of two parts, the front end and back end. The front end being the compiler from programming source code to intermediate representation and the back end being the runtime. In today’s post we will look specifically into how Python gets compiled and interpreted with the default implementation, CPython.

The Python Executable

In this post, we assume that we are using CPython, installed and available in our bin whether on computer or virtual environment. For example for my machine:

1
2
3
4
5
❯ where python3
/Library/Frameworks/Python.framework/Versions/3.8/bin/python3

❯ file /Library/Frameworks/Python.framework/Versions/3.8/bin/python3
/Library/Frameworks/Python.framework/Versions/3.8/bin/python3: Mach-O 64-bit executable x86_64

We can see that python3 is a Mach-O 64-bit executable x86_64 file which is the executable format for MacOS.

On a Ubuntu machine I would see the following:

1
2
❯ file ./python3.8
./python3.8: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=02526282ea6c4d6eec743ad74a1eeefd035346a3, for GNU/Linux 3.2.0, stripped

When you download Python from the official site https://www.python.org/downloads/, you would get the CPython implementation. An easy way to verify that is to fire up the interpreter and run the following:

1
2
3
4
5
6
7
❯ python3
Python 3.8.4 (v3.8.4:dfa645a65e, Jul 13 2020, 10:45:06) 
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform
>>> platform.python_implementation()
'CPython'

We can see that platform.python_implementation() returns CPython which is the default implementation of Python. CPython is an implementation of Python written in C. The executable created from the compilation of CPython contains a compiler, which compiles the programming code to Python Bytecode, and an interpreter which interprets the Python Bytecode and executes machine code intructions based on the resulting interpretation.

CPython can be downlaoded from the official GitHub repository and if we build it locally, we’ll get python.exe which would be the exact same executable installed from the official documentation (minus the version difference).

In order to run a script, we do:

1
❯ python3 my_file.py

or to run a module as script, we do:

1
❯ python3 -m my_module

Python will then first compile to Bytecode, then interpret the Bytecode in order to execute the commands requested.

Compiler and Interpreter

Now that we understand what is python3 executable, we can look at what the compilation and interpretation steps are.

The first compilation step converts Python programming source code into Python Bytecode. The Python Bytecode is then stored in a file with .pyc extension. Those files can be found under __pycache__/ folders. The Python Bytecode is the machine code understood by CPython VM - or CPython interpreter.

When we do python3 -m my_module, if the Bytecode was already generated and the files haven’t changed, the compilation step is skipped and the interpretation step start right away. Usually compilation and interpretation are done in one time but we can also force the compilation separately using compileall:

1
❯ python3 -m compileall .

Python Bytecode is a special language understood my the machine, but we also have access to a human readable version with dis module which stands for dissassembler. A dissassembler, it is a program that converts machine code to assembly language - as opposed to an assembler, which converts to machine code. The assembly language in question here is a human readable form of the Python Bytecode. For example, we can create a function:

1
2
3
>>> def say_hello():
...     print("Hello World")
...

And we can then use dis to look into the generated Python Bytecode:

1
2
3
4
5
6
7
8
>>> from dis import dis
>>> dis(say_hello)
  2           0 LOAD_GLOBAL              0 (print)
              2 LOAD_CONST               1 ('Hello World')
              4 CALL_FUNCTION            1
              6 POP_TOP
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE

Each line correspond of an opcode which will be understood by the CPython VM.

The code used to compile the Python code is present in CPython under cpython/Python/compile.c. It contains the compiler and assembler.

Following the generation of the Python Bytecode, it is then fed into the VM which runs it through the eval function. The evaluation can be found under cpython/Python/ceval.c which would contain the evaluation loop, taking care of frames, setting up the environment with variables, and interpreting the opcodes with adequate calls to methods.

Python is known for being “Batteries included”, referring to product being shipped with their batteries making them usable straight away, shipping with a large set of modules precompiled and available to be used. This set of modules is known as the Python Standard Library, and its full implementation can be found under cpython/Lib.

Extra modules can be installed with pip install when we create application, for regular Python modules, at runtime, their respective Bytecode are imported into the process by the interpreter when they are import‘d. In the case of Python module having C implementation, the packages ship with .so files which are compiled shared libraries that are dynamically linked at runtime.

And that concludes how Python code gets compiled and ran!

Conclusion

CPython is the default compiler and interpreter used for Python. It is used to run Python scripts or modules. In this post we saw a general picture of what happens when we run a Python script. We saw that we use CPython executable, compile into Python Bytecode which then get interpreted by the CPython VM. Hope you like this post and I see you on the next one!

External Sources

Designed, built and maintained by Kimserey Lam.