how is python written
To appreciate C one must learn assembly; to appreciate Python one must learn C. They all stack on top of each other (Assembly -> C -> Python -> English (now))
Python has many implementations (PyPy, Jython, etc.), but the original and most popular is CPython.
getting started with CPython source code
Clone it:
$ git clone https://github.com/python/cpython
$ cd cpython
$ git checkout v3.8.0b4 # or whatever version you prefer
What’s inside:
cpython/
│
├── Doc ← Source for the documentation
├── Grammar ← The computer-readable language definition
├── Include ← The C header files
├── Lib ← Standard library modules written in Python
├── Mac ← macOS support files
├── Misc ← Miscellaneous files
├── Modules ← Standard Library Modules written in C
├── Objects ← Core types and the object model
├── Parser ← The Python parser source code
├── PC ← Windows build support files
├── PCbuild ← Windows build support files for older Windows versions
├── Programs ← Source code for the python executable and other binaries
├── Python ← The CPython interpreter source code
└── Tools ← Standalone tools useful for building or extending Python
After forking and cloning, I broke things down into:
- Finding the entry point (the main function) of Python
- Locating where specific functions are implemented
- Editing those functions
- Remaking grammar files/source files and running the modified Python
table of contents
- The Walrus Operator
- String Optimization
- Chained Comparison Operators
- Integer Optimization
- How Python’s id() Works
- Adding a New Statement to Python
- Function Overloading in Python
- Parser and Tokenization
- Visualizing Abstract Syntax Trees
- Pointers in Python
- The Print Function Internals
- Python is slow as hell
- Python Execution Process
the walrus operator
Python 3.8 added the Walrus Operator (:=). It assigns and returns in one shot.
Say you’re building a tiny shell that takes commands until you type exit:
Without walrus:
command = input("$ ")
while command != "exit":
os.system(command)
command = input("$ ")
With walrus:
while (command := input("$ ")) != "exit":
os.system(command)
what’s weird about the walrus operator
But nope - not without parentheses:
>>> a := 10
File "<stdin>", line 1
a := 10
^
SyntaxError: invalid syntax
But wrap it in parentheses and it’s fine:
>>> (a := 10)
10
This is documented in wtf-python. Python disallows non-parenthesized assignment expressions but allows non-parenthesized assignment statements. Annoying.
modifying Python’s grammar
But we can fix that. Just modify the grammar:
# Add walrus operator here so we don't need () every time
expr_stmt: testlist_star_expr (annassign | augassign (yield_expr|testlist) |
[('=' (yield_expr|testlist_star_expr))+ [TYPE_COMMENT]] |
[(':=' (yield_expr|testlist_star_expr))+ [TYPE_COMMENT]]
)
The grammar file isn’t used directly by the compiler. A parser table generated by pgen is what actually matters. So after changing the grammar, you run make regen-pgen and make to rebuild. Now it works:
>>> a := 10
>>> a
10
string optimization
Python does something clever with strings called String Interning:
>>> a = 'samit'
>>> b = 'samit'
>>> a == b
True
>>> id(a)
4337581920 # memory location
>>> id(b)
4337581920 # same memory location
Both variables point to the same memory location.
string interning
The idea is simple: don’t create a new copy of a string every time. Keep one copy of each distinct string and use pointer references everywhere else.
You can check if two objects are the same in-memory object with is:
>>> 'python' is 'python'
True
how does it work
Here’s the actual C code:
void PyUnicode_InternInPlace(PyObject **p)
{
PyObject *s = *p;
PyObject *t;
// Check if already cached, return that
if (PyUnicode_CHECK_INTERNED(s))
return;
if (interned == NULL) {
interned = PyDict_New(); // create global mapping like a cache
if (interned == NULL) {
PyErr_Clear();
return;
}
}
// STRING INTERNING KEY -> we aren't really storing strings but rather pointers
t = PyDict_SetDefault(interned, s, s);
if (t == NULL) {
PyErr_Clear();
return;
}
if (t != s) {
Py_INCREF(t); // increases reference (more objects pointing to it)
Py_SETREF(*p, t); // sets *p = t so new p string points to instance of t
return;
}
Py_REFCNT(s) -= 2;
_PyUnicode_STATE(s).interned = SSTATE_INTERNED_MORTAL;
}
I added a print statement to the C code and you can literally watch interning happen:
>>> a = 'samit'
Already interned: __doc__
Already interned: __annotations__
Already interned: top
Interning: <module>
Interning: samit
Already interned: stderr
Already interned: flush
Already interned: flush
Already interned: stdout
>>> b = 'samit'
Already interned: samit # already interned
chained comparison operators
Python lets you chain comparisons. a < b < c actually means (a < b) and (b < c). Like how math notation works.
>>> -3 < -2 < -1
True
>>> 3 > 2 == 1
False
In C, the same expressions give opposite results:
#include <stdio.h>
int main(int argc, char *argv[]) {
printf("%d\n", -3 < -2 < -1);
printf("%d\n", 3 > 2 == 1);
return 0;
}
// Output:
// 0 // false
// 1 // true
C evaluates -3 < -2 < -1 as ((-3 < -2) < -1) because of left-to-right associativity. Classic C gotcha.
how Python implements comparison chaining
Looking at the bytecode with dis:
dis.dis("1 < 2 < 3")
0 RESUME 0
1 LOAD_CONST 0 (1)
LOAD_CONST 1 (2)
SWAP 2
COPY 2
COMPARE_OP 2 (<)
COPY 1
TO_BOOL
POP_JUMP_IF_FALSE 5 (to L1)
POP_TOP
LOAD_CONST 2 (3)
COMPARE_OP 2 (<)
RETURN_VALUE
L1: SWAP 2
POP_TOP
RETURN_VALUE
Python short-circuits here: once it knows 1 < 2, it checks 2 < 3. C would just evaluate (1 < 2) < 3 which is completely different.
integer optimization
Python preallocates integers from -5 to 256. Any time you use a number in that range, you get the same object back instead of a new one.
Check it:
>>> x, y = 36, 36
>>> id(x) == id(y)
True
>>> x, y = 257, 257
>>> id(x) == id(y)
False
Here’s the C code:
static PyObject *
get_small_int(sdigit ival)
{
PyObject *v;
assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS); // NSMALLNEGINTS = 5, NSMALLPOSINTS = 257
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);
return v; // return the preallocated integer instance
}
peephole optimizations
CPython also does “peephole optimizations” which lead to some weird behavior:
>>> x = 1000
>>> y = 1000
>>> x is y
False # In REPL
# But in a file this would return True
CPython preloads:
- Integers between
-5and256 - Strings under 20 characters that only contain ASCII letters, digits, and underscores
>>> s1 = "realpython"
>>> s2 = "realpython"
>>> s1 is s2
True
>>> s1 = "Real Python!" # Contains non-alphanumeric characters
>>> s2 = "Real Python!"
>>> s1 is s2
False
how Python’s id() works
id() just returns the memory address of an object. Here it is in bltinmodule.c:
static PyObject *
builtin_id(PyModuleDef *self, PyObject *v)
{
PyObject *id = PyLong_FromVoidPtr(v); // stores argument as a memory location in id
if (id && PySys_Audit("builtins.id", "O", id) < 0) {
Py_DECREF(id);
return NULL;
}
return id;
}
PyLong_FromVoidPtr(v) converts a pointer to a Python long integer:
PyLong_FromVoidPtr(void *p)
{
return PyLong_FromUnsignedLong((unsigned long)(uintptr_t)p);
}
the ‘is’ operator
is checks if two objects point to the same memory location:
>>> a = 10
>>> b = 10
>>> a is b
True
>>> id(a)
4343226816
>>> id(b)
4343226816
In bytecode it’s the IS_OP instruction:
>>> import dis
>>> dis.dis("a is b")
0 RESUME 0
1 LOAD_NAME 0 (a)
LOAD_NAME 1 (b)
IS_OP 0
RETURN_VALUE
The actual implementation in ceval.c:
case TARGET(IS_OP): {
PyObject *right = POP();
PyObject *left = TOP();
// Check if two pointers are the same
int res = Py_Is(left, right) ^ oparg;
// Set the result on top of stack
PyObject *b = res ? Py_True : Py_False;
Py_INCREF(b);
SET_TOP(b);
Py_DECREF(left);
Py_DECREF(right);
DISPATCH();
}
oparg is how it handles both is and is not:
>>> dis.dis("a is b")
0 RESUME 0
1 LOAD_NAME 0 (a)
LOAD_NAME 1 (b)
IS_OP 0
RETURN_VALUE
>>> dis.dis("a is not b")
0 RESUME 0
1 LOAD_NAME 0 (a)
LOAD_NAME 1 (b)
IS_OP 1
RETURN_VALUE
adding a new statement to Python
Python’s main function looks different depending on your OS:
/* Minimal main program -- everything is loaded from the library */
#include "Python.h"
#ifdef MS_WINDOWS
int
wmain(int argc, wchar_t **argv)
{
return Py_Main(argc, argv);
}
#else
int
main(int argc, char **argv)
{
return Py_BytesMain(argc, argv);
}
#endif
I added a new statement called nuke that just kills the runtime with exit code 45:
| 'break' { _PyAST_Break(EXTRA) }
| 'continue' { _PyAST_Continue(EXTRA) }
| 'nuke' { _PyAST_nuke(EXTRA) }
// in python.gram
// in compile.c
case Break_kind:
return compiler_break(c);
case Continue_kind:
return compiler_continue(c);
case Nuke_kind:
return compiler_nuke(c);
static int
compiler_nuke(struct compiler *c)
{
exit(45); // implementation
}
Run make regen-pegen (Python 3.10+ uses pegen, earlier versions used pgen) then make, and we’ve got a new keyword:
>>> nuke
[Program exits with code 45]
function overloading in Python
Python doesn’t do function overloading like C++:
// C++ supports function overloading (C does not)
int area(int len, int breadth) { return len * breadth; }
float area(int radius) { return 3.14 * radius * radius; }
// area(3,4) calls the first function, area(3) calls the second
In Python, defining a function with the same name just overwrites the previous one:
def area(length, breadth): return length * breadth
locals()
# {'area': <function area at 0x104d1b420>}
def area(radius):
return radius * radius * 3.14
locals()
# {'area': <function area at 0x104d1b2e0>}
print(area(4)) # 50.24
print(area(3,4)) # TypeError: area() takes 1 positional argument but 2 were given
implementing function overloading
You can fake it with decorators though:
from inspect import getfullargspec
from functools import wraps
class OverloadRegistry:
""" Singleton class to hold overloaded functions """
_instance = None # private
def __init__(self) -> None:
if OverloadRegistry._instance is not None: raise Exception("Use get_instance() to get singleton instance")
self.registry = {}
OverloadRegistry._instance = self
@classmethod
def get_instance(cls):
if cls._instance is None: cls._instance = OverloadRegistry()
return cls._instance
def _make_key(self, fn, args=None):
if args is None: args = getfullargspec(fn).args
return (fn.__module__, fn.__name__, len(args)) # key = {module, name, number of arguments}
def register(self, fn):
key = self._make_key(fn)
self.registry[key] = fn # attach this key to function
@wraps(fn)
def wrapper(*args, **kwargs):
""" The @wraps decorator preserves metadata of the original function """
match_key = (fn.__module__, fn.__name__, len(args))
target_fn = self.registry.get(match_key)
if not target_fn: raise TypeError("No matching function of this type")
return target_fn(*args, **kwargs)
return wrapper
def overload(fn):
return OverloadRegistry.get_instance().register(fn)
@overload
def area(l, b): return l * b
@overload
def area(r): return r * r * 3.14
print(area(5,6)) # 30
print(area(3)) # 28.26
Python 3.4+ also has functools.singledispatch which dispatches based on type:
from functools import singledispatch
@singledispatch
def area(shape):
raise TypeError("Unsupported type")
@area.register
def _(dimensions: tuple):
l, b = dimensions
return l * b
@area.register
def _(radius: int):
return 3.14 * radius * radius
print(area((5,6))) # 30
print(area(3)) # 28.26
And 3.8+ has singledispatchmethod for class methods:
from functools import singledispatchmethod
class AreaCalc:
@singledispatchmethod
def area(self, shape):
raise TypeError("Unsupported Type")
@area.register
def _(self, dimensions: tuple):
l, b = dimensions
return l * b
@area.register
def _(self, radius: int):
return 3.14 * radius * radius
calc = AreaCalc()
print(calc.area((5,6))) # 30
print(calc.area(3)) # 28.26
parser and tokenization
Python’s parser uses Deterministic Finite Automata (DFA) to process tokens. Here it is in action:
$ ./python.exe -d test_tokens.py
Token NAME/'def' ... It's a keyword
DFA 'file_input', state 0: Push 'stmt'
DFA 'stmt', state 0: Push 'compound_stmt'
DFA 'compound_stmt', state 0: Push 'funcdef'
DFA 'funcdef', state 0: Shift.
Token NAME/'my_function' ... It's a token we know
DFA 'funcdef', state 1: Shift.
Token LPAR/'(' ... It's a token we know
...
DFA visualisation:

visualizing abstract syntax trees
CPython converts your code into an Abstract Syntax Tree (AST). You can visualize this with instaviz:
import instaviz
def main():
a = 2 * 2 * 2 * 2
b = 1 + 5
c = [1, 4, 6]
for i in c:
print(i)
else:
print(a)
return c
instaviz.show(main)
Here’s the AST showing all loops/conditional statements:

The compiler then turns this AST into something the CPU can work with:
- Create a control-flow-graph representing execution order
- Convert CFG nodes into bytecode
pointers in Python
Python doesn’t expose pointers to you, but they’re everywhere in CPython’s internals. Good writeup on this: pointers in Python.
the print function internals
Here’s print() in CPython:
static PyObject *
builtin_print(PyObject *self, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames)
{
// If no file specified, use stdout
if (file == NULL || file == Py_None) {
file = _PySys_GetObjectId(&PyId_stdout);
// ...
}
// Print each argument with separator
for (i = 0; i < nargs; i++) {
if (i > 0) {
if (sep == NULL)
err = PyFile_WriteString(" ", file);
else
err = PyFile_WriteObject(sep, file, Py_PRINT_RAW);
if (err)
return NULL;
}
err = PyFile_WriteObject(args[i], file, Py_PRINT_RAW);
if (err)
return NULL;
}
// Print end character (default: newline)
if (end == NULL)
err = PyFile_WriteString("\n", file);
else
err = PyFile_WriteObject(end, file, Py_PRINT_RAW);
Py_RETURN_NONE;
}
So when you call print("hello world!"):
"hello world"becomes aPyUnicodeObjectbuiltin_print()runs with 1 argumentfilegets set tostdout- The argument gets written to
file - A
\ngets written tofile
Python is slow as hell
I wrote the same BST in both Python and C++ to compare:
"""
BST Structure
5
/ \
3 7
/ \
1 9
"""
import time
import random
class Node:
def __init__(self, data) -> None:
self.left = None
self.right = None
self.data = data
class BST:
def __init__(self) -> None:
self.root = None
def insert(self, data):
if self.root is None: self.root = Node(data)
else:
self._insert(data, self.root)
def _insert(self, data, currNode):
if data < currNode.data:
# left tree
if currNode.left is None: currNode.left = Node(data)
else:
self._insert(data, currNode.left)
elif data > currNode.data:
if currNode.right is None: currNode.right = Node(data)
else:
self._insert(data, currNode.right)
else:
print("Value already in tree")
def inorder(node): # L N R
if node:
inorder(node.left)
print(node.data)
inorder(node.right)
def run_python_benchmark():
nums = random.sample(range(1, 100000), 10000) # 10,000 unique numbers
tree = BST()
start = time.time()
for num in nums:
tree.insert(num)
end = time.time()
print(f"Python BST insert time: {end - start:.6f} seconds")
run_python_benchmark()
#include <iostream>
#include <chrono>
#include <vector>
#include <algorithm>
#include <random>
using namespace std;
using namespace std::chrono;
class Node
{
public:
int data;
Node *left;
Node *right;
Node(int val)
{
data = val;
left = right = nullptr;
}
};
class BST
{
public:
Node *root;
BST()
{
root = nullptr;
}
void insert(int data)
{
if (!root)
root = new Node(data);
else
insertHelper(data, root);
}
private:
void insertHelper(int data, Node *currNode)
{
if (data < currNode->data)
{
if (!currNode->left)
currNode->left = new Node(data);
else
insertHelper(data, currNode->left);
}
else if (data > currNode->data)
{
if (!currNode->right)
currNode->right = new Node(data);
else
insertHelper(data, currNode->right);
}
}
};
int main()
{
vector<int> nums(10000);
iota(nums.begin(), nums.end(), 1);
shuffle(nums.begin(), nums.end(), mt19937(random_device{}()));
BST tree;
auto start = high_resolution_clock::now();
for (int n : nums)
tree.insert(n);
auto end = high_resolution_clock::now();
duration<double> elapsed = end - start;
cout << "C++ BST insert time: " << elapsed.count() << " seconds" << endl;
return 0;
}
performance comparison
| Language | Average Time (per run) |
|---|---|
| Python | 14.52 milliseconds |
| C++ | 2.22 milliseconds |
C++ is approximately 7x faster in this case.
python execution process
Why is Python slow? Because it interprets bytecode at runtime instead of compiling straight to machine code like C/C++.
What happens when you run a Python script:
- Compiler reads your
.pysource - Source gets compiled to bytecode, producing a
.pycfile - The Python Virtual Machine (PVM) interprets that bytecode
- PVM converts bytecode to machine code
- CPU runs it
That’s a lot of steps. No wonder it’s slow.
Spent 3 days reading through this. Made me appreciate C a lot more - a non-OOP language built Python.