long-running programs can document what they are doing in files
e.g.: long-running algorithms, web servers
import logging
logging.basicConfig(
filename="sort.log",
level=logging.DEBUG,
filemode="w"
)
logging.debug("hello")
Exercise: add logging to an existing function (e.g. to a sorting algorithm)
why:
pytest: testing library with a simple interface
doctest: checks code examples in docstrings
unittest: testing library that is included in the standard library
code to be tested:
# insertion_sort.py
def insertion_sort(unsorted):
"""Return a sorted version of a list."""
sorted = []
for new_item in unsorted:
i = 0
for sorted_item in sorted:
if new_item >= sorted_item:
i += 1
else:
break
sorted.insert(i, new_item)
return sorted
assert: keyword that makes sure some condition is met
assert isinstance(a, int)
assert a > 0
If the condition is not met, it will throw an assertion error
# insertion_sort_test.py
from insertion_sort import insertion_sort
assert insertion_sort([3, 2, 4, 1, 5]) == [1, 2, 3, 4, 5]
assert insertion_sort([1, 1, 1]) == [1, 1, 1]
assert insertion_sort([]) == []
the script should run without throwing errors
testing library with a simple interface, based on assert
pip install pytest
test file that works with pytest:
# insertion_sort_test.py
from insertion_sort import insertion_sort
def test_insertion_sort():
assert insertion_sort([3, 2, 4, 1, 5]) == [1, 2, 3, 4, 5]
assert insertion_sort([1, 1, 1]) == [1, 1, 1]
assert insertion_sort([]) == []
finding and running tests:
python -m pytest
=================== test session starts ===================
platform win32 -- Python 3.8.7, pytest-6.2.1, [...]
rootdir: C:\[...]
collected 1 item
insertion_sort_test.py . [100%]
==================== 1 passed in 0.19s ====================
naming test files: *_test.py
(or test_*.py
)
naming test functions: test*
Determine how much of the code is covered by tests (what percentage of statements is executed during the tests):
pip install pytest-cov
python -m pytest -cov=.
example output:
Name Stmts Miss Cover
--------------------------------------------
insertion_sort.py 10 0 100%
insertion_sort_test.py 5 0 100%
--------------------------------------------
TOTAL 15 0 100%
import pytest
def test_no_argument_raises():
with pytest.raises(TypeError):
insertion_sort()
grouping tests via classes:
class TestExceptions():
def test_no_argument_raises():
with pytest.raises(TypeError):
insertion_sort()
def test_different_types_raises():
with pytest.raises(TypeError):
insertion_sort(["a", 1])
fixtures can set up conditions before running a test
def test_foo(tmp_path):
# tmp_path is a path to a temporary directory
built-in fixtures:
tmp_path
capsys
(capture output to stdout and stderr)Code examples may be included in docstrings and may be used for testing
simple doctest:
# insertion_sort.py
def insertion_sort(unsorted):
"""Return a sorted version of a list.
>>> insertion_sort([3, 2, 4, 1, 5])
[1, 2, 3, 4, 5]
"""
# code here
running doctests from pytest:
python -m pytest --doctest-modules
"""
>>> insertion_sort(range(10)) #doctest: +NORMALIZE_WHITESPACE
[0, 1, 2, 3, 4, 5,
6, 7, 8, 9]
>>> insertion_sort(range(10)) #doctest: +ELLIPSIS
[0, 1, 2, ..., 8, 9]
"""
unittest: testing package inside the standard library
often, pytest is recommended over unittest
python -m unittest
looks for files matching test_*.py*
Note: in order to be discovered all packages must contain a file named __init__.py (see https://bugs.python.org/issue35617)
specifying a different pattern:
python -m unittest discover -p "*_test.py"
# insertion_sort_test.py
import unittest
import insertion_sort
class InsertionSort(unittest.TestCase):
def test_five_items(self):
input = [3, 2, 4, 1, 5]
expected = [1, 2, 3, 4, 5]
actual = insertion_sort.insertion_sort(input)
self.assertEqual(actual, expected)
def test_empty(self):
actual = insertion_sort.insertion_sort([])
self.assertEqual(actual, [])
assertions:
.assertEqual(a, 3)
.assertTrue(b)
.assertFalse(c)
.assertIsNone(d)
.assertIn(a, [2, 3, 4])
.assertIsInstance(a, int)
.assertRaises(TypeError, len)
there are also contrary assertions, e.g. .assertNotEqual(a, 3)
Defining functions that are executed before / after each test:
import unittest
class WidgetTestCase(unittest.TestCase):
def setUp(self):
self.widget = Widget('The widget')
def tearDown(self):
self.widget.dispose()
PIP package coverage
execution:
python -m coverage run test_shorten.py
python -m coverage report
Example output:
Name Stmts Miss Cover
-------------------------------------
shorten.py 4 0 100%
test_shorten.py 11 0 100%
-------------------------------------
TOTAL 15 0 100%
# insertion_sort_test.py
import doctest
import insertion_sort
def load_tests(loader, tests, ignore):
tests.addTests(doctest.DocTestSuite(insertion_sort))
return tests
from the interactive Python console:
help(round)
import math
help(math)
help(math.floor)
from the terminal:
python -m pydoc round
python -m pydoc math
python -m pydoc math.floor
Docstring of a module: description, list of exported Functions with single-line summaries
Docstring of a class: description, list of methods
Docstring of a function: description, list of parameters
Linter for validating docstrings
reStructuredText (reST) = simple markup language (similar to Markdown), is used in Python docstrings
Sphinx = tool that uses existing docstrings to generate documentation in HTML and similar formats
Example:
Heading
=======
- list item 1
- list item 2
Link to `Wikipedia`_.
.. _Wikipedia: https://www.wikipedia.org/
.. code:: python
print("hello")
Newer Python versions support optional type hints
Type hints can support the IDE - e.g. by providing additional errors
Variables:
i: int = 3
Functions:
def double(n: int) -> int:
return 2 * n
from typing import List, Set, Dict, Tuple
names: List[str] = ['Anna', 'Bernd', 'Caro']
person: Tuple[str, str, int] = ('Anna', 'Berger', 1990)
roman_numerals: Dict[int, str] = {1: 'I', 2: 'II', 3: 'III', 4: 'IV'}
from typing import Iterable
names: Iterable[str] = ...
Example: class Length
a = Length(130, "cm")
a.value # 130
a.unit # cm
a.unit = "in"
a.value # 51.18
str(a) # 51.18in
b = Length.from_string("12cm")
2 * b # 24cm
b + a # 142cm
Getters & setters (not common in Python):
r = Rectangle(length=3, width=4)
print(r.get_area()) # 12
r.set_length(4)
print(r.get_area()) # 16
Using properties in Python:
r = Rectangle(length=3, width=4)
print(r.area) # 12
r.length = 4
print(r.area) # 16
Exercise: Implement a class called Rectangle_gs
that uses getters and setters
class Rectangle_gs:
def __init__(self, length, width):
self._length = length
self._width = width
def get_length(self):
return self._length
def set_length(self, new_length):
self._length = new_length
def get_width(self):
return self._width
def set_width(self, new_width):
self._width = new_width
def get_area(self):
return self._length * self._width
With properties we can "redirect" reading and writing of attributes to a function - so accessing r.area
can lead to the execution of a getter or setter function.
class Rectangle:
def __init__(self, length, width):
self.length = length
self.width = width
def _get_area(self):
return self.length * self.width
area = property(_get_area)
property
is a built-in, so it's always available
Extension: Setter for area
class Rectangle:
...
def _set_area(self, new_area):
# adjust the length
self.length = new_area / self.width
area = property(_get_area, _set_area)
Alternative way to create Properties via decorators:
class Rectangle:
def __init__(self, length, width):
self.length = length
self.width = width
@property
def area(self):
return self.length * self.width
@area.setter
def area(self, new_area):
self.length = new_area / self.width
static attributes and static methods are associated with a class, but not with any specific instance of it
example: static attributes and static methods of the datetime class:
datetime.today()
datetime.fromisoformat()
datetime.resolution
Class attributes are attributes that are only defined on the class (not on each instance) - all instances share these attributes.
Example: Money
class with a shared class attribute called _currency_data
class Money:
_currency_data = [
{"code": "USD", "symbol": "$", "rate": 1.0},
{"code": "EUR", "symbol": "€", "rate": 1.1},
{"code": "GBP", "symbol": "£", "rate": 1.25},
{"code": "JPY", "symbol": "Â¥", "rate": 0.01},
]
def __init__(self, ...):
...
If a method does not have to access data of a specific instance it can be declared as a static Method.
class Money:
...
@staticmethod
def _get_currency_data(code):
for currency in Money._currency_data:
if code == currency["code"]:
return currency
raise ValueError(f"unknown currency: {code}")
Note: A static method does not receive self
as its first parameter - there is no reference to a specific instance.
There are two main applications for static methods:
Money.from_string("23.40EUR")
_get_currency_data
If we want to make the followng code more portable (especially for inheritance) it would make sense not to mention the class name (Money
) in the method definition:
class Money:
...
@staticmethod
def _get_currency_data(code):
for currency in Money._currency_data:
if code == currency["code"]:
return currency
raise ValueError(f"unknown currency: {code}")
Class methods are special static methods which enable referencing the class by a generic name (conventionally cls
):
class Money:
...
@classmethod
def _get_currency_data(cls, code):
for currency in cls._currency_data:
if code == currency["code"]:
return currency
raise ValueError(f"unknown currency: {code}")
Magic methods are special methods that influence the behavior of a class.
They begin and end with two underscores, e.g. __init__
List of magic methods: https://docs.python.org/3/reference/datamodel.html#special-method-names
Methods for converting to strings:
__repr__
: default representation, ideally readable / interpretable by Python__str__
: "nice" representation for humans, falls back to __repr__
if not overwrittenExample:
from datetime import time
a = time(23, 45)
repr(a) # 'datetime.time(23, 45)'
str(a) # '23:45:00'
Methods for mathematical operators:
__add__
__mul__
__rmul__
__call__
__getitem__
Proxy for parent classes
without super:
class Child(A, B):
def __init__(self, x, y):
A.__init__(self, x, y)
B.__init__(self, x, y)
with super:
class Child(A, B):
def __init__(self, x, y):
super().__init__(x, y)
@logattraccess
class Foo():
def __init__(self):
self.a = 3
f = Foo()
f.a # prints: "get property 'a'"
f.b = 3 # prints: "set propery 'b'"
In general we can assign any attributes
a.value = 3
In order to reduce memory consumption we can define so-called slots in a class:
class Money():
__slots__ = ['currency', 'amount']
iterable: an object that can be iterated over via for element in my_iterable
hierarchy of iterables:
advantages of "dynamic iterables" / iterators:
examples of "static iterables":
examples of "dynamic iterables":
examples of iterators:
enumerate()
reversed()
open()
os.walk()
os.scandir()
map()
filter()
example: open()
returns an iterator of lines in a file
with open("./wikipedia_complete.txt", encoding="utf-8") as f:
for line in f:
print line
The file could be gigabytes in size and this would still work
example functions:
Loads all files in foo/ at the same time, then iterates over them:
for text in read_textfiles_as_list("./foo/"):
print(text[:5])
Loads and prints text files individually - keeping memory consumption low:
for text in read_textfiles_as_iterator("./foo/"):
print(text[:5])
itertools: module for creating iterators
itertools.count
itertools.repeat
itertools.product
from itertools import count
for i in count():
print(i)
if i >= 5:
break
# 0 1 2 3 4 5
Generator functions and generator expressions are two ways to define custom iterators
A function can contain a yield
statement instead of a return
statement - this makes it a generator
def count():
i = 0
while True:
yield i
i += i
A generator function can be repeatedly exited (via yield
) and entered again (by requesting the next value)
create an iterator that returns the string contents of all UTF-8 text files in a directory
usage:
for content in read_textfiles("."):
print(content[:10])
solution:
def read_textfiles(path="."):
for file in os.listdir(path):
try:
with open(path + "/" + file) as fobj:
yield fobj.read()
except:
pass
Generator expressions are similar to list comprehensions
list comprehension:
mylist = [i*i for i in range(3)]
generator expression:
mygenerator = (i*i for i in range(3))
summing all numbers from 1 to 10 million:
via a list comprehension - will use hundreds of megabytes in memory (see task manager):
sum([a for a in range(1, 10_000_001)])
via a generator expression:
sum((a for a in range(1, 10_000_001)))
In Python every for loop happens via an iterator.
When iterating over an iterable, an iterator is created for that iteration.
Every iterable has an __iter__
method which returns an iterator
An iterator has a method called __next__
__next__()
either returns the next value of the iteration or raises a StopIteration
exception
An iterator is actually also an iterable (it has an __iter__
method which returns itself)
Iterator of a list:
numbers = [1, 2, 3, 4]
numbers_iterator = numbers.__iter__()
Iterators have a method named __next__
which will return the next object of an iteration.
Example:
numbers = [1, 2, 3]
numbers_iterator = numbers.__iter__()
print(numbers_iterator.__next__()) # 1
print(numbers_iterator.__next__()) # 2
When an iterator is used up a StopIteration
exception is raised.
print(numbers_iterator.__next__()) # 1
print(numbers_iterator.__next__()) # 2
print(numbers_iterator.__next__()) # 3
print(numbers_iterator.__next__()) # StopIteration
The global function next()
is equivalent to calling .__next__()
next(numbers_iterator)
numbers_iterator.__next__()
exercise: create custom iterables from a class by implementing __iter__
and __next__
for i in random():
...
or
for number in roulette():
print(number, end=" ")
4 0 29 7 13 19
A for loop can have an optional else clause
It will be executed if the loop finishes normally - i.e. if Python does not encounter a break
or return
statement
This functionality is not present in any other widespread language
Many Python developers don't know it either
Quote from Python's inventor:
I would not have the feature at all if I had to do it over.
is_prime()
with loops and for ... else
defining a lambda function (anonymous function):
multiply = lambda a, b: a * b
using a lambda for sorting:
pairs = [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
pairs.sort(key=lambda pair: pair[1])
A higher-order function is a function that can receive and/or return other functions
remember: in Python, "everything is an object" - and so are functions
functools module: collection of some higher-order functions
examples:
functools.lru_cache
functools.cache
(Python 3.9)functools.partial
functools.reduce
from functools import partial
open_utf8 = partial(open, encoding='utf-8')
memoization: strategy for performance optimization
return values of previous function calls are cached and used on subsequent function calls with the same arguments
def fibonacci(n):
if n in [0, 1]:
return n
return fibonacci(n-1) + fibonacci(n-2)
# make faster by caching
fibonacci = lru_cache(fibonacci)
Decorator syntax: simple way of applying higher-order functions to function definitions
@lru_cache # Python >= 3.8
def fibonacci(n):
...
is equivalent to:
def fibonacci(n):
...
fibonacci = lru_cache(fibonacci)
Set: Unordered collection of elements with no duplicates
Frozenset: immutable set
ingredients = {"flour", "water", "salt", "yeast"}
ingredients = set(["flour", "water", "salt", "yeast"])
ingredients = frozenset(["flour", "water", "salt", "yeast"])
Sets can be an alternative for Lists if the order is not relevant.
ingredients1 = {"flour", "water", "salt", "yeast"}
ingredients2 = {"water", "salt", "flour", "yeast"}
ingredients1 == ingredients2 # True
Take care: An empty set must always be created via set()
Why does {}
not produce an empty set?
x = set('abc')
y = set('aeiou')
x | y
x & y
x - y
x <= y
countries = {
"Canada", "USA", "Mexico", "Guatemala", "Belize",
"El Salvador", "Honduras", "Nicaragua", "Costa Rica",
"Panama"}
neighbors = [
{"Canada", "USA"},
{"USA", "Mexico"},
{"Mexico", "Guatemala"},
{"Mexico", "Belize"},
{"Guatemala", "Belize"},
{"Guatemala", "El Salvador"},
{"Guatemala", "Honduras"},
{"Honduras", "Nicaragua"},
{"Nicaragua", "Costa Rica"},
{"Costa Rica", "Panama"}
]
Example:
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
p = Point(11, y=22)
print(p[0])
print(p.x)
Enum = collection of symbolic names that can be used instead of specific strings
Using a string:
a = Button(position="left")
Using an enum named Position:
a = Button(position=Position.LEFT)
Enums can prevent typos and help with autocompletion.
Defining an enum:
from enum import Enum
class Position(Enum):
UP = 1
RIGHT = 2
DOWN = 3
LEFT = 4
Threads:
Multiprocessing:
Advantages of threads: simpler, variables may be modified directly
general mechanism: we instruct Python to run multiple functions in separate threads / processes, e.g.:
Run download_xkcd_comic(i)
in parallel threads for i = 100 - 120
Run is_prime(i)
in parallel processes for several numbers and collect the boolean results in a list
Threads: Python will repeatedly switch between parallel threads so they are seemingly running concurrently; however at any point in time only one thread is active (Global interpreter lock - GIL)
Multiprocessing: Python will start multiple processes (visible in the task manager); it can be harder to share values between processes
high-level:
concurrent.futures.ThreadPoolExecutor
concurrent.futures.ProcessPoolExecutor
low-level:
threading.Thread
multiprocessing.Process
from concurrent.futures import ThreadPoolExecutor
def print_multiple(text, n):
for i in range(n):
print(text, end="")
with ThreadPoolExecutor() as executor:
executor.submit(print_multiple, ".", 200)
executor.submit(print_multiple, "o", 200)
print("completed all tasks")
We'll write a program that executes two threads (a and b). The two threads contain loops that count how often they were called
example output:
797 iterations in thread a
799 iterations in thread b
1750 iterations in thread a
20254 iterations in thread b
829 iterations in thread a
Exercise: concurrently download Python package documentation pages for these topics:
["os", "sys", "urllib", "pprint", "math", "time"]
example URL: https://docs.python.org/3/library/os.html
The downloads should be saved to a separate downloads folder
The program must be a Python file that only "runs" its main part if it is executed directly - and not imported (via __name__ == "__main__"
)
from concurrent.futures.process import ProcessPoolExecutor
def print_multiple(text, n):
for i in range(n):
print(text, end="")
if __name__ == "__main__":
with ProcessPoolExecutor() as executor:
executor.submit(print_multiple, ".", 200)
executor.submit(print_multiple, "o", 200)
May be used for parallel processing of multiple input data entries to generate output data
example: process every entry in the list [2, 3, 4, 5, 6]
and determine wheter they are prime numbers → [True, True, False, True, False]
with ProcessPoolExecutor() as executor:
prime_indicators = executor.map(is_prime, [2, 3, 4, 5, 6])
Exercise: write a function that creates a list of prime numbers in a specific range:
prime_range(100_000_000_000_000, 100_000_000_000_100)
# [100000000000031, 100000000000067,
# 100000000000097, 100000000000099]
Make use of a ProcessPoolExecutor
and use this function:
def is_prime(n):
for i in range(2, int(n**0.5) + 1):
if n % i == 0:
return False
return True