file = a sequence of bytes on a storage device
Many file formats are a sequence of text characters - e.g. the formats .txt, .html, .csv or .py.
The content of text files can be represented as strings, other file contents can be represented as byte sequences.
file = open("message.txt", "w", encoding="utf-8")
file.write("hello world\n")
file.write("end\n")
file.close()
The file is opened for writing (w).
The character encoding will be UTF-8.
file = open("message.txt", encoding="utf-8")
content = file.read()
file.close()
print(content)
Standard mode: reading (r)
# mode: text, append
open("todos.txt", mode="ta")
t
: text mode (default)b
: binaryr
: reading (default)w
: (over)writinga
: appendingwasm_content = bytes([
0, 97, 115, 109, 1, 0, 0, 0, 1, 5, 1, 96, 0, 1, 127,
3, 2, 1, 0, 7, 10, 1, 6, 97, 110, 115, 119, 101, 114,
0, 0, 10, 6, 1, 4, 0, 65, 42, 11
])
file = open("answer.wasm", mode="wb")
file.write(wasm_content)
file.close()
with open("todos.txt", encoding="utf-8") as file_obj:
content = file_obj.read()
The file will be closed automatically when the program leaves the indented block.
Text files may be encoded in various ways:
Recommendation: Use UTF-8 (best support for special characters)
The default character encoding for text files depends on the operating system:
import locale
locale.getpreferredencoding()
Objects that support using .read()
or .write()
etc:
open()
)sys.stdout
, sys.stdin
sys.stdin.readline()
urllib.request.urlopen('https://google.com')
Read line by line (small memory consumption):
with open("myfile.txt", encoding="utf-8") as file:
for line in file:
print(line)
Methods / Attributes:
.close()
.mode
.read()
(read the entire file).read(10)
(read the next 10 bytes).readline()
(read the next line)Create a program that asks the user for entries on a shopping list and stores them in a text file
possibilities:
JSON: popular and standardized data file format
can represent the fundamental Python datatypes (none, bool, int, float, list, dict)
import json
data = ["one", "two", "three"]
jsonstring = json.dumps(data)
with open("numbers.json", mode="w", encoding="utf-8") as jsonfile:
jsonfile.write(jsonstring)
import json
with open("numbers.json", encoding="utf-8") as jsonfile:
jsonstring = jsonfile.read()
data = json.loads(jsonstring)
CSV is a file format which can hold tabular data; entries are separated by commas
example:
ISO,Country,Capital,Languages
AD,Andorra,Andorra la Vella,"ES,FR"
AE,United Arab Emirates,Abu Dhabi,"AE,fa,en,hi,ur"
AF,Afghanistan,Kabul,"AF,tk"
Python libraries:
import pandas as pd
data = pd.DataFrame(
[
["CN", 9.6, 1386],
["RU", 17.0, 144],
["US", 9.8, 327],
],
columns=["code", "area", "population"],
)
data.to_csv("countries.csv")
import pandas as pd
data = pd.read_csv("countries.csv")
print(data)
print(data.values.tolist())
import csv
data = [
['code', 'area', 'population'],
['CN', 9.6, 1386],
['RU', 17, 144],
['US', 9.8, 327]
]
with open('countr.csv', 'w', encoding='utf-8', newline='') as f:
writer = csv.writer(f)
writer.writerows(data)
with open('countr.csv', encoding='utf-8', newline='') as f:
reader = csv.reader(f)
for row in reader:
print(row)
two packages in the standard library:
xml.etree.ElementTree
xml.dom.minidom
external library (extension of ElementTree):
lxml
import xml.etree.ElementTree as et
person = et.Element('person')
name = et.SubElement(person, 'name')
name.text = 'Adam'
age = et.SubElement(person, 'age')
age.text = '40'
age.set("unit", "years")
Always specify an encoding when saving XML!
xmlbytestring: bytes = et.tostring(person, encoding='utf-8')
with open("myfile.xml", mode="wb") as file:
file.write(xmlbytestring)
# or
xmlstring: str = et.tostring(person, encoding='unicode')
with open("myfile.xml", encoding="utf-8", mode="w") as file:
file.write(xmlstring)
# or
tree = et.ElementTree(person)
tree.write("myfile.xml", encoding="utf-8")
import xml.etree.ElementTree as et
person = et.fromstring(xmlstring)
for childnode in person:
print(childnode.tag)
print(childnode.text)
print(childnode.attrib)
File format that can be used to save various types of Python objects
Warning:
import pickle
import datetime
now = datetime.datetime.now()
serialized = pickle.dumps(now)
with open("datetime.pickle", mode="wb") as picklefile:
picklefile.write(serialized)
import pickle
with open("datetime.pickle", mode="rb") as picklefile:
serialized = picklefile.read()
earlier = pickle.loads(serialized)
Python data structure for the tic-tac-toe board:
field = [
['X', 'O', None],
['X', 'X', 'O'],
['O', 'O', 'X']
]
important packages:
os.getcwd()
(current path)os.chdir()
os.listdir()
os.walk()
os.mkdir("foo")
os.mkdir("foo/bar/baz")
os.remove("foo.txt")
(delete file)os.rmdir("foo/bar/baz")
(delete empty folder)shutil.rmtree()
(delete folder)os.rename("foo.txt", "bar.txt")
shutil.move()
(move file or folder)shutil.copy("foo.txt", "bar")
(copy file)shutil.copytree()
(copy folder)Direct output via os.system
:
os.system("ls .")
os.system("mkdir foo")
Read results in Python via os.popen
:
a = os.popen("ls .").read()
print(a)
create a program that searches for occurrences of a term in all files from a folder and prints their number
Mini language that can define a search pattern for text
Examples of simple search patterns:
https?://.+?\.com
<h1>.+?</h1>
\d?\d:\d\d
Online: https://regexr.com/
In VS Code: Ctrl+F and click the button labeled .*
The following characters have special meanings:
\
^
$
.
|
?
*
+
()
[]
{}
Avoid the special meaning by prepending a backslash:
13\$
9\.99€
1\+1=2
.
: any character except newline\s
: whitespace\d
: digit\w
: digit, letter or underscoreExercise: find all digits in a document
a*
: character a repeated 0 or more times (matches longest string)a*?
: character a repeated 0 or more times (matches shortest string)a+
: character a repeated 1 or more times (matches longest string)a+?
: charcter a repeated 1 or more times (matches shortest string)a?
: optional character aExercises:
.jpeg
or .jpg
Group expressions via (...)
Examples:
(ab)+
matches repetitions of the sequence ab<(-=)+->
matches the following pattern: <-=-=-=-=->
...|...|...
: Match any of the listed alternatives
Example to find an image: \.(jpe?g|png|gif)
Exercises:
http://
or https://
and end with .com
or .org
Groups can be used to extract information
Example: (\d?\d):(\d\d)
will extract two values
\A
: start of the string\Z
: end of the string^
: start of a line$
: end of a line[a-z]
: any lowercase ASCII letter[a-zA-Z]
: any ASCII letter[,;.]
: same as (,|;|.)
Regular expressions are used to find a substring of a specific pattern inside a string
in Python regular expressions can be handled via the re
Package, in particular:
re.search
re.finditer
example:
import re
match = re.search(r"[a-z][A-Z]", "abCdE")
if match:
print("match")
print(match[0]) # bC
else:
print("no match")
example:
import re
match_iter = re.finditer(r"https?://.+?\.com", website_content)
for match in match_iter:
print(match[0])
Exercise: find all URLs in a HTML document on the drive
For an example, save the page https://news.ycombinator.com to disk
times = re.finditer(
r'(\d?\d):(\d\d)',
'The course times are 9:30 - 16:30'
)
for time in times:
print(f"hour: {time[1]}")
print(f"minute: {time[2]}")
Optimizing performance if expressions are reused:
my_re = "..."
re.search(my_re, ...)
re.finditer(my_re, ...)
becomes
my_re = "..."
my_re_obj = re.compile(my_re)
my_re_obj.search(...)
my_re_obj.finditer(...)
Python packages:
datetime.date
datetime.time
datetime.datetime
a = datetime.time(hour, minute, second, microsecond)
b = datetime.datetime(2018, 1, 13)
c = datetime.datetime(2018, 3, 26, 12, 30)
c - b
import time
for i in range(10):
print(i)
time.sleep(1)
current Unix time (seconds since 1970-01-01 00:00:00 UTC)
import time
time.time()
Functions for the Python environment
examples:
argv
stdout.write
getrefcount
path
version
version_info
may be read via sys.argv
# hello.py
import sys
print(sys.argv)
python hello.py --run --file=foo.txt
['hello.py', '--run', '--file=foo.txt']
import sys
old_stdout = sys.stdout
class LoudStdout:
def write(self, text):
old_stdout.write(text.upper())
loudstdout = LoudStdout()
sys.stdout = loudstdout
import socket
client = socket.socket()
client.connect(("google.com", 80))
client.send(b"GET / HTTP/1.1\r\nHost: www.google.com\r\n\r\n")
response = client.recv(4096)
print(response)
Exercise: script that reads and processes a file named request.httpreq
and creates a file named response.httpres
low level HTTP interface
from http.client import HTTPSConnection
connection = HTTPSConnection("www.google.com")
connection.request("GET", "/")
answer = connection.getresponse()
content = answer.read()
connection.close()
print(content)
part of the standard library
from urllib import request
content: bytes = request.urlopen(
"https://www.google.com").read()
external package that may be installed via pip:
pip install requests
import requests
content: str = requests.get("https://www.google.com").text
Installation:
pip install selenium
geckodriver / chromedriver / ...
download geckodriver from:
https://github.com/mozilla/geckodriver/releases/tag/v0.23.0
Download and save in a folder in Python's sys.path
- or in the project directory
python -m http.server
https://docs.python.org/2/howto/webservers.html
(info is somewhat dated)
= standard interfaces for letting a program on a server reply to an HTTP request
CGI (Common Gateway Interface): Interface between server software (e.g. Apache) and a web application (written in any programming language)
Interface relies on stdin, stdout and environment variables
Drawback: a new Process has to be started for every request (which is slow)
WSGI = Web Server Gateway Interface
Standard for letting a Python program on a server reply to an HTTP request
Inspired by CGI; advantage: one running Process can handle multiple request one after another
The entry point is a Python function:
The function will receive two parameters: environ
and start_response
The request parameters are available via environ
(e.g. URL, HTTP headers, form data, ...)
The second parameter is a function which we usually name start_response
.
In order to start a response we call start_response(status, response_headers)
, e.g.:
start_response(
"200 OK",
[("Content-Type", "text/plain; charset=utf-8")]
)
The response body is an iterable (e.g. a list) of bytestrings.
server software:
from wsgiref.simple_server import make_server
from app import application
server = make_server("localhost", 8000, application)
server.serve_forever()
This part will usually be handled by libraries like guinicorn
# app.py
def application(environ, start_response):
response_status = "200 OK"
response_body_text = "hello"
response_body = response_body_text.encode("utf-8")
response_headers = [
("Content-Type", "text/plain; charset=utf-8"),
("Content-Length", str(len(response_body))),
]
start_response(response_status, response_headers)
return [response_body]
We can view it in the debugger, e.g.:
{
"CONTENT_LENGTH": "12"
"HTTP_USER_AGENT": 'Mozilla/5.0 (Win...',
"HTTP_COOKIE":
"PATH_INFO": "/todos/new",
"REQUEST_METHOD": "GET",
"wsgi.input": ...
}
PATH_INFO in the environ dictionary: requested path on the server
task: show different pages at different addresses
response_headers = [
("Location", "/login"),
("Content-Length", "0")
]
start_response("307 Temporary Redirect", response_headers)
return [b'']
Task: Make some pages redirect, e.g. make /now
redirect to /time
Cookies = small sets of data which may be placed in the Browser by a website
Cookies may be used to recognize a previous visitor, e.g. for:
response_headers.append((
"Set-Cookie", "mycookie123=abcd; Max-Age=30"
))
try:
current_cookies_str = environ["HTTP_COOKIE"]
except KeyError:
...
result may be: "cookie1=one; cookie2=two"
from http import cookies
...
mycookies = cookies.SimpleCookie()
mycookies.load(current_cookies_str)
mycookies["cookie1"].value
by setting an expired cookie
response_headers.append((
"Set-Cookie",
"mycookie123=; expires=Thu, 01 Jan 1970 00:00:00 GMT"
))
Task: Create a website which only lets the user visit content 5 times before requiring them to log in
This is how we read parameters from forms:
from urllib.parse import parse_qs
request_body_size = int(environ.get('CONTENT_LENGTH', '0'))
# environ["wsgi.input"] is a file-like object
request_body = (environ["wsgi.input"]
.read(request_body_size)
.decode("utf-8"))
parameters = parse_qs(request_body)
first_name = parameters.get("first-name")[0]
WSGI configuration file
# /var/www/username_pythonanywhere_com_wsgi.py
import sys
path = "/home/username/"
if path not in sys.path:
sys.path.append(path)
from app import application
For basics see the presentation on HTTP
tools for API development with Python:
Task: develop a simple API with FastAPI that has these resouces:
/time
/date
/randnr
(random number)Task:
develop an API with FastAPI that can manage a list of todos
implement these endpoints:
GET /todos
(JSON array of all todos)GET /todos/$id
(single todo by id)POST /todos
(add a todo)DELETE /todos/$id
(delete a single todo)PATCH /todos/$id
(modify a todo)POST /delete_completed_todos
The todos can be stored in a simple JSON file - or in an actual database
FastAPI: library for API development that relies on type annotations
basic example:
# main.py
from fastapi import FastAPI
app = FastAPI()
@app.get("/")
async def root():
return {"message": "Hello World"}
@app.get("/double")
async def double(number: int):
return number * 2
run the example via:
uvicorn main:app --reload
view documentation and testing environment (Swagger) at: http://localhost:8000/docs
For basics see presentation databases and data store
Database API specification: standard which is implemented by various Python database bindings; standardized in PEP 249
PEP 249 defines:
some libraries that implement PEP 249:
SQLite uses one file to store a database
comes with Python (module sqlite3
)
import sqlite3
# database stored in a file
connection = sqlite3.connect('contacts.db')
# in-memory database
connection = sqlite3.connect(':memory:')
PIP package psycopg
import psycopg
connection = psycopg.connect(host="localhost",
dbname="...",
user="...",
password="...")
or
connection = psycopg.connect(
"postgresql://user:password@localhost/dbname")
PIP package PyMySQL
import pymysql.cursors
connection = pymysql.connect(host='localhost',
user='user',
password='passwd',
db='db',
charset='utf8mb4')
Python package cx_Oracle
import cx_Oracle
connection = cx_Oracle.connect(user="user",
password="password",
dsn="localhost/oraclepdb")
ODBC = open database connectivity: Standard for database interfaces (independent of Python)
pyodbc = implementation for Python - this enables access to any ODBC database
import pyodbc
connection = pyodbc.connect(
"Driver=SQLite3 ODBC Driver;Database=contacts.db")
connection = ...
cursor = connection.cursor()
cursor.execute("SELECT ...")
print(cursor.fetchall())
...
cursor.execute("INSERT INTO ...")
cursor.execute("INSERT INTO ...")
connection.commit()
...
connection.close()
dangerous:
search_name = input()
res = cursor.execute(
f"""SELECT tel FROM person WHERE name = '{search_name}'"""
)
safe method (with SQL escaping):
search_name = input()
res = cursor.execute(
"""SELECT tel FROM person WHERE name = ?;""",
(search_name, )
)
https://www.python.org/dev/peps/pep-0249/#paramstyle
The attributes sqlite3.paramstyle
, pymysql.paramstyle
etc. indicate the format for queries with parameters
cursor.execute(command, parameters)
cursor.rowcount
: number of result rows in the last querycursor.fetchone()
: get a single row of the result (usually as a tuple)cursor.fetchmany(10)
cursor.fetchall()
Todo application (with web interface) (see courses-tutorials/python-todolist-wsgi-sqlite)
SQLite data types and corresponding Python types:
NULL
- None
INT
/ INTEGER
- int
REAL
- float
TEXT
- str
BLOB
- bytes
Two types that are usually not supported by SQLite:
TIMESTAMP
- datetime
DATE
- date
Storing them as SQL DECIMAL values with automatic conversion to / from Python types:
connection = sqlite3.connect(
'contacts.db'
detect_types=sqlite3.PARSE_DECLTYPES)
Task: Create a contacts database that automatically converts SQL decimals that represent the date of birth into Python date
objects.
We can add support for more types by writing so-called adapter and converter functions. These custom types will usually be stored as byte sequences in the database.
An adapter is a custom function that converts a Python object into an SQL value.
A converter is a custom function that converts an SQL value into a Python object.
see:
Tasks:
ipadress.IPv4Address
class - in SQL it would be saved as an encoded byte stringSQLAlchemy = Object Relational Mapper
Object oriented access to arbitrary SQL databases
potential alternative: Django ORM
# db_interface.py
from sqlalchemy import create_engine
engine = create_engine("sqlite:///music.db", echo=True)
engine.connect()
# schema.py
from sqlalchemy.orm import mapped_column, Mapped, DeclarativeBase
class Base(DeclarativeBase):
pass
class Artist(Base):
__tablename__ = "artist"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column()
country: Mapped[str] = mapped_column()
# db_interface.py
Base.metadata.create_all(engine)
from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine)
session = Session()
...
session.commit()
INSERT INTO artist VALUES ('The Beatles', 'United Kingdom');
becomes:
beatles = Artist(name="The Beatles", country="United Kingdom")
session.add(beatles)
SELECT name, country FROM artist;
becomes:
for artist in session.query(Artist):
print(f"{artist.name} ({artist.country})")
or
for name, country in session.query(Artist.name, Artist.country):
print(f"{name} ({country})")
SELECT name, country FROM artist ORDER BY name;
becomes:
for name, country in session.query(
Artist.name, Artist.country).order_by(Artist.name):
...
SELECT name, country FROM artist WHERE artist.name='The Beatles'
becomes:
session.query(Artist).filter(Artist.name=="The Beatles").one()
currently:
<db_schema.Song object at 0x00000175902A5FD0>
better:
Help! - The Beatles
We can achieve this via the methods __repr__
/ __str__
UPDATE song
SET title = 'Help'
WHERE title = 'Help!';
becomes
entry = session.query(Song).filter(Song.title=="Help!").one()
entry.title = "Help"
DELETE
FROM song
WHERE title = 'Help!';
becomes
session.query(Song).filter(Song.title=="Help!").delete()
class Song(Base):
__tablename__ = "song"
id = Column(Integer, primary_key=True)
title = Column(String(30))
artist_id = mapped_column(ForeignKey("artist.id"))
from sqlalchemy.orm import relationship
class Artist(Base):
...
songs: Mapped[Song] = relationship("Song", back_populates="artist")
class Song(Base):
...
artist_id = mapped_column(ForeignKey("artist.id"))
artist: Mapped[Artist] = relationship("Artist", back_populates="song")
yesterday = Song(title="Yesterday", artist=beatles)
help = Song(title="Help!", artist_id=beatles.id)
session.add(...)
print(yesterday.artist)
print(beatles.songs)
PIP package openpyxl
Creating, saving, loading:
import openpyxl
wb = openpyxl.Workbook()
wb.save("wb.xlsx")
wb2 = openpyxl.load_workbook("wb.xlsx")
Creating and getting worksheets:
ws1 = wb.worksheets[0]
print(wb.sheetnames) # ["Sheet"]
ws1 = wb["Sheet"]
# new sheet 0
wb.create_sheet("Sheet2", 0)
ws2 = wb["Sheet2"]
Working with worksheets:
ws = wb.worksheets[0]
ws.title = "times table"
Working with cells:
a1 = ws.cell(1, 1)
a1 = ws["A1"]
a1.value # None
a1.value = 3
a1.row # 1
a1.column # 1
a1.coordinate # "A1"
Exercise: Create the followng table:
product | price | stock |
---|---|---|
apple | 1.00 | 10 |
banana | 0.70 | 20 |
pear | 0.80 | 20 |
import tkinter as tk
window = tk.Tk()
window.mainloop()
A window is created as a Tk
object.
The method mainloop()
starts the program (and waits for user interaction)
import tkinter as tk
window = tk.Tk()
hello_label = tk.Label(master=window, text="Hello!")
hello_label.pack()
window.mainloop()
hello_label = tk.Label(master=window, text="Hello!")
hello_label.config(text="Hi!")
...
message_label = tk.Label(master=window, text="")
message_label.pack()
def display_message():
message_label.config(text="Hello!")
hello_button = tk.Button(master=window,
text="Say Hello!",
command=display_message)
hello_button.pack()
...
Application state is best stored inside a class.
import tkinter as tk
class CounterApp:
def __init__(self):
self.count = 0
self.window = tk.Tk()
self.count_btn = tk.Button(
master=self.window,
text=str(self.count),
command=self.increment_count
)
self.count_btn.pack()
def increment_count(self):
self.count += 1
self.count_btn.config(text=str(self.count))
def run(self):
self.window.mainloop()
counter = CounterApp()
counter.run()
Task: add a reset button to the counter
Simple layouts in rows and columns via .pack
:
label.pack()
More complex layouts in grids:
label_a.grid(column=0, row=0)
label_b.grid(column=0, row=1)
config options:
height
(in pixels or relative to the font size)width
borderwidth
background
(background color)foreground
(text color)justify
(text alignment, e.g. tk.CENTER
, tk.LEFT
, tk.RIGHT
)padx
, pady
(distance between the border and the content)font
(e.g.: ("Arial", 16)
)Label
Button
Frame
Entry
Enables creating .exe files from Python projects
Especially useful for GUI applications
Installation:
pip install pyinstaller
Creating an executable application:
pyinstaller app.pyw --onefile --windowed
Result: dist/app.exe
https://automatetheboringstuff.com/chapter16/
(for error corrections see next slide)
for hidden password entry: use module getpass
Errors in the resource:
provide the query as two entries:
UIDs = imapObj.search(['SINCE 05-Jul-2014'])
UIDs = imapObj.search(['SINCE', '05-Jul-2014'])
use newer version of pyzmail:
→ pyzmail
pyzmail36
use bytes instead of a string:
→ 'BODY[]'
b'BODY[]