NumPy

NumPy

Library for efficient data processing

Data are stored in multidimensional arrays of numeric values which are implemented in an efficient way:

  • smaller memory use than e.g. lists of numbers in Python
  • much faster execution of operations like element-wise addition of arrays

Data can represent images, sound, measurements and much more

NumPy

common import convention:

import numpy as np

Arrays

creating a 1-dimensional array:

a1d = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Arrays

creating a 2-dimensional array:

a2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

output:

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Arrays

creating a 3-dimensional array:

a3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7,8]]])

output:

array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

NumPy arrays vs Python lists

Arrays are implemented in C, the numeric entries are not full Python Objects and require less resources

NumPy arrays vs Python lists

Python list (with references to Python integer objects):

list_a = [1, 2, 3, 4]

NumPy array (data are contained within the array without referencing Python integers):

array_a = np.array(list_a)

Fast element-wise operation (implemented in C):

array_a * array_a

NumPy arrays vs Python lists

Exercise:

Compare the execution time of an operation in pure Python and in NumPy by using time.perf_counter()

e.g. compute the square roots of all numbers from 0 to 1,000,000

Array shape

We can query these attributes:

  • a3d.shape: (2, 2, 2)
  • a3d.ndim: 3
  • a3d.size: 8

More than one way to do it

More than one way to do it

from the Zen of Python:

There should be one-- and preferably only one --obvious way to do it.

this philosophy is often not applied in NumPy

More than one way to do it

example: transposing an array

a2d.T
a2d.transpose()
np.transpose(a2d)

NumPy functions vs array methods

many operations available in two ways:

  • functions in the numpy package
  • methods of the array class

we will be using mostly functions

NumPy functions vs array methods

available as functions and methods:

np.max(a2d)
a2d.max()
np.round(a2d)
a2d.round()

available as functions only:

np.sin(a2d)
np.exp(a2d)
np.expand_dims(a2d, 2)

Creating arrays

Creating arrays

creating a 2x6 array filled with 0:

np.zeros((2, 6))
# or
np.full((2, 6), 0.0)

Creating arrays

creating number sequences:

np.linspace(0, 1.0, 11)
# [0.0, 0.1, ... 1.0]
np.arange(0, 3.14, 0.1)
# [0.0, 0.1, ... 3.1]

Creating arrays

creating a 2x2 array of random values:

# create a random number generator
rng = np.random.default_rng(seed=1)

# floats between 0 and 1:
rng.random((2, 2))
# integers between 1 and 6:
rng.integers(1, 7, (2, 2))

older interface: np.random.random() and np.random.randint()

Selecting array entries

Selecting array entries

a1d[0] # 0
a2d[0, 1] # 2
a2d[0, :] # [1, 2, 3]
a2d[:, 0] # [1, 4, 7]

with 2D arrays: [row index, column index]

in general:

  • last index: counts rightwards
  • second to last index (if it exists): counts downwards

Selecting array entries

a2d[0, :] # [1, 2, 3]

shorter form:

a2d[0] # [1, 2, 3]

Slices

a1d[:3] # [0, 1, 2]
a1d[3:6] # [3, 4, 5]
a1d[6:] # [6, 7, 8, 9]
a1d[0:8:2] # [0, 2, 4, 6]
a1d[3:0:-1] # [3, 2, 1]
a1d[::-1] # [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
a2d[1:, :] # [[5, 6, 7], [8, 9, 10]]

also works on Python lists

Operations on arrays

Operators

Operators are applied element-wise:

a = np.array([0, 1, 2, 3])
b = np.array([2, 2, 2, 2])

-a
# np.array([0, -1, -2, -3])
a + b
# np.array([2, 3, 4, 5])
a * b
# np.array([0, 2, 4, 6])

Operators

element-wise comparison of arrays:

a < b
# np.array([True, True, False, False])
a == b
# np.array([False, False, True, False])

Warning: a == b cannot be used reasonably in if statements - use np.array_equal(a, b)

Operators

operations with single numbers (broadcasting):

print(a + 1)
# np.array([1, 2, 3, 4])

Some constants are available directly in NumPy:

print(a + np.pi)
print(a + np.e)
print(np.nan)

Element-wise functions

NumPy provides some mathematical functions that are applied element-wise:

print(np.sin(a))
# [0.0, 0.84147098, 0.9... ]
print(np.sqrt(a))
# [0.0, 1.0, 1.414... ]

Element-wise functions

  • abs
  • sin
  • cos
  • sqrt
  • exp
  • log
  • log10
  • round
  • ...

Aggregation functions

Aggregations compute scalar values for an entire array or for each of its rows / columns / ...

sum over all entries:

np.sum(a2d)

sum along axis 0 ("downwards"):

np.sum(a2d, axis=0)

sum along axis 1 ("rightwards"):

np.sum(a2d, axis=1)

Aggregation functions

  • sum
  • min
  • max
  • std
  • percentile

Exercises

(see next slides)

  • prices and amounts → total price
  • kinetic energy
  • centroid of a triangle
  • sine and cosine - value table
  • dice rolls

Exercises

given an array of prices and an array of quantities, determine the total price:

prices = np.array([3.99, 4.99, 3.99, 12.99])
# buying the first item 3 times and the last item 2 times
quantities = np.array([3, 0, 0, 2])

# solution: 37.95

Exercise

given an array of masses and velocities of some bodies, determine the kinetic energy of every body and the total kinetic of all bodies together

masses = np.array([1.2, 2.2, 1.5, 2.0])
velocities = np.array([12.0, 14.0, 14.0, 7.5])

formula: E = m*v^2 / 2

Exercises

given the coordinates of the vertices of a triangle, determine its centroid (arithmetic mean of its vertices)

a = np.array([5, 1])
b = np.array([6, 8])
c = np.array([1, 3])

# solution: [4, 4]

Exercises

create a "value table" for the sine and cosine functions in the interval between 0 and 2*pi.

result:

# x, sin(x), cos(x)
np.array([[0.0, 0.01, 0.02, ...],
          [0.0, 0.0099998, 0.0199999, ...],
          [1.0, 0.99995, 0.99980, ...]])

using this data, verify the following equation: sin(x)^2 + cos(x)^2 = 1

Exercises

Simulate 1 million dice rolls with 10 dice each

Boolean indexing

Boolean indexing (long form)

a = np.array([4.1, 2.7, -1, 3.8, -1])

a_valid = a > 0
# array([True, True, False, True, False])
a_filtered = a[a_valid]
# array([4.1, 2.7, 3.8])

a_invalid = a < 0
a_with_nans = np.copy(a)
a_with_nans[a_invalid] = np.nan
# array([4.1, 2.7, nan, 3.8, nan])

Boolean indexing (short form)

a = np.array([4.1, 2.7, -1, 3.8, -1])

a_filtered = a[a >= 0]

a_with_nans = np.copy(a)
a_with_nans[a_with_nans < 0] = np.nan

Numeric types

Numeric types

  • int
  • float
  • decimal

Int

an int8 consists of 8 bits and can store 2^8 (256) different numbers

number of representable values for integer types:

  • int8: 256 (-128 to +127)
  • int16: 65,536 (-32,768 to +32,767)
  • int32: 4,294,967,296
  • int64: 18,446,744,073,709,551,616

Int

an unsigned integer (uint) can only represent non-negative numbers

e.g. uint8: 0 to 255

Float

standardized way of representing real numbers in computers: IEEE 754

  • binary floating point numbers
  • decimal floating point numbers

Float

common floating point types:

  • float32 (single): exact for ~7 decimal digits
  • float64 (double): exact for ~16 decimal digits

Float

rounding errors: some numbers cannot be represented as floating point numbers, they will always be approximations

examples in the decimal system: 1/3, 1/7, π

examples in the binary system (i.e. floats): 1/10, 1/5, 1/3, π

example: π + π evaluates to 6.2 when using decimal numbers with a precision of 2 (a more exact result would be 6.3)

example: 0.1 + 0.2 evaluates to ~ 0.30000000000000004 when using 64 bit floats

Float

some operations result in loss of precision - e.g. subtracting numbers that are close to each other

example:

a = 0.001234567 (7 significant decimal places)
b = 0.001234321 (7 significant decimal places)

c = a - b
c = 0.000000246 (3 significant decimal places)

Float

Special values in IEEE 754:

  • inf and -inf (infinite values)
  • nan (not-a-number: undefined / unknown value)

Floats in IEEE 754

Floats in IEEE 754

storage format:

(-) 2^e * s
  • e ... exponent
  • s ... significand / coefficient

Examples

pi as float32:

0 10000000 10010010000111111011011

2*pi as float32:

0 10000001 10010010000111111011011

pi/2 as float32:

0 01111111 10010010000111111011011

Examples

numbers 0.20000000, 0.20000001, ... 0.20000005 expressed as closest float32 numbers:

  • 0 01111100 10011001100110011001101
  • 0 01111100 10011001100110011001101
  • 0 01111100 10011001100110011001110
  • 0 01111100 10011001100110011001111
  • 0 01111100 10011001100110011001111
  • 0 01111100 10011001100110011010000

Examples

Avogadro constant (6.02214076 * 10^23):

0 11001101 11111110000110001000001

planck length (1.61625518 * 10^-35):

0 00001011 01010111101011110110100

Overflow and underflow

largest float32 number:

0 11111110 11111111111111111111111

~ 2^127.9999 ~ 3.403e38

smallest positive float32 number with full precision:

0 00000001 00000000000000000000000

= 2^-126 ~ 1.175e-36

larger numbers will yield inf

smaller numbers will lose precision or yield 0.0

Special values

inf: 0 11111111 00000000000000000000000

nan: 0 11111111 00000000000000000000001

Array types

Array types

Each NumPy array can only hold data of one type (e.g. only 64 bit floats or only bytes)

Array types

Each array has a predefined data type for all entries

a = np.array([1])
a.dtype # int32
b = np.array([1.0])
b.dtype # float64
c = np.array(['abc'])
c.dtype # <U3
d = np.array([b'abc'])
d.dtype # |S3

Array types

Types may be stated explicitly:

a = np.array([1, 2, 3, 4], dtype='int64')
b = np.array([1, 2, 3, 4], dtype='uint8')

If possible, types are converted automatically:

c = a + b
c.dtype # int64

Array types

common types:

  • bool / bool_ (stored as 8 bits)
  • int8, int16, int32, int64
  • uint8, uint16, uint32, uint64
  • float16, float32, float64

Float types

precision for float types:

  • float16: ~3 decimal digits
  • float32: ~7 decimal digits
  • float64: ~16 decimal digits

Float types

float16: exact for ~3 decimal digits

np.array([2.71828, 0.271828], dtype="float16")
# array([2.719 , 0.2717])

Float types

float16: overflow

np.array([65450, 65500, 65550], dtype="float16")
# array([65440, 65500, inf])

float16: underflow

np.array(
    [3.141e-5, 3.141e-6, 3.141e-7, 3.141e-8, 3.141e-9],
    dtype="float16"
)
# array([3.14e-05, 3.16e-06, 2.98e-07, 5.96e-08, 0.00e+00])

NumPy advanced

Views

Several operations in numpy will produce views of the data - multiple numpy arrays can refer to the same data in the background (for efficiency)

Views

comparison: creating a copy of a list, creating a view of an array

list = [1, 2, 3]
list_copy = list[:]
list_copy[0] = 10 # does NOT change list

array = np.array([1, 2, 3])
array_view = array[:]
array_view[0] = 10 # DOES change array

Copying arrays

Arrays can be copied via np.copy()

Reshaping arrays

np.reshape(a3d, (8, )) # 1d array
np.reshape(a3d, (2, 4)) # 2d array

automatic sizing for one axis:

np.ravel(a3d) # 1d array
np.reshape(a3d, (-1, )) # 1d array
np.reshape(a3d, (2, -1)) # 2d array

these operations will create views

Transposing

reversing order of axes (flipping axes in 2D):

np.transpose(a2d)

a2d.T

Concatenating arrays

concatenating along an existing axis (axis 0 by default):

np.concatenate([a1d, a1d])
np.concatenate([a2d, a2d])
np.concatenate([a2d, a2d], axis=1)

concatenating along a new axis:

np.stack([a1d, a1d])

Linear algebra

Linear algebra

np.transpose(m)
np.linalg.inv(m)
np.eye(2) # unit matrix

Array multiplication

via the binary operator @

example: rotating several points by 45° / 90° (counterclockwise):

points = np.array([[0, 0], [0, 1], [1, 1], [1, 0]])

m = np.array([[np.sqrt(0.5), np.sqrt(0.5)],
              [-np.sqrt(0.5), np.sqrt(0.5)]])

print(points @ m)
print(points @ m @ m)

Array multiplication

example:

known data: prices of various products, number of items in stock for different stores

prices = np.array([3.99, 12.99, 5.90, 15])
quantities = np.array([[0, 80, 80, 100],
                       [100, 0, 0, 0],
                       [50, 0, 0, 50]])

wanted: total value for each of the three stores