use: managing big amounts of data
examples:
Popularity according to Stack Overflow Developer Survey 2020
basic operations for database records:
SQL:
INSERT INTO product (name, category, price)
VALUES ('IPhone', 'electronics', 699);
MongoDB shell:
db.products.insertOne({
name: 'IPhone',
category: 'electronics',
price: 699,
});
SQL:
SELECT * FROM product
WHERE category = 'electronics';
MongoDB shell:
db.products.find({ category: 'electronics' });
SQL:
SELECT name, price FROM product
WHERE category = 'electronics';
MongoDB shell:
db.products
.find({ category: 'electronics' })
.project({ name: 1, price: 1 });
SQL:
UPDATE product
SET category = 'phones'
WHERE name = 'IPhone';
MongoDB shell:
db.products.updateOne(
{ name: 'IPhone' },
{ $set: { category: 'phones' } }
);
SQL:
DELETE FROM product
WHERE name = 'IPhone';
MongoDB shell:
db.products.deleteOne({ name: 'IPhone' });
Create / change / query data in an online playground
SQL (Structured Query Language): developed in the 1970s, many variants
MongoDB: released in 2009
SQL: entries are stored in tables with predefined field names and field types (database schema)
MongoDB: entries (documents) in a collection may generally have arbitrary fields; a schema may be defined
SQL: standardized language (in theory), independent of any programming language
MongoDB: direct bindings for programming languages
entries automatically get a unique _id
field:
entry = {
_id: ObjectId('5e715e1b31315b0be066db84'),
name: 'Argentina',
continent: 'South America',
};
MongoDB is based on the BSON file format. It resembles JSON, but it is a binary format and can be read and written more efficiently
Importing and exporting can be done via the programs mongodump
and mongorestore
Atlas: hosted MongoDB databases provided by the developers of MongoDB (Login via Google supported)
try to:
MongoDB shell = simple command line interface for MongoDB that comes with MongoDB
try it online:
https://docs.mongodb.com/manual/tutorial/getting-started/
use a subset of MongoDB shell in pure JavaScript (without installing MongoDB):
important commands:
.insertOne
.insertMany
.find
.findOne
.updateOne
.replaceOne
.deleteOne
.deleteMany
creating entries in a collection:
db.countries.insertOne({
name: 'Argentina',
continent: 'South America',
});
creating multiple entries at once:
db.countries.insertMany([
{ name: 'Finland', continent: 'Europe' },
{ name: 'Greece', continent: 'Europe' },
]);
reading an array of all entries:
db.countries.find();
only query for some specific entries:
db.countries.find({ continent: 'Europe' });
reading a single entry via findOne
:
db.countries.findOne({ name: 'Greece' });
changing an entry by setting its population:
db.countries.updateOne(
{ name: 'Argentina' },
{ $set: { population: 44 } }
);
replacing an entry:
db.countries.replaceOne(
{ name: 'Brazil' },
{ name: 'Brazil', population: 210 }
);
deleting an entry:
db.countries.deleteOne({ name: 'Finland' });
deleting all entries:
db.countries.deleteMany({});
Create and modify a contact database
db.todos.find({ completed: false }).count();
$text
$regex
$gt
, $gte
, $lt
, $lte
$in
db.products.find({ name: { $text: 'fairphone' } });
db.products.find({
category: 'phone',
price: { $lt: 300 },
});
db.products.find({
category: { $in: ['laptop', 'tablet'] },
price: { $lt: 400 },
});
see: https://docs.mongodb.com/manual/reference/operator/query/
retrieving only specific fields:
db.products
.find({ category: 'phone' })
.project({ name: 1, price: 1 });
retrieves only name and price (and _id) of all phones
SQL = Structured Query Language
Standardized query language for tabular data
The standard is newer than most implementations
Standardized by ANSI and ISO
Old drafts of the standard (free):
open source:
proprietary:
https://db-fiddle.com (PostgreSQL, MySQL, SQLite)
https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_all (SQLite)
Desktop application:
https://sqlitebrowser.org/ (SQLite)
SQL statements are terminated by ;
Two variants of comments:
/* multi-line
comment */
-- single line comment
SQL is motly case insensitive
Convention: keywords capitalized, rest lowercase and separated by underscores
example:
SELECT first_name, last_name, tel FROM person;
Table names and column names are converted to uppercase by SQL, e.g. first_name
→ FIRST_NAME
, person
→ PERSON
Exceptions: In PostgreSQL names are converted to lowercase
If names should be used in their exact spelling they need to be quoted; this is rather uncommon
SELECT "First_Name", "Last_Name", "Tel" FROM "Person";
Exception: In MySQL would use backticks (`) in place of quotes; the mode ANSI_QUOTES
can be enabled for standards compliant behavior
command: CREATE TABLE
CREATE TABLE person(
name VARCHAR(50),
tel VARCHAR(20)
);
ISO / ANSI SQL Standard (selection):
BOOLEAN
INT
/ INTEGER
, SMALLINT
, BIGINT
NUMERIC
/ DECIMAL
REAL
, DOUBLE PRECISION
VARCHAR(n)
VARBINARY(n)
DATE
, TIME
, TIMESTAMP
INSERT INTO person (name, tel)
VALUES ('John Smith', '012345');
short form:
INSERT INTO person
VALUES ('John Smith', '012345');
Querying data of all persons:
SELECT name, tel FROM person;
or
SELECT * FROM person;
SELECT tel
FROM person
WHERE name = 'John Smith';
SELECT tel
FROM person
WHERE name LIKE '% Smith'
AND tel LIKE '+49%';
UPDATE person
SET name = 'John Miller'
WHERE name = 'John Smith';
DELETE FROM person
WHERE name = 'John Miller';
Create and modify a contact database
validation via JSON schema, e.g.:
const elementSchema = {
bsonType: 'object',
required: [
'atomic_number',
'symbol',
'name',
'atomic_mass',
],
properties: {
atomic_number: {
bsonType: 'int',
minimum: 1,
},
symbol: {
bsonType: 'string',
},
name: {
bsonType: 'string',
},
atomic_mass: {
bsonType: 'double',
},
},
};
db.createCollection('elements', {
validator: { $jsonSchema: elementSchema },
});
ISO / ANSI SQL Standard (selection):
BOOLEAN
INT
/ INTEGER
, SMALLINT
, BIGINT
REAL
, DOUBLE PRECISION
VARCHAR(n)
VARBINARY(n)
DATE
, TIME
, TIMESTAMP
Represented by the expressions TRUE
and FALSE
Deviations from the standard:
BOOLEAN
→ BIT
0
and 1
SMALLINT
(commonly 16 Bit)INT
/ INTEGER
(commonly 32 Bit)BIGINT
(commonly 64 Bit)NUMERIC
/ DECIMAL
(decimal numbers with variable accuracy)REAL
(commonly 32 Bit)DOUBLE PRECISION
(commonly 64 Bit)Deviations from the standard:
REAL
→ FLOAT
Decimal number with 10 digits before and 2 digits after the decimal point:
DECIMAL(12, 2)
MySQL distinguishes between:
SMALLINT
(-32768 to 32767)UNSIGNED SMALLINT
(0 to 65535)VARCHAR(10)
: Text of up to 10 charactersUsually Unicode is supported
On SQL Server: Use NVARCHAR
for Unicode support (needs twice as much space)
On Oracle: use VARCHAR2
Maximum length does not affect size on disk (but may affect size in memory)
Text is always delimited by single quotes:
INSERT INTO test VALUES ('Hello');
Escaping quotes by doubling:
INSERT INTO test VALUES ('Let''s go');
VARBINARY(n)
: Byte sequence with maximum length n
Deviations from the standard:
BYTEA
BLOB
, but VARBINARY
is acceptedTypes:
DATE
TIME
TIMESTAMP
: date and timeexamples:
'2013-02-14'
(recommended format)'13:02:17'
, '13:02:17.232'
'2013-02-14 13:02:17'
, '2013-02-14T13:02:17'
Deviations from the standard:
SQLite
TIMESTAMP
→ DATETIME
Limitations:
TIMESTAMP
is limited to years between 1970 and 2038 - a better alternative would be DATETIME
entries:
CREATE TABLE element(
atomic_number INT,
symbol VARCHAR(2),
name VARCHAR(20),
atomic_mass REAL
);
Constraining columns:
NOT NULL
UNIQUE
PRIMARY KEY
FOREIGN KEY
)Entry may not be left blank
CREATE TABLE element(
atomic_number INT NOT NULL,
symbol VARCHAR(3) NOT NULL,
name VARCHAR(20) NOT NULL,
atomic_mass REAL NOT NULL
);
Each entry in a column must be unique
CREATE TABLE element(
atomic_number INT NOT NULL UNIQUE,
symbol VARCHAR(2) NOT NULL UNIQUE,
name VARCHAR(20) NOT NULL UNIQUE,
atomic_mass REAL NOT NULL
);
Enables a unique identification of a row in a table
A primary key is always unique and not null
natural key:
CREATE TABLE element(
atomic_number INT PRIMARY KEY,
symbol VARCHAR(2) NOT NULL UNIQUE,
name VARCHAR(20) NOT NULL UNIQUE,
atomic_mass REAL NOT NULL
);
surrogate key:
CREATE TABLE element(
id INT PRIMARY KEY,
atomic_numer INT,
symbol VARCHAR(2) NOT NULL UNIQUE,
name VARCHAR(20) NOT NULL UNIQUE,
atomic_mass REAL NOT NULL
);
Automatic creation of a numeric primary key starting at 1:
Standard SQL (implemented in PostgreSQL, Oracle):
CREATE TABLE element(
id INT PRIMARY KEY GENERATED ALWAYS AS IDENTITY;
...
);
other variants:
AUTO_INCREMENT
AUTOINCREMENT
SERIAL
SQLite will always automatically generate a unique numeric key under the name rowid
In general: ordered lists can be searched much quicker than unordered lists (binary search)
Example: in the phone book we can quickly search for a last name, but not for a first name
Indexes may be added to one or more columns (additional data that references the entries in a specific order)
CREATE INDEX idx_name
ON element (name);
Querying for names will now be faster
CREATE TABLE element(
atomic_number INT,
symbol VARCHAR(2) NOT NULL UNIQUE,
name VARCHAR(20) NOT NULL UNIQUE,
atomic_mass REAL NOT NULL
);
CREATE INDEX idx_name
ON element (name);
INSERT INTO element(atomic_number, symbol, name, atomic_mass)
VALUES (1, 'H', 'Hydrogen', 1.008);
INSERT INTO element(atomic_number, symbol, name, atomic_mass)
VALUES (2, 'He', 'Helium', 4.0026);
SELECT *
FROM element
WHERE name='Hydrogen';
SQL: mostly scales vertically: adding resources to an existing server
MongoDB: mostly scales horizontally: adding additional servers (via sharding)
SQL: mostly uses atomic entries (and first normal form)
MongoDB: often includes composite entries (arrays, objects):
{
"name": "sue",
"groups": ["news", "sports"]
}
1 : 1
1 : n
m : n
0..1 : 1..1
0..1 : 0..n
0..m : 0..n
https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model
Properties of a database that guarantee its validity (protecting against errors):
tables:
template: Chinook Musikdatenbank
CREATE TABLE artist(
id INT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
name VARCHAR(20) NOT NULL,
country VARCHAR(5) NOT NULL,
year SMALLINT NOT NULL
);
INSERT INTO artist (name, country, year)
VALUES ('The Beatles', 'UK', 1960);
INSERT INTO artist (name, country, year)
VALUES ('AC/DC', 'AUS', 1973);
Reference to one entry in another table
example: each entry in the table song may reference an entry in the table artist via the column artist_id
CREATE TABLE song(
id INT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
title VARCHAR(30) NOT NULL,
artist_id INT,
FOREIGN KEY(artist_id) REFERENCES artist(id)
);
INSERT INTO song (title, artist_id)
VALUES ('Let it Be', 1);
A foreign key guarantees the existence of a matching entry in the other associated table
INSERT INTO song (title, artist_id)
VALUES ('Wish You Were Here', 10);
→ error
SELECT song.title, artist.name
FROM artist
INNER JOIN song
ON artist.id=song.artist_id;
The above code lists all combinations where artist.id
and song.artist_id
match
SELECT song.title, artist.name
FROM song
LEFT JOIN artist
ON artist.id=song.artist_id;
The above code lists all combinations and also includes songs that don't have an artist defined
SQL:
SELECT * FROM iris;
SQLAlchemy (Python):
session.query(Iris)
mongo shell (JS):
db.iris.find({});
Pandas (Python): N/A
SQL:
SELECT sepal_length, sepal_width FROM iris;
SQLAlchemy (Python):
session.query(Iris.sepal_length, Iris.sepal_width)
Mongo shell:
db.iris.find({}, { sepal_length: 1, sepal_width: 1 });
Pandas:
iris_data.loc[:,["sepal_length", "sepal_width"]]
SQL:
SELECT * FROM iris WHERE name='Iris-setosa';
SQLAlchemy (Python):
session.query(Iris).filter(Iris.name="Iris-setosa")
mongo shell:
db.iris.find({ name: 'Iris-setosa' });
pandas (Python):
iris_setosa_data = iris_data.loc[
iris_data["name"] == "Iris-setosa"
]
pandas (Python): selecting a range of rows:
iris_data.iloc[10:20]
SQL:
SELECT sepal_length, sepal_width
FROM iris
WHERE name='Iris-setosa';
SQLAlchemy (Python):
session.query(
Iris.sepal_length, Iris.sepal_width
).filter(Iris.name="Iris-setosa")
mongo shell:
db.iris.find(
{ name: 'Iris-setosa' },
{ sepal_length: 1, sepal_width: 1 }
);
pandas (Python):
iris_data.loc[
[iris_data["name"] == "Iris-setosa"],
["sepal_length", "sepal_width"],
]
SQL:
SELECT sepal_length, sepal_width
FROM iris
ORDER BY sepal_length;
SQLAlchemy:
session.query(
Iris.sepal_length, Iris.sepal_width
).order_by(Iris.sepal_length)
mongo shell:
db.iris
.find({}, { sepal_length: 1, sepal_width: 1 })
.sort({ sepal_length: 1 });
pandas (Python):
iris_data.loc[["sepal_length", "sepal_width"]].sort_values(
by="sepal_length"
)