This chapter is all about SQL, which is the industry standard database query language.
What is SQL?
Structured Query Language, or SQL, is a language used to communicate with a database. It is the standard language for a relational database, and is used for structuring, storing, manipulating, and retriving data. While SQL is a standard language, every database implements its own version of SQL with slightly different behaviours. SQL usage between databases doesn't vary widely, so the techniques here will transfer readily to other database systems.
SQL is split into two main sublanguages; a Data Definition Language (DDL) and a Data Manipulation Language (DML). It can also be generated programmatically through other applications; for example, you may have worked with JDBC in the past. A popular method of generating SQL from object-oriented definitions is an ORM like SQLAlchemy. We'll jump right into describing the SQL syntax and usage.
All data in a relational database is stored in tables. A table is not unlike an Excel table; each column defines an attribute name and type, and each row is an instance of the table's schema. A schema is a blueprint for what data in the given table should look like, including types and constraints.
To create a table in SQL, we use the
CREATE TABLE syntax to define our table name, attribute names and types, as well as any constraints on our data. A sample table creation is below:
CREATE TABLE person ( id int PRIMARY KEY, full_name varchar(255) NOT NULL, age int );
Here, we've defined a table called
person with three attributes,
id attribute is of type
int and is a primary key - this means that it uniquely identifies a row in the
person table and cannot be null. The
full_name attribute if of type
varchar(255) - a string of up to length 255 characters - and cannot be null. The
age attribute is of type
int and can be null.
If we define multiple tables, we can define foreign key relationships between them. A foreign key in one table references the primary key of another, and is the key behind the relationality of a relational database:
CREATE TABLE dogs ( id int PRIMARY KEY, pet_name varchar(255) NOT NULL, owner_id int REFERENCES person(id) );
To drop a table, use the
DROP TABLE <table_name> syntax. This will delete the table and all of the data in it.
To alter a table, use the
ALTER TABLE <table_name> syntax. You can
RENAME columns and constraints.
Constraints are a way for SQL to check that certain properties always hold for your data. You've already seen some examples of constraints:
NOT NULL and
PRIMARY KEY are two such constraints. Constraints on a single attribute can appear after the attribute in table creation. Constraints on multiple attributes can appear outside of any column declaration. As we've studied, certain sets of constraints can fulfill certain properties, which can make our databases easier to work with; try to conform to a normal form if possible. The following is an example of some common constraints:
CREATE TABLE Cats ( id int PRIMARY KEY, -- UNIQUE and NOT NULL num_legs int NOT NULL, -- Cannot be NULL pet_name varchar(255) UNIQUE, -- Must be unique owner_id int DEFAULT 0, -- Specify a default value
Constraints can come after DDL:
CREATE TABLE Cats ( id int, owner_id int, PRIMARY KEY (id), FOREIGN KEY (owner_id) REFERENCES Person(id)
And here's an example of table
CREATE TABLE Cats ( id int PRIMARY KEY, num_legs int NOT NULL, CHECK (num_legs < 5)
Global constraints are not supported by all databases, and are thus omitted from discussion here.
Now that we understand how data is organized into tables, we can start selecting data from tables. Select queries in SQL follow a basic syntax:
SELECT <columns...> FROM <tables...> WHERE <filters...>;
For example, let's say I had the
Person table above. I could select all of the rows and columns:
SELECT * FROM Person;
SELECT id, age FROM Person;
I could order my output or limit the number of rows emitted:
SELECT * FROM Person ORDER BY age [ASC|DESC];
SELECT * FROM Person LIMIT 10;
I can ask for non-duplicate output:
SELECT DISTINCT full_name FROM Person;
I can also rename columns when outputting:
SELECT id AS person_id FROM Person;
Inserts, Updates, and Deletes
To insert data into a table, use the
INSERT INTO syntax, optionally specifying the order of the columns you'll be inserting:
INSERT INTO Person (id, full_name, age) VALUES (1, "Barry", 19); INSERT INTO Person VALUES (1, "Barry", 19);
Updates can be done like so:
UPDATE Person SET age=20 WHERE id=1;
DELETE FROM Person WHERE id=1;
To delete all data, but leave the table intact:
DELETE FROM Person;
Filters and Aggregation
Specifying filters on your data is the job of the
WHERE clause. You can use comparison operations (=, <, <=, >, >=), text operators (LIKE), range operators (BETWEEN), or many more (IS NOT NULL, IN). Each filter can be joined using logical operators
NOT. The following are some examples:
SELECT id FROM Person WHERE id < 10;
SELECT id FROM Person WHERE full_name = "Barry the Bee";
SELECT id FROM Person WHERE id BETWEEN 10 AND 100;
SELECT id FROM Person WHERE age IS NOT NULL;
SELECT id FROM Person WHERE full_name LIKE "B%"; /* Everything that starts with 'B' */
You can also aggregate your data using a suite of aggregation functions:
SUM, and more. To get the average of each subset of data, use the
GROUP BY clause. The attribute you group by must be a selected column. To filter data based on aggregations, use the
HAVING clause. The following are some examples:
/* Get the average age across all people. */ SELECT AVG(age) FROM Person;
/* Get the number of people in each age group. */ SELECT COUNT(*), age FROM Person GROUP BY age;
/* Get the age groups with more than 10 people in them. */ SELECT COUNT(*) AS num, age FROM Person GROUP BY age HAVING num > 10;
Given multiple tables, it may be beneficial to combine tables on some common attribute. Given the
Person relations above, it might be useful to join the
Person(id) attribute with the
Cat(owner_id) attribute. To do so, SQL exposes a number of
JOIN operations, each one preserving a different amount of data on either side of the join.
We'll focus on
CROSS JOIN and
INNER JOIN. The
CROSS JOIN returns every possible combination of rows from the left and right table combined. If the left table had \(m\) rows, and right right \(n\) rows, expect to have \(mn\) rows in the joined table. An example is below:
-- Get all pairs of people and cats SELECT * FROM Person, Cat;
INNER JOIN returns pairs of rows that share some common attribute. If a row on one of the two tables has no corresponding match in the other table, then it will simply not be included in the result. Other joins (
OUTER JOIN) will include elements with no matches. An example is below:
-- Get all owners and pets together SELECT * FROM Person INNER JOIN Cat ON Person.id = Cat.owner_id;
NATURAL (INNER) JOIN is an
INNER JOIN that infers the columns that should be joined on.
Given the results of multiple subqueries, SQL allows you to combine the results using set operations. The
UNION (ALL) operator adds the two result sets together, returning rows that are in either set (adding
ALL preserves duplicates). The
INTERSECT operator intersects the result sets, returning rows that are in both sets. The
MINUS operator returns rows that are in the first set, but not the second. An example of how to use one of these operators is below:
-- Gets everybody below 10 years of age and above 18 years of age. (SELECT * FROM Person WITH age < 10) UNION -- Add the top set to the bottom set (SELECT * FROM Person WITH age > 18)
Sometimes for very complex queries, it is helpful to break a query up into several smaller queries, then combine their results. This is achievable using SQL subqueries, which can be nested inside of a SELECT, INSERT, UPDATE, or DELETE statement by enclosing the subquery with parentheses. For example:
(SELECT * FROM Person WITH age == ( SELECT MAX(age) FROM Person ));
It is often helpful to avoid nesting by naming the subqueries:
WITH maximum_age(age) AS (SELECT MAX(age) FROM Person) SELECT * FROM Person WITH age == maximum_age.age;
Where to now?
There is a whole host of advanced SQL functions, including views, triggers, indices, functions, recursion, and more. Depending on the system you are using or building, a different set of these features may be available at varying levels of fidelity. For reference on what might be available, peruse the PostgreSQL documentation!