SQL

For Data Science

Matt Deitke

2019

ID Name

1 ‘a’

2 ‘b’

3 ‘c’

4 ‘d’

5 ‘e’

ID Name

4 ‘v’

5 ‘w’

6 ‘x’

7 ‘y’

8 ‘z’

Last updated: August 6, 2019

Acknowledgements

The following is a collection of my lecture notes on the topic of SQL. These notes primarily

follow the Coursera course “SQL for Data Science” from the University of California, Davis

taught by Sadie St. Lawrence,

with the exception of my choice of using Python as a running

example for demonstration.

With Google Colab, I have uploaded the code for each section to a Jupyter Notebook,

which

will allow anybody to follow along and run the code locally. The perminant URL to these

notes, and others I have written about, can be found on my website mattdeitke.github.io.

If I have made an error assigning credit to somebody’s work, a technical error, or a grammat-

ical error, please email me mdeitke23@gmail.com or Tweet me @MattDeitke. More notes

and information about the author are located on his website at mattdeitke.github.io.

http://sadiestlawrence.com/

https://colab.research.google.com/drive/1CPQmPIx -r9SnCrf08IiC7PvMp3Ra9W6

Contents

1 Introduction 2

1.1 ER Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Connecting a database to Python . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Creating tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Retrieving data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Temporary tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.6 Adding comments to SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.7 Pandas dataframe from SQLite database . . . . . . . . . . . . . . . . . . . . . 8

2 Filtering, sorting, and calculations 10

2.1 WHERE operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 IN, OR, AND, and NOT operators . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Wildcards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4 ORDER BY operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5 Math operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.6 Aggregate functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.7 GROUP BY and HAVING command . . . . . . . . . . . . . . . . . . . . . . . 15

2.8 Order of clauses in a SELECT statement . . . . . . . . . . . . . . . . . . . . 15

3 Subqueries and joins 17

3.1 Using subqueries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2 Joining tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.1 Cartesian (cross) joins . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.2 Inner joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.3 Aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.4 Self join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2.5 Left join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.6 Right join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.7 Full outer join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.8 Union . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

CONTENTS iii

4 Modifying and analyzing data 26

4.1 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1.1 Concatenations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1.2 Trimming strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.1.3 Substring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.1.4 Uppercase and lowercase . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1.5 Date and time strings . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 Case statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.3 Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Notation

We will be using the monospace font to denote methods, function calls, and variable names.

Code blocks will have a background. Each Python code block will be in gray with an

occassional output block in white.

Python code block

Output

Each SQL code block will have a yellow background with the output in a white background.

SQL code block

Output

An output block will always follow an SQL or Python code block, so there is no need

to change the style for an output block in each language. We will also be running SQL

commands inside of a Python environment, in which case we will use a Python code block.

Chapter 1

Introduction

SQL

(Structured Query Language) is a programming language that will allow us to ma-

nipulate large amounts of data. In particular, we can query, insert, update, and modify

data in a database. Unlike most languages, SQL is not used to write applications; instead,

it is used to communicate with databases. The primary goal we have when using SQL is to

read or retrieve data, write dat a, and update data. To carry out these tasks, we use

SQL with a relational database management system, which include some popular ones like

Microsoft SQL Server, MySQL, IBM DB2 Oracle, Sybase ASE, PostgreSQL, SQLite, and

Apache Open Oﬃce Base. When using diﬀerent database management systems, the speciﬁc

syntax for a SQL task may vary slightly, but will generally follow the same structure.

Each database can be thought of as a collection of tables. In each table we will have rows

and columns. The columns will store the diﬀerent variables and the rows will record distinct

observations. As we have gotten more data in the recent years, other types of relational

databases have come up, including NoSQL (Not Only SQL), which gives users a way to

store and retrieve unstructured data without tabular relations.

The main two types of database models include relational models and transactional models.

With relational models, data is neatly stored and organized into a table, allowing the user to

easily look-up and access a particular piece of information. In contrast, transactional models

are much less organized and often need to be translated into a relational model in order to

easily access the information. For example, a doctor scribbling down everything that the

patient says is unstructured and would be added to a transactional model. If somebody

went in and organized the doctors noted into distinct categories, such as symptoms, blood

pressure, temperature, then that data would be stored like a relational model.

1.1 ER Diagrams

Relational models specify three things: entities, attributes, and relationships. An entity

is used to diﬀerentiate between observations, which could be a name or an ID. The at-

tributes are the values of the variables for each entity. The relationships of an entity and

attributes can be described using graph-like models. We call the graph-like models Entity-

Relationship (ER) Diagrams. If each entry maps to many other entries, we say the

relationship is one-to-many. Similarily, there exist relationships that are many-to-many and

one-to-one. An example of a one-to-many relationship would be tracking a single customers

entire invoice history.

We use ER diagrams to show how diﬀerent tables are linked together. ER diagram notations

may vary. The primary notations are the Chen Notation, Crow’s Foot Notation, and UML

Class Diagram Notation, which are shown in Figure 1.1.2.

Pronounced “see-quel” or “S-Q-L”

1.2. CONNECTING A DATABASE TO PYTHON 3

Figure 1.1.1: ER diagram

Figure 1.1.2: Diﬀerent types of ER diagram notations

1.2 Connecting a database to Python

In Python, we will be using the relational database management system SQLite with the

module sqlite3. To use the module, we simply can import it.

import sqlite3

We will be using a local database as an example throughout this text, but one could ﬁnd

many ways to connect to a database on the cloud in practice. To connect to a database,

we will need to specify the path using the connect(path) method. For now, let us call our

database data.db. If there is no ﬁle in the path speciﬁed, the ﬁle will automatically be

created.

conn = sqlite3.connect(‘data.db’)

For developmental purposes, we could also deﬁne a database inside of Python memory by

specifying the path to be :memory:. Now, instead of creating a new database for develop-

mental purposes, Python will create and destroy a new database after each session.

conn = sqlite3.connect(‘:memory:’)

4 CHAPTER 1. INTRODUCTION

It is typical to use the variable name conn when connecting to a database. Once we have

connected to the database, we need to create a cursor with the cursor() method, which

will enable us to run SQL commands.

c = conn.cursor()

It is also typical to use c as the variable name when creating a cursor, but it does not

matter as long as the variable name is consistent throughout the script. Now, we have

setup everything necessary in order to run an SQL command, which can be done using the

execute() method from the cursor object.

c.execute(<INSERT SQL COMMAND>)

We will come back to speciﬁc commands we can run in our database soon, but for now, we

can end our session by closing the connection to the database with the close() command.

conn.close()

1.3 Creating tables

To start exploring SQL commands, we will begin with an empty database and create a table

named Shoes using the SQL command CREATE TABLE.

CREATE TABLE Shoes (

ID char(12) PRIMARY KEY,

Brand char(10) NOT NULL,

Type char(250) NULL,

Color char(250) NULL,

Price decimal(8,2) NULL,

Desc Varchar(750) NULL

);

Each parameter that we specify inside of the CREATE TABLE method creates a new column.

We use PRIMARY KEY to characterize the entity. We also deﬁne the variable types that the

values in the column will take on along with if null values are accepted. The variable type

char must have a ﬁxed character length, while the length of Vachar can diﬀer. We would

use null values if we did not have all the information for a particular observation. Primary

keys can never have a null value.

To add an observation into our table we will use the INSERT INTO command.

INSERT INTO Shoes VALUES (

‘1234567890’,

‘Nike’,

‘Air Jordan’,

‘Red’,

‘450.00’,

NULL

);

Notice that when using the INSERT INTO command, each value must correspond to the

corect column. It may be hard to remember or know the order of the columns, so instead,

we can be more explicit when adding values, by ﬁrst specifying the column order.

1.4. RETRIEVING DATA 5

INSERT INTO Shoes (

ID,

Brand,

Type,

Color,

Price,

Desc

) VALUES (

‘1234567890’,

‘Nike’,

‘Air Jordan’,

‘Red’,

‘450.00’,

NULL

);

This also gives us the ﬂexibity to insert less values than there are columns.

In order to run SQL commands from a Python interface, we must use the execute(SQL

command) command. We will assume a database has been connected with the variable conn

and a cursor, c, was created from that connection.

c.execute(‘‘‘

CREATE TABLE Shoes (

ID char(12) PRIMARY KEY,

Brand char(10) NOT NULL,

Type char(250) NULL,

Color char(250) NULL,

Price decimal(8,2) NULL,

Desc Varchar(750) NULL

);

’’’)

Note that the triple quote block allows us to write strings in a multiline environment.

However, even after running this code our database has not been modiﬁed. In order to

permanently change the database from python we must call commit() from the variable

connected to the database. We can think of a database commit similar to a commit when

using Git.

conn.commit()

Regardless of the SQL command, we will use the same execute(SQL command) method in

order to stage changes to our database. Therefore, if we do not explicitly write the Python

code block after an SQL command is newly introduced, then it is implied that in order

to use it in Python, one must call c.execute(SQL command). We do this because SQL is

frequently used outside of Python and this will allow us to focus our attention more on SQL

at times.

1.4 Retrieving data

To get data from a table, we must specify at least two things: what information we would

like and the name of the table. Using the Shoes table we have just created, we will populate

it with a few more observations. For each of these observations, we will only specify the ID

and Brand, where the Brand name must be unique.

6 CHAPTER 1. INTRODUCTION

INSERT INTO Shoes (ID, Brand) VALUES (‘2345678901’, ‘LeBron’);

INSERT INTO Shoes (ID, Brand) VALUES (‘3456789012’, ‘Adidas’);

INSERT INTO Shoes (ID, Brand) VALUES (‘4567890123’, ‘Yeezy’);

INSERT INTO Shoes (ID, Brand) VALUES (‘5678901234’, ‘Puma’);

INSERT INTO Shoes (ID, Brand) VALUES (‘6789012345’, ‘New Balance’);

Now we will be able to get more information when retrieving data from our table. To retrieve

a variable, such as ID, from the Shoes table, we use the SELECT and FROM commands inside

of SQL.

SELECT ID FROM Shoes;

In Python, after running this command inside of the c.execute() method nothing will be

returned nor printed. In order to access the variables we are selecting we must fetch them.

For now, we will use fetchall, which will return all of the selected variables.

c.execute("SELECT ID FROM Shoes;")

c.fetchall()

[(‘1234567890’,),

(‘2345678901’,),

(‘3456789012’,),

(‘4567890123’,),

(‘5678901234’,),

(‘6789012345’,)]

We now have a Python list of tuples corresponding to each observation. In each observation

tuple, we see the values for each ID variable we have set. SQL stores this information as a

tuple in order to more easily let us retrieve multiple variables from a table, which can be

done by specifying all the variables after the SELECT command, separated with a comma.

c.execute("SELECT ID, Brand FROM Shoes;")

c.fetchall()

[(‘1234567890’, ‘Nike’),

(‘2345678901’, ‘LeBron’),

(‘3456789012’, ‘Adidas’),

(‘4567890123’, ‘Yeezy’),

(‘5678901234’, ‘Puma’),

(‘6789012345’, ‘New Balance’)]

If we want to retrieve every single value from the table, we can specify the asterisk (*)

character after the SELECT command.

c.execute("SELECT * FROM Shoes;")

c.fetchall()

1.5. TEMPORARY TABLES 7

[(‘1234567890’, ‘Nike’, ‘Air Jordan’, ‘Red’, 450, None),

(‘2345678901’, ‘LeBron’, None, None, None, None),

(‘3456789012’, ‘Adidas’, None, None, None, None),

(‘4567890123’, ‘Yeezy’, None, None, None, None),

(‘5678901234’, ‘Puma’, None, None, None, None),

(‘6789012345’, ‘New Balance’, None, None, None, None)]

When working with larger databases, it is often not a good choice to view the entire dataset,

because it stores so many observations. Instead, we can specify how many observations we

want to see using the LIMIT operator.

c.execute("SELECT * FROM Shoes LIMIT 3;")

c.fetchall()

[(‘1234567890’, ‘Nike’, ‘Air Jordan’, ‘Red’, 450, None),

(‘2345678901’, ‘LeBron’, None, None, None, None),

(‘3456789012’, ‘Adidas’, None, None, None, None)]

1.5 Temporary tables

We often do not need to want to create an entirely new table for many tasks. For example, if

we are creating a new table to store intermediate calculations before inserting that resultant

value into a table, then we will not need the table once the session is over. So, instead of

creating a table, we will use temporary tables, which are faster to create and removed from

memory after a session is closed. To create a temporary table, we use the SQL command

CREATE TEMPORARY TABLE.

CREATE TEMPORARY TABLE Nikes AS

SELECT *

FROM shoes

WHERE Brand = ‘Nike’;

Here, we have created a temporary table called Nikes that stores only the observations from

the table that have the value of the Brand variable as Nike.

1.6 Adding comments to SQL

Inside of an SQL environment, there are two says to add comments: single-line comments

and multiline comments. To add a single like comment to SQL, we add two minus signs

(--) to the beginning of the part we want commented out. For example, the below example

will comment out the selection of Brand.

select ID,

--Brand,

Color

FROM Shoes

);

To comment out multiple lines, we use /* < COMMENTED OUT AREA > */.

8 CHAPTER 1. INTRODUCTION

select ID,

/*Brand,

Color*/

FROM Shoes

);

1.7 Pandas dataframe from SQLite database

Pandas is a module in Python that is commonly used to represent and analyze dataframes.

Instead of using SQL from Python to analyze data, it is often faster and easier to ana-

lyze data natively on Pandas. We only need the sqlite3 and pandas modules for this

demonstration, which can be imported once they are installed locally.

import sqlite3

import pandas as pd

In the remainder of this section, we will be looking at how to load a larger SQLite database

with multiple tables. We will use the Kaggle database titled “The History of Baseball” from

Sean Lahman, which can be accessed here.

The database contains 26 unique tables inside

of the database.sqlite ﬁle. The ﬁletype .sqlite speciﬁes that a database must be of

type SQLite, while a ﬁletype of .db allows for many other types of databases, in addition

to SQLite.

With multiple tables, it is often a good idea to store each of the tables inside of a dictionary.

As we have previously done, we can set a connection and cursor to the new databack with

the connect() and cursor() commands.

conn = sqlite3.connect(‘the-history-of-baseball/database.sqlite’)

c = conn.cursor()

We can get the names of all the tables with a SELECT command and fetchall() method,

where we are choosing from sqlite master.

c.execute("SELECT name FROM sqlite_master WHERE type=’table’;")

c.fetchall()

[(‘all_star’,),

(‘appearances’,),

(‘manager_award’,),

(‘player_award’,),

(‘manager_award_vote’,),

(‘player_award_vote’,),

(‘batting’,),

(‘batting_postseason’,),

(‘player_college’,),

(‘fielding’,),

(‘fielding_outfield’,),

(‘fielding_postseason’,),

(‘hall_of_fame’,),

(‘home_game’,),

(‘manager’,),

(‘manager_half’,),

https://www.kaggle.com/seanlahman/the-history-of-baseball

1.7. PANDAS DATAFRAME FROM SQLITE DATABASE 9

(‘player’,),

(‘park’,),

(‘pitching’,),

(‘pitching_postseason’,),

(‘salary’,),

(‘college’,),

(‘postseason’,),

(‘team’,),

(‘team_franchise’,),

(‘team_half’,)]

We can create a dataframe from each table using the Pandas method read sql query().

We will save all of the tables into a dictionary with the name tables.

tables = {}

c.execute("SELECT name FROM sqlite_master WHERE type=‘table’;")

for table in c.fetchall():

tables[table[0]] = pd.read_sql_query(‘SELECT * FROM ’ + table[0], conn)

Now we have stored all the tables as values in the dictionary, which can be acceessed like

any other Python dictionary.

Chapter 2

Filtering, sorting, and

calculations

2.1 WHERE operator

Filtering through a database gives us a way to look at only speciﬁc observations and variables

for a given table. By lowering the number of observations and variables we look at, our

analysis will run faster. It is also best to ﬁrst ﬁlter a database from SQL, before creating

a dataframe from Python if there is data that needs to be removed, since not all the data

would need to originally be copied to Python.

The WHERE operator in SQL gives us a way to ﬁlter from a SELECT command. With the

WHERE operator we must specify the column name, operator, and value we want to compare.

SELECT column_name1, column_name2

FROM table_name,

WHERE column_name1 <operator> value;

Operator Description

= Equal

<> Not equal

> Greater than

< Less than

>= Greater than or equal

<= Less than or equal

BETWEEN Inclusive range

IS NULL Test for null values

NOT NULL Tests for non-null values

Table 2.1: Operator types in SQLite

The primary operators in SQLite are shown in Table 2.1. Using the Shoes table we pre-

viously worked with, we can access only the shoes from the “Nike” Brand with Brand =

‘Nike’.

2.2. IN, OR, AND, AND NOT OPERATORS 11

SELECT *

FROM Shoes

WHERE Brand = ‘Nike’;

If we are using numerical variables, we could ﬁlter for values in a range. For example, if we

want the observations where the price is greater than 45, we would write statement Price

> 45.

SELECT *

FROM Shoes

WHERE Price > 45;

To set the upper and lower bounds of a value, we use the BETWEEN operator, with the syntax

BETWEEN lower bound AND upper bound.

SELECT *

FROM Shoes

WHERE Price BETWEEN 300 AND 500;

When using the IS NULL or NOT NULL operators, we do not provide a value to test against,

since we are only checking if the value is null.

SELECT *

FROM Shoes

WHERE Price IS NULL;

2.2 IN, OR, AND, and NOT operators

The IN operator in SQL works similar to ﬁnding if α ∈ S, where S is a mathematical set.

To deﬁne the set, we must specify all the possible values that we would like. For example,

in the following code block, we check in each observation if the Price is 99, 199, or 299.

SELECT *

FROM Shoes

WHERE Price IN (99, 199, 299);

The OR operator works similar to the IN operator, in that it checks if any of the conditions

speciﬁed hold true. The order that the OR operator checks is from left-to-right, and if a

condition is true, then we do not have to test through all the other conditional statements,

since one true statement in an OR operator results in the entire operation being true.

SELECT *

FROM Shoes

WHERE Price = 99 OR Price = 199;

We typically prefer using the IN operator, since it allows us to more easily specify many

conditions and runs faster. With the IN operator, the order in which we add the items does

not matter.

The AND operator will only result in true if both of the conditions are true. We can string

together multiple AND and OR operations into a single WHERE statement. And has a higher

precedence than the OR operator.

12 CHAPTER 2. FILTERING, SORTING, AND CALCULATIONS

SELECT *

FROM Shoes

WHERE (Price = 99 OR Price = 199) AND Brand = ‘Nike’;

The NOT operator typically is used to negate a conditional statement. We can use either the

NOT operator or <> operator to achieve the same results of not ﬁltering for a speciﬁc value.

The performance of each operator is the same, so there is really no beneﬁt when using one

over the other.

SELECT *

FROM Shoes

WHERE NOT ID = ‘0123456789’ AND NOT Price = 99;

The NOT operator has higher precedence than the AND operator, which also has higher

precedence than the OR operator.

2.3 Wildcards

Wildcards give us a way to ﬁlter for a particular value, if we do not know all the information

about that value. There are two wildcard characters that do the same thing, the underscore

( ) and percent (%) characters. If, for example, we have multiple Brand values that have the

string “Jordan” inside of it, such as “Air Jordan” or “Jordan Max,” then we could search

if the string “Jordan” exists in the Brand value with the following code.

SELECT *

FROM Shoes

WHERE Brand LIKE ‘%Jordan%’;

Instead of using the % character, we can achieve the same results using the character.

SELECT *

FROM Shoes

WHERE Brand LIKE ‘_Jordan_’;

The wildcard characters are speciﬁying that any value can come before or after we see the

string “Jordan,” which means we are just looking for if the word “Jordan” exists in the

value. If, instead, we want to check and see if the word starts with “Jordan,” we could run

the following code.

SELECT *

FROM Shoes

WHERE Brand LIKE ‘Jordan_’;

Similarily, to check if “Jordan,” is at the end of the value, we could run the following.

SELECT *

FROM Shoes

WHERE Brand LIKE ‘_Jordan’;

While wildcard characters work great, they take longer to run than the comparison operators

we have previously introduced. Also, wildcard characters can only be used with varaibles

that hold string values.

2.4. ORDER BY OPERATOR 13

2.4 ORDER BY operator

We can sort the data we retrieve from the SELECT statement using the ORDER BY operator.

The two ways we can sort data are in ascending and descending order, with the ASC and

DESC operators, respectively. The ORDER BY operator always comes at the end of the SELECT

statement. By default, if we do not specify which order to sort, the sorting will occur in

ascending order, as shown.

SELECT *

FROM Shoes

ORDER BY Brand;

To specify ascending or descending order, that command must follow the column name.

SELECT *

FROM Shoes

ORDER BY Brand DESC;

We can chain together multiple ORDER BY statements. Chaining the statements together will

ﬁrst order by the ﬁrst variable speciﬁed, then order the ties based on the second varaible

speciﬁed, and continue this pattern for all the order speciﬁed. For example, the code block

below orders ﬁrst by the Brand name, and then if two Brand names are equal, the tiebreaker

will be the ID.

SELECT *

FROM Shoes

ORDER BY Brand, ID ASC;

2.5 Math operators

Inside of a SELECT statement, we can add additional variables, which can be formed using

arithmetic operations. For example, if we had the variables Price and Units from the

Shoes table, then we could track the Total Cost as follows.

SELECT Price,

Units,

Units * Price AS Total_Cost

FROM Shoes;

By specifying Total Cost after the AS operator, we are storing that as the varible name, and

the calculation is done for each row. We can use all the basic arithmetic operations, which

include parentheses (()), multiplication (*), division (/), addition (+), and subtraction (-).

2.6 Aggregate functions

Aggregate functions in SQL allow us a quick way to summarize our data. The primary

aggregate functions are shown in Table 2.2. Aggregate functions ignore null values.

Notice that for the COUNT() method, we are able to use two diﬀerent types of parameters:

the column name and all the columns. The speciﬁc syntax when using aggregate functions

is shown below.

SELECT COUNT(ID) AS id_count

FROM Shoes;

14 CHAPTER 2. FILTERING, SORTING, AND CALCULATIONS

Function Description

AVG(X) Average value in a column

COUNT(X) Number of values in a column

COUNT(*) Number of rows in the table

MIN(X) Minimum value in a column

MAX(X) Maximum value in a column

SUM(X) Integer valued sum over each value in a column

TOTAL(X) Floating point valued sum over each value in a column

Table 2.2: Primary aggregate functions. We are using X to denote the name of a column.

We can run multiple aggregate functions at once in Python. We will be using the batting

table from “The History of Baseball” dataset we download earlier. In this dataset, they

store empty strings instead of null values, so running the code NULLIF(var, ’’) will set

empty string values to null in the var column. We can run aggregate functions with the

following code block.

c.execute("""

SELECT

AVG(NULLIF(h, ’’)),

MAX(NULLIF(h, ’’)),

MIN(NULLIF(h, ’’)),

COUNT(*),

COUNT(NULLIF(h, ’’)),

SUM(NULLIF(h, ’’)),

TOTAL(NULLIF(h, ’’))

FROM batting;""")

c.fetchall()

[(39.261647068608795, 262, 0, 101332, 96183, 3776303, 3776303.0)]

Notice that we do not need to specify a variable name for each select statement. However,

if we would like to access that data at a later time, then we must assign it to a variable. We

can also use previously introduced ﬁltering operations to analyze more speciﬁc data. For

example, we can ﬁlter for the year 2000 as follows.

c.execute("""

SELECT

AVG(NULLIF(h, ’’)),

MAX(NULLIF(h, ’’)),

MIN(NULLIF(h, ’’)),

COUNT(*),

COUNT(NULLIF(h, ’’)),

SUM(NULLIF(h, ’’)),

TOTAL(NULLIF(h, ’’))

FROM batting

WHERE year = 2000;""")

c.fetchall()

[(32.69219653179191, 240, 0, 1384, 1384, 45246, 45246.0)]

2.7. GROUP BY AND HAVING COMMAND 15

If we want the total number of distinct values, such as the total number of unique Brand

names in our Shoes table, we can use the DISTINCT operator. The DISTINCT operator will

count each category only once and can be used with the syntax that follows.

SELECT COUNT(DISTINCT Brand)

FROM Shoes;

2.7 GROUP BY and HAVING command

If we want to create groups to organize and ﬁlter diﬀerent row categories, then we can use

the GROUP BY and HAVING commands. When using the GROUP BY command, we must specify

how we want to group our items. If, for example, we wanted to group our Shoes table by

brand, we could write GROUP BY Brand.

SELECT *

FROM Shoes

GROUP BY Brand;

The HAVING clause follows the grouping and allows us to ﬁlter based on the groups we have

assigned. If we want to remove all of the rows that have a unique brand name, then we

could write HAVING count(Brand) > 1.

SELECT *

FROM Shoes

GROUP BY Brand

HAVING count(Brand) > 1;

We can also change the columns we select to display information about the groups. If we

wanted to know the number of items in the group, maximum value, and minimum value,

we count write the following.

SELECT COUNT(Brand), MAX(Brand), MIN(Brand)

FROM Shoes

GROUP BY Brand

HAVING count(Brand) > 1;

2.8 Order of clauses in a SELECT statement

When using a SELECT statement in SQL, each clause must follow a speciﬁc order. The

particular order and descriptions of each clause is shown in Table 2.3.

16 CHAPTER 2. FILTERING, SORTING, AND CALCULATIONS

Order Clause Description Required

1 SELECT Columns to be returned 3

2 FROM Table to retrieve data from Only when using tables

3 WHERE Row-level ﬁltering 7

4 GROUP BY Create groups 7

5 HAVING Group-level ﬁltering 7

6 ORDER BY Sorting speciﬁcation 7

7 LIMIT Caps the maximum number of observations 7

Table 2.3: Clause ordering with a SELECT statement

Chapter 3

Subqueries and joins

3.1 Using subqueries

As we have seen with “The History of Baseball” database, a database may contain multiple

tables. Subqueries allow us a way to combine data from diﬀerent tables to analyze it all

in one table. A subquery works by embedding one query inside of another query. To set

up why subqueries could be useful, let us consider ﬁnding the number of all-stars who were

born in January from our history of baseball database. We also know that the birth month

is a variable in the player table, and we can ﬁnd out if somebody was an all star if their

player id is in the all star table. Without subqueries, we would ﬁrst need to ﬁlter for

birth month = 1.

SELECT player_id

FROM player

WHERE birth_month = 1;

[(‘abercda01’,),

(‘abernbi01’,),

(‘abreujo02’,),

(‘actama99’,),

(‘youngan01’,),

(‘yellela01’,),

(‘ydeem01’,),

(‘wynnea01’,)]

The result turns out to be around 1612 players, which we now need to copy into the WHERE

clause when ﬁltering through all stars.

SELECT *

FROM all_star

WHERE player_id IN (‘abercda01’, ‘abernbi01’, ‘abreujo02’, ‘actama99’,

..., ‘youngan01’, ‘yellela01’, ‘ydeem01’, ‘wynnea01’);

After copying and formatting all 1612 player IDs into the WHERE clause, we will get the

combination of all-star players who are born in January. This process required plenty of

manual assistance, which does is not ideal.

18 CHAPTER 3. SUBQUERIES AND JOINS

The subquery approach to ﬁnding the all-star players who are born in January is as follows.

SELECT player_id

FROM player

WHERE birth_month = 1 AND player_id IN (

SELECT player_id

FROM all_star);

Now, not only have we gotten the same result with less code and manual work, but our

approach will work as the information is updated in the table. There is not a limit to the

number of subqueries that we can use, however, as the nesting of SELECT statements grows,

the performance slows down. With subqueries, we are only able to select one column at

a time. As we go deeper into nested queries, it may be diﬃcult to understand what is

happening if the indentation is poor. The website PoorSQL can be useful to reorganize

SQL commands, in order to improve readibility.

3.2 Joining tables

Joining tables gives us a way to combine tables based on the primary key attribute. It is

important to note that databases do not always come with joined tables, primarily because

it is more eﬃcient to store and index into tables separately. In addition, multiple tables

inside of a database allows for greater scalability. Joining tables do not permanent records

of information. Instead, they are only used to view data once a query is executed.

For our discussion on joining tables, we will be using two tables, T1 and T2. Each table

will store an ID (integer) and Name (text) attribute. The speciﬁc observations we have are

shown in Figure 3.2.1.

ID Name

1 ‘a’

2 ‘b’

3 ‘c’

4 ‘d’

5 ‘e’

(a) Contents of T1

ID Name

4 ‘v’

5 ‘w’

6 ‘x’

7 ‘y’

8 ‘z’

(b) Contents of T2

Figure 3.2.1: Contents of our example tables.

In the tables we have created, the ID attribute is a primary key that must be unique for

each observation in each table. We can use a venn diagram as shown in Figure 3.2.2 to show

the overlapping and uniqueness of the keys in both tables.

3.2.1 Cartesian (cross) joins

One of the easiest ways to join two databases is to combine each item in one database with

each item in the other database. This is known as cartesian or cross join and is shown in

Figure 3.2.3. We can create that speciﬁc cross join between two tables, T1 and T2 as follows.

SELECT *

FROM T1 CROSS JOIN T2;

With one table having m rows and the other table having n rows, the cartesian join will

result in mn rows. This can cause the number of rows to be large, even with relatively small

databases, so this method for joining databases is seldom used in practice.

3.2. JOINING TABLES 19

Table 1

1, 2, 3 4, 5

6, 7, 8

Table 2

Figure 3.2.2: Venn diagram showing the relationship between the IDs in T1 and T2.

ID Name

1 ‘a’

2 ‘b’

3 ‘c’

4 ‘d’

5 ‘e’

ID Name

4 ‘v’

5 ‘w’

6 ‘x’

7 ‘y’

8 ‘z’

T1.id T1.Name T2.id T2.Name

1 ‘a’ 4 ‘v’

1 ‘a’ 5 ‘w’

1 ‘a’ 6 ‘x’

1 ‘a’ 7 ‘y’

1 ‘a’ 8 ‘z’

2 ‘b’ 4 ‘v’

2 ‘b’ 5 ‘w’

2 ‘b’ 6 ‘x’

2 ‘b’ 7 ‘y’

2 ‘b’ 8 ‘z’

3 ‘c’ 4 ‘v’

3 ‘c’ 5 ‘w’

3 ‘c’ 6 ‘x’

3 ‘c’ 7 ‘y’

3 ‘c’ 8 ‘z’

4 ‘d’ 4 ‘v’

4 ‘d’ 5 ‘w’

4 ‘d’ 6 ‘x’

4 ‘d’ 7 ‘y’

4 ‘d’ 8 ‘z’

5 ‘e’ 4 ‘v’

5 ‘e’ 5 ‘w’

5 ‘e’ 6 ‘x’

5 ‘e’ 7 ‘y’

5 ‘e’ 8 ‘z’

Figure 3.2.3: Resultant table after a cartesian (cross) join between T1 and T2.

20 CHAPTER 3. SUBQUERIES AND JOINS

3.2.2 Inner joins

If we have keys for each observation in two tables, T1 and T2, the inner join will keep only

the observations that have keys in both T1 and T2. If we look at the table keys as sets, this

means we are ﬁnding T1.id ∩ T2.id. The SQL syntax for inner joins is as follows.

SELECT *

FROM T1 INNER JOIN T2

ON T1.id = T2.id;

[(4, ‘d’, 4, ‘v’), (5, ‘e’, 5, ‘w’)]

We could also combine more than two tables using multiple inner join statements in the

SELECT clause as shown.

SELECT *

FROM T1

INNER JOIN T2 ON T1.keyid = T2.keyid

INNER JOIN T3 ON T1.keyid = T3.keyid;

Table 1

1, 2, 3 4, 5

6, 7, 8

Table 2

Figure 3.2.4: The selection from inner join takes the intersection of keys from two tables.

3.2.3 Aliases

Consider, for example, if we had longer table names, such as table name1, table name2,

and table name3 and performed an inner join on all three of the tables as shown.

SELECT table_name1.attr, table_name2.attr, table_name3.attr

FROM table_name1

INNER JOIN table_name2 ON table_name1.keyid = table_name2.keyid

INNER JOIN table_name3 ON table_name1.keyid = table_name3.keyid;

Using an alias, we can temporarily set the references to the variable names to be shorter,

making it easier to write compact code. For example, we can set the alias for table name1,

table name2, and table name3 to be T1, T2, and T3, respectively, with the code that follows.

3.2. JOINING TABLES 21

SELECT T1.attr, T2.attr, T3.attr

FROM table_name1 T1

INNER JOIN table_name2 T2 ON T1.keyid = T2.keyid

INNER JOIN table_name3 T3 ON T1.keyid = T3.keyid;

3.2.4 Self join

In this section, we will use the table that results from the cross join between T1 and T2 as

shown in Figure 3.2.3. We will label the cross joined table as CJ, the ID from T1 as ID1 and

the ID from T2 as ID2. A self join will join an attribute with itself, generally with a speciﬁc

ﬁltering in mind. For example, if we have a table named people that stores what city a

given person is in, along with their name, we can SELECT people from the same city using a

self join technique that requires creating a copy of the table. If we want to use self join on

our table, CJ, where we are looking to combine all the attributes that have ID1 = ID2 and

ID1 <> ID2, then we can use the self join that follows.

SELECT *

FROM CJ cj1

INNER JOIN CJ cj2

on (cj1.id1 = cj2.id1) and (cj1.id2 <> cj2.id2)

[(1, 4, ‘a’, ‘v’, 1, 5, ‘a’, ‘w’),

(1, 4, ‘a’, ‘v’, 1, 6, ‘a’, ‘x’),

(1, 4, ‘a’, ‘v’, 1, 7, ‘a’, ‘y’),

(1, 4, ‘a’, ‘v’, 1, 8, ‘a’, ‘z’),

(1, 5, ‘a’, ‘w’, 1, 4, ‘a’, ‘v’),

(1, 5, ‘a’, ‘w’, 1, 6, ‘a’, ‘x’),

(1, 5, ‘a’, ‘w’, 1, 7, ‘a’, ‘y’),

(1, 5, ‘a’, ‘w’, 1, 8, ‘a’, ‘z’),

(1, 6, ‘a’, ‘x’, 1, 4, ‘a’, ‘v’),

(1, 6, ‘a’, ‘x’, 1, 5, ‘a’, ‘w’),

(1, 6, ‘a’, ‘x’, 1, 7, ‘a’, ‘y’),

(1, 6, ‘a’, ‘x’, 1, 8, ‘a’, ‘z’),

(1, 7, ‘a’, ‘y’, 1, 4, ‘a’, ‘v’),

(1, 7, ‘a’, ‘y’, 1, 5, ‘a’, ‘w’),

(1, 7, ‘a’, ‘y’, 1, 6, ‘a’, ‘x’),

(1, 7, ‘a’, ‘y’, 1, 8, ‘a’, ‘z’),

(1, 8, ‘a’, ‘z’, 1, 4, ‘a’, ‘v’),

(1, 8, ‘a’, ‘z’, 1, 5, ‘a’, ‘w’),

(1, 8, ‘a’, ‘z’, 1, 6, ‘a’, ‘x’),

(1, 8, ‘a’, ‘z’, 1, 7, ‘a’, ‘y’),

...

(5, 4, ‘e’, ‘v’, 5, 5, ‘e’, ‘w’),

(5, 4, ‘e’, ‘v’, 5, 6, ‘e’, ‘x’),

(5, 4, ‘e’, ‘v’, 5, 7, ‘e’, ‘y’),

(5, 4, ‘e’, ‘v’, 5, 8, ‘e’, ‘z’),

(5, 5, ‘e’, ‘w’, 5, 4, ‘e’, ‘v’),

(5, 5, ‘e’, ‘w’, 5, 6, ‘e’, ‘x’),

(5, 5, ‘e’, ‘w’, 5, 7, ‘e’, ‘y’),

(5, 5, ‘e’, ‘w’, 5, 8, ‘e’, ‘z’),

(5, 6, ‘e’, ‘x’, 5, 4, ‘e’, ‘v’),

(5, 6, ‘e’, ‘x’, 5, 5, ‘e’, ‘w’),

(5, 6, ‘e’, ‘x’, 5, 7, ‘e’, ‘y’),

(5, 6, ‘e’, ‘x’, 5, 8, ‘e’, ‘z’),

(5, 7, ‘e’, ‘y’, 5, 4, ‘e’, ‘v’),

22 CHAPTER 3. SUBQUERIES AND JOINS

(5, 7, ‘e’, ‘y’, 5, 5, ‘e’, ‘w’),

(5, 7, ‘e’, ‘y’, 5, 6, ‘e’, ‘x’),

(5, 7, ‘e’, ‘y’, 5, 8, ‘e’, ‘z’),

(5, 8, ‘e’, ‘z’, 5, 4, ‘e’, ‘v’),

(5, 8, ‘e’, ‘z’, 5, 5, ‘e’, ‘w’),

(5, 8, ‘e’, ‘z’, 5, 6, ‘e’, ‘x’),

(5, 8, ‘e’, ‘z’, 5, 7, ‘e’, ‘y’)]

Here, we have shortened the output to put it on only a single page. Notice that no pair of

IDs from cj1 is equal to the pair of IDs from cj2.

3.2.5 Left join

The left join keeps all the elements from the left table, which maintains the same number

of rows in the left table before and after the observation. However, the left join will also

add missing information from the right table to the observations in the left table, if the

observation keys intersect.

3.2. JOINING TABLES 23

Table 1

1, 2, 3 4, 5

6, 7, 8

Table 2

Figure 3.2.5: Left join keeps all the elements from the left table and adds additional information

to each observation from the right table, if available.

Using our example data with tables T1 and T2, we can perform a left join as follows.

SELECT *

FROM T1 LEFT JOIN T2

ON T1.id = T2.id;

[(1, ‘a’, None, None),

(2, ‘b’, None, None),

(3, ‘c’, None, None),

(4, ‘d’, 4, ‘v’),

(5, ‘e’, 5, ‘w’)]

Notice that all the left observations remain, and if the id attribute does not overlap between

the two tables, then a null value (None) is ﬁlled in. A left join could be useful, for example,

if we had tables of both customers and orders and would like to keep all the customers, as

well as match the orders with the customers who have made orders.

3.2.6 Right join

Table 1

1, 2, 3 4, 5

6, 7, 8

Table 2

Figure 3.2.6: A right join works similar to a left join, just the roles of left and right are switched.

A right join works just like a left join, in that all the observations from the right table remain,

with the addition of variables to the column space. However, right join is not in SQLite,

which is what we have been using to program our database in Python. The functionality is

24 CHAPTER 3. SUBQUERIES AND JOINS

purposfully not added because any left join can be made into a right join by switching the

ordering of the tables.

3.2.7 Full outer join

The outer join of two tables gives both the left join and right join between the IDs of two

tables. For overlapping IDs, the observations will be combined, and unique IDs will remain

unique IDs. However, like the right join, the full outer join is not available as a command

when using SQLite.

Table 1

1, 2, 3 4, 5

6, 7, 8

Table 2

Figure 3.2.7: The outer join takes the left join and right join between the keys of two tables.

3.2.8 Union

The union operator does not work in the traditional probabilitic sense that one may think.

From a probabilistic perspective, the full outer join appears to be most similar to the

union operator as shown in Figure 3.2.7. What the union operator does in SQL is stack

the information from one table onto the other as shown in Figure 3.2.8. Thus, the union

operator requires the number of columns to be the same, along with the type of variables

that can be in each column.

City

New York

Seattle

Atlanta

City

Palo Alto

Berkeley

Austin

UNION

City

New York

Seattle

Atlanta

Palo Alto

Berkeley

Austin

Figure 3.2.8: Union between two tables.

To ﬁnd the union between the two tables, T1 and T2, we can use the UNION command. By

default, SQL will sort rows based on the ordering of the key.

3.2. JOINING TABLES 25

SELECT * FROM T1

UNION

SELECT * FROM T2;

[(1, ‘a’),

(2, ‘b’),

(3, ‘c’),

(4, ‘d’),

(4, ‘v’),

(5, ‘e’),

(5, ‘w’),

(6, ‘x’),

(7, ‘y’),

(8, ‘z’)]

Chapter 4

Modifying and analyzing data

4.1 Strings

Just like most programming languages, SQL comes with a variety of ways to modify strings

of text. A few of the common functions that work on strings include concatenate, substring,

trim, upper, and lower. For our discussion on strings, we will be using the Names table in

Figure 4.1.1

First Last

‘ John’ ‘Doe ’

‘Bill’ ‘Gates’

‘LeBron’ ‘James ’

Figure 4.1.1: Names table that we will be using for our discussion on strings.

4.1.1 Concatenations

A concatenation between two strings takes the contents from the ﬁrst string and adds to it

the contents of the second string. For example, if the ﬁrst string was “this and ” and the

second string was “that”, then the concatenation between the ﬁrst and second string would

be “this and that”. The concatenate operator in SQLite is ||. If we wanted to combine the

First and Last names in our Names table, we could call the following.

SELECT First || ‘ ’ || Last

FROM Names;

[(‘ John Doe ’,), (‘Bill Gates’,), (‘LeBron James ’,)]

Notice that we are also adding in a space between the ﬁrst and last names. If we did not

want to add a space, we could call First || Last.

SELECT First || Last

FROM Names;

[(‘ JohnDoe ’,), (‘BillGates’,), (‘LeBronJames ’,)]

4.1. STRINGS 27

4.1.2 Trimming strings

Notice that in our Names table, there appears to be leading and trailing whitespace. We

can remove all the whitespace on the lefthand and righthand side of the string using the

TRIM() method.

SELECT TRIM(First) || ‘ ’ || TRIM(Last)

FROM Names;

[(‘John Doe’,), (‘Bill Gates’,), (‘LeBron James’,)]

We could also remove only the whitespace on the lefthand side of the strings using the

LTRIM() method.

SELECT LTRIM(First) || ‘ ’ || LTRIM(Last)

FROM Names;

[(‘John Doe ’,), (‘Bill Gates’,), (‘LeBron James ’,)]

Similarily, we could also remove the whitespace on the righthand side of the string using

the RTRIM() method.

SELECT RTRIM(First) || ‘ ’ || RTRIM(Last)

FROM Names;

[(‘ John Doe’,), (‘Bill Gates’,), (‘LeBron James’,)]

4.1.3 Substring

Taking a substring of a particular string will give you back part of the original string. In

SQL, We use the SUBSTR(string name, starting position, number of characters)

method. SQL starts strings at index 1, as opposed to many programming languages that

start a string with index 0. Assuming our strings are trimmed, if we wanted to get the ﬁrst

letter of the ﬁrst name and concatenate it with the ﬁrst three letters of the last name, we

could call the following.

SELECT SUBSTR(First, 1, 1) || SUBSTR(Last, 1, 3)

FROM Names;

[(‘JDoe’,), (‘BGat’,), (‘LJam’,)]

If we specify a string length that is too long, for example if we try to get the ﬁrst ﬁve

characters from the name “John”, then the string will still work, but it will only return

characters until it gets to the end of the string.

SELECT SUBSTR(First, 1, 5)

FROM Names;

28 CHAPTER 4. MODIFYING AND ANALYZING DATA

[(‘John’,), (‘Bill’,), (‘LeBro’,)]

4.1.4 Uppercase and lowercase

To convert a string to uppercase or lowercase, we will using the UPPER() and LOWER()

commands, respectively. We can call UPPER() on our ﬁrst names as follows.

SELECT UPPER(First)

FROM Names;

[(‘JOHN’,), (‘BILL’,), (‘LEBRON’,)]

Similarily, we can also call LOWER() on a string to convert each character to be lowercase.

SELECT LOWER(First)

FROM Names;

[(‘john’,), (‘bill’,), (‘lebron’,)]

4.1.5 Date and time strings

Working with dates and times in SQL may be tricky because they are not stored the same

in every DBMS. Table 4.1 shows the 5 functions associated with dates and times in SQLite.

Function Name

DATE(timestring, modifiers...)

TIME(timestring, modifiers...)

DATETIME(timestring, modifiers...)

JULIANDAY(timestring, modifiers...)

STRFTIME(format, timestring, modifiers...)

Table 4.1: SQLite date and time functions

Table 4.2 shows the formats a string must be in, in order for SQLite to pick it up as a

timestring. For example, we can represent the date December 11, 1980 at 5:30 PM as

1980-12-11 17:30, where we have converted the time to a 24 hour scale. We can also refer

to the current time as ‘now’.

We can also add modiﬁers to our timestrings that modify the time we have speciﬁed. In

particular, we can add and subtract time, move the time to the beginning of the day/mon-

th/year, and change which day we want to use. Table 4.3 shows the strings needed for

each modiﬁer. For example, if we want to refer to the Monday of the current week, where

weekday N starts at Sunday = 0 and goes to Saturday = 6, we can call the following:

SELECT date(‘now’, ‘weekday 1’);

4.2. CASE STATEMENTS 29

YYYY-MM-DD

YYYY-MM-DD HH:MM

YYYY-MM-DD HH:MM:SS.SSS

YYYY-MM-DD‘T’HH:MM

YYYY-MM-DD‘T’HH:MM:SS

YYYY-MM-DD‘T’HH:MM:SS.SSS

HH:MM

HH:MM:SS

HH:MM:SS.SSS

Table 4.2: Diﬀerent types of timestrings (note that the ‘T’ character in the middle timestrings is

the literal T character)

±N N N days start of day weekday N

±N N N hours start of month

±N N N minutes start of year

±N N N.N N N N seconds

±N N N months

±N N N years

Table 4.3: Diﬀerent types of modiﬁers

If we want to add 2 hours, 20 minutes, and 15 seconds to the start of the current month,

then we can call:

SELECT date(‘now’, ‘start of month’, ‘+2 hours’, ‘+20 minutes’,

‘+15 seconds’);

4.2 Case statements

Case statements allow us to use conditional statements in the form of if/then/else inside of

SQL. We can use CASE statements inside of SELECT, INSERT, UPDATE, and DELETE statements.

The syntax to write a case statement follows.

CASE

WHEN C1 THEN E1

WHEN C2 THEN E2

...

ELSE

END

We can also use case statements to act like switch functions in other programming languages,

as we show in the following example.

30 CHAPTER 4. MODIFYING AND ANALYZING DATA

CASE <input expression>

WHEN <when expression> THEN E1

...

ELSE

END

For example, suppose we want to separate our data that has a city category into ‘San

Francisco’ and ‘Other’ categories. Here, we can set our new category name to be SF and

use the select statement that follows.

SELECT

CASE city

WHEN ‘San Francisco’ THEN ‘San Francisco’

ELSE ‘Other’

END SF

FROM <table>;

4.3 Views

Views give us a way to save our searches in SQL in a variable, which other users can easily

reference. For example, suppose we have a list of NBA players and we have a strict criteria to

determine if they have had a successful career based on if they meet the following conditions











NBA years ≥ 10

All NBA appearances ≥ 5

All Star appearances ≥ 3

NBA Championships ≥ 1

. (4.3.1)

Here, we could create a VIEW using the statement that follows.

CREATE VIEW successful AS

SELECT *

FROM <players table>

WHERE

NBA_years >= 10

All_NBA_appearances >= 5

All_Star_appearances >= 3

NBA_Championships >= 1;

After creating that view, we can reference it in the future just like we reference tables.

SELECT * FROM successful;

We can also ﬁnd all of the created views from a single table using the dot VIEWS command.

SELECT * FROM <table name>.VIEWS;