Skip to content

Commit 16ad543

Browse files
committed
5
1 parent f4e8571 commit 16ad543

28 files changed

+703
-716
lines changed

.DS_Store

0 Bytes
Binary file not shown.

_episodes/00-sql-introduction.md

+18-47
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: "Introduction to SQL"
3-
teaching: 30
3+
teaching: 20
44
exercises: 5
55
questions:
66
- "What is a relational database and why should I use it?"
@@ -20,12 +20,8 @@ keypoints:
2020

2121
_Note: this should have been done by participants before the start of the workshop._
2222

23-
We use [SQLite Manager](https://addons.mozilla.org/en-us/firefox/addon/sqlite-manager/)
24-
and the
25-
[Portal Project dataset](https://figshare.com/articles/Portal_Project_Teaching_Database/1314459)
26-
throughout this lesson. See [Setup](/sql-ecology-lesson/setup/) for
27-
instructions on how to download the data, and also how to install and open
28-
SQLite Manager.
23+
See [Setup](../setup/) for
24+
instructions on how to download the data, and also how to install jupyter notebook
2925

3026
## What is SQL?
3127

@@ -36,7 +32,7 @@ These queries can allow you to perform a number of actions such as: insert, upda
3632

3733
> ## Think about this situation
3834
>
39-
> Imagine if you are a owner of a convenience store, and you are trying to record your soda purchase record <br>
35+
> Imagine if you are a owner of a convenience store, and you are trying to record your soda purchase. <br>
4036
> In each invoice, it contains the following information: <br>
4137
> Invoice id, Date, Category, Soda name, Volume, Cost, Retail Price, Vendor, Number of bottle purchased <br>
4238
> How would you store the data?
@@ -46,19 +42,20 @@ These queries can allow you to perform a number of actions such as: insert, upda
4642
If you store all these invoice information in one Excel file,
4743
What problem could raise from this approach?
4844
<b>Data redundency:</b> <br>
49-
![alt text](../img/00_1.png)
45+
![alt text](../img/00_1.png){:width="700px"}
5046
<br><br>
5147
Imagine if you consistantly purchased some Big Dog Cola from LCDM Beverage vendor every day for 5 days,
5248
Notice these columns: Category, Soda_name, Volume, Cost, Retail_Price, Vendor, Vendor number
5349
With tranditional file approach, you have to record exact same information in these columns 5 times.
5450
<b>Data inconsistancy:</b> <br>
55-
![alt text](../img/00_2.png)
51+
![alt text](../img/00_2.png){:width="700px"}
5652
<br><br>
5753
Imagine if the vendor changed its phone number. Then multiple changes has to be made.
5854
There are only 5 rows so it might be easy to change everything. If the data size gets large, mistakes are likely to occur.
5955

6056
If you thought about storing these information in few different Excel files, <b>great idea! You are on the right track </b><br>
61-
However, if you want information from all files at the same time, how do you combine them? If each file contains thoudans of rows, Ah...
57+
However, if you want information from all files at the same time, how do you combine them? If each file contains thoudans of rows, Ah... <br>
58+
The largest table that Excel can handle is 1,048,576 * 16,384. Excel can be very slow when working with large number of data. Moreover, in the real word, you can easily get 1 million rows of data...
6259
![alt text](../img/tuxue.png){:height="100px" width="100px"}
6360

6461
## Goals
@@ -103,6 +100,7 @@ Using a relational database serves several purposes.
103100
* It's fast, even for large amounts of data.
104101
* It improves quality control of data entry (type constraints and use of forms in MS Access, Filemaker, Oracle Application Express etc.)
105102
* The concepts of relational database querying are core to understanding how to do similar things using programming languages such as R or Python.
103+
* Many big companies are using Relational Database to store financial data. As business students, we can learn how to pool useful data by ourselves.
106104

107105
## Database Management Systems (DBMS)
108106

@@ -111,47 +109,12 @@ relational data. We're going to use SQLite today, but basically everything we
111109
teach you will apply to the other database systems as well (e.g. MySQL,
112110
PostgreSQL, MS Access, MS SQL Server, Oracle Database and Filemaker Pro). The
113111
only things that will differ are the details of exactly how to import and
114-
export data and the [details of data types](#datatypediffs).
112+
export data and the data types.
115113

116114
Look at the [popularity of database](https://db-engines.com/en/ranking) <br>
117115
![alt text](../img/dbms.png){:height="170px"} <br>
118116
Top 4 are all Relational Database Management Systems (RDBMS). More and more companies choose to use relational database.
119117

120-
## Relational databases
121-
122-
Let's look at a pre-existing database, the `soda.db`
123-
file that we downloaded during
124-
[Setup](/sql-business/setup/). Clicking on the "open file" icon, then
125-
find that file and clicking on it will open the database.
126-
127-
You can see the tables in the database by looking at the left hand side of the
128-
screen under Tables, where each table corresponds to one of the `csv` files
129-
we were exploring earlier. To see the contents of any table, click on it, and
130-
then click the “Browse and Search” tab in the right panel. This will
131-
give us a view that we're used to - just a copy of the table. Hopefully this
132-
helps to show that a database is, in some sense, just a collection of tables,
133-
where there's some value in the tables that allows them to be connected to each
134-
other (the "related" part of "relational database").
135-
136-
The leftmost tab, "Structure", provides some metadata about each table. It
137-
describes the columns, often called *fields*. (The rows of a database table
138-
are called *records*.) If you scroll down in the Structure view, you'll
139-
see a list of fields, their labels, and their data *type*. Each field contains
140-
one variety or type of data, often numbers or text. You can see in the
141-
`surveys` table that most fields contain numbers (integers) while the `species`
142-
table is nearly all text.
143-
144-
The "Execute SQL" tab is blank now - this is where we'll be typing our queries
145-
to retrieve information from the database tables.
146-
147-
To summarize:
148-
149-
* Relational databases store data in tables with fields (columns) and records
150-
(rows)
151-
* Data in tables has types, and all values in a field have
152-
the same type ([list of data types](#datatypes))
153-
* Queries let us look up data or make calculations based on columns
154-
155118
## Database Design
156119

157120
* Every row-column combination contains a single *atomic* value, i.e., not
@@ -162,3 +125,11 @@ To summarize:
162125
* Needs an identifier in common between tables – shared column - to
163126
reconnect (known as a *foreign key*).
164127

128+
To summarize:
129+
130+
* Relational databases store data in tables with fields (columns) and records
131+
(rows)
132+
* Each row is uniquely identified by a Primary Key
133+
* Queries let us look up data or make calculations based on columns
134+
135+

_episodes/01-sql-intro2.md _episodes/01-sql-data.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: "Getting to know your data"
33
teaching: 30
4-
exercises: 5
4+
exercises: 0
55
questions:
66
- "How to connect to database with python?"
77
- "What is in the database?"
@@ -22,7 +22,7 @@ keypoints:
2222

2323
_Note: this should have been done by participants before the start of the workshop._
2424

25-
See [Setup](/sql-ecology-lesson/setup/) for
25+
See [Setup](../setup/) for
2626
instructions on how to download the data, and also how to install and open SQLite Manager.
2727

2828
## Import

_episodes/02-sql-basic-queries.md

+41-2
Original file line numberDiff line numberDiff line change
@@ -191,9 +191,48 @@ To truly be alphabetical, we might want to order by genus then species.
191191
> ## Challenge
192192
>
193193
> - Write a query that returns Item_Description, Bottle_Cost, volume and retail price
194-
> of the soda, sorted firstly with retail price in ascending order, then with volume in descending order.
194+
> of the soda, sorted firstly with retail price in descending order, then with volume in ascending order.
195195
{: .challenge}
196196

197+
## Dealing with dates
198+
Firstly, we take a look at the `invoice_info` table
199+
```
200+
SELECT * FROM invoice_info;
201+
```
202+
The date in sqlite3 can be stored as multiple string formats:
203+
```
204+
YYYY-MM-DD
205+
YYYY-MM-DD HH:MM
206+
YYYY-MM-DD HH:MM:SS
207+
YYYY-MM-DD HH:MM:SS.SSS
208+
YYYY-MM-DDTHH:MM
209+
YYYY-MM-DDTHH:MM:SS
210+
YYYY-MM-DDTHH:MM:SS.SSS
211+
HH:MM
212+
HH:MM:SS
213+
HH:MM:SS.SSS
214+
now
215+
DDDDDDDDDD
216+
```
217+
In our database, the date were stored as "YYYY-MM-DD" format. We can extract the year/month/date with `strftime` function. For example:
218+
```
219+
Select *, strftime('%Y', Date) AS YEAR
220+
FROM invoice_info;
221+
```
222+
For month and day, simply replace `%Y` with `%m` or `%d`. You can also do `%Y-%m` to get the year and month.
223+
You can get all invoice after 2017 by:
224+
```
225+
SELECT * FROM invoice_info
226+
WHERE Date >= "2017-01-01";
227+
```
228+
You can get a invoice from a range of time by:
229+
```
230+
SELECT * FROM invoice_info
231+
WHERE Date BETWEEN "2017-01-01" AND "2017-02-27";
232+
```
233+
Note that `BETWEEN` is inclusive, that is, invoices at 2017-01-01 and 2017-02-27 will be returned
234+
If you are interested at more cool things you can do with dates, heres the [Documentation](https://www.sqlite.org/lang_datefunc.html)
235+
197236
## Order of execution
198237

199238
Another note for ordering. We don’t actually have to display a column to sort by
@@ -221,7 +260,7 @@ we recommend to put each clause on its own line.
221260
> ## Challenge
222261
>
223262
> - Let's try to combine what we've learned so far in a single
224-
> query. Using the **item_info** table write a query to display the three date fields,
263+
> query. Using the **item_info** table write a query to display the three data fields,
225264
> `Item_Description`, `Bottle_Volume_ml` and Retail Price for 6 packs (give it an alias `Six_Pack_Price`), for
226265
> all sodas that are usually sold in 6 packs, ordered firstly by `Six_Pack_Price`, then alphabetically by the `Item_Description`.
227266
> - Write the query as a single line, then put each clause on its own line, and

_episodes/03-sql-aggregation.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -96,8 +96,8 @@ of these groups (`HAVING`).
9696

9797
> ## Challenge
9898
>
99-
> What sodas were sold more than 100000 bottles from 2012 to 2017?
100-
> In another word, write a query that returns item_id in invoice_info table
99+
> What sodas were sold more than 100000 bottles in the whole database?
100+
> In another word, write a query that returns item_id and total bottles sold in invoice_info table
101101
> Where the total bottles sold is more than 100000
102102
{: .challenge}
103103

0 commit comments

Comments
 (0)