generated from fhdsl/OTTR_Quarto
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathweek1-exercises.qmd
144 lines (98 loc) · 2.78 KB
/
week1-exercises.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
title: "week1-exercises"
format: html
---
In the exercises, you'll work with a different database. It is a synthetic dataset of breast cancer patients, but it has the same table structure (or schema) as the in-class example. It is much larger than the in class example.
We'll first connect to the database:
```{r}
#| context: setup
library(duckdb)
library(DBI)
library(DiagrammeR)
con <- DBI::dbConnect(duckdb::duckdb(),
"data/synthea-smaller_breast_cancer.db")
```
## `person`
Let's get some details about the tables.
`DESCRIBE` the person table.
```{sql}
#| connection: "con"
DESCRIBE -------;
```
Show the first 10 rows of `person`.
```{sql}
#| connection: "con"
SELECT * FROM
---------
LIMIT ----;
```
How many people (or rows) are in the person table?
```{sql}
#| connection: "con"
SELECT COUNT(-) FROM person;
```
How many people are born after 1980?
```{sql}
#| connection: "con"
#| eval: false
SELECT COUNT(person_id) FROM person
WHERE year_of_birth -------;
```
How about how many people who have `gender_source_value` of 'M'? (Hint: remember to use *single* quotes for values, not *double* quotes):
```{sql}
#| connection: "con"
#| eval: false
SELECT COUNT(person_id) FROM person
WHERE gender_source_value = ----
```
Ok, we now have a better idea of what is in the `person` table. Let's take a deeper dive into the `concept` table.
## Look at the `concept` table
`DESCRIBE` the concept table.
```{sql}
#| connection: "con"
DESCRIBE concept;
```
Select the distinct `domain_id`s from the `concept` table:
```{sql}
#| connection: "con"
SELECT DISTINCT --------
FROM -------;
```
Return the number of distinct `concept_name`s with `domain_id` equal to `'Procedure'`:
```{sql}
#| connection: "con"
SELECT COUNT(concept_name) FROM concept
WHERE -----------;
```
## Look at `procedure_occurrence`
Let's take a look at `procedure_occurrence` and see whether it uses `concept_id`s.
DESCRIBE `procedure_occurrence`:
```{sql}
#| connection: "con"
DESCRIBE procedure_occurrence;
```
How many columns are `concept_id`s?
How many distinct `procedure_concept_id`s are there in this `procedure_occurrence` table?
```{sql}
#| connection: "con"
SELECT COUNT(DISTINCT procedure_concept_id)
FROM procedure_occurrence;
```
## Explore another table
Describe another table in the Database. (You can use `SHOW TABLES` to show a list of tables).
```{sql}
#| connection: "con"
```
Return the first 10 rows of this table.
```{sql}
#| connection: "con"
```
Choose a column and try to write a `WHERE` statement to filter it. `COUNT` the number of rows returned.
```{sql}
#| connection: "con"
```
## Disconnect from the database
When you're done with your assignment, run the below code chunk to disconnect from the database.
```{r}
dbDisconnect(con)
```