Skip to content

Commit

Permalink
Move all kable to tinytable
Browse files Browse the repository at this point in the history
  • Loading branch information
RohanAlexander committed Nov 9, 2024
1 parent b6de9c7 commit b76d3c9
Show file tree
Hide file tree
Showing 66 changed files with 6,435 additions and 5,224 deletions.
10 changes: 9 additions & 1 deletion 00-errata.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Chapman and Hall/CRC published this book in July 2023. You can purchase that [he
This online version has some updates to what was printed. An online version that matches the print version is available [here](https://rohanalexander.github.io/telling_stories-published/).
:::

*Last updated: 8 November 2024.*
*Last updated: 9 November 2024.*

The book was reviewed by Piotr Fryzlewicz in *The American Statistician* [@Fryzlewicz2024] and Nick Cox on [Amazon](https://www.amazon.com/gp/customer-reviews/R3S602G9RUDOF/ref=cm_cr_dp_d_rvw_ttl?ie=UTF8&ASIN=1032134771). I am grateful that they gave such a lot of their time to provide the review, as well as their corrections and suggestions.

Expand All @@ -32,6 +32,14 @@ Seamus Ross,
Tino Kanngiesser, and
Zak Varty.

## Class usage

I have found various ways to use this book in classes. While traditional chalk-and-talk lectures work, if the students can commit to reading the chapter before the class, then I have found that using class for group-based projects and discussion is more enjoyable. Each week create small groups, each of two to four students (randomly create new groups every week to give students a chance to work with new people). Then generally following a "think-pair-share" exercise [@lyman1981responsive] have them work through most exercises, first by themselves, then compare with their group, and finally share selected answers with the class. I recommend creating a Google Doc and using that in places to make it easier to share. If you take this approach then the weekly quiz becomes especially important to ensure students are doing the readings.

In terms of timing and coverage, I have found that if Part I "Foundations" is covered, then the rest of the chapters are fairly independent. While I try to go with one chapter per week, students sometimes take a while to get started, and the first three chapters takes about three weeks (even though there is not much to the first chapter).

Typically, somewhere between the first and second papers is where it all starts to come together. It is important that Paper 1 is returned quickly to them so that they can incorporate lessons from that for future papers.


## Errors

Expand Down
6 changes: 3 additions & 3 deletions 01-introduction.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -287,10 +287,10 @@ Ultimately, we are all just telling stories with data, but these stories are inc
b. Data simplify reality to make analysis possible, but they cannot capture every detail.
c. Data are always inaccurate and useless.

### Activities {.unnumbered}
### Class activities {.unnumbered}

- [The instructor should take a photo of the class, then display the photo on the screen.] Write three aspects about what the photo shows (faces, class composition, etc), and three aspects about what the photo does not (context, thoughts, emotions, motivations, students who are not present, etc). Discuss how this relates to data science.
- [The instructor should give each group a different item to use for measurement, some of which are more useful than others, for instance, measuring tape, paper, ruler, markers, scales, etc.] Using the item you were given, please answer the following question: "How long is your hair?". Relate your experience to data science.
- The instructor should take a photo of the class, then display the photo on the screen. In small groups, students should identify three aspects the photo shows, and three aspects the photo does not. Discuss how this relates to data science.
- The instructor should give each group a different item to use for measurement, some of which are more useful than others, for instance, measuring tape, paper, ruler, markers, scales, etc. Students should then use the item to answer the following question: "How long is your hair?". Add the number to a spreadsheet. If you only had the spreadsheet, what would you understand and not understand about hair length? Relate this to data science more broadly.

### Task {.unnumbered}

Expand Down
51 changes: 49 additions & 2 deletions 02-drinking_from_a_fire_hose.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -1349,9 +1349,56 @@ midwest |>
b. "geo_area"
c. "country"

### Activity {.unnumbered}
### Class activities {.unnumbered}

- Use the [starter folder](https://github.com/RohanAlexander/starter_folder) and create a new repo. Add a link to the GitHub repo in the class's shared Google Doc.
- Pick one of the `dplyr` verbs -- `mutate()`, `select()`, `filter()`, `arrange()`, `summarize()`. Explain what it does and the context, and then livecode an example of its use.
- Explain what class is, with an example.
- Simulate 100 draws from the uniform distribution with mean 5 and standard deviation 2 in the `simulation.R` script. Write one test for this dataset in the `tests.R` script.
- Simulate 50 draws from the Poisson distribution with lambda 10 in the `simulation.R` script. Write two tests for this dataset in the `tests.R` script.
- Gather some data on Marriage Licence Statistics in Toronto using Open Data Toronto in the `gather.R` script. Clean it in the `cleaning.R` script.^[Consider `separate()` and then `lubridate::ymd()` for the dates.] Graph it in the Quarto document.
- The following code produces an error. Please add it to a GitHub Gist, and email it to the instructor asking for help:
```{r}
#| eval: false
tibble(year = 1875:1972,
level = as.numeric(datasets::LakeHuron)) |>
ggplot(aes(x = year, y = level)) |>
geom_point()
```
- The following code creates an odd-looking graph in terms of dates. Please identify the issue and fix it, by adding functions before `ggplot()`.
```{r}
#| eval: false
set.seed(853)
data <-
tibble(date = as.character(sample(seq(
as.Date("2022-01-01"),
as.Date("2022-12-31"),
by = "day"
),
10)), # https://stackoverflow.com/a/21502397
number = rcauchy(n = 10, location = 1) |> round(0))
data |>
# MAKE CHANGE HERE
ggplot(aes(x = date, y = number)) +
geom_col()
```
- Consider the following code to make a graph. You want to move the legend to the bottom but cannot remember the `ggplot2` function to do that. Please find the answer on Stack Overflow.
```{r}
#| eval: false
penguins |>
drop_na() |>
ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +
geom_point()
```


### Task {.unnumbered}

The purpose of this activity is to redo the Australian Elections example, but for Canada. It is a chance to work in a realistic setting because the Canadian situation has some differences, but the Australian example provides guardrails.
The purpose of this task is to redo the Australian Elections example, but for Canada. It is a chance to work in a realistic setting because the Canadian situation has some differences, but the Australian example provides guardrails.

By way of background, Canada's parliament has 338 seats, also known as "ridings", in the House of Commons. The main political parties are: major parties: Liberal and Conservative; minor parties: Bloc Québécois, New Democratic, and Green; some smaller parties and independents. The steps you should follow are:

Expand Down
55 changes: 51 additions & 4 deletions 03-workflow.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -1279,9 +1279,56 @@ election_results |>
c. Predictive analytics in your code.
d. Collaborative work with future colleagues.
### Activity I {.unnumbered}
The purpose of this activity is to give and receive peer review. Peer review generally, and code review specifically [@codereview], is an important part of working as a professional.
### Class activities {.unnumbered}
- Use the [starter folder](https://github.com/RohanAlexander/starter_folder) and create a new repo. Add a link to the GitHub repo in the class's shared Google Doc.
- Use Quarto to make a PDF with a title, author, and an abstract.^[There are always a small number of students who struggle with getting the PDF set up locally. Worst case, do everything else locally as a html, then build the PDF in Posit Cloud.]
- Add three sections and some code that produces the mean bill length, by species, for `palmerpenguins::penguins` (with the code itself hidden).
- Add a citation of R and `palmerpenguins`, then add a graph of body mass, by sex.
- Add a paragraph of text about the graph and a cross-reference. Also add a table about the number of species, by year.
- [The instructor should (very slowly) live code all this and have students code-along.] Set up git on your local computer.^[If you have hidden your GitHub email then make sure you use the alias when you add an email address locally.] Make a GitHub repo, then make a local copy, make some changes, and push.^[There will always be a few students that cannot get git working locally. I find the best approach is to triage by pairing them with an advanced student while you demonstrate, and if there are remaining issues then deal with them individually at an office hour.]
- Find the GitHub repo of a partner, fork it, make a change, and make a pull request.
- The following code produces an error. Please follow the strategies in @sec-dealingwitherrors to fix it.
```{r}
#| eval: false
tibble(year = 1875:1972,
level = as.numeric(datasets::LakeHuron)) |>
ggplot(aes(x = year, y = level)) |>
geom_point()
```
- The following code produces an error. Please follow the strategies in @sec-dealingwitherrors to fix it.
```{r}
#| eval: false
tibble(year = 1871:1970,
annual_nile_flow = as.character(datasets::Nile)) |>
ggplot(aes(x = annual_nile_flow)) +
geom_histogram()
```
- The following code produces an error. Following @sec-omgpleasemakeareprexplease create a reprex (change the example to use a more common dataset such as `mtcars`), add it to a GitHub Gist, and email it to the instructor.
```{r}
#| eval: false
tibble(year = 1875:1972,
level = as.numeric(datasets::LakeHuron)) |>
ggplot(aes(x = year, y = level)) |>
geom_point()
```
- The following code produces an error. Please use ChatGPT, or an equivalent LLM, to correct it. Discuss: 1) the prompt, and 2) the corrected code.
```{r}
#| eval: false
penguins |>
ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) |>
geom_point()
```
### Task I {.unnumbered}
The purpose of this task is to give and receive peer review. Peer review generally, and code review specifically [@codereview], is an important part of working as a professional.
Please start by running `usethis::git_vaccinate()`. Then update your work from the Activity in @sec-fire-hose, to use the [starter folder](https://github.com/rohanalexander/starter_folder). This would involve, amongst other things, moving the downloading and cleaning to appropriate scripts, updating the README, adding a title, etc. Generally, you should look at the rubric for the *Donaldson* Paper in [Online Appendix -@sec-papers] and quickly try to comply with as much as possible, without doing too much extra work. Then exchange it with someone else.
Expand All @@ -1303,9 +1350,9 @@ Read @googlecoderview and @giladpeerreview. Then using GitHub Issues please cond
*[Any other comments.]*
### Activity II {.unnumbered}
### Task II {.unnumbered}
The purpose of this activity is to develop comfort with:
The purpose of this task is to develop comfort with:
1. Quarto, and
2. Git and GitHub.
Expand Down
19 changes: 17 additions & 2 deletions 04-writing_research.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -863,9 +863,24 @@ A variety of authors have established rules\index{writing!rules} for writing. Th
d. They are acceptable as long as the content is good.


### Activity {.unnumbered}
### Class activities {.unnumbered}

@caroonworking [p. xii] writes at least 1,000 words almost every day. The purpose of this activity is to give you the chance to do that also. Please pick one of the papers specified in the prerequisites and complete the following tasks:

- Discuss your preferred approach (data-first/question-first/other) to research and why.
- Explain, with reference to examples, what is an estimand, estimator, and estimate.
- Please consider "selection bias" and include the definition in a sentence in the same way that @monicababynames does for the Gini coefficient.
- Please use ChatGPT, or an equivalent LLM, to create a prompt that answers the question "What is a selection effect?". With a partner, improve the response by adding context, references, and making it true (if necessary). Discuss three aspects: 1) the prompt, 2) the original answer, 3) your augmented answer.
- Pick one of the well-written quantitative papers:
- Write out the original title. What do you like, and not like, about it? Write an alternative title for it.
- Write out the abstract. What do you like, and not like, about it?
- Please prompt ChatGPT, or an equivalent LLM, to create an alternative abstract (copy the prompt so you can discuss it).
- Draw on all of this to put together an improved abstract and then discuss everything.
- Make a plan, based on @king2006publication, for how you will write a meaningful paper by the end of this class. (For PhD students: Detail three journals/conferences, in order, that you will submit it to, and why the paper would be a good fit at each.)
- *Paper review:* Please read @Gerring2012 and write a review of one page.

### Task {.unnumbered}

@caroonworking [p. xii] writes at least 1,000 words almost every day. The purpose of this task is to give you the chance to do that also. Please pick one of the papers specified in the prerequisites and complete the following tasks:

- Day 1: Transcribe, by writing each word yourself, the entire introduction.
- Day 2: Rewrite the introduction so that it is five lines (or 10 per cent, whichever is less) shorter.
Expand Down
111 changes: 109 additions & 2 deletions 05-graphs_tables_maps.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -1922,7 +1922,114 @@ beps |>
c. `popup`
d. `label`

### Activity I {.unnumbered}

### Class activities {.unnumbered}

- Use the [starter folder](https://github.com/RohanAlexander/starter_folder) and create a new repo. Add a link to the GitHub repo in the class's shared Google Doc. Do all the following in `paper.qmd`.
- The following produces a scatterplot showing the level, in feet, of Lake Huron between 1875 and 1972. Please improve it.
```{r}
#| eval: false
tibble(year = 1875:1972,
level = as.numeric(datasets::LakeHuron)) |>
ggplot(aes(x = year, y = level)) +
geom_point()
```
- The following produces a bar chart of the height of 31 Black Cherry Trees. Please improve it.
```{r}
#| eval: false
datasets::trees |>
as_tibble() |>
ggplot(aes(x = Height)) +
geom_bar()
```

- The following produces a line plot showing the weight of chicks, in grams, by how many days old they were. Please improve it.
```{r}
#| eval: false
datasets::ChickWeight |>
as_tibble() |>
ggplot(aes(x = Time, y = weight, group = Chick)) +
geom_line()
```
```{r}
#| echo: false
#| eval: false
# The best I've managed (based on stealing the best bits of past student work) is:
datasets::ChickWeight |>
group_by(Diet, Time) |>
summarize(average_weight = mean(weight)) |>
ggplot(aes(x = Time,
y = average_weight,
color = Diet)) +
geom_line(linewidth = 1.5) +
geom_point(data = datasets::ChickWeight,
aes(x = Time, y = weight),
alpha = 0.5) +
geom_line(data = datasets::ChickWeight,
aes(x = Time, y = weight, group = Chick),
alpha = 0.1) +
labs(x = "Days since birth",
y = "Average weight (grams)",
color = "Diet") +
theme_classic() +
scale_color_brewer(palette = "Set1") +
scale_y_continuous(breaks = seq(0, 400, 50))
```

- The following produces a histogram showing the annual number of sunspots between 1700 and 1988. Please improve it.
```{r}
#| eval: false
tibble(year = 1700:1988,
sunspots = as.numeric(datasets::sunspot.year) |> round(0)) |>
ggplot(aes(x = sunspots)) +
geom_histogram()
```

- Please follow [this code](https://github.com/saloni-nd/misc/blob/main/Mortality%20rates%20by%20age%20-%20HMD.R) from Saloni Dattani, and make a graph for two countries of interest to you.
- The following code, taken from the `palmerpenguins` [vignette](https://allisonhorst.github.io/palmerpenguins/articles/examples.html), produces a beautiful graph. Please modify it to create the ugliest graph that you can.^[The idea for this exercise is from Liza Bolton.]
```{r}
#| eval: false
#| warning: false
ggplot(data = penguins,
aes(x = flipper_length_mm,
y = body_mass_g)) +
geom_point(aes(color = species,
shape = species),
size = 3,
alpha = 0.8) +
scale_color_manual(values = c("darkorange", "purple", "cyan4")) +
labs(
title = "Penguin size, Palmer Station LTER",
subtitle = "Flipper length and body mass for Adelie, Chinstrap and Gentoo Penguins",
x = "Flipper length (mm)",
y = "Body mass (g)",
color = "Penguin species",
shape = "Penguin species"
) +
theme_minimal() +
theme(
legend.position = c(0.2, 0.7),
plot.title.position = "plot",
plot.caption = element_text(hjust = 0, face = "italic"),
plot.caption.position = "plot"
)
```

- The following code provides estimates for the speed of light, from three experiments, each of 20 runs. Please create an average speed of light, per experiment, then use `knitr::kable()` to create a cross-referenced table, with specified column names, and no significant digits.
```{r}
#| eval: false
datasets::morley |>
tibble()
```

### Task I {.unnumbered}

Please create a graph using `ggplot2` and a map using `ggmap` and add explanatory text to accompany both. Be sure to include cross-references and captions, etc. Each of these should take about a page.

Expand All @@ -1932,7 +2039,7 @@ And finally, with regard to the map that you created, please reflect on the foll

Submit a link to a high-quality GitHub repo.

### Activity II {.unnumbered}
### Task II {.unnumbered}

Please obtain data on the ethnic origins and number of Holocaust victims killed at Auschwitz concentration camp. Then use `shiny` to create an interactive graph and an interactive table. These should show the number of people murdered by nationality/category and should allow the user to specify the groups they are interested in seeing data for. Publish them using the free tier of shinyapps.io.

Expand Down
Loading

0 comments on commit b76d3c9

Please sign in to comment.