|
| 1 | +--- |
| 2 | +title: "ggplot2 extensions: ggTimeSeries" |
| 3 | +--- |
| 4 | + |
| 5 | +### ggTimeSeries |
| 6 | +<https://github.com/Ather-Energy/ggTimeSeries> |
| 7 | + |
| 8 | +This R package offers novel time series visualisations. It is based on `ggplot2` and offers `geom`s and pre-packaged functions for easily creating any of the offered charts. Some examples are listed below. |
| 9 | + |
| 10 | +```{r, message=FALSE,warning=FALSE} |
| 11 | +# Example from https://github.com/Ather-Energy/ggTimeSeries |
| 12 | +library(ggplot2) |
| 13 | +library(ggthemes) |
| 14 | +library(data.table) |
| 15 | +library(ggTimeSeries) |
| 16 | +``` |
| 17 | + |
| 18 | +## Line Charts Legacy |
| 19 | +IoT devices generate a lot of sequential data over time, also called time series data. Legacy portrayals of such data would centre around line charts. Line charts have reportedly been around since the early 1700s (source: Wikipedia) and we have nothing against them. They facilitate trend detection and comparison, are simple to draw, and easy to understand; all in all a very well behaved visualisation. In modern times, their use is widespread from the heartbeat monitor at a hospital to the multiple-monitor display at a trader's desk. |
| 20 | + |
| 21 | +```{r excel97_line, ext = 'png', fig.align = 'center', echo = FALSE, message = F, warning = F} |
| 22 | +set.seed(10) |
| 23 | +dfData = data.frame( |
| 24 | + Time = 1:100, |
| 25 | + Signal = abs( |
| 26 | + c( |
| 27 | + cumsum(rnorm(100, 0, 3)), |
| 28 | + cumsum(rnorm(100, 0, 4)), |
| 29 | + cumsum(rnorm(100, 0, 1)), |
| 30 | + cumsum(rnorm(100, 0, 2)) |
| 31 | + ) |
| 32 | + ), |
| 33 | + Variable = c(rep('a', 100), rep('b', 100), rep('c', 100), rep('d', 100)), |
| 34 | + VariableLabel = c(rep('Class A', 100), rep('Class B', 100), rep('Class C', 100), rep('Class D', 100)) |
| 35 | +) |
| 36 | +
|
| 37 | +Excel97Plot = ggplot(dfData, aes(x = Time, y = Signal, color = VariableLabel)) + |
| 38 | + geom_line() + |
| 39 | + geom_point() + |
| 40 | + theme_excel() + |
| 41 | + scale_colour_excel() |
| 42 | +
|
| 43 | +print("Excel 97 look recreated in R with the ggthemes package") |
| 44 | +plot(Excel97Plot) |
| 45 | +
|
| 46 | +``` |
| 47 | + |
| 48 | +## Alternatives |
| 49 | + |
| 50 | +However there are cases when the data scientist becomes more demanding and specific. Five alternatives available to such a data scientist are listed below. All of these options are available as `geom`s or packaged functions in the `ggplot2` based `ggTimeSeries` package. |
| 51 | + |
| 52 | +Before that, setting a minimal theme - |
| 53 | +```{r minimalTheme} |
| 54 | +minimalTheme = theme_set(theme_bw(12)) |
| 55 | +minimalTheme = theme_update( |
| 56 | + axis.ticks = element_blank(), |
| 57 | + legend.position = 'none', |
| 58 | + strip.background = element_blank(), |
| 59 | + panel.border = element_blank(), |
| 60 | + panel.background = element_blank(), |
| 61 | + panel.grid = element_blank(), |
| 62 | + panel.border = element_blank() |
| 63 | +) |
| 64 | +
|
| 65 | +``` |
| 66 | + |
| 67 | +### Calendar Heatmap |
| 68 | +Available as `stat_calendar_heatmap` and `ggplot_calendar_heatmap`. |
| 69 | + |
| 70 | +A calendar heatmap is a great way to visualise daily data. Its structure makes it easy to detect weekly, monthly, or seasonal patterns. |
| 71 | + |
| 72 | +```{r calendar_heatmap, fig.align = 'center', echo = TRUE, message = F, warning = F} |
| 73 | +
|
| 74 | +# creating some data |
| 75 | +set.seed(1) |
| 76 | +dtData = data.table( |
| 77 | + DateCol = seq( |
| 78 | + as.Date("1/01/2014", "%d/%m/%Y"), |
| 79 | + as.Date("31/12/2015", "%d/%m/%Y"), |
| 80 | + "days" |
| 81 | + ), |
| 82 | + ValueCol = runif(730) |
| 83 | + ) |
| 84 | +dtData[, ValueCol := ValueCol + (strftime(DateCol,"%u") %in% c(6,7) * runif(1) * 0.75), .I] |
| 85 | +dtData[, ValueCol := ValueCol + (abs(as.numeric(strftime(DateCol,"%m")) - 6.5)) * runif(1) * 0.75, .I] |
| 86 | +
|
| 87 | +# base plot |
| 88 | +p1 = ggplot_calendar_heatmap( |
| 89 | + dtData, |
| 90 | + 'DateCol', |
| 91 | + 'ValueCol' |
| 92 | +) |
| 93 | +
|
| 94 | +# adding some formatting |
| 95 | +p1 + |
| 96 | + xlab('') + |
| 97 | + ylab('') + |
| 98 | + scale_fill_continuous(low = 'green', high = 'red') + |
| 99 | + facet_wrap(~Year, ncol = 1) |
| 100 | +
|
| 101 | +
|
| 102 | +# creating some categorical data |
| 103 | +dtData[, CategCol := letters[1 + round(ValueCol * 7)]] |
| 104 | +
|
| 105 | +# base plot |
| 106 | +p2 = ggplot_calendar_heatmap( |
| 107 | + dtData, |
| 108 | + 'DateCol', |
| 109 | + 'CategCol' |
| 110 | +) |
| 111 | +
|
| 112 | +# adding some formatting |
| 113 | +p2 + |
| 114 | + xlab('') + |
| 115 | + ylab('') + |
| 116 | + facet_wrap(~Year, ncol = 1) |
| 117 | +
|
| 118 | +
|
| 119 | +``` |
| 120 | + |
| 121 | + |
| 122 | +### Horizon Plots |
| 123 | +Available as `stat_horizon` and `ggplot_horizon`. |
| 124 | + |
| 125 | +Imagine an area chart which has been chopped into multiple chunks of equal height. If you overlay these chunks one on top of the the other, and colour them to indicate which chunk it is, you get a horizon plot. Horizon plots are useful when visualising y values spanning a vast range but with a skewed distribution, and / or trying to highlight outliers without losing context of variation in the rest of the data. |
| 126 | + |
| 127 | + |
| 128 | +```{r horizon, fig.align = 'center', echo = TRUE, message = F, warning = F} |
| 129 | +
|
| 130 | +# creating some data |
| 131 | +set.seed(1) |
| 132 | +dfData = data.frame(x = 1:1000, y = cumsum(rnorm(1000))) |
| 133 | +
|
| 134 | +# base plot |
| 135 | +p1 = ggplot_horizon(dfData, 'x', 'y') |
| 136 | +
|
| 137 | +
|
| 138 | +print("If you're seeing any vertical white stripes, it's a display thing.") |
| 139 | +# adding some formatting |
| 140 | +p1 + |
| 141 | + xlab('') + |
| 142 | + ylab('') + |
| 143 | + scale_fill_continuous(low = 'green', high = 'red') + |
| 144 | + coord_fixed( 0.5 * diff(range(dfData$x)) / diff(range(dfData$y))) |
| 145 | +``` |
| 146 | + |
| 147 | + |
| 148 | +### Steamgraphs |
| 149 | + |
| 150 | +Available as `stat_steamgraph`. |
| 151 | + |
| 152 | +A steamgraph is a more aesthetically appealing version of a stacked area chart. It tries to highlight the changes in the data by placing the groups with the most variance on the edges, and the groups with the least variance towards the centre. This feature in conjunction with the centred alignment of each of the contributing areas makes it easier for the viewer to compare the contribution of any of the components across time. |
| 153 | + |
| 154 | +```{r steamgraph, fig.align = 'center', echo = TRUE, message = F, warning = F} |
| 155 | +# creating some data |
| 156 | +set.seed(10) |
| 157 | +dfData = data.frame( |
| 158 | + Time = 1:1000, |
| 159 | + Signal = abs( |
| 160 | + c( |
| 161 | + cumsum(rnorm(1000, 0, 3)), |
| 162 | + cumsum(rnorm(1000, 0, 4)), |
| 163 | + cumsum(rnorm(1000, 0, 1)), |
| 164 | + cumsum(rnorm(1000, 0, 2)) |
| 165 | + ) |
| 166 | + ), |
| 167 | + VariableLabel = c(rep('Class A', 1000), rep('Class B', 1000), rep('Class C', 1000), rep('Class D', 1000)) |
| 168 | +) |
| 169 | +
|
| 170 | +# base plot |
| 171 | +p1 = ggplot(dfData, aes(x = Time, y = Signal, group = VariableLabel, fill = VariableLabel)) + |
| 172 | + stat_steamgraph() |
| 173 | +
|
| 174 | +
|
| 175 | +# adding some formatting |
| 176 | +p1 + |
| 177 | + xlab('') + |
| 178 | + ylab('') + |
| 179 | + coord_fixed( 0.2 * diff(range(dfData$Time)) / diff(range(dfData$Signal))) |
| 180 | +
|
| 181 | +``` |
| 182 | + |
| 183 | + |
| 184 | +### Waterfall |
| 185 | +Available as `stat_waterfall` and `ggplot_waterfall`. |
| 186 | + |
| 187 | +Rather than the values itself, a waterfall plot tries to bring out the changes in the values. |
| 188 | + |
| 189 | + |
| 190 | +```{r waterfall, fig.align = 'center', echo = TRUE, message = F, warning = F} |
| 191 | +# creating some data |
| 192 | +set.seed(1) |
| 193 | +dfData = data.frame(x = 1:100, y = cumsum(rnorm(100))) |
| 194 | +
|
| 195 | +# base plot |
| 196 | +p1 = ggplot_waterfall( |
| 197 | + dtData = dfData, |
| 198 | + 'x', |
| 199 | + 'y' |
| 200 | +) |
| 201 | +
|
| 202 | +# adding some formatting |
| 203 | +p1 + |
| 204 | + xlab('') + |
| 205 | + ylab('') |
| 206 | +``` |
| 207 | + |
| 208 | + |
| 209 | +### Occurrence Dot Plot |
| 210 | +Available as `stat_occurrence`. |
| 211 | + |
| 212 | +This one is a favourite in infographics. For rare events, the reader would find it convenient to have the count of events encoded in the chart itself instead of having to map the value back to the Y axis. |
| 213 | + |
| 214 | + |
| 215 | +```{r occurrence_dotplot, fig.align = 'center', echo = TRUE, message = F, warning = F} |
| 216 | +# creating some data |
| 217 | +set.seed(1) |
| 218 | +dfData = data.table(x = 1:100, y = floor(4 * abs(rnorm(100, 0 , 0.4)))) |
| 219 | +
|
| 220 | +# base plot |
| 221 | +p1 = ggplot(dfData, aes(x =x, y = y) )+ |
| 222 | + stat_occurrence() |
| 223 | +
|
| 224 | +# adding some formatting |
| 225 | +p1 + |
| 226 | + xlab('') + |
| 227 | + ylab('') + |
| 228 | + coord_fixed(ylim = c(0,1 + max(dfData$y))) |
| 229 | +``` |
0 commit comments