netdiffuseR_sunbelt2016_workshop_reading_data.Rmd

---
title: |
  | Understanding Diffusion with __netdiffuseR__
  | _Reading data_
subtitle: Sunbelt 2016 INSNA
author: George Vega Yon \and Thomas Valente
date: |
  | Newport Beach, CA
  | April 5, 2016
institute: |
  | Department of Preventive Medicine
  | University of Southern California
header-includes: \usepackage{grffile}
output: beamer_presentation
fontsize: 10pt
---

```{r Config-knitr, echo=FALSE, warning=FALSE, message=FALSE}
knitr::opts_chunk$set(
  out.width = ".8\\textwidth", dev = "pdf", fig.align = "center",
  fig.width = 7, fig.height = 5)
pdf.options(family="Palatino")

options(width = 100)

library(sna)
library(SparseM)
library(netdiffuseR)
```

# Introduction
## Data in __netdiffuseR__

- __netdiffuseR__ has its own class of objects: `diffnet`.

- Most of the package's functions accept different types of graphs:
    * Static: `matrix`, `dgCMatrix` (from the __Matrix__ pkg), 
    * Dynamic: `list` + `dgCMatrix`, `array`, `diffnet`
  
- But `diffnet` is the class from which you get the most.

- From __netdiffuseR__'s perspective, network data comes in three classes:

    1. Raw R network data: Datasets with edgelist, attributes, survey data, etc.
    2. Already R data: already read into R using igraph, statnet, etc. (`igraph_to_diffnet`,     `network_to_diffnet`, etc.)
    3. Graph files: DL, UCINET, pajek, etc. (`read_pajek`, `read_dl`, `read_ucinet`, etc.)

- In this presentation we will show focus on 1.

# Introduction
## `diffnet` objects

A diffusion network, a.k.a. `diffnet` object, is a `list` that holds the following objects:

- `graph`: A `list` with $t$ `dgCMatrix` matrices of size $n\times n$,
- `toa`: An integer vector of length $n$,
- `adopt`: A matrix of size $n\times t$,
- `cumadopt`: A matrix of size $n\times t$,
- `vertex.static.attrs`: A `data.frame` of size $n\times k$,
- `vertex.dyn.attrs`: A list with $t$ dataframes of size $n\times k$,
- `graph.attrs`: Currently ignored..., and
- `meta`: A list with metadata about the object.

These are created using `as_diffnet` (or its wrappers).

------

\huge  Example 1: Nomination networks (cross-section data)  \normalsize

# Example 1: Nomination networks (cross-section data) 

- In this part we review the function `survey_to_diffnet`
- This function can use as input either a longitudinal dataset (which should be in long format, this is, one row per individual and time period), or a cross sectional dataset.
- For the first example we will use the `fakesurvey` dataset, which holds cross section data.

# Example 1: Nomination networks (cross-section data) 

We start by taking a look at the data.

\tiny
```{r split=FALSE}
# Loading the data
data("fakesurvey")
fakesurvey
```
\normalsize

In group one 4 nominates id 6, who does not show in the data, and in group two 1 nominates 3, 4, and 8, also individuals who don't show up in the survey.

# Example 1: Nomination networks (cross-section data) 
## Not including unsurveyed

Reading the data into __netdiffuseR__ with the option `no.unsurveyed = TRUE`

\tiny
```{r}
# Coercing the survey data into a diffnet object
diffnet_wo_unsurveyed <- survey_to_diffnet(
  dat      = fakesurvey,                # The dataset
  idvar    = "id",                      # Name of the idvar (must be integer)
  netvars  = c("net1", "net2", "net3"), # Vector of names of nomination vars
  toavar   = "toa",                     # Name of the time of adoption var
  groupvar = "group",                   # Name of the group var (OPTIONAL)
  no.unsurveyed = TRUE                  # KEEP OR NOT UNSURVEYED
)
diffnet_wo_unsurveyed

# Retrieving nodes ids
nodes(diffnet_wo_unsurveyed)
```
\normalsize

# Example 1: Nomination networks (cross-section data)
## ~~Not~~ including unsurveyed

Reading the data into __netdiffuseR__ with the option `no.unsurveyed = FALSE`

\tiny
```{r}
# Coercing the survey data into a diffnet object
diffnet_w_unsurveyed <- survey_to_diffnet(
  dat      = fakesurvey,                # The dataset
  idvar    = "id",                      # Name of the idvar (must be integer)
  netvars  = c("net1", "net2", "net3"), # Vector of names of nomination vars
  toavar   = "toa",                     # Name of the time of adoption var
  groupvar = "group",                   # Name of the group var (OPTIONAL)
  no.unsurveyed = FALSE                 # KEEP OR NOT UNSURVEYED
)
diffnet_w_unsurveyed

# Retrieving nodes ids
nodes(diffnet_w_unsurveyed)
```
\normalsize

# Example 1: Nomination networks (cross-section data)
Furthermore, we can compare the two diffusion networks by subtracting one from another:

\footnotesize
```{r}
difference <- diffnet_w_unsurveyed - diffnet_wo_unsurveyed
difference
```
\normalsize

------

\huge Example 2: Nomination networks (longitudinal data) \normalsize

# Example 2: Nomination networks (longitudinal data)

- Keep using `survey_to_diffnet`
- Now we will load panel data!
- For the first example we will use the `fakesurveyDyn` dataset, which holds cross section data.

# Example 2: Nomination networks (longitudinal data)

We start by taking a look at the data.

\tiny
```{r split=FALSE}
# Loading the data
data("fakesurveyDyn")
fakesurveyDyn
```
\normalsize

In group one 4 nominates id 6, who does not show in the data, and in group two 1 nominates 3, 4, and 8, also individuals who don't show up in the survey.

# Example 2: Nomination networks (longitudinal data)
## Reading the data-in

\tiny
```{r}
dyndiffnet <- survey_to_diffnet(
  dat      = fakesurveyDyn,             # The dataset
  idvar    = "id",                      # Name of the idvar (must be integer)
  netvars  = c("net1", "net2", "net3"), # Vector of names of nomination vars
  toavar   = "toa",                     # Name of the time of adoption var
  groupvar = "group",                   # Name of the group var (OPTIONAL)
  no.unsurveyed = TRUE,                 # keep or not unsurveyed
  timevar  = "time"                     # This is new!
)
dyndiffnet
```
\normalsize

# Example 2: Nomination networks (longitudinal data)
## Quick look at some attributes

\tiny
```{r}
# As a list
dyndiffnet[["age"]]

# As a data frame!
dyndiffnet[["age", as.df=TRUE]]
```
\normalsize

# Example 2: Nomination networks (longitudinal data)
## Quick look at the dynamic graph

\tiny
```{r, fig.width=12, fig.height=4}
plot_diffnet(dyndiffnet, vertex.cex = 1.5, displaylabels = TRUE)
```
\normalsize


------

\huge Example 3: Edgelist (cross-section data) \normalsize

# Example 3: Edgelist (cross-section data)

- Now we will use an edgelist as an input.
- For this example we'll use the `fakeEdgelist` dataset.
- We are also using the `fakesurvey` dataset as it holds the attributes.
- Now is turn of the function `edgelist_to_diffnet` to get into action... But first, we will look at the function `edgelist_to_adjmat` (which is actually what's underthehood)

# Example 3: Edgelist (cross-section data)

Lets take a look at the data

\footnotesize
```{r}
data("fakeEdgelist")
fakeEdgelist
```
\normalsize

# Example 3: Edgelist (cross-section data)
## Edgelists as adjacency matrices

\tiny
```{r}
# Coercing the edgelist to an adjacency matrix
adjmat  <- edgelist_to_adjmat(
  edgelist   = fakeEdgelist[,1:2], # Should be a two column matrix/data.frame
  w          = fakeEdgelist$value, # An optional vector with weights
  undirected = FALSE,              # In this case, the edgelist is directed
  t          = 5)                  # We use this option to make 5 replicas of it
# nnodes(adjmat)
adjmat[[1]][1:5,1:5]
```
\normalsize

- The problem is with the last edge. It has an `NA` in the column `value`.
- If we want to keep it we have to complete that data

# Example 3: Edgelist (cross-section data)
## Edgelists as adjacency matrices (cont.)

\scriptsize
```{r}
# Fixing
fakeEdgelist[11,"value"] <- 0

# Coercing the edgelist to an adjacency matrix
adjmat  <- edgelist_to_adjmat(
  edgelist   = fakeEdgelist[,1:2], # Should be a two column matrix/data.frame
  w          = fakeEdgelist$value, # An optional vector with weights
  undirected = FALSE,              # In this case, the edgelist is directed
  t          = 5)                  # We use this option to make 5 replicas of it
# nnodes(adjmat)
adjmat[[1]][1:5,1:5]

```
\normalsize

# Example 3: Edgelist (cross-section data)
## Ids in edgelist and attributes

- Notice that, in this case, the ids are already processed accordingly to groups.
- So we need to make ids from both, attributes and edgelist, to match

\footnotesize
```{r}
# Before
fakesurvey$id

# Changing the id
fakesurvey$id <- with(fakesurvey, group*100 + id)

# After
fakesurvey$id
```
\normalsize

# Example 3: Edgelist (cross-section data)
## Reading the data in

\scriptsize
```{r}
diffnet <- edgelist_to_diffnet(
  edgelist = fakeEdgelist[,1:2], # Passed to edgelist_to_adjmat
  w        = fakeEdgelist$value, # Passed to edgelist_to_adjmat
  dat      = fakesurvey,         # Data frame with -idvar- and -toavar-
  idvar    = "id",               # Name of the -idvar- in -dat-
  toavar   = "toa",              # Name of the -toavar- in -dat-
  keep.isolates = TRUE           # Passed to edgelist_to_adjmat   
)
diffnet
```
\normalsize


------

\huge Example 4: Edgelist (longitudinal data) \normalsize

# Example 4: Edgelist (longitudinal data)

- Very much like before, but now we have a dynamic graph (so we need dynamic attributes as well)
- For this example we'll use the `fakeDynEdgelist` dataset.
- We are also using the `fakesurveyDyn` dataset as it holds the attributes.
- Again, for this data, we need to fix the ids


# Example 4: Edgelist (longitudinal data)
## Ids in edgelist and attributes

\scriptsize
```{r}
# Before
head(fakesurveyDyn$id)

# Changing the id
fakesurveyDyn$id <- with(fakesurveyDyn, group*100 + id)

# After
head(fakesurveyDyn$id)
```
\normalsize

# Example 4: Edgelist (longitudinal data)
## Reading the data in

\tiny
```{r}
diffnet <- edgelist_to_diffnet(
  edgelist = fakeDynEdgelist[,1:2], # As usual, a two column dataset
  w        = fakeDynEdgelist$value, # Here we are using weights
  t0       = fakeDynEdgelist$time,  # An integer vector with starting point of spell
  t1       = fakeDynEdgelist$time,  # An integer vector with the endpoint of spell
  dat      = fakesurveyDyn,         # Attributes dataset
  idvar    = "id",                  
  toavar   = "toa",
  timevar  = "time",
  keep.isolates = TRUE,             # Keeping isolates (if there's any)
  warn.coercion = FALSE
)

diffnet
```
\normalsize

--------

\huge Example 5: What about import/export graphs? \normalsize

# Example 5: What about import/export graphs?
## `diffnet` to `igraph`

- We can use the `igraph_to_diffnet` and `diffnet_to_igraph` functions

\tiny
```{r}
# Back and forth
ig <- diffnet_to_igraph(medInnovationsDiffNet)
# ig (igraph has a bug printing this)
dn <- igraph_to_diffnet(ig[[1]], "toa")

# Comparing
dn; medInnovationsDiffNet
```
\normalsize

# Example 5: What about import/export graphs?
## `diffnet` to `network`

- For `diffnet_to_network`... coming soon (in the meantime, you can use the __intergraph__ package!)

\tiny
```{r}
library(intergraph)
asNetwork(ig[[1]])
```
\normalsize

# Example 5: What about import/export graphs?
## `diffnet` to `RSiena`

\scriptsize
```{r}
library(RSiena)

# Creating an array from the fakeDyn
medInnovationsDiffNet <- medInnovationsDiffNet[,,1:5]
nominationsData <- as.array(medInnovationsDiffNet)
nominationsData <- (nominationsData > 1) + 0L
nominations     <- sienaDependent(nominationsData)

# Covariates
dat     <- diffnet.attrs(medInnovationsDiffNet)
proage1 <- coCovar(as.numeric(dat[[1]][["proage"]]))

adopts  <- lapply(dat, function(x) as.integer(with(x, per==toa)))
adopts  <- varCovar(do.call(cbind, adopts))

# Putting all together
mydata <- sienaDataCreate(nominations, proage1, adopts)
myeff <- getEffects( mydata )
print01Report( mydata, myeff,modelname="s50")
```
\normalsize