The epicontacts data structure is useful for epidemiological network analysis of cases and contacts. Data partitioned as line list and contact list formats can be coerced to the epicontacts class in order to facilitate manipulation, visualization and analysis.

Using a simulated ebola outbreak dataset from the outbreaks package, this vignette will explore how to create an epicontacts object and use several generic methods to work with the data.

make_epicontacts()

make_epicontacts() creates the epicontacts data structure. The function accepts arguments for:

  • linelist: a data frame with at least one column providing unique case identifiers
  • contacts: a data frame with at least two columns indicating individuals that were in contact with one another (nb these may or may not be referenced in the line list)
  • id: the name or index for the column in the linelist that contains unique identifiers for individual cases; defaults to the first column
  • from: the name or index for the column in the contacts that contains the originating case; defaults to first column; can still be used in non-directed networks but should not be interpreted as direction (see directed argument)
  • to: the name or index for the column in the contacts that contains secondary case; defaults to second column; can still be used in non-directed networks but should not be interpreted as direction (see directed argument)
  • directed: indicator as to whether or not the contact moved in a direction (i.e. “from” one individual “to” another)

Before creating an epicontacts object, it may be helpful to examine the structure of the line list and contact data. The example that follows uses the ebola_sim data loaded from the outbreaks package.

## List of 2
##  $ linelist:'data.frame':    5888 obs. of  11 variables:
##   ..$ case_id                : chr [1:5888] "d1fafd" "53371b" "f5c3d8" "6c286a" ...
##   ..$ generation             : int [1:5888] 0 1 1 2 2 0 3 3 2 3 ...
##   ..$ date_of_infection      : Date[1:5888], format: NA "2014-04-09" ...
##   ..$ date_of_onset          : Date[1:5888], format: "2014-04-07" "2014-04-15" ...
##   ..$ date_of_hospitalisation: Date[1:5888], format: "2014-04-17" "2014-04-20" ...
##   ..$ date_of_outcome        : Date[1:5888], format: "2014-04-19" NA ...
##   ..$ outcome                : Factor w/ 2 levels "Death","Recover": NA NA 2 1 2 NA 2 1 2 1 ...
##   ..$ gender                 : Factor w/ 2 levels "f","m": 1 2 1 1 1 1 1 1 2 2 ...
##   ..$ hospital               : Factor w/ 11 levels "Connaught Hopital",..: 4 2 7 NA 7 NA 2 9 7 11 ...
##   ..$ lon                    : num [1:5888] -13.2 -13.2 -13.2 -13.2 -13.2 ...
##   ..$ lat                    : num [1:5888] 8.47 8.46 8.48 8.46 8.45 ...
##  $ contacts:'data.frame':    3800 obs. of  3 variables:
##   ..$ infector: chr [1:3800] "d1fafd" "cac51e" "f5c3d8" "0f58c4" ...
##   ..$ case_id : chr [1:3800] "53371b" "f5c3d8" "0f58c4" "881bd4" ...
##   ..$ source  : Factor w/ 2 levels "funeral","other": 2 1 2 2 2 1 2 2 2 2 ...

ebola_sim is a list with two data frames, which contain the line list and contacts respectively. The line list data frame already has a unique identifier for cases in the first column, and the contacts data has the individual contacts represented in the first and second columns. Note that if the input data were not formatted as such, the id, from and to arguments allow for explicit definition of the columns that contain these attributes.

Assuming this network of contacts is directed, the following call to make_epicontacts will generate an epicontacts object:

x <- make_epicontacts(linelist = ebola_sim$linelist, contacts = ebola_sim$contacts, directed = TRUE)

Use class() to confirm that make_epicontacts() worked:

## [1] "epicontacts"

epicontacts objets are at their core list objects.

## [1] TRUE

As with other lists, the named elements of the epicontacts data structure can be easily accessed with the $ operator.

$linelist

head(x$linelist)
##       id generation date_of_infection date_of_onset date_of_hospitalisation
## 1 d1fafd          0              <NA>    2014-04-07              2014-04-17
## 2 53371b          1        2014-04-09    2014-04-15              2014-04-20
## 3 f5c3d8          1        2014-04-18    2014-04-21              2014-04-25
## 4 6c286a          2              <NA>    2014-04-27              2014-04-27
## 5 0f58c4          2        2014-04-22    2014-04-26              2014-04-29
## 6 49731d          0        2014-03-19    2014-04-25              2014-05-02
##   date_of_outcome outcome gender           hospital       lon      lat
## 1      2014-04-19    <NA>      f  Military Hospital -13.21799 8.473514
## 2            <NA>    <NA>      m Connaught Hospital -13.21491 8.464927
## 3      2014-04-30 Recover      f              other -13.22804 8.483356
## 4      2014-05-07   Death      f               <NA> -13.23112 8.464776
## 5      2014-05-17 Recover      f              other -13.21016 8.452143
## 6      2014-05-07    <NA>      f               <NA> -13.23443 8.468572

$contacts

head(x$contacts)
##      from     to  source
## 2  d1fafd 53371b   other
## 3  cac51e f5c3d8 funeral
## 5  f5c3d8 0f58c4   other
## 8  0f58c4 881bd4   other
## 11 8508df 40ae5f   other
## 12 127d83 f547d6 funeral

Generics

The epicontacts data structure enables some convenient implementations of “generic” functions in R. These functions (plot(), print(), summary(), etc.) behave differently depending on the class of the input.

summary.epicontacts()

The summary method provides descriptive information regarding the dimensions and relationship between the line list and contact list (i.e. how many ids they share).

## 
## /// Overview //
##   // number of unique IDs in linelist: 5888
##   // number of unique IDs in contacts: 5511
##   // number of unique IDs in both: 4352
##   // number of contacts: 3800
##   // contacts with both cases in linelist: 56.868 %
## 
## /// Degrees of the network //
##   // in-degree summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  1.0000  0.5392  1.0000  1.0000 
## 
##   // out-degree summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.5392  1.0000  6.0000 
## 
##   // in and out degree summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   1.000   1.000   1.078   1.000   7.000 
## 
## /// Attributes //
##   // attributes in linelist:
##  generation date_of_infection date_of_onset date_of_hospitalisation date_of_outcome outcome gender hospital lon lat

subset.epicontacts()

With this method, one can reduce the size of the epicontacts object by filtering rows based on explicit values in the line list (node) and contact list (edge) components. For more on how to parameterize the subset, see ?subset.epicontacts.

nb this function returns an epicontacts object, which can in turn be passed to another generic method.

rokupafuneral <- subset(x, 
                        node_attribute = list("hospital" = "Rokupa Hospital"), 
                        edge_attribute = list("source" = "funeral"))
summary(rokupafuneral)
## 
## /// Overview //
##   // number of unique IDs in linelist: 443
##   // number of unique IDs in contacts: 1019
##   // number of unique IDs in both: 45
##   // number of contacts: 572
##   // contacts with both cases in linelist: 0 %
## 
## /// Degrees of the network //
##   // in-degree summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.4037  1.0000  1.0000 
## 
##   // out-degree summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.4037  1.0000  4.0000 
## 
##   // in and out degree summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  1.0000  0.8073  1.0000  4.0000 
## 
## /// Attributes //
##   // attributes in linelist:
##  generation date_of_infection date_of_onset date_of_hospitalisation date_of_outcome outcome gender hospital lon lat

plot.epicontacts()

By default, passing an epicontacts object into the plot function is effectively the same as using vis_epicontacts(), and will generate an interactive visualiztion of the network of cases and contacts. Note that this method includes a number of options to customize the plot. For more see ?vis_epicontacts.

plot(rokupafuneral, y = "outcome")