This package implements small helper functions usefull in infectious disease modelling and epidemics analysis.

To install the current stable, CRAN version of the package, type:

`install.packages("epitrix")`

To benefit from the latest features and bug fixes, install the development, *github* version of the package using:

`devtools::install_github("reconhub/epitrix")`

Note that this requires the package *devtools* installed.

The main features of the package include:

: convert shape and scale of a Gamma distribution to mean and CV`gamma_shapescale2mucv`

: convert mean and CV of a Gamma distribution to shape and scale`gamma_mucv2shapescale`

: Gamma log-likelihood using mean and CV`gamma_log_likelihood`

: convert growth rate into a reproduction number`r2R0`

: generates a distribution of R0 from a log-incidence linear model`lm2R0_sample`

: fits a discretised Gamma distribution to data (typically useful for describing delays)`fit_disc_gamma`

: generate portable labels by removing non-standard characters or replacing them with their closest alphanumeric matches, standardising separators, etc.`clean_labels`

: generate unique, anonymised, reproducible labels from various data fields (e.g. First name, Last name, Date of birth).`hash_names`

In this example, we simulate data which replicate the serial interval (SI), i.e. the delays between primary and secondary symptom onsets, in Ebola Virus Disease (EVD). We start by converting previously estimates of the mean and standard deviation of the SI (WHO Ebola Response Team (2014) NEJM 371:1481–1495) to the parameters of a Gamma distribution:

```
library(epitrix)
mu <- 15.3 # mean in days days
sigma <- 9.3 # standard deviation in days
cv <- sigma/mu # coefficient of variation
cv
```

`## [1] 0.6078431`

```
param <- gamma_mucv2shapescale(mu, cv) # convertion to Gamma parameters
param
```

```
## $shape
## [1] 2.706556
##
## $scale
## [1] 5.652941
```

The *shape* and *scale* are parameters of a Gamma distribution we can use to generate delays. However, delays are typically reported per days, which implies a discretisation (from continuous time to discrete numbers). We use the package *distcrete* to achieve this discretisation. It generates a list of functions, including one to simulate data (`$r`

), which we use to simulate 500 delays:

```
si <- distcrete::distcrete("gamma", interval = 1,
shape = param$shape,
scale = param$scale, w = 0)
si
```

```
## A discrete distribution
## name: gamma
## parameters:
## shape: 2.70655567117586
## scale: 5.65294117647059
```

```
set.seed(1)
x <- si$r(500)
head(x, 10)
```

`## [1] 8 10 15 28 7 27 32 17 16 4`

```
hist(x, col = "grey", border = "white",
xlab = "Days between primary and secondary onset",
main = "Simulated serial intervals")
```

`x`

contains simulated data, for illustrative purpose. In practice, one would use real data from an ongoing outbreaks. Now we use `fit_disc_gamma`

to estimate the parameters of a dicretised Gamma distribution from the data:

```
si_fit <- fit_disc_gamma(x)
si_fit
```

```
## $mu
## [1] 15.21914
##
## $cv
## [1] 0.5851581
##
## $sd
## [1] 8.905604
##
## $ll
## [1] -1741.393
##
## $converged
## [1] TRUE
##
## $distribution
## A discrete distribution
## name: gamma
## parameters:
## shape: 2.92047557759351
## scale: 5.21118646429829
```

The package *incidence* can fit a log-linear model to incidence curves (function `fit`

), which produces a growth rate (r). This growth rate can in turn be translated into a basic reproduction number (R0) using `r2R0`

. We illustrate this using simulated Ebola data from the *outbreaks* package, and using the serial interval from the previous example:

```
library(outbreaks)
library(incidence)
i <- incidence(ebola_sim$linelist$date_of_onset)
i
```

```
## <incidence object>
## [5888 cases from days 2014-04-07 to 2015-04-30]
##
## $counts: matrix with 389 rows and 1 columns
## $n: 5888 cases in total
## $dates: 389 dates marking the left-side of bins
## $interval: 1 day
## $timespan: 389 days
```

`f <- fit(i[1:150]) # fit on first 150 days`

`## Warning in fit(i[1:150]): 22 dates with incidence of 0 ignored for fitting`

`plot(i[1:200], fit = f, color = "#9fc2fc")`

`r2R0(f$info$r, si$d(1:100))`

`## [1] 1.348624`

`r2R0(f$info$r.conf, si$d(1:100))`

```
## 2.5 % 97.5 %
## [1,] 1.314055 1.383674
```

In addition, we can also use the function `lm2R0_sample`

to generate samples of R0 values compatible with a model fit:

```
R0_val <- lm2R0_sample(f$lm, si$d(1:100), n = 100)
head(R0_val)
```

`## [1] 1.350970 1.347374 1.350076 1.358523 1.341549 1.341634`

`hist(R0_val, col = "grey", border = "white")`

If you want to use labels that will work across different computers, independent of local encoding and operating systems, `clean_labels`

will make your life easier. The function transforms character strings by replacing diacritic symbols with their closest alphanumeric matches, setting all characters to lower case, and replacing various separators with a single, consistent one.

For instance:

```
x <- " Thîs- is A wêïrD LäBeL .."
x
```

`## [1] " Thîs- is A wêïrD LäBeL .."`

`clean_labels(x)`

`## [1] "this_is_a_weird_label"`

```
variables <- c("Date.of.ONSET ",
"/ date of hôspitalisation /",
"-DäTÈ--OF___DîSCHARGE-",
"GEndèr/",
" Location. ")
variables
```

```
## [1] "Date.of.ONSET " "/ date of hôspitalisation /"
## [3] "-DäTÈ--OF___DîSCHARGE-" "GEndèr/"
## [5] " Location. "
```

`clean_labels(variables)`

```
## [1] "date_of_onset" "date_of_hospitalisation"
## [3] "date_of_discharge" "gender"
## [5] "location"
```

`hash_names`

can be used to generate hashed labels from linelist data. Based on pre-defined fields, it will generate anonymous labels. This system has the following desirable features:

given the same input, the output will always be the same, so this encoding system generates labels which can be used by different people and organisations

given different inputs, the output will always be different; even minor differences in input will result in entirely different outputs

given an output, it is very hard to infer the input (it requires hacking skills); if security is challenged, the hashing algorithm can be ‘salted’ to strengthen security

```
first_name <- c("Jane", "Joe", "Raoul", "Raoul")
last_name <- c("Doe", "Smith", "Dupont", "Dupond")
age <- c(25, 69, 36, 36)
## detailed output by default
hash_names(first_name, last_name, age)
```

```
## label hash_short hash
## 1 jane_doe_25 4ea703 4ea703cdc94049fc649cd152faa5beed02858d69
## 2 joe_smith_69 c76ee2 c76ee26d66438710023bb256a644ba9db9c2f6be
## 3 raoul_dupont_36 abdf0b abdf0bf7990ce5127b406ca16fc1edd659ff51b3
## 4 raoul_dupond_36 783fc2 783fc2dc44b496c3afd3b4546e997aea63f1e0df
```

```
## short labels for practical use
hash_names(first_name, last_name, age,
size = 8, full = FALSE)
```

`## [1] "4ea703cd" "c76ee26d" "abdf0bf7" "783fc2dc"`

The overview vignette essentially replicates the content of this `README`

. To request or contribute other vignettes, see the section “*getting help, contributing*”.

Click here for the website dedicated to *epitrix*.

Bug reports and feature requests should be posted on *github* using the *issue* system. All other questions should be posted on the **RECON forum**.

Contributions are welcome via **pull requests**.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.