Handle dates data

This function detects variables of data.frame which are effectively representing dates, and converts them to Date objects. When variables are character strings or factors, the function will try to convert dates with various pre-defined formats (see details). For each variable, the most common date format is automatically detected, and dates not following it are set to NA (i.e. missing). It uses a tolerance threshold for the amount of entries which cannot be converted to date (error_tolerance). By default, tolerance is set to 0.1, meaning 10% of errors in dates entry is allowed for a given variable. If there are more errors, this variable is assumed not to be a date, and left untouched.

clean_dates(
  x,
  force_Date = TRUE,
  guess_dates = TRUE,
  error_tolerance = 0.5,
  ...,
  classes = NULL
)

Arguments

x	a `data.frame`
force_Date	a `logical` or `integer` vector indicating the columns . If `logical`, indicating if `POSIXct` and `POSIXlt` objects should be converted to `Date` objects; defaults to `TRUE`; you should use this if your dates are only precise to the day (i.e. no time information within days).
guess_dates	a `logical` or `integer` vector indicating which columns should be guessed , assuming these columns store character strings or `factors`; this feature is experimental; see `guess_dates()` for more information.
error_tolerance	a number between 0 and 1 indicating the proportion of entries which cannot be identified as dates to be tolerated; if this proportion is exceeded, the original vector is returned, and a message is issued; defaults to 0.1 (10 percent)
...	further arguments passed on to `guess_dates()`
classes	a vector of class definitions for each of the columns. If this is not provided, the classes will be read from the columns themselves. Practically, this is used in `clean_data()` to mark columns as protected.

Value

A data.frame with standardised dates.

Examples


## make toy data
onsets <- as.POSIXct("2018-01-01", tz = "UTC")
onsets <- seq(onsets, by = "1 day", length.out = 10)
onsets <- sample(onsets, 20, replace = TRUE)
onsets2 <- format(as.Date(onsets), "%d/%m/%Y")
onsets3 <- format(as.Date(onsets), "%d %m %Y")
outcomes <- onsets + 1e7
admissions <- onsets + 86400 + sample(86400, 20)
admissions[1:5] <- NA
discharges <- admissions + (86400 * sample(5, 20, replace = TRUE)) + sample(86400, 20)
onset_with_errors <- onsets2
onset_with_errors[c(1,20)] <- c("male", "confirmed")
mixed_info <- onsets3
mixed_info[1:10] <- sample(c("bleeding", "fever"), 10, replace = TRUE)
gender <- sample(c("male", "female"), 20, replace = TRUE)
case_type <- c("confirmed", "probable", "suspected", "not a case")
case <- sample(case_type, 20, replace = TRUE)
toy_data <- data.frame("Date of Onset." = onsets,
                       "onset 2" = onsets2,
                       "ONSET 3" = onsets3,
                       "onset_4" = onset_with_errors,
                       "date admission" = admissions,
                       "DATE.of.DISCHARGE" = discharges,
                       "GENDER_ " = gender,
                       "Épi.Case_définition" = case,
                       "date of admission" = admissions,
                       "Date-of_discharge" = discharges,
                       "extra" = mixed_info,
                       stringsAsFactors = FALSE,
                       check.names = FALSE)
## show data
toy_data
#>    Date of Onset.    onset 2    ONSET 3    onset_4      date admission
#> 1      2018-01-10 10/01/2018 10 01 2018       male                <NA>
#> 2      2018-01-08 08/01/2018 08 01 2018 08/01/2018                <NA>
#> 3      2018-01-02 02/01/2018 02 01 2018 02/01/2018                <NA>
#> 4      2018-01-04 04/01/2018 04 01 2018 04/01/2018                <NA>
#> 5      2018-01-05 05/01/2018 05 01 2018 05/01/2018                <NA>
#> 6      2018-01-07 07/01/2018 07 01 2018 07/01/2018 2018-01-08 18:20:09
#> 7      2018-01-04 04/01/2018 04 01 2018 04/01/2018 2018-01-05 09:34:39
#> 8      2018-01-04 04/01/2018 04 01 2018 04/01/2018 2018-01-05 20:40:49
#> 9      2018-01-04 04/01/2018 04 01 2018 04/01/2018 2018-01-05 04:47:37
#> 10     2018-01-03 03/01/2018 03 01 2018 03/01/2018 2018-01-04 18:33:36
#> 11     2018-01-05 05/01/2018 05 01 2018 05/01/2018 2018-01-06 20:28:07
#> 12     2018-01-06 06/01/2018 06 01 2018 06/01/2018 2018-01-07 14:16:32
#> 13     2018-01-05 05/01/2018 05 01 2018 05/01/2018 2018-01-06 01:51:22
#> 14     2018-01-03 03/01/2018 03 01 2018 03/01/2018 2018-01-04 12:51:18
#> 15     2018-01-04 04/01/2018 04 01 2018 04/01/2018 2018-01-05 09:38:50
#> 16     2018-01-09 09/01/2018 09 01 2018 09/01/2018 2018-01-10 12:57:26
#> 17     2018-01-05 05/01/2018 05 01 2018 05/01/2018 2018-01-06 19:32:14
#> 18     2018-01-06 06/01/2018 06 01 2018 06/01/2018 2018-01-07 15:02:59
#> 19     2018-01-07 07/01/2018 07 01 2018 07/01/2018 2018-01-08 02:53:43
#> 20     2018-01-07 07/01/2018 07 01 2018  confirmed 2018-01-08 11:31:40
#>      DATE.of.DISCHARGE GENDER_  Épi.Case_définition   date of admission
#> 1                 <NA>     male            probable                <NA>
#> 2                 <NA>     male            probable                <NA>
#> 3                 <NA>   female           suspected                <NA>
#> 4                 <NA>   female           suspected                <NA>
#> 5                 <NA>   female          not a case                <NA>
#> 6  2018-01-14 16:57:18   female           suspected 2018-01-08 18:20:09
#> 7  2018-01-08 05:01:34     male           confirmed 2018-01-05 09:34:39
#> 8  2018-01-08 09:42:15   female          not a case 2018-01-05 20:40:49
#> 9  2018-01-07 07:13:17     male          not a case 2018-01-05 04:47:37
#> 10 2018-01-06 10:50:14     male           confirmed 2018-01-04 18:33:36
#> 11 2018-01-08 19:21:05     male           confirmed 2018-01-06 20:28:07
#> 12 2018-01-08 16:07:12     male          not a case 2018-01-07 14:16:32
#> 13 2018-01-07 13:45:23     male           suspected 2018-01-06 01:51:22
#> 14 2018-01-08 13:20:23   female           confirmed 2018-01-04 12:51:18
#> 15 2018-01-07 01:11:50     male            probable 2018-01-05 09:38:50
#> 16 2018-01-12 10:54:32   female           confirmed 2018-01-10 12:57:26
#> 17 2018-01-09 00:18:16   female           confirmed 2018-01-06 19:32:14
#> 18 2018-01-10 23:02:40     male          not a case 2018-01-07 15:02:59
#> 19 2018-01-09 04:19:17   female          not a case 2018-01-08 02:53:43
#> 20 2018-01-13 00:47:03   female            probable 2018-01-08 11:31:40
#>      Date-of_discharge      extra
#> 1                 <NA>      fever
#> 2                 <NA>   bleeding
#> 3                 <NA>      fever
#> 4                 <NA>   bleeding
#> 5                 <NA>   bleeding
#> 6  2018-01-14 16:57:18      fever
#> 7  2018-01-08 05:01:34   bleeding
#> 8  2018-01-08 09:42:15   bleeding
#> 9  2018-01-07 07:13:17      fever
#> 10 2018-01-06 10:50:14      fever
#> 11 2018-01-08 19:21:05 05 01 2018
#> 12 2018-01-08 16:07:12 06 01 2018
#> 13 2018-01-07 13:45:23 05 01 2018
#> 14 2018-01-08 13:20:23 03 01 2018
#> 15 2018-01-07 01:11:50 04 01 2018
#> 16 2018-01-12 10:54:32 09 01 2018
#> 17 2018-01-09 00:18:16 05 01 2018
#> 18 2018-01-10 23:02:40 06 01 2018
#> 19 2018-01-09 04:19:17 07 01 2018
#> 20 2018-01-13 00:47:03 07 01 2018
str(toy_data)
#> 'data.frame':	20 obs. of  11 variables:
#>  $ Date of Onset.     : POSIXct, format: "2018-01-10" "2018-01-08" ...
#>  $ onset 2            : chr  "10/01/2018" "08/01/2018" "02/01/2018" "04/01/2018" ...
#>  $ ONSET 3            : chr  "10 01 2018" "08 01 2018" "02 01 2018" "04 01 2018" ...
#>  $ onset_4            : chr  "male" "08/01/2018" "02/01/2018" "04/01/2018" ...
#>  $ date admission     : POSIXct, format: NA NA ...
#>  $ DATE.of.DISCHARGE  : POSIXct, format: NA NA ...
#>  $ GENDER_            : chr  "male" "male" "female" "female" ...
#>  $ Épi.Case_définition: chr  "probable" "probable" "suspected" "suspected" ...
#>  $ date of admission  : POSIXct, format: NA NA ...
#>  $ Date-of_discharge  : POSIXct, format: NA NA ...
#>  $ extra              : chr  "fever" "bleeding" "fever" "bleeding" ...

## clean variable names, store in new object, show results
clean_data <- clean_variable_names(toy_data)
#> Warning: Some variable names were duplicated after cleaning and had suffixes attached:
#> 
#>   Date-of_discharge -> date_of_discharge_1
clean_data1 <- clean_dates(clean_data, first_date = "2018-01-01")
clean_data1
#>    date_of_onset    onset_2    onset_3    onset_4 date_admission
#> 1     2018-01-10 2018-01-10 2018-01-10       <NA>           <NA>
#> 2     2018-01-08 2018-01-08 2018-01-08 2018-01-08           <NA>
#> 3     2018-01-02 2018-01-02 2018-01-02 2018-01-02           <NA>
#> 4     2018-01-04 2018-01-04 2018-01-04 2018-01-04           <NA>
#> 5     2018-01-05 2018-01-05 2018-01-05 2018-01-05           <NA>
#> 6     2018-01-07 2018-01-07 2018-01-07 2018-01-07     2018-01-08
#> 7     2018-01-04 2018-01-04 2018-01-04 2018-01-04     2018-01-05
#> 8     2018-01-04 2018-01-04 2018-01-04 2018-01-04     2018-01-05
#> 9     2018-01-04 2018-01-04 2018-01-04 2018-01-04     2018-01-05
#> 10    2018-01-03 2018-01-03 2018-01-03 2018-01-03     2018-01-04
#> 11    2018-01-05 2018-01-05 2018-01-05 2018-01-05     2018-01-06
#> 12    2018-01-06 2018-01-06 2018-01-06 2018-01-06     2018-01-07
#> 13    2018-01-05 2018-01-05 2018-01-05 2018-01-05     2018-01-06
#> 14    2018-01-03 2018-01-03 2018-01-03 2018-01-03     2018-01-04
#> 15    2018-01-04 2018-01-04 2018-01-04 2018-01-04     2018-01-05
#> 16    2018-01-09 2018-01-09 2018-01-09 2018-01-09     2018-01-10
#> 17    2018-01-05 2018-01-05 2018-01-05 2018-01-05     2018-01-06
#> 18    2018-01-06 2018-01-06 2018-01-06 2018-01-06     2018-01-07
#> 19    2018-01-07 2018-01-07 2018-01-07 2018-01-07     2018-01-08
#> 20    2018-01-07 2018-01-07 2018-01-07       <NA>     2018-01-08
#>    date_of_discharge gender epi_case_definition date_of_admission
#> 1               <NA>   male            probable              <NA>
#> 2               <NA>   male            probable              <NA>
#> 3               <NA> female           suspected              <NA>
#> 4               <NA> female           suspected              <NA>
#> 5               <NA> female          not a case              <NA>
#> 6         2018-01-14 female           suspected        2018-01-08
#> 7         2018-01-08   male           confirmed        2018-01-05
#> 8         2018-01-08 female          not a case        2018-01-05
#> 9         2018-01-07   male          not a case        2018-01-05
#> 10        2018-01-06   male           confirmed        2018-01-04
#> 11        2018-01-08   male           confirmed        2018-01-06
#> 12        2018-01-08   male          not a case        2018-01-07
#> 13        2018-01-07   male           suspected        2018-01-06
#> 14        2018-01-08 female           confirmed        2018-01-04
#> 15        2018-01-07   male            probable        2018-01-05
#> 16        2018-01-12 female           confirmed        2018-01-10
#> 17        2018-01-09 female           confirmed        2018-01-06
#> 18        2018-01-10   male          not a case        2018-01-07
#> 19        2018-01-09 female          not a case        2018-01-08
#> 20        2018-01-13 female            probable        2018-01-08
#>    date_of_discharge_1      extra
#> 1                 <NA>       <NA>
#> 2                 <NA>       <NA>
#> 3                 <NA>       <NA>
#> 4                 <NA>       <NA>
#> 5                 <NA>       <NA>
#> 6           2018-01-14       <NA>
#> 7           2018-01-08       <NA>
#> 8           2018-01-08       <NA>
#> 9           2018-01-07       <NA>
#> 10          2018-01-06       <NA>
#> 11          2018-01-08 2018-01-05
#> 12          2018-01-08 2018-01-06
#> 13          2018-01-07 2018-01-05
#> 14          2018-01-08 2018-01-03
#> 15          2018-01-07 2018-01-04
#> 16          2018-01-12 2018-01-09
#> 17          2018-01-09 2018-01-05
#> 18          2018-01-10 2018-01-06
#> 19          2018-01-09 2018-01-07
#> 20          2018-01-13 2018-01-07

## Only clean the columns that have the words "date" or "admission" in them
the_date_cols <- grep("(date|admission)", names(clean_data))
the_date_cols
#> [1]  1  5  6  9 10
clean_data2 <- clean_dates(clean_data,
                           first_date  = "2018-01-01",
                           force_Date  = the_date_cols,
                           guess_dates = the_date_cols)
clean_data2
#>    date_of_onset    onset_2    onset_3    onset_4 date_admission
#> 1     2018-01-10 10/01/2018 10 01 2018       male           <NA>
#> 2     2018-01-08 08/01/2018 08 01 2018 08/01/2018           <NA>
#> 3     2018-01-02 02/01/2018 02 01 2018 02/01/2018           <NA>
#> 4     2018-01-04 04/01/2018 04 01 2018 04/01/2018           <NA>
#> 5     2018-01-05 05/01/2018 05 01 2018 05/01/2018           <NA>
#> 6     2018-01-07 07/01/2018 07 01 2018 07/01/2018     2018-01-08
#> 7     2018-01-04 04/01/2018 04 01 2018 04/01/2018     2018-01-05
#> 8     2018-01-04 04/01/2018 04 01 2018 04/01/2018     2018-01-05
#> 9     2018-01-04 04/01/2018 04 01 2018 04/01/2018     2018-01-05
#> 10    2018-01-03 03/01/2018 03 01 2018 03/01/2018     2018-01-04
#> 11    2018-01-05 05/01/2018 05 01 2018 05/01/2018     2018-01-06
#> 12    2018-01-06 06/01/2018 06 01 2018 06/01/2018     2018-01-07
#> 13    2018-01-05 05/01/2018 05 01 2018 05/01/2018     2018-01-06
#> 14    2018-01-03 03/01/2018 03 01 2018 03/01/2018     2018-01-04
#> 15    2018-01-04 04/01/2018 04 01 2018 04/01/2018     2018-01-05
#> 16    2018-01-09 09/01/2018 09 01 2018 09/01/2018     2018-01-10
#> 17    2018-01-05 05/01/2018 05 01 2018 05/01/2018     2018-01-06
#> 18    2018-01-06 06/01/2018 06 01 2018 06/01/2018     2018-01-07
#> 19    2018-01-07 07/01/2018 07 01 2018 07/01/2018     2018-01-08
#> 20    2018-01-07 07/01/2018 07 01 2018  confirmed     2018-01-08
#>    date_of_discharge gender epi_case_definition date_of_admission
#> 1               <NA>   male            probable              <NA>
#> 2               <NA>   male            probable              <NA>
#> 3               <NA> female           suspected              <NA>
#> 4               <NA> female           suspected              <NA>
#> 5               <NA> female          not a case              <NA>
#> 6         2018-01-14 female           suspected        2018-01-08
#> 7         2018-01-08   male           confirmed        2018-01-05
#> 8         2018-01-08 female          not a case        2018-01-05
#> 9         2018-01-07   male          not a case        2018-01-05
#> 10        2018-01-06   male           confirmed        2018-01-04
#> 11        2018-01-08   male           confirmed        2018-01-06
#> 12        2018-01-08   male          not a case        2018-01-07
#> 13        2018-01-07   male           suspected        2018-01-06
#> 14        2018-01-08 female           confirmed        2018-01-04
#> 15        2018-01-07   male            probable        2018-01-05
#> 16        2018-01-12 female           confirmed        2018-01-10
#> 17        2018-01-09 female           confirmed        2018-01-06
#> 18        2018-01-10   male          not a case        2018-01-07
#> 19        2018-01-09 female          not a case        2018-01-08
#> 20        2018-01-13 female            probable        2018-01-08
#>    date_of_discharge_1      extra
#> 1                 <NA>      fever
#> 2                 <NA>   bleeding
#> 3                 <NA>      fever
#> 4                 <NA>   bleeding
#> 5                 <NA>   bleeding
#> 6           2018-01-14      fever
#> 7           2018-01-08   bleeding
#> 8           2018-01-08   bleeding
#> 9           2018-01-07      fever
#> 10          2018-01-06      fever
#> 11          2018-01-08 05 01 2018
#> 12          2018-01-08 06 01 2018
#> 13          2018-01-07 05 01 2018
#> 14          2018-01-08 03 01 2018
#> 15          2018-01-07 04 01 2018
#> 16          2018-01-12 09 01 2018
#> 17          2018-01-09 05 01 2018
#> 18          2018-01-10 06 01 2018
#> 19          2018-01-09 07 01 2018
#> 20          2018-01-13 07 01 2018
str(clean_data2)
#> 'data.frame':	20 obs. of  11 variables:
#>  $ date_of_onset      : Date, format: "2018-01-10" "2018-01-08" ...
#>  $ onset_2            : chr  "10/01/2018" "08/01/2018" "02/01/2018" "04/01/2018" ...
#>  $ onset_3            : chr  "10 01 2018" "08 01 2018" "02 01 2018" "04 01 2018" ...
#>  $ onset_4            : chr  "male" "08/01/2018" "02/01/2018" "04/01/2018" ...
#>  $ date_admission     : Date, format: NA NA ...
#>  $ date_of_discharge  : Date, format: NA NA ...
#>  $ gender             : chr  "male" "male" "female" "female" ...
#>  $ epi_case_definition: chr  "probable" "probable" "suspected" "suspected" ...
#>  $ date_of_admission  : Date, format: NA NA ...
#>  $ date_of_discharge_1: Date, format: NA NA ...
#>  $ extra              : chr  "fever" "bleeding" "fever" "bleeding" ...
#>  - attr(*, "comment")= Named chr  "Date of Onset." "onset 2" "ONSET 3" "onset_4" ...
#>   ..- attr(*, "names")= chr  "date_of_onset" "onset_2" "onset_3" "onset_4" ...

## A more complex example: clean date and admissions, but avoid the discharge
## column, since the timestamp is important
the_date_cols <- grepl("(date|admission)", names(clean_data))
discharge     <- grepl("discharge", names(clean_data))

## set names so that these are easier to track
names(the_date_cols) <- names(clean_data) -> names(discharge)

the_date_cols # columns we want
#>       date_of_onset             onset_2             onset_3             onset_4 
#>                TRUE               FALSE               FALSE               FALSE 
#>      date_admission   date_of_discharge              gender epi_case_definition 
#>                TRUE                TRUE               FALSE               FALSE 
#>   date_of_admission date_of_discharge_1               extra 
#>                TRUE                TRUE               FALSE 
!discharge    # columns that are not the discharge columns ("!" means "not")
#>       date_of_onset             onset_2             onset_3             onset_4 
#>                TRUE                TRUE                TRUE                TRUE 
#>      date_admission   date_of_discharge              gender epi_case_definition 
#>                TRUE               FALSE                TRUE                TRUE 
#>   date_of_admission date_of_discharge_1               extra 
#>                TRUE               FALSE                TRUE 
to_keep     <- the_date_cols & !discharge # removing the discharge column
clean_data3 <- clean_dates(clean_data,
                           first_date  = "2018-01-01",
                           force_Date  = to_keep,
                           guess_dates = to_keep)
clean_data3
#>    date_of_onset    onset_2    onset_3    onset_4 date_admission
#> 1     2018-01-10 10/01/2018 10 01 2018       male           <NA>
#> 2     2018-01-08 08/01/2018 08 01 2018 08/01/2018           <NA>
#> 3     2018-01-02 02/01/2018 02 01 2018 02/01/2018           <NA>
#> 4     2018-01-04 04/01/2018 04 01 2018 04/01/2018           <NA>
#> 5     2018-01-05 05/01/2018 05 01 2018 05/01/2018           <NA>
#> 6     2018-01-07 07/01/2018 07 01 2018 07/01/2018     2018-01-08
#> 7     2018-01-04 04/01/2018 04 01 2018 04/01/2018     2018-01-05
#> 8     2018-01-04 04/01/2018 04 01 2018 04/01/2018     2018-01-05
#> 9     2018-01-04 04/01/2018 04 01 2018 04/01/2018     2018-01-05
#> 10    2018-01-03 03/01/2018 03 01 2018 03/01/2018     2018-01-04
#> 11    2018-01-05 05/01/2018 05 01 2018 05/01/2018     2018-01-06
#> 12    2018-01-06 06/01/2018 06 01 2018 06/01/2018     2018-01-07
#> 13    2018-01-05 05/01/2018 05 01 2018 05/01/2018     2018-01-06
#> 14    2018-01-03 03/01/2018 03 01 2018 03/01/2018     2018-01-04
#> 15    2018-01-04 04/01/2018 04 01 2018 04/01/2018     2018-01-05
#> 16    2018-01-09 09/01/2018 09 01 2018 09/01/2018     2018-01-10
#> 17    2018-01-05 05/01/2018 05 01 2018 05/01/2018     2018-01-06
#> 18    2018-01-06 06/01/2018 06 01 2018 06/01/2018     2018-01-07
#> 19    2018-01-07 07/01/2018 07 01 2018 07/01/2018     2018-01-08
#> 20    2018-01-07 07/01/2018 07 01 2018  confirmed     2018-01-08
#>      date_of_discharge gender epi_case_definition date_of_admission
#> 1                 <NA>   male            probable              <NA>
#> 2                 <NA>   male            probable              <NA>
#> 3                 <NA> female           suspected              <NA>
#> 4                 <NA> female           suspected              <NA>
#> 5                 <NA> female          not a case              <NA>
#> 6  2018-01-14 16:57:18 female           suspected        2018-01-08
#> 7  2018-01-08 05:01:34   male           confirmed        2018-01-05
#> 8  2018-01-08 09:42:15 female          not a case        2018-01-05
#> 9  2018-01-07 07:13:17   male          not a case        2018-01-05
#> 10 2018-01-06 10:50:14   male           confirmed        2018-01-04
#> 11 2018-01-08 19:21:05   male           confirmed        2018-01-06
#> 12 2018-01-08 16:07:12   male          not a case        2018-01-07
#> 13 2018-01-07 13:45:23   male           suspected        2018-01-06
#> 14 2018-01-08 13:20:23 female           confirmed        2018-01-04
#> 15 2018-01-07 01:11:50   male            probable        2018-01-05
#> 16 2018-01-12 10:54:32 female           confirmed        2018-01-10
#> 17 2018-01-09 00:18:16 female           confirmed        2018-01-06
#> 18 2018-01-10 23:02:40   male          not a case        2018-01-07
#> 19 2018-01-09 04:19:17 female          not a case        2018-01-08
#> 20 2018-01-13 00:47:03 female            probable        2018-01-08
#>    date_of_discharge_1      extra
#> 1                 <NA>      fever
#> 2                 <NA>   bleeding
#> 3                 <NA>      fever
#> 4                 <NA>   bleeding
#> 5                 <NA>   bleeding
#> 6  2018-01-14 16:57:18      fever
#> 7  2018-01-08 05:01:34   bleeding
#> 8  2018-01-08 09:42:15   bleeding
#> 9  2018-01-07 07:13:17      fever
#> 10 2018-01-06 10:50:14      fever
#> 11 2018-01-08 19:21:05 05 01 2018
#> 12 2018-01-08 16:07:12 06 01 2018
#> 13 2018-01-07 13:45:23 05 01 2018
#> 14 2018-01-08 13:20:23 03 01 2018
#> 15 2018-01-07 01:11:50 04 01 2018
#> 16 2018-01-12 10:54:32 09 01 2018
#> 17 2018-01-09 00:18:16 05 01 2018
#> 18 2018-01-10 23:02:40 06 01 2018
#> 19 2018-01-09 04:19:17 07 01 2018
#> 20 2018-01-13 00:47:03 07 01 2018
str(clean_data3)
#> 'data.frame':	20 obs. of  11 variables:
#>  $ date_of_onset      : Date, format: "2018-01-10" "2018-01-08" ...
#>  $ onset_2            : chr  "10/01/2018" "08/01/2018" "02/01/2018" "04/01/2018" ...
#>  $ onset_3            : chr  "10 01 2018" "08 01 2018" "02 01 2018" "04 01 2018" ...
#>  $ onset_4            : chr  "male" "08/01/2018" "02/01/2018" "04/01/2018" ...
#>  $ date_admission     : Date, format: NA NA ...
#>  $ date_of_discharge  : POSIXct, format: NA NA ...
#>  $ gender             : chr  "male" "male" "female" "female" ...
#>  $ epi_case_definition: chr  "probable" "probable" "suspected" "suspected" ...
#>  $ date_of_admission  : Date, format: NA NA ...
#>  $ date_of_discharge_1: POSIXct, format: NA NA ...
#>  $ extra              : chr  "fever" "bleeding" "fever" "bleeding" ...
#>  - attr(*, "comment")= Named chr  "Date of Onset." "onset 2" "ONSET 3" "onset_4" ...
#>   ..- attr(*, "names")= chr  "date_of_onset" "onset_2" "onset_3" "onset_4" ...

Arguments

Value

See also

Examples

Contents

Author