This function extracts the structures of two data.frames and compares them, issuing a series of diagnostics.

compare_data(ref, x, ...)

# S3 method for default
compare_data(ref, x, ...)

# S3 method for data_structure
compare_data(
  ref,
  x,
  use_dim = TRUE,
  use_names = TRUE,
  use_classes = TRUE,
  use_values = TRUE,
  columns = TRUE,
  ...
)

# S3 method for data.frame
compare_data(ref, x, ...)

# S3 method for data_comparison
print(x, ..., common_values = TRUE, diff_only = TRUE)

Arguments

ref

the reference data.frame

x

a data.frame to be checked

...

further arguments passed to other methods

use_dim

a logical indicating if dataset dimensions should be compared

use_names

a logical indicating if names of the variables should be compared

use_classes

a logical indicating if classes of the variables should be compared

use_values

a logical indicating if values of matching categorical variables should be compared

columns

the names or indices of columns to compare. Defaults to TRUE which will keep all columns by default.

common_values

when TRUE (default), common values are printed. When FALSE, common values are suppressed.

diff_only

when TRUE (default) only differences between ref and the current data content are presented, ignoring similarities. common values are hidden.

Value

an object of class data_comparison. This is a named list for each test

Details

The comparison relies on checking differences in:

  • names of columns

  • classes of the columns (only the first class is used)

  • values of the categorical variables

Examples

## no differences compare_data(iris, iris)
#> #> /// Comparisons of data content // #> (showing differences only) #> #> #> // Comparison of dimensions / #> Same number of rows and columns #> #> // Comparison of variable names / #> #> Same variable names, in the same order #> #> // Comparison of variable classes / #> #> Same variable classes #> #> // Comparison of values in categorical variables / #> #> Same values for categorical variables
## different dimensions compare_data(iris, iris[-1, -2])
#> #> /// Comparisons of data content // #> (showing differences only) #> #> #> // Comparison of dimensions / #> * different numbers of rows: ref has 150, new data has 149 #> * different numbers of columns: ref has 5, new data has 4 #> #> #> // Comparison of variable names / #> #> * variables missing in the new data: #> [1] "Sepal.Width" #> #> #> // Comparison of variable classes / #> #> #> // Comparison of values in categorical variables / #> #> Same values for categorical variables
compare_data(iris[-1, -2], iris) # inverse
#> #> /// Comparisons of data content // #> (showing differences only) #> #> #> // Comparison of dimensions / #> * different numbers of rows: ref has 149, new data has 150 #> * different numbers of columns: ref has 4, new data has 5 #> #> #> // Comparison of variable names / #> #> * new variables: #> [1] "Sepal.Width" #> #> #> // Comparison of variable classes / #> #> #> // Comparison of values in categorical variables / #> #> Same values for categorical variables
## one variable in common but different class and content compare_data(iris, data.frame(Species = letters, stringsAsFactors = FALSE))
#> #> /// Comparisons of data content // #> (showing differences only) #> #> #> // Comparison of dimensions / #> * different numbers of rows: ref has 150, new data has 26 #> * different numbers of columns: ref has 5, new data has 1 #> #> #> // Comparison of variable names / #> #> * variables missing in the new data: #> [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" #> #> #> // Comparison of variable classes / #> `Species` has changed from `factor` to `character` #> #> #> // Comparison of values in categorical variables / #> #> * Missing values in `Species`: #> [1] "setosa" "versicolor" "virginica" #> #> * New values in `Species`: #> [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" #> [20] "t" "u" "v" "w" "x" "y" "z" #>
## Comparing only specific columns iris1 <- iris2 <- iris iris1$letter <- sample(letters[1:3], nrow(iris), replace = TRUE) iris2$letter <- sample(letters[1:8], nrow(iris), replace = TRUE) compare_data(iris1, iris2, columns = "Species")
#> #> /// Comparisons of data content // #> (showing differences only) #> #> #> // Comparison of dimensions / #> Same number of rows and columns #> #> // Comparison of variable names / #> #> Same variable names, in the same order #> #> // Comparison of variable classes / #> #> Same variable classes #> #> // Comparison of values in categorical variables / #> #> Same values for categorical variables
compare_data(iris, iris2, columns = "Species")
#> #> /// Comparisons of data content // #> (showing differences only) #> #> #> // Comparison of dimensions / #> Same number of rows and columns #> #> // Comparison of variable names / #> #> Same variable names, in the same order #> #> // Comparison of variable classes / #> #> Same variable classes #> #> // Comparison of values in categorical variables / #> #> Same values for categorical variables
compare_data(iris, iris1)
#> #> /// Comparisons of data content // #> (showing differences only) #> #> #> // Comparison of dimensions / #> * different numbers of columns: ref has 5, new data has 6 #> #> #> // Comparison of variable names / #> #> * new variables: #> [1] "letter" #> #> #> // Comparison of variable classes / #> #> #> // Comparison of values in categorical variables / #>
compare_data(iris1, iris2)
#> #> /// Comparisons of data content // #> (showing differences only) #> #> #> // Comparison of dimensions / #> Same number of rows and columns #> #> // Comparison of variable names / #> #> Same variable names, in the same order #> #> // Comparison of variable classes / #> #> Same variable classes #> #> // Comparison of values in categorical variables / #> #> * New values in `letter`: #> [1] "d" "e" "f" "g" "h" #>