Compare structures of two datasets

This function extracts the structures of two data.frames and compares them, issuing a series of diagnostics.

compare_data(ref, x, ...)

# S3 method for default
compare_data(ref, x, ...)

# S3 method for data_structure
compare_data(
  ref,
  x,
  use_dim = TRUE,
  use_names = TRUE,
  use_classes = TRUE,
  use_values = TRUE,
  columns = TRUE,
  ...
)

# S3 method for data.frame
compare_data(ref, x, ...)

# S3 method for data_comparison
print(x, ..., common_values = TRUE, diff_only = TRUE)

Arguments

ref	the reference `data.frame`
x	a `data.frame` to be checked
...	further arguments passed to other methods
use_dim	a `logical` indicating if dataset dimensions should be compared
use_names	a `logical` indicating if names of the variables should be compared
use_classes	a `logical` indicating if classes of the variables should be compared
use_values	a `logical` indicating if values of matching categorical variables should be compared
columns	the names or indices of columns to compare. Defaults to `TRUE` which will keep all columns by default.
common_values	when `TRUE` (default), common values are printed. When `FALSE`, common values are suppressed.
diff_only	when `TRUE` (default) only differences between ref and the current data content are presented, ignoring similarities. common values are hidden.

Value

an object of class data_comparison. This is a named list for each test

Details

The comparison relies on checking differences in:

names of columns
classes of the columns (only the first class is used)
values of the categorical variables

Examples


## no differences
compare_data(iris, iris)
#> 
#>  /// Comparisons of data content // 
#>      (showing differences only)
#> 
#> 
#>  // Comparison of dimensions /
#> Same number of rows and columns
#> 
#>  // Comparison of variable names /
#> 
#> Same variable names, in the same order
#> 
#>  // Comparison of variable classes /
#> 
#> Same variable classes
#> 
#>  // Comparison of values in categorical variables /
#> 
#> Same values for categorical variables

## different dimensions
compare_data(iris, iris[-1, -2])
#> 
#>  /// Comparisons of data content // 
#>      (showing differences only)
#> 
#> 
#>  // Comparison of dimensions /
#>   * different numbers of rows: ref has 150, new data has 149
#>   * different numbers of columns: ref has 5, new data has 4
#> 
#> 
#>  // Comparison of variable names /
#> 
#>   * variables missing in the new data:
#> [1] "Sepal.Width"
#> 
#> 
#>  // Comparison of variable classes /
#> 
#> 
#>  // Comparison of values in categorical variables /
#> 
#> Same values for categorical variables
compare_data(iris[-1, -2], iris) # inverse
#> 
#>  /// Comparisons of data content // 
#>      (showing differences only)
#> 
#> 
#>  // Comparison of dimensions /
#>   * different numbers of rows: ref has 149, new data has 150
#>   * different numbers of columns: ref has 4, new data has 5
#> 
#> 
#>  // Comparison of variable names /
#> 
#>   * new variables:
#> [1] "Sepal.Width"
#> 
#> 
#>  // Comparison of variable classes /
#> 
#> 
#>  // Comparison of values in categorical variables /
#> 
#> Same values for categorical variables

## one variable in common but different class and content
compare_data(iris,
             data.frame(Species = letters,
                        stringsAsFactors = FALSE))
#> 
#>  /// Comparisons of data content // 
#>      (showing differences only)
#> 
#> 
#>  // Comparison of dimensions /
#>   * different numbers of rows: ref has 150, new data has 26
#>   * different numbers of columns: ref has 5, new data has 1
#> 
#> 
#>  // Comparison of variable names /
#> 
#>   * variables missing in the new data:
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
#> 
#> 
#>  // Comparison of variable classes /
#> `Species` has changed from `factor` to `character`
#> 
#> 
#>  // Comparison of values in categorical variables /
#> 
#>   * Missing values in `Species`:
#> [1] "setosa"     "versicolor" "virginica" 
#> 
#>   * New values in `Species`:
#>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
#> [20] "t" "u" "v" "w" "x" "y" "z"
#> 

## Comparing only specific columns

iris1 <- iris2 <- iris
iris1$letter <- sample(letters[1:3], nrow(iris), replace = TRUE)
iris2$letter <- sample(letters[1:8], nrow(iris), replace = TRUE)
compare_data(iris1, iris2, columns = "Species")
#> 
#>  /// Comparisons of data content // 
#>      (showing differences only)
#> 
#> 
#>  // Comparison of dimensions /
#> Same number of rows and columns
#> 
#>  // Comparison of variable names /
#> 
#> Same variable names, in the same order
#> 
#>  // Comparison of variable classes /
#> 
#> Same variable classes
#> 
#>  // Comparison of values in categorical variables /
#> 
#> Same values for categorical variables
compare_data(iris, iris2, columns = "Species")
#> 
#>  /// Comparisons of data content // 
#>      (showing differences only)
#> 
#> 
#>  // Comparison of dimensions /
#> Same number of rows and columns
#> 
#>  // Comparison of variable names /
#> 
#> Same variable names, in the same order
#> 
#>  // Comparison of variable classes /
#> 
#> Same variable classes
#> 
#>  // Comparison of values in categorical variables /
#> 
#> Same values for categorical variables
compare_data(iris, iris1)
#> 
#>  /// Comparisons of data content // 
#>      (showing differences only)
#> 
#> 
#>  // Comparison of dimensions /
#>   * different numbers of columns: ref has 5, new data has 6
#> 
#> 
#>  // Comparison of variable names /
#> 
#>   * new variables:
#> [1] "letter"
#> 
#> 
#>  // Comparison of variable classes /
#> 
#> 
#>  // Comparison of values in categorical variables /
#> 
compare_data(iris1, iris2)
#> 
#>  /// Comparisons of data content // 
#>      (showing differences only)
#> 
#> 
#>  // Comparison of dimensions /
#> Same number of rows and columns
#> 
#>  // Comparison of variable names /
#> 
#> Same variable names, in the same order
#> 
#>  // Comparison of variable classes /
#> 
#> Same variable classes
#> 
#>  // Comparison of values in categorical variables /
#> 
#>   * New values in `letter`:
#> [1] "d" "e" "f" "g" "h"
#>

Arguments

Value

Details

Examples

Contents

Author