R/compare_data.R
, R/print.data_comparison.R
compare_data.Rd
This function extracts the structures of two data.frames
and compares them,
issuing a series of diagnostics.
compare_data(ref, x, ...) # S3 method for default compare_data(ref, x, ...) # S3 method for data_structure compare_data( ref, x, use_dim = TRUE, use_names = TRUE, use_classes = TRUE, use_values = TRUE, columns = TRUE, ... ) # S3 method for data.frame compare_data(ref, x, ...) # S3 method for data_comparison print(x, ..., common_values = TRUE, diff_only = TRUE)
ref | the reference |
---|---|
x | a |
... | further arguments passed to other methods |
use_dim | a |
use_names | a |
use_classes | a |
use_values | a |
columns | the names or indices of columns to compare. Defaults to |
common_values | when |
diff_only | when |
an object of class data_comparison
. This is a named list for
each test
The comparison relies on checking differences in:
names of columns
classes of the columns (only the first class is used)
values of the categorical variables
## no differences compare_data(iris, iris)#> #> /// Comparisons of data content // #> (showing differences only) #> #> #> // Comparison of dimensions / #> Same number of rows and columns #> #> // Comparison of variable names / #> #> Same variable names, in the same order #> #> // Comparison of variable classes / #> #> Same variable classes #> #> // Comparison of values in categorical variables / #> #> Same values for categorical variables## different dimensions compare_data(iris, iris[-1, -2])#> #> /// Comparisons of data content // #> (showing differences only) #> #> #> // Comparison of dimensions / #> * different numbers of rows: ref has 150, new data has 149 #> * different numbers of columns: ref has 5, new data has 4 #> #> #> // Comparison of variable names / #> #> * variables missing in the new data: #> [1] "Sepal.Width" #> #> #> // Comparison of variable classes / #> #> #> // Comparison of values in categorical variables / #> #> Same values for categorical variablescompare_data(iris[-1, -2], iris) # inverse#> #> /// Comparisons of data content // #> (showing differences only) #> #> #> // Comparison of dimensions / #> * different numbers of rows: ref has 149, new data has 150 #> * different numbers of columns: ref has 4, new data has 5 #> #> #> // Comparison of variable names / #> #> * new variables: #> [1] "Sepal.Width" #> #> #> // Comparison of variable classes / #> #> #> // Comparison of values in categorical variables / #> #> Same values for categorical variables## one variable in common but different class and content compare_data(iris, data.frame(Species = letters, stringsAsFactors = FALSE))#> #> /// Comparisons of data content // #> (showing differences only) #> #> #> // Comparison of dimensions / #> * different numbers of rows: ref has 150, new data has 26 #> * different numbers of columns: ref has 5, new data has 1 #> #> #> // Comparison of variable names / #> #> * variables missing in the new data: #> [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" #> #> #> // Comparison of variable classes / #> `Species` has changed from `factor` to `character` #> #> #> // Comparison of values in categorical variables / #> #> * Missing values in `Species`: #> [1] "setosa" "versicolor" "virginica" #> #> * New values in `Species`: #> [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" #> [20] "t" "u" "v" "w" "x" "y" "z" #>## Comparing only specific columns iris1 <- iris2 <- iris iris1$letter <- sample(letters[1:3], nrow(iris), replace = TRUE) iris2$letter <- sample(letters[1:8], nrow(iris), replace = TRUE) compare_data(iris1, iris2, columns = "Species")#> #> /// Comparisons of data content // #> (showing differences only) #> #> #> // Comparison of dimensions / #> Same number of rows and columns #> #> // Comparison of variable names / #> #> Same variable names, in the same order #> #> // Comparison of variable classes / #> #> Same variable classes #> #> // Comparison of values in categorical variables / #> #> Same values for categorical variablescompare_data(iris, iris2, columns = "Species")#> #> /// Comparisons of data content // #> (showing differences only) #> #> #> // Comparison of dimensions / #> Same number of rows and columns #> #> // Comparison of variable names / #> #> Same variable names, in the same order #> #> // Comparison of variable classes / #> #> Same variable classes #> #> // Comparison of values in categorical variables / #> #> Same values for categorical variablescompare_data(iris, iris1)#> #> /// Comparisons of data content // #> (showing differences only) #> #> #> // Comparison of dimensions / #> * different numbers of columns: ref has 5, new data has 6 #> #> #> // Comparison of variable names / #> #> * new variables: #> [1] "letter" #> #> #> // Comparison of variable classes / #> #> #> // Comparison of values in categorical variables / #>compare_data(iris1, iris2)#> #> /// Comparisons of data content // #> (showing differences only) #> #> #> // Comparison of dimensions / #> Same number of rows and columns #> #> // Comparison of variable names / #> #> Same variable names, in the same order #> #> // Comparison of variable classes / #> #> Same variable classes #> #> // Comparison of values in categorical variables / #> #> * New values in `letter`: #> [1] "d" "e" "f" "g" "h" #>