NEWS.md
clean_variable_spelling() and clean_spelling() have been migrated over to the {matchmaker} package and arguments from the aformentioned functions are passed to the {matchmaker} functions. Tests and documentation have been updated to reflect this.clean_variable_spelling() and clean_spelling() gain the option to specify which columns contain the keys (from) and values (to). These default to 1 and 2, respectively, which ensure that backwards compatibility is retained. (this fixes #99).linelist_example() is a new function that serves as an alias for system.file("extdata", thing, package = "linelist"), which is much easier for new R users to understand.top_values() no longer throws a spurious warning when the levels in the subset data are identical to the levels in the full data (#96)top_values() gains a new subset argument that allows the user to retain the top levels of a subset of a vector. This is particularly useful for retrospective analysis based on current trends (fixes #92 via #94 and #95,
@thibautjombart)top_values() gains the explicit ties.method parameter, which defaults to “first” to fix issue #88 (thanks to @cwhittaker1000 for spotting the issue and providing a detailed explanation).top_values() issues a warning if one of the top values had a tied value that was not included.top_values() issues a warning if the user uses a ties.method that is not guaranteed to return exactly n top values.clean_spelling() gains the anchor_regex argument, which will wrap all regex keyword entries in “^” and “$” before processing.The linelist class and all associated epivars/dictionary functions have been removed as out of scope of this package. Without any validation, these functions were no more than a fancy wrapper to dplyr::rename(), thus they are being removed after fda9e18b02f5853cd311ddcc513c427244b21dd7. If the linelist class is ressurrected, (e.g. to implement a hxl validator package), it can be taken from that commit. This is related to #29
clean_spelling() now gains the .regex keyword that allows the user to supply perl-style regular expressions to change words that may have similar spelling.
guess_dates() now processes at double the speed of the previous version.guess_dates() will now properly constrain date vectors to the start and end dates.guess_dates() correctly parses dates represented as integers from excel (#73).print.data_comparison() now sets diff_only = TRUE by default (#71)compare_data() gains the option columns, which allows users to choose which columns they want to compare. Defaults to TRUE, which compares all columns (#58).guess_dates() can now handle dates that were imported from Excel as integers (#66).guess_dates() gains the argument “modern_excel” to indicate how integer dates should be formatted.getOption("linelist_guess_orders") replaces the explicit list of orders in guess_dates() for easier access.guess_dates() no longer throws an error if passed a date class object (#65).guess_dates() has been better documented to reflect the above changes (#64).clean_spelling() gains a new keyword: .na (or should I say “valueword”). When this keyword is in the values (second) column of the wordlist, the keys will be replaced with a missing (<NA>) value. This is useful for contrasting between presence of an absence and an absence of a presence with the .missing keyword. See #55 and #57 for detailsprint.data_comparison() gains the logical arguments common_values and diff_only to control the length of print output (See #61).compare_data() now correctly accounts for different values in variables. Thanks to @ffinger for finding the bug (#56).compare_data() now returns list of variable classes instead of TRUE if the classes match. (See #53 for details).clean_variable_spelling() will now run global variables before processing named variables instead of in tandem. This allows the user to define misspellings in the .global variable. See https://github.com/reconhub/linelist/issues/51 for details.clean_spelling() will no longer throw a warning if there is no value for .default to replace.clean_variable_spelling(), clean_variables(), and clean_data() gain the warn and warn_spelling arguments which will capture all errors and warnings issued from clean_spelling() for each variable. See https://github.com/reconhub/linelist/pull/48 for details).compare_data() allows users to compare structural changes to data frames This includes, names, classes, dimensions, and values in matching categorical variables. (See https://github.com/reconhub/linelist/pull/50 for details).top_values() will mask all but the top n values in a factor.crayon package is added to importsclean_spelling() wordlists now allow the optional .missing keyword to replace both NA and blank ("“) cells in the data. Values that are NA will be converted to”NA" (character) with a warning. See https://github.com/reconhub/linelist/pull/44 and https://github.com/reconhub/linelist/pull/45 for details.guess_dates() can once again parse date formats that are file names: example_format_2019-02-19.xlsx. (See #43 for details)clean_spelling() gains a quiet argument to suppress warnings.clean_variable_spelling() will no longer error if there are variable specifications that don’t exist in the data. It will also suppress all warnings from clean_spelling(). (see #41 for details)clean_spelling() will check the spelling of a vector against a wordlistclean_variable_spelling() will apply clean_spelling() to all specified columns in a data frameclean_variables() wraps clean_variable_labels() and clean_variable_spelling()
clean_data() now can optionally check labels againt a wordlist.(see #38 for details)
mask() will temporarily replace column names with epivarsunmask() reverses the effect of mask.geo epivar was replaced with geo_lat and geo_lon (see #35)lookup() function can look up the column name corresponding to an epivar (see #28)add_epivars() adds epivars to the global dictionaryadd_description() updates the description of one of the epivars (see #26)template_linelist() function (see #24)get_vars() can take multiple variables (see #15)guess_dates() now throws an appropriate error if a vector is passed instead of a data frame. See https://github.com/reconhub/linelist/issues/4 for details