The convert argument in write_vc() and
read_vc() allows you to apply transformations to data
columns during the write and read operations. This is useful when you
want to store data types that git2rdata doesn’t support.
The only requirement is that there exist two functions in some R package
that do the transformation. One function should convert the unsupported
data type into a supported data type. The second function should revert
the supported data type into the original unsupported data type.
The convert argument is a named list where:
write and read"package::function"A simple example is converting text to uppercase for storage while keeping it lowercase in R:
# Create sample data
data <- data.frame(
id = 1:3,
name = c("alice", "bob", "charlie"),
stringsAsFactors = FALSE
)
# Write with case conversion
write_vc(
data,
file = "people",
root = root,
sorting = "id",
convert = list(
name = c(
write = "base::toupper", # Convert to uppercase when writing
read = "base::tolower" # Convert to lowercase when reading
)
)
)## 766b5ac81e1dd8ac12c46ab0f765de87fa05b465
## "people.tsv"
## d4e04e976482cb8bfd5978f21be4ec353bf156a0
## "people.yml"
The stored file contains the names in uppercase:
# Check the raw file content
raw_content <- readLines(file.path(root, "people.tsv"))
cat(raw_content, sep = "\n")## id name
## 1 ALICE
## 2 BOB
## 3 CHARLIE
When reading the data back, the conversion is automatically applied:
## id name
## 1 1 alice
## 2 2 bob
## 3 3 charlie
##
## Use `display_metadata()` to view the metadata.
## $name
## [1] "base::toupper" "base::tolower"
You can apply conversions to multiple columns:
data2 <- data.frame(
id = 1:2,
first_name = c("alice", "bob"),
last_name = c("smith", "jones"),
stringsAsFactors = FALSE
)
write_vc(
data2,
file = "names",
root = root,
sorting = "id",
convert = list(
first_name = c(write = "base::toupper", read = "base::tolower"),
last_name = c(write = "base::toupper", read = "base::tolower")
)
)## be646a79460482c4df21cbdbf3d1395140015240
## "names.tsv"
## 65167961915d41cf81854bd4f715e61ee96b3183
## "names.yml"
## id first_name last_name
## 1 1 alice smith
## 2 2 bob jones
##
## Use `display_metadata()` to view the metadata.
git2rdata doesn’t have support for 64-bit integers. You
can store them by converting them into a character.
Convert numeric data to a more compact string representation:
Package availability: All packages referenced in
the convert argument must be available when calling
write_vc() and read_vc(). The function checks
for package availability at read and write time.
Function validation: The function validates that the specified functions exist in the specified packages.
Metadata storage: Conversion specifications are
stored in the metadata YAML file, ensuring that read_vc()
knows how to reverse the transformations.
Strict mode: When updating existing files,
changes to the convert argument are detected by
compare_meta() and will trigger an error in strict mode or
a warning in non-strict mode.
The convert argument only accepts functions in the
package::function format. Anonymous functions or functions
from the global environment are not supported.
Conversions must be reversible. The read function
should be able to restore the original data from the converted
form.
The conversion is applied before meta() processes
the data, so optimizations (like factor encoding) work on the converted
data.