Type: | Package |
Title: | A Comprehensive Toolkit for Working with Encrypted Parquet Files |
Version: | 0.1.0 |
Description: | Utilities for reading, writing, and managing RCDF files, including encryption and decryption support. It offers a flexible interface for handling data stored in encrypted Parquet format, along with metadata extraction, key management, and secure operations using Advanced Encryption Standard (AES) and Rivest-Shamir-Adleman (RSA) encryption. |
Author: | Bhas Abdulsamad |
Maintainer: | Bhas Abdulsamad <aeabdulsamad@gmail.com> |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
Imports: | arrow, duckdb, haven, openxlsx, fs, zip, glue, utils (≥ 4.0.0), openssl (≥ 2.1.1), dplyr (≥ 1.1.0), stringr (≥ 1.4.0), jsonlite (≥ 1.8.0), DBI (≥ 1.1.0), RSQLite (≥ 2.2.0), uuid (≥ 0.1.2), methods |
Suggests: | rlang (≥ 1.0.2), testthat (≥ 3.0.0), cli, devtools, knitr, rmarkdown, mockery, tibble, withr, gt (≥ 0.10.0) |
Config/testthat/edition: | 3 |
RoxygenNote: | 7.2.3 |
BugReports: | https://github.com/yng-me/rcdf/issues |
VignetteBuilder: | knitr |
Depends: | R (≥ 4.1.0) |
URL: | https://yng-me.github.io/rcdf/ |
NeedsCompilation: | no |
Packaged: | 2025-08-23 05:43:52 UTC; bhasabdulsamad |
Repository: | CRAN |
Date/Publication: | 2025-08-28 08:50:02 UTC |
Add metadata attributes to a data frame
Description
Adds variable labels and value labels to a data frame based on a metadata dictionary. This is particularly useful for preparing datasets for use with packages like 'haven' or for exporting to formats like SPSS or Stata.
Usage
add_metadata(data, metadata, ..., set_data_types = FALSE)
Arguments
data |
A data frame containing the raw dataset. |
metadata |
A data frame that serves as a metadata dictionary. It must contain at least the columns: '"variable_name"', '"label"', and '"type"'. Optionally, it may include a '"valueset"' column for categorical variables, which should be a list column with data frames containing '"value"' and '"label"' columns. |
... |
Additional arguments (currently unused). |
set_data_types |
Logical; if 'TRUE', attempts to coerce column data types to match those implied by the metadata. (Note: currently not fully implemented.) |
Details
The function first checks the structure of the 'metadata' using an internal helper. Then, for each variable listed in 'metadata', it: - Adds a label using the '"label"' attribute - Converts values to labelled vectors using 'haven::labelled()' if a 'valueset' is provided
If value labels are present, the function tries to align data types between the data and the valueset (e.g., converting character codes to integers if necessary).
Value
A 'tibble' with the same data as 'data', but with added attributes: - Variable labels (via the '"label"' attribute) - Value labels (as a 'haven::labelled' class, if applicable)
Examples
data <- data.frame(
sex = c(1, 2, 1),
age = c(23, 45, 34)
)
metadata <- data.frame(
variable_name = c("sex", "age"),
label = c("Gender", "Age in years"),
type = c("categorical", "numeric"),
valueset = I(list(
data.frame(value = c(1, 2), label = c("Male", "Female")),
NULL
))
)
labelled_data <- add_metadata(data, metadata)
str(labelled_data)
Convert to 'rcdf' class
Description
Converts an existing list or compatible object into an object of class '"rcdf"'.
Usage
as_rcdf(data)
Arguments
data |
A list or object to be converted to class '"rcdf"'. |
Value
The input object with class set to '"rcdf"'.
Examples
my_list <- list(a = 1, b = 2)
rcdf_obj <- as_rcdf(my_list)
class(rcdf_obj)
Create an empty 'rcdf' object
Description
Initializes and returns an empty 'rcdf' object. This is a convenient constructor for creating a new 'rcdf'-class list structure.
Usage
rcdf_list()
Value
A list object of class '"rcdf"'.
Examples
rcdf <- rcdf_list()
class(rcdf)
Read environment variables from a file
Description
Reads a '.env' file containing environment variables in the format 'KEY=VALUE', and returns them as a named list. Lines starting with '#' are considered comments and ignored. The function also removes quotes ('"') around values if present.
Usage
read_env(path)
Arguments
path |
A string specifying the path to the '.env' file. If not provided, defaults to '.env' in the current working directory. |
Value
A named list of environment variables. Each element is a key-value pair extracted from the file. If no variables are found, 'NULL' is returned.
Examples
## Not run:
# Assuming an `.env` file with the following content:
# DB_HOST=localhost
# DB_USER=root
# DB_PASS="secret"
env_vars <- read_env(".env")
print(env_vars)
# Should output something like:
# $DB_HOST
# [1] "localhost"
# If no path is given, it defaults to `.env` in the current directory.
env_vars <- read_env()
## End(Not run)
Read Parquet file with optional decryption
Description
This function reads a Parquet file, optionally decrypting it using the provided decryption key. If no decryption key is provided, it reads the file normally without decryption. It supports reading Parquet files as Arrow tables or regular data frames, depending on the 'as_arrow_table' argument.
Usage
read_parquet(path, ..., decryption_key = NULL, as_arrow_table = TRUE)
Arguments
path |
The file path to the Parquet file. |
... |
Additional arguments passed to 'arrow::open_dataset()' when no decryption key is provided. |
decryption_key |
A list containing 'aes_key' and 'aes_iv'. If provided, the Parquet file will be decrypted using these keys. Default is 'NULL'. |
as_arrow_table |
Logical. If 'TRUE', the function will return the result as an Arrow table. If 'FALSE', a regular data frame will be returned. Default is 'TRUE'. |
Value
An Arrow table or a data frame, depending on the value of 'as_arrow_table'.
Examples
# Using sample Parquet files from `mtcars` dataset
dir <- system.file("extdata", package = "rcdf")
# Without decryption
df <- read_parquet(file.path(dir, "mtcars.parquet"))
df
# With decryption
decryption_key <- list(
aes_key = "5bddd0ea4ab48ed5e33b1406180d68158aa255cf3f368bdd4744abc1a7909ead",
aes_iv = "7D3EF463F4CCD81B11B6EC3230327B2D"
)
df_with_encryption <- read_parquet(
file.path(dir, "mtcars-encrypted.parquet"),
decryption_key = decryption_key
)
df_with_encryption
Read and decrypt RCDF data
Description
This function reads an RCDF (Reusable Data Container Format) archive, decrypts its contents using the specified decryption key, and loads it into R as an RCDF object. The data files within the archive (usually Parquet files) are decrypted and, if provided, metadata (such as data dictionary and value sets) are applied to the data.
Usage
read_rcdf(path, decryption_key, ..., password = NULL, metadata = NULL)
Arguments
path |
A string specifying the path to the RCDF archive (zip file). |
decryption_key |
The key used to decrypt the RCDF contents. This can be an RSA or AES key, depending on how the RCDF was encrypted. |
... |
Additional parameters passed to other functions, if needed. |
password |
A password used for RSA decryption (optional). |
metadata |
An optional metadata object containing data dictionaries and value sets. This metadata is applied to the data if provided. |
Value
An RCDF object, which is a list of Parquet files (one for each record) along with attached metadata.
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key)
rcdf_data
# Using encrypted/password protected private key
rcdf_path_pw <- file.path(dir, 'mtcars-pw.rcdf')
private_key_pw <- file.path(dir, 'sample-private-key-pw.pem')
pw <- '1234'
rcdf_data_with_pw <- read_rcdf(
path = rcdf_path_pw,
decryption_key = private_key_pw,
password = pw
)
rcdf_data_with_pw
Write Parquet file with optional encryption
Description
This function writes a dataset to a Parquet file. If an encryption key is provided, the data will be encrypted before writing. Otherwise, the function writes the data as a regular Parquet file without encryption.
Usage
write_parquet(data, path, ..., encryption_key = NULL)
Arguments
data |
A data frame or tibble to write to a Parquet file. |
path |
The file path where the Parquet file will be written. |
... |
Additional arguments passed to 'arrow::write_parquet()' if no encryption key is provided. |
encryption_key |
A list containing 'aes_key' and 'aes_iv'. If provided, the data will be encrypted using AES before writing to Parquet. |
Value
None. The function writes the data to a Parquet file at the specified 'path'.
Examples
data <- mtcars
key <- "5bddd0ea4ab48ed5e33b1406180d68158aa255cf3f368bdd4744abc1a7909ead"
iv <- "7D3EF463F4CCD81B11B6EC3230327B2D"
temp_dir <- tempdir()
rcdf::write_parquet(
data = data,
path = file.path(temp_dir, "mtcars.parquet"),
encryption_key = list(aes_key = key, aes_iv = iv)
)
unlink(file.path(temp_dir, "mtcars.parquet"), force = TRUE)
Write data to RCDF format
Description
This function writes data to an RCDF (Reusable Data Container Format) archive. It encrypts the data using AES, generates metadata, and then creates a zip archive containing both the encrypted Parquet files and metadata. The function supports the inclusion of metadata such as system information and encryption keys.
Usage
write_rcdf(data, path, pub_key, ..., metadata = list())
Arguments
data |
A list of data frames or tables to be written to RCDF format. Each element of the list represents a record. |
path |
The path where the RCDF file will be written. The file will be saved with a '.rcdf' extension if not already specified. |
pub_key |
The public RSA key used to encrypt the AES encryption keys. |
... |
Additional arguments passed to helper functions if needed. |
metadata |
A list of metadata to be included in the RCDF file. Can contain system information or other relevant details. |
Value
NULL. The function writes the data to a '.rcdf' file at the specified path.
Examples
# Example usage of writing an RCDF file
rcdf_data <- rcdf_list()
rcdf_data$mtcars <- mtcars
dir <- system.file("extdata", package = "rcdf")
temp_dir <- tempdir()
write_rcdf(
data = rcdf_data,
path = file.path(temp_dir, "mtcars.rcdf"),
pub_key = file.path(dir, 'sample-public-key.pem')
)
write_rcdf(
data = rcdf_data,
path = file.path(temp_dir, "mtcars-pw.rcdf"),
pub_key = file.path(dir, 'sample-public-key-pw.pem')
)
unlink(file.path(temp_dir, "mtcars.rcdf"), force = TRUE)
unlink(file.path(temp_dir, "mtcars-pw.rcdf"), force = TRUE)
Write RCDF data to multiple formats
Description
Exports RCDF-formatted data to one or more supported open data formats. The function automatically dispatches to the appropriate writer function based on the 'formats' provided.
Usage
write_rcdf_as(data, path, formats, ...)
Arguments
data |
A named list or RCDF object. Each element should be a table or tibble-like object (typically a 'dbplyr' or 'dplyr' table). |
path |
The target directory where output files should be saved. |
formats |
A character vector of file formats to export to. Supported formats include: '"csv"', '"tsv"', '"json"', '"parquet"', '"xlsx"', '"dta"', '"sav"', and '"sqlite"'. |
... |
Additional arguments passed to the respective writer functions. |
Value
Invisibly returns 'NULL'. Files are written to disk.
See Also
write_rcdf_csv write_rcdf_tsv write_rcdf_json write_rcdf_xlsx write_rcdf_dta write_rcdf_sav write_rcdf_sqlite
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key)
temp_dir <- tempdir()
write_rcdf_as(data = rcdf_data, path = temp_dir, formats = c("csv", "xlsx"))
unlink(temp_dir, force = TRUE)
Write RCDF data to CSV files
Description
Writes each table in the RCDF object as a separate '.csv' file.
Usage
write_rcdf_csv(data, path, ..., parent_dir = NULL)
Arguments
data |
A valid RCDF object. |
path |
The base output directory. |
... |
Additional arguments passed to 'write.csv()'. |
parent_dir |
Optional subdirectory under 'path' to group CSV files. |
Value
Invisibly returns 'NULL'. Files are written to disk.
See Also
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key)
temp_dir <- tempdir()
write_rcdf_csv(data = rcdf_data, path = temp_dir)
unlink(temp_dir, force = TRUE)
Write RCDF data to Stata '.dta' files
Description
Writes each table in the RCDF object to a '.dta' file for use in Stata.
Usage
write_rcdf_dta(data, path, ..., parent_dir = NULL)
Arguments
data |
A valid RCDF object. |
path |
Output directory for files. |
... |
Additional arguments passed to 'foreign::write.dta()'. |
parent_dir |
Optional subdirectory under 'path' to group Stata files. |
Value
Invisibly returns 'NULL'. Files are written to disk.
See Also
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key)
temp_dir <- tempdir()
write_rcdf_dta(data = rcdf_data, path = temp_dir)
unlink(temp_dir, force = TRUE)
Write RCDF data to JSON files
Description
Writes each table in the RCDF object as a separate '.json' file.
Usage
write_rcdf_json(data, path, ..., parent_dir = NULL)
Arguments
data |
A valid RCDF object. |
path |
The output directory for files. |
... |
Additional arguments passed to 'jsonlite::write_json()'. |
parent_dir |
Optional subdirectory under 'path' to group JSON files. |
Value
Invisibly returns 'NULL'. Files are written to disk.
See Also
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key)
temp_dir <- tempdir()
write_rcdf_json(data = rcdf_data, path = temp_dir)
unlink(temp_dir, force = TRUE)
Write RCDF data to Parquet files
Description
This function writes an RCDF object (a list of data frames) to multiple Parquet files. Each data frame in the list is written to its corresponding Parquet file in the specified path.
Usage
write_rcdf_parquet(data, path, ..., parent_dir = NULL)
Arguments
data |
A list where each element is a data frame or tibble that will be written to a Parquet file. |
path |
The directory path where the Parquet files will be written. |
... |
Additional arguments passed to 'rcdf::write_parquet()' while writing each Parquet file. |
parent_dir |
An optional parent directory to be included in the path where the files will be written. |
Value
A character vector of file paths to the written Parquet files.
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key)
temp_dir <- tempdir()
write_rcdf_parquet(data = rcdf_data, path = temp_dir)
unlink(temp_dir, force = TRUE)
Write RCDF data to SPSS '.sav' files
Description
Writes each table in the RCDF object to a '.sav' file using the 'haven' package for compatibility with SPSS.
Usage
write_rcdf_sav(data, path, ..., parent_dir = NULL)
Arguments
data |
A valid RCDF object. |
path |
Output directory for files. |
... |
Additional arguments passed to 'haven::write_sav()'. |
parent_dir |
Optional subdirectory under 'path' to group SPSS files. |
Value
Invisibly returns 'NULL'. Files are written to disk.
See Also
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key)
temp_dir <- tempdir()
write_rcdf_sav(data = rcdf_data, path = temp_dir)
unlink(temp_dir, force = TRUE)
Write RCDF data to a SQLite database
Description
Writes all tables in the RCDF object to a single SQLite database file.
Usage
write_rcdf_sqlite(data, path, db_name = "cbms_data", ..., parent_dir = NULL)
Arguments
data |
A valid RCDF object. |
path |
Output directory for the database file. |
db_name |
Name of the SQLite database file (without extension). |
... |
Additional arguments passed to 'DBI::dbWriteTable()'. |
parent_dir |
Optional subdirectory under 'path' to store the SQLite file. |
Value
Invisibly returns 'NULL'. A '.db' file is written to disk.
See Also
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key)
temp_dir <- tempdir()
write_rcdf_sqlite(data = rcdf_data, path = temp_dir)
unlink(temp_dir, force = TRUE)
Write RCDF data to TSV files
Description
Writes each table in the RCDF object as a separate tab-separated '.txt' file.
Usage
write_rcdf_tsv(data, path, ..., parent_dir = NULL)
Arguments
data |
A valid RCDF object. |
path |
The base output directory. |
... |
Additional arguments passed to 'write.table()'. |
parent_dir |
Optional subdirectory under 'path' to group TSV files. |
Value
Invisibly returns 'NULL'. Files are written to disk.
See Also
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key)
temp_dir <- tempdir()
write_rcdf_tsv(data = rcdf_data, path = temp_dir)
unlink(temp_dir, force = TRUE)
Write RCDF data to Excel files
Description
Writes each table in the RCDF object as a separate '.xlsx' file using the 'openxlsx' package.
Usage
write_rcdf_xlsx(data, path, ..., parent_dir = NULL)
Arguments
data |
A valid RCDF object. |
path |
The output directory. |
... |
Additional arguments passed to 'openxlsx::write.xlsx()'. |
parent_dir |
Optional subdirectory under 'path' to group Excel files. |
Value
Invisibly returns 'NULL'. Files are written to disk.
See Also
Examples
dir <- system.file("extdata", package = "rcdf")
rcdf_path <- file.path(dir, 'mtcars.rcdf')
private_key <- file.path(dir, 'sample-private-key.pem')
rcdf_data <- read_rcdf(path = rcdf_path, decryption_key = private_key)
temp_dir <- tempdir()
write_rcdf_xlsx(data = rcdf_data, path = temp_dir)
unlink(temp_dir, force = TRUE)