Type: | Package |
Title: | Read Linguistic Data in the Cross Linguistic Data Format (CLDF) |
Version: | 1.5.1 |
Maintainer: | Simon J. Greenhill <simon@simon.net.nz> |
Description: | Cross-Linguistic Data Format (CLDF) is a framework for storing cross-linguistic data, ensuring compatibility and ease of data exchange between different linguistic datasets see Forkel et al. (2018) <doi:10.1038/sdata.2018.205>. The 'rcldf' package is designed to facilitate the manipulation and analysis of these datasets by simplifying the loading, querying, and visualisation of CLDF datasets making it easier to conduct comparative linguistic analyses, manage language data, and apply statistical methods directly within R. |
License: | Apache License (≥ 2.0) |
Encoding: | UTF-8 |
Imports: | archive, bib2df (≥ 1.1.1), csvwr, digest, dplyr, jsonlite, logger, magrittr, purrr, readr, remotes, rlang, tools, urltools, utils |
Suggests: | ggplot2, patchwork, testthat, mockthat, spelling, covr, knitr, rmarkdown, qpdf |
URL: | https://github.com/SimonGreenhill/rcldf |
BugReports: | https://github.com/SimonGreenhill/rcldf/issues |
Language: | en-US |
RoxygenNote: | 7.3.2 |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-09-22 08:05:55 UTC; simon |
Author: | Simon J. Greenhill [aut, cre] |
Repository: | CRAN |
Date/Publication: | 2025-09-30 07:20:02 UTC |
rcldf: Read Linguistic Data in the Cross Linguistic Data Format (CLDF)
Description
The rcldf package is designed to facilitate the manipulation and analysis of datasets in Cross-Linguistic Data Format (CLDF, Forkel et al. 2018 doi:10.1038/sdata.2018.205). CLDF is a framework for storing cross-linguistic data, ensuring compatibility and ease of data exchange between different linguistic datasets. This package simplifies the loading, querying, and visualisation of CLDF datasets making it easier to conduct comparative linguistic analyses, manage language data, and apply statistical methods directly within R.
Details
rcldf is a library for R to read Cross-Linguistic Data files (CLDF)
Author(s)
Maintainer: Simon J. Greenhill simon@simon.net.nz
See Also
Useful links:
Report bugs at https://github.com/SimonGreenhill/rcldf/issues
Adds a dataframe.
Description
Adds a dataframe.
Usage
add_dataframe(table, filename, group)
Arguments
table |
a metadata section from the CLDF metadata. |
filename |
the filename. |
group |
a grouping from the metadata. |
Value
A dataframe
Extracts a CLDF table as a 'wide' dataframe by resolving all foreign key links
Description
Extracts a CLDF table as a 'wide' dataframe by resolving all foreign key links
Usage
as.cldf.wide(object, table)
Arguments
object |
the |
table |
the name of the table to extract. |
Value
A tibble dataframe
Examples
md <- system.file("extdata/huon", "cldf-metadata.json", package = "rcldf")
cldfobj <- cldf(md)
forms <- as.cldf.wide(cldfobj, 'FormTable')
Reads a Cross-Linguistic Data Format dataset into an object.
Description
Reads a Cross-Linguistic Data Format dataset into an object.
included here to match people expecting e.g. readr::read_csv etc
Usage
cldf(
mdpath,
load_bib = FALSE,
cache_dir = tools::R_user_dir("rcldf", which = "cache")
)
read_cldf(
mdpath,
load_bib = FALSE,
cache_dir = tools::R_user_dir("rcldf", which = "cache")
)
Arguments
mdpath |
the path to the directory or metadata JSON file. |
load_bib |
a boolean flag (TRUE/FALSE, default FALSE) to load the
sources.bib BibTeX file. |
cache_dir |
a directory to cache downloaded files to |
Value
A cldf
object
Examples
cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf"))
Coalesce value to truthiness
Description
Determine whether the input is true, with missing values being interpreted as false.
Usage
coalesce_truth(x)
Arguments
x |
logical, |
Value
FALSE
if x is anything but TRUE
Map csvw datatypes to R types
Description
Translate csvw datatypes to R types. This implementation currently targets readr::cols column specifications.
Usage
datatype_to_type(datatypes)
Arguments
datatypes |
a list of csvw datatypes |
Details
rcldf adds some overrides here to add e.g. anyURI etc.
Value
a readr::cols
specification - a list of collectors
Examples
cspec <- datatype_to_type(list("double", list(base="date", format="yyyy-MM-dd")))
readr::read_csv(readr::readr_example("challenge.csv"), col_types=cspec)
CSVW default dialect
Description
The CSVW Default Dialect specification described in CSV Dialect Description Format.
Usage
default_dialect
Format
An object of class list
of length 13.
Value
a list specifying a default csv dialect
Create a default table schema given a csv file and dialect
Description
If neither the table nor the group have a tableSchema
annotation,
then this default schema will used.
Usage
default_schema(filename, dialect = default_dialect)
Arguments
filename |
a csv file |
dialect |
specification of the csv's dialect (default: |
Value
a table schema
Returns the cache dir.
Description
Returns the cache dir.
Usage
get_cache_dir(cache_dir = NA)
Arguments
cache_dir |
a directory to use |
Value
A string of the cache dir
Returns a dataframe of with details on the CLDF dataset in path
.
Description
Returns a dataframe of with details on the CLDF dataset in path
.
Usage
get_details(path, cache_dir = NA)
Arguments
path |
the path to resolve |
cache_dir |
a directory to cache downloaded files to |
Value
A dataframe.
Returns the filesize in bytes of a directory.
Description
Returns the filesize in bytes of a directory.
Usage
get_dir_size(path)
Arguments
path |
a directory to size |
Value
A numeric of the file size in bytes
Get a filename from url value in metadata (handles .zip files)
Description
Get a filename from url value in metadata (handles .zip files)
Usage
get_filename(base_dir, url)
Arguments
base_dir |
the base_dir |
url |
the url statement |
Value
A string
Downloads and installs a CLDF dataset from a Zenodo endpoint
Description
Downloads and installs a CLDF dataset from a Zenodo endpoint
Usage
get_from_zenodo(zid, load_bib = FALSE, cache_dir = NULL)
Arguments
zid |
Zenodo endpoint |
load_bib |
load sources (TRUE/FALSE, default FALSE) |
cache_dir |
A cache_dir to use. If NULL it will use get_cache_dir |
Value
A cldf
object
Identifies the separator characters specified by the CLDF metadata.
Description
Identifies the separator characters specified by the CLDF metadata.
Usage
get_separators(metadata)
Arguments
metadata |
|
Value
A dataframe with three columns (name, separator, url).
Extracts a single table from a CLDF dataset.
Description
Extracts a single table from a CLDF dataset.
Usage
get_table_from(
table,
mdpath,
cache_dir = tools::R_user_dir("rcldf", which = "cache")
)
Arguments
table |
a CLDF table type |
mdpath |
a path to a CLDF file |
cache_dir |
a directory to cache downloaded files to |
Value
a dataframe
Examples
md_json <- system.file("extdata/huon", "cldf-metadata.json", package = "rcldf")
df <- get_table_from("LanguageTable", md_json)
Convert a CLDF URL tablename to a short tablename
Description
Convert a CLDF URL tablename to a short tablename
Usage
get_tablename(conformsto, url = NA)
Arguments
conformsto |
the dc:conforms to statement |
url |
the url statement |
Value
A string
Examples
get_tablename("http://cldf.clld.org/v1.0/terms.rdf#ValueTable")
Returns TRUE if url
looks like a github URL
Description
Returns TRUE if url
looks like a github URL
Usage
is_github(url)
Arguments
url |
A string |
Value
A boolean TRUE/FALSE
Examples
is_github('https://github.com/SimonGreenhill/rcldf/')
Returns TRUE if url
looks like a URL
Description
Returns TRUE if url
looks like a URL
Usage
is_url(url)
Arguments
url |
A string |
Value
A boolean TRUE/FALSE
Examples
is_url('http://simon.net.nz')
Returns a dataframe of directories in the cache dir
Description
Returns a dataframe of directories in the cache dir
Usage
list_cache_files(cache_dir = NULL)
Arguments
cache_dir |
the cache directory to use. If NULL then R_user_dir will be used. |
Value
A dataframe of the directories
Returns a CLDF dataset object of the latest CLTS version.
Description
Returns a CLDF dataset object of the latest CLTS version.
Usage
load_clts(load_bib = FALSE, cache_dir = NULL)
Arguments
load_bib |
load sources (TRUE/FALSE, default FALSE) |
cache_dir |
A cache_dir to use. If NULL it will use get_cache_dir |
Value
A cldf
object
Returns a CLDF dataset object of the latest Concepticon version.
Description
Returns a CLDF dataset object of the latest Concepticon version.
Usage
load_concepticon(load_bib = FALSE, cache_dir = NULL)
Arguments
load_bib |
load sources (TRUE/FALSE, default FALSE) |
cache_dir |
A cache_dir to use. If NULL it will use get_cache_dir |
Value
A cldf
object
Returns a CLDF dataset object of the latest glottolog version.
Description
Returns a CLDF dataset object of the latest glottolog version.
Usage
load_glottolog(load_bib = FALSE, cache_dir = NULL)
Arguments
load_bib |
load sources (TRUE/FALSE, default FALSE) |
cache_dir |
A cache_dir to use. If NULL it will use get_cache_dir |
Value
A cldf
object
Returns the cachekey for the given path.
Description
Returns the cachekey for the given path.
Usage
make_cache_key(path)
Arguments
path |
a path to generate the cachekey for. |
Value
A string.
Converts all values specified in the CLDF metadata as null
to R's NA
.
Description
Note that this is run by default on loading a dataset with cldf()
Usage
nullify(cldfobj, nulls = NULL)
Arguments
cldfobj |
a CLDF Object |
nulls |
a dataframe of null values to replace (default=NULL). |
Value
A cldf
object
Examples
cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf"))
cldfobj <- nullify(cldfobj)
Override defaults
Description
Merges two lists applying override
values on top of the default
values.
Usage
override_defaults(...)
Arguments
... |
any number of lists with configuration values |
Value
a list with the values from the first list replacing those in the second and so on
Summarises the CLDF file
Description
Summarises the CLDF file
Usage
## S3 method for class 'cldf'
print(x, ...)
Arguments
x |
the CLDF dataset |
... |
Arguments to be passed to or from other methods. Currently not used. |
Value
No return value, called for side effects.
Examples
cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf"))
print(cldfobj)
Adds BibTeX source information into a CLDF dataset
Description
Adds BibTeX source information into a CLDF dataset
Usage
read_bib(object)
Arguments
object |
A CLDF object |
Value
A tibble dataframe
Relabels a column in a dataset for merging.
Description
Relabels a column in a dataset for merging.
Usage
relabel(column, table)
Arguments
column |
the tablename. |
table |
the tablename. |
Value
A string of "column.table"
Helper function to resolve the path (e.g. directory or md.json file)
Description
Helper function to resolve the path (e.g. directory or md.json file)
Usage
resolve_path(path, cache_dir = NA)
Arguments
path |
the path to resolve |
cache_dir |
a directory to cache downloaded files to |
Value
A list of two items:
path
- string containing the path to the metadata.json file
metadata
- a csvwr metadata object
Expands all values with separators.
Description
Note that this is run by default on loading a dataset with cldf()
Usage
separate(cldfobj, separators = NULL)
Arguments
cldfobj |
a CLDF Object |
separators |
a dataframe of separator values to replace (default=NULL). |
Value
A cldf
object
Examples
cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf"))
cldfobj <- separate(cldfobj)
Sets the cache dir for the current session.
Description
Sets the cache dir for the current session.
Usage
set_cache_dir(cache_dir = NA)
Arguments
cache_dir |
a directory to use |
Value
NULL. Sets an environment value.
Summarises the CLDF file
Description
Summarises the CLDF file
Usage
## S3 method for class 'cldf'
summary(object, ...)
Arguments
object |
the CLDF dataset |
... |
Arguments to be passed to or from other methods. Currently not used. |
Value
None
Examples
cldfobj <- cldf(system.file("extdata/huon", "cldf-metadata.json", package = "rcldf"))
summary(cldfobj)