Title: Processing Agro-Environmental Data
Version: 0.1.0
Description: A set of tools for processing and analyzing data developed in the context of the "Who Has Eaten the Planet" (WHEP) project, funded by the European Research Council (ERC). For more details on multi-regional input–output model "Food and Agriculture Biomass Input–Output" (FABIO) see Bruckner et al. (2019) <doi:10.1021/acs.est.9b03554>.
License: MIT + file LICENSE
Imports: cli, dplyr, fs, FAOSTAT, httr, mipfp, nanoparquet, pins, purrr, readr, rlang, stringr, tidyr, withr, yaml
Encoding: UTF-8
RoxygenNote: 7.3.2
Suggests: ggplot2, googlesheets4, here, knitr, pointblank, rmarkdown, testthat (≥ 3.0.0), tibble
Config/testthat/edition: 3
VignetteBuilder: knitr
URL: https://eduaguilera.github.io/whep/, https://github.com/eduaguilera/whep
BugReports: https://github.com/eduaguilera/whep/issues
Depends: R (≥ 4.2.0)
LazyData: true
NeedsCompilation: no
Packaged: 2025-07-23 12:39:29 UTC; usuario
Author: Catalin Covaci ORCID iD [aut, cre], Eduardo Aguilera ORCID iD [aut, cph], João Serra ORCID iD [ctb], European Research Council [fnd]
Maintainer: Catalin Covaci <catalin.covaci@csic.es>
Repository: CRAN
Date/Publication: 2025-07-25 09:50:01 UTC

whep: Processing Agro-Environmental Data

Description

logo

A set of tools for processing and analyzing data developed in the context of the "Who Has Eaten the Planet" (WHEP) project, funded by the European Research Council (ERC). For more details on multi-regional input–output model "Food and Agriculture Biomass Input–Output" (FABIO) see Bruckner et al. (2019) doi:10.1021/acs.est.9b03554.

Author(s)

Maintainer: Catalin Covaci catalin.covaci@csic.es (ORCID)

Authors:

Other contributors:

See Also

Useful links:


Get area codes from area names

Description

Add a new column to an existing tibble with the corresponding code for each name. The codes are assumed to be from those defined by the FABIO model.

Usage

add_area_code(table, name_column = "area_name", code_column = "area_code")

Arguments

table

The table that will be modified with a new column.

name_column

The name of the column in table containing the names.

code_column

The name of the output column containing the codes.

Value

A tibble with all the contents of table and an extra column named code_column, which contains the codes. If there is no code match, an NA is included.

Examples

table <- tibble::tibble(
  area_name = c("Armenia", "Afghanistan", "Dummy Country", "Albania")
)

add_area_code(table)

table |>
  dplyr::rename(my_area_name = area_name) |>
  add_area_code(name_column = "my_area_name")

add_area_code(table, code_column = "my_custom_code")

Get area names from area codes

Description

Add a new column to an existing tibble with the corresponding name for each code. The codes are assumed to be from those defined by the FABIO model, which them themselves come from FAOSTAT internal codes. Equivalences with ISO 3166-1 numeric can be found in the Area Codes CSV from the zip file that can be downloaded from FAOSTAT. TODO: Think about this, would be nice to use ISO3 codes but won't be enough for our periods.

Usage

add_area_name(table, code_column = "area_code", name_column = "area_name")

Arguments

table

The table that will be modified with a new column.

code_column

The name of the column in table containing the codes.

name_column

The name of the output column containing the names.

Value

A tibble with all the contents of table and an extra column named name_column, which contains the names. If there is no name match, an NA is included.

Examples

table <- tibble::tibble(area_code = c(1, 2, 4444, 3))

add_area_name(table)

table |>
  dplyr::rename(my_area_code = area_code) |>
  add_area_name(code_column = "my_area_code")

add_area_name(table, name_column = "my_custom_name")

Get commodity balance sheet item codes from item names

Description

Add a new column to an existing tibble with the corresponding code for each commodity balance sheet item name. The codes are assumed to be from those defined by FAOSTAT.

Usage

add_item_cbs_code(
  table,
  name_column = "item_cbs_name",
  code_column = "item_cbs_code"
)

Arguments

table

The table that will be modified with a new column.

name_column

The name of the column in table containing the names.

code_column

The name of the output column containing the codes.

Value

A tibble with all the contents of table and an extra column named code_column, which contains the codes. If there is no code match, an NA is included.

Examples

table <- tibble::tibble(
  item_cbs_name = c("Cottonseed", "Eggs", "Dummy Item")
)
add_item_cbs_code(table)

table |>
  dplyr::rename(my_item_cbs_name = item_cbs_name) |>
  add_item_cbs_code(name_column = "my_item_cbs_name")

add_item_cbs_code(table, code_column = "my_custom_code")

Get commodity balance sheet item names from item codes

Description

Add a new column to an existing tibble with the corresponding name for each commodity balance sheet item code. The codes are assumed to be from those defined by FAOSTAT.

Usage

add_item_cbs_name(
  table,
  code_column = "item_cbs_code",
  name_column = "item_cbs_name"
)

Arguments

table

The table that will be modified with a new column.

code_column

The name of the column in table containing the codes.

name_column

The name of the output column containing the names.

Value

A tibble with all the contents of table and an extra column named name_column, which contains the names. If there is no name match, an NA is included.

Examples

table <- tibble::tibble(item_cbs_code = c(2559, 2744, 9876))
add_item_cbs_name(table)

table |>
  dplyr::rename(my_item_cbs_code = item_cbs_code) |>
  add_item_cbs_name(code_column = "my_item_cbs_code")

add_item_cbs_name(table, name_column = "my_custom_name")

Get production item codes from item names

Description

Add a new column to an existing tibble with the corresponding code for each production item name. The codes are assumed to be from those defined by FAOSTAT.

Usage

add_item_prod_code(
  table,
  name_column = "item_prod_name",
  code_column = "item_prod_code"
)

Arguments

table

The table that will be modified with a new column.

name_column

The name of the column in table containing the names.

code_column

The name of the output column containing the codes.

Value

A tibble with all the contents of table and an extra column named code_column, which contains the codes. If there is no code match, an NA is included.

Examples

table <- tibble::tibble(
  item_prod_name = c("Rice", "Cabbages", "Dummy Item")
)
add_item_prod_code(table)

table |>
  dplyr::rename(my_item_prod_name = item_prod_name) |>
  add_item_prod_code(name_column = "my_item_prod_name")

add_item_prod_code(table, code_column = "my_custom_code")

Get production item names from item codes

Description

Add a new column to an existing tibble with the corresponding name for each production item code. The codes are assumed to be from those defined by FAOSTAT.

Usage

add_item_prod_name(
  table,
  code_column = "item_prod_code",
  name_column = "item_prod_name"
)

Arguments

table

The table that will be modified with a new column.

code_column

The name of the column in table containing the codes.

name_column

The name of the output column containing the names.

Value

A tibble with all the contents of table and an extra column named name_column, which contains the names. If there is no name match, an NA is included.

Examples

table <- tibble::tibble(item_prod_code = c(27, 358, 12345))
add_item_prod_name(table)

table |>
  dplyr::rename(my_item_prod_code = item_prod_code) |>
  add_item_prod_name(code_column = "my_item_prod_code")

add_item_prod_name(table, name_column = "my_custom_name")

Supply and use tables

Description

Create a table with processes, their inputs (use) and their outputs (supply).

Usage

build_supply_use(
  cbs_version = NULL,
  feed_intake_version = NULL,
  primary_prod_version = NULL,
  primary_residues_version = NULL,
  processing_coefs_version = NULL
)

Arguments

cbs_version

File version passed to get_wide_cbs() call.

feed_intake_version

File version passed to get_feed_intake() call.

primary_prod_version

File version passed to get_primary_production() call.

primary_residues_version

File version passed to get_primary_residues() call.

processing_coefs_version

File version passed to get_processing_coefs() call.

Value

A tibble with the supply and use data for processes. It contains the following columns:

Examples

# Note: These are smaller samples to show outputs, not the real data.
# For all data, call the function with default versions (i.e. no arguments).
build_supply_use(
  cbs_version = "20250721T132006Z-8ea47",
  feed_intake_version = "20250721T143825Z-c1313",
  primary_prod_version = "20250721T145805Z-8e12a",
  primary_residues_version = "20250721T150132Z-dfd94",
  processing_coefs_version = "20250721T143403Z-216d7"
)

Trade data sources

Description

Create a new dataframe where each row has a year range into one where each row is a single year, effectively 'expanding' the whole year range.

Usage

expand_trade_sources(trade_sources)

Arguments

trade_sources

A tibble dataframe where each row contains the year range.

Value

A tibble dataframe where each row corresponds to a single year for a given source.

Examples

trade_sources <- tibble::tibble(
  Name = c("a", "b", "c"),
  Trade = c("t1", "t2", "t3"),
  Info_Format = c("year", "partial_series", "year"),
  Timeline_Start = c(1, 1, 2),
  Timeline_End = c(3, 4, 5),
  Timeline_Freq = c(1, 1, 2),
  `Imp/Exp` = "Imp",
  SACO_link = NA,
)
expand_trade_sources(trade_sources)

Bilateral trade data

Description

Reports trade between pairs of countries in given years.

Usage

get_bilateral_trade(trade_version = NULL, cbs_version = NULL)

Arguments

trade_version

File version used for bilateral trade input. See whep_inputs for version details.

cbs_version

File version passed to get_wide_cbs() call.

Value

A tibble with the reported trade between countries. For efficient memory usage, the tibble is not exactly in tidy format. It contains the following columns:

The step by step approach to obtain this data tries to follow the FABIO model and is explained below. All the steps are performed separately for each group of year and item.

Examples

# Note: These are smaller samples to show outputs, not the real data.
# For all data, call the function with default versions (i.e. no arguments).
get_bilateral_trade(
  trade_version = "20250721T141553Z-5707e",
  cbs_version = "20250721T132006Z-8ea47"
)

Scrapes activity_data from FAOSTAT and slightly post-processes it

Description

Important: Dynamically allows for the introduction of subsets as "...".

Note: overhead by individually scraping FAOSTAT code QCL for crop data; it's fine.

Usage

get_faostat_data(activity_data, ...)

Arguments

activity_data

activity data required from FAOSTAT; needs to be one of c('livestock','crop_area','crop_yield','crop_production').

...

can be whichever column name from get_faostat_bulk, particularly year, area or ISO3_CODE.

Value

data.frame of FAOSTAT for activity_data; default is for all years and countries.

Examples


get_faostat_data("livestock", year = 2010, area = "Portugal")


Livestock feed intake

Description

Get amount of items used for feeding livestock.

Usage

get_feed_intake(version = NULL)

Arguments

version

File version to use as input. See whep_inputs for details.

Value

A tibble with the feed intake data. It contains the following columns:

Examples

# Note: These are smaller samples to show outputs, not the real data.
# For all data, call the function with default version (i.e. no arguments).
get_feed_intake(version = "20250721T143825Z-c1313")

Primary items production

Description

Get amount of crops, livestock and livestock products.

Usage

get_primary_production(version = NULL)

Arguments

version

File version to use as input. See whep_inputs for details.

Value

A tibble with the item production data. It contains the following columns:

Examples

# Note: These are smaller samples to show outputs, not the real data.
# For all data, call the function with default version (i.e. no arguments).
get_primary_production(version = "20250721T145805Z-8e12a")

Crop residue items

Description

Get type and amount of residue produced for each crop production item.

Usage

get_primary_residues(version = NULL)

Arguments

version

File version to use as input. See whep_inputs for details.

Value

A tibble with the crop residue data. It contains the following columns:

Examples

# Note: These are smaller samples to show outputs, not the real data.
# For all data, call the function with default version (i.e. no arguments).
get_primary_residues(version = "20250721T150132Z-dfd94")

Processed products share factors

Description

Reports quantities of commodity balance sheet items used for processing and quantities of their corresponding processed output items.

Usage

get_processing_coefs(version = NULL)

Arguments

version

File version to use as input. See whep_inputs for details.

Value

A tibble with the quantities for each processed product. It contains the following columns:

For the final data obtained, the quantities final_value_processed are balanced in the following sense: the total sum of final_value_processed for each unique tuple of ⁠(year, area_code, item_cbs_code_processed)⁠ should be exactly the quantity reported for that year, country and item_cbs_code_processed item in the production column obtained from get_wide_cbs(). This is because they are not primary products, so the amount from 'production' is actually the amount of subproduct obtained. TODO: Fix few data where this doesn't hold.

Examples

# Note: These are smaller samples to show outputs, not the real data.
# For all data, call the function with default version (i.e. no arguments).
get_processing_coefs(version = "20250721T143403Z-216d7")

Commodity balance sheet data

Description

States supply and use parts for each commodity balance sheet (CBS) item.

Usage

get_wide_cbs(version = NULL)

Arguments

version

File version to use as input. See whep_inputs for details.

Value

A tibble with the commodity balance sheet data in wide format. It contains the following columns:

The other columns are quantities (measured in tonnes), where total supply and total use should be balanced.

For supply:

For use:

There is an additional column domestic_supply which is computed as the total use excluding export.

Examples

# Note: These are smaller samples to show outputs, not the real data.
# For all data, call the function with default version (i.e. no arguments).
get_wide_cbs(version = "20250721T132006Z-8ea47")

Commodity balance sheet items

Description

Defines name/code correspondences for commodity balance sheet (CBS) items.

Usage

items_cbs

Format

A tibble where each row corresponds to one CBS item. It contains the following columns:

Source

Inspired by FAOSTAT data.


Primary production items

Description

Defines name/code correspondences for production items.

Usage

items_prod

Format

A tibble where each row corresponds to one production item. It contains the following columns:

Source

Inspired by FAOSTAT data.


Polities

Description

Defines name/code correspondences for polities (political entities).

Usage

polities

Format

A tibble where each row corresponds to one polity. It contains the following columns: TODO: On polities Pull Request, coming soon


External inputs

Description

The information needed for accessing external datasets used as inputs in our modeling.

Usage

whep_inputs

Format

A tibble where each row corresponds to one external input dataset. It contains the following columns:

Source

Created by the package authors.


Input file versions

Description

Lists all existing versions of an input file from whep_inputs.

Usage

whep_list_file_versions(file_alias)

Arguments

file_alias

Internal name of the requested file. You can find the possible values in the whep_inputs dataset.

Value

A tibble where each row is a version. For details about its format, see pins::pin_versions().

Examples

whep_list_file_versions("read_example")

Download, cache and read files

Description

Used to fetch input files that are needed for the package's functions and that were built in external sources and are too large to include directly. This is a public function for transparency purposes, so that users can inspect the original inputs of this package that were not directly processed here.

If the requested file doesn't exist locally, it is downloaded from a public link and cached before reading it. This is all implemented using the pins package. It supports multiple file formats and file versioning.

Usage

whep_read_file(file_alias, type = "parquet", version = NULL)

Arguments

file_alias

Internal name of the requested file. You can find the possible values in the alias column of the whep_inputs dataset.

type

The extension of the file that must be read. Possible values:

  • parquet: This is the default value for code efficiency reasons.

  • csv: Mainly available for those who want a more human-readable option. If the parquet version is available, this is useless because this function already returns the dataset in an R object, so the origin is irrelevant, and parquet is read faster.

Saving each file in both formats is for transparency and accessibility purposes, e.g., having to share the data with non-programmers who can easily import a CSV into a spreadsheet. You will most likely never have to set this option manually unless for some reason a file could not be supplied in e.g. parquet format but was in another one.

version

The version of the file that must be read. Possible values:

  • NULL: This is the default value. A frozen version is chosen to make the code reproducible. Each release will have its own frozen versions. The version is the string that can be found in whep_inputs in the version column.

  • "latest": This overrides the frozen version and instead fetches the latest one that is available. This might or might not match the frozen version.

  • Other: A specific version can also be used. For more details read the version column information from whep_inputs.

Value

A tibble with the dataset. Some information about each dataset can be found in the code where it's used as input for further processing.

Examples

whep_read_file("read_example")
whep_read_file("read_example", type = "parquet", version = "latest")
whep_read_file(
  "read_example",
  type = "csv",
  version = "20250721T152646Z-ce61b"
)