| Type: | Package |
| Title: | Dataset of the 'Contoso' Company |
| Version: | 1.1.1 |
| Description: | A collection of synthetic datasets simulating sales transactions from a fictional company. The dataset includes various related tables that contain essential business and operational data, useful for analyzing sales performance and other business insights. Key tables included in the package are: - "sales": Contains data on individual sales transactions, including order details, pricing, quantities, and customer information. - "customer": Stores customer-specific details such as demographics, geographic location, occupation, and birthday. - "store": Provides information about stores, including location, size, status, and operational dates. - "orders": Contains details about customer orders, including order and delivery dates, store, and customer data. - "product": Contains data on products, including attributes such as product name, category, price, cost, and weight. - "date": A time-based table that includes date-related attributes like year, month, quarter, day, and working day indicators. This dataset is ideal for practicing data analysis, performing time-series analysis, creating reports, or simulating business intelligence scenarios. |
| License: | MIT + file LICENSE |
| Imports: | DBI, dplyr, cli, duckdb (≥ 1.4.0) |
| Suggests: | testthat (≥ 3.0.0) |
| Encoding: | UTF-8 |
| LazyData: | true |
| RoxygenNote: | 7.3.2 |
| Depends: | R (≥ 4.1.0) |
| URL: | https://usrbinr.github.io/contoso/, https://github.com/usrbinr/contoso |
| Config/testthat/edition: | 3 |
| BugReports: | https://github.com/usrbinr/contoso/issues |
| NeedsCompilation: | no |
| Packaged: | 2025-11-09 02:23:30 UTC; hagan |
| Author: | Alejandro Hagan [aut, cre] |
| Maintainer: | Alejandro Hagan <alejandro.hagan@outlook.com> |
| Repository: | CRAN |
| Date/Publication: | 2025-11-12 21:00:15 UTC |
Creates duckdb versions of Contoso datasets
Description
Creates duckdb versions of Contoso datasets
Usage
create_contoso_duckdb(db_dir = c("in_memory"), size = "100K")
Arguments
db_dir |
"temp" or "in_memory" |
size |
"100k","1M", "10M", or "100M" |
Details
The create_contonso_duckd() function registers the following Contoso datasets as DuckDB tables:
-
sales: Contains sales transaction data. -
product: Contains details about products, including attributes like product name, manufacturer, and category. -
customer: Contains customer demographic and geographic information. -
store: Contains information about store locations and attributes. -
fx: Contains foreign exchange rate data for currency conversion. -
date: Contains various date-related information, including day, week, month, and year. -
con: the duckdb connection to your database
You can choose to store the database in memory or in a temporary directory. If you choose "temp", the database will be created in a temporary file on disk. If you choose "in_memory", the database will be created entirely in memory and will be discarded after the R session ends.
Value
A list of lazy tbl objects that are references to the Contoso datasets stored in the DuckDB database. The list contains the following tables:
-
sales -
product -
customer -
store -
fx -
store -
orderrows -
date
Examples
# Create a DuckDB version of Contoso datasets stored in memory
## Not run:
create_contoso_duckdb(db_dir = "in_memory",size="100K")
## End(Not run)
Customer Data from the Contonso Dataset
Description
This dataset contains information about customers from the Contonso dataset, including demographic details, geographical information, contact information, and other personal attributes. It provides insights into customer profiles, including location, age, occupation, and more.
Usage
customer
Format
A data frame with 23 columns:
- customer_key
doubleUnique identifier for each customer.- geo_area_key
doubleUnique identifier for the geographical area the customer resides in.- start_dt
DateDate when the customer relationship began.- end_dt
DateDate when the customer relationship ended, if applicable.- continent
characterThe continent where the customer resides.- gender
characterThe gender of the customer (e.g., 'Male', 'Female').- title
characterThe title of the customer (e.g., 'Mr.', 'Ms.').- given_name
characterThe given (first) name of the customer.- middle_initial
characterThe middle initial of the customer, if applicable.- surname
characterThe surname (last name) of the customer.- street_address
characterThe street address of the customer.- city
characterThe city where the customer resides.- state
characterThe state or province where the customer resides.- state_full
characterThe full name of the state or province.- zip_code
characterThe postal (ZIP) code of the customer's address.- country
characterThe country where the customer resides, using the country code.- country_full
characterThe full name of the country where the customer resides.- birthday
DateThe date of birth of the customer.- age
doubleThe age of the customer.- occupation
characterThe customer's occupation or profession.- company
characterThe company the customer is associated with, if applicable.- vehicle
characterThe type or make of vehicle the customer owns or drives.- latitude
doubleThe latitude of the customer's address.- longitude
doubleThe longitude of the customer's address.
Source
https://github.com/sql-bi/Contoso-Data-Generator-V2-Data/releases/tag/ready-to-use-data
Date Dimension Data from the Contonso Dataset
Description
This dataset contains date-related information used for time-based analysis in the Contonso dataset. It includes various representations of date-related attributes, such as year, quarter, month, and day, along with indicators for working days. It is useful for time-series analysis and aggregating data by different time periods.
Usage
date
Format
A data frame with 17 columns:
- date
DateThe actual date for the record.- date_key
doubleUnique identifier for the date (often in YYYYMMDD format).- year
doubleThe year part of the date.- year_quarter
characterThe year and quarter (e.g., "2025 Q1").- year_quarter_number
doubleThe numerical representation of the quarter (e.g., 1, 2, 3, 4).- quarter
characterThe quarter of the year (e.g., "Q1", "Q2").- year_month
characterThe year and month (e.g., "2025-03").- year_month_short
characterA shortened version of year and month (e.g., "2025 Mar").- year_month_number
doubleThe numerical representation of the year-month (e.g., 202503 for March 2025).- month
characterThe month name (e.g., "March").- month_short
characterThe abbreviated month name (e.g., "Mar").- month_number
doubleThe numerical representation of the month (e.g., 3 for March).- dayof_week
characterThe full name of the day of the week (e.g., "Monday").- dayof_week_short
characterThe abbreviated day of the week (e.g., "Mon").- dayof_week_number
doubleThe numerical representation of the day of the week (e.g., 1 for Monday).- working_day
doubleIndicator of whether the date is a working day (1 for working day, 0 for non-working day).- working_day_number
doubleA numerical indicator for working day (e.g., 1 for working day, 0 for non-working day).
Source
https://github.com/sql-bi/Contoso-Data-Generator-V2-Data/releases/tag/ready-to-use-data
Foreign Exchange Data from the Contonso Dataset
Description
This dataset contains information about foreign exchange (FX) rates between different currencies. It includes details about the exchange rate for a given date, as well as the currencies involved. This dataset is useful for analyzing currency conversions and understanding the exchange rates between different currencies over time.
Usage
fx
Format
A data frame with 4 columns:
- date
DateThe date of the exchange rate.- from_currency
characterThe code of the source currency (e.g., "USD", "EUR").- to_currency
characterThe code of the target currency (e.g., "GBP", "JPY").- exchange
doubleThe exchange rate between the source and target currencies on the given date.
Source
https://github.com/sql-bi/Contoso-Data-Generator-V2-Data/releases/tag/ready-to-use-data
Launch the DuckDB UI in your browser
Description
The launch_ui() function installs and launches the DuckDB UI extension
for an active DuckDB database connection. This allows users to interact
with the database via a web-based graphical interface.
Your connection from create_contoso_duckdb() is returned in the list.
Usage
launch_ui(.con)
Arguments
.con |
A valid |
Details
The function performs the following steps:
Checks that the provided DuckDB connection is valid. If the connection is invalid, it aborts with a descriptive error message.
Installs the
uiextension into the connected DuckDB instance.Calls the
start_ui()procedure to launch the DuckDB UI in your browser.
This provides a convenient way to explore and manage DuckDB databases interactively without needing to leave the R environment.
Value
The function is called for its side effects and does not return a value. It launches the DuckDB UI and opens it in your default web browser.
See Also
-
create_contoso_duckdb()for creating example Contoso datasets in DuckDB. -
DBI::dbConnect()andDBI::dbDisconnect()for managing DuckDB connections. -
duckdb::duckdb()for creating a DuckDB driver instance.
Examples
## Not run:
# Connect to DuckDB
db <- create_contoso_duckdb()
# Launch the DuckDB UI
launch_ui(db$con)
# Clean up
DBI::dbDisconnect(db$con, shutdown = TRUE)
## End(Not run)
Order Rows Data from the Contonso Dataset
Description
This dataset contains detailed information about the individual items (rows) within each order in the Contonso dataset. It includes details such as the product, quantity, pricing, and cost of each item in an order. This dataset is useful for analyzing the breakdown of order components and individual product sales.
Usage
orderrows
Format
A data frame with 7 columns:
- order_key
doubleUnique identifier for the order to which the item belongs.- line_number
doubleLine number within the order, identifying each product line.- product_key
doubleUnique identifier for the product in the order row.- quantity
doubleThe quantity of the product ordered.- unit_price
doubleThe price per unit of the product.- net_price
doubleThe total net price for the product, considering any applicable discounts.- unit_cost
doubleThe cost per unit of the product.
Source
https://github.com/sql-bi/Contoso-Data-Generator-V2-Data/releases/tag/ready-to-use-data
Order Data from the Contonso Dataset
Description
This dataset contains information about customer orders, including order dates, delivery dates, and store details.
Usage
orders
Format
A data frame with 5 columns:
- order_key
doubleUnique identifier for the order.- customer_key
doubleUnique identifier for the customer who placed the order.- store_key
doubleUnique identifier for the store where the order was placed.- order_date
DateThe date when the order was placed.- delivery_date
DateThe date when the order is expected to be delivered.- currency_code
characterThe currency code used for the order (e.g., USD, EUR).
Source
https://github.com/sql-bi/Contoso-Data-Generator-V2-Data/releases/tag/ready-to-use-data
Product Data from the Contonso Dataset
Description
This dataset contains information about products in the Contonso dataset. It includes product details such as identifiers, descriptions, pricing, weight, and categorization. This dataset is useful for analyzing product characteristics, pricing, and product-related sales insights.
Usage
product
Format
A data frame with 14 columns:
- product_key
doubleUnique identifier for each product.- product_code
characterA code that uniquely identifies the product.- product_name
characterThe name or description of the product.- manufacturer
characterThe name of the manufacturer of the product.- brand
characterThe brand of the product.- color
characterThe color of the product.- weight_unit
characterThe unit of measurement for the product's weight (e.g., "kg", "lbs").- weight
doubleThe weight of the product.- cost
doubleThe cost price of the product.- price
doubleThe selling price of the product.- category_key
doubleUnique identifier for the category to which the product belongs.- category_name
characterThe name of the category to which the product belongs.- sub_category_key
doubleUnique identifier for the subcategory to which the product belongs.- sub_category_name
characterThe name of the subcategory to which the product belongs.
Source
https://github.com/sql-bi/Contoso-Data-Generator-V2-Data/releases/tag/ready-to-use-data
Sales Data from the Contonso Dataset
Description
This dataset contains information about sales orders, including order details, pricing, and customer data from the Contonso dataset. It provides insights into the transactions that have occurred, including order dates, delivery dates, customer and store information, as well as product details.
Usage
sales
Format
A data frame with sales columns:
- order_key
doubleUnique identifier for each order.- line_number
doubleLine number within the order (for multi-line orders).- order_date
DateDate when the order was placed.- delivery_date
DateDate when the order was delivered.- customer_key
doubleUnique identifier for the customer who placed the order.- store_key
doubleUnique identifier for the store where the order was placed.- product_key
doubleUnique identifier for the product in the order.- quantity
doubleThe quantity of the product ordered.- unit_price
doubleThe price per unit of the product.- net_price
doubleThe total net price for the product, considering any discounts.- unit_cost
doubleThe cost per unit of the product.- currency_code
characterThe currency code used for the transaction (e.g., USD, EUR).- exchange_rate
doubleThe exchange rate applied to the currency, if applicable.- gross_revenue
doubleA product's unit_price multiplied by quantity.- net_revenue
doubleA product's net_price multiplied by quantity.- unit_discount
doubleA product's unit_price minute net_price.- discounts
doubleA product's unit_discount multiplied by quantity.- cogs
doubleA product's unit_cost multiplied by quantity.- margin
doubleA product's net_revenue minus cogs.- unit_margin
doubleA product margin divided by quantity.
Source
https://github.com/sql-bi/Contoso-Data-Generator-V2-Data/releases/tag/ready-to-use-data
Store Data from the Contonso Dataset
Description
This dataset contains information about stores within the Contonso dataset. It includes details about the store's geographic location, operational status, and physical characteristics such as size and opening/closing dates. It provides insights into the store network of the company.
Usage
store
Format
A data frame with 11 columns:
- store_key
doubleUnique identifier for each store.- store_code
doubleA code that uniquely identifies the store.- geo_area_key
doubleUnique identifier for the geographical area where the store is located.- country_code
characterThe country code where the store is located (e.g., "US", "DE").- country_name
characterThe full name of the country where the store is located.- state
characterThe state or province where the store is located.- open_date
DateThe date when the store was opened.- close_date
DateThe date when the store was closed, if applicable.- description
characterA description of the store (e.g., "Flagship store", "Outlet store").- square_meters
doubleThe physical size of the store in square meters.- status
characterThe operational status of the store (e.g., "Open", "Closed").
Source
https://github.com/sql-bi/Contoso-Data-Generator-V2-Data/releases/tag/ready-to-use-data