{brickster}
is the R toolkit for Databricks, it
includes:
Wrappers for Databricks
API’s (e.g. db_cluster_list
,
db_volume_read
)
Browser workspace assets via RStudio Connections Pane (open_workspace()
)
{DBI}
+ {dbplyr}
backend (no more ODBC
installs!)
Interactive Databricks REPL
library(brickster)
# only requires `DATABRICKS_HOST` if using OAuth U2M
# first request will open browser window to login
Sys.setenv(DATABRICKS_HOST = "https://<workspace-prefix>.cloud.databricks.com")
# open RStudio/Positron connection pane to view Databricks resources
open_workspace()
# list all SQL warehouses
<- db_sql_warehouse_list() warehouses
Refer to the “Connect to a Databricks Workspace” article for more details on getting authentication configured.
{DBI}
Backendlibrary(brickster)
library(DBI)
# Connect to Databricks using DBI (assumes you followed quickstart to authenticate)
<- dbConnect(
con DatabricksSQL(),
warehouse_id = "<warehouse-id>"
)
# Standard {DBI} operations
<- dbListTables(con)
tables dbGetQuery(con, "SELECT * FROM samples.nyctaxi.trips LIMIT 5")
# Use with {dbplyr} for {dplyr} syntax
library(dplyr)
library(dbplyr)
<- tbl(con, I("samples.nyctaxi.trips"))
nyc_taxi
<- nyc_taxi |>
result filter(year(tpep_pickup_datetime) == 2016) |>
group_by(pickup_zip) |>
summarise(
trip_count = n(),
avg_fare = mean(fare_amount, na.rm = TRUE),
avg_distance = mean(trip_distance, na.rm = TRUE)
|>
) collect()
library(readr)
library(brickster)
# upload `data.csv` to a volume
<- tempfile(fileext = ".csv")
local_file write_csv(x = iris, file = local_file)
db_volume_write(
path = "/Volumes/<catalog>/<schema>/<volume>/data.csv",
file = local_file
)
# read `data.csv` from a volume and write to a file
<- tempfile(fileext = ".csv")
downloaded_file <- db_volume_read(
file path = "/Volumes/<catalog>/<schema>/<volume>/data.csv",
destination = downloaded_file
)<- read_csv(downloaded_file) volume_csv
Run commands against an existing interactive Databricks cluster, read this article for more details.
library(brickster)
# commands after this will run on the interactive cluster
# read the vignette for more details
db_repl(cluster_id = "<interactive_cluster_id>")
install.packages("brickster")
# install.packages("pak")
pak::pak("databrickslabs/brickster")
{brickster}
is very deliberate with choosing what API’s
are wrapped. {brickster}
isn’t intended to replace IaC
tooling (e.g. Terraform) or to be used for
account/workspace administration.
API | Available | Version |
---|---|---|
DBFS | Yes | 2.0 |
Secrets | Yes | 2.0 |
Repos | Yes | 2.0 |
mlflow Model Registry | Yes | 2.0 |
Clusters | Yes | 2.0 |
Libraries | Yes | 2.0 |
Workspace | Yes | 2.0 |
Endpoints | Yes | 2.0 |
Query History | Yes | 2.0 |
Jobs | Yes | 2.1 |
Volumes (Files) | Yes | 2.0 |
SQL Statement Execution | Yes | 2.0 |
REST 1.2 Commands | Partially | 1.2 |
Unity Catalog - Tables | Yes | 2.1 |
Unity Catalog - Volumes | Yes | 2.1 |
Unity Catalog | Partially | 2.1 |