parabar is a
package designed to provide a simple interface for executing tasks in
parallel, while also providing functionality for tracking and displaying
the progress of the tasks.
This package is aimed at two audiences: (1) end-users who want to
execute a task in parallel in an interactive R session and
track the execution progress, and (2) R package developers
who want to use parabar as a
solution for parallel processing in their packages.
You can install parabar directly from CRAN
using the following command:
# Install the package from `CRAN`.
install.packages("parabar")
# Load the package.
library(parabar)Alternatively, you can also install the latest development version
from GitHub via:
# Install the package from `GitHub`.
remotes::install_github("mihaiconstantin/parabar")
# Load the package.
library(parabar)Below you can find a few examples of how to use parabar in
your R scripts, both for end-users, and for developers. All
examples below assume that you have already installed and loaded the
package.
In general, the usage of parabar
consists of the following steps:
Optionally, you can also configure the progress bar if the backend created supports progress tracking, or perform additional operations on the backend.
The simplest, and perhaps least interesting, way to use parabar is
by requesting a synchronous backend.
# Start a synchronous backend.
backend <- start_backend(cores = 4, cluster_type = "psock", backend_type = "sync")
# Run a task in parallel.
results <- par_sapply(backend, 1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})At this point you will notice the following warning message:
Warning message:
Progress tracking not supported for backend of type 'SyncBackend'.The reason for this is because progress tracking only works for
asynchronous backends, and parabar
enables progress tracking by default at load time. We can disable this
by option to get rid of the warning message.
# Disable progress tracking.
set_option("progress_track", FALSE)We can verify that the warning message is gone by running the task again, reusing the backend we created earlier.
# Run a task in parallel.
results <- par_sapply(backend, 1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})When we are done with this backend, we can stop it to free up the resources.
# Stop the backend.
stop_backend(backend)The more interesting way to use parabar is
by requesting an asynchronous backend. This is the default backend type,
and highlights the strengths of the package.
First, let’s ensure progress tracking is enabled (i.e., we disabled it above).
# Enable progress tracking.
set_option("progress_track", TRUE)Now, we can proceed with creating the backend and running the task.
# Start an asynchronous backend.
backend <- start_backend(cores = 4, cluster_type = "psock", backend_type = "async")
# Run a task in parallel.
results <- par_sapply(backend, 1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})At this point, we can see that the progress bar is displayed, and that the progress is tracked. The progress bar is updated in real-time, after each task execution, e.g.:
> completed 928 out of 1000 tasks [ 93%] [ 3s]We can also configure the progress bar. For example, suppose we want to display an actual progress bar.
# Change the progress bar options.
configure_bar(type = "modern", format = "[:bar] :percent")
# Run a task in parallel.
results <- par_sapply(backend, 1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})The progress bar will now look like this:
[====================>-------------------------------------------------] 30%By default, parabar uses
the progress
package to display the progress bar. However, we can easily swap it with
another progress bar engine. For example, suppose we want to use the
built-in utils::txtProgressBar.
# Change to and adjust the style of the `basic` progress bar.
configure_bar(type = "basic", style = 3)
# Run a task in parallel.
results <- par_sapply(backend, 1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})Check out ?configure_bar for more information on the
possible ways of configuring the progress bar.
We can also disable the progress bar for asynchronous backends altogether, by adjusting the package options.
# Disable progress tracking.
set_option("progress_track", FALSE)
# Run a task in parallel.
results <- par_sapply(backend, 1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})We can stop the backend when we are done.
# Stop the backend.
stop_backend(backend)Finally, we can also the ?par_sapply function without a
backend, which will resort to running the task sequentially by means of
utils::sapply.
# Run the task sequentially using the `base::sapply`.
results <- par_sapply(backend = NULL, 1:300, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})As indicated above, the general workflow consists of starting a backend, executing a task in parallel, and stopping the backend. However, there are additional operations that can be performed on a backend (i.e., see Developers section). The table below lists all available operations that can be performed on a backend.
| Operation | Description |
|---|---|
start_backend(backend) |
Start a backend. |
stop_backend(backend) |
Stop a backend. |
clear(backend) |
Remove all objects from a backend. |
peek(backend) |
List the names of the variables on a backend. |
export(backend,
variables, environment) |
Export objects to a backend. |
evaluate(backend,
expression) |
Evaluate expressions on a backend. |
par_sapply(backend,
x, fun) |
Run tasks in parallel on a backend. |
par_lapply(backend,
x, fun) |
Run tasks in parallel on a backend. |
par_apply(backend,
x, margin, fun) |
Run tasks in parallel on a backend. |
Check the documentation corresponding to each operation for more information and examples.
parabar
provides a rich API for developers who want to use the package in their
own projects.
From a high-level perspective, the package consists of
backends and
contexts in which these backends are
employed for executing tasks in parallel.
A backend represents a set of
operations, defined by the ?BackendService interface.
Backends can be synchronous (i.e., ?SyncBackend) or
asynchronous (i.e., ?AsyncBackend). The former will block
the execution of the current R session until the parallel
task is completed, while the latter will return immediately and the task
will be executed in a background R session.
The ?BackendService interface defines the following
operations:
start: Start the backend.stop: Stop the backend.clear: Remove all objects from the backend.peek: Show the variables names available on the
backend.export: Export variables from a given environment to
the backend.evaluate: Evaluate an arbitrary expression on the
backend.sapply: Run a task on the backend.lapply: Run a task on the backend.apply: Run a task on the backend.get_output: Get the output of the task execution.Check out the documentation for BackendService for more
information on each method.
A context represents the specific
conditions in which a backend object operates. The default context class
(i.e., ?Context) simply forwards the call to the
corresponding backend method. However, a more complex context can
augment the operation before forwarding the call to the backend. One
example of a complex context is the
?ProgressTrackingContext class. This class extends the
regular ?Context class and decorates, e.g., the backend
sapply operation to log the progress after each task
execution and display a progress bar.
The following are the main classes provided by parabar:
BackendService: Interface for backend operations.SyncBackend: Synchronous backend extending the abstract
Backend class and implementing the
BackendService interface.AsyncBackend: Asynchronous backend extending the
abstract Backend class and implementing the
BackendService interface.Specification: Backend specification used when starting
a backend.BackendFactory: Factory for creating
Backend objects.Context: Default context for executing backend
operations without interference.ProgressTrackingContext: Context for decorating the
sapply operation to track and display the progress.ContextFactory: Factory for creating
Context objects.UserApiConsumer: Wrapper around the developer
API.Additionally, parabar also
provides several classes for creating and updating different progress
bars, namely:
BasicBar: A simple, but robust, bar created via utils::txtProgressBar
extending the Bar abstract class.ModernBar: A modern bar created via progress::progress_bar
extending the Bar abstract class.BarFactory: Factory for creating Bar
objects.Below there is an example of how to use the package R6 class
API.
We start by creating a ?Specification object instructing
the ?Backend object how to create a cluster via the
built-in function parallel::makeCluster.
# Create a specification object.
specification <- Specification$new()
specification$set_cores(4)
specification$set_type("psock")We proceed by obtaining an asynchronous backend instance from the
?BackendFactory and starting the backend using the
?Specification instance above.
# Create a backend factory.
backend_factory <- BackendFactory$new()
# Get an asynchronous backend instance.
backend <- backend_factory$get("async")
# Start the backend.
backend$start(specification)Finally, we can run a task in parallel by calling, e.g., the
sapply method on the backend instance.
# Run a task in parallel.
backend$sapply(1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})At this point, the task was deployed in a background R
session, and the caller process is free to do other things.
Calling backend$get_output immediately after the
backend$sapply call will throw an error, indicating that
the task is still running, i.e.:
Error: A task is currently running.We can, however, block the caller process and wait for the task to complete before fetching the results.
results <- backend$get_output(wait = TRUE)We can now introduce the context concept to decorate the
backend instance and, in this example, track the progress
of the task. First, we obtain an ?Context instance from the
?ContextFactory. Furthermore, since we are using an
asynchronous backend, we can request a context that facilitates
progress-tracking.
# Create a context factory.
context_factory <- ContextFactory$new()
# Get a progress-tracking context.
context <- context_factory$get("progress")
# Register the backend with the context.
context$set_backend(backend)The ?Context class (i.e., and it’s subclasses)
implements the ?BackendService interface, which means that
we can use it to execute backend operations.
Since we are using the ?ProgressTrackingContext context,
we also need to register a ?Bar instance with the context.
First, let’s obtain a ?Bar instance from the
?BarFactory.
# Create a bar factory.
bar_factory <- BarFactory$new()
# Get a `modern` bar (i.e., via `progress::progress_bar`).
bar <- bar_factory$get("modern")We can now register the bar instance with the
context instance.
# Register the `bar` with the `context`.
context$set_bar(bar)We may also configure the bar, or change its appearance.
For instance, it may be a good idea is to show the progress bar right
away.
# Configure the `bar`.
context$configure_bar(
show_after = 0,
format = " > completed :current out of :total tasks [:percent] [:elapsed]"
)At this point, the backend$sapply operation is decorated
with progress tracking. Finally, we can run the task in parallel and
enjoy our progress bar using the context instance.
# Run a task in parallel with progress tracking.
context$sapply(1:1000, function(x) {
# Sleep a bit.
Sys.sleep(0.01)
# Compute and return.
x + 1
})All there is left to do is to fetch the results and stop the backend.
# Get the results.
results <- context$get_output()
# Stop the backend.
context$stop()Check out the UML diagram below for a quick overview of the package design.
Note. For the sake of clarity, the diagram
only displays the sapply operation for running tasks in
parallel. However, other operations are supported as well (i.e., see
table in the section Additional Operations).
GitHub.GitHub.
The
documentation, vignettes, and other website materials by
Mihai
Constantin are licensed under
CC
BY 4.0
.