Efficient Estimation of Bid-Ask Spreads from Open, High, Low, and Close Prices

This vignette illustrates how to estimate bid-ask spreads from open, high, low, and close prices. Let’s start by loading the package:

library(bidask)

The package offers two ways to estimate bid-ask spreads:

  1. edge(): designed for tidy data.
  2. spread(): designed for xts objects.

The function edge() implements the efficient estimator described in Ardia, Guidotti, & Kroencke (2024). Open, high, low, and close prices are to be passed as separate vectors.

The function spread() requires an xts object containing columns named Open, High, Low, Close and it provides additional functionalities, such as additional estimators and rolling estimates.

An output value of 0.01 corresponds to a spread estimate of 1%.

Examples are provided below.

Tidy data

The function edge() can be easily used with tidy data and the dplyr grammar. In the following example, we estimate bid-ask spreads for cryptocurrencies.

Download daily prices for Bitcoin and Ethereum using the crypto2 package:

library(dplyr)
library(crypto2)
df <- crypto_list(only_active=TRUE) %>%
  filter(symbol %in% c("BTC", "ETH")) %>%
  crypto_history(start_date = "20200101", end_date = "20221231")
#> ❯ Scraping historical crypto data
#> ❯ Processing historical crypto data
#> 
#> Coin The Infinite Garden does not have data available! Cont to next coin.
#> 
#> Coin The Infinite Garden does not have data available! Cont to next coin.
#> 
#> Coin The Infinite Garden does not have data available! Cont to next coin.
#> 
#> Coin Satoshi Pumpomoto does not have data available! Cont to next coin.
#> 
#> Coin Satoshi Pumpomoto does not have data available! Cont to next coin.
#> 
#> Coin Satoshi Pumpomoto does not have data available! Cont to next coin.
#> 
#> Coin Boost Trump Campaign does not have data available! Cont to next coin.
#> 
#> Coin Boost Trump Campaign does not have data available! Cont to next coin.
#> 
#> Coin Boost Trump Campaign does not have data available! Cont to next coin.
#> 
#> Coin batcat does not have data available! Cont to next coin.
#> 
#> Coin batcat does not have data available! Cont to next coin.
#> 
#> Coin batcat does not have data available! Cont to next coin.
#> 
#> Coin Bullish Trump Coin does not have data available! Cont to next coin.
#> 
#> Coin Bullish Trump Coin does not have data available! Cont to next coin.
#> 
#> Coin Bullish Trump Coin does not have data available! Cont to next coin.
head(df)
#> # A tibble: 6 × 17
#>      id slug    name    symbol timestamp           ref_cur_id ref_cur_name
#>   <int> <chr>   <chr>   <chr>  <dttm>              <chr>      <chr>       
#> 1     1 bitcoin Bitcoin BTC    2020-01-01 23:59:59 2781       USD         
#> 2     1 bitcoin Bitcoin BTC    2020-01-02 23:59:59 2781       USD         
#> 3     1 bitcoin Bitcoin BTC    2020-01-03 23:59:59 2781       USD         
#> 4     1 bitcoin Bitcoin BTC    2020-01-04 23:59:59 2781       USD         
#> 5     1 bitcoin Bitcoin BTC    2020-01-05 23:59:59 2781       USD         
#> 6     1 bitcoin Bitcoin BTC    2020-01-06 23:59:59 2781       USD         
#> # ℹ 10 more variables: time_open <dttm>, time_close <dttm>, time_high <dttm>,
#> #   time_low <dttm>, open <dbl>, high <dbl>, low <dbl>, close <dbl>,
#> #   volume <dbl>, market_cap <dbl>

Estimate the spread for each coin in each year:

df %>%
  mutate(yyyy = format(timestamp, "%Y")) %>%
  group_by(symbol, yyyy) %>%
  arrange(timestamp) %>%
  summarise(EDGE = edge(open, high, low, close))
#> # A tibble: 6 × 3
#> # Groups:   symbol [2]
#>   symbol yyyy      EDGE
#>   <chr>  <chr>    <dbl>
#> 1 BTC    2020  0.00319 
#> 2 BTC    2021  0.00376 
#> 3 BTC    2022  0.000200
#> 4 ETH    2020  0.00223 
#> 5 ETH    2021  0.00628 
#> 6 ETH    2022  0.00262

xts objects

The function spread() provides additional functionalities for xts objects. In the following example, we estimate bid-ask spreads for equities.

Download daily data for Microsoft (MSFT) using the quantmod package:

library(quantmod)
x <- getSymbols("MSFT", auto.assign = FALSE, start = "2019-01-01", end = "2022-12-31")
head(x)
#>            MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume MSFT.Adjusted
#> 2007-01-03     29.91     30.25    29.40      29.86    76935100      21.32047
#> 2007-01-04     29.70     29.97    29.44      29.81    45774500      21.28477
#> 2007-01-05     29.63     29.75    29.45      29.64    44607200      21.16339
#> 2007-01-08     29.65     30.10    29.53      29.93    50220200      21.37045
#> 2007-01-09     30.00     30.18    29.73      29.96    44636600      21.39187
#> 2007-01-10     29.80     29.89    29.43      29.66    55017400      21.17767

This is an xts object:

class(x)
#> [1] "xts" "zoo"

So we can estimate the spread with:

spread(x)
#>                   EDGE
#> 2024-08-16 0.005442099

By default, the call above is equivalent to:

edge(open = x[,1], high = x[,2], low = x[,3], close = x[,4])
#> [1] 0.005442099

But spread() also provides additional functionalities. For instance, estimate the spread for each month and plot the estimates:

sp <- spread(x, width = endpoints(x, on = "months"))
plot(sp)

Or estimate the spread using a rolling window of 21 obervations:

sp <- spread(x, width = 21)
plot(sp)

To illustrate higher-frequency estimates, we are going to download intraday data from Alpha Vantage. You must register with Alpha Vantage in order to download their data, but the one-time registration is fast and free. Register at https://www.alphavantage.co/ to receive your key. You can set the API key globally as follows:

setDefaults(getSymbols.av, api.key = "<API-KEY>")

Download minute data for Microsoft:

x <- getSymbols(
  Symbols = "MSFT", 
  auto.assign = FALSE, 
  src = "av", 
  periodicity = "intraday", 
  interval = "1min", 
  output.size = "full")
head(x)
#>                     MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume
#> 2023-08-17 04:00:00    319.20    322.00   319.20     320.39         992
#> 2023-08-17 04:01:00    320.03    320.18   320.00     320.18         914
#> 2023-08-17 04:02:00    320.38    320.38   320.35     320.35         170
#> 2023-08-17 04:03:00    320.35    320.35   320.06     320.34          96
#> 2023-08-17 04:04:00    320.34    320.34   320.34     320.34          17
#> 2023-08-17 04:05:00    320.34    320.34   320.29     320.30          11

Estimate the spread for each day and plot the estimates:

sp <- spread(x, width = endpoints(x, on = "day"))
plot(sp)

GitHub

If you find this package useful, please star the repo!

The repository also contains implementations for Python, C++, MATLAB, and more.

Cite as

Ardia, D., Guidotti, E., Kroencke, T.A. (2024). Efficient Estimation of Bid-Ask Spreads from Open, High, Low, and Close Prices. Journal of Financial Economics, 161, 103916. doi: 10.1016/j.jfineco.2024.103916

A BibTex entry for LaTeX users is:

@article{edge,
  title = {Efficient estimation of bid–ask spreads from open, high, low, and close prices},
  journal = {Journal of Financial Economics},
  volume = {161},
  pages = {103916},
  year = {2024},
  doi = {https://doi.org/10.1016/j.jfineco.2024.103916},
  author = {David Ardia and Emanuele Guidotti and Tim A. Kroencke},
}