Tidygeocoder provides a unified interface for performing both forward and reverse geocoding queries with a variety of geocoding services. In forward geocoding you provide an address to the geocoding service and you get latitude and longitude coordinates in return. In reverse geocoding you provide the latitude and longitude and the geocoding service will return that location’s address. In both cases, other data about the location can be provided by the geocoding service.
The geocode() and geo() functions are for
forward geocoding while the reverse_geocode() and
reverse_geo() functions perform reverse geocoding. The
geocode() and reverse_geocode() functions
extract either addresses (forward geocoding) or coordinates (reverse
geocoding) from the input dataframe and pass this data to the
geo() and reverse_geo() functions respectively
which execute the geocoding queries. All extra arguments
(...) given to geocode() are passed to
geo() and extra arguments given to
reverse_geocode() are passed to
reverse_geo().
library(tibble)
library(dplyr)
library(tidygeocoder)
address_single <- tibble(singlelineaddress = c(
  "11 Wall St, NY, NY",
  "600 Peachtree Street NE, Atlanta, Georgia"
))
address_components <- tribble(
  ~street, ~cty, ~st,
  "11 Wall St", "NY", "NY",
  "600 Peachtree Street NE", "Atlanta", "GA"
)You can use the address argument to specify single-line
addresses. Note that when multiple addresses are provided, the batch
geocoding functionality of the Census geocoding service is used.
Additionally, verbose = TRUE displays logs to the
console.
census_s1 <- address_single %>%
  geocode(address = singlelineaddress, method = "census", verbose = TRUE)
#> 
#> Number of Unique Addresses: 2
#> Executing batch geocoding...
#> Batch limit: 10,000
#> Passing 2 addresses to the US Census batch geocoder
#> Querying API URL: https://geocoding.geo.census.gov/geocoder/locations/addressbatch
#> Passing the following parameters to the API:
#> format : "json"
#> benchmark : "Public_AR_Current"
#> vintage : "Current_Current"
#> Query completed in: 0.1 seconds| singlelineaddress | lat | long | 
|---|---|---|
| 11 Wall St, NY, NY | 40.70711 | -74.01078 | 
| 600 Peachtree Street NE, Atlanta, Georgia | 33.81903 | -84.37582 | 
Alternatively you can run the same query with the geo()
function by passing the address values from the dataframe directly. In
either geo() or geocode(), the
lat and long arguments are used to name the
resulting latitude and longitude fields. Here the method
argument is used to specify the “osm” (Nominatim) geocoding service.
Refer to the geo() function documentation for the possible
values of the method argument.
osm_s1 <- geo(
  address = address_single$singlelineaddress, method = "osm",
  lat = latitude, long = longitude
)
#> Passing 2 addresses to the Nominatim single address geocoder
#> Query completed in: 2 seconds| address | latitude | longitude | 
|---|---|---|
| 11 Wall St, NY, NY | 40.73128 | -73.43475 | 
| 600 Peachtree Street NE, Atlanta, Georgia | 33.77085 | -84.38614 | 
Instead of single-line addresses, you can use any combination of the
following arguments to specify your addresses: street,
city, state, county,
postalcode, and country.
census_c1 <- address_components %>%
  geocode(street = street, city = cty, state = st, method = "census")
#> Passing 2 addresses to the US Census batch geocoder
#> Query completed in: 0.2 seconds| street | cty | st | lat | long | 
|---|---|---|---|---|
| 11 Wall St | NY | NY | 40.70711 | -74.01078 | 
| 600 Peachtree Street NE | Atlanta | GA | 33.81903 | -84.37582 | 
To return the full geocoding service results (not just latitude and
longitude), specify full_results = TRUE. Additionally, for
the Census geocoder you can get fields for geographies such as Census
tracts by specifying
api_options = list(census_return_type = 'geographies'). Be
sure to use full_results = TRUE with the “geographies”
return type in order to allow the Census geography columns to be
returned.
census_full1 <- address_single %>% geocode(
  address = singlelineaddress,
  method = "census", full_results = TRUE, api_options = list(census_return_type = 'geographies')
)
#> Passing 2 addresses to the US Census batch geocoder
#> Query completed in: 0.2 seconds| singlelineaddress | lat | long | id | input_address | match_indicator | match_type | matched_address | tiger_line_id | tiger_side | state_fips | county_fips | census_tract | census_block | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11 Wall St, NY, NY | 40.70711 | -74.01078 | 1 | 11 Wall St, NY, NY, , , | Match | Exact | 11 WALL ST, NEW YORK, NY, 10005 | 59659656 | R | 36 | 061 | 000700 | 1004 | 
| 600 Peachtree Street NE, Atlanta, Georgia | 33.81903 | -84.37582 | 2 | 600 Peachtree Street NE, Atlanta, Georgia, , , | Match | Non_Exact | 600 PEACHTREE HILLS CIR NE, ATLANTA, GA, 30305 | 618355955 | L | 13 | 121 | 009302 | 3000 | 
As mentioned earlier, the geocode() function passes
addresses in dataframes to the geo() function for geocoding
so we can also directly use the geo() function in a similar
way:
salz <- geo("Salzburg, Austria", method = "osm", full_results = TRUE) %>%
  select(-licence)
#> Passing 1 address to the Nominatim single address geocoder
#> Query completed in: 1 seconds| address | lat | long | place_id | osm_type | osm_id | class | type | place_rank | importance | addresstype | name | display_name | boundingbox | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Salzburg, Austria | 47.79813 | 13.04648 | 62682279 | relation | 86538 | boundary | administrative | 12 | 0.688384 | city | Salzburg | Salzburg, 5020, Österreich | 47.7512115, 47.8543925, 12.9856478, 13.1275256 | 
For reverse geocoding you’ll use reverse_geocode()
instead of geocode() and reverse_geo() instead
of geo(). Note that the reverse geocoding functions are
structured very similarly to the forward geocoding functions and share
many of the same arguments (method, limit,
full_results, etc.). For reverse geocoding you will provide
latitude and longitude coordinates as inputs and the location’s address
will be returned by the geocoding service.
Below, the reverse_geocode() function is used to geocode
coordinates in a dataframe. The lat and long
arguments specify the columns that contain the latitude and longitude
data. The address argument can be used to specify the
single line address column name that is returned from the geocoder. Just
as with forward geocoding, the method argument is used to
specify the geocoding service.
lat_longs1 <- tibble(
  latitude = c(38.895865, 43.6534817),
  longitude = c(-77.0307713, -79.3839347)
)
rev1 <- lat_longs1 %>%
  reverse_geocode(lat = latitude, long = longitude, address = addr, method = "osm")
#> Passing 2 coordinates to the Nominatim single coordinate geocoder
#> Query completed in: 2 seconds| latitude | longitude | addr | 
|---|---|---|
| 38.89587 | -77.03077 | L’Enfant’s plan, Pennsylvania Avenue, Ward 2, Washington, District of Columbia, 20045, United States | 
| 43.65348 | -79.38393 | Toronto City Hall, 100, Queen Street West, Yonge-Bay Corridor, Spadina—Fort York, Toronto, Golden Horseshoe, Ontario, M5H 2N2, Canada | 
The same query can also be performed by passing the latitude and
longitudes directly to the reverse_geo() function. Here we
will use full_results = TRUE so that the full results are
returned (not just the single line address column).
rev2 <- reverse_geo(
  lat = lat_longs1$latitude,
  long = lat_longs1$longitude,
  method = "osm",
  full_results = TRUE
)
#> Passing 2 coordinates to the Nominatim single coordinate geocoder
#> Query completed in: 2 seconds
glimpse(rev2)
#> Rows: 2
#> Columns: 30
#> $ lat              [3m[38;5;246m<dbl>[39m[23m 38.89587, 43.65348
#> $ long             [3m[38;5;246m<dbl>[39m[23m -77.03077, -79.38393
#> $ address          [3m[38;5;246m<chr>[39m[23m "L’Enfant's plan, Pennsylvania Avenue, Ward 2, Washington, District of Columbia, 20045, United States", "Toronto City Hall, 100, Queen Street West, Yonge-Bay Co…
#> $ place_id         [3m[38;5;246m<int>[39m[23m 320497331, 323744191
#> $ licence          [3m[38;5;246m<chr>[39m[23m "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright", "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright"
#> $ osm_type         [3m[38;5;246m<chr>[39m[23m "way", "way"
#> $ osm_id           [3m[38;5;246m<int>[39m[23m 899927546, 198500761
#> $ osm_lat          [3m[38;5;246m<chr>[39m[23m "38.895859599999994", "43.6536032"
#> $ osm_lon          [3m[38;5;246m<chr>[39m[23m "-77.0306779870984", "-79.38400546703345"
#> $ class            [3m[38;5;246m<chr>[39m[23m "tourism", "amenity"
#> $ type             [3m[38;5;246m<chr>[39m[23m "artwork", "townhall"
#> $ place_rank       [3m[38;5;246m<int>[39m[23m 30, 30
#> $ importance       [3m[38;5;246m<dbl>[39m[23m 8.492943e-05, 4.288063e-01
#> $ addresstype      [3m[38;5;246m<chr>[39m[23m "tourism", "amenity"
#> $ name             [3m[38;5;246m<chr>[39m[23m "L’Enfant's plan", "Toronto City Hall"
#> $ tourism          [3m[38;5;246m<chr>[39m[23m "L’Enfant's plan", NA
#> $ road             [3m[38;5;246m<chr>[39m[23m "Pennsylvania Avenue", "Queen Street West"
#> $ borough          [3m[38;5;246m<chr>[39m[23m "Ward 2", NA
#> $ city             [3m[38;5;246m<chr>[39m[23m "Washington", "Toronto"
#> $ state            [3m[38;5;246m<chr>[39m[23m "District of Columbia", "Ontario"
#> $ `ISO3166-2-lvl4` [3m[38;5;246m<chr>[39m[23m "US-DC", "CA-ON"
#> $ postcode         [3m[38;5;246m<chr>[39m[23m "20045", "M5H 2N2"
#> $ country          [3m[38;5;246m<chr>[39m[23m "United States", "Canada"
#> $ country_code     [3m[38;5;246m<chr>[39m[23m "us", "ca"
#> $ boundingbox      [3m[38;5;246m<list>[39m[23m <"38.8957273", "38.8959688", "-77.0311667", "-77.0301895">, <"43.6529946", "43.6541458", "-79.3848438", "-79.3830415">
#> $ amenity          [3m[38;5;246m<chr>[39m[23m NA, "Toronto City Hall"
#> $ house_number     [3m[38;5;246m<chr>[39m[23m NA, "100"
#> $ city_block       [3m[38;5;246m<chr>[39m[23m NA, "Yonge-Bay Corridor"
#> $ quarter          [3m[38;5;246m<chr>[39m[23m NA, "Spadina—Fort York"
#> $ state_district   [3m[38;5;246m<chr>[39m[23m NA, "Golden Horseshoe"Only unique input data (either addresses or coordinates) is passed to geocoding services even if your data contains duplicates. NA and blank inputs are excluded from queries. Input latitudes and longitudes are also limited to the range of possible values.
Below is an example of how duplicate and missing data is handled by tidygeocoder. As the console messages shows, only the two unique addresses are passed to the geocoding service.
# create a dataset with duplicate and NA addresses
duplicate_addrs <- address_single %>%
  bind_rows(address_single) %>%
  bind_rows(tibble(singlelineaddress = rep(NA, 3)))
duplicates_geocoded <- duplicate_addrs %>%
  geocode(singlelineaddress, verbose = TRUE)
#> 
#> Number of Unique Addresses: 2
#> Passing 2 addresses to the Nominatim single address geocoder
#> 
#> Number of Unique Addresses: 1
#> Querying API URL: https://nominatim.openstreetmap.org/search
#> Passing the following parameters to the API:
#> q : "11 Wall St, NY, NY"
#> limit : "1"
#> format : "json"
#> HTTP Status Code: 200
#> Query completed in: 0.6 seconds
#> Total query time (including sleep): 1 seconds
#> 
#> 
#> Number of Unique Addresses: 1
#> Querying API URL: https://nominatim.openstreetmap.org/search
#> Passing the following parameters to the API:
#> q : "600 Peachtree Street NE, Atlanta, Georgia"
#> limit : "1"
#> format : "json"
#> HTTP Status Code: 200
#> Query completed in: 0.5 seconds
#> Total query time (including sleep): 1 seconds
#> 
#> Query completed in: 2 seconds| singlelineaddress | lat | long | 
|---|---|---|
| 11 Wall St, NY, NY | 40.73128 | -73.43475 | 
| 600 Peachtree Street NE, Atlanta, Georgia | 33.77085 | -84.38614 | 
| 11 Wall St, NY, NY | 40.73128 | -73.43475 | 
| 600 Peachtree Street NE, Atlanta, Georgia | 33.77085 | -84.38614 | 
| NA | NA | NA | 
| NA | NA | NA | 
| NA | NA | NA | 
As shown above, duplicates will not be removed from your results by
default. However, you can return only unique results by using
unique_only = TRUE. Note that passing
unique_only = TRUE to geocode() or
reverse_geocode() will result in the original dataframe
format (including column names) to be discarded in favor of the standard
field names (ie. “address”, ‘lat, ’long’, etc.).
uniqueonly1 <- duplicate_addrs %>%
  geocode(singlelineaddress, unique_only = TRUE)
#> Passing 2 addresses to the Nominatim single address geocoder
#> Query completed in: 2 seconds| address | lat | long | 
|---|---|---|
| 11 Wall St, NY, NY | 40.73128 | -73.43475 | 
| 600 Peachtree Street NE, Atlanta, Georgia | 33.77085 | -84.38614 | 
The geocode_combine() function allows you to execute and
combine the results of multiple geocoding queries. The queries are
specified as a list of lists with the queries parameter and
are executed in the order provided. By default only addresses that are
not found are passed to the next query, but this behavior can be toggled
with the cascade argument.
In the first example below, the US Census service is used for the
first query while the Nominatim (“osm”) service is used for the second
query. The global_params argument passes the
address column from the input dataset to both queries.
addresses_combine <- tibble(
  address = c('100 Wall Street NY, NY', 'Paris', 'Not An Address')
)
cascade_results1 <- addresses_combine %>%
  geocode_combine(
    queries = list(
      list(method = 'census'),
      list(method = 'osm')
    ),
    global_params = list(address = 'address')
  )
#> 
#> Passing 3 addresses to the US Census batch geocoder
#> Query completed in: 0.2 seconds
#> Passing 3 addresses to the Nominatim single address geocoder
#> Query completed in: 3 seconds| address | lat | long | query | 
|---|---|---|---|
| 100 Wall Street NY, NY | 40.70522 | -74.006800 | osm | 
| Paris | 48.85350 | 2.348391 | osm | 
| Not An Address | NA | NA | 
If cascade is set to FALSE then all addresses are
attempted by each query regardless of if the address was found initially
or not.
no_cascade_results1 <- addresses_combine %>%
  geocode_combine(
    queries = list(
      list(method = 'census'),
      list(method = 'osm')
    ),
    global_params = list(address = 'address'),
    cascade = FALSE
  )
#> 
#> Passing 3 addresses to the US Census batch geocoder
#> Query completed in: 0.2 seconds
#> Passing 3 addresses to the Nominatim single address geocoder
#> Query completed in: 3 seconds| address | lat | long | query | 
|---|---|---|---|
| 100 Wall Street NY, NY | NA | NA | census | 
| 100 Wall Street NY, NY | 40.70522 | -74.006800 | osm | 
| Paris | NA | NA | census | 
| Paris | 48.85350 | 2.348391 | osm | 
| Not An Address | NA | NA | census | 
| Not An Address | NA | NA | osm | 
Additionally, the results from each query can be returned in separate
list items by setting return_list = TRUE.
The limit argument can be specified to allow multiple
results (rows) per input if available. The maximum value for the
limit argument is often 100 for geocoding services. To use
the default limit value for the selected geocoding service
you can use limit = NULL which will prevent the limit
parameter from being included in the query.
geo_limit <- geo(
  c("Lima, Peru", "Cairo, Egypt"),
  method = "osm",
  limit = 3, full_results = TRUE
)
#> Passing 2 addresses to the Nominatim single address geocoder
#> Query completed in: 2 seconds
glimpse(geo_limit)
#> Rows: 5
#> Columns: 15
#> $ address      [3m[38;5;246m<chr>[39m[23m "Lima, Peru", "Lima, Peru", "Lima, Peru", "Cairo, Egypt", "Cairo, Egypt"
#> $ lat          [3m[38;5;246m<dbl>[39m[23m -12.06211, -12.20011, -12.00021, 30.04439, 30.03325
#> $ long         [3m[38;5;246m<dbl>[39m[23m -77.03653, -76.28506, -76.83308, 31.23573, 31.56217
#> $ place_id     [3m[38;5;246m<int>[39m[23m 3325796, 3023110, 3206663, 44830415, 46149175
#> $ licence      [3m[38;5;246m<chr>[39m[23m "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright", "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright", "Data © OpenStreet…
#> $ osm_type     [3m[38;5;246m<chr>[39m[23m "node", "relation", "relation", "relation", "relation"
#> $ osm_id       [3m[38;5;246m<dbl>[39m[23m 4289361265, 1944659, 1944670, 5466227, 4103336
#> $ class        [3m[38;5;246m<chr>[39m[23m "place", "boundary", "boundary", "place", "boundary"
#> $ type         [3m[38;5;246m<chr>[39m[23m "city", "administrative", "administrative", "city", "administrative"
#> $ place_rank   [3m[38;5;246m<int>[39m[23m 15, 8, 12, 16, 8
#> $ importance   [3m[38;5;246m<dbl>[39m[23m 0.7282255, 0.5524865, 0.5247970, 0.7411248, 0.5215683
#> $ addresstype  [3m[38;5;246m<chr>[39m[23m "city", "state", "region", "city", "state"
#> $ name         [3m[38;5;246m<chr>[39m[23m "Lima", "Lima", "Lima", "القاهرة", "القاهرة"
#> $ display_name [3m[38;5;246m<chr>[39m[23m "Lima, Lima Metropolitana, Lima, 15083, Perú", "Lima, Perú", "Lima, Lima Metropolitana, Lima, Perú", "القاهرة, مصر", "القاهرة, مصر"
#> $ boundingbox  [3m[38;5;246m<list>[39m[23m <"-12.2221065", "-11.9021065", "-77.1965256", "-76.8765256">, <"-13.3241714", "-10.2741856", "-77.8863105", "-75.5075000">, <"-12.5199316", "-11.5724356", "-77.1992…To directly specify specific API parameters for a given
method you can use the custom_query parameter.
For example, the
Nominatim (OSM) geocoder has a ‘polygon_geojson’ argument that can
be used to return GeoJSON geometry content. To pass this parameter you
can insert it with a named list using the custom_query
argument:
cairo_geo <- geo("Cairo, Egypt",
  method = "osm", full_results = TRUE,
  custom_query = list(polygon_geojson = 1), verbose = TRUE
)
#> 
#> Number of Unique Addresses: 1
#> Passing 1 address to the Nominatim single address geocoder
#> 
#> Number of Unique Addresses: 1
#> Querying API URL: https://nominatim.openstreetmap.org/search
#> Passing the following parameters to the API:
#> q : "Cairo, Egypt"
#> limit : "1"
#> polygon_geojson : "1"
#> format : "json"
#> HTTP Status Code: 200
#> Query completed in: 0.4 seconds
#> Total query time (including sleep): 1 seconds
#> 
#> Query completed in: 1 seconds
glimpse(cairo_geo)
#> Rows: 1
#> Columns: 17
#> $ address             [3m[38;5;246m<chr>[39m[23m "Cairo, Egypt"
#> $ lat                 [3m[38;5;246m<dbl>[39m[23m 30.04439
#> $ long                [3m[38;5;246m<dbl>[39m[23m 31.23573
#> $ place_id            [3m[38;5;246m<int>[39m[23m 44830415
#> $ licence             [3m[38;5;246m<chr>[39m[23m "Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright"
#> $ osm_type            [3m[38;5;246m<chr>[39m[23m "relation"
#> $ osm_id              [3m[38;5;246m<int>[39m[23m 5466227
#> $ class               [3m[38;5;246m<chr>[39m[23m "place"
#> $ type                [3m[38;5;246m<chr>[39m[23m "city"
#> $ place_rank          [3m[38;5;246m<int>[39m[23m 16
#> $ importance          [3m[38;5;246m<dbl>[39m[23m 0.7411248
#> $ addresstype         [3m[38;5;246m<chr>[39m[23m "city"
#> $ name                [3m[38;5;246m<chr>[39m[23m "القاهرة"
#> $ display_name        [3m[38;5;246m<chr>[39m[23m "القاهرة, مصر"
#> $ boundingbox         [3m[38;5;246m<list>[39m[23m <"29.7483062", "30.3209168", "31.2200331", "31.9090054">
#> $ geojson.type        [3m[38;5;246m<chr>[39m[23m "Polygon"
#> $ geojson.coordinates [3m[38;5;246m<list>[39m[23m <<array[1 x 119 x 2]>>To test a query without sending any data to a geocoding service, you
can use no_query = TRUE (NA results are returned).
noquery1 <- geo(c("Vancouver, Canada", "Las Vegas, NV"),
  no_query = TRUE,
  method = "arcgis"
)
#> 
#> Number of Unique Addresses: 2
#> Passing 2 addresses to the ArcGIS single address geocoder
#> 
#> Number of Unique Addresses: 1
#> Querying API URL: https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates
#> Passing the following parameters to the API:
#> SingleLine : "Vancouver, Canada"
#> maxLocations : "1"
#> f : "json"
#> outFields : "*"
#> 
#> Number of Unique Addresses: 1
#> Querying API URL: https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates
#> Passing the following parameters to the API:
#> SingleLine : "Las Vegas, NV"
#> maxLocations : "1"
#> f : "json"
#> outFields : "*"
#> Query completed in: 0 seconds| address | lat | long | 
|---|---|---|
| Vancouver, Canada | NA | NA | 
| Las Vegas, NV | NA | NA | 
Additional usage notes for the geocode(),
geo(), reverse_geocode(), and
reverse_geo() functions:
quiet = TRUE to silence console logs displayed by
default (how many inputs were submitted, to what geocoding service, and
the elapsed time).progress_bar argument to control if a progress
bar is displayed.verbose, quiet, and
progress_bar arguments can be set globally with
options. For instance
options(tidygeocoder.verbose = TRUE) will set verbose to
TRUE for all queries by default.api_options or
api_url arguments. See ?geo or
?reverse_geo for details.min_time argument will
default to a value based on the maximum query rate of the given
geocoding service. If you are using a local Nominatim server or have a
commercial geocoder plan that has less restrictive usage limits then you
can manually set min_time to a lower value (such as
0).mode argument can be used to specify whether the
batch geocoding or single address/coordinate geocoding should be used.
By default batch geocoding will be used if available when more than one
address or coordinate is provided (with some noted exceptions for slower
batch geocoding services).return_addresses and return_coords
parameters (for forward and reverse geocoding respectively) can be used
to toggle whether the input addresses or coordinates are returned.
Setting these parameters to FALSE is necessary to use batch
geocoding if limit is greater than 1 or NULL.reverse_geocode() and geocode()
functions, the return_input argument can be used to toggle
if the input dataset is included in the returned dataframe.geocode() and
reverse_geocode() functions. See #154 for
details.