
CongressData is a package designed to allow a user with
only basic knowledge of R interact with CongressData, a
dataset with over 1000 variables that compiles information about all US
congressional districts across 1789-2023, and its codebook. The dataset
tracks district characteristics, members of congress, and the behavior
of those members in policymaking. Users can find variables related to
demographics, politics, and policy; subset the data across multiple
dimensions; create custom aggregations of the dataset; and access
citations in both plain text and BibTeX for every variable.
CongressData is a functional package that interacts with
the CongressData dataset via the internet. Install the package from
GitHub like so:
# use the devtools library to download the package from GitHub
library(devtools)
# if there are issues or you only want to download CongressData
install_github("ippsr/CongressData")get_var_info: Retrieve information regarding variables
in CongressData and identify variables of interest with
get_var_info. The function allows you to search to codebook
to find the years each variable is observed in the data; a short and
long description of each variable; and the source and citation/s for
each variable. Citations are available in both bibtex and plain text.
Use the function to search for broad terms like ‘tax’ with the
related_to argument and/or partial-match variable names
with var_names.
suppressMessages(library(dplyr))
library(CongressData)
#> Please cite:
#> Grossmann, Matt, Caleb Lucas, and Benjamin Yoel. Introducing CongressData and Correlates of State Policy
#> East Lansing, MI: Institute for Public Policy and Social Research (IPPSR), 2024.
#>
#> Run `CongressData::get_congress_version()` to print the version of CongressData the package is using.
# variables related to health insurance
h_ins_cong <- get_var_info(related_to = "health insurance")
cat("There are",nrow(h_ins_cong),"variables related to health insurance in CongressData")
#> There are 41 variables related to health insurance in CongressData
head(h_ins_cong$variable)
#> [1] "percent_under18_healthins" "percent_private_under18"
#> [3] "percent_public_under18" "percent_privpub_under18"
#> [5] "percent_pop18_34" "percent_private_18_34"
# variables with 'under18' in their name
under18_cong <- get_var_info(var_names = "under18")
head(under18_cong$variable)
#> [1] "percent_under18" "percent_under18_healthins"
#> [3] "percent_private_under18" "percent_public_under18"
#> [5] "percent_privpub_under18" "under18"get_var_info returns the following information to
simplify using CongressData:
get_cong_data: Access all or a part of CongressData with
get_cong_data. Subset by state names with
state and years with years (either a single
year or a two-year vector that represents the min/max of what you want).
You can also use the related_to argument to search across
variable names, short/long descriptions from the codebook, and citations
for non-exact matches of a supplied term. For example, searching ‘tax’
will return variables with words like ‘taxes’ and ‘taxable’ in any of
those columns.
# load the entire dataset
all_the_dat <- get_cong_data()
# subset by state, topic, and years
cong_subset <- get_cong_data(states = c("Indiana","Kentucky","Michigan")
,related_to = "tax"
,years = c(1960,1980))Run get_congress_version to see what version of the
dataset is available in CongressData.
CongressData::get_congress_version()
#> You are using CongressData version: 2get_var_info: Each variable in CongressData was
collected from external sources, please use get_var_info to
obtain their citations (plain text and BibTeX). We’ve made it easy to
cite the source of each variable you use with the
get_var_info function described above. Supply a vector of
variable names to the function with the var_names function
and collect the citations provided in the plain text or BibTeX columns.
NOTE: Some variables have multiple citations, so do check you have them
all.
# bibtex is also available
get_var_info(var_names = "com_benghazi_299") %>%
pull(plaintext_cite)
#> [1] "Charles Stewart III and Jonathan Woon. Congressional Committee Assignments, 103rd to 114th Congresses, 1993--2017: House of Representatives, 2017.\n"
# bibtex is also available
get_var_info(var_names = "percent_bus") %>%
pull(plaintext_cite)
#> [1] "U.S. Census Bureau. (2022). 2009-2019 American Community Survey 1-year Estimates. Retrieved from the Census Bureau Data API."In addition to citing each variable’s source, we ask that you cite CongressData if use this package or the dataset:
Grossmann, Matt, Caleb Lucas, and Benjamin Yoel. Introducing CongressData and Correlates of State Policy. East Lansing, MI: Institute for Public Policy and Social Research (IPPSR), 2024.
For questions about the CongressData dataset, contact Ben Yoel (yoelbenj@msu.edu).