id submission_type answer
tutorial-id none 131-stops
name question Luke Li
email question lukelex2008@gmail.com
introduction-1 question wisdom justice courage temperance
introduction-2 question > show_file(".gitignore") stops_files >
introduction-3 question > show_file("stops.qmd", chunk = "Last") #| message: false library(tidyverse) library(primer.data) >
introduction-4 question > library(tidyverse) ── Attaching core tidyverse packages ──────────────── tidyverse 2.0.0 ── ✔ dplyr 1.1.4 ✔ readr 2.1.5 ✔ forcats 1.0.0 ✔ stringr 1.5.1 ✔ ggplot2 3.5.2 ✔ tibble 3.3.0 ✔ lubridate 1.9.4 ✔ tidyr 1.3.1 ✔ purrr 1.1.0 ── Conflicts ────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() ℹ Use the conflicted package to force all conflicts to become errors >
introduction-5 question This data is from the Stanford Open Policing Project, which aims to improve police accountability and transparency by providing data on traffic stops across the United States. The New Orleans dataset includes detailed information about traffic stops conducted by the New Orleans Police Department.
introduction-6 question difference in potential outcomes
introduction-7 question only one potential outcome can be observed
introduction-8 question arrested
introduction-9 question day time night time
introduction-10 question 2, one for mask, one for no mask
introduction-11 question 0 - no mask - no arrest, 1 - mask - arrest. Causal effect is difference between no arrest and arrest
introduction-12 question time
introduction-13 question white vs black
introduction-14 question How is the race of a driver related to whether or not they get arrested at a traffic stop?
wisdom-1 question preceptor table examination of data research question
wisdom-2 question data in least rows and columns to find quantity of interest
wisdom-3 question Rows are units, columns for outcomes and covariates
wisdom-4 question individual drivers
wisdom-5 question arrested
wisdom-6 question race
wisdom-7 question none
wisdom-8 question current
wisdom-9 question rows are units of individual drivers stopped, column for outcome of arrested or not, column for covariate of race
wisdom-10 question What is difference in whether or not someone is arrested between white people and black people?
wisdom-11 question Often times, a person's appearance unfortunately factors into how others treat them. Using data from the Stanford Open Policing Project, we use 400,000 traffic stops from New Orleans to find the difference in arrest rate between Black and White drivers adjusting for other variables.
justice-1 question population table, validity, stability, representativeness, unconfoundedness
justice-2 question Consistency between columns from data and columns from preceptor table
justice-3 question Arrested is a very objective outcome. Validity should hold
justice-4 question unit/time combination from population where data and preceptor table are derived from
justice-5 question individual drivers stopped / year
justice-6 question no change over time
justice-7 question conditions that warrant an arrest changed since that time period
justice-8 question data and preceptor table are random samples from same popoulation
justice-9 question traffic stop data only came from reported traffic stops, what if some werent reported
justice-10 question New Orleans is not representative of the whole country
justice-11 question Treatments are randomly assigned and independent of covariates
justice-12 question > library(tidymodels) ── Attaching packages ────────────────────────────── tidymodels 1.3.0 ── ✔ broom 1.0.8 ✔ rsample 1.3.0 ✔ dials 1.4.0 ✔ tune 1.3.0 ✔ infer 1.0.8 ✔ workflows 1.2.0 ✔ modeldata 1.4.0 ✔ workflowsets 1.1.1 ✔ parsnip 1.3.2 ✔ yardstick 1.3.2 ✔ recipes 1.3.1 ── Conflicts ───────────────────────────────── tidymodels_conflicts() ── ✖ scales::discard() masks purrr::discard() ✖ dplyr::filter() masks stats::filter() ✖ recipes::fixed() masks stringr::fixed() ✖ dplyr::lag() masks stats::lag() ✖ yardstick::spec() masks readr::spec() ✖ recipes::step() masks stats::step() • Dig deeper into tidy modeling with R at https://www.tmwr.org >
justice-13 question > library(broom) >
justice-14 question $$ P(Y = 1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_n X_n)}} $$
justice-15 question A potential weakness of our model is that the dataset is heavily modified from the actual study, and is reduced down to around 400,000 entries, therefore leaving out nearly 3.1 million entries. The deletion of the entries may have led to a misrepresentation of the population, and this would mess up our final predictions because it would be providing values that don't represent to the area, causing our entire model to be unrealistic.
courage-1 question data generating mechanism
courage-2 exercise linear_reg(engine = "lm")
courage-3 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex, data = x)
courage-4 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex, data = x) |> tidy(conf.int = TRUE)
courage-5 exercise linear_reg(engine = "lm") |> fit(arrested ~ race, data = x)
courage-6 exercise linear_reg(engine = "lm") |> fit(arrested ~ race, data = x) |> tidy(conf.int = TRUE)
courage-7 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex + race, data = x) |> tidy(conf.int = TRUE)
courage-8 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex + race*zone, data = x) |> tidy(conf.int = TRUE)
courage-9 exercise fit_stops
courage-10 question > x <- stops |> + filter(race %in% c("black", "white")) |> + mutate(race = str_to_title(race), + sex = str_to_title(sex)) + + fit_stops <- linear_reg() |> + set_engine("lm") |> + fit(arrested ~ sex + race*zone, data = x) >
courage-11 question > library(easystats) # Attaching packages: easystats 0.7.4 (red = needs update) ✖ bayestestR 0.16.0 ✖ correlation 0.8.7 ✖ datawizard 1.1.0 ✔ effectsize 1.0.1 ✖ insight 1.3.0 ✖ modelbased 0.11.2 ✖ performance 0.14.0 ✖ parameters 0.26.0 ✔ report 0.6.1 ✔ see 0.11.0 Restart the R-Session and update packages with `easystats::easystats_update()`. >
courage-12 question > check_predictions(extract_fit_engine(fit_stops)) >
courage-13 question $$ \widehat{\text{arrested}} = 0.177 + 0.0614 \cdot \text{sex}_{\text{Male}} - 0.0445 \cdot \text{race}_{\text{White}} + 0.0146 \cdot \text{zone}_{\text{B}} + 0.00610 \cdot \text{zone}_{\text{C}} + 0.0781 \cdot \text{zone}_{\text{D}} + 0.00190 \cdot \text{zone}_{\text{E}} - 0.00271 \cdot \text{zone}_{\text{F}} + 0.0309 \cdot \text{zone}_{\text{G}} + 0.0757 \cdot \text{zone}_{\text{H}} + \text{(interaction terms for race and zone)} $$
courage-14 question > tutorial.helpers::show_file("stops.qmd", chunk = "Last") #| cache: true x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x) >
courage-15 question > tutorial.helpers::show_file(".gitignore") stops_files *_cache >
courage-16 exercise tidy(fit_stops, conf.int = TRUE)
courage-17 question > tutorial.helpers::show_file("stops.qmd", chunk = "Last") library(dplyr) library(gt) tidy(fit_stops, conf.int = TRUE) |> select(term, estimate, conf.low, conf.high) |> mutate( estimate = round(estimate, 3), conf.low = round(conf.low, 3), conf.high = round(conf.high, 3), `95% CI` = paste0("[", conf.low, ", ", conf.high, "]") ) |> select(term, estimate, `95% CI`) |> gt() |> tab_header( title = "Model Coefficients", subtitle = "Estimates and 95% Confidence Intervals" ) |> cols_label( term = "Term", estimate = "Estimate" ) |> fmt_number( columns = estimate, decimals = 3 ) |> tab_options( table.font.size = "small", column_labels.font.weight = "bold" ) >
courage-18 question We model the log odds of getting arrested at a traffic stop as a logistic function of sex and race.
temperance-1 question Using our data generating mechanism and interpreting results; humility reminds us that it is always a lie.
temperance-2 question Holding all other variables constant, the if the individual is male, then their probability of getting arrested increases by 0.06
temperance-3 question When comparing white people with other races, white people have a 0.04 lower value for arrested, meaning that they are less likely to get arrested, relative to other races, conditional on the other variables in the model.
temperance-4 question If all other variables are at their base value, then the chance of getting arrested at a traffic stop is 0.18.
temperance-5 question > library(marginaleffects) >
temperance-6 question What is the difference in chance of getting arrested at a traffic stop between black and white people?
temperance-7 question > predictions(fit_stops) Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % 0.179 0.00343 52.2 <0.001 Inf 0.173 0.186 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.250 0.00451 55.5 <0.001 Inf 0.241 0.259 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.232 0.01776 13.1 <0.001 127.6 0.198 0.267 --- 378457 rows omitted. See ?print.marginaleffects --- 0.208 0.00390 53.4 <0.001 Inf 0.201 0.216 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.189 0.00545 34.7 <0.001 874.0 0.179 0.200 Type: numeric
temperance-8 question > plot_predictions(fit_stops, by = "sex") >
temperance-9 question > plot_predictions(fit_stops, condition = "sex") >
temperance-10 question > plot_predictions(fit_stops, condition = c("sex", "race")) >
temperance-11 question library(ggplot2) library(dplyr) # Here's the polished plot plot_predictions(fit_stops, condition = c("sex", "race")) |> ggplot(aes(x = sex, y = estimate, color = race)) + geom_point(size = 3) + geom_errorbar(aes(ymin = estimate - std.error, ymax = estimate + std.error), width = 0.1, size = 0.8) + scale_color_manual(values = c("Black" = "#F8766D", "White" = "#00BFC4")) + labs( title = "Arrest Probability by Sex and Race", subtitle = "Black individuals—especially men—are arrested at higher rates than White counterparts.", x = "Sex", y = "Estimated Probability of Arrest", color = "Race", caption = "Source: Model predictions from study dataset" ) + theme_minimal(base_size = 13) + theme( plot.title = element_text(face = "bold", size = 16), plot.subtitle = element_text(size = 12, margin = margin(b = 10)), plot.caption = element_text(size = 10, face = "italic"), legend.position = "right", panel.grid.minor = element_blank() )
temperance-12 question > tutorial.helpers::show_file("stops.qmd", chunk = "Last") plot_predictions(fit_stops, condition = c("sex", "race")) |> ggplot(aes(x = sex, y = estimate, color = race)) + geom_point(size = 3) + geom_errorbar(aes(ymin = estimate - std.error, ymax = estimate + std.error), width = 0.1, size = 0.8) + scale_color_manual(values = c("Black" = "#F8766D", "White" = "#00BFC4")) + labs( title = "Arrest Probability by Sex and Race", subtitle = "Black individuals—especially men—are arrested at higher rates than White counterparts.", x = "Sex", y = "Estimated Probability of Arrest", color = "Race", caption = "Source: Model predictions from study dataset" ) + theme_minimal(base_size = 13) + theme( plot.title = element_text(face = "bold", size = 16), plot.subtitle = element_text(size = 12, margin = margin(b = 10)), plot.caption = element_text(size = 10, face = "italic"), legend.position = "right", panel.grid.minor = element_blank() ) >
temperance-13 question One quantity of interest and a measure of its uncertainty is the coefficient for having the male sex. When comparing men with women, men have a 0.06 higher value for arrested, meaning that they are more likely to get arrested, relative to women, conditional on the other variables in the model. The 95% confidence interval for this value ranges from 0.059 to 0.064.
temperance-14 question The estimates may be wrong because the
temperance-15 question > tutorial.helpers::show_file("stops.qmd") --- title: "Stops" format: html author: Luke Li execute: echo: false --- ```{r} #| message: false library(tidyverse) library(primer.data) library(tidymodels) library(broom) library(gt) library(marginaleffects) ``` ```{r} #| cache: true x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x) ``` ```{r} library(ggplot2) plot_predictions(fit_stops, condition = c("sex", "race")) + labs( title = "Arrest Probability Varies by Sex and Race", subtitle = "Black males have the highest predicted probability of arrest; disparities exist across both race and gender.", x = "Sex", y = "Estimated Probability of Being Arrested", caption = "Source: Model predictions from fitted logistic regression on stop data" ) + theme_minimal(base_size = 13) + theme( plot.title = element_text(face = "bold", size = 16), plot.subtitle = element_text(size = 12, margin = margin(b = 10)), plot.caption = element_text(size = 10, face = "italic"), axis.title.x = element_text(size = 12), axis.title.y = element_text(size = 12), legend.title = element_text(size = 12), legend.text = element_text(size = 11) ) + scale_color_manual( values = c("Black" = "#F8766D", "White" = "#00BFC4") ) ``` ```{r} tidy(fit_stops, conf.int = TRUE) |> select(term, estimate, conf.low, conf.high) |> mutate( estimate = round(estimate, 3), conf.low = round(conf.low, 3), conf.high = round(conf.high, 3), `95% CI` = paste0("[", conf.low, ", ", conf.high, "]") ) |> select(term, estimate, `95% CI`) |> gt() |> tab_header( title = "Model Coefficients", subtitle = "Estimates and 95% Confidence Intervals" ) |> cols_label( term = "Term", estimate = "Estimate" ) |> fmt_number( columns = estimate, decimals = 3 ) |> tab_options( table.font.size = "small", column_labels.font.weight = "bold" ) ``` $$ P(Y = 1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_n X_n)}} $$ $$ \widehat{\text{arrested}} = 0.177 + 0.0614 \cdot \text{sex}_{\text{Male}} - 0.0445 \cdot \text{race}_{\text{White}} + 0.0146 \cdot \text{zone}_{\text{B}} + 0.00610 \cdot \text{zone}_{\text{C}} + 0.0781 \cdot \text{zone}_{\text{D}} + 0.00190 \cdot \text{zone}_{\text{E}} - 0.00271 \cdot \text{zone}_{\text{F}} + 0.0309 \cdot \text{zone}_{\text{G}} + 0.0757 \cdot \text{zone}_{\text{H}} + \text{(interaction terms for race and zone)} $$ Often times, a person's appearance unfortunately factors into how others treat them. Using data from the Stanford Open Policing Project, we use 400,000 traffic stops from New Orleans to find the difference in arrest rate between Black and White drivers adjusting for other variables. A potential weakness of our model is that the dataset is heavily modified from the actual study, and is reduced down to around 400,000 entries, therefore leaving out nearly 3.1 million entries. The deletion of the entries may have led to a misrepresentation of the population, and this would mess up our final predictions because it would be providing values that don't represent to the area, causing our entire model to be unrealistic. We modeled the probability of getting arrested at a traffic stop as a linear function of sex and the product of race and zone. One quantity of interest and a measure of its uncertainty is the coefficient for having the male sex. When comparing men with women, men have a 6% higher probability for getting arrested, conditional on the other variables in the model. The 95% confidence interval for this value ranges from 5.9% to 6.4%.Based on all of the errors in our data that we collected during Justice, our original data source to base our model on was already flawed. Additionally, our model can only predict estimates on paper, based on an ideal world. It doesn’t account for many factors of the real world that we, ourselves, can’t even describe. >
temperance-16 question https://lukearoni.github.io/stops/
temperance-17 question https://github.com/lukearoni/stops
minutes question 70