| tutorial-id |
none |
131-stops |
| name |
question |
Surya Fraser |
| email |
question |
surya.fraser@gmail.com |
| introduction-1 |
question |
Wisdom, Justice, Courage, and Temperance |
| introduction-2 |
question |
> show_file(".gitignore")
stops_files
> |
| introduction-3 |
question |
> show_file("stops.qmd", chunk = "Last")
#|message: false
library(tidyverse)
library(primer.data)
> |
| introduction-4 |
question |
> library(tidyverse)
> |
| introduction-5 |
question |
Description
This data is from the Stanford Open Policing Project, which aims to improve police accountability and transparency by providing data on traffic stops across the United States. The New Orleans dataset includes detailed information about traffic stops conducted by the New Orleans Police Department. |
| introduction-6 |
question |
A causal effect is the difference between potenial outcomes. |
| introduction-7 |
question |
The fundamental problem of causal inference is that each unit can only receive one treatment. |
| introduction-8 |
question |
arrested |
| introduction-9 |
question |
The variable could be record. It could have values 1 and 0 representing whether or not the person had a criminal record. |
| introduction-10 |
question |
2: the person is wearing a mask or not. |
| introduction-11 |
question |
If mask is 1 (the person is wearing a mask) maybe arrested is 1 (an arrest was made) and if it was 0, maybe arrested was 0 and the causal effect would be 1. |
| introduction-12 |
question |
race |
| introduction-13 |
question |
White and black people maybe. |
| introduction-14 |
question |
What is the difference in the average arrested values for black ad white people? |
| wisdom-1 |
question |
The question, the Preceptor Table, and an examination of the data. |
| wisdom-2 |
question |
A Preceptor Table contains the minimum number of units and columns nessisary to answer the question. |
| wisdom-3 |
question |
The rows in the Preceptor Table are units, it contains columns for covariates, and it also contains column(s) for outcome(s). |
| wisdom-4 |
question |
The drivers. |
| wisdom-5 |
question |
arrested |
| wisdom-6 |
question |
Maybe age. |
| wisdom-7 |
question |
There are none. |
| wisdom-8 |
question |
When the drivers were pulled over maybe. |
| wisdom-9 |
question |
The drivers are rows, there is a column for race, and another column for arrested. |
| wisdom-10 |
question |
"What is the difference in arrest rate between Black and White drivers adjusting for other variables?" |
| wisdom-11 |
question |
Arrests in traffic stops are related to a variety of factors. To understand the impact of race om traffic stop arrests I examine data from the Stanford Open Policing Project. |
| justice-1 |
question |
The Population Table, validity, stability, representativeness, and unconfoundedness. |
| justice-2 |
question |
Validity refers to the consistency between columns in the data and Preceptor Table. |
| justice-4 |
question |
A Population Table combines the data and the Preceptor Table. |
| justice-5 |
question |
Units: drivers
Time: time of arrest |
| justice-6 |
question |
Stability concerns the relationship between the Preceptor Table and the rows in the data, population, and Preceptor Table. |
| justice-7 |
question |
Stability might not hold because of changes overtime. |
| justice-8 |
question |
Representativeness concerns the relationships between the rows in the Population Table. |
| justice-11 |
question |
Unconfoundedness concerns whether the decision for a unit to receive treatment was random. |
| justice-12 |
question |
> library(tidymodels)
── Attaching packages ────────────────
✔ broom 1.0.8 ✔ rsample 1.3.0
✔ dials 1.4.0 ✔ tune 1.3.0
✔ infer 1.0.9 ✔ workflows 1.2.0
✔ modeldata 1.4.0 ✔ workflowsets 1.1.1
✔ parsnip 1.3.2 ✔ yardstick 1.3.2
✔ recipes 1.3.1
── Conflicts ─────────────────────────
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter() masks stats::filter()
✖ recipes::fixed() masks stringr::fixed()
✖ dplyr::lag() masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step() masks stats::step()
• Use suppressPackageStartupMessages() to eliminate package startup messages
> |
| justice-13 |
question |
> library(broom)
> |
| justice-14 |
question |
$$
\log\left( \frac{\mathbb{P}(Y = 1)}{\mathbb{P}(Y = 0)} \right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k
$$ |
| justice-15 |
question |
However, one potential weakness in this model is the lack of stability in the data due to changes overtime in the law. |
| courage-1 |
question |
Courage is how the data generation mechanism is created. |
| courage-2 |
exercise |
linear_reg(engine = "lm") |
| courage-3 |
exercise |
linear_reg(engine = "lm")|>fit(arrested ~ sex, data = x) |
| courage-4 |
exercise |
linear_reg(engine = "lm")|>fit(arrested ~ sex, data = x)|>tidy(conf.int = TRUE) |
| courage-5 |
exercise |
linear_reg(engine = "lm")|>fit(arrested ~ race, data = x)|>tidy(conf.int = TRUE) |
| courage-6 |
exercise |
linear_reg(engine = "lm")|>fit(arrested ~ race, data = x)|>tidy(conf.int = TRUE) |
| courage-7 |
exercise |
linear_reg(engine = "lm")|>fit(arrested ~ sex + race, data = x)|>tidy(conf.int = TRUE) |
| courage-8 |
exercise |
linear_reg(engine = "lm")|>fit(arrested ~ sex + race*zone, data = x)|>tidy(conf.int = TRUE) |
| courage-9 |
exercise |
fit_stops |
| courage-10 |
question |
> x <- stops |>
+ filter(race %in% c("black", "white")) |>
+ mutate(race = str_to_title(race),
+ sex = str_to_title(sex))
+
+ fit_stops <- linear_reg() |>
+ set_engine("lm") |>
+ fit(arrested ~ sex + race*zone, data = x)
> |
| courage-11 |
question |
> library(easystats)
# Attaching packages: easystats 0.7.5
✔ bayestestR 0.16.1 ✔ correlation 0.8.8
✔ datawizard 1.2.0 ✔ effectsize 1.0.1
✔ insight 1.3.1 ✔ modelbased 0.12.0
✔ performance 0.15.0 ✔ parameters 0.27.0
✔ report 0.6.1 ✔ see 0.11.0
> |
| courage-12 |
question |
> check_predictions(extract_fit_engine(fit_stops)) |
| courage-13 |
question |
\log\left( \frac{\mathbb{P}(\text{stop} = 1)}{\mathbb{P}(\text{stop} = 0)} \right)
= 0.177
+ 0.0614 \cdot \text{sex}_{\text{Male}}
- 0.0445 \cdot \text{race}_{\text{White}}
+ 0.0146 \cdot \text{zone}_{\text{B}}
+ 0.00610 \cdot \text{zone}_{\text{C}}
+ 0.0781 \cdot \text{zone}_{\text{D}}
+ 0.00190 \cdot \text{zone}_{\text{E}}
- 0.00271 \cdot \text{zone}_{\text{F}}
+ 0.0309 \cdot \text{zone}_{\text{G}}
+ 0.0757 \cdot \text{zone}_{\text{H}}
+ \cdots |
| courage-14 |
question |
> tutorial.helpers::show_file("stops.qmd", chunk = "Last")
x <- stops |>
filter(race %in% c("black", "white")) |>
mutate(race = str_to_title(race),
sex = str_to_title(sex))
fit_stops <- linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race*zone, data = x)
> |
| courage-15 |
question |
> tutorial.helpers::show_file(".gitignore")
stops_files
*_cache
> |
| courage-16 |
exercise |
tidy(fit_stops, conf.int=TRUE) |
| courage-17 |
question |
> tutorial.helpers::show_file("stops.qmd", chunk = "Last")
# Load required libraries
library(broom)
library(gt)
# Get the model summary
table_data <- tidy(fit_stops, conf.int = TRUE)
# Clean and rename columns
table_data |>
dplyr::mutate(term = gsub("sexMale", "Sex: Male", term),
term = gsub("raceWhite", "Race: White", term),
term = gsub("zone", "Zone ", term),
term = gsub("\\(Intercept\\)", "Intercept", term)) |>
dplyr::select(term, estimate, std.error, conf.low, conf.high, p.value) |>
gt() |>
fmt_number(columns = c(estimate, std.error, conf.low, conf.high, p.value), decimals = 4) |>
cols_label(
term = "Term",
estimate = "Estimate",
std.error = "Std. Error",
conf.low = "95% CI (Low)",
conf.high = "95% CI (High)",
p.value = "p-value"
) |>
tab_header(
title = "Logistic Regression Coefficients",
subtitle = "Model: Probability of Stop"
) |>
tab_options(
table.font.size = "small",
data_row.padding = px(4),
table.align = "center"
)
> |
| courage-18 |
question |
I model traffic stop arrests with a logistic model as a function of race, sex, and zone. |
| temperance-1 |
question |
Temperance concerns answering the initial question. |
| temperance-2 |
question |
This estimate imples that male drivers have a higher log odds of being arrested. |
| temperance-3 |
question |
This estimate means white drivers have a lower log odds of being arrested. |
| temperance-4 |
question |
This reflects the probability for being arrested when all ariables are at their reference levels. |
| temperance-5 |
question |
> library(marginaleffects) |
| temperance-6 |
question |
What is the difference in arrest rate between Black and White drivers adjusting for other variables? |
| temperance-7 |
question |
> predictions(fit_stops)
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0.179 0.00343 52.2 <0.001 Inf 0.173 0.186
0.142 0.00419 33.8 <0.001 828.0 0.133 0.150
0.250 0.00451 55.5 <0.001 Inf 0.241 0.259
0.142 0.00419 33.8 <0.001 828.0 0.133 0.150
0.232 0.01776 13.1 <0.001 127.6 0.198 0.267
--- 378457 rows omitted. See ?print.marginaleffects ---
0.208 0.00390 53.4 <0.001 Inf 0.201 0.216
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.189 0.00545 34.7 <0.001 874.0 0.179 0.200
Type: numeric
> |
| temperance-8 |
question |
> plot_predictions(fit_stops,by="sex") |
| temperance-9 |
question |
> plot_predictions(fit_stops,condition ="sex")
> |
| temperance-10 |
question |
> plot_predictions(fit_stops,condition =c("sex", "race"))
> |
| temperance-11 |
question |
# Load required packages
library(marginaleffects)
library(ggplot2)
# Generate predicted probabilities by sex and race
preds <- plot_predictions(fit_stops, condition = c("sex", "race"), draw = FALSE)
# Plot with ggplot2
ggplot(preds, aes(x = sex, y = estimate, fill = race)) +
geom_col(position = position_dodge(width = 0.7), width = 0.6) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
position = position_dodge(width = 0.7),
width = 0.2) +
labs(
title = "Predicted Probability of Being Stopped",
subtitle = "Grouped by Sex and Race",
x = "Sex",
y = "Predicted Probability",
fill = "Race"
) +
theme_minimal(base_size = 13) +
theme(
legend.position = "top",
plot.title = element_text(face = "bold"),
axis.title.y = element_text(margin = margin(r = 10))
) |
| temperance-12 |
question |
> tutorial.helpers::show_file("stops.qmd", chunk = "Last")
# Load required packages
library(marginaleffects)
library(ggplot2)
# Generate predicted probabilities by sex and race
preds <- plot_predictions(fit_stops, condition = c("sex", "race"), draw = FALSE)
# Plot with ggplot2
ggplot(preds, aes(x = sex, y = estimate, fill = race)) +
geom_col(position = position_dodge(width = 0.7), width = 0.6) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
position = position_dodge(width = 0.7),
width = 0.2) +
labs(
title = "Predicted Probability of Being Stopped",
subtitle = "Grouped by Sex and Race",
x = "Sex",
y = "Predicted Probability",
fill = "Race"
) +
theme_minimal(base_size = 13) +
theme(
legend.position = "top",
plot.title = element_text(face = "bold"),
axis.title.y = element_text(margin = margin(r = 10))
)
> |
| temperance-13 |
question |
According to the model's output white drivers have a lower log odds of being arrested. |
| temperance-15 |
question |
> tutorial.helpers::show_file("stops.qmd")
---
title: "Stops"
format: html
author: Surya Fraser
execute:
echo: false
---
```{r}
#|message: false
#|warning: false
suppressPackageStartupMessages({
library(tidyverse)
library(tidymodels)
})
library(primer.data)
library(broom)
library(marginaleffects)
```
```{r}
x <- stops |>
filter(race %in% c("black", "white")) |>
mutate(race = str_to_title(race),
sex = str_to_title(sex))
```
```{r}
#| cache: true
fit_stops <- linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race*zone, data = x)
```
```{r}
# Load required libraries
library(broom)
library(gt)
# Get the model summary
table_data <- tidy(fit_stops, conf.int = TRUE)
# Clean and rename columns
table_data |>
dplyr::mutate(term = gsub("sexMale", "Sex: Male", term),
term = gsub("raceWhite", "Race: White", term),
term = gsub("zone", "Zone ", term),
term = gsub("\\(Intercept\\)", "Intercept", term)) |>
dplyr::select(term, estimate, std.error, conf.low, conf.high, p.value) |>
gt() |>
fmt_number(columns = c(estimate, std.error, conf.low, conf.high, p.value), decimals = 4) |>
cols_label(
term = "Term",
estimate = "Estimate",
std.error = "Std. Error",
conf.low = "95% CI (Low)",
conf.high = "95% CI (High)",
p.value = "p-value"
) |>
tab_header(
title = "Logistic Regression Coefficients",
subtitle = "Model: Probability of Stop"
) |>
tab_options(
table.font.size = "small",
data_row.padding = px(4),
table.align = "center"
)
```
```{r}
# Load required packages
library(marginaleffects)
library(ggplot2)
# Generate predicted probabilities by sex and race
preds <- plot_predictions(fit_stops, condition = c("sex", "race"), draw = FALSE)
# Plot with ggplot2
ggplot(preds, aes(x = sex, y = estimate, fill = race)) +
geom_col(position = position_dodge(width = 0.7), width = 0.6) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
position = position_dodge(width = 0.7),
width = 0.2) +
labs(
title = "Predicted Probability of Being Stopped",
subtitle = "Grouped by Sex and Race",
x = "Sex",
y = "Predicted Probability",
fill = "Race"
) +
theme_minimal(base_size = 13) +
theme(
legend.position = "top",
plot.title = element_text(face = "bold"),
axis.title.y = element_text(margin = margin(r = 10))
)
```
$$
\log\left( \frac{\mathbb{P}(Y = 1)}{\mathbb{P}(Y = 0)} \right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k
$$
$$
\log\left( \frac{\mathbb{P}(\text{stop} = 1)}{\mathbb{P}(\text{stop} = 0)} \right)
= 0.177
+ 0.0614 \cdot \text{sex}_{\text{Male}}
- 0.0445 \cdot \text{race}_{\text{White}}
+ 0.0146 \cdot \text{zone}_{\text{B}}
+ 0.00610 \cdot \text{zone}_{\text{C}}
+ 0.0781 \cdot \text{zone}_{\text{D}}
+ 0.00190 \cdot \text{zone}_{\text{E}}
- 0.00271 \cdot \text{zone}_{\text{F}}
+ 0.0309 \cdot \text{zone}_{\text{G}}
+ 0.0757 \cdot \text{zone}_{\text{H}}
+ \cdots
$$
Arrests in traffic stops are related to a variety of factors. To understand the impact of race om traffic stop arrests I examine data from the Stanford Open Policing Project. However, one potential weakness in this model is the lack of stability in the data due to changes overtime in the law. I model traffic stop arrests with a logistic model as a function of race, sex, and zone. According to the model's output white drivers have a lower log odds of being arrested.
> |
| temperance-16 |
question |
https://sfraser8910.github.io/stops/ |
| temperance-17 |
question |
https://github.com/sfraser8910/stops |
| minutes |
question |
120 |