| tutorial-id |
none |
131-stops |
| name |
question |
Ayush Chandra |
| email |
question |
chandra.ayush2021@gmail.com |
| introduction-1 |
question |
The cardinal virtues are courage, wisdom, justice, and temperance |
| introduction-2 |
question |
> show_file(".gitignore")
stops_files
> |
| introduction-3 |
question |
> show_file("stops.qmd", chunk = "Last")
#| message: false
library(tidyverse)
library(primer.data)
> |
| introduction-4 |
question |
> library(tidyverse)
── Attaching core tidyverse packages ───────────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ─────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package to force all conflicts to become errors |
| introduction-5 |
question |
Description
This data is from the Stanford Open Policing Project, which aims to improve police accountability and transparency by providing data on traffic stops across the United States. The New Orleans dataset includes detailed information about traffic stops conducted by the New Orleans Police Department. |
| introduction-6 |
question |
The difference between 2 potential outcomes |
| introduction-7 |
question |
We can only observe one outcome at a time |
| introduction-8 |
question |
arrested |
| introduction-9 |
question |
it will be manipulated by the inputs |
| introduction-10 |
question |
2, it is binary. Arrested or not arrested |
| introduction-11 |
question |
Arrested and not arrested |
| introduction-12 |
question |
time |
| introduction-13 |
question |
white people and black people |
| introduction-14 |
question |
What is the connection between race and the probability of getting arrested |
| wisdom-1 |
question |
Wisdom involves the preceptors table and the assumption of validity |
| wisdom-2 |
question |
A preceptor table is a data table with the least number of rows and columns needed for all of the data |
| wisdom-3 |
question |
the rows are units while the columns are outcomes and covariates |
| wisdom-4 |
question |
The units are the individual people |
| wisdom-5 |
question |
the outcome is arrested |
| wisdom-6 |
question |
race |
| wisdom-7 |
question |
no treatments since predictive |
| wisdom-8 |
question |
whien the people are pulled over |
| wisdom-9 |
question |
The units are the people who have been pulled over, while the outcome columns are whether they were arrested or not. The covariates for this are race and zone |
| wisdom-10 |
question |
what is the connection between race and the number of arrests made |
| wisdom-11 |
question |
our data source is from the US government |
| justice-1 |
question |
Justice involves the population table and the 4 assumptions that underlie it: validity, representativeness, unconfoundedness, and stability |
| justice-2 |
question |
Validity is the assumption that the columns in the data table alight witht the columns in the population and preceptors tables |
| justice-3 |
question |
It may not be true since the studies could have different covatiates |
| justice-4 |
question |
A population table is the combination of the preceptors table and the data table |
| justice-5 |
question |
it is arrests/year |
| justice-6 |
question |
Stability means that the relationship between the columns in the Population Table is the same for three categories of rows: the data, the Preceptor Table, and the larger population from which both are drawn. |
| justice-7 |
question |
It may not hold since he data could have been drawn from different sources |
| justice-8 |
question |
It is the assumption that the data represents the entire population |
| justice-9 |
question |
It may not be true since this is just a data set taken from New Orleans |
| justice-10 |
question |
It may not be true since the columns may not be aligned |
| justice-11 |
question |
Unconfoundedness means that the treatment assignment is independent of the potential outcomes, when we condition on pre-treatment covariates. This assumption is only relevant for causal models. We describe a model as “confounded” if this is not true. The easiest way to ensure unconfoundedness is to assign treatment randomly. |
| justice-12 |
question |
> library(tidymodels)
── Attaching packages ─────────────────────────────────────────────────── tidymodels 1.3.0 ──
✔ broom 1.0.8 ✔ rsample 1.3.0
✔ dials 1.4.0 ✔ tune 1.3.0
✔ infer 1.0.8 ✔ workflows 1.2.0
✔ modeldata 1.4.0 ✔ workflowsets 1.1.1
✔ parsnip 1.3.2 ✔ yardstick 1.3.2
✔ recipes 1.3.1
── Conflicts ────────────────────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter() masks stats::filter()
✖ recipes::fixed() masks stringr::fixed()
✖ dplyr::lag() masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step() masks stats::step()
• Dig deeper into tidy modeling with R at https://www.tmwr.org |
| justice-13 |
question |
> library(broom)
> |
| justice-14 |
question |
$$
P(Y = 1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_n X_n)}}
$$
with \( Y \sim \text{Bernoulli}(\rho) \) where \( \rho = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_n X_n)}} \) |
| justice-15 |
question |
One weakness is that the study is taken from a small place in New Orleans, therefore it doesn't fully represent the entire US population. |
| courage-1 |
question |
Courage involves the data generating mechanism |
| courage-2 |
exercise |
linear_reg(engine = "lm") |
| courage-3 |
exercise |
linear_reg(engine = "lm") |>
fit(arrested ~ sex, data = x) |
| courage-4 |
exercise |
linear_reg(engine = "lm") |>
fit(arrested ~ sex, data = x) |>
tidy(conf.int = TRUE) |
| courage-5 |
exercise |
linear_reg() |>
set_engine("lm") |>
fit(arrested ~ race, data = x) |
| courage-6 |
exercise |
linear_reg() |>
set_engine("lm") |>
fit(arrested ~ race, data = x) |>
tidy(conf.int = TRUE) |
| courage-7 |
exercise |
linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race, data = x) |>
tidy(conf.int = TRUE) |
| courage-8 |
exercise |
linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race*zone, data = x) |>
tidy(conf.int = TRUE) |
| courage-9 |
exercise |
fit_stops |
| courage-10 |
question |
> x <- stops |>
+ filter(race %in% c("black", "white")) |>
+ mutate(race = str_to_title(race),
+ sex = str_to_title(sex))
+
+ fit_stops <- linear_reg() |>
+ set_engine("lm") |>
+ fit(arrested ~ sex + race*zone, data = x)
> |
| courage-11 |
question |
> library(easystats)
# Attaching packages: easystats 0.7.4 (red = needs update)
✖ bayestestR 0.16.0 ✖ correlation 0.8.7
✖ datawizard 1.1.0 ✔ effectsize 1.0.1
✖ insight 1.3.0 ✖ modelbased 0.11.2
✖ performance 0.14.0 ✖ parameters 0.26.0
✔ report 0.6.1 ✔ see 0.11.0
Restart the R-Session and update packages with `easystats::easystats_update()`. |
| courage-12 |
question |
> check_predictions(extract_fit_engine(fit_stops)) |
| courage-13 |
question |
$$
\widehat{\text{arrested}} = 0.177
+ 0.0614 \cdot \text{sex}_{\text{Male}}
- 0.0445 \cdot \text{race}_{\text{White}}
+ 0.0146 \cdot \text{zone}_{\text{B}}
+ 0.00610 \cdot \text{zone}_{\text{C}}
+ 0.0781 \cdot \text{zone}_{\text{D}}
+ 0.00190 \cdot \text{zone}_{\text{E}}
- 0.00271 \cdot \text{zone}_{\text{F}}
+ 0.0309 \cdot \text{zone}_{\text{G}}
+ 0.0757 \cdot \text{zone}_{\text{H}}
+ \text{(interaction terms for race and zone)}
$$ |
| courage-14 |
question |
> tutorial.helpers::show_file("stops.qmd", chunk = "Last")
#| cache: true
fit_stops <- linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race*zone, data = x)
> |
| courage-15 |
question |
> tutorial.helpers::show_file(".gitignore")
stops_files
*_cache
> |
| courage-16 |
exercise |
tidy(fit_stops, conf.int = TRUE) |
| courage-17 |
question |
#| label: tidy-table
#| echo: false
#| message: false
#| warning: false
library(gt)
tidy(fit_stops) |>
mutate(term = str_replace_all(term, "sex", "Sex: "),
term = str_replace_all(term, "race", "Race: "),
term = str_replace_all(term, "zone", "Zone: "),
estimate = round(estimate, 3),
std.error = round(std.error, 3),
statistic = round(statistic, 2),
p.value = signif(p.value, 3)) |>
gt() |>
tab_header(
title = "Linear Regression Results",
subtitle = "Effect of Race, Sex, and Zone on Arrest Probability"
) |>
cols_label(
term = "Predictor",
estimate = "Estimate",
std.error = "Std. Error",
statistic = "t value",
p.value = "p value"
) |>
fmt_number(columns = c(estimate, std.error, statistic, p.value), decimals = 3) |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels(everything())
) |>
opt_table_outline() |>
opt_row_striping() |
| courage-18 |
question |
we model accidents as a binary function |
| temperance-1 |
question |
Temperance uses the Data Generating Mechanism to answer the specific question with which we began. Humility reminds us that this answer is always a lie. We can also explore the general question by using the DGM to calculate many similar quantities of interest, displaying the results graphically. |
| temperance-2 |
question |
males have a 6% chance of getting arrested compared to the intercept |
| temperance-3 |
question |
White people have a 4% less chance of getting arrested than others |
| temperance-4 |
question |
The intercept is where the values intercect |
| temperance-5 |
question |
> library(marginaleffects)
> |
| temperance-6 |
question |
What si the collelation between sex, race, and zone with our outcome variable arrests |
| temperance-7 |
question |
> predictions(fit_stops)
> |
| temperance-8 |
question |
> plot_predictions(fit_stops, by = "sex")
> |
| temperance-9 |
question |
> plot_predictions(fit_stops, condition = "sex")
> |
| temperance-10 |
question |
> plot_predictions(fit_stops, condition = c("sex", "race"))
> |
| temperance-11 |
question |
preds <- augment(fit_stops, new_data = x)
preds |>
ggplot(aes(x = zone, y = .pred, color = race, linetype = sex, group = interaction(race, sex))) +
stat_summary(fun = mean, geom = "line", size = 1.2) +
stat_summary(fun = mean, geom = "point", size = 3, shape = 21, fill = "white") +
scale_color_manual(values = c("Black" = "#1f77b4", "White" = "#ff7f0e")) +
scale_linetype_manual(values = c("Male" = "solid", "Female" = "dashed")) +
labs(
title = "Predicted Arrest Rate by Race, Zone, and Sex",
subtitle = "Model-based averages from linear regression",
x = "Zone",
y = "Predicted Arrest Rate",
color = "Race",
linetype = "Sex"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 18),
legend.position = "right",
panel.grid.minor = element_blank()
) |
| temperance-12 |
question |
> tutorial.helpers::show_file("stops.qmd", chunk = "Last")
#| label: beautiful-plot
#| fig-cap: "Predicted Arrest Rate by Race, Zone, and Sex"
#| fig-width: 10
#| fig-height: 6
#| warning: false
preds |>
ggplot(aes(x = zone, y = .pred, color = race, linetype = sex, group = interaction(race, sex))) +
stat_summary(fun = mean, geom = "line", size = 1.2) +
stat_summary(fun = mean, geom = "point", size = 3, shape = 21, fill = "white") +
scale_color_manual(values = c("Black" = "#1f77b4", "White" = "#ff7f0e")) +
scale_linetype_manual(values = c("Male" = "solid", "Female" = "dashed")) +
labs(
title = "Predicted Arrest Rate by Race, Zone, and Sex",
subtitle = "Model-based averages from linear regression",
x = "Zone",
y = "Predicted Arrest Rate",
color = "Race",
linetype = "Sex"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 18),
legend.position = "right",
panel.grid.minor = element_blank()
)
> |
| temperance-13 |
question |
One quantity of intrest is how much more black epople ere arrested than white people |
| temperance-14 |
question |
it may be wrong since we only have a 95% confidence |
| temperance-15 |
question |
> tutorial.helpers::show_file("stops.qmd")
---
title: "Stops"
author: "Ayush Chandra"
format: html
execute:
echo: false
---
```{r}
#| message: false
library(tidyverse)
library(primer.data)
library(tidymodels)
library(broom)
library(marginaleffects)
x <- stops |>
filter(race %in% c("black", "white")) |>
mutate(race = str_to_title(race),
sex = str_to_title(sex))
```
```{r}
#| cache: true
fit_stops <- linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race*zone, data = x)
#| label: augment-model
#| message: false
#| warning: false
preds <- augment(fit_stops, new_data = x)
```
```{r}
#| label: tidy-table
#| echo: false
#| message: false
#| warning: false
library(gt)
tidy(fit_stops) |>
mutate(term = str_replace_all(term, "sex", "Sex: "),
term = str_replace_all(term, "race", "Race: "),
term = str_replace_all(term, "zone", "Zone: "),
estimate = round(estimate, 3),
std.error = round(std.error, 3),
statistic = round(statistic, 2),
p.value = signif(p.value, 3)) |>
gt() |>
tab_header(
title = "Linear Regression Results",
subtitle = "Effect of Race, Sex, and Zone on Arrest Probability"
) |>
cols_label(
term = "Predictor",
estimate = "Estimate",
std.error = "Std. Error",
statistic = "t value",
p.value = "p value"
) |>
fmt_number(columns = c(estimate, std.error, statistic, p.value), decimals = 3) |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_column_labels(everything())
) |>
opt_table_outline() |>
opt_row_striping()
```
```{r}
#| label: beautiful-plot
#| fig-cap: "Predicted Arrest Rate by Race, Zone, and Sex"
#| fig-width: 10
#| fig-height: 6
#| warning: false
preds |>
ggplot(aes(x = zone, y = .pred, color = race, linetype = sex, group = interaction(race, sex))) +
stat_summary(fun = mean, geom = "line", size = 1.2) +
stat_summary(fun = mean, geom = "point", size = 3, shape = 21, fill = "white") +
scale_color_manual(values = c("Black" = "#1f77b4", "White" = "#ff7f0e")) +
scale_linetype_manual(values = c("Male" = "solid", "Female" = "dashed")) +
labs(
title = "Predicted Arrest Rate by Race, Zone, and Sex",
subtitle = "Model-based averages from linear regression",
x = "Zone",
y = "Predicted Arrest Rate",
color = "Race",
linetype = "Sex"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 18),
legend.position = "right",
panel.grid.minor = element_blank()
)
```
Racial arguments in policing outcomes remain an important issue, particularly when examining how factors like race and location influence the likelihood of an arrest during traffic stops. Using data from a study of New Orleans drivers, we wish to understand how driver race can have an impact on the probability of getting arrested during a traffic stop. One weakness is that the study is taken from a small place in New Orleans, therefore it doesn't fully represent the entire US population. We model our outcome variable, arrested, as a linear function of sex, race, and zone. From this, we examined that Males are less likely of getting arrested than Females. We estimate that the probability of a Black driver in New Orleans getting arrested during a traffic stop is roughly 5% more than the probability for White drivers.
> |
| temperance-16 |
question |
https://smartsilver5294.github.io/stops/ |
| temperance-17 |
question |
https://github.com/SmartSilver5294/stops |
| minutes |
question |
180 |