| tutorial-id |
none |
131-stops |
| name |
question |
Shuntaro Kawakami |
| email |
question |
skawakam@hotmail.com |
| introduction-1 |
question |
Wisdom, Justice, Courage, Temperance |
| introduction-2 |
question |
> show_file(".gitignore")
stops_files |
| introduction-3 |
question |
> show_file("stops.qmd", chunk = "Last")
library(tidyverse)
library(primer.data) |
| introduction-4 |
question |
> library(tidyverse) |
| introduction-5 |
question |
Description
This data is from the Stanford Open Policing Project, which aims to improve police accountability and transparency by providing data on traffic stops across the United States. The New Orleans dataset includes detailed information about traffic stops conducted by the New Orleans Police Department. |
| introduction-6 |
question |
Difference between potential outcome under treatment and control |
| introduction-7 |
question |
Not possible to have different outcome at the same time. |
| introduction-8 |
question |
arrested |
| introduction-9 |
question |
whether if driver received ticket or not |
| introduction-10 |
question |
two |
| introduction-11 |
question |
if wearing mask, more likely to be arrested |
| introduction-12 |
question |
race |
| introduction-13 |
question |
White vs Black |
| introduction-14 |
question |
Is Black people more likely to be arrested? |
| wisdom-1 |
question |
Preceptor table, |
| wisdom-2 |
question |
Smallest possible table of data with rows and column such that if there is no missing data, we can easily calculate quantities of interest. |
| wisdom-3 |
question |
Unit = The things or individuals on which data is collected.
Outcome = The main variable(s) you are trying to predict or explain.
Covariate = Variables that are used to help explain or predict the outcome. |
| wisdom-4 |
question |
Driver |
| wisdom-5 |
question |
arrest |
| wisdom-6 |
question |
race |
| wisdom-7 |
question |
No treatment |
| wisdom-8 |
question |
Current |
| wisdom-9 |
question |
id, arrested, race |
| wisdom-10 |
question |
Does race affect number of drivers arrested? |
| wisdom-11 |
question |
We are interested in to know the pattern of drivers arrested when they are pulled over. One pattern we want to find out whether if race is correlated. |
| justice-1 |
question |
Population table, validity, stability, representative, unconfoundedness |
| justice-2 |
question |
Validity is about columns in population table and data. In order to consider the two data sets to be drawn from the same population, the columns from one must have a valid correspondence with the columns in the other. |
| justice-3 |
question |
Arrested column of data or population table may not necessary represent arrest covariate of Preceptor table. |
| justice-4 |
question |
The Population Table includes a row for each unit/time combination in the underlying population from which both the Preceptor Table and the data are drawn. It can be constructed if the validity assumption is (mostly) true. |
| justice-5 |
question |
Drivers pulled over
Data collected between July 1, 2011 to July 18, 2018 |
| justice-6 |
question |
Stability means that the relationship between the columns in the Population Table is the same for three categories of rows: the data, the Preceptor Table, and the larger population from which both are drawn. |
| justice-7 |
question |
Trend of drivers arrested data collected may not be represent current trend of drivers arrested |
| justice-8 |
question |
representativeness refers to the idea that the data used for analysis (such as a sample or training dataset) accurately reflects the larger population or process from which it was drawn. |
| justice-9 |
question |
Drivers in New Orleans may drive differently in other location in US. |
| justice-10 |
question |
Drivers in New Orleans may drive differently in other location in US. |
| justice-11 |
question |
Unconfoundedness means that the treatment assignment is independent of the potential outcomes (The easiest way to ensure unconfoundedness is to assign treatment randomly) |
| justice-12 |
question |
> library(tidymodels)
── Attaching packages ─────────────────────────────── tidymodels 1.3.0 ──
✔ broom 1.0.8 ✔ rsample 1.3.0
✔ dials 1.4.0 ✔ tune 1.3.0
✔ infer 1.0.9 ✔ workflows 1.2.0
✔ modeldata 1.4.0 ✔ workflowsets 1.1.1
✔ parsnip 1.3.2 ✔ yardstick 1.3.2
✔ recipes 1.3.1
── Conflicts ────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter() masks stats::filter()
✖ recipes::fixed() masks stringr::fixed()
✖ dplyr::lag() masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step() masks stats::step()
• Use tidymodels_prefer() to resolve common conflicts. |
| justice-13 |
question |
> library(broom) |
| justice-14 |
question |
\[
\log\left( \frac{\rho}{1 - \rho} \right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k
\] |
| justice-15 |
question |
Racial disparities in policing outcomes remain a pressing concern, particularly when examining how factors like race and location influence the likelihood of arrest during traffic stops. Using data from a study of New Orleans drivers, we seek to understand the relationship between driver race and the probability of getting arrested during a traffic stop. |
| courage-1 |
question |
Intellectual Honesty
Speaking Truth to Power
Transparency & Accountability
Ethics & Privacy
Perseverance |
| courage-2 |
exercise |
linear_reg(engine = "lm") |
| courage-3 |
exercise |
linear_reg(engine = "lm") |> fit(arrested ~ sex, data =x) |
| courage-4 |
exercise |
linear_reg(engine = "lm") |> fit(arrested ~ sex, data =x) |> tidy(conf.int = TRUE) |
| courage-5 |
exercise |
linear_reg(engine = "lm") |> fit(arrested ~ race, data =x) |> tidy(conf.int = TRUE) |
| courage-6 |
exercise |
linear_reg(engine = "lm") |> fit(arrested ~ race, data =x) |> tidy(conf.int = TRUE) |
| courage-7 |
exercise |
linear_reg(engine = "lm") |> fit(arrested ~ sex + race, data =x) |> tidy(conf.int = TRUE) |
| courage-8 |
exercise |
linear_reg(engine = "lm") |> fit(arrested ~ sex + race*zone, data =x) |> tidy(conf.int = TRUE) |
| courage-9 |
exercise |
fit_stops |
| courage-10 |
question |
x <- stops |>
filter(race %in% c("black", "white")) |>
mutate(race = str_to_title(race),
sex = str_to_title(sex))
fit_stops <- linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race*zone, data = x) |
| courage-11 |
question |
> library(easystats)
# Attaching packages: easystats 0.7.4 (red = needs update)
✖ bayestestR 0.16.0 ✖ correlation 0.8.7
✖ datawizard 1.1.0 ✔ effectsize 1.0.1
✖ insight 1.3.0 ✖ modelbased 0.11.2
✖ performance 0.14.0 ✖ parameters 0.26.0
✔ report 0.6.1 ✔ see 0.11.0
Restart the R-Session and update packages with `easystats::easystats_update()`. |
| courage-12 |
question |
> check_predictions(extract_fit_engine(fit_stops)) |
| courage-13 |
question |
\[
\hat{Y} = \text{logit}^{-1} \left(
-2.43
+ 0.01 \cdot \text{age}
+ 0.04 \cdot \text{sex}_{\text{Male}}
+ 0.09 \cdot \text{treatment}_{\text{Civic Duty}}
+ 0.07 \cdot \text{treatment}_{\text{Hawthorne}}
+ 0.20 \cdot \text{treatment}_{\text{Self}}
+ 0.36 \cdot \text{treatment}_{\text{Neighbors}}
+ 0.82 \cdot \text{voter\_class}_{\text{Sometimes Vote}}
+ 1.61 \cdot \text{voter\_class}_{\text{Always Vote}}
+ 0.03 \cdot \text{treatment}_{\text{Civic Duty}} \times \text{voter\_class}_{\text{Sometimes Vote}}
\right)
\] |
| courage-14 |
question |
#| cache: true
x <- stops |>
filter(race %in% c("black", "white")) |>
mutate(race = str_to_title(race),
sex = str_to_title(sex))
fit_stops <- linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race*zone, data = x) |
| courage-15 |
question |
> tutorial.helpers::show_file(".gitignore")
stops_files
*_cache |
| courage-16 |
exercise |
tidy(fit_stops, conf.int=TRUE) |
| courage-17 |
question |
> tutorial.helpers::show_file("stops.qmd", chunk = "Last")
#| cache: true
x <- stops |>
filter(race %in% c("black", "white")) |>
mutate(race = str_to_title(race),
sex = str_to_title(sex))
fit_stops <- linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race*zone, data = x)
tidy(fit_stops, conf.int=TRUE) |
| temperance-1 |
question |
Temperance guides us in the use of the model we have created to answer the questions with which we began. |
| temperance-2 |
question |
The estimate of 0.06 for sexMale means that, holding all other variables constant, being male is associated with a 0.06 unit increase in the predicted value of the outcome variable compared to being female (the reference group).
Since the 95% confidence interval (0.0585 to 0.0644) does not contain zero, this effect is statistically significant at the 5% level. |
| temperance-3 |
question |
The estimate of -0.04 for raceWhite means that, holding all other variables constant, being White is associated with a 0.04 unit decrease in the predicted value of the outcome variable compared to individuals in the reference race category (i.e., non-White).
Since the 95% confidence interval (-0.057 to -0.032) does not include zero, this effect is statistically significant at the 5% level. |
| temperance-4 |
question |
The estimate of 0.18 for the intercept means that, when all predictor variables are at their reference levels (e.g., female, non-White, and in zone A), the predicted value of the outcome variable is approximately 0.18.
The 95% confidence interval (0.171 to 0.184) is narrow and does not include zero, indicating that the intercept is statistically significant. |
| temperance-5 |
question |
> library(marginaleffects) |
| temperance-6 |
question |
Does race correlate with drivers arrested? |
| temperance-7 |
question |
> predictions(fit_stops)
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0.179 0.00343 52.2 <0.001 Inf 0.173 0.186
0.142 0.00419 33.8 <0.001 828.0 0.133 0.150
0.250 0.00451 55.5 <0.001 Inf 0.241 0.259
0.142 0.00419 33.8 <0.001 828.0 0.133 0.150
0.232 0.01776 13.1 <0.001 127.6 0.198 0.267
--- 378457 rows omitted. See ?print.marginaleffects ---
0.208 0.00390 53.4 <0.001 Inf 0.201 0.216
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.189 0.00545 34.7 <0.001 874.0 0.179 0.200
Type: numeric |
| temperance-8 |
question |
> predictions(fit_stops, by="sex")
sex Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Female 0.192 0.001234 156 <0.001 Inf 0.190 0.194
Male 0.254 0.000823 309 <0.001 Inf 0.253 0.256
Type: numeric |
| temperance-9 |
question |
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0.179 0.00343 52.2 <0.001 Inf 0.173 0.186
0.142 0.00419 33.8 <0.001 828.0 0.133 0.150
0.250 0.00451 55.5 <0.001 Inf 0.241 0.259
0.142 0.00419 33.8 <0.001 828.0 0.133 0.150
0.232 0.01776 13.1 <0.001 127.6 0.198 0.267
--- 378457 rows omitted. See ?print.marginaleffects ---
0.208 0.00390 53.4 <0.001 Inf 0.201 0.216
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.189 0.00545 34.7 <0.001 874.0 0.179 0.200
Type: numeric
> |
| temperance-10 |
question |
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0.179 0.00343 52.2 <0.001 Inf 0.173 0.186
0.142 0.00419 33.8 <0.001 828.0 0.133 0.150
0.250 0.00451 55.5 <0.001 Inf 0.241 0.259
0.142 0.00419 33.8 <0.001 828.0 0.133 0.150
0.232 0.01776 13.1 <0.001 127.6 0.198 0.267
--- 378457 rows omitted. See ?print.marginaleffects ---
0.208 0.00390 53.4 <0.001 Inf 0.201 0.216
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.189 0.00545 34.7 <0.001 874.0 0.179 0.200
Type: numeric
> |
| temperance-11 |
question |
plot_predictions() +
labs(
title = "Sex, Race, and Zone Predict Differences in the Outcome",
subtitle = "Men and non-White individuals in Zone A have notably different predicted values compared to others.",
x = "Group (e.g., Male / Female, Race, Zone)",
y = "Predicted Outcome"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(size = 13, margin = margin(b = 10)),
axis.title.x = element_text(size = 13),
axis.title.y = element_text(size = 13)
) |
| temperance-12 |
question |
> tutorial.helpers::show_file("stops.qmd", chunk = "Last")
#| cache: true
x <- stops |>
filter(race %in% c("black", "white")) |>
mutate(race = str_to_title(race),
sex = str_to_title(sex))
fit_stops <- linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race*zone, data = x)
tidy(fit_stops, conf.int=TRUE) |
| temperance-13 |
question |
T |
| temperance-14 |
question |
The estimates and confidence intervals for the quantities of interest might be wrong or misleading if key modeling assumptions are violated. For example, if the model suffers from omitted variable bias, important predictors that influence the outcome (like socioeconomic status or regional differences) might be missing, leading to biased coefficient estimates. Similarly, if the relationships between predictors and the outcome are nonlinear or if there are interactions not captured in the model, the linear estimates could misrepresent the true effects. |
| temperance-15 |
question |
> tutorial.helpers::show_file("stops.qmd")
---
title: "Stops"
format: html
author: "Shuntaro Kawakami"
execute:
echo: false
---
```{r}
library(tidyverse)
library(primer.data)
library(tidyverse)
library(tidymodels)
library(broom)
library(marginaleffects)
```
Racial disparities in policing outcomes remain a pressing concern, particularly when examining how factors like race and location influence the likelihood of arrest during traffic stops. Using data from a study of New Orleans drivers, we seek to understand the relationship between driver race and the probability of getting arrested during a traffic stop.
Racial disparities in policing outcomes remain a pressing concern, particularly when examining how factors like race and location influence the likelihood of arrest during traffic stops. Using data from a study of New Orleans drivers, we seek to understand the relationship between driver race and the probability of getting arrested during a traffic stop.
```{r}
#| cache: true
x <- stops |>
filter(race %in% c("black", "white")) |>
mutate(race = str_to_title(race),
sex = str_to_title(sex))
fit_stops <- linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race*zone, data = x)
tidy(fit_stops, conf.int=TRUE)
```
Using data from a study of New Orleans drivers, we seek to understand the relationship between driver race and the probabilty of getting arrested during a traffic stop. However, our data from both our Preceptor Table and our dataset may not fully represent the population as both may not be from the same time frame and some of our data may come from biased officers, who may target certain groups of individuals. |
| temperance-16 |
question |
https://skawakamNY.github.io/stops |
| temperance-17 |
question |
https://github.com/skawakamNY/stops |
| minutes |
question |
90 |