| tutorial-id |
none |
131-stops |
| name |
question |
Darakhshan Fatima |
| email |
question |
darakhshan.fatima110@gmail.com |
| introduction-1 |
question |
Wisdom, Justice, Courage, Tamperance |
| introduction-2 |
question |
> show_file(".gitignore")
stops_files
> |
| introduction-3 |
question |
> cat(readLines("stops.qmd"), sep = "\n")
---
title: "Stops"
author: "DK"
format: html
---
> |
| introduction-4 |
question |
> library(tidyverse)
+ library(primer.data) |
| introduction-5 |
question |
This data is from the Stanford Open Policing Project, which aims to improve police accountability and transparency by providing data on traffic stops across the United States. The New Orleans dataset includes detailed information about traffic stops conducted by the New Orleans Police Department. |
| introduction-6 |
question |
A causal effect is the difference between two potential outcomes. |
| introduction-7 |
question |
That we can't test an individual for both outcomes. |
| introduction-8 |
question |
arrested |
| introduction-9 |
question |
officer_aggressive_tone 1 = Officer used an aggressive or commanding tone, 0 = Officer used a calm or respectful tone. It can be manipulated through training in de-escalation, communication, or community policing.
Another one is officer_threatening_body_language, 1 = Officer exhibited threatening or dominating body language (e.g., hand on weapon, standing too close, crossed arms), 0 = Officer showed neutral or non-threatening posture. It can be manipulated through body language and conflict management training. |
| introduction-10 |
question |
If we have a binary treatment variable like mask, which can take on two values:
mask = 1 (the person is wearing a mask)
mask = 0 (the person is not wearing a mask) |
| introduction-11 |
question |
Treatment variable mask has two values:
mask = 1: the person is wearing a mask
mask = 0: the person is not wearing a mask
Guess at the potential outcomes:
If the person wears a mask (mask = 1), they do not get arrested → Y1=0
If the person does not wear a mask (mask = 0), they get arrested → Y0=1
Causal effect for this unit = Y1 - Y0 = 0 - 1 = -1 |
| introduction-12 |
question |
zone or race |
| introduction-13 |
question |
Group 1: Black drivers
Group 2: White drivers |
| introduction-14 |
question |
How does race affect the likelihood of being arrested during a traffic stop? |
| wisdom-1 |
question |
Wisdom requires a question, the creation of a Preceptor Table and an examination of our data. |
| wisdom-2 |
question |
It is the smallest table with rows and columns in which if no values are missing we can calculate results. |
| wisdom-3 |
question |
The rows of the Preceptor Table are the units. The outcome is at least one of the columns. If the problem is causal, there will be at least two (potential) outcome columns. The other columns are covariates. If the problem is causal, at least one of the covariates will considered a treatment. |
| wisdom-4 |
question |
individual drivers |
| wisdom-5 |
question |
arrested |
| wisdom-6 |
question |
race, age and zone |
| wisdom-7 |
question |
No treatment is needed as it is a predictive model. |
| wisdom-8 |
question |
moment of the traffic stop |
| wisdom-9 |
question |
In our preceptor table, the unit = the individual driver, the outcome variable is arrested, the covariates are race, zone, and has no treatment as it is a predictive model. |
| wisdom-10 |
question |
Does the average arrest rate differ by race, across all traffic stops? |
| wisdom-11 |
question |
Many researchers are interested in how demographic characteristics like race relate to outcomes such as being arrested during traffic stops. This dataset, collected by the Stanford Open Policing Project from over 400,000 stops, allows us to examine whether arrest rates differ by race. |
| justice-1 |
question |
Justice concerns the Population Table and the four key assumptions which underlie it: validity, stability, representativeness, and unconfoundedness. |
| justice-2 |
question |
Validity is the consistency, or lack thereof, in the columns of the data set and the corresponding columns in the Preceptor Table. |
| justice-3 |
question |
The assumption of validity might not hold because certain columns, like "arrested," could reflect personal biases of officers toward specific races, which would distort the true relationship between race and arrest outcomes. Additionally, if officers treated drivers differently based on factors like the type of car, but we don’t have a column for that in the data, then omitted variable bias may violate the assumption of validity. |
| justice-4 |
question |
The Population Table includes a row for each unit/time combination in the underlying population from which both the Preceptor Table and the data are drawn. |
| justice-5 |
question |
Unit = one individual driver who was stopped
Time = the date and time of the stop
Each row in the Population Table corresponds to a single traffic stop conducted on a specific driver at a specific date and time. |
| justice-6 |
question |
Stability means that the relationship between the columns in the Population Table is the same for three categories of rows: the data, the Preceptor Table, and the larger population from which both are drawn. |
| justice-7 |
question |
One reason why the assumption of stability might not hold in this case is that officer behavior and policing practices may vary across zones or over time. For example, in certain zones or during specific time periods, officers may be more likely to arrest drivers of certain races due to local policies or events. |
| justice-8 |
question |
Representativeness, or the lack thereof, concerns two relationships among the rows in the Population Table. The first is between the data and the other rows. The second is between the other rows and the Preceptor Table. |
| justice-9 |
question |
One reason the assumption of representativeness might not be true in this data is that it includes only Black and White drivers, while in reality, drivers of other races may also have been stopped and arrested. By excluding those other racial groups, the data may not fully represent the diversity of the actual driving population at that time and location, limiting the generalizability of any conclusions drawn. |
| justice-10 |
question |
One reason the assumption of representativeness might not be true in this case is that the Preceptor Table may not have been randomly selected from the population. If the officers in the Preceptor Table were chosen based on specific criteria (such as only including officers with a certain number of stops or from specific zones), then their behavior and outcomes may not reflect the full range of variation seen in the overall population. This non-random selection would make it difficult to generalize population-level insights to the Preceptor Table accurately |
| justice-11 |
question |
Unconfoundedness means that the treatment assignment is independent of the potential outcomes, when we condition on pre-treatment covariates. |
| justice-12 |
question |
> library(tidyverse)
+ library(primer.data)
+ library(tidymodels)
── Attaching packages ──────────────── tidymodels 1.3.0 ──
✔ broom 1.0.8 ✔ rsample 1.3.0
✔ dials 1.4.0 ✔ tune 1.3.0
✔ infer 1.0.9 ✔ workflows 1.2.0
✔ modeldata 1.4.0 ✔ workflowsets 1.1.1
✔ parsnip 1.3.2 ✔ yardstick 1.3.2
✔ recipes 1.3.1
── Conflicts ─────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter() masks stats::filter()
✖ recipes::fixed() masks stringr::fixed()
✖ purrr::is_null() masks testthat::is_null()
✖ dplyr::lag() masks stats::lag()
✖ rsample::matches() masks dplyr::matches(), tidyr::matches(), testthat::matches()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step() masks stats::step()
• Search for functions across packages at https://www.tidymodels.org/find/
Warning message:
package ‘infer’ was built under R version 4.5.1
> |
| justice-13 |
question |
> library(tidyverse)
+ library(primer.data)
+ library(tidymodels)
+ library(broom)
> |
| justice-14 |
question |
$$
\rho = \mathbb{P}(Y = 1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k)}}
$$ |
| justice-15 |
question |
A potential weakness in the model is that it assumes a linear relationship on the log-odds scale between the predictors and the outcome, which may not hold true if important interaction terms or nonlinear effects are omitted. |
| courage-1 |
question |
Courage creates the data generating mechanism. |
| courage-2 |
exercise |
linear_reg(engine = "lm") |
| courage-3 |
exercise |
linear_reg(engine = "lm") |> fit(arrested ~ sex, data = x) |
| courage-4 |
exercise |
linear_reg(engine = "lm") |> fit(arrested ~ sex, data = x) |> tidy(conf.int = TRUE) |
| courage-5 |
exercise |
linear_reg(engine = "lm") |> fit(arrested ~ race, data = x) |
| courage-6 |
exercise |
linear_reg(engine = "lm") |> fit(arrested ~ race, data = x) |> tidy(conf.int = TRUE) |
| courage-7 |
exercise |
linear_reg(engine = "lm") |> fit(arrested ~ sex + race, data = x) |> tidy(conf.int = TRUE) |
| courage-8 |
exercise |
linear_reg(engine = "lm") |> fit(arrested ~ sex + race*zone, data = x) |> tidy(conf.int = TRUE) |
| courage-9 |
exercise |
fit_stops |
| courage-10 |
question |
> x <- stops |>
+ filter(race %in% c("black", "white")) |>
+ mutate(race = str_to_title(race),
+ sex = str_to_title(sex))
+
+ fit_stops <- linear_reg() |>
+ set_engine("lm") |>
+ fit(arrested ~ sex + race*zone, data = x)
> |
| courage-11 |
question |
> library(easystats)
# Attaching packages: easystats 0.7.5 (red = needs update)
✔ bayestestR 0.16.1 ✔ correlation 0.8.8
✖ datawizard 1.1.0 ✔ effectsize 1.0.1
✔ insight 1.3.1 ✔ modelbased 0.12.0
✔ performance 0.15.0 ✔ parameters 0.27.0
✔ report 0.6.1 ✔ see 0.11.0
Restart the R-Session and update packages with `easystats::easystats_update()`.
Warning message:
package ‘easystats’ was built under R version 4.5.1
> |
| courage-12 |
question |
> check_predictions(extract_fit_engine(fit_stops))
> |
| courage-13 |
question |
$$
\widehat{\mathbb{P}(\text{arrested} = 1)} =
\frac{1}{1 + \exp\left(- \left[
0.177
+ 0.0614 \cdot \text{sex}_{\text{Male}}
- 0.0445 \cdot \text{race}_{\text{White}}
+ 0.0146 \cdot \text{zone}_{\text{B}}
+ 0.00610 \cdot \text{zone}_{\text{C}}
+ 0.0781 \cdot \text{zone}_{\text{D}}
+ 0.00190 \cdot \text{zone}_{\text{E}}
- 0.00271 \cdot \text{zone}_{\text{F}}
+ 0.0309 \cdot \text{zone}_{\text{G}}
+ 0.0757 \cdot \text{zone}_{\text{H}}
+ \text{(interaction terms)}
\right] \right)}
$$ |
| courage-14 |
question |
> tutorial.helpers::show_file("stops.qmd", chunk = "Last")
#| cache: True
x <- stops |>
filter(race %in% c("black", "white")) |>
mutate(
race = str_to_title(race),
sex = str_to_title(sex),
arrested = factor(arrested) # This line is essential!
)
fit_stops <- logistic_reg() |>
set_engine("glm") |>
set_mode("classification") |>
fit(arrested ~ sex + race * zone, data = x)
> |
| courage-15 |
question |
> tutorial.helpers::show_file(".gitignore")
stops_files
*_cache
> |
| courage-16 |
exercise |
tidy(fit_stops, conf_int = TRUE) |
| minutes |
question |
180 |