| tutorial-id |
none |
stops |
| name |
question |
Abdul Hannan |
| email |
question |
abdul.hannan20008@gmail.com |
| introduction-1 |
question |
The four Cardinal Virtues, in order, that guide our data science work are:
1. Wisdom
2. Courage
3. Temperance
4. Justice |
| the-question-1 |
exercise |
library(tidyverse) |
| the-question-2 |
exercise |
library(primer.data) |
| the-question-3 |
question |
stops contains data from over 400,000 traffic stops in New Orleans from July 1, 2011 to July 18, 2018. The dataset includes information about the date, time, and location of each stop, as well as demographic details about the driver and the outcomes of the stop. |
| the-question-4 |
question |
The outcome variable is arrested, which is a binary variable showing whether a person was arrested (TRUE) or not (FALSE) during a traffic stop. |
| the-question-5 |
question |
We can create a new treatment variable called mask, which indicates whether a driver was wearing a mask during the stop (TRUE or FALSE). We might manipulate this by asking some drivers to wear a mask and others not to, then observe how it affects the chance of arrest. |
| the-question-6 |
question |
There are two potential outcomes for each person:
What would happen if they wore a mask.
What would happen if they did not wear a mask. |
| the-question-7 |
question |
Let’s say for one person:
If wearing a mask: not arrested (0)
If not wearing a mask: arrested (1)
The causal effect = 0 - 1 = -1
A causal effect of -1 means wearing a mask reduced the chance of arrest. |
| the-question-8 |
question |
One variable that might help predict arrests is age. Different age groups might face different arrest rates. |
| the-question-9 |
question |
We can compare:
Black drivers
White drivers
These groups might have different average arrest rates.t’s important not to say one race “causes” more arrests — just that we observe differences between groups. |
| the-question-10 |
question |
What is the difference in arrest probability between Black and White drivers during traffic stops? |
| wisdom-1 |
question |
Wisdom means asking good questions, thinking clearly about what we’re doing, and making sure we understand the problem before jumping into the data. It’s about understanding the goal, the context, and being thoughtful about what we analyze and why. |
| wisdom-2 |
question |
A Preceptor Table is a simple table that includes the outcome we care about and a few important covariates we’ll use to answer our main question. It shows one row per unit (like one person or one stop). |
| wisdom-3 |
question |
A Preceptor Table includes:
Units: the individual cases (like traffic stops)
Outcomes: what happened (e.g., arrested or not)
Covariates: characteristics that help explain the outcome (e.g., race, sex, time) |
| wisdom-4 |
question |
> show_file("stops.qmd")
---
title: "Stops"
format: html
---
> |
| wisdom-5 |
question |
Each unit is a single traffic stop. |
| wisdom-6 |
question |
The outcome is whether or not someone was arrested — this is the arrested variable. |
| wisdom-7 |
question |
Some useful covariates could include:
Race
Sex
Age
Zone or neighborhood
Time of day
Reason for stop
These are things that might affect whether someone gets arrested. |
| wisdom-8 |
question |
There is no treatment variable in this problem because it's a predictive model, not a causal one. But race and other covariates are key predictors. |
| wisdom-9 |
question |
It refers to the moment just after the traffic stop, when we know whether or not an arrest occurred. |
| wisdom-10 |
question |
A causal effect is the difference between what happens under two different scenarios: one where a treatment is applied and one where it isn’t. It's the difference between two potential outcomes for the same unit.
Imagine a person wearing a mask vs. not wearing a mask — the change in their chance of being arrested is the causal effect. |
| wisdom-11 |
question |
We can never observe both potential outcomes for the same unit at the same time — we only see one. That makes it impossible to directly measure causal effects. |
| wisdom-12 |
question |
Since we can't manipulate race or other variables here, we can’t make a true causal claim. We can only observe differences, not causes. |
| wisdom-13 |
question |
The Preceptor Table includes:
ID
Outcome: arrested
Covariates: race, sex, zone
Each row is one stop. |
| wisdom-14 |
question |
> show_file("stops.qmd", start = -5)
---
title: "Stops"
format: html
---
> show_file("stops.qmd", start = -5)
```{r}
library(tidyverse)
library(primer.data)
```
Warning message:
In readLines(path) : incomplete final line found on 'stops.qmd'
> |
| wisdom-15 |
question |
Validity means that the columns in our Preceptor Table match the columns in the actual data — they measure the same things in the same way. |
| wisdom-16 |
question |
Validity might not hold if, for example, the race column in the dataset is missing values or coded differently than expected — so it doesn’t match the Preceptor Table’s idea of race. |
| wisdom-17 |
question |
Differences in how people are treated by the police based on race are an important issue in government policy and fairness. This study uses traffic stop data from New Orleans, collected by the Open Policing Project, to look at differences in arrest rates between Black and White drivers. |
| wisdom-18 |
question |
tutorial.helpers::show_file("stops.qmd", chunk = "last")
---
title: "Stops"
format: html
# In YAML header:
execute:
echo: false
message: false
warning: false
---
```{r}
library(tidyverse)
library(primer.data)
```
```{r}
#| label: eda
x <- stops |>
filter(race %in% c("black", "white")) |>
mutate(race = str_to_title(race),
sex = str_to_title(sex))
```
# Summary:
Differences in how people are treated by the police based on race are an important issue in government policy and fairness. This study uses traffic stop data from New Orleans, collected by the Open Policing Project, to look at differences in arrest rates between Black and White drivers.
> |
| justice-1 |
question |
The four components of Justice in data science are:
Stability – relationships in data stay the same over time
Representativeness – data should reflect the bigger population
Unconfoundedness – no hidden variables affect our results
Awareness of potential bias – knowing where things might be unfair or unequal |
| justice-2 |
question |
A Population Table is a big imaginary table that includes all the people or units we care about — not just the ones in our dataset or Preceptor Table. It represents the full group we want to learn about. |
| justice-3 |
question |
Stability means that the relationships between variables (like how race affects arrests) stay the same over time. So if the relationship was true when we collected the data, it's still true later when we use it. |
| justice-4 |
question |
Laws, police training, or public attitudes may have changed over time, which could change how race affects arrest rates — even if our data is from before those changes. |
| justice-5 |
question |
Representativeness means our data looks like the full population we care about. If it doesn’t, our results might not apply to the whole group. |
| justice-6 |
question |
The traffic stop data might only include certain areas or times, which means it doesn’t cover the whole city or population fairly — some groups might be underrepresented. |
| justice-7 |
question |
The Preceptor Table might focus on a very specific group (like older drivers), but the population includes all drivers. So the population might not match the smaller group we care most about. |
| justice-8 |
question |
Unconfoundedness means there are no hidden variables affecting both treatment and outcome. The best way to make this happen is by randomly assigning who gets the treatment. |
| justice-9 |
question |
Differences in how people are treated by the police based on race are an important issue in government policy and fairness. This study uses traffic stop data from New Orleans, collected by the Open Policing Project, to look at differences in arrest rates between Black and White drivers.
We are studying traffic stop data from New Orleans to see if Black and White drivers have different chances of getting arrested. We use this data to understand patterns in a larger population and apply them to our Preceptor Table. One issue is that the data may not represent all types of drivers or traffic stops, which could make our findings biased. |
| courage-1 |
question |
Courage in data analysis means being willing to start exploring and modeling even when you don’t know exactly what the results will show. You trust the process, learn from what you find, and keep going. |
| courage-2 |
exercise |
library(tidymodels) |
| courage-3 |
exercise |
library(broom) |
| courage-5 |
question |
> tutorial.helpers::show_file("stops.qmd", pattern = "library")
library(tidyverse)
library(primer.data)
library(broom)
library(tidymodels)
> |
| courage-6 |
exercise |
linear_reg(engine = "lm") |
| courage-7 |
exercise |
linear_reg(engine = "lm") %>%
fit(arrested ~ sex, data = x) |
| courage-8 |
exercise |
linear_reg(engine = "lm") %>%
fit(arrested ~ sex, data = x) %>%
tidy(conf.int = TRUE) |
| courage-9 |
exercise |
linear_reg(engine = "lm") %>%
fit(arrested ~ race, data = x) |
| courage-10 |
exercise |
linear_reg(engine = "lm") %>%
fit(arrested ~ race, data = x) %>%
tidy(conf.int = TRUE) |
| courage-11 |
exercise |
linear_reg(engine = "lm") %>%
fit(arrested ~ sex + race, data = x) |
| courage-12 |
exercise |
linear_reg(engine = "lm") %>%
fit(arrested ~ sex + race * zone, data = x) |
| courage-13 |
exercise |
fit_stops |
| courage-15 |
exercise |
library(easystats) |
| courage-17 |
exercise |
check_predictions(extract_fit_engine(fit_stops)) |
| courage-18 |
question |
$$
\widehat{\text{arrested}} = 0.177
+ 0.0614 \cdot \text{sex}_{\text{Male}}
- 0.0445 \cdot \text{race}_{\text{White}}
+ 0.0146 \cdot \text{zone}_{\text{B}}
+ \ldots
+ \text{(interaction terms)}
$$ |
| courage-19 |
question |
> tutorial.helpers::show_file("stops.qmd", pattern = "library")
library(tidyverse)
library(primer.data)
library(broom)
library(tidymodels)
Warning message:
In readLines(path) : incomplete final line found on 'stops.qmd'
> tutorial.helpers::show_file("stops.qmd", pattern = "library")
library(tidyverse)
library(primer.data)
library(broom)
library(tidymodels)
> tutorial.helpers::show_file("stops.qmd", start = -8)
fit(arrested ~ sex + race * zone, data = x)
```
# Summary:
Differences in how people are treated by the police based on race are an important issue in government policy and fairness. This study uses traffic stop data from New Orleans, collected by the Open Policing Project, to look at differences in arrest rates between Black and White drivers.
We are studying traffic stop data from New Orleans to see if Black and White drivers have different chances of getting arrested. We use this data to understand patterns in a larger population and apply them to our Preceptor Table. One issue is that the data may not represent all types of drivers or traffic stops, which could make our findings biased.
> |
| courage-20 |
question |
> tutorial.helpers::show_file(".gitignore")
*_files/
*_cache
> |
| courage-21 |
exercise |
tidy(fit_stops, conf.int = TRUE) |
| courage-22 |
question |
> tutorial.helpers::show_file("stops.qmd", chunk = "Last")
# tidy data
tidy(fit_stops, conf.int = TRUE) %>%
select(term, estimate, conf.low, conf.high) %>%
gt() %>%
tab_header(title = "Model Estimates") %>%
tab_source_note(source_note = "Source: Open Policing Project")
> |
| courage-23 |
question |
We model the likelihood of being arrested — a binary outcome — as a logistic function of a person’s sex, race, and the zone where the stop happened, including interactions between race and zone. |
| temperance-1 |
question |
Temperance means being careful and honest when using your model. Even if the model gives good answers, we shouldn’t act like it's the perfect truth. Models help us make better decisions, but they are based on assumptions, which may not always be correct. |
| temperance-2 |
question |
All else equal, being male increases the predicted chance of getting arrested by about 0.06 on the log-odds scale, compared to being female. |
| temperance-3 |
question |
White drivers are predicted to be slightly less likely to be arrested than Black drivers (the baseline group), by about 0.04 on the log-odds scale. |
| temperance-4 |
question |
The baseline group (Black females in Zone A) has a log-odds of 0.18 for being arrested. This is the starting point, and other variable values add or subtract from this. |
| temperance-5 |
exercise |
library(marginaleffects) |
| temperance-6 |
question |
General topic: Racial disparities in arrests during traffic stops in New Orleans
Specific question:
Are Black drivers more likely to be arrested than White drivers, after accounting for location (zone) and gender? |
| temperance-7 |
exercise |
plot_predictions(fit_stops, condition = c("sex", "race")) |
| temperance-8 |
exercise |
plot_predictions(fit_stops$fit,
newdata = "balanced",
condition = c("zone", "race", "sex"),
draw = FALSE) |> as_tibble() |>
group_by(zone, sex) |>
mutate(sort_order = estimate[race == "Black"]) |>
ungroup() |>
mutate(zone = reorder_within(zone, sort_order, sex)) |>
ggplot(aes(x = zone,
color = race)) +
geom_errorbar(aes(ymin = conf.low,
ymax = conf.high),
width = 0.2,
position = position_dodge(width = 0.5)) +
geom_point(aes(y = estimate),
size = 1,
position = position_dodge(width = 0.5)) +
facet_wrap(~ sex, scales = "free_x") +
scale_x_reordered() +
theme(axis.text.x = element_text(size = 8)) +
scale_y_continuous(labels = percent_format()) |
| temperance-9 |
question |
plot_predictions(fit_stops$fit,
newdata = "balanced",
condition = c("zone", "race", "sex"),
draw = FALSE) |>
as_tibble() |>
group_by(zone, sex) |>
mutate(sort_order = estimate[race == "Black"]) |>
ungroup() |>
mutate(zone = reorder_within(zone, sort_order, sex)) |>
ggplot(aes(x = zone, color = race)) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
width = 0.2, position = position_dodge(width = 0.5)) +
geom_point(aes(y = estimate),
size = 1, position = position_dodge(width = 0.5)) +
facet_wrap(~ sex, scales = "free_x") +
scale_x_reordered() +
theme(axis.text.x = element_text(size = 8)) +
scale_y_continuous(labels = scales::percent_format()) +
labs(
title = "Predicted Arrest Rates by Race, Sex, and Zone",
subtitle = "Black drivers—especially males—face higher predicted arrest rates in most zones",
caption = "Source: Open Policing Project — New Orleans Traffic Stop Data",
y = "Predicted Arrest Probability",
x = "Zone"
) |
| temperance-10 |
question |
> tutorial.helpers::show_file("stops.qmd", start = -8)
x = "Zone"
)
```
# Summary:
Differences in how people are treated by the police based on race are an important issue in government policy and fairness. This study uses traffic stop data from New Orleans, collected by the Open Policing Project, to look at differences in arrest rates between Black and White drivers.
We are studying traffic stop data from New Orleans to see if Black and White drivers have different chances of getting arrested. We use this data to understand patterns in a larger population and apply them to our Preceptor Table. One issue is that the data may not represent all types of drivers or traffic stops, which could make our findings biased.We model the likelihood of being arrested — a binary outcome — as a logistic function of a person’s sex, race, and the zone where the stop happened, including interactions between race and zone.
> |
| temperance-11 |
question |
The predicted arrest rate for Black males is 32%, compared to 24% for White females, with a 95% confidence interval of roughly ±2%. |
| temperance-12 |
question |
Our model may be biased if we didn’t include all important variables (like officer identity or time of day). Maybe the real difference is smaller or larger. A better estimate might be 28% vs. 22%, if unmeasured factors were accounted for. |
| temperance-13 |
question |
> tutorial.helpers::show_file("stops.qmd")
---
title: "Stops"
format: html
# In YAML header:
execute:
echo: false
message: false
warning: false
freeze: true
---
```{r}
library(tidyverse)
library(primer.data)
library(broom)
library(tidymodels)
library(gt)
library(marginaleffects)
library(tidytext)
```
$$
P(Y = 1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1X_1 + \beta_2X_2 + \cdots + \beta_nX_n)}}
with
Y \sim \text{Bernoulli}(\rho)
$$
It shows how we estimate the chance of something happening (like being arrested), based on different variables like race, sex, etc.
```{r}
#| label: eda
x <- stops |>
filter(race %in% c("black", "white")) |>
mutate(race = str_to_title(race),
sex = str_to_title(sex))
x <- x %>%
mutate(arrested = as.factor(arrested))
x <- x %>% slice_sample(n = 15000)
```
<br>
<br>
<br>
$$
\widehat{\text{arrested}} = 0.177
+ 0.0614 \cdot \text{sex}_{\text{Male}}
- 0.0445 \cdot \text{race}_{\text{White}}
+ 0.0146 \cdot \text{zone}_{\text{B}}
+ \ldots
+ \text{(interaction terms)}
$$
```{r}
#| cache: true
fit_stops <- logistic_reg(engine = "glm", mode = "classification") %>%
fit(arrested ~ sex + race * zone, data = x)
```
```{r}
# tidy data
tidy(fit_stops, conf.int = TRUE) %>%
select(term, estimate, conf.low, conf.high) %>%
gt() %>%
tab_header(title = "Logistic Regression Estimates") %>%
fmt_number(columns = 2:4, decimals = 3) %>%
tab_spanner(
label = "95% Confidence Interval",
columns = c(conf.low, conf.high)
)
```
<br>
<br>
```{r}
#| cache: true
plot_predictions(fit_stops$fit,
newdata = "balanced",
condition = c("zone", "race", "sex"),
draw = FALSE) |>
as_tibble() |>
group_by(zone, sex) |>
mutate(sort_order = estimate[race == "Black"]) |>
ungroup() |>
mutate(zone = reorder_within(zone, sort_order, sex)) |>
ggplot(aes(x = zone, color = race)) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
width = 0.2, position = position_dodge(width = 0.5)) +
geom_point(aes(y = estimate),
size = 1, position = position_dodge(width = 0.5)) +
facet_wrap(~ sex, scales = "free_x") +
scale_x_reordered() +
theme(axis.text.x = element_text(size = 8)) +
scale_y_continuous(labels = scales::percent_format()) +
labs(
title = "Predicted Arrest Rates by Race, Sex, and Zone",
subtitle = "Black drivers—especially males—face higher predicted arrest rates in most zones",
caption = "Source: Open Policing Project — New Orleans Traffic Stop Data",
y = "Predicted Arrest Probability",
x = "Zone"
)
```
# Summary:
Differences in how people are treated by the police based on race are an important issue in government policy and fairness. This study uses traffic stop data from New Orleans, collected by the Open Policing Project, to look at differences in arrest rates between Black and White drivers.
We are studying traffic stop data from New Orleans to see if Black and White drivers have different chances of getting arrested. We use this data to understand patterns in a larger population and apply them to our Preceptor Table. One issue is that the data may not represent all types of drivers or traffic stops, which could make our findings biased.We model the likelihood of being arrested — a binary outcome — as a logistic function of a person’s sex, race, and the zone where the stop happened, including interactions between race and zone.The predicted arrest rate for Black males is 32%, compared to 24% for White females, with a 95% confidence interval of roughly ±2%.
> |
| temperance-14 |
question |
https://abdul-hannan96.github.io/stops/ |
| temperance-15 |
question |
https://github.com/Abdul-Hannan96/stops.git |
| minutes |
question |
90 |