| tutorial-id |
none |
131-stops |
| name |
question |
Naveed Ahmad |
| email |
question |
naveedgill4u@yahoo.com |
| introduction-1 |
question |
Wisdom, Justice, Courage, and Temperance |
| introduction-2 |
question |
show_file(".gitignore")
stops_files
> |
| introduction-3 |
question |
show_file("stops.qmd", chunk = "Last")
#| message: false
library(tidyverse)
library(primer.data)
> |
| introduction-4 |
question |
library(tidyverse)
+ library(primer.data)
── Attaching core tidyverse packages ───────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ─────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package to force all conflicts to become errors
> |
| introduction-5 |
question |
New Orleans Traffic Stops Data
Description
This data is from the Stanford Open Policing Project, which aims to improve police accountability and transparency by providing data on traffic stops across the United States. The New Orleans dataset includes detailed information about traffic stops conducted by the New Orleans Police Department. |
| introduction-6 |
question |
Difference between two potential outcomes |
| introduction-7 |
question |
Two potential outcomes cannot be studied about an individual at the same time |
| introduction-8 |
question |
Zone |
| introduction-9 |
question |
Guidance, if the driver was guided it will carry 9 scores for future avoidance of stop and if not then it will carry 3 scores to avoid future stop |
| introduction-10 |
question |
Two, either may be arrested or not |
| introduction-11 |
question |
if wearing mask 1 score
if not wearing mask 0 score
casual effect 1-0 = 1 |
| introduction-12 |
question |
sex |
| introduction-13 |
question |
female group may different average value of arrest then men |
| introduction-14 |
question |
what is the average difference of being arrested on the basis of race? |
| wisdom-1 |
question |
Wisdom guides for specific research question, preceptor table and assumption of validity |
| wisdom-2 |
question |
A table with units for each observation, outcome variable and covariates, if all values are filled, answer our research question. |
| wisdom-3 |
question |
A table with units for each observation, outcome variable and covariates |
| wisdom-4 |
question |
each stop |
| wisdom-5 |
question |
arrested |
| wisdom-6 |
question |
race, age, sex, place, time, date |
| wisdom-7 |
question |
Its predictive model, no treatment |
| wisdom-8 |
question |
present |
| wisdom-9 |
question |
Units for each individual stopped, arrested outcome variable, race, age, sex, zone, covariates |
| wisdom-10 |
question |
What is the average difference of arrest between white and black drivers adjusting sex and zone? |
| wisdom-11 |
question |
When stopped, the arrest of drivers may be predicted to very on some factors like race, sex, zone. In this study we will use data from the open policing project to predict the arrest by accounting for some covariates. |
| justice-1 |
question |
It involves creation of population table, and assumptions of validity, stability, representativeness and unconfoundedness. |
| justice-2 |
question |
The relationship between columns of preceptor table and columns of data set is similar |
| justice-3 |
question |
Columns of data may not have the similar columns in preceptor table, as individual stopped may have different attitude in both data and preceptor table |
| justice-4 |
question |
A table having rows with unit and time combination for recording all stopped with other variables |
| justice-5 |
question |
Unit is the row recording each stopped individual on specific time period |
| justice-6 |
question |
Stability means the relationship between columns of the population table is consistent with three categories of rows: rows of preceptor table, data and other rows of population table. |
| justice-7 |
question |
The police officer as well drivers and legal provisions may have changed over the time, behavior of drivers and police officers. |
| justice-8 |
question |
The preceptor table may have different drivers then in data and population table, or zone restrictions may have changed |
| justice-9 |
question |
The population may have drivers who were not educated as in the data set |
| justice-10 |
question |
The preceptor table may have individual drivers who were more trained then in population table. |
| justice-11 |
question |
Treatment assignment should be independent or unbiased |
| justice-12 |
question |
library(tidymodels)
── Attaching packages ─────────────────────────────────────── tidymodels 1.3.0 ── |
| justice-13 |
question |
library(broom) |
| justice-14 |
question |
Since $Y$ is a binary variable (with exactly two possible values), the probability family is **Bernoulli**:
$$
Y \sim \text{Bernoulli}(\rho)
$$
where $\rho$ is the probability that one of the two possible values — conventionally referred to as 1 (or TRUE) — occurs. By definition, $1 - \rho$ is the probability of the other value.
For a binary outcome variable, we use a **log-odds model**:
$$
\log\left(\frac{\rho}{1 - \rho}\right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k
$$ |
| justice-15 |
question |
However, our assumptions that data is valid, stable and representative of all drivers if not proved fully, the predictions may very. Because, the divers or offcicers may have change in behavior, education, traing, and also change in zone may differ hence, our predictions about arrest may very from actual. |
| courage-1 |
question |
it decide about Data Generating Mechanizm |
| courage-2 |
exercise |
linear_reg(engine = "lm") |
| courage-3 |
exercise |
linear_reg(engine = "lm") |>
fit(arrested ~ sex, data = x) |
| courage-4 |
exercise |
linear_reg(engine = "lm") |>
fit(arrested ~ sex, data = x) |>
tidy(conf.int = TRUE) |
| courage-5 |
exercise |
linear_reg(engine = "lm") |>
fit(arrested ~ race, data = x) |
| courage-6 |
exercise |
linear_reg(engine = "lm") |>
fit(arrested ~ race, data = x) |>
tidy(conf.int = TRUE) |
| courage-7 |
exercise |
linear_reg(engine = "lm") |>
fit(arrested ~ sex + race, data = x) |>
tidy(conf.int = TRUE) |
| courage-8 |
exercise |
linear_reg(engine = "lm") |>
fit(arrested ~ sex + race*zone, data = x) |>
tidy(conf.int = TRUE) |
| courage-9 |
exercise |
fit_stops |
| courage-10 |
question |
fit_stops
parsnip model object
Call:
stats::lm(formula = arrested ~ sex + race * zone, data = data)
Coefficients:
(Intercept) sexMale raceWhite zoneB
0.1773298 0.0614460 -0.0445247 0.0146036
zoneC zoneD zoneE zoneF
0.0061012 0.0780600 0.0019025 -0.0027057
zoneG zoneH zoneI zoneJ
0.0308717 0.0757019 0.0330416 0.0237773
zoneK zoneL zoneM zoneN
0.0586687 -0.0038877 0.0393026 0.0139437
zoneO zoneP zoneQ zoneR
0.0232251 0.0140617 0.0126170 0.0119566
zoneS zoneT zoneU zoneV
0.0594727 0.0113267 0.0071986 0.0770051
zoneW zoneX zoneY raceWhite:zoneB
0.1143814 0.0057280 0.0386437 -0.0077384
raceWhite:zoneC raceWhite:zoneD raceWhite:zoneE raceWhite:zoneF
0.0065557 0.0294040 0.0068179 -0.0137965
raceWhite:zoneG raceWhite:zoneH raceWhite:zoneI raceWhite:zoneJ
0.0088500 0.0085970 -0.0339373 -0.0244272
raceWhite:zoneK raceWhite:zoneL raceWhite:zoneM raceWhite:zoneN
-0.0381747 -0.0075094 -0.0423222 -0.0566405
raceWhite:zoneO raceWhite:zoneP raceWhite:zoneQ raceWhite:zoneR
-0.0149832 0.0092133 -0.0544990 -0.0379411
raceWhite:zoneS raceWhite:zoneT raceWhite:zoneU raceWhite:zoneV
-0.0250048 -0.0272932 0.0383220 -0.0387945
raceWhite:zoneW raceWhite:zoneX raceWhite:zoneY
-0.1233162 0.0843196 -0.0002596
> |
| courage-11 |
question |
library(easystats)
# Attaching packages: easystats 0.7.4
✔ bayestestR 0.16.0 ✔ correlation 0.8.7
✔ datawizard 1.1.0 ✔ effectsize 1.0.1
✔ insight 1.3.0 ✔ modelbased 0.11.2
✔ performance 0.14.0 ✔ parameters 0.26.0
✔ report 0.6.1 ✔ see 0.11.0
> |
| courage-12 |
question |
check_predictions(extract_fit_engine(fit_stops))
> |
| courage-13 |
question |
$$
\widehat{\text{arrested}} = 0.177
+ 0.0614 \cdot \text{sex}_{\text{Male}}
- 0.0445 \cdot \text{race}_{\text{White}}
+ 0.0146 \cdot \text{zone}_{\text{B}}
+ 0.00610 \cdot \text{zone}_{\text{C}}
+ 0.0781 \cdot \text{zone}_{\text{D}}
+ 0.00190 \cdot \text{zone}_{\text{E}}
- 0.00271 \cdot \text{zone}_{\text{F}}
+ 0.0309 \cdot \text{zone}_{\text{G}}
+ 0.0757 \cdot \text{zone}_{\text{H}}
+ \text{(interaction terms for race and zone)}
$$ |
| courage-14 |
question |
tutorial.helpers::show_file("stops.qmd", chunk = "Last")
#| cache: true
x <- stops |>
filter(race %in% c("black", "white")) |>
mutate(race = str_to_title(race),
sex = str_to_title(sex))
fit_stops <- linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race*zone, data = x)
> |
| courage-15 |
question |
tutorial.helpers::show_file(".gitignore")
stops_files
*_cache
> |
| courage-16 |
exercise |
tidy(fit_stops, conf.int = TRUE) |
| courage-17 |
question |
tutorial.helpers::show_file("stops.qmd", chunk = "Last")
#| message: false
#| warning: false
stops_tbl <- tidy(fit_stops, conf.int = TRUE) |>
select(term, estimate, conf.low, conf.high) |>
mutate(across(c(estimate, conf.low, conf.high), round, 2)) |>
gt() |>
tab_header(
title = "Coefficients with 95% Confidence Intervals"
) |>
cols_label(
term = "Term",
estimate = "Estimate",
conf.low = "95% CI Lower",
conf.high = "95% CI Upper"
) |>
fmt_number(
columns = c(estimate, conf.low, conf.high),
decimals = 2
) |>
tab_options(
table.font.size = "small",
table.border.top.color = "gray80",
table.border.bottom.color = "gray80"
)
stops_tbl
> |
| courage-18 |
question |
We used logistic regression model to predict arrests as function of race, sex, and zone. |
| temperance-1 |
question |
Modesty in claims because our results are based upon assumptions |
| temperance-2 |
question |
Male has .06 more chances of arrest than female |
| temperance-3 |
question |
Race white has -.04 less chances of arrest |
| temperance-4 |
question |
There are 0.18 chances of a driver being arrested at stop |
| temperance-5 |
question |
library(marginaleffects)
Please cite the software developers who make your work possible.
One package: citation("package_name")
All project packages: softbib::softbib()
> |
| temperance-6 |
question |
what is the average difference of black and white drivers of being arrested on basis of sex, race and zone at each stop? |
| temperance-7 |
question |
predictions(fit_stops)
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0.179 0.00343 52.2 <0.001 Inf 0.173 0.186
0.142 0.00419 33.8 <0.001 828.0 0.133 0.150
0.250 0.00451 55.5 <0.001 Inf 0.241 0.259
0.142 0.00419 33.8 <0.001 828.0 0.133 0.150
0.232 0.01776 13.1 <0.001 127.6 0.198 0.267
--- 378457 rows omitted. See ?print.marginaleffects ---
0.208 0.00390 53.4 <0.001 Inf 0.201 0.216
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.189 0.00545 34.7 <0.001 874.0 0.179 0.200
Type: numeric |
| temperance-8 |
question |
predictions(fit_stops, by = "sex")
sex Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Female 0.192 0.001234 156 <0.001 Inf 0.190 0.194
Male 0.254 0.000823 309 <0.001 Inf 0.253 0.256
Type: numeric |
| temperance-9 |
question |
predictions(fit_stops, condition = "sex")
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0.179 0.00343 52.2 <0.001 Inf 0.173 0.186
0.142 0.00419 33.8 <0.001 828.0 0.133 0.150
0.250 0.00451 55.5 <0.001 Inf 0.241 0.259
0.142 0.00419 33.8 <0.001 828.0 0.133 0.150
0.232 0.01776 13.1 <0.001 127.6 0.198 0.267
--- 378457 rows omitted. See ?print.marginaleffects ---
0.208 0.00390 53.4 <0.001 Inf 0.201 0.216
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.189 0.00545 34.7 <0.001 874.0 0.179 0.200
Type: numeric
> |
| temperance-10 |
question |
predictions(fit_stops, condition = c("sex", "race"))
Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
0.179 0.00343 52.2 <0.001 Inf 0.173 0.186
0.142 0.00419 33.8 <0.001 828.0 0.133 0.150
0.250 0.00451 55.5 <0.001 Inf 0.241 0.259
0.142 0.00419 33.8 <0.001 828.0 0.133 0.150
0.232 0.01776 13.1 <0.001 127.6 0.198 0.267
--- 378457 rows omitted. See ?print.marginaleffects ---
0.208 0.00390 53.4 <0.001 Inf 0.201 0.216
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.270 0.00377 71.5 <0.001 Inf 0.262 0.277
0.189 0.00545 34.7 <0.001 874.0 0.179 0.200
Type: numeric
> |
| temperance-11 |
question |
stops_pre_tbl |>
slice_sample(n = 1000) |>
mutate(zone = factor(zone)) |> # Ensures correct ordering of zones
ggplot(aes(y = zone, x = estimate, color = race, shape = sex)) +
geom_point(size = 2) +
geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.2) + # horizontal error bars
labs(
title = "Prediction of arrested by Zone, Race, and Sex",
subtitle = "Black and sex Male has the highest pobability of arrested",
x = "Predicted Probability",
y = "Zone"
) +
theme_minimal(base_size = 13) |
| temperance-12 |
question |
tutorial.helpers::show_file("stops.qmd", chunk = "Last")
stops_pre_tbl |>
slice_sample(n = 1000) |>
mutate(zone = factor(zone)) |> # Ensures correct ordering of zones
ggplot(aes(y = zone, x = estimate, color = race, shape = sex)) +
geom_point(size = 2) +
geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.2) + # horizontal error bars
labs(
title = "Prediction of arrested by Zone, Race, and Sex",
subtitle = "Black and sex Male has the highest pobability of arrested",
x = "Predicted Probability",
y = "Zone"
) +
theme_minimal(base_size = 13) |
| temperance-13 |
question |
The predictions of being arrested show that black, male drivers have the highest probability in all zones except zone x. The highest probability of being arrested is more than 35% for black drivers in zone w with variation in perdition 32% to 38%. |
| temperance-14 |
question |
The assumption that data drivers in our prediction represent the drivers in the population rest all things being constant but if our assumptions do not hold good then actual results may very as the attitude of police officer, training of drivers, wearing of safety halmets, etc may reduce chances of being arrested. |
| temperance-15 |
question |
tutorial.helpers::show_file("stops.qmd")
---
title: "Stops"
author: Naveed Ahmad
format: html
execute:
echo: false
---
```{r}
#| message: false
#| warning: false
library(tidyverse)
library(primer.data)
library(tidymodels)
library(broom)
library(gt)
library(marginaleffects)
library(ggplot2)
```
## Summary
<div style="text-align: justify;">When stopped, the arrest of drivers may be predicted to very on some factors like race, sex, zone. In this study we will use data from the open policing project to predict the arrest of drivers by adjusting for some covariates. However, our assumptions that data is valid, stable and representative of all drivers if not proved fully, the predictions may very. Because, the divers or offcicers may have change in behavior, education, traing, and also change in zone may differ hence, our predictions about arrest may very from actual. We used logistic regression model to predict arrested as function of race, sex, and zone. The predictions of being arrested show that black, male drivers have the highest probability in all zones except zone x. The highest probability of being arrested is more than 35% for black drivers in zone w with variation in perdition 32% to 38%.</div>
## Statistical Model
Since $Y$ is a binary variable (with exactly two possible values), the probability family is **Bernoulli**:
$$
Y \sim \text{Bernoulli}(\rho)
$$
where $\rho$ is the probability that one of the two possible values — conventionally referred to as 1 (or TRUE) — occurs. By definition, $1 - \rho$ is the probability of the other value.
For a binary outcome variable, we use a **log-odds model**:
$$
\log\left(\frac{\rho}{1 - \rho}\right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k
$$
```{r}
#| cache: true
x <- stops |>
filter(race %in% c("black", "white")) |>
mutate(race = str_to_title(race),
sex = str_to_title(sex))
fit_stops <- linear_reg() |>
set_engine("lm") |>
fit(arrested ~ sex + race*zone, data = x)
```
## Table of Coefficients of Parameters
```{r}
#| message: false
#| warning: false
stops_tbl <- tidy(fit_stops, conf.int = TRUE) |>
select(term, estimate, conf.low, conf.high) |>
mutate(across(c(estimate, conf.low, conf.high), round, 2)) |>
gt() |>
tab_header(
title = "Coefficients with 95% Confidence Intervals"
) |>
cols_label(
term = "Term",
estimate = "Estimate",
conf.low = "95% CI Lower",
conf.high = "95% CI Upper"
) |>
fmt_number(
columns = c(estimate, conf.low, conf.high),
decimals = 2
) |>
tab_options(
table.font.size = "small",
table.border.top.color = "gray80",
table.border.bottom.color = "gray80"
)
stops_tbl
```
## Fited regression model
$$
\widehat{\text{arrested}} = 0.177
+ 0.0614 \cdot \text{sex}_{\text{Male}}
- 0.0445 \cdot \text{race}_{\text{White}}
+ 0.0146 \cdot \text{zone}_{\text{B}}
+ 0.00610 \cdot \text{zone}_{\text{C}}
+ 0.0781 \cdot \text{zone}_{\text{D}}
+ 0.00190 \cdot \text{zone}_{\text{E}}
- 0.00271 \cdot \text{zone}_{\text{F}}
+ 0.0309 \cdot \text{zone}_{\text{G}}
+ 0.0757 \cdot \text{zone}_{\text{H}}
+ \text{(interaction terms for race and zone)}
$$
this is our Data Generating Mechanism
```{r}
#| message: false
#| warning: false
stops_pre_tbl <- predictions(fit_stops, condition = c("sex", "race"))
```
## Plot showing prediction of arrested
```{r}
stops_pre_tbl |>
slice_sample(n = 1000) |>
mutate(zone = factor(zone)) |> # Ensures correct ordering of zones
ggplot(aes(y = zone, x = estimate, color = race, shape = sex)) +
geom_point(size = 2) +
geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.2) + # horizontal error bars
labs(
title = "Prediction of arrested by Zone, Race, and Sex",
subtitle = "Black and sex Male has the highest pobability of arrested",
x = "Predicted Probability",
y = "Zone"
) +
theme_minimal(base_size = 13)
```
> |
| temperance-16 |
question |
https://naveedgill4u.github.io/stops/ |
| temperance-17 |
question |
https://github.com/naveedgill4u/stops |
| minutes |
question |
210 |