id submission_type answer
tutorial-id none 131-stops
name question Shuntaro Kawakami
email question skawakam@hotmail.com
introduction-1 question Wisdom, Justice, Courage, Temperance
introduction-2 question > show_file(".gitignore") stops_files
introduction-3 question > show_file("stops.qmd", chunk = "Last") library(tidyverse) library(primer.data)
introduction-4 question > library(tidyverse)
introduction-5 question Description This data is from the Stanford Open Policing Project, which aims to improve police accountability and transparency by providing data on traffic stops across the United States. The New Orleans dataset includes detailed information about traffic stops conducted by the New Orleans Police Department.
introduction-6 question Difference between potential outcome under treatment and control
introduction-7 question Not possible to have different outcome at the same time.
introduction-8 question arrested
introduction-9 question whether if driver received ticket or not
introduction-10 question two
introduction-11 question if wearing mask, more likely to be arrested
introduction-12 question race
introduction-13 question White vs Black
introduction-14 question Is Black people more likely to be arrested?
wisdom-1 question Preceptor table,
wisdom-2 question Smallest possible table of data with rows and column such that if there is no missing data, we can easily calculate quantities of interest.
wisdom-3 question Unit = The things or individuals on which data is collected. Outcome = The main variable(s) you are trying to predict or explain. Covariate = Variables that are used to help explain or predict the outcome.
wisdom-4 question Driver
wisdom-5 question arrest
wisdom-6 question race
wisdom-7 question No treatment
wisdom-8 question Current
wisdom-9 question id, arrested, race
wisdom-10 question Does race affect number of drivers arrested?
wisdom-11 question We are interested in to know the pattern of drivers arrested when they are pulled over. One pattern we want to find out whether if race is correlated.
justice-1 question Population table, validity, stability, representative, unconfoundedness
justice-2 question Validity is about columns in population table and data. In order to consider the two data sets to be drawn from the same population, the columns from one must have a valid correspondence with the columns in the other.
justice-3 question Arrested column of data or population table may not necessary represent arrest covariate of Preceptor table.
justice-4 question The Population Table includes a row for each unit/time combination in the underlying population from which both the Preceptor Table and the data are drawn. It can be constructed if the validity assumption is (mostly) true.
justice-5 question Drivers pulled over Data collected between July 1, 2011 to July 18, 2018
justice-6 question Stability means that the relationship between the columns in the Population Table is the same for three categories of rows: the data, the Preceptor Table, and the larger population from which both are drawn.
justice-7 question Trend of drivers arrested data collected may not be represent current trend of drivers arrested
justice-8 question representativeness refers to the idea that the data used for analysis (such as a sample or training dataset) accurately reflects the larger population or process from which it was drawn.
justice-9 question Drivers in New Orleans may drive differently in other location in US.
justice-10 question Drivers in New Orleans may drive differently in other location in US.
justice-11 question Unconfoundedness means that the treatment assignment is independent of the potential outcomes (The easiest way to ensure unconfoundedness is to assign treatment randomly)
justice-12 question > library(tidymodels) ── Attaching packages ─────────────────────────────── tidymodels 1.3.0 ── ✔ broom 1.0.8 ✔ rsample 1.3.0 ✔ dials 1.4.0 ✔ tune 1.3.0 ✔ infer 1.0.9 ✔ workflows 1.2.0 ✔ modeldata 1.4.0 ✔ workflowsets 1.1.1 ✔ parsnip 1.3.2 ✔ yardstick 1.3.2 ✔ recipes 1.3.1 ── Conflicts ────────────────────────────────── tidymodels_conflicts() ── ✖ scales::discard() masks purrr::discard() ✖ dplyr::filter() masks stats::filter() ✖ recipes::fixed() masks stringr::fixed() ✖ dplyr::lag() masks stats::lag() ✖ yardstick::spec() masks readr::spec() ✖ recipes::step() masks stats::step() • Use tidymodels_prefer() to resolve common conflicts.
justice-13 question > library(broom)
justice-14 question \[ \log\left( \frac{\rho}{1 - \rho} \right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k \]
justice-15 question Racial disparities in policing outcomes remain a pressing concern, particularly when examining how factors like race and location influence the likelihood of arrest during traffic stops. Using data from a study of New Orleans drivers, we seek to understand the relationship between driver race and the probability of getting arrested during a traffic stop.
courage-1 question Intellectual Honesty Speaking Truth to Power Transparency & Accountability Ethics & Privacy Perseverance
courage-2 exercise linear_reg(engine = "lm")
courage-3 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex, data =x)
courage-4 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex, data =x) |> tidy(conf.int = TRUE)
courage-5 exercise linear_reg(engine = "lm") |> fit(arrested ~ race, data =x) |> tidy(conf.int = TRUE)
courage-6 exercise linear_reg(engine = "lm") |> fit(arrested ~ race, data =x) |> tidy(conf.int = TRUE)
courage-7 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex + race, data =x) |> tidy(conf.int = TRUE)
courage-8 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex + race*zone, data =x) |> tidy(conf.int = TRUE)
courage-9 exercise fit_stops
courage-10 question x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x)
courage-11 question > library(easystats) # Attaching packages: easystats 0.7.4 (red = needs update) ✖ bayestestR 0.16.0 ✖ correlation 0.8.7 ✖ datawizard 1.1.0 ✔ effectsize 1.0.1 ✖ insight 1.3.0 ✖ modelbased 0.11.2 ✖ performance 0.14.0 ✖ parameters 0.26.0 ✔ report 0.6.1 ✔ see 0.11.0 Restart the R-Session and update packages with `easystats::easystats_update()`.
courage-12 question > check_predictions(extract_fit_engine(fit_stops))
courage-13 question \[ \hat{Y} = \text{logit}^{-1} \left( -2.43 + 0.01 \cdot \text{age} + 0.04 \cdot \text{sex}_{\text{Male}} + 0.09 \cdot \text{treatment}_{\text{Civic Duty}} + 0.07 \cdot \text{treatment}_{\text{Hawthorne}} + 0.20 \cdot \text{treatment}_{\text{Self}} + 0.36 \cdot \text{treatment}_{\text{Neighbors}} + 0.82 \cdot \text{voter\_class}_{\text{Sometimes Vote}} + 1.61 \cdot \text{voter\_class}_{\text{Always Vote}} + 0.03 \cdot \text{treatment}_{\text{Civic Duty}} \times \text{voter\_class}_{\text{Sometimes Vote}} \right) \]
courage-14 question #| cache: true x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x)
courage-15 question > tutorial.helpers::show_file(".gitignore") stops_files *_cache
courage-16 exercise tidy(fit_stops, conf.int=TRUE)
courage-17 question > tutorial.helpers::show_file("stops.qmd", chunk = "Last") #| cache: true x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x) tidy(fit_stops, conf.int=TRUE)
temperance-1 question Temperance guides us in the use of the model we have created to answer the questions with which we began.
temperance-2 question The estimate of 0.06 for sexMale means that, holding all other variables constant, being male is associated with a 0.06 unit increase in the predicted value of the outcome variable compared to being female (the reference group). Since the 95% confidence interval (0.0585 to 0.0644) does not contain zero, this effect is statistically significant at the 5% level.
temperance-3 question The estimate of -0.04 for raceWhite means that, holding all other variables constant, being White is associated with a 0.04 unit decrease in the predicted value of the outcome variable compared to individuals in the reference race category (i.e., non-White). Since the 95% confidence interval (-0.057 to -0.032) does not include zero, this effect is statistically significant at the 5% level.
temperance-4 question The estimate of 0.18 for the intercept means that, when all predictor variables are at their reference levels (e.g., female, non-White, and in zone A), the predicted value of the outcome variable is approximately 0.18. The 95% confidence interval (0.171 to 0.184) is narrow and does not include zero, indicating that the intercept is statistically significant.
temperance-5 question > library(marginaleffects)
temperance-6 question Does race correlate with drivers arrested?
temperance-7 question > predictions(fit_stops) Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % 0.179 0.00343 52.2 <0.001 Inf 0.173 0.186 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.250 0.00451 55.5 <0.001 Inf 0.241 0.259 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.232 0.01776 13.1 <0.001 127.6 0.198 0.267 --- 378457 rows omitted. See ?print.marginaleffects --- 0.208 0.00390 53.4 <0.001 Inf 0.201 0.216 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.189 0.00545 34.7 <0.001 874.0 0.179 0.200 Type: numeric
temperance-8 question > predictions(fit_stops, by="sex") sex Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % Female 0.192 0.001234 156 <0.001 Inf 0.190 0.194 Male 0.254 0.000823 309 <0.001 Inf 0.253 0.256 Type: numeric
temperance-9 question Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % 0.179 0.00343 52.2 <0.001 Inf 0.173 0.186 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.250 0.00451 55.5 <0.001 Inf 0.241 0.259 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.232 0.01776 13.1 <0.001 127.6 0.198 0.267 --- 378457 rows omitted. See ?print.marginaleffects --- 0.208 0.00390 53.4 <0.001 Inf 0.201 0.216 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.189 0.00545 34.7 <0.001 874.0 0.179 0.200 Type: numeric >
temperance-10 question Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % 0.179 0.00343 52.2 <0.001 Inf 0.173 0.186 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.250 0.00451 55.5 <0.001 Inf 0.241 0.259 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.232 0.01776 13.1 <0.001 127.6 0.198 0.267 --- 378457 rows omitted. See ?print.marginaleffects --- 0.208 0.00390 53.4 <0.001 Inf 0.201 0.216 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.189 0.00545 34.7 <0.001 874.0 0.179 0.200 Type: numeric >
temperance-11 question plot_predictions() + labs( title = "Sex, Race, and Zone Predict Differences in the Outcome", subtitle = "Men and non-White individuals in Zone A have notably different predicted values compared to others.", x = "Group (e.g., Male / Female, Race, Zone)", y = "Predicted Outcome" ) + theme_minimal(base_size = 14) + theme( plot.title = element_text(face = "bold", size = 16), plot.subtitle = element_text(size = 13, margin = margin(b = 10)), axis.title.x = element_text(size = 13), axis.title.y = element_text(size = 13) )
temperance-12 question > tutorial.helpers::show_file("stops.qmd", chunk = "Last") #| cache: true x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x) tidy(fit_stops, conf.int=TRUE)
temperance-13 question T
temperance-14 question The estimates and confidence intervals for the quantities of interest might be wrong or misleading if key modeling assumptions are violated. For example, if the model suffers from omitted variable bias, important predictors that influence the outcome (like socioeconomic status or regional differences) might be missing, leading to biased coefficient estimates. Similarly, if the relationships between predictors and the outcome are nonlinear or if there are interactions not captured in the model, the linear estimates could misrepresent the true effects.
temperance-15 question > tutorial.helpers::show_file("stops.qmd") --- title: "Stops" format: html author: "Shuntaro Kawakami" execute: echo: false --- ```{r} library(tidyverse) library(primer.data) library(tidyverse) library(tidymodels) library(broom) library(marginaleffects) ``` Racial disparities in policing outcomes remain a pressing concern, particularly when examining how factors like race and location influence the likelihood of arrest during traffic stops. Using data from a study of New Orleans drivers, we seek to understand the relationship between driver race and the probability of getting arrested during a traffic stop. Racial disparities in policing outcomes remain a pressing concern, particularly when examining how factors like race and location influence the likelihood of arrest during traffic stops. Using data from a study of New Orleans drivers, we seek to understand the relationship between driver race and the probability of getting arrested during a traffic stop. ```{r} #| cache: true x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x) tidy(fit_stops, conf.int=TRUE) ``` Using data from a study of New Orleans drivers, we seek to understand the relationship between driver race and the probabilty of getting arrested during a traffic stop. However, our data from both our Preceptor Table and our dataset may not fully represent the population as both may not be from the same time frame and some of our data may come from biased officers, who may target certain groups of individuals.
temperance-16 question https://skawakamNY.github.io/stops
temperance-17 question https://github.com/skawakamNY/stops
minutes question 90