id submission_type answer
tutorial-id none 131-stops
name question Alfred Cheung
email question cheungha21@gmail.com
introduction-1 question Wisdom, Justice, Courage, Temperance
introduction-2 question > show_file(".gitignore") stops_files >
introduction-3 question ```{r} library(tidyverse) library(primer.data) ```
introduction-4 question > library(tidyverse) + library(primer.data) + library(tidyverse) +
introduction-5 question This data is from the Stanford Open Policing Project, which aims to improve police accountability and transparency by providing data on traffic stops across the United States. The New Orleans dataset includes detailed information about traffic stops conducted by the New Orleans Police Department.
introduction-6 question Causal models have more than one (potential) outcome column because we need more than one potential outcome in order to estimate a causal effect.
introduction-7 question Car accidents
introduction-8 question arrested
introduction-9 question sex, age
introduction-10 question 2
introduction-11 question There are 2 potential outcomes because the treatment variable mask takes on 2 posible values: either wearing a mask and getting arrested or not wearing one and getting arrested.
introduction-12 question sex, age
introduction-13 question Let’s consider the variable race. Does race have an effect on the rate people get arrested at?
introduction-14 question Let’s consider the variable race. Does race have an effect on the rate people get arrested at?
wisdom-1 question Wisdom requires a question, the creation of a Preceptor Table, and an examination of our data.
wisdom-2 question A Preceptor Table is the smallest possible table with rows and columns such that, if there is no missing data, our question is easy to answer.
wisdom-3 question A Preceptor Table is the smallest possible table of data with rows and columns such that, if there is no missing data, we can easily calculate the quantity of interest.
wisdom-4 question people
wisdom-5 question arrested
wisdom-6 question sex, age , race
wisdom-7 question no
wisdom-8 question 2011-2018
wisdom-9 question A Preceptor Table is the smallest possible table with rows and columns such that, if there is no missing data, our question is easy to answer.
wisdom-10 question What is the difference in arrest rate between Black and White drivers adjusting for other variables?
wisdom-11 question What is the difference in arrest rate between Black and White drivers adjusting for other variables?
justice-1 question Validity, Stability,Representativeness, unconfoundedness
justice-2 question Validity is the consistency, or lack thereof, in the columns of your data set and the corresponding columns in your Preceptor Table.
justice-3 question match the data set
justice-4 question The Population Table includes a row for each unit/time combination in the underlying population from which both the Preceptor Table and the data are drawn.
justice-5 question unit: people Time: 2010s
justice-6 question Stability means that the relationship between the columns in the Population Table is the same for three categories of rows: the data, the Preceptor Table, and the larger population from which both are drawn.
justice-7 question The data should match
justice-8 question Representativeness, or the lack thereof, concerns two relationships among the rows in the Population Table. The first is between the data and the other rows. The second is between the other rows and the Preceptor Table.
justice-9 question look at the location
justice-10 question The dataset stops is a heavily modified version of the data from the actual study, and therefore has left out nearly 3.1 million entries from the real data, shortening it to roughly 400,000 entries. The deletion of the entries may have led to a misrepresentation of the population, in that a lot of the current data may only be from select areas with select conditions present, and could be from biased officers who are more likely to arrest drivers compared to other officers in the zone. This would mess up our final predictions because it would be providing values that are unrealistic to the area, causing our entire model to be unreastic.
justice-11 question Unconfoundedness means that the treatment assignment is independent of the potential outcomes, when we condition on pre-treatment covariates. A model is confounded if this is not true.
justice-12 question > library(tidymodels) ── Attaching packages ───────────────────────────────────────────────────────── tidymodels 1.3.0 ── ✔ broom 1.0.8 ✔ rsample 1.3.0 ✔ dials 1.4.0 ✔ tune 1.3.0 ✔ infer 1.0.7 ✔ workflows 1.2.0 ✔ modeldata 1.4.0 ✔ workflowsets 1.1.0 ✔ parsnip 1.3.1 ✔ yardstick 1.3.2 ✔ recipes 1.2.1 ── Conflicts ──────────────────────────────────────────────────────────── tidymodels_conflicts() ── ✖ scales::discard() masks purrr::discard() ✖ dplyr::filter() masks stats::filter() ✖ recipes::fixed() masks stringr::fixed() ✖ dplyr::lag() masks stats::lag() ✖ yardstick::spec() masks readr::spec() ✖ recipes::step() masks stats::step() • Search for functions across packages at https://www.tidymodels.org/find/ >
justice-13 question > library(broom)
justice-14 question $$ Y_i = \beta_0 + \beta_1 \cdot \text{Black}_i + \beta_2 \cdot X_{2i} + \beta_3 \cdot X_{3i} + \cdots + \beta_p \cdot X_{pi} + \varepsilon_i $$
justice-15 question Racial disparities in policing outcomes remain a pressing concern, particularly when examining how factors like race and location influence the likelihood of arrest during traffic stops. Using data from a study of New Orleans drivers, we seek to understand the relationship between driver race and the probability of getting arrested during a traffic stop.
courage-1 question Courage starts with math, explores models, and then creates the Data Generating Mechanism.
courage-2 exercise linear_reg(engine = "lm")
courage-3 exercise linear_reg(engine = "lm")|> fit(arrested ~ sex, data = x)
courage-4 exercise linear_reg(engine = "lm")|> fit(arrested ~ sex, data = x)|> tidy(conf.int = TRUE)
courage-5 exercise linear_reg(engine = "lm")|> fit(arrested ~ race, data = x)|> tidy(conf.int = TRUE)
courage-6 exercise linear_reg() |> set_engine("lm") |> fit(arrested ~ race, data = x) |> tidy(conf.int = TRUE)
courage-7 exercise linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race, data = x) |> tidy(conf.int = TRUE)
courage-8 exercise linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x) |> tidy(conf.int = TRUE)
courage-9 exercise fit_stops
courage-10 question x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x)
courage-11 question > library(easystats) # Attaching packages: easystats 0.7.4 ✔ bayestestR 0.15.2 ✔ correlation 0.8.7 ✔ datawizard 1.0.2 ✔ effectsize 1.0.0 ✔ insight 1.1.0 ✔ modelbased 0.10.0 ✔ performance 0.13.0 ✔ parameters 0.24.2 ✔ report 0.6.1 ✔ see 0.11.0
courage-12 question check_predictions(extract_fit_engine(fit_stops))
courage-13 question $$ \hat{Y}_i = 0.1773 + 0.0614 \cdot \text{sexMale}_i - 0.0445 \cdot \text{raceWhite}_i + 0.0146 \cdot \text{zoneB}_i + 0.0061 \cdot \text{zoneC}_i + 0.0781 \cdot \text{zoneD}_i + 0.0019 \cdot \text{zoneE}_i - 0.0027 \cdot \text{zoneF}_i + 0.0309 \cdot \text{zoneG}_i + 0.0757 \cdot \text{zoneH}_i $$
courage-14 question $$ \hat{Y}_i = 0.1773 + 0.0614 \cdot \text{sexMale}_i - 0.0445 \cdot \text{raceWhite}_i + 0.0146 \cdot \text{zoneB}_i + 0.0061 \cdot \text{zoneC}_i + 0.0781 \cdot \text{zoneD}_i + 0.0019 \cdot \text{zoneE}_i - 0.0027 \cdot \text{zoneF}_i + 0.0309 \cdot \text{zoneG}_i + 0.0757 \cdot \text{zoneH}_i $$
courage-15 question stops_files _cache
courage-16 exercise tidy(fit_stops, conf.int = TRUE)
courage-17 question fit_stops_logistic <- logistic_reg() |> set_engine("glm") |> fit(as.factor(arrested) ~ sex + race, data = x) tidy(fit_stops_logistic, conf.int = TRUE) |> select(term, estimate, conf.low, conf.high) |> mutate(across(where(is.numeric), ~round(., 3))) |> knitr::kable( caption = "Logistic Regression Estimates for Arrest Probability (Source: Traffic stops dataset filtered for Black and White drivers)" )
courage-18 question Using data from a study of New Orleans drivers, we seek to understand the relationship between driver race and the probabilty of getting arrested during a traffic stop. However, our data from both our Preceptor Table and our dataset may not fully represent the population as both may not be from the same time frame and some of our data may come from biased officers, who may target certain groups of individuals.
temperance-1 question Temperance uses the Data Generating Mechanism to answer the questions with which we began. Humility reminds us that this answer is always a lie. We can also use the DGM to calculate many similar quantities of interest, displaying the results graphically.
temperance-2 question If the people is male, it will have higher chance get arrested
temperance-3 question When comparing to Black, race white have less chance get arrested
temperance-4 question there is an error and the upper limit of the estimate ins 0.18
temperance-5 question R 4.3.3 started. R version 4.3.3 (2024-02-29 ucrt) -- "Angel Food Cake" Copyright (C) 2024 The R Foundation for Statistical Computing Platform: x86_64-w64-mingw32/x64 (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library(tutorial.helpers) > show_file(".gitignore") stops_files Warning message: In readLines(path) : incomplete final line found on '.gitignore' > show_file(".gitignore") stops_files > library(tidyverse) + library(primer.data) ── Attaching core tidyverse packages ─────────────────────────────────────────── tidyverse 2.0.0 ── ✔ dplyr 1.1.4 ✔ readr 2.1.5 ✔ forcats 1.0.0 ✔ stringr 1.5.1 ✔ ggplot2 3.5.1 ✔ tibble 3.2.1 ✔ lubridate 1.9.4 ✔ tidyr 1.3.1 ✔ purrr 1.0.4 ── Conflicts ───────────────────────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() ℹ Use the conflicted package to force all conflicts to become errors > show_file("stops.qmd", chunk = "Last") > library(tidyverse) + library(primer.data) + library(tidyverse) + > ?stops > library(tidymodels) ── Attaching packages ───────────────────────────────────────────────────────── tidymodels 1.3.0 ── ✔ broom 1.0.8 ✔ rsample 1.3.0 ✔ dials 1.4.0 ✔ tune 1.3.0 ✔ infer 1.0.7 ✔ workflows 1.2.0 ✔ modeldata 1.4.0 ✔ workflowsets 1.1.0 ✔ parsnip 1.3.1 ✔ yardstick 1.3.2 ✔ recipes 1.2.1 ── Conflicts ──────────────────────────────────────────────────────────── tidymodels_conflicts() ── ✖ scales::discard() masks purrr::discard() ✖ dplyr::filter() masks stats::filter() ✖ recipes::fixed() masks stringr::fixed() ✖ dplyr::lag() masks stats::lag() ✖ yardstick::spec() masks readr::spec() ✖ recipes::step() masks stats::step() • Search for functions across packages at https://www.tidymodels.org/find/ > library(broom) > x <- stops |> + filter(race %in% c("black", "white")) |> + mutate(race = str_to_title(race), + sex = str_to_title(sex)) + + fit_stops <- linear_reg() |> + set_engine("lm") |> + fit(arrested ~ sex + race*zone, data = x) > library(easystats) # Attaching packages: easystats 0.7.4 ✔ bayestestR 0.15.2 ✔ correlation 0.8.7 ✔ datawizard 1.0.2 ✔ effectsize 1.0.0 ✔ insight 1.1.0 ✔ modelbased 0.10.0 ✔ performance 0.13.0 ✔ parameters 0.24.2 ✔ report 0.6.1 ✔ see 0.11.0 > check_predictions(extract_fit_engine(fit_stops)) > tutorial.helpers::show_file("stops.qmd", chunk = "Last") > > tutorial.helpers::show_file(".gitignore") > library(marginaleffects)
temperance-6 question What is the difference in arrest rate between Black and White drivers adjusting for other variables?
temperance-7 question > predictions(fit_stops) Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % 0.179 0.00343 52.2 <0.001 Inf 0.173 0.186 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.250 0.00451 55.5 <0.001 Inf 0.241 0.259 0.142 0.00419 33.8 <0.001 828.0 0.133 0.150 0.232 0.01776 13.1 <0.001 127.6 0.198 0.267 --- 378457 rows omitted. See ?print.marginaleffects --- 0.208 0.00390 53.4 <0.001 Inf 0.201 0.216 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.270 0.00377 71.5 <0.001 Inf 0.262 0.277 0.189 0.00545 34.7 <0.001 874.0 0.179 0.200 Type: numeric
temperance-8 question > plot_predictions(fit_stops, by = "sex")
temperance-9 question plot_predictions(fit_stops, condition = "sex")
temperance-10 question plot_predictions(fit_stops, condition = c("sex","race"))
temperance-11 question plot_predictions(fit_stops$fit, newdata = "balanced", condition = c("zone", "race", "sex"), draw = FALSE) |> as_tibble() |> group_by(zone, sex) |> mutate(sort_order = estimate[race == "Black"]) |> ungroup() |> mutate(zone = reorder_within(zone, sort_order, sex)) |> ggplot(aes(x = zone, color = race)) + geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2, position = position_dodge(width = 0.5)) + geom_point(aes(y = estimate), size = 1, position = position_dodge(width = 0.5)) + facet_wrap(~ sex, scales = "free_x") + scale_x_reordered() + theme(axis.text.x = element_text(size = 8)) + scale_y_continuous(labels = percent_format())
temperance-12 question > plot_predictions(fit_stops, by = "sex") > plot_predictions(fit_stops, condition = "sex") > plot_predictions(fit_stops, condition = c("sex","race")) > tutorial.helpers::show_file("stops.qmd", chunk = "Last") plot_predictions(fit_stops$fit, newdata = "balanced", condition = c("zone", "race", "sex"), draw = FALSE) |> as_tibble() |> group_by(zone, sex) |> mutate(sort_order = estimate[race == "Black"]) |> ungroup() |> mutate(zone = reorder_within(zone, sort_order, sex)) |> ggplot(aes(x = zone, color = race)) + geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2, position = position_dodge(width = 0.5)) + geom_point(aes(y = estimate), size = 1, position = position_dodge(width = 0.5)) + facet_wrap(~ sex, scales = "free_x") + scale_x_reordered() + theme(axis.text.x = element_text(size = 8)) + scale_y_continuous(labels = percent_format())
temperance-13 question Racial disparities in policing outcomes remain a pressing concern, particularly when examining how factors like race and location influence the likelihood of arrest during traffic stops. Using data from a study of New Orleans drivers, we seek to understand the relationship between driver race and the probability of getting arrested during a traffic stop.
temperance-14 question Racial disparities in policing outcomes remain a pressing concern, particularly when examining how factors like race and location influence the likelihood of arrest during traffic stops. Using data from a study of New Orleans drivers, we seek to understand the relationship between driver race and the probability of getting arrested during a traffic stop.
temperance-15 question ops dataset filtered for Black and White drivers)" ) ``` ```{r} plot_predictions(fit_stops$fit, newdata = "balanced", condition = c("zone", "race", "sex"), draw = FALSE) |> as_tibble() |> group_by(zone, sex) |> mutate(sort_order = estimate[race == "Black"]) |> ungroup() |> mutate(zone = reorder_within(zone, sort_order, sex)) |> ggplot(aes(x = zone, color = race)) + geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2, position = position_dodge(width = 0.5)) + geom_point(aes(y = estimate), size = 1, position = position_dodge(width = 0.5)) + facet_wrap(~ sex, scales = "free_x") + scale_x_reordered() + theme(axis.text.x = element_text(size = 8)) + scale_y_continuous(labels = percent_format()) ```
temperance-16 question https://github.com/AlfredCheung2121
temperance-17 question https://github.com/AlfredCheung2121
minutes question 90