id submission_type answer
tutorial-id none 131-stops
name question Ayush Chandra
email question chandra.ayush2021@gmail.com
introduction-1 question The cardinal virtues are courage, wisdom, justice, and temperance
introduction-2 question > show_file(".gitignore") stops_files >
introduction-3 question > show_file("stops.qmd", chunk = "Last") #| message: false library(tidyverse) library(primer.data) >
introduction-4 question > library(tidyverse) ── Attaching core tidyverse packages ───────────────────────────────────── tidyverse 2.0.0 ── ✔ dplyr 1.1.4 ✔ readr 2.1.5 ✔ forcats 1.0.0 ✔ stringr 1.5.1 ✔ ggplot2 3.5.2 ✔ tibble 3.3.0 ✔ lubridate 1.9.4 ✔ tidyr 1.3.1 ✔ purrr 1.1.0 ── Conflicts ─────────────────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() ℹ Use the conflicted package to force all conflicts to become errors
introduction-5 question Description This data is from the Stanford Open Policing Project, which aims to improve police accountability and transparency by providing data on traffic stops across the United States. The New Orleans dataset includes detailed information about traffic stops conducted by the New Orleans Police Department.
introduction-6 question The difference between 2 potential outcomes
introduction-7 question We can only observe one outcome at a time
introduction-8 question arrested
introduction-9 question it will be manipulated by the inputs
introduction-10 question 2, it is binary. Arrested or not arrested
introduction-11 question Arrested and not arrested
introduction-12 question time
introduction-13 question white people and black people
introduction-14 question What is the connection between race and the probability of getting arrested
wisdom-1 question Wisdom involves the preceptors table and the assumption of validity
wisdom-2 question A preceptor table is a data table with the least number of rows and columns needed for all of the data
wisdom-3 question the rows are units while the columns are outcomes and covariates
wisdom-4 question The units are the individual people
wisdom-5 question the outcome is arrested
wisdom-6 question race
wisdom-7 question no treatments since predictive
wisdom-8 question whien the people are pulled over
wisdom-9 question The units are the people who have been pulled over, while the outcome columns are whether they were arrested or not. The covariates for this are race and zone
wisdom-10 question what is the connection between race and the number of arrests made
wisdom-11 question our data source is from the US government
justice-1 question Justice involves the population table and the 4 assumptions that underlie it: validity, representativeness, unconfoundedness, and stability
justice-2 question Validity is the assumption that the columns in the data table alight witht the columns in the population and preceptors tables
justice-3 question It may not be true since the studies could have different covatiates
justice-4 question A population table is the combination of the preceptors table and the data table
justice-5 question it is arrests/year
justice-6 question Stability means that the relationship between the columns in the Population Table is the same for three categories of rows: the data, the Preceptor Table, and the larger population from which both are drawn.
justice-7 question It may not hold since he data could have been drawn from different sources
justice-8 question It is the assumption that the data represents the entire population
justice-9 question It may not be true since this is just a data set taken from New Orleans
justice-10 question It may not be true since the columns may not be aligned
justice-11 question Unconfoundedness means that the treatment assignment is independent of the potential outcomes, when we condition on pre-treatment covariates. This assumption is only relevant for causal models. We describe a model as “confounded” if this is not true. The easiest way to ensure unconfoundedness is to assign treatment randomly.
justice-12 question > library(tidymodels) ── Attaching packages ─────────────────────────────────────────────────── tidymodels 1.3.0 ── ✔ broom 1.0.8 ✔ rsample 1.3.0 ✔ dials 1.4.0 ✔ tune 1.3.0 ✔ infer 1.0.8 ✔ workflows 1.2.0 ✔ modeldata 1.4.0 ✔ workflowsets 1.1.1 ✔ parsnip 1.3.2 ✔ yardstick 1.3.2 ✔ recipes 1.3.1 ── Conflicts ────────────────────────────────────────────────────── tidymodels_conflicts() ── ✖ scales::discard() masks purrr::discard() ✖ dplyr::filter() masks stats::filter() ✖ recipes::fixed() masks stringr::fixed() ✖ dplyr::lag() masks stats::lag() ✖ yardstick::spec() masks readr::spec() ✖ recipes::step() masks stats::step() • Dig deeper into tidy modeling with R at https://www.tmwr.org
justice-13 question > library(broom) >
justice-14 question $$ P(Y = 1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_n X_n)}} $$ with \( Y \sim \text{Bernoulli}(\rho) \) where \( \rho = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_n X_n)}} \)
justice-15 question One weakness is that the study is taken from a small place in New Orleans, therefore it doesn't fully represent the entire US population.
courage-1 question Courage involves the data generating mechanism
courage-2 exercise linear_reg(engine = "lm")
courage-3 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex, data = x)
courage-4 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex, data = x) |> tidy(conf.int = TRUE)
courage-5 exercise linear_reg() |> set_engine("lm") |> fit(arrested ~ race, data = x)
courage-6 exercise linear_reg() |> set_engine("lm") |> fit(arrested ~ race, data = x) |> tidy(conf.int = TRUE)
courage-7 exercise linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race, data = x) |> tidy(conf.int = TRUE)
courage-8 exercise linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x) |> tidy(conf.int = TRUE)
courage-9 exercise fit_stops
courage-10 question > x <- stops |> + filter(race %in% c("black", "white")) |> + mutate(race = str_to_title(race), + sex = str_to_title(sex)) + + fit_stops <- linear_reg() |> + set_engine("lm") |> + fit(arrested ~ sex + race*zone, data = x) >
courage-11 question > library(easystats) # Attaching packages: easystats 0.7.4 (red = needs update) ✖ bayestestR 0.16.0 ✖ correlation 0.8.7 ✖ datawizard 1.1.0 ✔ effectsize 1.0.1 ✖ insight 1.3.0 ✖ modelbased 0.11.2 ✖ performance 0.14.0 ✖ parameters 0.26.0 ✔ report 0.6.1 ✔ see 0.11.0 Restart the R-Session and update packages with `easystats::easystats_update()`.
courage-12 question > check_predictions(extract_fit_engine(fit_stops))
courage-13 question $$ \widehat{\text{arrested}} = 0.177 + 0.0614 \cdot \text{sex}_{\text{Male}} - 0.0445 \cdot \text{race}_{\text{White}} + 0.0146 \cdot \text{zone}_{\text{B}} + 0.00610 \cdot \text{zone}_{\text{C}} + 0.0781 \cdot \text{zone}_{\text{D}} + 0.00190 \cdot \text{zone}_{\text{E}} - 0.00271 \cdot \text{zone}_{\text{F}} + 0.0309 \cdot \text{zone}_{\text{G}} + 0.0757 \cdot \text{zone}_{\text{H}} + \text{(interaction terms for race and zone)} $$
courage-14 question > tutorial.helpers::show_file("stops.qmd", chunk = "Last") #| cache: true fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x) >
courage-15 question > tutorial.helpers::show_file(".gitignore") stops_files *_cache >
courage-16 exercise tidy(fit_stops, conf.int = TRUE)
courage-17 question #| label: tidy-table #| echo: false #| message: false #| warning: false library(gt) tidy(fit_stops) |> mutate(term = str_replace_all(term, "sex", "Sex: "), term = str_replace_all(term, "race", "Race: "), term = str_replace_all(term, "zone", "Zone: "), estimate = round(estimate, 3), std.error = round(std.error, 3), statistic = round(statistic, 2), p.value = signif(p.value, 3)) |> gt() |> tab_header( title = "Linear Regression Results", subtitle = "Effect of Race, Sex, and Zone on Arrest Probability" ) |> cols_label( term = "Predictor", estimate = "Estimate", std.error = "Std. Error", statistic = "t value", p.value = "p value" ) |> fmt_number(columns = c(estimate, std.error, statistic, p.value), decimals = 3) |> tab_style( style = cell_text(weight = "bold"), locations = cells_column_labels(everything()) ) |> opt_table_outline() |> opt_row_striping()
courage-18 question we model accidents as a binary function
temperance-1 question Temperance uses the Data Generating Mechanism to answer the specific question with which we began. Humility reminds us that this answer is always a lie. We can also explore the general question by using the DGM to calculate many similar quantities of interest, displaying the results graphically.
temperance-2 question males have a 6% chance of getting arrested compared to the intercept
temperance-3 question White people have a 4% less chance of getting arrested than others
temperance-4 question The intercept is where the values intercect
temperance-5 question > library(marginaleffects) >
temperance-6 question What si the collelation between sex, race, and zone with our outcome variable arrests
temperance-7 question > predictions(fit_stops) >
temperance-8 question > plot_predictions(fit_stops, by = "sex") >
temperance-9 question > plot_predictions(fit_stops, condition = "sex") >
temperance-10 question > plot_predictions(fit_stops, condition = c("sex", "race")) >
temperance-11 question preds <- augment(fit_stops, new_data = x) preds |> ggplot(aes(x = zone, y = .pred, color = race, linetype = sex, group = interaction(race, sex))) + stat_summary(fun = mean, geom = "line", size = 1.2) + stat_summary(fun = mean, geom = "point", size = 3, shape = 21, fill = "white") + scale_color_manual(values = c("Black" = "#1f77b4", "White" = "#ff7f0e")) + scale_linetype_manual(values = c("Male" = "solid", "Female" = "dashed")) + labs( title = "Predicted Arrest Rate by Race, Zone, and Sex", subtitle = "Model-based averages from linear regression", x = "Zone", y = "Predicted Arrest Rate", color = "Race", linetype = "Sex" ) + theme_minimal(base_size = 14) + theme( plot.title = element_text(face = "bold", size = 18), legend.position = "right", panel.grid.minor = element_blank() )
temperance-12 question > tutorial.helpers::show_file("stops.qmd", chunk = "Last") #| label: beautiful-plot #| fig-cap: "Predicted Arrest Rate by Race, Zone, and Sex" #| fig-width: 10 #| fig-height: 6 #| warning: false preds |> ggplot(aes(x = zone, y = .pred, color = race, linetype = sex, group = interaction(race, sex))) + stat_summary(fun = mean, geom = "line", size = 1.2) + stat_summary(fun = mean, geom = "point", size = 3, shape = 21, fill = "white") + scale_color_manual(values = c("Black" = "#1f77b4", "White" = "#ff7f0e")) + scale_linetype_manual(values = c("Male" = "solid", "Female" = "dashed")) + labs( title = "Predicted Arrest Rate by Race, Zone, and Sex", subtitle = "Model-based averages from linear regression", x = "Zone", y = "Predicted Arrest Rate", color = "Race", linetype = "Sex" ) + theme_minimal(base_size = 14) + theme( plot.title = element_text(face = "bold", size = 18), legend.position = "right", panel.grid.minor = element_blank() ) >
temperance-13 question One quantity of intrest is how much more black epople ere arrested than white people
temperance-14 question it may be wrong since we only have a 95% confidence
temperance-15 question > tutorial.helpers::show_file("stops.qmd") --- title: "Stops" author: "Ayush Chandra" format: html execute: echo: false --- ```{r} #| message: false library(tidyverse) library(primer.data) library(tidymodels) library(broom) library(marginaleffects) x <- stops |> filter(race %in% c("black", "white")) |> mutate(race = str_to_title(race), sex = str_to_title(sex)) ``` ```{r} #| cache: true fit_stops <- linear_reg() |> set_engine("lm") |> fit(arrested ~ sex + race*zone, data = x) #| label: augment-model #| message: false #| warning: false preds <- augment(fit_stops, new_data = x) ``` ```{r} #| label: tidy-table #| echo: false #| message: false #| warning: false library(gt) tidy(fit_stops) |> mutate(term = str_replace_all(term, "sex", "Sex: "), term = str_replace_all(term, "race", "Race: "), term = str_replace_all(term, "zone", "Zone: "), estimate = round(estimate, 3), std.error = round(std.error, 3), statistic = round(statistic, 2), p.value = signif(p.value, 3)) |> gt() |> tab_header( title = "Linear Regression Results", subtitle = "Effect of Race, Sex, and Zone on Arrest Probability" ) |> cols_label( term = "Predictor", estimate = "Estimate", std.error = "Std. Error", statistic = "t value", p.value = "p value" ) |> fmt_number(columns = c(estimate, std.error, statistic, p.value), decimals = 3) |> tab_style( style = cell_text(weight = "bold"), locations = cells_column_labels(everything()) ) |> opt_table_outline() |> opt_row_striping() ``` ```{r} #| label: beautiful-plot #| fig-cap: "Predicted Arrest Rate by Race, Zone, and Sex" #| fig-width: 10 #| fig-height: 6 #| warning: false preds |> ggplot(aes(x = zone, y = .pred, color = race, linetype = sex, group = interaction(race, sex))) + stat_summary(fun = mean, geom = "line", size = 1.2) + stat_summary(fun = mean, geom = "point", size = 3, shape = 21, fill = "white") + scale_color_manual(values = c("Black" = "#1f77b4", "White" = "#ff7f0e")) + scale_linetype_manual(values = c("Male" = "solid", "Female" = "dashed")) + labs( title = "Predicted Arrest Rate by Race, Zone, and Sex", subtitle = "Model-based averages from linear regression", x = "Zone", y = "Predicted Arrest Rate", color = "Race", linetype = "Sex" ) + theme_minimal(base_size = 14) + theme( plot.title = element_text(face = "bold", size = 18), legend.position = "right", panel.grid.minor = element_blank() ) ``` Racial arguments in policing outcomes remain an important issue, particularly when examining how factors like race and location influence the likelihood of an arrest during traffic stops. Using data from a study of New Orleans drivers, we wish to understand how driver race can have an impact on the probability of getting arrested during a traffic stop. One weakness is that the study is taken from a small place in New Orleans, therefore it doesn't fully represent the entire US population. We model our outcome variable, arrested, as a linear function of sex, race, and zone. From this, we examined that Males are less likely of getting arrested than Females. We estimate that the probability of a Black driver in New Orleans getting arrested during a traffic stop is roughly 5% more than the probability for White drivers. >
temperance-16 question https://smartsilver5294.github.io/stops/
temperance-17 question https://github.com/SmartSilver5294/stops
minutes question 180