id submission_type answer
tutorial-id none 131-stops
name question Darakhshan Fatima
email question darakhshan.fatima110@gmail.com
introduction-1 question Wisdom, Justice, Courage, Tamperance
introduction-2 question > show_file(".gitignore") stops_files >
introduction-3 question > cat(readLines("stops.qmd"), sep = "\n") --- title: "Stops" author: "DK" format: html --- >
introduction-4 question > library(tidyverse) + library(primer.data)
introduction-5 question This data is from the Stanford Open Policing Project, which aims to improve police accountability and transparency by providing data on traffic stops across the United States. The New Orleans dataset includes detailed information about traffic stops conducted by the New Orleans Police Department.
introduction-6 question A causal effect is the difference between two potential outcomes.
introduction-7 question That we can't test an individual for both outcomes.
introduction-8 question arrested
introduction-9 question officer_aggressive_tone 1 = Officer used an aggressive or commanding tone, 0 = Officer used a calm or respectful tone. It can be manipulated through training in de-escalation, communication, or community policing. Another one is officer_threatening_body_language, 1 = Officer exhibited threatening or dominating body language (e.g., hand on weapon, standing too close, crossed arms), 0 = Officer showed neutral or non-threatening posture. It can be manipulated through body language and conflict management training.
introduction-10 question If we have a binary treatment variable like mask, which can take on two values: mask = 1 (the person is wearing a mask) mask = 0 (the person is not wearing a mask)
introduction-11 question Treatment variable mask has two values: mask = 1: the person is wearing a mask mask = 0: the person is not wearing a mask Guess at the potential outcomes: If the person wears a mask (mask = 1), they do not get arrested → Y1=0 If the person does not wear a mask (mask = 0), they get arrested → Y0=1 Causal effect for this unit = Y1 - Y0 = 0 - 1 = -1
introduction-12 question zone or race
introduction-13 question Group 1: Black drivers Group 2: White drivers
introduction-14 question How does race affect the likelihood of being arrested during a traffic stop?
wisdom-1 question Wisdom requires a question, the creation of a Preceptor Table and an examination of our data.
wisdom-2 question It is the smallest table with rows and columns in which if no values are missing we can calculate results.
wisdom-3 question The rows of the Preceptor Table are the units. The outcome is at least one of the columns. If the problem is causal, there will be at least two (potential) outcome columns. The other columns are covariates. If the problem is causal, at least one of the covariates will considered a treatment.
wisdom-4 question individual drivers
wisdom-5 question arrested
wisdom-6 question race, age and zone
wisdom-7 question No treatment is needed as it is a predictive model.
wisdom-8 question moment of the traffic stop
wisdom-9 question In our preceptor table, the unit = the individual driver, the outcome variable is arrested, the covariates are race, zone, and has no treatment as it is a predictive model.
wisdom-10 question Does the average arrest rate differ by race, across all traffic stops?
wisdom-11 question Many researchers are interested in how demographic characteristics like race relate to outcomes such as being arrested during traffic stops. This dataset, collected by the Stanford Open Policing Project from over 400,000 stops, allows us to examine whether arrest rates differ by race.
justice-1 question Justice concerns the Population Table and the four key assumptions which underlie it: validity, stability, representativeness, and unconfoundedness.
justice-2 question Validity is the consistency, or lack thereof, in the columns of the data set and the corresponding columns in the Preceptor Table.
justice-3 question The assumption of validity might not hold because certain columns, like "arrested," could reflect personal biases of officers toward specific races, which would distort the true relationship between race and arrest outcomes. Additionally, if officers treated drivers differently based on factors like the type of car, but we don’t have a column for that in the data, then omitted variable bias may violate the assumption of validity.
justice-4 question The Population Table includes a row for each unit/time combination in the underlying population from which both the Preceptor Table and the data are drawn.
justice-5 question Unit = one individual driver who was stopped Time = the date and time of the stop Each row in the Population Table corresponds to a single traffic stop conducted on a specific driver at a specific date and time.
justice-6 question Stability means that the relationship between the columns in the Population Table is the same for three categories of rows: the data, the Preceptor Table, and the larger population from which both are drawn.
justice-7 question One reason why the assumption of stability might not hold in this case is that officer behavior and policing practices may vary across zones or over time. For example, in certain zones or during specific time periods, officers may be more likely to arrest drivers of certain races due to local policies or events.
justice-8 question Representativeness, or the lack thereof, concerns two relationships among the rows in the Population Table. The first is between the data and the other rows. The second is between the other rows and the Preceptor Table.
justice-9 question One reason the assumption of representativeness might not be true in this data is that it includes only Black and White drivers, while in reality, drivers of other races may also have been stopped and arrested. By excluding those other racial groups, the data may not fully represent the diversity of the actual driving population at that time and location, limiting the generalizability of any conclusions drawn.
justice-10 question One reason the assumption of representativeness might not be true in this case is that the Preceptor Table may not have been randomly selected from the population. If the officers in the Preceptor Table were chosen based on specific criteria (such as only including officers with a certain number of stops or from specific zones), then their behavior and outcomes may not reflect the full range of variation seen in the overall population. This non-random selection would make it difficult to generalize population-level insights to the Preceptor Table accurately
justice-11 question Unconfoundedness means that the treatment assignment is independent of the potential outcomes, when we condition on pre-treatment covariates.
justice-12 question > library(tidyverse) + library(primer.data) + library(tidymodels) ── Attaching packages ──────────────── tidymodels 1.3.0 ── ✔ broom 1.0.8 ✔ rsample 1.3.0 ✔ dials 1.4.0 ✔ tune 1.3.0 ✔ infer 1.0.9 ✔ workflows 1.2.0 ✔ modeldata 1.4.0 ✔ workflowsets 1.1.1 ✔ parsnip 1.3.2 ✔ yardstick 1.3.2 ✔ recipes 1.3.1 ── Conflicts ─────────────────── tidymodels_conflicts() ── ✖ scales::discard() masks purrr::discard() ✖ dplyr::filter() masks stats::filter() ✖ recipes::fixed() masks stringr::fixed() ✖ purrr::is_null() masks testthat::is_null() ✖ dplyr::lag() masks stats::lag() ✖ rsample::matches() masks dplyr::matches(), tidyr::matches(), testthat::matches() ✖ yardstick::spec() masks readr::spec() ✖ recipes::step() masks stats::step() • Search for functions across packages at https://www.tidymodels.org/find/ Warning message: package ‘infer’ was built under R version 4.5.1 >
justice-13 question > library(tidyverse) + library(primer.data) + library(tidymodels) + library(broom) >
justice-14 question $$ \rho = \mathbb{P}(Y = 1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_k X_k)}} $$
justice-15 question A potential weakness in the model is that it assumes a linear relationship on the log-odds scale between the predictors and the outcome, which may not hold true if important interaction terms or nonlinear effects are omitted.
courage-1 question Courage creates the data generating mechanism.
courage-2 exercise linear_reg(engine = "lm")
courage-3 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex, data = x)
courage-4 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex, data = x) |> tidy(conf.int = TRUE)
courage-5 exercise linear_reg(engine = "lm") |> fit(arrested ~ race, data = x)
courage-6 exercise linear_reg(engine = "lm") |> fit(arrested ~ race, data = x) |> tidy(conf.int = TRUE)
courage-7 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex + race, data = x) |> tidy(conf.int = TRUE)
courage-8 exercise linear_reg(engine = "lm") |> fit(arrested ~ sex + race*zone, data = x) |> tidy(conf.int = TRUE)
courage-9 exercise fit_stops
courage-10 question > x <- stops |> + filter(race %in% c("black", "white")) |> + mutate(race = str_to_title(race), + sex = str_to_title(sex)) + + fit_stops <- linear_reg() |> + set_engine("lm") |> + fit(arrested ~ sex + race*zone, data = x) >
courage-11 question > library(easystats) # Attaching packages: easystats 0.7.5 (red = needs update) ✔ bayestestR 0.16.1 ✔ correlation 0.8.8 ✖ datawizard 1.1.0 ✔ effectsize 1.0.1 ✔ insight 1.3.1 ✔ modelbased 0.12.0 ✔ performance 0.15.0 ✔ parameters 0.27.0 ✔ report 0.6.1 ✔ see 0.11.0 Restart the R-Session and update packages with `easystats::easystats_update()`. Warning message: package ‘easystats’ was built under R version 4.5.1 >
courage-12 question > check_predictions(extract_fit_engine(fit_stops)) >
courage-13 question $$ \widehat{\mathbb{P}(\text{arrested} = 1)} = \frac{1}{1 + \exp\left(- \left[ 0.177 + 0.0614 \cdot \text{sex}_{\text{Male}} - 0.0445 \cdot \text{race}_{\text{White}} + 0.0146 \cdot \text{zone}_{\text{B}} + 0.00610 \cdot \text{zone}_{\text{C}} + 0.0781 \cdot \text{zone}_{\text{D}} + 0.00190 \cdot \text{zone}_{\text{E}} - 0.00271 \cdot \text{zone}_{\text{F}} + 0.0309 \cdot \text{zone}_{\text{G}} + 0.0757 \cdot \text{zone}_{\text{H}} + \text{(interaction terms)} \right] \right)} $$
courage-14 question > tutorial.helpers::show_file("stops.qmd", chunk = "Last") #| cache: True x <- stops |> filter(race %in% c("black", "white")) |> mutate( race = str_to_title(race), sex = str_to_title(sex), arrested = factor(arrested) # This line is essential! ) fit_stops <- logistic_reg() |> set_engine("glm") |> set_mode("classification") |> fit(arrested ~ sex + race * zone, data = x) >
courage-15 question > tutorial.helpers::show_file(".gitignore") stops_files *_cache >
courage-16 exercise tidy(fit_stops, conf_int = TRUE)
minutes question 180