ci
: Confidence
Intervals for EducationPackage ci
(v0.0.1) is an educational package providing
intuitive functions for calculating confidence intervals (CI) for
various statistical parameters. Designed primarily for teaching and
learning about statistical inference (particularly confidence
intervals), the package offers user-friendly wrappers around established
methods for:
All functions integrate seamlessly with Tidyverse workflows, making them ideal for classroom demonstrations and student exercises.
This vignette will introduce you to the concept of CIs and show you
how to use the ci
package to calculate and visualize
them.
Function | Purpose | Input Data | Output Data |
---|---|---|---|
ci_binom() |
Proportion CI | Summary statistics (frequencies) | Data frame |
ci_multinom() |
Proportion CI | Summary statistics (frequencies) | Data frame |
ci_mean_t() |
Mean CI | Data frame | Data frame |
ci_mean_t_stat() |
Mean CI | Summary statistics (M, SD, n) | Data frame |
ci_boot() |
Any statistic CI | Data frame | Data frame |
The most common use case examples:
# Binomial CI: two categories (e.g., mutant/wild-type)
ci_binom(x = 54, n = 80)
# Multinomial CI: 3+ categories (e.g., blood types)
ci_multinom(c("A" = 20, "B" = 35, "AB" = 25, "O" = 15))
# Mean CI based on data: average with groups
data |>
group_by(group_var) |>
ci_mean_t(numeric_var)
# Mean CI based on summary statistics (mean, SD, n): e.g., literature values
ci_mean_t_stat(mean_ = 75, sd_ = 10, n = 25)
# Any statistic (median, correlation, etc.)
data |> ci_boot(var1, var2, FUN = cor, R = 1000)
?ci
or
help(package = "ci")
?ci::ci_mean_t
,
?ci::ci_binom
, etc.Confidence interval (CI) is a fundamental concept in statistics, providing a range of values that likely contain the true population parameter. They help quantify the uncertainty associated with sample estimates.
Imagine you want to know the average height of all students in your university. You can’t measure everyone, so you measure 100 students and find their average height is 170 cm. But would a different group of 100 students give exactly the same average? Probably not.
A confidence interval gives you a range of plausible values for the true population average. Instead of saying “the average is 170 cm,” you might say “I’m 95% confident the true average is between 168 cm and 172 cm.”
Population vs Sample:
Parameter vs Statistic:
Confidence Level:
Key Point: The interval either contains the true value or it doesn’t. The “95%” refers to how often the method works, not the probability for any single interval.
❌ Wrong: “95% probability the true value is in this
interval”
✅ Right: “We’re 95% confident the true value
is in this interval”
❌ Wrong: Making intervals narrower by changing
confidence level
✅ Right: Choose confidence level before calculating
(usually 95%)
Choosing Confidence Level:
Interpreting Results:
Sample Size Planning:
Binomial proportion CI measures the uncertainty around a proportion in binary outcomes (i.e., two categories like mutant/wild-type, diseased/healthy, success/failure, yes/no).
To calculate confidence intervals in this case:
x
and total
n
as input to ci_binom()
.Scenario: You investigate 80 Drosophila
specimens (n = 80
) and 54 have no mutation
(x = 54
).
# 54 out of 80 Drosophila are mutation-free
ci_binom(x = 54, n = 80)
#> # A tibble: 1 × 5
#> est lwr.ci upr.ci x n
#> <dbl> <dbl> <dbl> <int> <int>
#> 1 0.675 0.566 0.768 54 80
Interpretation: We are 95% confident that between 56.6% and 76.8% of the population are mutation-free.
Let’s simulate results with same proportion (67.5%) but different sample sizes:
ci_n1 <- ci_binom(x = 54, n = 80) # Small sample
ci_n2 <- ci_binom(x = 135, n = 200) # Larger sample
ci_n3 <- ci_binom(x = 675, n = 1000) # Even larger sample
Now let’s merge all the results and calculate difference between upper and lower CI bounds:
bind_rows(ci_n1, ci_n2, ci_n3) |>
mutate(diff_upr_lwr = round(upr.ci - lwr.ci, digits = 3))
#> # A tibble: 3 × 6
#> est lwr.ci upr.ci x n diff_upr_lwr
#> <dbl> <dbl> <dbl> <int> <int> <dbl>
#> 1 0.675 0.566 0.768 54 80 0.201
#> 2 0.675 0.607 0.736 135 200 0.129
#> 3 0.675 0.645 0.703 675 1000 0.058
💡 Notice: Larger samples give narrower (more precise) intervals. This is true not only for proportions but also for means and other statistics.
Let’s see how confidence level affects interval width:
ci_q1 <- ci_binom(x = 54, n = 80, conf.level = 0.90) # Less confident, narrower
ci_q2 <- ci_binom(x = 54, n = 80, conf.level = 0.95) # Standard
ci_q3 <- ci_binom(x = 54, n = 80, conf.level = 0.99) # More confident, wider
bind_rows(ci_q1, ci_q2, ci_q3) |>
mutate(diff_upr_lwr = round(upr.ci - lwr.ci, digits = 3))
#> # A tibble: 3 × 6
#> est lwr.ci upr.ci x n diff_upr_lwr
#> <dbl> <dbl> <dbl> <int> <int> <dbl>
#> 1 0.675 0.584 0.754 54 80 0.17
#> 2 0.675 0.566 0.768 54 80 0.201
#> 3 0.675 0.531 0.792 54 80 0.261
💡 Notice: Higher confidence levels give wider intervals.
⚠️ This is just an educational example. Always choose the confidence level based on your study design before calculating CIs.
Multinomial proportion CI measures uncertainty around proportions in categorical outcomes with 3 or more categories (e.g., A/B/AB/O blood types, eye colors, genotypes).
In this case:
ci_multinom()
.Scenario: Blood type distribution in patients with a certain medical condition.
# Blood type distribution
blood_types <- c("A" = 20, "B" = 35, "AB" = 25, "O" = 15)
ci_multinom(blood_types)
#> # A tibble: 4 × 6
#> group est lwr.ci upr.ci x n
#> <fct> <dbl> <dbl> <dbl> <int> <int>
#> 1 A 0.211 0.126 0.331 20 95
#> 2 B 0.368 0.257 0.497 35 95
#> 3 AB 0.263 0.167 0.388 25 95
#> 4 O 0.158 0.0860 0.272 15 95
Scenario: You survey lab members about their preferred method of commuting to the university.
# Transportation preferences
c("Car" = 20, "Bus" = 45, "Bike" = 15, "Walk" = 20) |>
ci_multinom(method = "goodman")
#> # A tibble: 4 × 6
#> group est lwr.ci upr.ci x n
#> <fct> <dbl> <dbl> <dbl> <int> <int>
#> 1 Car 0.2 0.119 0.316 20 100
#> 2 Bus 0.45 0.332 0.574 45 100
#> 3 Bike 0.15 0.0816 0.259 15 100
#> 4 Walk 0.2 0.119 0.316 20 100
Interpretation: We’re 95% confident that the true proportion of people who prefer taking the bus is between 32.7% and 57.7%.
In this case, simultaneous CIs are calculated. This means that CIs are calculated for each category at once, ensuring that the overall confidence level (e.g., 95%) is maintained across all categories.
To calculate confidence intervals for means directly from data
frames, use ci_mean_t()
.
Scenario: You are investigating crop yield.
# Using built-in dataset as example
data(npk, package = "datasets")
head(npk, n = 3)
#> block N P K yield
#> 1 1 0 1 1 49.5
#> 2 1 1 1 0 62.8
#> 3 1 0 0 0 46.8
# Average crop yield (overall)
npk |>
ci_mean_t(yield)
#> # A tibble: 1 × 3
#> mean lwr.ci upr.ci
#> <dbl> <dbl> <dbl>
#> 1 54.9 52.3 57.5
Interpretation: We’re 95% confident the true average yield is between 51.9 and 56.9.
Scenario: You are investigating crop yield in different conditions (with/without fertilizers).
# Compare yields with and without nitrogen fertilizer
npk |>
group_by(N) |>
ci_mean_t(yield)
#> # A tibble: 2 × 4
#> N mean lwr.ci upr.ci
#> <fct> <dbl> <dbl> <dbl>
#> 1 0 52.1 48.6 55.5
#> 2 1 57.7 54.0 61.4
Look at the confidence intervals: if they don’t overlap, it suggests a potential difference between groups.
# Multiple grouping factors
npk |>
group_by(N, P, K) |>
ci_mean_t(yield)
#> # A tibble: 8 × 6
#> N P K mean lwr.ci upr.ci
#> <fct> <fct> <fct> <dbl> <dbl> <dbl>
#> 1 0 1 1 50.5 44.6 56.4
#> 2 1 1 0 57.9 44.3 71.5
#> 3 0 0 0 51.4 40.0 62.9
#> 4 1 0 1 54.7 44.2 65.1
#> 5 1 0 0 63.8 51.1 76.4
#> 6 1 1 1 54.4 41.9 66.8
#> 7 0 0 1 52 38.0 66.0
#> 8 0 1 0 54.3 31.0 77.7
Scenario: Iris flower measurements.
data(iris, package = "datasets")
head(iris, n = 3)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
# Petal length by flower species
iris |>
group_by(Species) |>
ci_mean_t(Petal.Length)
#> # A tibble: 3 × 4
#> Species mean lwr.ci upr.ci
#> <fct> <dbl> <dbl> <dbl>
#> 1 setosa 1.46 1.41 1.51
#> 2 versicolor 4.26 4.13 4.39
#> 3 virginica 5.55 5.40 5.71
Interpretation: The results suggest that the three species have clearly different petal lengths (no overlapping intervals).
When you only have descriptive statistics (from literature, reports, etc.), you may use them directly.
Scenario: You read a study that reports “mean = 75, SD = 10, n = 25” but doesn’t give confidence intervals.
Scenario: You compare 3 studies.
You may enter data as vectors:
# Three enzyme activity measurements
mean_val <- c(78, 82, 75)
std_dev <- c(8, 7, 9)
n <- c(30, 28, 32)
group <- c("Enzyme A", "Enzyme B", "Enzyme C")
ci_mean_t_stat(mean_val, std_dev, n, group)
#> # A tibble: 3 × 6
#> group mean lwr.ci upr.ci sd n
#> <fct> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 Enzyme A 78 75.0 81.0 8 30
#> 2 Enzyme B 82 79.3 84.7 7 28
#> 3 Enzyme C 75 71.8 78.2 9 32
You may enter data as a data frame too. Then use either
$
or with()
based
syntax:
# Three different treatment conditions from literature
treatments <-
tibble(
treatment = c("Control", "Low dose", "High dose"),
mean_response = c(78, 82, 75),
sd = c(8, 7, 9),
n = c(30, 28, 32)
)
treatments |>
with(ci_mean_t_stat(mean_response, sd, n, treatment))
#> # A tibble: 3 × 6
#> group mean lwr.ci upr.ci sd n
#> <fct> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 Control 78 75.0 81.0 8 30
#> 2 Low dose 82 79.3 84.7 7 28
#> 3 High dose 75 71.8 78.2 9 32
# Same mean and SD but four different sample sizes
ci_mean_t_stat(mean_ = 75, sd_ = 10, n = c(10, 25, 50, 100))
#> # A tibble: 4 × 6
#> group mean lwr.ci upr.ci sd n
#> <fct> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 "" 75 67.8 82.2 10 10
#> 2 "" 75 70.9 79.1 10 25
#> 3 "" 75 72.2 77.8 10 50
#> 4 "" 75 73.0 77.0 10 100
Larger samples yield narrower (more precise) confidence intervals. Doubling sample size doesn’t halve interval width: the relationship is not linear.
Bootstrapping methods are based on resampling your data with replacement to create many “bootstrap samples.” For each sample, the statistic of interest is calculated. The distribution of these statistics is then used to estimate confidence intervals.
Bootstrap is useful when:
⚠️ Warning: Bootstrap needs adequate sample sizes (not fewer than 20 per group, preferably 30+ per group).
The main arguments of ci_boot()
are:
data
: Your data frame. Usually piped in using %>%
or |>
.x
, y
: One or two unquoted variable names
to use (y
is optional).
FUN
: The function to compute the statistic (e.g.,
median
, cor
, etc.). Use no parentheses after
the function name (e.g., FUN = median
, not
FUN = median()
)....
: Additional arguments to pass to FUN
(e.g., method = "pearson"
for cor()
).R
: Number of bootstrap resamples (e.g., 1000):
R
gives more accurate CIs but takes longer to
compute.bci.method
: Bootstrap CI method:
"bca"
for Bias-Corrected and Accelerated (recommended
default)"perc"
for Percentile methodUse BCa method when:
Use percentile method when:
💡 Advice: Bootstrap methods give slightly different
results on the same data in different runs due to their random nature.
For reproducibility, set a random seed using set.seed()
before
calling ci_boot()
.
Single-variable statistics need only one variable to be computed. Examples are mean, median, SD, etc.
Median CI
set.seed(123) # For reproducible results
# Median petal length for all flowers
iris |>
ci_boot(Petal.Length, FUN = median, R = 1000, bci.method = "bca")
#> # A tibble: 1 × 3
#> median lwr.ci upr.ci
#> <dbl> <dbl> <dbl>
#> 1 4.35 4 4.54
Median CI by Group
set.seed(456)
# Median by species
iris |>
group_by(Species) |>
ci_boot(Petal.Length, FUN = median, R = 1000, bci.method = "bca")
#> # A tibble: 3 × 4
#> Species median lwr.ci upr.ci
#> <fct> <dbl> <dbl> <dbl>
#> 1 setosa 1.5 1.4 1.5
#> 2 versicolor 4.35 4.1 4.5
#> 3 virginica 5.55 5.2 5.6
Standard Deviation CI
When to use: CIs of variability measures (SD, IQR, etc.) should be used when you want to know how consistent or variable your data is.
Variable-pair statistics need two variables to be computed. Examples are correlation coefficients (Pearson, Spearman, Kendall), Cramér’s V, Goodman-Kruskal Tau and Lambda, etc.
Pearson correlation measures the strength of linear relationship between two continuous variables:
set.seed(101)
# Relationship between petal length and width for each species
iris |>
group_by(Species) |>
ci_boot(
Petal.Length, Petal.Width,
FUN = cor, method = "pearson",
R = 1000, bci.method = "bca"
)
#> # A tibble: 3 × 4
#> Species cor lwr.ci upr.ci
#> <fct> <dbl> <dbl> <dbl>
#> 1 setosa 0.332 0.0649 0.517
#> 2 versicolor 0.787 0.665 0.868
#> 3 virginica 0.322 0.111 0.514
Note 1: method
is an argument of
function cor()
.
Note 2: If the CI does not include 0, there is likely a
real correlation.
Spearman correlation measures rank (monotonic non-linear) relationship between two continuous variables:
set.seed(101)
# Correlation between petal length and width
iris |>
group_by(Species) |>
ci_boot(
Petal.Length, Petal.Width,
FUN = cor, method = "spearman",
R = 1000, bci.method = "bca"
)
#> # A tibble: 3 × 4
#> Species cor lwr.ci upr.ci
#> <fct> <dbl> <dbl> <dbl>
#> 1 setosa 0.271 -0.00466 0.510
#> 2 versicolor 0.787 0.615 0.875
#> 3 virginica 0.363 0.111 0.591
Results: All three species show strong positive correlations.
The package offers several bootstrap methods. They are set using the
bci.method
argument. Different methods can give slightly
different results. Let’s look at two examples:
bca
– Bias-Corrected and Accelerated
(BCa) bootstrap CI (recommended default)perc
– Percentile bootstrap CIset.seed(222)
# Percentile method (simpler, more robust)
iris |> ci_boot(Petal.Length, FUN = median, R = 1000, bci.method = "perc")
#> # A tibble: 1 × 3
#> median lwr.ci upr.ci
#> <dbl> <dbl> <dbl>
#> 1 4.35 4 4.55
# BCa method (often more accurate)
iris |> ci_boot(Petal.Length, FUN = median, R = 1000, bci.method = "bca")
#> # A tibble: 1 × 3
#> median lwr.ci upr.ci
#> <dbl> <dbl> <dbl>
#> 1 4.35 4 4.5
# Same data, different confidence levels
ci_mean_t_stat(mean_ = 75, sd_ = 10, n = 25, conf.level = 0.90)
#> # A tibble: 1 × 6
#> group mean lwr.ci upr.ci sd n
#> <fct> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 "" 75 71.6 78.4 10 25
ci_mean_t_stat(mean_ = 75, sd_ = 10, n = 25, conf.level = 0.95)
#> # A tibble: 1 × 6
#> group mean lwr.ci upr.ci sd n
#> <fct> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 "" 75 70.9 79.1 10 25
ci_mean_t_stat(mean_ = 75, sd_ = 10, n = 25, conf.level = 0.99)
#> # A tibble: 1 × 6
#> group mean lwr.ci upr.ci sd n
#> <fct> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 "" 75 69.4 80.6 10 25
Questions:
# Effect of sample size on precision
ci_mean_t_stat(mean_ = 85, sd_ = 15, n = c(10, 25, 50, 100, 400))
#> # A tibble: 5 × 6
#> group mean lwr.ci upr.ci sd n
#> <fct> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 "" 85 74.3 95.7 15 10
#> 2 "" 85 78.8 91.2 15 25
#> 3 "" 85 80.7 89.3 15 50
#> 4 "" 85 82.0 88.0 15 100
#> 5 "" 85 83.5 86.5 15 400
Questions:
# Two treatment methods
ci_mean_t_stat(
mean_ = c(78, 82),
sd_ = c(8, 7),
n = c(30, 28),
group = c("Treatment A", "Treatment B")
)
#> # A tibble: 2 × 6
#> group mean lwr.ci upr.ci sd n
#> <fct> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 Treatment A 78 75.0 81.0 8 30
#> 2 Treatment B 82 79.3 84.7 7 28
Questions:
In this exercise, we investigate how the number of bootstrap
resamples (R
) affects both computation time and the
precision of confidence intervals. We’ll run the same bootstrap analysis
multiple times with different values of R
(500, 1000, 3000,
5000, and 9999) to observe patterns in:
R
?To avoid repetitive code, we create a helper function simulate_ci()
that:
ci_boot()
with a
specified number of resamples (R
)system.time()
R
value, computation time)# Function to run bootstrap CI and measure computation time
simulate_ci <- function(R) {
# system.time() tracks how long the code takes to run
time_s <- system.time(
result <- iris |> ci_boot(Petal.Length, FUN = mean, R = R, bci.method = "bca")
)
# Add useful metrics to the results:
# - diff: CI width (difference between upper and lower bounds)
# - R: number of resamples used
# - time_s: elapsed computation time in seconds
result <- result |>
mutate(
diff = (upr.ci - lwr.ci) |> round(digits = 3),
upr.ci = upr.ci |> round(digits = 3),
lwr.ci = lwr.ci |> round(digits = 3),
time_s = time_s[3], # [3] extracts elapsed time (wall-clock time),
R = R,
)
return(result)
}
Now we run the simulation multiple times (5 replications) for each
value of R
to see how results vary:
# Run simulation with different numbers of bootstrap resamples
# Each R value is tested 5 times to assess variability
set.seed(307)
results <- bind_rows(
# 500 resamples
simulate_ci(R = 500),
simulate_ci(R = 500),
simulate_ci(R = 500),
simulate_ci(R = 500),
simulate_ci(R = 500),
# 1000 resamples
simulate_ci(R = 1000),
simulate_ci(R = 1000),
simulate_ci(R = 1000),
simulate_ci(R = 1000),
simulate_ci(R = 1000),
# 3000 resamples
simulate_ci(R = 3000),
simulate_ci(R = 3000),
simulate_ci(R = 3000),
simulate_ci(R = 3000),
simulate_ci(R = 3000),
# 5000 resamples
simulate_ci(R = 5000),
simulate_ci(R = 5000),
simulate_ci(R = 5000),
simulate_ci(R = 5000),
simulate_ci(R = 5000),
# 9999 resamples
simulate_ci(R = 9999),
simulate_ci(R = 9999),
simulate_ci(R = 9999),
simulate_ci(R = 9999),
simulate_ci(R = 9999),
)
print(results, n = Inf)
#> # A tibble: 25 × 6
#> mean lwr.ci upr.ci diff time_s R
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 3.76 3.49 4.01 0.525 0.0500 500
#> 2 3.76 3.49 4.06 0.573 0.100 500
#> 3 3.76 3.49 4.02 0.526 0.120 500
#> 4 3.76 3.46 4.00 0.54 0.0800 500
#> 5 3.76 3.42 4.03 0.612 0.100 500
#> 6 3.76 3.45 4.01 0.562 0.150 1000
#> 7 3.76 3.47 4.06 0.589 0.140 1000
#> 8 3.76 3.45 4.03 0.583 0.0900 1000
#> 9 3.76 3.47 4.03 0.559 0.120 1000
#> 10 3.76 3.45 4.04 0.587 0.0900 1000
#> 11 3.76 3.48 4.04 0.562 0.260 3000
#> 12 3.76 3.49 4.04 0.552 0.310 3000
#> 13 3.76 3.48 4.05 0.567 0.240 3000
#> 14 3.76 3.47 4.04 0.571 0.230 3000
#> 15 3.76 3.47 4.05 0.574 0.240 3000
#> 16 3.76 3.46 4.04 0.575 0.420 5000
#> 17 3.76 3.48 4.04 0.558 0.390 5000
#> 18 3.76 3.46 4.03 0.561 0.550 5000
#> 19 3.76 3.47 4.04 0.569 0.390 5000
#> 20 3.76 3.47 4.04 0.572 0.41 5000
#> 21 3.76 3.47 4.03 0.563 0.98 9999
#> 22 3.76 3.47 4.04 0.572 0.860 9999
#> 23 3.76 3.48 4.04 0.558 0.860 9999
#> 24 3.76 3.48 4.03 0.558 0.860 9999
#> 25 3.76 3.48 4.04 0.554 0.870 9999
# Summary by number of replications (R)
summary_of_results <-
results |>
group_by(R) |>
summarize(
diff_mean = mean(diff) |> round(digits = 3),
diff_sd = sd(diff) |> round(digits = 3),
time_mean = mean(time_s) |> round(digits = 2),
time_sd = sd(time_s) |> round(digits = 2)
)
summary_of_results
#> # A tibble: 5 × 5
#> R diff_mean diff_sd time_mean time_sd
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 500 0.555 0.037 0.09 0.03
#> 2 1000 0.576 0.014 0.12 0.03
#> 3 3000 0.565 0.009 0.26 0.03
#> 4 5000 0.567 0.007 0.43 0.07
#> 5 9999 0.561 0.007 0.89 0.05
Questions:
R
affect computation time?R
affect the (a) width and (b)
stability of the confidence intervals?