In discrimination experiments, candidates are sent on the same test
(e.g. job, house rental) and one examines whether they receive the same
outcome. The number of non negative or explicitly positive answers are
examined in details, looking for outcome differences. In what follows,
we consider a test about the effect of gender and origin on the
recruitment of software developers (inter1 data set). The
candidates can have a French, Moroccan, Senegalese or Vietnamese origin,
suggested by their first and last names.
library(callback)
m <- inter1
table(m$origin, m$lastn)
#>    
#>     Bertrand Diallo Diouf Kaidi Moreau Pham Tran Zalegh
#>   F      310      0     0     0    310    0    0      0
#>   M        0      0     0   310      0    0    0    310
#>   S        0    310   310     0      0    0    0      0
#>   V        0      0     0     0      0  310  310      0
table(m$origin, m$firstn)
#>    
#>     Abdallah Amadou Anthony Fatou Jamila Minh Trang Sophie Tien Hiep
#>   F        0      0     310     0      0          0    310         0
#>   M      310      0       0     0    310          0      0         0
#>   S        0    310       0   310      0          0      0         0
#>   V        0      0       0     0      0        310      0       310The contents of the data set is:
str(m)
#> 'data.frame':    2480 obs. of  11 variables:
#>  $ offer    : Factor w/ 310 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 2 2 ...
#>  $ firstn   : Factor w/ 8 levels "Abdallah","Amadou",..: 7 4 5 6 1 2 3 8 1 2 ...
#>  $ lastn    : Factor w/ 8 levels "Bertrand","Diallo",..: 5 3 4 7 8 2 1 6 8 2 ...
#>  $ origin   : Factor w/ 4 levels "F","M","S","V": 1 3 2 4 2 3 1 4 2 3 ...
#>  $ sentorder: int  3 7 6 2 1 5 4 8 8 4 ...
#>  $ gender   : Factor w/ 2 levels "Man","Woman": 2 2 2 2 1 1 1 1 1 1 ...
#>  $ callback : logi  TRUE TRUE TRUE TRUE FALSE FALSE ...
#>  $ paris    : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ cont     : Factor w/ 2 levels "LTC","STC": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ ansorder : int  1 2 3 4 9 9 9 9 9 9 ...
#>  $ date     : Factor w/ 3 levels "April 2009","February 2009",..: 2 2 2 2 2 2 2 2 2 2 ...The offer variable is very important. It indicates the
job offer identification. It is important because, in order to test
discrimination, the workers must candidate on the same job
offer. This is the cluster parameter of the
callback() function. With cluster = "offer" we
are sure that all the computations will be paired, which means that we
will always compare the candidates on the very same job offer. This is
essential to produce meaningful results since otherwise the difference
of answers could come from the differences of recruiters and not from
the differences in gender or origin.
The second important variables are the ones that define the
candidates. Here, there are two variables : the suggested origin (F for
French, M for Moroccan, V for Vietnamese and S for Senegalese) and the
gender. Combined together, they give the candidate variable that we use
in the analysis. The origin and gender
variables are factors and the reference levels of these factors
implicitly define the reference candidate. By convention, the reference
candidate is the one that is the less susceptible to be discriminated
against. Here the French origin man should be taken because his French
origin and gender should not be discrimination sources in the French
labor market. In practice, we will check that this candidate really had
the highest callback rate. We can find the reference levels of our
factors by looking at the first level given by the levels()
function.
By default, the levels are ordered after their alphabetical ordering.
It is pure chance that we find the French man as a reference. It can be
changed with the relevel function. For instance, if one
wants to take the woman as a reference, enter:
and the new factor gender2 has “Woman” as the reference.
The last element we need is, obviously, the outcome of the job hiring
application. It is given by the callback variable. It is a
Boolean variable, TRUE when the recruiter gives a non negative callback
(in this data set), and FALSE otherwise.
We can know launch the callback() function, which
prepares the data for statistical analysis. Here we need to choose the
comp parameter. Indeed, we realize that there are n=8 candidates so that n(n−1)/2=8×7/2=28 comparisons are
possible. This is a large number and this is why callback()
performs the statistical analysis according to the reference candidate
only by default with comp = "ref". This reduces our
analysis to n−1=7 comparisons. One
can get the 28 comparisons by setting comp = "all"
instead.
dtest <- callback(
  data = m,
  cluster = "offer",
  candid = c("origin", "gender"),
  callback = "callback"
)The dtest object contains the formatted data needed for
the callback analysis. Using print() gives the mains
characteristics of the experiment :
print(dtest)
#> 
#>  Structure of the experiment 
#>  ---------------------------
#>  
#>  Candidates defined by: origin gender 
#>  Callback variable: callback 
#>  
#>  Number of tests for each candidate:
#> 
#>   F.Man F.Woman   M.Man M.Woman   S.Man S.Woman   V.Man V.Woman 
#>     310     310     310     310     310     310     310     310 
#> 
#>  
#>  Number of tests for each pair of candidates:
#> 
#>  F.Man.vs.F.Woman F.Man.vs.M.Man F.Man.vs.M.Woman F.Man.vs.S.Man
#>               310            310              310            310
#>  F.Man.vs.S.Woman F.Man.vs.V.Man F.Man.vs.V.Woman
#>               310            310              310
#> 
#>  
#>  Number of tests with all the candidates: 310We find that the experiment is standard since all the candidates have been sent to all the tests. When a candidate of the same type is send several times to a test, the most favorable answer is kept (the “max” rule). The reader is informed that there are other ways to deal with this issue.
We can take a look at the global callback rates of the candidates, by entering :
print(stat_raw(dtest))
#> 
#>  Proportions: raw callback rates 
#>  Confidence intervals: Student at 95 %
#>  
#>         tests callback inf_p_callback p_callback sup_p_callback
#> F.Man     310       86     0.22730239 0.27741935      0.3275363
#> F.Woman   310       70     0.17900426 0.22580645      0.2726086
#> M.Man     310       65     0.16411033 0.20967742      0.2552445
#> M.Woman   310       32     0.06916861 0.10322581      0.1372830
#> S.Man     310       43     0.10001944 0.13870968      0.1773999
#> S.Woman   310       26     0.05284271 0.08387097      0.1148992
#> V.Man     310       38     0.08587036 0.12258065      0.1592909
#> V.Woman   310       62     0.15522525 0.20000000      0.2447748and get a graphical representation with :
It is possible to change the definition of the confidence intervals, the confidence level and the colors in the plot. If you prefer the Clopper-Pearson definition, a 90% confidence interval, a “steelblue3” bar and a black confidence interval enter :
g <- stat_raw(dtest, level = 0.9,method="cp")
print(g)
#> 
#>  Proportions: raw callback rates 
#>  Confidence intervals: Clopper-Pearson at 90 %
#>  
#>         tests callback inf_p_callback p_callback sup_p_callback
#> F.Man     310       86     0.23570047 0.27741935      0.3223433
#> F.Woman   310       70     0.18721293 0.22580645      0.2683608
#> M.Man     310       65     0.17222662 0.20967742      0.2513276
#> M.Woman   310       32     0.07612170 0.10322581      0.1361644
#> S.Man     310       43     0.10749071 0.13870968      0.1752003
#> S.Woman   310       26     0.05943210 0.08387097      0.1144672
#> V.Man     310       38     0.09312657 0.12258065      0.1575588
#> V.Woman   310       62     0.16327759 0.20000000      0.2410655
plot(g, col = c("steelblue3","black"))When all the candidates are sent to all the tests, the previous figures may be used to measure discrimination. However, when there is a rotation of the candidates so that only a part of them is sent on each test, it could not be the case. For this reason, we prefer to use matched statistics, which only compare candidates that have been sent to the same tests.
Since we do pairwise comparisons, we will consider two candidates 1 and 2 that are send on the same test. There are four possible outcomes: no callback (denoted 0 for both candidates), one of the two candidates is called back (denoted 1 for the candidate called back, 0 for the other), or both candidates are called back (denoted 1 for both candidates). We count the corresponding cases and use the following notations:
In order to get the result of the discrimination tests, we will use
the stat_count function. It can be saved into an object for
further exports, or printed. The following instruction:
does not produce any printed output, but saves an object with class
stat_count into s. We can get the statistics
with:
print(sp)
#> 
#>  Callback counts:
#>  ----------------
#>                  tests callback disc callback1 Neither Only 1 Only 2 Both
#> F.Man vs F.Woman   310      113   70        86      70    197     43   27
#> F.Man vs M.Man     310      106   61        86      65    204     41   20
#> F.Man vs M.Woman   310       96   74        86      32    214     64   10
#> F.Man vs S.Man     310      100   71        86      43    210     57   14
#> F.Man vs S.Woman   310       97   82        86      26    213     71   11
#> F.Man vs V.Man     310       97   70        86      38    213     59   11
#> F.Man vs V.Woman   310      111   74        86      62    199     49   25
#>                  Difference calldif
#> F.Man vs F.Woman         43      16
#> F.Man vs M.Man           45      21
#> F.Man vs M.Woman         22      54
#> F.Man vs S.Man           29      43
#> F.Man vs S.Woman         15      60
#> F.Man vs V.Man           27      48
#> F.Man vs V.Woman         37      24The callback counts describe the results of the paired experiments. The first column defines the comparison under the form “candidate 1 vs candidate 2”. Here “F.Man vs F.Woman” means that we compare French origin men (“F.Man”) with the French origin woman (“F.Woman”). Out of 310 tests, 113 got at least one callback. The French origin men got 86 callbacks and the French origin women 70. The difference, called net discrimination, equals 86-70=16 callbacks. We can go further in the details thanks to the next columns. Out of 310 tests, neither candidate was called back in n00=197 of the job offers, n10=43 called only men, n01=27 called only women and n11=43 called both. Discrimination only occurs when a single candidate is called back. The net discrimination is thus n10−n01=43−27=16 (the “Difference” column). The corresponding line percentages are available with .
sp$props
#>                  p_callback   p_cand1    p_cand2     p_c00     p_c10      p_c01
#> F.Man vs F.Woman  0.3645161 0.2774194 0.22580645 0.6354839 0.1387097 0.08709677
#> F.Man vs M.Man    0.3419355 0.2774194 0.20967742 0.6580645 0.1322581 0.06451613
#> F.Man vs M.Woman  0.3096774 0.2774194 0.10322581 0.6903226 0.2064516 0.03225806
#> F.Man vs S.Man    0.3225806 0.2774194 0.13870968 0.6774194 0.1838710 0.04516129
#> F.Man vs S.Woman  0.3129032 0.2774194 0.08387097 0.6870968 0.2290323 0.03548387
#> F.Man vs V.Man    0.3129032 0.2774194 0.12258065 0.6870968 0.1903226 0.03548387
#> F.Man vs V.Woman  0.3580645 0.2774194 0.20000000 0.6419355 0.1580645 0.08064516
#>                       p_c11 p_cand_dif
#> F.Man vs F.Woman 0.13870968 0.05161290
#> F.Man vs M.Man   0.14516129 0.06774194
#> F.Man vs M.Woman 0.07096774 0.17419355
#> F.Man vs S.Man   0.09354839 0.13870968
#> F.Man vs S.Woman 0.04838710 0.19354839
#> F.Man vs V.Man   0.08709677 0.15483871
#> F.Man vs V.Woman 0.11935484 0.07741935We can save the output or print it, like in the previous example. Printing is the default.
In fact, there are three ways that can be used to compute proportions
in discrimination studies. First, you can divide the number of callbacks
by the number of tests. We call it “matched callback rates” given by the
function stat_mcr(). Second, you can restrict your analysis
to the tests which got at least one callback. We call it “total callback
shares”, given by the function stat_tcs(). Last you can
divide by the number of tests where only one candidate has been called
back. We call it “exclusive callback shares”, given by the function
stat_ecs().
The callback rate of candidates 1 and 2, denoted respectively p1 and p2, are obtained by dividing the number of callbacks of each candidate by the total number of discrimination tests n:
p1=n10+n11np2=n01+n11nwith n=n00+n10+n01+n11
The absence of discrimination is measured by: p1=p2⇔n10=n01
The stat_mcr() function provides the proportions, the
confidence intervals and the equality tests. By default, the level is
95% and can be changed with the leveloption. The Student
definition is obtained with:
mcr <- stat_mcr(dtest)
print(mcr)
#> 
#>  Proportions: matched callback rates 
#>  Confidence intervals: Student at 95 %
#>  
#>                  tests inf_p_callback p_callback sup_p_callback inf_p_cand1
#> F.Man vs F.Woman   310      0.3106416  0.3645161      0.4183907   0.2273024
#> F.Man vs M.Man     310      0.2888373  0.3419355      0.3950337   0.2273024
#> F.Man vs M.Woman   310      0.2579222  0.3096774      0.3614326   0.2273024
#> F.Man vs S.Man     310      0.2702542  0.3225806      0.3749071   0.2273024
#> F.Man vs S.Woman   310      0.2610009  0.3129032      0.3648056   0.2273024
#> F.Man vs V.Man     310      0.2610009  0.3129032      0.3648056   0.2273024
#> F.Man vs V.Woman   310      0.3043985  0.3580645      0.4117306   0.2273024
#>                    p_cand1 sup_p_cand1 inf_p_cand2    p_cand2 sup_p_cand2
#> F.Man vs F.Woman 0.2774194   0.3275363  0.17900426 0.22580645   0.2726086
#> F.Man vs M.Man   0.2774194   0.3275363  0.16411033 0.20967742   0.2552445
#> F.Man vs M.Woman 0.2774194   0.3275363  0.06916861 0.10322581   0.1372830
#> F.Man vs S.Man   0.2774194   0.3275363  0.10001944 0.13870968   0.1773999
#> F.Man vs S.Woman 0.2774194   0.3275363  0.05284271 0.08387097   0.1148992
#> F.Man vs V.Man   0.2774194   0.3275363  0.08587036 0.12258065   0.1592909
#> F.Man vs V.Woman 0.2774194   0.3275363  0.15522525 0.20000000   0.2447748
#>                  inf_cand_dif p_cand_dif sup_cand_dif
#> F.Man vs F.Woman -0.001263807 0.05161290    0.1044896
#> F.Man vs M.Man    0.018669997 0.06774194    0.1168139
#> F.Man vs M.Woman  0.123097544 0.17419355    0.2252896
#> F.Man vs S.Man    0.087439176 0.13870968    0.1899802
#> F.Man vs S.Woman  0.140210120 0.19354839    0.2468867
#> F.Man vs V.Man    0.104550333 0.15483871    0.2051271
#> F.Man vs V.Woman  0.023420286 0.07741935    0.1314184
#> 
#>  Student test 
#>                  statistic       p_stat c_stat
#> F.Man vs F.Woman  1.920642 5.569699e-02    .  
#> F.Man vs M.Man    2.716294 6.973512e-03    ** 
#> F.Man vs M.Woman  6.708070 9.404184e-11    ***
#> F.Man vs S.Man    5.323431 1.961932e-07    ***
#> F.Man vs S.Woman  7.140081 6.704916e-12    ***
#> F.Man vs V.Man    6.058490 3.990671e-09    ***
#> F.Man vs V.Woman  2.821082 5.095962e-03    ** 
#> 
#>  Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.10 ' ' 1and a corresponding plot with:
This represents the difference of proportions and their confidence
intervals. Another plot is available, with the confidence intervals of
the callback rate of the two candidates. However, the reader is informed
that these confidence intervals with level 1−α can be misleading because their
crossing does not guarantee the equality of the callback rates at the
α level. To get it anyway,
enter:
The difference analysis is not available with the Clopper-Pearson intervals.