By using the formula parameter, it is already possible to protect linked tables with the functions described in the other vignettes. The result is the strictest form of protection, which we call global protection.
This vignette illustrates alternative methods for linked tables. A
common method for such protection is back-tracking where one
iterates until a consistent solution is found. In the functions
described below, such a method can be achieved by specifying
linkedGauss = "back-tracking". With the GaussSuppression
package, one can find such a consistent solution using an improved
approach, avoiding the need for iteration.
Below we start with some examples of protected tables with
alternative methods. Then we show in more detail different function
calls that achieve this. We also discuss the parameters
recordAware and collapseAware.
Finally, an example with interval protection is also shown.
We use a modified version of the example 1 dataset used elsewhere.
library(GaussSuppression)
dataset <- SSBtoolsData("example1")
dataset <- dataset[c(1, 2, 4, 6, 8, 10, 12, 13, 14, 15), ]
dataset$freq = c(6, 8, 9, 1, 2, 4, 3, 7, 2, 2)
print(dataset)
#>      age      geo    eu year freq
#> 1  young    Spain    EU 2014    6
#> 2  young  Iceland nonEU 2014    8
#> 4    old    Spain    EU 2014    9
#> 6    old Portugal    EU 2014    1
#> 8  young  Iceland nonEU 2015    2
#> 10   old    Spain    EU 2015    4
#> 12   old Portugal    EU 2015    3
#> 13 young    Spain    EU 2016    7
#> 14 young  Iceland nonEU 2016    2
#> 15 young Portugal    EU 2016    2In the examples, we work with two linked tables:
- a three-way table where age, eu, and
year are crossed, and
- a two-way table where geo and year are
crossed.
In this example, small counts (1s and 2s) are protected. All zeros are treated as known structural zeros and are omitted from both the input and the output.
As in the other vignettes, primary suppressed cells are underlined and labeled in red, while the secondary suppressed cells are labeled in purple.
We first illustrate local protection, where the tables are protected separately without any coordination between them.
 Table 1: Linked suppressed tables by
 linkedGauss = "local" 
| age | year | nonEU | EU | Total | 
|---|---|---|---|---|
| young | 2014 | 8 | 6 | 14 | 
| young | 2015 | 2 | 2 | |
| young | 2016 | 2 | 9 | 11 | 
| young | Total | 12 | 15 | 27 | 
| old | 2014 | 10 | 10 | |
| old | 2015 | 7 | 7 | |
| old | 2016 | |||
| old | Total | 17 | 17 | |
| Total | 2014 | 8 | 16 | 24 | 
| Total | 2015 | 2 | 7 | 9 | 
| Total | 2016 | 2 | 9 | 11 | 
| Total | Total | 12 | 32 | 44 | 
| year | Iceland | Portugal | Spain | Total | 
|---|---|---|---|---|
| 2014 | 8 | 1 | 15 | 24 | 
| 2015 | 2 | 3 | 4 | 9 | 
| 2016 | 2 | 2 | 7 | 11 | 
| Total | 12 | 6 | 26 | 44 | 
Clearly, this is not a satisfactory solution. The totals for 2015 and 2016 are suppressed in one table, but not in the other. Furthermore, there is also an inconsistency for Iceland-2014, which is the same as nonEU-2014.
We continue with consistent protection.
 Table 2: Linked suppressed tables by
 linkedGauss = "consistent" 
| age | year | nonEU | EU | Total | 
|---|---|---|---|---|
| young | 2014 | 8 | 6 | 14 | 
| young | 2015 | 2 | 2 | |
| young | 2016 | 2 | 9 | 11 | 
| young | Total | 12 | 15 | 27 | 
| old | 2014 | 10 | 10 | |
| old | 2015 | 7 | 7 | |
| old | 2016 | |||
| old | Total | 17 | 17 | |
| Total | 2014 | 8 | 16 | 24 | 
| Total | 2015 | 2 | 7 | 9 | 
| Total | 2016 | 2 | 9 | 11 | 
| Total | Total | 12 | 32 | 44 | 
| year | Iceland | Portugal | Spain | Total | 
|---|---|---|---|---|
| 2014 | 8 | 1 | 15 | 24 | 
| 2015 | 2 | 3 | 4 | 9 | 
| 2016 | 2 | 2 | 7 | 11 | 
| Total | 12 | 6 | 26 | 44 | 
The inconsistency problems are now avoided.
However, a remaining problem with this solution is that Spain-2015 can be derived from EU-2015 and Portugal-2015.
Finally, we illustrate an improved form of consistent protection, denoted as super-consistent, which also avoids this problem.
 Table 3: Linked suppressed tables by
linkedGauss = "super-consistent"
| age | year | nonEU | EU | Total | 
|---|---|---|---|---|
| young | 2014 | 8 | 6 | 14 | 
| young | 2015 | 2 | 2 | |
| young | 2016 | 2 | 9 | 11 | 
| young | Total | 12 | 15 | 27 | 
| old | 2014 | 10 | 10 | |
| old | 2015 | 7 | 7 | |
| old | 2016 | |||
| old | Total | 17 | 17 | |
| Total | 2014 | 8 | 16 | 24 | 
| Total | 2015 | 2 | 7 | 9 | 
| Total | 2016 | 2 | 9 | 11 | 
| Total | Total | 12 | 32 | 44 | 
| year | Iceland | Portugal | Spain | Total | 
|---|---|---|---|---|
| 2014 | 8 | 1 | 15 | 24 | 
| 2015 | 2 | 3 | 4 | 9 | 
| 2016 | 2 | 2 | 7 | 11 | 
| Total | 12 | 6 | 26 | 44 | 
The suppressed cells in each table correspond to related equations that cannot be solved. The super-consistent method makes use of the fact that common cells across tables must have the same value. Thus, the equations from the different tables can be combined when searching for solutions. The super-consistent method ensures that suppressed cells cannot be uniquely determined from the combined system of equations. However, the coordination is not as strict as in the global method, where the system of equations becomes even larger. In this particular case, the super-consistent solution turns out to be the same as the global one.
To achieve both treating zeros as known structural zeros and omitting
them from the output, we use the parameter settings
extend0 = FALSE and removeEmpty = TRUE.
In SuppressLinkedTables(), the argument
withinArg specifies which parameters may differ between the
linked tables. In our examples, we choose this to be either
dimVar, hierarchies, or
formula.
The output from SuppressLinkedTables() is a list, with
one element for each of the linked tables.
SuppressLinkedTables() with dimVaroutput <- SuppressLinkedTables(data = dataset,
              fun = SuppressSmallCounts, 
              withinArg = list(table_1 = list(dimVar = c("age", "eu", "year")), 
                               table_2 = list(dimVar = c("geo", "year"))),
              freqVar = "freq", 
              maxN = 2,
              extend0 = FALSE, 
              removeEmpty = TRUE,
              linkedGauss = "super-consistent")
#> [preAggregate 10*12->7*11]
#> [preAggregate 10*12->9*10]
#> 
#> ====== Linked GaussSuppression by "super-consistent" algorithm:
#> 
#> GaussSuppression_anySum: .....................................
print(output[["table_1"]])
#>      age    eu  year freq primary suppressed
#> 1  Total Total Total   44   FALSE      FALSE
#> 2  Total Total  2014   24   FALSE      FALSE
#> 3  Total Total  2015    9   FALSE       TRUE
#> 4  Total Total  2016   11   FALSE       TRUE
#> 5  Total    EU Total   32   FALSE      FALSE
#> 6  Total    EU  2014   16   FALSE      FALSE
#> 7  Total    EU  2015    7   FALSE      FALSE
#> 8  Total    EU  2016    9   FALSE      FALSE
#> 9  Total nonEU Total   12   FALSE      FALSE
#> 10 Total nonEU  2014    8   FALSE      FALSE
#> 11 Total nonEU  2015    2    TRUE       TRUE
#> 12 Total nonEU  2016    2    TRUE       TRUE
#> 13   old Total Total   17   FALSE      FALSE
#> 14   old Total  2014   10   FALSE      FALSE
#> 15   old Total  2015    7   FALSE      FALSE
#> 16   old    EU Total   17   FALSE      FALSE
#> 17   old    EU  2014   10   FALSE      FALSE
#> 18   old    EU  2015    7   FALSE      FALSE
#> 19 young Total Total   27   FALSE      FALSE
#> 20 young Total  2014   14   FALSE      FALSE
#> 21 young Total  2015    2    TRUE       TRUE
#> 22 young Total  2016   11   FALSE       TRUE
#> 23 young    EU Total   15   FALSE      FALSE
#> 24 young    EU  2014    6   FALSE      FALSE
#> 25 young    EU  2016    9   FALSE      FALSE
#> 26 young nonEU Total   12   FALSE      FALSE
#> 27 young nonEU  2014    8   FALSE      FALSE
#> 28 young nonEU  2015    2    TRUE       TRUE
#> 29 young nonEU  2016    2    TRUE       TRUE
print(output[["table_2"]])
#>         geo  year freq primary suppressed
#> 1     Total Total   44   FALSE      FALSE
#> 2     Total  2014   24   FALSE      FALSE
#> 3     Total  2015    9   FALSE       TRUE
#> 4     Total  2016   11   FALSE       TRUE
#> 5   Iceland Total   12   FALSE      FALSE
#> 6   Iceland  2014    8   FALSE      FALSE
#> 7   Iceland  2015    2    TRUE       TRUE
#> 8   Iceland  2016    2    TRUE       TRUE
#> 9  Portugal Total    6   FALSE      FALSE
#> 10 Portugal  2014    1    TRUE       TRUE
#> 11 Portugal  2015    3   FALSE      FALSE
#> 12 Portugal  2016    2    TRUE       TRUE
#> 13    Spain Total   26   FALSE      FALSE
#> 14    Spain  2014   15   FALSE       TRUE
#> 15    Spain  2015    4   FALSE      FALSE
#> 16    Spain  2016    7   FALSE       TRUESuppressLinkedTables() with
hierarchiesFirst, we need hierarchies for the input. Here, these are generated
separately with SSBtools::FindDimLists().
h_age  <- SSBtools::FindDimLists(dataset["age"])[[1]]
h_geo  <- SSBtools::FindDimLists(dataset["geo"])[[1]]
h_eu   <- SSBtools::FindDimLists(dataset["eu"])[[1]]
h_year <- SSBtools::FindDimLists(dataset["year"])[[1]]
  
print(h_age)
#>   levels codes
#> 1      @ Total
#> 2     @@   old
#> 3     @@ young
print(h_geo)
#>   levels    codes
#> 1      @    Total
#> 2     @@  Iceland
#> 3     @@ Portugal
#> 4     @@    Spain
print(h_eu)
#>   levels codes
#> 1      @ Total
#> 2     @@    EU
#> 3     @@ nonEU
print(h_year)
#>   levels codes
#> 1      @ Total
#> 2     @@  2014
#> 3     @@  2015
#> 4     @@  2016The output is identical to using dimVar, so we only show
the code. Note that the only difference is the withinArg
argument.
output <- SuppressLinkedTables(data = dataset,
              fun = SuppressSmallCounts, 
              withinArg = 
                list(table_1 = list(hierarchies = list(age = h_age, eu = h_eu, year = h_year)), 
                     table_2 = list(hierarchies = list(geo = h_geo, year = h_year))),
              freqVar = "freq", 
              maxN = 2,
              extend0 = FALSE, 
              removeEmpty = TRUE,
              linkedGauss = "super-consistent")SuppressLinkedTables() with formulaWhen using formula, the output is similar to that
obtained with dimVar or hierarchies. The only
difference in the output is the ordering of rows, so we only show the
code.
Again, the only difference in the code is the withinArg
argument. However, note that we have omitted
removeEmpty = TRUE here, since this is the default when a
formula is used as input.
SuppressSmallCounts() with formula and
linkedGaussSince only the formula parameter varies between the
linked tables, one option is to run SuppressSmallCounts()
directly with formula as a list and the
linkedGauss parameter specified. Here we show 10 output
rows.
output <- SuppressSmallCounts(data = dataset,
              formula = list(table_1 = ~age*eu*year, table_2 = ~geo*year),   
              freqVar = "freq",
              maxN = 2,
              extend0 = FALSE,
              linkedGauss = "super-consistent") 
#> 
#> ====== Linked GaussSuppression by "super-consistent" algorithm:
#> 
#> GaussSuppression_anySum: ....................................
print(output[c(1, 6:7, 12, 19, 23, 25:28), ])
#>      age  year      geo freq primary suppressed
#> 1  Total Total    Total   44   FALSE      FALSE
#> 6  Total  2014    Total   24   FALSE      FALSE
#> 7  Total  2015    Total    9   FALSE       TRUE
#> 12   old Total       EU   17   FALSE      FALSE
#> 19 young  2016    Total   11   FALSE       TRUE
#> 23 Total  2014    nonEU    8   FALSE      FALSE
#> 25 Total  2016    nonEU    2    TRUE       TRUE
#> 26 Total  2014  Iceland    8   FALSE      FALSE
#> 27 Total  2014 Portugal    1    TRUE       TRUE
#> 28 Total  2014    Spain   15   FALSE       TRUEtables_by_formulas() with formula and
linkedGaussSimilar output can be obtained by tables_by_formulas().
In this case, the region variable is specified manually, and table
membership variables are included in the output. Again, 10 output rows
are shown.
output <-  tables_by_formulas(data = dataset,
              table_fun = SuppressSmallCounts,                
              table_formulas = list(table_1 = ~age*eu*year, table_2 = ~geo*year),   
              freqVar = "freq",
              maxN = 2,
              extend0 = FALSE,
              linkedGauss = "super-consistent",
              substitute_vars = list(region = c("geo", "eu"))) 
#> 
#> ====== Linked GaussSuppression by "super-consistent" algorithm:
#> 
#> GaussSuppression_anySum: ....................................
              
print(output[c(1, 6:7, 12, 19, 23, 25:28), ])
#>      age  year   region freq primary suppressed table_1 table_2
#> 1  Total Total    Total   44   FALSE      FALSE    TRUE    TRUE
#> 6  Total  2014    Total   24   FALSE      FALSE    TRUE    TRUE
#> 7  Total  2015    Total    9   FALSE       TRUE    TRUE    TRUE
#> 12   old Total       EU   17   FALSE      FALSE    TRUE   FALSE
#> 19 young  2016    Total   11   FALSE       TRUE    TRUE   FALSE
#> 23 Total  2014    nonEU    8   FALSE      FALSE    TRUE   FALSE
#> 25 Total  2016    nonEU    2    TRUE       TRUE    TRUE   FALSE
#> 26 Total  2014  Iceland    8   FALSE      FALSE   FALSE    TRUE
#> 27 Total  2014 Portugal    1    TRUE       TRUE   FALSE    TRUE
#> 28 Total  2014    Spain   15   FALSE       TRUE   FALSE    TRUErecordAware and
collapseAwareAn important issue is which cells are considered common cells. In the
functions, the parameter recordAware is set to
TRUE by default. In this case, common cells are determined
based on whether they aggregate the same underlying records. This is
similar to the use of cell keys, a well-known concept from the cell-key
method of statistical disclosure control.
When recordAware = FALSE, common cells are instead
identified by matching variable combinations. This does not always work
well. For example, here recordAware = TRUE is necessary to
capture that Iceland-2014 and nonEU-2014 are the
same.
A related parameter is collapseAware, but it is not
available when using SuppressLinkedTables(). When it is
used, even more cells are treated as common cells. In particular, the
suppression algorithm then automatically accounts for cells in one table
that are sums of cells in another table. In our example, this means that
the combination "consistent" and
collapseAware = TRUE gives the same result as
"super-consistent".
For more details on parameters and options, see the documentation for
SuppressLinkedTables().
Intervals for the primary suppressed cells are computed whenever the
lpPackage parameter is specified. When
linkedGauss = "super-consistent", intervals can be
calculated using this method as well.
There are several possibilities. See the documentation for the
parameter linkedIntervals in the help page for
SuppressLinkedTables().
If rangePercent and/or rangeMin are
provided, further suppression is performed to ensure that the interval
width requirements are met. See the help page for
GaussSuppressionFromData(), under the description of the
lpPackage parameter, for more details.
In the example below, the required interval width is 4. To achieve
this, two additional cells are suppressed: Portugal-2015 and
Spain-2015. Without this additional suppression, some intervals
are as narrow as 3 (see variables lo_1 and
up_1 below).
 Table 4: Linked suppressed tables with
intervals by 
linkedGauss = "super-consistent", rangeMin = 4
| age | year | nonEU | EU | Total | 
|---|---|---|---|---|
| young | 2014 | 8 | 6 | 14 | 
| young | 2015 | 2 [0, 4] | 2 [0, 4] | |
| young | 2016 | 2 [0, 4] | 9 | 11 | 
| young | Total | 12 | 15 | 27 | 
| old | 2014 | 10 | 10 | |
| old | 2015 | 7 | 7 | |
| old | 2016 | |||
| old | Total | 17 | 17 | |
| Total | 2014 | 8 | 16 | 24 | 
| Total | 2015 | 2 [0, 4] | 7 | 9 | 
| Total | 2016 | 2 [0, 4] | 9 | 11 | 
| Total | Total | 12 | 32 | 44 | 
| year | Iceland | Portugal | Spain | Total | 
|---|---|---|---|---|
| 2014 | 8 | 1 [0, 6] | 15 | 24 | 
| 2015 | 2 [0, 4] | 3 | 4 | 9 | 
| 2016 | 2 [0, 4] | 2 [0, 6] | 7 | 11 | 
| Total | 12 | 6 | 26 | 44 | 
This functionality can be used with all the function calls above.
Below is shown SuppressLinkedTables() with
dimVar.
output <- SuppressLinkedTables(data = dataset,
                               fun = SuppressSmallCounts, 
                               withinArg = list(table_1 = list(dimVar = c("age", "eu", "year")), 
                                                table_2 = list(dimVar = c("geo", "year"))),
                               freqVar = "freq", 
                               maxN = 2,
                               extend0 = FALSE, 
                               removeEmpty = TRUE,
                               linkedGauss = "super-consistent",
                               lpPackage = "highs", 
                               rangeMin = 4)
#> [preAggregate 10*12->7*11]
#> [preAggregate 10*12->9*10]
#> 
#> ====== Linked GaussSuppression by "super-consistent" algorithm:
#> 
#> GaussSuppression_anySum: .....................................
#> (16*18-0exact->9*5-DDcol2->9*3-GaussI->9*3)
#> 
#> Using highs for intervals...
#> ----
#> (16*18)
#> ..................
#> 10+1-6+2-5+3+
#>   2: 1 new, (4.000) 1-
#>   1: 2 new, (3.000) 1+
#> GaussSuppression_none: .............................
#> (16*16-0exact->11*5-DDcol2->11*3-GaussI->11*3)
#> 
#> Using highs for intervals...
#> ----
print(output[["table_1"]])
#>      age    eu  year freq rlim_freq lo_1 up_1 lo up suppressed_integer primary
#> 1  Total Total Total   44        NA   NA   NA NA NA                  0   FALSE
#> 2  Total Total  2014   24        NA   NA   NA NA NA                  0   FALSE
#> 3  Total Total  2015    9        NA   NA   NA NA NA                  2   FALSE
#> 4  Total Total  2016   11        NA   NA   NA NA NA                  2   FALSE
#> 5  Total    EU Total   32        NA   NA   NA NA NA                  0   FALSE
#> 6  Total    EU  2014   16        NA   NA   NA NA NA                  0   FALSE
#> 7  Total    EU  2015    7        NA   NA   NA NA NA                  0   FALSE
#> 8  Total    EU  2016    9        NA   NA   NA NA NA                  0   FALSE
#> 9  Total nonEU Total   12        NA   NA   NA NA NA                  0   FALSE
#> 10 Total nonEU  2014    8        NA   NA   NA NA NA                  0   FALSE
#> 11 Total nonEU  2015    2         4    0    4  0  4                  1    TRUE
#> 12 Total nonEU  2016    2         4    0    4  0  4                  1    TRUE
#> 13   old Total Total   17        NA   NA   NA NA NA                  0   FALSE
#> 14   old Total  2014   10        NA   NA   NA NA NA                  0   FALSE
#> 15   old Total  2015    7        NA   NA   NA NA NA                  0   FALSE
#> 16   old    EU Total   17        NA   NA   NA NA NA                  0   FALSE
#> 17   old    EU  2014   10        NA   NA   NA NA NA                  0   FALSE
#> 18   old    EU  2015    7        NA   NA   NA NA NA                  0   FALSE
#> 19 young Total Total   27        NA   NA   NA NA NA                  0   FALSE
#> 20 young Total  2014   14        NA   NA   NA NA NA                  0   FALSE
#> 21 young Total  2015    2         4    0    4  0  4                  1    TRUE
#> 22 young Total  2016   11        NA   NA   NA NA NA                  2   FALSE
#> 23 young    EU Total   15        NA   NA   NA NA NA                  0   FALSE
#> 24 young    EU  2014    6        NA   NA   NA NA NA                  0   FALSE
#> 25 young    EU  2016    9        NA   NA   NA NA NA                  0   FALSE
#> 26 young nonEU Total   12        NA   NA   NA NA NA                  0   FALSE
#> 27 young nonEU  2014    8        NA   NA   NA NA NA                  0   FALSE
#> 28 young nonEU  2015    2         4    0    4  0  4                  1    TRUE
#> 29 young nonEU  2016    2         4    0    4  0  4                  1    TRUE
#>    suppressed
#> 1       FALSE
#> 2       FALSE
#> 3        TRUE
#> 4        TRUE
#> 5       FALSE
#> 6       FALSE
#> 7       FALSE
#> 8       FALSE
#> 9       FALSE
#> 10      FALSE
#> 11       TRUE
#> 12       TRUE
#> 13      FALSE
#> 14      FALSE
#> 15      FALSE
#> 16      FALSE
#> 17      FALSE
#> 18      FALSE
#> 19      FALSE
#> 20      FALSE
#> 21       TRUE
#> 22       TRUE
#> 23      FALSE
#> 24      FALSE
#> 25      FALSE
#> 26      FALSE
#> 27      FALSE
#> 28       TRUE
#> 29       TRUE
print(output[["table_2"]])
#>         geo  year freq rlim_freq lo_1 up_1 lo up suppressed_integer primary
#> 1     Total Total   44        NA   NA   NA NA NA                  0   FALSE
#> 2     Total  2014   24        NA   NA   NA NA NA                  0   FALSE
#> 3     Total  2015    9        NA   NA   NA NA NA                  2   FALSE
#> 4     Total  2016   11        NA   NA   NA NA NA                  2   FALSE
#> 5   Iceland Total   12        NA   NA   NA NA NA                  0   FALSE
#> 6   Iceland  2014    8        NA   NA   NA NA NA                  0   FALSE
#> 7   Iceland  2015    2         4    0    4  0  4                  1    TRUE
#> 8   Iceland  2016    2         4    0    4  0  4                  1    TRUE
#> 9  Portugal Total    6        NA   NA   NA NA NA                  0   FALSE
#> 10 Portugal  2014    1         4    0    3  0  6                  1    TRUE
#> 11 Portugal  2015    3        NA   NA   NA NA NA                  3   FALSE
#> 12 Portugal  2016    2         4    0    3  0  6                  1    TRUE
#> 13    Spain Total   26        NA   NA   NA NA NA                  0   FALSE
#> 14    Spain  2014   15        NA   NA   NA NA NA                  2   FALSE
#> 15    Spain  2015    4        NA   NA   NA NA NA                  3   FALSE
#> 16    Spain  2016    7        NA   NA   NA NA NA                  2   FALSE
#>    suppressed
#> 1       FALSE
#> 2       FALSE
#> 3        TRUE
#> 4        TRUE
#> 5       FALSE
#> 6       FALSE
#> 7        TRUE
#> 8        TRUE
#> 9       FALSE
#> 10       TRUE
#> 11       TRUE
#> 12       TRUE
#> 13      FALSE
#> 14       TRUE
#> 15       TRUE
#> 16       TRUE