This vignette describes how to download the latest Maddison Project
data and create from it four data sets included in the package
MaddisonData: MaddisonCountries,
MaddisonData, MaddisonSources, and
MaddisonYears.
The Wikipedia article on the Maddison Project says, “Development economist Branko Milanović (writing for the World Bank), development economist Morten Jerven, and billionaire philanthropist Bill Gates have identified the Maddison Project, the Penn World Tables, and World Bank/IMF data (the World Development Indicators), as the three main sources of worldwide economic statistics such as GDP data, with the focus of the Maddison Project being on historical data. Economist Paul Krugman has suggested the Maddison Project as a data source for historical debt, growth, and labor output and productivity data.”
Bolt and van Zanden (2020) say that, “The Maddison database on Historical Statistics of the World Economy has probably the widest coverage of data on GDP per capita across countries and over time currently available … . To compare income levels and developments for this period and set of countries, national income estimates are converted … to a common currency using purchasing power parities (PPPs).”
The web site for Maddison project data offers both “Angus Maddison’s unaltered final dataset” and the “Latest Maddison Project Release”.
On 2025-08-28 I saw that the “Latest Maddison Project Release” was
dated 2024-09-18. I requested that and found that it was available in
either Excel or Stata format. I downloaded the Excel format to my
current working directory as mpd2023web.xlsx. (On
2025-08-28 I found a 4 MB file by this name dated 2025-06-03. I changed
that name to mpd023web0.xlsx and downloaded ostensibly the
same file and got one with 4.9 MB. I doubt if I would have deleted some
of the content of the file with date 2025-06-03, though that’s possible.
In any event, I plan to ignore this difference for the moment.)
In 2024, I also downloaded “Maddison
Database 2010” and compared it with mpd2020.xlsx. I
compared the two for numbers for the former USSR, the UK, and the US. I
found that the newer data had many numbers the older data didn’t while
the older data had a few numbers absent from the newer data. However, it
seemed that I would be wise to ignore the older data.
(MadXlsx <- path_package2('^mpd2023.*xlsx$'))
#> [1] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx"
#> attr(,"searched")
#> [1] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/vignettes"
#> [2] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/vignettes/extdata"
#> [3] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/vignettes/inst/extdata"
#> [4] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/vignettes"
#> [5] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/vignettes/extdata"
#> [6] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/vignettes/inst/extdata"
#> [7] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData"
#> [8] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/extdata"
#> [9] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata"
(foundData <- (length(MadXlsx)>0))
#> [1] TRUEIf more than one file was found, pick the most recent one.
MadInfo <- file.info(MadXlsx)
cat('Multiple files found.\n')
#> Multiple files found.
print(MadInfo)
#> size
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx 4903804
#> isdir
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx FALSE
#> mode
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx 644
#> mtime
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx 2025-08-28 21:49:07
#> ctime
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx 2025-11-20 07:47:08
#> atime
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx 2025-09-07 13:02:24
#> uid
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx 501
#> gid
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx 20
#> uname
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx sg
#> grname
#> /private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/inst/extdata/mpd2023_web.xlsx staff
imax <- which.max(MadInfo$mtime)
Madxlsx <- MadXlsx[imax]Now read Madxlsx.
MaddisonData0 <- readxl::read_xlsx(Madxlsx, sheet='Full data')
head(MaddisonData0, 2)
#> # A tibble: 2 × 6
#> countrycode country region year gdppc pop
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 AFG Afghanistan South and South East Asia 1 NA NA
#> 2 AFG Afghanistan South and South East Asia 730 NA NA
tail(MaddisonData0)
#> # A tibble: 6 × 6
#> countrycode country region year gdppc pop
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 ZWE Zimbabwe Sub Saharan Africa 2017 1844. 13870.
#> 2 ZWE Zimbabwe Sub Saharan Africa 2018 1900. 14097.
#> 3 ZWE Zimbabwe Sub Saharan Africa 2019 1753. 14345.
#> 4 ZWE Zimbabwe Sub Saharan Africa 2020 1586. 14615.
#> 5 ZWE Zimbabwe Sub Saharan Africa 2021 1687. 14900.
#> 6 ZWE Zimbabwe Sub Saharan Africa 2022 1704. 15205.
MaddisonSources0 <- readxl::read_xlsx(Madxlsx,
sheet='Sources')
#> New names:
#> • `` -> `...2`
#> • `` -> `...3`
names(MaddisonSources0) <- c('ISO', 'years', 'source')
head(MaddisonSources0)
#> # A tibble: 6 × 3
#> ISO years source
#> <chr> <chr> <chr>
#> 1 GDP pc: 2008 - 2022: Total Economy Database the Conference Board… <NA> <NA>
#> 2 <NA> <NA> <NA>
#> 3 ISO code Coun… Source
#> 4 ARG Arge… <NA>
#> 5 <NA> 1800… Prado…
#> 6 <NA> 1870… Berto…
tail(MaddisonSources0)
#> # A tibble: 6 × 3
#> ISO years source
#> <chr> <chr> <chr>
#> 1 <NA> <NA> <NA>
#> 2 ZWE Zimb… <NA>
#> 3 <NA> 1914… Broad…
#> 4 <NA> <NA> <NA>
#> 5 Population sources <NA> <NA>
#> 6 1990 onward Total Economy Database the Conference Board for all … <NA> <NA>How many characters in countrycode?
ctryCds <- unique(MaddisonData0$countrycode)
nCds <- length(ctryCds)
table(nchCode <- nchar(ctryCds))
#>
#> 3
#> 169Good: All 3-letter codes.
Let’s concatenate country after countrycode
then check to see if there are any ctryCds with multiple
country names.
cdCtry <- sort(with(MaddisonData0, unique(
paste0(countrycode, country))))
if(length(cdCtry)!=nCds){
stop('some countrycode(s) have more than one',
' country')
}
cd_Ctry <- data.frame(ISO=substring(cdCtry, 1, 3),
country=substring(cdCtry, 4))NOTES:
The variable countrycode in
MaddisonData is called ISO in
MaddisonSources. We will standardize on ISO
for this.
There is a countrycode
function in a countrycode
package that sounds like it might help translating between
country and countrycode. However, a little
experimentation exposed “ambiguous” results. Therefore, we will not
pursue that further here.
Let’s create MaddisonCountries as a
data.frame with columns: countrycode,
country, and region:
rownames(cd_Ctry) <- cd_Ctry$countrycode
ctryRgn <- sort(with(MaddisonData0, unique(
paste0(countrycode, region))))
MaddisonCountries <- cbind(cd_Ctry,
region = substring(ctryRgn, 4))
rownames(MaddisonCountries) <- MaddisonCountries$ISO
save(MaddisonCountries, file='MaddisonCountries.rdb',
compress=TRUE)
# CANNOT include MaddisonCountries.rda in .Rbuildignore,
# so change the name.
# per R Packages, section 9.7
# As of 2025-09-24 the R Packages book recommends writing
# using usethis::use_data
if(FALSE){
tryUse <- try(usethis::use_data(MaddisonData))
}
getwd()
#> [1] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/vignettes"This saves MaddisonCountries in
MaddisonCountries.rdb in the working directory. To include
that in the package, copy or move that file into the data
subdirectory of the package and change the suffix to
rda.
We can use subset
got translate between country and countrycode
or select all countries in selected regions.
subset(MaddisonCountries, ISO=='GBR', country)
#> country
#> GBR United Kingdom
subset(MaddisonCountries, grepl('Yugo', country), 1:3)
#> ISO country region
#> YUG YUG Former Yugoslavia Eastern Europe
table(MaddisonCountries$region)
#>
#> East Asia Eastern Europe
#> 6 32
#> Latin America Middle East and North Africa
#> 26 20
#> South and South East Asia Sub Saharan Africa
#> 16 46
#> Western Europe Western Offshoots
#> 19 4
# What are "Western Offshoots"?
subset(MaddisonCountries, grepl('Of', region),
c(country, ISO))
#> country ISO
#> AUS Australia AUS
#> CAN Canada CAN
#> NZL New Zealand NZL
#> USA United States USANow create MaddisonData with countrycode
but without country and region.
First identify all rows of MaddisonData with both
gdppc and pop NA.
NAs <- with(MaddisonData0, is.na(gdppc) & is.na(pop))
str(MaddisonData <- MaddisonData0[!NAs, c(1, 4:6)])
#> tibble [23,280 × 4] (S3: tbl_df/tbl/data.frame)
#> $ countrycode: chr [1:23280] "AFG" "AFG" "AFG" "AFG" ...
#> $ year : num [1:23280] 1820 1870 1913 1950 1951 ...
#> $ gdppc : num [1:23280] NA NA NA 1156 1170 ...
#> $ pop : num [1:23280] 3280 4207 5730 8150 8284 ...
names(MaddisonData)[1] <- 'ISO'
save(MaddisonData, file='MaddisonData.rdb',
compress=TRUE)
# CANNOT include MaddisonData.rda in .Rbuildignore,
# so change the name.
# per R Packages, section 9.7
# As of 2025-09-24 the R Packages book recommends writing
# using usethis::use_data
if(FALSE){
tryUse <- try(usethis::use_data(MaddisonCountries))
}
getwd()
#> [1] "/private/var/folders/2n/zqk768wj3818l8x2wttbc5kw0000gn/T/RtmpldI43c/Rbuild51b644c22063/MaddisonData/vignettes"MaddisonSourcesThe first and last rows in MaddisonSources0 give general
info for GDP and population for all countries since 2008 and 1990,
respectively.
The format is a row giving ISO and country followed by one or more rows with sources for different date ranges followed by a blank row.
Get the first row for each country:
str(sourceNA <- which(is.na(MaddisonSources0[,1])
& is.na(MaddisonSources0[,2]) &
is.na(MaddisonSources0[,3])))
#> int [1:69] 2 7 11 14 17 20 24 28 31 37 ...
str(row1 <- c(4, sourceNA[-1]+1))
#> num [1:69] 4 8 12 15 18 21 25 29 32 38 ...
MaddisonSources0[row1[1:3], ]
#> # A tibble: 3 × 3
#> ISO years source
#> <chr> <chr> <chr>
#> 1 ARG Argentina <NA>
#> 2 BEL Belgium <NA>
#> 3 BGR Bulgaria <NA>
MaddisonSources0[tail(row1, 3), ]
#> # A tibble: 3 × 3
#> ISO years source
#> <chr> <chr> <chr>
#> 1 ZMB Zambia <NA>
#> 2 ZWE Zimbabwe <NA>
#> 3 Population sources <NA> <NA>Sources for each ISO / country?
ISOsourceNms <- MaddisonSources0[head(row1, -1), 1, drop=TRUE]
nISOsources <- length(ISOsourceNms)
MaddisonSources <- vector('list', nISOsources)
names(MaddisonSources) <- ISOsourceNms
# Get year ranges for each country
MaddisonYears <- data.frame(
ISO =character(0),
yearBegin=integer(0),
yearEnd =integer(0),
sourceNum=integer(0)
)
for(i in 1:nISOsources){
rowi <- (row1[i]+1):(row1[i+1]-2)
MadSrci <- MaddisonSources0[rowi, 2:3]
MaddisonSources[[i]] <- MadSrci
MadYrsi0 <- MadDateRanges(MadSrci[, 1, drop=TRUE])
MadYrsi <- cbind(ISO=ISOsourceNms[i], MadYrsi0)
MaddisonYears <- rbind(MaddisonYears,
MadYrsi)
}
MaddisonSources[['EGY']]
#> # A tibble: 3 × 2
#> years source
#> <chr> <chr>
#> 1 1 Scheidel, W. and Friesen, S. J., ‘The size of the econ…
#> 2 700 – 1500 Pamuk, Ş. and M. Shatzmiller (2011). “Real Wages and G…
#> 3 1820, 1870, 1913, 1950 Pamuk, S. (2006), Estimating Economic Growth in the Mi…
MaddisonYears[MaddisonYears$ISO=='EGY', ]
#> ISO yearBegin yearEnd sourceNum
#> 24 EGY 1 1 1
#> 25 EGY 700 1500 2
#> 26 EGY 1820 1820 3
#> 27 EGY 1870 1870 3
#> 28 EGY 1913 1913 3
#> 29 EGY 1950 1950 3
attr(MaddisonSources, 'since2008') <- paste(
"gdppc since 2008: Total Economy Database (TED) from the",
"Conference Board for all countries included in TED and UN",
"national accounts statistics for all others.")
head(MaddisonSources, 3)
#> $ARG
#> # A tibble: 2 × 2
#> years source
#> <chr> <chr>
#> 1 1800 - 1870 Prados de la Escosura, L. (2009). “Lost Decades? Economic Perform…
#> 2 1870 - 1900 Bertola, L and Ocampo, J.A. (2012) The Economic Development of La…
#>
#> $BEL
#> # A tibble: 2 × 2
#> years source
#> <chr> <chr>
#> 1 1 Scheidel, W. and Friesen, S. J., ‘The size of the economy and the …
#> 2 1500- 1846 Buyst, E. (2011), “Towards Estimates of Long Term Growth in the So…
#>
#> $BGR
#> # A tibble: 1 × 2
#> years source
#> <chr> <chr>
#> 1 1892-1945 Ivanov, M. (2006). “Bulgarian National Income between 1892 and 1924…
head(MaddisonYears)
#> ISO yearBegin yearEnd sourceNum
#> 1 ARG 1800 1870 1
#> 2 ARG 1870 1900 2
#> 3 BEL 1 1 1
#> 4 BEL 1500 1846 2
#> 5 BGR 1892 1945 1
#> 6 BIH 1952 2008 1
tail(MaddisonSources, 3)
#> $ZAF
#> # A tibble: 1 × 2
#> years source
#> <chr> <chr>
#> 1 1700–1900 (Cape Colony) Fourie, J. and Van Zanden, J.L. (2013). GDP in the Du…
#>
#> $ZMB
#> # A tibble: 1 × 2
#> years source
#> <chr> <chr>
#> 1 1906-1950 Broadberry, S., & Gardner, L. (2022). Economic growth in Sub-Sahara…
#>
#> $ZWE
#> # A tibble: 1 × 2
#> years source
#> <chr> <chr>
#> 1 1914-1950 Broadberry, S., & Gardner, L. (2022). Economic growth in Sub-Sahara…
tail(MaddisonYears)
#> ISO yearBegin yearEnd sourceNum
#> 128 VEN 1830 2012 1
#> 129 YUG 1952 2008 1
#> 130 YUG 2008 2008 2
#> 131 ZAF 1700 1900 1
#> 132 ZMB 1906 1950 1
#> 133 ZWE 1914 1950 1
save(MaddisonSources, file='MaddisonSources.rdb',
compress=TRUE)
save(MaddisonYears, file='MaddisonYears.rdb',
compress=TRUE) Bolt and Van Zanden (2024), “Maddison style estimates of the evolution of the world economy: A new 2023 update”, Journal of Economic Surveys, 1–41.