In this notebook, we will be exploring some of the concepts on population projection and stable population theory covered in the lecture.
The aim of the first section of this lab is to introduce you to the dataset that we’ll be using.
The dataset has values for \(_nL_x\) (person-years lived between ages \(x\) and \(x+n\)), \(_nF_x\) (age-specific fertility) and \(_nK_x\) (population between ages \(x\) and \(x+n\)) for six countries, which have different types of mortality, fertility and age structures:
Data are for females in 2015 and were obtained from the World Health Organisation (\(_nL_x\)) and United Nations Population Division (\(_nF_x\) and \(_nK_x\)).
d <- read_csv("bfdw_projection_data.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## age = col_double(),
## country = col_character(),
## nLx = col_double(),
## nKx = col_double(),
## nFx = col_double()
## )
Throughout this lab, we’ll be making use of the %>%
operator. This is like a ‘pipe’, which takes whatever is on the left hand side and feeds it forward into a function. For example, below we are taking our data d
, filtering it to only include Australia, and then printing the head (the first six rows).
d %>%
filter(country=="Australia") %>%
head
## # A tibble: 6 x 5
## age country nLx nKx nFx
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 0 Australia 498495. 752. 0
## 2 5 Australia 498196. 740. 0
## 3 10 Australia 497994. 689. 0
## 4 15 Australia 497637. 717. 15.5
## 5 20 Australia 497040. 813. 52.9
## 6 25 Australia 496409. 885. 102.
Notice that:
To save trouble later, let’s change \(_nL_x\) to have a radix of 1, and \(_nF_x\) to have units births per woman.
d <- d %>%
mutate(nLx = nLx/10^5,
nFx = nFx/1000)
Let’s investigate the values of \(_nL_x\) by country. First, let’s calculate the life expectancy at birth, \(e_0\), for all countries. Remember that \[e_0 = \frac{\sum {_nL_x}}{l_0}\]
To calculate in R we can use group_by
to do the calculation separately for each country.
# calculate life expectancy at birth
# we know radix is 1
radix <- 1
d %>%
group_by(country) %>%
summarise(e0 = sum(nLx)/radix) %>%
arrange(e0) # arrange by increasing life expectancy
## # A tibble: 6 x 2
## country e0
## <chr> <dbl>
## 1 Niger 62.8
## 2 Kenya 65.8
## 3 Senegal 68.6
## 4 Russia 76.3
## 5 Australia 84.8
## 6 Singapore 86.1
There is more than 23 years difference in the life expectancy at birth for females in Niger compared to Singapore.
We can use the age-specific fertility rates to calculate some summary fertility measures for each country. Note that we will assume that the proportion of babies that are female (fraction of females at birth) is 0.4886. Using the information we have, we can calculate the total fertility rate (TFR), gross reproduction ratio (GRR) and net reproduction ratio (NRR). We can also calculate the mean age at childbearing (\(\mu\)). Remember that \[ TFR = \sum {_nF_x} \cdot n\] \[ GRR = TFR \cdot f_{fab}\] \[NRR = \frac{\sum {_nF_x}\cdot {_nL_x} \cdot f_{fab}}{l_0} \] \[ \mu = \frac{\sum (x+2.5)\cdot {_nF_x}\cdot {_nL_x}}{\sum \cdot {_nF_x}\cdot {_nL_x}} \]
# calculate TFR, GRR and NRR by country
# set fraction female at birth to be 0.4886
ffab <- 0.4486
d %>%
filter(age>=15, age < 50) %>% # restrict our dataset to be fertility rates for women aged 15-49
group_by(country) %>%
summarise(tfr = sum(nFx*5),
grr = tfr*ffab,
nrr = sum(nFx*nLx*ffab/radix),
mu = sum((age+2.5)*nFx*nLx)/sum(nFx*nLx)) %>%
arrange(tfr)
## # A tibble: 6 x 5
## country tfr grr nrr mu
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Singapore 1.23 0.553 0.549 31.2
## 2 Russia 1.66 0.745 0.730 27.7
## 3 Australia 1.92 0.860 0.853 30.5
## 4 Kenya 4.44 1.99 1.78 28.6
## 5 Senegal 5.18 2.32 2.14 29.8
## 6 Niger 7.63 3.42 2.94 28.9
We can also plot the age-specific fertility rates for each country.
# create a dataset which only has reproductive ages (15-49)
d_rpa <- d %>% filter(age >= 15, age < 50)
# plot age-specific fertility rates
ggplot(data = d_rpa, aes(x = age, y = nFx, color = country)) +
geom_line()+
ylab("Age-specific fertility rate (births/woman)")+
ggtitle("Age-specific fertility rate by country")+
scale_color_brewer(palette="Set1") +
theme_minimal(base_size = 12) ## change ggplot default theme
We can also plot the proportion of the population in each age group. This is like half of a population pyramid (the female half). First we need to calculate the proportion in each age group from the raw numbers.
# let's calculate the proportion in each age group so we can compare across countries
d <- d %>%
group_by(country) %>%
mutate(nCx = nKx/sum(nKx))
ggplot(d, aes(x = age, y = nCx)) +
facet_wrap(~country, ncol=3)+
geom_bar(stat="identity", position = "dodge")+
ggtitle("Proportion of population in each age group")+
ylab("proportion")+
coord_flip()
In this section, we will do a population projection exercise using the fertility, mortality and population data from the dataset from the first section of this lab. We will then compare these results to the growth rate and population structure implied by stable population theory.
We will use the same data from the earlier exercises.
We need to create a Leslie matrix which contains the fertility and survivorship information in order to project a population forward. Elements in the top line of the Leslie matrix are equal to \[ _nL_0 \cdot \frac{1}{2}(_nF_x + {_nF_{x+n}}\cdot\frac{_nL_{x+n}}{_nL_x}) \cdot F_{fab} \] and the subdiagonals are equal to \[ \frac{_nL_{x+n}}{_nL_x} \] Below is a function that helps to create a Leslie matrix. It takes four inputs:
With these inputs, the function leslie
creates a square matrix with dimension equal to n_age_groups
and fills in the top row and subdiagonal according to the equations above.
## function to define Leslie matrix, based on nLx and nFx values
leslie <- function(nLx,
nFx,
n_age_groups=10,
ffab = 0.4886){
L = matrix(0, nrow = n_age_groups, ncol = n_age_groups)
L[1,] = ffab * nLx[1]*(nFx[1:n_age_groups]+nFx[2:(n_age_groups+1)]*nLx[2:(n_age_groups+1)]/nLx[1:n_age_groups])/2 # top row
diag(L[2:n_age_groups,1:(n_age_groups-1)]) = nLx[2:n_age_groups] / nLx[1:(n_age_groups-1)] # subdiagonal
return(L)
}
We can use this function to create a Leslie matrix using the \(_nL_x\) and \(_nF_x\) values for Australia:
LAA <- leslie(nLx = d$nLx[d$country=="Australia"],
nFx = d$nFx[d$country=="Australia"])
LAA
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0.0000000 0.0000000 0.01889559 0.08323168 0.1885731 0.2770073 0.2398413
## [2,] 0.9993985 0.0000000 0.00000000 0.00000000 0.0000000 0.0000000 0.0000000
## [3,] 0.0000000 0.9995963 0.00000000 0.00000000 0.0000000 0.0000000 0.0000000
## [4,] 0.0000000 0.0000000 0.99928252 0.00000000 0.0000000 0.0000000 0.0000000
## [5,] 0.0000000 0.0000000 0.00000000 0.99880073 0.0000000 0.0000000 0.0000000
## [6,] 0.0000000 0.0000000 0.00000000 0.00000000 0.9987295 0.0000000 0.0000000
## [7,] 0.0000000 0.0000000 0.00000000 0.00000000 0.0000000 0.9983661 0.0000000
## [8,] 0.0000000 0.0000000 0.00000000 0.00000000 0.0000000 0.0000000 0.9975890
## [9,] 0.0000000 0.0000000 0.00000000 0.00000000 0.0000000 0.0000000 0.0000000
## [10,] 0.0000000 0.0000000 0.00000000 0.00000000 0.0000000 0.0000000 0.0000000
## [,8] [,9] [,10]
## [1,] 0.1053085 0.01931361 0.00106925
## [2,] 0.0000000 0.00000000 0.00000000
## [3,] 0.0000000 0.00000000 0.00000000
## [4,] 0.0000000 0.00000000 0.00000000
## [5,] 0.0000000 0.00000000 0.00000000
## [6,] 0.0000000 0.00000000 0.00000000
## [7,] 0.0000000 0.00000000 0.00000000
## [8,] 0.0000000 0.00000000 0.00000000
## [9,] 0.9963091 0.00000000 0.00000000
## [10,] 0.0000000 0.99443093 0.00000000
We could also use \(_nL_x\) and \(_nF_x\) values for any combination of the different countries, for example, \(_nL_x\) values from Australia and \(_nF_x\) from Senegal:
LAS <- leslie(nLx = d$nLx[d$country=="Australia"],
nFx = d$nFx[d$country=="Senegal"])
We can now use the Leslie matrices to project a population forward in time. Let’s start with the Australia-only Leslie matrix and project forward Australia’s 2015 population 250 years. Note that because we are using five-year age groups, the population projection happens in five-year steps.
n_age_groups <- 10 # 0-50 in 5yr age groups
n_projections <- 50 # want to project forward 50*5 = 250 years
initial_pop <- d$nKx[d$country=="Australia"] # set initial population to be Australia's population in 2015
# define population matrix K
K <- matrix(0, nrow = n_age_groups, ncol = n_projections+1)
K[,1] <- initial_pop[1:n_age_groups]
# do the projection!
for(i in 2:(n_projections+1)){
K[,i] <- LAA%*%K[,i-1]
}
head(K[,1:5])
## [,1] [,2] [,3] [,4] [,5]
## [1,] 751.868 781.6098 751.5275 708.4348 680.9857
## [2,] 740.330 751.4157 781.1396 751.0754 708.0087
## [3,] 688.526 740.0312 751.1124 780.8243 750.7723
## [4,] 717.395 688.0320 739.5002 750.5735 780.2641
## [5,] 812.546 716.5347 687.2069 738.6133 749.6734
## [6,] 884.737 811.5136 715.6243 686.3338 737.6749
Now we have a matrix of populations by age for Australia projected into the future. We can use these numbers to investigate:
First, let’s get the matrix K
in a form that’s a bit easier to plot. This involves making K
into a dataframe in ‘long’ format, so that every row refers to a different year and age combination.
# get K in a form that's easier to ggplot
# make into a dataframe, name columns as years and add an age column
Kdf <- as.data.frame(K)
colnames(Kdf) <- seq(from = 2015, to = (2015+n_projections*5), by = 5)
Kdf <- cbind(age = seq(from = 0, to = 45, by = 5), Kdf)
# get in long format and then add proportion of population in each age group
Klong <- Kdf %>%
gather(year, population, -age) %>%
group_by(year) %>%
mutate(proportion = population/sum(population),
age = as.factor(age))
head(Klong)
## # A tibble: 6 x 4
## # Groups: year [1]
## age year population proportion
## <fct> <chr> <dbl> <dbl>
## 1 0 2015 752. 0.0953
## 2 5 2015 740. 0.0939
## 3 10 2015 689. 0.0873
## 4 15 2015 717. 0.0910
## 5 20 2015 813. 0.103
## 6 25 2015 885. 0.112
Create a data frame that has the total population by year, and calculate the annual growth rate based on the equation
\[ K(t+5) = K(t) \cdot e^{5R} \] So \[ R = \frac{1}{5} \cdot \log \frac{K(t+5)}{K(t)} \]
# total population by year
tot_pop <- Klong %>%
group_by(year) %>%
summarise(pop = sum(population)) %>%
mutate(R = c(NA, log(pop[2:n()]/pop[1:(n()-1)])/5))
head(tot_pop)
## # A tibble: 6 x 3
## year pop R
## <chr> <dbl> <dbl>
## 1 2015 7887. NA
## 2 2020 7879. -0.000190
## 3 2025 7761. -0.00302
## 4 2030 7689. -0.00188
## 5 2035 7484. -0.00541
## 6 2040 7278. -0.00557
Now we can plot total population, growth rate, and proportion in each age group over time.
# plot total population over time
ggplot(data = tot_pop, aes(x = year, y = pop, group = 1)) +
geom_point() + geom_line()+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
ggtitle("Total population over time")
# plot growth rate over time
ggplot(data = tot_pop, aes(x = year, y = R, group = 1)) +
geom_point() + geom_line()+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
ggtitle("Growth rate of population over time")
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 row(s) containing missing values (geom_path).
# plot proportion in each age group over time
ggplot(data = Klong, aes(x = year, y = proportion, group = age, color = age)) +
geom_point() + geom_line()+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
ggtitle("Proportion of population in each age group over time")
We can do an eigen-decomposition of the Leslie Matrix to get the values of the stable growth rate \(r\) and stable population age structure. If we compare these values to the values of the growth rate \(R\) and population age structure in the final projection period, you can see that they are very similar.
eigen.AA <- eigen(LAA)
r.AA <- log(Re(eigen.AA$value[1]))/5
v <- Re(eigen.AA$vectors[,1])
k.AA <- v/sum(v)
# compare stable population rate and rate at end of projection period
cbind(stable = r.AA, proj = tot_pop$R[nrow(tot_pop)])
## stable proj
## [1,] -0.002413445 -0.002422304
# compare stable population age distribution and age distribution at end of projection period
cbind(stable = k.AA, proj = Klong$proportion[Klong$year==(2015+n_projections*5)])
## stable proj
## [1,] 0.09518527 0.09518237
## [2,] 0.09628290 0.09627466
## [3,] 0.09741247 0.09740990
## [4,] 0.09852435 0.09853611
## [5,] 0.09960088 0.09962347
## [6,] 0.10068198 0.10069951
## [7,] 0.10173779 0.10173462
## [8,] 0.10272465 0.10270184
## [9,] 0.10358801 0.10356753
## [10,] 0.10426171 0.10426999
ages <- seq(0, 45, by = 5)
ggplot(data = data.frame(age = ages, proportion = k.AA), aes(x = age, y = proportion))+
geom_bar(stat = 'identity')+
coord_flip()+
ggtitle("Stable age population structure")
1. Would you expect a population projection for Niger to converge to the stable properties more quickly or slowly than Australia? Why? Repeat the projection exercise for Niger to see what happens.
Answer: First, let’s look plot the proportion of the population in each age group for both countries.
# let's calculate the proportion in each age group so we can compare across countries
df_forplot <- d %>%
filter(country %in% c("Australia", "Niger")) %>%
group_by(country) %>%
mutate(nCx = nKx/sum(nKx))
ggplot(df_forplot, aes(x = age, y = nCx)) +
facet_wrap(~country, ncol=3)+
geom_bar(stat="identity", position = "dodge")+
ggtitle("Proportion of population in each age group")+
ylab("proportion")+
coord_flip()
According to this plot, which country has an age distribution that looks more like a stable age distribution?
In stable population, the proportions in each age group do not change over time. The age pyramid for Australia has clear indentations, suggesting that the proportions in each age group are changing over time (from variation in age-specific fertility and mortality rates). In contrast, Niger looks much more like a stable population — we don’t see notches or indents in the age pyramid, and the proportion in each age group is monotonically decreasing. This suggests proportions in each age group aren’t changing as much over time. Thus, we would expect the population projection for Niger to converge to stable properties more quickly than Australia.
## Leslie matrix for Giger
LNN <- leslie(nLx = d$nLx[d$country=="Niger"],
nFx = d$nFx[d$country=="Niger"])
n_age_groups <- 10 # 0-50 in 5yr age groups
n_projections <- 50 # want to project forward 50*5 = 250 years
initial_pop <- d$nKx[d$country=="Niger"] # set initial population to be Niger's population in 2015
# define population matrix K
K <- matrix(0, nrow = n_age_groups, ncol = n_projections+1)
K[,1] <- initial_pop[1:n_age_groups]
# do the projection!
for(i in 2:(n_projections+1)){
K[,i] <- LNN%*%K[,i-1]
}
head(K[,1:5])
## [,1] [,2] [,3] [,4] [,5]
## [1,] 2031.254 2498.1488 3081.8377 3817.922 4728.830
## [2,] 1603.349 1974.1868 2427.9644 2995.255 3710.659
## [3,] 1279.366 1588.8466 1956.3302 2406.003 2968.163
## [4,] 1021.085 1268.6693 1575.5623 1939.973 2385.887
## [5,] 797.591 1008.7022 1253.2840 1556.455 1916.447
## [6,] 634.829 785.8361 993.8359 1234.813 1533.516
# get K in a form that's easier to ggplot
# make into a dataframe, name columns as years and add an age column
Kdf <- as.data.frame(K)
colnames(Kdf) <- seq(from = 2015, to = (2015+n_projections*5), by = 5)
Kdf <- cbind(age = seq(from = 0, to = 45, by = 5), Kdf)
# get in long format and then add proportion of population in each age group
Klong <- Kdf %>%
gather(year, population, -age) %>%
group_by(year) %>%
mutate(proportion = population/sum(population),
age = as.factor(age))
head(Klong)
## # A tibble: 6 x 4
## # Groups: year [1]
## age year population proportion
## <fct> <chr> <dbl> <dbl>
## 1 0 2015 2031. 0.226
## 2 5 2015 1603. 0.178
## 3 10 2015 1279. 0.142
## 4 15 2015 1021. 0.113
## 5 20 2015 798. 0.0886
## 6 25 2015 635. 0.0705
# total population by year
tot_pop <- Klong %>%
group_by(year) %>%
summarise(pop = sum(population)) %>%
mutate(R = c(NA, log(pop[2:n()]/pop[1:(n()-1)])/5))
head(tot_pop)
## # A tibble: 6 x 3
## year pop R
## <chr> <dbl> <dbl>
## 1 2015 9001. NA
## 2 2020 11055. 0.0411
## 3 2025 13604. 0.0415
## 4 2030 16781. 0.0420
## 5 2035 20748. 0.0424
## 6 2040 25675. 0.0426
# plot total population over time
ggplot(data = tot_pop, aes(x = year, y = pop, group = 1)) +
geom_point() + geom_line()+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
ggtitle("Total population over time")
# plot growth rate over time
ggplot(data = tot_pop, aes(x = year, y = R, group = 1)) +
geom_point() + geom_line()+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
ggtitle("Growth rate of population over time")
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 row(s) containing missing values (geom_path).
# plot proportion in each age group over time
ggplot(data = Klong, aes(x = year, y = proportion, group = age, color = age)) +
geom_point() + geom_line()+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
ggtitle("Proportion of population in each age group over time")
These plots confirm that the population projection for Niger converges to the stable properties more quickly than Australia. For example, we see that proportion within each age group becomes constant (key feature of stable population) around year 2035 in Niger but not until year 2200 in Australia.
2. Investigate how different fertility and mortality situations affect the stable population characteristics. Obtain eigendecompositions of different combinations of high and low fertility/mortality countries and look at the implied growth rates and population age structures.
Here, we’ll investigate two different combinations:
## high fertility country (Niger) with a low mortality county (Singapore)
LSN <- leslie(nLx = d$nLx[d$country=="Singapore"],
nFx = d$nFx[d$country=="Niger"])
eigen.SN <- eigen(LSN)
r.SN <- log(Re(eigen.SN$value[1]))/5
v <- Re(eigen.SN$vectors[,1])
k.SN <- v/sum(v)
ages <- seq(0, 45, by = 5)
ggplot(data = data.frame(age = ages, proportion = k.SN), aes(x = age, y = proportion))+
geom_bar(stat = 'identity')+
coord_flip()+
ggtitle("Stable age population structure")
r.SN
## [1] 0.04766744
The implied growth rate in a country with low mortality and high fertility is very high. The population age distribution is heavily skewed towards the earlier age groups — this is an example of an “expansive” age pyramid.
## high fertility and low mortality
LNS <- leslie(nLx = d$nLx[d$country=="Niger"],
nFx = d$nFx[d$country=="Singapore"])
eigen.NS <- eigen(LNS)
r.NS <- log(Re(eigen.NS$value[1]))/5
v <- Re(eigen.NS$vectors[,1])
k.NS <- v/sum(v)
ages <- seq(0, 45, by = 5)
ggplot(data = data.frame(age = ages, proportion = k.NS), aes(x = age, y = proportion))+
geom_bar(stat = 'identity')+
coord_flip()+
ggtitle("Stable age population structure")
r.NS
## [1] -0.02113221
The implied growth rate in a country with high mortality and low fertility is negative (population size is decreasing over time). The population age distribution is heavily skewed towards the older age groups. This is an example of an “restrictive” age pyramid.