In this notebook, we will be exploring some of the concepts on population projection and stable population theory covered in the lecture.

Section I: Introduction to the data

The aim of the first section of this lab is to introduce you to the dataset that we’ll be using.

The dataset has values for \(_nL_x\) (person-years lived between ages \(x\) and \(x+n\)), \(_nF_x\) (age-specific fertility) and \(_nK_x\) (population between ages \(x\) and \(x+n\)) for six countries, which have different types of mortality, fertility and age structures:

Data are for females in 2015 and were obtained from the World Health Organisation (\(_nL_x\)) and United Nations Population Division (\(_nF_x\) and \(_nK_x\)).

d <- read_csv("bfdw_projection_data.csv")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   age = col_double(),
##   country = col_character(),
##   nLx = col_double(),
##   nKx = col_double(),
##   nFx = col_double()
## )

Throughout this lab, we’ll be making use of the %>% operator. This is like a ‘pipe’, which takes whatever is on the left hand side and feeds it forward into a function. For example, below we are taking our data d, filtering it to only include Australia, and then printing the head (the first six rows).

d %>% 
  filter(country=="Australia") %>% 
  head
## # A tibble: 6 x 5
##     age country       nLx   nKx   nFx
##   <dbl> <chr>       <dbl> <dbl> <dbl>
## 1     0 Australia 498495.  752.   0  
## 2     5 Australia 498196.  740.   0  
## 3    10 Australia 497994.  689.   0  
## 4    15 Australia 497637.  717.  15.5
## 5    20 Australia 497040.  813.  52.9
## 6    25 Australia 496409.  885. 102.

Notice that:

To save trouble later, let’s change \(_nL_x\) to have a radix of 1, and \(_nF_x\) to have units births per woman.

d <- d %>% 
  mutate(nLx = nLx/10^5, 
         nFx = nFx/1000)

Person-years lived: \(_nL_x\)

Let’s investigate the values of \(_nL_x\) by country. First, let’s calculate the life expectancy at birth, \(e_0\), for all countries. Remember that \[e_0 = \frac{\sum {_nL_x}}{l_0}\]

To calculate in R we can use group_by to do the calculation separately for each country.

# calculate life expectancy at birth 
# we know radix is 1
radix <- 1

d %>% 
  group_by(country) %>% 
  summarise(e0 = sum(nLx)/radix) %>%
  arrange(e0) # arrange by increasing life expectancy
## # A tibble: 6 x 2
##   country      e0
##   <chr>     <dbl>
## 1 Niger      62.8
## 2 Kenya      65.8
## 3 Senegal    68.6
## 4 Russia     76.3
## 5 Australia  84.8
## 6 Singapore  86.1

There is more than 23 years difference in the life expectancy at birth for females in Niger compared to Singapore.

Fertility rates: \(_nF_x\)

We can use the age-specific fertility rates to calculate some summary fertility measures for each country. Note that we will assume that the proportion of babies that are female (fraction of females at birth) is 0.4886. Using the information we have, we can calculate the total fertility rate (TFR), gross reproduction ratio (GRR) and net reproduction ratio (NRR). We can also calculate the mean age at childbearing (\(\mu\)). Remember that \[ TFR = \sum {_nF_x} \cdot n\] \[ GRR = TFR \cdot f_{fab}\] \[NRR = \frac{\sum {_nF_x}\cdot {_nL_x} \cdot f_{fab}}{l_0} \] \[ \mu = \frac{\sum (x+2.5)\cdot {_nF_x}\cdot {_nL_x}}{\sum \cdot {_nF_x}\cdot {_nL_x}} \]

# calculate TFR, GRR and NRR by country
# set fraction female at birth to be 0.4886
ffab <- 0.4486

d %>% 
  filter(age>=15, age < 50) %>% # restrict our dataset to be fertility rates for women aged 15-49
  group_by(country) %>% 
  summarise(tfr = sum(nFx*5), 
            grr = tfr*ffab,
            nrr = sum(nFx*nLx*ffab/radix),
            mu = sum((age+2.5)*nFx*nLx)/sum(nFx*nLx)) %>%
  arrange(tfr)
## # A tibble: 6 x 5
##   country     tfr   grr   nrr    mu
##   <chr>     <dbl> <dbl> <dbl> <dbl>
## 1 Singapore  1.23 0.553 0.549  31.2
## 2 Russia     1.66 0.745 0.730  27.7
## 3 Australia  1.92 0.860 0.853  30.5
## 4 Kenya      4.44 1.99  1.78   28.6
## 5 Senegal    5.18 2.32  2.14   29.8
## 6 Niger      7.63 3.42  2.94   28.9

We can also plot the age-specific fertility rates for each country.

# create a dataset which only has reproductive ages (15-49)
d_rpa <- d %>% filter(age >= 15, age < 50)

# plot age-specific fertility rates
ggplot(data = d_rpa, aes(x = age, y = nFx, color = country)) + 
  geom_line()+
  ylab("Age-specific fertility rate (births/woman)")+
  ggtitle("Age-specific fertility rate by country")+ 
  scale_color_brewer(palette="Set1") + 
  theme_minimal(base_size = 12) ## change ggplot default theme

We can also plot the proportion of the population in each age group. This is like half of a population pyramid (the female half). First we need to calculate the proportion in each age group from the raw numbers.

# let's calculate the proportion in each age group so we can compare across countries
d <- d %>% 
  group_by(country) %>%
  mutate(nCx = nKx/sum(nKx))
ggplot(d, aes(x = age, y = nCx)) + 
  facet_wrap(~country, ncol=3)+
  geom_bar(stat="identity", position = "dodge")+
  ggtitle("Proportion of population in each age group")+
  ylab("proportion")+
  coord_flip()

Section II: Population Projections

In this section, we will do a population projection exercise using the fertility, mortality and population data from the dataset from the first section of this lab. We will then compare these results to the growth rate and population structure implied by stable population theory.

We will use the same data from the earlier exercises.

Leslie matrices

We need to create a Leslie matrix which contains the fertility and survivorship information in order to project a population forward. Elements in the top line of the Leslie matrix are equal to \[ _nL_0 \cdot \frac{1}{2}(_nF_x + {_nF_{x+n}}\cdot\frac{_nL_{x+n}}{_nL_x}) \cdot F_{fab} \] and the subdiagonals are equal to \[ \frac{_nL_{x+n}}{_nL_x} \] Below is a function that helps to create a Leslie matrix. It takes four inputs:

  • a vector of \(_nL_x\) values
  • a vector of \(_nF_x\) values
  • the number of age groups. By default this is 10, because we are considering the five year age groups from ages 0–50.
  • The fraction female at birth, which is set to a default value of 0.4886.

With these inputs, the function leslie creates a square matrix with dimension equal to n_age_groups and fills in the top row and subdiagonal according to the equations above.

## function to define Leslie matrix, based on nLx and nFx values
leslie <- function(nLx,
                   nFx, 
                   n_age_groups=10,
                   ffab = 0.4886){
  L = matrix(0, nrow = n_age_groups, ncol = n_age_groups)
  L[1,] = ffab * nLx[1]*(nFx[1:n_age_groups]+nFx[2:(n_age_groups+1)]*nLx[2:(n_age_groups+1)]/nLx[1:n_age_groups])/2 # top row 
  diag(L[2:n_age_groups,1:(n_age_groups-1)]) = nLx[2:n_age_groups] / nLx[1:(n_age_groups-1)] # subdiagonal
  return(L)
}

We can use this function to create a Leslie matrix using the \(_nL_x\) and \(_nF_x\) values for Australia:

LAA <- leslie(nLx = d$nLx[d$country=="Australia"], 
              nFx = d$nFx[d$country=="Australia"])
LAA
##            [,1]      [,2]       [,3]       [,4]      [,5]      [,6]      [,7]
##  [1,] 0.0000000 0.0000000 0.01889559 0.08323168 0.1885731 0.2770073 0.2398413
##  [2,] 0.9993985 0.0000000 0.00000000 0.00000000 0.0000000 0.0000000 0.0000000
##  [3,] 0.0000000 0.9995963 0.00000000 0.00000000 0.0000000 0.0000000 0.0000000
##  [4,] 0.0000000 0.0000000 0.99928252 0.00000000 0.0000000 0.0000000 0.0000000
##  [5,] 0.0000000 0.0000000 0.00000000 0.99880073 0.0000000 0.0000000 0.0000000
##  [6,] 0.0000000 0.0000000 0.00000000 0.00000000 0.9987295 0.0000000 0.0000000
##  [7,] 0.0000000 0.0000000 0.00000000 0.00000000 0.0000000 0.9983661 0.0000000
##  [8,] 0.0000000 0.0000000 0.00000000 0.00000000 0.0000000 0.0000000 0.9975890
##  [9,] 0.0000000 0.0000000 0.00000000 0.00000000 0.0000000 0.0000000 0.0000000
## [10,] 0.0000000 0.0000000 0.00000000 0.00000000 0.0000000 0.0000000 0.0000000
##            [,8]       [,9]      [,10]
##  [1,] 0.1053085 0.01931361 0.00106925
##  [2,] 0.0000000 0.00000000 0.00000000
##  [3,] 0.0000000 0.00000000 0.00000000
##  [4,] 0.0000000 0.00000000 0.00000000
##  [5,] 0.0000000 0.00000000 0.00000000
##  [6,] 0.0000000 0.00000000 0.00000000
##  [7,] 0.0000000 0.00000000 0.00000000
##  [8,] 0.0000000 0.00000000 0.00000000
##  [9,] 0.9963091 0.00000000 0.00000000
## [10,] 0.0000000 0.99443093 0.00000000

We could also use \(_nL_x\) and \(_nF_x\) values for any combination of the different countries, for example, \(_nL_x\) values from Australia and \(_nF_x\) from Senegal:

LAS <- leslie(nLx = d$nLx[d$country=="Australia"], 
              nFx = d$nFx[d$country=="Senegal"])

Project the population

We can now use the Leslie matrices to project a population forward in time. Let’s start with the Australia-only Leslie matrix and project forward Australia’s 2015 population 250 years. Note that because we are using five-year age groups, the population projection happens in five-year steps.

n_age_groups <- 10 # 0-50 in 5yr age groups
n_projections <- 50 # want to project forward 50*5 = 250 years
initial_pop <- d$nKx[d$country=="Australia"] # set initial population to be Australia's population in 2015
# define population matrix K
K <- matrix(0, nrow = n_age_groups, ncol = n_projections+1)
K[,1] <- initial_pop[1:n_age_groups]

# do the projection!
for(i in 2:(n_projections+1)){
  K[,i] <- LAA%*%K[,i-1] 
}

head(K[,1:5])
##         [,1]     [,2]     [,3]     [,4]     [,5]
## [1,] 751.868 781.6098 751.5275 708.4348 680.9857
## [2,] 740.330 751.4157 781.1396 751.0754 708.0087
## [3,] 688.526 740.0312 751.1124 780.8243 750.7723
## [4,] 717.395 688.0320 739.5002 750.5735 780.2641
## [5,] 812.546 716.5347 687.2069 738.6133 749.6734
## [6,] 884.737 811.5136 715.6243 686.3338 737.6749

Now we have a matrix of populations by age for Australia projected into the future. We can use these numbers to investigate:

  • The total population over time
  • The share of the population in each age group over time
  • The population growth rate over time

First, let’s get the matrix K in a form that’s a bit easier to plot. This involves making K into a dataframe in ‘long’ format, so that every row refers to a different year and age combination.

# get K in a form that's easier to ggplot
# make into a dataframe, name columns as years and add an age column
Kdf <- as.data.frame(K)
colnames(Kdf) <- seq(from = 2015, to = (2015+n_projections*5), by = 5)
Kdf <- cbind(age = seq(from = 0, to = 45, by = 5), Kdf)

# get in long format and then add proportion of population in each age group
Klong <- Kdf %>% 
  gather(year, population, -age) %>%
  group_by(year) %>%
  mutate(proportion = population/sum(population), 
         age = as.factor(age))
head(Klong)
## # A tibble: 6 x 4
## # Groups:   year [1]
##   age   year  population proportion
##   <fct> <chr>      <dbl>      <dbl>
## 1 0     2015        752.     0.0953
## 2 5     2015        740.     0.0939
## 3 10    2015        689.     0.0873
## 4 15    2015        717.     0.0910
## 5 20    2015        813.     0.103 
## 6 25    2015        885.     0.112

Create a data frame that has the total population by year, and calculate the annual growth rate based on the equation

\[ K(t+5) = K(t) \cdot e^{5R} \] So \[ R = \frac{1}{5} \cdot \log \frac{K(t+5)}{K(t)} \]

# total population by year
tot_pop <- Klong %>% 
  group_by(year) %>%
  summarise(pop = sum(population)) %>%
  mutate(R = c(NA, log(pop[2:n()]/pop[1:(n()-1)])/5))
head(tot_pop)
## # A tibble: 6 x 3
##   year    pop         R
##   <chr> <dbl>     <dbl>
## 1 2015  7887. NA       
## 2 2020  7879. -0.000190
## 3 2025  7761. -0.00302 
## 4 2030  7689. -0.00188 
## 5 2035  7484. -0.00541 
## 6 2040  7278. -0.00557

Now we can plot total population, growth rate, and proportion in each age group over time.

# plot total population over time 
ggplot(data = tot_pop, aes(x = year, y = pop, group = 1)) +
  geom_point() + geom_line()+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  ggtitle("Total population over time")

# plot growth rate over time
ggplot(data = tot_pop, aes(x = year, y = R, group = 1)) +
  geom_point() + geom_line()+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  ggtitle("Growth rate of population over time")
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 row(s) containing missing values (geom_path).

# plot proportion in each age group over time
ggplot(data = Klong, aes(x = year, y = proportion, group = age, color = age)) +
  geom_point() + geom_line()+ 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  ggtitle("Proportion of population in each age group over time")

Stable population quantities

We can do an eigen-decomposition of the Leslie Matrix to get the values of the stable growth rate \(r\) and stable population age structure. If we compare these values to the values of the growth rate \(R\) and population age structure in the final projection period, you can see that they are very similar.

eigen.AA <- eigen(LAA)
r.AA <- log(Re(eigen.AA$value[1]))/5 
v <- Re(eigen.AA$vectors[,1])
k.AA <- v/sum(v)

# compare stable population rate and rate at end of projection period
cbind(stable = r.AA,  proj = tot_pop$R[nrow(tot_pop)])
##            stable         proj
## [1,] -0.002413445 -0.002422304
# compare stable population age distribution and age distribution at end of projection period
cbind(stable = k.AA, proj = Klong$proportion[Klong$year==(2015+n_projections*5)])
##           stable       proj
##  [1,] 0.09518527 0.09518237
##  [2,] 0.09628290 0.09627466
##  [3,] 0.09741247 0.09740990
##  [4,] 0.09852435 0.09853611
##  [5,] 0.09960088 0.09962347
##  [6,] 0.10068198 0.10069951
##  [7,] 0.10173779 0.10173462
##  [8,] 0.10272465 0.10270184
##  [9,] 0.10358801 0.10356753
## [10,] 0.10426171 0.10426999
ages <- seq(0, 45, by = 5)
ggplot(data = data.frame(age = ages, proportion = k.AA), aes(x = age, y = proportion))+
  geom_bar(stat = 'identity')+
  coord_flip()+
  ggtitle("Stable age population structure")

Solutions: Section II Exercises

1. Would you expect a population projection for Niger to converge to the stable properties more quickly or slowly than Australia? Why? Repeat the projection exercise for Niger to see what happens.

Answer: First, let’s look plot the proportion of the population in each age group for both countries.

# let's calculate the proportion in each age group so we can compare across countries
df_forplot <- d %>% 
  filter(country %in% c("Australia", "Niger")) %>% 
  group_by(country) %>% 
  mutate(nCx = nKx/sum(nKx))

ggplot(df_forplot, aes(x = age, y = nCx)) + 
  facet_wrap(~country, ncol=3)+
  geom_bar(stat="identity", position = "dodge")+
  ggtitle("Proportion of population in each age group")+
  ylab("proportion")+
  coord_flip()

According to this plot, which country has an age distribution that looks more like a stable age distribution?

In stable population, the proportions in each age group do not change over time. The age pyramid for Australia has clear indentations, suggesting that the proportions in each age group are changing over time (from variation in age-specific fertility and mortality rates). In contrast, Niger looks much more like a stable population — we don’t see notches or indents in the age pyramid, and the proportion in each age group is monotonically decreasing. This suggests proportions in each age group aren’t changing as much over time. Thus, we would expect the population projection for Niger to converge to stable properties more quickly than Australia.

## Leslie matrix for Giger 
LNN <- leslie(nLx = d$nLx[d$country=="Niger"], 
              nFx = d$nFx[d$country=="Niger"])


n_age_groups <- 10 # 0-50 in 5yr age groups
n_projections <- 50 # want to project forward 50*5 = 250 years
initial_pop <- d$nKx[d$country=="Niger"] # set initial population to be Niger's population in 2015
# define population matrix K
K <- matrix(0, nrow = n_age_groups, ncol = n_projections+1)
K[,1] <- initial_pop[1:n_age_groups]

# do the projection!
for(i in 2:(n_projections+1)){
  K[,i] <- LNN%*%K[,i-1] 
}

head(K[,1:5])
##          [,1]      [,2]      [,3]     [,4]     [,5]
## [1,] 2031.254 2498.1488 3081.8377 3817.922 4728.830
## [2,] 1603.349 1974.1868 2427.9644 2995.255 3710.659
## [3,] 1279.366 1588.8466 1956.3302 2406.003 2968.163
## [4,] 1021.085 1268.6693 1575.5623 1939.973 2385.887
## [5,]  797.591 1008.7022 1253.2840 1556.455 1916.447
## [6,]  634.829  785.8361  993.8359 1234.813 1533.516
# get K in a form that's easier to ggplot
# make into a dataframe, name columns as years and add an age column
Kdf <- as.data.frame(K)
colnames(Kdf) <- seq(from = 2015, to = (2015+n_projections*5), by = 5)
Kdf <- cbind(age = seq(from = 0, to = 45, by = 5), Kdf)

# get in long format and then add proportion of population in each age group
Klong <- Kdf %>% 
  gather(year, population, -age) %>%
  group_by(year) %>%
  mutate(proportion = population/sum(population), 
         age = as.factor(age))
head(Klong)
## # A tibble: 6 x 4
## # Groups:   year [1]
##   age   year  population proportion
##   <fct> <chr>      <dbl>      <dbl>
## 1 0     2015       2031.     0.226 
## 2 5     2015       1603.     0.178 
## 3 10    2015       1279.     0.142 
## 4 15    2015       1021.     0.113 
## 5 20    2015        798.     0.0886
## 6 25    2015        635.     0.0705
# total population by year
tot_pop <- Klong %>% 
  group_by(year) %>%
  summarise(pop = sum(population)) %>%
  mutate(R = c(NA, log(pop[2:n()]/pop[1:(n()-1)])/5))
head(tot_pop)
## # A tibble: 6 x 3
##   year     pop       R
##   <chr>  <dbl>   <dbl>
## 1 2015   9001. NA     
## 2 2020  11055.  0.0411
## 3 2025  13604.  0.0415
## 4 2030  16781.  0.0420
## 5 2035  20748.  0.0424
## 6 2040  25675.  0.0426
# plot total population over time 
ggplot(data = tot_pop, aes(x = year, y = pop, group = 1)) +
  geom_point() + geom_line()+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  ggtitle("Total population over time")

# plot growth rate over time
ggplot(data = tot_pop, aes(x = year, y = R, group = 1)) +
  geom_point() + geom_line()+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  ggtitle("Growth rate of population over time")
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 row(s) containing missing values (geom_path).

# plot proportion in each age group over time
ggplot(data = Klong, aes(x = year, y = proportion, group = age, color = age)) +
  geom_point() + geom_line()+ 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  ggtitle("Proportion of population in each age group over time")

These plots confirm that the population projection for Niger converges to the stable properties more quickly than Australia. For example, we see that proportion within each age group becomes constant (key feature of stable population) around year 2035 in Niger but not until year 2200 in Australia.

2. Investigate how different fertility and mortality situations affect the stable population characteristics. Obtain eigendecompositions of different combinations of high and low fertility/mortality countries and look at the implied growth rates and population age structures.

Here, we’ll investigate two different combinations:

  • low fertility and high mortality
  • high fertility and low mortality
## high fertility country (Niger) with a low mortality county (Singapore)
LSN <- leslie(nLx = d$nLx[d$country=="Singapore"], 
              nFx = d$nFx[d$country=="Niger"])

eigen.SN <- eigen(LSN)
r.SN <- log(Re(eigen.SN$value[1]))/5 
v <- Re(eigen.SN$vectors[,1])
k.SN <- v/sum(v)

ages <- seq(0, 45, by = 5)
ggplot(data = data.frame(age = ages, proportion = k.SN), aes(x = age, y = proportion))+
  geom_bar(stat = 'identity')+
  coord_flip()+
  ggtitle("Stable age population structure")

r.SN
## [1] 0.04766744

The implied growth rate in a country with low mortality and high fertility is very high. The population age distribution is heavily skewed towards the earlier age groups — this is an example of an “expansive” age pyramid.

## high fertility and low mortality
LNS <- leslie(nLx = d$nLx[d$country=="Niger"], 
              nFx = d$nFx[d$country=="Singapore"])

eigen.NS <- eigen(LNS)
r.NS <- log(Re(eigen.NS$value[1]))/5 
v <- Re(eigen.NS$vectors[,1])
k.NS <- v/sum(v)

ages <- seq(0, 45, by = 5)
ggplot(data = data.frame(age = ages, proportion = k.NS), aes(x = age, y = proportion))+
  geom_bar(stat = 'identity')+
  coord_flip()+
  ggtitle("Stable age population structure")

r.NS
## [1] -0.02113221

The implied growth rate in a country with high mortality and low fertility is negative (population size is decreasing over time). The population age distribution is heavily skewed towards the older age groups. This is an example of an “restrictive” age pyramid.

Section III: Social Contact Matrices and Infectious Disease Models

In this section, we’ll explore social contact matrices. What is a contact matrix? Let’s assume the population has been divided into \(a\) discrete age groups. Let \(c_{ij}\) be the average number of contacts that a person in age group \(i\) has with people in age group \(j\) over a given time period. The contact matrix \(C = (c_{ij})\) is the \(a \times a\) matrix where each entry of the expected number of contacts between each pair of age groups.

Why do researchers work with age-structured contact matrices? Most fundamentally, age-specific interpersonal contact patterns determine the trajectory of outbreaks of directly transmitted pathogens (e.g., COVID-19). Contact matrices are also a key input in mathematical disease models.

We will use social contact data from the Berkeley Interpersonal Contact Study, which has been collecting data on interpersonal contact over the course of the pandemic. Here, we’ll focus on contact data collected in April 2020 from two states: Iowa and Washington. The dataset contains the following variables:

## read in contact data 
contact_data <- data.frame(
  stringsAsFactors = FALSE,
  alter_age = c("[0,18)","[0,18)","[0,18)",
                "[0,18)","[0,18)","[0,18)","[0,18)","[18,25)",
                "[18,25)","[18,25)","[18,25)","[18,25)","[18,25)","[25,35)",
                "[25,35)","[25,35)","[25,35)","[25,35)","[25,35)",
                "[35,45)","[35,45)","[35,45)","[35,45)","[35,45)",
                "[35,45)","[45,55)","[45,55)","[45,55)","[45,55)",
                "[45,55)","[45,55)","[55,65)","[55,65)","[55,65)",
                "[55,65)","[55,65)","[55,65)","[65,100]","[65,100]",
                "[65,100]","[65,100]","[65,100]","[65,100]","[18,25)",
                "[25,35)","[35,45)","[45,55)","[55,65)","[65,100]",
                "[0,18)","[0,18)","[0,18)","[0,18)","[0,18)","[0,18)",
                "[0,18)","[18,25)","[18,25)","[18,25)","[18,25)",
                "[18,25)","[18,25)","[25,35)","[25,35)","[25,35)",
                "[25,35)","[25,35)","[25,35)","[35,45)","[35,45)","[35,45)",
                "[35,45)","[35,45)","[35,45)","[45,55)","[45,55)",
                "[45,55)","[45,55)","[45,55)","[45,55)","[55,65)",
                "[55,65)","[55,65)","[55,65)","[55,65)","[55,65)",
                "[65,100]","[65,100]","[65,100]","[65,100]","[65,100]",
                "[65,100]","[18,25)","[25,35)","[35,45)","[45,55)",
                "[55,65)","[65,100]"),
  ego_age = c("[0,18)","[35,45)","[18,25)",
              "[25,35)","[45,55)","[55,65)","[65,100]","[35,45)",
              "[18,25)","[25,35)","[45,55)","[55,65)","[65,100]",
              "[35,45)","[18,25)","[25,35)","[45,55)","[55,65)",
              "[65,100]","[35,45)","[18,25)","[25,35)","[45,55)",
              "[55,65)","[65,100]","[35,45)","[18,25)","[25,35)",
              "[45,55)","[55,65)","[65,100]","[35,45)","[18,25)",
              "[25,35)","[45,55)","[55,65)","[65,100]","[35,45)",
              "[18,25)","[25,35)","[45,55)","[55,65)","[65,100]","[0,18)",
              "[0,18)","[0,18)","[0,18)","[0,18)","[0,18)",
              "[0,18)","[35,45)","[18,25)","[25,35)","[45,55)",
              "[55,65)","[65,100]","[35,45)","[18,25)","[25,35)","[45,55)",
              "[55,65)","[65,100]","[35,45)","[18,25)","[25,35)",
              "[45,55)","[55,65)","[65,100]","[35,45)","[18,25)",
              "[25,35)","[45,55)","[55,65)","[65,100]","[35,45)",
              "[18,25)","[25,35)","[45,55)","[55,65)","[65,100]",
              "[35,45)","[18,25)","[25,35)","[45,55)","[55,65)",
              "[65,100]","[35,45)","[18,25)","[25,35)","[45,55)",
              "[55,65)","[65,100]","[0,18)","[0,18)","[0,18)","[0,18)",
              "[0,18)","[0,18)"),
  state = c("Iowa","Iowa","Iowa","Iowa","Iowa",
            "Iowa","Iowa","Iowa","Iowa","Iowa","Iowa","Iowa","Iowa","Iowa","Iowa",
            "Iowa","Iowa","Iowa","Iowa","Iowa","Iowa","Iowa","Iowa","Iowa",
            "Iowa","Iowa","Iowa","Iowa","Iowa","Iowa","Iowa","Iowa","Iowa","Iowa",
            "Iowa","Iowa","Iowa","Iowa","Iowa","Iowa","Iowa","Iowa","Iowa",
            "Iowa","Iowa","Iowa","Iowa","Iowa","Iowa","Washington",
            "Washington","Washington","Washington",
            "Washington","Washington","Washington","Washington","Washington","Washington","Washington",
            "Washington","Washington","Washington","Washington","Washington", "Washington","Washington","Washington",
            "Washington","Washington","Washington","Washington", "Washington","Washington",
            "Washington","Washington","Washington","Washington","Washington",
            "Washington","Washington", "Washington","Washington","Washington","Washington","Washington",
            "Washington", "Washington","Washington","Washington","Washington", "Washington","Washington",
            "Washington", "Washington","Washington","Washington","Washington"),
  avg_contacts = c(1.96355865075412,
                   1.07790278376977,0.846004352438355,1.25872430861939,
                   0.856510337330594,0.283046641676595,0.103733294413363,
                   0.520884363375575,2.89924081331928,0.799656186032481,
                   0.541806519198463,0.293628014593759,0.168317120672265,1.10939850263363,
                   0.970173425755095,2.29066969762187,0.706785631862259,
                   0.517604165576141,0.332388336966515,2.3915300439597,
                   0.600961347389168,1.05498606110888,0.885540297483235,
                   0.585549244618107,0.303659625641791,0.932109063584607,
                   0.657972659382363,0.707465490742922,1.45268153797397,
                   0.660202277518944,0.294450038109138,0.65051855766295,
                   0.376356112153727,0.546831059814438,0.696810813389629,
                   0.987824394701072,0.411559114570664,0.419786640822864,
                   0.268457211094917,0.43696483084501,0.386718344800324,
                   0.512126819771215,0.795187584529312,0.373597394030026,
                   0.674385049219153,0.549181696022704,0.459332813407329,
                   0.160210410124759,0.0730627799228152,1.08669597261484,
                   1.25292308158473,0.797696476263686,1.09896801915227,
                   0.774099268925644,0.316942675609594,0.133045284839802,
                   0.237978826422459,1.32149144073559,0.351105050032515,
                   0.276594819959561,0.205780509393844,0.0954807846997501,
                   0.604310003842793,0.575734619243985,1.21848144580133,
                   0.370019430439582,0.323342743883215,0.207709023733976,
                   1.45446289422518,0.346937110312083,0.537263025647986,
                   0.542291526093321,0.387640440740019,0.220904109649164,
                   0.531104772928663,0.394915215687221,0.322180377724249,
                   0.770546174690304,0.415008863029254,0.233243729370475,
                   0.37983125755873,0.293953272223552,0.281677340714602,
                   0.415213636487433,0.693203608564507,0.301820892190442,
                   0.246118895036495,0.155084923122772,0.205742349729577,
                   0.265340627452728,0.343185426450645,0.60028694161922,
                   0.322555475126993,0.728680365204508,0.738590288300677,
                   0.446913245500217,0.183071831119944,0.087381567624679)
)

head(contact_data)
##   alter_age ego_age state avg_contacts
## 1    [0,18)  [0,18)  Iowa    1.9635587
## 2    [0,18) [35,45)  Iowa    1.0779028
## 3    [0,18) [18,25)  Iowa    0.8460044
## 4    [0,18) [25,35)  Iowa    1.2587243
## 5    [0,18) [45,55)  Iowa    0.8565103
## 6    [0,18) [55,65)  Iowa    0.2830466

Let’s visualize our contact matrices for Iowa and Washington. Which state do we expect will have higher levels of social contact?

## plot contact matrices 
contact_data %>% 
  ggplot() +
  geom_tile(aes(x = ego_age, y = alter_age, fill = avg_contacts)) +
  coord_equal() +
  viridis::scale_fill_viridis(name = str_wrap("Contacts per respondent", width = 10)) +
  facet_grid(~state) +
  theme_minimal(base_size = 14) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .3)) +
  labs(
    title = "Social Contact Matrices (April 2020)",
    x = "Ego age",
    y = "Alter age"
  )

We can see two clear patterns in this plot. First, levels of contact are generally higher in Iowa than in Washington. Second, in both states there are higher values in the leading diagonal, suggesting that contact patterns are highly assortative by age category — that is, people have more conversational contacts close to them in age.

Summarizing Contact Intensity

The leading eigenvalue of a contact matrix is a summary measure of the overall levels of contact implied by a contact matrix.

Our contact data is formatted in a dataframe. To summarize state-level contact intensity, we will manipulate our dataframe into two separate matrices, one for each state. Then, we will calculate the leading eigenvalue of each contact matrix and compare overall contact intensity.

## Manipulate data.frame into a matrix (Iowa)
iowa_matrix <- contact_data %>%
  filter(state == "Iowa") %>% 
    select(alter_age, ego_age, avg_contacts) %>%
    arrange(ego_age) %>%
    pivot_wider(names_from = ego_age, values_from = avg_contacts) %>%
    column_to_rownames(var = "alter_age") %>%
    data.matrix()

## Manipulate data.frame into a matrix (Washington)
washington_matrix <- contact_data %>%
  filter(state == "Washington") %>% 
    select(alter_age, ego_age, avg_contacts) %>%
    arrange(ego_age) %>%
    pivot_wider(names_from = ego_age, values_from = avg_contacts) %>%
    column_to_rownames(var = "alter_age") %>%
    data.matrix()

## calculate leading eigenvalues for Iowa
eigen_iowa <- eigen(iowa_matrix)$values[1]

## calculate leading eigenvalue for Washington
eigen_washington <- eigen(washington_matrix)$values[1]

paste0("The leading eigenvalue of the contact matrix for Iowa (", round(eigen_iowa, 3), ") is larger than the leading eigenvalue of the contact matrix for Washington (", round(eigen_washington, 3), "), suggesting that overall contact intensity in Iowa is higher than in Washington.")
## [1] "The leading eigenvalue of the contact matrix for Iowa (5.677) is larger than the leading eigenvalue of the contact matrix for Washington (3.702), suggesting that overall contact intensity in Iowa is higher than in Washington."

The total contact intensity is higher in Iowa than in Washington. What implications does this have for the spread of infectious disease?

Infectious Disease Models and \(R_0\)

As we saw in lecture, the social contact matrix is also related the the next generation matrix in infectious disease models. The next generation matrix is a transition matrix (just like the Leslie matrix) and describes the transition between stages (compartments) in the infectious disease model.

The dominant eigenvalue of the next generation matrix is the basic reproduction number,\(R_0\), which is the expected number of secondary infections from one infected person in a fully susceptible population.

If we have an age and disease stage structured population model, for respiratory pathogens spread via direct contact (such as SARS-CoV-2) we can define the next generation matrix as:

\[ NGM = D_u \cdot C \cdot D_{dI} \]

where:

  • \(NGM\) is the next generation matrix
  • \(D_u\) is a diagonal matrix with diagonal entries \(u_i\) representing the probability of a successful transmission for age group \(i\), given contact with an infectious individual
  • \(C\) is the contact matrix
  • \(D_{dI}\) is a diagonal matrix with diagonal entries \(dI\) equal to the infectious period

Here’s a function for calculating the \(R_0\):

## function to calculate R0 from NGM
u = c(0.39, 0.62, 0.81, 0.83, 0.81, 0.74) * 0.2 # relative susceptibility of each age class * probability of transmission given contact
compute_R0 = function(u, C){
  dI <- 5 #recovery period in days 
  Du <- diag(u, 7)
  Dy <- diag(dI, 7)
  NGM <- Du %*% C %*% Dy
  R0  <- abs(eigen(NGM)$values[1])
  return(R0)
}

Now let’s use the function to calculate \(R_0\) for Iowa and Washington:

compute_R0(u, iowa_matrix)
## [1] 4.077668
compute_R0(u, washington_matrix)
## [1] 2.505717

We can conclude that the \(R_0\) for a COVID-19 like diseases is much higher in Iowa than in Washington based on the social contact data.

Section III Exercises Solutions

1. What does an increase in contact among school age children (in either Iowa or Washington) imply for disease transmission? What about if susceptibility didn’t vary by age (\(u\) is constant over age)?

## create a new matrix with higher contact among school children (first entry of matrix)
iowa_matrix_higher_contact <- iowa_matrix
iowa_matrix_higher_contact[1, 1] <- iowa_matrix[1, 1]*3 ## triple amount of contact 

## Calcuate R0 
compute_R0(u, iowa_matrix)
## [1] 4.077668
compute_R0(u, iowa_matrix_higher_contact)
## [1] 4.220683

We see that this increase in contact among school age children implies that disease transmission is increasing. However, despite tripling the amount of contact, we see a relatively small change in \(R_0\). (This is because susceptibility is relatively low for this age group.)

What happens if susceptibility doesn’t vary by age?

## create new u vector 
u_constant = rep(0.1435, 7) # relative susceptibility of each age class * probability of transmission given contact

## implication 
compute_R0(u_constant, iowa_matrix)
## [1] 4.072938
compute_R0(u_constant, iowa_matrix_higher_contact)
## [1] 5.029525

We see that if susceptibility does not vary by age, there is a much larger increase in \(R_0\).

2. What impact does a longer infectious period have on \(R_0\)?

## update function so we can change infectious period 
compute_R0_infec_period = function(u, C, dI){
  dI <- dI #recovery period in days 
  Du <- diag(u, 7)
  Dy <- diag(dI, 7)
  NGM <- Du %*% C %*% Dy
  R0  <- abs(eigen(NGM)$values[1])
  return(R0)
}

## try three different infectious periods 
compute_R0_infec_period(u, iowa_matrix, dI = 3)
## [1] 2.446601
compute_R0_infec_period(u, iowa_matrix, dI = 5)
## [1] 4.077668
compute_R0_infec_period(u, iowa_matrix, dI = 7)
## [1] 5.708735

A longer infectious period is causes an increase \(R_0\). The relationship between infectious period and \(R_0\) appears to be linear.

We can also visualize this relationship:

## create vector of different infectious periods 
infect_periods <- seq(1, 10, by = 1)

## create a list to store estimates 
R0_list <- list()

## loop over all different infectious periods 
for (infect_period in infect_periods) {
  
  ## calculate R0
  R0_estimate <- tibble(R0 = compute_R0_infec_period(u, iowa_matrix, dI = infect_period),
                        infectious_period = infect_period)
  ## store each estimate in a list 
  R0_list[[infect_period]] <- R0_estimate
}

## put results into data.frame 
results <- R0_list %>%
  bind_rows()

## plot results
ggplot(data = results, aes(x = infectious_period, y = R0)) + 
  geom_line(size = 1) + 
  geom_point(size = 3) + 
  theme_minimal(15) + 
  labs(x = "Infectious Period",
       y = "R0",
       title = "Relationship between R0 and Infectious Period")

Our plot shows that the relationship between \(R_0\) and infectious period is linear.