Heterogeneity and transmission scaling

Author:Daniel J. Klein, Senior Research Scientist, IDM


Models of disease transmission often assume a homogeneous pattern of mixing between individuals that stems from a mechanism in which the population is sufficiently “well-mixed” to make each individual equally likely to next encounter each other individual. Along with this mixing assumption, simple models additionally suppose that susceptibility and infectivity are homogeneous across the population. These assumptions enable elegant mathematical calculations to be performed, including those for the base reproductive rate, R_0, and endemic equilibrium. Homogeneous mixing also greatly simplifies implementation in the sense the only a single transmission constant, \beta, is required to describe the transmission rate per person.

However, assumptions of homogeneity quickly become invalid when considering the complexities of disease transmission. Spatial effects, even at relatively small scales, can invalidate the well- mixed assumption, particularly if the disease is highly infectious or the population density is high. Non-uniform patterns of mixing based on age are well-documented in school-based disease transmission [1]. Susceptibility and infectivity are inherently heterogeneous, as governed by both behavioral and biological factors including hand washing, viral load, and immunity. Many of these factors can be targeted by interventions that potentially have non-uniform uptake, for example due to differences in accessibility.

These sources of heterogeneity are abundant in almost every infectious disease. The “right” level of heterogeneity to include in a model depends on the question being asked, but rarely is a purely homogeneous model sufficient. The Epidemiological MODeling software (EMOD) has long-supported heterogeneity in the form of “nodes.” Each node represents a small geographical area that can have customized parameters. Individuals migrate between nodes either temporarily or permanently using mobility patterns driven by local, regional, and long-distance transportation. While nodes have long provided the capability to model large-scale spatial heterogeneities, disease transmission within each node was homogeneously governed by a single transmission parameter, Base_Infectivity, which determined \beta.

Heterogeneous Intra-Node Transmission (HINT)

EMOD implements the Heterogeneous Intra-Node Transmission (HINT) feature that enables the modeler to include disease transmission heterogeneities within each node. Intra-node heterogeneity is driven by “individuals properties,” each of which has an associated discrete set of “values.” Examples of individual properties, with typical values given parenthetically, include risk group (low, medium, high), age group (under 5, 5 to 15, over 15), vaccine accessibility class (easy, hard), intra-node locale (census tracts), and place (home, school, work, community, non-school). All nodes in a simulation must have the same individual properties and values, but the characteristics of the heterogeneity can vary from node to node.

Associated with each individual property is a values-by-values-sized matrix of multipliers, denoted \beta, that scale base infectivity, \beta_0. Specifically, the entry of the multiplier matrix at row i and column j determines the transmission constant from infected individuals having the i^{th} value to susceptible individuals having the j^{th} value.

Two individual properties can be combined to achieve a higher level of transmission heterogeneity. When multiple individual properties are configured, their effects are combined independently via multiplication of the appropriate multipliers. For example if the individual properties are age group (young and old) and vaccine accessibility (easy and hard),

\beta^{\text{age}} = \begin{bmatrix}1.5 & 0.9 \\ 0.8 & 1.0\end{bmatrix} \text{and } \beta^{\text{accessibility}} = \begin{bmatrix}1.0 & 0.4 \\ 0.3 & 5.0\end{bmatrix},

the transmission multiplier from an infected individual who is a young and hard to access to a susceptible individual who is old and easy to access will be 0.9 \times 0.3 = 0.27. Similarly, the multiplier from an infected young vaccine refuser to a susceptible young vaccine refuser will be 1.5 \times 5.0 = 7.5. While most multiplier matrices are symmetric, asymmetries as in the above example can arise for several reasons including heterogeneity in susceptibility.

The product of the base infectivity, \beta_0, and the multiplier matrix, \beta, governs “who acquires infection from whom,” and thus is sometimes called the WAIFW matrix. The values in this matrix can be thought of as the “effective contact rate,” where an effective contact is defined as one that will result in disease transmission were it between a susceptible and infected individual [2].

Disease dynamics

EMOD is an exceptionally powerful software tool. Even in “generic” mode, EMOD is capable of simulating spatially diverse Susceptible-Exposed-Infected-Recovered (SEIR model) and SEIRS dynamics, including options for non-exponential incubation and infectious periods, inter-node migration, and density- or frequency-dependent transmission.

Single homogeneous node

The mathematical fundamentals underlying generic disease dynamics are most vividly illustrated in a single well-mixed community. In EMOD, this scenario corresponds to a single node without HINT. To simplify presentation of the mathematical principles, we focus on the well-known SIR model dynamics with exponentially distributed waiting times between state transitions.

In mathematical biology, disease dynamics are often presented in terms of the proportion of the initial population in susceptible, infected, or recovered states (SIR). Because EMOD represents individuals, and to be clear about mechanisms by which the transmission rate varies as a function of the node population, we instead present the dynamics in terms of the number of individuals (X, Y, Z) in place of (S, I, R), respectively.

The governing dynamics can be written as an ordinary differential equation (ODE) with fixed birthrate v, death rate {\mu}, and recovery rate {\gamma}.

\frac{dX}{dt} &= vN - \frac{\beta_0}{N}XY - \mu{X} \\
\frac{dY}{dt} &= \frac{\beta_0}{N}XY - \gamma{Y} - \mu{Y} \\
\frac{dZ}{dt} &= \gamma{Y} - \mu{Z}

Here N = X + Y + Z is the total number of individuals.

The ODE dynamics in (2) are deterministic and continuous. In a finite population of individuals, there is no way to precisely replicate these dynamics, and it is not clear that one would want to in practice because disease transmission is an inherently stochastic process that unfolds in a population of finite size.

In EMOD, the state changes at fixed time steps. The size of the time step, denoted \delta, is selected to be small compared to the characteristic timescale of the disease dynamics. Each update step consists of three primary sub-steps:

  1. Shed: The default behavior in a homogeneous generic simulation is for each infected individual to shed contagion at a fixed rate. The total rate of contagion shedding from all infected individuals is called the force of infection, \lambda = \frac{\beta}{N}Y.
  2. Expose: Each susceptible individual becomes infected with probability p_1 = \text{exp}(\lambda\delta), where \lambda is the time step.
  3. Finalize: The final part of each time step contains recovery from infection and disease mortality for infected individuals, along with basic demographic updates. EMOD supports advanced demographics, but here we use the per-capita birth and death rates as in (2).
    1. Recovery: Each infected individual recovers in one time step with probability, p_r = 1 - \text{exp}(-\gamma\delta).
    2. Birth: For birthrate v, the number of new susceptible individuals will be Poisson distributed with rate vN\delta.
    3. Natural Death: The probability of death for each individual is p_d = 1 - \text{exp}(-\mu\delta).

Putting these pieces together over the course of a time step, let:

\Delta{N_i} & \sim \text{Binomial}(X,p_i) \qquad [\text{Infection}] \\
\Delta{N_r} & \sim \text{Binomial}(Y,p_r) \qquad [\text{Recovery}] \\
\Delta{N_b} & \sim \text{Binomial}(vN\delta) \qquad [\text{Birth}] \\
\Delta{N_{X,d}} & \sim \text{Binomial}(X,p_d) \qquad [\text{Death of susceptibles}] \\
\Delta{N_{Y,d}} & \sim \text{Binomial}(Y,p_d) \qquad [\text{Death of infected}] \\
\Delta{N_{Z,d}} & \sim \text{Binomial}(Z,p_d) \qquad [\text{Death of recovered}]

With these numbers in mind and assuming that only one state transition event happens to each individual in a time step:

X(t + \delta) &= X(t) - \Delta{N_i} + \Delta{N_b} - \Delta{N_{X,d}} \\
Y(t + \delta) &= Y(t) - \Delta{N_i} + \Delta{N_r} - \Delta{N_{Y,d}} \\
Z(t + \delta) &= Z(t) - \Delta{N_r} - \Delta{N_{Z,d}} \\
t &= t + \delta

HINT node with one individual property

HINT feature adds the ability to simulate heterogeneity within each EMOD node. It is enabled through individual properties defined in the demographics file. The disease dynamics parallel the homogeneous case. To begin, consider a single node with one individual property having V values. The X, Y, and Z disease states will gain an n subscript to index into the individual property values. Using the user-provided multiplier matrix of size V \times V, the ODE-form of the disease dynamics are:

\frac{dX_v}{dt} &= vN - \sum^{V}_{k=1}\frac{\beta_0\beta_{k,v}}{N}X_v Y_k - \mu{X_v} \\
\frac{dY_v}{dt} &= \sum^V_{k=1}\frac{\beta_0\beta_{k,v}}{N}X_vY_k - \gamma{Y_v} - \mu{Y_v} \\
\frac{dZ_v}{dt} &= \gamma{Y_v} - \mu{Z_v}

For v \in [1...V]. From this ODE, conversion to an individual-based implementation follows the methods for a homogeneous node presented above.

At simulation initialization, individuals are assigned stochastically to values of the individual property according to the distribution specified in the demographics file. Note also that intra-node transmission heterogeneities can be transmission-route specific.

HINT node with multiple individual properties

Multiple individual properties can be easily configured in the EMOD demographics file. When enabled, the heterogeneity induced by separate individual properties act independently. As an example, consider two individual properties, p_1 and p_2, having V_1 and V_2 values, respectively. The multiplier matrices will be denoted \beta^{p_1} and \beta^{p_2}, and have sizes V_1 \times V_1 and V_2 \times V_2, respectively. The resulting disease dynamics in ODE-form are:

\frac{dX_{v_1,v_2}}{dt} &= vN - \sum^{v_1}_{k_1=1} \sum^{v_2}_{k_2=1}\frac{\beta_0\beta^{p_1}_{k_1,v_1}\beta^{p_2}_{k_2,k_2}}{N}X_{v_1,v_2}Y_{k_1,k_2} - \mu{X_{v_1,v_2}} \\

\frac{dX_{v_1,v_2}}{dt} &= vN - \sum^{v_1}_{k_1=1} \sum^{v_2}_{k_2=1}\frac{\beta_0\beta^{p_1}_{k_1,v_1}\beta^{p_2}_{k_2,k_2}}{N}X_{v_1,v_2}Y_{k_1,k_2} - \gamma{Y_{v_1,v_2}} - \mu{Y_{v_1,v_2}} \\

\frac{dZ_{v_1,v_2}}{dt} &= \gamma{Y_{v_1,v_2}} - \mu{Z_{v_1,v_2}}

For indices (V_1,V_2) \in [1...V_1]\times[V_1...V_2].

When configured for multiple individual properties, EMOD assigns individual property values to individuals independently based on the separate probability distributions provided by the user in the demographics file.

Transmission scaling

Disease transmission depends on the rate effective contacts as represented by parameter \beta_0. But what happens to the effective contact rate as the size of the population changes? There are two commonly accepted answers: frequency-dependent and density-dependent transmission scaling.

Frequency versus density-dependent transmission scaling

The two common forms of transmission scaling with population (more precisely, population density) are:

Frequency dependence, true mass action

The number of effective contacts per person per unit time is independent of the population size. A single infected person in a node with a large population will infect just as many people as they would in a node with a small population. One way to think about this is that the population density is fixed, so that the node with the large population is spread over a large area. For this case,

\text{Transmission} \propto \frac{\text{Number of effective contacts per unit time}}{\text{Current size of the node population}},

so that the transmission rate inversely proportional to N(t). The dynamics described above, starting from (2), are frequency-dependent because the \beta_0 parameter is divided by the current node population, N(t).

Density dependence, pseudo mass action

The number of effective contacts per person per time is proportional to the population. Think of a room containing N individuals. A single highly-infectious cough would infect more people if the number of people in the room (population density) was higher. For this case,

\text{Transmission} \propto \frac{\text{Number of effective contacts per unit time}}{\text{Fixed population size}},

Usually, the denominator is taken as N(t=0), the initial node population. Note that the transmission rate per capita is a constant in this case. Density-dependent transmission is often found in SIR models which inherently neglect the total population size.

Transmission scaling in EMOD

EMOD supports flexibility in transmission scaling above and beyond pure frequency- or density-dependent scaling. The default configuration employs frequency-dependent scaling, the most commonly- used form of scaling in computational epidemiology. For version 1.6, EMOD allows the transmission rate to scale as a saturating function of population (density) [3], even when HINT is enabled. The force of infection contributed by each infected individual under these two cases of transmission scaling are as follows:

\text{Force per infected} = \left\{
        \frac{\beta_0}{N} & \text{Frequency dependence} \\
        \left[1 - \exp\left(- \frac{\rho}{\rho_{50}}\right)\right] \frac{\beta_0}{N} & \text{Saturating function of density}

Here, r is the population density and r_{50} is an input parameter the governs the transition from density to frequency dependence. The population density is computed as \rho = \frac{N}{A} where A is is the area of the node. This node area is in turn computed from the longitude and latitude of the node center point (from the demographics file) and the node size. It is assumed that all nodes have equal size in terms of the degrees of latitude and longitude, as determined by configuration parameter Node_Grid_Size. Denoting this node grid size by w, the area of a node located at (lat, lon) is computed based on the corresponding area of a sphere,

A = R^2\Big\lbrace\Big[\text{cos}\Big(\Big(90-\text{lat}-\frac{w}{2}\Big)\frac{\pi}{180}\Big)- \text{cos}\Big(\Big(90-\text{lat}+\frac{w}{2}\Big)\frac{\pi}{180}\Big)\Big]\frac{w\pi}{180}\Big\rbrace,

where R=6371.2213 km is the radius of Earth.

The saturating function of density is enabled by setting the following EMOD configuration parameter: Population_Density_Infectivity_Correction: SATURATING_FUNCTION_OF_DENSITY

Finally, the r_{50} parameter is configured using an EMOD configuration parameter Population_Density_C50.

Contact scaling

HINT and transmission scaling allow the EMOD user to specify and control heterogeneity within EMOD. One potential application for these features is in configuring “contact scaling,’’ a particular form of heterogeneous mixing in which shedding and acquisition are symmetric. The best example of contact scaling is location-based mixing, which can be used in place of intra-day migration. Typical places for location-based mixing include home, school, and work. Contact scaling could also be used for mixing between age groups.

Contact scaling is typically specified by a matrix, \rho=\frac{N}{A}. The entry at c_{ij} indicates that an individual with value i sheds and acquires from mixing pool j with weight c_{ij}. Each row of the contact scaling matrix typically sums to one, especially for location-based mixing wherein c_{ij} can be used to represent the fraction of the day an individual with value i spends at location j.

In terms of implementation, there is one main difference between contact scaling and transmission scaling. In contact scaling, individuals both shed into and acquire from pools of contagion according to the corresponding row of the C matrix. Normalization is done not by N, as in frequency scaling, but instead by the “effective population” of each mixing pool, which is computed as the C-weighted sum of the population-by-value. Transmission scaling via the \beta matrix directly specifies the transmission rate per susceptible between each group of individuals, so there is no concept of place but the overall approach is more flexible, e.g. by allowing asymmetries.

Interestingly, contact scaling exhibits neither frequency-dependent nor density-dependent scaling, although it more closely resembles frequency-dependent transmission because the denominator is a function of the current population. Currently, contact scaling is not supported as an independent feature in the EMOD software, however one can simulate contact scaling to good approximation if the fraction of the population in each value n_i = \frac{N_i(t)}{N(t)} is constant or stabilizes quickly. Then EMOD can be configured using,

\beta = C[\text{diag}(C^Tn)]^{-1}C^T

Here, n is a V-length column vector of the equilibrium fractions of the population by value. Absent mortality factors or interventions moving individuals from one value to the next, n should be close to the probability distribution used to assign values to individuals.


EMOD is a powerful software tool for simulating generic disease dynamics. Amongst the many improvements offered by version 1.6 are the new capabilities described mathematically in this document: HINT and transmission scaling with population density. These improvements add to the long list of EMOD features that make it the most powerful disease simulation framework available today.

  1. Mossong, N. Hens, M. Jit, P. Beutels, K. Auranen, R. Mikolajczyk, M. Massari, S. Salmaso, G. S. Tomba, J. Wallinga, et al. Social contacts and mixing patterns relevant to the spread of infectious diseases. PLos medicine, 5(3):e74, 2008.
[2]M.J. Keeling and P. Rohani. Modeling infectious diseases in humans and animals. Princeton University Press, 2008.
  1. Hu, K. Nigmatulina, and P. Eckhoff. The scaling of contact rates with population density for the infectious disease models. Mathematical biosciences, 2013.