Household contact layer

The household contact layer represents the pairwise connections between household members. The population is generated within this contact layer, not as a separate pool of people.

As locations, households are special in the following ways:

  • Unlike schools and workplaces, everyone must be assigned to a household.

  • The size of the household is important (for example, a 2-person household looks very different in comparison to a 5- or 6-person household) and some households only have 1 person.

  • The reference person/head of the household can be well-defined by data.

Data needed

The following data sets are required for households:

  1. Age bracket distribution specifying the distribution of people in age bins for the location. For example:

    age_bracket , percent
    0_4         , 0.0594714358950416
    5_9         , 0.06031137308234759
    10_14       , 0.05338015778985113
    15_19       , 0.054500690394160285
    20_24       , 0.06161403846144956
    25_29       , 0.08899312471888453
    30_34       , 0.0883533486774803
    35_39       , 0.07780767611060545
    40_44       , 0.07099017823587304
    45_49       , 0.06996903280562596
    50_54       , 0.06655242534751997
    55_59       , 0.06350008343899961
    60_64       , 0.05761405140489549
    65_69       , 0.04487122889235999
    70_74       , 0.030964420778483555
    75_100       , 0.05110673396642193
    
  2. Age distribution of the reference person for each household size

    The distribution is what matters, so it doesn’t matter if absolute counts are available or not, each row is normalized. If this is not available, default to sampling the age of the reference individual from the age distribution for adults:

    family_size , 18-20 , 20-24 , 25-29 , 30-34 , 35-39 , 40-44 , 45-49 , 50-54 , 55-64 , 65-74 , 75-99
    2           , 163   , 999   , 2316  , 2230  , 1880  , 1856  , 2390  , 3118  , 9528  , 9345  , 5584
    3           , 115   , 757   , 1545  , 1907  , 2066  , 1811  , 2028  , 2175  , 3311  , 1587  , 588
    4           , 135   , 442   , 1029  , 1951  , 2670  , 2547  , 2368  , 1695  , 1763  , 520   , 221
    5           , 61    , 172   , 394   , 905   , 1429  , 1232  , 969   , 683   , 623   , 235   , 94
    6           , 25    , 81    , 153   , 352   , 511   , 459   , 372   , 280   , 280   , 113   , 49
    7           , 24    , 33    , 63    , 144   , 279   , 242   , 219   , 115   , 157   , 80    , 16
    
  3. Distribution of household sizes:

    household_size , percent
    1              , 0.2781590909877753
    2              , 0.3443313103056699
    3              , 0.15759535523004006
    4              , 0.13654311541644018
    5              , 0.050887858718118274
    6              , 0.019738368167953997
    7              , 0.012744901174002305
    
  4. Household contact matrix specifying the number/weight of contacts by age bin:

            0-10        , 10-20       , 20-30
    0-10    0.659867911 , 0.503965302 , 0.214772978
    10-20   0.314776879 , 0.895460015 , 0.412465791
    20-30   0.132821425 , 0.405073038 , 1.433888594
    

    By default, SynthPops uses matrices from a study (Prem et al. 2017) that projected inferred age mixing patterns from the POLYMOD study (Mossong et al. 2008) in Europe to other countries. SynthPops can take in user-specified contact matrices if other age mixing patterns are available for the household, school, and workplace settings (see the social contact data on Zenodo for other empirical contact matrices from survey studies).

    In theory, the household contact matrix varies with household size, but in general data at that resolution is unavailable.

Workflow

Use these SynthPops functions to instantiate households as follows:

  1. Call generate_synthetic_population() and provide the binned age bracket distribution data described above. This wrapper function calls the following functions:

    1. From the binned age distribution, get_age_n() creates samples of ages from the binned distribution, and then normalizes to create a single-year distribution. This distribution can therefore be gathered using whatever age bins are present in any given dataset.

    2. generate_household_sizes_from_fixed_pop_size() generates empty households with known size based on the distribution of household sizes.

    3. generate_all_households() contains the core implementation and constructs households with individuals of different ages living together. It takes in the remaining data sources above, and then does the following:

      • Calls generate_living_alone() to populate households with 1 person (either from data on those living alone or, if unavailable, from the adult age distribution).

      • Calls generate_larger_households() repeatedly with with different household sizes to populate those households, first sampling the age of a reference person and then their household contacts as outlined above.