src.create_initial_states.make_educ_group_columns

Module Contents

Functions

make_educ_group_columns(states, query, group_size, strict_assort_by, weak_assort_by, adults_per_group, n_contact_models, column_prefix, occupation_name, seed)

Generate contact model columns for education contacts.

create_balanced_group_column(states, query, group_size, strict_assort_by, weak_assort_by)

Create a group id for a recurrent contact model with equally sized groups.

_get_id_to_weak_group(participants, raw_id)

Create a mapping from groups to weak_assort_by groups

_split_data_by_query(df, query)

Split data into those selected by query and the rest.

_create_group_id_for_participants(df, group_size, strict_assort_by, weak_assort_by)

Create the group id for those selected by query.

_determine_group_sizes(target_size, population_size)

Calculate group sizes given a target size and a population size.

_create_group_id_for_one_strict_assort_by_group(df, group_size, weak_assort_by, start_id)

Create group id for all people of the same strict_assort_by group.

_get_key_with_longest_value(dict_)

Get the key from dict_ that has the longest value.

_get_key_with_shortest_value(dict_)

Get the key from dict_ that has the shortest value

_create_group_id_for_non_participants(df)

Create group_id for those not selected by query.

make_educ_group_columns(states, query, group_size, strict_assort_by, weak_assort_by, adults_per_group, n_contact_models, column_prefix, occupation_name, seed)[source]

Generate contact model columns for education contacts.

This generates raw group ids using create_balanced_group_column. It then replicates this column n_contact_models times and mixes in some individuals for which occupation == “working” (e.g. to simulate teachers).

Parameters
  • states (pandas.DataFrame) – DataFrame with background variables, including all assort_by variables.

  • query (str) – Query that selects which individuals are part of a group.

  • group_size (int) – Target group size that will be achieved approximately.

  • strict_assort_by (list or str) – Groups only contain individuals that have the same value in all strict_assort_by variables.

  • weak_assort_by (list or str) – Individuals that have the same value in all weak_assort_by variables are more likely to be matched into one group. Adults are taken from the modal weak group.

  • adults_per_group (int) – Number of teachers added to each class.

  • n_contact_models (int) – Number of contact models for which group ids are generated. This is also the average number of classes each teacher teaches.

  • column_prefix (str) – Prefix for column names.

  • occupation_name (str) – Value to which the

  • seed (int) – Random seed.

Returns

The generated id columns. Column names are

f”{prefix}_{number}” where number counts the contact models.

pd.Series: Modified occupation column where “working” was changed

to occupation_name in some cases.

Return type

pd.DataFrame

create_balanced_group_column(states, query, group_size, strict_assort_by, weak_assort_by)[source]

Create a group id for a recurrent contact model with equally sized groups.

This is a low level function that will probably rather be called via get_educ_group_column.

When reading the code it is helpful to distinguish four types of groups of individuals: 1. The group whose ID column we want to generate, called just “group” 2. The groups induced by the strict_assort_by variables, called “strong_group” 3. The groups induced by the weak_assort_by_variables, called “weak_group” 4. Participants and non participants. Participants are those selected by query

The algorithm is deterministic but might depend on the order of states.

Parameters
  • states (pandas.DataFrame) – DataFrame with background variables, including all assort_by variables.

  • query (str) – Query that selects which individuals are part of a group.

  • group_size (int) – Target group size that will be achieved approximately.

  • strict_assort_by (list or str) – Groups only contain individuals that have the same value in all strict_assort_by variables.

  • weak_assort_by (list or sttr) – Individuals that have the same value in all weak_assort_by variables are more likely to be matched into one group.

Returns

The group_id with same index as states.

Return type

pandas.Series

_get_id_to_weak_group(participants, raw_id)[source]

Create a mapping from groups to weak_assort_by groups

This is not a unique mapping since each group can have members from multiple weak assort by groups. We make it unique by just assigning the weak_assort_by group of the first group member to the whole group.

Parameters
  • participants (pandas.DataFrame) – DataFrame of participating individuals. It has to have the “__weak_group_id” column.

  • raw_id (pandas.Series) – column giving the groups which are to be mapped to __weak_group_ids.

Returns

the index are the group ids in

participants, the values are the first weak group ids of each group.

Return type

id_to_weak_group (pandas.Series)

_split_data_by_query(df, query)[source]

Split data into those selected by query and the rest.

_create_group_id_for_participants(df, group_size, strict_assort_by, weak_assort_by)[source]

Create the group id for those selected by query.

The main work is done in _create_group_id_for_one_strict_assort_by_group.

_determine_group_sizes(target_size, population_size)[source]

Calculate group sizes given a target size and a population size.

Parameters
  • target_size (int) – Target group size

  • population_size (int) – Number of people that are split into groups.

Returns

List of integers. The length is the number of groups. The entries

are the group sizes. Not all groups have the same size but they differ at most by one.

Return type

list

_create_group_id_for_one_strict_assort_by_group(df, group_size, weak_assort_by, start_id)[source]

Create group id for all people of the same strict_assort_by group.

To make matching as assortative as possible with respect to the weak_assort_by variables, for each group we first try to fill it with members of only one group (i.e. we start with the largest remaining weak_assort_by_group). If this is not enough, we fill the group by members of the smallest remaining weak_assort_by group.

Parameters
  • df (pandas.DataFrame) – DataFrame that only contains people from one strict_assort_by_group.

  • group_size (int) – The target group size.

  • weak_assort_by (str or list) – Variable or list of variables according to which group matching should be assortative.

  • start_id (int) – The id of the first group.

Returns

The index is the same as df. The values are the group_ids.

Return type

pd.Series

_get_key_with_longest_value(dict_)[source]

Get the key from dict_ that has the longest value.

_get_key_with_shortest_value(dict_)[source]

Get the key from dict_ that has the shortest value

_create_group_id_for_non_participants(df)[source]

Create group_id for those not selected by query.