`src.create_initial_states.make_educ_group_columns`¶

Module Contents¶

Functions¶

`make_educ_group_columns`(states, query, group_size, strict_assort_by, weak_assort_by, adults_per_group, n_contact_models, column_prefix, occupation_name, seed)	Generate contact model columns for education contacts.
`create_balanced_group_column`(states, query, group_size, strict_assort_by, weak_assort_by)	Create a group id for a recurrent contact model with equally sized groups.
`_get_id_to_weak_group`(participants, raw_id)	Create a mapping from groups to weak_assort_by groups
`_split_data_by_query`(df, query)	Split data into those selected by query and the rest.
`_create_group_id_for_participants`(df, group_size, strict_assort_by, weak_assort_by)	Create the group id for those selected by query.
`_determine_group_sizes`(target_size, population_size)	Calculate group sizes given a target size and a population size.
`_create_group_id_for_one_strict_assort_by_group`(df, group_size, weak_assort_by, start_id)	Create group id for all people of the same strict_assort_by group.
`_get_key_with_longest_value`(dict_)	Get the key from `dict_` that has the longest value.
`_get_key_with_shortest_value`(dict_)	Get the key from `dict_` that has the shortest value
`_create_group_id_for_non_participants`(df)	Create group_id for those not selected by query.

make_educ_group_columns(states, query, group_size, strict_assort_by, weak_assort_by, adults_per_group, n_contact_models, column_prefix, occupation_name, seed)[source]¶

Generate contact model columns for education contacts.

This generates raw group ids using create_balanced_group_column. It then replicates this column n_contact_models times and mixes in some individuals for which occupation == “working” (e.g. to simulate teachers).

Parameters

states (pandas.DataFrame) – DataFrame with background variables, including all assort_by variables.
query (str) – Query that selects which individuals are part of a group.
group_size (int) – Target group size that will be achieved approximately.
strict_assort_by (list or str) – Groups only contain individuals that have the same value in all strict_assort_by variables.
weak_assort_by (list or str) – Individuals that have the same value in all weak_assort_by variables are more likely to be matched into one group. Adults are taken from the modal weak group.
adults_per_group (int) – Number of teachers added to each class.
n_contact_models (int) – Number of contact models for which group ids are generated. This is also the average number of classes each teacher teaches.
column_prefix (str) – Prefix for column names.
occupation_name (str) – Value to which the
seed (int) – Random seed.

Returns

The generated id columns. Column names are: f”{prefix}_{number}” where number counts the contact models.
pd.Series: Modified occupation column where “working” was changed: to occupation_name in some cases.

Return type

pd.DataFrame

create_balanced_group_column(states, query, group_size, strict_assort_by, weak_assort_by)[source]¶

Create a group id for a recurrent contact model with equally sized groups.

This is a low level function that will probably rather be called via get_educ_group_column.

When reading the code it is helpful to distinguish four types of groups of individuals: 1. The group whose ID column we want to generate, called just “group” 2. The groups induced by the strict_assort_by variables, called “strong_group” 3. The groups induced by the weak_assort_by_variables, called “weak_group” 4. Participants and non participants. Participants are those selected by query

The algorithm is deterministic but might depend on the order of states.

Parameters

states (pandas.DataFrame) – DataFrame with background variables, including all assort_by variables.
query (str) – Query that selects which individuals are part of a group.
group_size (int) – Target group size that will be achieved approximately.
strict_assort_by (list or str) – Groups only contain individuals that have the same value in all strict_assort_by variables.
weak_assort_by (list or sttr) – Individuals that have the same value in all weak_assort_by variables are more likely to be matched into one group.

Returns

The group_id with same index as states.

Return type

pandas.Series

_get_id_to_weak_group(participants, raw_id)[source]¶

Create a mapping from groups to weak_assort_by groups

This is not a unique mapping since each group can have members from multiple weak assort by groups. We make it unique by just assigning the weak_assort_by group of the first group member to the whole group.

Parameters

participants (pandas.DataFrame) – DataFrame of participating individuals. It has to have the “__weak_group_id” column.
raw_id (pandas.Series) – column giving the groups which are to be mapped to __weak_group_ids.

Returns

the index are the group ids in: participants, the values are the first weak group ids of each group.

Return type

id_to_weak_group (pandas.Series)

_split_data_by_query(df, query)[source]¶: Split data into those selected by query and the rest.

_create_group_id_for_participants(df, group_size, strict_assort_by, weak_assort_by)[source]¶

Create the group id for those selected by query.

The main work is done in _create_group_id_for_one_strict_assort_by_group.

_determine_group_sizes(target_size, population_size)[source]¶

Calculate group sizes given a target size and a population size.

Parameters

target_size (int) – Target group size
population_size (int) – Number of people that are split into groups.

Returns

List of integers. The length is the number of groups. The entries: are the group sizes. Not all groups have the same size but they differ at most by one.

Return type

list

_create_group_id_for_one_strict_assort_by_group(df, group_size, weak_assort_by, start_id)[source]¶

Create group id for all people of the same strict_assort_by group.

To make matching as assortative as possible with respect to the weak_assort_by variables, for each group we first try to fill it with members of only one group (i.e. we start with the largest remaining weak_assort_by_group). If this is not enough, we fill the group by members of the smallest remaining weak_assort_by group.

Parameters

df (pandas.DataFrame) – DataFrame that only contains people from one strict_assort_by_group.
group_size (int) – The target group size.
weak_assort_by (str or list) – Variable or list of variables according to which group matching should be assortative.
start_id (int) – The id of the first group.

Returns

The index is the same as df. The values are the group_ids.

Return type

pd.Series

_get_key_with_longest_value(dict_)[source]¶: Get the key from dict_ that has the longest value.

_get_key_with_shortest_value(dict_)[source]¶: Get the key from dict_ that has the shortest value

_create_group_id_for_non_participants(df)[source]¶: Create group_id for those not selected by query.

src.create_initial_states.make_educ_group_columns¶

Module Contents¶

Functions¶

`src.create_initial_states.make_educ_group_columns`¶