src.create_initial_states.make_educ_group_columns
¶
Module Contents¶
Functions¶
|
Generate contact model columns for education contacts. |
|
Create a group id for a recurrent contact model with equally sized groups. |
|
Create a mapping from groups to weak_assort_by groups |
|
Split data into those selected by query and the rest. |
|
Create the group id for those selected by query. |
|
Calculate group sizes given a target size and a population size. |
|
Create group id for all people of the same strict_assort_by group. |
|
Get the key from |
|
Get the key from |
Create group_id for those not selected by query. |
- make_educ_group_columns(states, query, group_size, strict_assort_by, weak_assort_by, adults_per_group, n_contact_models, column_prefix, occupation_name, seed)[source]¶
Generate contact model columns for education contacts.
This generates raw group ids using create_balanced_group_column. It then replicates this column n_contact_models times and mixes in some individuals for which occupation == “working” (e.g. to simulate teachers).
- Parameters
states (pandas.DataFrame) – DataFrame with background variables, including all assort_by variables.
query (str) – Query that selects which individuals are part of a group.
group_size (int) – Target group size that will be achieved approximately.
strict_assort_by (list or str) – Groups only contain individuals that have the same value in all
strict_assort_by
variables.weak_assort_by (list or str) – Individuals that have the same value in all weak_assort_by variables are more likely to be matched into one group. Adults are taken from the modal weak group.
adults_per_group (int) – Number of teachers added to each class.
n_contact_models (int) – Number of contact models for which group ids are generated. This is also the average number of classes each teacher teaches.
column_prefix (str) – Prefix for column names.
occupation_name (str) – Value to which the
seed (int) – Random seed.
- Returns
- The generated id columns. Column names are
f”{prefix}_{number}” where number counts the contact models.
- pd.Series: Modified occupation column where “working” was changed
to occupation_name in some cases.
- Return type
pd.DataFrame
- create_balanced_group_column(states, query, group_size, strict_assort_by, weak_assort_by)[source]¶
Create a group id for a recurrent contact model with equally sized groups.
This is a low level function that will probably rather be called via get_educ_group_column.
When reading the code it is helpful to distinguish four types of groups of individuals: 1. The group whose ID column we want to generate, called just “group” 2. The groups induced by the strict_assort_by variables, called “strong_group” 3. The groups induced by the weak_assort_by_variables, called “weak_group” 4. Participants and non participants. Participants are those selected by query
The algorithm is deterministic but might depend on the order of states.
- Parameters
states (pandas.DataFrame) – DataFrame with background variables, including all assort_by variables.
query (str) – Query that selects which individuals are part of a group.
group_size (int) – Target group size that will be achieved approximately.
strict_assort_by (list or str) – Groups only contain individuals that have the same value in all
strict_assort_by
variables.weak_assort_by (list or sttr) – Individuals that have the same value in all weak_assort_by variables are more likely to be matched into one group.
- Returns
The group_id with same index as states.
- Return type
- _get_id_to_weak_group(participants, raw_id)[source]¶
Create a mapping from groups to weak_assort_by groups
This is not a unique mapping since each group can have members from multiple weak assort by groups. We make it unique by just assigning the weak_assort_by group of the first group member to the whole group.
- Parameters
participants (pandas.DataFrame) – DataFrame of participating individuals. It has to have the “__weak_group_id” column.
raw_id (pandas.Series) – column giving the groups which are to be mapped to __weak_group_ids.
- Returns
- the index are the group ids in
participants, the values are the first weak group ids of each group.
- Return type
id_to_weak_group (pandas.Series)
- _create_group_id_for_participants(df, group_size, strict_assort_by, weak_assort_by)[source]¶
Create the group id for those selected by query.
The main work is done in _create_group_id_for_one_strict_assort_by_group.
- _determine_group_sizes(target_size, population_size)[source]¶
Calculate group sizes given a target size and a population size.
- _create_group_id_for_one_strict_assort_by_group(df, group_size, weak_assort_by, start_id)[source]¶
Create group id for all people of the same strict_assort_by group.
To make matching as assortative as possible with respect to the weak_assort_by variables, for each group we first try to fill it with members of only one group (i.e. we start with the largest remaining weak_assort_by_group). If this is not enough, we fill the group by members of the smallest remaining weak_assort_by group.
- Parameters
df (pandas.DataFrame) – DataFrame that only contains people from one strict_assort_by_group.
group_size (int) – The target group size.
weak_assort_by (str or list) – Variable or list of variables according to which group matching should be assortative.
start_id (int) – The id of the first group.
- Returns
The index is the same as df. The values are the group_ids.
- Return type
pd.Series