`src.prepare_data.shared`¶

Module Contents¶

`draw_groups`(df, group_name, query, assort_bys, n_per_group)	Assign individuals to random groups based on their characteristics.
`_create_groups`(df, counter, n_per_group)
`create_groups_from_dist`(initial_states, group_name, group_distribution, query, assort_bys)	Assign individuals to random groups to match a group size distribution.
`_determine_number_of_groups`(nobs, dist)
`_create_group_ids`(nr_of_groups, assort_by_vals)
`_expand_or_contract_ids`(ids, nobs, assort_by_vals)
`_check_created_groups`(group_sr, size_sr, group_distribution)
`format_thousands_with_comma`(value, pos)
`draw_from_distribution_for_subset`(states, distribution, query, outside_val)	Draw for all workers from a distribution how many they are going to meet.
`create_age_groups`(age_sr)
`create_age_groups_rki`(df)
`relabel_age_groups_rki_for_parquet`(sr)
`from_timestamps_to_epochs`(timestamps)	Convert timestamps to epochs.
`from_epochs_to_timestamps`(epochs)	Convert epochs to timestamps.

draw_groups(df, group_name, query, assort_bys, n_per_group)[source]¶

Assign individuals to random groups based on their characteristics.

Parameters

df (pandas.DataFrame) – sid states DataFrame
group_name (str) – name of the column to be created
query (str) – identify who gets a group. All others are assigned -1. Make sure your contact model assigns these people a 0 so they do not meet.
assort_bys (list) – columns by which to group individuals, such that in every group people share all characteristics in the assort_by variables.
n_per_group (int) – number of people per group.

Returns

Series with the group ids.: It has the same index as df. It’s -1 for individuals without a group.

Return type

drawn_groups (pandas.Series)

create_groups_from_dist(initial_states, group_name, group_distribution, query, assort_bys)[source]¶

Assign individuals to random groups to match a group size distribution.

Notes

This could be made faster by not creating single member groups.
Group assignment is completely random (within each assort_by value
combination). This means there is either perfect or zero assortativeness in the contacts with respect to characteristics. E.g. if age is not given in the assort_bys, groups are completely randomly assigned with respect to age. On the other hand if age is in the assort_bys all members of each group have the exact same age.

Parameters

initial_states (pandas.DataFrame) – SID initial states DataFrame.
group_name (str) – name of the Series to be created.
group_distribution (pandas.Series) – the index is the support of the group sizes, the values is the share of the group size we are aiming for.
query (str) – query string to identify the subpopulation for which we want to create group ids. Note that group_distribution must describe the distribution of group sizes in this subpopulation.
assort_bys (list) – columns by which to group individuals, such that in every group people share all charackteristics in the assort_by variables.

Returns

index is the same as the initial_states. Values are: identifiers (strings) of each group.

Return type

group_sr (pandas.Series)

draw_from_distribution_for_subset(states, distribution, query, outside_val)[source]¶

Draw for all workers from a distribution how many they are going to meet.

Parameters

states (pandas.DataFrame) – sid states DataFrame
distribution (pandas.Series) – index is the support, values are the probabilities.
query (str) – query to identify the subset of the states for which to draw the number of contacts.
outside_val – value the output Series should draw for indivdiuals who do not fulfill the query condition.

Returns

index is the same as states,: the values are outside_val for anyone outside the query and drawn from the distribution for the rest.

Return type

contacts (pandas.Series)

from_timestamps_to_epochs(timestamps)[source]¶

Convert timestamps to epochs.

from_epochs_to_timestamps(epochs)[source]¶

Convert epochs to timestamps.