src.prepare_data.shared

Module Contents

Functions

draw_groups(df, group_name, query, assort_bys, n_per_group)

Assign individuals to random groups based on their characteristics.

_create_groups(df, counter, n_per_group)

create_groups_from_dist(initial_states, group_name, group_distribution, query, assort_bys)

Assign individuals to random groups to match a group size distribution.

_determine_number_of_groups(nobs, dist)

_create_group_ids(nr_of_groups, assort_by_vals)

_expand_or_contract_ids(ids, nobs, assort_by_vals)

_check_created_groups(group_sr, size_sr, group_distribution)

format_thousands_with_comma(value, pos)

draw_from_distribution_for_subset(states, distribution, query, outside_val)

Draw for all workers from a distribution how many they are going to meet.

create_age_groups(age_sr)

create_age_groups_rki(df)

relabel_age_groups_rki_for_parquet(sr)

from_timestamps_to_epochs(timestamps)

Convert timestamps to epochs.

from_epochs_to_timestamps(epochs)

Convert epochs to timestamps.

draw_groups(df, group_name, query, assort_bys, n_per_group)[source]

Assign individuals to random groups based on their characteristics.

Parameters
  • df (pandas.DataFrame) – sid states DataFrame

  • group_name (str) – name of the column to be created

  • query (str) – identify who gets a group. All others are assigned -1. Make sure your contact model assigns these people a 0 so they do not meet.

  • assort_bys (list) – columns by which to group individuals, such that in every group people share all characteristics in the assort_by variables.

  • n_per_group (int) – number of people per group.

Returns

Series with the group ids.

It has the same index as df. It’s -1 for individuals without a group.

Return type

drawn_groups (pandas.Series)

_create_groups(df, counter, n_per_group)[source]
create_groups_from_dist(initial_states, group_name, group_distribution, query, assort_bys)[source]

Assign individuals to random groups to match a group size distribution.

Notes

  • This could be made faster by not creating single member groups.

  • Group assignment is completely random (within each assort_by value

    combination). This means there is either perfect or zero assortativeness in the contacts with respect to characteristics. E.g. if age is not given in the assort_bys, groups are completely randomly assigned with respect to age. On the other hand if age is in the assort_bys all members of each group have the exact same age.

Parameters
  • initial_states (pandas.DataFrame) – SID initial states DataFrame.

  • group_name (str) – name of the Series to be created.

  • group_distribution (pandas.Series) – the index is the support of the group sizes, the values is the share of the group size we are aiming for.

  • query (str) – query string to identify the subpopulation for which we want to create group ids. Note that group_distribution must describe the distribution of group sizes in this subpopulation.

  • assort_bys (list) – columns by which to group individuals, such that in every group people share all charackteristics in the assort_by variables.

Returns

index is the same as the initial_states. Values are

identifiers (strings) of each group.

Return type

group_sr (pandas.Series)

_determine_number_of_groups(nobs, dist)[source]
_create_group_ids(nr_of_groups, assort_by_vals)[source]
_expand_or_contract_ids(ids, nobs, assort_by_vals)[source]
_check_created_groups(group_sr, size_sr, group_distribution)[source]
format_thousands_with_comma(value, pos)[source]
draw_from_distribution_for_subset(states, distribution, query, outside_val)[source]

Draw for all workers from a distribution how many they are going to meet.

Parameters
  • states (pandas.DataFrame) – sid states DataFrame

  • distribution (pandas.Series) – index is the support, values are the probabilities.

  • query (str) – query to identify the subset of the states for which to draw the number of contacts.

  • outside_val – value the output Series should draw for indivdiuals who do not fulfill the query condition.

Returns

index is the same as states,

the values are outside_val for anyone outside the query and drawn from the distribution for the rest.

Return type

contacts (pandas.Series)

create_age_groups(age_sr)[source]
create_age_groups_rki(df)[source]
relabel_age_groups_rki_for_parquet(sr)[source]
from_timestamps_to_epochs(timestamps)[source]

Convert timestamps to epochs.

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html #from-timestamps-to-epoch

from_epochs_to_timestamps(epochs)[source]

Convert epochs to timestamps.

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html #epoch-timestamps