src.prepare_data.shared
¶
Module Contents¶
Functions¶
|
Assign individuals to random groups based on their characteristics. |
|
|
|
Assign individuals to random groups to match a group size distribution. |
|
|
|
|
|
|
|
|
|
|
|
Draw for all workers from a distribution how many they are going to meet. |
|
|
|
Convert timestamps to epochs. |
|
Convert epochs to timestamps. |
- draw_groups(df, group_name, query, assort_bys, n_per_group)[source]¶
Assign individuals to random groups based on their characteristics.
- Parameters
df (pandas.DataFrame) – sid states DataFrame
group_name (str) – name of the column to be created
query (str) – identify who gets a group. All others are assigned -1. Make sure your contact model assigns these people a 0 so they do not meet.
assort_bys (list) – columns by which to group individuals, such that in every group people share all characteristics in the assort_by variables.
n_per_group (int) – number of people per group.
- Returns
- Series with the group ids.
It has the same index as df. It’s -1 for individuals without a group.
- Return type
drawn_groups (pandas.Series)
- create_groups_from_dist(initial_states, group_name, group_distribution, query, assort_bys)[source]¶
Assign individuals to random groups to match a group size distribution.
Notes
This could be made faster by not creating single member groups.
- Group assignment is completely random (within each assort_by value
combination). This means there is either perfect or zero assortativeness in the contacts with respect to characteristics. E.g. if age is not given in the assort_bys, groups are completely randomly assigned with respect to age. On the other hand if age is in the assort_bys all members of each group have the exact same age.
- Parameters
initial_states (pandas.DataFrame) – SID initial states DataFrame.
group_name (str) – name of the Series to be created.
group_distribution (pandas.Series) – the index is the support of the group sizes, the values is the share of the group size we are aiming for.
query (str) – query string to identify the subpopulation for which we want to create group ids. Note that group_distribution must describe the distribution of group sizes in this subpopulation.
assort_bys (list) – columns by which to group individuals, such that in every group people share all charackteristics in the assort_by variables.
- Returns
- index is the same as the initial_states. Values are
identifiers (strings) of each group.
- Return type
group_sr (pandas.Series)
- draw_from_distribution_for_subset(states, distribution, query, outside_val)[source]¶
Draw for all workers from a distribution how many they are going to meet.
- Parameters
states (pandas.DataFrame) – sid states DataFrame
distribution (pandas.Series) – index is the support, values are the probabilities.
query (str) – query to identify the subset of the states for which to draw the number of contacts.
outside_val – value the output Series should draw for indivdiuals who do not fulfill the query condition.
- Returns
- index is the same as states,
the values are outside_val for anyone outside the query and drawn from the distribution for the rest.
- Return type
contacts (pandas.Series)
- from_timestamps_to_epochs(timestamps)[source]¶
Convert timestamps to epochs.
https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html #from-timestamps-to-epoch
- from_epochs_to_timestamps(epochs)[source]¶
Convert epochs to timestamps.
https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html #epoch-timestamps