src.create_initial_states.create_initial_infections¶
Module Contents¶
Functions¶
|
Create a DataFrame with initial infections. |
|
Calculate the infection probability for each group and date. |
|
Draw boolean values for each individual in synthetic data. |
Round values in an array, preserving the sum as good as possible. |
|
|
Draw which infections are of which virus variant. |
-
create_initial_infections(empirical_infections, synthetic_data, start, end, seed, virus_shares, reporting_delay, population_size)[source]¶ Create a DataFrame with initial infections.
Warning
In case a person is drawn to be newly infected more than once we only infect her on the first date. If the probability of being infected is large, not correcting for this will lead to a lower infection probability than in the empirical data.
- Parameters
empirical_infections (pandas.Series) – Newly infected Series with the index levels [“date”, “county”, “age_group_rki”]. Should already be corrected upwards to include undetected cases.
synthetic_data (pandas.DataFrame) – Dataset with one row per simulated individual. Must contain the columns age_group_rki and county.
start (str or pd.Timestamp) – Start date.
end (str or pd.Timestamp) – End date.
seed (int) –
virus_shares (dict or None) – If None, it is assumed that there is only one strain. If dict, keys are the names of the virus strains and the values are pandas.Series with a DatetimeIndex and the share among newly infected individuals on each day as value.
reporting_delay (int) – Number of days by which the reporting of cases is delayed. If given, later days are used to get the infections of the demanded time frame.
population_size (int) – Population size behind the empirical_infections.
- Returns
- DataFrame with same index as synthetic_data and one column
for each day between start and end. Dtype is boolean or categorical. Values identify which individual gets infected with which variant.
- Return type
-
_calculate_group_infection_probs(cases, population_size, synthetic_data)[source]¶ Calculate the infection probability for each group and date.
- Parameters
cases (pandas.DataFrame) – columns are the dates, the index are counties and age groups.
population_size (int) – Size of the population from which the cases originate.
synthetic_data (pandas.DataFrame) – Dataset with one row per simulated individual. Must contain the columns age_group_rki and county.
- Returns
- columns are dates, index are
counties and age groups. The values are the probabilities to be infected by age group on a particular date.
- Return type
group_infection_probs (pandas.DataFrame)
-
_draw_bools_by_group(synthetic_data, group_by, probabilities)[source]¶ Draw boolean values for each individual in synthetic data.
- Parameters
synthetic_data (pd.DataFrame) – Synthetic data set containing the group_by variables.
group_by (list) – List of variables according to which the data are grouped.
probabilities (pd.DataFrame) – The index levels are the group_by variables. There can be several columns with probabilities.
- Returns
pandas.DataFrame or pandas.Series
-
_unbiased_sum_preserving_round(arr)[source]¶ Round values in an array, preserving the sum as good as possible. The function loops over the elements of an array and collects the deviations to the nearest downward adjusted integer. Whenever the collected deviations reach a predefined threshold, +1 is added to the current element and the collected deviations are reduced by 1. :param arr: 1d numpy array. :type arr: numpy.ndarray
- Returns
numpy.ndarray
-
_add_variant_info_to_infections(bool_df, virus_shares)[source]¶ Draw which infections are of which virus variant.
- Parameters
bool_df (pandas.DataFrame) – DataFrame with same index as synthetic_data and one column for each day between start and end. True for individuals being infected on each day.
virus_shares (dict) – A mapping between the names of the virus strains and their share among newly infected individuals over time.
- Returns
- DataFrame with same index as
synthetic_data and one column for each day between start and end. Dtype is categorical, identifying which individual gets infected on each day by which variant.
- Return type
virus_strain_infections (pandas.DataFrame)