src.create_initial_states.create_initial_infections

Module Contents

Functions

create_initial_infections(empirical_infections, synthetic_data, start, end, seed, virus_shares, reporting_delay, population_size)

Create a DataFrame with initial infections.

_calculate_group_infection_probs(cases, population_size, synthetic_data)

Calculate the infection probability for each group and date.

_draw_bools_by_group(synthetic_data, group_by, probabilities)

Draw boolean values for each individual in synthetic data.

_unbiased_sum_preserving_round(arr)

Round values in an array, preserving the sum as good as possible.

_add_variant_info_to_infections(bool_df, virus_shares)

Draw which infections are of which virus variant.

create_initial_infections(empirical_infections, synthetic_data, start, end, seed, virus_shares, reporting_delay, population_size)[source]

Create a DataFrame with initial infections.

Warning

In case a person is drawn to be newly infected more than once we only infect her on the first date. If the probability of being infected is large, not correcting for this will lead to a lower infection probability than in the empirical data.

Parameters
  • empirical_infections (pandas.Series) – Newly infected Series with the index levels [“date”, “county”, “age_group_rki”]. Should already be corrected upwards to include undetected cases.

  • synthetic_data (pandas.DataFrame) – Dataset with one row per simulated individual. Must contain the columns age_group_rki and county.

  • start (str or pd.Timestamp) – Start date.

  • end (str or pd.Timestamp) – End date.

  • seed (int) –

  • virus_shares (dict or None) – If None, it is assumed that there is only one strain. If dict, keys are the names of the virus strains and the values are pandas.Series with a DatetimeIndex and the share among newly infected individuals on each day as value.

  • reporting_delay (int) – Number of days by which the reporting of cases is delayed. If given, later days are used to get the infections of the demanded time frame.

  • population_size (int) – Population size behind the empirical_infections.

Returns

DataFrame with same index as synthetic_data and one column

for each day between start and end. Dtype is boolean or categorical. Values identify which individual gets infected with which variant.

Return type

pandas.DataFrame

_calculate_group_infection_probs(cases, population_size, synthetic_data)[source]

Calculate the infection probability for each group and date.

Parameters
  • cases (pandas.DataFrame) – columns are the dates, the index are counties and age groups.

  • population_size (int) – Size of the population from which the cases originate.

  • synthetic_data (pandas.DataFrame) – Dataset with one row per simulated individual. Must contain the columns age_group_rki and county.

Returns

columns are dates, index are

counties and age groups. The values are the probabilities to be infected by age group on a particular date.

Return type

group_infection_probs (pandas.DataFrame)

_draw_bools_by_group(synthetic_data, group_by, probabilities)[source]

Draw boolean values for each individual in synthetic data.

Parameters
  • synthetic_data (pd.DataFrame) – Synthetic data set containing the group_by variables.

  • group_by (list) – List of variables according to which the data are grouped.

  • probabilities (pd.DataFrame) – The index levels are the group_by variables. There can be several columns with probabilities.

Returns

pandas.DataFrame or pandas.Series

_unbiased_sum_preserving_round(arr)[source]

Round values in an array, preserving the sum as good as possible. The function loops over the elements of an array and collects the deviations to the nearest downward adjusted integer. Whenever the collected deviations reach a predefined threshold, +1 is added to the current element and the collected deviations are reduced by 1. :param arr: 1d numpy array. :type arr: numpy.ndarray

Returns

numpy.ndarray

_add_variant_info_to_infections(bool_df, virus_shares)[source]

Draw which infections are of which virus variant.

Parameters
  • bool_df (pandas.DataFrame) – DataFrame with same index as synthetic_data and one column for each day between start and end. True for individuals being infected on each day.

  • virus_shares (dict) – A mapping between the names of the virus strains and their share among newly infected individuals over time.

Returns

DataFrame with same index as

synthetic_data and one column for each day between start and end. Dtype is categorical, identifying which individual gets infected on each day by which variant.

Return type

virus_strain_infections (pandas.DataFrame)