src.contact_models.task_create_contact_params

Module Contents

Functions

_create_parametrization()

Create the pytask parametrization.

task_calculate_and_plot_nr_of_contacts(depends_on, specs, produces)

_create_assort_params(model_name, contacts, places, recurrent, frequency, weekend)

_create_n_contacts(contacts, places, recurrent, frequency, weekend)

Calculate the reported number of contacts for the given specification.

_get_relevant_contacts_subset(contacts, places, recurrent, frequency, weekend)

Reduce the contacts data to that relevant for the current contact model.

_reduce_empirical_distribution_to_max_contacts(empirical_distribution, max_contacts, assert_below_this)

Find the closest distribution that has no more than max_contacts.

measure_of_diff_btw_distributions(params, old_distribution, desired_total)

Cost function for the minimization problem.

_make_decreasing(sr)

Make a pandas.Series decreasing.

Attributes

_create_parametrization()[source]

Create the pytask parametrization.

Each entry includes criteria for which contacts count to a contact type, specified in the data_selection_criteria that maps the type to the criteria that a contact must fulfill to belong to that contact type, as well as potentially the maximal number of contacts allowed.

The parametrization is a list of tuples where the first entry is the data_selection_criteria entry and the second the list of expected outputs.

PARAMETRIZATION[source]
task_calculate_and_plot_nr_of_contacts(depends_on, specs, produces)[source]
_create_assort_params(model_name, contacts, places, recurrent, frequency, weekend)[source]
_create_n_contacts(contacts, places, recurrent, frequency, weekend)[source]

Calculate the reported number of contacts for the given specification.

Parameters
  • contacts (pandas.DataFrame) –

  • places (list) – list of the places belonging to the current contact model. If places is [“work”], only workers are counted as reporting contacts.

  • recurrent (bool) – whether to keep recurrent or non-recurrent contacts

  • frequency (str) – which frequency to keep. None means all frequencies are kept

  • weekend (bool) – whether to use weekday or weekend data. None means both are kept.

Returns

index are reporting individuals

(or reporting workers if places == [“work”]) and the values are the number of condition meeting contacts (with respect to recurrent, placse, frequency and weekend) individuals reported.

Return type

n_contacts (pandas.Series)

_get_relevant_contacts_subset(contacts, places, recurrent, frequency, weekend)[source]

Reduce the contacts data to that relevant for the current contact model.

Parameters
  • contacts (pandas.DataFrame) –

  • places (list) – list of the places belonging to the current contact model

  • recurrent (bool) – whether to keep recurrent or non-recurrent contacts

  • frequency (str) – which frequency to keep. None means all frequencies are kept

  • weekend (bool) – whether to use weekday or weekend data. None means both are kept.

Returns

reduced DataFrame that only contains longer contacts of

the specified places, frequency and day of week. Reduce to only workers if looking at work contacts.

Return type

df (pandas.DataFrame)

_reduce_empirical_distribution_to_max_contacts(empirical_distribution, max_contacts, assert_below_this)[source]

Find the closest distribution that has no more than max_contacts.

Parameters
  • empirical_distribution (pandas.Series) – value counts of the number of reported contacts.

  • max_contacts (int) – maximum allowed numbers of reported contacts

Returns

approximated value counts

of the number of reported contacts. Adjusted so that no one has too many contacts, the number of individuals reporting contacts stayed the same and the total number of reported contacts over all individuals is close to before.

Return type

closest_distribution (pandas.Series)

measure_of_diff_btw_distributions(params, old_distribution, desired_total)[source]

Cost function for the minimization problem.

We try to minimize the difference between the truncated value counts and the proposed distribution while keeping the number of total contacts as close as possible to the original distribution.

Parameters
  • params (pandas.DataFrame) – estimagic params DataFrame. The index is the support of the distribuiton, the “value” column the number of individuals reporting the respective number.

  • old_distribution (pandas.Series) – The distribution to be approximated. The index is the support of the distribution. It must be the same as that of params.

  • desired_total (int) – desired value of the dot product of params. In the context of number of reported contacts this means the total number of reported contacts by all individuals.

Returns

sum of the parameter distance penalty and a penalty for

distance between the desired and achieved total.

Return type

cost (float)

_make_decreasing(sr)[source]

Make a pandas.Series decreasing.