src.contact_models.task_create_contact_params
¶
Module Contents¶
Functions¶
Create the pytask parametrization. |
|
|
|
|
|
|
Calculate the reported number of contacts for the given specification. |
|
Reduce the contacts data to that relevant for the current contact model. |
|
Find the closest distribution that has no more than max_contacts. |
|
Cost function for the minimization problem. |
|
Make a pandas.Series decreasing. |
Attributes¶
- _create_parametrization()[source]¶
Create the pytask parametrization.
Each entry includes criteria for which contacts count to a contact type, specified in the data_selection_criteria that maps the type to the criteria that a contact must fulfill to belong to that contact type, as well as potentially the maximal number of contacts allowed.
The parametrization is a list of tuples where the first entry is the data_selection_criteria entry and the second the list of expected outputs.
- _create_n_contacts(contacts, places, recurrent, frequency, weekend)[source]¶
Calculate the reported number of contacts for the given specification.
- Parameters
contacts (pandas.DataFrame) –
places (list) – list of the places belonging to the current contact model. If places is [“work”], only workers are counted as reporting contacts.
recurrent (bool) – whether to keep recurrent or non-recurrent contacts
frequency (str) – which frequency to keep. None means all frequencies are kept
weekend (bool) – whether to use weekday or weekend data. None means both are kept.
- Returns
- index are reporting individuals
(or reporting workers if places == [“work”]) and the values are the number of condition meeting contacts (with respect to recurrent, placse, frequency and weekend) individuals reported.
- Return type
n_contacts (pandas.Series)
- _get_relevant_contacts_subset(contacts, places, recurrent, frequency, weekend)[source]¶
Reduce the contacts data to that relevant for the current contact model.
- Parameters
contacts (pandas.DataFrame) –
places (list) – list of the places belonging to the current contact model
recurrent (bool) – whether to keep recurrent or non-recurrent contacts
frequency (str) – which frequency to keep. None means all frequencies are kept
weekend (bool) – whether to use weekday or weekend data. None means both are kept.
- Returns
- reduced DataFrame that only contains longer contacts of
the specified places, frequency and day of week. Reduce to only workers if looking at work contacts.
- Return type
df (pandas.DataFrame)
- _reduce_empirical_distribution_to_max_contacts(empirical_distribution, max_contacts, assert_below_this)[source]¶
Find the closest distribution that has no more than max_contacts.
- Parameters
empirical_distribution (pandas.Series) – value counts of the number of reported contacts.
max_contacts (int) – maximum allowed numbers of reported contacts
- Returns
- approximated value counts
of the number of reported contacts. Adjusted so that no one has too many contacts, the number of individuals reporting contacts stayed the same and the total number of reported contacts over all individuals is close to before.
- Return type
closest_distribution (pandas.Series)
- measure_of_diff_btw_distributions(params, old_distribution, desired_total)[source]¶
Cost function for the minimization problem.
We try to minimize the difference between the truncated value counts and the proposed distribution while keeping the number of total contacts as close as possible to the original distribution.
- Parameters
params (pandas.DataFrame) – estimagic params DataFrame. The index is the support of the distribuiton, the “value” column the number of individuals reporting the respective number.
old_distribution (pandas.Series) – The distribution to be approximated. The index is the support of the distribution. It must be the same as that of params.
desired_total (int) – desired value of the dot product of params. In the context of number of reported contacts this means the total number of reported contacts by all individuals.
- Returns
- sum of the parameter distance penalty and a penalty for
distance between the desired and achieved total.
- Return type
cost (float)