june.infection_seed.observed_to_cases

class june.infection_seed.observed_to_cases.Observed2Cases(age_per_area_df: pandas.core.frame.DataFrame, female_fraction_per_area_df: pandas.core.frame.DataFrame, health_index_generator: HealthIndexGenerator = None, symptoms_trajectories: Optional[TrajectoryMaker] = None, n_observed_deaths: Optional[pandas.core.frame.DataFrame] = None, area_super_region_df: Optional[pandas.core.frame.DataFrame] = None, smoothing=False)

Class to convert observed deaths over time into predicted number of latent cases over time, use for the seed of the infection. It reads population data, to compute average death rates for a particular region, timings from config file estimate the median time it takes for someone infected to die in hospital, and the health index to obtain the death rate as a function of age and sex.

age_per_area_df:

data frame with the age distribution per area, to compute the weighted death rate

female_fraction_per_area_df:

data frame with the fraction of females per area as a function of age to compute the weighted death rate

health_index_generator:

generator of the health index to compute death_rate(age,sex)

symptoms_trajectories:

used to read the trajectory config file and compute the median time it takes to die in hospital

n_observed_deaths:

time series with the number of observed deaths per region

area_super_region_df:

df with area, super_area, region mapping

smoothing:

whether to smooth the observed deaths time series before computing the expected number of cases (therefore the estimates becomes less dependent on spikes in the data)

__init__(age_per_area_df: pandas.core.frame.DataFrame, female_fraction_per_area_df: pandas.core.frame.DataFrame, health_index_generator: HealthIndexGenerator = None, symptoms_trajectories: Optional[TrajectoryMaker] = None, n_observed_deaths: Optional[pandas.core.frame.DataFrame] = None, area_super_region_df: Optional[pandas.core.frame.DataFrame] = None, smoothing=False)

Class to convert observed deaths over time into predicted number of latent cases over time, use for the seed of the infection. It reads population data, to compute average death rates for a particular region, timings from config file estimate the median time it takes for someone infected to die in hospital, and the health index to obtain the death rate as a function of age and sex.

age_per_area_df:

data frame with the age distribution per area, to compute the weighted death rate

female_fraction_per_area_df:

data frame with the fraction of females per area as a function of age to compute the weighted death rate

health_index_generator:

generator of the health index to compute death_rate(age,sex)

symptoms_trajectories:

used to read the trajectory config file and compute the median time it takes to die in hospital

n_observed_deaths:

time series with the number of observed deaths per region

area_super_region_df:

df with area, super_area, region mapping

smoothing:

whether to smooth the observed deaths time series before computing the expected number of cases (therefore the estimates becomes less dependent on spikes in the data)

_smooth_time_series(time_series_df: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame

Smooth a time series by applying a gaussian filter in 1d

time_series_df:

df with time as index

smoothed time series df

aggregate_age_sex_dfs_by_region(age_per_area_df: pandas.core.frame.DataFrame, female_fraction_per_area_df: pandas.core.frame.DataFrame) -> (<class 'pandas.core.frame.DataFrame'>, <class 'pandas.core.frame.DataFrame'>)

Combines the age per area dataframe and female fraction per area to create two data frames with numbers of females by age per region, and numbers of males by age per region

age_per_area_df:

data frame with the number of people with a certain age per area

female_fraction_per_area_df:

fraction of those that are females per area and age bin

females_per_age_region_df:

number of females as a function of age per region

males_per_age_region_df:

number of males as a function of age per region

aggregate_areas_by_region(df_per_area: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame

Aggregates an area dataframe into a region dataframe

df_per_area:

data frame indexed by area

convert_regional_cases_to_super_area(n_cases_per_region_df: pandas.core.frame.DataFrame, dates: Union[List[str], Dict[str, List]]) → pandas.core.frame.DataFrame

Converts regional cases to cases by super area by weighting each super area within the region according to its population

n_cases_per_region_df:

data frame with the number of cases by region, indexed by date

dates:

dates to select (it can be a dictinary with different dates for different regions

data frame with the number of cases by super area, indexed by date

filter_symptoms_trajectories(symptoms_trajectories: List[TrajectoryMaker], symptoms_to_keep: Tuple[str] = 'dead_hospital', 'dead_icu') → List[june.infection.trajectory_maker.TrajectoryMaker]

Filter all symptoms trajectories to obtain only the ones that contain given symtpoms in `symptoms_to_keep`

symptoms_trajectories:

list of all symptoms trajectories

symptoms_to_keep:

tuple of strings containing the desired symptoms for which to find trajectories

trajectories containing `symptoms_to_keep`

classmethod from_file(health_index_generator, age_per_area_path: str = PosixPath('/home/sadie/JUNE/data/input/demography/age_structure_single_year.csv'), female_fraction_per_area_path: str = PosixPath('/home/sadie/JUNE/data/input/demography/female_ratios_per_age_bin.csv'), trajectories_path: str = PosixPath('/home/sadie/JUNE/configs/defaults/symptoms/trajectories.yaml'), observed_deaths_path: str = PosixPath('/home/sadie/JUNE/data/covid_real_data/n_deaths_region.csv'), area_super_region_path: str = PosixPath('/home/sadie/JUNE/data/input/geography/area_super_area_region.csv'), smoothing=False)june.infection_seed.observed_to_cases.Observed2Cases

Creates class from paths to data

health_index_generator:

generator of the health index to compute death_rate(age,sex)

age_per_area_path:

path to data with number of people of a given age by area

female_fraction_per_area_df:

path to data with fraction of people that are female by area and age bin

trajectories_path:

path to config file with possible symptoms trajectories and their timings

observed_deaths_path:

path to time series of observed deaths over time

area_super_region_path:

path to data on area, super_area, region mapping

smoothing:

whether to smooth the observed deaths time series before computing the expected number of cases (therefore the estimates becomes less dependent on spikes in the data)

Instance of Observed2Cases

get_latent_cases_from_observed(n_observed: int, avg_rates: List) → int

Given a number of observed cases, such as observed deaths or observed hospital admissions, this function converts it into number of latent cases necessary to produce such an observation.

n_observed:

observed number of cases (such as deaths or hospital admissions)

avg_rates:

average rates to produce the observed cases, such as average death rate or average admission rates. It is a list, since we might want to look at, for instance, death rate, which is a combination of deat_home, deat_hospital, dead_icu rates.

Number of latent cases

get_latent_cases_per_region(n_observed_df: pandas.core.frame.DataFrame, time_to_get_symptoms: int, avg_rates_per_region: dict) → pandas.core.frame.DataFrame

Converts observed cases per region into latent cases per region.

n_observed_df:

time series of the observed cases

time_to_get_symptoms:

days it takes form infection to the symptoms of interest (such as time to death)

avg_rates_per_region:

average probability to get those symptoms per region

n_cases_per_region_df:

number of latent cases per region time series

get_median_completion_time(stage: Stage) → float

Get median completion time of a stage, from its distribution

stage:

given stage in trajectory

Median time spent in stage

get_regional_latent_cases() → pandas.core.frame.DataFrame

Find regional latent cases from the observed one.

data frame with latent cases per region indexed by date

get_super_area_population_weights() → pandas.core.frame.DataFrame

Compute the weight in population that a super area has over its whole region, used to convert regional cases to cases by super area by population density

data frame indexed by super area, with weights and region

get_symptoms_rates_per_age_sex() → dict

Computes the rates of ending up with certain SymptomTag for all ages and sex.

dictionary with rates of symptoms (fate) as a function of age and sex

get_time_it_takes_to_symptoms(symptoms_trajectories: List[TrajectoryMaker], symptoms_tags: List[str])

Compute the median time it takes to get certain symptoms in `symptoms_tags`, such as death or hospital admission.

symptoms_trajectories:

list of symptoms trajectories

symptoms_tags:

symptoms tags for the symptoms of interest

get_weighted_time_to_symptoms(symptoms_trajectories: List[TrajectoryMaker], avg_rate_for_symptoms: List[float], symptoms_tags: List[str]) → float

Get the time it takes to get certain symptoms weighted by population. For instance, when computing the death rate, more people die in hospital than in icu, therefore the median time to die in hospital gets weighted more than the median time to die in icu.

symptoms_trajectories:

trajectories for symptoms that include the symptoms of interest

avg_rate_for_symptoms:

list containing the average rate for certain symptoms given in `symptoms tags`. WARNING: should be in the same order

symptoms_tags:

tags of the symptoms for which we want to know the median time

Weighted median time to symptoms

weight_rates_by_age_sex_per_region(symptoms_rates_dict: dict, symptoms_tags: List[SymptomTag]) → List[float]

Get the weighted average by age and sex of symptoms rates for symptoms in symptoms_tags. For example to get the weighted average death rate per region, select symptoms_tags = (‘dead_hospital’, ‘dead_icu’, ‘dead_home’)

symtpoms_rates_dict:

dictionary with rates for all the possible final symptoms, indexed by sex and age.

symptoms_tags:

final symptoms to keep

List of weighted rates for symptoms in `symptoms_tags` (ordered in the same way!!)