src.gss package

Submodules

src.gss.Data_simulation_util module

src.gss.Data_simulation_util.assign_gridcells_7x7(data)

Assigns grid cell indices to the data based on the values of two columns, “partyid” and “polviews”. The grid is 7x7 in size. The Gridcells are stored in the dataframe provided to the function. @param data: The pandas dataframe containing the preprocessed GSS data @return: The dataframe containing the grid cell indices as well as the GSS Data

src.gss.Data_simulation_util.borders_3x3_small(data)

Assigns grid cell indices to the data based on the values of two columns, “partyid” and “polviews”. The Gridcells are stored in the dataframe provided to the function. @param data: The pandas dataframe containing the preprocessed GSS data @return: The dataframe containing the grid cell indices as well as the GSS Data

src.gss.Data_simulation_util.get_bins(data, opinion)

Compute the bin edges for creating a histogram of the dataset. The bin size ich chosen, such that each unique value is contained in a separate bin @param data: The pandas dataframe containing the preprocessed GSS data @param opinion: string, the key for accessing the dataframe @return: bin edges covering the entire data

src.gss.Data_simulation_util.get_grid_cell(x_min, x_max, y_min, y_max, coordinate)
src.gss.Data_simulation_util.get_histogram_counts(data, opinion, bins, method, density=False)

Calculates histogram counts for each grid cell based on the specified opinion column and method. @param data: @param opinion: @param bins: @param method: @param density: @return:

src.gss.Data_simulation_util.grid_uniform(data, seed=123)

A new and smaller grid is computed. Redistributes agents uniformly within each grid cell, and updates the grid cell indices based on the new agent positions. @param data: @param seed: @return:

src.gss.GSS_analysis module

This file contains the GSS class handling all the functions that are related to the GSS dataset. The function parameter search() is used to sample the parameter space.

class src.gss.GSS_analysis.GSS(data_key: str, first_year: int | None = None, last_year: int = 2018)

Bases: object

This class handles all interactions with the GSS dataset. Loading and preparing the relevant ata, as well as comparing simulations, etc.

bins: ndarray | None
compute_kl_divergence(data_counts, sim_counts)

Calculates the KL Divergence KL(data | simulation) between the joint distribution over social space and opinion space.

Parameters:
  • data_counts (list) – Histogram counts for the joint distribution from the GSS data. Each entry in this list must contain the histogram counts for the opinion within a grid cell.Designed to take input from input,_ = Util.get_histogram_counts()

  • sim_counts (list) – Histogram counts for the joint distribution from the simulated data. Each entry in this list must contain the histogram counts for the opinion within a grid cell.Designed to take input from input,_ = Util.get_histogram_counts()

Returns:

The KL divergence value between the GSS data and the simulated data.

Return type:

float

The method applies pseudo counts to avoid division by zero.

data_analysis(R: tuple[float, float], alpha: float | int, beta: float | int, sigma_op: float | int, sigma_sp: float | int, seed: int | None = 123, model: Model | None = None, boundary: bool = True) None

Simulates a realization using the initial data provided by the data_preparation function, computes KL divergences, and generates histograms for both the GSS data and simulation results.

Parameters:
  • R (tuple[float, float]) – Radii parameter for spatial interaction.

  • alpha (float | int) – Opinion interaction strength.

  • beta (float | int) – Spatial interaction strength.

  • sigma_op (float | int) – Opinion noise intensity.

  • sigma_sp (float | int) – Spatial noise intensity.

  • seed (int) – The seed used for the random number generator in the simulation.

  • model (Model | None) –

    If the simulation was computed elsewhere, an instance of the Model

    class can be passed here.

  • boundary (bool) – Whether to enforce boundary conditions during the simulation.

Returns:

None

Raises:
  • ValueError

  • If the initial data or number of agents is not set by running data_preparation() first.

This method performs the following steps:
  1. Initializes the Model class with the given parameters if no model is provided.

2. Runs the simulation for the specified time period. 4. For each time step, generates histograms for the GSS data and simulation results. 5. Computes KL divergences between the GSS data and simulation results for each grid cell and the overall distribution. 6. Stores histogram data and KL divergence values for later analysis.

data_preparation(seed: int | None = 123) None

Processes the raw data files and prepares histograms for comparison with simulation results. Generates initial data for the simulation.

Parameters:

seed (Optional[int]) – Defines the seed for the spreading of the data. If None is provided, the random number generator will use None.

This method performs the following steps:

1. Reads only the specified columns (‘year’, ‘partyid’, ‘polviews’, and the column specified by self.data_key). 2. Filters the data to include only usable values within the desired time period. 3. Rescales the data using the rescale_data method. 4. Determines the unique years for which data is available and filters them based on the specified time range. 5. Groups the data by year for easy access. 6. Sets the input dataframe containing the initial data for the simulation. 7. Computes the bin edges for histogram analysis. 8. Determines the number of agents and the length of the investigated time period. 9. Disperses the agents in the social space and assigns them to a 3x3 grid.

  1. Computes the histograms for the initial data.

  2. Sets the boundary conditions for the social space.

Returns:

None

distributed_gss_data: dict | None
goodness_of_fit() float

A simple heuristic to evaluate the goodness of fit by integrating the KL divergence against an exponential decay, to prioritize the fit for times closer to the initialization.

@return: float

The goodness of fit.

gss_data: None | DataFrame | Series
gss_data_groups: DataFrameGroupBy | None
gss_histograms: List | None
initial_data: DataFrame | None
kl_divergence: List[int] | None
model: Model | None
number_of_agents: int | None
plot_distributed_data(year, radii, xlim, ylim, boundary=True, use_R_sp=True, filename=None, figure=None, s=None, network_alpha=None, colorbar=True, network_lw=None, hist_ylim=None, node_linewidth=0)
plot_gss_participants(ax: Axes | None = None) axes
plot_kl_divergence(filename: str | None = None, ax=None) Axes | None

” Plots the KL divergence timeseries.

Parameters:
  • filename (str, optional)

  • specified (If)

  • be (the plot will be saved to this file. The file extension has to)

  • None. (included. Default is)

  • ax (matplotlib.axes.Axes, optional)

  • given (he timeseries will be plotted on this axis. Default is None. If None is)

  • new (a)

  • created. (plot will be)

Returns:

Matplotlib Axes object containing the KL divergences if no filename is provided.

Return type:

Optional[plt.Axes]

Raises:
  • ValueError – If kl_divergence is not defined, indicating that data_analysis() needs to be

  • run first.

rescale_data()

Rescales the ‘partyid’ and ‘polviews’ columns of the GSS dataset to the range (-0.25, 0.25) and the column specified by ‘data_key’ to the range (-1, 1). This method uses MinMaxScaler from scikit-learn to normalize the data. The ‘partyid’ and ‘polviews’ columns are scaled to a smaller range suitable for spatial data, while the column specified by ‘data_key’ is scaled to a larger range suitable for opinion data. .. attribute:: gss_data

The GSS dataset containing the columns to be rescaled.

type:

pd.DataFrame

data_key

The key for the specific column in the GSS dataset to be rescaled.

Type:

str

Returns:

None

simulation_histograms: List | None
time_period: int | None
xlim: tuple[float, float] | None
years: ndarray | None
ylim: tuple[float, float] | None

This function performs a uniform sampling from the parameter space and analyzes the KL divergence between the opinion data gathered from the GSS data set and the simulated distribution. The timeseries of the KL divergence will be plotted and saved. The filename contains the used parameter values and the “goodness-of-fit”.

Parameters:
  • data_key (str) – Key for the relevant GSS dataset (‘helpsick’ or ‘eqwlth’).

  • sample_size (int, optional) – Number of parameter sets to sample. Default is 100.

  • seed (Optional[int], optional) – Seed for the random number generator. Default is 123.

  • first_year (Optional[int], optional) – First year for the data comparison. Default is None.

  • last_year (Optional[int], optional) – Last year for the data comparison. Default is None.

Returns:

None

src.gss.image_util module

Image Combination Module

This module provides functions to combine multiple images in various ways. You can stack images vertically, place them side by side, or combine multiple PNG images into a single image.

Functions:

  1. combine_images_side_by_side(image1_path: str, image2_path: str, output_path: str) -> None:

    Combine two images side by side and save the resulting image.

  2. stack_images_vertically(image_paths: List[str], output_path: str) -> None:

    Stack multiple images vertically and save the resulting image.

  3. combine_pngs(png1_path: str, png2_path: str, png3_path: str, png4_path: str, png5_path: str,

    png6_path: str, output_path: str) -> None: Combine six PNG images into a single image and save the resulting image.

src.gss.image_util.combine_images_side_by_side(image1_path: str, image2_path: str, output_path: str) None

Combine two images side by side and save the resulting image.

Parameters:

image1_pathstr

The file path to the first image.

image2_pathstr

The file path to the second image.

output_pathstr

The file path where the combined image will be saved.

Returns:

None

Notes:

  • The function opens the two images specified in image1_path and image2_path.

  • It calculates the width and height of the resulting combined image.

  • A new blank image is created with the calculated dimensions.

  • The first image is pasted onto the new image at the leftmost position.

  • The second image is pasted onto the new image at the rightmost position.

  • The combined image is saved to output_path.

Example usage:

combine_images_side_by_side(“image1.png”, “image2.png”, “output.png”)

src.gss.image_util.combine_pngs(png1_path, png2_path, png3_path, png4_path, png5_path, png6_path, output_path)

Combine six PNG images into a single image and save the resulting image.

Parameters:

png1_pathstr

The file path to the first PNG image.

png2_pathstr

The file path to the second PNG image.

png3_pathstr

The file path to the third PNG image.

png4_pathstr

The file path to the fourth PNG image.

png5_pathstr

The file path to the fifth PNG image.

png6_pathstr

The file path to the sixth PNG image.

output_pathstr

The file path where the combined image will be saved.

Returns:

None

Notes:

  • The function opens the six PNG images specified by the file paths.

  • It calculates the width and height of the resulting combined image.

  • A new blank image is created with the calculated dimensions.

  • Each PNG image is pasted onto the new image at the appropriate position.

  • The combined image is saved to output_path.

Example usage:

combine_pngs(“png1.png”, “png2.png”, “png3.png”, “png4.png”, “png5.png”, “png6.png”, “output.png”)

src.gss.image_util.stack_images_vertically(image_paths, output_path)

Stack multiple images vertically and save the resulting image.

Parameters:

image_pathsList[str]

A list of file paths to the images to be stacked.

output_pathstr

The file path where the combined image will be saved.

Returns:

None

Notes:

  • The function opens all images specified in image_paths.

  • It calculates the width and height of the resulting combined image.

  • A new blank image is created with the calculated dimensions.

  • Each image is pasted onto the new image, stacked vertically.

  • The combined image is saved to output_path.

Example usage:

stack_images_vertically([“image1.png”, “image2.png”, “image3.png”], “output.png”)

Module contents