segregation.spatial.AbsoluteCentralization

class segregation.spatial.AbsoluteCentralization(data, group_pop_var, total_pop_var, center='mean', metric='euclidean')[source]

Calculation of Absolute Centralization index

Parameters
dataa geopandas DataFrame with a geometry column.
group_pop_varstring

The name of variable in data that contains the population size of the group of interest

total_pop_varstring

The name of variable in data that contains the total population of the unit

centerstring, two-dimension values (tuple, list, array) or integer.

This defines what is considered to be the center of the spatial context under study.

If string, this can be set to:

“mean”: the center longitude/latitude is the mean of longitudes/latitudes of all units. “median”: the center longitude/latitude is the median of longitudes/latitudes of all units. “population_weighted_mean”: the center longitude/latitude is the mean of longitudes/latitudes of all units weighted by the total population. “largest_population”: the center longitude/latitude is the centroid of the unit with largest total population. If there is a tie in the maximum population, the mean of all coordinates will be taken.

If tuple, list or array, this argument should be the coordinates of the desired center assuming longitude as first value and latitude second value. Therefore, in the form (longitude, latitude), if tuple, or [longitude, latitude] if list or numpy array.

If integer, the center will be the centroid of the polygon from data corresponding to the integer interpreted as index. For example, if center = 0 the centroid of the first row of data is used as center, if center = 1 the second row will be used, and so on.

Notes

Based on Massey, Douglas S., and Nancy A. Denton. “The dimensions of residential segregation.” Social forces 67.2 (1988): 281-315.

A discussion of defining the center in this function can be found in https://github.com/pysal/segregation/issues/18.

Reference: [MD88].

Examples

In this example, we will calculate the absolute centralization index (ACE) for the Riverside County using the census tract data of 2010. The group of interest is non-hispanic black people which is the variable nhblk10 in the dataset.

Firstly, we need to perform some import the modules and the respective function.

>>> import pandas as pd
>>> import geopandas as gpd
>>> import segregation
>>> from segregation.spatial import AbsoluteCentralization

Secondly, we need to read the data:

>>> # This example uses all census data that the user must provide your own copy of the external database.
>>> # A step-by-step procedure for downloading the data can be found here: https://github.com/spatialucr/geosnap/blob/master/examples/01_getting_started.ipynb
>>> # After the user download the LTDB_Std_All_fullcount.zip and extract the files, the filepath might be something like presented below.
>>> filepath = '~/data/LTDB_Std_2010_fullcount.csv'
>>> census_2010 = pd.read_csv(filepath, encoding = "ISO-8859-1", sep = ",")

Then, we filter only for the desired county (in this case, Riverside County):

>>> df = census_2010.loc[census_2010.county == "Riverside County"][['tractid', 'pop10','nhblk10']]

Then, we read the Riverside map data using geopandas (the county id is 06065):

>>> map_url = 'https://raw.githubusercontent.com/renanxcortes/inequality-segregation-supplementary-files/master/Tracts_grouped_by_County/06065.json'
>>> map_gpd = gpd.read_file(map_url)

It is necessary to harmonize the data type of the dataset and the geopandas in order to work the merging procedure. Later, we extract only the columns that will be used.

>>> map_gpd['INTGEOID10'] = pd.to_numeric(map_gpd["GEOID10"])
>>> gdf_pre = map_gpd.merge(df, left_on = 'INTGEOID10', right_on = 'tractid')
>>> gdf = gdf_pre[['geometry', 'pop10', 'nhblk10']]

The value is estimated below.

>>> absolute_centralization_index = AbsoluteCentralization(gdf, 'nhblk10', 'pop10')
>>> absolute_centralization_index.statistic
0.6416113799795511
Attributes
statisticfloat

Absolute Centralization Index

core_dataa geopandas DataFrame

A geopandas DataFrame that contains the columns used to perform the estimate.

center_valueslist

The center, in the form [longitude, latitude], values used for the calculation of the centralization distances.

__init__(data, group_pop_var, total_pop_var, center='mean', metric='euclidean')[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(data, group_pop_var, total_pop_var)

Initialize self.