segregation.spatial.SpatialDissim¶
-
class
segregation.spatial.
SpatialDissim
(data, group_pop_var, total_pop_var, w=None, standardize=False)[source]¶ Calculation of Spatial Dissimilarity index
- Parameters
- dataa geopandas DataFrame with a geometry column.
- group_pop_varstring
The name of variable in data that contains the population size of the group of interest
- total_pop_varstring
The name of variable in data that contains the total population of the unit
- wW
A PySAL weights object. If not provided, Queen contiguity matrix is used.
- standardizeboolean
A condition for row standardisation of the weights matrices. If True, the values of cij in the formulas gets row standardized. For the sake of comparison, the seg R package of Hong, Seong-Yun, David O’Sullivan, and Yukio Sadahiro. “Implementing spatial segregation measures in R.” PloS one 9.11 (2014): e113767. works by default with row standardization.
Notes
Based on Morrill, R. L. (1991) “On the Measure of Geographic Segregation”. Geography Research Forum.
Reference: [Mor91].
Examples
In this example, we will calculate the degree of spatial dissimilarity (D) for the Riverside County using the census tract data of 2010. The group of interest is non-hispanic black people which is the variable nhblk10 in the dataset. The neighborhood contiguity matrix is used.
Firstly, we need to perform some import the modules and the respective function.
>>> import pandas as pd >>> import geopandas as gpd >>> import segregation >>> from segregation.spatial import SpatialDissim
Secondly, we need to read the data:
>>> # This example uses all census data that the user must provide your own copy of the external database. >>> # A step-by-step procedure for downloading the data can be found here: https://github.com/spatialucr/geosnap/blob/master/examples/01_getting_started.ipynb >>> # After the user download the LTDB_Std_All_fullcount.zip and extract the files, the filepath might be something like presented below. >>> filepath = '~/data/LTDB_Std_2010_fullcount.csv' >>> census_2010 = pd.read_csv(filepath, encoding = "ISO-8859-1", sep = ",")
Then, we filter only for the desired county (in this case, Riverside County):
>>> df = census_2010.loc[census_2010.county == "Riverside County"][['tractid', 'pop10','nhblk10']]
Then, we read the Riverside map data using geopandas (the county id is 06065):
>>> map_url = 'https://raw.githubusercontent.com/renanxcortes/inequality-segregation-supplementary-files/master/Tracts_grouped_by_County/06065.json' >>> map_gpd = gpd.read_file(map_url)
It is necessary to harmonize the data type of the dataset and the geopandas in order to work the merging procedure. Later, we extract only the columns that will be used.
>>> map_gpd['INTGEOID10'] = pd.to_numeric(map_gpd["GEOID10"]) >>> gdf_pre = map_gpd.merge(df, left_on = 'INTGEOID10', right_on = 'tractid') >>> gdf = gdf_pre[['geometry', 'pop10', 'nhblk10']]
The value is estimated below.
>>> spatial_dissim_index = SpatialDissim(gdf, 'nhblk10', 'pop10') >>> spatial_dissim_index.statistic 0.2864885055405311
To use different neighborhood matrices:
>>> from libpysal.weights import Rook, KNN
Assuming K-nearest neighbors with k = 4
>>> knn = KNN.from_dataframe(gdf, k=4) >>> spatial_dissim_index = Spatial_Dissim(gdf, 'nhblk10', 'pop10', w = knn) >>> spatial_dissim_index.statistic 0.28544347200877285
Assuming Rook contiguity neighborhood
>>> roo = Rook.from_dataframe(gdf) >>> spatial_dissim_index = Spatial_Dissim(gdf, 'nhblk10', 'pop10', w = roo) >>> spatial_dissim_index.statistic 0.2866269198707091
- Attributes
- statisticfloat
Spatial Dissimilarity Index
- core_dataa geopandas DataFrame
A geopandas DataFrame that contains the columns used to perform the estimate.
-
__init__
(data, group_pop_var, total_pop_var, w=None, standardize=False)[source]¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
(data, group_pop_var, total_pop_var)Initialize self.