The Chicago Hardship Index

4.2. The Chicago Hardship Index#

4.2.1. Raw Data and Summary Statistics#

The hardship index is a way to use data to explore urban inequities. Exploratory data analysis for social justice issues should aim to be accessible, significant, equitable, impartial, and transparent.

The hardship index (HI) is the mean of six indicator estimates. Each indicator has been normalized and scaled from 0-100 (except per capita income). A higher hardship index value indicates greater hardship. The 6 raw indicator estimates are complicated by time variation over 5-year periods (US Census Bureau 2008) and geographical boundary approximation (Great Cities Institute 2019). The indicators are

UNEMP = % of community age 16 and older who are unemployed
NOHS = % of community age 25 and older without a high school diploma
DEP = % of community who are dependent (under age 18 or over age 64)
HOUS= % of community with overcrowded housing (more than 1 occupant per room)
POV = % below federal poverty line
INC = per capita income

Datafile: ‘HIHOM20142017.xlsx’

Data Source: https://greatcities.uic.edu/wp-content/uploads/2016/07/GCI-Hardship-Index-Fact-SheetV2.pdf (2010-2014) https://greatcities.uic.edu/wp-content/uploads/2019/12/Hardship-Index-Fact-Sheet-2017-ACS-Final-1.pdf (2013-2017).

4.2.2. Summary Statistics#

Upload the hardship index data for 2014 and 2017. The first two of seventy-seven community areas in Chicago are printed below.

raw_hardship=pd.read_excel('HIHOM20142017.xlsx')
raw_hardship.head(2)

	Community	index	population(K)	HI14	UNEMP14	NOHS14	DEP14	HOUS14	POV14	INC14	...	UNEMP17	NOHS17	DEP17	HOUS17	POV17	INC17	HOM14	HOM17	LAT	LON
0	Albany Park	13	48.0	47.0	10.0	26.9	32.2	9.5	18.3	23240	...	7.1	14.3	32.0	8.6	16.5	25848	2	1	41.96823	-87.72421
1	Archer Heights	56	14.0	54.4	14.3	32.2	38.5	9.8	14.6	16507	...	9.2	17.9	40.2	9.7	15.9	18497	0	3	41.81093	-87.72677

2 rows × 21 columns

Separate the 2014 and 2017 hardship index (HI) data into two dataframes called “dfHI14” and “dfHI17.” The column names will reflect the year.

dfHI14=raw_hardship[["Community","HI14","UNEMP14","NOHS14","DEP14","HOUS14","POV14","INC14"]]
dfHI14 = dfHI14.rename(columns = {'Community':'Community14'})
dfHI17=raw_hardship[["Community","HI17","UNEMP17","NOHS17","DEP17","HOUS17","POV17","INC17"]]
dfHI17 = dfHI17.rename(columns = {'Community':'Community17'})

The first two communities’ HI and the indicator scores for 2014 are printed below.

dfHI14.head(2)

	Community14	HI14	UNEMP14	NOHS14	DEP14	HOUS14	POV14	INC14
0	Albany Park	47.0	10.0	26.9	32.2	9.5	18.3	23240
1	Archer Heights	54.4	14.3	32.2	38.5	9.8	14.6	16507

The scores for two communities in 2017 are printed below.

dfHI17.head(2)

	Community17	HI17	UNEMP17	NOHS17	DEP17	HOUS17	POV17	INC17
0	Albany Park	45.7	7.1	14.3	32.0	8.6	16.5	25848
1	Archer Heights	56.1	9.2	17.9	40.2	9.7	15.9	18497

Calculate the summary statistics for indicators from the data in all seventy-seven communities. For example, we calculate statistics for the HOUS indicator in 2017. By replacing “HOUS17,” statistics for other indicators can be calculated–like “HI17,” “NOHS17”, etc.

x=dfHI17["HOUS17"]
import numpy
print("Minimum: ", numpy.min(x))
print("Maximum: ", numpy.max(x))
print("Standard Deviation: ", numpy.std(x))
print("Mean: ", numpy.mean(x))
print("Median: ", numpy.median(x))

Minimum:  0.3
Maximum:  13.6
Standard Deviation:  2.87070805717617
Mean:  4.114285714285715
Median:  3.3

The table below lists the indicator values for several communities, as shown above, and the summary statistics for each indicator.

4.2.3. K-Means Clustering#

We will use a machine learning method called K-means clustering (see Chapter 10.3) to visualize patterns in the geographic information. By plotting the homocide data with it, twe can investigate a relationship between location, hardship cluster, and number of homocides. We use the sklearn library for machine learning.

Datafile: ‘standardizedindicators.xlsx’

Read standardized hardship index and homicde data.

hom_df = pd.read_excel('standardizedindicators.xlsx')
hom_df.head(2)

	Community	index	HI14	UNEMP14	NOHS14	DEP14	HOUS14	POV14	INC14	HI17	UNEMP17	NOHS17	DEP17	HOUS17	POV17	INC17	HOM14	HOM17	LAT	LON
0	Albany Park	13	47.0	-0.705481	0.667993	-0.470891	1.42992	-0.353466	-0.165118	45.7	-0.706842	0.506067	-0.530539	1.562581	-0.418499	-0.183089	2	1	41.96823	-87.72421
1	Archer Heights	56	54.4	-0.193683	1.136956	0.471085	1.51857	-0.672245	-0.605100	56.1	-0.442634	1.090575	0.742190	1.945762	-0.473646	-0.607925	0	3	41.81093	-87.72677

Create dataframe with just HI and HOM 2017 info

HIHOM=hom_df[["UNEMP17","NOHS17","DEP17","HOUS17","POV17","INC17"]]
HIHOM.head()

	UNEMP17	NOHS17	DEP17	HOUS17	POV17	INC17
0	-0.706842	0.506067	-0.530539	1.562581	-0.418499	-0.183089
1	-0.442634	1.090575	0.742190	1.945762	-0.473646	-0.607925
2	-0.128101	2.243355	0.773232	0.552377	1.575995	-0.574983
3	0.035456	-0.257040	0.229994	-0.283653	-0.896442	-0.290469
4	1.054544	0.018978	0.586979	-0.771338	0.629300	-0.609312

Use the KMeans() function to make n_clusters=2 clusters and get the labels indicating which cluster each point belongs to.

# Fit the k means model
k_means = KMeans(init="k-means++", n_clusters=2, n_init=2)
k_means.fit(HIHOM)
#Get Labels
k_means_labels = k_means.labels_
k_means_labels

array([1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0,
       0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
       1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1,
       0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1], dtype=int32)

Add the labels to hom_df

hom_df["CLASS"]=k_means_labels
hom_df.head(2)

	Community	index	HI14	UNEMP14	NOHS14	DEP14	HOUS14	POV14	INC14	HI17	...	NOHS17	DEP17	HOUS17	POV17	INC17	HOM14	HOM17	LAT	LON	CLASS
0	Albany Park	13	47.0	-0.705481	0.667993	-0.470891	1.42992	-0.353466	-0.165118	45.7	...	0.506067	-0.530539	1.562581	-0.418499	-0.183089	2	1	41.96823	-87.72421	1
1	Archer Heights	56	54.4	-0.193683	1.136956	0.471085	1.51857	-0.672245	-0.605100	56.1	...	1.090575	0.742190	1.945762	-0.473646	-0.607925	0	3	41.81093	-87.72677	1

2 rows × 21 columns

Make a geographic plot of Chicago’s 77 community areas with marker color (blue or red) indicating k-means bi-clustering based only on the 6 standardized economic hardship indicators. Affluent community areas such as the Loop (central business district), Near North Side (which includes the “Gold Coast”), and Hyde Park (site of the University of Chicago) appear in blue. Lower-income communities with a history of injustices, including Woodlawn, Englewood, and Austin, appear in red. Marker sizes are proportional to homicide counts, with actual numbers in parentheses following named community areas.

fig=plt.figure(figsize=(25,20))

for i in hom_df.index:
    if hom_df.loc[i,"CLASS"]==0:   #toggle class (0 or 1) if "Loop" does not appear on the map
        plt.scatter(hom_df.loc[i,'LON'], hom_df.loc[i,'LAT'],s=20*hom_df.loc[i,'HOM17']+20,color='b', alpha=0.95)
        if hom_df.loc[i,"Community"] in ["Loop"]:
            plt.gca().text(hom_df.loc[i,'LON']+.003, hom_df.loc[i,'LAT']-.005, hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='blue', size=30)
        if hom_df.loc[i,"Community"] in ["Hyde Park","Near West Side","Kenwood","Near North Side","Near South Side","West Town"]:
            plt.gca().text(hom_df.loc[i,'LON']+.003, hom_df.loc[i,'LAT']-.005, hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='blue', size=20)
        if hom_df.loc[i,"Community"] in ["Lincoln Park","Lakeview","Uptown","Edgewater","Rogers Park","Logan Square","Avondale","North Center"]:
            plt.gca().text(hom_df.loc[i,'LON']-.015, hom_df.loc[i,'LAT']-.0075, hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='blue', size=20)
        if hom_df.loc[i,"Community"] in ["Lincoln Square","Irving Park","Portage Park","Jefferson Park"]:
            plt.gca().text(hom_df.loc[i,'LON']-.025, hom_df.loc[i,'LAT']-.0075, hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='blue', size=20)
        if hom_df.loc[i,"Community"] in ["Dunning","O’Hare","Edison Park","Norwood Park"]:
            plt.gca().text(hom_df.loc[i,'LON']-.015, hom_df.loc[i,'LAT']+.005, hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='blue', size=20)
        if hom_df.loc[i,"Community"] in ["Forest Glen","Garfield Ridge","Clearning","Ashburn","Beverly","Mount Greenwood","Morgan Park"]:
            plt.gca().text(hom_df.loc[i,'LON']-.015, hom_df.loc[i,'LAT']+.0015, hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='blue', size=20)
    
        
        
        if hom_df.loc[i,"Community"] in ["Hegewisch"]:
            plt.gca().text(hom_df.loc[i,'LON']-.01, hom_df.loc[i,'LAT']-.0075, hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='blue', size=20)
    
    
    else:
        plt.scatter(hom_df.loc[i,'LON'], hom_df.loc[i,'LAT'],s=20*hom_df.loc[i,'HOM17']+20,color='r', alpha=0.95)
        if hom_df.loc[i,"Community"] in ["Austin","Belmont Craigin","Montclare"]:
            plt.gca().text(hom_df.loc[i,'LON']-.02,hom_df.loc[i,'LAT']-.013, hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='red', size=20)
        if hom_df.loc[i,"Community"] in ["Woodlawn","Englewood","Chicago Lawn","South Lawndale","McKinley Park","Brighton Park","Archer Heights","West Elsdon","West Lawn","New City","Greater Grand Crossing","Auburn Gresham","South Chicago","East Side","South Deering","Riverdale"]:
            plt.gca().text(hom_df.loc[i,'LON']-.015,hom_df.loc[i,'LAT']-.009, hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='red', size=20)
        if hom_df.loc[i,"Community"] in ["Avalon Park","Burnside","Pullman"]:
            plt.gca().text(hom_df.loc[i,'LON']-.015,hom_df.loc[i,'LAT']-.006,hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='red', size=20)
        if hom_df.loc[i,"Community"] in ["Roseland","North Lawndale","East Garfield Park","Hermosa","West Englewood","Humboldt Park"]:
            plt.gca().text(hom_df.loc[i,'LON']-.015,hom_df.loc[i,'LAT']+.007,hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='red', size=20)
        if hom_df.loc[i,"Community"] in ["West Garfield Park"]:
            plt.gca().text(hom_df.loc[i,'LON']-.04,hom_df.loc[i,'LAT']-.007,hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='red', size=20)
        if hom_df.loc[i,"Community"] in ["Lower West Side","Oakland","Douglas","Armour Square"]:
            plt.gca().text(hom_df.loc[i,'LON']-.001,hom_df.loc[i,'LAT']+.002,hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='red', size=20)
        if hom_df.loc[i,"Community"] in ["Bridgeport"]:
            plt.gca().text(hom_df.loc[i,'LON']-.01,hom_df.loc[i,'LAT']-.005,hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='red', size=20)
        if hom_df.loc[i,"Community"] in ["South Shore"]:
            plt.gca().text(hom_df.loc[i,'LON']-.015,hom_df.loc[i,'LAT']+.003,hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='red', size=20)
        if hom_df.loc[i,"Community"] in ["Washington Park","West Pullman","Grand Boulevard"]:
            plt.gca().text(hom_df.loc[i,'LON']-.025,hom_df.loc[i,'LAT']+.003,hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='red', size=20)
        if hom_df.loc[i,"Community"] in ["Fuller Park"]:
            plt.gca().text(hom_df.loc[i,'LON']-.027,hom_df.loc[i,'LAT']+.002,hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='red', size=20)
        if hom_df.loc[i,"Community"] in ["Washington Heights"]:
            plt.gca().text(hom_df.loc[i,'LON']-.05,hom_df.loc[i,'LAT']+.005,hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='red', size=20)
        if hom_df.loc[i,"Community"] in ["West Ridge","North Park","Albany Park"]:
            plt.gca().text(hom_df.loc[i,'LON']-.02,hom_df.loc[i,'LAT']+.005,hom_df.loc[i,'Community']+'('+str(hom_df.loc[i,'HOM17'])+')',color='red', size=20)

#plt.gca().set_facecolor('lightgray')
plt.gca().grid()
plt.yticks(fontsize=20) 
plt.xticks(fontsize=20)
#title
plt.title('KMeans Classification of Community Areas by 6 Standardized Hardship Indicators',size=25)
plt.xlabel("Longitude",size=30)
plt.ylabel("Latitude",size=30)
plt.legend(loc="lower left")
fig.savefig("Fig4.png") 
#show the plot
plt.show()

No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.

../../_images/b1f544a0cd16ecc7e03d42ea66d3b4d1c660b39d8e6f3f18be2a470666a26a13.png

Exercise

Note that the Near North Side has the same homicide count as Riverdale. However, the homicide rate in Riverdale was 10 times higher than the Near North Side. Using the 2017 data, change the marker size so that it corresponds to homicide rate (per 10,000 people) rather than homicide count.

The Chicago Hardship Index

Contents

4.2. The Chicago Hardship Index#

4.2.1. Raw Data and Summary Statistics#

4.2.2. Summary Statistics#

4.2.3. K-Means Clustering#