2.4. Glimpse of Chicago#

Personal Note from the Author

In the Summer of 2020, Hope Wood (Wheaton College Urban Studies staff), Demetrius Crawford (Wheaton College Urban Studies outreach program participant) and I (Wheaton College math professor) launched the idea of creating JNB labs as a tool to teach math, data analysis and STEAM to disadvantaged Chicago grade-school students.

In Fall 2020, Mr. Jim Wilkes, principal of Cornerstone Academy, an alternative Christian High School serving mainly low to middle income students on the West Side of Chicago held the pilot offering of a JNB after-school program via Zoom. A class of Wheaton College student mentors worked online in small groups of 2 or 3 with Cornerstone students. Mr. Wilkes and I served as co-program directors, and Hope Wood and Antoinette Ratliff as program staff.

The following JNBs were developed for use with 8th graders participating in an after-school program at a non-profit organization called Celestial Ministries. The first two years, the scholars’ middle school math teacher brought them to to the program and participated in the class. I served as teacher. We are planning for the middle school teacher to take over the teaching responsibilities.

Celestial Ministries serves single parent children in Lawndale neighborhood on the West Side of Chicago. Parents and friends were invite to a recognition ceremony in which students demonstrated various functionalities of JNBs which covered in the program. Students who successfully completed the program were given a certificate and a 100 dollar stipend.

Word of Advice

There are likely opportunities to introduce Python Jupyter Notebooks in disadvantaged communities within proximity of your workplace or residence. Establishing a good working relationship with a well-established community organization and loal community math teacher is highly recommended as opposed to seeking to establish a program on your own.

2.4.1. Michael Jordan’s Greatest Scoring Game#

Note

In this section, we will introduce a very important programming tool called a function.

STEP#1 Watch the PowerPoint on this lesson. link to PPT

STEP #2 From the PowerPoint, a variable is like a b ____ . A variable has a n ______ . We can store some i ______ such as a number (eg. 3) or a message (eg. ‘God is Good!’) in a variable. Let’s give names to two variables:

  • feet

  • inch

In Michael Jordan’s case, the info stored in the variable feet is the number 6 and the info stored in the variable inch is also equal to 6. For short, we just write feet=6 and inch=6 Let’s go around and each person state your name and what your value is for the variables feet and inch.

(Answers: box, name, info)

STEP #3 The word for a short computer program is a c ______. The programming language we are using is Py ______ n. In the next cell, we give Python code to create the function in STEP 2. (If it looks complicated, try to find someone to help explain it to you.)

(Answers: code, Python)

def height(feet,inch):
    height_in_inches = 12*feet + inch  
    return height_in_inches
height(6,0)  #Test on Seimone's height  feet=6, inch=0
72

STEP #4 What do we have to do in the next cell to test our function on 6’6” Michael Jordan?

STEP #5 One of the fun parts of programming is that we can try to design a function to do almost anything we want.

Define a function called points(fg,ft,tp) which computes the points scored by a basketball player given the input variables

  • fg= number of 2 point field goals made

  • ft=number of free throws made

  • tp=number of three point shots made

Use a variable called pts to store the answer.

STEP 6 In his highest scoring game, Michael Jordan made 21 two-point field goals, 21 free throws and 2 three point shots. Check that your function gives the correct value of 69 for his point total.

#Supply the missing values for points(fg,ft,tp)

For Discussion

  1. What would you like your mentors to explain about today’s lesson?

  2. What is one of your favorite sports memories?

2.4.2. How Bad is COVID-19?#

Note

In this section we will analyze data imported directly from the Chicago Data Portal

https://data.cityofchicago.org/

STEP 1 Let’s read in up to 100,000 rows of COVID data for Chicago from the Chicago Data Portal.

import pandas as pd
import numpy as np
rawCOVID = pd.read_json('https://data.cityofchicago.org/resource/yhhz-zm2v.json?$limit=100000') #Import data directly from data portal
rawCOVID.head(10)
zip_code week_number week_start week_end cases_weekly cases_cumulative case_rate_weekly case_rate_cumulative tests_weekly tests_cumulative ... death_rate_weekly death_rate_cumulative population row_id zip_code_location :@computed_region_rpca_8um6 :@computed_region_vrxf_vc4k :@computed_region_6mkv_f3dw :@computed_region_bdys_3d7i :@computed_region_43wa_7qmu
0 60601 17 2020-04-19T00:00:00.000 2020-04-25T00:00:00.000 7.0 45.0 47.7 306.6 35.0 198 ... 6.8 13.6 14675 60601-2020-17 {'type': 'Point', 'coordinates': [-87.622844, ... 42.0 38.0 14309.0 580.0 36.0
1 60601 12 2021-03-21T00:00:00.000 2021-03-27T00:00:00.000 29.0 1003.0 197.6 6834.8 616.0 23059 ... 0.0 75.0 14675 60601-2021-12 {'type': 'Point', 'coordinates': [-87.622844, ... 42.0 38.0 14309.0 580.0 36.0
2 60601 38 2020-09-13T00:00:00.000 2020-09-19T00:00:00.000 7.0 204.0 47.7 1390.1 325.0 5654 ... 0.0 34.1 14675 60601-2020-38 {'type': 'Point', 'coordinates': [-87.622844, ... 42.0 38.0 14309.0 580.0 36.0
3 60601 43 2021-10-24T00:00:00.000 2021-10-30T00:00:00.000 22.0 1552.0 149.9 10575.8 1243.0 58296 ... 0.0 81.8 14675 60601-2021-43 {'type': 'Point', 'coordinates': [-87.622844, ... 42.0 38.0 14309.0 580.0 36.0
4 60601 24 2020-06-07T00:00:00.000 2020-06-13T00:00:00.000 6.0 78.0 40.9 531.5 106.0 782 ... 0.0 34.1 14675 60601-2020-24 {'type': 'Point', 'coordinates': [-87.622844, ... 42.0 38.0 14309.0 580.0 36.0
5 60601 26 2020-06-21T00:00:00.000 2020-06-27T00:00:00.000 4.0 88.0 27.3 599.7 175.0 1075 ... 0.0 34.1 14675 60601-2020-26 {'type': 'Point', 'coordinates': [-87.622844, ... 42.0 38.0 14309.0 580.0 36.0
6 60601 28 2020-07-05T00:00:00.000 2020-07-11T00:00:00.000 4.0 96.0 27.3 654.2 202.0 1439 ... 0.0 34.1 14675 60601-2020-28 {'type': 'Point', 'coordinates': [-87.622844, ... 42.0 38.0 14309.0 580.0 36.0
7 60601 39 2020-09-20T00:00:00.000 2020-09-26T00:00:00.000 9.0 213.0 61.3 1451.4 390.0 6044 ... 6.8 40.9 14675 60601-2020-39 {'type': 'Point', 'coordinates': [-87.622844, ... 42.0 38.0 14309.0 580.0 36.0
8 60601 40 2020-09-27T00:00:00.000 2020-10-03T00:00:00.000 5.0 218.0 34.1 1485.5 393.0 6437 ... 0.0 40.9 14675 60601-2020-40 {'type': 'Point', 'coordinates': [-87.622844, ... 42.0 38.0 14309.0 580.0 36.0
9 60601 41 2020-10-04T00:00:00.000 2020-10-10T00:00:00.000 7.0 225.0 47.7 1533.2 433.0 6870 ... 0.0 40.9 14675 60601-2020-41 {'type': 'Point', 'coordinates': [-87.622844, ... 42.0 38.0 14309.0 580.0 36.0

10 rows × 26 columns

Let’s check the number of rows and columns.

rawCOVID.shape
(10560, 26)

We’ll list the 26 column names

list(rawCOVID.columns)
['zip_code',
 'week_number',
 'week_start',
 'week_end',
 'cases_weekly',
 'cases_cumulative',
 'case_rate_weekly',
 'case_rate_cumulative',
 'tests_weekly',
 'tests_cumulative',
 'test_rate_weekly',
 'test_rate_cumulative',
 'percent_tested_positive_weekly',
 'percent_tested_positive_cumulative',
 'deaths_weekly',
 'deaths_cumulative',
 'death_rate_weekly',
 'death_rate_cumulative',
 'population',
 'row_id',
 'zip_code_location',
 ':@computed_region_rpca_8um6',
 ':@computed_region_vrxf_vc4k',
 ':@computed_region_6mkv_f3dw',
 ':@computed_region_bdys_3d7i',
 ':@computed_region_43wa_7qmu']

Q1 Does case matter when referring to a column name?

STEP TWO Let’s streamline the data to just 4 columns, drop rows with missing data, simplify the column names, and then display the first 20 rows.

COVID=rawCOVID[['deaths_cumulative', "population",  'tests_cumulative','zip_code']]
COVID.dropna  #drop rows with missing data
COVID.columns=["deaths","population","tests","zip"] #simplify the column names
COVID.head(20)
deaths population tests zip
0 2 14675 198 60601
1 11 14675 23059 60601
2 5 14675 5654 60601
3 12 14675 58296 60601
4 5 14675 782 60601
5 5 14675 1075 60601
6 5 14675 1439 60601
7 6 14675 6044 60601
8 6 14675 6437 60601
9 6 14675 6870 60601
10 7 14675 7642 60601
11 7 14675 8222 60601
12 7 14675 8798 60601
13 7 14675 9377 60601
14 8 14675 10125 60601
15 8 14675 11129 60601
16 8 14675 11710 60601
17 9 14675 12417 60601
18 9 14675 13092 60601
19 10 14675 13973 60601

Q2 Why is there different information for the same zip code?

STEP THREE

Let’s check how many rows have data in each column

COVID.count()
deaths        10560
population    10560
tests         10560
zip           10560
dtype: int64

Q3 How many data rows are there in each column?

STEP FOUR

Let’s find out how many times each zip code appears.

COVID["zip"].value_counts()
60601      176
60602      176
60646      176
60647      176
60649      176
60651      176
60652      176
60653      176
60654      176
60655      176
60656      176
60657      176
60661      176
60614      176
60615      176
60617      176
60618      176
60827      176
60707      176
60666      176
60660      176
60643      176
60633      176
60624      176
60640      176
60625      176
60630      176
60659      176
60628      176
60645      176
60644      176
60610      176
60620      176
60603      176
60607      176
60608      176
60606      176
60604      176
60605      176
60611      176
60612      176
60609      176
60613      176
60616      176
60619      176
60621      176
60642      176
60622      176
60623      176
60626      176
60629      176
60631      176
60632      176
60636      176
60637      176
60634      176
60638      176
60639      176
60641      176
Unknown    176
Name: zip, dtype: int64

STEP FIVE

Let’s make a copy of the COVID dataframe and display the first 5 rows.

df1=COVID
df1.head(5)
deaths population tests zip
0 2 14675 198 60601
1 11 14675 23059 60601
2 5 14675 5654 60601
3 12 14675 58296 60601
4 5 14675 782 60601

STEP SIX

Filter data by a specific zip code, for example, ‘60611’

df2=df1[df1['zip']=='60611']
df2.head()
deaths population tests zip
214 3 32426 10229 60611
215 4 32426 11000 60611
216 4 32426 11777 60611
218 14 32426 45055 60611
229 14 32426 41668 60611
df2["zip"].value_counts()
60611    176
Name: zip, dtype: int64

STEP SEVEN

Let’s find the largest value in the ‘tests’ column for zip ‘60611’.

df2 = df1[df1["zip"]=='60637'] #get just rows with zip 60611
numtested=df2["tests"].max() #get the largest number for tests
numtested
413972

Q4 How can we find how many deaths due to COVID have occurred in zip ‘60623’? (Create a new dataframe for 60623 called temp)

Hide code cell source
df3 = COVID[df1["zip"]=='60623'] #get just rows with zip 60611
numdeaths=df3["deaths"].max() #get the largest number for tests
numdeaths
334

Q5 How can we get the population in 60623?

Hide code cell source
df3 = COVID[df1["zip"]=='60623'] #get just rows with zip 60611
pop=df3["population"].max() #get the largest number for tests
pop
df3 = COVID[df1["zip"]=='60623'] #get just rows with zip 60611
pop=df3["population"].max() #get the largest number for tests
pop
85979

STEP EIGHT We can instruct the computer to give us the number COVID tests for any Chicago zipcode.

def MyCOVID(COVID,zip):
    alreadychecked=0  #eliminate duplication of information
    for z in COVID.index:  #go through all the index values
        if COVID.loc[z,"zip"]==zip and alreadychecked==0:    #found the zip we requested (first-time)
            alreadychecked=1  #we will only do this once
            df=COVID[COVID["zip"]==zip]
            numtested=df["tests"].max()
            print("Zip code: ", zip)  #print zipcode
            print("number tested is ", numtested) #print number tested
    return ("Enter a different zip code if you wish.")

Q6 Test out the function on zip code ‘60610’

Hide code cell source
MyCOVID(COVID,'60637')
Zip code:  60637
number tested is  413972
'Enter a different zip code if you wish.'

Exercises#

Exercises

  1. Define a function myCOVID2() which outputs for each given zip code the population, number tested, and number of deaths.

  2. Use your function myCOVID2() to determine the COVID data for each of the following Chicago landmarks:

a) North Park University (zip ‘60625’)

b) Wheaton in Chicago (zip=‘60637’)

  1. Why is COVID disproportionately impacting black and brown communities?

2.4.3. Mapping Famous Chicagoans#

Note

In the section we will learn how to use Python to put information on a map of a geographical location.

Step One Let’s’ begin by importing data of some famous people born in Chicago.

import pandas as pd
Chi=pd.read_csv('chicagoans.csv')
Chi
Name Age Alive Noted For Place of Birth Zip Lat Lon fun fact
0 Michelle Obama 56 yes First Lady Provident Hospital 60605 41.802665 -87.613657 father worked at city water plant
1 Robin Williams 63 no Actor St Luke's Hospital 60605 41.863100 -87.623100 went to Julliard with Christopher Reeve
2 Walt Disney 65 no Film Producer 149 Tripp Ave 60651 41.907879 -87.732594 won 22 oscars
3 Bobby Fischer 64 no Chess Michael Reese Hospital 60606 41.884116 -87.637656 grandmaster at age 15
4 Isiah Thomas 59 yes Basketball West Side 60624 41.874827 -87.710075 youngest of 9 children

Q1 How old was Bobby Fischer when he became a grandmaster?

STEP TWO Let’s use a Python library called folium to make a map of Chicago.

!pip install folium
Collecting folium
  Downloading folium-0.14.0-py2.py3-none-any.whl (102 kB)
                                              0.0/102.3 kB ? eta -:--:--
     -------------------------------------- 102.3/102.3 kB 5.8 MB/s eta 0:00:00
Collecting branca>=0.6.0 (from folium)
  Downloading branca-0.6.0-py3-none-any.whl (24 kB)
Requirement already satisfied: jinja2>=2.9 in c:\users\pisihara\appdata\local\anaconda3\lib\site-packages (from folium) (3.1.2)
Requirement already satisfied: numpy in c:\users\pisihara\appdata\local\anaconda3\lib\site-packages (from folium) (1.24.3)
Requirement already satisfied: requests in c:\users\pisihara\appdata\local\anaconda3\lib\site-packages (from folium) (2.29.0)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\pisihara\appdata\local\anaconda3\lib\site-packages (from jinja2>=2.9->folium) (2.1.1)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\pisihara\appdata\local\anaconda3\lib\site-packages (from requests->folium) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in c:\users\pisihara\appdata\local\anaconda3\lib\site-packages (from requests->folium) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\pisihara\appdata\local\anaconda3\lib\site-packages (from requests->folium) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\pisihara\appdata\local\anaconda3\lib\site-packages (from requests->folium) (2023.5.7)
Installing collected packages: branca, folium
Successfully installed branca-0.6.0 folium-0.14.0
import folium # map rendering library
from folium.features import DivIcon #used to add popup info to a map
Chicago_map = folium.Map(location=[41.886456, -87.62325],  tiles="openstreetmap", zoom_start=10)
Chicago_map
Make this Notebook Trusted to load map: File -> Trust Notebook

Q2

Try adjusting the “zoom_start” value. What happens?

What happens if you change the numbers in location=[41.886456, -87.62325]”

STEP THREE Let’s add our data about famous Chicagoans to the map.

Chicago_map=folium.Map(location=[41.886456,-87.62325],tiles="openstreetmap",zoom_start=11)
for i in Chi.index:
    p=[Chi.loc[i,"Lat"],Chi.loc[i,"Lon"]]
    folium.Marker(p,icon=DivIcon(
        icon_size=(100,0),
        icon_anchor=(0,8),
        html='<div style="font-size:20pt; color:red">'+str(Chi.loc[i,"Name"]) +'</div>',
            )).add_to(Chicago_map)
    Chicago_map.add_child(folium.CircleMarker(p, radius=1,color='black'))
Chicago_map.save("Chicagoans.html")
Chicago_map
Make this Notebook Trusted to load map: File -> Trust Notebook

Q3 What side of the city was Michelle Obama born?

STEP FOUR Let’s make a function which can add another person to our dataframe called Chi

def addperson(map_name,df,name,age,alive,noted,birth,zipcode,lat,lon,fact):
    our_map=map_name 
    new_row = {'Name':name, 
                   'Age':age, 
                   'Alive':alive,
                   'Noted For':noted,
                    'Place of Birth':birth,
                    'Zip':zipcode,
                    'Lat':lat,
                    'Lon':lon,
                    'fun fact':fact            }
    df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)
    i=len(df)-1
    p=[df.loc[i,"Lat"],df.loc[i,"Lon"]]
    folium.Marker(p,icon=DivIcon(
        icon_size=(100,0),
        icon_anchor=(0,8),
        html='<div style="font-size: 15pt; color : blue">'+str(df.loc[i,"Name"]) +'</div>',
            )).add_to(our_map)
    our_map.add_child(folium.CircleMarker(p, radius=1,color='blue'))
    return df,our_map

Q4 What command is used to add a new line to the dataframe df?

STEP FIVE Let’s add Jenifer Hudson to our dataframe using the following info:

‘Jennifer Hudson’,38,‘yes’,‘Actress’,‘Englewood’,60621,41.779699,-87.633194,‘worked at Burger King’

[Chi,Chicago_map]=addperson(Chicago_map,Chi,'Jennifer Hudson','Actress',41,'yes','Englewood',60621,41.779699,-87.633194,'worked at Burger King')
Chicago_map
Make this Notebook Trusted to load map: File -> Trust Notebook

Q5 Where did Jennifer Hudson work before becoming famous?

STEP SIX Add your info to the map by editing the info in the next cell.

#answer to Step Six

STEP SEVEN Use the following two commands to save your map and info in a new Excel file:

Chicago_map.save(“Chicagoans.html”)

Chi.to_excel(‘ChiRevised.xlsx’)

Enter and run the commands in the cell below.

Q7 Did your map display correctly? Did the new Excel file include your name?

Discussion

  1. Whom do you admire? Why so?

  2. What character qualities can help someone to be successful in life?

  3. What can help you to be more successful?

2.4.4. Predicting Exemplary Schools#

Note

This section is a friendly competition predict whether a K-8 Chicago school’s IL State Board of Education Summative Designation is designated ‘Exemplary’ (the ‘Non-Exemplary’ school categories are “Commendable”, “Targeted”, and “Comprehensive”).

The prediction will be based on the file MiddleSchool.xlsx with the following data:

  • Student Enrollment - Black or African American

  • Student Enrollment - Hispanic or Latino

  • Student Enrollment - Children with Disabilities

  • Student Enrollment - Low Income

  • Total Number of School Days

  • 8th Grade Passing Algebra 1

  • Student Attendance Rate

  • Student Chronic Truancy Rate

  • Avg Class Size – All Grades

  • Teacher Retention Rate

Scoring

Scoring for our competition is based on values for the Confusion Matrix:

\[\begin{split} \begin{pmatrix} TP & FN \\ FP & TN \end{pmatrix} \end{split}\]

where

  • TP=True Positive: your model predicts exemplary and the school is exemplary

  • TN=True Negative: your model predicts not exemplary and the school is not exemplary

  • FP=False Positive: your model predicts exemplary but the school is not exemplary

  • FN=False Negative: your model predicts not exemplary but the school is exemplary

The number of each type of prediction then determines

  • Accuracy = \(\frac{\mid TP\mid + \mid TN \mid}{\mid TP\mid + \mid TN \mid+ \mid FP\mid + \mid FN \mid}\) (proportion that were correctly predicted out of all the schools)

  • Specificity (Precision) = \(\frac{\mid TN\mid}{\mid TN\mid + \mid FP\mid }\) (proportion that were correct out of those you predicted to be exemplary)

  • Sensitivity (Recall) = \(\frac{\mid TP\mid}{\mid TP\mid + \mid FN\mid }\) (proportion that you predicted correctly among just the exemplary schools)

Your competition (F1) score is the geometric mean of the precision (specificity), and recall(sensitivity):

\[ F_1=2\frac{precision*recall}{precision+recall} \]

STEP ONE Exploratory Data Analysis 1a) Import the usual libraries including matplotlib.pyplot as plt, as well as the MiddleSchool report card data into a datframe df.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.read_excel("MiddleSchool.xlsx")
df.head(2)
School Name School Type Grades Served Summative Designation # Student Enrollment % Student Enrollment - Black or African American % Student Enrollment - Hispanic or Latino % Student Enrollment - Children with Disabilities % Student Enrollment - Low Income Total Number of School Days % 8th Grade Passing Algebra 1 Student Attendance Rate Student Chronic Truancy Rate Avg Class Size – All Grades Teacher Retention Rate
0 Barbour Two-Way Lang Immersion ELEMENTARY K 1 2 3 4 5 6 7 8 Commendable 779 6.0 84.3 6.5 72.9 181 0.0 96.1 7.4 26.2 91.4
1 Galapagos Rockford Charter Sch CHARTER SCH K 1 2 3 4 5 6 7 8 Commendable 343 60.1 18.4 16.6 44.3 181 0.0 95.8 29.6 19.1 44.2

1b) Display the column names and the number of data rows in each column.

df.count()
School Name                                          122
School Type                                          122
Grades Served                                        122
Summative Designation                                122
# Student Enrollment                                 122
% Student Enrollment - Black or African American     122
% Student Enrollment - Hispanic or Latino            122
% Student Enrollment - Children with Disabilities    122
% Student Enrollment - Low Income                    122
Total Number of School Days                          122
% 8th Grade Passing Algebra 1                        122
Student Attendance Rate                              122
Student Chronic Truancy Rate                         122
Avg Class Size – All Grades                          122
Teacher Retention Rate                               122
dtype: int64

1c Get the max and min values in each column.

n1min=df["# Student Enrollment"].min()
n1max=df["# Student Enrollment"].max()
n2min=df["% Student Enrollment - Black or African American"].min()
n2max=df["% Student Enrollment - Black or African American"].max()
n3min=df["% Student Enrollment - Hispanic or Latino"].min()
n3max=df["% Student Enrollment - Hispanic or Latino"].max()
n4min=df["% Student Enrollment - Children with Disabilities"].min()
n4max=df["% Student Enrollment - Children with Disabilities"].max()
n5min=df["% Student Enrollment - Low Income"].min()
n5max=df["% Student Enrollment - Low Income"].max()
n6min=df["Total Number of School Days"].min()
n6max=df["Total Number of School Days"].max()
n7min=df["% 8th Grade Passing Algebra 1"].min()
n7max=df["% 8th Grade Passing Algebra 1"].max()
n8min=df["Student Attendance Rate"].min()
n8max=df["Student Attendance Rate"].max()
n9min=df["Student Chronic Truancy Rate"].min()
n9max=df["Student Chronic Truancy Rate"].max()
n10min=df["Avg Class Size – All Grades"].min()
n10max=df["Avg Class Size – All Grades"].max()
n11min=df["Teacher Retention Rate"].min()
n11max=df["Teacher Retention Rate"].max()
print("min enroll",n1min)
print("max enroll",n1max)
print("min % Student Enrollment - Black or African American",n2min)
print("max % Student Enrollment - Black or African American",n2max)
print("min % Student Enrollment - Hispanic or Latino",n3min)
print("max % Student Enrollment - Hispanic or Latino",n3max)
print("min % Student Enrollment - Children with Disabilities",n4min)
print("max % Student Enrollment - Children with Disabilities",n4max)
print("min % Student Enrollment - Low Income",n5min)
print("max % Student Enrollment - Low Income",n5max)
print("min Total Number of School Days",n6min)
print("max Total Number of School Days",n6max)
print("min % 8th Grade Passing Algebra 1",n7min)
print("max % 8th Grade Passing Algebra 1",n7max)
print("min Student Attendance Rate",n8min)
print("max Student Attendance Rate",n8max)
print("min Student Chronic Truancy Rate",n9min)
print("max Student Chronic Truancy Rate",n9max)
print("Avg Class Size – All Grades",n10min)
print("Avg Class Size – All Grades",n10max)
print("Teacher Retention Rate",n11min)
print("Teacher Retention Rate",n11max)
min enroll 92
max enroll 1306
min % Student Enrollment - Black or African American 0.2
max % Student Enrollment - Black or African American 99.5
min % Student Enrollment - Hispanic or Latino 0.5
max % Student Enrollment - Hispanic or Latino 99.8
min % Student Enrollment - Children with Disabilities 5.8
max % Student Enrollment - Children with Disabilities 56.3
min % Student Enrollment - Low Income 0.3
max % Student Enrollment - Low Income 99.6
min Total Number of School Days 174
max Total Number of School Days 186
min % 8th Grade Passing Algebra 1 0.0
max % 8th Grade Passing Algebra 1 100.0
min Student Attendance Rate 91.1
max Student Attendance Rate 99.7
min Student Chronic Truancy Rate 0.0
max Student Chronic Truancy Rate 46.2
Avg Class Size – All Grades 2.0
Avg Class Size – All Grades 31.6
Teacher Retention Rate 0.0
Teacher Retention Rate 100.0

1d Check how many schools are in each category.

df["Summative Designation"].value_counts()
Commendable      102
Targeted           8
Exemplary          6
Comprehensive      6
Name: Summative Designation, dtype: int64

STEP TWO Define a function which predicts whether a school is exemplary (1) or not-exemplary (0).

2a For a simple prediction, let us predict that a school is exemplary if the Teacher Retention Rate is at least 90%.

#---PREDICTION MODEL----#
def mypredict(df):
    for i in df.index:
        if df.loc[i,"Teacher Retention Rate"]>90:
            df.loc[i,"Prediction"]=1
        else:
            df.loc[i,"Prediction"]=0
    return df

#---APPLY MODEL TO OUR DATA---#
mydf=mypredict(df)
mydf=mydf.reset_index(drop=True)

#---COMPUTE YOUR SCORE---#
TP=0
TN=0
FP=0
FN=0
numschools=0
for i in mydf.index:
    if mydf.loc[i,"Prediction"]==1 and mydf.loc[i,"Summative Designation"]=="Exemplary":
        TP=TP+1
    if mydf.loc[i,"Prediction"]==0 and mydf.loc[i,"Summative Designation"]!="Exemplary":
        TN=TN+1
    if mydf.loc[i,"Prediction"]==1 and mydf.loc[i,"Summative Designation"]!="Exemplary":
        FP=FP+1
    if mydf.loc[i,"Prediction"]==0 and mydf.loc[i,"Summative Designation"]=="Exemplary":
        FN=FN+1
    numschools=numschools+1
print("|TP|=",TP)
print("|TN|=",TN)
print("|FP|=",FP)
print("|FN|=",FN)
accuracy=round((TP+TN)/numschools,2)
precision=round(TP/(TP+FP),2)
recall=round(TP/(TP+FN),2)
F1score=2*(precision*recall)/(precision+recall)
print("Accuracy (% correct all 122 schools)=",100*accuracy,"%")
print("Precision (% correct of those you predicted to be exemplary) =",100*precision,"%")
print("Recall (% correct of schools that are exemplary) =",100*recall,"%")
print('COMPETITION F1 SCORE=',round(F1score*100,2),"%" )
    
|TP|= 5
|TN|= 70
|FP|= 46
|FN|= 1
Accuracy (% correct all 122 schools)= 61.0 %
Precision (% correct of those you predicted to be exemplary) = 10.0 %
Recall (% correct of schools that are exemplary) = 83.0 %
COMPETITION F1 SCORE= 17.85 %

2.4.5. Assignment#

Assignment

Modify the Prediction Model to see how high you can score.