2.4. Glimpse of Chicago#
Word of Advice
There are likely opportunities to introduce Python Jupyter Notebooks in disadvantaged communities within proximity of your workplace or residence. Establishing a good working relationship with a well-established community organization and loal community math teacher is highly recommended as opposed to seeking to establish a program on your own.
2.4.1. Michael Jordan’s Greatest Scoring Game#
Note
In this section, we will introduce a very important programming tool called a function.
STEP#1 Watch the PowerPoint on this lesson. link to PPT
STEP #2 From the PowerPoint, a variable is like a b ____ . A variable has a n ______ . We can store some i ______ such as a number (eg. 3) or a message (eg. ‘God is Good!’) in a variable. Let’s give names to two variables:
feet
inch
In Michael Jordan’s case, the info stored in the variable feet is the number 6 and the info stored in the variable inch is also equal to 6. For short, we just write feet=6 and inch=6 Let’s go around and each person state your name and what your value is for the variables feet and inch.
(Answers: box, name, info)
STEP #3 The word for a short computer program is a c ______. The programming language we are using is Py ______ n. In the next cell, we give Python code to create the function in STEP 2. (If it looks complicated, try to find someone to help explain it to you.)
(Answers: code, Python)
def height(feet,inch):
height_in_inches = 12*feet + inch
return height_in_inches
height(6,0) #Test on Seimone's height feet=6, inch=0
72
STEP #4 What do we have to do in the next cell to test our function on 6’6” Michael Jordan?
STEP #5 One of the fun parts of programming is that we can try to design a function to do almost anything we want.
Define a function called points(fg,ft,tp) which computes the points scored by a basketball player given the input variables
fg= number of 2 point field goals made
ft=number of free throws made
tp=number of three point shots made
Use a variable called pts to store the answer.
STEP 6 In his highest scoring game, Michael Jordan made 21 two-point field goals, 21 free throws and 2 three point shots. Check that your function gives the correct value of 69 for his point total.
#Supply the missing values for points(fg,ft,tp)
For Discussion
What would you like your mentors to explain about today’s lesson?
What is one of your favorite sports memories?
2.4.2. How Bad is COVID-19?#
Note
In this section we will analyze data imported directly from the Chicago Data Portal
STEP 1 Let’s read in up to 100,000 rows of COVID data for Chicago from the Chicago Data Portal.
import pandas as pd
import numpy as np
rawCOVID = pd.read_json('https://data.cityofchicago.org/resource/yhhz-zm2v.json?$limit=100000') #Import data directly from data portal
rawCOVID.head(10)
zip_code | week_number | week_start | week_end | cases_weekly | cases_cumulative | case_rate_weekly | case_rate_cumulative | tests_weekly | tests_cumulative | ... | death_rate_weekly | death_rate_cumulative | population | row_id | zip_code_location | :@computed_region_rpca_8um6 | :@computed_region_vrxf_vc4k | :@computed_region_6mkv_f3dw | :@computed_region_bdys_3d7i | :@computed_region_43wa_7qmu | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 60601 | 17 | 2020-04-19T00:00:00.000 | 2020-04-25T00:00:00.000 | 7.0 | 45.0 | 47.7 | 306.6 | 35.0 | 198 | ... | 6.8 | 13.6 | 14675 | 60601-2020-17 | {'type': 'Point', 'coordinates': [-87.622844, ... | 42.0 | 38.0 | 14309.0 | 580.0 | 36.0 |
1 | 60601 | 12 | 2021-03-21T00:00:00.000 | 2021-03-27T00:00:00.000 | 29.0 | 1003.0 | 197.6 | 6834.8 | 616.0 | 23059 | ... | 0.0 | 75.0 | 14675 | 60601-2021-12 | {'type': 'Point', 'coordinates': [-87.622844, ... | 42.0 | 38.0 | 14309.0 | 580.0 | 36.0 |
2 | 60601 | 38 | 2020-09-13T00:00:00.000 | 2020-09-19T00:00:00.000 | 7.0 | 204.0 | 47.7 | 1390.1 | 325.0 | 5654 | ... | 0.0 | 34.1 | 14675 | 60601-2020-38 | {'type': 'Point', 'coordinates': [-87.622844, ... | 42.0 | 38.0 | 14309.0 | 580.0 | 36.0 |
3 | 60601 | 43 | 2021-10-24T00:00:00.000 | 2021-10-30T00:00:00.000 | 22.0 | 1552.0 | 149.9 | 10575.8 | 1243.0 | 58296 | ... | 0.0 | 81.8 | 14675 | 60601-2021-43 | {'type': 'Point', 'coordinates': [-87.622844, ... | 42.0 | 38.0 | 14309.0 | 580.0 | 36.0 |
4 | 60601 | 24 | 2020-06-07T00:00:00.000 | 2020-06-13T00:00:00.000 | 6.0 | 78.0 | 40.9 | 531.5 | 106.0 | 782 | ... | 0.0 | 34.1 | 14675 | 60601-2020-24 | {'type': 'Point', 'coordinates': [-87.622844, ... | 42.0 | 38.0 | 14309.0 | 580.0 | 36.0 |
5 | 60601 | 26 | 2020-06-21T00:00:00.000 | 2020-06-27T00:00:00.000 | 4.0 | 88.0 | 27.3 | 599.7 | 175.0 | 1075 | ... | 0.0 | 34.1 | 14675 | 60601-2020-26 | {'type': 'Point', 'coordinates': [-87.622844, ... | 42.0 | 38.0 | 14309.0 | 580.0 | 36.0 |
6 | 60601 | 28 | 2020-07-05T00:00:00.000 | 2020-07-11T00:00:00.000 | 4.0 | 96.0 | 27.3 | 654.2 | 202.0 | 1439 | ... | 0.0 | 34.1 | 14675 | 60601-2020-28 | {'type': 'Point', 'coordinates': [-87.622844, ... | 42.0 | 38.0 | 14309.0 | 580.0 | 36.0 |
7 | 60601 | 39 | 2020-09-20T00:00:00.000 | 2020-09-26T00:00:00.000 | 9.0 | 213.0 | 61.3 | 1451.4 | 390.0 | 6044 | ... | 6.8 | 40.9 | 14675 | 60601-2020-39 | {'type': 'Point', 'coordinates': [-87.622844, ... | 42.0 | 38.0 | 14309.0 | 580.0 | 36.0 |
8 | 60601 | 40 | 2020-09-27T00:00:00.000 | 2020-10-03T00:00:00.000 | 5.0 | 218.0 | 34.1 | 1485.5 | 393.0 | 6437 | ... | 0.0 | 40.9 | 14675 | 60601-2020-40 | {'type': 'Point', 'coordinates': [-87.622844, ... | 42.0 | 38.0 | 14309.0 | 580.0 | 36.0 |
9 | 60601 | 41 | 2020-10-04T00:00:00.000 | 2020-10-10T00:00:00.000 | 7.0 | 225.0 | 47.7 | 1533.2 | 433.0 | 6870 | ... | 0.0 | 40.9 | 14675 | 60601-2020-41 | {'type': 'Point', 'coordinates': [-87.622844, ... | 42.0 | 38.0 | 14309.0 | 580.0 | 36.0 |
10 rows × 26 columns
Let’s check the number of rows and columns.
rawCOVID.shape
(10560, 26)
We’ll list the 26 column names
list(rawCOVID.columns)
['zip_code',
'week_number',
'week_start',
'week_end',
'cases_weekly',
'cases_cumulative',
'case_rate_weekly',
'case_rate_cumulative',
'tests_weekly',
'tests_cumulative',
'test_rate_weekly',
'test_rate_cumulative',
'percent_tested_positive_weekly',
'percent_tested_positive_cumulative',
'deaths_weekly',
'deaths_cumulative',
'death_rate_weekly',
'death_rate_cumulative',
'population',
'row_id',
'zip_code_location',
':@computed_region_rpca_8um6',
':@computed_region_vrxf_vc4k',
':@computed_region_6mkv_f3dw',
':@computed_region_bdys_3d7i',
':@computed_region_43wa_7qmu']
Q1 Does case matter when referring to a column name?
STEP TWO Let’s streamline the data to just 4 columns, drop rows with missing data, simplify the column names, and then display the first 20 rows.
COVID=rawCOVID[['deaths_cumulative', "population", 'tests_cumulative','zip_code']]
COVID.dropna #drop rows with missing data
COVID.columns=["deaths","population","tests","zip"] #simplify the column names
COVID.head(20)
deaths | population | tests | zip | |
---|---|---|---|---|
0 | 2 | 14675 | 198 | 60601 |
1 | 11 | 14675 | 23059 | 60601 |
2 | 5 | 14675 | 5654 | 60601 |
3 | 12 | 14675 | 58296 | 60601 |
4 | 5 | 14675 | 782 | 60601 |
5 | 5 | 14675 | 1075 | 60601 |
6 | 5 | 14675 | 1439 | 60601 |
7 | 6 | 14675 | 6044 | 60601 |
8 | 6 | 14675 | 6437 | 60601 |
9 | 6 | 14675 | 6870 | 60601 |
10 | 7 | 14675 | 7642 | 60601 |
11 | 7 | 14675 | 8222 | 60601 |
12 | 7 | 14675 | 8798 | 60601 |
13 | 7 | 14675 | 9377 | 60601 |
14 | 8 | 14675 | 10125 | 60601 |
15 | 8 | 14675 | 11129 | 60601 |
16 | 8 | 14675 | 11710 | 60601 |
17 | 9 | 14675 | 12417 | 60601 |
18 | 9 | 14675 | 13092 | 60601 |
19 | 10 | 14675 | 13973 | 60601 |
Q2 Why is there different information for the same zip code?
STEP THREE
Let’s check how many rows have data in each column
COVID.count()
deaths 10560
population 10560
tests 10560
zip 10560
dtype: int64
Q3 How many data rows are there in each column?
STEP FOUR
Let’s find out how many times each zip code appears.
COVID["zip"].value_counts()
60601 176
60602 176
60646 176
60647 176
60649 176
60651 176
60652 176
60653 176
60654 176
60655 176
60656 176
60657 176
60661 176
60614 176
60615 176
60617 176
60618 176
60827 176
60707 176
60666 176
60660 176
60643 176
60633 176
60624 176
60640 176
60625 176
60630 176
60659 176
60628 176
60645 176
60644 176
60610 176
60620 176
60603 176
60607 176
60608 176
60606 176
60604 176
60605 176
60611 176
60612 176
60609 176
60613 176
60616 176
60619 176
60621 176
60642 176
60622 176
60623 176
60626 176
60629 176
60631 176
60632 176
60636 176
60637 176
60634 176
60638 176
60639 176
60641 176
Unknown 176
Name: zip, dtype: int64
STEP FIVE
Let’s make a copy of the COVID dataframe and display the first 5 rows.
df1=COVID
df1.head(5)
deaths | population | tests | zip | |
---|---|---|---|---|
0 | 2 | 14675 | 198 | 60601 |
1 | 11 | 14675 | 23059 | 60601 |
2 | 5 | 14675 | 5654 | 60601 |
3 | 12 | 14675 | 58296 | 60601 |
4 | 5 | 14675 | 782 | 60601 |
STEP SIX
Filter data by a specific zip code, for example, ‘60611’
df2=df1[df1['zip']=='60611']
df2.head()
deaths | population | tests | zip | |
---|---|---|---|---|
214 | 3 | 32426 | 10229 | 60611 |
215 | 4 | 32426 | 11000 | 60611 |
216 | 4 | 32426 | 11777 | 60611 |
218 | 14 | 32426 | 45055 | 60611 |
229 | 14 | 32426 | 41668 | 60611 |
df2["zip"].value_counts()
60611 176
Name: zip, dtype: int64
STEP SEVEN
Let’s find the largest value in the ‘tests’ column for zip ‘60611’.
df2 = df1[df1["zip"]=='60637'] #get just rows with zip 60611
numtested=df2["tests"].max() #get the largest number for tests
numtested
413972
Q4 How can we find how many deaths due to COVID have occurred in zip ‘60623’? (Create a new dataframe for 60623 called temp)
Show code cell source
df3 = COVID[df1["zip"]=='60623'] #get just rows with zip 60611
numdeaths=df3["deaths"].max() #get the largest number for tests
numdeaths
334
Q5 How can we get the population in 60623?
Show code cell source
df3 = COVID[df1["zip"]=='60623'] #get just rows with zip 60611
pop=df3["population"].max() #get the largest number for tests
pop
df3 = COVID[df1["zip"]=='60623'] #get just rows with zip 60611
pop=df3["population"].max() #get the largest number for tests
pop
85979
STEP EIGHT We can instruct the computer to give us the number COVID tests for any Chicago zipcode.
def MyCOVID(COVID,zip):
alreadychecked=0 #eliminate duplication of information
for z in COVID.index: #go through all the index values
if COVID.loc[z,"zip"]==zip and alreadychecked==0: #found the zip we requested (first-time)
alreadychecked=1 #we will only do this once
df=COVID[COVID["zip"]==zip]
numtested=df["tests"].max()
print("Zip code: ", zip) #print zipcode
print("number tested is ", numtested) #print number tested
return ("Enter a different zip code if you wish.")
Q6 Test out the function on zip code ‘60610’
Show code cell source
MyCOVID(COVID,'60637')
Zip code: 60637
number tested is 413972
'Enter a different zip code if you wish.'
Exercises#
Exercises
Define a function myCOVID2() which outputs for each given zip code the population, number tested, and number of deaths.
Use your function myCOVID2() to determine the COVID data for each of the following Chicago landmarks:
a) North Park University (zip ‘60625’)
b) Wheaton in Chicago (zip=‘60637’)
Why is COVID disproportionately impacting black and brown communities?
2.4.3. Mapping Famous Chicagoans#
Note
In the section we will learn how to use Python to put information on a map of a geographical location.
Step One Let’s’ begin by importing data of some famous people born in Chicago.
import pandas as pd
Chi=pd.read_csv('chicagoans.csv')
Chi
Name | Age | Alive | Noted For | Place of Birth | Zip | Lat | Lon | fun fact | |
---|---|---|---|---|---|---|---|---|---|
0 | Michelle Obama | 56 | yes | First Lady | Provident Hospital | 60605 | 41.802665 | -87.613657 | father worked at city water plant |
1 | Robin Williams | 63 | no | Actor | St Luke's Hospital | 60605 | 41.863100 | -87.623100 | went to Julliard with Christopher Reeve |
2 | Walt Disney | 65 | no | Film Producer | 149 Tripp Ave | 60651 | 41.907879 | -87.732594 | won 22 oscars |
3 | Bobby Fischer | 64 | no | Chess | Michael Reese Hospital | 60606 | 41.884116 | -87.637656 | grandmaster at age 15 |
4 | Isiah Thomas | 59 | yes | Basketball | West Side | 60624 | 41.874827 | -87.710075 | youngest of 9 children |
Q1 How old was Bobby Fischer when he became a grandmaster?
STEP TWO Let’s use a Python library called folium to make a map of Chicago.
!pip install folium
Collecting folium
Downloading folium-0.14.0-py2.py3-none-any.whl (102 kB)
0.0/102.3 kB ? eta -:--:--
-------------------------------------- 102.3/102.3 kB 5.8 MB/s eta 0:00:00
Collecting branca>=0.6.0 (from folium)
Downloading branca-0.6.0-py3-none-any.whl (24 kB)
Requirement already satisfied: jinja2>=2.9 in c:\users\pisihara\appdata\local\anaconda3\lib\site-packages (from folium) (3.1.2)
Requirement already satisfied: numpy in c:\users\pisihara\appdata\local\anaconda3\lib\site-packages (from folium) (1.24.3)
Requirement already satisfied: requests in c:\users\pisihara\appdata\local\anaconda3\lib\site-packages (from folium) (2.29.0)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\pisihara\appdata\local\anaconda3\lib\site-packages (from jinja2>=2.9->folium) (2.1.1)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\pisihara\appdata\local\anaconda3\lib\site-packages (from requests->folium) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in c:\users\pisihara\appdata\local\anaconda3\lib\site-packages (from requests->folium) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\pisihara\appdata\local\anaconda3\lib\site-packages (from requests->folium) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\pisihara\appdata\local\anaconda3\lib\site-packages (from requests->folium) (2023.5.7)
Installing collected packages: branca, folium
Successfully installed branca-0.6.0 folium-0.14.0
import folium # map rendering library
from folium.features import DivIcon #used to add popup info to a map
Chicago_map = folium.Map(location=[41.886456, -87.62325], tiles="openstreetmap", zoom_start=10)
Chicago_map
Q2
Try adjusting the “zoom_start” value. What happens?
What happens if you change the numbers in location=[41.886456, -87.62325]”
STEP THREE Let’s add our data about famous Chicagoans to the map.
Chicago_map=folium.Map(location=[41.886456,-87.62325],tiles="openstreetmap",zoom_start=11)
for i in Chi.index:
p=[Chi.loc[i,"Lat"],Chi.loc[i,"Lon"]]
folium.Marker(p,icon=DivIcon(
icon_size=(100,0),
icon_anchor=(0,8),
html='<div style="font-size:20pt; color:red">'+str(Chi.loc[i,"Name"]) +'</div>',
)).add_to(Chicago_map)
Chicago_map.add_child(folium.CircleMarker(p, radius=1,color='black'))
Chicago_map.save("Chicagoans.html")
Chicago_map
Q3 What side of the city was Michelle Obama born?
STEP FOUR Let’s make a function which can add another person to our dataframe called Chi
def addperson(map_name,df,name,age,alive,noted,birth,zipcode,lat,lon,fact):
our_map=map_name
new_row = {'Name':name,
'Age':age,
'Alive':alive,
'Noted For':noted,
'Place of Birth':birth,
'Zip':zipcode,
'Lat':lat,
'Lon':lon,
'fun fact':fact }
df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)
i=len(df)-1
p=[df.loc[i,"Lat"],df.loc[i,"Lon"]]
folium.Marker(p,icon=DivIcon(
icon_size=(100,0),
icon_anchor=(0,8),
html='<div style="font-size: 15pt; color : blue">'+str(df.loc[i,"Name"]) +'</div>',
)).add_to(our_map)
our_map.add_child(folium.CircleMarker(p, radius=1,color='blue'))
return df,our_map
Q4 What command is used to add a new line to the dataframe df?
STEP FIVE Let’s add Jenifer Hudson to our dataframe using the following info:
‘Jennifer Hudson’,38,‘yes’,‘Actress’,‘Englewood’,60621,41.779699,-87.633194,‘worked at Burger King’
[Chi,Chicago_map]=addperson(Chicago_map,Chi,'Jennifer Hudson','Actress',41,'yes','Englewood',60621,41.779699,-87.633194,'worked at Burger King')
Chicago_map
Q5 Where did Jennifer Hudson work before becoming famous?
STEP SIX Add your info to the map by editing the info in the next cell.
#answer to Step Six
STEP SEVEN Use the following two commands to save your map and info in a new Excel file:
Chicago_map.save(“Chicagoans.html”)
Chi.to_excel(‘ChiRevised.xlsx’)
Enter and run the commands in the cell below.
Q7 Did your map display correctly? Did the new Excel file include your name?
Discussion
Whom do you admire? Why so?
What character qualities can help someone to be successful in life?
What can help you to be more successful?
2.4.4. Predicting Exemplary Schools#
Note
This section is a friendly competition predict whether a K-8 Chicago school’s IL State Board of Education Summative Designation is designated ‘Exemplary’ (the ‘Non-Exemplary’ school categories are “Commendable”, “Targeted”, and “Comprehensive”).
The prediction will be based on the file MiddleSchool.xlsx with the following data:
Student Enrollment - Black or African American
Student Enrollment - Hispanic or Latino
Student Enrollment - Children with Disabilities
Student Enrollment - Low Income
Total Number of School Days
8th Grade Passing Algebra 1
Student Attendance Rate
Student Chronic Truancy Rate
Avg Class Size – All Grades
Teacher Retention Rate
Scoring
Scoring for our competition is based on values for the Confusion Matrix:
where
TP=True Positive: your model predicts exemplary and the school is exemplary
TN=True Negative: your model predicts not exemplary and the school is not exemplary
FP=False Positive: your model predicts exemplary but the school is not exemplary
FN=False Negative: your model predicts not exemplary but the school is exemplary
The number of each type of prediction then determines
Accuracy = \(\frac{\mid TP\mid + \mid TN \mid}{\mid TP\mid + \mid TN \mid+ \mid FP\mid + \mid FN \mid}\) (proportion that were correctly predicted out of all the schools)
Specificity (Precision) = \(\frac{\mid TN\mid}{\mid TN\mid + \mid FP\mid }\) (proportion that were correct out of those you predicted to be exemplary)
Sensitivity (Recall) = \(\frac{\mid TP\mid}{\mid TP\mid + \mid FN\mid }\) (proportion that you predicted correctly among just the exemplary schools)
Your competition (F1) score is the geometric mean of the precision (specificity), and recall(sensitivity):
STEP ONE Exploratory Data Analysis 1a) Import the usual libraries including matplotlib.pyplot as plt, as well as the MiddleSchool report card data into a datframe df.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.read_excel("MiddleSchool.xlsx")
df.head(2)
School Name | School Type | Grades Served | Summative Designation | # Student Enrollment | % Student Enrollment - Black or African American | % Student Enrollment - Hispanic or Latino | % Student Enrollment - Children with Disabilities | % Student Enrollment - Low Income | Total Number of School Days | % 8th Grade Passing Algebra 1 | Student Attendance Rate | Student Chronic Truancy Rate | Avg Class Size – All Grades | Teacher Retention Rate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Barbour Two-Way Lang Immersion | ELEMENTARY | K 1 2 3 4 5 6 7 8 | Commendable | 779 | 6.0 | 84.3 | 6.5 | 72.9 | 181 | 0.0 | 96.1 | 7.4 | 26.2 | 91.4 |
1 | Galapagos Rockford Charter Sch | CHARTER SCH | K 1 2 3 4 5 6 7 8 | Commendable | 343 | 60.1 | 18.4 | 16.6 | 44.3 | 181 | 0.0 | 95.8 | 29.6 | 19.1 | 44.2 |
1b) Display the column names and the number of data rows in each column.
df.count()
School Name 122
School Type 122
Grades Served 122
Summative Designation 122
# Student Enrollment 122
% Student Enrollment - Black or African American 122
% Student Enrollment - Hispanic or Latino 122
% Student Enrollment - Children with Disabilities 122
% Student Enrollment - Low Income 122
Total Number of School Days 122
% 8th Grade Passing Algebra 1 122
Student Attendance Rate 122
Student Chronic Truancy Rate 122
Avg Class Size – All Grades 122
Teacher Retention Rate 122
dtype: int64
1c Get the max and min values in each column.
n1min=df["# Student Enrollment"].min()
n1max=df["# Student Enrollment"].max()
n2min=df["% Student Enrollment - Black or African American"].min()
n2max=df["% Student Enrollment - Black or African American"].max()
n3min=df["% Student Enrollment - Hispanic or Latino"].min()
n3max=df["% Student Enrollment - Hispanic or Latino"].max()
n4min=df["% Student Enrollment - Children with Disabilities"].min()
n4max=df["% Student Enrollment - Children with Disabilities"].max()
n5min=df["% Student Enrollment - Low Income"].min()
n5max=df["% Student Enrollment - Low Income"].max()
n6min=df["Total Number of School Days"].min()
n6max=df["Total Number of School Days"].max()
n7min=df["% 8th Grade Passing Algebra 1"].min()
n7max=df["% 8th Grade Passing Algebra 1"].max()
n8min=df["Student Attendance Rate"].min()
n8max=df["Student Attendance Rate"].max()
n9min=df["Student Chronic Truancy Rate"].min()
n9max=df["Student Chronic Truancy Rate"].max()
n10min=df["Avg Class Size – All Grades"].min()
n10max=df["Avg Class Size – All Grades"].max()
n11min=df["Teacher Retention Rate"].min()
n11max=df["Teacher Retention Rate"].max()
print("min enroll",n1min)
print("max enroll",n1max)
print("min % Student Enrollment - Black or African American",n2min)
print("max % Student Enrollment - Black or African American",n2max)
print("min % Student Enrollment - Hispanic or Latino",n3min)
print("max % Student Enrollment - Hispanic or Latino",n3max)
print("min % Student Enrollment - Children with Disabilities",n4min)
print("max % Student Enrollment - Children with Disabilities",n4max)
print("min % Student Enrollment - Low Income",n5min)
print("max % Student Enrollment - Low Income",n5max)
print("min Total Number of School Days",n6min)
print("max Total Number of School Days",n6max)
print("min % 8th Grade Passing Algebra 1",n7min)
print("max % 8th Grade Passing Algebra 1",n7max)
print("min Student Attendance Rate",n8min)
print("max Student Attendance Rate",n8max)
print("min Student Chronic Truancy Rate",n9min)
print("max Student Chronic Truancy Rate",n9max)
print("Avg Class Size – All Grades",n10min)
print("Avg Class Size – All Grades",n10max)
print("Teacher Retention Rate",n11min)
print("Teacher Retention Rate",n11max)
min enroll 92
max enroll 1306
min % Student Enrollment - Black or African American 0.2
max % Student Enrollment - Black or African American 99.5
min % Student Enrollment - Hispanic or Latino 0.5
max % Student Enrollment - Hispanic or Latino 99.8
min % Student Enrollment - Children with Disabilities 5.8
max % Student Enrollment - Children with Disabilities 56.3
min % Student Enrollment - Low Income 0.3
max % Student Enrollment - Low Income 99.6
min Total Number of School Days 174
max Total Number of School Days 186
min % 8th Grade Passing Algebra 1 0.0
max % 8th Grade Passing Algebra 1 100.0
min Student Attendance Rate 91.1
max Student Attendance Rate 99.7
min Student Chronic Truancy Rate 0.0
max Student Chronic Truancy Rate 46.2
Avg Class Size – All Grades 2.0
Avg Class Size – All Grades 31.6
Teacher Retention Rate 0.0
Teacher Retention Rate 100.0
1d Check how many schools are in each category.
df["Summative Designation"].value_counts()
Commendable 102
Targeted 8
Exemplary 6
Comprehensive 6
Name: Summative Designation, dtype: int64
STEP TWO Define a function which predicts whether a school is exemplary (1) or not-exemplary (0).
2a For a simple prediction, let us predict that a school is exemplary if the Teacher Retention Rate is at least 90%.
#---PREDICTION MODEL----#
def mypredict(df):
for i in df.index:
if df.loc[i,"Teacher Retention Rate"]>90:
df.loc[i,"Prediction"]=1
else:
df.loc[i,"Prediction"]=0
return df
#---APPLY MODEL TO OUR DATA---#
mydf=mypredict(df)
mydf=mydf.reset_index(drop=True)
#---COMPUTE YOUR SCORE---#
TP=0
TN=0
FP=0
FN=0
numschools=0
for i in mydf.index:
if mydf.loc[i,"Prediction"]==1 and mydf.loc[i,"Summative Designation"]=="Exemplary":
TP=TP+1
if mydf.loc[i,"Prediction"]==0 and mydf.loc[i,"Summative Designation"]!="Exemplary":
TN=TN+1
if mydf.loc[i,"Prediction"]==1 and mydf.loc[i,"Summative Designation"]!="Exemplary":
FP=FP+1
if mydf.loc[i,"Prediction"]==0 and mydf.loc[i,"Summative Designation"]=="Exemplary":
FN=FN+1
numschools=numschools+1
print("|TP|=",TP)
print("|TN|=",TN)
print("|FP|=",FP)
print("|FN|=",FN)
accuracy=round((TP+TN)/numschools,2)
precision=round(TP/(TP+FP),2)
recall=round(TP/(TP+FN),2)
F1score=2*(precision*recall)/(precision+recall)
print("Accuracy (% correct all 122 schools)=",100*accuracy,"%")
print("Precision (% correct of those you predicted to be exemplary) =",100*precision,"%")
print("Recall (% correct of schools that are exemplary) =",100*recall,"%")
print('COMPETITION F1 SCORE=',round(F1score*100,2),"%" )
|TP|= 5
|TN|= 70
|FP|= 46
|FN|= 1
Accuracy (% correct all 122 schools)= 61.0 %
Precision (% correct of those you predicted to be exemplary) = 10.0 %
Recall (% correct of schools that are exemplary) = 83.0 %
COMPETITION F1 SCORE= 17.85 %
2.4.5. Assignment#
Assignment
Modify the Prediction Model to see how high you can score.