{ "cells": [ { "cell_type": "markdown", "id": "bd66cd07", "metadata": {}, "source": [ "# Glimpse of Chicago" ] }, { "cell_type": "markdown", "id": "7b71d00e", "metadata": {}, "source": [ ":::{admonition} Personal Note from the Author\n", "In the Summer of 2020, Hope Wood (Wheaton College Urban Studies staff), Demetrius Crawford (Wheaton College Urban Studies outreach program participant) and I (Wheaton College math professor) launched the idea of creating JNB labs as a tool to teach math, data analysis and STEAM to disadvantaged Chicago grade-school students. \n", "\n", "In Fall 2020, Mr. Jim Wilkes, principal of Cornerstone Academy, an alternative Christian High School serving mainly low to middle income students on the West Side of Chicago held the pilot offering of a JNB after-school program via Zoom. A class of Wheaton College student mentors worked online in small groups of 2 or 3 with Cornerstone students. Mr. Wilkes and I served as co-program directors, and Hope Wood and Antoinette Ratliff as program staff.\n", "\n", "The following JNBs were developed for use with 8th graders participating in an after-school program at a non-profit organization called Celestial Ministries. The first two years, the scholars' middle school math teacher brought them to to the program and participated in the class. I served as teacher. We are planning for the middle school teacher to take over the teaching responsibilities. \n", "\n", "Celestial Ministries serves single parent children in Lawndale neighborhood on the West Side of Chicago. Parents and friends were invite to a recognition ceremony in which students demonstrated various functionalities of JNBs which covered in the program. Students who successfully completed the program were given a certificate and a 100 dollar stipend.\n", ":::" ] }, { "cell_type": "markdown", "id": "9c0f0756", "metadata": {}, "source": [ ":::::{admonition} Word of Advice\n", ":class: tip\n", "There are likely opportunities to introduce Python Jupyter Notebooks in disadvantaged communities within proximity of your workplace or residence. Establishing a good working relationship with a well-established community organization and loal community math teacher is highly recommended as opposed to seeking to establish a program on your own.\n", ":::::" ] }, { "cell_type": "markdown", "id": "694e3080", "metadata": {}, "source": [ "```{index} function\n", "```" ] }, { "cell_type": "markdown", "id": "7015bcc4", "metadata": {}, "source": [ "### Michael Jordan's Greatest Scoring Game\n", ":::{note}\n", "In this section, we will introduce a very important programming tool called a function. \n", ":::" ] }, { "cell_type": "markdown", "id": "7e92afcc", "metadata": {}, "source": [ "__STEP#1__\n", "Watch the PowerPoint on this lesson.\n", "link to PPT" ] }, { "cell_type": "markdown", "id": "e4a329fd", "metadata": {}, "source": [ "__STEP #2__\n", "From the PowerPoint, a variable is like a b ____ . A variable has a n ______ . We can store some i ______ such as a number (eg. 3) or a message (eg. 'God is Good!') in a variable.\n", "Let's give names to two variables:\n", "\n", "* feet\n", "* inch\n", " \n", "In Michael Jordan's case, the info stored in the variable feet is the number 6 and the info stored in the variable inch is also equal to 6. For short, we just write feet=6 and inch=6\n", "Let's go around and each person state your name and what your value is for the variables feet and inch.\n", "\n", "(Answers: box, name, info)" ] }, { "cell_type": "markdown", "id": "4357ab78", "metadata": {}, "source": [ "__STEP #3__\n", "The word for a short computer program is a c ______. The programming language we are using is Py ______ n.\n", "In the next cell, we give Python code to create the function in STEP 2. (If it looks complicated, try to find someone to help explain it to you.) \n", "\n", "(Answers: code, Python)" ] }, { "cell_type": "code", "execution_count": 8, "id": "da6d100b", "metadata": {}, "outputs": [], "source": [ "def height(feet,inch):\n", " height_in_inches = 12*feet + inch \n", " return height_in_inches" ] }, { "cell_type": "code", "execution_count": 9, "id": "02b602e2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "72" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "height(6,0) #Test on Seimone's height feet=6, inch=0" ] }, { "cell_type": "markdown", "id": "b8155ef1", "metadata": {}, "source": [ "__STEP #4__\n", "What do we have to do in the next cell to test our function on 6'6\" Michael Jordan?" ] }, { "cell_type": "code", "execution_count": null, "id": "e2868579", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "6ca75e65", "metadata": {}, "source": [ "__STEP #5__\n", "One of the fun parts of programming is that we can try to design a function to do almost anything we want. \n", "\n", "Define a function called points(fg,ft,tp) which computes the points scored by a basketball player given the input variables \n", "\n", "* fg= number of 2 point field goals made\n", "* ft=number of free throws made\n", "* tp=number of three point shots made\n", " \n", "Use a variable called pts to store the answer." ] }, { "cell_type": "markdown", "id": "4a80bde0", "metadata": {}, "source": [ "__STEP 6__\n", "In his highest scoring game, Michael Jordan made 21 two-point field goals, 21 free throws and 2 three point shots.\n", "Check that your function gives the correct value of 69 for his point total." ] }, { "cell_type": "code", "execution_count": null, "id": "c822a978", "metadata": {}, "outputs": [], "source": [ "#Supply the missing values for points(fg,ft,tp)\n" ] }, { "cell_type": "markdown", "id": "59b3547d", "metadata": {}, "source": [ ":::{admonition} For Discussion\n", "\n", "1. What would you like your mentors to explain about today's lesson?\n", "\n", "2. What is one of your favorite sports memories?\n", "\n", ":::" ] }, { "cell_type": "markdown", "id": "f3c40d7a", "metadata": {}, "source": [ "```{index} Covid-19\n", "```" ] }, { "cell_type": "markdown", "id": "8b261d47", "metadata": {}, "source": [ "### How Bad is COVID-19?" ] }, { "cell_type": "markdown", "id": "982dd982", "metadata": {}, "source": [ ":::{note}\n", "In this section we will analyze data imported directly from the Chicago Data Portal\n", "\n", "https://data.cityofchicago.org/\n", ":::" ] }, { "cell_type": "markdown", "id": "750bce4b", "metadata": {}, "source": [ "__STEP 1__\n", "Let's read in up to 100,000 rows of COVID data for Chicago from the Chicago Data Portal." ] }, { "cell_type": "code", "execution_count": 14, "id": "e04a37b8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
zip_codeweek_numberweek_startweek_endcases_weeklycases_cumulativecase_rate_weeklycase_rate_cumulativetests_weeklytests_cumulative...death_rate_weeklydeath_rate_cumulativepopulationrow_idzip_code_location:@computed_region_rpca_8um6:@computed_region_vrxf_vc4k:@computed_region_6mkv_f3dw:@computed_region_bdys_3d7i:@computed_region_43wa_7qmu
060601172020-04-19T00:00:00.0002020-04-25T00:00:00.0007.045.047.7306.635.0198...6.813.61467560601-2020-17{'type': 'Point', 'coordinates': [-87.622844, ...42.038.014309.0580.036.0
160601122021-03-21T00:00:00.0002021-03-27T00:00:00.00029.01003.0197.66834.8616.023059...0.075.01467560601-2021-12{'type': 'Point', 'coordinates': [-87.622844, ...42.038.014309.0580.036.0
260601382020-09-13T00:00:00.0002020-09-19T00:00:00.0007.0204.047.71390.1325.05654...0.034.11467560601-2020-38{'type': 'Point', 'coordinates': [-87.622844, ...42.038.014309.0580.036.0
360601432021-10-24T00:00:00.0002021-10-30T00:00:00.00022.01552.0149.910575.81243.058296...0.081.81467560601-2021-43{'type': 'Point', 'coordinates': [-87.622844, ...42.038.014309.0580.036.0
460601242020-06-07T00:00:00.0002020-06-13T00:00:00.0006.078.040.9531.5106.0782...0.034.11467560601-2020-24{'type': 'Point', 'coordinates': [-87.622844, ...42.038.014309.0580.036.0
560601262020-06-21T00:00:00.0002020-06-27T00:00:00.0004.088.027.3599.7175.01075...0.034.11467560601-2020-26{'type': 'Point', 'coordinates': [-87.622844, ...42.038.014309.0580.036.0
660601282020-07-05T00:00:00.0002020-07-11T00:00:00.0004.096.027.3654.2202.01439...0.034.11467560601-2020-28{'type': 'Point', 'coordinates': [-87.622844, ...42.038.014309.0580.036.0
760601392020-09-20T00:00:00.0002020-09-26T00:00:00.0009.0213.061.31451.4390.06044...6.840.91467560601-2020-39{'type': 'Point', 'coordinates': [-87.622844, ...42.038.014309.0580.036.0
860601402020-09-27T00:00:00.0002020-10-03T00:00:00.0005.0218.034.11485.5393.06437...0.040.91467560601-2020-40{'type': 'Point', 'coordinates': [-87.622844, ...42.038.014309.0580.036.0
960601412020-10-04T00:00:00.0002020-10-10T00:00:00.0007.0225.047.71533.2433.06870...0.040.91467560601-2020-41{'type': 'Point', 'coordinates': [-87.622844, ...42.038.014309.0580.036.0
\n", "

10 rows × 26 columns

\n", "
" ], "text/plain": [ " zip_code week_number week_start week_end \\\n", "0 60601 17 2020-04-19T00:00:00.000 2020-04-25T00:00:00.000 \n", "1 60601 12 2021-03-21T00:00:00.000 2021-03-27T00:00:00.000 \n", "2 60601 38 2020-09-13T00:00:00.000 2020-09-19T00:00:00.000 \n", "3 60601 43 2021-10-24T00:00:00.000 2021-10-30T00:00:00.000 \n", "4 60601 24 2020-06-07T00:00:00.000 2020-06-13T00:00:00.000 \n", "5 60601 26 2020-06-21T00:00:00.000 2020-06-27T00:00:00.000 \n", "6 60601 28 2020-07-05T00:00:00.000 2020-07-11T00:00:00.000 \n", "7 60601 39 2020-09-20T00:00:00.000 2020-09-26T00:00:00.000 \n", "8 60601 40 2020-09-27T00:00:00.000 2020-10-03T00:00:00.000 \n", "9 60601 41 2020-10-04T00:00:00.000 2020-10-10T00:00:00.000 \n", "\n", " cases_weekly cases_cumulative case_rate_weekly case_rate_cumulative \\\n", "0 7.0 45.0 47.7 306.6 \n", "1 29.0 1003.0 197.6 6834.8 \n", "2 7.0 204.0 47.7 1390.1 \n", "3 22.0 1552.0 149.9 10575.8 \n", "4 6.0 78.0 40.9 531.5 \n", "5 4.0 88.0 27.3 599.7 \n", "6 4.0 96.0 27.3 654.2 \n", "7 9.0 213.0 61.3 1451.4 \n", "8 5.0 218.0 34.1 1485.5 \n", "9 7.0 225.0 47.7 1533.2 \n", "\n", " tests_weekly tests_cumulative ... death_rate_weekly \\\n", "0 35.0 198 ... 6.8 \n", "1 616.0 23059 ... 0.0 \n", "2 325.0 5654 ... 0.0 \n", "3 1243.0 58296 ... 0.0 \n", "4 106.0 782 ... 0.0 \n", "5 175.0 1075 ... 0.0 \n", "6 202.0 1439 ... 0.0 \n", "7 390.0 6044 ... 6.8 \n", "8 393.0 6437 ... 0.0 \n", "9 433.0 6870 ... 0.0 \n", "\n", " death_rate_cumulative population row_id \\\n", "0 13.6 14675 60601-2020-17 \n", "1 75.0 14675 60601-2021-12 \n", "2 34.1 14675 60601-2020-38 \n", "3 81.8 14675 60601-2021-43 \n", "4 34.1 14675 60601-2020-24 \n", "5 34.1 14675 60601-2020-26 \n", "6 34.1 14675 60601-2020-28 \n", "7 40.9 14675 60601-2020-39 \n", "8 40.9 14675 60601-2020-40 \n", "9 40.9 14675 60601-2020-41 \n", "\n", " zip_code_location \\\n", "0 {'type': 'Point', 'coordinates': [-87.622844, ... \n", "1 {'type': 'Point', 'coordinates': [-87.622844, ... \n", "2 {'type': 'Point', 'coordinates': [-87.622844, ... \n", "3 {'type': 'Point', 'coordinates': [-87.622844, ... \n", "4 {'type': 'Point', 'coordinates': [-87.622844, ... \n", "5 {'type': 'Point', 'coordinates': [-87.622844, ... \n", "6 {'type': 'Point', 'coordinates': [-87.622844, ... \n", "7 {'type': 'Point', 'coordinates': [-87.622844, ... \n", "8 {'type': 'Point', 'coordinates': [-87.622844, ... \n", "9 {'type': 'Point', 'coordinates': [-87.622844, ... \n", "\n", " :@computed_region_rpca_8um6 :@computed_region_vrxf_vc4k \\\n", "0 42.0 38.0 \n", "1 42.0 38.0 \n", "2 42.0 38.0 \n", "3 42.0 38.0 \n", "4 42.0 38.0 \n", "5 42.0 38.0 \n", "6 42.0 38.0 \n", "7 42.0 38.0 \n", "8 42.0 38.0 \n", "9 42.0 38.0 \n", "\n", " :@computed_region_6mkv_f3dw :@computed_region_bdys_3d7i \\\n", "0 14309.0 580.0 \n", "1 14309.0 580.0 \n", "2 14309.0 580.0 \n", "3 14309.0 580.0 \n", "4 14309.0 580.0 \n", "5 14309.0 580.0 \n", "6 14309.0 580.0 \n", "7 14309.0 580.0 \n", "8 14309.0 580.0 \n", "9 14309.0 580.0 \n", "\n", " :@computed_region_43wa_7qmu \n", "0 36.0 \n", "1 36.0 \n", "2 36.0 \n", "3 36.0 \n", "4 36.0 \n", "5 36.0 \n", "6 36.0 \n", "7 36.0 \n", "8 36.0 \n", "9 36.0 \n", "\n", "[10 rows x 26 columns]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "rawCOVID = pd.read_json('https://data.cityofchicago.org/resource/yhhz-zm2v.json?$limit=100000') #Import data directly from data portal\n", "rawCOVID.head(10)" ] }, { "cell_type": "markdown", "id": "409fce9f", "metadata": {}, "source": [ "Let's check the number of rows and columns." ] }, { "cell_type": "code", "execution_count": 15, "id": "f1409c64", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(10560, 26)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rawCOVID.shape" ] }, { "cell_type": "markdown", "id": "b224e1e7", "metadata": {}, "source": [ "We'll list the 26 column names" ] }, { "cell_type": "code", "execution_count": 16, "id": "1b289ff0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['zip_code',\n", " 'week_number',\n", " 'week_start',\n", " 'week_end',\n", " 'cases_weekly',\n", " 'cases_cumulative',\n", " 'case_rate_weekly',\n", " 'case_rate_cumulative',\n", " 'tests_weekly',\n", " 'tests_cumulative',\n", " 'test_rate_weekly',\n", " 'test_rate_cumulative',\n", " 'percent_tested_positive_weekly',\n", " 'percent_tested_positive_cumulative',\n", " 'deaths_weekly',\n", " 'deaths_cumulative',\n", " 'death_rate_weekly',\n", " 'death_rate_cumulative',\n", " 'population',\n", " 'row_id',\n", " 'zip_code_location',\n", " ':@computed_region_rpca_8um6',\n", " ':@computed_region_vrxf_vc4k',\n", " ':@computed_region_6mkv_f3dw',\n", " ':@computed_region_bdys_3d7i',\n", " ':@computed_region_43wa_7qmu']" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(rawCOVID.columns)" ] }, { "cell_type": "markdown", "id": "cee7840b", "metadata": {}, "source": [ "__Q1__ Does case matter when referring to a column name?\n", "\n", "__STEP TWO__\n", "Let's streamline the data to just 4 columns, drop rows with missing data, simplify the column names, and then display the first 20 rows." ] }, { "cell_type": "code", "execution_count": 17, "id": "111c25d8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
deathspopulationtestszip
021467519860601
111146752305960601
2514675565460601
312146755829660601
451467578260601
5514675107560601
6514675143960601
7614675604460601
8614675643760601
9614675687060601
10714675764260601
11714675822260601
12714675879860601
13714675937760601
148146751012560601
158146751112960601
168146751171060601
179146751241760601
189146751309260601
1910146751397360601
\n", "
" ], "text/plain": [ " deaths population tests zip\n", "0 2 14675 198 60601\n", "1 11 14675 23059 60601\n", "2 5 14675 5654 60601\n", "3 12 14675 58296 60601\n", "4 5 14675 782 60601\n", "5 5 14675 1075 60601\n", "6 5 14675 1439 60601\n", "7 6 14675 6044 60601\n", "8 6 14675 6437 60601\n", "9 6 14675 6870 60601\n", "10 7 14675 7642 60601\n", "11 7 14675 8222 60601\n", "12 7 14675 8798 60601\n", "13 7 14675 9377 60601\n", "14 8 14675 10125 60601\n", "15 8 14675 11129 60601\n", "16 8 14675 11710 60601\n", "17 9 14675 12417 60601\n", "18 9 14675 13092 60601\n", "19 10 14675 13973 60601" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "COVID=rawCOVID[['deaths_cumulative', \"population\", 'tests_cumulative','zip_code']]\n", "COVID.dropna #drop rows with missing data\n", "COVID.columns=[\"deaths\",\"population\",\"tests\",\"zip\"] #simplify the column names\n", "COVID.head(20)" ] }, { "cell_type": "markdown", "id": "f6fa66c6", "metadata": {}, "source": [ "__Q2__ Why is there different information for the same zip code?\n", "\n", "__STEP THREE__\n", "\n", "Let's check how many rows have data in each column" ] }, { "cell_type": "code", "execution_count": 18, "id": "58665fff", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "deaths 10560\n", "population 10560\n", "tests 10560\n", "zip 10560\n", "dtype: int64" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "COVID.count()" ] }, { "cell_type": "markdown", "id": "64cb59b3", "metadata": {}, "source": [ "__Q3__ How many data rows are there in each column?\n", "\n", "__STEP FOUR__\n", "\n", "Let's find out how many times each zip code appears." ] }, { "cell_type": "code", "execution_count": 20, "id": "d7145418", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "60601 176\n", "60602 176\n", "60646 176\n", "60647 176\n", "60649 176\n", "60651 176\n", "60652 176\n", "60653 176\n", "60654 176\n", "60655 176\n", "60656 176\n", "60657 176\n", "60661 176\n", "60614 176\n", "60615 176\n", "60617 176\n", "60618 176\n", "60827 176\n", "60707 176\n", "60666 176\n", "60660 176\n", "60643 176\n", "60633 176\n", "60624 176\n", "60640 176\n", "60625 176\n", "60630 176\n", "60659 176\n", "60628 176\n", "60645 176\n", "60644 176\n", "60610 176\n", "60620 176\n", "60603 176\n", "60607 176\n", "60608 176\n", "60606 176\n", "60604 176\n", "60605 176\n", "60611 176\n", "60612 176\n", "60609 176\n", "60613 176\n", "60616 176\n", "60619 176\n", "60621 176\n", "60642 176\n", "60622 176\n", "60623 176\n", "60626 176\n", "60629 176\n", "60631 176\n", "60632 176\n", "60636 176\n", "60637 176\n", "60634 176\n", "60638 176\n", "60639 176\n", "60641 176\n", "Unknown 176\n", "Name: zip, dtype: int64" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "COVID[\"zip\"].value_counts()" ] }, { "cell_type": "markdown", "id": "6f86b4d6", "metadata": {}, "source": [ "__STEP FIVE__\n", "\n", "Let's make a copy of the COVID dataframe and display the first 5 rows." ] }, { "cell_type": "code", "execution_count": 21, "id": "8ef563c0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
deathspopulationtestszip
021467519860601
111146752305960601
2514675565460601
312146755829660601
451467578260601
\n", "
" ], "text/plain": [ " deaths population tests zip\n", "0 2 14675 198 60601\n", "1 11 14675 23059 60601\n", "2 5 14675 5654 60601\n", "3 12 14675 58296 60601\n", "4 5 14675 782 60601" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1=COVID\n", "df1.head(5)" ] }, { "cell_type": "markdown", "id": "3e535252", "metadata": {}, "source": [ "__STEP SIX__\n", "\n", "Filter data by a specific zip code, for example, '60611'" ] }, { "cell_type": "code", "execution_count": 22, "id": "4d202f14", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
deathspopulationtestszip
2143324261022960611
2154324261100060611
2164324261177760611
21814324264505560611
22914324264166860611
\n", "
" ], "text/plain": [ " deaths population tests zip\n", "214 3 32426 10229 60611\n", "215 4 32426 11000 60611\n", "216 4 32426 11777 60611\n", "218 14 32426 45055 60611\n", "229 14 32426 41668 60611" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2=df1[df1['zip']=='60611']\n", "df2.head()" ] }, { "cell_type": "code", "execution_count": 23, "id": "0fc0deff", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "60611 176\n", "Name: zip, dtype: int64" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2[\"zip\"].value_counts()" ] }, { "cell_type": "markdown", "id": "559a87b5", "metadata": {}, "source": [ "__STEP SEVEN__\n", "\n", "Let's find the largest value in the 'tests' column for zip '60611'." ] }, { "cell_type": "code", "execution_count": 24, "id": "fde40a4e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "413972" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = df1[df1[\"zip\"]=='60637'] #get just rows with zip 60611\n", "numtested=df2[\"tests\"].max() #get the largest number for tests\n", "numtested" ] }, { "cell_type": "markdown", "id": "bf55ea86", "metadata": {}, "source": [ "__Q4__\n", "How can we find how many deaths due to COVID have occurred in zip '60623'? (Create a new dataframe for 60623 called temp)" ] }, { "cell_type": "code", "execution_count": 25, "id": "35a95df5", "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/plain": [ "334" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3 = COVID[df1[\"zip\"]=='60623'] #get just rows with zip 60611\n", "numdeaths=df3[\"deaths\"].max() #get the largest number for tests\n", "numdeaths" ] }, { "cell_type": "markdown", "id": "eb2e1976", "metadata": {}, "source": [ "__Q5__ How can we get the population in 60623?" ] }, { "cell_type": "code", "execution_count": null, "id": "6e286b10", "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "df3 = COVID[df1[\"zip\"]=='60623'] #get just rows with zip 60611\n", "pop=df3[\"population\"].max() #get the largest number for tests\n", "pop\n" ] }, { "cell_type": "code", "execution_count": 26, "id": "3ca60212", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "85979" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3 = COVID[df1[\"zip\"]=='60623'] #get just rows with zip 60611\n", "pop=df3[\"population\"].max() #get the largest number for tests\n", "pop\n" ] }, { "cell_type": "markdown", "id": "d2c61ab9", "metadata": {}, "source": [ "__STEP EIGHT__\n", "We can instruct the computer to give us the number COVID tests for any Chicago zipcode. " ] }, { "cell_type": "code", "execution_count": 27, "id": "fd06597e", "metadata": {}, "outputs": [], "source": [ "def MyCOVID(COVID,zip):\n", " alreadychecked=0 #eliminate duplication of information\n", " for z in COVID.index: #go through all the index values\n", " if COVID.loc[z,\"zip\"]==zip and alreadychecked==0: #found the zip we requested (first-time)\n", " alreadychecked=1 #we will only do this once\n", " df=COVID[COVID[\"zip\"]==zip]\n", " numtested=df[\"tests\"].max()\n", " print(\"Zip code: \", zip) #print zipcode\n", " print(\"number tested is \", numtested) #print number tested\n", " return (\"Enter a different zip code if you wish.\")" ] }, { "cell_type": "markdown", "id": "0dff1c91", "metadata": {}, "source": [ "__Q6__ Test out the function on zip code '60610'" ] }, { "cell_type": "code", "execution_count": 28, "id": "7734ea28", "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Zip code: 60637\n", "number tested is 413972\n" ] }, { "data": { "text/plain": [ "'Enter a different zip code if you wish.'" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "MyCOVID(COVID,'60637')" ] }, { "cell_type": "markdown", "id": "4f1481ed", "metadata": {}, "source": [ "#### Exercises" ] }, { "cell_type": "markdown", "id": "ad7e7b9b", "metadata": {}, "source": [ ":::{admonition} Exercises\n", "1. Define a function myCOVID2() which outputs for each given zip code the population, number tested, and number of deaths. \n", "\n", "2. Use your function myCOVID2() to determine the COVID data for each of the following Chicago landmarks:\n", "\n", "a) North Park University (zip '60625')\n", "\n", "b) Wheaton in Chicago (zip='60637')\n", "\n", "3. Why is COVID disproportionately impacting black and brown communities?\n", "\n", ":::" ] }, { "cell_type": "markdown", "id": "173e2c1f", "metadata": {}, "source": [ "```{index} map\n", "```\n", "\n" ] }, { "cell_type": "markdown", "id": "2c870b3e", "metadata": {}, "source": [ "### Mapping Famous Chicagoans" ] }, { "cell_type": "markdown", "id": "3271c6cc", "metadata": {}, "source": [ ":::{note}\n", "In the section we will learn how to use Python to put information on a map of a geographical location.\n", ":::" ] }, { "cell_type": "markdown", "id": "62281a10", "metadata": {}, "source": [ "__Step One__\n", "Let's' begin by importing data of some famous people born in Chicago." ] }, { "cell_type": "code", "execution_count": 3, "id": "bfd0e1d6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameAgeAliveNoted ForPlace of BirthZipLatLonfun fact
0Michelle Obama56yesFirst LadyProvident Hospital6060541.802665-87.613657father worked at city water plant
1Robin Williams63noActorSt Luke's Hospital6060541.863100-87.623100went to Julliard with Christopher Reeve
2Walt Disney65noFilm Producer149 Tripp Ave6065141.907879-87.732594won 22 oscars
3Bobby Fischer64noChessMichael Reese Hospital6060641.884116-87.637656grandmaster at age 15
4Isiah Thomas59yesBasketballWest Side6062441.874827-87.710075youngest of 9 children
\n", "
" ], "text/plain": [ " Name Age Alive Noted For Place of Birth Zip \\\n", "0 Michelle Obama 56 yes First Lady Provident Hospital 60605 \n", "1 Robin Williams 63 no Actor St Luke's Hospital 60605 \n", "2 Walt Disney 65 no Film Producer 149 Tripp Ave 60651 \n", "3 Bobby Fischer 64 no Chess Michael Reese Hospital 60606 \n", "4 Isiah Thomas 59 yes Basketball West Side 60624 \n", "\n", " Lat Lon fun fact \n", "0 41.802665 -87.613657 father worked at city water plant \n", "1 41.863100 -87.623100 went to Julliard with Christopher Reeve \n", "2 41.907879 -87.732594 won 22 oscars \n", "3 41.884116 -87.637656 grandmaster at age 15 \n", "4 41.874827 -87.710075 youngest of 9 children " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "Chi=pd.read_csv('chicagoans.csv')\n", "Chi" ] }, { "cell_type": "markdown", "id": "d2c77241", "metadata": {}, "source": [ "__Q1__ How old was Bobby Fischer when he became a grandmaster?\n", "\n", "__STEP TWO__\n", "Let's use a Python library called folium to make a map of Chicago." ] }, { "cell_type": "markdown", "id": "7fa75d3f", "metadata": {}, "source": [ "```{index} folium\n", "```" ] }, { "cell_type": "code", "execution_count": 3, "id": "5035812d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting folium\n", " Downloading folium-0.14.0-py2.py3-none-any.whl (102 kB)\n", " 0.0/102.3 kB ? eta -:--:--\n", " -------------------------------------- 102.3/102.3 kB 5.8 MB/s eta 0:00:00\n", "Collecting branca>=0.6.0 (from folium)\n", " Downloading branca-0.6.0-py3-none-any.whl (24 kB)\n", "Requirement already satisfied: jinja2>=2.9 in c:\\users\\pisihara\\appdata\\local\\anaconda3\\lib\\site-packages (from folium) (3.1.2)\n", "Requirement already satisfied: numpy in c:\\users\\pisihara\\appdata\\local\\anaconda3\\lib\\site-packages (from folium) (1.24.3)\n", "Requirement already satisfied: requests in c:\\users\\pisihara\\appdata\\local\\anaconda3\\lib\\site-packages (from folium) (2.29.0)\n", "Requirement already satisfied: MarkupSafe>=2.0 in c:\\users\\pisihara\\appdata\\local\\anaconda3\\lib\\site-packages (from jinja2>=2.9->folium) (2.1.1)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in c:\\users\\pisihara\\appdata\\local\\anaconda3\\lib\\site-packages (from requests->folium) (2.0.4)\n", "Requirement already satisfied: idna<4,>=2.5 in c:\\users\\pisihara\\appdata\\local\\anaconda3\\lib\\site-packages (from requests->folium) (3.4)\n", "Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\\users\\pisihara\\appdata\\local\\anaconda3\\lib\\site-packages (from requests->folium) (1.26.16)\n", "Requirement already satisfied: certifi>=2017.4.17 in c:\\users\\pisihara\\appdata\\local\\anaconda3\\lib\\site-packages (from requests->folium) (2023.5.7)\n", "Installing collected packages: branca, folium\n", "Successfully installed branca-0.6.0 folium-0.14.0\n" ] } ], "source": [ "!pip install folium\n" ] }, { "cell_type": "code", "execution_count": 1, "id": "c9d938fa", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Make this Notebook Trusted to load map: File -> Trust Notebook
" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import folium # map rendering library\n", "from folium.features import DivIcon #used to add popup info to a map\n", "Chicago_map = folium.Map(location=[41.886456, -87.62325], tiles=\"openstreetmap\", zoom_start=10)\n", "Chicago_map" ] }, { "cell_type": "markdown", "id": "bcd16688", "metadata": {}, "source": [ "__Q2__\n", "\n", "Try adjusting the \"zoom_start\" value. What happens?\n", "\n", "What happens if you change the numbers in location=[41.886456, -87.62325]\"" ] }, { "cell_type": "markdown", "id": "ad09281c", "metadata": {}, "source": [ "__STEP THREE__\n", "Let's add our data about famous Chicagoans to the map." ] }, { "cell_type": "code", "execution_count": 8, "id": "06ce6283", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Make this Notebook Trusted to load map: File -> Trust Notebook
" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Chicago_map=folium.Map(location=[41.886456,-87.62325],tiles=\"openstreetmap\",zoom_start=11)\n", "for i in Chi.index:\n", " p=[Chi.loc[i,\"Lat\"],Chi.loc[i,\"Lon\"]]\n", " folium.Marker(p,icon=DivIcon(\n", " icon_size=(100,0),\n", " icon_anchor=(0,8),\n", " html='
'+str(Chi.loc[i,\"Name\"]) +'
',\n", " )).add_to(Chicago_map)\n", " Chicago_map.add_child(folium.CircleMarker(p, radius=1,color='black'))\n", "Chicago_map.save(\"Chicagoans.html\")\n", "Chicago_map" ] }, { "cell_type": "markdown", "id": "5619656a", "metadata": {}, "source": [ "__Q3__ What side of the city was Michelle Obama born?\n", "\n", "__STEP FOUR__\n", "Let's make a function which can add another person to our dataframe called Chi " ] }, { "cell_type": "code", "execution_count": 9, "id": "ea2a3352", "metadata": {}, "outputs": [], "source": [ "def addperson(map_name,df,name,age,alive,noted,birth,zipcode,lat,lon,fact):\n", " our_map=map_name \n", " new_row = {'Name':name, \n", " 'Age':age, \n", " 'Alive':alive,\n", " 'Noted For':noted,\n", " 'Place of Birth':birth,\n", " 'Zip':zipcode,\n", " 'Lat':lat,\n", " 'Lon':lon,\n", " 'fun fact':fact }\n", " df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)\n", " i=len(df)-1\n", " p=[df.loc[i,\"Lat\"],df.loc[i,\"Lon\"]]\n", " folium.Marker(p,icon=DivIcon(\n", " icon_size=(100,0),\n", " icon_anchor=(0,8),\n", " html='
'+str(df.loc[i,\"Name\"]) +'
',\n", " )).add_to(our_map)\n", " our_map.add_child(folium.CircleMarker(p, radius=1,color='blue'))\n", " return df,our_map" ] }, { "cell_type": "markdown", "id": "ac314e09", "metadata": {}, "source": [ "__Q4__ What command is used to add a new line to the dataframe df?\n", "\n", "__STEP FIVE__\n", "Let's add Jenifer Hudson to our dataframe using the following info:\n", "\n", "'Jennifer Hudson',38,'yes','Actress','Englewood',60621,41.779699,-87.633194,'worked at Burger King'\n" ] }, { "cell_type": "code", "execution_count": 10, "id": "cef0aa52", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Make this Notebook Trusted to load map: File -> Trust Notebook
" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[Chi,Chicago_map]=addperson(Chicago_map,Chi,'Jennifer Hudson','Actress',41,'yes','Englewood',60621,41.779699,-87.633194,'worked at Burger King')\n", "Chicago_map" ] }, { "cell_type": "markdown", "id": "f089fdb8", "metadata": {}, "source": [ "__Q5__ Where did Jennifer Hudson work before becoming famous?\n", "\n", "__STEP SIX__\n", "Add your info to the map by editing the info in the next cell.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "af428909", "metadata": {}, "outputs": [], "source": [ "#answer to Step Six\n" ] }, { "cell_type": "markdown", "id": "d98a308b", "metadata": {}, "source": [ "__STEP SEVEN__\n", "Use the following two commands to save your map and info in a new Excel file:\n", "\n", "Chicago_map.save(\"Chicagoans.html\")\n", "\n", "Chi.to_excel('ChiRevised.xlsx')\n", "\n", "Enter and run the commands in the cell below." ] }, { "cell_type": "markdown", "id": "c347166d", "metadata": {}, "source": [ "__Q7__ Did your map display correctly? Did the new Excel file include your name?\n", "\n" ] }, { "cell_type": "markdown", "id": "3b28c35d", "metadata": {}, "source": [ ":::{admonition} Discussion\n", "\n", "\n", "1) Whom do you admire? Why so?\n", "\n", "2) What character qualities can help someone to be successful in life?\n", "\n", "3) What can help you to be more successful?\n", "\n", "\n", ":::" ] }, { "cell_type": "markdown", "id": "0fd326b6", "metadata": {}, "source": [ "## Predicting Exemplary Schools" ] }, { "cell_type": "markdown", "id": "e5e9d99a", "metadata": {}, "source": [ ":::{note}\n", "This section is a friendly competition predict whether a K-8 Chicago school's IL State Board of Education Summative Designation is designated 'Exemplary' (the 'Non-Exemplary' school categories are \"Commendable\", \"Targeted\", and \"Comprehensive\").\n", "\n", "The prediction will be based on the file MiddleSchool.xlsx with the following data:\n", " \n", "* Student Enrollment - Black or African American \n", "* Student Enrollment - Hispanic or Latino \n", "* Student Enrollment - Children with Disabilities \n", "* Student Enrollment - Low Income \n", "* Total Number of School Days \n", "* 8th Grade Passing Algebra 1 \n", "* Student Attendance Rate \n", "* Student Chronic Truancy Rate \n", "* Avg Class Size – All Grades \n", "* Teacher Retention Rate\n", ":::" ] }, { "cell_type": "markdown", "id": "120a6c0c", "metadata": {}, "source": [ "```{index} true positive\n", "```\n", "\n", "```{index} false positive\n", "```\n", "\n", "```{index} true negative\n", "```\n", "\n", "```{index} false negative\n", "```\n", "\n", "```{index} true accuracy\n", "```\n", "\n", "```{index} specificity\n", "```\n", "\n", "```{index} precision\n", "```\n", "\n", "```{index} sensitivity\n", "```\n", "\n", "```{index} recall\n", "```\n", "\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "id": "5bd890ae", "metadata": {}, "source": [ "__Scoring__\n", "\n", "Scoring for our competition is based on values for the __Confusion Matrix__:\n", "\n", "$$\n", "\\begin{pmatrix}\n", "TP & FN \\\\\n", "FP & TN \n", "\\end{pmatrix}\n", "$$\n", "\n", "where\n", "\n", "* TP=True Positive: your model predicts exemplary and the school is exemplary\n", "* TN=True Negative: your model predicts not exemplary and the school is not exemplary\n", "* FP=False Positive: your model predicts exemplary but the school is not exemplary\n", "* FN=False Negative: your model predicts not exemplary but the school is exemplary\n", " \n", " The number of each type of prediction then determines\n", " \n", "* **Accuracy** = $\\frac{\\mid TP\\mid + \\mid TN \\mid}{\\mid TP\\mid + \\mid TN \\mid+ \\mid FP\\mid + \\mid FN \\mid}$ (proportion that were correctly predicted out of all the schools)\n", " \n", "* **Specificity (Precision)** = $\\frac{\\mid TN\\mid}{\\mid TN\\mid + \\mid FP\\mid }$ (proportion that were correct out of those you predicted to be exemplary)\n", " \n", "* **Sensitivity (Recall)** = $\\frac{\\mid TP\\mid}{\\mid TP\\mid + \\mid FN\\mid }$ (proportion that you predicted correctly among just the exemplary schools)\n", " \n", "Your competition (F1) score is the geometric mean of the precision (specificity), and recall(sensitivity):\n", " \n", "$$\n", "F_1=2\\frac{precision*recall}{precision+recall}\n", "$$" ] }, { "cell_type": "markdown", "id": "c0788e11", "metadata": {}, "source": [ "__STEP ONE__ Exploratory Data Analysis\n", "1a) Import the usual libraries including matplotlib.pyplot as plt, as well as the MiddleSchool report card data into a datframe df." ] }, { "cell_type": "code", "execution_count": 12, "id": "b3cff453", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
School NameSchool TypeGrades ServedSummative Designation# Student Enrollment% Student Enrollment - Black or African American% Student Enrollment - Hispanic or Latino% Student Enrollment - Children with Disabilities% Student Enrollment - Low IncomeTotal Number of School Days% 8th Grade Passing Algebra 1Student Attendance RateStudent Chronic Truancy RateAvg Class Size – All GradesTeacher Retention Rate
0Barbour Two-Way Lang ImmersionELEMENTARYK 1 2 3 4 5 6 7 8Commendable7796.084.36.572.91810.096.17.426.291.4
1Galapagos Rockford Charter SchCHARTER SCHK 1 2 3 4 5 6 7 8Commendable34360.118.416.644.31810.095.829.619.144.2
\n", "
" ], "text/plain": [ " School Name School Type Grades Served \\\n", "0 Barbour Two-Way Lang Immersion ELEMENTARY K 1 2 3 4 5 6 7 8 \n", "1 Galapagos Rockford Charter Sch CHARTER SCH K 1 2 3 4 5 6 7 8 \n", "\n", " Summative Designation # Student Enrollment \\\n", "0 Commendable 779 \n", "1 Commendable 343 \n", "\n", " % Student Enrollment - Black or African American \\\n", "0 6.0 \n", "1 60.1 \n", "\n", " % Student Enrollment - Hispanic or Latino \\\n", "0 84.3 \n", "1 18.4 \n", "\n", " % Student Enrollment - Children with Disabilities \\\n", "0 6.5 \n", "1 16.6 \n", "\n", " % Student Enrollment - Low Income Total Number of School Days \\\n", "0 72.9 181 \n", "1 44.3 181 \n", "\n", " % 8th Grade Passing Algebra 1 Student Attendance Rate \\\n", "0 0.0 96.1 \n", "1 0.0 95.8 \n", "\n", " Student Chronic Truancy Rate Avg Class Size – All Grades \\\n", "0 7.4 26.2 \n", "1 29.6 19.1 \n", "\n", " Teacher Retention Rate \n", "0 91.4 \n", "1 44.2 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "df=pd.read_excel(\"MiddleSchool.xlsx\")\n", "df.head(2)" ] }, { "cell_type": "markdown", "id": "e36fb03d", "metadata": {}, "source": [ "__1b)__ Display the column names and the number of data rows in each column." ] }, { "cell_type": "code", "execution_count": 13, "id": "97eb992f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "School Name 122\n", "School Type 122\n", "Grades Served 122\n", "Summative Designation 122\n", "# Student Enrollment 122\n", "% Student Enrollment - Black or African American 122\n", "% Student Enrollment - Hispanic or Latino 122\n", "% Student Enrollment - Children with Disabilities 122\n", "% Student Enrollment - Low Income 122\n", "Total Number of School Days 122\n", "% 8th Grade Passing Algebra 1 122\n", "Student Attendance Rate 122\n", "Student Chronic Truancy Rate 122\n", "Avg Class Size – All Grades 122\n", "Teacher Retention Rate 122\n", "dtype: int64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.count()" ] }, { "cell_type": "markdown", "id": "7e5cb180", "metadata": {}, "source": [ "__1c__ Get the max and min values in each column." ] }, { "cell_type": "code", "execution_count": 15, "id": "60e522ee", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "min enroll 92\n", "max enroll 1306\n", "min % Student Enrollment - Black or African American 0.2\n", "max % Student Enrollment - Black or African American 99.5\n", "min % Student Enrollment - Hispanic or Latino 0.5\n", "max % Student Enrollment - Hispanic or Latino 99.8\n", "min % Student Enrollment - Children with Disabilities 5.8\n", "max % Student Enrollment - Children with Disabilities 56.3\n", "min % Student Enrollment - Low Income 0.3\n", "max % Student Enrollment - Low Income 99.6\n", "min Total Number of School Days 174\n", "max Total Number of School Days 186\n", "min % 8th Grade Passing Algebra 1 0.0\n", "max % 8th Grade Passing Algebra 1 100.0\n", "min Student Attendance Rate 91.1\n", "max Student Attendance Rate 99.7\n", "min Student Chronic Truancy Rate 0.0\n", "max Student Chronic Truancy Rate 46.2\n", "Avg Class Size – All Grades 2.0\n", "Avg Class Size – All Grades 31.6\n", "Teacher Retention Rate 0.0\n", "Teacher Retention Rate 100.0\n" ] } ], "source": [ "n1min=df[\"# Student Enrollment\"].min()\n", "n1max=df[\"# Student Enrollment\"].max()\n", "n2min=df[\"% Student Enrollment - Black or African American\"].min()\n", "n2max=df[\"% Student Enrollment - Black or African American\"].max()\n", "n3min=df[\"% Student Enrollment - Hispanic or Latino\"].min()\n", "n3max=df[\"% Student Enrollment - Hispanic or Latino\"].max()\n", "n4min=df[\"% Student Enrollment - Children with Disabilities\"].min()\n", "n4max=df[\"% Student Enrollment - Children with Disabilities\"].max()\n", "n5min=df[\"% Student Enrollment - Low Income\"].min()\n", "n5max=df[\"% Student Enrollment - Low Income\"].max()\n", "n6min=df[\"Total Number of School Days\"].min()\n", "n6max=df[\"Total Number of School Days\"].max()\n", "n7min=df[\"% 8th Grade Passing Algebra 1\"].min()\n", "n7max=df[\"% 8th Grade Passing Algebra 1\"].max()\n", "n8min=df[\"Student Attendance Rate\"].min()\n", "n8max=df[\"Student Attendance Rate\"].max()\n", "n9min=df[\"Student Chronic Truancy Rate\"].min()\n", "n9max=df[\"Student Chronic Truancy Rate\"].max()\n", "n10min=df[\"Avg Class Size – All Grades\"].min()\n", "n10max=df[\"Avg Class Size – All Grades\"].max()\n", "n11min=df[\"Teacher Retention Rate\"].min()\n", "n11max=df[\"Teacher Retention Rate\"].max()\n", "print(\"min enroll\",n1min)\n", "print(\"max enroll\",n1max)\n", "print(\"min % Student Enrollment - Black or African American\",n2min)\n", "print(\"max % Student Enrollment - Black or African American\",n2max)\n", "print(\"min % Student Enrollment - Hispanic or Latino\",n3min)\n", "print(\"max % Student Enrollment - Hispanic or Latino\",n3max)\n", "print(\"min % Student Enrollment - Children with Disabilities\",n4min)\n", "print(\"max % Student Enrollment - Children with Disabilities\",n4max)\n", "print(\"min % Student Enrollment - Low Income\",n5min)\n", "print(\"max % Student Enrollment - Low Income\",n5max)\n", "print(\"min Total Number of School Days\",n6min)\n", "print(\"max Total Number of School Days\",n6max)\n", "print(\"min % 8th Grade Passing Algebra 1\",n7min)\n", "print(\"max % 8th Grade Passing Algebra 1\",n7max)\n", "print(\"min Student Attendance Rate\",n8min)\n", "print(\"max Student Attendance Rate\",n8max)\n", "print(\"min Student Chronic Truancy Rate\",n9min)\n", "print(\"max Student Chronic Truancy Rate\",n9max)\n", "print(\"Avg Class Size – All Grades\",n10min)\n", "print(\"Avg Class Size – All Grades\",n10max)\n", "print(\"Teacher Retention Rate\",n11min)\n", "print(\"Teacher Retention Rate\",n11max)\n" ] }, { "cell_type": "markdown", "id": "317115e9", "metadata": {}, "source": [ "__1d__ Check how many schools are in each category." ] }, { "cell_type": "code", "execution_count": 16, "id": "fc06fb40", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Commendable 102\n", "Targeted 8\n", "Exemplary 6\n", "Comprehensive 6\n", "Name: Summative Designation, dtype: int64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"Summative Designation\"].value_counts()" ] }, { "cell_type": "markdown", "id": "db08abbd", "metadata": {}, "source": [ "__STEP TWO__\n", "Define a function which predicts whether a school is exemplary (1) or not-exemplary (0)." ] }, { "cell_type": "markdown", "id": "68f45acd", "metadata": {}, "source": [ "__2a__ For a simple prediction, let us predict that a school is exemplary if the Teacher Retention Rate is at least 90%." ] }, { "cell_type": "code", "execution_count": 17, "id": "9ea6ccff", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "|TP|= 5\n", "|TN|= 70\n", "|FP|= 46\n", "|FN|= 1\n", "Accuracy (% correct all 122 schools)= 61.0 %\n", "Precision (% correct of those you predicted to be exemplary) = 10.0 %\n", "Recall (% correct of schools that are exemplary) = 83.0 %\n", "COMPETITION F1 SCORE= 17.85 %\n" ] } ], "source": [ "#---PREDICTION MODEL----#\n", "def mypredict(df):\n", " for i in df.index:\n", " if df.loc[i,\"Teacher Retention Rate\"]>90:\n", " df.loc[i,\"Prediction\"]=1\n", " else:\n", " df.loc[i,\"Prediction\"]=0\n", " return df\n", "\n", "#---APPLY MODEL TO OUR DATA---#\n", "mydf=mypredict(df)\n", "mydf=mydf.reset_index(drop=True)\n", "\n", "#---COMPUTE YOUR SCORE---#\n", "TP=0\n", "TN=0\n", "FP=0\n", "FN=0\n", "numschools=0\n", "for i in mydf.index:\n", " if mydf.loc[i,\"Prediction\"]==1 and mydf.loc[i,\"Summative Designation\"]==\"Exemplary\":\n", " TP=TP+1\n", " if mydf.loc[i,\"Prediction\"]==0 and mydf.loc[i,\"Summative Designation\"]!=\"Exemplary\":\n", " TN=TN+1\n", " if mydf.loc[i,\"Prediction\"]==1 and mydf.loc[i,\"Summative Designation\"]!=\"Exemplary\":\n", " FP=FP+1\n", " if mydf.loc[i,\"Prediction\"]==0 and mydf.loc[i,\"Summative Designation\"]==\"Exemplary\":\n", " FN=FN+1\n", " numschools=numschools+1\n", "print(\"|TP|=\",TP)\n", "print(\"|TN|=\",TN)\n", "print(\"|FP|=\",FP)\n", "print(\"|FN|=\",FN)\n", "accuracy=round((TP+TN)/numschools,2)\n", "precision=round(TP/(TP+FP),2)\n", "recall=round(TP/(TP+FN),2)\n", "F1score=2*(precision*recall)/(precision+recall)\n", "print(\"Accuracy (% correct all 122 schools)=\",100*accuracy,\"%\")\n", "print(\"Precision (% correct of those you predicted to be exemplary) =\",100*precision,\"%\")\n", "print(\"Recall (% correct of schools that are exemplary) =\",100*recall,\"%\")\n", "print('COMPETITION F1 SCORE=',round(F1score*100,2),\"%\" )\n", " " ] }, { "cell_type": "markdown", "id": "0e9dfcef", "metadata": {}, "source": [ "## Assignment" ] }, { "cell_type": "markdown", "id": "328d92c1", "metadata": {}, "source": [ ":::{admonition} Assignment\n", "\n", "Modify the Prediction Model to see how high you can score.\n", "\n", ":::" ] } ], "metadata": { "celltoolbar": "Edit Metadata", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 5 }