{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# JNB Lab Solutions\n", "\n", "The United States publishes government grant opportunities to solicit eligible opportunities. The dataset of grant opportunities is updated every day and can be found at https://www.grants.gov/xml-extract to be downloaded as an xml file. In this lab, we will be classifying these grant entries into the various UN SDG goals that we talked about throughout the chapter.\n", "\n", "## Lab Exercises, Part 1: Supervised Learning and Vectorizations\n", "\n", "1. Use `pandas` to read in the provided xml file, which is from June 25, 2024, when this lab was first being started. You can use xml files from other days, but the solutions and exercises for this lab are based off of this file." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "remove-output" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.\n", "Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.\n" ] } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "#reading the xml file\n", "df = pd.read_xml(\"GrantsDBExtract20240625v2.xml\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we've read in the data, we can take a look at it." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpportunityIDOpportunityTitleOpportunityNumberOpportunityCategoryFundingInstrumentTypeCategoryOfFundingActivityCategoryExplanationCFDANumbersEligibleApplicantsAdditionalInformationOnEligibility...CloseDateExplanationOpportunityCategoryExplanationEstimatedSynopsisPostDateFiscalYearEstimatedSynopsisCloseDateEstimatedSynopsisCloseDateExplanationEstimatedAwardDateEstimatedProjectStartDateGrantorContactNameGrantorContactPhoneNumber
0262148Establishment of the Edmund S. Muskie Graduate...SCAPPD-14-AW-161-SCA-08152014DCAOPublic Diplomacy19.04025.0Eligibility for U.S. institutions is limited t......NoneNoneNaNNaNNaNNoneNaNNaNNoneNone
1262149Eradication of Yellow Crazy Ants on Johnston A...F14AS00402DCANRNone15.60899.0The recipient has already been selected for th......NoneNoneNaNNaNNaNNoneNaNNaNNoneNone
2131073Cooperative Ecosystem Studies Unit, Piedmont S...G12AS20003DCASTNone15.80825.0This financial assistance opportunity is being......NoneNoneNaNNaNNaNNoneNaNNaNNoneNone
3131094Plant Feedstock Genomics for Bioenergy: A Joi...DE-FOA-0000598DGSTNone81.04999.0DOE Eligibility Criteria: Applicants from U.S.......NoneNoneNaNNaNNaNNoneNaNNaNNoneNone
4131095Management of HIV-Related Lung Disease and Car...RFA-HL-12-034DGHLNone93.83825.0Other Eligible Applicants include the followin......NoneNoneNaNNaNNaNNoneNaNNaNNoneNone
\n", "

5 rows × 38 columns

\n", "
" ], "text/plain": [ " OpportunityID OpportunityTitle \\\n", "0 262148 Establishment of the Edmund S. Muskie Graduate... \n", "1 262149 Eradication of Yellow Crazy Ants on Johnston A... \n", "2 131073 Cooperative Ecosystem Studies Unit, Piedmont S... \n", "3 131094 Plant Feedstock Genomics for Bioenergy: A Joi... \n", "4 131095 Management of HIV-Related Lung Disease and Car... \n", "\n", " OpportunityNumber OpportunityCategory FundingInstrumentType \\\n", "0 SCAPPD-14-AW-161-SCA-08152014 D CA \n", "1 F14AS00402 D CA \n", "2 G12AS20003 D CA \n", "3 DE-FOA-0000598 D G \n", "4 RFA-HL-12-034 D G \n", "\n", " CategoryOfFundingActivity CategoryExplanation CFDANumbers \\\n", "0 O Public Diplomacy 19.040 \n", "1 NR None 15.608 \n", "2 ST None 15.808 \n", "3 ST None 81.049 \n", "4 HL None 93.838 \n", "\n", " EligibleApplicants AdditionalInformationOnEligibility ... \\\n", "0 25.0 Eligibility for U.S. institutions is limited t... ... \n", "1 99.0 The recipient has already been selected for th... ... \n", "2 25.0 This financial assistance opportunity is being... ... \n", "3 99.0 DOE Eligibility Criteria: Applicants from U.S.... ... \n", "4 25.0 Other Eligible Applicants include the followin... ... \n", "\n", " CloseDateExplanation OpportunityCategoryExplanation \\\n", "0 None None \n", "1 None None \n", "2 None None \n", "3 None None \n", "4 None None \n", "\n", " EstimatedSynopsisPostDate FiscalYear EstimatedSynopsisCloseDate \\\n", "0 NaN NaN NaN \n", "1 NaN NaN NaN \n", "2 NaN NaN NaN \n", "3 NaN NaN NaN \n", "4 NaN NaN NaN \n", "\n", " EstimatedSynopsisCloseDateExplanation EstimatedAwardDate \\\n", "0 None NaN \n", "1 None NaN \n", "2 None NaN \n", "3 None NaN \n", "4 None NaN \n", "\n", " EstimatedProjectStartDate GrantorContactName GrantorContactPhoneNumber \n", "0 NaN None None \n", "1 NaN None None \n", "2 NaN None None \n", "3 NaN None None \n", "4 NaN None None \n", "\n", "[5 rows x 38 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. What variable seems to be the best for which to classify these grant applications into various UN SDGs?\n", "\n", "3. Take the variable you've chosen above and transform it into a bag of words matrix." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<78106x24535 sparse matrix of type ''\n", "\twith 762851 stored elements in Compressed Sparse Row format>" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn import metrics\n", "from sklearn.metrics import confusion_matrix\n", "from sklearn.metrics import ConfusionMatrixDisplay\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.naive_bayes import MultinomialNB\n", "from sklearn.neural_network import MLPClassifier\n", "\n", "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.feature_extraction.text import CountVectorizer\n", "from sklearn.feature_extraction.text import TfidfVectorizer\n", "\n", "docs = df.OpportunityTitle\n", "cv = CountVectorizer()\n", "cv_fit = cv.fit_transform(docs)\n", "\n", "cv_fit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. From the bag of words that you've made, identify the first ten features and print them out. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['000', '000001', '000001azerbaijan', '000002', '000003', '000008',\n", " '00001', '000011', '00001395', '00002413'], dtype=object)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "feature_names = cv.get_feature_names_out()\n", "feature_names[:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. Refer to the [section on Document-Term Matrices](sec2_transform_features.ipynb) to create another vectorization of the documents. Show features 100-110." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
programresearchclinicalnationalfytrialhealthgrantsdevelopmenteducation
1001100000000
1010000000110
1020000000000
1030000003000
1040000000000
1050002000010
1060000000000
1070000000000
1081000000000
1090000000000
1100001000000
\n", "
" ], "text/plain": [ " program research clinical national fy trial health grants \\\n", "100 1 1 0 0 0 0 0 0 \n", "101 0 0 0 0 0 0 0 1 \n", "102 0 0 0 0 0 0 0 0 \n", "103 0 0 0 0 0 0 3 0 \n", "104 0 0 0 0 0 0 0 0 \n", "105 0 0 0 2 0 0 0 0 \n", "106 0 0 0 0 0 0 0 0 \n", "107 0 0 0 0 0 0 0 0 \n", "108 1 0 0 0 0 0 0 0 \n", "109 0 0 0 0 0 0 0 0 \n", "110 0 0 0 1 0 0 0 0 \n", "\n", " development education \n", "100 0 0 \n", "101 1 0 \n", "102 0 0 \n", "103 0 0 \n", "104 0 0 \n", "105 1 0 \n", "106 0 0 \n", "107 0 0 \n", "108 0 0 \n", "109 0 0 \n", "110 0 0 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "count_vectorizer = CountVectorizer(stop_words='english')\n", "count_vector = count_vectorizer.fit_transform(docs).toarray()\n", "count_vector_df_unigram = pd.DataFrame(count_vector, columns=count_vectorizer.get_feature_names_out())\n", "term_freq = pd.DataFrame({\"term\": count_vector_df_unigram.columns.values, \"freq\" : count_vector_df_unigram.sum(axis=0)})\n", "count_vector_df_unigram.loc[100:110,term_freq.sort_values(by=\"freq\", ascending=False)[:10].term] # take a portion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. (Bonus, if you have a strong computer) Create yet another vectorization, similar to the above, but using bigrams. Again, show features 100-110." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "tags": [ "remove-output" ] }, "outputs": [ { "ename": "", "evalue": "", "output_type": "error", "traceback": [ "\u001b[1;31mThe Kernel crashed while executing code in the current cell or a previous cell. \n", "\u001b[1;31mPlease review the code in the cell(s) to identify a possible cause of the failure. \n", "\u001b[1;31mClick here for more info. \n", "\u001b[1;31mView Jupyter log for further details." ] } ], "source": [ "#DO NOT RUN THIS CELL UNLESS YOU HAVE BEEFY COMPUTER\n", "\n", "count_vectorizer = CountVectorizer(ngram_range=(2,2), stop_words='english') \n", "count_vector = count_vectorizer.fit_transform(docs).toarray()\n", "\n", "count_vector_df_bigram = pd.DataFrame(count_vector, columns=count_vectorizer.get_feature_names_out())\n", "term_freq = pd.DataFrame({\"term\": count_vector_df_bigram.columns.values, \"freq\" : count_vector_df_bigram.sum(axis=0)})\n", "count_vector_df_bigram.loc[100:110,term_freq.sort_values(by=\"freq\", ascending=False)[:10].term] # take a portion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "7. Once again, make another vectorization, using the TF-IDF Vectorizer. Show features 100-110." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
programresearchnationalclinicalgrantsfyhealthdevelopmentcooperativetrial
1100.0000000.0000000.2733890.00.0000000.00.00.0000000.00.0
1110.0000000.0000000.0000000.00.0000000.00.00.0000000.00.0
1120.1670610.0000000.0000000.00.0000000.00.00.0000000.00.0
1130.2003720.0000000.0000000.00.0000000.00.00.0000000.00.0
1140.0000000.0000000.0000000.00.0000000.00.00.1613330.00.0
1150.0000000.0000000.0000000.00.0000000.00.00.0000000.00.0
1160.0000000.1337040.0000000.00.0000000.00.00.0000000.00.0
1170.0000000.0000000.0000000.00.0000000.00.00.1482900.00.0
1180.0000000.0000000.0000000.00.0000000.00.00.0000000.00.0
1190.0000000.0000000.0000000.00.3043180.00.00.0000000.00.0
1200.0000000.0000000.0000000.00.0000000.00.00.0000000.00.0
\n", "
" ], "text/plain": [ " program research national clinical grants fy health \\\n", "110 0.000000 0.000000 0.273389 0.0 0.000000 0.0 0.0 \n", "111 0.000000 0.000000 0.000000 0.0 0.000000 0.0 0.0 \n", "112 0.167061 0.000000 0.000000 0.0 0.000000 0.0 0.0 \n", "113 0.200372 0.000000 0.000000 0.0 0.000000 0.0 0.0 \n", "114 0.000000 0.000000 0.000000 0.0 0.000000 0.0 0.0 \n", "115 0.000000 0.000000 0.000000 0.0 0.000000 0.0 0.0 \n", "116 0.000000 0.133704 0.000000 0.0 0.000000 0.0 0.0 \n", "117 0.000000 0.000000 0.000000 0.0 0.000000 0.0 0.0 \n", "118 0.000000 0.000000 0.000000 0.0 0.000000 0.0 0.0 \n", "119 0.000000 0.000000 0.000000 0.0 0.304318 0.0 0.0 \n", "120 0.000000 0.000000 0.000000 0.0 0.000000 0.0 0.0 \n", "\n", " development cooperative trial \n", "110 0.000000 0.0 0.0 \n", "111 0.000000 0.0 0.0 \n", "112 0.000000 0.0 0.0 \n", "113 0.000000 0.0 0.0 \n", "114 0.161333 0.0 0.0 \n", "115 0.000000 0.0 0.0 \n", "116 0.000000 0.0 0.0 \n", "117 0.148290 0.0 0.0 \n", "118 0.000000 0.0 0.0 \n", "119 0.000000 0.0 0.0 \n", "120 0.000000 0.0 0.0 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tfidf_vectorizer = TfidfVectorizer(ngram_range=(1,1), stop_words='english')\n", "tfidf_vector = tfidf_vectorizer.fit_transform(docs).toarray()\n", "tfidf_vector_df = pd.DataFrame(tfidf_vector, columns=tfidf_vectorizer.get_feature_names_out())\n", "term_freq = pd.DataFrame({\"term\": tfidf_vector_df.columns.values, \"freq\" : tfidf_vector_df.sum(axis=0)})\n", "tfidf_vector_df.loc[110:120,term_freq.sort_values(by=\"freq\", ascending=False)[:10].term] # take a portion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "8. Using the function you made in Exercise 4 from Section 5, modify it so as to train a model on 90% of the UNSDG dataset. Then, use this model to assign predicted classes to each of the entries in the dataset using the Perceptron, Naive Bayes, and Ridge Classifier models. \n", "\n", "*This solution only uses Naive Bayes for simplicity; the other algorithms can easily replace the* `nb` *variable in this solution.*" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 5 3 10 ... 4 6 3]\n" ] } ], "source": [ "# change this to your own data directory\n", "data_dir = \"data/\"\n", "\n", "# read and preprocess data\n", "text_file_name = \"osdg-community-data-v2023-01-01.csv\"\n", "text_df = pd.read_csv(data_dir + text_file_name,sep = \"\\t\", quotechar='\"')\n", "col_names = text_df.columns.values[0].split('\\t')\n", "text_df[col_names] = text_df[text_df.columns.values[0]].apply(lambda x: pd.Series(str(x).split(\"\\t\")))\n", "text_df = text_df.astype({'sdg':int, 'labels_negative': int, 'labels_positive':int, 'agreement': float}, copy=True)\n", "text_df.drop(text_df.columns.values[0], axis=1, inplace=True)\n", "text_df = text_df.query(\"agreement > 0.5 and (labels_positive - labels_negative) > 2\")\n", "text_df.reset_index(inplace=True, drop=True)\n", "\n", "docs = text_df.text\n", "categories = text_df.sdg\n", "X_train, X_test, y_train, y_test = \\\n", " train_test_split(docs, categories, test_size=0.1, random_state=7)\n", "\n", "X_train_count_vectorizer = CountVectorizer(ngram_range=(2,2), stop_words = \"english\" )\n", "X_train_count_vectorizer.fit(X_train) \n", "X_train_count_vector = X_train_count_vectorizer.transform(X_train) \n", "\n", "#train Naive Bayes model\n", "nb = MultinomialNB()\n", "nb.fit(X_train_count_vector, y_train)\n", "\n", "#apply model to the grant titles\n", "X_test_count_vector = X_train_count_vectorizer.transform(docs)\n", "y_pred = nb.predict(X_test_count_vector)\n", "print(y_pred)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One thing you've probably noticed is that this dataset does not explicitly have UNSDG classes associated with it. While we can do what we did in the previous exercise for this lab, it is often not the case that we have pre-labeled data. As such, one thing we can turn to is *unsupervised learning*. This was mentioned in the [section giving an overview on machine learning](sec4_classification_algos.ipynb) and involves data that is not already labeled with the \"correct\" category. \n", "\n", "## Unsupervised Learning\n", "\n", "Two popular algorithms for unsupervised learning revolve around what is known as *clustering*. Clustering, similar to how the name sounds, puts data points into clusters such that there is high similarity within the cluster and low similarity between different clusters. The algorithms for this are a deterministic method called $k$-means, which puts each data point into a definitive cluster, and a probabilistic method called Gaussian Mixture Modeling, which assigns probabilities for each data point belonging in any cluster.\n", "\n", "$k$-means is described in more detail in the [chapter on linear algebra and optimization](../LinearAlgebra/KMeans/jnb2.ipynb). In practice, however, $k$-means clustering takes the following procedure:\n", "1. Create $k$ random points to serve as centroids; you choose the value of $k$. These will be the \"centers\" of each of our clusters.\n", "2. Assign each existing data point to its closest centroid. This is typically done with Euclidean distance, or \n", "\n", "$$\n", "\\text{dist} = \\sqrt{(x_1-a_1)^2 + (x_2-a_2)^2 + ... + (x_j-a_j)^2}\n", "$$\n", "\n", "for data point $x$ and centroid $a$, each with $j$ features (columns in the dataset).\n", "\n", "3. Measure the distance of each point from its assigned centroid and sum all these distances for all $n$ points. Again, this is done with Euclidean distance.\n", "4. Re-calculate the centroid of each cluster by taking the mean vector of all points in the cluster.\n", "5. Repeat steps 2-4 until the total distance metric changes marginally between iterations, or until the centroids do not change position between iterations.\n", "\n", "Gaussian Mixture Modeling is similar to $k$-means, except it assigns probabilities for each point belonging to each cluster, assuming that each point follows a multivariate normal distribution from each cluster's mean point. In practice, it is performed with the Expectation-Maximization (EM) algorithm. Its procedure works as follows: \n", "1. Create $k$ random points to serve as means, and assign each cluster a $j \\times j$ covariance matrix; this can be randomly set, but it is more common to use the identity matrix. These mean vectors and covariance matrices are the *mixture parameters*. We also need a prior probability to help with normalization in the next step, which is a single vector of length $k$ detailing the prior probability of a point belonging in any one cluster. This vector is known as the *mixing proportions*.\n", "2. *Expectation Step*: Calculate the log-likelihood of the current data points given these randomly set parameters. This involves two main steps:\n", "- (a) Calculate the probability of each point belonging in each cluster using the multivariate normal probability density function. Following this, multiply these values by the respective probability found in the mixing proportions, then normalize these probabilities so that they sum to 1. These probabilities are stored in a matrix known as a *hidden matrix*.\n", "- (b) Take the cluster of highest probability for each point, then take the natural log of that probability. Perform that over all points, taking the sum of the natural logs. \n", "3. *Maximization Step*: Given the points assigned to each cluster of maximum likelihood, re-calculate the mean vector and covariance matrix for each cluster, as well as the mixing proportions. \n", "- To recalculate mixing proportions, take the cluster where each point has greatest probability to be in and \"assign\" the point that cluster. Then for each cluster, the new proportion is simply the number of points assigned to that cluster, divided by the total number of points.\n", "- To recalculate mean vectors, for each entry in each mean vector, take the corresponding feature in the data and take the dot product of it to each of the columns in the hidden matrix. Normalize the result based on the sums of the cluster, which gives us a single value for each k; we then do that $j$ times to get the full matrix.\n", "- To recalculate covariance matrices, for each cluster, take the deviation of each point from the mean of the cluster and use these deviations to calculate a new covariance matrix. Then normalize the entries in this covariance matrix by the sums of the probabilities for that cluster.\n", "\n", "Clusters can be evaluated by a variety of metrics, including consulting domain experts or using Jaccard index. Evaluation of these clusters is beyond the scope of this lab.\n", "\n", "## Lab Exercises, Part 2: Unsupervised Learning\n", "\n", "9. Look at the documentation for $k$-means and the EM algorithm on `scikit-learn` and use these with various $k$ to cluster the grants. If you have a powerful computer, try $k = 17$ to match the number of UN SDGs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#answers for this will likely take the form of the following:\n", "#for k-means:\n", "from sklearn.cluster import KMeans\n", "from sklearn.preprocessing import StandardScaler\n", "\n", "scaler = StandardScaler()\n", "scaled_data = scaler.fit_transform(count_vector_df_unigram)\n", "kmeans = KMeans(n_clusters=5, random_state=0).fit_predict(scaled_data)\n", "\n", "#for Gaussian Mixture Modeling with EM Algorithm:\n", "from sklearn.mixture import GaussianMixture\n", "gm = GaussianMixture(n_components=5, random_state=0).fit_predict(scaled_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "10. Look at the entries for one of the $k$-means clusters you made in the previous exercise. How similar are the entries to each other?\n", "\n", "*Answers will vary due to the random nature of both clustering methods.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "11. Compare the entries from the previous cluster you analyzed to a different cluster. How similar are the entries in the first cluster to the ones in the new cluster?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Answers will vary both due to the random nature of clustering, as well as depending on the chosen clusters.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "12. (Bonus) Implement $k$-means and the EM algorithm only using the `numpy` package - feel free to consult other resources for mathematical help. This exercise is not for the faint of heart but is advisable for those who want to improve their understanding of the mathematics underlying these methods!\n", "\n", "*Implementations may vary. One example implementation is given below.*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#For k-means:\n", "import copy\n", "def kmeans(data, k):\n", " #so we can't have k less than 1, or there will be problems.\n", " if k < 1:\n", " print(\"Invalid k-value given; k must be at least 1\")\n", " return -1\n", " \n", " #make a dictionary to hold the classes of all the items.\n", " cluster_assignments = initial_new_cluster(k)\n", "\n", " #we also need something to hold our cluster evaluations\n", " evals = []\n", "\n", " #set our initializations\n", " inits = get_randinits(data, k)\n", "\n", " #main clustering process\n", " clustered = False\n", " #a max iterations parameter for sanity check\n", " maxRuns = 50\n", " numRuns = 0\n", "\n", " while clustered == False and numRuns < maxRuns:\n", " prev_clusters = copy.deepcopy(cluster_assignments)\n", " cluster_assignments = initial_new_cluster(k)\n", " for entry in data:\n", " dists = get_distances(entry, inits)\n", " minDist = np.min(dists)\n", " closest_cluster = np.where(dists == minDist)[0][0]\n", " cluster_assignments[closest_cluster].append(entry)\n", "\n", " new_centers = get_new_centers(cluster_assignments, inits)\n", " #check: are our centers the same?\n", " #if clusters_same(prev_clusters, cluster_assignments):\n", " if centroids_same(inits, new_centers):\n", " clustered = True\n", " else:\n", " inits = copy.deepcopy(new_centers)\n", " evals.append(eval_clusters(cluster_assignments, inits))\n", " numRuns += 1\n", "\n", " return cluster_assignments, evals\n", "\n", "def get_randinits(data, k):\n", " #set min and max values per variable\n", " minvals = []\n", " maxvals = []\n", "\n", " #range over length of first vector\n", " for i in range(len(data[0])):\n", " current_min = data[0][i]\n", " current_max = data[0][i]\n", "\n", " #now range over all rows:\n", " for j in range(len(data)):\n", " if data[j][i] < current_min:\n", " current_min = data[j][i]\n", " if data[j][i] > current_max:\n", " current_max = data[j][i]\n", "\n", " minvals.append(current_min)\n", " maxvals.append(current_max)\n", "\n", " #okay, now we have a list of the min/max values.\n", " #let's make random vectors with np.random.rand(k, len(minvals))\n", " #to each one, we'll multiply it by the range (min - max) and add the min.\n", " inits = np.random.rand(k, len(minvals))\n", " for i in range(len(minvals)):\n", " current_range = maxvals[i] - minvals[i]\n", " inits[:, i] *= current_range\n", " inits[:, i] += minvals[i]\n", "\n", " return inits\n", "\n", "#two helper functions: get distances, and clusters same.\n", "def get_distances(entry, inits):\n", " dists = []\n", " for item in inits:\n", " currentDist = get_dist(entry, item)\n", " dists.append(currentDist)\n", " return np.array(dists)\n", "\n", "def get_dist(entry, item):\n", " dist = 0.0\n", " for i in range(len(item)):\n", " dist += (entry[i] - item[i]) ** 2\n", " return np.sqrt(dist)\n", "\n", "def centroids_same(c1, c2):\n", " same = np.all(c1 == c2)\n", " return same\n", "\n", "def get_new_centers(clusters, centers):\n", " #we'll get the data, and the number of clusters\n", " newcenters = []\n", " clusts = clusters.keys()\n", " for c in clusts:\n", " current_clust = np.array(clusters[c])\n", " #for each cluster, we're going to average each variable\n", " new_center = []\n", " if len(current_clust) == 0:\n", " newcenters.append(centers[c])\n", " else:\n", " for i in range(len(current_clust[0])):\n", " new_center.append(np.average(current_clust[:, i]))\n", " newcenters.append(new_center)\n", "\n", " return np.array(newcenters)\n", "\n", "def eval_clusters(clusters, centroids):\n", " score = 0\n", " for i in range(len(centroids)):\n", " current_clust = clusters[i]\n", " for j in current_clust:\n", " score += get_dist(j, centroids[i])\n", "\n", " return score\n", "\n", "def initial_new_cluster(k):\n", " cluster_assignments = {}\n", " for i in range(k):\n", " cluster_assignments[i] = []\n", " return cluster_assignments " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#For GMM:\n", "def MultiVarNormal(x, mean, cov):\n", " \"\"\"\n", " MultiVarNormal implements the PDF for a multivariate gaussian distribution\n", " (one sample at a time)\n", " Input:\n", " - x: (d,) numpy array\n", " - mean: (d,) numpy array; the mean vector\n", " - cov: (d,d) numpy array; the covariance matrix\n", " Output:\n", " prob - a scaler\n", " Hint:\n", " - Use np.linalg.det to compute determinant\n", " - Use np.linalg.pinv to invert a matrix\n", " \"\"\"\n", " d = x.shape[0]\n", " pdf = 1\n", " # begin solution\n", " #added: another thing to keep in mind is that sometimes we get determinants very close to 0\n", " #this makes probabilities very large; much larger than 1\n", " #so we need to compute a pseudo-derivative if it is too small.\n", " mydet = np.linalg.det(cov)\n", " #if mydet < 1e-12:\n", " #pseudo determinant through evals\n", " #evals, evecs = np.linalg.eig(cov)\n", " #mydet = np.product(evals[evals > 1e-12])\n", " #d = np.linalg.matrix_rank(cov)\n", "\n", " n1 = 1.0 / np.sqrt(((2.0 * np.pi) ** d) * mydet)\n", " n2 = np.exp(-0.5 * (np.matmul(np.transpose(x - mean), np.matmul(np.linalg.pinv(cov), (x - mean)))))\n", " pdf = n1*n2\n", " # end solution\n", " return pdf\n", "\n", "def UpdateMixProps(hidden_matrix):\n", " \"\"\"\n", " Returns the new mixing proportions given a hidden matrix\n", " Input:\n", " - hidden_matrix: (n, k) numpy array\n", " Output:\n", " - mix_props: (k,) numpy array\n", " Hint:\n", " - See equation in Lecture 10 pg 42\n", " \"\"\"\n", " n,k = hidden_matrix.shape\n", " mix_props = np.zeros(k)\n", " # begin solution\n", " for i in range(k):\n", " current_mean = np.mean(hidden_matrix[:,i])\n", " mix_props[i] = current_mean\n", "\n", " # end solution\n", " return mix_props\n", "\n", "def UpdateMeans(X, hidden_matrix):\n", " \"\"\"\n", " Update means for gaussian distributions given the data and the hidden matrix\n", " Input:\n", " - X: (n, d) numpy array\n", " - hidden_matrix: (n, k) numpy array\n", " Output:\n", " - new_means: (k, d) numpy array\n", " Hint:\n", " - See equation in Lecture 10 pg 43\n", " \"\"\"\n", " n,d = X.shape\n", " k = hidden_matrix.shape[1]\n", " new_means = np.zeros([k,d])\n", " # begin solution\n", " #so to think about this a little bit because for some reason I am not getting this:\n", " #we need to, for each entry in each mean vector, take the corresponding entry in the data (entry = feature)\n", " #then we need to dot product it by each of the columns in the hidden matrix\n", " #and then normalize based on the sums of the cluster, which gives us a single value for each k\n", " #then do that d times to get the full matrix.\n", " for i in range(d): #across the columns\n", " current_subset = X[:,i]\n", " for c in range(k): #across the clusters\n", " current_vals = hidden_matrix[:,c]\n", " \n", " weighted_val = np.dot(current_subset, current_vals)\n", " total_prob = np.sum(current_vals)\n", "\n", " current_entry = weighted_val / total_prob\n", "\n", " new_means[c,i] = current_entry\n", " # end solution\n", " return new_means\n", "\n", "def UpdateCovars(X, hidden_matrix, means):\n", " \"\"\"\n", " Update covariance matrices for gaussian distributions given the data and the hidden matrix\n", " Input:\n", " - X: (n, d) numpy array\n", " - hidden_matrix: (n, k) numpy array\n", " - means: (k, d) numpy array; means for all distributions\n", " Output:\n", " - new_covs: (k, d, d) numpy array\n", " Hint:\n", " - See equation in Lecture 10 pg 43\n", " \"\"\"\n", " n,d = X.shape\n", " k = hidden_matrix.shape[1]\n", " new_covs = np.zeros([k,d,d])\n", " # begin solution\n", " #to think about this:\n", " #first we want to range through each cluster\n", " for i in range(k):\n", " #we want to get the sum of probabilities for this cluster\n", " current_sum = np.sum(hidden_matrix[:,i])\n", "\n", " #next we want to range through each data point\n", " for j in range(n):\n", " #take the deviation from the mean of the cluster\n", " current_diff = X[j] - means[i]\n", " \n", " #take the probability for this data point from this cluster\n", " current_prob = hidden_matrix[j, i]\n", "\n", " #need to get the matrix - np.matmul does not work here\n", " #so we need to brute force this\n", " for a in range(d):\n", " for b in range(d):\n", " current_cov = current_diff[a] * current_diff[b] * current_prob\n", " new_covs[i][a,b] += current_cov\n", "\n", " #then normalize the entries\n", " new_covs[i] = new_covs[i] / current_sum\n", " # end solution\n", " return new_covs\n", "\n", "def HiddenMatrix(X, means, covs, mix_props):\n", " \"\"\"\n", " Computes the hidden matrix for the data.\n", " This function should also compute the log likelihood\n", " Input:\n", " - X: (n, d) numpy array\n", " - means: (k, d) numpy array; the mean vectors\n", " - new_covs: (k, d, d) numpy array; the covariance matrices\n", " - mix_props: (k,) numpy array; the mixing proportions\n", " Output:\n", " - hidden_matrix: (n, k) numpy array\n", " - ll: scalar; the log likelihood\n", " Hint:\n", " - Construct an intermediate matrix t of shape (n, k).\n", " t[i,j] = P(X_i | c = j)P(c = j), for i=1,...,n, j=1,...,k\n", " This matrix can be used to calculate the loglikelihood and the hidden matrix.\n", " - Each row of the hidden matrix sums to 1\n", " - hidden_matrix[i,j] = P(X_i | c = j)P(c = j) / (Sum_{l=1}^{k}(P(X_i | c = l)P(c = l))),\n", " for i=1,...,n, j=1,...,k\n", " \"\"\"\n", " n,d = X.shape\n", " k = means.shape[0]\n", " hidden_matrix = np.zeros([n,k])\n", " ll = 0\n", " t = np.zeros([n,k]) # intermediate matrix\n", " \n", " # begin solution\n", " for i in range(n): #across the rows\n", " for c in range(k): #across the clusters\n", " current_mean = means[c]\n", " current_cov = covs[c]\n", " current_mix = mix_props[c]\n", "\n", " current_prob = MultiVarNormal(X[i, :], current_mean, current_cov)\n", " #current_prob = stats.multivariate_normal.pdf(X[i, :], current_mean, current_cov)\n", " current_entry = current_prob * current_mix\n", "\n", " hidden_matrix[i,c] = current_entry\n", "\n", " #we have the intermediate matrix, we now need to normalize so these entries sum to 1\n", " for i in range(n):\n", " current_total = np.sum(hidden_matrix[i,:])\n", " for j in range(k): #across the columns, which is the clusters for hidden matrix\n", " current_val = hidden_matrix[i,j] / current_total\n", " hidden_matrix[i,j] = current_val\n", "\n", " #now to take the log likelihood via the matrix\n", " for a in range(n):\n", " #we want the log-likelihood of the biggest value\n", " b = np.argmax(hidden_matrix[a,:])\n", " ll += np.log(hidden_matrix[a,b])\n", "\n", " # end solution\n", " return hidden_matrix,ll\n", "\n", "def GMM(X, init_means, init_covs, init_mix_props, thres=0.001):\n", " \"\"\"\n", " Runs the GMM algorithm\n", " Input:\n", " - X: (n, d) numpy array\n", " - init_means: (k,d) numpy array; the initial means\n", " - init_covs: (k,d,d) numpy arry; the initial covariance matrices\n", " - init_mix_props: the initial mixing proportions\n", " Output:\n", " - clusters: (n,) numpy array; the cluster assignment for each sample\n", " - hidden_matrix: (n, k) numpy array\n", " - ll_list: the log likelihood at all iterations\n", " Hint:\n", " - Use all above functions\n", " - Stoping condition: the difference between your ll from the current iteration\n", " and the last iteration is below your threshold\n", " - You can set maximum iteration as 1,000 to avoid infinite loop\n", " - Remember to check if your algorithm has converged\n", " \"\"\"\n", " n,d = X.shape\n", " k = init_means.shape[0]\n", " clusters = np.zeros(n)\n", " ll_list = []\n", " # begin solution\n", " threshold_met = False\n", " current_iter = 0\n", " current_means = init_means\n", " current_covs = init_covs\n", " current_mix = init_mix_props\n", " while threshold_met != True:\n", " #first: get the hidden matrix and log-likelihood\n", " current_hidden, current_ll = HiddenMatrix(X, current_means, current_covs, current_mix)\n", " ll_list.append(current_ll)\n", "\n", " #second: update all parameters\n", " current_means = UpdateMeans(X, current_hidden)\n", " current_covs = UpdateCovars(X, current_hidden, current_means)\n", " current_mix = UpdateMixProps(current_hidden)\n", "\n", " #third: check if our threshold is met\n", " #or if we have only run one time.\n", " if current_iter == 0:\n", " current_iter += 1\n", " elif np.abs(ll_list[current_iter] - ll_list[current_iter - 1]) < thres:\n", " threshold_met = True\n", " else:\n", " current_iter += 1\n", "\n", " #now we need to put each point into a cluster\n", " for j in range(n):\n", " largest_prob_cluster = np.argmax(current_hidden[j, :])\n", " clusters[j] = largest_prob_cluster\n", " # end solution\n", " return clusters,current_hidden,ll_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lab Exercises, Part 3: Similarity\n", "\n", "13. Construct a heatmap for 40 of the grants in the dataset. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/px/b7vc3nh913zb_m0x36ncftj00000gn/T/ipykernel_68678/3899229267.py:6: DtypeWarning: Columns (33,36,37) have mixed types. Specify dtype option on import or set low_memory=False.\n", " df = pd.read_csv(\"GrantsDBExtract20240625v2.csv\")[:10000]\n" ] } ], "source": [ "#for this part: I am switching kernels and will put in some of the setup in this\n", "#hidden cell.\n", "import pandas as pd\n", "import numpy as np\n", "\n", "df = pd.read_csv(\"GrantsDBExtract20240625v2.csv\")[:10000]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2024-07-15 21:10:45.922395: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'inputs' with dtype string\n", "\t [[{{node inputs}}]]\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAgMAAAGiCAYAAAB6c8WBAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAABlI0lEQVR4nO3de1xUdf4/8NdcYAaRi0pyUxG1FC+ZgpUiorVhdFGzXc0uaJrlpS2zUMndreyrmGuueUGzNstN0S21rEzFstIgE68l5iUviIIoKiCXYZg5vz/8STtymfcZBgHn9fRx/vDMi898Zs6Z4cPnnPM+GkVRFBAREZHL0tZ3B4iIiKh+cTBARETk4jgYICIicnEcDBAREbk4DgaIiIhcHAcDRERELo6DASIiIhfHwQAREZGL42CAiIjIxXEwQERE5OI4GCAiImogfvjhBzz88MMICgqCRqPBZ599Zvdnvv/+e4SHh8NoNKJdu3ZYunSp6uflYICIiKiBKCoqQvfu3bFo0SJR/sSJE3jggQcQFRWFvXv34tVXX8ULL7yAtWvXqnpeDW9URERE1PBoNBqsX78eQ4YMqTYzdepUbNiwAYcOHapYN27cOOzfvx9paWni5+LMABERUR0ymUwoKCiwWUwmk1PaTktLQ0xMjM26gQMHIj09HWazWdyO3im9cQLzheOi3L3dx4pyIXofuxk/jbuorX3leaLcIK2/KJehle0E+8rOiXJ93YNEuWJYRLkM80VRrhxWUa6XW0tR7mfh631O00qU+9Gt1G7GS/gRSCvLEeUKLSWi3F0erUW5vaXZolyYQfYeHzLlinKdDLeIcleUclEuq+ySKBdqaCHKPWr2tptJc5d9EW4oyBDl2jcJkOUE3z0A4Obkv8Wkn2+LcDLYJGzviOm8KNfe4CfKDSq3v20BYNSZj0U5R0l/J0kkLlqBN954w2bda6+9htdff73Wbefk5MDf3/Z3j7+/P8rLy3HhwgUEBgaK2mkwgwEiIqIGwyobDEkkJCRg8uTJNusMBoPT2tdoNDb/v3b0//r1NeFggIiIqA4ZDAan/vL/XwEBAcjJsZ25zM3NhV6vR4sWspk2wIHBQFZWFpYsWYLU1FTk5ORAo9HA398fffr0wbhx49C6tf3pT5PJVOl4idZkqrM3i4iISBVFdhi0vvXu3RtffPGFzbotW7YgIiICbm5u4nZUHbTasWMHwsLCsH79enTv3h1xcXF48skn0b17d3z22Wfo0qULfvzxR7vtJCYmwsfHx2Z56x3110USERHVCavVeYsKV65cwb59+7Bv3z4AVy8d3LdvHzIzMwFcPeQQFxdXkR83bhxOnTqFyZMn49ChQ/jggw/w73//G6+88oqq51U1M/DSSy/hmWeewb/+9a9qH580aRJ27dpVYztVHT/RFp5R0xUiIqI6o9TTzEB6ejoGDBhQ8f9rvytHjhyJDz/8ENnZ2RUDAwAIDQ3Fxo0b8dJLL2Hx4sUICgrCggUL8Oijj6p6XlWDgV9//RUff1z9GZzPPfecqPJRVcdPzGUX1HSFiIjoptO/f3/UVP7nww8/rLQuOjoae/bsqdXzqjpMEBgYiNTU1GofT0tLE1/GQERE1GDV02GC+qJqZuCVV17BuHHjsHv3btx3333w9/eHRqNBTk4OUlJS8P7772P+/Pl11FUiIqIbpJGcQOgsqssRr1mzBv/617+we/duWCxXr8PU6XQIDw/H5MmTMWzYMIc60i/4XlHum/3viXLDer5oNyMtqnG8VFZUw0vvIcrFussKzhxVikQ5KXeNTpQrU2Tvy6a8X0W5sX69RLn/XN4vyvX3uU2UMwpe7y/CIjwW4RdDU51RlJNui1Z6L1EuVJE977flsuJJkqJdAFAq/AxdFBZj6qGXXQp1UvDZCNY0EbW11yz7fEe5yYqK7bHICizdppO9x1K/WwtFuTZaT1HumKVAlOskfB0HhMXb/HVNRbk1pz4T5RxVdlr2fSTh3rq709qqK6ovLRw+fDiGDx8Os9mMCxeuHuf38/NTdQkDERFRg+bEokONgcNFh9zc3Hh+ABER3Zxc7DABb1RERETk4liOmIiI6HqN5CoAZ+FggIiI6Dr1VXSovvAwARERkYvjzAAREdH1eJiAiIjIxbnYYYIGMxiQFjmRFBMCgP/uecdu5rmIKaK2TO7NRDm9sJDMYWExoQuWYlHOQyur8SArXwO0FRZraeImu+W0GzSi3BDfrqKcr3C3zUe53YyH1l3UVjs3X1Huq4u/iHIvNZMVYsqCWZTrYJa9x3uFhbG6QLYP/KhcFuWa62TP663IjlyeKbdfYOdunbeorRxhYad0YTGh84K+AcADkN9rXsKskxUTOq+Y7IcAeAs/G16Qfe+FCr/jzerq4NUdF6szwHMGiIiIXFyDmRkgIiJqMHiYgIiIyMW52AmEqg8TlJSUYMeOHcjIyKj0WGlpKVasWGG3DZPJhIKCApvFIrw5DhERETmXqsHAkSNHEBYWhn79+qFbt27o378/srOzKx7Pz8/H008/bbedxMRE+Pj42Cy/5h9R33siIqK6oFidtzQCqgYDU6dORbdu3ZCbm4vDhw/D29sbkZGRyMzMVPWkCQkJyM/Pt1m6Cm9LS0REVOesVuctjYCqcwZSU1OxdetW+Pn5wc/PDxs2bMDEiRMRFRWFbdu2wdNTdmmLwWCAwWB7WZpOeFkeEREROZeqwUBJSQn0etsfWbx4MbRaLaKjo7Fq1Sqndo6IiKg+KC52HpuqwUCnTp2Qnp6OsLAwm/ULFy6EoigYNGiQwx3x08gKXOQppaKcpKDQu+lzRG39uecLolwUfEW5VBSIcl30smJH/opsMxoUWWGaHI3sQ6DRyNo7oZSIchesstyjip8ot0dvv3hJibVM1FaeVbbfGfWy/fiYRtbeZWH/PnOTtXfSdFGUy9fL2vPSGkW5rHLZPl+glb1eT0FBnNNa+0WnAOAWyIpntdPJCjEd18ray9E4d/q4DLL29MIiYDHlsmJM24X73m2KrPDUQciKrdW5RnKs31lUnTPwyCOPIDk5ucrHFi1ahBEjRkBpKNWjiIiISETVYCAhIQEbN26s9vGkpCRYG8nJEkRERNXiCYREREQuzsUOE3AwQEREdD3eqIiIiIhcCWcGiIiIrsfDBERERC6ukZz45yw8TEBEROTiODNARER0PR4mqB/7yvNEueyyy6Kcyd1+9T5pZcFP9ywQ5fp3f0aUc9PIJmSylUJR7kzJBVGuiV5WLW6kR0dRrpNnsCgnrd5XJKy29yHOiHJ+Vvv3yjhfli9q6w6DvyinE27bfMUsyl0WvndNhVXv8spklQCD3HxEOR9h5VAvvZsol3K58q3Rq9LH51a7maNW2Wu9UF4kyjUTViD0FlRHBICdFtl3nlRa3mFRLrxZe1Fui/C3Q5ZZ9j21s/yKKNfSzVv2xHWNhwmIiIjIlThlZkBRFHGdeiIiogbPxWYGnDIYMBgM2L9/f6UbGFXHZDLBZDLZrLMqVmiFU6xERER1iXctrMHkyZOrXG+xWDB79my0aNECADBv3rwa20lMTMQbb7xhsy7EKxSh3rJjWUREROQ8qgYD8+fPR/fu3eHr62uzXlEUHDp0CJ6enqLDBQkJCZUGFg+HPaKmK0RERHWHhwmqN3PmTLz33nt4++23cc8991Ssd3Nzw4cffojOnTuL2jEYDDAYbM9+5iECIiJqMFzs0kLVtzBes2YNxo8fj1deeQVms+zyKCIiokbFxW5hrPrP8V69emH37t04f/48IiIi8Msvv/BKAiIiokbMoasJmjZtio8++girV6/GfffdB4ul9mddDtLKirqs0pvshwDoNTq7mSj4itqSFhP6bv/7otyDPSaIcjvzjopys2+JEuWk8zipkBVr8dTICskYNbLdTDoyHaBtIcqtM2fZzfzFq4uorR1l2aJcv6YdRLmj5ouiXLFVtr83M3iIcuJCUZZiUS5cZ7+4FwB8UXpSlLvdK0SUa69tajejiFoCWrjJinFdUmTbormwEFOJplyUk+rbQnY1l/RzZobsL1qz8Kx7aUGuYJ2XKFfnXOwwQa0uLXzsscfQt29f7N69GyEhsg8xERFRg9dIpvedpdZ1Blq1aoVWrVo5oy9ERERUDxrMvQmIiIgaDB4mICIicnEudpiAF/cTERG5OM4MEBERXc/FZgY4GCAiIrqei50zwMMERERELq7BzAxkaGUFPWLdW4tyh5UiuxlpcR03YbEMaTGhr/YmiXJRt48W5eYVHRDltJBVivR28xTl3ASFnQBgpY/9AjEAMCK/UJT7zk32Ot602C9k9XdhMaGs0guiXGs3H1HOqJV99K5YSkW5FhqD/RCAY+bzotxEvewOovnCyj7SfcVfJ9v3viiyX5BrVJNOorYOK7L3WPr5OS8sTvRucIkoJ9X3SJ4o9+tEWXGiF1cIiyK5eYtiWWbZ920pnFuMyWE8TEBEROTiXOwwAQcDRERE13OxmQGeM0BEROTiVA0G9u7dixMnTlT8/+OPP0ZkZCRat26Nvn37YvXq1aJ2TCYTCgoKbBaL8GYXREREdU6xOm9pBFQNBsaMGYOTJ08CAN5//308++yziIiIwPTp09GrVy+MHTsWH3zwgd12EhMT4ePjY7Psy//NoRdARETkdFar85ZGQNU5A4cPH0b79lfPMk5KSsL8+fPx7LPPVjzeq1cvzJw5E6NH13wWfEJCAiZPnmyzblK3UWq6QkRERE6iambAw8MD589fvTTpzJkzuOuuu2wev+uuu2wOI1THYDDA29vbZtEJLz0iIiKqc/U4M5CUlITQ0FAYjUaEh4dj+/btNeZXrlyJ7t27o0mTJggMDMTTTz+NvDzZpabXqBoMxMbGYsmSJQCA6OhofPrppzaP//e//0WHDh1UdYCIiKjBURTnLSqsWbMGkyZNwvTp07F3715ERUUhNjYWmZmZVeZ37NiBuLg4jBkzBgcPHsQnn3yCXbt24ZlnnlH1vBpFkff07NmziIyMRJs2bRAREYElS5YgPDwcYWFhOHz4MH766SesX78eDzzwgKpOAECvoH6iXAe35qJcrqXYbqaLvpmorV3mXFFu38Xjolz3ZqGi3PYD9s+/AID77njWfgiAVbip/fWyIkFphb+Lco95dxPlSiEbQZdCdrJpvmK2mzltzhe11c39FlEuOTddlEu4pY8ol6uRvdYcYeEcs/A9jlJkhWTWW2RFm3x0HqLcQMVXlPtRe8Vu5nbI9uOPSg6LcuONHUW5dI397x4AyLbaL4ymhpvwbztfraxAVbEiK/7TTSvbV3ZZLsna0/mKcnNPJotyjipZ84bT2tIOmQaTybYYlcFggMFQeVvcdddd6NmzZ8Uf3gAQFhaGIUOGIDExsVJ+7ty5WLJkCX7//Y/v44ULF2LOnDk4ffq0vI/iJICgoCDs3bsXvXv3xqZNm6AoCn7++Wds2bIFrVq1wo8//ujQQICIiKhBceJhgqpOmq/qF3tZWRl2796NmJgYm/UxMTFITU2tspt9+vRBVlYWNm7cCEVRcO7cOXz66ad48MEHVb1c1UWHfH19MXv2bMyePVvtjxIRETUOTrwKICHhb5VOmq9qVuDChQuwWCzw97ctpe7v74+cnJwq2+7Tpw9WrlyJ4cOHo7S0FOXl5Rg0aBAWLlyoqo8sOkRERFSHqjppvqrBwDUaje19MBRFqbTumoyMDLzwwgv4xz/+gd27d2PTpk04ceIExo0bp6qPLEdMRER0vXooFuTn5wedTldpFiA3N7fSbME1iYmJiIyMRHx8PADg9ttvh6enJ6KiovB///d/CAwMFD03ZwaIiIiuVw+XFrq7uyM8PBwpKSk261NSUtCnT9UnHRcXF0Ortf1VrtNdvVRfxfUBnBkgIiKqROUlgc4yefJkPPXUU4iIiEDv3r2xbNkyZGZmVkz7JyQk4MyZM1ixYgUA4OGHH8bYsWOxZMkSDBw4ENnZ2Zg0aRLuvPNOBAUFiZ+XgwEiIqIGYvjw4cjLy8OMGTOQnZ2Nrl27YuPGjQgJCQEAZGdn29QcGDVqFAoLC7Fo0SK8/PLL8PX1xT333IO33npL1fNyMEBERHS9erynwIQJEzBhwoQqH/vwww8rrfvrX/+Kv/71r7V6Tg4GiIiIrtdIbjDkLA1mMNDXXXZsQ1ppzUPrZjfjr8he/pmSC6Lc7FuiRLl5RQdEOWllwZR9y0Q56/lToty42EWiXGl5mSwnrHp3VikR5UaUeYly/3W3X4HQTXhPjNaKuyjnY2giyp3UyN674+WyConewqpyB0tkFQObNJF9Ni6Wy6ro3aMPEOU+U86Lcu019qve7VIKRG0Fu8sqke7Xyr57tKj6ErDruTv5fizSCoQXrSb7IRV+sshq4B8vkVVyLTPKKh+SczWYwQAREVGDUQ+XFtYnDgaIiIiuo1jr52qC+qK6zsDChQsxcuRI/Pe//wUA/Oc//0Hnzp3RqVMnvPrqqygvtz/FYzKZUFBQYLOUK7IbshAREZFzqRoMvPnmm5g+fTqKiorw4osv4q233sJLL72EJ554AiNHjsT777+PN9980247Vd20YVf+IYdfBBERkVPVQ9Gh+qTqMMGHH36IDz/8EEOHDsX+/fsRHh6Ojz76CE888QQAoFOnTpgyZQreeKPmWz8mJCRUumnD9G5jVHadiIiojvCcgeplZ2cjIiICANC9e3dotVrccccdFY/37NkTZ8+etdtOVfdx1jv5zFoiIiKSUXWYICAgABkZGQCAo0ePwmKxVPwfAA4ePIiWLVs6t4dEREQ3mlVx3tIIqJoZePzxxxEXF4fBgwfjm2++wdSpU/HKK68gLy8PGo0GM2fOxJ///Oe66isREdGN0UiO9TuLqsHAG2+8AQ8PD/z000947rnnMHXqVNx+++2YMmUKiouL8fDDD4tOIKxKMWRXE0gLdUjK0hgUWXGQJnqjKGe/xM1V0qIkVuGNMqTFhLS3hIhyJuGxMovww2IWFh0qE15RkqOXvX+S99ksfE6TRrYtTBbZXmCBrD3pe6Kr5l7n1zNbZf1rrrFftAuA+CqgYo1sH9AJJyuNsP89IC3Co9PIcm7Cz625no41X7AUi3I+Otn3mfR7Srgrw10r+3XjrmkgV7xzMFA9nU6H6dOn26x77LHH8Nhjjzm1U0RERHTjNJAhGBERUQNST7cwri8cDBAREV3PxQ4TqK5ASERERDcXzgwQERFdr5FcEugsHAwQERFdz8UqEPIwARERkYvjzAAREdH1eJjAvqKiIqxatQqpqanIycmBRqOBv78/IiMjMWLECHh6eqpuM8N8UZRrpZeUEwLaaprYzeRoZAVTRnp0FOVSUSDKebvJ3h9/fVNRblzsIlFOWkzoo91vi3KP9PyrKPdT6RlRrrNBVsq6nUm23da6XbGbkb7HZmFlldZNbhHlihT7t/oG5Pv7L6U5olx4U1nhqW9LZIWswowBolwzRVYsLNOUJ8oVu5XZzTTXeYjaumwpEeVWXzoqyvVq1kGUO1FyXpSTamnwEeX2FpwU5cb79BDlfsIlUa67R7Aot6f4tChX1xReTVCzjIwM3HbbbZgyZQouXbqENm3aoFWrVrh06RLi4+PRsWNHm/sVEBERUcOmemZg4sSJ6NevHz766CO4u7vbPFZWVoZRo0Zh4sSJ2LZtm9M6SUREdEPxMEHNdu7cifT09EoDAQBwd3fHq6++ijvvvNMpnSMiIqoXLnY1gerBQLNmzXD06FF07ty5ysePHTuGZs2a1diGyWSCyWSyWWdVrNAKbxhCRERUp1xsZkD1b9+xY8di5MiRmDt3Lvbv34+cnBycO3cO+/fvx9y5czF69Gg899xzNbaRmJgIHx8fm+V04UlHXwMRERHVguqZgddffx0eHh6YN28epkyZAs3/v3WqoigICAjAtGnTMGXKlBrbSEhIwOTJk23WxXYarLYrREREdcPFriZw6NLCqVOnYurUqThx4gRycq5e0hQQEIDQ0FDRzxsMBhgMBpt1PERAREQNBg8TyIWGhqJ3797o3bt3xUDg9OnTGD16tFM6R0RERHXP6RUIL168iI8++ggffPCBqp8rh2xKZlPer6JcEzeD3cy1Qxz2dPKUFcvw1LiJcm4aWQGWtMLfRbnScvsFWADAIpz2khYTWr9noSjXo8vjotxB0zlR7m/CWaThOvszVcmm46K2tO6yfcWgle0D+0tkhZg6GGWFmJrqjKKc0ckf+WZa+58zANijFIpyvsKCXBfL7ReU8hUWHWqms1+gDADKLLJCUdlll0W5CM82opzUd5d/E+X8jLLiRL/A/nsMABZF9hd0niIr7tTK0FyUq3O8mqBmGzZsqPHx48dlX65EREQNlosdJlA9GBgyZAg0Gg2UGkaD0r+4iYiIqP6pPmcgMDAQa9euhdVqrXLZs2dPXfSTiIjohlGsVqctjYHqwUB4eHiNv/DtzRoQERE1eFbFeUsjoPowQXx8PIqKiqp9vEOHDrwvARERUSOiejAQFRVV4+Oenp6Ijo52uENERET1rpH8Re8sTr+0kIiIqNHjpYVEREQuzsVmBlgDmIiIyMU5PDOQlZUFX19fNG3a1Ga92WxGWloa+vXrp6q9Xm6ySmtRfgGinBvs1zo4IayIlWctFeWMGtnbudKnqf0QgPdKW4lypcLqjWZh7qdSWXU8aWXBvQdXiXJPh78iyrXVyKrt/aXpebuZFItsW0irSzbXy6rZ9XGX7cfSbdvJ3VuU+9Z0WpT7j8FflPu7VfYZ8tXIKhV2cWshyrVws//+eSiyv3U+Lzspyo31v1uU62SR7St/7iDbFlKDDsn2qbUhsvdlRra7KBcqrOC4z3JJlIvRyfa9uqZwZqBm2dnZuPPOOxESEgJfX1+MHDkSV678Ubby4sWLGDBggFM7SUREdEO52KWFqgcD06ZNg06nw86dO7Fp0yZkZGSgf//+uHTpj1Ef6wwQERE1HqoPE2zduhXr169HREQEgKuXGg4fPhz33HMPvvnmGwD2yxGbTCaYTCabdRbFAp3wBj5ERER1qpFUDnQW1TMD+fn5aNasWcX/DQYDPv30U7Rt2xYDBgxAbm6u3TYSExPh4+Njs6TnH1LbFSIiorrBwwQ1a9euHQ4cOGCzTq/X45NPPkG7du3w0EMP2W0jISEB+fn5NkuET5jarhAREZETqB4MxMbGYtmyZZXWXxsQ3HHHHXbPGTAYDPD29rZZeIiAiIgaDBebGVB9zsDMmTNRXFxcdWN6PdatW4esrKxad4yIiKi+uNqJ8KpnBvR6Pby9q7+m+ezZs3jjjTdq1SkiIiK6cTSKk4c/+/fvR8+ePWGxWFT93N1B/UW5U8X2T1AEgCG+Xe1mfrcUiNoqspaJct5aWWGVc+WFolwfQ7Aod1ZYPKlMkW0TL2GBnYOmc6Lc7YZAUW757rmi3HMRU0S5bwqP2M34uHmK2uphDBLlUotOinJ7Hr1FlFvwtawIzxmNWZT71ZwnykmL/xwwXxDl/PWy4k4FVpP9EIAya7ndzEN62X73z8u7RLkuXq1FucxS2XvsIyxQJTXCECrKrRYWWeouLDyVr8j2vQctssJYa7UXRbmNmRtFOUcVjI1xWlve721xWlt1RfVhgg0bNtT4+PHjxx3uDBERUYPQSI71O4vqwcCQIUOg0WhqPJ5ir84AERFRQ8ZyxHYEBgZi7dq1sFqtVS579uypi34SERFRHVE9GAgPD6/xF769WQMiIqIGj5cW1iw+Ph5FRUXVPt6hQwds27atVp0iIiKqV65VjVj9YCAqKqrGxz09PREdHe1wh4iIiOjGUn2YgIiI6GanWBWnLWolJSUhNDQURqMR4eHh2L59e415k8mE6dOnIyQkBAaDAe3bt8cHH3yg6jlVzwwQERHd9OrpWP+aNWswadIkJCUlITIyEu+++y5iY2ORkZGBNm3aVPkzw4YNw7lz5/Dvf/8bHTp0QG5uLsrL7dfi+F9OKzrUrl07bN68GbfeeqtDP788+ElRbov+iijXVmO/oEfbctnEyIfKGVHuEb2sSNB3iqyoRpBWVpRkoElW7ChHL7vks12ZrDjR3zSZotyDxhBR7ixkxZ3eTZ8jyr0QMc1u5pRVtj/dqpUVTDlqlRWyekBpLsr9s/iA/RCA+706inLFwsJTeUqpKHeL1kP4vLIvpsMmWVGxmWhrN5NskL2GE+WXRbm73WRFeA6Uyz7f0iJlUvcqPqLcSqvs+6y38PWWCg+uZ1tlxdEKFdn3wNbTm0U5R10eMcBpbfkmy8+ju+uuu9CzZ08sWbKkYl1YWBiGDBmCxMTESvlNmzbhsccew/Hjx9G8uex7pSqqZwYWLFhQ5frMzEwsX74cAQEBAIAXXnjB4U4RERHVKyeeQGgymWAy2VbXNBgMMBhsB4RlZWXYvXs3pk2z/UMmJiYGqampVba9YcMGREREYM6cOfjPf/4DT09PDBo0CG+++SY8PGSDdcCBwcCkSZMQHBwMvd72R61WK1asWAE3NzdoNBoOBoiIqNFyZtGhxMTESvfsee211/D666/brLtw4QIsFgv8/W1nZfz9/ZGTk1Nl28ePH8eOHTtgNBqxfv16XLhwARMmTMDFixdVnTegejAwduxY/Pzzz1i1ahXCwsIq1ru5uWHLli3o3Lmz2iaJiIhuWgkJCZg8ebLNuutnBf7X9VV8FUWptrKv1WqFRqPBypUr4eNz9VDRvHnz8Oc//xmLFy8Wzw6oHgy8++67+OyzzzBw4EBMmTIFzz//vNomqpwyMSsWuGl0qtsiIiJyOiceJqjqkEBV/Pz8oNPpKs0C5ObmVpotuCYwMBDBwcEVAwHg6jkGiqIgKytLfB6fQ5cWDhkyBGlpaVi/fj1iY2Ornb6oTmJiInx8fGyWrwoPOtIVIiIip6uPSwvd3d0RHh6OlJQUm/UpKSno06dPlT8TGRmJs2fP4sqVP06GPnLkCLRaLVq1aiV+bofrDAQHB2Pr1q3o168fevTooaoEcUJCAvLz822WB726ONoVIiIi57I6cVFh8uTJeP/99/HBBx/g0KFDeOmll5CZmYlx48YBuPr7My4uriL/+OOPo0WLFnj66aeRkZGBH374AfHx8Rg9enTdnkD4vzQaDRISEhATE4MdO3YgMFB2//Cqpkx4iICIiFzd8OHDkZeXhxkzZiA7Oxtdu3bFxo0bERJy9RLt7OxsZGb+cVl306ZNkZKSgr/+9a+IiIhAixYtMGzYMPzf//2fqud1StGh8PBwhIeHAwBOnz6N1157TXX1IyIiooZCqcd7E0yYMAETJkyo8rEPP/yw0rpOnTpVOrSgltOKDl2zf/9+9OzZExaLrLjJNc+0/bMoZ4Gsux6wP9MgbeussFjG+XJZAZs3LbJiHsuMsqIpUlrIig6ds8hex/26AFHuL17nRbl7srNFuQe8OolyC9Jn280MD58kaivfItsWZ8sui3KdjLJ9wFPjJso1F+a2l2aJcq9DVihqs1FWTOiMtViUa6E1inJGwRHOS8LiNZnlskJR0iJBfbWywi/P3ibbFlKR+6q/gdz/+o9b1VXsrve+u2y21kN4tPmcYrIfAtAVnqLc30+tFOUclfeg8+6x0+Kr753WVl1RPTOwYcOGGh8/fvy4w50hIiKiG0/1YGDIkCHQaDQ1njBY3fWQREREjUF9HiaoD6qvJggMDMTatWthtVqrXPbs2VMX/SQiIrpx6ulqgvqiejAQHh5e4y98e7MGRERE1LCoPkwQHx+PoqLqT1Tp0KEDtm2T36GJiIiooXG1wwSqBwNRUVE1Pu7p6YnoaOedhUlERHSjcTBARETk4lxtMOBwOWIiIiK6OXBmgIiI6HqKa10ir3owkJWVBaPRCD8/PwDA9u3bsXTpUmRmZiIkJAQTJ05E7969VXfES9iV7aYzopyH1t1upsQqq1B2vixflPuL8GZLfy+TVdqDWRaT3tfBrMiqQvrrm4pyySZZgakUi6w9HzdZ5bFTVlmFREl1wTW754va8giq+VyZawYFhoty6YUnRLniclnVNj+jj/0QgAiP1qLcGsiqbqYVyvaBnOLLolyfFh1FuXt0t9jNbC46JmpLEVYizRJ+fvSess/jx/tl77FUW6OfKDe+PFeUa26RfR5/vnhUlGvrJau66WuU32mvLvEwgR3Dhg3Drl27AACff/45+vfvjytXriAyMhLFxcWIjo7Gl19+6fSOEhERUd1QPTPw66+/IiwsDACQmJiIWbNmYerUqRWPL1q0CP/4xz/w0EMPOa+XREREN5Bida3DBKpnBrRaLQoKrt7Y48SJE4iNjbV5PDY2FocPH66xDZPJhIKCApulXDgFR0REVNcUq/OWxkD1YCA6OhrJyckAgB49euC7776zeXzbtm0IDg6usY3ExET4+PjYLLvyD6ntChERETmB6sMEs2fPRlRUFM6ePYu+ffti+vTp2LVrF8LCwnD48GGsWbMGS5curbGNhIQETJ482Wbd9G5j1HaFiIioTii8mqBmYWFh2LlzJ/72t79hzpw5KCoqwsqVK6HX69GrVy+sXr0aQ4YMqbENg8EAg8H23uB64RnxREREda2xTO87i0N1Btq3b4/k5GQoioLc3FxYrVb4+fnBzc3N2f0jIiKiOlarCoQajQb+/v4IDAysGAicPn0ao0ePdkrniIiI6oNi1ThtaQw0ipPvN7x//3707NkTFou6qwPuDuovypmssko8txsD7GbyrKWitgK0HqLcr2UXRLmsUlku1jtMlGut2C+wBAAmjWxTm4VFWI4Ki/94amQzRjqN7EPjC1l7v5ZftJv5PvegqK2Ss9tFudYdHhTlnvbtIcqdg6wwVoZw37tcXv0dR/9XqEFWwOa34rOiXMYHT4hyo15IFeUKFfvvi6/GYDcDAKfLC0S51npvUS5L2F6g3kuUk/q9LE+UayIoyAYA7d18RTkjZId4T1oKRTlPrezz/empDaKcozIj7nVaW23Sv3FaW3VF9WGCDRtq3gDHj8sqkhERETVUjeUvemdRPRgYMmQINBoNappQ0Aj/wiMiIqL6p/qcgcDAQKxduxZWq7XKZc+ePXXRTyIiohvG1c4ZUD0YCA8Pr/EXvr1ZAyIiooZOUZy3NAaqDxPEx8ejqKj6k5A6dOiAbdu21apTREREdOOoHgxERdV8K1dPT09ER0c73CEiIqL61lim953FoaJDRERENzNXK0dcq6JDRERE1Pg5VHToiy++QHp6Ou6//3707t0b3377LebOnQur1YqhQ4fi2WefVd2RLv53iXLN9U1FuUOFp+1mjHpZ8Q2dRjZm6te0gyhXCllBpq/PHxDlfAxNRDmTRVawqXWTW0Q5g7A4SHO9rH9HS86Jcrd6+ItyJwTFncI87BenAoDU/KOi3OljX4lyXcKGiXIWYYH0K+ZiUa6bd4god1Dw+QGAfj63iXJphb+LctLXG+TRwm7mSrmsqJj09unSvvm5y4oTSb9XpIKFRYwyzfmi3JkSWSGrc0WXRbk23i1FuZbuPqLcT2e/E+UcdazzQKe11SFjs9Paqiuq98alS5di6NCh+Oqrr3D//fdj5cqVGDJkCIKDg9G2bVtMmjQJ77zzTl30lYiI6IawKhqnLY2B6nMGFixYgKSkJIwdOxbbtm3DAw88gLfffhsTJkwAANx9992YM2cOXnzxRad3loiIiJxP9czAyZMnMXDg1emTAQMGwGKxoF+/fhWP9+/fH6dOnXJeD4mIiG4wRdE4bWkMVA8GWrRoUfHL/uzZsygvL0dmZmbF46dOnULz5s1rbMNkMqGgoMBmsbrazaOJiKjBcrUKhKoPEwwePBhjxozByJEjsWHDBsTFxeHll1+GVquFRqNBfHw8YmJiamwjMTERb7zxhs06vyZBaNm0ldruEBEROV1jqRzoLKpnBt566y1ER0dj9erV6NmzJ9577z2MGTMGgwcPRmxsLFq0aIHExMQa20hISEB+fr7N4ucZ5PCLICIiIsepnhnw9PTEe++9Z7PulVdewfPPPw+z2QwvL/uXtxgMBhgMtvca1zr5MhsiIiJHNZbpfWdx2m9go9EILy8vnD59GqNHj3ZWs0RERDecq11a6FDRoZrs378fPXv2hMUiK+Rxzei2fxblTpUXiHL36OwXzjmmkRUlyVdkxXrOmmV9M2plEzL36WTFdU5qykQ5C2SbukgpF+X2l5wR5R5o0l6Ue+P+y6LcR1/5iXLfaOwXV9l95aSorSe8u4lya4uOiHIHD/1XlHsuYooo181qFOU2WGWFnf6kkxWISVdk+7yHRrbP3wFPUa5YsC9Hlcr247fcZUV4AnWyvmUKv6P8dLJiXFK9ICs69J1yUZQL0sr6F2mW7XtNrbLvn50G2cnkc08mi3KO+rXdQ05rq+vxL53WVl1RfZhgw4YNNT5+/PhxhztDRETUEDSWSwKdRfVgYMiQIdBoNKhpQkGjca03kYiIbi68msCOwMBArF27Flartcplz549ddFPIiIiqiOqBwPh4eE1/sK3N2tARETU0LnaCYSqDxPEx8ejqKio2sc7dOiAbdu21apTRERE9YnnDNgRFRVV4+Oenp6Ijo52uENERER0Y6keDBAREd3sXO1oNwcDRERE12ksx/qdRfVgoKioCKtWrUJqaipycnKg0Wjg7++PyMhIjBgxAp6essIc19tbmi3K3W4MEOWyYL9Q0GWrrFjPZausOFGx1STKXbHI2svVy4rrHC+XFU0pU2SFoFrpZcVLOhhlhWlKISsisuDrFqLcsuIDolxU0w52M8Xlsm12DsLCTsK7b0qLCb2bPkeU69/9GVFOuu9tVGSfx/NlsgI7AzxDRbnPzWdFuUMFp+1mTrW4XdSWuVz2ufi52P5zAsDQJvb3OwA4YJW9d1LbcVmUKxfuo3tMOaKcp0F2g7nvSjPthwB0VmTfK3XN1c4ZUHU1QUZGBm677TZMmTIFly5dQps2bdCqVStcunQJ8fHx6NixIzIyMuqqr0RERFQHVM0MTJw4Ef369cNHH30Ed3d3m8fKysowatQoTJw4kVcTEBFRo8bDBDXYuXMn0tPTKw0EAMDd3R2vvvoq7rzzTqd1joiIqD642PmD6gYDzZo1w9GjR9G5c+cqHz927BiaNWtmtx2TyQSTyfZYrVWx8jbGRERE9UDVb9+xY8di5MiRmDt3Lvbv34+cnBycO3cO+/fvx9y5czF69Gg899xzdttJTEyEj4+PzXKuKMvhF0FERORMrEBYg9dffx0eHh6YN28epkyZUnFDIkVREBAQgGnTpmHKFPtnSSckJGDy5Mk266JuHaimK0RERHXG1a4mUH1p4dSpUzF16lScOHECOTlXLz0JCAhAaKjs0iEAMBgMMBgMNut4iICIiKh+OFx0KDQ0VNUAgIiIqLGQVWO4eaj+c7ykpAQ7duyosp5AaWkpVqxY4ZSOERER1RcFGqctaiUlJSE0NBRGoxHh4eHYvn276Od+/PFH6PV63HHHHaqfU6OouN/wkSNHEBMTg8zMTGg0GkRFRSE5ORmBgYEAgHPnziEoKAgWi6yi1/96POQRUe42NBHlOpjtb4DP3GQVwEzCil1uwkMdLTQG+yEABYr9KoqAvLKgTiPbKX8rzRXlmuqMolxf90BRrhiy12ERXvRjEIx1txQfF7Xlq5dV1jxRJKvaNs0nQpT71CKryPfd/vdFucjbnxblXoKsqtwWd1lFw0NleaLcve5BotwJpcRuZoRJtn8ec9eJcjla2f55xHpFlOuklVX6lCoUfn6knzM/uIlyRuHflCbh39oBVtn2eCnzY1HOUT8E/MVpbfXL+UScXbNmDZ566ikkJSUhMjIS7777Lt5//31kZGSgTZs21f5cfn4+evbsiQ4dOuDcuXPYt2+fqj6qmhmYOnUqunXrhtzcXBw+fBje3t6IjIxEZqaszCQREVFjYFWct5hMJhQUFNgs119ef828efMwZswYPPPMMwgLC8P8+fPRunVrLFmypMb+Pvfcc3j88cfRu3dvh16vqsFAamoqZs2aBT8/P3To0AEbNmxAbGwsoqKicPy47C8sIiKihs4KjdOWqi6nT0xMrPScZWVl2L17N2JiYmzWx8TEIDU1tdq+Ll++HL///jtee+01h1+vqhMIS0pKoNfb/sjixYuh1WoRHR2NVatWOdwRIiKihsKRY/3Vqepy+uuvqAOACxcuwGKxwN/f32a9v79/xdV71zt69CimTZuG7du3V/r9rIaqn+zUqRPS09MRFhZms37hwoVQFAWDBg1yuCNEREQ3o6oup6+J5rrzuxRFqbQOACwWCx5//HG88cYbuO2222rVR1WHCR555BEkJydX+diiRYswYsQIqDgfkYiIqEGyOnGR8vPzg06nqzQLkJubW2m2AAAKCwuRnp6O559/Hnq9Hnq9HjNmzMD+/fuh1+vx7bffip9b1WAgISEBGzdurPbxpKQkWK2udnUmERHdbOrj0kJ3d3eEh4cjJSXFZn1KSgr69OlTKe/t7Y1ffvkF+/btq1jGjRuHjh07Yt++fbjrrrvEz+34AQYiIiJyqsmTJ+Opp55CREQEevfujWXLliEzMxPjxo0DcPWP8jNnzmDFihXQarXo2rWrzc+3bNkSRqOx0np7HBoMZGVlwdfXF02bNrVZbzabkZaWhn79+jnSLBERUYNQX3Pcw4cPR15eHmbMmIHs7Gx07doVGzduREhICAAgOzu7Ti7nV1V0KDs7G4MHD8bu3buh0WjwxBNPYPHixRWDgtoUHeoRECnKeQkL3TTTedjNnCy7KGorr0xWnKiTZ7Aol2uWtdfBcIso92uxrDCN2SorYhTeNESUMwrHkhllwiJGWtkJNiF6H1HuV9M5u5kuhsrH4aqSXnJalGtlaC7KWYSFrC6WF4lyBq2sQMyPB5aLcp3DZAVXOnvICkodKpEVY/LW2//cApVPsKpKiJuvqK3fTbKCSLmmy6JcuyYBoty5snxRTspNKyvWk1V0QZS7p3nVt6q/3mWrrPCUh3AfvWSxX1AKANLObBPlHLXR/zGntfXAudVOa6uuqDpnYNq0adDpdNi5cyc2bdqEjIwM9O/fH5cuXarI8ARCIiKixkXVYYKtW7di/fr1iIi4Wko1KioKw4cPxz333INvvvkGgGzETkRE1JA5s85AY6BqZiA/Px/NmjWr+L/BYMCnn36Ktm3bYsCAAcjNlU0HExERNWRWjfOWxkDVYKBdu3Y4cOCAzTq9Xo9PPvkE7dq1w0MPPSRqp6o6zVbhMVQiIiJyLlWDgdjYWCxbtqzS+msDAultE6uq03yuKEtNV4iIiOqMM+9N0BioGgzMnDkTn3xS9a0Y9Xo91q1bJ7phUUJCAvLz820Wf0/ZLVOJiIjqmuLEpTFQdQKhXq+Ht7d3tY/rdLqKayFrUlWdZq1G1biEiIiozrjagWvVv4FLSkqwY8cOZGRkVHqstLQUK1ascErHiIiI6MZQNTNw5MgRxMTEIDMzExqNBlFRUUhOTkZg4NXCI/n5+Xj66acRFxenuiOdhAV29MLxSxc0sZvJ18uKZQS5yYrc5FmKRbmJ+vaiXLFwfqlJE9lmbK6RFf34tuSU7ImF/iMs7LNMWHTolPWKKPc67M9SrYGswEmowU+U21coe++e9+khym1UsmXtKbLiP9JiQhmHqj4ceL2HekwU5VoLizHdrpflzitldjPjTbLviqctJlFurPcdopxUF6usOJHU9PLDotw8396i3DcaWcGrAH0z+yEA5xTZ9228RbYv1zWri10mr2pmYOrUqejWrRtyc3Nx+PBheHt7IzIysk5KIxIREdUXVztnQNVgIDU1FbNmzYKfnx86dOiADRs2IDY2FlFRUaITB4mIiKjhUXWYoKSkBHq97Y8sXrwYWq0W0dHRWLVqlVM7R0REVB9c7QRCVYOBTp06IT09HWFhYTbrFy5cCEVRMGjQIKd2joiIqD40lsqBzqLqMMEjjzyC5OTkKh9btGgRRowYwRsVERERNTKqBgMJCQnYuHFjtY8nJSXBanW1yRUiIrrZuFoFQlWHCYiIiFyBq81xs+wfERGRi3PKzEC7du2wefNm3HrrrQ63cUUpF+WMGp0o96Ny2W7GS2sUteWjcRflwnWy4hv5wiHnl+WygjMXy2XFQcoViygXZpQVQ2kmLBL0d6ussM8lc6Eo18FN9j5vNtrfp9IKnXtJbD+f20S5dKVAlDtfJstt8ZS9J531soIu0mJCX+5dLMq1av+AKAcvWeyc2f77st4jVNRWhNJalDsmLFCVVnRSlItt6vj3ZVXKzbLP91c62T6VZc4X5QL1so1WYJUVd5qtlT3vUFHKca52AqGqwcCCBQuqXJ+ZmYnly5cjIODqL5EXXnih9j0jIiKqJ6529puqwcCkSZMQHBxcqdaA1WrFihUr4ObmBo1Gw8EAERE1aq52zoCqwcDYsWPx888/Y9WqVTa1Btzc3LBlyxZ07tzZ6R0kIiKiuqXqBMJ3330Xr732GgYOHIhFixY5/KQmkwkFBQU2i0V4PJuIiKiuWTXOWxoD1VcTDBkyBGlpaVi/fj1iY2ORk5Oj+kkTExPh4+Njsxwr+F11O0RERHXB6sSlMXDo0sLg4GBs3boV/fr1Q48ePVRXHUxISEB+fr7N0sFbdltfIiIici6HLy3UaDRISEhATEwMduzYgcBA+T2oDQYDDAbby9J0wksGiYiI6lpj+YveWWpdZyA8PBzh4eHO6AsREVGDoDSSY/3OovowQUlJCXbs2IGMjIxKj5WWlmLFihVO6RgRERHdGKpmBo4cOYKYmBhkZmZCo9EgKioKycnJFYcI8vPz8fTTTyMuLk51R7LKLolyzfWespzOw/5zlssqcXnp3US5L0pPinJuwkMigW4+otw9elnFwGKNbOKrmSLr3x5FVjHQVyOrVOiulz1vsbBa5WXFfsWznOLLorYurpZV5Ov49H9Eub5eHUS5AZ6yKnr7y86LchfNV0S51obmopy0smDW79Xf4Ox/PdhjgihnsprtZi7CfgaQfw+UCvc7T71sf88T7J9qeOntf+cB8r8A2wi/f8yK7HulRLDNAECnaRhV8l3tMIGqd33q1Kno1q0bcnNzcfjwYXh7eyMyMhKZmZl11T8iIqIbjlcT1CA1NRWzZs2Cn58fOnTogA0bNiA2NhZRUVE4fty5Nd6JiIjoxlB1mKCkpKRSKeLFixdDq9UiOjoaq1atcmrniIiI6gPLEdegU6dOSE9PtylFDAALFy6EoigYNGiQUztHRERUHxpL5UBnUXWY4JFHHkFycnKVjy1atAgjRoxQXYCIiIiooeE5AzVISEjAxo3VnxWclJQEq7WxvHQiIiICnFB0iIiI6Gbjan/WqpoZyMrKwoULFyr+v337djzxxBOIiorCk08+ibS0NKd3kIiI6EZTnLg0BqpmBoYNG4a///3viI2Nxeeff46hQ4fioYceQmRkJI4cOYLo6GisW7cODz30kOqOhBpaiHLtNLKiQ96K/XFOgbZM1FbK5crVFqtyu1eIKOevk72GPlZZ7jNFVnBGJxz7ZZryRDlfN1n/urjJtm2ORVaEJdcsKxITYQy2m+nToqOorVEvpIpyFmEBljsge+8+N58V5e51DxLlNiunRbnb9bKiQ/CSxaTFhL7amyTKDe35gt1MF6tR1NZeS6koN8DYWpQ7apUVdopQmopyUqXustvAh8NblLMKf42Fm2TPu8IoK94WICxSRs6lajDw66+/VlxJkJiYiFmzZmHq1KkVjy9atAj/+Mc/HBoMEBERNRS8mqCmsFaLgoKrf5WdOHECsbGxNo/Hxsbi8OHDzusdERFRPeDVBDWIjo6uuLSwR48e+O6772we37ZtG4KD7U/NEhERUcOh6jDB7NmzERUVhbNnz6Jv376YPn06du3ahbCwMBw+fBhr1qzB0qVL7bZjMplgMtkeH7YoFuiEN/AhIiKqS43lxD9nUTUzEBYWhp07d6KsrAxz5sxBUVERVq5ciddffx3Hjh3D6tWrMWrUKLvtJCYmwsfHx2Y5kn/M0ddARETkVFYoTlsaA9V1Btq3b4/k5GQoioLc3FxYrVb4+fnBzU12pihwtXjR5MmTbdY91XWE2q4QERGREzhcdEij0cDf39+hnzUYDDAYbC8f4SECIiJqKBrLiX/OouowAXD1zoU7duxARkbla+9LS0uxYsUKp3SMiIiovrDoUA2OHDmCmJgYZGZmQqPRICoqCsnJyQgMDAQA5Ofn4+mnn0ZcXJzqjjxqlhXCWO8mKzhzprzQbsZT6y5qq4/PraJce62siMgXRUdFOaWJrMhJe43svTNCNvtS7CYrxnSxXFZcpYVbgCh3xFouys1EW1Hua9gvhnKP7hZRWz8ql0S5IA9ZgaVi4VfEoQJZkaCQ5j6inEYju3j6vCLbB84JC0CZrGZRTlJMCADW7VlgN/NIz7+K2iqxyl5rifBvRR+N7HvlC0uOKCcVpJNVgPrOKisq5i6crc33kD1vnrC4UxdtE1GurnFmoAZTp05Ft27dkJubi8OHD8Pb2xuRkZHIzMysq/4RERFRHVM1M5CamoqtW7fCz88Pfn5+2LBhAyZOnIioqChs27YNnp6yEqtEREQNmatVIFQ1GCgpKYFeb/sjixcvhlarRXR0NFatWuXUzhEREdWHxnJJoLOoGgx06tQJ6enpFfcnuGbhwoVQFAWDBg1yaueIiIio7qk6Z+CRRx6pKEd8vUWLFmHEiBFQFNcaTRER0c3H1a4mUDUYSEhIwMaNG6t9PCkpCVarq52DSURENxveqIiIiIhcisMVCImIiG5WPIHQji+++ALp6em4//770bt3b3z77beYO3curFYrhg4dimeffdahjqS5y4qSBENWkOJunf1CPKe1siI3R62ywirSXWdUk06inBaya1t2KbL+uQkngprrPEQ5X2HOQ5E970P6QFEuWV8kyknOX9lcJLtB1l2eIaLcifILolyURbbvnWpxuyg3tFRW6OYjg68oN94k22brPUJFuYuQfb67WI2inKSg0Po9C0VtSQsd5VhLRLmWWtlr6K1vKcpJ/Sr8norUNhfl2pll3z/fuMveF29hkbdMjawIVF2rz6FAUlIS/vnPfyI7OxtdunTB/PnzERUVVWV23bp1WLJkCfbt2weTyYQuXbrg9ddfx8CBA1U9p6rDBEuXLsXQoUPx1Vdf4f7778fKlSsxZMgQBAcHo23btpg0aRLeeecdVR0gIiKiq9asWYNJkyZh+vTp2Lt3L6KiohAbG1ttcb8ffvgB9913HzZu3Ijdu3djwIABePjhh7F3715Vz6tqZmDBggVISkrC2LFjsW3bNjzwwAN4++23MWHCBADA3XffjTlz5uDFF19U1QkiIqKGpL5O/Js3bx7GjBmDZ555BgAwf/58bN68GUuWLEFiYmKl/Pz5823+P2vWLHz++ef44osv0KNHD/HzqpoZOHnyZMXUw4ABA2CxWNCvX7+Kx/v3749Tp06paZKIiKjBsUJx2mIymVBQUGCzmEymSs9ZVlaG3bt3IyYmxmZ9TEwMUlNTZf22WlFYWIjmzWWHg65RNRho0aJFxS/7s2fPory83Gbq4tSpU6IOVPXGWBT7N5UhIiK6EZxZZyAxMRE+Pj42S1V/5V+4cAEWiwX+/v426/39/ZGTI7ux1dtvv42ioiIMGzZM1etVdZhg8ODBGDNmDEaOHIkNGzYgLi4OL7/8MrRaLTQaDeLj4yuNaKqSmJiIN954w2ZdL5/OuNO3q6rOExERNXQJCQmYPHmyzTqDwVBt/vq7iyqKIrrjaHJyMl5//XV8/vnnaNlS3QmqqgYDb731FkwmE1avXo2+fftiwYIFeOeddzB48GCYzWZER0dXOdq5XlVvzNRuo1V1nIiIqK4485wBg8FQ4y//a/z8/KDT6SrNAuTm5laaLbjemjVrMGbMGHzyySf405/+pLqPqgYDnp6eeO+992zWvfLKK3j++edhNpvh5SW7r3VVb4xOeO9sIiKiuqbUw8WF7u7uCA8PR0pKCh555JGK9SkpKRg8eHC1P5ecnIzRo0cjOTkZDz74oEPP7ZSiQ0ajEUaj7NpaIiIiqtrkyZPx1FNPISIiAr1798ayZcuQmZmJcePGAbg6s37mzBmsWLECwNWBQFxcHN555x3cfffdFbMKHh4e8PHxET+v6nLEJSUl2LFjBzIyMio9VlpaWtFBIiKixqq+7k0wfPhwzJ8/HzNmzMAdd9yBH374ARs3bkRIyNXCZ9nZ2TYn7r/77rsoLy/HxIkTERgYWLGovcRfo6i4zeCRI0cQExODzMxMaDQaREVFITk5GYGBVyvHnTt3DkFBQbBY1F8Z0KZ5N1GurYfspIhAvf1DFrdo7B/DAYCfy2RncXZ0ayHK5Smlotzx0vOiXLB7M1FOp5GN/S5bZBXFmulk1SBzzbLKaFklsup9HZoGiXKSCo7ZpkuitoINsst0csoui3KhxltEObPwKptB2pqPJ16zouyEKFdkqXzZU1UimrQW5bLKZftAoUX22Six2q9Sd6tR9p6s27NAlLut4yP2QwA0ws9ZL2FVS6mfr8i2bTN32eFc6a8GH72sEum5Mtk+UGCWVRg9c+mgKOeoCW3VnY1fk6ST/3VaW3VF1czA1KlT0a1bN+Tm5uLw4cPw9vZGZGRktZWRiIiIqOFTNRhITU3FrFmz4Ofnhw4dOmDDhg2IjY1FVFQUjh8/Xld9JCIiuqGcWWegMVB1AmFJSQn0etsfWbx4MbRaLaKjo7Fq1Sqndo6IiKg+8K6FNejUqRPS09MRFhZms37hwoVQFAWDBg1yaueIiIio7qk6TPDII48gOTm5yscWLVqEESNGiE86ISIiaqjq62qC+qJqMJCQkICNGzdW+3hSUhKs1sby0omIiKqmOPFfY+CUokNEREQ3E1f7s1Z10SEiIiK6uaieGSgqKsKqVauQmpqKnJwcaDQa+Pv7IzIyEiNGjICnp6dDHWnfJECU66OXFWtJt9gvJtNOWDRHWlznkiIr1CIphgMA440dRbn9WlmhFjfh866+dFSUK7OUi3Jj/e8W5Q4Ii5f00PuJcr9Z8u1msoRFfVrrvUW5M6aLolygTvY5+bn4tCiX00T2nuSaLotyY73vEOWOQVagqlSR7SsDjLIiRiWCv9tyrLK+SYsJHTm8XpQbHf6KKNcesv1dKtso2wce0geKcp+aZfteP72suFO+8HObaZUVHaprjWV631lUzQxkZGTgtttuw5QpU3Dp0iW0adMGrVq1wqVLlxAfH4+OHTtWWaaYiIioMXG1EwhVzQxMnDgR/fr1w0cffQR3d3ebx8rKyjBq1ChMnDgR27Ztc2oniYiIqO6oGgzs3LkT6enplQYCwNVbL7766qu488477bZjMplgMtlOqVsVK7TCmt5ERER1yepil8mr+u3brFkzHD1a/fHkY8eOoVkz+zfNSUxMhI+Pj81yqlB2kw0iIqK65mrliFUNBsaOHYuRI0di7ty52L9/P3JycnDu3Dns378fc+fOxejRo/Hcc8/ZbSchIQH5+fk2S4hXqMMvgoiIiByn6jDB66+/Dg8PD8ybNw9TpkyBRnP17HRFURAQEIBp06ZhypQpdtsxGAwwGGxvH8xDBERE1FDw3gR2TJ06FVOnTsWJEyeQk5MDAAgICEBoKP+yJyKimwMvLbTj0KFDWL58OcrKytC7d280a9YMc+bMwejRo/Htt9/WRR+JiIioDqmaGdi0aRMGDx6Mpk2bori4GOvXr0dcXBy6d+8ORVEwcOBAbN68Gffcc4/qjrTX+4hyewTFhADgfHmh3cxxrcFuBgC8tZWvnqhKc40sd15YnCgdxaKctIiRWZFd8dqrWQdRLrvssijXyeImyn1ZmifK6YyyMew9OvsFqvSeOlFbWeUFopyfu6w4UaawvaFNZNsiw2p/fweAdsLiXlJpRSdFOU+97LN21HpFlPMRfNZaao2itjTCQ5TSYkIf7J4ryk2JeFWUk5IWdrqgkRXa8hR+752CrOjZqXL7RcAAIET4u6CuNZb6AM6iamZgxowZiI+PR15eHpYvX47HH38cY8eORUpKCrZu3YopU6Zg9uzZddVXIiKiG8IKxWlLY6BqMHDw4EGMGjUKADBs2DAUFhbi0UcfrXh8xIgROHDggFM7SEREdKO52l0LHT6FX6vVwmg0wtfXt2Kdl5cX8vNlU0FERETUMKgaDLRt2xbHjh2r+H9aWhratGlT8f/Tp08jMFB2EwwiIqKGivcmqMH48eNhsfxx8knXrl1tHv/6668dOnmQiIioIVFcrByxqsHAuHHjanx85syZteoMERER3Xiqiw4RERHd7BrLVQDOwsEAERHRdRrLsX5naTCDATfhuYy36WQFKR5AC7uZHI1sc++0yIrhlGhkRT/eDS4R5UZmyYr1uGtkhXOkTpScF+UiPNvYDwH4c4fTotx7B5uIct7CYlHP3pZlN/Pxftm2uN0jSJTLFBZW8dPJXusBq6w4UVetrNjR2pJsUa6LVVacKLbpraJcnrDQVoTSVJT7wpJjN9Nb31LUVi/PEFGuPTxEOWkxoTnps0Q5qYiuT4py06NzRblp39v/DgUAb8i+f3R6X1GuOWTfe+RcDl1amJWVhStXKlcKM5vN+OGHH2rdKSIiovrEOgM1yM7Oxp133omQkBD4+vpi5MiRNoOCixcvYsCAAU7vJBER0Y3ECoQ1mDZtGnQ6HXbu3IlNmzYhIyMD/fv3x6VLf9wvwNUuxyAiImrsVJ0zsHXrVqxfvx4REREAgKioKAwfPhz33HMPvvnmGwCARmP/pjkmkwkmk+0xRItigc7Jx76JiIgc4Wp/2KqaGcjPz0ezZs0q/m8wGPDpp5+ibdu2GDBgAHJzZSemJCYmwsfHx2bZk/+bup4TERHVEVerQKhqMNCuXbtKNyLS6/X45JNP0K5dOzz00EOidhISEpCfn2+z9PTppKYrREREdYYnENYgNjYWy5Ytq7T+2oDgjjvuELVjMBjg7e1ts/AQARERUf1Qdc7AzJkzUVxcXHVDej3WrVuHrCz713YTERE1ZI3lKgBnUTUzoNfrcebMGSxfvhy//Xb1GP9vv/2G8ePHY/To0fj+++8REiIr4EFERNRQKYritKUxUDUzsGnTJgwePBhNmzZFcXEx1q9fj7i4OHTv3h2KomDgwIHYvHkz71xIRETUiKiaGZgxYwbi4+ORl5eH5cuX4/HHH8fYsWORkpKCrVu3YsqUKZg9e3Zd9ZWIiOiGYNGhGhw8eBCjRo0CAAwbNgyFhYV49NFHKx4fMWJEpasNiIiIGhteTSD9Qa0WRqMRvr6+Feu8vLyQny+7UQsRERE1DKoGA23btsWxY8cq/p+WloY2bf64c93p06cRGBjovN4RERHVA6uiOG1pDFSdQDh+/HhYLJaK/3ft2tXm8a+//ponDxIRUaPXOH6FO4+qwcC4ceNqfHzmzJm16gwRERHdeKoGA0RERK6gsVwF4CwcDBAREV2HgwEHtGvXDps3b8att97qcBvFsNgPAThtLRLlzDpPu5ky4f2k0vIOi3J9W4TJckfyRLlOTYJEOTfheaAXLFWXkr5eS4OPKPfdZdmdJgcdChDlRhhCRbkmwtuARe7Ltptpa/QTtfV7mWybtXVvZj8EoBe8RLntuCzKFQo/P25a2T1AppfL9vlys+x5vfQeolypu6y9IJ399+9Xa4GorV+LToty2cJ9pVQpF+Uiuj4pykml//qxKNejy+OiXGt3d1HOQ+sma0/TRJQ7ai0U5epaY6kc6CyqBgMLFiyocn1mZiaWL1+OgICrX/ovvPBC7XtGREREN4SqwcCkSZMQHBwMvd72x6xWK1asWAE3NzdoNBoOBoiIqFHjYYIajB07Fj///DNWrVqFsLA/psTd3NywZcsWdO7cWdSOyWSCyWSyWWdRLLyNMRERNQiNpXKgs6gqOvTuu+/itddew8CBA7Fo0SKHnzQxMRE+Pj42yy/5smOURERE5FyqyxEPGTIEaWlpWL9+PWJjY5GTk6P6SRMSEpCfn2+zdPPpqLodIiKiusBbGAsEBwdj69atmD17Nnr06KH6xRoMBhgMBpt1PERAREQNBc8ZENJoNEhISEBMTAx27NjBexIQERE1UqoPExw6dAjLly/Hb79dvcbc09MTv/32G15++WV8++23Tu8gERHRjcbDBDXYtGkTBg8ejKZNm6K4uBjr169HXFwcunfvDkVRMHDgQGzevNmhmxVZhG9YG639YkIAcF4x2c3ooRG1Fd6svSgnHVn9OlFWnOiZj0pFuYtW+68VAHx0RlFub8FJUc7PKCtOtDZE9s48fFL2vAZhkZP/uLWxmxlfnitqq6nWYD8EINMsu4V3sV5WmKZckVVYkhbtyiq6IMrN8+0tyn2lkxX2kX42wuEtyn1ntV8EKlLbXNTWGffLotxDetns5wWNbFtMj5bte1LSYkJ7D64S5V6OSBDljMKtu/aKsHhbU1nxsbrmaocJVM0MzJgxA/Hx8cjLy8Py5cvx+OOPY+zYsUhJScHWrVsxZcoUzJ49u676SkREdNNLSkpCaGgojEYjwsPDsX379hrz33//PcLDw2E0GtGuXTssXbpU9XOqGgwcPHgQo0aNAgAMGzYMhYWFePTRRyseHzFiBA4cOKC6E0RERA2J4sR/aqxZswaTJk3C9OnTsXfvXkRFRSE2NhaZmZlV5k+cOIEHHngAUVFR2Lt3L1599VW88MILWLt2rarnVX3OQMUParUwGo3w9fWtWOfl5YX8fNk0KRERUUNlVRSnLWrMmzcPY8aMwTPPPIOwsDDMnz8frVu3xpIlS6rML126FG3atMH8+fMRFhaGZ555BqNHj8bcuXNVPa+qwUDbtm1x7Nixiv+npaWhTZs/jsuePn2aVxUQEVGj58yZAZPJhIKCApvl+iq8AFBWVobdu3cjJibGZn1MTAxSU1Or7GdaWlql/MCBA5Geng6z2Sx+vaoGA+PHj4fF8sfJMV27drW5T8HXX3/t0MmDREREN6uqqu4mJiZWyl24cAEWiwX+/v426/39/ast8JeTk1Nlvry8HBcuyE4YBlReTTBu3LgaH585c6aa5oiIiBoktdP7NUlISMDkyZNt1l1feO9/aTS2V7opilJpnb18Vetr4nDRISIiopuVM29UVFXV3ar4+flBp9NVmgXIzc2t9Nf/NQEBAVXm9Xo9WrRoIe6jwycQEhERkfO4u7sjPDwcKSkpNutTUlLQp0+fKn+md+/elfJbtmxBREQE3NxkNVkAQKM0kPJIfwkZLMrlCwvseGvd7WYeLJcVONmivyLKmSErEOOrsd83ADhvlRUdKlVkRU7cNLKx350aX1HuF8jel+bC11smfP+aQbaDFwoK8ZywyIrmtNY1FeW25P8mysX6yApP7THJbgT2J0NrUe6oVbbNPDSyScNj5ouiXBs3WYGqHvAS5X5S7F+1NLzcV9TW29YTopybVvaeeAq+ewCgk76ZKCd10lIoyt2qk33vvZ1e+Zh2VZ6JiBfl/CF7X64IC2glnfyvKOeo226JcFpbR86ni7Nr1qzBU089haVLl6J3795YtmwZ3nvvPRw8eBAhISFISEjAmTNnsGLFCgBXLy3s2rUrnnvuOYwdOxZpaWkYN24ckpOTbS79t0fVYYKsrCwYjUb4+fkBALZv346lS5ciMzMTISEhmDhxInr3llUuIyIiaqiceZhAjeHDhyMvLw8zZsxAdnY2unbtio0bNyIkJAQAkJ2dbVNzIDQ0FBs3bsRLL72ExYsXIygoCAsWLFA1EABUHiYYNmwYdu3aBQD4/PPP0b9/f1y5cgWRkZEoLi5GdHQ0vvzyS1UdICIioj9MmDABJ0+ehMlkwu7du9GvX7+Kxz788EN89913Nvno6Gjs2bMHJpMJJ06csHuyf1VUzQz8+uuvCAu7Or2ZmJiIWbNmYerUqRWPL1q0CP/4xz/w0EMPqe4IERFRQ+HMqwkaA1UzA1qtFgUFV4+xnjhxArGxsTaPx8bG4vBh+zejqKoAg0V43JuIiKiu1Vc54vqiajAQHR2N5ORkAECPHj0qTVVs27YNwcHBdtupqgDDb/lH1XSFiIiInETVYYLZs2cjKioKZ8+eRd++fTF9+nTs2rULYWFhOHz4MNasWSO6W1JVBRhGdZXdfpOIiKiuKcLbh98sVA0GwsLCsHPnTkyfPh1z5sxBUVERVq5cCb1ej169emH16tUYMmSI3XaqKsCg0+hUdZyIiKiuWBvJ9L6zqK5A2L59e6xevRqKoiA3NxdWqxV+fn6qihsQERE1ZA2kBM8No7oC4aFDh7B8+XIcOXIE/v7+yM/PxwsvvIDRo0fj22+/rYs+EhERUR1SNTOwadMmDB48GE2bNkVxcTHWr1+PuLg4dO/eHYqiYODAgdi8ebNDdy48YjovykUbZZXWvGD/sMN2N1mFvyyzrLKXWXpFhJusAlg3rSz3kyVP9rzCge5PuCTKWYQj51BdE1HumFIkynkIDyl5CMa6P1+Unbh6q7Aa2bmiy6JcZBOjKOdpaCXKGYXj+svCqpYBwup4gXpZxUCz8PhruEn2Gcr3sP+837iXiNryKfcQ5frpq64Nf71TkL3H3oLvKDU8tLLZWem+Iq0s+H76P0W5KRGvinJnrMWiXF1ztcMEqmYGZsyYgfj4eOTl5WH58uV4/PHHMXbsWKSkpGDr1q2YMmUKZs+eXVd9JSIiuiEURXHa0hioGgwcPHgQo0aNAnC1GmFhYaFNycMRI0bgwIEDTu0gERER1S2Hb2Gs1WphNBrh6+tbsc7Lywv5+fZvIEJERNSQsQJhDdq2bYtjx45V/D8tLQ1t2rSp+P/p06cRGBjovN4RERHVA1erQKhqZmD8+PGwWP44wadr1642j3/99dcOnTxIRERE9UfVYMDenZBmzpxZq84QERE1BI3lxD9ncficASIiopsVLy0kIiIil6J6ZuCLL75Aeno67r//fvTu3Rvffvst5s6dC6vViqFDh+LZZ591qCPtDX6i3IFyWYGdUL2P3cxtiqzYyM7yK6KcTiMbW2WZC0S5Qp1ZlDtekivKuWtlm7u7h/07TwJAniIr6rLPIitiNFyR7QNfamVXrLgLihO19ZIVkjlpkRWeauPdUpRrapX91fFdaaYod78xRJSTFqY5p8gK5xRYTaJciVW2L68wyvqXZ7HfP2+tu6itc2Wyz2O+XrZ/niqX7Z86va8oJ9VaIyvutfaK/dvMA8CjTTuKctJiQnPSZ4lyI8IniXJ1zdUOE6iaGVi6dCmGDh2Kr776Cvfffz9WrlyJIUOGIDg4GG3btsWkSZPwzjvv1FVfiYiIbgirojhtaQxUzQwsWLAASUlJGDt2LLZt24YHHngAb7/9NiZMmAAAuPvuuzFnzhy8+OKLddJZIiKiG4EzAzU4efIkBg4cCAAYMGAALBYL+vXrV/F4//79cerUKbvtmEwmFBQU2CwWaV1/IiIicipVg4EWLVpU/LI/e/YsysvLkZn5xzHNU6dOoXnz5nbbSUxMhI+Pj81yJP+Y3Z8jIiK6EaxQnLY0BqoOEwwePBhjxozByJEjsWHDBsTFxeHll1+GVquFRqNBfHw8YmJi7LaTkJCAyZMn26x7qusIdT0nIiKqI652mEDVYOCtt96CyWTC6tWr0bdvXyxYsADvvPMOBg8eDLPZjOjoaCQmJtptx2AwwGAw2KzTCW9LS0RERM6lajDg6emJ9957z2bdK6+8gueffx5msxleXrJ7mxMRETVkjeUqAGdRXWfg0KFD+Omnn9CnTx907NgRv/32G9555x2YTCY8+eSTvDcBERE1eo3lBkPOomowsGnTJgwePBhNmzZFcXEx1q9fj7i4OHTv3h2KomDgwIHYvHmzQwOCQeXeotzXetk5j2bBqO4gikVttXST9S1YJ5sZKUW5KNdW4ynKlRll7blrZJt7T/FpUa6Vwf7JogAQo5MV9lmrXBTlyoRXnkRr7BeJ8TW2ErWVp8iK67R0t1/sCgB2ultFuc6KrIhRgFV2mO0Hi6xQVLxFdvfR2cICUNKCXAEag/0QgC5a+wV2MjVlorYKzEWiXKZVlgsRFDwDgOaQFViSOmqVFcbq2zRUlLsC2efsjFX2PSotJpS8e74oR86l6mqCGTNmID4+Hnl5eVi+fDkef/xxjB07FikpKdi6dSumTJmC2bNn11VfiYiIbghXKzqkajBw8OBBjBo1CgAwbNgwFBYW4tFHH614fMSIEThw4IBTO0hERHSjKYritKUxcPhGRVqtFkajEb6+vhXrvLy8kJ8vmzYkIiKihkHVYKBt27Y4duyP4kBpaWlo06ZNxf9Pnz6NwEDZsUYiIqKGSnHiv8ZA1QmE48ePh8Xyx0klXbt2tXn866+/5tUERETU6DWW6X1nUTUYGDduXI2Pz5w5s1adISIiaghcbTDg8DkDREREdHNQXXSIiIjoZuda8wIAlAaqtLRUee2115TS0tIG115D7hvbazhtsb2G1V5D7hvbo/qmUZSGeWCkoKAAPj4+yM/Ph7e3rALgjWqvIfeN7XHbsr3G1ze2V/v2qHZ4zgAREZGL42CAiIjIxXEwQERE5OIa7GDAYDDgtddeg8Egu4vZjWyvIfeN7TWctthew2qvIfeN7VF9a7AnEBIREdGN0WBnBoiIiOjG4GCAiIjIxXEwQERE5OI4GCAiInJxHAwQERG5uAY5GEhKSkJoaCiMRiPCw8Oxfft2h9pJTExEr1694OXlhZYtW2LIkCE4fPiw0/qZmJgIjUaDSZMmOdzGmTNn8OSTT6JFixZo0qQJ7rjjDuzevduhtsrLy/G3v/0NoaGh8PDwQLt27TBjxgxYrVbRz//www94+OGHERQUBI1Gg88++8zmcUVR8PrrryMoKAgeHh7o378/Dh48qLots9mMqVOnolu3bvD09ERQUBDi4uJw9uxZh/v2v5577jloNBrMnz+/Vu0dOnQIgwYNgo+PD7y8vHD33XcjMzPTofauXLmC559/Hq1atYKHhwfCwsKwZMmSKtuS7LdqtoW99tRuD7WfK3vbQ9qedHtI2pNujyVLluD222+Ht7c3vL290bt3b3z99dcVj6vZDvbac+RzYa9//0vyuZC0p+ZzYa89NZ8LqlsNbjCwZs0aTJo0CdOnT8fevXsRFRWF2NjYane2mnz//feYOHEifvrpJ6SkpKC8vBwxMTEoKiqqdT937dqFZcuW4fbbb3e4jUuXLiEyMhJubm74+uuvkZGRgbfffhu+vr4OtffWW29h6dKlWLRoEQ4dOoQ5c+bgn//8JxYuXCj6+aKiInTv3h2LFi2q8vE5c+Zg3rx5WLRoEXbt2oWAgADcd999KCwsVNVWcXEx9uzZg7///e/Ys2cP1q1bhyNHjmDQoEEO9+2azz77DDt37kRQUFCtXuvvv/+Ovn37olOnTvjuu++wf/9+/P3vf4fRaHSovZdeegmbNm3Cxx9/jEOHDuGll17CX//6V3z++eeVspL9Vs22sNee2u2h5nMl2R6S9tRsD0l70u3RqlUrzJ49G+np6UhPT8c999yDwYMHV/zCV7Md7LXnyOfCXv/UbAdJe2o/F/baU/O5oDpWjzdJqtKdd96pjBs3zmZdp06dlGnTptW67dzcXAWA8v3339eqncLCQuXWW29VUlJSlOjoaOXFF190qJ2pU6cqffv2rVVf/teDDz6ojB492mbd0KFDlSeffFJ1WwCU9evXV/zfarUqAQEByuzZsyvWlZaWKj4+PsrSpUtVtVWVn3/+WQGgnDp1SnXfrsnKylKCg4OVX3/9VQkJCVH+9a9/2W2ruvaGDx/u0PtWXXtdunRRZsyYYbOuZ8+eyt/+9je77V2/39ZmW1TVXlXUbI/q2nN0e1TVXm22R1Xt1WZ7NGvWTHn//fdrvR2ub68qarZDde05uh2qaq8226Gq9mqzHci5GtTMQFlZGXbv3o2YmBib9TExMUhNTa11+/n5+QCA5s2b16qdiRMn4sEHH8Sf/vSnWrWzYcMGRERE4C9/+QtatmyJHj164L333nO4vb59++Kbb77BkSNHAAD79+/Hjh078MADD9SqnwBw4sQJ5OTk2Gwbg8GA6Ohop20bjUbj8KyI1WrFU089hfj4eHTp0qVWfbFarfjqq69w2223YeDAgWjZsiXuuuuuGg9N2NO3b19s2LABZ86cgaIo2LZtG44cOYKBAwfa/dnr99vabgvJ50DN9qiqvdpsj+vbq+32qKp/jmwPi8WC1atXo6ioCL179671dri+ver6Lt0OVbVXm+1wfXu13Q5V9a82nwtysnoejNg4c+aMAkD58ccfbdbPnDlTue2222rVttVqVR5++OFa/yWenJysdO3aVSkpKVEURanVzIDBYFAMBoOSkJCg7NmzR1m6dKliNBqVjz76yKH2rFarMm3aNEWj0Sh6vV7RaDTKrFmzHGoL1/11++OPPyoAlDNnztjkxo4dq8TExKhq63olJSVKeHi48sQTTzjUN0VRlFmzZin33XefYrVaFUVRajUzkJ2drQBQmjRposybN0/Zu3evkpiYqGg0GuW7775zqH8mk0mJi4tTACh6vV5xd3dXVqxYYbetqvbb2mwLyedAzfaorj1Ht0dV7dVme1TXPzXb48CBA4qnp6ei0+kUHx8f5auvvlIUxfHtUF1715Nuh5rac2Q7VNeeo9uhpv45+rkg59PXw/jDLo1GY/N/RVEqrVPr+eefx4EDB7Bjxw6H2zh9+jRefPFFbNmypdpjZGpYrVZERERg1qxZAIAePXrg4MGDWLJkCeLi4lS3t2bNGnz88cdYtWoVunTpgn379mHSpEkICgrCyJEja91fwPnbxmw247HHHoPVakVSUpJDbezevRvvvPMO9uzZU+v9BEDFCZeDBw/GSy+9BAC44447kJqaiqVLlyI6Olp1mwsWLMBPP/2EDRs2ICQkBD/88AMmTJiAwMDAGmeYatpvHdkW9j4HardHVe3VZntU1V5ttkd1r1fN9ujYsSP27duHy5cvY+3atRg5ciS+//77isfVbofq2uvcuXNFRs12qK69kpISh7ZDde1dm51Qux1qer2Ofi6oDtTvWMSWyWRSdDqdsm7dOpv1L7zwgtKvXz+H233++eeVVq1aKcePH69V/9avX68AUHQ6XcUCQNFoNIpOp1PKy8tVtdemTRtlzJgxNuuSkpKUoKAgh/rXqlUrZdGiRTbr3nzzTaVjx46q28J1f93+/vvvCgBlz549NrlBgwYpcXFxqtq6pqysTBkyZIhy++23KxcuXHC4b//6178qtsH/bhetVquEhISobs9kMil6vV558803bXJTpkxR+vTpo7q94uJixc3NTfnyyy9tcmPGjFEGDhxYbTvV7beObgt7nwO126O69hzdHtW15+j2qK49R7fHNffee6/y7LPP1uozUVV71zj6ubi+vdp+Lq5vr7afi+vbq+12IOdqUOcMuLu7Izw8HCkpKTbrU1JS0KdPH9XtKYqC559/HuvWrcO3336L0NDQWvXv3nvvxS+//IJ9+/ZVLBEREXjiiSewb98+6HQ6Ve1FRkZWuuTpyJEjCAkJcah/xcXF0GptN6lOpxNfWliT0NBQBAQE2GybsrIyfP/99w5tG7PZjGHDhuHo0aPYunUrWrRo4XDfnnrqKRw4cMBmuwQFBSE+Ph6bN29W3Z67uzt69erltG1jNpthNpvF28befqt2W0g+B2q2h7321G4Pe+2p3R722lO7Papq32QyOe0zca29a32r7efiWnvO+lxca89Zn4tr7dV2O5CT1dMgpFqrV69W3NzclH//+99KRkaGMmnSJMXT01M5efKk6rbGjx+v+Pj4KN99952SnZ1dsRQXFzutv7U5Z+Dnn39W9Hq9MnPmTOXo0aPKypUrlSZNmigff/yxQ+2NHDlSCQ4OVr788kvlxIkTyrp16xQ/Pz9lypQpop8vLCxU9u7dq+zdu1cBUHFc8NqZzLNnz1Z8fHyUdevWKb/88osyYsQIJTAwUCkoKFDVltlsVgYNGqS0atVK2bdvn822MZlMDvXtevaOjdprb926dYqbm5uybNky5ejRo8rChQsVnU6nbN++3aH2oqOjlS5duijbtm1Tjh8/rixfvlwxGo1KUlJSpbYk+62abWGvPbXbw5HPVU3bQ9Kemu0haU+6PRISEpQffvhBOXHihHLgwAHl1VdfVbRarbJlyxbV28Fee458Luz1T812kLSn9nNhrz01nwuqWw1uMKAoirJ48WIlJCREcXd3V3r27OnwpYAAqlyWL1/utL7WZjCgKIryxRdfKF27dlUMBoPSqVMnZdmyZQ63VVBQoLz44otKmzZtFKPRqLRr106ZPn16tV8k19u2bVuV79fIkSMVRbl6MtZrr72mBAQEKAaDQenXr5/yyy+/qG7rxIkT1W6bbdu2OdS369n70pO09+9//1vp0KGDYjQale7duyufffaZw+1lZ2cro0aNUoKCghSj0ah07NhRefvttytO7Ppfkv1Wzbaw157a7eHI56qm7SFtT7o9JO1Jt8fo0aMrvotuueUW5d5777X5RatmO9hrz5HPhb3+Xc/e50LSnprPhb321HwuqG5pFEVR1M0lEBER0c2kQZ0zQERERDceBwNEREQujoMBIiIiF8fBABERkYvjYICIiMjFcTBARETk4jgYICIicnEcDBAREbk4DgaIiIhcHAcDRERELo6DASIiIhf3/wAPwvNn9bZE8gAAAABJRU5ErkJggg==", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import tensorflow as tf\n", "import tensorflow_hub as hub\n", "\n", "# change this to your own embedding directory\n", "embedding_dir = \"\"\n", "\n", "# load the embedding\n", "embed = hub.load(embedding_dir + \"universal-sentence-encoder_4\")\n", "\n", "df[\"embedding\"] = list(embed(df.OpportunityTitle))\n", "\n", "#make heatmap of similarities\n", "import seaborn as sns\n", "sns.heatmap(np.array(np.inner(df.embedding[:40].tolist(), \n", " df.embedding[:40].tolist())))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "14. Isolate twenty of the entries, then write a function that will take one of these twenty entries and an integer $k$ as an input; the function will return the $k$ most similar entries in the rest of the dataset.\n", "\n", "*The function and its setup are given as follows:*" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Grants most similar to grant Establishment of the Edmund S. Muskie Graduate Internship Program\n" ] }, { "data": { "text/plain": [ "427 Internship Training Program Interior Museum Pr...\n", "3015 Career Discovery Internship Program\n", "7994 Mosaics in Science Internship Program\n", "7995 Mosaics in Science Internship Program\n", "8964 Buiness Plan Internship Program\n", "Name: OpportunityTitle, dtype: object" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_set = df[:20]\n", "train_set = df[20:]\n", "\n", "def get_most_similar(text_df, point, n = 5):\n", " sentence = point.OpportunityTitle\n", " sentence_sim = np.inner(list(text_df.embedding), embed([sentence]))\n", " val = sorted(sentence_sim, reverse=True)[n]\n", " return text_df[sentence_sim > val].OpportunityTitle\n", "\n", "print(\"Grants most similar to grant\", test_set.iloc[0].OpportunityTitle)\n", "get_most_similar(train_set, test_set.iloc[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generative AI and Language Models\n", "With the rise of generative AI and large language models like the GPT-system of models developed by OpenAI, it is easier than ever to give a model a string of text and have it classify these texts into predicted UN SDGs. \n", "\n", "So how exactly do these models work? The exact mathematical theory behind these models is highly complex as they build on years of research on AI, natural language processing, and machine learning. We mention generative text models briefly in [a previous section](sec2_transform_features.ipynb), and state that these models are similar to probabilistic language models but are *generative* in the sense that they will generate the next word in the sequence based on a highly complex model.\n", "\n", "We won't have you recreate any generative AI programs here. Instead, we will provide a quick guide through best practices to use them in the context of text classification. \n", "\n", "The guiding principle is to **be as specific as possible and narrow the desired task as much as you can.** Expect that you will sometimes get incorrect results, or ones that do not align with the task you intended. As you go, iterate and fine-tune the prompts so that they become more specific.\n", "\n", "Additionally, some advanced prompting techniques exist, including few-shot prompting and chain-of-thought prompting, which provides some examples for the LLM or guides the LLM through a few reasoning steps, respectively.\n", "\n", "## Lab Exercises, Part 4: Generative AI and Language Models\n", "\n", "15. Use a LLM available online, such as ChatGPT, and ask it what UNSDGs it predicts some of the grants to fall under using simple question asking. For example, \"For a grant with the title __________, what UNSDG aligns best with the grant?\" Do the classifications make sense?\n", "\n", "*Answers may vary but should make sense if an advanced LLM is used.*\n", "\n", "16. Use few-shot prompting to classify some other grants. Utilize your own previous classification models (or your own classifications) to provide shots to the prompt. For example, \"The grant titled _____________ falls under UN SDG _____. The grant titled __________ falls under which UNSDG?\" Compare the classifications from this step with the classifications from the previous step.\n", "\n", "*Answers may vary but should make sense if an advanced LLM is used. Prompts should also follow the format specified in this question.*\n", "\n", "17. (Bonus) Check out the following additional resources for more details on LLM prompting. Code along with the code provided on the pages and provide your resulting notebooks to answer this question. Extend the principles found in resources to some of these grant titles as well.\n", "\n", "- [https://cookbook.openai.com/examples/multiclass_classification_for_transactions](https://cookbook.openai.com/examples/multiclass_classification_for_transactions): a resource from OpenAI, utilizing the capabilities of some of their own models, to classify text documents\n", "- [https://huggingface.co/docs/transformers/main/tasks/prompting](https://huggingface.co/docs/transformers/main/tasks/prompting): a resource from HuggingFace, a package we used in this section, that talks more about general LLM prompting.\n", "\n", "*Answers should follow the code examples given on the websites.*" ] } ], "metadata": { "kernelspec": { "display_name": "base", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 2 }