{
"cells": [
{
"cell_type": "markdown",
"id": "0128ad0b",
"metadata": {},
"source": [
"# Data preparation\n",
"\n",
"* From **business understanding**, we know the task to be solved. \n",
"* Then we do **data understanding** to look into data.\n",
"* Now we are going to do some necessary or useful data transformation to reach the aim.\n",
"\n",
"## Outline\n",
"0. Summary of data understanding\n",
"1. Missing and invalid data\n",
"2. Feature extraction\n",
"3. Making different statistical units\n",
"4. Data transformation\n",
"\n",
"## Data and tasks\n",
"* Titanic2 (*titanic_train.csv*) - data preparation for an analysis of ticket fares\n",
"* Home Credit (*application_train.csv*) - segmentation of clients by family situation"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "2fb31a8e",
"metadata": {},
"outputs": [],
"source": [
"# setup\n",
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"sns.set_theme(style=\"ticks\", color_codes=True)"
]
},
{
"cell_type": "markdown",
"id": "465ee695",
"metadata": {},
"source": [
"## Part I. Titanic and ticket fares\n",
"### Summary of data understanding\n",
"Just few facts from the exploration -- for the aim of this practice.\n",
"\n",
"Let's consider these columns only: *pclass*, *sex*, *age*, *ticket*, *fare*, *cabin*, *embarked*"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2a75a9ce",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
passenger_id
\n",
"
ticket
\n",
"
pclass
\n",
"
fare
\n",
"
sex
\n",
"
age
\n",
"
cabin
\n",
"
embarked
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
1216
\n",
"
335432
\n",
"
3
\n",
"
7.7333
\n",
"
female
\n",
"
NaN
\n",
"
NaN
\n",
"
Q
\n",
"
\n",
"
\n",
"
1
\n",
"
699
\n",
"
315089
\n",
"
3
\n",
"
8.6625
\n",
"
male
\n",
"
38.0
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
2
\n",
"
1267
\n",
"
345773
\n",
"
3
\n",
"
24.1500
\n",
"
female
\n",
"
30.0
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
3
\n",
"
449
\n",
"
29105
\n",
"
2
\n",
"
23.0000
\n",
"
female
\n",
"
54.0
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
4
\n",
"
576
\n",
"
28221
\n",
"
2
\n",
"
13.0000
\n",
"
male
\n",
"
40.0
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
845
\n",
"
158
\n",
"
680
\n",
"
1
\n",
"
50.0000
\n",
"
male
\n",
"
55.0
\n",
"
C39
\n",
"
S
\n",
"
\n",
"
\n",
"
846
\n",
"
174
\n",
"
11771
\n",
"
1
\n",
"
29.7000
\n",
"
male
\n",
"
58.0
\n",
"
B37
\n",
"
C
\n",
"
\n",
"
\n",
"
847
\n",
"
467
\n",
"
244367
\n",
"
2
\n",
"
26.0000
\n",
"
female
\n",
"
24.0
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
848
\n",
"
1112
\n",
"
SOTON/O.Q. 3101315
\n",
"
3
\n",
"
13.7750
\n",
"
female
\n",
"
3.0
\n",
"
NaN
\n",
"
S
\n",
"
\n",
"
\n",
"
849
\n",
"
425
\n",
"
250647
\n",
"
2
\n",
"
13.0000
\n",
"
male
\n",
"
52.0
\n",
"
NaN
\n",
"
S
\n",
"
\n",
" \n",
"
\n",
"
850 rows × 8 columns
\n",
"
"
],
"text/plain": [
" passenger_id ticket pclass fare sex age cabin \\\n",
"0 1216 335432 3 7.7333 female NaN NaN \n",
"1 699 315089 3 8.6625 male 38.0 NaN \n",
"2 1267 345773 3 24.1500 female 30.0 NaN \n",
"3 449 29105 2 23.0000 female 54.0 NaN \n",
"4 576 28221 2 13.0000 male 40.0 NaN \n",
".. ... ... ... ... ... ... ... \n",
"845 158 680 1 50.0000 male 55.0 C39 \n",
"846 174 11771 1 29.7000 male 58.0 B37 \n",
"847 467 244367 2 26.0000 female 24.0 NaN \n",
"848 1112 SOTON/O.Q. 3101315 3 13.7750 female 3.0 NaN \n",
"849 425 250647 2 13.0000 male 52.0 NaN \n",
"\n",
" embarked \n",
"0 Q \n",
"1 S \n",
"2 S \n",
"3 S \n",
"4 S \n",
".. ... \n",
"845 S \n",
"846 C \n",
"847 S \n",
"848 S \n",
"849 S \n",
"\n",
"[850 rows x 8 columns]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# data reading\n",
"df1 = pd.read_csv('titanic_train.csv')\n",
"df1 = df1[['passenger_id', 'ticket', 'pclass', 'fare', 'sex', 'age', 'cabin', 'embarked']]\n",
"df1"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "93fa11a8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"passenger_id 0.000000\n",
"ticket 0.000000\n",
"pclass 0.000000\n",
"fare 0.001176\n",
"sex 0.000000\n",
"age 0.204706\n",
"cabin 0.775294\n",
"embarked 0.001176\n",
"dtype: float64\n"
]
}
],
"source": [
"# share of missing data (NaN, NULL) by columns\n",
"print(1 - df1.count()/len(df1))"
]
},
{
"cell_type": "markdown",
"id": "1003360b",
"metadata": {},
"source": [
"* *ticket*, *pclass* and *sex* are complete\n",
"* *fare* and *embarked* have negligible counts of missing data\n",
"* *age* and *cabin* have significant counts of missing data"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "26a4f014",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pclass\n",
"3 478\n",
"1 206\n",
"2 166\n",
"Name: count, dtype: int64\n",
"sex\n",
"male 551\n",
"female 299\n",
"Name: count, dtype: int64\n",
"embarked\n",
"S 589\n",
"C 176\n",
"Q 84\n",
"Name: count, dtype: int64\n",
"ticket\n",
"CA. 2343 10\n",
"1601 8\n",
"S.O.C. 14879 6\n",
"CA 2144 6\n",
"PC 17608 6\n",
"Name: count, dtype: int64\n",
"cabin\n",
"G6 4\n",
"D 4\n",
"B96 B98 4\n",
"C22 C26 4\n",
"B57 B59 B63 B66 4\n",
"Name: count, dtype: int64\n"
]
}
],
"source": [
"# invalid values in data?\n",
"# frequency tables of categorical columns\n",
"print(df1['pclass'].value_counts())\n",
"print(df1['sex'].value_counts())\n",
"print(df1['embarked'].value_counts())\n",
"# the most often values in string columns\n",
"print(df1['ticket'].value_counts().sort_values(ascending=False)[:5])\n",
"print(df1['cabin'].value_counts().sort_values(ascending=False)[:5])"
]
},
{
"cell_type": "markdown",
"id": "965f233b",
"metadata": {},
"source": [
"> String columns (*ticket*, *cabin*) have expected frequencies -- no value has too high frequency. \n",
"> Categorical columns seem to have valid values.\n",
"\n",
"Let's look into numeric columns (*age*, *fare*)."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "fdb1a868",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fare: minimum= 0.0 ; maximum= 512.3292 ; median= 14.1083\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\jhucin\\Anaconda3\\lib\\site-packages\\seaborn\\axisgrid.py:118: UserWarning: The figure layout has changed to tight\n",
" self._figure.tight_layout(*args, **kwargs)\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAeQAAAHkCAYAAADvrlz5AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8pXeV/AAAACXBIWXMAAA9hAAAPYQGoP6dpAAAsN0lEQVR4nO3df1TVdYL/8ddFll+LmBgC49RoEJKpYEFLJeLgMnbSmlhn1qPijD/GrAzTSj2lFY2pzYi/2FYZDYsZc3QU13XcqUmZZqs5HANOthWi4SK1CZI3hVR+iHy+f/jljjfAAC/cN97n4xzOkc/nc+99f95pTz4/uNdmWZYlAADgVl7uHgAAACDIAAAYgSADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEI8rekpaUpLS3N3cMAAHgYb3cPwDSVlZXuHgIAwANxhAwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYwNvdA/Ak298q1emaOknSjf38NfW+aDePCABgCoLcg07X1KnKfsHdwwAAGIhT1gAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGMDtQbbb7Vq0aJESEhI0atQoPfzwwyorK3OsP3LkiNLS0hQbG6uxY8cqJyfH6fHNzc3KyspSYmKiYmJiNGvWLFVUVPT0bgAAcE3cHuRHH31UX3zxhbZs2aLdu3fLz89PM2bMUF1dnc6cOaOZM2dq8ODBysvLU3p6ujZs2KC8vDzH4zdu3KgdO3bopZde0s6dO2Wz2TRnzhw1Nja6ca8AAOgct76X9ZkzZ/T9739fjz76qG699VZJ0mOPPaYf//jH+uyzz1RQUCAfHx9lZGTI29tbERERqqio0JYtWzRp0iQ1NjZq69atWrRokZKSkiRJ69atU2Jiog4cOKAJEya4c/cAAOgwtx4h9+/fX2vXrnXE+PTp08rJyVFYWJgiIyNVVFSk+Ph4eXv//eeGhIQElZeXy263q7S0VOfPn1dCQoJjfVBQkIYNG6bCwsIe3x8AALrKmE97eu655/SHP/xBPj4+2rRpkwICAlRVVaWoqCin7QYOHChJOnnypKqqqiRJ4eHhrbaprKxs97XGjRvX7rrKyspWzwcAQHdz+zXkFj//+c+Vl5enBx98UPPmzdOnn36q+vp6+fj4OG3n6+srSWpoaFBd3eXPFm5rm4aGhp4ZOAAALmDMEXJkZKQkafny5Tp8+LC2bdsmPz+/VjdntYQ2ICBAfn5+kqTGxkbHn1u28ff3b/e18vPz2113taNnAAC6i1uPkO12u/bv369Lly45lnl5eSkiIkLV1dUKCwtTdXW102Navg8NDXWcWm5rm7CwsG4ePQAAruPWIFdXV+upp57SBx984Fh28eJFlZSUKCIiQvHx8SouLnYKdkFBgYYMGaIBAwYoOjpagYGBOnTokGN9bW2tSkpKFBcX16P7AgDAtXBrkKOjozV69Gi9+OKLKioq0rFjx7RkyRLV1tZqxowZmjRpks6dO6elS5eqrKxMe/bsUW5urubOnSvp8rXjtLQ0ZWZmKj8/X6WlpVq4cKHCwsKUkpLizl0DAKBT3HoN2Wazaf369VqzZo0WLFigb775RnFxcXrjjTf0ve99T5L06quvasWKFUpNTVVISIgWL16s1NRUx3PMnz9fTU1NWrZsmerr6xUfH6+cnJxWN3oBAGAym2VZlrsHYZKWm7quduNXV2Xt/FBV9guSpLABAZo/eZTLXwMA0DsZ82tPAAB4MoIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAZwe5DPnj2r559/XmPGjNEdd9yhKVOmqKioyLH+mWee0dChQ52+xowZ41jf3NysrKwsJSYmKiYmRrNmzVJFRYU7dgUAgC7zdvcAnnzySdntdq1du1bBwcHavn27Zs+erT179igiIkJHjx7VI488orS0NMdj+vTp4/jzxo0btWPHDq1atUqhoaFavXq15syZo/3798vHx8cduwQAQKe59Qi5oqJCf/vb3/TCCy8oLi5Ot9xyi5YuXarQ0FDt379fly5dUllZmUaMGKGQkBDHV3BwsCSpsbFRW7duVXp6upKSkhQdHa1169bp1KlTOnDggDt3DQCATnFrkPv376/Nmzdr+PDhjmU2m02WZammpkYnTpxQQ0ODIiIi2nx8aWmpzp8/r4SEBMeyoKAgDRs2TIWFhd0+fgAAXMWtp6yDgoKUlJTktOzNN9/U559/rtGjR+vYsWOy2WzKzc3Vu+++Ky8vLyUlJWnBggXq27evqqqqJEnh4eFOzzFw4EBVVlb22H4AAHCt3H4N+UrFxcV69tlnNW7cOCUnJysrK0teXl4aNGiQsrOzVVFRoV/96lc6duyYcnNzVVdXJ0mtrhX7+vqqpqam3dcZN25cu+sqKytbBR4AgO5mTJAPHjyop59+WjExMVq7dq0kKT09XTNmzFBQUJAkKSoqSiEhIZo8ebI+/vhj+fn5Sbp8Lbnlz5LU0NAgf3//nt8JAAC6yIggb9u2TStWrFBKSooyMzMdR7w2m80R4xZRUVGSpKqqKseRbHV1tW6++WbHNtXV1YqOjm739fLz89tdd7WjZwAAuovbfw95+/btWr58uaZNm6b169c7nX5+6qmnNHv2bKftP/74Y0lSZGSkoqOjFRgYqEOHDjnW19bWqqSkRHFxcT2zAwAAuIBbj5DLy8u1cuVKpaSkaO7cubLb7Y51fn5+mjhxoh599FFt2rRJEyZMUHl5uX75y19q4sSJjjuv09LSlJmZqeDgYA0aNEirV69WWFiYUlJS3LVbAAB0mluD/Oc//1kXL17UgQMHWv3ecGpqql5++WVt2LBB2dnZys7OVt++ffXAAw9owYIFju3mz5+vpqYmLVu2TPX19YqPj1dOTg5vCgIA6FVslmVZ7h6ESVquIV/tOnNXZe38UFX2C5KksAEBmj95lMtfAwDQO7n9GjIAACDIAAAYgSADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAtwf57Nmzev755zVmzBjdcccdmjJlioqKihzrjxw5orS0NMXGxmrs2LHKyclxenxzc7OysrKUmJiomJgYzZo1SxUVFT29GwAAXBO3B/nJJ5/URx99pLVr12r37t26/fbbNXv2bB0/flxnzpzRzJkzNXjwYOXl5Sk9PV0bNmxQXl6e4/EbN27Ujh079NJLL2nnzp2y2WyaM2eOGhsb3bhXAAB0jrc7X7yiokJ/+9vf9Pvf/1533HGHJGnp0qV69913tX//fvn5+cnHx0cZGRny9vZWRESEKioqtGXLFk2aNEmNjY3aunWrFi1apKSkJEnSunXrlJiYqAMHDmjChAnu3D0AADrMrUfI/fv31+bNmzV8+HDHMpvNJsuyVFNTo6KiIsXHx8vb++8/NyQkJKi8vFx2u12lpaU6f/68EhISHOuDgoI0bNgwFRYW9ui+AABwLdwa5KCgICUlJcnHx8ex7M0339Tnn3+u0aNHq6qqSmFhYU6PGThwoCTp5MmTqqqqkiSFh4e32qaysrKbRw8AgOu49ZT1txUXF+vZZ5/VuHHjlJycrFWrVjnFWpJ8fX0lSQ0NDaqrq5OkNrepqalp93XGjRvX7rrKyspWgQcAoLu5/aauFgcPHtTs2bM1cuRIrV27VpLk5+fX6uashoYGSVJAQID8/Pwkqc1t/P39e2DUAAC4hhFHyNu2bdOKFSuUkpKizMxMxxFvWFiYqqurnbZt+T40NFRNTU2OZTfffLPTNtHR0e2+Xn5+frvrrnb0DABAd3H7EfL27du1fPlyTZs2TevXr3c6/RwfH6/i4mJdunTJsaygoEBDhgzRgAEDFB0drcDAQB06dMixvra2ViUlJYqLi+vR/QAA4Fq4Ncjl5eVauXKlUlJSNHfuXNntdn311Vf66quv9M0332jSpEk6d+6cli5dqrKyMu3Zs0e5ubmaO3eupMvXjtPS0pSZman8/HyVlpZq4cKFCgsLU0pKijt3DQCATnHrKes///nPunjxog4cOKADBw44rUtNTdXLL7+sV199VStWrFBqaqpCQkK0ePFipaamOrabP3++mpqatGzZMtXX1ys+Pl45OTmtbvQCAMBkNsuyLHcPwiQt15Cvdp25q7J2fqgq+wVJUtiAAM2fPMrlrwEA6J3cfg0ZAAAQZAAAjECQAQAwAEEGAMAABBkAAAMQZAAADECQAQAwAEEGAMAABBkAAAMQZAAADECQAQAwAEEGAMAABBkAAAMQZAAADECQAQAwAEEGAMAABBkAAAMQZAAADNClIBcWFur8+fNtrqutrdV//dd/XdOgAADwNF0K8s9+9jMdP368zXUlJSV65plnrmlQAAB4Gu+ObrhkyRJVVlZKkizLUkZGhgIDA1ttd+LECd14442uGyEAAB6gw0fI48ePl2VZsizLsazl+5YvLy8vxcbGatWqVd0yWAAArlcdPkJOTk5WcnKyJGn69OnKyMhQREREtw0MAABP0uEgX+l3v/udq8cBAIBH61KQ6+rqlJ2drXfeeUd1dXVqbm52Wm+z2XTw4EGXDBAAAE/QpSCvWLFCeXl5uuuuu3TbbbfJy4tfZwYA4Fp0Kchvv/22Fi5cqIcfftjV4wEAwCN16dC2qalJI0eOdPVYAADwWF0K8ujRo/Xuu++6eiwAAHisLp2yvv/++/XCCy/o66+/VkxMjPz9/Vtt89BDD13r2AAA8BhdCvKCBQskSXv37tXevXtbrbfZbAQZAIBO6FKQ8/PzXT0OAAA8WpeCPGjQIFePAwAAj9alIL/yyivfuc3jjz/elacGAMAjuTzIgYGBGjhwIEEGAKATuhTk0tLSVssuXLig4uJiZWRk6LnnnrvmgQEA4Elc9p6XAQEBSkxM1Lx58/TrX//aVU8LAIBHcPmbUIeHh+v48eOufloAAK5rXTpl3RbLslRZWaktW7ZwFzYAAJ3UpSBHR0fLZrO1uc6yLE5ZAwDQSV0K8rx589oMcmBgoMaOHavBgwdf67gAAPAoXQpyenq6q8cBAIBH6/I15MbGRu3Zs0eHDh1SbW2t+vfvr7i4OKWmpsrX19eVYwQA4LrXpSDX1tbqZz/7mUpLS/W9731PISEhKi8v1/79+/XGG29o+/bt6tu3r6vHCgDAdatLv/a0Zs0aVVVVadu2bfrLX/6inTt36i9/+Yu2bdsmu92uDRs2uHqcAABc17oU5Pz8fC1YsEBxcXFOy+Pi4jR//ny9/fbbLhkcAACeoktBPn/+vG666aY219100006e/bstYwJAACP06Ug33LLLXrnnXfaXJefn68f/OAH1zQoAAA8TZdu6po9e7aefPJJNTY26oEHHtCNN96o06dP649//KN27dqljIwMFw8TAIDrW5eCfP/99+vEiRPKzs7Wrl27HMv/4R/+QfPmzdPkyZNdNkAAADxBl4J84cIFPfbYY0pLS9Phw4dVU1OjyspKTZ48Wf369XP1GAEAuO516hrykSNH9NBDD+n111+XJAUFBWnMmDEaM2aM1q9fr6lTp/JJTwAAdEGHg/zFF19oxowZqqmpUWRkpNM6Hx8fPfvsszp//rymTp2qqqqqLg1m48aNmj59utOyZ555RkOHDnX6GjNmjGN9c3OzsrKylJiYqJiYGM2aNUsVFRVden0AANylw0HevHmz+vfvr//4j//Qj370I6d1/v7+SktLU15engICApSdnd3pgbz++uvKyspqtfzo0aN65JFH9P777zu+9u7d61i/ceNG7dixQy+99JJ27twpm82mOXPmqLGxsdNjAADAXToc5IKCAv3iF7/QDTfc0O42AwYM0MyZM1VQUNDhAZw6dUq/+MUvtGHDBg0ZMsRp3aVLl1RWVqYRI0YoJCTE8RUcHCzp8vtpb926Venp6UpKSlJ0dLTWrVunU6dO6cCBAx0eAwAA7tbhIH/11Vcd+v3iqKioTp2y/vTTT9WvXz/t27dPMTExTutOnDihhoYGRUREtPnY0tJSnT9/XgkJCY5lQUFBGjZsmAoLCzs8BgAA3K3Dd1kHBwerurr6O7f7+uuvr3oU/W3JyclKTk5uc92xY8dks9mUm5urd999V15eXkpKStKCBQvUt29fR/jDw8OdHjdw4EBVVla2+5rjxo1rd11lZWWr5wMAoLt1+Ag5Pj5ee/bs+c7t9u7dq9tuu+2aBtXis88+k5eXlwYNGqTs7GwtWbJE//3f/63HHntMzc3Nqqurk3T5prIr+fr6qqGhwSVjAACgJ3T4CHn69OmaMmWKXn75ZS1cuLDVZx43NjZq3bp1eu+997R582aXDC49PV0zZsxQUFCQpMunw0NCQjR58mR9/PHH8vPzc7x2y58lqaGhQf7+/u0+b35+frvrrnb0DABAd+lwkEeMGKFnnnlGK1eu1H/+53/q7rvv1ve//31dunRJJ0+e1KFDh3TmzBk98cQTSkxMdMngbDabI8YtoqKiJElVVVWOU8vV1dW6+eabHdtUV1crOjraJWMAAKAndOqduqZNm6bo6Gjl5OQoPz/fcVr4H//xHzV69GjNmjWr1Y1Z1+Kpp57S2bNnlZOT41j28ccfS5IiIyN10003KTAwUIcOHXIEuba2ViUlJUpLS3PZOAAA6G6dfuvMO++8U3feeack6cyZM/Ly8uq2t8ucOHGiHn30UW3atEkTJkxQeXm5fvnLX2rixImOO6/T0tKUmZmp4OBgDRo0SKtXr1ZYWJhSUlK6ZUwAAHSHLr2XdYv+/fu7ahxt+uEPf6gNGzYoOztb2dnZ6tu3rx544AEtWLDAsc38+fPV1NSkZcuWqb6+XvHx8crJyWl1oxcAACazWZZluXsQJmm5qetqN351VdbOD1VlvyBJChsQoPmTR7n8NQAAvVOnPlwCAAB0D4IMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyG7iZXP3CAAAJvF29wA8Vf8gP21/q1Sna+p0Yz9/Tb0v2t1DAgC4EUF2o9M1daqyX3D3MAAABuCUNQAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABjAqyBs3btT06dOdlh05ckRpaWmKjY3V2LFjlZOT47S+ublZWVlZSkxMVExMjGbNmqWKioqeHDYAANfMmCC//vrrysrKclp25swZzZw5U4MHD1ZeXp7S09O1YcMG5eXlObbZuHGjduzYoZdeekk7d+6UzWbTnDlz1NjY2NO7AABAl7n985BPnTqlpUuXqri4WEOGDHFa94c//EE+Pj7KyMiQt7e3IiIiVFFRoS1btmjSpElqbGzU1q1btWjRIiUlJUmS1q1bp8TERB04cEATJkxwxy4BANBpbj9C/vTTT9WvXz/t27dPMTExTuuKiooUHx8vb++//9yQkJCg8vJy2e12lZaW6vz580pISHCsDwoK0rBhw1RYWNhj+wAAwLVy+xFycnKykpOT21xXVVWlqKgop2UDBw6UJJ08eVJVVVWSpPDw8FbbVFZWdsNoAQDoHm4P8tXU19fLx8fHaZmvr68kqaGhQXV1dZLU5jY1NTXtPu+4cePaXVdZWdkq8AAAdDe3n7K+Gj8/v1Y3ZzU0NEiSAgIC5OfnJ0ltbuPv798zgwQAwAWMPkIOCwtTdXW107KW70NDQ9XU1ORYdvPNNzttEx0d3e7z5ufnt7vuakfPAAB0F6OPkOPj41VcXKxLly45lhUUFGjIkCEaMGCAoqOjFRgYqEOHDjnW19bWqqSkRHFxce4YMgAAXWJ0kCdNmqRz585p6dKlKisr0549e5Sbm6u5c+dKunztOC0tTZmZmcrPz1dpaakWLlyosLAwpaSkuHn0AAB0nNGnrAcMGKBXX31VK1asUGpqqkJCQrR48WKlpqY6tpk/f76ampq0bNky1dfXKz4+Xjk5Oa1u9AIAwGQ2y7Isdw/CJC3XkK92nbmrsnZ+qCr7BUnS7bcEy15Tryr7BYUNCND8yaNc/noAgN7D6FPWAAB4CoIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABjP48ZFy2/a1Sna6pkyTd2M9fU++LdvOIAACuRpB7gdM1dY7PUQYAXJ84ZQ0AgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYoFcE+csvv9TQoUNbfe3atUuSdOTIEaWlpSk2NlZjx45VTk6Om0cMAEDneLt7AB1x9OhR+fr66uDBg7LZbI7lffv21ZkzZzRz5kz98z//s1588UUdPnxYL774om644QZNmjTJjaMGAKDjekWQjx07piFDhmjgwIGt1uXm5srHx0cZGRny9vZWRESEKioqtGXLFoIMAOg1esUp66NHjyoyMrLNdUVFRYqPj5e3999/tkhISFB5ebnsdntPDREAgGvSK4J87Ngx2e12TZ06Vffcc4+mTJmi9957T5JUVVWlsLAwp+1bjqRPnjzZ42PtCi/bd28DALi+GX/KurGxUSdOnJC/v78WL16sgIAA7du3T3PmzNFrr72m+vp6+fj4OD3G19dXktTQ0NDmc44bN67d16usrFR4eLjrdqAD+gf5aftbpTpdUydJurGfv6beF92jYwAAuJfxQfbx8VFhYaG8vb0d4R0+fLiOHz+unJwc+fn5qbGx0ekxLSEOCAjo8fF21emaOlXZL7h7GAAANzE+yFLbYY2KitL777+vsLAwVVdXO61r+T40NLTN58vPz2/3ta529AwAQHcx/hpyaWmpRo0apaKiIqfln3zyiSIjIxUfH6/i4mJdunTJsa6goEBDhgzRgAEDenq4AAB0ifFBjoqK0q233qoXX3xRRUVFOn78uFatWqXDhw/rkUce0aRJk3Tu3DktXbpUZWVl2rNnj3JzczV37lx3Dx0AgA4z/pS1l5eXsrOzlZmZqQULFqi2tlbDhg3Ta6+9pqFDh0qSXn31Va1YsUKpqakKCQnR4sWLlZqa6uaRdx13XQOA5zE+yJIUHByslStXtrt+5MiR2rlzZw+OqHtdedd15KAb3D0cAEAPMP6Utadquev6zLl6dw8FANADCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAILcy3jZ3D0CAEB38Hb3ANA5/YP8tP2tUp2uqdON/fw19b5odw8JAOACBLkXOl1Tpyr7BXcPAwDgQpyyBgDAAAQZAAADEGQAAAxAkAEAMABBBgDAAAQZAAADEGQAAAxAkAEAMABBBgDAAAQZAAADEGQAAAxAkAEAMABBBgDAAHzak4dq+QhHSXyMIwAYgCB7KD7CEQDMwilrAAAMQJABADAAQQYAwABcQ/YgLTdyRQ664ZqfQ+JmMABwJYLsQVpu5BrQz69Tj7sy5NwMBgDdg1PW+E4tET5zrt7dQwGA6xZBRpd52dw9AgC4fnDKupu54rqtqfoH+Tn2j+vJAHBtCHI36+p1296Ca8oA4BqcsgYAwAAEGQAAA3DK+jrTck035AZ/TRnfc9d0ucELAK4NQe7F2orgldesr3wTj6vdVOaKmF55g5fEm4Z4Ev67A65BkHuxb0fw29G98oarq91U5qq7pbnByzPx3x1wDYLcy3U0up15Hk/WW472ess4AXQcQQau0Ft+MOkt4wTQcQQZ3cLdN3lxBAmgtyHIcOKqkF55Xbqn7/iWOIIE0PtcF0Fubm7WK6+8ol27dqm2tlZ33nmnXnjhBf3gBz9w99B6ne+6Uawz2rvj+1qOWHmrzs5jzoDe4boI8saNG7Vjxw6tWrVKoaGhWr16tebMmaP9+/fLx8fH3cPrdVx1o1h7z/lt3/4B4Oy5Bsf7f7f8uWWdO498e8tp8G/PJ2cLgI5x97/xXh/kxsZGbd26VYsWLVJSUpIkad26dUpMTNSBAwc0YcIEN48Q3+XbPwDYa+odR9Ytf25Z5069JWzd8QMV4Anc/W+81791Zmlpqc6fP6+EhATHsqCgIA0bNkyFhYVuHBkAAB1nsyzLcvcgrsXbb7+t9PR0ffTRR/Lz+/vRwBNPPKH6+nr95je/afWYcePGtft8//d//6c+ffooPDzcJeM7V3dRzc2WvPt4qdmy1Nx8ebqv/N7T1nl52RTo/w+t5qgzz/nt52hv3tt6vW+ra2hSc7OlPn281HSpuUPj/K7n7G5XG0t789nZMbfMi5eXTf6+7Z9MM2legGvRXX+Xw8PDtW3btu/crtefsq6ru3y+/9vXin19fVVTU9Pp57PZbPL2ds20VFZWSpLL4n49uXJuuuN/4J15TufY9HHJc16r7/q7c7WxuGqcV4twd7xeZ/Bv6+qYn6trb37c/cNkrw9yy1FxY2Oj0xFyQ0OD/P3923xMfn5+j4yt5Ui8p16vN2Furo75uTrm5+qYn6szdX56/TXklp9wqqurnZZXV1crLCzMHUMCAKDTen2Qo6OjFRgYqEOHDjmW1dbWqqSkRHFxcW4cGQAAHdfrT1n7+PgoLS1NmZmZCg4O1qBBg7R69WqFhYUpJSXF3cMDAKBDen2QJWn+/PlqamrSsmXLVF9fr/j4eOXk5PCmIACAXuO6CHKfPn20aNEiLVq0yN1DAQCgS3r9NWQAAK4Hvf6NQQAAuB5whAwAgAEIMgAABiDIAAAYgCADAGAAgtwNmpublZWVpcTERMXExGjWrFmqqKhw97B63MaNGzV9+nSnZUeOHFFaWppiY2M1duxY5eTkOK2/3ufu7Nmzev755zVmzBjdcccdmjJlioqKihzrPX1+7Ha7Fi1apISEBI0aNUoPP/ywysrKHOs9fX5alJeXa9SoUdqzZ49jGXMjffnllxo6dGirr127dknqBXNkweX+7d/+zbr77rutv/71r9aRI0esWbNmWSkpKVZDQ4O7h9ZjXnvtNWvo0KFWWlqaY9nXX39t/dM//ZO1dOlSq6yszNq9e7c1YsQIa/fu3Y5trve5mzlzpvXggw9ahYWF1vHjx63ly5dbI0eOtMrKypgfy7J++tOfWpMnT7b+53/+xyorK7PS09Ote++917pw4QLz8/81NjZa//Iv/2JFRUVZeXl5lmXxb6tFfn6+NWLECOvUqVNWdXW146uurq5XzBFBdrGGhgZr1KhR1vbt2x3LampqrJEjR1r79+9348h6RlVVlTV79mwrNjbWuu+++5yCnJ2dbSUmJloXL150LFuzZo01fvx4y7Ku/7k7ceKEFRUVZRUXFzuWNTc3WykpKdb69es9fn6+/vpra+HChdaxY8ccy44cOWJFRUVZH330kcfPT4s1a9ZY06dPdwoyc3PZpk2brAcffLDNdb1hjjhl7WKlpaU6f/68EhISHMuCgoI0bNgwFRYWunFkPePTTz9Vv379tG/fPsXExDitKyoqUnx8vNPnTSckJKi8vFx2u/26n7v+/ftr8+bNGj58uGOZzWaTZVmqqalhfvr319q1a3XrrbdKkk6fPq2cnByFhYUpMjLS4+dHkgoLC7Vz50796le/clrO3Fx29OhRRUZGtrmuN8wRQXaxqqoqSa0/+HrgwIGOD8W+niUnJ2vNmjW66aabWq2rqqpq9ZGYAwcOlCSdPHnyup+7oKAgJSUlOb3H+ptvvqnPP/9co0eP9vj5udJzzz2ne++9V2+99ZZWrFihgIAAj5+f2tpaLV68WMuWLWu1j54+Ny2OHTsmu92uqVOn6p577tGUKVP03nvvSeodc0SQXayurk6SWn2wha+vrxoaGtwxJGPU19e3OS+S1NDQ4HFzV1xcrGeffVbjxo1TcnIy83OFn//858rLy9ODDz6oefPm6dNPP/X4+cnIyFBsbKweeOCBVus8fW4kqbGxUSdOnNC5c+e0YMECbd68WSNGjNCcOXNUUFDQK+bouvhwCZP4+flJuvyXo+XP0uX/4P7+/u4alhH8/PzU2NjotKzlL3pAQIBHzd3Bgwf19NNPKyYmRmvXrpXE/Fyp5bTj8uXLdfjwYW3bts2j52fv3r0qKirSH//4xzbXe/LctPDx8VFhYaG8vb0dUR0+fLiOHz+unJycXjFHHCG7WMvpjurqaqfl1dXVrU6XeJqwsLA250WSQkNDPWbutm3bpvT0dI0ZM0Zbtmxx/OP39Pmx2+3av3+/Ll265Fjm5eWliIgIxz566vzk5eXJbrdr7NixGjVqlEaNGiVJeuGFFzRhwgSPnpsrBQQEtDrCjYqK0qlTp3rFHBFkF4uOjlZgYKAOHTrkWFZbW6uSkhLFxcW5cWTuFx8fr+LiYqf/4RYUFGjIkCEaMGCAR8zd9u3btXz5ck2bNk3r1693+p+Hp89PdXW1nnrqKX3wwQeOZRcvXlRJSYkiIiI8en4yMzP1pz/9SXv37nV8SZc/C37z5s0ePTctSktLNWrUKKff65ekTz75RJGRkb1jjnrkXm4Ps3btWuuuu+6yDh486Phdth/96EfX1e/7dcSSJUucfu3p9OnTVnx8vLVkyRLrs88+s/Ly8qwRI0ZYe/bscWxzPc/d//7v/1q33367NW/ePKffkayurrZqa2s9fn6am5utWbNmWePHj7cKCwuto0ePWgsXLrTi4+OtL7/80uPn59uu/LUn5sayLl26ZP30pz+1Jk6caBUWFlplZWXWypUrreHDh1ulpaW9Yo4Icjdoamqyfv3rX1sJCQlWbGysNWfOHOuLL75w97B63LeDbFmW9dFHH1n/+q//ag0fPtz64Q9/aP3ud79zWn89z92mTZusqKioNr+WLFliWZZnz49lWVZtba31wgsvWPfee681cuRIa9asWU6/l+zp83OlK4NsWcyNZVmW3W63nnnmGevee++1RowYYU2ePNkqLCx0rDd9jvg8ZAAADMA1ZAAADECQAQAwAEEGAMAABBkAAAMQZAAADECQAQAwAEEGAMAABBkAAAMQZACqqqpSWlqaRowYobvvvtvxUXQAeg4fvwhAubm5+vDDD7V69WqFhoZeNx/JB/QmBBmAzp49q4EDB+r+++9391AAj8V7WQMeLjk5WV9++aXj+8cff1wpKSl65ZVXVFRUpG+++UbBwcEaP368nn76acfnNw8dOlTp6el65513dOLECc2ePVuPPfaYTp48qczMTL3//vtqaGhQbGyslixZomHDhrlrF4FegSADHq6kpETr169XSUmJXnnlFYWGhmrixImKjY3V9OnT5ePjo7/+9a/Kzc3VwoUL9cgjj0i6HGRvb2898cQTGjp0qMLCwhQSEqKHHnpI/v7+evzxx+Xv76/c3Fx98skn2r17tyIiIty8t4C5OGUNeLhhw4YpODhYPj4+io2N1fvvv6/bbrtNGzZsUGBgoCTpnnvuUUFBgQoLCx1BlqSRI0fq4Ycfdny/bt06nT17Vr///e81aNAgSdKYMWN0//33a8OGDcrKyurZnQN6EYIMwMno0aM1evRoXbx4UeXl5Tpx4oSOHj2qr7/+WjfccIPTtlFRUU7fFxQU6LbbblNoaKiampokSV5eXhozZoz27dvXU7sA9EoEGYCT5uZmrV27Vm+88YYuXLig8PBwjRw5Ur6+vq22vfHGG52+P3v2rCoqKnT77be3+dx1dXXcwQ20gyADcLJ582a9/vrrysjI0Pjx49W3b19J0k9+8pPvfGzfvn111113afHixW2u9/HxcelYgesJbwwCwElxcbEiIyP1k5/8xBHjU6dO6dixY2pubr7qY++66y6Vl5dryJAhGjFihONr37592rVrl/r06dMTuwD0SgQZgJORI0fq6NGj2rx5sz744APt2rVL06ZNU2Nj43e+g9eMGTPU3NysGTNm6E9/+pMKCgr03HPP6be//a1uueWWHtoDoHfilDUAJ3PnztWZM2f029/+Vv/+7/+u8PBw/fjHP5bNZtNvfvMb1dTUqF+/fm0+NjQ0VDt27NCaNWuUkZGhhoYGDR48WCtWrOjQKW/Ak/F7yAAAGIBT1gAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAb4f7yjjtuRSFpdAAAAAElFTkSuQmCC",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# distribution of values in numeric columns\n",
"sns.displot(df1['fare'])\n",
"print('Fare: minimum=', df1['fare'].min(), '; maximum=', df1['fare'].max(), '; median=', df1['fare'].median())"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "bbfffb36",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Passengers with zero fare: 11\n",
"The most often fares: \n",
"fare\n",
"13.0000 42\n",
"8.0500 40\n",
"7.7500 39\n",
"7.8958 32\n",
"26.0000 29\n",
"Name: count, dtype: int64\n"
]
}
],
"source": [
"# zero fare is rather unexpected; how many passenger have zero fare?\n",
"print('Passengers with zero fare: ', (df1['fare']==0).sum())\n",
"print('The most often fares: ')\n",
"print(df1['fare'].value_counts().sort_values(ascending=False).iloc[0:5])"
]
},
{
"cell_type": "markdown",
"id": "e8891a3e",
"metadata": {},
"source": [
"> Fare values seem to be valid with exception of zero and missing values."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "f96cdf2b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Age: minimum= 0.1667 ; maximum= 80.0 ; median= 28.0\n",
"The most often ages:\n",
"age\n",
"18.0 32\n",
"30.0 30\n",
"24.0 29\n",
"22.0 28\n",
"25.0 26\n",
"Name: count, dtype: int64\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\jhucin\\Anaconda3\\lib\\site-packages\\seaborn\\axisgrid.py:118: UserWarning: The figure layout has changed to tight\n",
" self._figure.tight_layout(*args, **kwargs)\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAeQAAAHkCAYAAADvrlz5AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8pXeV/AAAACXBIWXMAAA9hAAAPYQGoP6dpAAAn70lEQVR4nO3de1SVdb7H8Q+IKOiYjOGlJi/pkDdujjh0vESyxBaWaxhP40kxQyeZowdHO6Ok5MhaZZeZUMkyR8X0ZE2M4dDkTE1KtnI6jKGWrckLBwMcBQIRZQQFhd/5wyW1E0txw/MD3q+1WCt+z96P34073zzPvnkYY4wAAICjPJ0eAAAAEGQAAKxAkAEAsABBBgDAAgQZAAALEGQAACxAkAEAsABBBgDAAu0uyLGxsYqNjXV6DAAAXHg5PUBLKy4udnoEAACu0u6OkAEAsBFBBgDAAgQZAAALEGQAACxAkAEAsABBBgDAAgQZAAALEGQAACxAkAEAsABBBgDAAgQZAAALEGQAACxAkAEAsABBBgDAAgQZAAALEGQAACxAkAEAsABBBgDAAgQZAAALEGTgBtXXG6v2A6Bt8HJ6AKC18fT0UPrOXJVVVDd5H/5+vpo6IcCNUwFo7Qgy0ARlFdUqOlXl9BgA2hBOWQMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMhoN+rrjdMjAMA1eTk9ANBSPD09lL4zV2UV1U3eR0BfP0WF93PjVABwGUFGu1JWUa2iU1VNvr5/dx83TgMAX+GUNQAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUcD/LFixe1atUqRUREKDQ0VNOmTdOBAwcath8+fFixsbEKCQlRRESE0tLSHJwWAIDm4XiQX375ZWVkZOipp55SZmam7rzzTj366KP68ssvVVFRobi4OPXv318ZGRlKSEhQamqqMjIynB4bAAC3cvzjF7OysnT//fdrzJgxkqTHH39c27Zt06effqqCggJ5e3srOTlZXl5eGjhwoAoLC7VhwwZNmTLF4ckBAHAfx4+Qu3fvrt27d+vEiROqq6tTenq6vL29NWTIEO3bt09hYWHy8vrq94bw8HDl5+ervLzcwakBAHAvx4+Qk5KStHDhQkVGRqpDhw7y9PRUamqq+vbtq5KSEgUEBLhcvmfPnpKkoqIi9ejRw4mRAQBwO8eDfOzYMXXr1k0vvfSSevXqpW3btikxMVFbt27VhQsX5O3t7XL5Tp06SZJqamquuc/IyMhrbisuLlafPn3cMzwAAG7iaJBPnjypRYsWafPmzRo5cqQkKTAwUHl5eVqzZo06d+6s2tpal+tcCbGvr2+LzwsAQHNxNMifffaZLl68qMDAQJf14OBgffjhh7rttttUWlrqsu3K97169brmfrOysq657duOngEAcIqjT+q6cur46NGjLuu5ubnq16+fwsLCtH//ftXV1TVsy87O1oABA3j8GADQpjga5KCgII0cOVKJiYn6+9//roKCAq1evVrZ2dmaM2eOpkyZonPnzikpKUl5eXnavn27tmzZovj4eCfHBgDA7Rw9Ze3p6am1a9dq9erVWrJkic6ePauAgABt3rxZISEhkqSNGzdqxYoViomJkb+/vxYvXqyYmBgnxwYAwO0cf5b1LbfcouXLl2v58uWNbg8KClJ6enoLTwUAQMty/I1BAAAAQQYAwAoEGQAACxBkAAAsQJABALAAQQYAwAIEGQAACxBkAAAsQJABALAAQQYAwAIEGQAACxBkAAAsQJABALAAQQYAwAIEGQAACxBkNJv6emPVfgDAZl5OD4C2y9PTQ+k7c1VWUd3kffj7+WrqhAA3TgUAdiLIaFZlFdUqOlXl9BgAYD1OWQMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyrNbVt6Pq643TYwBAs+PzkGE1H28veXp6KH1nrsoqqpu8n4C+fooK7+fGyQDAvQgyWoWyimoVnapq8vX9u/u4cRoAcD9OWQMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABawIcmZmpqKjoxUYGKhJkybpnXfeadh2+PBhxcbGKiQkRBEREUpLS3NwUgAAmofjQX7rrbe0dOlSTZ06VTt27FB0dLQee+wxffLJJ6qoqFBcXJz69++vjIwMJSQkKDU1VRkZGU6PDQCAW3k5+YcbY5SamqqZM2dq5syZkqR58+bpwIED+vjjj/Xxxx/L29tbycnJ8vLy0sCBA1VYWKgNGzZoypQpTo4OAIBbOXqE/MUXX+jkyZN64IEHXNbT0tIUHx+vffv2KSwsTF5eX/3eEB4ervz8fJWXl7f0uAAANBtHj5ALCgokSdXV1Zo9e7YOHTqkH/zgB/rP//xPjR8/XiUlJQoICHC5Ts+ePSVJRUVF6tGjR6P7jYyMvOafWVxcrD59+rjnBgAA4CaOHiGfO3dOkpSYmKj7779fmzZt0ujRozV37lxlZ2frwoUL8vb2drlOp06dJEk1NTUtPi8AAM3F0SPkjh07SpJmz56tmJgYSdKQIUN06NAhvfLKK+rcubNqa2tdrnMlxL6+vtfcb1ZW1jW3fdvRMwAATnH0CLl3796SdNVp6UGDBunEiRPq3bu3SktLXbZd+b5Xr14tMyQAAC3A0SAPHTpUXbp00cGDB13Wc3Nz1bdvX4WFhWn//v2qq6tr2Jadna0BAwZc8/FjAABaI0eD3LlzZ/385z/XSy+9pB07duj48eN6+eWX9dFHHykuLk5TpkzRuXPnlJSUpLy8PG3fvl1btmxRfHy8k2MDAOB2jj6GLElz586Vj4+PVq1apS+//FIDBw7UmjVr9OMf/1iStHHjRq1YsUIxMTHy9/fX4sWLGx5vBgCgrXA8yJIUFxenuLi4RrcFBQUpPT29hScCAKBlOf7WmQAAgCADAGAFggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggw4oKtvR9XXG7fsyx37cdcsAJrOiveyBtobH28veXp6KH1nrsoqqpu8n4C+fooK73dT+/H389XUCQHffUEAzYogAw4qq6hW0amqJl/fv7uPW/YDwHmcsgYAwAIEGQAACxBkAAAsQJABALBAk4Kck5OjqqrGn0BSWVmpP//5zzc1FAAA7U2Tgvzwww/r2LFjjW47dOiQlixZclNDAQDQ3lz3y54SExNVXFwsSTLGKDk5WV27dr3qcgUFBbr11lvdNyEAAO3AdR8hT5w4UcYYGfPVO/pc+f7Kl6enp0JCQvTMM880y7AAALRV132EPH78eI0fP16SNGPGDCUnJ2vgwIHNNhgAAO1Jk96p69VXX3X3HAAAtGtNCvL58+e1bt067d69W+fPn1d9fb3Ldg8PD+3atcstAwIA0B40KcgrVqxQRkaGRo0apSFDhsjTk5czAwBwM5oU5Pfee08LFy7UnDlz3D0PAADtUpMObS9duqSgoCB3zwIAQLvVpCCPGTNGH374obtnAQCg3WrSKevo6GgtX75cp0+fVnBwsHx8fK66zE9+8pObnQ0AgHajSUFesGCBJCkzM1OZmZlXbffw8CDIAADcgCYFOSsry91zAADQrjUpyLfffru75wAAoF1rUpBffPHF77zMf/3XfzVl1wAAtEtuD3LXrl3Vs2dPggwAwA1oUpCPHDly1Vp1dbX279+v5ORkLVu27KYHAwCgPXHbe176+vpq7Nixmjdvnn7zm9+4a7cAALQLbn8T6j59+ujYsWPu3i0AAG1ak05ZN8YYo+LiYm3YsIFnYQMAcIOaFOTBgwfLw8Oj0W3GGE5ZAwBwg5oU5Hnz5jUa5K5duyoiIkL9+/e/2bkAAGhXmhTkhIQEd88BAEC71uTHkGtra7V9+3bt3btXlZWV8vPz08iRIxUTE6NOnTq5c0YAANq8JgW5srJSDz/8sI4cOaLbbrtN/v7+ys/P144dO/Taa6/p9ddf1/e+9z13zwoAQJvVpJc9paSkqKSkRFu3btX777+v9PR0vf/++9q6davKy8uVmprq7jkBAGjTmhTkrKwsLViwQCNHjnRZHzlypObPn6/33nvPLcMBANBeNCnIVVVVuuOOOxrddscdd+jMmTM3MxMAAO1Ok4J85513avfu3Y1uy8rKUr9+/W5qKAAA2psmPalr9uzZeuyxx1RbW6sHHnhAt956q06dOqW3335b27ZtU3JyspvHBACgbWtSkKOjo1VQUKB169Zp27ZtDesdO3bUvHnzNHXqVLcNCABAe9CkIFdXV2vu3LmKjY3Vp59+qrNnz6q4uFhTp07VLbfc4u4ZAQBo827oMeTDhw/rJz/5iTZv3ixJ6tatm8aNG6dx48Zp9erVmjZtGp/0BABAE1x3kP/5z3/qkUce0dmzZzVo0CCXbd7e3lq6dKmqqqo0bdo0lZSUuH1QAADasusO8vr16+Xn56c//vGPioqKctnm4+Oj2NhYZWRkyNfXV+vWrXP7oAAAtGXXHeTs7Gz9/Oc/V/fu3a95mR49eiguLk7Z2dnumA0AgHbjuoNcVlZ2Xa8vDggI4JQ1AAA36LqD/P3vf1+lpaXfebnTp09/61E0AAC42nUHOSwsTNu3b//Oy2VmZmrIkCE3NRQAAO3NdQd5xowZ2rt3r5599lnV1NRctb22tlbPPfec9uzZo+nTp7t1SAAA2rrrfmOQwMBALVmyRE8//bTeeust3X333frBD36guro6FRUVae/evaqoqNAvf/lLjR07tjlnBgCgzbmhd+qaPn26Bg8erLS0NGVlZTUcKXfp0kVjxozRrFmzFBwc3CyDAgDQlt3wW2f+6Ec/0o9+9CNJUkVFhTw9PXm7TAAAblKT3sv6Cj8/P3fNAQBAu9akz0MGAADuRZABALAAQQYAwAIEGQAACxBkAAAsQJABALAAQQYAwAIEGQAACxBkAAAsQJABALCAVUHOz89XaGioy+cuHz58WLGxsQoJCVFERITS0tIcnBAAgOZhTZAvXryoX/3qV6qurm5Yq6ioUFxcnPr376+MjAwlJCQoNTVVGRkZDk4KAID73dSHS7jTmjVr1KVLF5e1P/zhD/L29lZycrK8vLw0cOBAFRYWasOGDZoyZYpDkwIA4H5WHCHn5OQoPT1dzz33nMv6vn37FBYWJi+vr35vCA8PV35+vsrLy1t6TAAAmo3jQa6srNTixYv1xBNPqE+fPi7bSkpK1Lt3b5e1nj17SpKKiopabEYAAJqb46esk5OTFRISogceeOCqbRcuXJC3t7fLWqdOnSRJNTU119xnZGTkNbcVFxdfFX4AAJzmaJAzMzO1b98+vf32241u79y5s2pra13WroTY19e32ecDAKClOBrkjIwMlZeXKyIiwmV9+fLlSktL02233abS0lKXbVe+79Wr1zX3m5WVdc1t33b0DACAUxwN8vPPP68LFy64rEVFRWn+/PmKjo7Wn//8Z73xxhuqq6tThw4dJEnZ2dkaMGCAevTo4cTIAAA0C0ef1NWrVy/169fP5UuSevToodtvv11TpkzRuXPnlJSUpLy8PG3fvl1btmxRfHy8k2MDAOB2jj/L+tv06NFDGzduVH5+vmJiYvTiiy9q8eLFiomJcXo0AADcyvFnWX/T0aNHXb4PCgpSenq6Q9MAANAyrD5CBgCgvSDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUI8k2qrzdW7ANoqq6+Hd12H+S+DDSdl9MDtHaenh5K35mrsorqJl3f389XUycEuHkq4Pr5eHvd9P1Y4r4M3CyC7AZlFdUqOlXl9BjATeF+DDiLU9YAAFiAIANwCxsfi+YxbbQmnLIG4Bbueiw6oK+fosL78Zg22h2CDMCtbvaxaP/uPm7ZD9DacMoaAAALEGQAACxAkAEAsABBBgDAAgQZAAALEGQAACxAkAG0STa+UQnwbXgdMoA2iQ/NQGtDkAG0abzBCFoLTlkDAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYwPEgnzlzRr/+9a81btw4jRgxQg899JD27dvXsP3w4cOKjY1VSEiIIiIilJaW5uC0AAA0D8eD/Nhjj+ngwYNauXKl3nzzTQ0bNkyzZ8/WsWPHVFFRobi4OPXv318ZGRlKSEhQamqqMjIynB4bAAC3cvTjFwsLC/XRRx/p97//vUaMGCFJSkpK0ocffqgdO3aoc+fO8vb2VnJysry8vDRw4EAVFhZqw4YNmjJlipOjAwDgVo4eIfv5+Wn9+vUaPnx4w5qHh4eMMTp79qz27dunsLAweXl99XtDeHi48vPzVV5e7sTIAAA0C0eD3K1bN91zzz3y9vZuWHvnnXd0/PhxjRkzRiUlJerdu7fLdXr27ClJKioqatFZAQBoTo6esv6m/fv3a+nSpYqMjNT48eP1zDPPuMRakjp16iRJqqmpueZ+IiMjr7mtuLhYffr0cc/AAAC4ieNP6rpi165dmj17toKCgrRy5UpJUufOnVVbW+tyuSsh9vX1bfEZAbQ/XX07qr7euGVf7toP2iYrjpC3bt2qFStWaMKECXr++ecbjop79+6t0tJSl8te+b5Xr17X3F9WVtY1t33b0TMAfJOPt5c8PT2UvjNXZRXVTd6Pv5+vpk4IcONkaGscD/Lrr7+uJ598UjNmzNDSpUvl6fnVQXtYWJjeeOMN1dXVqUOHDpKk7OxsDRgwQD169HBqZADtUFlFtYpOVTk9BtowR09Z5+fn6+mnn9aECRMUHx+v8vJylZWVqaysTP/61780ZcoUnTt3TklJScrLy9P27du1ZcsWxcfHOzk2AABu5+gR8l//+lddvHhRO3fu1M6dO122xcTE6Nlnn9XGjRu1YsUKxcTEyN/fX4sXL1ZMTIxDEwMA0DwcDfIvfvEL/eIXv/jWywQFBSk9Pb2FJgIAwBnWPMsaAID2jCADAGABgtyG8FpJAGi9HH/ZE9yH10oCQOtFkNsYXisJAK0Tp6wBALAAQQYAwAIE2WHufON6d7BtHgBoL3gM2WHueuP6gL5+igrv1+bmAYD2giBb4mafjOXf3ceN09g3DwC0dZyyBgDAAgQZAAALEGQAACxAkAEAsABBBgDAAgQZAAALEGQAACxAkAEAsABBBgDAAgQZAAALEGQAACxAkAEAsABBBgDAAgQZAFqAOz9rnM8sb5v4+EUAaAHu+qxxfz9fTZ0Q4MbJYAuCDAAt6GY/axxtF6esAQCwAEEGAMACBBkAAAsQZAAALECQAQCwAEEGAMACBBkAWhHeYKTt4nXIANCK8AYjbRdBBoBWiDcYaXs4ZQ0AgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADQDvU1bej6uuNW/blrv20d15ODwAAaHk+3l7y9PRQ+s5clVVUN3k//n6+mjohwI2TtV8EGQDasbKKahWdqnJ6DIhT1gAAWIEgAwCazF2PRfN4NqesAQA3wR2PRQf09VNUeL92/3g2QQYA3LSbeSzav7vPTe+jLWgVp6zr6+v1wgsvaOzYsQoODtasWbNUWFjo9FgAALhNqwjy2rVr9cYbb+ipp55Senq6PDw89Oijj6q2ttbp0QAAlmjtr622/pR1bW2tNm3apEWLFumee+6RJK1atUpjx47Vzp07NWnSJIcnBADYoLW/ttr6IB85ckRVVVUKDw9vWOvWrZuGDh2qnJwcggwAcNFaH4v2MMZY/Rzx9957TwkJCTp48KA6d+7csP7LX/5SFy5c0O9+97urrhMZGXnN/Z04cUIdOnRQnz593DZj1fmLqmvi6Y2OXp7y6eR1U/tgP+yH+yD7ac37sWkWSerg6aEuPh2bfP1v6tOnj7Zu3fqdl7P+CPn8+fOSJG9vb5f1Tp066ezZsze8Pw8PD3l53fzNLi4ulnT5B+2Ovzh3/eVfz36+PrsN89zIfq5n9pac50Y0NrtNP+dv28eN/Nxtuk2SVHnmlKSm32eucOJ2fdvP3baf8zf305T/V225TcXFxao8c/P3mRtlfZCvHBXX1ta6HCHX1NTIx8en0etkZWU1+1xXjsJb4s9yN2Z3BrM7g9mdwew3zvpnWV/5DaW0tNRlvbS0VL1793ZiJAAA3M76IA8ePFhdu3bV3r17G9YqKyt16NAhjRw50sHJAABwH+tPWXt7eys2NlbPP/+8vv/97+v222/Xb3/7W/Xu3VsTJkxwejwAANzC+iBL0vz583Xp0iU98cQTunDhgsLCwpSWlnbVE70AAGitWkWQO3TooEWLFmnRokVOjwIAQLOw/jFkAADaA+vfGAQAgPaAI2QAACxAkAEAsABBBgDAAgQZAAALEOQbVF9frxdeeEFjx45VcHCwZs2apcLCQqfH+k5r167VjBkzXNYOHz6s2NhYhYSEKCIiQmlpaQ5Nd7UzZ87o17/+tcaNG6cRI0booYce0r59+xq22zx7eXm5Fi1apPDwcIWGhmrOnDnKy8tr2G7z7F+Xn5+v0NBQbd++vWHN5tlPnjypu+6666qvbdu2SbJ7dknKzMxUdHS0AgMDNWnSJL3zzjsN22ydfe/evY3+zO+6666G94O2dXZJunjxolatWqWIiAiFhoZq2rRpOnDgQMP2Fp/d4IasWbPG3H333eaDDz4whw8fNrNmzTITJkwwNTU1To92Ta+88oq56667TGxsbMPa6dOnzY9//GOTlJRk8vLyzJtvvmkCAwPNm2++6eCkX4mLizOTJ082OTk55tixY+bJJ580QUFBJi8vz/rZH3zwQTN16lTz2Wefmby8PJOQkGBGjx5tqqurrZ/9itraWvPTn/7UBAQEmIyMDGOM/feZrKwsExgYaL788ktTWlra8HX+/HnrZ8/MzDRDhgwxmzdvNgUFBebFF180gwcPNgcOHLB69pqaGpefdWlpqfnb3/5mhg4dav7whz9YPbsxxqSmpprRo0ebPXv2mIKCApOUlGRGjBhhSkpKHJmdIN+AmpoaExoaal5//fWGtbNnz5qgoCCzY8cOBydrXElJiZk9e7YJCQkx9913n0uQ161bZ8aOHWsuXrzYsJaSkmImTpzoxKguCgoKTEBAgNm/f3/DWn19vZkwYYJZvXq11bOfPn3aLFy40OTm5jasHT582AQEBJiDBw9aPfvXpaSkmBkzZrgE2fbZX375ZTN58uRGt9k8e319vbn33nvNs88+67I+a9Yss27dOqtn/6ba2lozadIks2DBAmOM3T93Y4yZPHmyeeaZZxq+/9e//mUCAgLMu+++68jsnLK+AUeOHFFVVZXCw8Mb1rp166ahQ4cqJyfHwcka9/nnn+uWW27Rn/70JwUHB7ts27dvn8LCwlw+Gzo8PFz5+fkqLy9v6VFd+Pn5af369Ro+fHjDmoeHh4wxOnv2rPWzr1y5Uj/84Q8lSadOnVJaWpp69+6tQYMGWT37FTk5OUpPT9dzzz3nsm777EePHtWgQYMa3Wbz7F988YVOnjypBx54wGU9LS1N8fHxVs/+Ta+99pqKi4u1ZMkSSXb/3CWpe/fu2r17t06cOKG6ujqlp6fL29tbQ4YMcWR2gnwDSkpKJF39odU9e/Zs+DBum4wfP14pKSm64447rtpWUlJy1cdX9uzZU5JUVFTUIvNdS7du3XTPPfe4vFf5O++8o+PHj2vMmDFWz/51y5Yt0+jRo/Xuu+9qxYoV8vX1tX72yspKLV68WE888cRV93PbZ8/NzVV5ebmmTZumf/u3f9NDDz2kPXv2SLJ79oKCAklSdXW1Zs+erbvvvlsPPvig3n//fUl2z/51NTU1WrdunWbOnNkwn+2zJyUlycvLS5GRkQoMDNSqVau0evVq9e3b15HZCfINOH/+vCRd9aEWnTp1Uk1NjRMjNdmFCxcavR2SrLst+/fv19KlSxUZGanx48e3mtlnzpypjIwMTZ48WfPmzdPnn39u/ezJyckKCQm56mhNsvs+U1tbq4KCAp07d04LFizQ+vXrFRgYqEcffVTZ2dlWz37u3DlJUmJiou6//35t2rRJo0eP1ty5c62f/eveeust1dTUuDx51PbZjx07pm7duumll15Senq6fvrTnyoxMVFHjhxxZPZW8eEStujcubOky//zX/lv6fJfjo+Pj1NjNUnnzp1VW1vrsnblTubr6+vESI3atWuXfvWrXyk4OFgrV66U1Hpmv3L69Mknn9Snn36qrVu3Wj17Zmam9u3bp7fffrvR7TbP7u3trZycHHl5eTX8Izp8+HAdO3ZMaWlpVs/esWNHSdLs2bMVExMjSRoyZIgOHTqkV155xerZvy4zM1NRUVHy8/NrWLN59pMnT2rRokXavHmzRo4cKUkKDAxUXl6e1qxZ48jsHCHfgCun8EpLS13WS0tLrzq1YbvevXs3ejskqVevXk6MdJWtW7cqISFB48aN04YNGxp+CbJ59vLycu3YsUN1dXUNa56enho4cGDD/cTW2TMyMlReXt7wEpDQ0FBJ0vLlyzVp0iSrZ5cu/yP5zSOagIAAffnll1bPfuXfjoCAAJf1QYMG6cSJE1bPfsXp06f1ySefKDo62mXd5tk/++wzXbx4UYGBgS7rwcHBKigocGR2gnwDBg8erK5du2rv3r0Na5WVlTp06FDDb1itRVhYmPbv3+8SjuzsbA0YMEA9evRwcLLLXn/9dT355JOaPn26Vq9e7fIPrc2zl5aW6r//+7/18ccfN6xdvHhRhw4d0sCBA62e/fnnn9df/vIXZWZmNnxJlz+PfP369VbPfuTIEYWGhrq8Vl2S/vGPf2jQoEFWzz506FB16dJFBw8edFnPzc1V3759rZ79igMHDsjDw0OjRo1yWbd59isHWEePHnVZz83NVb9+/ZyZvdmev91GrVy50owaNcrs2rWr4XXIUVFRVr8O2RhjEhMTXV72dOrUKRMWFmYSExPN//3f/5mMjAwTGBhotm/f7uCUl33xxRdm2LBhZt68eVe9xrGystLq2evr682sWbPMxIkTTU5Ojjl69KhZuHChCQsLMydPnrR69sZ8/WVPNs9eV1dnHnzwQXP//febnJwck5eXZ55++mkzfPhwc+TIEatnN8aYl156yYSGhpq3337bFBYWmrVr15rBgwebv//979bPbszl92eIioq6at3m2evq6sy0adPMfffdZ7Kzs01+fr5ZtWqVGTJkiPnkk08cmZ0g36BLly6Z3/zmNyY8PNyEhISYRx991Pzzn/90eqzv9M0gG2PMwYMHzc9+9jMzfPhwc++995pXX33VoelcvfzyyyYgIKDRr8TERGOMvbMbY0xlZaVZvny5GT16tAkKCjKzZs1yeV2yzbN/09eDbIzds5eXl5slS5aY0aNHm8DAQDN16lSTk5PTsN3m2Y0xZtOmTWb8+PFm2LBhZvLkyWbnzp0N22yfffny5eZnP/tZo9tsnv3MmTMmOTnZREREmNDQUDN16lSzd+/ehu0tPTufhwwAgAV4DBkAAAsQZAAALECQAQCwAEEGAMACBBkAAAsQZAAALECQAQCwAEEGAMACBBkAAAsQZAAALECQAQCwAEEG2qkLFy4oJSVFUVFRGj58uEaMGKG4uDgdPny44TJ//OMfFR0drcDAQE2ePFnZ2dkaOnSotm/f3nCZoqIiPfbYYxo1apSCg4M1c+ZMHTp0yImbBLRqBBlopxYvXqw333xTc+bM0aZNm/T4448rNzdXCxculDFGmZmZevzxxzVixAitXbtWEydO1Ny5c10+H/b06dP6j//4D33++edatmyZUlJSVF9fr+nTp+vYsWMO3jqg9fFyegAALa+2tlZVVVVatmyZoqOjJUmjRo1SVVWVnn32WZWVlSk1NVX33nuvnnrqKUnS2LFj1bFjR6WkpDTsZ8uWLTpz5ox+//vf6/bbb5ckjRs3TtHR0UpNTdULL7zQ8jcOaKU4QgbaIW9vb6WlpSk6OlqlpaXKyclRenq6du/eLUkqKChQUVGR7rvvPpfrTZo0yeX77OxsDRkyRL169dKlS5d06dIleXp6aty4cfrf//3fFrs9QFvAETLQTu3Zs0dPP/20vvjiC3Xp0kV33XWXunTpIknq2LGjJKlHjx4u1/H393f5/syZMyosLNSwYcMa/TPOnz8vHx+fZpgeaHsIMtAOHT9+XPPmzVNkZKR+97vfqW/fvpKk1157TXv27Gl4nLi8vNzlet/8/nvf+55GjRqlxYsXN/rneHt7N8P0QNvEKWugHfrHP/6hmpoaxcfHN8RYunzULEk9e/ZU3759tXPnTpfr/fWvf3X5ftSoUcrPz9eAAQMUGBjY8PWnP/1J27ZtU4cOHZr/xgBtBEEG2qFhw4bJy8tLv/3tb/XRRx9p9+7dSkhI0AcffCDp8qnm+fPna9euXVq+fLn+9re/aePGjUpNTZUkeXpe/qfjkUceUX19vR555BH95S9/UXZ2tpYtW6b/+Z//0Z133unUzQNaJQ9jjHF6CAAt791339WLL76o48eP65ZbblFISIgefvhhzZgxQ8uWLdP06dOVnp6utLQ0FRUV6Yc//KGmT5+upKQkrVmzRlFRUZIun/5OSUlRdna2ampq1L9/f82YMUP//u//7vAtBFoXggygUTt27NDQoUNdjnQ/+OADxcfH66233tLgwYMdnA5oewgygEbNmTNHx44d04IFC9SnTx8VFBTohRdeUL9+/fTqq686PR7Q5hBkAI2qqKhQSkqKPvzwQ50+fVq33nqrJk6cqPnz5ze8PAqA+xBkAAAswLOsAQCwAEEGAMACBBkAAAsQZAAALECQAQCwAEEGAMACBBkAAAsQZAAALECQAQCwwP8DsNqdAD50ApQAAAAASUVORK5CYII=",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.displot(df1['age'])\n",
"print('Age: minimum=', df1['age'].min(), '; maximum=', df1['age'].max(), '; median=', df1['age'].median())\n",
"print('The most often ages:')\n",
"print(df1['age'].value_counts().sort_values(ascending=False).iloc[0:5])"
]
},
{
"cell_type": "markdown",
"id": "7cc6c1ce",
"metadata": {},
"source": [
"> Age values seem to be fully valid with exception of missing values."
]
},
{
"cell_type": "markdown",
"id": "b017c87c",
"metadata": {},
"source": [
"### Dealing with missing and invalid data\n",
"\n",
"Now we use exploration outcomes for the data cleaning.\n",
"\n",
"**TASK 1.** \n",
"Consider how to treat missing or invalid data of fare, embarkment, age and cabin. Then prepare a script for data cleaning."
]
},
{
"cell_type": "markdown",
"id": "4be35bb4",
"metadata": {},
"source": [
"**Answers:**\n",
"\n",
"* missing fare -- could be either omitted (one case only) or estimated from other attributes\n",
"* zero fare -- few cases only, could be kept as valid (possibly special passengers) or omitted (possibly errors)\n",
"* missing embarkment -- could be either be omitted (one case only) or estimated from other attributes\n",
"* missing age -- should not be omitted (too many cases, we have to deal with it other way)\n",
"* missing cabin -- should not be omitted (missing value is informative)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "b9e22961",
"metadata": {},
"outputs": [],
"source": [
"# cleaning - example of omitting records with missing values (keeping record with non-missing and valid values)\n",
"# we will not run it yet\n",
"# df1 = df1[df1['fare'].notna() & (df1['fare']>0) & (df1['embarked'].notna())]"
]
},
{
"cell_type": "markdown",
"id": "4167d850",
"metadata": {},
"source": [
"### Feature extraction\n",
"\n",
"Multiple persons travelled on one ticket, so they can have the same fare which was paid only once. It's a reason to make new statistical units – tickets. But is data for the same ticket consistent? Let's check the integrity of data for the tickets.\n",
"\n",
"**TASK 2.** \n",
"Explore whether all passengers with the same ticket have the same fare, pclass, embarkment and cabin."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "114f6b3b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"fare\n",
"1 659\n",
"0 1\n",
"Name: count, dtype: int64\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
passenger_id
\n",
"
ticket
\n",
"
pclass
\n",
"
fare
\n",
"
sex
\n",
"
age
\n",
"
cabin
\n",
"
embarked
\n",
"
\n",
" \n",
" \n",
"
\n",
"
416
\n",
"
1225
\n",
"
3701
\n",
"
3
\n",
"
NaN
\n",
"
male
\n",
"
60.5
\n",
"
NaN
\n",
"
S
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" passenger_id ticket pclass fare sex age cabin embarked\n",
"416 1225 3701 3 NaN male 60.5 NaN S"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# same fare for the same ticket?\n",
"print(df1.groupby('ticket').agg({'fare': 'nunique'}).value_counts())\n",
"# Which ticket is for a passenger with missing fare? Are there more passengers for this ticket?\n",
"ticket_na_fare = df1[df1['fare'].isna()]['ticket'].values.tolist()\n",
"df1[df1['ticket'].isin(ticket_na_fare)]"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "88157b90",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pclass\n",
"1 660\n",
"Name: count, dtype: int64\n"
]
}
],
"source": [
"# same pclass for the same ticket?\n",
"print(df1.groupby('ticket').agg({'pclass': 'nunique'}).value_counts())"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "915f0225",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"embarked\n",
"1 658\n",
"0 1\n",
"2 1\n",
"Name: count, dtype: int64\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
passenger_id
\n",
"
ticket
\n",
"
pclass
\n",
"
fare
\n",
"
sex
\n",
"
age
\n",
"
cabin
\n",
"
embarked
\n",
"
\n",
" \n",
" \n",
"
\n",
"
285
\n",
"
258
\n",
"
113798
\n",
"
1
\n",
"
31.0
\n",
"
female
\n",
"
30.0
\n",
"
NaN
\n",
"
C
\n",
"
\n",
"
\n",
"
381
\n",
"
46
\n",
"
113798
\n",
"
1
\n",
"
31.0
\n",
"
male
\n",
"
NaN
\n",
"
NaN
\n",
"
S
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" passenger_id ticket pclass fare sex age cabin embarked\n",
"285 258 113798 1 31.0 female 30.0 NaN C\n",
"381 46 113798 1 31.0 male NaN NaN S"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# same embarkments for the same ticket?\n",
"print(df1.groupby('ticket').agg({'embarked': 'nunique'}).value_counts())\n",
"# For which ticket were there more embarkments?\n",
"tmp_tickets = df1.groupby('ticket').agg({'embarked': 'nunique'})\n",
"ticket_mult_emb = tmp_tickets[tmp_tickets['embarked'] > 1].index.tolist()\n",
"df1[df1['ticket'].isin(ticket_mult_emb)]"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "1105dee2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"cabin\n",
"0 532\n",
"1 112\n",
"2 14\n",
"3 1\n",
"4 1\n",
"Name: count, dtype: int64\n"
]
}
],
"source": [
"# same cabin for the same ticket?\n",
"print(df1.groupby('ticket').agg({'cabin': 'nunique'}).value_counts())"
]
},
{
"cell_type": "markdown",
"id": "eb568332",
"metadata": {},
"source": [
"> For each ticket, there is the same fare (possibly missing or zero) and same class. \n",
"> For each ticket except two cases, there is one embarkment place. One ticket has two places and one ticket none (missing). \n",
"> There can be various numbers of cabin for a ticket (and possibly none, too)."
]
},
{
"cell_type": "markdown",
"id": "9d8caa76",
"metadata": {},
"source": [
"Now we make a table of tickets by few steps:\n",
"\n",
"1. Base table -- unique rows of *ticket*, *pclass*, *fare* (we know there is integrity).\n",
"2. Aggregated features grouped by *ticket* -- e. g. count of passengers; join aggregated table to the base table.\n",
"3. Artificial aggregation as a solution of multiple embarkment -- we take the highest value of *embarked* to unify embarkment places for tickets.\n",
"\n",
"**TASK 3.** \n",
"Make a table with tickets as rows and features (some of them aggregated). Choose useful features for future analysis by yourself."
]
},
{
"cell_type": "markdown",
"id": "67c628de",
"metadata": {},
"source": [
"**Chosen features:**\n",
"* count of passengers\n",
"* ratio of male passengers\n",
"* age of the youngest and of the oldest passenger\n",
"* average age of passengers\n",
"* count af passengers with known age\n",
"* sex od the oldest passenger\n",
"* count of (distinct) cabins"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "c5433f7b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
pclass
\n",
"
fare
\n",
"
embarked
\n",
"
pass_cnt
\n",
"
rate_males
\n",
"
age_min
\n",
"
age_max
\n",
"
age_mean
\n",
"
age_valid_cnt
\n",
"
cabin_cnt
\n",
"
sex_oldest
\n",
"
\n",
"
\n",
"
ticket
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
335432
\n",
"
3
\n",
"
7.7333
\n",
"
Q
\n",
"
1
\n",
"
0.0
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
0
\n",
"
0
\n",
"
female
\n",
"
\n",
"
\n",
"
315089
\n",
"
3
\n",
"
8.6625
\n",
"
S
\n",
"
1
\n",
"
1.0
\n",
"
38.0
\n",
"
38.0
\n",
"
38.0
\n",
"
1
\n",
"
0
\n",
"
male
\n",
"
\n",
"
\n",
"
345773
\n",
"
3
\n",
"
24.1500
\n",
"
S
\n",
"
2
\n",
"
0.5
\n",
"
30.0
\n",
"
36.0
\n",
"
33.0
\n",
"
2
\n",
"
0
\n",
"
male
\n",
"
\n",
"
\n",
"
29105
\n",
"
2
\n",
"
23.0000
\n",
"
S
\n",
"
2
\n",
"
0.0
\n",
"
20.0
\n",
"
54.0
\n",
"
37.0
\n",
"
2
\n",
"
0
\n",
"
female
\n",
"
\n",
"
\n",
"
28221
\n",
"
2
\n",
"
13.0000
\n",
"
S
\n",
"
1
\n",
"
1.0
\n",
"
40.0
\n",
"
40.0
\n",
"
40.0
\n",
"
1
\n",
"
0
\n",
"
male
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
3101267
\n",
"
3
\n",
"
6.4958
\n",
"
S
\n",
"
1
\n",
"
1.0
\n",
"
18.0
\n",
"
18.0
\n",
"
18.0
\n",
"
1
\n",
"
0
\n",
"
male
\n",
"
\n",
"
\n",
"
19943
\n",
"
1
\n",
"
90.0000
\n",
"
S
\n",
"
1
\n",
"
1.0
\n",
"
38.0
\n",
"
38.0
\n",
"
38.0
\n",
"
1
\n",
"
1
\n",
"
male
\n",
"
\n",
"
\n",
"
680
\n",
"
1
\n",
"
50.0000
\n",
"
S
\n",
"
1
\n",
"
1.0
\n",
"
55.0
\n",
"
55.0
\n",
"
55.0
\n",
"
1
\n",
"
1
\n",
"
male
\n",
"
\n",
"
\n",
"
11771
\n",
"
1
\n",
"
29.7000
\n",
"
C
\n",
"
1
\n",
"
1.0
\n",
"
58.0
\n",
"
58.0
\n",
"
58.0
\n",
"
1
\n",
"
1
\n",
"
male
\n",
"
\n",
"
\n",
"
250647
\n",
"
2
\n",
"
13.0000
\n",
"
S
\n",
"
1
\n",
"
1.0
\n",
"
52.0
\n",
"
52.0
\n",
"
52.0
\n",
"
1
\n",
"
0
\n",
"
male
\n",
"
\n",
" \n",
"
\n",
"
660 rows × 11 columns
\n",
"
"
],
"text/plain": [
" pclass fare embarked pass_cnt rate_males age_min age_max \\\n",
"ticket \n",
"335432 3 7.7333 Q 1 0.0 NaN NaN \n",
"315089 3 8.6625 S 1 1.0 38.0 38.0 \n",
"345773 3 24.1500 S 2 0.5 30.0 36.0 \n",
"29105 2 23.0000 S 2 0.0 20.0 54.0 \n",
"28221 2 13.0000 S 1 1.0 40.0 40.0 \n",
"... ... ... ... ... ... ... ... \n",
"3101267 3 6.4958 S 1 1.0 18.0 18.0 \n",
"19943 1 90.0000 S 1 1.0 38.0 38.0 \n",
"680 1 50.0000 S 1 1.0 55.0 55.0 \n",
"11771 1 29.7000 C 1 1.0 58.0 58.0 \n",
"250647 2 13.0000 S 1 1.0 52.0 52.0 \n",
"\n",
" age_mean age_valid_cnt cabin_cnt sex_oldest \n",
"ticket \n",
"335432 NaN 0 0 female \n",
"315089 38.0 1 0 male \n",
"345773 33.0 2 0 male \n",
"29105 37.0 2 0 female \n",
"28221 40.0 1 0 male \n",
"... ... ... ... ... \n",
"3101267 18.0 1 0 male \n",
"19943 38.0 1 1 male \n",
"680 55.0 1 1 male \n",
"11771 58.0 1 1 male \n",
"250647 52.0 1 0 male \n",
"\n",
"[660 rows x 11 columns]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# User function\n",
"def rate_males(s):\n",
" return np.mean(np.where(s=='male', 1, 0))\n",
"\n",
"### Base table\n",
"df2_base = df1[['ticket', 'pclass', 'fare']].drop_duplicates()\n",
"df2_base = df2_base.set_index('ticket') # setting 'ticket' column as key\n",
"\n",
"### Multiple embarkment solution\n",
"df2_emb = df1.groupby('ticket').agg({'embarked': 'max'})\n",
"# print('Ticket with multiple embarkment has been unified:')\n",
"# print(df2_emb.loc['113798'])\n",
"# no need to set index - groupby + agg sets index by default\n",
"\n",
"### Some chosen features\n",
"df2_feat = df1.groupby('ticket').agg({'ticket': 'count', 'sex': [rate_males],\n",
" 'age': ['min', 'max', np.mean, 'count'], 'cabin': 'nunique'})\n",
"# column names update\n",
"df2_feat.columns = ['pass_cnt', 'rate_males', 'age_min', 'age_max', 'age_mean', 'age_valid_cnt', 'cabin_cnt']\n",
"\n",
"# sex of the oldest person for the ticket\n",
"df2_feat_sex_oldest = df1.sort_values(by=['ticket', 'age'], ascending=[True, False]) \\\n",
" .drop_duplicates('ticket')[['ticket', 'sex']]\n",
"df2_feat_sex_oldest = df2_feat_sex_oldest.set_index('ticket') # setting 'ticket' column as key\n",
"df2_feat_sex_oldest.columns = ['sex_oldest']\n",
"\n",
"### Joining tables together\n",
"df2 = df2_base.join(df2_emb) # join is by default LEFT and index<->index\n",
"df2 = df2.join(df2_feat)\n",
"df2 = df2.join(df2_feat_sex_oldest)\n",
"\n",
"df2"
]
},
{
"cell_type": "markdown",
"id": "87466f9c",
"metadata": {},
"source": [
"### Data transformation\n",
"\n",
"* The distribution of fare is very skew. Let's transform it by log to get it better balanced.\n",
"* The fare is given as a total. But it's better to get an average fare per one passenger.\n",
"\n",
"**TASK 4.**\n",
"Add new columns to the table as stated above."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "b996b7a9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
pclass
\n",
"
fare
\n",
"
embarked
\n",
"
pass_cnt
\n",
"
rate_males
\n",
"
age_min
\n",
"
age_max
\n",
"
age_mean
\n",
"
age_valid_cnt
\n",
"
cabin_cnt
\n",
"
sex_oldest
\n",
"
fare_log
\n",
"
fare_per_pass
\n",
"
\n",
"
\n",
"
ticket
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
335432
\n",
"
3
\n",
"
7.7333
\n",
"
Q
\n",
"
1
\n",
"
0.0
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
0
\n",
"
0
\n",
"
female
\n",
"
0.941178
\n",
"
7.7333
\n",
"
\n",
"
\n",
"
315089
\n",
"
3
\n",
"
8.6625
\n",
"
S
\n",
"
1
\n",
"
1.0
\n",
"
38.0
\n",
"
38.0
\n",
"
38.0
\n",
"
1
\n",
"
0
\n",
"
male
\n",
"
0.985090
\n",
"
8.6625
\n",
"
\n",
"
\n",
"
345773
\n",
"
3
\n",
"
24.1500
\n",
"
S
\n",
"
2
\n",
"
0.5
\n",
"
30.0
\n",
"
36.0
\n",
"
33.0
\n",
"
2
\n",
"
0
\n",
"
male
\n",
"
1.400538
\n",
"
12.0750
\n",
"
\n",
"
\n",
"
29105
\n",
"
2
\n",
"
23.0000
\n",
"
S
\n",
"
2
\n",
"
0.0
\n",
"
20.0
\n",
"
54.0
\n",
"
37.0
\n",
"
2
\n",
"
0
\n",
"
female
\n",
"
1.380211
\n",
"
11.5000
\n",
"
\n",
"
\n",
"
28221
\n",
"
2
\n",
"
13.0000
\n",
"
S
\n",
"
1
\n",
"
1.0
\n",
"
40.0
\n",
"
40.0
\n",
"
40.0
\n",
"
1
\n",
"
0
\n",
"
male
\n",
"
1.146128
\n",
"
13.0000
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
3101267
\n",
"
3
\n",
"
6.4958
\n",
"
S
\n",
"
1
\n",
"
1.0
\n",
"
18.0
\n",
"
18.0
\n",
"
18.0
\n",
"
1
\n",
"
0
\n",
"
male
\n",
"
0.874818
\n",
"
6.4958
\n",
"
\n",
"
\n",
"
19943
\n",
"
1
\n",
"
90.0000
\n",
"
S
\n",
"
1
\n",
"
1.0
\n",
"
38.0
\n",
"
38.0
\n",
"
38.0
\n",
"
1
\n",
"
1
\n",
"
male
\n",
"
1.959041
\n",
"
90.0000
\n",
"
\n",
"
\n",
"
680
\n",
"
1
\n",
"
50.0000
\n",
"
S
\n",
"
1
\n",
"
1.0
\n",
"
55.0
\n",
"
55.0
\n",
"
55.0
\n",
"
1
\n",
"
1
\n",
"
male
\n",
"
1.707570
\n",
"
50.0000
\n",
"
\n",
"
\n",
"
11771
\n",
"
1
\n",
"
29.7000
\n",
"
C
\n",
"
1
\n",
"
1.0
\n",
"
58.0
\n",
"
58.0
\n",
"
58.0
\n",
"
1
\n",
"
1
\n",
"
male
\n",
"
1.487138
\n",
"
29.7000
\n",
"
\n",
"
\n",
"
250647
\n",
"
2
\n",
"
13.0000
\n",
"
S
\n",
"
1
\n",
"
1.0
\n",
"
52.0
\n",
"
52.0
\n",
"
52.0
\n",
"
1
\n",
"
0
\n",
"
male
\n",
"
1.146128
\n",
"
13.0000
\n",
"
\n",
" \n",
"
\n",
"
660 rows × 13 columns
\n",
"
"
],
"text/plain": [
" pclass fare embarked pass_cnt rate_males age_min age_max \\\n",
"ticket \n",
"335432 3 7.7333 Q 1 0.0 NaN NaN \n",
"315089 3 8.6625 S 1 1.0 38.0 38.0 \n",
"345773 3 24.1500 S 2 0.5 30.0 36.0 \n",
"29105 2 23.0000 S 2 0.0 20.0 54.0 \n",
"28221 2 13.0000 S 1 1.0 40.0 40.0 \n",
"... ... ... ... ... ... ... ... \n",
"3101267 3 6.4958 S 1 1.0 18.0 18.0 \n",
"19943 1 90.0000 S 1 1.0 38.0 38.0 \n",
"680 1 50.0000 S 1 1.0 55.0 55.0 \n",
"11771 1 29.7000 C 1 1.0 58.0 58.0 \n",
"250647 2 13.0000 S 1 1.0 52.0 52.0 \n",
"\n",
" age_mean age_valid_cnt cabin_cnt sex_oldest fare_log \\\n",
"ticket \n",
"335432 NaN 0 0 female 0.941178 \n",
"315089 38.0 1 0 male 0.985090 \n",
"345773 33.0 2 0 male 1.400538 \n",
"29105 37.0 2 0 female 1.380211 \n",
"28221 40.0 1 0 male 1.146128 \n",
"... ... ... ... ... ... \n",
"3101267 18.0 1 0 male 0.874818 \n",
"19943 38.0 1 1 male 1.959041 \n",
"680 55.0 1 1 male 1.707570 \n",
"11771 58.0 1 1 male 1.487138 \n",
"250647 52.0 1 0 male 1.146128 \n",
"\n",
" fare_per_pass \n",
"ticket \n",
"335432 7.7333 \n",
"315089 8.6625 \n",
"345773 12.0750 \n",
"29105 11.5000 \n",
"28221 13.0000 \n",
"... ... \n",
"3101267 6.4958 \n",
"19943 90.0000 \n",
"680 50.0000 \n",
"11771 29.7000 \n",
"250647 13.0000 \n",
"\n",
"[660 rows x 13 columns]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# we use log10 for better interpretation, but simple log is ok, too\n",
"# be careful at zero fare - log is invalid! (we can use log(x+1) instead)\n",
"df2['fare_log'] = np.log10(df2['fare']+1)\n",
"df2['fare_per_pass'] = df2['fare'] / df2['pass_cnt']\n",
"df2"
]
},
{
"cell_type": "markdown",
"id": "5995bcca",
"metadata": {},
"source": [
"**TASK 5.**\n",
"1. Make new columns as meaningful categories \"binned\" from count of passengers, mean age, count of distinct cabins.\n",
"2. Make flags \"child\" and \"baby\": flag is True when the youngest passenger for a ticket was under 15, resp. under 3 years.\n",
"3. Find the most often combinations of men and women travelling on one ticket (e. g. \"single man\", \"man+woman\", \"two men\", \"other\" etc.) and make a new column with category description."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "51452489",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pass_cnt\n",
"1 542\n",
"2 83\n",
"3 19\n",
"4 7\n",
"6 4\n",
"5 3\n",
"10 1\n",
"8 1\n",
"Name: count, dtype: int64\n",
"pass_cnt_cat\n",
"1 542\n",
"2 83\n",
"3 19\n",
"4+ 16\n",
"Name: count, dtype: int64\n",
"age_mean_cat\n",
"15- 36\n",
"15-20 75\n",
"20-25 108\n",
"25-30 91\n",
"30-40 108\n",
"40+ 101\n",
"Name: count, dtype: int64\n",
"cabin_cnt\n",
"0 532\n",
"1 112\n",
"2 14\n",
"4 1\n",
"3 1\n",
"Name: count, dtype: int64\n",
"cabin_cnt_cat\n",
"none 532\n",
"1 112\n",
"2+ 16\n",
"Name: count, dtype: int64\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\jhucin\\Anaconda3\\lib\\site-packages\\seaborn\\axisgrid.py:118: UserWarning: The figure layout has changed to tight\n",
" self._figure.tight_layout(*args, **kwargs)\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAeQAAAHkCAYAAADvrlz5AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8pXeV/AAAACXBIWXMAAA9hAAAPYQGoP6dpAAAy5klEQVR4nO3de1SVZaLH8R+IW0AziRDoohK2vSQIJkZjGsloHS1PxHE8Y5ihY1RmYxfxGDOJx+wyk7eyspTUk13IYLScaWaUqVXNYUwo6zSiDgY0CogiyigCBu/5o8WuHVCKG95H+X7WYq3Zz7P3ww9i/O33st/Xy7IsSwAAwFbedgcAAAAUMgAARqCQAQAwAIUMAIABKGQAAAxAIQMAYAAKGQAAA1DIAAAYoNMVclJSkpKSkuyOAQCAGx+7A3S0srIyuyMAANBMp9tCBgDARBQyAAAGoJABADAAhQwAgAEoZAAADEAhAwBgAAoZAAADUMgAABiAQgYAwAAUMgAABqCQAQAwAIUMAIABKGQAAAxAIQMAYAAKGQAAA1DIAAAYgEIGAMAAFDIAAAagkAEAMACFjHNOY6Nl5FoAcDZ87A4AnClvby9lbt2rQ1U1Z7VOUIC/Jo91eigVAJwdChnnpENVNSo9fMLuGADgMeyyBgDAABQyAAAGoJABADAAhQwAgAEoZAAADEAhAwBgAAoZAAADUMgAABiAQgYAwAAUMgAABrC9kE+dOqVly5YpLi5O0dHRmjJlij755BPXfEFBgZKSkhQVFaW4uDhlZGTYmBYAgPZheyG/8MILysrK0mOPPaZNmzbpiiuu0MyZM3Xw4EFVVVUpOTlZ/fr1U1ZWlmbPnq0VK1YoKyvL7tgAAHiU7TeXyMnJ0c0336zrrrtOkvRf//Vf2rhxo3bu3Kni4mI5HA6lp6fLx8dH4eHhKikp0erVq5WYmGhzcgAAPMf2LeRevXrpvffe0/79+9XQ0KDMzEw5HA4NGjRIeXl5iomJkY/Pt+8bYmNjVVRUpMrKShtTAwDgWbZvIaelpemBBx5QfHy8unTpIm9vb61YsUJ9+vRReXm5nE73+9X27t1bklRaWqrAwEA7IgMA4HG2F/K+ffvUs2dPPffccwoODtbGjRs1b948bdiwQbW1tXI4HG7P79atmySprq6u1TXj4+NbnSsrK1NoaKhnwgMA4CG2FvKBAwc0d+5crVu3TsOHD5ckRUREqLCwUM8++6x8fX1VX1/v9pqmIvb39+/wvAAAtBdbC/nzzz/XqVOnFBER4TY+dOhQffDBB7rkkktUUVHhNtf0ODg4uNV1c3JyWp37oa1nAADsYutJXU27jvfs2eM2vnfvXvXt21cxMTHKz89XQ0ODay43N1dhYWEcPwYAnFdsLeTIyEgNHz5c8+bN09/+9jcVFxdr+fLlys3N1V133aXExEQdP35caWlpKiwsVHZ2ttavX6+UlBQ7YwMA4HG27rL29vbW888/r+XLl2v+/Pk6duyYnE6n1q1bp6ioKEnSmjVrtHjxYiUkJCgoKEipqalKSEiwMzYAAB7nZVmWZXeIjtR0DPmHjjPDfCvf3KnSwyfOao1LLu6u+34W5ZlAAHCWbL8wCAAAoJABADAChQwAgAEoZAAADEAhAwBgAAoZAAADUMgAABiAQgYAwAAUMgAABqCQAQAwAIUMAIABKGQAAAxAIQMAYAAKGQAAA1DIAAAYgEIGAMAAFDIAAAagkNFp9fDvqsZGy2PreXItAJ2Pj90BALv4OXzk7e2lzK17daiq5qzWCgrw1+SxTg8lA9AZUcjo9A5V1aj08Am7YwDo5NhlDQCAAShkAAAMQCEDAGAAChkAAANQyAAAGIBCBgDAABQyAAAGoJABADAAhQwAgAEoZAAADEAhAwBgAAoZAAADUMgAABiAQgYAwAAUMgAABrD1fsjbt2/XHXfc0eLcZZddppycHBUUFGjx4sX64osv1KtXL02dOlUzZszo4KQAALQvWws5OjpaH330kdvY3r17ddddd+nuu+9WVVWVkpOT9dOf/lQLFy7Uzp07tXDhQvXq1UuJiYk2pQYAwPNsLWSHw6GgoCDX41OnTumJJ57QuHHjNGnSJL344otyOBxKT0+Xj4+PwsPDVVJSotWrV1PIAIDzilHHkF999VWVlZVp/vz5kqS8vDzFxMTIx+fb9w2xsbEqKipSZWWlXTEBAPA4Ywq5rq5Oq1at0rRp09S7d29JUnl5uUJCQtye1zRXWlra4RkBAGgvtu6y/q7Nmzerrq5OU6dOdY3V1tbK4XC4Pa9bt26Svinw1sTHx7c6V1ZWptDQ0LNMCwCAZxmzhbxp0yaNGzdOAQEBrjFfX1/V19e7Pa+piP39/Ts0HwAA7cmILeQjR47o008/VUpKitt4SEiIKioq3MaaHgcHB7e6Xk5OTqtzP7T1DACAXYzYQv7kk0/k5eWlESNGuI3HxMQoPz9fDQ0NrrHc3FyFhYUpMDCwo2MCANBujCjk3bt36/LLL5efn5/beGJioo4fP660tDQVFhYqOztb69evb7YlDQDAuc6IQj58+LB69erVbDwwMFBr1qxRUVGREhIStHLlSqWmpiohIaHjQwIA0I6MOIacnp7e6lxkZKQyMzM7LgwAADYwYgsZAIDOjkIGAMAAFDIAAAagkAEAMACFDACAAShkAAAMQCEDAGAAChkAAANQyOgwjY2W3REAwFhGXKkLnYO3t5cyt+7VoaqaNq/h7BOgcbF9PZgKAMxAIaNDHaqqUenhE21+fVAvvx9/EgCcg9hlDQCAAShkAAAMQCEDAGAAChkAAANQyAAAGIBCBgDAABQyAAAGoJABADAAhQwAgAEoZAAADEAhAwBgAAoZ8IAe/l09ejcr7owFdD7cXALwAD+Hj0fuZiVJQQH+mjzW6aFkAM4VFDLgQWd7NysAnRe7rAEAMACFDACAAShkAAAMQCEDAGAAChkAAANQyAAAGIBCBgDAABQyAAAGoJABADAAhQwAgAEoZAAADGBEIW/atEnjx49XRESEJkyYoHfffdc1V1BQoKSkJEVFRSkuLk4ZGRk2JgUAoH3YXsibN2/WI488osmTJ2vLli0aP368HnzwQX366aeqqqpScnKy+vXrp6ysLM2ePVsrVqxQVlaW3bEBAPAoW+/2ZFmWVqxYoWnTpmnatGmSpFmzZumTTz7Rxx9/rI8//lgOh0Pp6eny8fFReHi4SkpKtHr1aiUmJtoZHQAAj7J1C/nLL7/UgQMHdMstt7iNZ2RkKCUlRXl5eYqJiZGPz7fvG2JjY1VUVKTKysqOjgsAQLuxdQu5uLhYklRTU6MZM2Zo165duuyyy3TPPfdozJgxKi8vl9PpfqP23r17S5JKS0sVGBjY4rrx8fGtfs+ysjKFhoZ65gcAAMBDbN1CPn78uCRp3rx5uvnmm/Xyyy9r5MiRuvfee5Wbm6va2lo5HA6313Tr1k2SVFdX1+F5AQBoL7ZuIXft2lWSNGPGDCUkJEiSBg0apF27dmnt2rXy9fVVfX2922uaitjf37/VdXNyclqd+6GtZwAA7GLrFnJISIgkNdst3b9/f+3fv18hISGqqKhwm2t6HBwc3DEhAQDoALYW8uDBg9W9e3d99tlnbuN79+5Vnz59FBMTo/z8fDU0NLjmcnNzFRYW1urxYwAAzkW2FrKvr69+8Ytf6LnnntOWLVv01Vdf6YUXXtBf//pXJScnKzExUcePH1daWpoKCwuVnZ2t9evXKyUlxc7YAAB4nK3HkCXp3nvvlZ+fn5YtW6aDBw8qPDxczz77rK655hpJ0po1a7R48WIlJCQoKChIqampruPNAACcL2wvZElKTk5WcnJyi3ORkZHKzMzs4EQAAHQs2y+dCQAAKGQAAIxAIQMAYAAKGQAAA1DIAAAYgEIGAMAAFDIAAAagkAEAMACFDACAAShkAAAMQCEDAGAAChkAAANQyAAAGIBCBgDAABQyAAAGoJABADAAhQwAgAEoZAAADEAhAwBgAAoZAAADUMgAABiAQgYAwAAUMgAABqCQAQAwAIUMAIABKGQAAAxAIQMAYAAKGQAAA1DIAAAYgEIGAMAAFDIAAAagkAEAMACFDACAAShkAAAMQCEDAGAA2wv5wIEDGjBgQLOvjRs3SpIKCgqUlJSkqKgoxcXFKSMjw+bEAAB4no/dAfbs2aNu3bpp27Zt8vLyco1fcMEFqqqqUnJysn76059q4cKF2rlzpxYuXKhevXopMTHRxtQAAHiW7YW8d+9ehYWFqXfv3s3m1q9fL4fDofT0dPn4+Cg8PFwlJSVavXo1hQwAOK/Yvst6z5496t+/f4tzeXl5iomJkY/Pt+8bYmNjVVRUpMrKyo6KCABAuzNiCzkoKEhTpkxRcXGx+vbtq3vvvVejRo1SeXm5nE6n2/ObtqRLS0sVGBjY4prx8fGtfr+ysjKFhoZ67gcAAMADbC3k+vp6FRcXy8/PT6mpqfL399fbb7+tmTNnau3ataqtrZXD4XB7Tbdu3SRJdXV1dkQG2l0P/65qbLTk7e31408+DZ5cC0D7sbWQHQ6HduzYIR8fH1fxDhkyRPv27VNGRoZ8fX1VX1/v9pqmIvb392913ZycnFbnfmjrGTCBn8NH3t5eyty6V4eqas5qraAAf00e6/zxJwKwne27rFsqVqfTqY8++kghISGqqKhwm2t6HBwc3CH5ALscqqpR6eETdscA0EFsPalr9+7dio6OVl5entv4F198of79+ysmJkb5+flqaGhwzeXm5iosLKzV48cAAJyLbC1kp9OpK6+8UgsXLlReXp727dunJ554Qjt37tTdd9+txMREHT9+XGlpaSosLFR2drbWr1+vlJQUO2MDAOBxtu6y9vb21qpVq/T0009rzpw5qq6u1uDBg7V27VoNGDBAkrRmzRotXrxYCQkJCgoKUmpqqhISEuyMDQCAx9l+DPmiiy7S448/3up8ZGSkMjMzOzARAAAdr027rHfs2KETJ1o+2aS6ulq///3vzyoUAACdTZsK+Y477tC+fftanNu1a5fmz59/VqEAAOhsTnuX9bx581RWViZJsixL6enp6tGjR7PnFRcX6+KLL/ZcQgAAOoHT3kK+8cYbZVmWLMtyjTU9bvry9vZWVFSUnnjiiXYJCwDA+eq0t5DHjBmjMWPGSJKmTp2q9PR0hYeHt1swAAA6kzadZf3KK694OgcAAJ1amwr55MmTWrVqld577z2dPHlSjY2NbvNeXl7atm2bRwICANAZtKmQFy9erKysLI0YMUKDBg2St7ftt1UGAOCc1qZC/vOf/6wHHnhAd911l6fzAADQKbVp0/brr79WZGSkp7MAANBptamQr7vuOn3wwQeezgIAQKfVpl3W48eP14IFC3TkyBENHTpUfn5+zZ5z6623nm02AAA6jTYV8pw5cyRJmzZt0qZNm5rNe3l5UcgAAJyBNhVyTk6Op3MAANCptamQL730Uk/nAACgU2tTIa9cufJHn3Pfffe1ZWkAADoljxdyjx491Lt3bwoZAIAz0KZC3r17d7Oxmpoa5efnKz09Xb/+9a/POhgAAJ2Jx6556e/vr1GjRmnWrFn6zW9+46llAQDoFDx+EerQ0FDt27fP08sCAHBea9Mu65ZYlqWysjKtXr2as7ABADhDbSrkgQMHysvLq8U5y7LYZQ0AwBlqUyHPmjWrxULu0aOH4uLi1K9fv7PNBQBAp9KmQp49e7ancwAA0Km1+RhyfX29srOztX37dlVXVysgIEDDhw9XQkKCunXr5smMAACc99pUyNXV1brjjju0e/duXXLJJQoKClJRUZG2bNmiV199Va+99pouuOACT2cFAOC81aaPPS1ZskTl5eXasGGD/vKXvygzM1N/+ctftGHDBlVWVmrFihWezgkAwHmtTYWck5OjOXPmaPjw4W7jw4cP1/33368///nPHgkHAEBn0aZCPnHihC6//PIW5y6//HIdPXr0bDIBANDptKmQr7jiCr333nstzuXk5Khv375nFQoAgM6mTSd1zZgxQw8++KDq6+t1yy236OKLL9bhw4f1zjvvaOPGjUpPT/dwTAAAzm9tKuTx48eruLhYq1at0saNG13jXbt21axZszR58mSPBQQAoDNoUyHX1NTo3nvvVVJSknbu3Kljx46prKxMkydP1oUXXujpjAAAnPfO6BhyQUGBbr31Vq1bt06S1LNnT40ePVqjR4/W8uXLNWXKFO70BABAG5x2If/zn//UnXfeqWPHjql///5ucw6HQ4888ohOnDihKVOmqLy83ONBAQA4n512Ib/00ksKCAjQ7373O40bN85tzs/PT0lJScrKypK/v79WrVrVpjBFRUWKjo5Wdna2a6ygoEBJSUmKiopSXFycMjIy2rQ2AAAmO+1Czs3N1S9+8Qv16tWr1ecEBgYqOTlZubm5Zxzk1KlTevjhh1VTU+Maq6qqUnJysvr166esrCzNnj1bK1asUFZW1hmvDwCAyU77pK5Dhw6d1ueLnU5nm3ZZP/vss+revbvb2JtvvimHw6H09HT5+PgoPDxcJSUlWr16tRITE8/4ewAAYKrT3kK+6KKLVFFR8aPPO3LkyA9uRbdkx44dyszM1FNPPeU2npeXp5iYGPn4fPu+ITY2VkVFRaqsrDyj7wEAgMlOews5JiZG2dnZmjBhwg8+b9OmTRo0aNBpB6iurlZqaqp+9atfKTQ01G2uvLxcTqfTbax3796SpNLSUgUGBra4Znx8fKvfr6ysrNn3AQDAbqe9hTx16lRt375dTz75pOrq6prN19fX66mnntKHH36o22+//bQDpKenKyoqSrfcckuzudraWjkcDrexpnstt5QBAIBz1WlvIUdERGj+/Pl6/PHHtXnzZl177bW67LLL1NDQoNLSUm3fvl1VVVX65S9/qVGjRp3Wmps2bVJeXp7eeeedFud9fX1VX1/vNtZUxP7+/q2um5OT0+rcD209AwBglzO6Utftt9+ugQMHKiMjQzk5Oa5y7N69u6677jpNnz5dQ4cOPe31srKyVFlZqbi4OLfxBQsWKCMjQ5dcckmz49ZNj4ODg88kOgAARjvjS2deffXVuvrqqyV987Ekb2/vNl8u8+mnn1Ztba3b2Lhx43T//fdr/Pjx+v3vf6833nhDDQ0N6tKli6RvPn4VFhbW6vFjAADORW26/WKTgICAs7p2dXBwsPr27ev2JX3zeeZLL71UiYmJOn78uNLS0lRYWKjs7GytX79eKSkpZxMbAADjnFUht7fAwECtWbNGRUVFSkhI0MqVK5WamqqEhAS7owEA4FFtuttTe9qzZ4/b48jISGVmZtqUBgCAjmH0FjIAAJ0FhQwAgAEoZAAADEAhAwBgAAoZAAADUMgAABiAQgYAwAAUMgAABqCQAQAwAIUMAIABKGQAAAxAIQMAYAAKGQAAA1DIAAAYgEIGAMAAFDIAAAagkAEAMACFDACAAShkAAAMQCEDAGAAChkAAANQyAAAGIBCBs5jPfy7qrHR8shanloHQMt87A4AoP34OXzk7e2lzK17daiqps3rBAX4a/JYpweTAfg+ChnoBA5V1aj08Am7YwD4AeyyBgDAABQyAAAGoJABADAAhQwAgAEoZAAADEAhAwBgAAoZAAADUMgAABiAQgYAwAAUMgAABrC9kCsrKzV37lzFxsYqOjpad911lwoLC13zBQUFSkpKUlRUlOLi4pSRkWFjWgAA2ofthXzPPffon//8p1avXq233npLvr6+uvPOO3Xy5ElVVVUpOTlZ/fr1U1ZWlmbPnq0VK1YoKyvL7tgAAHiUrTeXqKqq0mWXXaZ77rlHV155pSTp3nvv1b//+7/rH//4h3Jzc+VwOJSeni4fHx+Fh4erpKREq1evVmJiop3RAQDwKFu3kAMCArR06VJXGR8+fFgZGRkKCQlR//79lZeXp5iYGPn4fPu+ITY2VkVFRaqsrLQrNgAAHmfM7Rd//etf680335TD4dALL7wgf39/lZeXy+l0vwdr7969JUmlpaUKDAy0IyoAAB5nTCFPmzZNkydP1uuvv65Zs2bptddeU21trRwOh9vzunXrJkmqq6trda34+PhW58rKyhQaGuqZ0AAAeIgxhdy/f39J0qJFi7Rz505t2LBBvr6+qq+vd3teUxH7+/t3eEYAANqLrYVcWVmp3Nxc/du//Zu6dOkiSfL29lZ4eLgqKioUEhKiiooKt9c0PQ4ODm513ZycnFbnfmjrGQAAu9h6UldFRYUeeughffzxx66xU6dOadeuXQoPD1dMTIzy8/PV0NDgms/NzVVYWBjHjwEA5xVbC3ngwIG67rrrtHDhQuXl5Wnv3r2aN2+eqqurdeeddyoxMVHHjx9XWlqaCgsLlZ2drfXr1yslJcXO2AAAeJythezl5aXly5crNjZWc+bM0aRJk3Ts2DG9+uqruuSSSxQYGKg1a9aoqKhICQkJWrlypVJTU5WQkGBnbAAAPM72k7ouuOACpaenKz09vcX5yMhIZWZmdmwoAAA6mO2XzgQAABQyAABGoJABADAAhQwAgAEoZAAADEAhAwBgAAoZAAADUMgAABiAQgYAwAAUMgAABqCQAQAwAIUMAIABKGQAAAxAIQMAYAAKGQAAA1DIAAAYgEIGAMAAFDIAAAagkAEAMACFDACAAShkAAAMQCEDAGAAChkAAANQyAAAGIBCBgDAABQyAAAGoJABADAAhQygwzU2WkauBdjJx+4AADofb28vZW7dq0NVNWe1TlCAvyaPdXooFWAvChmALQ5V1aj08Am7YwDGYJc1AAAGoJAB/Kge/l05Vgu0M3ZZA/hRfg4fjx33dfYJ0LjYvh5KBpw/KGQAp80Tx32Devl5KA1wfmGXNQAABrC9kI8ePapHH31Uo0eP1rBhw/Tzn/9ceXl5rvmCggIlJSUpKipKcXFxysjIsDEtAADtw/ZCfvDBB/XZZ59p6dKleuutt3TVVVdpxowZ2rdvn6qqqpScnKx+/fopKytLs2fP1ooVK5SVlWV3bAAAPMrWY8glJSX661//qtdff13Dhg2TJKWlpemDDz7Qli1b5OvrK4fDofT0dPn4+Cg8PFwlJSVavXq1EhMT7YwOAIBH2bqFHBAQoJdeeklDhgxxjXl5ecmyLB07dkx5eXmKiYmRj8+37xtiY2NVVFSkyspKOyIDANAubC3knj176vrrr5fD4XCNvfvuu/rqq6903XXXqby8XCEhIW6v6d27tySptLS0Q7MCANCejPrYU35+vh555BHFx8drzJgxeuKJJ9zKWpK6desmSaqrq2t1nfj4+FbnysrKFBoa6pnAAAB4iO0ndTXZtm2bZsyYocjISC1dulSS5Ovrq/r6erfnNRWxv79/h2cEYBZPX0GMq5HBTkZsIW/YsEGLFy/W2LFj9fTTT7u2ikNCQlRRUeH23KbHwcHBra6Xk5PT6twPbT0DOLd48gpi3DkKdrO9kF977TUtWrRIU6dO1SOPPCJv72832mNiYvTGG2+ooaFBXbp0kSTl5uYqLCxMgYGBdkUGYBjuHIXzga27rIuKivT4449r7NixSklJUWVlpQ4dOqRDhw7pX//6lxITE3X8+HGlpaWpsLBQ2dnZWr9+vVJSUuyMDQCAx9m6hfynP/1Jp06d0tatW7V161a3uYSEBD355JNas2aNFi9erISEBAUFBSk1NVUJCQk2JQYAoH3YWsh333237r777h98TmRkpDIzMzsoEb6vsdGSt7eX3TEA4Lxn+zFkmI1b7gFAx6CQ8aO45R4AtD9jPocMAEBnRiEDAGAAChkAAANQyAAAGIBCBgDAABQyAAAGoJABADAAhQwAgAEoZAAADEAhAwBgAAoZAAADUMgAABiAQgYAwAAUMgAABqCQAQAwAIUMAIABKGQAAAxAIQMAYAAKGQAAA1DIAAAYgEIGAEk9/LuqsdHy2HqeXAudg4/dAQDABH4OH3l7eylz614dqqo5q7WCAvw1eazTQ8nQWVDIAPAdh6pqVHr4hN0x0AmxyxoAAANQyAAAGIBCBgDAABQyAAAGoJABADAAhQwAgAEoZAAADEAhnyVPXY2Hq/oAQOfGhUHOkieu7MNVfQAAFLIHcGUfAMDZYpc1AAAGMKqQn3/+eU2dOtVtrKCgQElJSYqKilJcXJwyMjJsSgcAQPsxppDXrVunZ555xm2sqqpKycnJ6tevn7KysjR79mytWLFCWVlZNqUEAKB92H4M+eDBg0pLS1N+fr7CwsLc5t588005HA6lp6fLx8dH4eHhKikp0erVq5WYmGhTYgAAPM/2LeS///3vuvDCC/X2229r6NChbnN5eXmKiYmRj8+37xtiY2NVVFSkysrKjo4KAEC7sX0LecyYMRozZkyLc+Xl5XI63T8O1Lt3b0lSaWmpAgMDW3xdfHx8q9+vrKxMoaGhbUwLAED7sH0L+YfU1tbK4XC4jXXr1k2SVFdXZ0ckAOhQnrxoEBcgMpvtW8g/xNfXV/X19W5jTUXs7+/f6utycnJanfuhrWcAMI0nLj4kcQGic4HRhRwSEqKKigq3sabHwcHBdkQCgA7HxYc6B6N3WcfExCg/P18NDQ2usdzcXIWFhbV6/BgAgHOR0YWcmJio48ePKy0tTYWFhcrOztb69euVkpJidzQAADzK6EIODAzUmjVrVFRUpISEBK1cuVKpqalKSEiwOxoAAB5l1DHkJ598stlYZGSkMjMzbUgDAEDHMXoLGQCAzoJCBgDAABTyeYgP/wPAuceoY8jwDE9dSMDZJ0DjYvt6KBUA4IdQyOcpT1xIIKiXn4fSAAB+DLusAQAwAIUMAIABKGQA8LAe/l05uRJnjGPIAOBhfg4fj5xcyYmVnQuFDADt5GxPruTEys6FXdYAABiAQgYAwAAUMgAABqCQAQAwAIUMAIABKGQAAAxAIQMAYAAK2QBc1QcAwIVBDOCpq/pIXNkHAM5VFLJBuGUiAHRe7LIGAMAAFDIAAAagkAEAMACFDACAAShkAAAMQCEDAGAAChkAAANQyAAAGIBCBgDAABQyAAAGoJABADAAhQwAnYCn7yrHHeo8j5tLAEAn4Mm7ygUF+GvyWKeHkqEJhQwAnYgn7iqH9sEuawAADHBOFHJjY6OeeeYZjRo1SkOHDtX06dNVUlJidywAADzmnCjk559/Xm+88YYee+wxZWZmysvLSzNnzlR9fb3d0QCg0zH1BDETM50J448h19fX6+WXX9bcuXN1/fXXS5KWLVumUaNGaevWrZowYYLNCQGgc/HkCWLOPgEaF9v3rNfy1DqSfSetGV/Iu3fv1okTJxQbG+sa69mzpwYPHqwdO3ZQyABgE0+cIBbUy88ja3lqHTt5WZZl9IfJ/vznP2v27Nn67LPP5Ovr6xr/5S9/qdraWr344ovNXhMfH9/qevv371eXLl0UGhrqsYwnTp5Sw1ns3ujq4y2/bj5nvU5nWMvETJ1hLRMzdYa1TMxk6lqezNTF20vd/bqe1RrfFRoaqg0bNvzo84zfQj558qQkyeFwuI1369ZNx44dO+P1vLy85ONzdj92WVmZJLlK3VP/4Tz5B9DaWt/PfjZrtcXZrNWU3S801JhMp7sWv/fmOmqtM/3dm/QzlpWV6ajO7O/mx5zvv3dPrFNWVqbqo579vZ8O4wu5aau4vr7ebQu5rq5Ofn5+Lb4mJyenXTM1bYG39/dpD2S3B9ntcy7nJ7s97Mpu/FnWTe9QKioq3MYrKioUEhJiRyQAADzO+EIeOHCgevTooe3bt7vGqqurtWvXLg0fPtzGZAAAeI7xu6wdDoeSkpL09NNP66KLLtKll16q3/72twoJCdHYsWPtjgcAgEcYX8iSdP/99+vrr7/Wr371K9XW1iomJkYZGRnNTvQCAOBcdU4UcpcuXTR37lzNnTvX7igAALQL448hAwDQGRh/YRAAADoDtpABADAAhQwAgAEoZAAADEAhAwBgAAr5DDQ2NuqZZ57RqFGjNHToUE2fPl0lJSV2x/pRzz//vKZOneo2VlBQoKSkJEVFRSkuLk4ZGRk2pWvu6NGjevTRRzV69GgNGzZMP//5z5WXl+eaNzl7ZWWl5s6dq9jYWEVHR+uuu+5SYWGha97k7N9VVFSk6OhoZWdnu8ZMz37gwAENGDCg2dfGjRslmZ9/06ZNGj9+vCIiIjRhwgS9++67rjlTs2/fvr3F3/mAAQNc14M2NbsknTp1SsuWLVNcXJyio6M1ZcoUffLJJ675Ds9u4bQ9++yz1rXXXmu9//77VkFBgTV9+nRr7NixVl1dnd3RWrV27VprwIABVlJSkmvsyJEj1jXXXGOlpaVZhYWF1ltvvWVFRERYb731lo1Jv5WcnGxNnDjR2rFjh7Vv3z5r0aJFVmRkpFVYWGh89kmTJlmTJ0+2Pv/8c6uwsNCaPXu2NXLkSKumpsb47E3q6+ut2267zXI6nVZWVpZlWeb/zViWZeXk5FgRERHWwYMHrYqKCtfXyZMnjc+/adMma9CgQda6deus4uJia+XKldbAgQOtTz75xOjsdXV1br/riooK66OPPrIGDx5svfnmm0ZntyzLWrFihTVy5Ejrww8/tIqLi620tDRr2LBhVnl5uS3ZKeTTVFdXZ0VHR1uvvfaaa+zYsWNWZGSktWXLFhuTtay8vNyaMWOGFRUVZd10001uhbxq1Spr1KhR1qlTp1xjS5YssW688UY7oropLi62nE6nlZ+f7xprbGy0xo4day1fvtzo7EeOHLEeeOABa+/eva6xgoICy+l0Wp999pnR2b9ryZIl1tSpU90K+VzI/sILL1gTJ05scc7k/I2NjdYNN9xgPfnkk27j06dPt1atWmV09u+rr6+3JkyYYM2ZM8eyLLN/75ZlWRMnTrSeeOIJ1+N//etfltPptP74xz/akp1d1qdp9+7dOnHihGJjY11jPXv21ODBg7Vjxw4bk7Xs73//uy688EK9/fbbGjp0qNtcXl6eYmJi3O4LHRsbq6KiIlVWVnZ0VDcBAQF66aWXNGTIENeYl5eXLMvSsWPHjM++dOlSXXnllZKkw4cPKyMjQyEhIerfv7/R2Zvs2LFDmZmZeuqpp9zGz4Xse/bsUf/+/VucMzn/l19+qQMHDuiWW25xG8/IyFBKSorR2b/v1VdfVVlZmebPny/J7N+7JPXq1Uvvvfee9u/fr4aGBmVmZsrhcGjQoEG2ZKeQT1N5ebmk5jes7t27t+tG3CYZM2aMlixZossvv7zZXHl5ebNbV/bu3VuSVFpa2iH5WtOzZ09df/31btcpf/fdd/XVV1/puuuuMzr7d/3617/WyJEj9cc//lGLFy+Wv7+/8dmrq6uVmpqqX/3qV83+zk3PLkl79+5VZWWlpkyZop/85Cf6+c9/rg8//FCS2fmLi4slSTU1NZoxY4auvfZaTZo0SX/5y18kmZ39u+rq6rRq1SpNmzbNlc/07GlpafLx8VF8fLwiIiK0bNkyLV++XH369LElO4V8mk6ePClJzW5o0a1bN9XV1dkRqc1qa2tb/DkkGfez5Ofn65FHHlF8fLzGjBlzzmSfNm2asrKyNHHiRM2aNUt///vfjc+enp6uqKioZltqkvl/M/X19SouLtbx48c1Z84cvfTSS4qIiNDMmTOVm5trdP7jx49LkubNm6ebb75ZL7/8skaOHKl7773X+OzftXnzZtXV1bmdQGp69n379qlnz5567rnnlJmZqdtuu03z5s3T7t27bcl+TtxcwgS+vr6Svvk/ftP/lr75D+Pn52dXrDbx9fVVfX2921jTH5i/v78dkVq0bds2Pfzwwxo6dKiWLl0q6dzJ3rTrdNGiRdq5c6c2bNhgdPZNmzYpLy9P77zzTovzJmeXvnmjvGPHDvn4+Lj+ER0yZIj27dunjIwMo/N37dpVkjRjxgwlJCRIkgYNGqRdu3Zp7dq1Rmf/rk2bNmncuHEKCAhwjZmc/cCBA5o7d67WrVun4cOHS5IiIiJUWFioZ5991pbsbCGfpqZdeBUVFW7jFRUVzXZrmC4kJKTFn0OSgoOD7YjUzIYNGzR79myNHj1aq1evdr0JMjl7ZWWltmzZooaGBteYt7e3wsPDXX8npmbPyspSZWWl6+Mf0dHRkqQFCxZowoQJRmdv4u/v32yLxul06uDBg0bnb/r3w+l0uo33799f+/fvNzp7kyNHjujTTz/V+PHj3cZNzv7555/r1KlTioiIcBsfOnSoiouLbclOIZ+mgQMHqkePHtq+fbtrrLq6Wrt27XK9uzpXxMTEKD8/3604cnNzFRYWpsDAQBuTfeO1117TokWLdPvtt2v58uVu/8ianL2iokIPPfSQPv74Y9fYqVOntGvXLoWHhxud/emnn9Yf/vAHbdq0yfUlfXMv8pdeesno7NI3J11GR0e7fV5dkr744gv179/f6PyDBw9W9+7d9dlnn7mN7927V3369DE6e5NPPvlEXl5eGjFihNu4ydmbNrL27NnjNr5371717dvXnuztdv72eWjp0qXWiBEjrG3btrk+hzxu3DijP4dsWZY1b948t489HT582IqJibHmzZtn/eMf/7CysrKsiIgIKzs728aU3/jyyy+tq666ypo1a1azzzdWV1cbnb2xsdGaPn26deONN1o7duyw9uzZYz3wwANWTEyMdeDAAaOzt+S7H3syPXtDQ4M1adIk6+abb7Z27NhhFRYWWo8//rg1ZMgQa/fu3cbnf+6556zo6GjrnXfesUpKSqznn3/eGjhwoPW3v/3N+OyW9c01GsaNG9ds3OTsDQ0N1pQpU6ybbrrJys3NtYqKiqxly5ZZgwYNsj799FNbslPIZ+Drr7+2fvOb31ixsbFWVFSUNXPmTOuf//yn3bF+1PcL2bIs67PPPrN+9rOfWUOGDLFuuOEG65VXXrEpnbsXXnjBcjqdLX7NmzfPsixzs1uWZVVXV1sLFiywRo4caUVGRlrTp093+1yyydm/77uFbFnmZ6+srLTmz59vjRw50oqIiLAmT55s7dixwzVvev6XX37ZGjNmjHXVVVdZEydOtLZu3eqaMz37ggULrJ/97Gctzpmc/ejRo1Z6eroVFxdnRUdHW5MnT7a2b9/umu/o7NwPGQAAA3AMGQAAA1DIAAAYgEIGAMAAFDIAAAagkAEAMACFDACAAShkAAAMQCEDAGAAChkAAANQyAAAGIBCBgDAABQyYKDa2lotWbJE48aN05AhQzRs2DAlJyeroKDA9Zzf/e53Gj9+vCIiIjRx4kTl5uZq8ODBys7Odj2ntLRUDz74oEaMGKGhQ4dq2rRp2rVr1xnnmTp1qh599FG98MILGjVqlIYOHaqZM2fq8OHDysrK0tixYxUdHa0777xT+/fvd3vttm3bdNtttykiIkIjR47UY489ppqammbPmTJliqKjozVkyBDddNNN2rBhg2t++/btGjBggHJzczV9+nQNHTpUP/nJT/TUU0/p66+/PuOfBzARN5cADHT//fdrx44deuihh9SnTx8VFxdrxYoVuuCCC/Tuu+9q8+bNmjdvniZNmqQbb7xRn3/+udasWaOamho98cQTuu2223TkyBHdeuut8vPz03333Sc/Pz+tX79eX3zxhd566y2Fh4efdp6pU6dq165dGjx4sGbOnKnS0lItWrRIl19+uXx9fXXffffp6NGjWrx4sWJiYvTSSy9Jkt555x09/PDDuuWWWzRx4kQdOHBAy5Yt0+DBg7V27Vp5eXnp/fffV0pKiu644w6NGTNGtbW12rBhgz766CO9/vrrGjZsmLZv36477rhDF198saZMmaJhw4bp/fff17p167Rw4UL953/+Z3v9pwA6TrveSwrAGaurq7OmT59u/f73v3cbf/nlly2n02kdPHjQiouLs1JSUtzmX3zxRbdbJi5dutSKiIiw9u/f77Z2fHy8NXv27DPKlJSUZEVERFhHjx51jU2fPt1yOp3WV1995Rr77//+b+vqq6+2LOub+0OPHj3amjFjhtta//u//2s5nU7rvffesyzLslavXm2lpqa6PaeqqspyOp3WqlWrLMuyrL/97W+W0+m0li1b5va8MWPGNPs9AOcqH7vfEABw53A4lJGRIUmqqKhQSUmJvvzyS7333nuSpOLiYpWWluqXv/yl2+smTJigJUuWuB7n5uZq0KBBCg4Odu3W9fb21ujRo/X222+fca7w8HBdeOGFrsdBQUG66KKLdPnll7vGevXqpX/961+SpC+//FLl5eVKSUlx260cExOjHj166K9//avi4uL0i1/8QpJUU1Ojr776SkVFRfq///s/SdKpU6fcMkRHR7s9DgkJabb7GzhXUciAgT788EM9/vjj+vLLL9W9e3cNGDBA3bt3lyR17dpVkhQYGOj2mqCgILfHR48eVUlJia666qoWv8fJkyfl5+d32pl69OjRbOyHXn/06FFJ0sKFC7Vw4cJm8xUVFZKkI0eOaMGCBdq2bZu8vLzUt29fXX311ZIk63tH1Hx9fd0ee3t7N3sOcK6ikAHDfPXVV5o1a5bi4+P14osvqk+fPpKkV199VR9++KEaGhokSZWVlW6v+/7jCy64QCNGjFBqamqL38fhcLRD+m/17NlTkpSamqoRI0Y0m2/a2n744Ye1b98+rV27VsOGDZPD4dDJkye1cePGds0HmIazrAHDfPHFF6qrq1NKSoqrjKVvtpolqXfv3urTp4+2bt3q9ro//elPbo9HjBihoqIihYWFKSIiwvX19ttva+PGjerSpUu7/hxXXHGFAgMDtX//frfvHxISoiVLlrjO9s7Pz9eNN96o2NhY15uEDz74QJLU2NjYrhkBk7CFDBjmqquuko+Pj377299q+vTpqq+vV3Z2tt5//31J3+xqvv/++/Xwww9rwYIFGjt2rHbv3q3nnntO0je7cSXpzjvv1ObNm3XnnXdq+vTpCggI0B/+8Ae9+eabmj9/frv/HF26dNEDDzygRx99VF26dNENN9yg6upqPf/88zp48KBrV3pkZKTeeecdXXXVVQoJCdGnn36qF198UV5eXjp58mS75wRMQSEDhunbt6+WLFmilStX6p577tGFF16oqKgovfLKK5o6dary8vJ0++23q6amRhkZGcrKytKVV16ptLQ0paWlyd/fX5IUHBysN954Q0uWLFF6errq6urUr18/LV68WP/xH//RIT/LpEmT1L17d61Zs0aZmZny9/fXsGHD9PTTT7tOBnvyySe1aNEiLVq0SJLUr18/LVy4UG+//bby8vI6JCdgAj6HDJyDtmzZosGDB+uKK65wjTV9nnfz5s0aOHCgjekAtAWFDJyD7rrrLu3bt09z5sxRaGioiouL9cwzz6hv37565ZVXTmuNxsbG0zpG26VLF3l5eZ1tZAA/gkIGzkFVVVVasmSJPvjgAx05ckQXX3yxbrzxRt1///2uj0f9mGeffVYrV6780ef9z//8j6655pqzjQzgR1DIQCd18OBB12eBf0hYWFiLn0EG4FkUMgAABuBzyAAAGIBCBgDAABQyAAAGoJABADAAhQwAgAEoZAAADEAhAwBgAAoZAAAD/D9vwU8D7vvAfQAAAABJRU5ErkJggg==",
"text/plain": [
"