{ "cells": [ { "cell_type": "markdown", "id": "0128ad0b", "metadata": {}, "source": [ "# Data preparation\n", "\n", "* From **business understanding**, we know the task to be solved. \n", "* Then we do **data understanding** to look into data.\n", "* Now we are going to do some necessary or useful data transformation to reach the aim.\n", "\n", "## Outline\n", "0. Summary of data understanding\n", "1. Missing and invalid data\n", "2. Feature extraction\n", "3. Making different statistical units\n", "4. Data transformation\n", "\n", "## Data and tasks\n", "* Titanic2 (*titanic_train.csv*) - data preparation for an analysis of ticket fares\n", "* Home Credit (*application_train.csv*) - segmentation of clients by family situation" ] }, { "cell_type": "code", "execution_count": 1, "id": "2fb31a8e", "metadata": {}, "outputs": [], "source": [ "# setup\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "sns.set_theme(style=\"ticks\", color_codes=True)" ] }, { "cell_type": "markdown", "id": "465ee695", "metadata": {}, "source": [ "## Part I. Titanic and ticket fares\n", "### Summary of data understanding\n", "Just few facts from the exploration -- for the aim of this practice.\n", "\n", "Let's consider these columns only: *pclass*, *sex*, *age*, *ticket*, *fare*, *cabin*, *embarked*" ] }, { "cell_type": "code", "execution_count": 2, "id": "2a75a9ce", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
passenger_idticketpclassfaresexagecabinembarked
0121633543237.7333femaleNaNNaNQ
169931508938.6625male38.0NaNS
21267345773324.1500female30.0NaNS
344929105223.0000female54.0NaNS
457628221213.0000male40.0NaNS
...........................
845158680150.0000male55.0C39S
84617411771129.7000male58.0B37C
847467244367226.0000female24.0NaNS
8481112SOTON/O.Q. 3101315313.7750female3.0NaNS
849425250647213.0000male52.0NaNS
\n", "

850 rows × 8 columns

\n", "
" ], "text/plain": [ " passenger_id ticket pclass fare sex age cabin \\\n", "0 1216 335432 3 7.7333 female NaN NaN \n", "1 699 315089 3 8.6625 male 38.0 NaN \n", "2 1267 345773 3 24.1500 female 30.0 NaN \n", "3 449 29105 2 23.0000 female 54.0 NaN \n", "4 576 28221 2 13.0000 male 40.0 NaN \n", ".. ... ... ... ... ... ... ... \n", "845 158 680 1 50.0000 male 55.0 C39 \n", "846 174 11771 1 29.7000 male 58.0 B37 \n", "847 467 244367 2 26.0000 female 24.0 NaN \n", "848 1112 SOTON/O.Q. 3101315 3 13.7750 female 3.0 NaN \n", "849 425 250647 2 13.0000 male 52.0 NaN \n", "\n", " embarked \n", "0 Q \n", "1 S \n", "2 S \n", "3 S \n", "4 S \n", ".. ... \n", "845 S \n", "846 C \n", "847 S \n", "848 S \n", "849 S \n", "\n", "[850 rows x 8 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# data reading\n", "df1 = pd.read_csv('titanic_train.csv')\n", "df1 = df1[['passenger_id', 'ticket', 'pclass', 'fare', 'sex', 'age', 'cabin', 'embarked']]\n", "df1" ] }, { "cell_type": "code", "execution_count": 3, "id": "93fa11a8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "passenger_id 0.000000\n", "ticket 0.000000\n", "pclass 0.000000\n", "fare 0.001176\n", "sex 0.000000\n", "age 0.204706\n", "cabin 0.775294\n", "embarked 0.001176\n", "dtype: float64\n" ] } ], "source": [ "# share of missing data (NaN, NULL) by columns\n", "print(1 - df1.count()/len(df1))" ] }, { "cell_type": "markdown", "id": "1003360b", "metadata": {}, "source": [ "* *ticket*, *pclass* and *sex* are complete\n", "* *fare* and *embarked* have negligible counts of missing data\n", "* *age* and *cabin* have significant counts of missing data" ] }, { "cell_type": "code", "execution_count": 4, "id": "26a4f014", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pclass\n", "3 478\n", "1 206\n", "2 166\n", "Name: count, dtype: int64\n", "sex\n", "male 551\n", "female 299\n", "Name: count, dtype: int64\n", "embarked\n", "S 589\n", "C 176\n", "Q 84\n", "Name: count, dtype: int64\n", "ticket\n", "CA. 2343 10\n", "1601 8\n", "S.O.C. 14879 6\n", "CA 2144 6\n", "PC 17608 6\n", "Name: count, dtype: int64\n", "cabin\n", "G6 4\n", "D 4\n", "B96 B98 4\n", "C22 C26 4\n", "B57 B59 B63 B66 4\n", "Name: count, dtype: int64\n" ] } ], "source": [ "# invalid values in data?\n", "# frequency tables of categorical columns\n", "print(df1['pclass'].value_counts())\n", "print(df1['sex'].value_counts())\n", "print(df1['embarked'].value_counts())\n", "# the most often values in string columns\n", "print(df1['ticket'].value_counts().sort_values(ascending=False)[:5])\n", "print(df1['cabin'].value_counts().sort_values(ascending=False)[:5])" ] }, { "cell_type": "markdown", "id": "965f233b", "metadata": {}, "source": [ "> String columns (*ticket*, *cabin*) have expected frequencies -- no value has too high frequency. \n", "> Categorical columns seem to have valid values.\n", "\n", "Let's look into numeric columns (*age*, *fare*)." ] }, { "cell_type": "code", "execution_count": 5, "id": "fdb1a868", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Fare: minimum= 0.0 ; maximum= 512.3292 ; median= 14.1083\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\jhucin\\Anaconda3\\lib\\site-packages\\seaborn\\axisgrid.py:118: UserWarning: The figure layout has changed to tight\n", " self._figure.tight_layout(*args, **kwargs)\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeQAAAHkCAYAAADvrlz5AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8pXeV/AAAACXBIWXMAAA9hAAAPYQGoP6dpAAAsN0lEQVR4nO3df1TVdYL/8ddFll+LmBgC49RoEJKpYEFLJeLgMnbSmlhn1qPijD/GrAzTSj2lFY2pzYi/2FYZDYsZc3QU13XcqUmZZqs5HANOthWi4SK1CZI3hVR+iHy+f/jljjfAAC/cN97n4xzOkc/nc+99f95pTz4/uNdmWZYlAADgVl7uHgAAACDIAAAYgSADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEI8rekpaUpLS3N3cMAAHgYb3cPwDSVlZXuHgIAwANxhAwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYwNvdA/Ak298q1emaOknSjf38NfW+aDePCABgCoLcg07X1KnKfsHdwwAAGIhT1gAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGMDtQbbb7Vq0aJESEhI0atQoPfzwwyorK3OsP3LkiNLS0hQbG6uxY8cqJyfH6fHNzc3KyspSYmKiYmJiNGvWLFVUVPT0bgAAcE3cHuRHH31UX3zxhbZs2aLdu3fLz89PM2bMUF1dnc6cOaOZM2dq8ODBysvLU3p6ujZs2KC8vDzH4zdu3KgdO3bopZde0s6dO2Wz2TRnzhw1Nja6ca8AAOgct76X9ZkzZ/T9739fjz76qG699VZJ0mOPPaYf//jH+uyzz1RQUCAfHx9lZGTI29tbERERqqio0JYtWzRp0iQ1NjZq69atWrRokZKSkiRJ69atU2Jiog4cOKAJEya4c/cAAOgwtx4h9+/fX2vXrnXE+PTp08rJyVFYWJgiIyNVVFSk+Ph4eXv//eeGhIQElZeXy263q7S0VOfPn1dCQoJjfVBQkIYNG6bCwsIe3x8AALrKmE97eu655/SHP/xBPj4+2rRpkwICAlRVVaWoqCin7QYOHChJOnnypKqqqiRJ4eHhrbaprKxs97XGjRvX7rrKyspWzwcAQHdz+zXkFj//+c+Vl5enBx98UPPmzdOnn36q+vp6+fj4OG3n6+srSWpoaFBd3eXPFm5rm4aGhp4ZOAAALmDMEXJkZKQkafny5Tp8+LC2bdsmPz+/VjdntYQ2ICBAfn5+kqTGxkbHn1u28ff3b/e18vPz2113taNnAAC6i1uPkO12u/bv369Lly45lnl5eSkiIkLV1dUKCwtTdXW102Navg8NDXWcWm5rm7CwsG4ePQAAruPWIFdXV+upp57SBx984Fh28eJFlZSUKCIiQvHx8SouLnYKdkFBgYYMGaIBAwYoOjpagYGBOnTokGN9bW2tSkpKFBcX16P7AgDAtXBrkKOjozV69Gi9+OKLKioq0rFjx7RkyRLV1tZqxowZmjRpks6dO6elS5eqrKxMe/bsUW5urubOnSvp8rXjtLQ0ZWZmKj8/X6WlpVq4cKHCwsKUkpLizl0DAKBT3HoN2Wazaf369VqzZo0WLFigb775RnFxcXrjjTf0ve99T5L06quvasWKFUpNTVVISIgWL16s1NRUx3PMnz9fTU1NWrZsmerr6xUfH6+cnJxWN3oBAGAym2VZlrsHYZKWm7quduNXV2Xt/FBV9guSpLABAZo/eZTLXwMA0DsZ82tPAAB4MoIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAZwe5DPnj2r559/XmPGjNEdd9yhKVOmqKioyLH+mWee0dChQ52+xowZ41jf3NysrKwsJSYmKiYmRrNmzVJFRYU7dgUAgC7zdvcAnnzySdntdq1du1bBwcHavn27Zs+erT179igiIkJHjx7VI488orS0NMdj+vTp4/jzxo0btWPHDq1atUqhoaFavXq15syZo/3798vHx8cduwQAQKe59Qi5oqJCf/vb3/TCCy8oLi5Ot9xyi5YuXarQ0FDt379fly5dUllZmUaMGKGQkBDHV3BwsCSpsbFRW7duVXp6upKSkhQdHa1169bp1KlTOnDggDt3DQCATnFrkPv376/Nmzdr+PDhjmU2m02WZammpkYnTpxQQ0ODIiIi2nx8aWmpzp8/r4SEBMeyoKAgDRs2TIWFhd0+fgAAXMWtp6yDgoKUlJTktOzNN9/U559/rtGjR+vYsWOy2WzKzc3Vu+++Ky8vLyUlJWnBggXq27evqqqqJEnh4eFOzzFw4EBVVlb22H4AAHCt3H4N+UrFxcV69tlnNW7cOCUnJysrK0teXl4aNGiQsrOzVVFRoV/96lc6duyYcnNzVVdXJ0mtrhX7+vqqpqam3dcZN25cu+sqKytbBR4AgO5mTJAPHjyop59+WjExMVq7dq0kKT09XTNmzFBQUJAkKSoqSiEhIZo8ebI+/vhj+fn5Sbp8Lbnlz5LU0NAgf3//nt8JAAC6yIggb9u2TStWrFBKSooyMzMdR7w2m80R4xZRUVGSpKqqKseRbHV1tW6++WbHNtXV1YqOjm739fLz89tdd7WjZwAAuovbfw95+/btWr58uaZNm6b169c7nX5+6qmnNHv2bKftP/74Y0lSZGSkoqOjFRgYqEOHDjnW19bWqqSkRHFxcT2zAwAAuIBbj5DLy8u1cuVKpaSkaO7cubLb7Y51fn5+mjhxoh599FFt2rRJEyZMUHl5uX75y19q4sSJjjuv09LSlJmZqeDgYA0aNEirV69WWFiYUlJS3LVbAAB0mluD/Oc//1kXL17UgQMHWv3ecGpqql5++WVt2LBB2dnZys7OVt++ffXAAw9owYIFju3mz5+vpqYmLVu2TPX19YqPj1dOTg5vCgIA6FVslmVZ7h6ESVquIV/tOnNXZe38UFX2C5KksAEBmj95lMtfAwDQO7n9GjIAACDIAAAYgSADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAtwf57Nmzev755zVmzBjdcccdmjJlioqKihzrjxw5orS0NMXGxmrs2LHKyclxenxzc7OysrKUmJiomJgYzZo1SxUVFT29GwAAXBO3B/nJJ5/URx99pLVr12r37t26/fbbNXv2bB0/flxnzpzRzJkzNXjwYOXl5Sk9PV0bNmxQXl6e4/EbN27Ujh079NJLL2nnzp2y2WyaM2eOGhsb3bhXAAB0jrc7X7yiokJ/+9vf9Pvf/1533HGHJGnp0qV69913tX//fvn5+cnHx0cZGRny9vZWRESEKioqtGXLFk2aNEmNjY3aunWrFi1apKSkJEnSunXrlJiYqAMHDmjChAnu3D0AADrMrUfI/fv31+bNmzV8+HDHMpvNJsuyVFNTo6KiIsXHx8vb++8/NyQkJKi8vFx2u12lpaU6f/68EhISHOuDgoI0bNgwFRYW9ui+AABwLdwa5KCgICUlJcnHx8ex7M0339Tnn3+u0aNHq6qqSmFhYU6PGThwoCTp5MmTqqqqkiSFh4e32qaysrKbRw8AgOu49ZT1txUXF+vZZ5/VuHHjlJycrFWrVjnFWpJ8fX0lSQ0NDaqrq5OkNrepqalp93XGjRvX7rrKyspWgQcAoLu5/aauFgcPHtTs2bM1cuRIrV27VpLk5+fX6uashoYGSVJAQID8/Pwkqc1t/P39e2DUAAC4hhFHyNu2bdOKFSuUkpKizMxMxxFvWFiYqqurnbZt+T40NFRNTU2OZTfffLPTNtHR0e2+Xn5+frvrrnb0DABAd3H7EfL27du1fPlyTZs2TevXr3c6/RwfH6/i4mJdunTJsaygoEBDhgzRgAEDFB0drcDAQB06dMixvra2ViUlJYqLi+vR/QAA4Fq4Ncjl5eVauXKlUlJSNHfuXNntdn311Vf66quv9M0332jSpEk6d+6cli5dqrKyMu3Zs0e5ubmaO3eupMvXjtPS0pSZman8/HyVlpZq4cKFCgsLU0pKijt3DQCATnHrKes///nPunjxog4cOKADBw44rUtNTdXLL7+sV199VStWrFBqaqpCQkK0ePFipaamOrabP3++mpqatGzZMtXX1ys+Pl45OTmtbvQCAMBkNsuyLHcPwiQt15Cvdp25q7J2fqgq+wVJUtiAAM2fPMrlrwEA6J3cfg0ZAAAQZAAAjECQAQAwAEEGAMAABBkAAAMQZAAADECQAQAwAEEGAMAABBkAAAMQZAAADECQAQAwAEEGAMAABBkAAAMQZAAADECQAQAwAEEGAMAABBkAAAMQZAAADNClIBcWFur8+fNtrqutrdV//dd/XdOgAADwNF0K8s9+9jMdP368zXUlJSV65plnrmlQAAB4Gu+ObrhkyRJVVlZKkizLUkZGhgIDA1ttd+LECd14442uGyEAAB6gw0fI48ePl2VZsizLsazl+5YvLy8vxcbGatWqVd0yWAAArlcdPkJOTk5WcnKyJGn69OnKyMhQREREtw0MAABP0uEgX+l3v/udq8cBAIBH61KQ6+rqlJ2drXfeeUd1dXVqbm52Wm+z2XTw4EGXDBAAAE/QpSCvWLFCeXl5uuuuu3TbbbfJy4tfZwYA4Fp0Kchvv/22Fi5cqIcfftjV4wEAwCN16dC2qalJI0eOdPVYAADwWF0K8ujRo/Xuu++6eiwAAHisLp2yvv/++/XCCy/o66+/VkxMjPz9/Vtt89BDD13r2AAA8BhdCvKCBQskSXv37tXevXtbrbfZbAQZAIBO6FKQ8/PzXT0OAAA8WpeCPGjQIFePAwAAj9alIL/yyivfuc3jjz/elacGAMAjuTzIgYGBGjhwIEEGAKATuhTk0tLSVssuXLig4uJiZWRk6LnnnrvmgQEA4Elc9p6XAQEBSkxM1Lx58/TrX//aVU8LAIBHcPmbUIeHh+v48eOufloAAK5rXTpl3RbLslRZWaktW7ZwFzYAAJ3UpSBHR0fLZrO1uc6yLE5ZAwDQSV0K8rx589oMcmBgoMaOHavBgwdf67gAAPAoXQpyenq6q8cBAIBH6/I15MbGRu3Zs0eHDh1SbW2t+vfvr7i4OKWmpsrX19eVYwQA4LrXpSDX1tbqZz/7mUpLS/W9731PISEhKi8v1/79+/XGG29o+/bt6tu3r6vHCgDAdatLv/a0Zs0aVVVVadu2bfrLX/6inTt36i9/+Yu2bdsmu92uDRs2uHqcAABc17oU5Pz8fC1YsEBxcXFOy+Pi4jR//ny9/fbbLhkcAACeoktBPn/+vG666aY219100006e/bstYwJAACP06Ug33LLLXrnnXfaXJefn68f/OAH1zQoAAA8TZdu6po9e7aefPJJNTY26oEHHtCNN96o06dP649//KN27dqljIwMFw8TAIDrW5eCfP/99+vEiRPKzs7Wrl27HMv/4R/+QfPmzdPkyZNdNkAAADxBl4J84cIFPfbYY0pLS9Phw4dVU1OjyspKTZ48Wf369XP1GAEAuO516hrykSNH9NBDD+n111+XJAUFBWnMmDEaM2aM1q9fr6lTp/JJTwAAdEGHg/zFF19oxowZqqmpUWRkpNM6Hx8fPfvsszp//rymTp2qqqqqLg1m48aNmj59utOyZ555RkOHDnX6GjNmjGN9c3OzsrKylJiYqJiYGM2aNUsVFRVden0AANylw0HevHmz+vfvr//4j//Qj370I6d1/v7+SktLU15engICApSdnd3pgbz++uvKyspqtfzo0aN65JFH9P777zu+9u7d61i/ceNG7dixQy+99JJ27twpm82mOXPmqLGxsdNjAADAXToc5IKCAv3iF7/QDTfc0O42AwYM0MyZM1VQUNDhAZw6dUq/+MUvtGHDBg0ZMsRp3aVLl1RWVqYRI0YoJCTE8RUcHCzp8vtpb926Venp6UpKSlJ0dLTWrVunU6dO6cCBAx0eAwAA7tbhIH/11Vcd+v3iqKioTp2y/vTTT9WvXz/t27dPMTExTutOnDihhoYGRUREtPnY0tJSnT9/XgkJCY5lQUFBGjZsmAoLCzs8BgAA3K3Dd1kHBwerurr6O7f7+uuvr3oU/W3JyclKTk5uc92xY8dks9mUm5urd999V15eXkpKStKCBQvUt29fR/jDw8OdHjdw4EBVVla2+5rjxo1rd11lZWWr5wMAoLt1+Ag5Pj5ee/bs+c7t9u7dq9tuu+2aBtXis88+k5eXlwYNGqTs7GwtWbJE//3f/63HHntMzc3Nqqurk3T5prIr+fr6qqGhwSVjAACgJ3T4CHn69OmaMmWKXn75ZS1cuLDVZx43NjZq3bp1eu+997R582aXDC49PV0zZsxQUFCQpMunw0NCQjR58mR9/PHH8vPzc7x2y58lqaGhQf7+/u0+b35+frvrrnb0DABAd+lwkEeMGKFnnnlGK1eu1H/+53/q7rvv1ve//31dunRJJ0+e1KFDh3TmzBk98cQTSkxMdMngbDabI8YtoqKiJElVVVWOU8vV1dW6+eabHdtUV1crOjraJWMAAKAndOqduqZNm6bo6Gjl5OQoPz/fcVr4H//xHzV69GjNmjWr1Y1Z1+Kpp57S2bNnlZOT41j28ccfS5IiIyN10003KTAwUIcOHXIEuba2ViUlJUpLS3PZOAAA6G6dfuvMO++8U3feeack6cyZM/Ly8uq2t8ucOHGiHn30UW3atEkTJkxQeXm5fvnLX2rixImOO6/T0tKUmZmp4OBgDRo0SKtXr1ZYWJhSUlK6ZUwAAHSHLr2XdYv+/fu7ahxt+uEPf6gNGzYoOztb2dnZ6tu3rx544AEtWLDAsc38+fPV1NSkZcuWqb6+XvHx8crJyWl1oxcAACazWZZluXsQJmm5qetqN351VdbOD1VlvyBJChsQoPmTR7n8NQAAvVOnPlwCAAB0D4IMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyG7iZXP3CAAAJvF29wA8Vf8gP21/q1Sna+p0Yz9/Tb0v2t1DAgC4EUF2o9M1daqyX3D3MAAABuCUNQAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABjAqyBs3btT06dOdlh05ckRpaWmKjY3V2LFjlZOT47S+ublZWVlZSkxMVExMjGbNmqWKioqeHDYAANfMmCC//vrrysrKclp25swZzZw5U4MHD1ZeXp7S09O1YcMG5eXlObbZuHGjduzYoZdeekk7d+6UzWbTnDlz1NjY2NO7AABAl7n985BPnTqlpUuXqri4WEOGDHFa94c//EE+Pj7KyMiQt7e3IiIiVFFRoS1btmjSpElqbGzU1q1btWjRIiUlJUmS1q1bp8TERB04cEATJkxwxy4BANBpbj9C/vTTT9WvXz/t27dPMTExTuuKiooUHx8vb++//9yQkJCg8vJy2e12lZaW6vz580pISHCsDwoK0rBhw1RYWNhj+wAAwLVy+xFycnKykpOT21xXVVWlqKgop2UDBw6UJJ08eVJVVVWSpPDw8FbbVFZWdsNoAQDoHm4P8tXU19fLx8fHaZmvr68kqaGhQXV1dZLU5jY1NTXtPu+4cePaXVdZWdkq8AAAdDe3n7K+Gj8/v1Y3ZzU0NEiSAgIC5OfnJ0ltbuPv798zgwQAwAWMPkIOCwtTdXW107KW70NDQ9XU1ORYdvPNNzttEx0d3e7z5ufnt7vuakfPAAB0F6OPkOPj41VcXKxLly45lhUUFGjIkCEaMGCAoqOjFRgYqEOHDjnW19bWqqSkRHFxce4YMgAAXWJ0kCdNmqRz585p6dKlKisr0549e5Sbm6u5c+dKunztOC0tTZmZmcrPz1dpaakWLlyosLAwpaSkuHn0AAB0nNGnrAcMGKBXX31VK1asUGpqqkJCQrR48WKlpqY6tpk/f76ampq0bNky1dfXKz4+Xjk5Oa1u9AIAwGQ2y7Isdw/CJC3XkK92nbmrsnZ+qCr7BUnS7bcEy15Tryr7BYUNCND8yaNc/noAgN7D6FPWAAB4CoIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABjP48ZFy2/a1Sna6pkyTd2M9fU++LdvOIAACuRpB7gdM1dY7PUQYAXJ84ZQ0AgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYgCADAGAAggwAgAEIMgAABiDIAAAYoFcE+csvv9TQoUNbfe3atUuSdOTIEaWlpSk2NlZjx45VTk6Om0cMAEDneLt7AB1x9OhR+fr66uDBg7LZbI7lffv21ZkzZzRz5kz98z//s1588UUdPnxYL774om644QZNmjTJjaMGAKDjekWQjx07piFDhmjgwIGt1uXm5srHx0cZGRny9vZWRESEKioqtGXLFoIMAOg1esUp66NHjyoyMrLNdUVFRYqPj5e3999/tkhISFB5ebnsdntPDREAgGvSK4J87Ngx2e12TZ06Vffcc4+mTJmi9957T5JUVVWlsLAwp+1bjqRPnjzZ42PtCi/bd28DALi+GX/KurGxUSdOnJC/v78WL16sgIAA7du3T3PmzNFrr72m+vp6+fj4OD3G19dXktTQ0NDmc44bN67d16usrFR4eLjrdqAD+gf5aftbpTpdUydJurGfv6beF92jYwAAuJfxQfbx8VFhYaG8vb0d4R0+fLiOHz+unJwc+fn5qbGx0ekxLSEOCAjo8fF21emaOlXZL7h7GAAANzE+yFLbYY2KitL777+vsLAwVVdXO61r+T40NLTN58vPz2/3ta529AwAQHcx/hpyaWmpRo0apaKiIqfln3zyiSIjIxUfH6/i4mJdunTJsa6goEBDhgzRgAEDenq4AAB0ifFBjoqK0q233qoXX3xRRUVFOn78uFatWqXDhw/rkUce0aRJk3Tu3DktXbpUZWVl2rNnj3JzczV37lx3Dx0AgA4z/pS1l5eXsrOzlZmZqQULFqi2tlbDhg3Ta6+9pqFDh0qSXn31Va1YsUKpqakKCQnR4sWLlZqa6uaRdx13XQOA5zE+yJIUHByslStXtrt+5MiR2rlzZw+OqHtdedd15KAb3D0cAEAPMP6Utadquev6zLl6dw8FANADCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAILcy3jZ3D0CAEB38Hb3ANA5/YP8tP2tUp2uqdON/fw19b5odw8JAOACBLkXOl1Tpyr7BXcPAwDgQpyyBgDAAAQZAAADEGQAAAxAkAEAMABBBgDAAAQZAAADEGQAAAxAkAEAMABBBgDAAAQZAAADEGQAAAxAkAEAMABBBgDAAHzak4dq+QhHSXyMIwAYgCB7KD7CEQDMwilrAAAMQJABADAAQQYAwABcQ/YgLTdyRQ664ZqfQ+JmMABwJYLsQVpu5BrQz69Tj7sy5NwMBgDdg1PW+E4tET5zrt7dQwGA6xZBRpd52dw9AgC4fnDKupu54rqtqfoH+Tn2j+vJAHBtCHI36+p1296Ca8oA4BqcsgYAwAAEGQAAA3DK+jrTck035AZ/TRnfc9d0ucELAK4NQe7F2orgldesr3wTj6vdVOaKmF55g5fEm4Z4Ev67A65BkHuxb0fw29G98oarq91U5qq7pbnByzPx3x1wDYLcy3U0up15Hk/WW472ess4AXQcQQau0Ft+MOkt4wTQcQQZ3cLdN3lxBAmgtyHIcOKqkF55Xbqn7/iWOIIE0PtcF0Fubm7WK6+8ol27dqm2tlZ33nmnXnjhBf3gBz9w99B6ne+6Uawz2rvj+1qOWHmrzs5jzoDe4boI8saNG7Vjxw6tWrVKoaGhWr16tebMmaP9+/fLx8fH3cPrdVx1o1h7z/lt3/4B4Oy5Bsf7f7f8uWWdO498e8tp8G/PJ2cLgI5x97/xXh/kxsZGbd26VYsWLVJSUpIkad26dUpMTNSBAwc0YcIEN48Q3+XbPwDYa+odR9Ytf25Z5069JWzd8QMV4Anc/W+81791Zmlpqc6fP6+EhATHsqCgIA0bNkyFhYVuHBkAAB1nsyzLcvcgrsXbb7+t9PR0ffTRR/Lz+/vRwBNPPKH6+nr95je/afWYcePGtft8//d//6c+ffooPDzcJeM7V3dRzc2WvPt4qdmy1Nx8ebqv/N7T1nl52RTo/w+t5qgzz/nt52hv3tt6vW+ra2hSc7OlPn281HSpuUPj/K7n7G5XG0t789nZMbfMi5eXTf6+7Z9MM2legGvRXX+Xw8PDtW3btu/crtefsq6ru3y+/9vXin19fVVTU9Pp57PZbPL2ds20VFZWSpLL4n49uXJuuuN/4J15TufY9HHJc16r7/q7c7WxuGqcV4twd7xeZ/Bv6+qYn6trb37c/cNkrw9yy1FxY2Oj0xFyQ0OD/P3923xMfn5+j4yt5Ui8p16vN2Furo75uTrm5+qYn6szdX56/TXklp9wqqurnZZXV1crLCzMHUMCAKDTen2Qo6OjFRgYqEOHDjmW1dbWqqSkRHFxcW4cGQAAHdfrT1n7+PgoLS1NmZmZCg4O1qBBg7R69WqFhYUpJSXF3cMDAKBDen2QJWn+/PlqamrSsmXLVF9fr/j4eOXk5PCmIACAXuO6CHKfPn20aNEiLVq0yN1DAQCgS3r9NWQAAK4Hvf6NQQAAuB5whAwAgAEIMgAABiDIAAAYgCADAGAAgtwNmpublZWVpcTERMXExGjWrFmqqKhw97B63MaNGzV9+nSnZUeOHFFaWppiY2M1duxY5eTkOK2/3ufu7Nmzev755zVmzBjdcccdmjJlioqKihzrPX1+7Ha7Fi1apISEBI0aNUoPP/ywysrKHOs9fX5alJeXa9SoUdqzZ49jGXMjffnllxo6dGirr127dknqBXNkweX+7d/+zbr77rutv/71r9aRI0esWbNmWSkpKVZDQ4O7h9ZjXnvtNWvo0KFWWlqaY9nXX39t/dM//ZO1dOlSq6yszNq9e7c1YsQIa/fu3Y5trve5mzlzpvXggw9ahYWF1vHjx63ly5dbI0eOtMrKypgfy7J++tOfWpMnT7b+53/+xyorK7PS09Ote++917pw4QLz8/81NjZa//Iv/2JFRUVZeXl5lmXxb6tFfn6+NWLECOvUqVNWdXW146uurq5XzBFBdrGGhgZr1KhR1vbt2x3LampqrJEjR1r79+9348h6RlVVlTV79mwrNjbWuu+++5yCnJ2dbSUmJloXL150LFuzZo01fvx4y7Ku/7k7ceKEFRUVZRUXFzuWNTc3WykpKdb69es9fn6+/vpra+HChdaxY8ccy44cOWJFRUVZH330kcfPT4s1a9ZY06dPdwoyc3PZpk2brAcffLDNdb1hjjhl7WKlpaU6f/68EhISHMuCgoI0bNgwFRYWunFkPePTTz9Vv379tG/fPsXExDitKyoqUnx8vNPnTSckJKi8vFx2u/26n7v+/ftr8+bNGj58uGOZzWaTZVmqqalhfvr319q1a3XrrbdKkk6fPq2cnByFhYUpMjLS4+dHkgoLC7Vz50796le/clrO3Fx29OhRRUZGtrmuN8wRQXaxqqoqSa0/+HrgwIGOD8W+niUnJ2vNmjW66aabWq2rqqpq9ZGYAwcOlCSdPHnyup+7oKAgJSUlOb3H+ptvvqnPP/9co0eP9vj5udJzzz2ne++9V2+99ZZWrFihgIAAj5+f2tpaLV68WMuWLWu1j54+Ny2OHTsmu92uqVOn6p577tGUKVP03nvvSeodc0SQXayurk6SWn2wha+vrxoaGtwxJGPU19e3OS+S1NDQ4HFzV1xcrGeffVbjxo1TcnIy83OFn//858rLy9ODDz6oefPm6dNPP/X4+cnIyFBsbKweeOCBVus8fW4kqbGxUSdOnNC5c+e0YMECbd68WSNGjNCcOXNUUFDQK+bouvhwCZP4+flJuvyXo+XP0uX/4P7+/u4alhH8/PzU2NjotKzlL3pAQIBHzd3Bgwf19NNPKyYmRmvXrpXE/Fyp5bTj8uXLdfjwYW3bts2j52fv3r0qKirSH//4xzbXe/LctPDx8VFhYaG8vb0dUR0+fLiOHz+unJycXjFHHCG7WMvpjurqaqfl1dXVrU6XeJqwsLA250WSQkNDPWbutm3bpvT0dI0ZM0Zbtmxx/OP39Pmx2+3av3+/Ll265Fjm5eWliIgIxz566vzk5eXJbrdr7NixGjVqlEaNGiVJeuGFFzRhwgSPnpsrBQQEtDrCjYqK0qlTp3rFHBFkF4uOjlZgYKAOHTrkWFZbW6uSkhLFxcW5cWTuFx8fr+LiYqf/4RYUFGjIkCEaMGCAR8zd9u3btXz5ck2bNk3r1693+p+Hp89PdXW1nnrqKX3wwQeOZRcvXlRJSYkiIiI8en4yMzP1pz/9SXv37nV8SZc/C37z5s0ePTctSktLNWrUKKff65ekTz75RJGRkb1jjnrkXm4Ps3btWuuuu+6yDh486Phdth/96EfX1e/7dcSSJUucfu3p9OnTVnx8vLVkyRLrs88+s/Ly8qwRI0ZYe/bscWxzPc/d//7v/1q33367NW/ePKffkayurrZqa2s9fn6am5utWbNmWePHj7cKCwuto0ePWgsXLrTi4+OtL7/80uPn59uu/LUn5sayLl26ZP30pz+1Jk6caBUWFlplZWXWypUrreHDh1ulpaW9Yo4Icjdoamqyfv3rX1sJCQlWbGysNWfOHOuLL75w97B63LeDbFmW9dFHH1n/+q//ag0fPtz64Q9/aP3ud79zWn89z92mTZusqKioNr+WLFliWZZnz49lWVZtba31wgsvWPfee681cuRIa9asWU6/l+zp83OlK4NsWcyNZVmW3W63nnnmGevee++1RowYYU2ePNkqLCx0rDd9jvg8ZAAADMA1ZAAADECQAQAwAEEGAMAABBkAAAMQZAAADECQAQAwAEEGAMAABBkAAAMQZACqqqpSWlqaRowYobvvvtvxUXQAeg4fvwhAubm5+vDDD7V69WqFhoZeNx/JB/QmBBmAzp49q4EDB+r+++9391AAj8V7WQMeLjk5WV9++aXj+8cff1wpKSl65ZVXVFRUpG+++UbBwcEaP368nn76acfnNw8dOlTp6el65513dOLECc2ePVuPPfaYTp48qczMTL3//vtqaGhQbGyslixZomHDhrlrF4FegSADHq6kpETr169XSUmJXnnlFYWGhmrixImKjY3V9OnT5ePjo7/+9a/Kzc3VwoUL9cgjj0i6HGRvb2898cQTGjp0qMLCwhQSEqKHHnpI/v7+evzxx+Xv76/c3Fx98skn2r17tyIiIty8t4C5OGUNeLhhw4YpODhYPj4+io2N1fvvv6/bbrtNGzZsUGBgoCTpnnvuUUFBgQoLCx1BlqSRI0fq4Ycfdny/bt06nT17Vr///e81aNAgSdKYMWN0//33a8OGDcrKyurZnQN6EYIMwMno0aM1evRoXbx4UeXl5Tpx4oSOHj2qr7/+WjfccIPTtlFRUU7fFxQU6LbbblNoaKiampokSV5eXhozZoz27dvXU7sA9EoEGYCT5uZmrV27Vm+88YYuXLig8PBwjRw5Ur6+vq22vfHGG52+P3v2rCoqKnT77be3+dx1dXXcwQ20gyADcLJ582a9/vrrysjI0Pjx49W3b19J0k9+8pPvfGzfvn111113afHixW2u9/HxcelYgesJbwwCwElxcbEiIyP1k5/8xBHjU6dO6dixY2pubr7qY++66y6Vl5dryJAhGjFihONr37592rVrl/r06dMTuwD0SgQZgJORI0fq6NGj2rx5sz744APt2rVL06ZNU2Nj43e+g9eMGTPU3NysGTNm6E9/+pMKCgr03HPP6be//a1uueWWHtoDoHfilDUAJ3PnztWZM2f029/+Vv/+7/+u8PBw/fjHP5bNZtNvfvMb1dTUqF+/fm0+NjQ0VDt27NCaNWuUkZGhhoYGDR48WCtWrOjQKW/Ak/F7yAAAGIBT1gAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAYgyAAAGIAgAwBgAIIMAIABCDIAAAb4f7yjjtuRSFpdAAAAAElFTkSuQmCC", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# distribution of values in numeric columns\n", "sns.displot(df1['fare'])\n", "print('Fare: minimum=', df1['fare'].min(), '; maximum=', df1['fare'].max(), '; median=', df1['fare'].median())" ] }, { "cell_type": "code", "execution_count": 6, "id": "bbfffb36", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Passengers with zero fare: 11\n", "The most often fares: \n", "fare\n", "13.0000 42\n", "8.0500 40\n", "7.7500 39\n", "7.8958 32\n", "26.0000 29\n", "Name: count, dtype: int64\n" ] } ], "source": [ "# zero fare is rather unexpected; how many passenger have zero fare?\n", "print('Passengers with zero fare: ', (df1['fare']==0).sum())\n", "print('The most often fares: ')\n", "print(df1['fare'].value_counts().sort_values(ascending=False).iloc[0:5])" ] }, { "cell_type": "markdown", "id": "e8891a3e", "metadata": {}, "source": [ "> Fare values seem to be valid with exception of zero and missing values." ] }, { "cell_type": "code", "execution_count": 7, "id": "f96cdf2b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Age: minimum= 0.1667 ; maximum= 80.0 ; median= 28.0\n", "The most often ages:\n", "age\n", "18.0 32\n", "30.0 30\n", "24.0 29\n", "22.0 28\n", "25.0 26\n", "Name: count, dtype: int64\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\jhucin\\Anaconda3\\lib\\site-packages\\seaborn\\axisgrid.py:118: UserWarning: The figure layout has changed to tight\n", " self._figure.tight_layout(*args, **kwargs)\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeQAAAHkCAYAAADvrlz5AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8pXeV/AAAACXBIWXMAAA9hAAAPYQGoP6dpAAAn70lEQVR4nO3de1SVdb7H8Q+IKOiYjOGlJi/pkDdujjh0vESyxBaWaxhP40kxQyeZowdHO6Ok5MhaZZeZUMkyR8X0ZE2M4dDkTE1KtnI6jKGWrckLBwMcBQIRZQQFhd/5wyW1E0txw/MD3q+1WCt+z96P34073zzPvnkYY4wAAICjPJ0eAAAAEGQAAKxAkAEAsABBBgDAAgQZAAALEGQAACxAkAEAsABBBgDAAu0uyLGxsYqNjXV6DAAAXHg5PUBLKy4udnoEAACu0u6OkAEAsBFBBgDAAgQZAAALEGQAACxAkAEAsABBBgDAAgQZAAALEGQAACxAkAEAsABBBgDAAgQZAAALEGQAACxAkAEAsABBBgDAAgQZAAALEGQAACxAkAEAsABBBgDAAgQZAAALEGTgBtXXG6v2A6Bt8HJ6AKC18fT0UPrOXJVVVDd5H/5+vpo6IcCNUwFo7Qgy0ARlFdUqOlXl9BgA2hBOWQMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMhoN+rrjdMjAMA1eTk9ANBSPD09lL4zV2UV1U3eR0BfP0WF93PjVABwGUFGu1JWUa2iU1VNvr5/dx83TgMAX+GUNQAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUcD/LFixe1atUqRUREKDQ0VNOmTdOBAwcath8+fFixsbEKCQlRRESE0tLSHJwWAIDm4XiQX375ZWVkZOipp55SZmam7rzzTj366KP68ssvVVFRobi4OPXv318ZGRlKSEhQamqqMjIynB4bAAC3cvzjF7OysnT//fdrzJgxkqTHH39c27Zt06effqqCggJ5e3srOTlZXl5eGjhwoAoLC7VhwwZNmTLF4ckBAHAfx4+Qu3fvrt27d+vEiROqq6tTenq6vL29NWTIEO3bt09hYWHy8vrq94bw8HDl5+ervLzcwakBAHAvx4+Qk5KStHDhQkVGRqpDhw7y9PRUamqq+vbtq5KSEgUEBLhcvmfPnpKkoqIi9ejRw4mRAQBwO8eDfOzYMXXr1k0vvfSSevXqpW3btikxMVFbt27VhQsX5O3t7XL5Tp06SZJqamquuc/IyMhrbisuLlafPn3cMzwAAG7iaJBPnjypRYsWafPmzRo5cqQkKTAwUHl5eVqzZo06d+6s2tpal+tcCbGvr2+LzwsAQHNxNMifffaZLl68qMDAQJf14OBgffjhh7rttttUWlrqsu3K97169brmfrOysq657duOngEAcIqjT+q6cur46NGjLuu5ubnq16+fwsLCtH//ftXV1TVsy87O1oABA3j8GADQpjga5KCgII0cOVKJiYn6+9//roKCAq1evVrZ2dmaM2eOpkyZonPnzikpKUl5eXnavn27tmzZovj4eCfHBgDA7Rw9Ze3p6am1a9dq9erVWrJkic6ePauAgABt3rxZISEhkqSNGzdqxYoViomJkb+/vxYvXqyYmBgnxwYAwO0cf5b1LbfcouXLl2v58uWNbg8KClJ6enoLTwUAQMty/I1BAAAAQQYAwAoEGQAACxBkAAAsQJABALAAQQYAwAIEGQAACxBkAAAsQJABALAAQQYAwAIEGQAACxBkAAAsQJABALAAQQYAwAIEGQAACxBkNJv6emPVfgDAZl5OD4C2y9PTQ+k7c1VWUd3kffj7+WrqhAA3TgUAdiLIaFZlFdUqOlXl9BgAYD1OWQMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyrNbVt6Pq643TYwBAs+PzkGE1H28veXp6KH1nrsoqqpu8n4C+fooK7+fGyQDAvQgyWoWyimoVnapq8vX9u/u4cRoAcD9OWQMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABQgyAAAWIMgAAFiAIAMAYAGCDACABawIcmZmpqKjoxUYGKhJkybpnXfeadh2+PBhxcbGKiQkRBEREUpLS3NwUgAAmofjQX7rrbe0dOlSTZ06VTt27FB0dLQee+wxffLJJ6qoqFBcXJz69++vjIwMJSQkKDU1VRkZGU6PDQCAW3k5+YcbY5SamqqZM2dq5syZkqR58+bpwIED+vjjj/Xxxx/L29tbycnJ8vLy0sCBA1VYWKgNGzZoypQpTo4OAIBbOXqE/MUXX+jkyZN64IEHXNbT0tIUHx+vffv2KSwsTF5eX/3eEB4ervz8fJWXl7f0uAAANBtHj5ALCgokSdXV1Zo9e7YOHTqkH/zgB/rP//xPjR8/XiUlJQoICHC5Ts+ePSVJRUVF6tGjR6P7jYyMvOafWVxcrD59+rjnBgAA4CaOHiGfO3dOkpSYmKj7779fmzZt0ujRozV37lxlZ2frwoUL8vb2drlOp06dJEk1NTUtPi8AAM3F0SPkjh07SpJmz56tmJgYSdKQIUN06NAhvfLKK+rcubNqa2tdrnMlxL6+vtfcb1ZW1jW3fdvRMwAATnH0CLl3796SdNVp6UGDBunEiRPq3bu3SktLXbZd+b5Xr14tMyQAAC3A0SAPHTpUXbp00cGDB13Wc3Nz1bdvX4WFhWn//v2qq6tr2Jadna0BAwZc8/FjAABaI0eD3LlzZ/385z/XSy+9pB07duj48eN6+eWX9dFHHykuLk5TpkzRuXPnlJSUpLy8PG3fvl1btmxRfHy8k2MDAOB2jj6GLElz586Vj4+PVq1apS+//FIDBw7UmjVr9OMf/1iStHHjRq1YsUIxMTHy9/fX4sWLGx5vBgCgrXA8yJIUFxenuLi4RrcFBQUpPT29hScCAKBlOf7WmQAAgCADAGAFggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggw4oKtvR9XXG7fsyx37cdcsAJrOiveyBtobH28veXp6KH1nrsoqqpu8n4C+fooK73dT+/H389XUCQHffUEAzYogAw4qq6hW0amqJl/fv7uPW/YDwHmcsgYAwAIEGQAACxBkAAAsQJABALBAk4Kck5OjqqrGn0BSWVmpP//5zzc1FAAA7U2Tgvzwww/r2LFjjW47dOiQlixZclNDAQDQ3lz3y54SExNVXFwsSTLGKDk5WV27dr3qcgUFBbr11lvdNyEAAO3AdR8hT5w4UcYYGfPVO/pc+f7Kl6enp0JCQvTMM880y7AAALRV132EPH78eI0fP16SNGPGDCUnJ2vgwIHNNhgAAO1Jk96p69VXX3X3HAAAtGtNCvL58+e1bt067d69W+fPn1d9fb3Ldg8PD+3atcstAwIA0B40KcgrVqxQRkaGRo0apSFDhsjTk5czAwBwM5oU5Pfee08LFy7UnDlz3D0PAADtUpMObS9duqSgoCB3zwIAQLvVpCCPGTNGH374obtnAQCg3WrSKevo6GgtX75cp0+fVnBwsHx8fK66zE9+8pObnQ0AgHajSUFesGCBJCkzM1OZmZlXbffw8CDIAADcgCYFOSsry91zAADQrjUpyLfffru75wAAoF1rUpBffPHF77zMf/3XfzVl1wAAtEtuD3LXrl3Vs2dPggwAwA1oUpCPHDly1Vp1dbX279+v5ORkLVu27KYHAwCgPXHbe176+vpq7Nixmjdvnn7zm9+4a7cAALQLbn8T6j59+ujYsWPu3i0AAG1ak05ZN8YYo+LiYm3YsIFnYQMAcIOaFOTBgwfLw8Oj0W3GGE5ZAwBwg5oU5Hnz5jUa5K5duyoiIkL9+/e/2bkAAGhXmhTkhIQEd88BAEC71uTHkGtra7V9+3bt3btXlZWV8vPz08iRIxUTE6NOnTq5c0YAANq8JgW5srJSDz/8sI4cOaLbbrtN/v7+ys/P144dO/Taa6/p9ddf1/e+9z13zwoAQJvVpJc9paSkqKSkRFu3btX777+v9PR0vf/++9q6davKy8uVmprq7jkBAGjTmhTkrKwsLViwQCNHjnRZHzlypObPn6/33nvPLcMBANBeNCnIVVVVuuOOOxrddscdd+jMmTM3MxMAAO1Ok4J85513avfu3Y1uy8rKUr9+/W5qKAAA2psmPalr9uzZeuyxx1RbW6sHHnhAt956q06dOqW3335b27ZtU3JyspvHBACgbWtSkKOjo1VQUKB169Zp27ZtDesdO3bUvHnzNHXqVLcNCABAe9CkIFdXV2vu3LmKjY3Vp59+qrNnz6q4uFhTp07VLbfc4u4ZAQBo827oMeTDhw/rJz/5iTZv3ixJ6tatm8aNG6dx48Zp9erVmjZtGp/0BABAE1x3kP/5z3/qkUce0dmzZzVo0CCXbd7e3lq6dKmqqqo0bdo0lZSUuH1QAADasusO8vr16+Xn56c//vGPioqKctnm4+Oj2NhYZWRkyNfXV+vWrXP7oAAAtGXXHeTs7Gz9/Oc/V/fu3a95mR49eiguLk7Z2dnumA0AgHbjuoNcVlZ2Xa8vDggI4JQ1AAA36LqD/P3vf1+lpaXfebnTp09/61E0AAC42nUHOSwsTNu3b//Oy2VmZmrIkCE3NRQAAO3NdQd5xowZ2rt3r5599lnV1NRctb22tlbPPfec9uzZo+nTp7t1SAAA2rrrfmOQwMBALVmyRE8//bTeeust3X333frBD36guro6FRUVae/evaqoqNAvf/lLjR07tjlnBgCgzbmhd+qaPn26Bg8erLS0NGVlZTUcKXfp0kVjxozRrFmzFBwc3CyDAgDQlt3wW2f+6Ec/0o9+9CNJUkVFhTw9PXm7TAAAblKT3sv6Cj8/P3fNAQBAu9akz0MGAADuRZABALAAQQYAwAIEGQAACxBkAAAsQJABALAAQQYAwAIEGQAACxBkAAAsQJABALCAVUHOz89XaGioy+cuHz58WLGxsQoJCVFERITS0tIcnBAAgOZhTZAvXryoX/3qV6qurm5Yq6ioUFxcnPr376+MjAwlJCQoNTVVGRkZDk4KAID73dSHS7jTmjVr1KVLF5e1P/zhD/L29lZycrK8vLw0cOBAFRYWasOGDZoyZYpDkwIA4H5WHCHn5OQoPT1dzz33nMv6vn37FBYWJi+vr35vCA8PV35+vsrLy1t6TAAAmo3jQa6srNTixYv1xBNPqE+fPi7bSkpK1Lt3b5e1nj17SpKKiopabEYAAJqb46esk5OTFRISogceeOCqbRcuXJC3t7fLWqdOnSRJNTU119xnZGTkNbcVFxdfFX4AAJzmaJAzMzO1b98+vf32241u79y5s2pra13WroTY19e32ecDAKClOBrkjIwMlZeXKyIiwmV9+fLlSktL02233abS0lKXbVe+79Wr1zX3m5WVdc1t33b0DACAUxwN8vPPP68LFy64rEVFRWn+/PmKjo7Wn//8Z73xxhuqq6tThw4dJEnZ2dkaMGCAevTo4cTIAAA0C0ef1NWrVy/169fP5UuSevToodtvv11TpkzRuXPnlJSUpLy8PG3fvl1btmxRfHy8k2MDAOB2jj/L+tv06NFDGzduVH5+vmJiYvTiiy9q8eLFiomJcXo0AADcyvFnWX/T0aNHXb4PCgpSenq6Q9MAANAyrD5CBgCgvSDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUI8k2qrzdW7ANoqq6+Hd12H+S+DDSdl9MDtHaenh5K35mrsorqJl3f389XUycEuHkq4Pr5eHvd9P1Y4r4M3CyC7AZlFdUqOlXl9BjATeF+DDiLU9YAAFiAIANwCxsfi+YxbbQmnLIG4Bbueiw6oK+fosL78Zg22h2CDMCtbvaxaP/uPm7ZD9DacMoaAAALEGQAACxAkAEAsABBBgDAAgQZAAALEGQAACxAkAG0STa+UQnwbXgdMoA2iQ/NQGtDkAG0abzBCFoLTlkDAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYwPEgnzlzRr/+9a81btw4jRgxQg899JD27dvXsP3w4cOKjY1VSEiIIiIilJaW5uC0AAA0D8eD/Nhjj+ngwYNauXKl3nzzTQ0bNkyzZ8/WsWPHVFFRobi4OPXv318ZGRlKSEhQamqqMjIynB4bAAC3cvTjFwsLC/XRRx/p97//vUaMGCFJSkpK0ocffqgdO3aoc+fO8vb2VnJysry8vDRw4EAVFhZqw4YNmjJlipOjAwDgVo4eIfv5+Wn9+vUaPnx4w5qHh4eMMTp79qz27dunsLAweXl99XtDeHi48vPzVV5e7sTIAAA0C0eD3K1bN91zzz3y9vZuWHvnnXd0/PhxjRkzRiUlJerdu7fLdXr27ClJKioqatFZAQBoTo6esv6m/fv3a+nSpYqMjNT48eP1zDPPuMRakjp16iRJqqmpueZ+IiMjr7mtuLhYffr0cc/AAAC4ieNP6rpi165dmj17toKCgrRy5UpJUufOnVVbW+tyuSsh9vX1bfEZAbQ/XX07qr7euGVf7toP2iYrjpC3bt2qFStWaMKECXr++ecbjop79+6t0tJSl8te+b5Xr17X3F9WVtY1t33b0TMAfJOPt5c8PT2UvjNXZRXVTd6Pv5+vpk4IcONkaGscD/Lrr7+uJ598UjNmzNDSpUvl6fnVQXtYWJjeeOMN1dXVqUOHDpKk7OxsDRgwQD169HBqZADtUFlFtYpOVTk9BtowR09Z5+fn6+mnn9aECRMUHx+v8vJylZWVqaysTP/61780ZcoUnTt3TklJScrLy9P27du1ZcsWxcfHOzk2AABu5+gR8l//+lddvHhRO3fu1M6dO122xcTE6Nlnn9XGjRu1YsUKxcTEyN/fX4sXL1ZMTIxDEwMA0DwcDfIvfvEL/eIXv/jWywQFBSk9Pb2FJgIAwBnWPMsaAID2jCADAGABgtyG8FpJAGi9HH/ZE9yH10oCQOtFkNsYXisJAK0Tp6wBALAAQQYAwAIE2WHufON6d7BtHgBoL3gM2WHueuP6gL5+igrv1+bmAYD2giBb4mafjOXf3ceN09g3DwC0dZyyBgDAAgQZAAALEGQAACxAkAEAsABBBgDAAgQZAAALEGQAACxAkAEAsABBBgDAAgQZAAALEGQAACxAkAEAsABBBgDAAgQZAFqAOz9rnM8sb5v4+EUAaAHu+qxxfz9fTZ0Q4MbJYAuCDAAt6GY/axxtF6esAQCwAEEGAMACBBkAAAsQZAAALECQAQCwAEEGAMACBBkAWhHeYKTt4nXIANCK8AYjbRdBBoBWiDcYaXs4ZQ0AgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADAGABggwAgAUIMgAAFiDIAABYgCADQDvU1bej6uuNW/blrv20d15ODwAAaHk+3l7y9PRQ+s5clVVUN3k//n6+mjohwI2TtV8EGQDasbKKahWdqnJ6DIhT1gAAWIEgAwCazF2PRfN4NqesAQA3wR2PRQf09VNUeL92/3g2QQYA3LSbeSzav7vPTe+jLWgVp6zr6+v1wgsvaOzYsQoODtasWbNUWFjo9FgAALhNqwjy2rVr9cYbb+ipp55Senq6PDw89Oijj6q2ttbp0QAAlmjtr622/pR1bW2tNm3apEWLFumee+6RJK1atUpjx47Vzp07NWnSJIcnBADYoLW/ttr6IB85ckRVVVUKDw9vWOvWrZuGDh2qnJwcggwAcNFaH4v2MMZY/Rzx9957TwkJCTp48KA6d+7csP7LX/5SFy5c0O9+97urrhMZGXnN/Z04cUIdOnRQnz593DZj1fmLqmvi6Y2OXp7y6eR1U/tgP+yH+yD7ac37sWkWSerg6aEuPh2bfP1v6tOnj7Zu3fqdl7P+CPn8+fOSJG9vb5f1Tp066ezZsze8Pw8PD3l53fzNLi4ulnT5B+2Ovzh3/eVfz36+PrsN89zIfq5n9pac50Y0NrtNP+dv28eN/Nxtuk2SVHnmlKSm32eucOJ2fdvP3baf8zf305T/V225TcXFxao8c/P3mRtlfZCvHBXX1ta6HCHX1NTIx8en0etkZWU1+1xXjsJb4s9yN2Z3BrM7g9mdwew3zvpnWV/5DaW0tNRlvbS0VL1793ZiJAAA3M76IA8ePFhdu3bV3r17G9YqKyt16NAhjRw50sHJAABwH+tPWXt7eys2NlbPP/+8vv/97+v222/Xb3/7W/Xu3VsTJkxwejwAANzC+iBL0vz583Xp0iU98cQTunDhgsLCwpSWlnbVE70AAGitWkWQO3TooEWLFmnRokVOjwIAQLOw/jFkAADaA+vfGAQAgPaAI2QAACxAkAEAsABBBgDAAgQZAAALEOQbVF9frxdeeEFjx45VcHCwZs2apcLCQqfH+k5r167VjBkzXNYOHz6s2NhYhYSEKCIiQmlpaQ5Nd7UzZ87o17/+tcaNG6cRI0booYce0r59+xq22zx7eXm5Fi1apPDwcIWGhmrOnDnKy8tr2G7z7F+Xn5+v0NBQbd++vWHN5tlPnjypu+6666qvbdu2SbJ7dknKzMxUdHS0AgMDNWnSJL3zzjsN22ydfe/evY3+zO+6666G94O2dXZJunjxolatWqWIiAiFhoZq2rRpOnDgQMP2Fp/d4IasWbPG3H333eaDDz4whw8fNrNmzTITJkwwNTU1To92Ta+88oq56667TGxsbMPa6dOnzY9//GOTlJRk8vLyzJtvvmkCAwPNm2++6eCkX4mLizOTJ082OTk55tixY+bJJ580QUFBJi8vz/rZH3zwQTN16lTz2Wefmby8PJOQkGBGjx5tqqurrZ/9itraWvPTn/7UBAQEmIyMDGOM/feZrKwsExgYaL788ktTWlra8HX+/HnrZ8/MzDRDhgwxmzdvNgUFBebFF180gwcPNgcOHLB69pqaGpefdWlpqfnb3/5mhg4dav7whz9YPbsxxqSmpprRo0ebPXv2mIKCApOUlGRGjBhhSkpKHJmdIN+AmpoaExoaal5//fWGtbNnz5qgoCCzY8cOBydrXElJiZk9e7YJCQkx9913n0uQ161bZ8aOHWsuXrzYsJaSkmImTpzoxKguCgoKTEBAgNm/f3/DWn19vZkwYYJZvXq11bOfPn3aLFy40OTm5jasHT582AQEBJiDBw9aPfvXpaSkmBkzZrgE2fbZX375ZTN58uRGt9k8e319vbn33nvNs88+67I+a9Yss27dOqtn/6ba2lozadIks2DBAmOM3T93Y4yZPHmyeeaZZxq+/9e//mUCAgLMu+++68jsnLK+AUeOHFFVVZXCw8Mb1rp166ahQ4cqJyfHwcka9/nnn+uWW27Rn/70JwUHB7ts27dvn8LCwlw+Gzo8PFz5+fkqLy9v6VFd+Pn5af369Ro+fHjDmoeHh4wxOnv2rPWzr1y5Uj/84Q8lSadOnVJaWpp69+6tQYMGWT37FTk5OUpPT9dzzz3nsm777EePHtWgQYMa3Wbz7F988YVOnjypBx54wGU9LS1N8fHxVs/+Ta+99pqKi4u1ZMkSSXb/3CWpe/fu2r17t06cOKG6ujqlp6fL29tbQ4YMcWR2gnwDSkpKJF39odU9e/Zs+DBum4wfP14pKSm64447rtpWUlJy1cdX9uzZU5JUVFTUIvNdS7du3XTPPfe4vFf5O++8o+PHj2vMmDFWz/51y5Yt0+jRo/Xuu+9qxYoV8vX1tX72yspKLV68WE888cRV93PbZ8/NzVV5ebmmTZumf/u3f9NDDz2kPXv2SLJ79oKCAklSdXW1Zs+erbvvvlsPPvig3n//fUl2z/51NTU1WrdunWbOnNkwn+2zJyUlycvLS5GRkQoMDNSqVau0evVq9e3b15HZCfINOH/+vCRd9aEWnTp1Uk1NjRMjNdmFCxcavR2SrLst+/fv19KlSxUZGanx48e3mtlnzpypjIwMTZ48WfPmzdPnn39u/ezJyckKCQm56mhNsvs+U1tbq4KCAp07d04LFizQ+vXrFRgYqEcffVTZ2dlWz37u3DlJUmJiou6//35t2rRJo0eP1ty5c62f/eveeust1dTUuDx51PbZjx07pm7duumll15Senq6fvrTnyoxMVFHjhxxZPZW8eEStujcubOky//zX/lv6fJfjo+Pj1NjNUnnzp1VW1vrsnblTubr6+vESI3atWuXfvWrXyk4OFgrV66U1Hpmv3L69Mknn9Snn36qrVu3Wj17Zmam9u3bp7fffrvR7TbP7u3trZycHHl5eTX8Izp8+HAdO3ZMaWlpVs/esWNHSdLs2bMVExMjSRoyZIgOHTqkV155xerZvy4zM1NRUVHy8/NrWLN59pMnT2rRokXavHmzRo4cKUkKDAxUXl6e1qxZ48jsHCHfgCun8EpLS13WS0tLrzq1YbvevXs3ejskqVevXk6MdJWtW7cqISFB48aN04YNGxp+CbJ59vLycu3YsUN1dXUNa56enho4cGDD/cTW2TMyMlReXt7wEpDQ0FBJ0vLlyzVp0iSrZ5cu/yP5zSOagIAAffnll1bPfuXfjoCAAJf1QYMG6cSJE1bPfsXp06f1ySefKDo62mXd5tk/++wzXbx4UYGBgS7rwcHBKigocGR2gnwDBg8erK5du2rv3r0Na5WVlTp06FDDb1itRVhYmPbv3+8SjuzsbA0YMEA9evRwcLLLXn/9dT355JOaPn26Vq9e7fIPrc2zl5aW6r//+7/18ccfN6xdvHhRhw4d0sCBA62e/fnnn9df/vIXZWZmNnxJlz+PfP369VbPfuTIEYWGhrq8Vl2S/vGPf2jQoEFWzz506FB16dJFBw8edFnPzc1V3759rZ79igMHDsjDw0OjRo1yWbd59isHWEePHnVZz83NVb9+/ZyZvdmev91GrVy50owaNcrs2rWr4XXIUVFRVr8O2RhjEhMTXV72dOrUKRMWFmYSExPN//3f/5mMjAwTGBhotm/f7uCUl33xxRdm2LBhZt68eVe9xrGystLq2evr682sWbPMxIkTTU5Ojjl69KhZuHChCQsLMydPnrR69sZ8/WVPNs9eV1dnHnzwQXP//febnJwck5eXZ55++mkzfPhwc+TIEatnN8aYl156yYSGhpq3337bFBYWmrVr15rBgwebv//979bPbszl92eIioq6at3m2evq6sy0adPMfffdZ7Kzs01+fr5ZtWqVGTJkiPnkk08cmZ0g36BLly6Z3/zmNyY8PNyEhISYRx991Pzzn/90eqzv9M0gG2PMwYMHzc9+9jMzfPhwc++995pXX33VoelcvfzyyyYgIKDRr8TERGOMvbMbY0xlZaVZvny5GT16tAkKCjKzZs1yeV2yzbN/09eDbIzds5eXl5slS5aY0aNHm8DAQDN16lSTk5PTsN3m2Y0xZtOmTWb8+PFm2LBhZvLkyWbnzp0N22yfffny5eZnP/tZo9tsnv3MmTMmOTnZREREmNDQUDN16lSzd+/ehu0tPTufhwwAgAV4DBkAAAsQZAAALECQAQCwAEEGAMACBBkAAAsQZAAALECQAQCwAEEGAMACBBkAAAsQZAAALECQAQCwAEEG2qkLFy4oJSVFUVFRGj58uEaMGKG4uDgdPny44TJ//OMfFR0drcDAQE2ePFnZ2dkaOnSotm/f3nCZoqIiPfbYYxo1apSCg4M1c+ZMHTp0yImbBLRqBBlopxYvXqw333xTc+bM0aZNm/T4448rNzdXCxculDFGmZmZevzxxzVixAitXbtWEydO1Ny5c10+H/b06dP6j//4D33++edatmyZUlJSVF9fr+nTp+vYsWMO3jqg9fFyegAALa+2tlZVVVVatmyZoqOjJUmjRo1SVVWVnn32WZWVlSk1NVX33nuvnnrqKUnS2LFj1bFjR6WkpDTsZ8uWLTpz5ox+//vf6/bbb5ckjRs3TtHR0UpNTdULL7zQ8jcOaKU4QgbaIW9vb6WlpSk6OlqlpaXKyclRenq6du/eLUkqKChQUVGR7rvvPpfrTZo0yeX77OxsDRkyRL169dKlS5d06dIleXp6aty4cfrf//3fFrs9QFvAETLQTu3Zs0dPP/20vvjiC3Xp0kV33XWXunTpIknq2LGjJKlHjx4u1/H393f5/syZMyosLNSwYcMa/TPOnz8vHx+fZpgeaHsIMtAOHT9+XPPmzVNkZKR+97vfqW/fvpKk1157TXv27Gl4nLi8vNzlet/8/nvf+55GjRqlxYsXN/rneHt7N8P0QNvEKWugHfrHP/6hmpoaxcfHN8RYunzULEk9e/ZU3759tXPnTpfr/fWvf3X5ftSoUcrPz9eAAQMUGBjY8PWnP/1J27ZtU4cOHZr/xgBtBEEG2qFhw4bJy8tLv/3tb/XRRx9p9+7dSkhI0AcffCDp8qnm+fPna9euXVq+fLn+9re/aePGjUpNTZUkeXpe/qfjkUceUX19vR555BH95S9/UXZ2tpYtW6b/+Z//0Z133unUzQNaJQ9jjHF6CAAt791339WLL76o48eP65ZbblFISIgefvhhzZgxQ8uWLdP06dOVnp6utLQ0FRUV6Yc//KGmT5+upKQkrVmzRlFRUZIun/5OSUlRdna2ampq1L9/f82YMUP//u//7vAtBFoXggygUTt27NDQoUNdjnQ/+OADxcfH66233tLgwYMdnA5oewgygEbNmTNHx44d04IFC9SnTx8VFBTohRdeUL9+/fTqq686PR7Q5hBkAI2qqKhQSkqKPvzwQ50+fVq33nqrJk6cqPnz5ze8PAqA+xBkAAAswLOsAQCwAEEGAMACBBkAAAsQZAAALECQAQCwAEEGAMACBBkAAAsQZAAALECQAQCwwP8DsNqdAD50ApQAAAAASUVORK5CYII=", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.displot(df1['age'])\n", "print('Age: minimum=', df1['age'].min(), '; maximum=', df1['age'].max(), '; median=', df1['age'].median())\n", "print('The most often ages:')\n", "print(df1['age'].value_counts().sort_values(ascending=False).iloc[0:5])" ] }, { "cell_type": "markdown", "id": "7cc6c1ce", "metadata": {}, "source": [ "> Age values seem to be fully valid with exception of missing values." ] }, { "cell_type": "markdown", "id": "b017c87c", "metadata": {}, "source": [ "### Dealing with missing and invalid data\n", "\n", "Now we use exploration outcomes for the data cleaning.\n", "\n", "**TASK 1.** \n", "Consider how to treat missing or invalid data of fare, embarkment, age and cabin. Then prepare a script for data cleaning." ] }, { "cell_type": "markdown", "id": "4167d850", "metadata": {}, "source": [ "### Feature extraction\n", "\n", "Multiple persons travelled on one ticket, so they can have the same fare which was paid only once. It's a reason to make new statistical units – tickets. But is data for the same ticket consistent? Let's check the integrity of data for the tickets.\n", "\n", "**TASK 2.** \n", "Explore whether all passengers with the same ticket have the same fare, pclass, embarkment and cabin." ] }, { "cell_type": "markdown", "id": "9d8caa76", "metadata": {}, "source": [ "After you explored the consistency of data to the common ticket (and you know how to solve possible inconsistency), make a **table of tickets** by few steps:\n", "\n", "1. Base table -- unique rows of *ticket*, *pclass*, *fare* (we know these data is consistent).\n", "2. Aggregated features grouped by *ticket* -- e. g. count of passengers; join aggregated table to the base table.\n", "3. Artificial aggregation as a solution of multiple embarkment -- we take the highest value of *embarked* to unify embarkment places for tickets.\n", "\n", "**TASK 3.** \n", "Make a table with tickets as rows and features (some of them aggregated). Choose useful features for future analysis by yourself." ] }, { "cell_type": "markdown", "id": "87466f9c", "metadata": {}, "source": [ "### Data transformation\n", "\n", "* The distribution of fare is very skew. Let's transform it by log to get it better balanced.\n", "* The fare is given as a total. But it's better to get an average fare per one passenger.\n", "\n", "**TASK 4.**\n", "Add new columns to the table as stated above." ] }, { "cell_type": "markdown", "id": "5995bcca", "metadata": {}, "source": [ "**TASK 5.**\n", "1. Make new columns as meaningful categories \"binned\" from count of passengers, mean age, count of distinct cabins.\n", "2. Make flags \"child\" and \"baby\": flag is True when the youngest passenger for a ticket was under 15, resp. under 3 years.\n", "3. Find the most often combinations of men and women travelling on one ticket (e. g. \"single man\", \"man+woman\", \"two men\", \"other\" etc.) and make a new column with category description." ] }, { "cell_type": "markdown", "id": "530fcd07", "metadata": {}, "source": [ "## Part II. Home credit\n", "The dataset `application_train.csv` contains Home Credit clients who got a loan. Each client (=row in the dataset) has plenty of data in columns. We are interested in the segmentation of client portfolio. Segmentation is a division the basic dataset into some well-defined segment, like \"young single men\", \"old widow women living alone\" etc.\n", "\n", "The relevant columns are *days_birth*, *code_gender*, *cnt_children*, *cnt_fam_members*, *name_family_status*.\n", "\n", "**TASK: look into data and try to find some big segments based on some features from the set of relevant columns. You may need to do some binning before.**" ] }, { "cell_type": "code", "execution_count": 8, "id": "26e91654", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
days_birthcode_gendercnt_childrencnt_fam_membersname_family_status
0-9461M01.0Single / not married
1-16765F02.0Married
2-19046M01.0Single / not married
3-19005F02.0Civil marriage
4-19932M01.0Single / not married
..................
307506-9327M01.0Separated
307507-20775F01.0Widow
307508-14966F01.0Separated
307509-11961F02.0Married
307510-16856F02.0Married
\n", "

307511 rows × 5 columns

\n", "
" ], "text/plain": [ " days_birth code_gender cnt_children cnt_fam_members \\\n", "0 -9461 M 0 1.0 \n", "1 -16765 F 0 2.0 \n", "2 -19046 M 0 1.0 \n", "3 -19005 F 0 2.0 \n", "4 -19932 M 0 1.0 \n", "... ... ... ... ... \n", "307506 -9327 M 0 1.0 \n", "307507 -20775 F 0 1.0 \n", "307508 -14966 F 0 1.0 \n", "307509 -11961 F 0 2.0 \n", "307510 -16856 F 0 2.0 \n", "\n", " name_family_status \n", "0 Single / not married \n", "1 Married \n", "2 Single / not married \n", "3 Civil marriage \n", "4 Single / not married \n", "... ... \n", "307506 Separated \n", "307507 Widow \n", "307508 Separated \n", "307509 Married \n", "307510 Married \n", "\n", "[307511 rows x 5 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_hc = pd.read_csv('application_train.csv')\n", "df_hc.columns = df_hc.columns.str.lower()\n", "df_hc = df_hc[['days_birth', 'code_gender', 'cnt_children', 'cnt_fam_members', 'name_family_status']]\n", "df_hc" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.18" } }, "nbformat": 4, "nbformat_minor": 5 }