{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Measuring user-perceived serendipity\n",
    "\n",
    "Your task would be to devise a metric, that would be capable of measuring (predicting, in case it will have a training part) to what extent were displayed results serendipitous from the user's perspective. Serendipity is one of the key concepts behind trully successfull recommendations, but on the other hand, it is notorously difficult to properly quantify.\n",
    "\n",
    "At your disposal, there are results of a user study conducted on movie and book domains, where users were exposed to several iterations of recommendations (18 in total, supplied by 3 different algorithms). After each block of 6 iterations, users were asked to fill-in a questionnaire including their perception of recommendations serendipity. Your task is to use other available information (or collect some additional) to estimate this value.\n",
    "\n",
    "**Source data:**\n",
    "https://osf.io/chbj9/?view_only=460e981414d349a3b667c254e3b5632d\n",
    "\n",
    "**Work-in-progress report:** in the root folder. You probably only need to focus on sections 3,4 (study design and implemented algorithm variants), and eventually in the results on RQ2\n",
    "\n",
    "### Repository Content:\n",
    "**common** folder\n",
    "- common/df_books.csv, common/df_books.csv: mapping between internal IDs and those used in original datasets\n",
    "- df_elicitation_selections.csv: selected items during the preference elicitation phase of the study (i.e., initial user profile)\n",
    "- common/books.csv: additional metadata we collected for books (not directly used for results presentation)\n",
    "- common/metadata_updated.json: additional metadata we collected for movies (not directly used for results presentation)\n",
    "\n",
    "**data** folder\n",
    "- distance_matrix_rating: the distance matrix of items w.r.t. cosine of their rating vectors (books and movies separately)\n",
    "- item_item: trained EASE model with item-item relatedness (books and movies separately)\n",
    "\n",
    "**serendipity_task** folder\n",
    "- serendipity_task/after_block_questionnaire.json: results for questionnaire including serendipity evaluation\n",
    "- serendipity_task/book_data_indexed_full.json: data used to present books*\n",
    "- serendipity_task/movie_data_indexed_full.json: data used to present movies*\n",
    "- serendipity_task/df_iteration_impressions_selections_serendipities.json: details about what users received and how they reacted\n",
    "\n",
    "\\* not all information, but a substantial part:-)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\lpesk\\AppData\\Roaming\\Python\\Python38\\site-packages\\pandas\\core\\computation\\expressions.py:20: UserWarning: Pandas requires version '2.7.3' or newer of 'numexpr' (version '2.7.1' currently installed).\n",
      "  from pandas.core.computation.check import NUMEXPR_INSTALLED\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "study_results = pd.read_json(\"serendipity_task/df_iteration_impressions_selections_serendipities.json\")\n",
    "questionnaire_results = pd.read_json(\"serendipity_task/after_block_questionnaire.json\")\n",
    "\n",
    "study_results[\"block\"] = study_results[\"iteration\"] // 6\n",
    "study_results[\"block_iteration\"] = study_results[\"iteration\"] % 6\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>index</th>\n",
       "      <th>participation</th>\n",
       "      <th>items</th>\n",
       "      <th>selected_items</th>\n",
       "      <th>iteration</th>\n",
       "      <th>cf_ild</th>\n",
       "      <th>cb_ild</th>\n",
       "      <th>bin_div</th>\n",
       "      <th>relevance</th>\n",
       "      <th>sel_cb_surprise</th>\n",
       "      <th>sel_cb_unexpectedness</th>\n",
       "      <th>sel_cf_unexpectedness</th>\n",
       "      <th>sel_incremental_genre_coverage</th>\n",
       "      <th>sel_pmi_unexpectedness</th>\n",
       "      <th>block</th>\n",
       "      <th>block_iteration</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>61</td>\n",
       "      <td>[1218, 4682, 745, 12886, 9719, 3416, 6387, 279...</td>\n",
       "      <td>[2796, 6387, 12886, 3416, 1723]</td>\n",
       "      <td>0</td>\n",
       "      <td>0.490715</td>\n",
       "      <td>0.533223</td>\n",
       "      <td>0.457035</td>\n",
       "      <td>18.308832</td>\n",
       "      <td>0.313333</td>\n",
       "      <td>0.723443</td>\n",
       "      <td>0.638649</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.205209</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>61</td>\n",
       "      <td>[10199, 12644, 1996, 9609, 11478, 12514, 9267,...</td>\n",
       "      <td>[9312, 9609, 5524, 10199, 7736]</td>\n",
       "      <td>1</td>\n",
       "      <td>0.658099</td>\n",
       "      <td>0.508626</td>\n",
       "      <td>0.328032</td>\n",
       "      <td>14.200634</td>\n",
       "      <td>0.190000</td>\n",
       "      <td>0.636354</td>\n",
       "      <td>0.662195</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.258737</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>61</td>\n",
       "      <td>[6707, 7784, 12259, 9072, 10906, 10193, 4648, ...</td>\n",
       "      <td>[12259, 1800, 3948, 10193, 10906]</td>\n",
       "      <td>2</td>\n",
       "      <td>0.587088</td>\n",
       "      <td>0.507497</td>\n",
       "      <td>0.238507</td>\n",
       "      <td>12.025384</td>\n",
       "      <td>0.073333</td>\n",
       "      <td>0.634591</td>\n",
       "      <td>0.625639</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.270986</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   index  participation                                              items  \\\n",
       "0      0             61  [1218, 4682, 745, 12886, 9719, 3416, 6387, 279...   \n",
       "1      1             61  [10199, 12644, 1996, 9609, 11478, 12514, 9267,...   \n",
       "2      2             61  [6707, 7784, 12259, 9072, 10906, 10193, 4648, ...   \n",
       "\n",
       "                      selected_items  iteration    cf_ild    cb_ild   bin_div  \\\n",
       "0    [2796, 6387, 12886, 3416, 1723]          0  0.490715  0.533223  0.457035   \n",
       "1    [9312, 9609, 5524, 10199, 7736]          1  0.658099  0.508626  0.328032   \n",
       "2  [12259, 1800, 3948, 10193, 10906]          2  0.587088  0.507497  0.238507   \n",
       "\n",
       "   relevance  sel_cb_surprise  sel_cb_unexpectedness  sel_cf_unexpectedness  \\\n",
       "0  18.308832         0.313333               0.723443               0.638649   \n",
       "1  14.200634         0.190000               0.636354               0.662195   \n",
       "2  12.025384         0.073333               0.634591               0.625639   \n",
       "\n",
       "   sel_incremental_genre_coverage  sel_pmi_unexpectedness  block  \\\n",
       "0                             0.0                0.205209      0   \n",
       "1                             0.0                0.258737      0   \n",
       "2                             0.0                0.270986      0   \n",
       "\n",
       "   block_iteration  \n",
       "0                0  \n",
       "1                1  \n",
       "2                2  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "study_results.head(3)\n",
    "\n",
    "#participation: ID of a user\n",
    "#items: list of items displayed to the participant at this iteration\n",
    "#selected_items = list of items the user selected at this iteration\n",
    "#iteration: identification of the step in the study (0-17)\n",
    "#   - the study was divided into 3 blocks (each served by a different algorithm), followed by the questionnaire\n",
    "#   - block and block_iteration provide this information\n",
    "#all other columns represent individual metrics evaluated on the full list of recommendations, or on the selected items (\"sel\" prefix)\n",
    "#  - check the paper for metrics definitions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>index</th>\n",
       "      <th>dataset</th>\n",
       "      <th>participation</th>\n",
       "      <th>block</th>\n",
       "      <th>q1</th>\n",
       "      <th>q2</th>\n",
       "      <th>q3</th>\n",
       "      <th>q4</th>\n",
       "      <th>q5</th>\n",
       "      <th>q6</th>\n",
       "      <th>q7</th>\n",
       "      <th>q8</th>\n",
       "      <th>q9</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>movies</td>\n",
       "      <td>61</td>\n",
       "      <td>0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>389</td>\n",
       "      <td>movies</td>\n",
       "      <td>63</td>\n",
       "      <td>0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>-2.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>-2.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1012</td>\n",
       "      <td>movies</td>\n",
       "      <td>61</td>\n",
       "      <td>1</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>-3.0</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>-2.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>-1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1370</td>\n",
       "      <td>movies</td>\n",
       "      <td>61</td>\n",
       "      <td>2</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1723</td>\n",
       "      <td>movies</td>\n",
       "      <td>63</td>\n",
       "      <td>1</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>-2.0</td>\n",
       "      <td>-2.0</td>\n",
       "      <td>-2.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   index dataset  participation  block   q1   q2   q3   q4   q5   q6   q7  \\\n",
       "0      0  movies             61      0  1.0 -1.0  1.0 -1.0 -1.0 -1.0  1.0   \n",
       "1    389  movies             63      0  3.0 -1.0  2.0  3.0  1.0 -2.0  1.0   \n",
       "2   1012  movies             61      1 -1.0 -3.0 -1.0 -1.0  2.0 -2.0  1.0   \n",
       "3   1370  movies             61      2 -1.0  1.0  1.0 -1.0  1.0 -1.0  1.0   \n",
       "4   1723  movies             63      1  2.0  1.0 -2.0 -2.0 -2.0  2.0  3.0   \n",
       "\n",
       "    q8   q9  \n",
       "0  1.0  1.0  \n",
       "1 -2.0  3.0  \n",
       "2 -1.0 -1.0  \n",
       "3  1.0  1.0  \n",
       "4  3.0  2.0  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "questionnaire_results.head(5)\n",
    "#dataset: which dataset the participant worked on\n",
    "#participation: ID of the user\n",
    "#block: to which block this questionnaire results correspond\n",
    "#q1 - q9: responses to questions, 6-point Likert scale (strongly disagree = -3, somewhat agree = 1 etc.)\n",
    "#   - nan correspond to \"I dont understand\" response\n",
    "#   - q4 correspond to serendipity perception, i.e., this is your target value\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 1.,  3., -1.,  2., -3., -2., nan])"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# list of all questions: \n",
    "# AQ1 relevance perception    The recommended movies matched my interests.\n",
    "# AQ2 novelty perception      ... were mostly novel to me.\n",
    "# AQ3 diversity perception    ... were highly different from each other.\n",
    "# AQ4 serendipity perception  ... were unexpected yet interesting to me.\n",
    "# AQ5 exploration perception  ... differed from my usual choices.\n",
    "# AQ6 exploitation perception ... were mostly similar to what I usually watch.\n",
    "# AQ7 popularity perception   ... mostly popular (i.e., blockbusters).\n",
    "# AQ8 uniformity perception   ... mostly similar to each other.\n",
    "# AQ9 overall satisfaction    Overall, I am satisfied with the recommended movies."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Why is this a difficult task?\n",
    "- we tried several common serendipity metrics; none of those provided a reasonable correlation with the user-perceived serendipity\n",
    "- also no other evaluated metric was any better (we tried Pearsons correlation)\n",
    "![localImage](objective_after_block_q_correlations.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How to start?\n",
    "- create some notion of item-to-item distance (or similarity)\n",
    "    - we worked with rating vectors, genres, and plot embeddings; other alternatives are possible\n",
    "- evaluate the difference between selected items and, e.g., so far known user profile\n",
    "    - other alternatives possible, e.g., differences from the current offer, cluster analysis, uniqueness w.r.t. full user profile, ...\n",
    "    - several ways to aggregate the differences (we only tried mean)\n",
    "- aggregate results for all iterations of the block\n",
    "    - we only tried mean values of the list-wise unexpectedness\n",
    "- evaluate the correspondence with the user-supplied values\n",
    "    - we only tried the Pearsons correlations of the whole dataset\n",
    "    \n",
    "## Where to dig?\n",
    "We tried several different notions of items similarity, so - although I believe there is still some space for further research, perhaps there are also other possible areas of interest, namely:\n",
    "- How to aggregate the item-item similarity results (we only tried a simple mean with no weighting or no non-linearity whatsoever)\n",
    "- Train your serendipity models if you devise some parameters that needs to be set\n",
    "- Challenge the assumption of what serendipity mean. Check out some ideas of Brett Binst: https://www.serendipityengine.be/post/let-go-of-the-one-size-fits-all-definition-of-serendipity, https://dl.acm.org/doi/abs/10.1145/3640457.3688017\n",
    "- Challenge how we evaluate the correspondence. Perhaps we already have good results per-user or per-\\[some other context\\], just do not consider them properly\n",
    "\n",
    "## What to always include?\n",
    "- Reasons and hypotheses for the steps you performed: why do you think this is a good idea?\n",
    "- Comparison with (some of the) existing methods (you already have their results)\n",
    "- Results and conclusions: how do you read the obtained data?\n",
    "\n",
    "## How the semmestral work should look like?\n",
    "There are multiple possible ways to tackle this problem. So, pick 2-3 promissing directions that sounds like fun to you and check them out (check them out = justify, code, evaluate, and analyze). Positive results are warmly wellcomed, yet the negative are expected (wellcome to the real world:-). Although, in case of negative results, I might ask you to go a few steps further to check few other options and so on."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Template code to start with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "((6548, 6548), (6548, 6548))"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# loading data\n",
    "#item_item_movies = np.load(\"data/movieLens/item_item.npy\")\n",
    "item_item_books = np.load(\"data/goodbooks-10k/item_item.npy\")\n",
    "\n",
    "#distance_matrix_cf_movies = np.load(\"data/movieLens/distance_matrix_rating.npy\")\n",
    "distance_matrix_cf_books = np.load(\"data/goodbooks-10k/distance_matrix_rating.npy\")\n",
    "\n",
    "(item_item_books.shape, distance_matrix_cf_books.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- note that other pre-computed distance matrices can be found at https://osf.io/9y8gx/ in the \"repro\" folder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "#unexpectedness of a single item\n",
    "def item_unexpectedness(user_history, item, distance_matrix):\n",
    "    return distance_matrix[user_history, item].mean()\n",
    "\n",
    "#unexpectedness of a list of items, using collaborative distance matrix\n",
    "def list_unexpectedness_cf(user_history, rec_list, dataset):\n",
    "    if len(rec_list) == 0 or len(user_history) == 0:\n",
    "        return 0    \n",
    "    if dataset == \"movies\":\n",
    "        return np.array([item_unexpectedness(user_history, x, distance_matrix_cf_movies) for x in rec_list]).mean()\n",
    "    elif dataset == \"books\":\n",
    "        return np.array([item_unexpectedness(user_history, x, distance_matrix_cf_books) for x in rec_list]).mean()\n",
    "\n",
    "\n",
    "#usage in our codes:\n",
    "# for i, row in df.iterrows():\n",
    "#     full_hist = np.unique(elicitation_hist + iteration_hist)\n",
    "#     df.loc[i, \"sel_cf_unexpectedness\"] = list_unexpectedness_cf(full_hist, curr_iter_selections, row.dataset)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.6519418720830903"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#usage example\n",
    "item_unexpectedness([1,5],6,distance_matrix_cf_books)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.5961706642294122"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#usage example\n",
    "list_unexpectedness_cf([1,3,5,7], [2,4,6], \"books\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "NEG_INF = -1e6\n",
    "\n",
    "#estimated relevance of all items, using the knowledge of items previously selected by the user\n",
    "def predict_scores(item_item, all_items, selected_items, filter_out_items, k=10):\n",
    "    user_vector = np.zeros(shape=(all_items.size,), dtype=item_item.dtype)\n",
    "    if selected_items.size == 0:\n",
    "        return np.zeros_like(user_vector)\n",
    "    user_vector[selected_items] = 1\n",
    "    probs = np.dot(user_vector, item_item)\n",
    "    # Here the NEG_INF used for masking must be STRICTLY smaller than probs predicted by the algorithms\n",
    "    # So that the masking works properly\n",
    "    assert NEG_INF < probs.min()\n",
    "    # Mask out selected items\n",
    "    probs[selected_items] = NEG_INF\n",
    "    # Mask out items to be filtered\n",
    "    probs[filter_out_items] = NEG_INF\n",
    "    return probs\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count       6548.000000\n",
       "mean        -458.154907\n",
       "std        21402.509766\n",
       "min     -1000000.000000\n",
       "25%           -0.000657\n",
       "50%            0.000003\n",
       "75%            0.000671\n",
       "max            0.246466\n",
       "dtype: float64"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#usage example\n",
    "all_books = pd.Series(range(0,item_item_books.shape[0]))\n",
    "probs = predict_scores(item_item_books, all_books, pd.Series([6,7]), pd.Series([2]))\n",
    "pd.Series(probs).describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import QuantileTransformer\n",
    "\n",
    "#normalization for the relevance scores; top-k here induces the indices of items (those recommended to the user)\n",
    "def relevance_normed_movies(top_k, selected_items, filter_out_items):\n",
    "    rel_scores = predict_scores(item_item_movies, all_movies, selected_items, filter_out_items, top_k.size)\n",
    "    rel_scores_normed = QuantileTransformer().fit_transform(rel_scores.reshape(-1, 1)).reshape(rel_scores.shape)\n",
    "    return rel_scores_normed[top_k].mean()#.sum()\n",
    "    \n",
    "#normalization for the relevance scores; top-k here induces the indices of items (those recommended to the user)    \n",
    "def relevance_normed_books(top_k, selected_items, filter_out_items):\n",
    "    rel_scores = predict_scores(item_item_books, all_books, selected_items, filter_out_items, top_k.size)\n",
    "    rel_scores_normed = QuantileTransformer().fit_transform(rel_scores.reshape(-1, 1)).reshape(rel_scores.shape)\n",
    "    return rel_scores_normed[top_k].mean()#.sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9987033"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "normed_scores = relevance_normed_books(pd.Series([1,4,9]),pd.Series([6,7]), pd.Series([2]))\n",
    "normed_scores"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}