{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Measuring user-perceived serendipity\n", "\n", "Your task would be to devise a metric, that would be capable of measuring (predicting, in case it will have a training part) to what extent were displayed results serendipitous from the user's perspective. Serendipity is one of the key concepts behind trully successfull recommendations, but on the other hand, it is notorously difficult to properly quantify.\n", "\n", "At your disposal, there are results of a user study conducted on movie and book domains, where users were exposed to several iterations of recommendations (18 in total, supplied by 3 different algorithms). After each block of 6 iterations, users were asked to fill-in a questionnaire including their perception of recommendations serendipity. Your task is to use other available information (or collect some additional) to estimate this value.\n", "\n", "**Source data:**\n", "https://osf.io/chbj9/?view_only=460e981414d349a3b667c254e3b5632d\n", "\n", "**Work-in-progress report:** in the root folder. You probably only need to focus on sections 3,4 (study design and implemented algorithm variants), and eventually in the results on RQ2\n", "\n", "### Repository Content:\n", "**common** folder\n", "- common/df_books.csv, common/df_books.csv: mapping between internal IDs and those used in original datasets\n", "- df_elicitation_selections.csv: selected items during the preference elicitation phase of the study (i.e., initial user profile)\n", "- common/books.csv: additional metadata we collected for books (not directly used for results presentation)\n", "- common/metadata_updated.json: additional metadata we collected for movies (not directly used for results presentation)\n", "\n", "**data** folder\n", "- distance_matrix_rating: the distance matrix of items w.r.t. cosine of their rating vectors (books and movies separately)\n", "- item_item: trained EASE model with item-item relatedness (books and movies separately)\n", "\n", "**serendipity_task** folder\n", "- serendipity_task/after_block_questionnaire.json: results for questionnaire including serendipity evaluation\n", "- serendipity_task/book_data_indexed_full.json: data used to present books*\n", "- serendipity_task/movie_data_indexed_full.json: data used to present movies*\n", "- serendipity_task/df_iteration_impressions_selections_serendipities.json: details about what users received and how they reacted\n", "\n", "\\* not all information, but a substantial part:-)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\lpesk\\AppData\\Roaming\\Python\\Python38\\site-packages\\pandas\\core\\computation\\expressions.py:20: UserWarning: Pandas requires version '2.7.3' or newer of 'numexpr' (version '2.7.1' currently installed).\n", " from pandas.core.computation.check import NUMEXPR_INSTALLED\n" ] } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "import json" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "study_results = pd.read_json(\"serendipity_task/df_iteration_impressions_selections_serendipities.json\")\n", "questionnaire_results = pd.read_json(\"serendipity_task/after_block_questionnaire.json\")\n", "\n", "study_results[\"block\"] = study_results[\"iteration\"] // 6\n", "study_results[\"block_iteration\"] = study_results[\"iteration\"] % 6\n", "\n" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexparticipationitemsselected_itemsiterationcf_ildcb_ildbin_divrelevancesel_cb_surprisesel_cb_unexpectednesssel_cf_unexpectednesssel_incremental_genre_coveragesel_pmi_unexpectednessblockblock_iteration
0061[1218, 4682, 745, 12886, 9719, 3416, 6387, 279...[2796, 6387, 12886, 3416, 1723]00.4907150.5332230.45703518.3088320.3133330.7234430.6386490.00.20520900
1161[10199, 12644, 1996, 9609, 11478, 12514, 9267,...[9312, 9609, 5524, 10199, 7736]10.6580990.5086260.32803214.2006340.1900000.6363540.6621950.00.25873701
2261[6707, 7784, 12259, 9072, 10906, 10193, 4648, ...[12259, 1800, 3948, 10193, 10906]20.5870880.5074970.23850712.0253840.0733330.6345910.6256390.00.27098602
\n", "
" ], "text/plain": [ " index participation items \\\n", "0 0 61 [1218, 4682, 745, 12886, 9719, 3416, 6387, 279... \n", "1 1 61 [10199, 12644, 1996, 9609, 11478, 12514, 9267,... \n", "2 2 61 [6707, 7784, 12259, 9072, 10906, 10193, 4648, ... \n", "\n", " selected_items iteration cf_ild cb_ild bin_div \\\n", "0 [2796, 6387, 12886, 3416, 1723] 0 0.490715 0.533223 0.457035 \n", "1 [9312, 9609, 5524, 10199, 7736] 1 0.658099 0.508626 0.328032 \n", "2 [12259, 1800, 3948, 10193, 10906] 2 0.587088 0.507497 0.238507 \n", "\n", " relevance sel_cb_surprise sel_cb_unexpectedness sel_cf_unexpectedness \\\n", "0 18.308832 0.313333 0.723443 0.638649 \n", "1 14.200634 0.190000 0.636354 0.662195 \n", "2 12.025384 0.073333 0.634591 0.625639 \n", "\n", " sel_incremental_genre_coverage sel_pmi_unexpectedness block \\\n", "0 0.0 0.205209 0 \n", "1 0.0 0.258737 0 \n", "2 0.0 0.270986 0 \n", "\n", " block_iteration \n", "0 0 \n", "1 1 \n", "2 2 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "study_results.head(3)\n", "\n", "#participation: ID of a user\n", "#items: list of items displayed to the participant at this iteration\n", "#selected_items = list of items the user selected at this iteration\n", "#iteration: identification of the step in the study (0-17)\n", "# - the study was divided into 3 blocks (each served by a different algorithm), followed by the questionnaire\n", "# - block and block_iteration provide this information\n", "#all other columns represent individual metrics evaluated on the full list of recommendations, or on the selected items (\"sel\" prefix)\n", "# - check the paper for metrics definitions" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexdatasetparticipationblockq1q2q3q4q5q6q7q8q9
00movies6101.0-1.01.0-1.0-1.0-1.01.01.01.0
1389movies6303.0-1.02.03.01.0-2.01.0-2.03.0
21012movies611-1.0-3.0-1.0-1.02.0-2.01.0-1.0-1.0
31370movies612-1.01.01.0-1.01.0-1.01.01.01.0
41723movies6312.01.0-2.0-2.0-2.02.03.03.02.0
\n", "
" ], "text/plain": [ " index dataset participation block q1 q2 q3 q4 q5 q6 q7 \\\n", "0 0 movies 61 0 1.0 -1.0 1.0 -1.0 -1.0 -1.0 1.0 \n", "1 389 movies 63 0 3.0 -1.0 2.0 3.0 1.0 -2.0 1.0 \n", "2 1012 movies 61 1 -1.0 -3.0 -1.0 -1.0 2.0 -2.0 1.0 \n", "3 1370 movies 61 2 -1.0 1.0 1.0 -1.0 1.0 -1.0 1.0 \n", "4 1723 movies 63 1 2.0 1.0 -2.0 -2.0 -2.0 2.0 3.0 \n", "\n", " q8 q9 \n", "0 1.0 1.0 \n", "1 -2.0 3.0 \n", "2 -1.0 -1.0 \n", "3 1.0 1.0 \n", "4 3.0 2.0 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "questionnaire_results.head(5)\n", "#dataset: which dataset the participant worked on\n", "#participation: ID of the user\n", "#block: to which block this questionnaire results correspond\n", "#q1 - q9: responses to questions, 6-point Likert scale (strongly disagree = -3, somewhat agree = 1 etc.)\n", "# - nan correspond to \"I dont understand\" response\n", "# - q4 correspond to serendipity perception, i.e., this is your target value\n" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1., 3., -1., 2., -3., -2., nan])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# list of all questions: \n", "# AQ1 relevance perception The recommended movies matched my interests.\n", "# AQ2 novelty perception ... were mostly novel to me.\n", "# AQ3 diversity perception ... were highly different from each other.\n", "# AQ4 serendipity perception ... were unexpected yet interesting to me.\n", "# AQ5 exploration perception ... differed from my usual choices.\n", "# AQ6 exploitation perception ... were mostly similar to what I usually watch.\n", "# AQ7 popularity perception ... mostly popular (i.e., blockbusters).\n", "# AQ8 uniformity perception ... mostly similar to each other.\n", "# AQ9 overall satisfaction Overall, I am satisfied with the recommended movies." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Why is this a difficult task?\n", "- we tried several common serendipity metrics; none of those provided a reasonable correlation with the user-perceived serendipity\n", "- also no other evaluated metric was any better (we tried Pearsons correlation)\n", "![localImage](objective_after_block_q_correlations.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to start?\n", "- create some notion of item-to-item distance (or similarity)\n", " - we worked with rating vectors, genres, and plot embeddings; other alternatives are possible\n", "- evaluate the difference between selected items and, e.g., so far known user profile\n", " - other alternatives possible, e.g., differences from the current offer, cluster analysis, uniqueness w.r.t. full user profile, ...\n", " - several ways to aggregate the differences (we only tried mean)\n", "- aggregate results for all iterations of the block\n", " - we only tried mean values of the list-wise unexpectedness\n", "- evaluate the correspondence with the user-supplied values\n", " - we only tried the Pearsons correlations of the whole dataset\n", " \n", "## Where to dig?\n", "We tried several different notions of items similarity, so - although I believe there is still some space for further research, perhaps there are also other possible areas of interest, namely:\n", "- How to aggregate the item-item similarity results (we only tried a simple mean with no weighting or no non-linearity whatsoever)\n", "- Train your serendipity models if you devise some parameters that needs to be set\n", "- Challenge the assumption of what serendipity mean. Check out some ideas of Brett Binst: https://www.serendipityengine.be/post/let-go-of-the-one-size-fits-all-definition-of-serendipity, https://dl.acm.org/doi/abs/10.1145/3640457.3688017\n", "- Challenge how we evaluate the correspondence. Perhaps we already have good results per-user or per-\\[some other context\\], just do not consider them properly\n", "\n", "## What to always include?\n", "- Reasons and hypotheses for the steps you performed: why do you think this is a good idea?\n", "- Comparison with (some of the) existing methods (you already have their results)\n", "- Results and conclusions: how do you read the obtained data?\n", "\n", "## How the semmestral work should look like?\n", "There are multiple possible ways to tackle this problem. So, pick 2-3 promissing directions that sounds like fun to you and check them out (check them out = justify, code, evaluate, and analyze). Positive results are warmly wellcomed, yet the negative are expected (wellcome to the real world:-). Although, in case of negative results, I might ask you to go a few steps further to check few other options and so on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Template code to start with:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((6548, 6548), (6548, 6548))" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# loading data\n", "#item_item_movies = np.load(\"data/movieLens/item_item.npy\")\n", "item_item_books = np.load(\"data/goodbooks-10k/item_item.npy\")\n", "\n", "#distance_matrix_cf_movies = np.load(\"data/movieLens/distance_matrix_rating.npy\")\n", "distance_matrix_cf_books = np.load(\"data/goodbooks-10k/distance_matrix_rating.npy\")\n", "\n", "(item_item_books.shape, distance_matrix_cf_books.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- note that other pre-computed distance matrices can be found at https://osf.io/9y8gx/ in the \"repro\" folder" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "#unexpectedness of a single item\n", "def item_unexpectedness(user_history, item, distance_matrix):\n", " return distance_matrix[user_history, item].mean()\n", "\n", "#unexpectedness of a list of items, using collaborative distance matrix\n", "def list_unexpectedness_cf(user_history, rec_list, dataset):\n", " if len(rec_list) == 0 or len(user_history) == 0:\n", " return 0 \n", " if dataset == \"movies\":\n", " return np.array([item_unexpectedness(user_history, x, distance_matrix_cf_movies) for x in rec_list]).mean()\n", " elif dataset == \"books\":\n", " return np.array([item_unexpectedness(user_history, x, distance_matrix_cf_books) for x in rec_list]).mean()\n", "\n", "\n", "#usage in our codes:\n", "# for i, row in df.iterrows():\n", "# full_hist = np.unique(elicitation_hist + iteration_hist)\n", "# df.loc[i, \"sel_cf_unexpectedness\"] = list_unexpectedness_cf(full_hist, curr_iter_selections, row.dataset)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.6519418720830903" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#usage example\n", "item_unexpectedness([1,5],6,distance_matrix_cf_books)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5961706642294122" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#usage example\n", "list_unexpectedness_cf([1,3,5,7], [2,4,6], \"books\")" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "NEG_INF = -1e6\n", "\n", "#estimated relevance of all items, using the knowledge of items previously selected by the user\n", "def predict_scores(item_item, all_items, selected_items, filter_out_items, k=10):\n", " user_vector = np.zeros(shape=(all_items.size,), dtype=item_item.dtype)\n", " if selected_items.size == 0:\n", " return np.zeros_like(user_vector)\n", " user_vector[selected_items] = 1\n", " probs = np.dot(user_vector, item_item)\n", " # Here the NEG_INF used for masking must be STRICTLY smaller than probs predicted by the algorithms\n", " # So that the masking works properly\n", " assert NEG_INF < probs.min()\n", " # Mask out selected items\n", " probs[selected_items] = NEG_INF\n", " # Mask out items to be filtered\n", " probs[filter_out_items] = NEG_INF\n", " return probs\n", "\n" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 6548.000000\n", "mean -458.154907\n", "std 21402.509766\n", "min -1000000.000000\n", "25% -0.000657\n", "50% 0.000003\n", "75% 0.000671\n", "max 0.246466\n", "dtype: float64" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#usage example\n", "all_books = pd.Series(range(0,item_item_books.shape[0]))\n", "probs = predict_scores(item_item_books, all_books, pd.Series([6,7]), pd.Series([2]))\n", "pd.Series(probs).describe()" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "from sklearn.preprocessing import QuantileTransformer\n", "\n", "#normalization for the relevance scores; top-k here induces the indices of items (those recommended to the user)\n", "def relevance_normed_movies(top_k, selected_items, filter_out_items):\n", " rel_scores = predict_scores(item_item_movies, all_movies, selected_items, filter_out_items, top_k.size)\n", " rel_scores_normed = QuantileTransformer().fit_transform(rel_scores.reshape(-1, 1)).reshape(rel_scores.shape)\n", " return rel_scores_normed[top_k].mean()#.sum()\n", " \n", "#normalization for the relevance scores; top-k here induces the indices of items (those recommended to the user) \n", "def relevance_normed_books(top_k, selected_items, filter_out_items):\n", " rel_scores = predict_scores(item_item_books, all_books, selected_items, filter_out_items, top_k.size)\n", " rel_scores_normed = QuantileTransformer().fit_transform(rel_scores.reshape(-1, 1)).reshape(rel_scores.shape)\n", " return rel_scores_normed[top_k].mean()#.sum()" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9987033" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "normed_scores = relevance_normed_books(pd.Series([1,4,9]),pd.Series([6,7]), pd.Series([2]))\n", "normed_scores" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }