{
"cells": [
{
"cell_type": "markdown",
"id": "72784d36",
"metadata": {},
"source": [
"# Working with study results in EasyStudy\n",
"- Data collected for the following paper https://dl.acm.org/doi/10.1145/3539618.3592056\n",
"- Normally, data stored in SQLite database \n",
" -> use any suitable DB viewer, e.g. https://sqlitebrowser.org/\n",
" -> simple exports can store them into CSV or JSON\n",
" \n",
"Two main tables: **participants** and **interactions**\n",
"\n",
"## Study details:\n",
"- evaluate two recommending algorithms; Gamma = generalized matrix factorization, Delta = multi-objective RS (weighted average over relevance, novelty, and diversity objectives)\n",
"- at each iteration (8 in total), results of both algorithms were displayed and organized w.r.t. result_layout into rows or columns (several variants)\n",
" - at each iteration, users can select items they're interested in, rate individual algorithms (1-5 scale) and provide pairwise comparison of algorithms\n",
"- original research questions about suitability of individual result_layouts. In your case, the tasks are more aligned with supposed semestral works\n",
"\n",
"## Your task:\n",
"- Inspect the data (similar will be produced from your semestral works)\n",
"- Perform simple evaluation of results\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "6d868b42",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import os\n",
"\n",
"import warnings\n",
"warnings.simplefilter(action='ignore', category=FutureWarning)"
]
},
{
"cell_type": "markdown",
"id": "1c8ae166",
"metadata": {},
"source": [
"# Pre-process Participants data\n",
"- this was already done for you, just FYI\n",
"- remove test users, remove sensitive data, only keep participants who completed the whole study"
]
},
{
"cell_type": "markdown",
"id": "1f1c2031",
"metadata": {},
"source": [
"df_participation = pd.read_csv(\"participation-export.csv\", index_col=0)\n",
"\n",
"#filter-out test users\n",
"\n",
"df_participation = df_participation.loc[31:]\n",
"df_participation.participant_email = df_participation.participant_email.fillna(\"\")\n",
"df_participation = df_participation[~df_participation.participant_email.str.contains(\"testuser\")]\n",
"df_participation.shape\n",
"\n",
"#filter-out unfinished participations\n",
"\n",
"df_participation = df_participation[df_participation.age_group.notna()]\n",
"df_completed_participation = df_participation[df_participation.time_finished.notna()]\n",
"df_uncompleted_participation = df_participation[df_participation.time_finished.isna()]\n",
"\n",
"#remove sensitive data and save for further usage\n",
"\n",
"df_completed_participation = df_completed_participation.drop([\"participant_email\",\"extra_data\"], axis=1)\n",
"df_completed_participation.to_csv(\"participation-export_filtered.csv\")"
]
},
{
"cell_type": "markdown",
"id": "e7b4f5a4",
"metadata": {},
"source": [
"# Pre-process Interactions\n",
"- this was already done for you, just FYI"
]
},
{
"cell_type": "markdown",
"id": "2292ae8f",
"metadata": {},
"source": [
"df_interaction = pd.read_json(\"interaction-export.json\", encoding='utf-8')\n",
"\n",
"def get_participants_interaction(df_i, df_p):\n",
" return df_i[df_i.participation.isin(df_p.index)]\n",
"\n",
"\n",
"#Only get interactions of non-dummy-data participants who completed the study\n",
"df_interaction = get_participants_interaction(df_interaction, df_completed_participation)\n",
"df_interaction.to_json(\"interaction-export_filtered.json\")\n",
"\n",
"#Toy dataset for faster downloads (using only 20 participants)\n",
"df_interactionTop20 = get_participants_interaction(df_interaction, df_completed_participation.iloc[0:20])\n",
"df_interactionTop20.to_json(\"interaction-export_filteredSmall.json\")\n"
]
},
{
"cell_type": "markdown",
"id": "b1d80635",
"metadata": {
"scrolled": true
},
"source": [
"# Load data"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "f98fdecb",
"metadata": {},
"outputs": [],
"source": [
"df_interaction = pd.read_json(\"interaction-export_filtered.json\", encoding='utf-8')\n",
"df_completed_participation = pd.read_csv(\"participation-export_filtered.csv\", index_col=0)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "cb282f50",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" participation | \n",
" interaction_type | \n",
" time | \n",
" data | \n",
"
\n",
" \n",
" \n",
" \n",
" 2718 | \n",
" 3219 | \n",
" 36 | \n",
" loaded-page | \n",
" 2023-01-15 23:13:22.508734 | \n",
" {\"page\": \"preference_elicitation\", \"context\": ... | \n",
"
\n",
" \n",
" 2719 | \n",
" 3220 | \n",
" 36 | \n",
" changed-viewport | \n",
" 2023-01-15 23:13:27.502013 | \n",
" {\"viewport\": {\"left\": 0, \"top\": 0, \"width\": 25... | \n",
"
\n",
" \n",
" 2720 | \n",
" 3221 | \n",
" 36 | \n",
" on-input | \n",
" 2023-01-15 23:13:31.569834 | \n",
" {\"search_text_box_value\": \"potter\", \"context\":... | \n",
"
\n",
" \n",
" 2721 | \n",
" 3222 | \n",
" 36 | \n",
" on-input | \n",
" 2023-01-15 23:13:31.580454 | \n",
" {\"id\": \"\", \"text\": \"Search\", \"name\": \"search\",... | \n",
"
\n",
" \n",
" 2722 | \n",
" 3223 | \n",
" 36 | \n",
" selected-item | \n",
" 2023-01-15 23:13:32.643579 | \n",
" {\"selected_item\": {\"movieName\": \"Harry Potter ... | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id participation interaction_type time \\\n",
"2718 3219 36 loaded-page 2023-01-15 23:13:22.508734 \n",
"2719 3220 36 changed-viewport 2023-01-15 23:13:27.502013 \n",
"2720 3221 36 on-input 2023-01-15 23:13:31.569834 \n",
"2721 3222 36 on-input 2023-01-15 23:13:31.580454 \n",
"2722 3223 36 selected-item 2023-01-15 23:13:32.643579 \n",
"\n",
" data \n",
"2718 {\"page\": \"preference_elicitation\", \"context\": ... \n",
"2719 {\"viewport\": {\"left\": 0, \"top\": 0, \"width\": 25... \n",
"2720 {\"search_text_box_value\": \"potter\", \"context\":... \n",
"2721 {\"id\": \"\", \"text\": \"Search\", \"name\": \"search\",... \n",
"2722 {\"selected_item\": {\"movieName\": \"Harry Potter ... "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_interaction.head()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "06ed24dc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['loaded-page', 'changed-viewport', 'on-input', 'selected-item',\n",
" 'elicitation-ended', 'iteration-started', 'iteration-ended',\n",
" 'study-ended', 'deselected-item'], dtype=object)"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_interaction.interaction_type.unique()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "4b0d01e6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" age_group | \n",
" gender | \n",
" education | \n",
" ml_familiar | \n",
" user_study_id | \n",
" time_joined | \n",
" time_finished | \n",
" uuid | \n",
" language | \n",
"
\n",
" \n",
" id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 36 | \n",
" 21.0 | \n",
" 0.0 | \n",
" 4.0 | \n",
" True | \n",
" 13 | \n",
" 2023-01-15 23:13:22.047657 | \n",
" 2023-01-15 23:20:18.022112 | \n",
" JQpLq2n0cE-86IMSaaIuig | \n",
" en | \n",
"
\n",
" \n",
" 37 | \n",
" 29.0 | \n",
" 0.0 | \n",
" 5.0 | \n",
" True | \n",
" 13 | \n",
" 2023-01-15 23:19:44.236601 | \n",
" 2023-01-15 23:27:10.655670 | \n",
" H43BQ14jGPykvJK-BPMPlg | \n",
" en | \n",
"
\n",
" \n",
" 39 | \n",
" 21.0 | \n",
" 0.0 | \n",
" 2.0 | \n",
" True | \n",
" 13 | \n",
" 2023-01-16 17:04:24.403909 | \n",
" 2023-01-16 17:20:09.564196 | \n",
" Bunsc02cUiXVrGE_1R7StQ | \n",
" en | \n",
"
\n",
" \n",
" 40 | \n",
" 21.0 | \n",
" 1.0 | \n",
" 2.0 | \n",
" False | \n",
" 13 | \n",
" 2023-01-16 18:59:21.828438 | \n",
" 2023-01-16 19:12:04.004289 | \n",
" 9fKA90WNfh5mzgD_v1Otgg | \n",
" cs | \n",
"
\n",
" \n",
" 42 | \n",
" 21.0 | \n",
" 0.0 | \n",
" 4.0 | \n",
" False | \n",
" 13 | \n",
" 2023-01-16 23:14:00.585090 | \n",
" 2023-01-16 23:23:47.331900 | \n",
" pVVFqHS8_nWp0uISHV61qg | \n",
" en | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" age_group gender education ml_familiar user_study_id \\\n",
"id \n",
"36 21.0 0.0 4.0 True 13 \n",
"37 29.0 0.0 5.0 True 13 \n",
"39 21.0 0.0 2.0 True 13 \n",
"40 21.0 1.0 2.0 False 13 \n",
"42 21.0 0.0 4.0 False 13 \n",
"\n",
" time_joined time_finished \\\n",
"id \n",
"36 2023-01-15 23:13:22.047657 2023-01-15 23:20:18.022112 \n",
"37 2023-01-15 23:19:44.236601 2023-01-15 23:27:10.655670 \n",
"39 2023-01-16 17:04:24.403909 2023-01-16 17:20:09.564196 \n",
"40 2023-01-16 18:59:21.828438 2023-01-16 19:12:04.004289 \n",
"42 2023-01-16 23:14:00.585090 2023-01-16 23:23:47.331900 \n",
"\n",
" uuid language \n",
"id \n",
"36 JQpLq2n0cE-86IMSaaIuig en \n",
"37 H43BQ14jGPykvJK-BPMPlg en \n",
"39 Bunsc02cUiXVrGE_1R7StQ en \n",
"40 9fKA90WNfh5mzgD_v1Otgg cs \n",
"42 pVVFqHS8_nWp0uISHV61qg en "
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_completed_participation.head()"
]
},
{
"cell_type": "markdown",
"id": "44ba1823",
"metadata": {},
"source": [
"# Enrich the interactions data frame\n",
"- retrieve necessary information from \"data\" payload, e.g., \n",
" - in what iteration we are \n",
" - what layout was used\n",
" - ordering of algorithms"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "c6cb1493",
"metadata": {},
"outputs": [],
"source": [
"N_ITERATIONS = 8\n",
"def get_iteration(x):\n",
" return json.loads(x)[\"iteration\"]"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "cfb48855",
"metadata": {},
"outputs": [],
"source": [
"#This cell takes up to a minute or two to complete\n",
"import json\n",
"def set_iteration(row):\n",
" if row.interaction_type == \"iteration-started\" or row.interaction_type == \"iteration-ended\":\n",
" row['iteration'] = json.loads(row.data)['iteration']\n",
" else:\n",
" row['iteration'] = None\n",
" return row\n",
"\n",
"def set_result_layout(row):\n",
" if row.interaction_type == \"iteration-started\":\n",
" row['result_layout'] = json.loads(row.data)['result_layout']\n",
" else:\n",
" row['result_layout'] = None\n",
" return row\n",
"\n",
"#'algorithm_assignment': {'0': {'algorithm': 'relevance_based',\n",
"# 'name': 'gamma',\n",
"# 'order': 1},\n",
"# '1': {'algorithm': 'weighted_average', 'name': 'delta', 'order': 0}},\n",
"\n",
"def set_mapping(row):\n",
" if row.interaction_type == 'iteration-started':\n",
" dat = json.loads(row.data)['algorithm_assignment'].values()\n",
" for mapping in dat:\n",
" row[mapping['name'].upper()] = mapping['order']\n",
" else:\n",
" row['GAMMA'] = None\n",
" row['DELTA'] = None\n",
" return row\n",
"\n",
"\n",
"\n",
"\n",
"d = df_interaction.copy()\n",
"d = d.set_index(\"id\")\n",
"d = d.apply(set_iteration, axis=1).apply(set_result_layout, axis=1).apply(set_mapping, axis=1)\n",
"d['iteration'] = d.groupby(['participation'], sort=False)['iteration'].apply(lambda x: x.ffill())\n",
"d['result_layout'] = d.groupby(['participation'], sort=False)['result_layout'].apply(lambda x: x.ffill())\n",
"d['GAMMA'] = d.groupby(['participation'], sort=False)['GAMMA'].apply(lambda x: x.ffill())\n",
"d['DELTA'] = d.groupby(['participation'], sort=False)['DELTA'].apply(lambda x: x.ffill())\n",
"d = d[d.iteration.notna()]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7dee65e0",
"metadata": {},
"outputs": [],
"source": [
"#in case of problems with the code above, use the following:\n",
"d = pd.read_json(\"interaction-export_filteredEnriched.json\", encoding='utf-8')"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "dbb8dbcc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"participation 173\n",
"interaction_type iteration-started\n",
"time 2023-01-20 10:33:24.402124\n",
"data {\"iteration\": 3, \"weights\": [0.333333333333333...\n",
"iteration 3.0\n",
"result_layout rows\n",
"GAMMA 1.0\n",
"DELTA 0.0\n",
"Name: 26009, dtype: object"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# how does the record with iteration-started looks like?\n",
"d.loc[26009]"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "c47be068",
"metadata": {},
"outputs": [],
"source": [
"# GAMMA and DELTA fields denote which algorithm was at advantaged (0) or disadvantaged (1) position. \n",
"# - Advantaged position denote left column, top row and so on"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "ab2f48fe",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'iteration': 3,\n",
" 'weights': [0.33333333333333337, 0.33333333333333337, 0.33333333333333337],\n",
" 'movies': {'gamma': {'movies': [{'movie': 'Valerian and the City of a Thousand Planets (2017)',\n",
" 'url': '/assets/utils/ml-latest/img/173291.jpg',\n",
" 'movie_idx': '1462',\n",
" 'movie_id': 173291,\n",
" 'genres': ['Action', 'Adventure', 'Sci-Fi']},\n",
" {'movie': 'Transcendence (2014)',\n",
" 'url': '/assets/utils/ml-latest/img/110730.jpg',\n",
" 'movie_idx': '1072',\n",
" 'movie_id': 110730,\n",
" 'genres': ['Drama', 'Sci-Fi', 'IMAX']},\n",
" {'movie': 'Rampage (2018)',\n",
" 'url': '/assets/utils/ml-latest/img/186587.jpg',\n",
" 'movie_idx': '1665',\n",
" 'movie_id': 186587,\n",
" 'genres': ['Action', 'Adventure', 'Sci-Fi']},\n",
" {'movie': 'Resident Evil: The Final Chapter (2017)',\n",
" 'url': '/assets/utils/ml-latest/img/168498.jpg',\n",
" 'movie_idx': '1429',\n",
" 'movie_id': 168498,\n",
" 'genres': ['Action', 'Horror', 'Sci-Fi']},\n",
" {'movie': 'Project Almanac (2015)',\n",
" 'url': '/assets/utils/ml-latest/img/127096.jpg',\n",
" 'movie_idx': '1191',\n",
" 'movie_id': 127096,\n",
" 'genres': ['Sci-Fi', 'Thriller']},\n",
" {'movie': 'Pixels (2015)',\n",
" 'url': '/assets/utils/ml-latest/img/135137.jpg',\n",
" 'movie_idx': '1229',\n",
" 'movie_id': 135137,\n",
" 'genres': ['Action', 'Comedy', 'Sci-Fi']},\n",
" {'movie': 'Independence Day: Resurgence (2016)',\n",
" 'url': '/assets/utils/ml-latest/img/135567.jpg',\n",
" 'movie_idx': '1238',\n",
" 'movie_id': 135567,\n",
" 'genres': ['Action', 'Adventure', 'Sci-Fi']},\n",
" {'movie': 'Transformers: The Last Knight (2017)',\n",
" 'url': '/assets/utils/ml-latest/img/174585.jpg',\n",
" 'movie_idx': '1468',\n",
" 'movie_id': 174585,\n",
" 'genres': ['Action', 'Adventure', 'Sci-Fi', 'Thriller']},\n",
" {'movie': 'Hansel & Gretel: Witch Hunters (2013)',\n",
" 'url': '/assets/utils/ml-latest/img/100163.jpg',\n",
" 'movie_idx': '966',\n",
" 'movie_id': 100163,\n",
" 'genres': ['Action', 'Fantasy', 'Horror', 'IMAX']},\n",
" {'movie': 'Seven Sisters (2017)',\n",
" 'url': '/assets/utils/ml-latest/img/173925.jpg',\n",
" 'movie_idx': '1464',\n",
" 'movie_id': 173925,\n",
" 'genres': ['Sci-Fi', 'Thriller']}],\n",
" 'order': 1},\n",
" 'delta': {'movies': [{'movie': 'In Time (2011)',\n",
" 'url': '/assets/utils/ml-latest/img/90405.jpg',\n",
" 'movie_idx': '871',\n",
" 'movie_id': 90405,\n",
" 'genres': ['Crime', 'Sci-Fi', 'Thriller']},\n",
" {'movie': \"Ender's Game (2013)\",\n",
" 'url': '/assets/utils/ml-latest/img/106002.jpg',\n",
" 'movie_idx': '1034',\n",
" 'movie_id': 106002,\n",
" 'genres': ['Action', 'Adventure', 'Sci-Fi', 'IMAX']},\n",
" {'movie': 'Transcendence (2014)',\n",
" 'url': '/assets/utils/ml-latest/img/110730.jpg',\n",
" 'movie_idx': '1072',\n",
" 'movie_id': 110730,\n",
" 'genres': ['Drama', 'Sci-Fi', 'IMAX']},\n",
" {'movie': 'Lucy (2014)',\n",
" 'url': '/assets/utils/ml-latest/img/111360.jpg',\n",
" 'movie_idx': '1077',\n",
" 'movie_id': 111360,\n",
" 'genres': ['Action', 'Sci-Fi']},\n",
" {'movie': 'Passengers (2016)',\n",
" 'url': '/assets/utils/ml-latest/img/166635.jpg',\n",
" 'movie_idx': '1399',\n",
" 'movie_id': 166635,\n",
" 'genres': ['Adventure', 'Drama', 'Romance', 'Sci-Fi']},\n",
" {'movie': 'Riddick (2013)',\n",
" 'url': '/assets/utils/ml-latest/img/104243.jpg',\n",
" 'movie_idx': '1016',\n",
" 'movie_id': 104243,\n",
" 'genres': ['Action', 'Sci-Fi', 'Thriller', 'IMAX']},\n",
" {'movie': 'John Carter (2012)',\n",
" 'url': '/assets/utils/ml-latest/img/93363.jpg',\n",
" 'movie_idx': '900',\n",
" 'movie_id': 93363,\n",
" 'genres': ['Action', 'Adventure', 'Sci-Fi', 'IMAX']},\n",
" {'movie': 'Terminator Genisys (2015)',\n",
" 'url': '/assets/utils/ml-latest/img/120799.jpg',\n",
" 'movie_idx': '1170',\n",
" 'movie_id': 120799,\n",
" 'genres': ['Action', 'Adventure', 'Sci-Fi', 'Thriller']},\n",
" {'movie': 'Total Recall (2012)',\n",
" 'url': '/assets/utils/ml-latest/img/95875.jpg',\n",
" 'movie_idx': '923',\n",
" 'movie_id': 95875,\n",
" 'genres': ['Action', 'Sci-Fi', 'Thriller']},\n",
" {'movie': 'Elysium (2013)',\n",
" 'url': '/assets/utils/ml-latest/img/103253.jpg',\n",
" 'movie_idx': '998',\n",
" 'movie_id': 103253,\n",
" 'genres': ['Action', 'Drama', 'Sci-Fi', 'IMAX']}],\n",
" 'order': 0}},\n",
" 'algorithm_assignment': {'0': {'algorithm': 'relevance_based',\n",
" 'name': 'gamma',\n",
" 'order': 1},\n",
" '1': {'algorithm': 'weighted_average', 'name': 'delta', 'order': 0}},\n",
" 'result_layout': 'rows',\n",
" 'refinement_layout': '3',\n",
" 'shown': {'relevance_based': [[1288,\n",
" 1108,\n",
" 1327,\n",
" 972,\n",
" 1084,\n",
" 1367,\n",
" 1326,\n",
" 1204,\n",
" 1377,\n",
" 1107],\n",
" [1422, 1236, 1346, 1409, 1386, 1471, 1352, 1433, 1403, 1349],\n",
" [1462, 1072, 1665, 1429, 1191, 1229, 1238, 1468, 966, 1464]],\n",
" 'weighted_average': [[1084,\n",
" 1204,\n",
" 1163,\n",
" 972,\n",
" 954,\n",
" 973,\n",
" 797,\n",
" 1206,\n",
" 1275,\n",
" 1089],\n",
" [1054, 1230, 1205, 1213, 1154, 813, 733, 1117, 764, 1166],\n",
" [871, 1034, 1072, 1077, 1399, 1016, 900, 1170, 923, 998]]}}"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# the main payload reside in the \"data\" field\n",
"# it contains (among other) list of movies as appeared in the list for both gamma and delta algorithms\n",
"# full history of shown items also from previous iterations is available from \"shown\"\n",
"# movie_idx vs. movie_id: the ordering in our reduced dataset vs. the original ID from MovieLens\n",
"json.loads(d.loc[26009][\"data\"])"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "90206979",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"participation 36\n",
"interaction_type iteration-ended\n",
"time 2023-01-15 23:19:22.851985\n",
"data {\"iteration\": 6, \"selected\": [[1040, 314, 355,...\n",
"iteration 6.0\n",
"result_layout columns\n",
"GAMMA 1.0\n",
"DELTA 0.0\n",
"Name: 3377, dtype: object"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# how does the record with iteration-ended looks like?\n",
"d.loc[3377]"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "98463259",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'iteration': 6,\n",
" 'selected': [[1040, 314, 355, 956, 1231, 468],\n",
" [1155, 1228, 1142, 1039, 883],\n",
" [1140, 1054],\n",
" [293, 392, 885, 1345],\n",
" [1417, 329],\n",
" [345, 437]],\n",
" 'new_weights': [0.33333333333333337,\n",
" 0.33333333333333337,\n",
" 0.33333333333333337],\n",
" 'selected_variants': [[0, 0, 0, 0, 1, 1],\n",
" [0, 0, 0, 0, 0],\n",
" [1, 0],\n",
" [0, 0, 0, 0],\n",
" [0, 1],\n",
" [0, 0]],\n",
" 'dont_like_anything': [False, False, False, False, False, False],\n",
" 'algorithm_comparison': ['third',\n",
" 'third',\n",
" 'fourth',\n",
" 'second',\n",
" 'third',\n",
" 'first'],\n",
" 'ratings': [{'gamma': 4.0, 'delta': 4.0},\n",
" {'gamma': 4.0, 'delta': 4.0},\n",
" {'gamma': 2.0, 'delta': 3.0},\n",
" {'gamma': 2.0, 'delta': 3.0},\n",
" {'gamma': 2.0, 'delta': 2.0},\n",
" {'gamma': 1.0, 'delta': 3.0}]}"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"json.loads(d.loc[3377][\"data\"])\n",
"#selected_variants: did the click appear on advanteged or disadvantaged algorithm?\n",
"#selected: sequence of all selected items (movie_idx) in all iterations so far"
]
},
{
"cell_type": "markdown",
"id": "6f9097db",
"metadata": {},
"source": [
"# Filter only the information about item selections\n",
"- alternatively, similar information (without the ordering of clicks) can be collected from iteration-ended"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "2599e54f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(30305, 9)\n",
"(30264, 9)\n"
]
}
],
"source": [
"# adding information on whether the selected item was displayed on an advantaged position (variant=0), or not (variant=1)\n",
"d[\"variant\"] = -1\n",
"print(d.shape)\n",
"d.loc[d[\"interaction_type\"] == \"selected-item\", \"variant\"] = d[d[\"interaction_type\"] == \"selected-item\"].data.map(lambda x: json.loads(x)[\"selected_item\"]).map(lambda x: x.get(\"variant\", -1))\n",
"d = d.loc[d.iteration <= 8] \n",
"print(d.shape)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "e1dba16d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(6980, 9)"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"selected_item_interactions = d[d.variant >= 0].copy()\n",
"selected_item_interactions.shape"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "a12ab6a5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" participation | \n",
" interaction_type | \n",
" time | \n",
" data | \n",
" iteration | \n",
" result_layout | \n",
" GAMMA | \n",
" DELTA | \n",
" variant | \n",
"
\n",
" \n",
" id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 3271 | \n",
" 36 | \n",
" selected-item | \n",
" 2023-01-15 23:15:59.742434 | \n",
" {\"selected_item\": {\"genres\": [\"Adventure\", \"Fa... | \n",
" 1.0 | \n",
" row-single-scrollable | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0 | \n",
"
\n",
" \n",
" 3272 | \n",
" 36 | \n",
" selected-item | \n",
" 2023-01-15 23:16:00.293013 | \n",
" {\"selected_item\": {\"genres\": [\"Adventure\", \"Fa... | \n",
" 1.0 | \n",
" row-single-scrollable | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0 | \n",
"
\n",
" \n",
" 3273 | \n",
" 36 | \n",
" selected-item | \n",
" 2023-01-15 23:16:00.636370 | \n",
" {\"selected_item\": {\"genres\": [\"Action\", \"Adven... | \n",
" 1.0 | \n",
" row-single-scrollable | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0 | \n",
"
\n",
" \n",
" 3274 | \n",
" 36 | \n",
" selected-item | \n",
" 2023-01-15 23:16:02.500438 | \n",
" {\"selected_item\": {\"genres\": [\"Adventure\", \"Fa... | \n",
" 1.0 | \n",
" row-single-scrollable | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0 | \n",
"
\n",
" \n",
" 3277 | \n",
" 36 | \n",
" selected-item | \n",
" 2023-01-15 23:16:09.472159 | \n",
" {\"selected_item\": {\"genres\": [\"Fantasy\"], \"mov... | \n",
" 1.0 | \n",
" row-single-scrollable | \n",
" 1.0 | \n",
" 0.0 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" participation interaction_type time \\\n",
"id \n",
"3271 36 selected-item 2023-01-15 23:15:59.742434 \n",
"3272 36 selected-item 2023-01-15 23:16:00.293013 \n",
"3273 36 selected-item 2023-01-15 23:16:00.636370 \n",
"3274 36 selected-item 2023-01-15 23:16:02.500438 \n",
"3277 36 selected-item 2023-01-15 23:16:09.472159 \n",
"\n",
" data iteration \\\n",
"id \n",
"3271 {\"selected_item\": {\"genres\": [\"Adventure\", \"Fa... 1.0 \n",
"3272 {\"selected_item\": {\"genres\": [\"Adventure\", \"Fa... 1.0 \n",
"3273 {\"selected_item\": {\"genres\": [\"Action\", \"Adven... 1.0 \n",
"3274 {\"selected_item\": {\"genres\": [\"Adventure\", \"Fa... 1.0 \n",
"3277 {\"selected_item\": {\"genres\": [\"Fantasy\"], \"mov... 1.0 \n",
"\n",
" result_layout GAMMA DELTA variant \n",
"id \n",
"3271 row-single-scrollable 1.0 0.0 0 \n",
"3272 row-single-scrollable 1.0 0.0 0 \n",
"3273 row-single-scrollable 1.0 0.0 0 \n",
"3274 row-single-scrollable 1.0 0.0 0 \n",
"3277 row-single-scrollable 1.0 0.0 1 "
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"selected_item_interactions.head()"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "b3cebbff",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'selected_item': {'genres': ['Action', 'Fantasy', 'Horror', 'Thriller'],\n",
" 'movie': 'Underworld: Vzpoura Lycanů (2009) Akční|Fantasy|Horor|Thriller',\n",
" 'movie_id': 65682,\n",
" 'movie_idx': '650',\n",
" 'url': '/assets/utils/ml-latest/img/65682.jpg',\n",
" 'variant': 1},\n",
" 'selected_items': [{'genres': ['Action', 'Comedy', 'IMAX'],\n",
" 'movie': 'Noc v muzeu 2 (2009) Akční|Komedie|IMAX',\n",
" 'movie_id': 68793,\n",
" 'movie_idx': '674',\n",
" 'url': '/assets/utils/ml-latest/img/68793.jpg',\n",
" 'variant': 0},\n",
" {'genres': ['Action', 'Fantasy', 'Horror', 'IMAX'],\n",
" 'movie': 'Underworld: Probuzení (2012) Akční|Fantasy|Horor|IMAX',\n",
" 'movie_id': 91974,\n",
" 'movie_idx': '890',\n",
" 'url': '/assets/utils/ml-latest/img/91974.jpg',\n",
" 'variant': 0},\n",
" {'genres': ['Action', 'Fantasy', 'Horror'],\n",
" 'movie': 'Underworld: Evolution (2006) Akční|Fantasy|Horor',\n",
" 'movie_id': 42738,\n",
" 'movie_idx': '472',\n",
" 'url': '/assets/utils/ml-latest/img/42738.jpg',\n",
" 'variant': 1},\n",
" {'genres': ['Action', 'Fantasy', 'Horror', 'Thriller'],\n",
" 'movie': 'Underworld: Vzpoura Lycanů (2009) Akční|Fantasy|Horor|Thriller',\n",
" 'movie_id': 65682,\n",
" 'movie_idx': '650',\n",
" 'url': '/assets/utils/ml-latest/img/65682.jpg',\n",
" 'variant': 1}],\n",
" 'context': {'url': 'http://hmon.ms.mff.cuni.cz:5000/plugin1/compare-algorithms',\n",
" 'time': '2023-02-05T18:46:36.111Z',\n",
" 'viewport': {'left': 0,\n",
" 'top': -620,\n",
" 'width': 1249.3333740234375,\n",
" 'height': 1856.3958740234375},\n",
" 'extra': {'variant': 1}}}"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# information available for selected-item interaction_type\n",
"#selected_item: current selection\n",
"#selected_items: previous selections\n",
"json.loads(selected_item_interactions.loc[47768][\"data\"])"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "94a4f5b0",
"metadata": {},
"outputs": [],
"source": [
"# adding information on corresponding MovieID\n",
"def getSelectedMovieId(x):\n",
" return json.loads(x)[\"selected_item\"][\"movie_id\"]\n",
"\n",
"selected_item_interactions[\"movieID\"] = np.nan\n",
"selected_item_interactions.movieID = selected_item_interactions.data.map(lambda x: getSelectedMovieId(x))"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "cc96cb74",
"metadata": {},
"outputs": [],
"source": [
"# adding information on which algorithm is responsible for the selection\n",
"selected_item_interactions[\"selected_algorithm\"] = \"GAMMA\"\n",
"selected_item_interactions.loc[selected_item_interactions.variant == selected_item_interactions.DELTA, \"selected_algorithm\"] = \"DELTA\""
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "7520b829",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" participation | \n",
" interaction_type | \n",
" time | \n",
" data | \n",
" iteration | \n",
" result_layout | \n",
" GAMMA | \n",
" DELTA | \n",
" variant | \n",
" movieID | \n",
" selected_algorithm | \n",
"
\n",
" \n",
" id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 3271 | \n",
" 36 | \n",
" selected-item | \n",
" 2023-01-15 23:15:59.742434 | \n",
" {\"selected_item\": {\"genres\": [\"Adventure\", \"Fa... | \n",
" 1.0 | \n",
" row-single-scrollable | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0 | \n",
" 106489 | \n",
" DELTA | \n",
"
\n",
" \n",
" 3272 | \n",
" 36 | \n",
" selected-item | \n",
" 2023-01-15 23:16:00.293013 | \n",
" {\"selected_item\": {\"genres\": [\"Adventure\", \"Fa... | \n",
" 1.0 | \n",
" row-single-scrollable | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0 | \n",
" 5952 | \n",
" DELTA | \n",
"
\n",
" \n",
" 3273 | \n",
" 36 | \n",
" selected-item | \n",
" 2023-01-15 23:16:00.636370 | \n",
" {\"selected_item\": {\"genres\": [\"Action\", \"Adven... | \n",
" 1.0 | \n",
" row-single-scrollable | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0 | \n",
" 7153 | \n",
" DELTA | \n",
"
\n",
" \n",
" 3274 | \n",
" 36 | \n",
" selected-item | \n",
" 2023-01-15 23:16:02.500438 | \n",
" {\"selected_item\": {\"genres\": [\"Adventure\", \"Fa... | \n",
" 1.0 | \n",
" row-single-scrollable | \n",
" 1.0 | \n",
" 0.0 | \n",
" 0 | \n",
" 98809 | \n",
" DELTA | \n",
"
\n",
" \n",
" 3277 | \n",
" 36 | \n",
" selected-item | \n",
" 2023-01-15 23:16:09.472159 | \n",
" {\"selected_item\": {\"genres\": [\"Fantasy\"], \"mov... | \n",
" 1.0 | \n",
" row-single-scrollable | \n",
" 1.0 | \n",
" 0.0 | \n",
" 1 | \n",
" 135143 | \n",
" GAMMA | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" participation interaction_type time \\\n",
"id \n",
"3271 36 selected-item 2023-01-15 23:15:59.742434 \n",
"3272 36 selected-item 2023-01-15 23:16:00.293013 \n",
"3273 36 selected-item 2023-01-15 23:16:00.636370 \n",
"3274 36 selected-item 2023-01-15 23:16:02.500438 \n",
"3277 36 selected-item 2023-01-15 23:16:09.472159 \n",
"\n",
" data iteration \\\n",
"id \n",
"3271 {\"selected_item\": {\"genres\": [\"Adventure\", \"Fa... 1.0 \n",
"3272 {\"selected_item\": {\"genres\": [\"Adventure\", \"Fa... 1.0 \n",
"3273 {\"selected_item\": {\"genres\": [\"Action\", \"Adven... 1.0 \n",
"3274 {\"selected_item\": {\"genres\": [\"Adventure\", \"Fa... 1.0 \n",
"3277 {\"selected_item\": {\"genres\": [\"Fantasy\"], \"mov... 1.0 \n",
"\n",
" result_layout GAMMA DELTA variant movieID selected_algorithm \n",
"id \n",
"3271 row-single-scrollable 1.0 0.0 0 106489 DELTA \n",
"3272 row-single-scrollable 1.0 0.0 0 5952 DELTA \n",
"3273 row-single-scrollable 1.0 0.0 0 7153 DELTA \n",
"3274 row-single-scrollable 1.0 0.0 0 98809 DELTA \n",
"3277 row-single-scrollable 1.0 0.0 1 135143 GAMMA "
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"selected_item_interactions.head()"
]
},
{
"cell_type": "markdown",
"id": "0fc44299",
"metadata": {},
"source": [
"## Task 1: which algorithm (GAMMA or DELTA) attracted more selections?\n",
"- apply a simple groupby and count\n",
"- what kind of test would you apply to determine whether the differences are statistically significant?"
]
},
{
"cell_type": "markdown",
"id": "791b4e15",
"metadata": {},
"source": [
"### Task 1.1: were there some differences if the algorithm was displayed at (dis)advantaged position?"
]
},
{
"cell_type": "markdown",
"id": "a3a890c3",
"metadata": {},
"source": [
"### Task 1.2: were there some differences w.r.t. result_layout and (dis)advantaged position?"
]
},
{
"cell_type": "markdown",
"id": "c2cf2e55",
"metadata": {},
"source": [
"## Task 2: what are average DCG scores for both GAMMA and DELTA?\n",
"- you gonna need to link selections to the positions of displayed items. This can be acquired from iteration-started interaction_type"
]
},
{
"cell_type": "markdown",
"id": "c92ddc86",
"metadata": {},
"source": [
"## Task 3: does GAMMA or DELTA substantially differ in the diversity, novelty, or popularity lift of provided recommendations?\n",
"- consider diversity w.r.t. movie genres, apply, e.g., ILD\n",
" - or, alternatively, focus on overall distribution divergence (e.g., KL divergence)\n",
"- consider mean distribution year as a proxy for novelty\n",
"- alternatively, employ consumption statistics of https://grouplens.org/datasets/movielens/latest/ (large) dataset (siotable for novelty, popularity lift and collaborative diversity)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b7a0e7b2",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}