6. Evaluation tool

The output for each of the recommender tools discussed here is a table with two columns in which the first column indicates the name of the user and the second a semi-colon separated list of item ids that belong to the recommendations. During the pre-processing of the data we split the dataset into three different datasets: training, validation and test. The validation dataset will be used to test different options in the different recommenders, whether it is to explore the hyperparameter spaces in matrix factorization based recommenders or to check whether a bag of words based on only characters performs better than one build on different tags. The final preformance of the tool will be calculated with the best parameters for each tool and the test set.

In order to evaluate which set or parameters or tool works best, we will use the evaluate_predictions.py script that, given a training set a test set (whether validation or test) and the list of recommendations will calculate the following metrics: precision@k, recall@k, f1@k and map@k. By default the program will calculate metrics with k = 20.

In order to run the evaluation tool simply execute:

python3 python3 ./6_evaluate/evaluate_predictions.py -train train.u2i.txt -test test.u2i.txt -recom recom.txt

Where -train accepts the training file used to get the recommendations, -test accepts the file that has either the test or the validation dataset and -recom contains the recommendations. With the -b option you can also analyse all recommendations found in a folder. In this case the -recom option will indicate the name of the folder where your recommendations are. The result will be printed on screen and will contain a summary of the hyperparameters used if included in the recommendations file and the four calculated metrics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

6. Evaluation tool

Clone this wiki locally