Guide on LightFM package in Python
Start your free 7-days trial now!
Suppose we are a video-on-demand service (e.g. Netflix) and we wish to build a model that recommends relevant movies to viewers. In machine learning jargon, the viewers are known as users while the movies are known as items. For instance, for an e-commerce service, the users refer to customers whereas items refer to the products.
Most recommendation models leverage either or both of the following types of data:
interaction data that captures how users interact with the items. For example, the interaction data for our video-on-demand service include user ratings of the movies, the user's browsing history of movies and so on.
attribute data that captures information about users and items separately. For instance, the users' attribute data may be their profile (e.g. age, nationality) while the items' attribute data may be the year of release or the genre (e.g. horror).
There are mainly two types of recommendation models:
collaborative filtering models that leverage the fact that users who interact with the same items will share a similar preference for other items. For instance, suppose user A and user B both give a 5 star rating to a movie M. Intuitively, we should expect user A and user B to share a similar taste in movies, which means that we can recommend movies to user B based on the interests of user A.
content-based filtering models recommends items to an user based solely on the user's past data. For instance,
Content-based filtering models recommends items to an user based solely on the user's past data. For instance, consider the following data about an user's ratings of movies they've watched before:
Genre | Year of release | |||
---|---|---|---|---|
Movie 1 | Horror | 2020 | ||
Movie 2 | Romance | 1995 | ||
Movie 3 | Horror | 2998 | 1 | |
User | Romance |
Movie 1 | Movie 2 | Movie 3 | Movie 4 | |
---|---|---|---|---|
Genre | Horror | Romance | Horror | Romance |
Release date | 2022 | 2020 | 1995 | 2010 |
User A | 5 | 0 | 4 | 1 |
We can make the following (naive) conclusion:
this user enjoys horror movies over romance movies.
this user prefers modern movies over old ones.
Therefore, it would make more sense to recommend modern horror movies rather than romance moves to this user. Notice how recommendations of this type do not rely on other users - we generate the recommendations based entirely on this user's past information.
Collaborative filtering models that leverage the fact that users who interact with the same items will share a similar preference for other items. For instance, suppose user A and user B both give a 5 star rating to a movie M. Intuitively, we should expect user A and user B to share a similar taste in movies, which means that we can recommend movies to user B based on the interests of user A.
Because each of these model types have its own strengths and weaknesses, they are both widely used in practice - one type is not superior over the other. For instance, we may opt for content-based filtering models because we simply have a lack of interaction data. There also exist hybrid recommendation models.
There are mainly two types of recommendation models:
collaborative filtering models that leverage interaction data. The main idea is that users who interact with the same items will share a similar preference for over items. For instance, suppose user A and user B both give a 5-star rating to a movie M. Intuitively, we should expect user A and user B to share a similar taste in movies.
content-based filtering models that leverage attribute data.
There also exist hybrid recommendation models that leverage both interaction and attribute data. Because each of these model types have its own strengths and weaknesses, they are both widely used in practice - one type is not superior over the other. For instance, we may opt for content-based filtering models because we simply have a lack of interaction data.
Movie 1 | Movie 2 | Movie 3 | Movie 4 | |
User A | 5 | 0 | 5 | 1 |
User B | 0 | 4 | 2 | 5 |
Genre | Horror | Romance | Horror | Romance |
This guide will cover how to use the LightFM
package to train recommendation models based on matrix factorization.
Tutorial for LightFM
Suppose we wanted to build a movie recommendation system based on the following dataset:
df_data = pd.read_csv("./data.csv", index_col=0)print(len(df_data)) # 10987720 rowsdf_data
userID itemID rating occupation genre0 196 242 3.0 writer Comedy1 196 242 3.0 writer Comedy2 196 242 3.0 writer Comedy3 196 242 3.0 writer Comedy4 196 242 3.0 writer Comedy
Note the following:
itemID
refers to the movie ID.rating
is a numeric value that ranges from0
to5
.occupation
is the profession of the user. A user can only have a single occupation.genre
refers to the type of the movie. An item can have multiple genres (e.g."Comedy|Musical"
).
Our goal is to build a movie recommendation system based on the following features:
user-item interaction data -
rating
.user attribute data -
occupation
.item attribute data -
genre
.
Preparing LightFM Dataset
The first step is to prepare the dataset (Dataset
) that we will feed into our LightFM model. There are two things we must supply:
a list of unique user occupations.
a list of unique movie genres.
The list of unique occupations can be obtained like so:
list_str_occupations_unique = list(df_data["occupation"].drop_duplicates())print(len(list_str_occupations_unique)) # 21list_str_occupations_unique
['writer', 'marketing', 'student', 'other', ... ]
The list of unique movie genres can be obtained like so:
series_genre_of_movies = df_data["genre"].str.split("|")list_str_movie_genre_unique = list(set(np.concatenate(series_genre_of_movies).ravel()))print(len(list_str_movie_genre_unique)) # 18list_str_movie_genre_unique
['War', 'Animation', 'Sci-Fi', 'Comedy', 'Film-Noir', ... ]
Here, we are using the Series' split(-)
method to obtain a list of genres (e.g. "Comedy|Musical"
becomes ["Comedy","Musical"]
.
We can now build the LightFM dataset like so:
dataset = Dataset()dataset.fit(users=df_data["userID"], items=df_data["itemID"], item_features=list_str_movie_genre_unique, user_features=list_str_occupations_unique)
Preparing users features
Next, we must build the users' features. The input format expected by LightFM is as follows:
[(user_id_1, ['feature1']), (user_id_2, ['feature2']), ...]
We can obtain this input format like so:
df_data_with_unique_user_ids = df_data.drop_duplicates("userID")list_user_features = [(x,[y]) for x,y in zip(df_data_with_unique_user_ids["userID"], df_data_with_unique_user_ids["occupation"])]# print(len(list_user_features)) # 943list_user_features
[(196, ['writer']), (63, ['marketing']), (226, ['student']), (154, ['student']), ... ]
We then pass this into the LightFM's build_user_features(-)
method:
sm_user_features = dataset.build_user_features(list_user_features)sm_user_features
<943x964 sparse matrix of type '<class 'numpy.float32'>' with 1886 stored elements in Compressed Sparse Row format>
Here, the shape of sm_user_features
is 943
by 964
. This is because there are 943
unique users and each userID
is treated as an user attribute. We also have 21
unique occupations, which means there are a total of 964
user attributes.
Preparing items features
Similarly, we build the items' features:
list_item_features = [(x,y) for x,y in zip(df_movies_uniq["itemID"], series_genre_of_movies)]print(f"Length of item features: {len(list_item_features)}") # 352list_item_features
[(242, ['Comedy']), (257, ['Action', 'Adventure', 'Comedy', 'Sci-Fi']), (111, ['Comedy', 'Romance']), (25, ['Comedy']), (382, ['Comedy', 'Drama']), ... ]
We then pass this into LightFM's build_item_features(-)
method:
sm_item_features = dataset.build_item_features(list_item_features)sm_item_features
<352x370 sparse matrix of type '<class 'numpy.float32'>' with 1097 stored elements in Compressed Sparse Row format>
Preparing interaction data
Now that we've prepared the user and item data, we can move on to preparing the interaction data by using the build_interactions(-)
method:
sm_interactions, sm_weights = dataset.build_interactions(df_data[["userID","itemID","rating"]].values)sm_interactions
<943x352 sparse matrix of type '<class 'numpy.int32'>' with 10987720 stored elements in COOrdinate format>
Internal mapping of IDs
Instead of using the original IDs of the users and items in our dataset, LightFM internally assigns a new consecutive non-negative integer ID to each user and item. We can see the mapping like so:
user_id_map, user_feature_map, item_id_map, feature_item_map = dataset.mapping()
The user_id_map
is:
user_id_map
{196: 0, 63: 1, 226: 2,
The user_feature_map
is:
user_feature_map
{196: 0, 63: 1, ..., 'healthcare': 962, 'marketing': 963}
Remember, LightFM also treats the ID of every user as a feature - this is why we see the ID of our users included in the user_feature_map. We will later use these mappings to perform predictions.
Evaluating performance
Since we want to evaluate the performance of our LightFM model, we will use the library's random_train_test_split(-)
method:
sm_train_interactions, sm_test_interactions = random_train_test_split(sm_interactions, test_percentage=0.2, random_state=42)print(f"Shape of train interactions: {sm_train_interactions.shape}")print(f"Shape of test interactions: {sm_test_interactions.shape}")
Shape of train interactions: (943, 352)Shape of test interactions: (943, 352)
It's finally time to fit the LightFM model:
LEARNING_RATE = 0.25NO_EPOCHS = 20NO_COMPONENTS = 20 # Number of latent factorizationITEM_ALPHA = 1e-6 # Regularization factor for item featuresUSER_ALPHA = 1e-6 # Regularization factor for user features
model = LightFM(loss="warp", no_components=NO_COMPONENTS, learning_rate=LEARNING_RATE, item_alpha=ITEM_ALPHA, user_alpha=USER_ALPHA, random_state=42)
model.fit(interactions=sm_train_interactions, user_features=sm_user_features, item_features=sm_item_features, epochs=NO_EPOCHS)
It took me roughly 5 minutes to train the model using my M1-chip MacOS. Let's now evaluate the performance (precision@k) of our model using our testing data:
np_arr_prec = precision_at_k(model, test_interactions=sm_test_interactions, user_features=sm_user_features, item_features=sm_item_features)
print(len(np_arr_prec.shape)) # 943np_arr_prec[:10]
array([0.3, 0.3, 0.1, 0.3, 0.1, 0.5, 0.1, 0.9, 0.7, 0. ], dtype=float32)
Here, we have the precision@k value for every user. We can compute the mean average precision@k (MAP@K) like so:
np_arr_prec.mean()
0.38462353
User-to-Item recommendation
Suppose we wanted to recommend 5 movies to a particular user with ID 63. We can use the predict method of our model like so:
user_id = 63list_scores = model.predict(user_id_map[user_id], list(item_id_map.values()))print(len(list_scores)) # 352list_scores[:5]
array([-89.237495, -40.246788, -49.907825, -54.08462 , -99.96051 ], dtype=float32)
Note the following:
we convert the user_id to the user ID used by LightFM internally using
user_id_map
dictionary that we obtained earlier bydataset.mapping()
.we supply a list of movie IDs that we want to obtain a recommendation score for. Since we are interested in finding the top k movie recommendations for this user, we need to compute the recommendation score for every movie.
the
item_id_map
is a dictionary that maps the original movie IDs to the non-negative consecutive integers used by LightFM internally:item_id_map{242: 0,257: 1,111: 2, ...
We convert the list of recommendation scores into a Series such that we can assign the original movie IDs as the index:
series_scores = pd.Series(list_scores)series_scores.index = item_id_map.keys()print(len(list_scores)) # 352series_scores[:5]
242 -89.237495257 -40.246788111 -49.90782525 -54.084621382 -99.960510dtype: float32
Finally, we sort the scores in descending order to obtain the top movie recommendations for the user:
series_scores.sort_values(ascending=False, inplace=True)series_scores[:5]
222 124.6318281 21.56416315 -17.863367288 -31.532494258 -34.471897dtype: float32
Here, we see that the movie with ID 222
is the top recommended movie for this particular user. Note that the magnitude of the scores does not carry any meaning - they are simply used for ranking purpose only.
Item-to-item recommendation5
To obtain the vector embedding of each movie, we can use the get_item_representations(-)
method:
_, np_item_embeddings = model.get_item_representations(features=sm_item_features)print(np_item_embeddings.shape) # (352, 20)np_item_embeddings[:2]
array([[-0.79904926, -0.85671186, 0.42553982, 0.994905 , -0.06102959, -0.41155615, -0.64710814, 0.38948753, -0.16504961, -0.24440393, -0.46848 , -0.00726059, -0.5730575 , -0.12569594, -0.84235895, 0.9981231 , -0.36846963, 0.0336417 , 0.1883249 , 0.7433187 ], [ 1.3654532 , 1.4837279 , 1.0903912 , -0.14545436, -1.0986278 , 0.08251551, -2.776378 , -0.20987356, 1.8015835 , 2.2055554 , -0.22924855, -3.5627067 , -0.35516343, -0.79560184, -1.4587665 , 1.6426092 , 1.2299991 , 0.26629227, 0.877507 , -0.35510343]], dtype=float32)
Note the following:
for every movie, we get a vector representation that encodes the characteristics of the movie.
the shape is
352
by20
because there are352
movies in total and we set the latent vector size (NO_COMPONENTS
) to20
during model fitting.
We compute the cosine similarity for each pair of movies:
np_item_similarities = cosine_similarity(sparse.csr_matrix(np_item_embeddings))print(np_item_similarities.shape) # (352, 352)np_item_similarities[:2]
array([[ 9.99999940e-01, 9.83655304e-02, 8.39596242e-03, -1.05853856e-01, 4.35901970e-01, -1.35404930e-01, -7.39989057e-03, 5.28273523e-01, -5.74964844e-02, ...
It's good practise to convert this NumPy array into a DataFrame as we can assign the movie IDs to the rows and columns like so:
df_item_similarities = pd.DataFrame(np_item_similarities)df_item_similarities.columns = item_id_map.keys()df_item_similarities.index = item_id_map.keys()df_item_similarities
242 257 111 25 382 202 153 286 66 845 ... 1181242 1.000000 0.098366 0.008396 -0.105854 0.435902 -0.135405 -0.007400 0.528274 -0.057496 0.247484 ... 0.113449257 0.098366 1.000000 0.079299 0.134674 -0.236781 -0.060198 0.160672 -0.037351 -0.040263 0.233311 ... 0.121223111 0.008396 0.079299 1.000000 0.555100 0.156462 0.340305 0.059644 0.220777 0.389159 0.734334 ... 0.11343425 -0.500537 -0.442127 -0.427205 -0.404253 -0.395989 -0.377560 -0.364232 -0.344025 -0.341550 -0.338591 ... 1.000000382 0.435902 -0.236781 0.156462 0.271259 1.000000 0.204918 0.178467 0.311732 0.174588 0.147333 ... 0.076015
Remember, the item_id_map
is a dictionary obtained by dataset.mapping()
from earlier that maps the original movie IDs to the non-negative consecutive integers used by LightFM internally:
item_id_map
{242: 0, 257: 1, 111: 2, ...
To get the top 5 recommendations for a particular movie, say movie with ID 1049:
int_movie_id = 1049series_rec_movie_ids = df_item_similarities.loc[int_movie_id,:]series_top_5_rec_movie_ids = series_rec_movie_ids.sort_values(ascending=False).head(6)series_top_5_rec_movie_ids
1049 1.000000832 0.7865101047 0.745040756 0.741784930 0.729942407 0.729872Name: 1049, dtype: float32
As expected, the first value is the movie itself, which receives a perfect score. We see that the movie with ID 832 is the most recommended (similar) movie!
Suppose we have the following dataset of user ratings of movies:
df_ratings = pd.read_csv("./data/rating.csv")df_ratings = df_ratings[:50000]df_ratings.head(10)
userId movieId rating timestamp0 1 2 3.5 2005-04-02 23:53:471 1 29 3.5 2005-04-02 23:31:162 1 32 3.5 2005-04-02 23:33:393 1 47 3.5 2005-04-02 23:32:074 1 50 3.5 2005-04-02 23:29:40
Using this data, we can create an interaction matrix for the ratings:
def create_interaction_matrix(df_rating): interactions = df_rating.groupby(['userId', 'movieId'])['rating'] \ .sum().unstack().reset_index(). \ fillna(0).set_index('userId') return interactions
df_interactions = create_interaction_matrix(df_ratings)df_interactions.head(10)
movieId 1 2 3 4 5 6 7 8 9 10 ... 117590 118696 125916userId 1 0.0 3.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.02 0.0 0.0 4.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.03 4.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.04 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 4.0 ... 0.0 0.0 0.05 0.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.06 5.0 0.0 3.0 0.0 0.0 0.0 5.0 0.0 0.0 0.0 ... 0.0 0.0 0.07 0.0 0.0 3.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 ... 0.0 0.0 0.08 4.0 0.0 5.0 0.0 0.0 3.0 0.0 0.0 0.0 4.0 ... 0.0 0.0 0.09 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.010 4.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.010 rows × 6471 columns
def train(df_interactions, n_components=30, loss='warp', k=15, epoch=20, n_jobs=4): x = sparse.csr_matrix(df_interactions.values) model = LightFM(no_components=n_components, loss=loss, k=k) model.fit(x, epochs=epoch, num_threads=n_jobs) return model
model = train(df_interactions=df_interactions)
Let's create a user dictionary that maps the user ID (1
to 91
):
dict_users = create_user_dict(df_interactions)dict_users
{1: 0, 2: 1, 3: 2, 4: 3, 5: 4,
Note that there are 91
users in total.
Let's also create a movie dictionary that maps the movie ID to the movie title:
dict_movies = create_dict_movies(df_movies=df_movies)dict_movies
{1: 'Toy Story (1995)', 2: 'Jumanji (1995)', 3: 'Grumpier Old Men (1995)', 4: 'Waiting to Exhale (1995)', 5: 'Father of the Bride Part II (1995)',
Note that there are 27278
movies in total.
Recommending movies to a user
We then obtain a score for each movie (2888
scores in total in this case):
int_user_id = 20int_user_index = dict_users[int_user_id]list_int_all_movie_indexes = np.arange(n_movies) # [0,1,2,...,2888]list_float_scores = model.predict(int_user_index, list_int_all_movie_indexes)list_float_scores
[-2.4829762 -4.9836407 -4.567343 ... -7.8306265 -5.070689 -5.515493 ]
We then convert the list into a Pandas Series and set the index to the corresponding movie ID:
series_float_scores = pd.Series(list_float_scores)series_float_scores.index = df_interactions.columnsseries_float_scores
movieId1 -2.4829762 -4.9836413 -4.5673434 -6.7875995 -3.771592 ... 116797 -7.433063117511 -6.747071117590 -7.830626118696 -5.070689125916 -5.515493Length: 2889, dtype: float32
We then sort the movie recommendations in descending order of score:
series_float_scores.sort_values(ascending=False, inplace=True)series_float_scores
movieId588 1.63785453125 0.056910595 -0.0199405816 -0.0717564306 -0.114593 ... 3035 -8.7811291632 -8.905715446 -9.055195106920 -9.316463986 -9.662092Length: 2889, dtype: float32
Finally, we refer to the dict_movies to map the movie IDs to the movie titles:
index_int_top_n_movie_ids = series_float_scores.index[:n_rec_movies]index_str_top_n_movie_titles = index_int_top_n_movie_ids.map(lambda int_movie_id: dict_movies[int_movie_id])print(index_str_top_n_movie_titles)
Index(['Aladdin (1992)', 'Pirates of the Caribbean: At World's End (2007)', 'Beauty and the Beast (1991)', 'Harry Potter and the Chamber of Secrets (2002)', 'Shrek (2001)', 'Ace Ventura: Pet Detective (1994)'], dtype='object', name='movieId')
Item-to-item recommendation
To obtain the embedding of each movie, use the item_embeddings
property:
np_2d_item_embeddings = model.item_embeddingsprint(np_2d_item_embeddings.shape) # (2889, 30)np_2d_item_embeddings
array([[-0.09092457, -0.0838875 , -0.12737845, ..., 0.15028293, 0.0911804 , -0.10203753], [-0.19036956, -0.434018 , -0.01351528, ..., 0.20598511, 0.08963097, -0.65899724], [-0.32468513, -0.39591578, 0.28006303, ..., 0.04760544, 0.38568467, 0.04577473], ..., [ 0.22653413, 0.3293239 , 0.03833063, ..., -0.00624772, -0.06222142, -0.22419415], [ 0.27651408, -0.17156073, 0.3964168 , ..., 0.03881634, -0.27182811, 0.00873393], [ 0.30261683, -0.16283555, 0.41420266, ..., -0.04435388, -0.2530296 , -0.03735547]], dtype=float32)
The model embeddings are as follows:
np_2d_similarities = cosine_similarity(sparse.csr_matrix(np_2d_item_embeddings))print(np_2d_similarities.shape) # (2889, 2889)print(np_2d_similarities)
[[ 1. 0.40566397 0.3469511 ... -0.36663648 -0.6410434 -0.6627071 ] [ 0.40566397 1. 0.21013616 ... -0.06052235 -0.31996462 -0.31183812] [ 0.3469511 0.21013616 1.0000001 ... 0.08269963 -0.05860725 -0.06574367] ... [-0.36663648 -0.06052235 0.08269963 ... 1. 0.50560886 0.52789694] [-0.6410434 -0.31996462 -0.05860725 ... 0.50560886 1. 0.98076695] [-0.6627071 -0.31183812 -0.06574367 ... 0.52789694 0.98076695 1. ]]
df_movie_movie_embedding_matrix = pd.DataFrame(np_2d_similarities)df_movie_movie_embedding_matrix.columns = df_interactions.columnsdf_movie_movie_embedding_matrix.index = df_interactions.columnsdf_movie_movie_embedding_matrix
movieId 1 2 3 4 5 6 7 8 9 10 ... 111921 112138 112290 112556 112852 116797 117511 117590 118696 125916movieId 1 1.000000 0.405664 0.346951 -0.294259 0.437728 0.512554 0.353473 -0.107797 -0.230798 0.638173 ... -0.627231 -0.352069 -0.387219 -0.437410 -0.560808 -0.337960 -0.561736 -0.366636 -0.641043 -0.6627072 0.405664 1.000000 0.210136 -0.176198 0.409218 0.277941 0.297220 0.331275 0.292498 0.389678 ... -0.160900 -0.048249 -0.116282 -0.203658 -0.221201 0.048154 -0.237367 -0.060522 -0.319965 -0.3118383 0.346951 0.210136 1.000000 -0.298944 0.407445 0.088106 0.760060 0.411049 0.341809 0.552191 ... 0.000870 0.126664 -0.045766 0.188171 -0.071493 0.121961 0.044734 0.082700 -0.058607 -0.0657444 -0.294259 -0.176198 -0.298944 1.000000 -0.614848 -0.435423 -0.160393 -0.232857 -0.387155 -0.672230 ... 0.225883 0.120840 0.320404 0.122739 0.215385 0.198091 0.236011 0.169824 0.257572 0.2927655 0.437728 0.409218 0.407445 -0.614848 1.000000 0.715999 0.458008 0.280278 0.327384 0.731787 ... -0.381800 -0.347734 -0.546228 -0.333948 -0.496239 -0.410618 -0.543472 -0.406298 -0.521836 -0.536075... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...116797 -0.337960 0.048154 0.121961 0.198091 -0.410618 -0.586303 0.123174 0.118297 0.303710 -0.389772 ... 0.824818 0.945439 0.890395 0.892665 0.374194 1.000000 0.873650 0.947438 0.465295 0.490205117511 -0.561736 -0.237367 0.044734 0.236011 -0.543472 -0.798511 -0.054796 0.180001 0.286856 -0.523803 ... 0.870660 0.874688 0.848889 0.903736 0.656449 0.873650 1.000000 0.879492 0.747042 0.749149117590 -0.366636 -0.060522 0.082700 0.169824 -0.406298 -0.576917 0.097847 0.076889 0.306317 -0.424767 ... 0.840736 0.963988 0.956530 0.882439 0.364814 0.947438 0.879492 1.000000 0.505609 0.527897118696 -0.641043 -0.319965 -0.058607 0.257572 -0.521836 -0.868977 -0.265207 0.194165 -0.009919 -0.602092 ... 0.720856 0.414643 0.517941 0.485705 0.943585 0.465295 0.747042 0.505609 1.000000 0.980767125916 -0.662707 -0.311838 -0.065744 0.292765 -0.536075 -0.845708 -0.262634 0.177577 0.032490 -0.608097 ... 0.771162 0.439405 0.544692 0.493888 0.938071 0.490205 0.749149 0.527897 0.980767 1.000000
series_movie_scores = df_movie_movie_embedding_matrix.loc[int_movie_id,:]index_top_n_rec_movie_ids = series_movie_scores.sort_values(ascending=False).head(n_items+1).indexindex_str_top_n_rec_movie_titles = index_top_n_rec_movie_ids.map(lambda int_movie_id: dict_movies[int_movie_id])print(index_str_top_n_rec_movie_titles)
Index(['Star Wars: Episode II - Attack of the Clones (2002)', 'Spider-Man 2 (2004)', '50 First Dates (2004)', 'X-Men: The Last Stand (2006)', 'Matrix Revolutions, The (2003)', 'My Big Fat Greek Wedding (2002)', 'Harry Potter and the Chamber of Secrets (2002)', 'Aladdin (1992)', 'Signs (2002)', 'Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)', 'Pirates of the Caribbean: At World's End (2007)'], dtype='object', name='movieId')