Content-Based Filtering

So what are the alternatives to collaborative filtering? We will examine content-based filtering.

Content-based still uses features ratings.

Let’s change w to a vector v^(j) which is computed from the user x^(j) for the user features
and x to a vector v⁽ⁱ⁾ vector for the movie features
the product will be the predictor we use to predict which item we can recommend based on user features and movie features they rated
So given vector V_u and vector V_m how can we make a recommendation

given a feature vector describing a user, such as age and gender, and country, and so on, we have to compute the vector V_u, and similarly, given a vector describing a movie such as year of release, the stars in the movie, and so on, we have to compute a vector V_m.

In order to do the former, we’re going to use a neural network.

The first neural network will be what we’ll call the user network.
Here’s an example of user network, that takes as input the list of features of the user, x_u, so the age, the gender, the country of the user, and so on.
Then using a few layers, say dense neural network layers, it will output this vector v_u that describes the user.
Notice that in this neural network, the output layer has 32 units, and so v_u is actually a list of 32 numbers.
Unlike most of the neural networks that we were using earlier, the final layer is not a layer with one unit, it’s a layer with 32 units.

Similarly, to compute v_m for a movie, we can have a movie network as follows, that takes as input features of the movie and through a few layers of a neural network is outputting v_m, that vector that describes the movie.

Finally, we’ll predict the rating of this user on that movie as v_ u dot product with v_m.
Notice that the user network and the movie network can hypothetically have different numbers of hidden layers and different numbers of units per hidden layer.
All the output layer needs to have the same size of the same dimension.
In the description you’ve seen so far, we were predicting the 1-5 or 0-5 star movie rating.
If we had binary labels, if y was to the user like or favor an item, then you can also modify this algorithm to output.
Instead of v_u.v_m, you can apply the sigmoid function to that and use this to predict the probability that’s y^(i,j) is 1.
To flesh out this notation, we can also add superscripts i and j here if we want to emphasize that this is the prediction by user j on movie i.
Here the user network and the movie network as two separate neural networks.
But it turns out that we can actually draw them together in a single diagram as if it was a single neural network.

On the upper portion of this diagram, we have the user network which inputs x_u and ends up computing V_u
On the lower portion of this diagram, we have what was the movie network, the input is x_m and ends up computing V_m and these two vectors are then dot-product together.
This dot here represents dot product, and this gives us our prediction.

Cost Function

To train all the parameters of both the user network and the movie network we will construct a cost function J, which is going to be very similar to the cost function in collaborative filtering.

This will train both movie and user network in one cost function equation. To regularize it we can add the NN term.

To find a similar movie to movie V⁽ⁱ⁾ we can find V^(k) and calculate and find the smallest squared distance between them. You can calculate these values ahead of time so when the user is online you can present these referrals.

Scale Up

To show recommended movies, items, songs…. on sites all these can be calculated ahead of time. Even if we recommend items the user doesn’t like it isn’t the end of the world.

We can do the retrieval in advance then we take all the items retrieved and rank the list

TF Implementation

Here we start with the two NN one for user and one for movies
Layout the sequential layers
Then we tell TF Keras how to feed the user features and movie features with input_user & input_item
We input the data and extract vu, then normalize the value
We input and extract vm then normalize it
This is where we multiply the two values from the two NN and get the dot product as output and get the prediction
Then we declare the model and feed it the input and output
Finally we specify the cost function