LLR (Log-Likelihood Ratio) used for recommendations

The blog is aimed at a brief explanation of LLR. For more details please refer the original paper at http://ucrel.lancs.ac.uk/papers/tedstats.pdf. Another great resource is http://tdunning.blogspot.in/2008/03/surprise-and-coincidence.html from which much of the inspiration for this blog is drawn.

The approach is centered around analyzing counts of co-occurrence of events. Concretely, lets consider two events A and B. Let event A describe the view of an iPhone. Let event B describe viewing an iPad. Now, the problem we are trying to solve is as follows. To what extent does one indicate the likelihood of the occurrence of the other. So, if the user has viewed an iPhone, how likely is he to view an iPad?

Mathematical Treatment

Mathematically, it can be described as follows. To compute the score, we define a few counts. Let k_11 be the number of times the events occurred together, let k_12 and k_21 be the number of times each has occurred without the other and k_22 be the number of times something has been observed that was neither of these events. We get the following table:

Event A Everything but A
Event B A and B together (k_11) B, but not A (k_12)
Everything but B A without B (k_21) Neither A nor B (k_22)

The LLR score is given as follows:

LLR = 2 sum(k) (H(k) – H(rowSums(k)) – H(colSums(k)))

where H is Shannon’s entropy, computed as the sum of (k_ij / sum(k)) log (k_ij / sum(k)). In R, this function is defined as

H = function(k) {N = sum(k) ; return (sum(k/N * log(k/N + (k==0)))}.

credit for the illustration: http://tdunning.blogspot.in/2008/03/surprise-and-coincidence.html

An E-Commerce example

Now, lets take a concrete example. Say, we are designing a recommendation system for an e-commerce website. We analyse the user click stream data and compute LLR between items. Say we get the following results:-

Item
View-View LLR Scores
iPhone iPad_12.31, iPod Touch_10.11, MacbookPro_8.44, …
iPad iPadPro_16.11, …
Galaxy Nexus Samsung Note_10.12, iPad_6.11 …

So, what exactly does this table represent. Lets consider the first entry in the first row. What this means is that users who ‘view’ iPad are likely to view ‘iPhone’ as well. The quantitative measure of this likeness is established by the LLR scores. So, in this case the value is 12.31.

Ok, that’s well and good. But, how exactly does this serve recommendations? Let us consider a user, say John. John has come to the website and has just clicked on the link for iPad. Now, the problem that we are trying to solve is this. Given that John is currently viewing an iPad. What items should we recommend to him? LLR provides a ready answer.

Concretely, in the example above, we would query the view-view LLR column. The item iPad appears in two rows (row 1 and row 3). In row 1 it appears with a similarity score of 12.31. In row 3 it appears with a similarity score of 6.11. Now, we recommend to the user an iPhone and a Galaxy Nexus. The order being determined by their similarity scores.

Lets extend our example to make the value more apparent. Say, now John goes and looks at a Samsung Note. So, we now know two items that John has viewed. The question we are trying to answer is the following. Given that John has viewed an iPad and a Samsung Note, what items should we recommend to him? Again, LLR to the rescue.

In the most simple approach, we would derive recommendations from both of these items. We would then merge these recommendations using their similarity scores and suitable weights. Concretely, we have the following:-

recommendations because he viewed iPad – [ (iPhone,12.31), (Galaxy Nexus, 6.11) ]

recommendations because he viewed Samsung Note – [ (Galaxy Nexus, 10.12) ]

Now, we simply combine the above results to return Galaxy Nexus followed by iPhone in the list of recommendations. This illustrates the beauty of this approach. A product such as Galaxy Nexus would be ranked higher in recommendations owing to being indicated by multiple items that the user has interacted with.

This drives real time personalization. As the user visits different items, we build up a user history consisting of the items he has viewed/purchased. With each click we gain more knowledge about the user intent and can utilize this knowledge to serve recommendations. Another significant advantage with this approach is that the LLR model can be readily indexed in a search index such as SOLR or Elastic Search. Recommendations can then be reduced to simple search queries.

The view-view computation was simply an example. In reality you could have multiple indicators such as view-bought as well as view-view. In this case, views are indicated by both views and purchases. Then, we can maintain a user history vector for both the items he has viewed as well as purchased. This can then be used to query the LLR table and serve recommendations. Similarly, user searches can be an indicator. For instance, you could have a view-search_term table. For example, this table may look as follows:-

Item
View-Search_Term LLR Scores
iPhone apple_10.11, …
iPad tablet_16.11, …
Galaxy Nexus google_10.12, …

Which clearly shows that searching for the term ‘apple’ indicates a strong affinity to viewing an item such as an ‘iPhone’.

An example of how this is done can be found here – http://www.slideshare.net/pferrel/unified-recommender-39986309.

Hmm, Sounds a lot like a simple co-occurrence count

The above approach has significant advantages over a simple co-occurrence count. One of the chief ones being that popular items will not be recommended if they are not indicated by other items. Concretely, say that ‘beer’ is an item that is always purchased. In that case it would have high co-occurrence with each other item. Now, using traditional co-occurrence count measures we would recommend ‘beer’ for every item that is viewed. This approach may indeed prove useful for certain use cases. However, it fails to capture an essential aspect of the problem. That ‘beer’ is co-occurring with, say ‘diapers’ not because ‘diapers’ indicates ‘beer’ but simply because ‘beer’ is a really popular item and just co-occurs with most items in the database. These problems are solved by LLR where it is able to capture significant co-occurrences and hence eliminate such recommendations.

Implementation

Mahout has some great libraries that offer LLR implemented using Spark out of the box. The following link:- https://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html describes the spark-itemsimilarity command line tool that can be used for computing LLR.

Another fantastic resource is https://github.com/pferrel/template-scala-parallel-universal-recommendation. This is more of an end to end implementation of a recommendation system that uses the above discussed concepts.

For any questions, please post comments and it’d be great to have a discussion on this.

16 thoughts on “LLR (Log-Likelihood Ratio) used for recommendations

  1. Thanks Nikaash,it will be helpful for me if you give me a sample snippet.If possible can you please share me your email id.

    Like

  2. Hi!
    I’ve been trying to reconstruct the maths behind the UR for a week and I give up. Say, we get the LLR scores. Let’s take the following matrix in R (rows are users, columns are items).
    m = c(1,1,0,0,0,
    0,0,1,1,0,
    0,0,0,0,1,
    1,0,0,1,0,
    1,1,0,0,0,
    1,0,0,0,0,
    1,1,1,0,0,
    0,0,0,1,1,
    0,0,0,1,1)
    If we calculate LLR for each item-item (using your code for LLR), we’ll get:
    rs = c(0,4.72,0.03,1.76,6.95,
    4.72,0,0.3,4.72,3.13,
    0.03,0.3,0,0.03,1.89,
    1.76,4.72,0.03,0,0.9,
    6.95,3.13,1.89,0.9,0)
    So what’s next. If I query the recommendations for user3 (he currently has ‘0 0 0 0 1’), then what do we do? How do we convert LLR scores to indicators? I’ve set up ur and pushed the events there. However, the scores I’m getting are nothing similar to what I get trying to calculate it by hand. Any help? 🙂

    Like

    1. Hey! Thanks for the question.
      So, it works as follows. Say user3 has 0 0 0 0 1. That basically means he has seen one item, that is item5 in this case (assuming items are numbered 1 to 5). Now, we need to recommend some items to him. So, this is what we do:
      a. Go to the item-item matrix you have constructed above (assuming the LLR scores are valid)
      b. Go to the 5th row, the row corresponding to item5, since the user has seen item5
      c. Pick the columns with the highest scores, in this case column 1, 2 and then 3. Corresponding to item1, item2 and item3 respectively.
      d. Recommend item1, item2 and item3 to the user. Of course, you can recommend more than 3 items, I picked 3 to keep the example simple.

      Now, in this case the data set is really small, so its not quite clear that recommending item1, item2 and item3 may be relevant to a user who has seen item5, but in general this is the concept.

      Ok, let’s make it more interesting. Consider user2 from your example (0 0 1 1 0). He has seen item3 followed by item4. I take the liberty of assuming that he saw item3 first, and then item4. The following logic holds either way. So, how do we recommend items for this user?
      a. We pick items corresponding to item3 as well as item4.
      b. Then, we merge their scores.
      c. What I mean by that is, say LLR(item3, item2)=0.3 and LLR(item4, item2)=4.72. Then for user2, who has seen both item3 and item4, the combined score for item2 will be (0.3 + 4.72 = 5.02).

      Also, another scheme is possible. We might like to say that items that were viewed a while ago are less important than items viewed more recently. In that case, we could give different weights when calculating final LLR scores. For example:
      a. For user2 as discussed above, he saw item3 and then item4.
      b. So, we might want to give less weight to item3 and more to item4.
      c. Then, the final score would be (0.3 X 0.5 + 4.72 X 1.0 = 4.87). Observe that item3 got a weight of 0.5 and item4 got a weight of 1.0. In other words, we said that we would like to give twice the weight to the more recent item than to the less recent one.

      Finally, this scheme is very generic. So, for example, if you could ‘somehow’ build a matrix of item-item similarity. That matrix might come from simple co-occurrence counts, or from LLR or from some other mechanism. (using algorithms from NLP to build such a matrix – http://www.sciencedirect.com/science/article/pii/S1877050916308559). Then, with such a scheme, you could recommend items to any user who has seen a certain set of items.

      Does that make it more clear?

      Like

      1. Yes, that helped. I actually asked about the algorithm on the action-ml forum. So my misunderstanding was more about extracting these scores. And it looks like these are calculated using another algorithm from Elasticsearch.
        As for LLR, I’m actually not sure that we can use them directly like you said. So LLR can’t be scores. E.g. in my case. LLR score for items 1-5 is very high, and there is a dependency between there variables. However, this is a ‘negative’ dependency, i.e. if 1 then not 5, and if 5 then not 1. So LLR only shows the strength of correlation. I think that the next step should have been combining co-occurence and check LLR only for no-empty positions in the co-occurence matrix. Does it make sense?

        Like

      2. Hi Ana,

        Yes, that seems reasonable. In fact, If I’m not mistaken, the Mahout implementation does exactly this (please do correct me if I’m wrong on this). It only computes LLR scores for pairs of items that have co-occurred some minimum number of times. It does not make sense to have an LLR score between two items that had no co-occurrence.

        Thank you,
        Nikaash Puri

        Like

    1. Hi Sourav,

      This is done simply using LLR, which is more of a statistical computation than a machine learning algorithm. There are other machine learning based approaches that use neural networks and Factorization Machines (FM) for recommendation systems.

      Thank you,
      Nikaash Puri

      Like

  3. Hi Nikaash, I have a query regarding how the user’s current action is being used here? In the comment section, you explained an example where user 2 has visited item 3 and 4. You check the similar products corresponding to 3 and 4. Then you add the scores and recommend the0 highest scoring item. My question is, suppose item 3 and 4 were some electronic products(iPhone and Nexus) and the recommended product was also an electronic product(Samsung note). But this time the user has clicked a cosmetic product, but according to our algorithm, it will recommend him Samsung note which should not be the case. So how is this being tackled by the algorithm? Does the Hp matrix(0,0,1,1,0) is being updated real time and more weight is given to the current click?

    Like

    1. Hi Rishav,

      That’s correct. The amount of weight to be given to the most recent interaction is usually configurable. For example, the last item seen can be given weight 1, followed by 0.5 for the second last item, then 0.25 and so on. It is also possible to say that if the user switches categories, then we should reset the weights and not use past weights at all. Further, we could create rules such as only looking at the items that the user has seen in the current active session (session might be a browsing length of 30 minutes).

      There’s no one solution that works for all domains. Usually, you should A/B test between different approaches (i.e. show some users recommendations from one algorithm and others recommendations from another algorithm). And then select the algorithm that gives the best results for your particular problem.

      Thank you,
      Nikaash Puri

      Like

  4. Hi nikaash. Can you please explain me LLR by applying the formula,let’s say for item 1 and item 2 in the matrix in the second comment. I mean a step by step application of the formula as to how we got 4.72. I didn’t understand the formula.

    Like

Leave a comment