Optimizing an algorithm that pulls data from a very large MySQL table

My website provides a large amount of tagged and categorized content, and I’m attempting to create a strong algorithm to line the content up with user needs. Users are able to select the tags that they are most interested in, and content that matches up with those tags are given more priority. Furthermore, the content can be “liked” (pushes it up in priority) and time decay pushes it down in priority. So, ultimately, the user should see relatively new content that is in line with what they’re interested in, and is also popular with other users.

The current algorithm works like this:

  1. Pull all items from content table. Assign each item a score of 1.
  2. Check content_tags against user_tags to see if there are matches in the array pulled from step 1. If so, apply a multiplier to that item’s score.
  3. Check content_likes to see how many likes each item has. Apply another multiplier, based on this amount.
  4. Apply a third factor based on the time decay of the item. Obviously, older items receive a bigger penalty than newer items.
  5. Sort by total score. The resulting array should have the most relevant items first. Then, I can simply trim this array down to 20 or so items and display them on the page.

As you can probably tell, this is a sluggish algorithm and, not only does it have to run a query to pull every single piece of content, but it then has to run separate queries to check content_tags, user_tags, and content_likes. Too many queries!

I suppose my first question is: am I doing this all wrong? Beyond that, can you think of any ways to optimize everything I’ve summarized above? The algorithm itself works quite well, assuming items and users have relevant tags. But I’m afraid that when my content table grows to tens of thousands of items, I’ll be in a real mess.

Thanks for your help!

Leave a Reply


Hire Me
Follow Me!
Most Popular Articles & Pages
Because your vote is Important
Sorry, there are no polls available at the moment.