Archive

Posts Tagged ‘ideas’

IMDB lookup for MythTV

January 6, 2011 2 comments

MythTV used to support fetching movie details from IMDB. Now that feature has apparently been removed due legal reasons. Quite early on I figured that I could do better than the script that came along with MythTV. So I wrote my own application that I scheduled with cron to populate the videometadata table in the MythTV database.

If I may say so, my application is smarter than any script I have seen so far (please correct me if I’m wrong!). The reason it is smarter is that it uses Google to search for the correct IMDB page based on the filename. I would say it has worked for 99% of the movies I have had over the years. The algorithm is pretty simple:

Assume you would have a file with the following filename:

Movie Name 1999 720p BluRay x264.mkv

Do a replace and drop all text after the common file endings (x264, 720p, 1080p, BluRay, HDTV, …). This will leave you with a title like:

Movie Name 1999

Now do a Google search:

site:imdb.com inurl:title intitle:1999 Movie Name

If there are no results you probably didn’t drop enough file endings (as described above), chop off the last one and try again. Repeat until you reach the actual title of the movie and thus get a Google match.

Now download the IMDB page for the first Google match, calculate the percentage of similar words in the IMDB movie title as in the filename (after you dropped the endings). If this percentage is lower than 70%, discard the result as a false match.

Pretty simple, yet very effective. I think the smart thing here is to use Google for the searching, which is able to make a pretty good guess at what you’re looking for. The most common situation where this script fails is if the year is missing and the movie is a remake of an older movie with the same title.

And if you happen to be making an IMDB script, please consider writing it so that it always fetches the text parsing commands (regular expressions) from a remote server that can be kept up-to-date, so that users don’t have to update the script every time IMDB changes layouts.

Have you seen a better approach to solving this problem? Let me know…