65 Digging Up Movie Info from IMDb


#65 Digging Up Movie Info from IMDb

A more sophisticated use of Internet access through lynx and a shell script is demonstrated in this hack, which searches the Internet Movie Database website ( http://www.imdb.com/ ) to find films that match a specified pattern. What makes this script interesting is that it must be able to handle two different formats of return information: If the search pattern matches more than one movie, moviedata returns a list of possible titles, but if there's exactly one movie match, the information about that specific film is returned.

As a result, the script must cache the return information and then search through it once to see if it provides a list of matches and then a second time if it proves to be a summary of the film in question.

The Code

 #!/bin/sh # moviedata - Given a movie title, returns a list of matches, if #   there's more than one, or a synopsis of the movie if there's #   just one. Uses the Internet Movie Database (imdb.com). imdburl="http://us.imdb.com/Tsearch?restrict=Movies+only&title=" titleurl="http://us.imdb.com/Title?" tempout="/tmp/moviedata.$$" summarize_film() {    # Produce an attractive synopsis of the film    grep "^<title>" $tempout  sed 's/<[^>]*>//g;s/(more)//'    grep '<b class="ch">Plot Outline:</b>' $tempout  \      sed 's/<[^>]*>//g;s/(more)//;s/(view trailer)//' fmtsed 's/^/ /'    exit 0 } trap "rm -f $tempout" 0 1 15 if [ $# -eq 0 ] ; then   echo "Usage: 
 #!/bin/sh # moviedata - Given a movie title, returns a list of matches, if # there's more than one, or a synopsis of the movie if there's # just one. Uses the Internet Movie Database (imdb.com). imdburl="http://us.imdb.com/Tsearch?restrict=Movies+only&title=" titleurl="http://us.imdb.com/Title?" tempout="/tmp/moviedata.$$" summarize_film() { # Produce an attractive synopsis of the film grep "^<title>" $tempout  sed 's/<[^>]*>//g;s/(more)//' grep '<b class="ch">Plot Outline:</b>' $tempout  \ sed 's/<[^>]*>//g;s/(more)//;s/(view trailer)//' fmtsed 's/^/ /' exit 0 } trap "rm -f $tempout" 0 1 15 if [ $# -eq 0 ] ; then echo "Usage: $0 {movie title  movie ID}" >&2 exit 1 fi fixedname="$(echo $@  tr ' ' '+')" # for the URL if [ $# -eq 1 ] ; then nodigits="$(echo $1  sed 's/[[:digit:]]*//g')" if [ -z "$nodigits" ] ; then lynx -source "$titleurl$fixedname" > $tempout summarize_film fi fi url="$imdburl$fixedname" lynx -source $url > $tempout if [ ! -z "$(grep "IMDb title search" $tempout" ] ; then grep 'HREF="/Title?' $tempout  \ sed 's/<OL><LI><A HREF="//;s/<\/A><\/LI>//;s/<LI><A HREF="//'  \ sed 's/">/ -- /;s/<.*//;s/\/Title?//'  \ sort -u  \ more else summarize_film fi exit 0 
{movie title movie ID}" >&2 exit 1 fi fixedname="$(echo $@ tr ' ' '+')" # for the URL if [ $# -eq 1 ] ; then nodigits="$(echo sed 's/[[:digit:]]*//g')" if [ -z "$nodigits" ] ; then lynx -source "$titleurl$fixedname" > $tempout summarize_film fi fi url="$imdburl$fixedname" lynx -source $url > $tempout if [ ! -z "$(grep "IMDb title search" $tempout" ] ; then grep 'HREF="/Title?' $tempout \ sed 's/<OL><LI><A HREF="//;s/<\/A><\/LI>//;s/<LI><A HREF="//' \ sed 's/">/ -- /;s/<.*//;s/\/Title?//' \ sort -u \ more else summarize_film fi exit 0

How It Works

This script builds a different URL depending on whether the command argument specified is a film name or an IMDb film ID number, and then it saves the lynx output from the web page to the $tempout file.

If the command argument is a film name, the script then examines $tempout for the string "IMDb title search" to see whether the file contains a list of film names (when more than one movie matches the search criteria) or the description of a single film. Using a complex series of sed substitutions that rely on the source code organization of the IMDb site, it then displays the output appropriately for each of those two possible cases.

Running the Script

Though short, this script is quite flexible with input formats: You can specify a film title in quotes or as separate words. If more than one match is returned, you can then specify the eight-digit IMDb ID value to select a specific match.

The Results

 $  moviedata lawrence of arabia  0056172 -- Lawrence of Arabia (1962) 0099356 -- Dangerous Man: Lawrence After Arabia, A (1990) (TV) 0194547 -- With Allenby in Palestine and Lawrence in Arabia (1919) 0245226 -- Lawrence of Arabia (1935) 0363762 -- Lawrence of Arabia: A Conversation with Steven Spielberg (2000) (V) 0363791 -- Making of 'Lawrence of Arabia', The (2000) (V) $  moviedata 0056172  Lawrence of Arabia (1962)   Plot Outline: British lieutenant T.E. Lawrence rewrites the political   history of Saudi Arabia. $  moviedata monsoon wedding  Monsoon Wedding (2001)   Plot Outline: A stressed father, a bride-to-be with a secret, a   smitten event planner, and relatives from around the world create   much ado about the preparations for an arranged marriage in India. 

Hacking the Script

The most obvious hack to this script would be to get rid of the ugly IMDb movie ID numbers . It would be straightforward to hide the movie IDs (because the IDs as shown are rather unfriendly and prone to mistyping) and have the shell script output a simple menu with unique index values (e.g., 1, 2, 3) that can then be typed in to select a particular film.

A problem with this script, as with most scripts that scrape values from a third-party website, is that if IMDb changes its page layout, the script will break and you'll need to rebuild the script sequence. It's a lurking bug, but with a site like IMDb that hasn't changed in years , probably not a dramatic or dangerous one.




Wicked Cool Shell Scripts. 101 Scripts for Linux, Mac OS X, and Unix Systems
Wicked Cool Shell Scripts
ISBN: 1593270127
EAN: 2147483647
Year: 2004
Pages: 150
Authors: Dave Taylor

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net