Hack 34 Detective Case Study: iFilm

Hack 34 Detective Case Study: iFilm

figs/beginner.gif figs/hack34.gif

Sometimes, the detective work is more complicated than the solution .

iFilm (http://www.ifilm.com/) offers a huge selection of worthwhile movies, including special clips and trailers from the latest theater flicks; previews for the newest and coolest games ; and odd bits of ephemera from contributors, celebrities , and third-party sites. They also go out of their way to create a true media experience for the end user , with frames here, media markup (like SMIL) there, and convoluted logic that a normal visitor would never worry about.

Of course, we're not your average visitor. In this hack, we'll archive iFilm media, specifically the QuickTime versions (yes, I'm an Apple enthusiast), without wading through ad-heavy windows , pop-up annoyances, or movie-specific recommendations. Similar to the Newgrounds hack [Hack #33], we'd like to pass an ID on the command line and, a few minutes later, have a movie ready for watching.

First, we're going to load a page that has the media we want to download. Picking randomly from the home page, we come across a game trailer located at http://ifilm.com/filmdetail?ifilmid=2462842&cch=1. By Jove, there's a unique ID in that URL! We're off to a good start, although we doubt there are two million movies availablea hunch that is confirmed by randomly choosing numbers for the ifilmid and receiving lots of errors. Likewise, that cch gives us no contextual clues; it could stand for just about anything. Removing it from the final URL seems to have no adverse effects, so we'll assume it's not relevant for what we want to do.

Throwing out ineffectual URL-line arguments both helps you focus on what's actually needed to get the job done and keeps your code clear of seemingly useful, yet utterly unnecessary, variables and values.


Whatever movie page we visit, we see standard links: namely, a button for 56 KB, 200 KB, and 500 KB downloads. Likewise, we notice that, most of the time, the 500 KB download is available only for membersthose who have paid for the better quality. Now, if we were viewing this in a browser over a dialup connection, we'd probably always choose the 56 KB version, 'cause we're impatient sleuths. Since we'll be archiving the movies for posterity, however, we'll want the best quality version we can grab, so we'll work our way backward from 500 KB to 56 KB, checking for availability.

Mousing over our three choices, we see they're all using JavaScript links:

 playVideo(2462842, 56, 'no', '', '1', ''); playVideo(2462842, 200, 'no', '', '1', ''); playVideo(2462842, 500, 'yes', '', '1', ''); 

Exploring such links is always walking a fine line between the information you actually need to know and display and other attributes that need not concern you. By glancing at the links, we've learned three things:

  • The ID of the movie is passed to the playVideo function.

  • The quality of the movie is passed as the second argument.

  • Whether a movie requires pay access is passed as the third argument. We've never investigated the internals of the JavaScript to confirm this assertion, but we've instead relied on the fact that most 500 KB versions were for " members only."

We don't know what the other three arguments are and, honestly, we don't need to: they don't appear to differ from movie to movie, so chances are there's no need to investigate.

Click the best quality link that isn't a pay version (in our example, 200 KB). Up pops a window allowing us to choose (or have autodetected) our media player. As mentioned, my preference is for QuickTime, so I'll select that as my preferred format, arriving eventually at an iFilm Player window with ads, preference settings, navigation, and more.

Viewing the source, we come across the following snippet:

 <iframe  name="mp" id="mp"  src='/media/components/mp/if/qt.jsp?pinfo=ipt: ifilmgpt:1fid:2462842mt:movbw:200refsite:rcid:prn:it:pop:lid:sid: 1cid:1cch:100cr:1ctxpg:plc:trueadmt:movadid:2471970adbt: sponsoradsn:' style="width:340;height:315;position:absolute;top:60px;left: 340px;z-index:10;overflow : hidden;" frameborder=0 hspace="0" vspace="0"  MARGINWIDTH="0" MARGINHEIGHT="0" SCROLLING="no" class="bg01"></iframe> 

We'll assume the mp in name="mp" id="mp " means "movie player." The most important portion of this is also the longest: the src . For us to find out more about where the movies are served from, we need to open that URL in a window of its own, instead of that <iframe> . <iframe> , or "inline frames," keeps us from viewing the HTML source the <iframe> refers to. For us to see the actual HTML of our movie player, we'll need to prepend the <iframe> 's URL with http://ifilm.com/, load it up, and see what we can see.

In the source, we see yet another URL:

 QTSRC="http://www.ifilm.com/media/getmetafile.smil?fid=2462842&mt=mov&bw=200 &adid=2471970&admt=mov&refsite=&pinfo=ipt:ifilmgpt:1fid:2462842mt:mov bw:200refsite:rcid:prn:it:pop:lid:sid:1cid:1cch:100cr:1ctxpg:pl c:trueadmt:movadid:2471970adbt:sponsoradsn:" 

As before, we learn something new. For one, all the URLs are sent around with the movie's unique ID, as well as with our bandwidth choice ( &bw ). We can also see the format of video the file we chose in various places ( mov ). All these arguments are handed to something called getmetafile.smil ; the .smil extension suggests that at some point we'll be handed a Synchronized Multimedia Integration Language (SMIL) file. SMIL (http://www.w3.org/AudioVideo/) is a markup language specifically for integrating media in presentations. You may not have encountered it before, but in the vein of "know only what you need to," it won't prove much of a worry. Load the provided URL in your browser. Depending on your browser, you'll either be asked to download a file, or you'll be shown plain text in the window. That's the SMIL file we were expecting, which is interpreted by the embedded media player in the original pop-up window we opened; on its own and out of the context of that pop-up menu, it provides just the information we're afterthe URL of the movie itself:

 <video src="http://anon.ifilm.speedera.net/anon.ifilm/ qt/portal/2462842_200.mov" title="IFILM.com" region="IFILM" /> 

That URL contains the movie's unique ID ( 2465042) , the format type ( qt and .mov ), and our preferred quality setting ( 200 ). It then becomes a simple matter of passing that off to a downloader like wget (see [Hack #26]), and the movie is ours:

 %  wget http://anon.ifilm.speedera.net/anon.ifilm/qt/portal/2462842_200.mov  

You can do much better than all these manual shenanigans with a little shell scripting. We'll assume wget on a Linux system to do the actual downloading and write a simple program to accept a film ID and get the best quality video.

The Code

Save the following code to a file called leechifilm.sh :

 #!/bin/sh # # LeechiFilm - saves movies from iFilm.com. # Part of the Leecharoo suite - for all those hard to leech places. # http://disobey.com/d/code/ or contact morbus@disobey.com. # # This code is free software; you can redistribute it and/or # modify it under the same terms as Perl itself. # for id in $*; do     f56="http://anon.ifilm.speedera.net/anon.ifilm/qt/portal/${id}_56.mov"     f200="http://anon.ifilm.speedera.net/anon.ifilm/qt/portal/${id}_200.mov"     f500="http://anon.ifilm.speedera.net/anon.ifilm/qt/portal/${id}_500.mov"     wget -c $f500  wget -c $f200  wget -c $f56 done 

Running the Hack

Invoke the script on the command line, passing it a movie's unique ID. You'll need to do the initial work of gleaning that identifier, as we did previously:

 %  leechifilm.sh 2462842  ... etc ... --21:43:31--  http://anon.ifilm.speedera.net/anon.ifilm/qt/portal/  [RETURN]  2462842_200.mov            => `2462842_200.mov' Resolving anon.ifilm.speedera.net... done. Connecting to anon.ifilm.speedera.net[64.15.251.217]:80... connected. HTTP request sent, awaiting response... 200 OK Length: 4,231,752 [video/quicktime] ... etc ... 

You can also pass multiple film IDs, and wget will trudge along happily. If you're familiar with bash , you can eschew the need for an external script and just define function leechifilm( ) within your .bash_profile .



Spidering Hacks
Spidering Hacks
ISBN: 0596005776
EAN: 2147483647
Year: 2005
Pages: 157

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net