Hack24.Create Link Graphs


Hack 24. Create Link Graphs

Use the font size of links to express the importance of certain terms.

Flikr (http://flikr.com/) is a site that allows users to upload images, and then tag those images with single-word terms. You can come back later and search the Flikr database using those terms, and you can see a link graph [Hack #91] that shows the most-used terms in a font larger than that used for the less frequently used terms.

In this hack, I show you how to create a link graph by analyzing an article from the Cable News Network web site (http://cnn.com/) for keywords. The words are counted, and the font size of each word is scaled relative to the number of counts.

3.15.1. The Code

Save the code in Example 3-19 as linkgraph.php.

Example 3-19. Link graph code
 <?php $wordcounts = array(); $words = split( " ", "CNN number Americans disapproving President Bush job perance risen highest level presidency according CNN USA Today Gallup poll released Monday According poll percent respondents disapproved Bush performance compared percent approved margerror plus minus percentage points percent figure highest disapproval rating recorded CNN USA Today Gallup poll Bush president January approval percentage percent matches low point late March point gap between those disapproved approved largest recorded during Bush tenure As Bush prepares address nation Tuesday defend Iraq policy just percent those responding poll approved handling war percent disapproved Full story approval rating Iraq unchanged poll late May disapproval figure marked increase percentage points But poll found issues other Iraq war dragging down Bush numbers Respondents expressed stronger disapproval handling economy energy policy health care Social Security lone bright spot president poll handling terrorism which scored percent approval rating compared just percent disapproved presidents worst numbers latest poll came issue Social Security respondents disapproving performance margmore percent percent Bush made changing Social Security system signature issue second term He proposed creating voluntary government sponsored personal retirement accounts workers younger Under proposal workers could invest portion their Social Security taxes range government selected funds exchange guaranteed benefits retirement plan run instiff opposition Democrats accounts are too risky undermine Social Security system Some Republicans are wary taking such politically risky economy only percent poll respondents approved Bush performance compared percent disapproved On energy policy percent approved percent disapproved health care percent approved percent disapproved poll results based interviews Friday Sunday American adults" ); foreach( $words as $word ) {   $word = strtolower( $word );   if ( strlen( $word ) > 0 )   { if ( ! array_key_exists( $word, $wordcounts ) )     $wordcounts[ $word ] = 0;    $wordcounts[ $word ] += 1;   }  } $min = 1000000; $max = -1000000; foreach( array_keys( $wordcounts ) as $word ) {   if ( $wordcounts[ $word ] > $max ) $max = $wordcounts[ $word ];   if ( $wordcounts[ $word ] < $min ) $min = $wordcounts[ $word ];  }  $ratio = 18.0 / ( $max - $min ); ?> <html> <head> <style type="text/css"> body { font-family: arial, verdana, sans-serif; } .link { line-height: 20pt; } </style> </head> <body> <div style="width:600px;"> <?php $wc = array_keys( $wordcounts ); sort( $wc ); foreach( $wc as $word ) { $fs = (int)( 9 + ( $wordcounts[ $word ] * $ratio ) ); ?> <a  href="http://en.wikipedia.org/wiki/<?php echo($word); ?>" style="font-size:<?php echo( $fs ); ?>pt;"> <?php echo( $word ); ?></a> &nbsp; <?php } ?> </div> </body> </html> 

I've hardcoded in the keywords of an article; you could just as easily fetch an article from the Web programmatically.


3.15.2. Running the Hack

Upload the file to your web server and navigate your browser to linkgraph.php. You should see something like Figure 3-20.

As you can see, terms like percent, bush, approved, disapproved, security, and social stand out from the rest because they were used more often. It's interesting that from these clues, it's clear that this CNN article was about recent polling numbers and Bush's second-term efforts on Social Security. The word disapproved is slightly larger, which could indicate something negative, or just a writing style in the article. Regardless, even on this simple data set, it's clear that some interesting features in the data appear clearly contrasted in a link graph.

3.15.3. See Also

  • "Search Google by Link Graph" [Hack #91]

Figure 3-20. The link graph of the article




PHP Hacks
PHP Hacks: Tips & Tools For Creating Dynamic Websites
ISBN: 0596101392
EAN: 2147483647
Year: 2006
Pages: 163

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net