Data Mining

   

Many Web pages provide a lot of useful information. Sometimes the information you need is surrounded by a lot of other information that you don't need or want. Using PHP, it is possible to read a page on the Web and extract the information you need.

When extracting information from other sites, it is important to remember that any data on another site is most likely copyrighted. Do not use other people's data for your own purposes without express permission from the owner of that data.

This next example details how to retrieve a stock quote from the Reuters News Service Web page. This is merely an example and should not be used on your own site unless you have expressly written consent from Reuters to use their data.

HTML is a rather poor language to specify the type of data contained on a page. It works well for formatting data, but when it comes to being able to find the data you need, you sometimes have to get a little creative.

Before you can extract the data you require, you must look at the data in its "natural" state. In the example below, the data is fetched from the Reuters page at www.reuters.com/quote.jhtml after you search for a quote. Looking at the HTML for the page, you will notice that the quote prices are one line below the "Bid" and "Ask" lines, which are unique to the page. Therefore, you can extract the prices by searching for the unique text.

This script uses another method of reading a file, called the file() function:

 $file = "/home/me/myfile.txt";  $filearray = file($file); 

The file() function opens the file, reads the file contents into an array with each line in the file corresponding to an array element, and then closes the file. file() takes one argument, the name of the file you wish to open, including the path if applicable.

Once again, remember to respect copyright notices when using scripts that mine Web pages for data! The result of this script can be seen in Figure 6-2.

Script 6 2 stockquote.php
  1.  <html>  2.  <head><title>Stock Quote Mining</title></head>  3.  <body>  4.  <form action="stockquote.php">  5.  <p>Enter Symbol For Quote: <input type="text" name="symbol">  6.  <br><input type="submit" name="submit" value="Fetch Quote">  7.  </form>  8.  <hr noshade>  9.  <? 10.  if(isset($symbol)) { 11.    $file = "http://www.reuters.com/quote.jhtml?ticker="; 12.    $contents = file($file . $symbol); 13.    $size = sizeof($contents); 14.    echo "<h3>Quote for $symbol:</h3>"; 15.    for($i = 0; $i < $size; $i++) { 16.      $data = strip_tags($contents[$i]); 17.      if(trim($data) == "Ask") { 18.        $quote = strip_tags($contents[$i + 1]); 19.        echo "<P>Ask: $quote"; 20.      } 21.      if(trim($data) == "Bid") { 22.        $quote = strip_tags($contents[$i + 1]); 23.        echo "<P>Bid: $quote"; 24.     } 25.    } 26.  } 27.  ?> 28.  </body> 29.  </html> 
Figure 6-2. stockquote.php

graphics/06fig02.jpg

Script 6-2. stockquote.php Line-by-Line Explanation

LINE

DESCRIPTION

1 8

Print out the beginning HTML for the page, as well as a form asking for the symbol of the stock for which the user wishes to retrieve a quote.

9

Begin the PHP for the page.

10

Check to see if the $symbol variable has been set. If it has, then execute lines 11 26.

11

Assign the $file variable to the page that has stock quote information.

12

Assign the data opened from the file() function to the $contents variable. The file() function's argument ($file . $symbol) is the URL appended with the stock quote symbol entered by the user. For example, if the user entered "CSCO" (the symbol for Cisco Systems), then the argument is "http://www.reuters.com/quote.jhtml?ticker=CSCO".

13

Get the size of the $contents array. This is used below to loop through each record in the array.

14

Print out a heading.

15 25

Use a for loop to loop through each record in the array.

16

Strip out the HTML tags from the record retrieved from the array and assign this value to the $data variable.

17

If, after trimming whitespace from $data, the resulting value is "Ask", then we know, after analyzing the page we wish to mine, that the value we are looking for is on the next line. Execute lines 18 20.

18

Get the "Ask" quote from the next line in the array ($contents[$i +1]) and assign it to the quote variable.

19

Echo the "Ask" quote to the page.

20

Close out the if statement started on line 17.

21

If, after trimming whitespace from $data, the resulting value is "Bid", then we know, after analyzing the page we wish to mine, that the value we are looking for is on the next line. Execute lines 22 24.

22

Get the "Bid" quote from the next line in the array ($contents[$i +1]) and assign it to the quote variable.

23

Echo the "Bid" quote to the page.

24

Close out the if statement started on line 21.

25

Close out the for loop started on line 15

26

Close out the if statement started on line 10.

27 29

End the PHP and HTML for the page.


   
Top


Advanced PHP for Web Professionals
Advanced PHP for Web Professionals
ISBN: 0130085391
EAN: 2147483647
Year: 2005
Pages: 92

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net