Recipe 13.1. Fetching a URL with the Get Method


13.1.1. Problem

You want to retrieve the contents of a URL. For example, you want to include part of one web page in another page's content.

13.1.2. Solution

Provide the URL to file_get_contents( ), as shown in Example 13-1.

Fetching a URL with file_get_contents( )

<?php $page = file_get_contents('http://www.example.com/robots.txt'); ?> 

Or you can use the cURL extension, as shown in Example 13-2.

Fetching a URL with cURL

<?php $c = curl_init('http://www.example.com/robots.txt'); curl_setopt($c, CURLOPT_RETURNTRANSFER, true); $page = curl_exec($c); curl_close($c); ?>

You can also use the HTTP_Request class from PEAR, as shown in Example 13-3.

Fetching a URL with HTTP_Request

<?php require_once 'HTTP/Request.php'; $r = new HTTP_Request('http://www.example.com/robots.txt'); $r->sendRequest(); $page = $r->getResponseBody(); ?>

13.1.3. Discussion

file_get_contents( ), like all PHP file-handling functions, uses PHP's streams feature. This means that it can handle local files as well as a variety of network resources, including HTTP URLs. There's a catch, though'the allow_url_fopen configuration setting must be turned on (which it usually is).

This makes for extremely easy retrieval of remote documents. As Example 13-4 shows, you can use the same technique to grab a remote XML document.

Fetching a remote XML document

<?php $url = 'http://rss.news.yahoo.com/rss/oddlyenough'; $rss = simplexml_load_file($url); print '<ul>'; foreach ($rss->channel->item as $item) {    print '<li><a href="' .          htmlentities($item->link) .          '">' .          htmlentities($item->title) .          '</a></li>'; } print '</ul>'; ?>

To retrieve a page that includes query string variables, use http_build_query( ) to create the query string. It accepts an array of key/value pairs and returns a single string with everything properly escaped. You're still responsible for the ? in the URL that sets off the query string. Example 13-5 demonstrates http_build_query( ).

Building a query string with http_build_query( )

<?php $vars = array('page' => 4, 'search' => 'this & that'); $qs = http_build_query($vars); $url = 'http://www.example.com/search.php?' . $qs; $page = file_get_contents($url); ?> 

To retrieve a protected page, put the username and password in the URL. In Example 13-6, the username is david, and the password is hax0r.

Retrieving a protected page

<?php $url = 'http://david:hax0r@www.example.com/secrets.php'; $page = file_get_contents($url); ?>

Example 13-7 shows how to retrieve a protected page with cURL.

Retrieving a protected page with cURL

<?php $c = curl_init('http://www.example.com/secrets.php'); curl_setopt($c, CURLOPT_RETURNTRANSFER, true); curl_setopt($c, CURLOPT_USERPWD, 'david:hax0r'); $page = curl_exec($c); curl_close($c); ?>

Example 13-8 shows how to retrieve a protected page with HTTP_Request.

Retrieving a protected page with HTTP_Request

<?php $r = new HTTP_Request('http://www.example.com/secrets.php'); $r->setBasicAuth('david','hax0r'); $r->sendRequest(); $page = $r->getResponseBody();

PHP's http stream wrapper automatically follows redirects. Since PHP 5.0.0, file_get_contents( ) and fopen( ) support a stream context argument that allows for specifying options about how the stream is retrieved. In PHP 5.1.0 and later, one of those options is max_redirects'the maximum number of redirects to follow. Example 13-9 sets max_redirects to 1, which turns off redirect following.

Not following redirects

<?php $url = 'http://www.example.com/redirector.php'; // Define the options $options = array('max_redirects' => 1 ); // Create a context with options for the http stream $context = stream_context_create(array('http' => $options)); // Pass the options to file_get_contents. The second // argument is whether to use the include path, which // we don't want here. print file_get_contents($url, false, $context); 

The max_redirects stream wrapper option really indicates not how many redirects should be followed, but the maximum number of requests that should be made when following the redirect chain. That is, a value of 1 tells PHP to make at most 1 request'follow no redirects. A value of 2 tells PHP to make at most 2 requests'follow no more than 1 redirect. (A value of 0, however, behaves like a value of 1'PHP makes just 1 request.)

If the redirect chain would have PHP make more requests than are allowed by max_redirects, PHP issues a warning.

cURL only follows redirects when the CURLOPT_FOLLOWLOCATION option is set, as shown in Example 13-10.

Following redirects with cURL

<?php $c = curl_init('http://www.example.com/redirector.php'); curl_setopt($c, CURLOPT_RETURNTRANSFER, true); curl_setopt($c, CURLOPT_FOLLOWLOCATION, true); $page = curl_exec($c); curl_close($c); ?>

To set a maximum number of redirects that cURL should follow, set CURLOPT_FOLLOWLOCATION to TRue and then set the CURLOPT_MAXREDIRS option to that maximum number.

HTTP_Request does not follow redirects, but another PEAR module, HTTP_Client, can. HTTP_Client wraps around HTTP_Request and provides additional capabilities. Example 13-11 shows how to use HTTP_Client to follow redirects.

Following redirects with HTTP_Client

<?php require_once 'HTTP/Client.php'; // Create a client $client = new HTTP_Client(); // Issue a GET request $client->get($url); // Get the response $response = $client->currentResponse(); // $response is an array with three elements: // code, headers, and body print $response['body']; ?>

cURL can do a few different things with the page it retrieves. As you've seen in previous examples, if CURLOPT_RETURNTRANSFER is set, curl_exec( ) returns the body of the page requested. If CURLOPT_RETURNTRANSFER is not set, curl_exec( ) prints the response body.

To write the retrieved page to a file, open a file handle for writing with fopen( ) and set the CURLOPT_FILE option to that file handle. Example 13-12 uses cURL to copy a remote web page to a local file.

Writing a response body to a file with cURL

<?php $fh = fopen('local-copy-of-files.html','w') or die($php_errormsg); $c = curl_init('http://www.example.com/files.html'); curl_setopt($c, CURLOPT_FILE, $fh); curl_exec($c); curl_close($c); ?>

To pass the cURL resource and the contents of the retrieved page to a function, set the CURLOPT_WRITEFUNCTION option to a callback for that function (either a string function name or an array of object name or instance and method name). The "write function" must return the number of bytes it was passed. Note that with large responses, the write function might get called more than once as cURL processes the response in chunks. Example 13-13 uses a cURL write function to save page contents in a database.

Saving a page to a database table with cURL

<?php class PageSaver {     protected $db;     protected $page ='';     public function __construct() {         $this->db = new PDO('sqlite:./pages.db');     }     public function write($curl, $data) {         $this->page .= $data;         return strlen($data);     }     public function save($curl) {         $info = curl_getinfo($curl);         $st = $this->db->prepare('INSERT INTO pages '.                            '(url,page) VALUES (?,?)');         $st->execute(array($info['url'], $this->page));     } } // Create the saver instance $pageSaver = new PageSaver(); // Create the cURL resources $c = curl_init('http://www.sklar.com/'); // Set the write function curl_setopt($c, CURLOPT_WRITEFUNCTION, array($pageSaver,'write')); // Execute the request curl_exec($c); // Save the accumulate data $pageSaver->save($c);

13.1.4. See Also

Recipe 13.2 for fetching a URL with the POST method; documentation on file_get_contents( ) at http://www.php.net/file_get_contents, simplexml_load_file( ) at http://www.php.net/simplexml_load_file, stream_context_create( ) at http://www.php.net/stream_context_create, curl_init( ) at http://www.php.net/curl-init, curl_setopt( ) at http://www.php.net/curl-setopt, curl_exec( ) at http://www.php.net/curl-exec, curl_getinfo( ) at http://www.php.net/curl_getinfo, and curl_close( ) at http://www.php.net/curl-close; the PEAR HTTP_Request class at http://pear.php.net/package/HTTP_Request; and the PEAR HTTP_Client class at http://pear.php.net/package/HTTP_Client.




PHP Cookbook, 2nd Edition
PHP Cookbook: Solutions and Examples for PHP Programmers
ISBN: 0596101015
EAN: 2147483647
Year: 2006
Pages: 445

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net