19.1 Stripping HTML Tags


You want to remove or extract all HTML tags from a document.

Technique

Either use the strip_tags() function after you have read the entire file or string:

 <?php $new_str = strip_tags($old_str); ?> 

Or, when looping through a file line by line, use the fgetss() function:

 <?php $fp = @fopen('/path/to/somefile', 'r')     or die('Cannot Open Somefile'); while ($line = @fgetss($fp, 1024)) {     $no_html_tags .= $line; } fclose($fp); ?> 

Comments

Sometimes you have HTML documents that you want to convert to plain text ”a format accessible to everyone, everywhere. PHP enables you to do this by providing functions that take away all the HTML tags in a document. In other languages, this requires complex regular expressions. But there is one weakness: In HTML, things such as extra spaces and copyright characters ( ) are represented with special format specifiers that look something like this:

 &  individual_specifier;  

Therefore, we must also eliminate these unwanted specifiers and replace them with their corresponding ASCII characters, To do this, use the HTML_processor class from PEAR:

 <?php require("HTML_processor.php"); $processor = new HTML_Processor; $processor->ConvertSpecial(&$text); ?> 


PHP Developer's Cookbook
PHP Developers Cookbook (2nd Edition)
ISBN: 0672323257
EAN: 2147483647
Year: 2000
Pages: 351

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net