12.7 Implementation Considerations


Two big concerns face the Perl programmer implementing web services: Unicode and performance. Perl 5.8 has solid Unicode support, required for XML processing, but most programmers have never needed to use it. Perl has traditionally been slammed for performance, but it's possible to write blazing web services in Perl. This section addresses both issues.

12.7.1 Internationalization

XML performs all character processing in terms of the Universal Character Set (UCS) specification. Look for more information on UCS at:

http://www.unicode.org/unicode/uni2book/u2.html

Many specifications that use XML speak about the character set or charset of a string or a document, which denotes both the character repertoire and the encoding form (or simply, encoding) used to represent sequences of characters as sequences of bytes. The XML specification requires all XML processors to support both the UTF-8 and UTF-16 encodings of UCS.

UTF-8, the most common encoding, is the Unicode Transformation Standard (defined in UCS specification) that serializes a Unicode character as a sequence of one to four bytes. The advantages of UTF-8 encoding are well-known: it's compact compared with other encoding forms, it uses one byte to encode ACSII characters, and every XML parser must know how to deal with it. UTF-16 encoding is slightly more complex; it requires a minimum of two bytes, and the XML specification requires usage of a Byte Order Mark (BOM) with this encoding. Per UCS specification, a BOM signature is allowed to be used for all the encoding forms (UCS-4, UCS-2, UTF-32, UTF-16, and UTF-8) to correct an erroneous byte order (for all messages that use a byte ordering other than big-endian). It isn't intended to specify the byte order; rather it gives a strong hint about which encoding form is used (yet the XML specification requires it to be used with UTF-16).

You may be lucky and never have to deal with the sending or receiving of non-English characters. If you do have to, there are several questions to ask: what encoding to use, how to convert data into that encoding, and how to specify the encoding for messages on the wire.

If you send characters that don't require a specific code table (such as Chinese, Japanese, or Korean languages), it's best to stay with UTF-8 or ISO 8859-1 encodings. To convert between ISO 8859-1 and UTF-8 encodings using Perl 5.6 and later, the following code may be used:

my $utf8 = pack('U*', unpack('C*', ' figs/pwsp_cyrillic.gif '));

Those who need to work with other encodings may use Unicode::Map8 , Unicode::String , or Encode modules. Also, the Unicode support supplied with Perl 5.8 is much more complete and allows the easy conversion between different encodings using the Encode module from Perl's standard library:

 use Encode;     $utf8 = decode('iso-8859-1', $data);  # this code will make $utf8 consist of completely valid UTF-8 string # AND turn utf8 flag on (unless $data is in ASCII/EBCDIC only) 

To specify the encoding on wire using the SOAP::Lite module encoding method may be used:

use SOAP::Lite; # specify type explicitly so it won't be encoded as base64 my $string = SOAP::Data->type(string => ' figs/pwsp_cyrillic.gif '); my $result = SOAP::Lite -> proxy ("...") -> uri ("...") -> encoding('iso-8859-1') # specify encoding, because default is UTF-8 -> hello($string) -> result;

Nothing needs to be done on the server side. The XML::Parser module that is used by default in SOAP::Lite always returns data using UTF-8 encoding. There is one thing to watch for: returned strings are always encoded as UTF-8, but they may not be recognized as UTF-8 strings by Perl, which might be important in some cases.

The XML::Parser module (Version 2.28 and later) tags generated strings as UTF-8 in Perl 5.6 and beyond. For all other occasions (for example, you're running an old version of XML::Parser), something similar to the following code can be used in Perl 5.7 and beyond:

 $result = pack 'UOA*', $result; 

Simon Cozen's "Perl and Unicode" tutorial (http://www.netthink.co.uk/downloads/unicode.pdf) and Grant McLean's "Perl-XML Frequently Asked Questions" (http://perl-xml. sourceforge .net/faq/) provide extensive coverage on this topic.

Besides the character set and encoding considerations, there is one more aspect of internationalization that people who work with XML need to be aware of. Text encapsulated in XML can be represented in many different human languages, and applications often need to process them differently (e.g., as the result of content negotiation, one or the other value has to be presented). The XML specification defines a special attribute, xml:lang , which specifies the language that represents the data in an element. For instance, xml:lang="en" can be used to mark elements in language-dependent context.

12.7.2 Performance and Optimization

The first thing that comes to mind when performance and optimization for web-based applications are discussed is the switch from CGI application on the server side to something more persistent, like mod_perl , FastCGI , or PerlEx from ActiveState. The usage of mod_per l or a daemon server instead of CGI for SOAP::Lite implementation may improve the performance tenfold in some situations; however, there are some cases when you want to get better results.

Even though the SOAP::Lite module isn't the best performer in a web-services world, the modular structure lets you swap out components to get implementations that address your specific requirements, such as the ability to send big documents or do streaming processing.

Because parsing XML messages is involved in the processing of an every request, the first natural step is to change the default XML::Parser module to some other module that provides better performance ( XML::LibXML::SAX::Parser module looks promising , even though it can be more expensive memory-wise). Another reason for changing the parser would be the ability to do the SOAP processing on platforms where XML::Parser isn't available (like WinCE, this situation may change before this book gets published). Even though SOAP::Lite package includes the lightweight regexp-based parser ( XML::Parser::Lite ) that works anywhere where Perl works, it doesn't provide the full XML parser support. Matt Sergeant's XML::SAX::PurePerl parser is the better choice. Note that you may need to use additional module to convert SAX callbacks into Expat callbacks currently used in SOAP::Lite ; a future version of SOAP::Lite may include direct support for SAX interface. The code to register a new parser looks simple:

 use SOAP::Lite;     BEGIN {   package MyParser;   use base qw(SOAP::Parser);       # parser code here }     my $soap = SOAP::Lite   ->proxy("...")   ->uri("..."); # register new parser $soap->deserializer->parser(MyParser->new); 

In most cases the change of the parser doesn't provide the significant performance boost: most of the time, deserializer spends doing memory management and memory management is still the most time-consuming operation in many computing processes. Serializer and deserializer can be replaced in a similar fashion (both on client and server side), thus allowing new functionality to be implemented (such as streaming processing) and still reusing the rest of the components:

 my $soap = SOAP::Lite   ->proxy("...")   ->uri("...")   ->deserializer(MyDeserializer->new)   ->serializer(MySerializer->new); 

The last thing to mention in this section is that XML parsing isn't the only option available to parse and process XML messages. Because Perl is so good at dealing with the text, and XML messages are all text, it's possible to implement regexp-based parsing to extract the parameters and other information required to execute the method call. Although the logic that generates that regular expression can be quite complex, it's possible to generate the pattern once, store it, and match the incoming requests against stored patterns. While this solution has limited scope, it can be used for simple requests (having regular processing for all other requests ), and preliminary tests show that this approach can reduce the response time in a factor of 10.



Programming Web Services with Perl
Programming Web Services with Perl
ISBN: 0596002068
EAN: 2147483647
Year: 2000
Pages: 123

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net