We will now look at how we will make PHP work with our most commonly used Unicode characters.
Installation and Configuration of mbstring and mbregex
As we mention later in Appendix A, "Installation/Configuration," you will need to configure PHP as you compile or install it to enable multi-byte string and regular expression functions. For fresh builds of PHP that you compile yourself, you will want to make sure you pass the options
when you run the configuration program. For PHP installations on machines running Microsoft Windows (where you will often not compile PHP5 yourself), you will enable mbstring functionality by editing the php.ini file, typically in the Windows root directory (C:\Windows), or the directory into which PHP was installed. Make sure that the following entry is uncommented by verifying that there is no semicolon (;) at the beginning:
You will also need to check that the appropriate directory containing the mbstring extension dynamic link library (DLL) listed previously (php_mbstring.dll) is in the path where PHP searches for extensions by setting the extension_dir configuration option in the same php.ini file:
extension_dir = "c:\php\ext\"
Once you have the extension enabled and ready to go, we will then turn to configuring it, which is the same under Unix and Windows versions of PHP5. We will do this by setting a number of options in php.ini, as shown in Table 6-1 (these options are under the [mbstring] section).
One of the ways mbstring is made even more useful is through its ability to overload a group of functions that are not normally multi-byte safe and replace them with implementations that are safe. There are three groups of functions available for overloading:
The three groups of functions are represented by the binary values 1, 2, and 4 respectively; the setting value of 7 we are using for mbstring.func_overload in php.ini is a bitwise OR of these three values.
When you do not wish to use function overloading, all of the functions listed as being overloaded also have non-overloaded versions whose names are the same as their non-multi-byte brethren, with mb_ prefixed to them (mb_strpos, mb_strlen, mb_mail, mb_eregi, and so on).