The first time you'll hear about an application running slowly is probably from a user report. Unfortunately, users are rarely particularly useful in this respect; the cry of "this thing is so slow'' across the office tells you virtually nothing. It's important, therefore, to do a little digging.
Let's go back to basics for a minute and look at the structure of an HTTP GET or POST request. When the user's Web browser makes a request, it makes a socket connection to the Web server, usually on port 80 or, for SSL connections, 443. This is a blocking activity in the Web browser, so called because the browser cannot do anything else until the connection is successfully made. In practice, however, most modern Web browsers will allow the user to cancel while the connection attempt is being made, and on faster connections the connection is made in literally fractions of a second. But if the server is heavily loaded the connection may take a while to establish. If this is the case, it points to other applications or processes on that server slowing things down (not necessarily the application in question).
After achieving connection, the Web browser does not wait for any response because the HTTP protocol does not dictate there should be any. Immediately, it sends a very small request packet, usually not more than a few bytes in size. This request contains, among other data, the document that the Web browser requires, and any GET or POST parameters the user has offered as part of the request.
This request is, in itself, small. The time between the socket connection's being established and the request's being sent to the Web server is likely to be minimal.
The time between when the Web server has received the request and when it starts to return data is known as the processing time for your script. In most cases, PHP will not attempt to send any output to the Web browser until the entire script has finished executing, unless the volume of your output exceeds the value of output_buffering in php.ini. This means that the processing time is roughly equal to the time between when PHP starts executing your script and the time it finishes executing. This is the most likely place for a delay. The time between when data starts to be returned to the Web browser and when that data is finished transferring is the delivery time and is not likely to be related to PHP in any way. The time for this data to be transferred is much more likely to be tied to network performance, either at the server side (for example, an overloaded connection) or client side (a horrifyingly slow modem). Unless your page weight exceeds 55K, which is generally regarded as the limit for sensible Web pages, this is unlikely to be a cause of delays.
The easiest way to identify where a performance bottleneck is occurring is to use some manual tool, rather than a bona fide Web browser, to make the HTTP request, and analyze the results yourself.
Suppose you want to analyze where poor performance might be occurring in requests for the page /example.php, on the server www.example.com, with GET parameter foo equaling bar. This is, of course, equivalent to http://www.example.com/example.php?foo=bar. Start a console session and use telnet as follows:
ed@genesis:"$ telnet www.example.com 80 Trying 192.168.1.2... Connected to www.example.com Escape character is '^ ]'. GET /example.php?foo=bar HTTP/1.1 Host: www.example.com <HTML> <BODY> Hello, World! </BODY> </HTML>
In Windows, the same approach applies start a command prompt and use telnet in exactly the same manner.
To get real-life output, you will need to substitute the hostname and URL for real-life examples. You need to press Enter where you see a blank line, and enter spaces exactly as above. You may find it easier to write it all out in Notepad and paste it into telnet.
Have a stopwatch handy when you do this. Observe where the delay lies, and infer as follows:
A delay between pressing Enter and observing Trying 192.168.1.2 indicates a delay in resolving the IP address of the server against a name server. This is unusual. It could indicate unresponsive name servers, either those you are using yourself (typically those of your own ISP) or those serving the domain of the server in question (typically the ISP hosting the server). Why this unresponsiveness exists is outside the scope of this book, but you can at least reassure yourself PHP is not to blame. To a real-world user, this delay would be experienced only once, when first accessing the site, because most Web browsers (and, indeed, operating systems) cache the results of name server lookups.
A delay between the Trying . . . line and the Connected line indicates that the server itself took a while to successfully respond to your requests to connect. A delay here is massively damaging, because a page with several images could easily consist of twenty or thirty HTTP requests. If each one has a delay attached, the page could appear dramatically sluggish even if script execution time is markedly quick. Unfortunately, the delay could exist in one of two places either the network to/from the server or in the server's ability to respond to connections in a timely manner. The latter of these could be caused by server load, which could be caused by poorly optimized PHP (not necessarily this script) or other processes bringing the server to its knees. A quick check of memory and CPU utilization on the server can reveal the truth here. If it's the former, then it's outside the scope of this book; if it's the latter, then you should try to track down which script is causing the problem and, if it's PHP, apply the same methods seen here to that script.
A delay between pressing Enter twice after having entered your HTTP request and seeing the HTML of your response almost certainly indicates poor performance at processing time. This can be validated and verified by adding watches in code.
Assuming that the delay appears to be down to script processing time, you can now determine what in the script is causing the delay (or delays).
Forgetting the wider edicts on good PHP architecture for just a moment, we must remember that PHP is a scripting language. Its scripts have a start, a middle, and an end, just like any scripting language. The truth of the matter is that at the most basic, simplified level, the pattern of execution of any PHP script looks something like this:
Read input parameters, be they GET, POST,or COOKIE
Use those input parameters to make decisions, consulting external data sources in the process
The first and last of these three steps are not likely to cause performance bottlenecks because they are integral to PHP and your Web server. In other words, it's not your code. Only the real processing of data could be causing delays.
These delays will boil down to one of three things:
Poor algorithms inefficient code resulting in high execution time
Poor hardware the code's not unreasonable, but the hardware it's running on is elderly or overworked
External bottlenecks a database causing holdups
We should point out that hardware is rarely the problem. Except in extreme cases such as intense usage of Gd for graphics rendering, PHP does not require the latest or greatest in server hardware.
There are straightforward steps to resolving each different variety of identifiable bottleneck, as we discuss in the remainder of this appendix.
However, the first step is to determine where the bottleneck can be found and which of the three categories described previously it falls into.
The easiest offenders to track down are slow database queries. Almost all enterprise-class PHP applications depend on a database of some kind. In this book we have heavily advocated PostgreSQL, but many applications, particularly those you will encounter that are written by third parties, will make use of the more popular MySQL.
You must eliminate database bottlenecks before attempting to optimize code. Because most code is dependent on the database in some respect, even if only for the supply of data, to look at code before its data source is optimized is not time well spent.
The simplest way to determine whether database bottlenecks exist in a script is to temporarily adapt your database abstraction layer (first described in Chapter 8) to add a timer method.
Consider the query method of your abstraction layer for PostgreSQL. By adding error log stamps before and after each query, you can show how long each query took and easily produce an analyzable log for your page. The following excerpt shows an example, demonstrating how you might measure the time taken to perform a query in the database abstraction layer by sandwiching the execute query statement as follows:
$intTimeNow = microtime(); $q_handle = pg_exec($this->link_ident, $sql); $intTimeTaken = microtime() - $intTimeNow; error_log("DEBUG: QUERY: $sql\n"); error_log("DEBUG: TIME TAKEN: $intTimeTaken\n");
This will yield error log output similar to the following:
[Sun May 16 22:10:19 2004] [error] DEBUG: QUERY: SELECT id,logged_in,user_id FROM "user_sessions" WHERE session_id='98ce552be0a2ea6b6f69fbebcd14997c' AND user_agent='Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)' AND ip_address='192.168.4.3' [Sun May 16 22:10:19 2004] [error] DEBUG: TIME TAKEN: 0.003752
Executing a typical PHP script with your database abstraction layer doctored as shown in the preceding example will then yield a series of DEBUG: QUERY xx and DEBUG: TIME TAKEN statements as above.
The pattern of queries will be easier to analyze whether you can ensure that no other warnings, error messages, or code-initiated debug statements are writing to the error log and that the analysis of the time being taken by your query has exclusivity for now.
You should easily be able to spot the slow performers. In typical setups, anything lasting more than half a second should raise a red flag. Also note the sum total of the duration of queries for a page. Anything totaling more than three seconds is considered incredibly bad form and will drive your user nuts.
After any database bottlenecks have been addressed, you may safely look at the code itself. PHP is very fast, so algorithmic holdups are certainly less likely to be a factor than is poor database performance.
However, by enseeding your script with similar time stamp output as those used in tracking down database problems, you can determine where bottlenecks might lie.
You cannot do this in one fell swoop, for obvious reasons. The burden is on you to place "start the clock'' and "stop the clock'' statements at either side of blocks of code you feel might be troublesome. If you do find one, you can always drill down and add more start/stop statements within that block of code to find the precise culprit. Don't be afraid to output variable values in your error_log statement, too, to give a clue as to progress in for loops, for example.
This approach can be automated to some extent by using a package such as APD, the Advanced PHP Debugger (http://pecl.php.net/package/apd), though for smaller applications, and those built in a "best practice'' modular fashion, its immense functionality may well prove to be overkill.
Progress between steps should be nearly instantaneous in PHP, as long as an external data source is not involved. Delays of more than fractions of a second between logical blocks point to serious problems in your code.