Now that you know what's to blame and where it is, you can concentrate on patching your code to improve performance.
Mercifully, there is one thing of which you can be certain: You won't be the only PHP developer who's ever encountered bottlenecks like yours. As a result, there are some tried and tested techniques you can follow to try to patch them.
Earlier, you saw how to identify which queries were holding up execution of your script. But why a query is running badly could be a result of any number of factors. Optimizing SQL statements is a massive topic, and there simply isn't the space to go into much detail here. There are numerous excellent volumes on PostgreSQL, MySQL, and other databases that deal with query optimization. However, the following headline points are worth bearing in mind:
Primary keys that are numbers will always be faster as a primary index than primary keys that are alphanumeric, even if there are unique alphanumeric values that would ordinarily be an obvious candidate for primary key. In general, an "id'' column is what is required.
Ensure that columns and, perhaps more important, combinations of columns by which you will filter or order in SELECT statements have corresponding indices defined.
Scheduled scripts to clean up tables (using, for example, the VACUUM statement in PostgreSQL) whose contents are updated frequently may well prove to be sound investments.
Never use SELECT *. If you know the names of the columns you want, specify them. This will improve performance.
Try to minimize the number of queries you are performing to achieve your desired result. Selecting a string of matching id values and then using a brace of queries to retrieve their contents may fit better with your object model, but it is very inefficient. Is there a way to find a middle ground? In Chapter 7, for example, you met the GenericObjectCollection class, which let you do just that adopt an OOP-compliant approach to development while minimizing the number of SQL queries necessary to effect that approach.
Subselects are easier to put together than INNER or OUTER joins, but nowhere near as efficient. Avoid them if possible (this will also, as a side effect, increase RDBMS portability). If you suspect that subselects may be slowing your application down, try replacing them with either INNER or OUTER joins as appropriate. Consult the documentation of your preferred flavor of database for more information.
Most tips such as the preceding increase speed by only fractions of a second. But this really is a case of every little bit helping, especially when you are dealing with pages with a large number of queries.
Don't forget that you will almost certainly need to look outside the PHP script itself to track down the offending SQL query, particularly if you are following the best practice principles as set out in this book. If you have devised an object model that is heavily dependent on a database backend, you may find that performance tweaks in queries used in that model have unexpected performance gains elsewhere. By the same token, however, when changes are being made to such high-level classes, care should be taken to thoroughly test all components that depend on them to ensure that functionality and behavior remain consistent with expectations.
Ultimately, the same principles as those for optimizing database queries apply here. It's not good enough to know simply which block of code is causing a holdup; you need to work out which PHP statement is being slow and under what circumstances.
However, it is sometimes the case that simple errors in application logic are to blame. In such cases, the application does function as expected, so ordinary QA techniques have not registered a bug. However, some human error in the code has caused performance to be adversely affected. These are the easiest to fix.
Consider the following example:
switch ($i) { case 0: array_pop($arMediumArray); break; case 1: array_reverse($arAnotherHugeArray); case 2: $arHugeArray = array_unique($arHugeArray); break; };
As you can probably spot, we are missing a break statement under the case where $i is equal to 1. This means that the case where $i is equal to 2 will also be executed when $i is equal to 1. In that case, all that is necessary is for $arAnotherHugeArray to be reversed. But, in fact, $arHugeArray is also being made unique, as a side effect. It is quite possibly the case that $arHugeArray will never be used again after this switch statement should $i equal 1, so this method being erroneously called may not cause a malfunction per se. What it likely does, however, is double the execution time for the block. Adding the break statement stops the redundant method call and speeds execution as a result.
The methods alluded to previously should still allow you to track down such problems. By printing the execution time for the entire block, and the execution time for the statements you think should be getting executed, you will often see a discrepancy between the two. Although this discrepancy may be the result of any number of problems, it is entirely possible that a logic error is to blame. Accordingly, it is worthwhile checking for such problems, particularly when using prone constructs such as the switch statement, before getting too in-depth with query optimization and other more intense performance tuning.
At all other times, judicious use of the techniques met earlier should allow you to narrow your bottleneck down to a single offender, or at least, a collection of offenders.
The most common methods built into PHP, which can be very slow for one reason or another, include:
Any method that relies on an external source for data. We have looked at databases as a bottleneck separately, but consider, for example, disk access to a busy disk or network drive, HTTP and FTP requests, IP address resolution, and communication with other objects via Web services protocols such as SOAP and RPC.
Anything memory intensive may have no problems when called in isolation, but when fighting for physical memory with other instances of the script, and forced to use virtual memory (that is, disk) as a result, such methods can grind to a halt. Typically, anything to do with the built-in graphics library Gd is likely to be an offender, especially when having to work with large images. Consider whether implementing caching techniques, or performing offline generation using scheduled tasks, is feasible.
Anything involving sockets is heavily dependent on network performance. Again, an approach whereby socket conversations are made offline and the user is advised as to his or her status when a request is made, rather than a socket conversation's being initiated as a direct consequence of a request's being made, is a better approach. Consider, therefore, maintaining an e-mail queue database table, which is processed every five minutes, and inserting into that queue from within your script instead of sending mail directly from within a script.
Exercise extreme caution when using shared network drives (over NFS in particular) as sources for large binary files. For example, a community Web site may allow users to upload a JPEG photograph of themselves for use in their profiles. In a multiple-server environment, this JPEG needs to be accessible by all Web servers. It makes sense to use disk rather than the database (databases are still awful at storing binary data), but shared network disk protocols are notoriously slow. Consider, instead, replicating binary data across each server using rsync or similar tools. A sync script might run as a background process every five minutes, redistributing data between servers, so a local copy is always available within five minutes of the initial upload.
There are many more examples of methods that can cause bottlenecks. The PHP online documentation often refers to potential performance problems in its entries for particular methods and classes.
When you're relying on PEAR classes, it is also worth checking that your server is installed with the latest supported versions. As with all distributed components, bugs do crop up, and it's worth ensuring that you have the latest versions to rule out any performance problems that might arise from those bugs.
Any changes you make to your code or SQL queries as a result of the discoveries discussed previously need to be tested thoroughly. Make sure that you apply the same basic first principles of QA as those discussed in Chapter 23, "Quality Assurance,'' when testing the scripts and components you have changed as you would when testing the application as a whole.
This includes testing not just the input parameters and usage scenarios you used while making your changes, but all conceivable parameters and scenarios where it is conceivably possible that your changes could have an impact. Err on the side of overtesting rather than undertesting.
If you have made changes on a development environment rather than a live environment, try to snatch a copy of your live database or data store and test locally. If this is not possible, try to set up a Virtual Server in Apache (or your preferred Web server) to use as a temporary staging environment, using your new code base against the live database but running distinctly from the live environment.
Either way, be sure that when you go live, you are confident not only that your changes will have the desired effect in improving performance but also that they will not adversely affect functionality in any way.