Report Generation Architecture

You've learnt now the basics of reporting in PHP against a typical database. How do you put this theory into practice, however, and build a reporting platform for your application?

You may be tempted to follow an approach similar to the following:

A page called salesreport.php renders a form requesting input criteria.
When submitted, this page performs an HTTP POST to salesreport-results.php, which performs the SQL queries necessary to retrieve the report results, and renders the data on screen.

Quick, yes, but dirty, too; there a number of drawbacks to this approach:

It's not a good example of using the MVC design pattern that, for reasons discussed in Chapter 13, is to be encouraged and employed wherever possible.
There is little scope for code reuse (except through the copy-and-paste approach) when developing different reports in the future.
If the user wants to view the report's output again at some point in the future, the report has to be generated once more. This results in not only a waste of processing time but also the possibility of different results. What if the database has changed since then? A report is a snapshot in time; users will expect it to be the same next time they view it.
Users may be kept waiting for a long time while the report generates. On more complex reports, this could be up to a minute or more; what if the request times out at their end, that is, in their Web browser? What if they click the Stop button? It's generally accepted that a Web page should load pretty quickly to keep the user happy, and this includes a report that's been generated.

The last of these drawbacks is potentially the most serious, of course. Our book sales example won't take more than a few seconds on even the slowest of database servers, but many reports run to considerably more complex output than that and can even comprise hundreds, if not thousands, of queries.

As a rule, and given the nature of HTTP as a stateless protocol, if the rendering of a subsequent page on your application is dependent on some external, potentially slow process, you should attempt to take that process offline.

To cite another example, consider how many e-commerce sites ask their users for credit card numbers to make online purchases. It's a fair number, isn't it? Consider, however, how many of those sport almost comical messages declaring "don't click the Submit button more than once'' or something similar. Furthermore, even the most credit-worthy of you will have been made to wait as many as forty seconds to have your card authorized. If your request times out, how do you know whether you've been charged?

The approach adopted by Amazon is very different, and certainly far more highly recommended. Why do you need to authorize there and then? Why not do it offline and let the user know in due course whether the purchase went through successfully?

Of course, since Amazon's delivery is physical rather than electronic, it can afford to notify its user of success or otherwise by e-mail some minutes, or even hours, later. But the same principle can be applied to real-time scenarios, too, such as when allowing access to online content requiring credit card payment.

Consider the traditional approach to authorizing credit cards:

First page requests card number in HTML form; offers data via SSL HTTP POST.
Second page accepts HTTP POST from first page, authorizes card in real time, and returns result to user thirty seconds later.

Compare this to a more intelligent approach, which is an extension of that adopted by Amazon and provides pseudo real-time processing:

First page requests card number in HTML form; offers data via SSL HTTP POST.
Second page accepts HTTP POST from first page, puts card data into to-be-processed database, and returns user to authorization-pending page a split second later.
The authorization-pending page refreshes on the client side every couple of seconds until consultation with the database shows that the card transaction has been processed and a result has been determined.
An entirely separate process on the server that runs independently of the Web servers periodically (or constantly) checks the database for pending card transactions and, if it finds them, processes them, updating the database with the result when finished.

The benefits are obvious:

There is virtually no scope for the user to click the Stop button, because there is virtually no delay in returning the page to him/her.
If the user clicks the Submit button more than once, it's no big deal because a UNIQUE key on the database can prevent duplicate entries.
If the user clicks Refresh on the authorization-pending page, there's no problem because it isn't changing any data it simply refreshes.

We can apply this same logic to generating reports, as discussed next.

The Offline Approach

Using the offline approach to generating reports has huge benefits from both a user experience and an application performance perspective. Admittedly, the report itself doesn't generate any more quickly, since exactly the same database queries are being executed. But the user experience is far better, and many of the practical and technical pitfalls discussed previously are avoided altogether.

The basic principles are as follows:

A database table called report contains template information for every distinct report available on the system.
A generic page called reporting.php lists the available reports by consulting said table (and, of course, uses a Smarty template so as to be MVC-compliant). A link is also provided to the My Reports page (more on that shortly).
The user can select a given report from the table, which links to another page called newreport.php, with the report identifier passed as an HTTP GET parameter (for example, newreport.php?report_id=14). This page uses an XML template associated with that report to render a page specific to that report, which is then used by the users to specify the criteria for this instance of the report.
When the user submits the report, it is entered into a table called report_instance and flagged as PENDING. The supplied criteria are entered into a table called report_instance_ criteria. The user is then redirected to the My Reports page.
The My Reports page shows a list of all reports held in the database for that user, along with their status; PENDING, PROCESSING, or COMPLETED. The COMPLETED ones can be viewed in one of a number of formats (for example, HTML, PDF, Excel). Which formats are available depends on the "translator'' scripts available for that report.
A single PHP script called reportprocessor.phpx (with the extension .phpx signifying this is a command line executable rather than a Web page) runs constantly on a machine attached to the application's database server, constantly scanning for reports in the report_instance table with their status set to PENDING. It immediately sets them to PROCESSING and then forks a new handler process appropriate to that script, with each script having its own dedicated handler, which is also a PHP script. It then recommences the loop, again looking for reports marked as PENDING.
The handler script consults the report_instance_criteria to determine the criteria for this report and then builds SQL queries as needed to extract the relevant data. This data is written to an XML file and the instance marked as COMPLETED.
Should the user then look at the My Reports page, the report in question will be seen to be COMPLETED rather than PROCESSING. The user can then view the results of the report.
The "translator'' scripts for a given report take the raw XML output of the handler scripts and translate them into the output format in question. In the case of HTML, this might be accomplished using XSLT. In the case of a PDF, this might be accomplished using an XSLT followed by a pass of Apache FOP to translate that XML into a PDF document, as discussed in the first section of this chapter.

Let's look at this approach in more detail now. We don't give you the source code verbatim because it would run to many hundreds of pages; however, if you follow the architectural outline detailed in this section, you should easily be able to construct a reporting platform to fit your application's requirements perfectly. Alternatively, the full source code for an example implementation of this methodology can be found on the Wrox Web site at www.wrox.com.

Later in the chapter, you'll meet a real-world example of using this methodology to process a user's request for a report.

The Reports Interface

The main Reports interface is the home page of the reporting component of your application. Should you add reporting as an option to your navigation, this is inevitably the first page to be rendered.

The page will comprise two files reporting.php and reporting.tpl, a Smarty template.

The purpose of the interface is as follows:

Welcome the user to the reporting component of the application.
Provide a link to the My Reports page so that the user may view any requested reports and see their status.
Provide a list of the types of report on the system and allow the user to create a new instance of that report.

Note that "instance'' has nothing to do with OOP. It is simply an incarnation of a particular report. For example, you may have a generic report that generates sales comparison figures; an instance of that report would be that generated by Joe Bloggs on the 3^rd of August to compare figures from April and June, for example. A report is not associated with particular input criteria, whereas an instance of it is associated with a very particular set of input criteria.

The list of types of report available on the system would be achieved through the consultation of the report table. You may wish to provide some administrative interface to allow new reports to be added to this table, but keep in mind that in the architecture we are presenting there is a requirement for some fairly sophisticated files to be constructed for each new report. With this in mind, you may see limited value in providing an interface to create brand-new reports. Indeed, it is probably commercially preferable that your client is dependent on you for creating new report templates.

The report database table, which contains the list of available report types and is used by the Reports interface to generate the list of available report types, is structured as shown in the following table.

Column	Data Type	Description
Id	SERIAL	The identifier and primary key of the table
report_code	character varying(8) NOT NULL	Another unique identifier, but more human readable, such as salesrep
report_name	character varying(64) NOT NULL	The name of the report, such as "sales comparison''
report_desc	character varying(256)	A description for the report

Note that the preceding is tailored for PostgreSQL in keeping with the rest of this book. It would be a trivial matter to adapt this table structure for MySQL or another database platform, however.

As you can see, we don't store much data about the report in the report table. We don't need to; almost everything else will be stored on disk, as you'll see.

The data about available reports is extracted from the preceding table. Unless a huge number of reports are available, you probably don't need to concern yourself with pagination. You simply provide the name of each report type and a link to create a new instance of a report of this type, using the interface discussed in the next section.

This should be passed as an HTTP GET parameter to newreport.php, discussed next. The preceding information should be presented using a Smarty template (see Chapter 13), perhaps named reporting.tpl.

Type	Description
freetext	A free text entry box, stored as entered
date	A date, represented as three form components (day, month, year) but stored as a single ISO date (YYYY-MM-DD)
time	A time, represented as three form components (hour, minute, seconds) but stored as a single ISO time (HH:MM:SS)
datetime	A date and time, represented as six form components (day, month, year, hours, minutes, seconds) but stored as a single ISO date (YYYY-MM-DD HH:MM:SS)
fkmultiple	A list box showing entries from a foreign key table. The format for this type is fkmultiple:tablename/storecolumn/displaycolumn, where tablename is the table containing the foreign entity, storecolumn is the column containing the values to be stored, and displaycolumn is the column to display in the list box. This is used in our preceding example to allow the selection of a publisher or author (from the publisher and author tables respectively).
fksingle	As described previously, but allowing only a single entity from the foreign table to be selected

Type

Description

freetext

A free text entry box, stored as entered

date

A date, represented as three form components (day, month, year) but stored as a single ISO date (YYYY-MM-DD)

time

A time, represented as three form components (hour, minute, seconds) but stored as a single ISO time (HH:MM:SS)

datetime

A date and time, represented as six form components (day, month, year, hours, minutes, seconds) but stored as a single ISO date (YYYY-MM-DD HH:MM:SS)

fkmultiple

A list box showing entries from a foreign key table. The format for this type is fkmultiple:tablename/storecolumn/displaycolumn, where tablename is the table containing the foreign entity, storecolumn is the column containing the values to be stored, and displaycolumn is the column to display in the list box. This is used in our preceding example to allow the selection of a publisher or author (from the publisher and author tables respectively).

fksingle

As described previously, but allowing only a single entity from the foreign table to be selected

Column

Data Type

Description

SERIAL

The identifier and primary key of the table

report_id

int4

The report in question, a foreign key from the report table

submitted

datetime

Date and time at which the report was submitted for processing

submitting_user_id

int4

The ID of the user submitting the report, used for matching on the My Reports page

status_flag

character(1)

The status of the report for the processor; P is pending, I is in progress, and C is completed

Column

Data Type

Description

SERIAL

The identifier and primary key of the table

instance_id

int4

The report instance in question, a foreign key from the report_instance table

criterion_name

varying(128)

The name of the criterion in question, such as datefrom1

criterion_value

character varying(128)

The value of the criterion in question, such as 2004-01-01

# /scripts/handlers/salesrep.php /generated/output19039.xml datefrom1=2004-03-01 dateto1=2004-03-31 datefrom2=2004-05-01 dateto2=2004-05-31 authors=31,14,12,11,15 publishers= [blank line] [script executes here]

$objDomXML = new DomDocument; $objDomXML->loadXML($strXML); $objDomXSL = new DomDocument; $objDomXSL->loadXML($strXSL); $proc = new XSLTProcessor; $proc->importStyleSheet($objDomXSL); $strHTML = $proc->transformToXML($objDomXML);

Report Generation Architecture