For the rest of this chapter, I will be writing about console applications. When you create a console application project, there is only one class available by default, which is the App class, a subclass of ConsoleApplication. It has two events, Run and UnhandledException. The Run event is triggered when the application is started, and it serves as the starting point for all console applications. Transformer: Command-Line XML TransformationThe first console example is a simple command-line application that takes two arguments. The first is a path that points to an XML document, and the second is a path that points to an XSLT stylesheet that will be used to convert the document. The results of the transformation are sent to standard output by using the Print method. There are two classes, mwArgHandler and mwTransformer. mwArgHandler is a class that's used to parse the arguments that are passed to the application from the command line. The mwTransformer class is the class that wraps the process of transforming an XML document using a given stylesheet. Class AppAll console applications have an App class, which should come as no surprise. This class has one property, which is an instance of mwArgHandler. One thing that you will notice is that the output is sent to the standard output using the Print command. In addition to the expected output of the program, which is the result of the XSLT transformation, I also send error messages to standard output using Print. I do not have to do it this way, but I chose to because it's convenient and because standard output and standard error are often the same thing. If I wanted to send error messages specifically to standard error, I could use the StdErr method to get a reference to it. It implements REALbasic's Writeable interface (which is different from the iWritable interface I introduced in a previous chapter), so after you have a reference to it, you can write data to it by calling Write(someData as String). Property Arg As mwArgHandler Listing 7.56. Function Run(args() as String) As Integer Handles Event
Listing 7.57. Sub UnhandledException(error As RuntimeException) Handles Event
Class mwArgHandler Inherits DictionaryThis class handles the parsing of command-line arguments. Earlier I described a special kind of argument called an option, which is basically a key/value pair that together represents a single argument. This class parses those key/value pairs and makes the values more readily accessible. The arguments are stored in the Arguments array property, and the options are stored in the Options dictionary: Property Arguments(-1) As String Property Options As Dictionary There is one very important piece of information you need to know to accurately parse command-line arguments, and that is the encoding that is used for the text. Windows computers use UTF-16, whereas UNIX-like systems use ASCII. The args() array is parsed in the Constructor of the class, but you will see that the arguments are converted into UTF-8 as it is being parsed. mwArgHandler is a subclass of Dictionary, so if the arguments are not converted to UTF-8, the keys in the Dictionary are UTF-16. If you try to see whether the Dictionary has a particular key and you use a String literal, or a variable that is encoded in UTF-8, and call the Dictionary.HasKey("literalString") method, the result will always be False. There is also a difference in how Print methods are called. In the App class, I used Print to display error messages as well as the output of the transformation, because I said that standard error is usually the same thing as standard output. Because the App class is a subclass of ConsoleApplication, I know that it will always be run from in an environment where standard input and standard output are meaningful. This is not the case with the mwArgHandler and mwTransformer, because there are other situations where I could make use of these classes in desktop applications. So I want to make sure the classes work in both kinds of applications: desktop and console. To do that, these classes introduce another compiler directive, TargetHasGui, which is used to see if the class instance is being run from a console application or a desktop application. Depending on the result, different steps are taken with error and logging messages. Listing 7.58. Sub Constructor(args() as String)
Listing 7.59. Function getArgCount() As Integer
Listing 7.60. Function getFolderItemFromPath(aPath as String) As FolderItem
Class mwTransformerThe mwTransformer class is simple. It has one function that wraps the XMLDocument.Transform method and adds some error checking. This same class will be used in the RSSReader application, as well as in the rbCGI application that I will share with you in the next section. Listing 7.61. Function transform(source as FolderItem, xslt_folder as FolderItem) As String
CGI ApplicationCGI stands for the common gateway interface. Its called an interface because it provides the means for Apache (or any web server that supports CGI) to execute scripts and applications on the host machine of a web server. When a user types in a URL into his or her web browser, that URL often represents the location of an HTML file that the server just picks up and sends back to the browser. In a CGI program, the URL represents a script or a program that gets executed. The output of the program then gets sent back to the user. To provide security, Apache allows the administrator to configure which directories allow CGI programs to be executed. Most often, the default CGI directory is the cgi-bin directory. On OS X, the cgi-bin directory is here: /Library/WebServer/CGI-Executables For this example, we'll be placing our REALbasic CGI program in this directory. Very often, you'll see CGI scripts that end with a .cgi extension, but we won't need to use thatin fact, you should avoid using any extensions, because it will mess things up. Other scripting languages, such as Perl and Python, usually reside on the web server as text files that are executed by an interpreter. Apache uses extensions to map an interpreter to a particular file. Because REALbasic is a compiled program, it doesn't need an interpreter, and it's better just to leave the extension off. It also provides for a much nicer URL, which is important, too. Now we can start work on the program. The easiest way to work is to save the project in the CGI-Executables directory. You'll need to compile the application to test it with Apache, and it's easier to compile it and leave it there to test than to compile it and copy it to the CGI directory. As is the case with all console applications, there are two default events in a UnhandledException and Run. The Run event is triggered when the program is launched. In the case of a CGI application, it is triggered when a user requests it by typing selecting the applications URL in a web browser. When I created the console application project, I named the application CGI for Linux and CGIX for Macintosh OS Xboth with no extensions. Because console applications do not have a graphical interface, they have to be able to input data and output data in some other fashion. For programs that are executed on the command line, this is typically referred to as standard input and standard output, respectively. With a REALbasic console application, the command Input represents (you guessed it) standard input. Print sends data to standard output. In addition to standard input and output, CGI applications also make use of environment variables that are set by the web server. To access environment variables, you need the System object, which includes the method System.EnvironmentVariable(), which returns the value for the environment variable that is passed to it. Because the Run event is triggered when the application is invoked by the web server, it is in the Run event that we put the main part of our code. I also created a cgiRequest object, which is created when the Run method is executed. It is a subclass of Dictionary. It is used to hold the data that is passed to the CGI application from Apache, and it executes a Write method that sends data back to the client browser. The Run method should look like this: Listing 7.62. Function Run(args() as String) As Integer Handles Event
At the beginning of the Run event handler is a pragma directive. It's similar to the other compiler directives that were encountered in previous chapters, except that instead of identifying the system platform, it tells REALbasic how to compile the application. In this case, background tasks are disabled because Apache doesn't work well with them. If you don't disable them, every time you do a loop, or execute anything that triggers a new thread or background task, the application crashes mercilessly. You can also use this directive to speed up processor-intensive stretches of your REALbasic code in other situations. The downside is that other threads stop running and events don't get triggered, but it can make a significant performance difference otherwise. Class cgiRequest Inherits DictionaryProperty query as Dictionary Property cookies as Dictionary The two variables that matter most to us are "REQUEST_METHOD" and "QUERY_STRING". There are several kinds of requests a web server can accept. The two that concern us are "Post" requests and "Get" requests. The distinction between the two in actual practice is virtually nonexistent, except that it changes the way that form data is passed to the CGI program. Anytime you fill out a form on a web page, either to log in or make a purchase, the information that you enter needs to be transferred to the server so that it can take some appropriate action. When you create a form in HTML, you have the option of selecting the request method you want to useeither "Get" or "Post". If you choose "Get", the data from the form is encoded and sent across as part of the URL. If you use "Post", the data is sent to the CGI program as standard input. Here is an example of a "Get" request URL: localhost/cgi-bin/test?cat=dog The first step in processing a CGI request is to find out what kind of request it is, and process it accordingly. In the cgiRequest class, I have implemented the following method: Listing 7.63. Sub getQueryString()
Listing 7.64. Sub handleRequest()
The method creates a new Dictionary to hold the values of the query (the data from the form). If the request method is a "Post", the method grabs the string from standard input. If it is a "Get", it grabs it from the environment variable "QUERY_STRING". Beyond that, everything else is the same. The string is parsed, and the dictionary values are set. Finally, the HTTP_COOKIES environment variable must be checked. If it has a value, you will need to parse it and instantiate a cgiCookie object. In web programming, cookies are a way to identify individual users and it can help you store information specifically about that user. After it is parsed, it consists of key/value pairs for each cookie that has been set. You can set new cookies in the cgiResponse class. Class cgiCookieProperty key(-1) as String Property value(-1) as String Listing 7.65. Function getItem(sequence as integer, byref key as string, byref value as string) As Boolean
Listing 7.66. Function hasKey(myKey as string) As Boolean
Listing 7.67. Sub setValue(key as string, value as string)
Listing 7.68. Function getValue(index as integer) As String
Listing 7.69. Sub getValues(myKey as string, byref myValues() as string)
Listing 7.70. Function count() As Integer
Class cgiResponseWe now have a cgiRequest object that contains all the needed values from the request, plus the query parsed into a dictionary. This object will be passed to the cgiResponse object when it is instantiated, and the cgiResponse object will be able to use the data in it to prepare the appropriate response. To send data back to the client, we need to send some header information followed by an html string: Property str as String Property request as cgiRequest Property contentType as String Property cookies as String Property location as String Property status as String Listing 7.71. Sub Constructor(myRequest as cgiRequest)
Listing 7.72. Sub addHeader(myHeader as String)
Listing 7.73. Sub setCookie(cName as String, cValue as String, cExpires as Date, cPath as String, cDomain as String, cSecure as Boolean)
Listing 7.74. Sub setContentType(myType as String)
Listing 7.75. Sub setLocation(myURL as String)
Listing 7.76. Sub setStatus(myStatus as String)
Listing 7.77. Function Transform() as String
Listing 7.78. Sub write()
Listing 7.79. Sub setHTML(myHtml as string)
If you are using OS X and placed the application in the /Library/WebServer/CGI-Executables directory, and set the application name as "CGIX", you should be able to access the script from the following URL: Localhost/cgi-bin/CGIX/CGIX The reason the CGIX repeats itself is because this is a Mach-O build for OS X, so it builds it in a typical bundle, which isn't recognized as such by Apache. All it sees is a directory with an application inside of it, so you have to use the path that specifies the actual executable. You should be able to paste this URL into the browser, press Enter, and then get back a list of the variables. If you want to test the query string, then enter a URL such as the following: Localhost/cgi-bin/CGIX/CGIX/realsoftware.xml If you have the realsoftware.xml file in the right location, as well as a stylesheet to transform it, the results will look something like Figure 7.5: Figure 7.5. CGI output in Safari.You now have a good starting point for writing CGI programs in REALbasic for Apache. One thing you'll notice, especially if you have a lot of traffic on your site, is that CGI can be slow at times. The reason is that the program has to be started up with each request, which produces a lot of overhead. The downside to REALbasic is that it produces large executable filesabout 1.3MB for this simple CGI program, so the particular solution is best limited to low-traffic sites. Because of this, there have been a variety of CGI workarounds that speed up the process. They way they work is that instead of invoking the program each time it is requested, the program stays resident in memory and handles the requests as they come in. This is usually accomplished with an Apache plug-in. This is an interesting approach that can be used with REALbasic as well, and you don't need to rely on console programming. I developed a REALbasic application that worked with an Apache plug-in called "mod_scgi". Mod_scgi works by taking the data that Apache would normally send as environment variables to a CGI program, and instead sends it as a block of data over a TCP connection. Using REALbasic's networking capabilities, you can create a SocketServer that creates a pool of TCPSockets that listen on the appropriate port, gets the data when it is available, parses it, and acts on it just like a CGI program. As soon as the individual socket is done, instead of exiting, it returns to listening on the port for the next request. This creates a huge performance boost and is a tactic that should be considered if you expect a lot of traffic to your site. The following is the original (and best) guide to CGI from the inventor's of Mosaic, NCSA: http://hoohoo.ncsa.uiuc.edu/cgi/ |