2.2. Stable Database ConnectionsA new database connection can be created quickly and easily, but this ease can disguise the high cost of making repeated connections. You must manage the use of database connections with great care. The consequences of allowing multiple connections to occur, perhaps hidden within an application, can be substantial, as the next example illustrates. Some time ago I came across an application in which numerous small files of up to an arbitrary maximum of 100 lines were being processed. Each line in these small text files contained both data and the identification of the database instance into which that data had to be loaded. In this particular case, there was a single server, but the principle being illustrated is exactly the same as if there were a hundred database instances. The process for each file was coded as follows: Open the file Until the end of file is reached Read a row Connect to the server specified by the row Insert the data Disconnect Close the file This process worked quite satisfactorily, except for the occasional circumstance in which a large number of small files would arrive in a very short space of time, and at a rate greater than the ability of the application to process them. This resulted in a substantial backlog, which took considerable time to clear. I explained the problem of performance degradation as a consequence of frequent connection and disconnection to the customer with the help of a simple program (written in C) emulating the current application. Table 2-1 gives the results from that demonstration.
The demonstration showed the importance of trying to minimize the number of separate database connections that had to be made. Thus, there was an obvious and enormous advantage in applying a simple check to determine whether the "next" insert was into the same database as the previous one. The rationalization could go further, as the number of database instances was of course finite. You could likely achieve further performance gain by setting up an array of handlers, one for each specific database connection, opening a new connection each time a new database is referenced, and thus connecting at most once to each database. As Table 2-1 shows, the simple technique of connecting only once (or a very few times) improved performance by a factor of more than 200 with very little additional effort. Of course, this was an excellent opportunity to show that minimizing the number of round-trips between a program and the database kernel, using arrays and populating them with incoming data, can also lead to spectacular improvements in performance. By inserting several rows at once, the throughput could be radically improvedby another factor of five. The results in Table 2-1 demonstrate that improvements in the process could reach a modest factor of 1,200. Why such dramatic improvement?
Database connections and round-trips are like Chinese Wallsthe more you have, the longer it takes to receive the correct message. |