Highly Available PostgreSQL with Sequoia | Professional Apache Geronimo (Wrox Professional Guides)

Sequoia is an Open Source project providing transparent database clustering, load balancing, replication, and failover via JDBC. As a follow-on project to ObjectWeb C-JDBC (http://c-jdbc.objectweb.org), Sequoia also offers a group-based communication project, a C++ client library with an ODBC driver and a management console. Sequoia accomplishes this via the Sequoia JDBC driver that forwards all database requests to one or more Sequoia controllers, which distributes these requests over a group of database instances. Sequoia accomplishes this by providing the concept of a redundant array of inexpensive databases (RAID). Figure 13-12 shows a high-level diagram explaining how Sequoia works.

image from book
Figure 13-12: High-level architecture of Sequoia

In the Sequoia architecture, client applications make use of the Sequoia JDBC drivers instead of the database-specific JDBC drivers. This simple change hooks Sequoia so that the client applications are backed by a redundant array of databases without any changes to the client applications. The Sequoia controller hosts distributed host managers that host virtual databases. These virtual databases handle incoming JDBC requests from the Sequoia JDBC drivers via a request manager that provides load balancing across the actual databases on the back end via the various RAID configurations.

However, right in the middle is a recovery log that is stored in an instance of a HSQLDB database (http://hsqldb.org). HSQLDB is a 100 percent Java relational database that runs in a 100KB footprint and is designed to be embedded inside applications. The recovery log handles checkpoints and allows back ends to dynamically be added to a running cluster and to dynamically recover from a failure. You can download and find out more information on Sequoia from:

http://sequoia.continuent.org

Using Sequoia, the deployment plan for PostgreSQL can be made to make use of all of these features.

Configuring a Clustered PostgreSQL

For starters, the PostgreSQL deployment plan from the previous section is used as a starting point for the clustered database configuration. This deployment plan will hold almost all of the same data as the previous deployment plan, but with some changes for use with Sequoia. The changes are actually quite minimal, as shown in Listing 13-5.

Listing 13-5: The Clustered PostgreSQL Deployment Plan

 <?xml version="1.0" encoding="UTF-8"?> <connector xmlns="http://geronimo.apache.org/xml/ns/j2ee/connector-1.1">   <dep:environment xmlns:dep="http://geronimo.apache.org/xml/ns/deployment-1.1">     <dep:moduleId>       <dep:groupId>postgresql</dep:groupId>       <dep:artifactId>sequoia</dep:artifactId>       <dep:version>1.1</dep:version>       <dep:type>car</dep:type>     </dep:moduleId>     <dep:dependencies>       <dep:dependency>         <dep:groupId>sequoia</dep:groupId>         <dep:artifactId>sequoia</dep:artifactId>         <dep:version>2.9</dep:version>         <dep:type>jar</dep:type>         <dep:import>classes</dep:import>       </dep:dependency>     </dep:dependencies>   <resourceadapter>     <outbound-resourceadapter>       <connection-definition>         <connectionfactory-interface>           javax.sql.DataSource         </connectionfactory-interface>         <connectiondefinition-instance>           <name>ClusteredPostgreSQLDataSource</name>           <config-property-setting name="UserName">             geronimo           </config-property-setting>           <config-property-setting name="Password">             geronimo           </config-property-setting>           <config-property-setting name="Driver">             org.continuent.sequoia.driver.Driver           </config-property-setting>           <config-property-setting name="ConnectionURL">             jdbc:sequoia://localhost:5432/test           </config-property-setting>           <connectionmanager>             <local-transaction/>             <single-pool>               <max-size>10</max-size>               <min-size>0</min-size>               <blocking-timeout-milliseconds>5000</blocking-timeout-milliseconds>       <idle-timeout-minutes>15</idle-timeout-minutes>       <match-one/>      </single-pool>     </connectionmanager>    </connectiondefinition-instance>   </connection-definition>  </outbound-resourceadapter> </resourceadapter>

The first change is the artifact-ID to identify the fact that this is PostgreSQL with Sequoia. The next change is the addition of the Sequoia JDBC driver JAR dependency instead of the PostgreSQL JDBC driver. The last change is to the JDBC URL. It must change to work with the Sequoia driver. In addition to the Geronimo deployment plan, there is also configuration for Sequoia itself.

The Sequoia Configuration

To make Sequoia aware of the actual databases, Sequoia-specific configurations are necessary. Though getting Sequoia up and running is an exercise not detailed here, two configuration files are shown in Listings 13-6 and 13-7.

Listing 13-6: The postgresql-raidbl1-distributin-1.xml Sequoia Configuration File

 <?xml version="1.0" encoding="UTF8"?> <!DOCTYPE SEQUOIA PUBLIC"-//Continuent//DTD SEQUOIA 2.3//EN" "http://sequoia.continuent.org/dtds/sequoia-2.3.dtd"> <SEQUOIA>   <VirtualDatabase name="myDB">  <Distribution>     <MessageTimeouts/>  </Distribution>    <AuthenticationManager>       <Admin>          <User username="admin" password=""/>       </Admin>        <VirtualUsers>         <VirtualLogin vLogin="user" vPassword=""/>       </VirtualUsers>     </AuthenticationManager>     <DatabaseBackend name="backend1" driver="org.postgresql.Driver" url="jdbc:postgresql://node1/myDB" connectionTestStatement="select now()">       <ConnectionManager vLogin="user" rLogin="TEST" rPassword="">         <VariablePoolConnectionManager initPoolSize="10" minPoolSize="5" maxPoolSize="50" idleTimeout="30" waitTimeout="10"/>       </ConnectionManager>     </DatabaseBackend>     <DatabaseBackend name="backend2" driver="org.postgresql.Driver" url="jdbc:postgresql://node2/myDB" connectionTestStatement="select now()">       <ConnectionManager vLogin="user">          <VariablePoolConnectionManager initPoolSize="10"/>       </ConnectionManager>     </DatabaseBackend>     <RequestManager>       <RequestScheduler>          <RAIDb-1Scheduler level="passThrough"/>       </RequestScheduler>       <LoadBalancer>          <RAIDb-1>             <WaitForCompletion policy="first"/>             <RAIDb-1-LeastPendingRequestsFirst/>          </RAIDb-1>       </LoadBalancer>       <RecoveryLog driver="org.hsqldb.jdbcDriver"         url="jdbc:hsqldb:hsql://localhost:9001" login="SA" password="">         <RecoveryLogTable tableName="RECOVERY" idColumnType="BIGINT NOT NULL"           vloginColumnType="VARCHAR NOT NULL" sqlColumnType="VARCHAR NOT NULL"           extraStatementDefinition=",PRIMARY KEY (id)"/>         <CheckpointTable tableName="CHECKPOINT"           checkpointNameColumnType="VARCHAR NOT NULL"/>         <BackendTable tableName="BACKEND"           databaseNameColumnType="VARCHAR NOT NULL"           backendNameColumnType="VARCHAR NOT NULL"           checkpointNameColumnType="VARCHAR NOT NULL"/>         <DumpTable tableName="DUMP" dumpNameColumnType="VARCHAR NOT NULL"           dumpDateColumnType="VARCHAR NOT NULL"           dumpPathColumnType="VARCHAR NOT NULL"           dumpFormatColumnType="VARCHAR NOT NULL"           checkpointNameColumnType="VARCHAR NOT NULL"           backendNameColumnType="VARCHAR NOT NULL"           tablesColumnType="VARCHAR NOT NULL"/>       </RecoveryLog>     </RequestManager>   </VirtualDatabase> </SEQUOIA>

Listing 13-7: The postgresql-raidb1-distribution-1.xml Sequoia Configuration File

 <?xml version="1.0" encoding="UTF8"?> <!DOCTYPE SEQUOIA PUBLIC"-//Continuent//DTD SEQUOIA 2.3//EN" "http://sequoia.continuent.org/dtds/sequoia-2.3.dtd"> <SEQUOIA>   <VirtualDatabase name="myDB">  <Distribution>     <MessageTimeouts/>  </Distribution>    <AuthenticationManager>       <Admin>          <User username="admin" password=""/>       </Admin>        <VirtualUsers>         <VirtualLogin vLogin="user" vPassword=""/>       </VirtualUsers>     </AuthenticationManager>     <DatabaseBackend name="backend3" driver="org.postgresql.Driver" url="jdbc:postgresql://node3/myDB " connectionTestStatement="select now()">       <ConnectionManager vLogin="user" rLogin="TEST" rPassword="">         <VariablePoolConnectionManager initPoolSize="10" minPoolSize="1" maxPoolSize="0"/>       </ConnectionManager>     </DatabaseBackend>     <DatabaseBackend name="backend4" driver="org.postgresql.Driver" url="jdbc:postgresql://node4/myDB " connectionTestStatement="select now()">       <ConnectionManager vLogin="user"> <!-- assumes rlogin/rpassword identical to vLogin/cPassword -->         <VariablePoolConnectionManager initPoolSize="1"/>       </ConnectionManager>     </DatabaseBackend>     <RequestManager>       <RequestScheduler>          <RAIDb-1Scheduler level="passThrough"/>       </RequestScheduler>       <LoadBalancer>          <RAIDb-1>             <WaitForCompletion policy="first"/>             <RAIDb-1-LeastPendingRequestsFirst/>          </RAIDb-1>       </LoadBalancer>       <RecoveryLog driver="org.hsqldb.jdbcDriver"         url="jdbc:hsqldb:hsql://localhost:9001" login="SA" password="">         <RecoveryLogTable tableName="RECOVERY" idColumnType="BIGINT NOT NULL"           vloginColumnType="VARCHAR NOT NULL" sqlColumnType="VARCHAR NOT NULL"           extraStatementDefinition=",PRIMARY KEY (id)"/>         <CheckpointTable tableName="CHECKPOINT"           checkpointNameColumnType="VARCHAR NOT NULL"/>         <BackendTable tableName="BACKEND"           databaseNameColumnType="VARCHAR NOT NULL"           backendNameColumnType="VARCHAR NOT NULL"           checkpointNameColumnType="VARCHAR NOT NULL"/>         <DumpTable tableName="DUMP" dumpNameColumnType="VARCHAR NOT NULL"           dumpDateColumnType="VARCHAR NOT NULL"           dumpPathColumnType="VARCHAR NOT NULL"           dumpFormatColumnType="VARCHAR NOT NULL"           checkpointNameColumnType="VARCHAR NOT NULL"           backendNameColumnType="VARCHAR NOT NULL"           tablesColumnType="VARCHAR NOT NULL"/>       </RecoveryLog>     </RequestManager>   </VirtualDatabase> </SEQUOIA>

Listings 13-6 and 13-7 set up four PostgreSQL database instances on four nodes to be clustered using Sequoia. These two configurations are almost identical except for two differences:

The JDBC URLs for the PostgreSQL databases that are hosted on two separate nodes per configuration file for a total of four
The connection pool manager configuration for each database

In addition to these two differences, these configuration files set up the RAID configuration and the recovery logs for each request manager. For more information on RAID configuration options and other features provided by Sequoia, see the Sequoia documentation. But the only thing left to do at this point is to deploy the configuration.

Deploying the Clustered PostgreSQL Deployment Plan

Deploying this plan is no different than deploying the nonclustered version of the plan. Interaction with the Geronimo deployer can be accomplished using either Maven or Ant, or a direct interaction can take place on the command line, using the shell script wrapper for the Geronimo deployer:

 $ ./bin/deploy.sh --user system --password manager deploy \   ./plan.xml \   ~/.maven/repository/tranql/rars/tranql-connector-1.0.rar     Deployed ClusteredPostgreSQLDataSource

Again, this output (upon successfully deploying the plan) is minimal. In addition to this message, the Geronimo Console can be used to determine what J2EE resource adapters are currently running.