Appendix A: Trade3 Application | DB2 UDB V8 and WebSphere V5. Performance Tuning and Operations Guide2004

< Day Day Up >

9.3 Problem determination scenarios

As we have already covered the basic tuning guidelines for DB2 UDB and WebSphere in previous chapters, here we are going to apply those skills and methodologies to solve various problems using scenarios.

9.3.1 Connectivity scenario

In Chapter 5, "Operational setup" on page 117, we demonstrate the WebSphere Application Server and DB2 UDB system setup using the DB2 EJB Sample application AccessEmployee. We install and configure WebSphere Application Server in the Kanaga machine. The database server is in Atlantic, which has DB2 UDB V8 running and Sample database created. We create the DB2 data source named jdbc/Sample, which uses the DB2 Universal JDBC Driver Type 2 Driver to connect to the sample database. During the setup for the DB2 data source, we associated the DB2 data source with a J2C Authentication Data, which represents the combination of the user db2inst1 and its corresponding password. For more details about the above description refer to "The steps to create and configure DB2 Data Source" on page 144.

The db2inst1 user is the instance owner, also a member of the system administrator group of the instance. This user has the upmost authority of the instance, and could do almost anything against the instance. Based on the security consideration, you are recommended to not use the system administrator authority in your database client application, but use a general user with the appropriate authority.

In this subsection, a new user named db2itso is created. This user does not belong to any system groups such as SYSADM, SYSCTRL and SYSMAINT of the DB2 UDB instance. We use this user for the AccessEmployee application to connect to the sample database to replace the original db2inst1 user. The basic steps are as below:

Create new user and grant appropriate authority.
Create new J2C Authentication Data and use it for AccessEmployee application.
Test and troubleshoot runtime issues.

The following shows you the details of the above steps.

Create new user and grant appropriate authority

In our example, we create new user db2itso and grant appropriate database access authority using the following steps:

Use ksh via the SMIT utility on AIX to create a db2itso user ID belongs to the group db2ugrp.
Add the following line to the .profile of db2itso user to initialize the DB2 environment:
```
 . /home/db2inst1/sqllib/db2profile 
```
Use the db2inst1 user to revoke the database connect authority from the public group.
Grant the database connect authority to the db2itso user.

Establish the connection to the SAMPLE database by using the db2itso user, as shown in the Example 9-4.

Example 9-4: New user creating and authority changing

 $  db2 connect to sample $ db2 "revoke connect on database from public" $ db2 "grant connect on database to db2itso" $  db2 "select char(grantor,10) grantor,char(grantee,10) grantee, dbadmauth, connectauth, granteetype, securityadmauth from sysibm.sysdbauth" GRANTOR    GRANTEE    DBADMAUTH CONNECTAUTH GRANTEETYPE SECURITYADMAUTH ---------- ---------- --------- ----------- ----------- --------------- SYSIBM     DB2INST1   Y         Y           U           N SYSIBM     PUBLIC     N         N           G           N DB2INST1   PUBLIC     N         N           G           N DB2INST1   DB2ITSO    N         Y           U           N   4 record(s) selected. $ db2 "grant select,insert,update,delete on table db2inst1.employee to user db2itso" DB20000I  The SQL command completed successfully. $ db2 "grant select,insert,update,delete on table db2inst1.department to user db2itso" DB20000I  The SQL command completed successfully. $ db2 connect to sample user db2itso Enter current password for db2itso:   Database Connection Information  Database server       = DB2/6000 8.1.3  SQL authorization ID  = DB2ITSO  Local database alias  = SAMPLE

Create and use new J2C Authentication Data

We create the new J2C Authentication Data with the properties shown in Figure 9-3. The steps for J2C Authentication Data creation are the same as the steps described in step 1 in "The steps to create and configure DB2 Data Source" on page 144.

click to expand
Figure 9-3: Creating New J2C Authentication Data for AccessEmployee

Then we modify the property "Container-managed Authentication Alias" for the existing data source jdbc/Sample to utilize the new J2C entry:

From the WebSphere Application Server Administrative Console, choose Resource -> JDBC Providers -> My DB2 Universal JDBC Provider -> Data Sources.
Click EJBSample, then modify the property "Container-managed Authentication Alias" to "kanaga/ITSODB2AccessEmp".

Be aware that the server may need to be restarted for these changes to take effect. After restarting the WebSphere application server server1, try to test the connection to jdbc/Sample using the steps described in step 5 of "The steps to create and configure DB2 Data Source" on page 144. The connection should be established successfully at this point.

Testing and troubleshooting runtime issues

There is no need to reinstall the AccessEmployee application as we only changed the J2C Authentication Data. We could begin the testing via accessing the following URL:

http://kanaga:9080/AccessEmp/AccessEmployee.html

On the resulting page, try to click Display Employee List, Employee List is displayed in another page successfully. But when clicking Query Employee button or Submit Query button, it would fail with error messages like below.

Example 9-5: Error messages from query employee running

 Error: Cannot retrieve the employee's info with emp No.: 000010 StackTrace: RemoteException occurred in server thread; nested exception is: com.ibm.websphere.csi.CSITransactionRolledbackException: Global tx rolled back

The major difference between Display Employee List and other buttons is that Display Employee List does not use the EJB, whereas other buttons use EJB to interact with the data. From the error messages above, it is difficult to tell what is the underlying reason for the problem. Let us begin the troubleshooting from the System.out stream log. In our testing it is: /usr/WebSphere/AppServer/logs/server1/SystemOut.log

Within the SystemOut.log, you could find the following snippets, as shown in Example 9-6.

Example 9-6: SystemOut.log Investigation

 PART 1: ************ Start Display Current Environment ************ WebSphere Platform 5.0 [BASE 5.0.2 ptf2M0325.01] running with process name kanaga\kanaga\server1 and process id 51268 Host Operating System is AIX, version 5.1 Java version = J2RE 1.3.1 IBM AIX build ca131-20030618 (JIT enabled: jitc), Java Compiler = jitc, Java VM name = Classic VM was.install.root = /usr/WebSphere/AppServer user.install.root = /usr/WebSphere/AppServer Java Home = /usr/WebSphere/AppServer/java/jre ws.ext.dirs = /usr/WebSphere/AppServer/java/lib:/usr/WebSphere/AppServer/classes:/usr/WebSphere/AppServer/cla sses:/usr/WebSphere/AppServer/lib:/usr/WebSphere/AppServer/lib/ext:/usr/WebSphere/AppServer/web /help:/usr/WebSphere/AppServer/deploytool/itp/plugins/com.ibm.etools.ejbdeploy/runtime Classpath = /usr/WebSphere/AppServer/properties:/usr/WebSphere/AppServer/properties:/usr/WebSphere/AppServe r/lib/bootstrap.jar:/usr/WebSphere/AppServer/lib/j2ee.jar:/usr/WebSphere/AppServer/lib/lmproxy. jar ... PART 2: [11/24/03 21:03:04:391 PST] 641cdbcf EJBContainerI I WSVR0037I: Starting EJB jar: AccessEmpEjb.jar [11/24/03 21:03:07:143 PST] 641cdbcf WebContainer  A SRVE0169I: Loading Web Module: AccessEmpWeb.war. [11/24/03 21:03:07:183 PST] 641cdbcf WebGroup     I SRVE0180I: [AccessEmpWeb.war] [/AccessEmp] [Servlet.LOG]: JSP 1.2 Processor: init [11/24/03 21:03:07:226 PST] 641cdbcf WebGroup     I SRVE0180I: [AccessEmpWeb.war] [/AccessEmp] [Servlet.LOG]: SimpleFileServlet: init [11/24/03 21:03:07:230 PST] 641cdbcf WebGroup     I SRVE0180I: [AccessEmpWeb.war] [/AccessEmp] [Servlet.LOG]: InvokerServlet: init [11/24/03 21:03:07:238 PST] 641cdbcf ApplicationMg A WSVR0221I: Application started: AccessEmployee PART 3: [11/24/03 21:03:07:995 PST] 641cdbcf HttpTransport A SRVE0171I: Transport http is listening on port 9,080. [11/24/03 21:03:09:206 PST] 641cdbcf HttpTransport A SRVE0171I: Transport https is listening on port 9,443. [11/24/03 21:03:09:225 PST] 641cdbcf HttpTransport A SRVE0171I: Transport http is listening on port 9,090. [11/24/03 21:03:09:369 PST] 641cdbcf HttpTransport A SRVE0171I: Transport https is listening on port 9,043. [11/24/03 21:03:09:537 PST] 641cdbcf RMIConnectorC A ADMC0026I: RMI Connector available at port 2809 [11/24/03 21:03:09:586 PST] 641cdbcf WsServer      A WSVR0001I: Server server1 open for e-business ... PART 4: [11/24/03 21:04:34:621 PST] 1e98dbc3 WebGroup      I SRVE0180I: [AccessEmpWeb.war] [/AccessEmp] [Servlet.LOG]: AccessEmpServlet: init [11/24/03 21:04:36:595 PST] 1e98dbc3 WSRdbDataSour  u Database version is                                  SQL08014 [11/24/03 21:04:36:595 PST] 1e98dbc3 WSRdbDataSour  u JDBC Driver version is                                  1.5.54 PART 5: [11/24/03 21:04:37:137 PST] 1e98dbc3 DataAccessReq E PMGR6022E: Error using adapter to create or execute an Interaction. com.ibm.ws.rsadapter.cci.WSInteractionImpl@62ddbca [11/24/03 21:04:37:281 PST] 1e98dbc3 DataAccessReq E PMGR0000E: Call stack: com.ibm.ws.ejbpersistence.utilpm.PersistenceManagerException: PMGR6022E: Error using adapter to create or execute an Interaction. com.ibm.ws.rsadapter.cci.WSInteractionImpl@62ddbca    at com.ibm.ws.ejbpersistence.dataaccess.DataAccessRequestImpl.executeOneRowFBPK(Unknown Source)    at com.ibm.ws.ejbpersistence.beanextensions.ConcreteBeanStatefulInstanceExtensionImpl.fetchRecordF romDataStore(Unknown Source)    at com.ibm.ws.ejbpersistence.beanextensions.ConcreteBeanStatefulInstanceExtensionImpl.getRecordFor Load(Unknown Source)    at com.ibm.ws.ejbpersistence.beanextensions.ConcreteBeanStatefulInstanceExtensionImpl.executeLoad( Unknown Source)    at com.ibm.ws.ejbpersistence.beanextensions.CBReadyState.ejbLoad(Unknown Source)    at com.ibm.ws.ejbpersistence.beanextensions.ConcreteBeanStatefulInstanceExtensionImpl.ejbLoad(Unkn own Source)    at com.db2int.ejb.ConcreteEmployee_52940fe3.ejbLoad(Unknown Source)    at com.ibm.ejs.container.ContainerManaged2_0BeanO.loadForEnlist(ContainerManaged2_0BeanO.java:331)    at com.ibm.ejs.container.EntityBeanO.enlist(EntityBeanO.java:718)    at com.ibm.ejs.container.activator.OptCEntityActivationStrategy.atActivate(OptCEntityActivationStr ategy.java:117)    at com.ibm.ejs.container.activator.Activator.activateBean(Activator.java:516)    at com.ibm.ejs.container.EJSHome.activateBean_Common(EJSHome.java:1234)    at com.ibm.ejs.container.EJSHome.activateBean_Local(EJSHome.java:1150)    at com.db2int.ejb.EJSCMPEmployeeHomeBean_52940fe3.findByPrimaryKey_Local(Unknown Source)    at com.db2int.ejb.EJSLocalCMPEmployeeHome_52940fe3.findByPrimaryKey(Unknown Source)    at com.db2int.ejb.AccessEmployeeBean.getEmployeeInfo(Unknown Source)    at com.db2int.ejb.EJSRemoteStatelessAccessEmployee_cb37d10c.getEmployeeInfo(Unknown Source)    at com.db2int.ejb._AccessEmployee_Stub.getEmployeeInfo(_AccessEmployee_Stub.java:258)    at client.AccessEmpServlet.queryEmployee(AccessEmpServlet.java:328)    at client.AccessEmpServlet.service(AccessEmpServlet.java:195)    at javax.servlet.http.HttpServlet.service(HttpServlet.java:853) ... PART 6: ---- Begin backtrace for nested exception com.ibm.db2.jcc.a.SqlException: "DB2ITSO.EMPLOYEE" is an undefined name.    at com.ibm.db2.jcc.a.cq.e(cq.java:1482)    at com.ibm.db2.jcc.a.cq.a(cq.java:1092)    at com.ibm.db2.jcc.c.bc.g(bc.java:138)    at com.ibm.db2.jcc.c.bc.a(bc.java:42)    at com.ibm.db2.jcc.c.q.a(q.java:31)    at com.ibm.db2.jcc.c.bo.g(bo.java:145)    at com.ibm.db2.jcc.a.cq.k(cq.java:1072)    at com.ibm.db2.jcc.a.cr.Z(cr.java:1495)    at com.ibm.db2.jcc.a.cr.d(cr.java:1918)    at com.ibm.db2.jcc.a.cr.Q(cr.java:408)    at com.ibm.db2.jcc.a.cr.executeQuery(cr.java:391)    at com.db2int.ejb.websphere_deploy.DB2UDBNT_V8_1.EmployeeBeanFunctionSet_52940fe3.FindByPrimaryKey (Unknown Source)    at com.db2int.ejb.websphere_deploy.DB2UDBNT_V8_1.EmployeeBeanFunctionSet_52940fe3.execute(Unknown Source)    at com.ibm.ws.rsadapter.cci.WSInteractionImpl.execute(WSInteractionImpl.java:347)    at com.ibm.ws.rsadapter.cci.WSRelationalRAAdapter.executeFinder(WSRelationalRAAdapter.java:274)    at com.ibm.ws.ejbpersistence.dataaccess.DataAccessRequestImpl.executeOneRowFBPK(Unknown Source) ... PART 7: ---- Begin backtrace for Nested Throwables com.ibm.db2.jcc.a.SqlException: An error occurred during implicit system action type "2". Information returned for the error includes SQLCODE "-204", SQLSTATE "42704" and message tokens "DB2ITSO.EMPLOYEE". ...

Some explanations for different parts which are highlighted in the above example are provided as below:

PART 1 shows you current environment information such as WAS version, Operating System information, CLASSPATH setting, etc.
PART 2 shows you that the AccessEmployee application has been successfully started.
PART 3 shows you that the application server is started successfully.
PART 4 shows you that the AccessEmpServlet servlet is triggered by clicking the Query Employee button.
PART 5 shows you that an error happens when running the application. The call stack is provided. In the call stack you can find that the problem happens when fetching records from the data store that is related to EJB operations.
PART 6 shows you the back trace of the nested exception, the SqlException. In this part you can find clues as to why the application running failed. It is due to "DB2ITSO.EMPLOYEE" is an undefined name.
PART 7 shows you some information from the Nested Throwables, including the SQLCODE, which is returned by DB2 UDB.

At this point we know the reason the application running failed, as we use the new J2C Authentication Data associated with the user db2itso to access the DB2 data source, but the table name in the application for EJB access is not a fully qualified name with the form "schema.tablename". It is defaulted to DB2ITSO.<Table_Name>, for example, DB2ITSO.EMPLOYEE - an inexistent table. When access to this inexistent table, error is returned from DB2 to WAS, then WAS to the application.

There other methods also exist to find out the cause of the failed application running. For example, you could find clues in the WAS FFDC logs, get detailed information from the activity log, or increase the DB2 UDB diagnostic level so that you also could find information about the inexistent table, etc.

There is a custom property, named currentSchema, of the DB2 data source that uses DB2 Universal JDBC Driver. This property identifies the default schema name used to qualify unqualified database object references where applicable in dynamically prepared SQL statements. Unless currentSchema is used, the default schema name is the authorization ID of the current session user.

We try to resolve this problem by modifying the currentSchema property. In WebSphere Application Server Administrative Console, select Resources -> JDBC Providers -> My DB2 Universal JDBC Provider -> Data Sources, then choose the EJBSample entry which is related to the JNDI name "jdbc/Sample". Select Custom Properties and set the value of currentSchema to db2inst1.

Note

Changing currentSchema to "db2inst1" does not resolve the problem. But because it is an "obvious" way to resolve the problem and is commonly used when people see the problem the first time, so we take this action here for demonstration purposes. The correct resolution is provided later on in this subsection.

After applying the change to currentSchema above, we restart the application server for the changes to take effect. We then redo the testing, but the previous problem is still there. We need to check the SystemOut.log again to see what happens after changing the currentSchema property. Some snippets of SystemOut.log are extracted as shown in the following example.

From Example 9-7 on page 366 you can find that the currentSchema modification works, as the table name shown in the SystemOut.log has been changed to "db2inst1.Employee". Perhaps you would get confused if you try to test the SQL statement "select * from db2inst1.EMPLOYEE" from a DB2 Command Line Processor (CLP), as the statement could be run without any problem. But why does the error message say that the "db2inst1.EMPLOYEE" is an undefined name? To obtain more information about the problem nature, the trace of DB2 Universal JDBC Driver could be utilized.

Example 9-7: SystemOut.log investigation after changing currentSchema property

 [11/24/03 21:18:10:651 PST] 26251959 DataAccessReq E PMGR6022E: Error using adapter to create or execute an Interaction. com.ibm.ws.rsadapter.cci.WSInteractionImpl@47c35950 [11/24/03 21:18:10:801 PST] 26251959 DataAccessReq E PMGR0000E: Call stack: com.ibm.ws.ejbpersistence.utilpm.PersistenceManagerException: PMGR6022E: Error using adapter to create or execute an Interaction. com.ibm.ws.rsadapter.cci.WSInteractionImpl@47c35950    at com.ibm.ws.ejbpersistence.dataaccess.DataAccessRequestImpl.executeOneRowFBPK(Unknown Source) ... ---- Begin backtrace for nested exception com.ibm.db2.jcc.a.SqlException: "db2inst1.EMPLOYEE" is an undefined name. ---- Begin backtrace for Nested Throwables com.ibm.db2.jcc.a.SqlException: An error occurred during implicit system action type "2". Information returned for the error includes SQLCODE "-204", SQLSTATE "42704" and message tokens "db2inst1.EMPLOYEE". ...

Similar to what we just did to change the currentSchema property, in the same Custom Properties page we could change the following properties to activate the JDBC trace:

traceLevel: -1
traceFile: /usr/WebSphere/AppServer/logs/trace.out

After applying the changes made above and restarting the application server, then we redo the testing and check the trace output file trace.out, as shown in the example below:

Example 9-8: JDBC Trace investigation

 PART 1: [ibm][db2][jcc][Time:1069738090654][Thread:Servlet.Engine.Transports : 0][DB2ConnectionPoolDataSource@2abfd88c] getPooledConnection () called [ibm][db2][jcc] BEGIN TRACE_DRIVER_CONFIGURATION [ibm][db2][jcc] Driver: IBM DB2 JDBC Universal Driver Architecture 1.5.54 ... [ibm][db2][jcc] User's account name = root [ibm][db2][jcc] User's home directory = / [ibm][db2][jcc] User's current working directory = /usr/WebSphere/AppServer [ibm][db2][jcc] END TRACE_DRIVER_CONFIGURATION PART 2: [ibm][db2][jcc] BEGIN TRACE_CONNECTS [ibm][db2][jcc] Using properties: { password=******, dataSourceName=null, traceDirectory=null, portNumber=50000, fullyMaterializeLobData=true, currentSQLID=null, securityMechanism=3, clientAccountingInformation=null, cliSchema=null, resultSetHoldability=2, serverName=null, currentPackageSet=null, loginTimeout=0, planName=null, traceFile=/usr/WebSphere/AppServer/logs/trace.out, currentFunctionPath=null, kerberosServerPrincipal=null, retrieveMessagesFromServerOnGetMessage=true, cursorSensitivity=0, currentPackagePath=null, currentSchema=db2inst1, clientUser=null, keepDynamic=0, driverType=2, clientApplicationInformation=null, description=null, readOnly=false, clientWorkstation=null, pkList=null, deferPrepares=true, jdbcCollection=NULLID, databaseName=rsample, traceLevel=-1, traceFileAppend=false, user=db2itso } [ibm][db2][jcc] END TRACE_CONNECTS ... [ibm][db2][jcc][Time:1069738091595][Thread:Servlet.Engine.Transports : 0][Connection@480e9893] setCurrentSchema () called ... PART 3: [ibm][db2][jcc][Connection@480e9893] BEGIN TRACE_CONNECTS [ibm][db2][jcc][Connection@480e9893] Successfully connected to server jdbc:db2:rsample [ibm][db2][jcc][Connection@480e9893] User: db2itso [ibm][db2][jcc][Connection@480e9893] Database product name: DB2/6000 [ibm][db2][jcc][Connection@480e9893] Database product version: SQL08014 [ibm][db2][jcc][Connection@480e9893] Driver name: IBM DB2 JDBC Universal Driver Architecture [ibm][db2][jcc][Connection@480e9893] Driver version: 1.5.54 [ibm][db2][jcc][Connection@480e9893] DB2 Application Correlator: G901265A.OAC4.0D9415052811 [ibm][db2][jcc][Connection@480e9893] END TRACE_CONNECTS ... [ibm][db2][jcc][Time:1069738091720][Thread:Servlet.Engine.Transports : 0][Connection@480e9893] prepareStatement (SELECT T1.EMPNO, T1.FIRSTNME, T1.MIDINIT, T1.LASTNAME, T1.PHONENO, T1.HIREDATE, T1.JOB, T1.EDLEVEL, T1.SEX, T1.BIRTHDATE, T1.SALARY, T1.BONUS, T1.COMM, T1.WORKDEPT FROM EMPLOYEE T1 WHERE T1.EMPNO = ?, 1003, 1007) called ... [ibm][db2][jcc][Time:1069738091739][Thread:Servlet.Engine.Transports : 0][PreparedStatement@151f188c] executeQuery () called ... PART 4: [time:1069738091767][thread:Servlet.Engine.Transports : 0][tracepoint:5][Request.flush]         SEND BUFFER: EXCSQLIMM              (ASCII)           (EBCDIC)         0 1 2 3 4 5 6 7   8 9 A B C D E F   0123456789ABCDEF  0123456789ABCDEF  0000   0053D0510001004D  200A0044211399A2  .S.Q...M ..D!...  ..}....(......rs  0010   8194979385404040  4040404040404040  .....@@@@@@@@@@@  ample  0020   D5E4D3D3C9C44040  4040404040404040  ......@@@@@@@@@@  NULLID  0030   4040E2E8E2E2D5F2  F0F0404040404040  @@........@@@@@@    SYSSN200  0040   404040405359534C  564C303100410005  @@@@SYSLVL01.A..      ...<.<......  0050   2105F1002FD04300  0100292414000000  !.../.C...)$....  ..1..}..........  0060   001F534554204355  5252454E54205343  ..SET CURRENT SC    ...........+....  0070   48454D41203D2027  646232696E737431  HEMA = 'db2inst1  ..(.........>...  0080   27FF004ED0510002  0048201400442113  '..N.Q...H ..D!.  ...+}...........  0090   99A2819497938540  4040404040404040  .......@@@@@@@@@  rsample  00A0   4040D5E4D3D3C9C4  4040404040404040  @@......@@@@@@@@    NULLID  00B0   40404040E2E8E2E2  D5F2F0F040404040  @@@@........@@@@      SYSSN200  00C0   4040404040405359  534C564C30310041  @@@@@@SYSLVL01.A        ...<.<....  00D0   002AD05300020024  2414000000001A53  .*.S...$$......S  ..}.............  00E0   455420434C49454E  5420555345524944  ET CLIENT USERID  ....<..+........  00F0   276462326974736F  27FF002DD0530002  'db2itso'..-.S..  .......?....}...  0100   0027241400000000  1D53455420434C49  .'$......SET CLI  ..............<.  0110   454E542057524B53  544E4E414D45276B  ENT WRKSTNNAME'k  .+.......++.(..,  0120   616E61676127FF00   48D0530002004224 anaga'..H.S...B$ />/./....}......  0130   1400000000385345   5420434C49454E54 .....8SET CLIENT ...........<..+.  0140   204150504C4E414D   45276462326A6363 APPLNAME'db2jcc  ..&&<+.(.....|..  0150   536572766C65742E   456E67696E652E54 Servlet.Engine.T ....%....>..>...  0160   72616E73706F7274   73203A203027FF00 ransports : 0'.. ./>..?..........  0170   5FD0430002005924   14000000004F5345 _.C...Y$.....OSE .}...........|..  0180   5420434C49454E54   20414343544E4727 T CLIENT ACCTNG' ...<..+......+..  0190   4A43433031303530   4A564D2020202020 JCC01050JVM      ..........(.....  01A0   2020202020202020   20206462326A6363           db2jcc .............|..  01B0   536572766C65742E   456E67696E656462 Servlet.Enginedb ....%....>..>...  01C0   326974736F20272C   5827303027FF0058 2itso ',X'00'..X ....?...........  01D0   D05100030052200D   0044211399A28194 .Q...R ..D!..... }...........rsam  01E0   9793854040404040   404040404040D5E4 ...@@@@@@@@@@@.. ple           NU  01F0   D3D3C9C440404040   4040404040404040 ....@@@@@@@@@@@@ LLID  0200   E2E8E2E2D5F3F0F0   4040404040404040 ........@@@@@@@@ SYSSN300  0210   40405359534C564C   3031000100052116 @@SYSLVL01....!. ...<.<........  0220   F1000521460400D8   D043000300D22414 ...!F....C....$. 1......Q}....K..  0230   00000000C853454C   4543542054312E45 .....SELECT T1.E ....H..<........  0240   4D504E4F2C205431   2E46495253544E4D MPNO, T1.FIRSTNM (&+|..........+(  0250   452C2054312E4D49   44494E49542C2054 E, T1.MIDINIT, T ......(...+.....  0260   312E4C4153544E41   4D452C2054312E50 1.LASTNAME, T1.P ..<...+.(......&  0270   484F4E454E4F2C20   54312E4849524544 HONENO, T1.HIRED .|+.+|..........  0280   4154452C2054312E   4A4F422C2054312E ATE, T1.JOB, T1. .........|......  0290   45444C4556454C2C   2054312E5345582C EDLEVEL, T1.SEX, ..<...<.........  02A0   2054312E42495254   48444154452C2054 T1.BIRTHDATE, T  ................  02B0   312E53414C415259   2C2054312E424F4E 1.SALARY, T1.BON ....<.........|+  02C0   55532C2054312E43   4F4D4D2C2054312E US, T1.COMM, T1. ........|((.....  02D0   574F524B44455054   2046524F4D20454D WORKDEPT FROM EM .|....&....|(..(  02E0   504C4F5945452020   5431205748455245 PLOYEE T1 WHERE  &<|.............  02F0   2054312E454D504E   4F203D203FFF0053 T1.EMPNO = ?..S  .....(&+|.......  0300   D0410004004D2008   0044211399A28194 .A...M ..D!..... }....(......rsam  0310   9793854040404040   404040404040D5E4 ...@@@@@@@@@@@.. ple           NU  0320   D3D3C9C440404040   4040404040404040 ....@@@@@@@@@@@@ LLID  0330   E2E8E2E2D5F3F0F0   4040404040404040 ........@@@@@@@@ SYSSN300  0340   40405359534C564C   3031000100052146 @@SYSLVL01....!F   ...<.<........  0350   05005BD051000500   55200C0044211399 ..[.Q...U ..D!.. ..$}...........r  0360   A281949793854040   4040404040404040 ......@@@@@@@@@@ sample  0370   40D5E4D3D3C9C440   4040404040404040 @......@@@@@@@@@  NULLID  0380   404040E2E8E2E2D5   F3F0F04040404040 @@@........@@@@@    SYSSN300  0390   4040404040535953   4C564C3031000100 @@@@@SYSLVL01...      ...<.<.....  03A0   08211400007FFF00   05215D010028D003 .!.......!]..(.. ....."....)...}.  03B0   0005002224120010   00100676D0417FFF ..."$......v.A.. ............}.".  03C0   0671E4D00001000E   147A000000063030 .q.......z....00 ..U}.....:......  03D0   30303130                            0010             .... ... PART 5: [ibm][db2][jcc] BEGIN TRACE_DIAGNOSTICS [ibm][db2][jcc][SQLException@5a20988c] java.sql.SQLException [ibm][db2][jcc][SQLException@5a20988c][Sqlca@5d91d88c] DB2 SQLCA from server [ibm][db2][jcc][SQLException@5a20988c][Sqlca@5d91d88c] SqlCode        = -204 [ibm][db2][jcc][SQLException@5a20988c][Sqlca@5d91d88c] SqlErrd        = { -2146303874, 0, 0, 0, -10, 0 } [ibm][db2][jcc][SQLException@5a20988c][Sqlca@5d91d88c] SqlErrmc       = db2inst1.EMPLOYEE [ibm][db2][jcc][SQLException@5a20988c][Sqlca@5d91d88c] SqlErrmcTokens = { db2inst1.EMPLOYEE } [ibm][db2][jcc][SQLException@5a20988c][Sqlca@5d91d88c] SqlErrp        = SQLNQ1FB [ibm][db2][jcc][SQLException@5a20988c][Sqlca@5d91d88c] SqlState       = 42704 [ibm][db2][jcc][SQLException@5a20988c][Sqlca@5d91d88c] SqlWarn        = [ibm][db2][jcc][SQLException@5a20988c] SQL state  = 42704 [ibm][db2][jcc][SQLException@5a20988c] Error code = -204 [ibm][db2][jcc][SQLException@5a20988c] Tokens     = db2inst1.EMPLOYEE [ibm][db2][jcc][SQLException@5a20988c] Stack trace follows com.ibm.db2.jcc.a.SqlException: db2inst1.EMPLOYEE    at com.ibm.db2.jcc.a.cq.e(cq.java:1482)    at com.ibm.db2.jcc.a.cq.a(cq.java:1092)    at com.ibm.db2.jcc.c.bc.g(bc.java:138)    at com.ibm.db2.jcc.c.bc.a(bc.java:42)    at com.ibm.db2.jcc.c.q.a(q.java:31)    at com.ibm.db2.jcc.c.bo.g(bo.java:145)    at com.ibm.db2.jcc.a.cq.k(cq.java:1072)    at com.ibm.db2.jcc.a.cr.Z(cr.java:1495)    at com.ibm.db2.jcc.a.cr.d(cr.java:1918)    at com.ibm.db2.jcc.a.cr.Q(cr.java:408)    at com.ibm.db2.jcc.a.cr.executeQuery(cr.java:391)    at com.db2int.ejb.websphere_deploy.DB2UDBNT_V8_1.EmployeeBeanFunctionSet_52940fe3.FindByPrimaryKey (Unknown Source)    at com.db2int.ejb.websphere_deploy.DB2UDBNT_V8_1.EmployeeBeanFunctionSet_52940fe3.execute(Unknown Source)    at com.ibm.ws.rsadapter.cci.WSInteractionImpl.execute(WSInteractionImpl.java:347)    at com.ibm.ws.rsadapter.cci.WSRelationalRAAdapter.executeFinder(WSRelationalRAAdapter.java:274)    at com.ibm.ws.ejbpersistence.dataaccess.DataAccessRequestImpl.executeOneRowFBPK(Unknown Source) ...

Some explanations for the different parts, which are highlighted in the above example, are provided as below:

PART 1 contains some environment information, such as the IBM DB2 JDBC Universal Driver version, Operating System information, JVM information, CLASSPATH setting, and so on. Only part of the information is shown in our example.
PART 2 shows you the current setting of properties for the DB2 data source. There are some properties highlighted in the example that you probably are interested in, for example, currentSchema=db2inst1, and you also could find that the function setCurrentSchema () is called to make the currentSchema property change.
PART 3 indicates that a connection has been made to the sample database (see the message "Successfully connected to server jdbc:db2:rsample"). Messages also shown here include the SQL statement that is executed on the DB2 side.

PART 4 shows you the send buffer content. From the ASCII translation of the send buffer, you could find that the corresponding command that is used to make the currentSchema change:

 SET CURRENT SCHEMA = 'db2inst1'

The statement above is the problem source, as the schema name is case sensitive when it is enclosed in the single quotes. The problem could be easily reproduced within the DB2 CLP, as shown in the example below.

Example 9-9: Reproducing problem using similar SQL statements

 atlantic / # su - db2itso $ db2 connect to sample user db2itso using ibmdb2 $ db2 "set current schema = 'db2inst1'" DB20000I  The SQL command completed successfully. $ db2 "select * from employee" SQL0204N  "db2inst1.EMPLOYEE" is an undefined name.  SQLSTATE=42704

As the current schema value is enclosed by the single quotes, it would not be converted to upper case when running inside DB2. But the fully qualified table object name for the EMPLOYEE table is "DB2INST1.EMPLOYEE", not "db2inst1.EMPLOYEE". This is the reason that why the error message says "db2inst1.EMPLOYEE" is an undefined name.

Note

The reason why you could run the SQL statement "select * from db2inst1.EMPLOYEE" successfully in the DB2CLP is that the DB2 engine helps you to convert the lower case schema name to an upper case schema name automatically, as there are no quotation marks surrounding the schema name.

PART 5 shows you the trace diagnostics, including the output for most tokens of the SQL Communications Area (SQLCA), stack traceback for the function calls, etc.

Now we know that the root cause of the failed application running is due to the lower case schema name. We just need to change the schema name to upper case, that is, DB2INST1, in the Custom Properties configuration page for the DB2 data source in WebSphere Application Server Administrative Console. After applying the change and restarting the application server, all the services provided by the AccessEmployee application, including the "Query Employee" service, can be run successfully.

Up to this point, we have successfully resolved the problem by investigating both WebSphere Application Server logs and DB2 Universal JDBC Driver traces. The steps discussed above could be applied to most problems that involve both WAS and DB2 UDB. Also please remember to turn off the trace once the problem troubleshooting work is done.

There are also many other methods to resolve this problem. For example, you could create aliases for related tables from the DB2 UDB side so that you could access DB2INST1.<table_name> via the alias DB2ITSO.<table_name>, or you could modify the schema for the mapping between EJB and RDB, etc. As the purpose of this scenario is just to show you some basic methodologies that you can utilize for a general problem troubleshooting situation, so we only discuss the method of changing currentSchema property of the data source to resolve the problem. If you encounter a similar "wrong schema" problem in your environment, you can have your own method that is most appropriate for your runtime environment.

9.3.2 Concurrency scenario

Performance degradation due to inappropriate concurrency settings of an application or database is a very common problem. During our lab environment test we have also found a similar problem and this scenario is dedicated to resolve that issue. The root cause of the problem demonstrated here is one where the default lock settings of the TRADE3DB database are creating a significant locking contention on the DB2 UDB server, which results in poor performance/response time.

We are using a TRADE3 application for our scenarios which is the third generation WebSphere end-to-end benchmark and performance sample application. TRADE3 application is developed using the WebSphere Application Developer (WSAD) development tool. This includes development of EJBs, servlets and visual development of JSPs and Java clients.

Note

Refer to the Appendix A, "Trade3 application" on page 401, for a complete understanding of the TRADE3 application and the architecture used.

Simulate an environment

In order to simulate this environment we are using the TRADE3 application. We applied a workload through Web Performance Tools (WPT) application stress, which simulates an environment where 40 users log in and make a request for the stock quotes, and then repeatedly request the same stock quotes again and again. This simulates a high volume trading application where multiple customers perform stock quote requests. Due to applied stress number of user start complaining about poor response time, so a task is assigned to a performance problem determination team to look into this matter and come up with the root cause and an appropriate solution for the same.

Table 9-2 is the detail of hardware and software infrastructure used by TRADE3 application in our lab environment. The DB2 UDB database we created is call TRADE3DB.

Table 9-2: Infrastructure for concurrency scenario
Hardware/Software	Configuration
WebSphere Application Server 5.0.2	Hostname: HELIUM IP: 9.1.38.185 x-Series with Windows 2000 Server 4 GB RAM Dual CPU: Intel Pentium® III CPU family 1133 MHz Two 16 GB HDD Application: TRADE3
DB2 UDB 8.1.3	Hostname: GALLIUM IP: 9.1.38.184 x-Series with Windows 2000 Server 4 GB RAM Dual CPU: Intel Pentium III CPU family 1133 MHz Two 16 GB HDD Database: TRADE3DB
Web Performance Tools (WPT)	Hostname: LOCHNESS IP: 9.1.38.167 x-Series with Windows 2000 2 GB RAM Intel Pentium 4 CPU 2.40 GHz 37.2 GB HDD WPT application: Stress

Analyze the problem and determine the root cause

As we are working in our controlled environment, so we have ignored the real-world possibilities because it is almost impossible to cover all of them within the scope of this book. We are diagnosing some of the key areas to depict this performance degradation problem. A few of the most common areas are:

Memory constraints
Disk I/O contention
CPU resources
Application server/application issues
Database configuration/SQL

Let us start analyzing them one by one to drill down the root cause of the problem.

Memory constraints

In order to verify memory usage, CPU consumption, and disk I/O statistics, we use Performance Monitoring tools of the Windows perfmon and Windows Task Manager. We check these statistics both on WebSphere Application Server (HELIUM) and DB2 UDB server (GALLIUM) by capturing the information through the perfmon tool and Task Manager. Below are the steps to configure these tools:
1. To start the perfmon, click Start -> Run. Enter perfmon, or enter Control Panel. Select Administrative Tools -> Performance.
2. Select Performance Logs and Alerts from the left pane of the window.
3. Right-click Counter logs and select New Log Settings.
4. Give any name (for example, Memory Overview) and click OK, as shown in Figure 9-4.
  
  Figure 9-4: perfmon settings for Memory Overview
5. Click ADD to add a performance counter. Select the counters from the list and click ADD, as shown in Figure 9-5.
  
  Figure 9-5: prefmon settings for Memory Overview
6. Close the windows. You will be sent back to the initial screen and click OK to close the window.
7. Double-click Counter Logs. If the newly created log is not already running, (check the green icon as shown in Figure 9-6 on page 376) then right-click the log and select Start.
  
  Figure 9-6: prefmon green signal
8. Select System Monitor from the tree, then select the View Log File Data to see the results.
The rest of the activity we track from the Windows Task Manager to make it more informative and appropriate to our settings.
1. Open the Windows Task Manager and click the Processes tab. Chose the View menu and click Select columns and finally check the check boxes for the desired options, as shown in Figure 9-7 on page 377
  
  Figure 9-7: Task Manager settings
2. Select the appropriate columns for the Task Manager and click OK.
We considered whether system swapping due to potential memory constraints was the cause of the problem. The statistics shown in Figure 9-8 show that plenty of free memory is available for use, so it is not a memory problem. Let us continue the analysis with the rest of the key areas.

Figure 9-8: Memory usage
Disk I/O contention

We checked the disk I/O activity through the Task Manager. We have not found much contention on the I/O activity, as shown in Figure 9-9 on page 379, on the GALLIUM server.

Figure 9-9: I/O activity for concurrency scenario

We thought that the possible cause of the poor response time at peak periods could be due to disk I/O contention, but we found that the disk I/O rate is within an acceptable range. The output of Figure 9-9 clearly shows that there is not an issue with I/O contention, so we must continue with rest of the possibilities.
CPU resources

In order to check the fate of CPU consumption let us verify it with the Task Manager.

The above statistics in CPU column of the output shown in Figure 9-10 clearly indicate that the percentage of CPU consumption is under moderate limits, so CPU is not a problem in this case. We continue our diagnostic process with WebSphere Application Server and DB2 UDB.

Figure 9-10: CPU consumption for concurrency scenario
Application server/application issues

While looking at the problem through the performance analyzing tool provided by WebSphere - Tivoli Performance Viewer we found that response time is too high, as shown in Figure 9-11 on page 381.

Figure 9-11: High response time

We checked with WAS logs but could not find much there. The final area left to be explored is the DB2 UDB server, so let us examine the database activity.

Database configuration/SQL

We analyzed the DB2 diagnostic file, DB2DIAG.LOG, and found a high number of lock escalation, especially against database table DB2ADMIN.HOLDINGEJB to attain a intent share lock. Example 9-10 shows the few messages thrown in DB2DIAG.LOG file.

Example 9-10: Information from the DB2 diagnostic log DB2DIAG.LOG

 2003-11-25-18.17.01.968001   Instance:DB2 Node:000 PID:3476(db2syscs.exe)   TID:4056   Appid:G90126B9.A704.0158C6010957 data management  sqldEscalateLocks Probe:3   Database:TRADE3DB ADM5502W The escalation of "109" locks on table "DB2ADMIN.HOLDINGEJB" to lock intent "S" was successful.

The diagnostic level parameter DIAGLEVEL default value of 3 is appropriate for routine monitoring. We have changed this value to 4, which is the highest level of diagnostics. This is because the routine monitoring level does not provide us with the information required to perform proper problem diagnosis.

Then we verified this fact with snapshot for the TRADE3 database, as shown in Example 9-11.

Example 9-11: Snapshot of TRADE3DB for concurrency scenario

 db2 get snapshot for database on trade3db               Database Snapshot Database name                              = TRADE3DB Database path                              = C:\DB2\NODE0000\SQL Input database alias                       = TRADE3DB Database status                            = Active Catalog database partition number          = 0 Catalog network node name                  = Operating system running at database server= NT Location of the database                   = Local First database connect timestamp           = 11-25-2003 18:13:34 Last reset timestamp                       = Last backup timestamp                      = Snapshot timestamp                         = 11-25-2003 18:19:20 Locks held currently                       = 123 Lock waits                                 = 270 Lock list memory in use (Bytes)            = 19188 Deadlocks detected                         = 0 Lock escalations                           = 747 Exclusive lock escalations                 = 0 Agents currently waiting on locks          = 1 Lock Timeouts                              = 5 Number of indoubt transactions             = 0

Bold values in Example 9-11 also verify the same fact of the lock escalation. In order to further confirm the above cause we checked it through DB2 Command Line Process output, as shown in Figure 9-12 on page 383.

click to expand
Figure 9-12: CLP showing UOW waiting

Apart from the above indications, we checked our system Event Viewer Application log and found the message as shown in Figure 9-13 on page 384.

click to expand
Figure 9-13: Lock Escalation Event log

All of the above facts lead to a DB2 concurrency issue, so the root cause of the problem is inappropriate settings of locking parameters.

Apply best practices

Referring to the results shown in Example 9-11 on page 382, snapshot clearly points to of DB2 lock contention as follows:

270 Lock Waits

This corresponds to an application having to wait for another user to release a lock on a required resource before it can continue to execute.
747 Lock Escalations

According to this factor the total number of locks held by the application exceeded the value specified in LOCKLIST parameter, and therefore an escalation of the locks took place. Such an escalation has the potential to significantly impact concurrency among users requesting incompatible locks on shared resources. Exclusive lock escalations are the most restrictive of such locks, essentially single threading execution of applications contending for the same resource.

We issued the following command to determine the TRADE3DB database configuration parameters relating to locking and output, as shown in Example 9-12.

Example 9-12: Database configuration setting of TRADE3DB

 db2 get db cfg for trade3db #Trimmed results of db config Percent. of lock lists per application      (MAXLOCKS) = 10 Lock timeout (sec)                        (LOCKTIMEOUT) = -1

The LOCKLIST parameter value can be calculated by examining the memory requirements of each lock, the maximum number of concurrent connections, and the average number of concurrent locks held by an application. Each initial lock on a DB2 object consumes 72 bytes of memory, with each additional lock on the same object consuming 36 bytes. The LOCKLIST parameter value is specified in 4-KB blocks, and may be computed as follows:

 LOCKLIST = Maximum Conn App * Avg no of locks * 72/4096

The MAXLOCKS defines the percentage of LOCKLIST memory that can be assigned to single application at a given time before lock escalation occurs.

The considerations when setting maxlocks is to use it in conjunction with the size of the lock list (LOCKLIST). The actual limit of the number of locks held by an application before lock escalation occurs is:

 MAXLOCKS * LOCKLIST * 4 096 / (100 * 36)

Where 4,096 is the number of bytes in a page, 100 is the largest percentage value allowed for maxlocks, and 36 is the number of bytes per lock. If you know that one of your applications requires 1000 locks, and you do not want lock escalation to occur, then you should choose values for maxlocks and locklist in this formula so that the result is greater than 1000.

The LOCKTIMEOUT parameter specifies the number of seconds that an application waits to obtain a lock. This helps avoid deadlocks for applications. In a transaction processing (OLTP) environment, you can use an initial starting value of 20 to 30 seconds.

If we see the values of MAXLOCKS, LOCKTIMEOUT database parameters of TRADE3DB, as shown in Example 9-12, they are not optimally tuned. Let us change these parameters as shown in Example 9-13 on page 386.

Example 9-13: Update the configuration

 db2 update db cfg using MAXLOCKS 75 DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully. db2 update db cfg using LOCKTIMEOUT 20 DB20000I  The UPDATE DATABASE CONFIGURATION command completed successfully. SQL1363W One or more of the parameters submitted for immediate modification were not changed dynamically. For these configuration parameters, all applications must disconnect from this database before the changes become effective. db2stop force 11-25-2003 19:04:26    0   0   SQL1064N DB2STOP processing was successful. SQL1064N DB2STOP processing was successful. db2start 11-25-2003 19:04:41    0   0   SQL1063N DB2START processing was successful. SQL1063N DB2START processing was successful. db2 get db cfg for trade3db #Trimmed results of db config Percent. of lock lists per application     (MAXLOCKS) = 75 Lock timeout (sec)                      (LOCKTIMEOUT) = 20

After changing these parameters we again applied the workload through the WPT stress application and examined the DB2DIAG.LOG file, and there are no further messages of lock escalation. In order to confirm it further, let us have a look at the snapshot shown in Example 9-14.

Example 9-14: Snapshot results after resolving concurrency problem

 db2 get snapshot for database on trade3db #Trimmed results of db config               Database Snapshot Database name                            = TRADE3DB Database path                            = C:\DB2\NODE0000\SQL Input database alias                     = TRADE3DB Database status                          = Active Locks held currently                     = 0 Lock waits                               = 0 Lock list memory in use (Bytes)          = 9000 Deadlocks detected                       = 0 Lock escalations                         = 0 Exclusive lock escalations               = 0 Agents currently waiting on locks        = 0 Lock Timeouts                            = 5 Number of indoubt transactions           = 0

This confirms that now we do not have concurrency problems.

9.3.3 High I/O consumption scenario

Performance tuning is an iterative process, and you encounter the situation when you just resolved the one problem and stuck with the next one. So after resolving the problem of concurrency we again simulated the environment via the TRADE3 application by putting a stress through the stress application of WPT. We still found the performance issues, and in order to eradicate the problem we once again started our problem determination and tuning process.

Analyze the problem and determine the root cause

As discussed in previous scenarios, we are evaluating only the key areas to determine a problem. Also here we are eliminating the scope of the concurrency problem, as it is already resolved in previous scenario. Below are the areas which we are going to explore:

Memory constraints
CPU resources
Disk I/O contention
Application server/application issues
Database configuration/SQL

Let us start analyzing them one by one to drill down to the root cause of the problem.

Memory constraints

We checked both DB2 UDB and WAS servers. None of them is showing any memory issues. We have confirmed this with the help of the prefmon tool of Windows.
CPU resources

In order to check the fate of CPU consumption we have verified with the Task Manager. The statistics of CPU indicate that the percentage of CPU consumption is under moderate limits, so CPU is not a problem in this case.
Disk I/O contention

The output of the statistics shown in Figure 9-14 on page 388 clearly indicate that there is high I/O activity is happening at the database server, which is probably causing poor performance of the system.

Figure 9-14: Disk I/O contention
Application server/application-related issues

We checked the O/S resources and processes related to Websphere there is no such indication of contention among resources. Also, the previous diagnosis shows that some high I/O activity is happening at the database server. So we continue with the DB2 UDB server investigation.

Database configuration/SQL

Below are the messages thrown by DB2 in the diagnostic file indicating that there is a BUFFERPOOL-related problem (Example 9-15).

Example 9-15: db2diag.log file to evaluate high CPU consumption scenario

 2003-11-25-02.38.38.187000   Instance:DB2   Node:000 PID:2168(db2syscs.exe)   TID:2468   Appid:G90126B9.C904.015645020606 buffer pool services  sqlbKeepPagesPinned Probe:20   Database:TRADE3DB ADM6071I  The maximum number of pinned pages allowed concurrently was reached in buffer pool "IBMDEFAULTBP" (ID "1").  As a result performance may not be optimal. Increasing the SORTHEAP database configuration parameter may reduce the chances of this condition occurring in the future. 2003-11-25-02.38.38.218000  Instance:DB2   Node:000 PID:2168(db2syscs.exe)  TID:2468   Appid:G90126B9.C904.015645020606 buffer pool services  sqlbKeepPagesPinned Probe:20  Database:TRADE3DB BP 1, requested pages to pin = 1, actual pinned = 0

The above analysis of the db2diag.log file in Example 9-15 on page 388 and snapshot in Example 9-16 on page 389 clearly indicate that the root cause of the problem is inadequate sizing of the bufferpool. The size of the default bufferpool IBMDEFAULTBP of the TRADE3DB database is creating a problem. This configuration of DB2 UDB results in a very high rate of I/O on the system when we are applying a workload through the TRADE3 application.

Example 9-16: Snapshot to evaluate high CPU consumption scenario

 update dbm configuration using DFT_MON_BUFPOOL ON db2 get snapshot for bufferpools on tarde3db #Trimmed results of db config Buffer pool data logical reads              = 91411 Buffer pool data physical reads             = 181 Asynchronous pool data page reads           = 9 Buffer pool data writes                     = 245 Asynchronous pool data page writes          = 78 Buffer pool index logical reads             = 151390 Buffer pool index physical reads            = 203 Asynchronous pool index page reads          = 1 Buffer pool index writes                    = 246 Asynchronous pool index page writes         = 61 Total buffer pool read time (ms)            = 9 Total buffer pool write time (ms)           = 20943 Total elapsed asynchronous read time        = 0 Total elapsed asynchronous write time       = 14968 Asynchronous data read requests             = 4 Asynchronous index read requests            = 1 LSN Gap cleaner triggers                    = 8 Dirty page steal cleaner triggers           = 0 Dirty page threshold cleaner triggers       = 0 Time waited for prefetch (ms)               = 2 Unread prefetch pages                       = 0 Direct reads                                = 528 Direct writes                               = 136 Direct read requests                        = 103 Direct write requests                       = 36 Direct reads elapsed time (ms)              = 20 Direct write elapsed time (ms)              = 323 Database files closed                       = 0 Data pages copied to extended storage       = 0 Index pages copied to extended storage      = 0 Data pages copied from extended storage     = 0 Index pages copied from extended storage    = 0

Apply best practices

A buffer pool is a memory used to cache table and index data pages when they are being read from disk or being modified. The buffer pool improves database system performance by allowing data to be accessed from memory instead of from disk. Accessing it from the memory is much faster than disk access. The less often the database manager needs to read from or written to a disk, the better the performance.

As pages are read and modified, they accumulate in the database buffer pool. When a page is read in, it is read into a buffer pool page. If the buffer pool is full of modified pages, one of these modified pages must be written out to the disk before the new page can be read in. When the size of the bufferpool is small, a lot of disk I/O happens, which leads to performance degradation.

In the snapshot output shown in Example 9-17, look for the following logical reads and physical reads so that you can calculate the buffer pool hit ratio, which can help you tune your buffer pools.

Example 9-17: Buffer pool values

 Buffer pool data logical reads             = 91411 Buffer pool data physical reads            = 181 Buffer pool data writes                    = 245 Buffer pool index logical reads            = 151390 Buffer pool index physical reads           = 203

The buffer pool hit ratio indicates the percentage of time the database manager needs to not load a page from disk to fulfill the request, as the page is already in the buffer pool. The higher the buffer pool hit ratio, the lower the frequency of disk I/O.

The above calculation shows all of the pages that has been cached by the buffer pool. Ideally this ratio should be 95 percent or above. So in order to increase the buffer pool hit ratio, increase the buffer pool size.

Take the formula mentioned in "System tuning" on page 272 while calculating the size of the bufferpool and review the results using the snapshot information of your environment. See the DB2 UDB Administration Guide - Performance, SC09-4821, for more information about bufferpool tuning.

We have identified the root cause of the problem as too small of the size of the bufferpool. Let us increase the size of the buffer pool to remove this problem using command shown in Example 9-18.

Example 9-18: Bufferpool settings

 db2 alter bufferpool ibmdefaultbp immediate size 2000 DB20000I  The SQL command completed successfully. db2stop force 11-26-2003 12:04:26     0   0   SQL1064N  DB2STOP processing was successful. SQL1064N  DB2STOP processing was successful. db2start 11-26-2003 12:04:41     0   0   SQL1063N  DB2START processing was successful. SQL1063N  DB2START processing was successful. db2 get snapshot for database manager

After changing these parameters shown in Example 9-18, we again applied the workload through the WPT stress application and examined the DB2DIAG.LOG file. Now there is no longer a buffer pool problem.

9.3.4 High CPU utilization scenario

This scenario describes the problem of performance degradation due to high CPU consumption on both the DB2 UDB and WAS server. There can be many reasons of high consumption of CPU. It may be a system problem, DB2 UDB configuration problem, WAS configuration it or might be inefficient query design. So we are going to solve this high CPU consumption by following our problem determination methodology.

Simulate an environment

We have simulated this environment on DB2 UDB server by changing few configuration parameters, as shown in Example 9-19. We are using the TRADE3 application and the TRADE3DB database with Web Performance Tools (WPT) application stress. The details of hardware and software infrastructure used for this environment are the same as provided in the previous scenario.

Example 9-19: Database configuration change for high CPU consumption scenario

 su - db2inst1 db2 update dbm cfg using intra_parallel yes DB20000I  The UPDATE DATABASE MANAGER CONFIGURATION command completed successfully. SQL1362W  One or more of the parameters submitted for immediate modification were not changed dynamically. Client changes will not be effective until the next time the application is started or the TERMINATE command has been issued. Server changes will not be effective until the next DB2START command. db2 update dbm cfg using num_poolagents 50 maxagents 500 DB20000I  The UPDATE DATABASE MANAGER CONFIGURATION command completed successfully. SQL1362W  One or more of the parameters submitted for immediate modification were not changed dynamically. Client changes will not be effective until the next time the application is started or the TERMINATE command has been issued. Server changes will not be effective until the next DB2START command. db2 update dbm cfg using DIAGLEVEL 4 DB20000I  The UPDATE DATABASE MANAGER CONFIGURATION command completed successfully. db2stop force 11-21-2003 19:35:19     0   0   SQL1064N  DB2STOP processing was successful. SQL1064N  DB2STOP processing was successful. db2start 11-21-2003 19:36:19     0   0   SQL1063N  DB2START processing was successful. SQL1063N  DB2START processing was successful. db2 update db cfg for sample using dft_degree 16

In Example 9-19 on page 391 we have changed a few parameters to replicate the environment of high CPU consumption, especially database agents. This configuration of DB2 UDB results in a very high rate of agent and process creation on the system when appropriate workload is applied by the TRADE3 application, which results in high CPU consumption.

Analyze the problem and determine the root cause

As we are working in our controlled environment, so we are diagnosing some of the key areas to depict this performance degradation problem, as shown below:

Memory constraints
Disk I/O contention
CPU resources
Application server/application issues
Database configuration/SQL

Let us start analyzing them one by one to drill down to the root cause of the problem.

Memory constraints

In order to verify this concerned area we used the perfmon and Task Manger tools of Windows. The statistics shown by the tools do not lead to a memory problem. It seems to be some other problem. Let us continue the analysis with the rest of the areas.
Disk I/O contention

We have ruled out the possibility of I/O contention, as we have already resolved this issue in a previous scenario. We must continue on with the rest of the possibilities.
CPU resources

Both DB2 UDB and WAS server are showing high CPU consumption. The statistics of CPU consumption of Helium - WAS server, shown in Figure 9-15 on page 394, clearly indicate that CPU is being used a lot.

Figure 9-15: CPU analysis for high CPU consumption scenario
Application server/application related issues

Let us further drill down to find out which process is consuming a lot of processing power of the CPU on Helium. We have checked the configuration of Websphere Application Server through the Tivoli Performance Viewer tool. Thread pool enables components of the server to reuse threads to eliminate the need to create new threads at run time, as creating new threads expends time and resources. Figure 9-16 on page 395 clearly indicates that there is a high number of thread creation and destruction going on, and at the same time the average number of concurrently active threads is very low. This may be the cause of high CPU consumption.

Figure 9-16: Tivoli Performance Viewer for High CPU consumption scenario output

We analyzed the Web container thread pool settings through the admin console (as shown in Figure 9-17), which can be a possible reason for this problem.

Figure 9-17: Webcontainer thread pool settings

Database configuration/SQL

Let us evaluate these DB2 UDB settings to resolve this problem. Below in Example 9-20 are the results of the DB2 diagnostic file DB2DIAG.LOG.

Example 9-20: db2diag.log file to evaluate high CPU consumption scenario

 2003-11-26-16.36.52.312000   Instance:DB2   Node:000 PID:2524(db2syscs.exe)   TID:3340   Appid:G90126B9.IE0A.015246232717 base sys utilities  sqleGetAgentFromPool Probe:35   Database:TRADE3DB Attempting to STEAL an agent 2003-11-26-16.36.52.375000   Instance:DB2   Node:000 PID:2524(db2syscs.exe)   TID:3340   Appid:G90126B9.IE0A.015246232717 base sys utilities  sqleGetAgentFromPool Probe:35   Database:TRADE3DB Attempting to STEAL an agent 2003-11-26-16.36.52.453000   Instance:DB2   Node:000 PID:2524(db2syscs.exe)   TID:3340   Appid:G90126B9.IE0A.015246232717 base sys utilities  sqleGetAgentFromPool Probe:35   Database:TRADE3DB

The database manager configuration parameter for agent pool size (num_poolagents) affects the total number of subagents that can be kept associated with applications. If the pool size is too small and the pool is full, a subagent disassociates itself from the application it is working on and terminates. Because subagents must be constantly created and re-associated to applications, performance suffers.

In addition, if the value of num_poolagents is too small, one application may fill the pool with associated subagents. Then when another application requires a new subagent and has no subagents in its associated agent pool, it will steal subagents from the agent pools of other applications. This situation is costly, and causes poor performance.

The above analysis of db2diag.log file and snapshot in Example 9-21 on page 396 clearly indicate that there is a problem of stealing agents among applications, which eventually creates extra cost to the system. So the root cause of the problem is inappropriate configuration settings of DB2 agents and Websphere Web container thread pool settings.

Example 9-21: Snapshot to evaluate high CPU consumption scenario

 update dbm configuration using DFT_MON_STMT ON db2 get snapshot for database manager             Database Manager Snapshot Node name                                      = Node type                                      = Enterprise Server Edition with local and remote clients Instance name                                  = DB2 Number of database partitions in DB2 instance  = 1 Database manager status                        = Active Product name                                   = DB2 v8.1.3.132 Service level                                  = s030728 (WR21324) Private Sort heap allocated                    = 0 Private Sort heap high water mark              = 48 Post threshold sorts                           = Not Collected Piped sorts requested                          = 128 Piped sorts accepted                           = 128 Start Database Manager timestamp               = 11-26-2003 16:43:37.000031 Last reset timestamp                           = Snapshot timestamp                             = 11-26-2003 16:51:42.462675 Remote connections to db manager               = 14 Remote connections executing in db manager     = 14 Local connections                              = 0 Local connections executing in db manager      = 0 Active local databases                         = 1 High water mark for agents registered          = 125 High water mark for agents waiting for a token = 0 Agents registered                              = 125 Agents waiting for a token                     = 0 Idle agents                                    = 0 Committed private Memory (Bytes)               = 71647232 Switch list for db partition number 0 Buffer Pool Activity Information  (BUFFERPOOL) = ON  11-26-2003 16:43:37.000031 Lock Information                        (LOCK) = ON  11-26-2003 16:50:19.273637 Sorting Information                     (SORT) = OFF SQL Statement Information          (STATEMENT) = ON  11-26-2003 16:43:37.000031 Table Activity Information             (TABLE) = OFF Take Timestamp Information         (TIMESTAMP) = ON  11-26-2003 16:43:37.000031 Unit of Work Information                 (UOW) = OFF Agents assigned from pool                      = 446 Agents created from empty pool                 = 124 Agents stolen from another application         = 1113 High water mark for coordinating agents        = 15 Max agents overflow                            = 0 Hash joins after heap threshold exceeded       = 0

Apply best practices

Let us first tune the Websphere Application Server (Helium). As we have seen during the investigation of the Helium, which Tivoli Performance Viewer is showing a high number of thread creation and destruction, which may be the cause of high CPU consumption.

Take the help of Tivoli Performance Viewer to tune the Web container thread pool size settings. Apply the workload with the typical number of incoming client requests, fixed number of iterations, and use a standard configuration settings. Check the Percent Maxed and Active Threads counters of the Web container submodule of the Thread Pools module.

Simple rules are if the value of the Percent Maxed counter is consistently in the double digits, then the Web container could be a bottleneck and the number of threads should be increased.

If the number of active threads are significantly lower than the number of threads in the pool, this indicates a need to lower the thread pool size for a performance gain.

Figure 9-18 shows that we have changed the value of Maximum Size parameter to 40 threads. This must enhance the average of active threads, which helps to decrease the more often creation and destruction of the threads.

click to expand
Figure 9-18: Thread pool settings of Web Container

Now let us rectify the DB2 UDB server problem of stealing agents, which we have found during the diagnostic process.

The ideal settings of agents for a server with a non-partitioned database and local and remote clients is:

 maxagents/50 x max_querydegree or maxagents - max_coordagents.

Otherwise, a -1 value can be chosen for NUM_POOLAGENTS setting, which is a calculated value.

-1 stands for calculated value = maxagents - num_initagents.

If the ratio of agents created due to empty agent pool versus agents assigned from the pool is very high, it may indicate that the NUM_POOLAGENTS configuration parameter should be increased; otherwise, set the values of NUM_POOLAGENTS and MAXAGENTS to an appropriate ratio.

A low ratio suggests that NUM_POOLAGENTS is set too high, and that some of the agents in the pool are rarely used and are wasting system resources. A high ratio can indicate that the overall workload for this node is too high. You can adjust the workload by lowering the maximum number of coordinating agents specified by the MAXAGENTS configuration parameter.

See the DB2 UDB Administration Guide - Performance, SC09-4821, for more details.

Let us change the configuration parameters, as shown in Example 9-22, to resolve this problem.

Example 9-22: Num_PoolAgents configuration

 Change agent parameters and show results db2 update dbm cfg using num_poolagents -1 DB20000I  The UPDATE DATABASE MANAGER CONFIGURATION command completed successfully. SQL1362W  One or more of the parameters submitted for immediate modification were not changed dynamically. Client changes will not be effective until the next time the application is started or the TERMINATE command has been issued. Server changes will not be effective until the next DB2START command. db2stop force 11-21-2003 19:35:19     0   0   SQL1064N  DB2STOP processing was successful. SQL1064N  DB2STOP processing was successful. db2start 11-21-2003 19:36:19     0   0   SQL1063N  DB2START processing was successful. SQL1063N  DB2START processing was successful. db2 update db cfg for trade3db using dft_degree -1 DB20000I  The UPDATE DATABASE CONFIGURATION command completed successfully. db2 get snapshot for database manager

After applying the changes at DB2 UDB server and WAS server we again checked the resources. Now there is no problem of high CPU consumption and the problem has been resolved.

< Day Day Up >