| < Day Day Up > |
|
As we have already covered the basic tuning guidelines for DB2 UDB and WebSphere in previous chapters, here we are going to apply those skills and methodologies to solve various problems using scenarios.
In Chapter 5, "Operational setup" on page 117, we demonstrate the WebSphere Application Server and DB2 UDB system setup using the DB2 EJB Sample application AccessEmployee. We install and configure WebSphere Application Server in the Kanaga machine. The database server is in Atlantic, which has DB2 UDB V8 running and Sample database created. We create the DB2 data source named jdbc/Sample, which uses the DB2 Universal JDBC Driver Type 2 Driver to connect to the sample database. During the setup for the DB2 data source, we associated the DB2 data source with a J2C Authentication Data, which represents the combination of the user db2inst1 and its corresponding password. For more details about the above description refer to "The steps to create and configure DB2 Data Source" on page 144.
The db2inst1 user is the instance owner, also a member of the system administrator group of the instance. This user has the upmost authority of the instance, and could do almost anything against the instance. Based on the security consideration, you are recommended to not use the system administrator authority in your database client application, but use a general user with the appropriate authority.
In this subsection, a new user named db2itso is created. This user does not belong to any system groups such as SYSADM, SYSCTRL and SYSMAINT of the DB2 UDB instance. We use this user for the AccessEmployee application to connect to the sample database to replace the original db2inst1 user. The basic steps are as below:
Create new user and grant appropriate authority.
Create new J2C Authentication Data and use it for AccessEmployee application.
Test and troubleshoot runtime issues.
The following shows you the details of the above steps.
In our example, we create new user db2itso and grant appropriate database access authority using the following steps:
Use ksh via the SMIT utility on AIX to create a db2itso user ID belongs to the group db2ugrp.
Add the following line to the .profile of db2itso user to initialize the DB2 environment:
. /home/db2inst1/sqllib/db2profile
Use the db2inst1 user to revoke the database connect authority from the public group.
Grant the database connect authority to the db2itso user.
Establish the connection to the SAMPLE database by using the db2itso user, as shown in the Example 9-4.
Example 9-4: New user creating and authority changing
$ db2 connect to sample $ db2 "revoke connect on database from public" $ db2 "grant connect on database to db2itso" $ db2 "select char(grantor,10) grantor,char(grantee,10) grantee, dbadmauth, connectauth, granteetype, securityadmauth from sysibm.sysdbauth" GRANTOR GRANTEE DBADMAUTH CONNECTAUTH GRANTEETYPE SECURITYADMAUTH ---------- ---------- --------- ----------- ----------- --------------- SYSIBM DB2INST1 Y Y U N SYSIBM PUBLIC N N G N DB2INST1 PUBLIC N N G N DB2INST1 DB2ITSO N Y U N 4 record(s) selected. $ db2 "grant select,insert,update,delete on table db2inst1.employee to user db2itso" DB20000I The SQL command completed successfully. $ db2 "grant select,insert,update,delete on table db2inst1.department to user db2itso" DB20000I The SQL command completed successfully. $ db2 connect to sample user db2itso Enter current password for db2itso: Database Connection Information Database server = DB2/6000 8.1.3 SQL authorization ID = DB2ITSO Local database alias = SAMPLE
We create the new J2C Authentication Data with the properties shown in Figure 9-3. The steps for J2C Authentication Data creation are the same as the steps described in step 1 in "The steps to create and configure DB2 Data Source" on page 144.
Figure 9-3: Creating New J2C Authentication Data for AccessEmployee
Then we modify the property "Container-managed Authentication Alias" for the existing data source jdbc/Sample to utilize the new J2C entry:
From the WebSphere Application Server Administrative Console, choose Resource -> JDBC Providers -> My DB2 Universal JDBC Provider -> Data Sources.
Click EJBSample, then modify the property "Container-managed Authentication Alias" to "kanaga/ITSODB2AccessEmp".
Be aware that the server may need to be restarted for these changes to take effect. After restarting the WebSphere application server server1, try to test the connection to jdbc/Sample using the steps described in step 5 of "The steps to create and configure DB2 Data Source" on page 144. The connection should be established successfully at this point.
There is no need to reinstall the AccessEmployee application as we only changed the J2C Authentication Data. We could begin the testing via accessing the following URL:
http://kanaga:9080/AccessEmp/AccessEmployee.html
On the resulting page, try to click Display Employee List, Employee List is displayed in another page successfully. But when clicking Query Employee button or Submit Query button, it would fail with error messages like below.
Example 9-5: Error messages from query employee running
Error: Cannot retrieve the employee's info with emp No.: 000010 StackTrace: RemoteException occurred in server thread; nested exception is: com.ibm.websphere.csi.CSITransactionRolledbackException: Global tx rolled back
The major difference between Display Employee List and other buttons is that Display Employee List does not use the EJB, whereas other buttons use EJB to interact with the data. From the error messages above, it is difficult to tell what is the underlying reason for the problem. Let us begin the troubleshooting from the System.out stream log. In our testing it is: /usr/WebSphere/AppServer/logs/server1/SystemOut.log
Within the SystemOut.log, you could find the following snippets, as shown in Example 9-6.
Example 9-6: SystemOut.log Investigation
PART 1: ************ Start Display Current Environment ************ WebSphere Platform 5.0 [BASE 5.0.2 ptf2M0325.01] running with process name kanaga\kanaga\server1 and process id 51268 Host Operating System is AIX, version 5.1 Java version = J2RE 1.3.1 IBM AIX build ca131-20030618 (JIT enabled: jitc), Java Compiler = jitc, Java VM name = Classic VM was.install.root = /usr/WebSphere/AppServer user.install.root = /usr/WebSphere/AppServer Java Home = /usr/WebSphere/AppServer/java/jre ws.ext.dirs = /usr/WebSphere/AppServer/java/lib:/usr/WebSphere/AppServer/classes:/usr/WebSphere/AppServer/cla sses:/usr/WebSphere/AppServer/lib:/usr/WebSphere/AppServer/lib/ext:/usr/WebSphere/AppServer/web /help:/usr/WebSphere/AppServer/deploytool/itp/plugins/com.ibm.etools.ejbdeploy/runtime Classpath = /usr/WebSphere/AppServer/properties:/usr/WebSphere/AppServer/properties:/usr/WebSphere/AppServe r/lib/bootstrap.jar:/usr/WebSphere/AppServer/lib/j2ee.jar:/usr/WebSphere/AppServer/lib/lmproxy. jar ... PART 2: [11/24/03 21:03:04:391 PST] 641cdbcf EJBContainerI I WSVR0037I: Starting EJB jar: AccessEmpEjb.jar [11/24/03 21:03:07:143 PST] 641cdbcf WebContainer A SRVE0169I: Loading Web Module: AccessEmpWeb.war. [11/24/03 21:03:07:183 PST] 641cdbcf WebGroup I SRVE0180I: [AccessEmpWeb.war] [/AccessEmp] [Servlet.LOG]: JSP 1.2 Processor: init [11/24/03 21:03:07:226 PST] 641cdbcf WebGroup I SRVE0180I: [AccessEmpWeb.war] [/AccessEmp] [Servlet.LOG]: SimpleFileServlet: init [11/24/03 21:03:07:230 PST] 641cdbcf WebGroup I SRVE0180I: [AccessEmpWeb.war] [/AccessEmp] [Servlet.LOG]: InvokerServlet: init [11/24/03 21:03:07:238 PST] 641cdbcf ApplicationMg A WSVR0221I: Application started: AccessEmployee PART 3: [11/24/03 21:03:07:995 PST] 641cdbcf HttpTransport A SRVE0171I: Transport http is listening on port 9,080. [11/24/03 21:03:09:206 PST] 641cdbcf HttpTransport A SRVE0171I: Transport https is listening on port 9,443. [11/24/03 21:03:09:225 PST] 641cdbcf HttpTransport A SRVE0171I: Transport http is listening on port 9,090. [11/24/03 21:03:09:369 PST] 641cdbcf HttpTransport A SRVE0171I: Transport https is listening on port 9,043. [11/24/03 21:03:09:537 PST] 641cdbcf RMIConnectorC A ADMC0026I: RMI Connector available at port 2809 [11/24/03 21:03:09:586 PST] 641cdbcf WsServer A WSVR0001I: Server server1 open for e-business ... PART 4: [11/24/03 21:04:34:621 PST] 1e98dbc3 WebGroup I SRVE0180I: [AccessEmpWeb.war] [/AccessEmp] [Servlet.LOG]: AccessEmpServlet: init [11/24/03 21:04:36:595 PST] 1e98dbc3 WSRdbDataSour u Database version is SQL08014 [11/24/03 21:04:36:595 PST] 1e98dbc3 WSRdbDataSour u JDBC Driver version is 1.5.54 PART 5: [11/24/03 21:04:37:137 PST] 1e98dbc3 DataAccessReq E PMGR6022E: Error using adapter to create or execute an Interaction. com.ibm.ws.rsadapter.cci.WSInteractionImpl@62ddbca [11/24/03 21:04:37:281 PST] 1e98dbc3 DataAccessReq E PMGR0000E: Call stack: com.ibm.ws.ejbpersistence.utilpm.PersistenceManagerException: PMGR6022E: Error using adapter to create or execute an Interaction. com.ibm.ws.rsadapter.cci.WSInteractionImpl@62ddbca at com.ibm.ws.ejbpersistence.dataaccess.DataAccessRequestImpl.executeOneRowFBPK(Unknown Source) at com.ibm.ws.ejbpersistence.beanextensions.ConcreteBeanStatefulInstanceExtensionImpl.fetchRecordF romDataStore(Unknown Source) at com.ibm.ws.ejbpersistence.beanextensions.ConcreteBeanStatefulInstanceExtensionImpl.getRecordFor Load(Unknown Source) at com.ibm.ws.ejbpersistence.beanextensions.ConcreteBeanStatefulInstanceExtensionImpl.executeLoad( Unknown Source) at com.ibm.ws.ejbpersistence.beanextensions.CBReadyState.ejbLoad(Unknown Source) at com.ibm.ws.ejbpersistence.beanextensions.ConcreteBeanStatefulInstanceExtensionImpl.ejbLoad(Unkn own Source) at com.db2int.ejb.ConcreteEmployee_52940fe3.ejbLoad(Unknown Source) at com.ibm.ejs.container.ContainerManaged2_0BeanO.loadForEnlist(ContainerManaged2_0BeanO.java:331) at com.ibm.ejs.container.EntityBeanO.enlist(EntityBeanO.java:718) at com.ibm.ejs.container.activator.OptCEntityActivationStrategy.atActivate(OptCEntityActivationStr ategy.java:117) at com.ibm.ejs.container.activator.Activator.activateBean(Activator.java:516) at com.ibm.ejs.container.EJSHome.activateBean_Common(EJSHome.java:1234) at com.ibm.ejs.container.EJSHome.activateBean_Local(EJSHome.java:1150) at com.db2int.ejb.EJSCMPEmployeeHomeBean_52940fe3.findByPrimaryKey_Local(Unknown Source) at com.db2int.ejb.EJSLocalCMPEmployeeHome_52940fe3.findByPrimaryKey(Unknown Source) at com.db2int.ejb.AccessEmployeeBean.getEmployeeInfo(Unknown Source) at com.db2int.ejb.EJSRemoteStatelessAccessEmployee_cb37d10c.getEmployeeInfo(Unknown Source) at com.db2int.ejb._AccessEmployee_Stub.getEmployeeInfo(_AccessEmployee_Stub.java:258) at client.AccessEmpServlet.queryEmployee(AccessEmpServlet.java:328) at client.AccessEmpServlet.service(AccessEmpServlet.java:195) at javax.servlet.http.HttpServlet.service(HttpServlet.java:853) ... PART 6: ---- Begin backtrace for nested exception com.ibm.db2.jcc.a.SqlException: "DB2ITSO.EMPLOYEE" is an undefined name. at com.ibm.db2.jcc.a.cq.e(cq.java:1482) at com.ibm.db2.jcc.a.cq.a(cq.java:1092) at com.ibm.db2.jcc.c.bc.g(bc.java:138) at com.ibm.db2.jcc.c.bc.a(bc.java:42) at com.ibm.db2.jcc.c.q.a(q.java:31) at com.ibm.db2.jcc.c.bo.g(bo.java:145) at com.ibm.db2.jcc.a.cq.k(cq.java:1072) at com.ibm.db2.jcc.a.cr.Z(cr.java:1495) at com.ibm.db2.jcc.a.cr.d(cr.java:1918) at com.ibm.db2.jcc.a.cr.Q(cr.java:408) at com.ibm.db2.jcc.a.cr.executeQuery(cr.java:391) at com.db2int.ejb.websphere_deploy.DB2UDBNT_V8_1.EmployeeBeanFunctionSet_52940fe3.FindByPrimaryKey (Unknown Source) at com.db2int.ejb.websphere_deploy.DB2UDBNT_V8_1.EmployeeBeanFunctionSet_52940fe3.execute(Unknown Source) at com.ibm.ws.rsadapter.cci.WSInteractionImpl.execute(WSInteractionImpl.java:347) at com.ibm.ws.rsadapter.cci.WSRelationalRAAdapter.executeFinder(WSRelationalRAAdapter.java:274) at com.ibm.ws.ejbpersistence.dataaccess.DataAccessRequestImpl.executeOneRowFBPK(Unknown Source) ... PART 7: ---- Begin backtrace for Nested Throwables com.ibm.db2.jcc.a.SqlException: An error occurred during implicit system action type "2". Information returned for the error includes SQLCODE "-204", SQLSTATE "42704" and message tokens "DB2ITSO.EMPLOYEE". ...
Some explanations for different parts which are highlighted in the above example are provided as below:
PART 1 shows you current environment information such as WAS version, Operating System information, CLASSPATH setting, etc.
PART 2 shows you that the AccessEmployee application has been successfully started.
PART 3 shows you that the application server is started successfully.
PART 4 shows you that the AccessEmpServlet servlet is triggered by clicking the Query Employee button.
PART 5 shows you that an error happens when running the application. The call stack is provided. In the call stack you can find that the problem happens when fetching records from the data store that is related to EJB operations.
PART 6 shows you the back trace of the nested exception, the SqlException. In this part you can find clues as to why the application running failed. It is due to "DB2ITSO.EMPLOYEE" is an undefined name.
PART 7 shows you some information from the Nested Throwables, including the SQLCODE, which is returned by DB2 UDB.
At this point we know the reason the application running failed, as we use the new J2C Authentication Data associated with the user db2itso to access the DB2 data source, but the table name in the application for EJB access is not a fully qualified name with the form "schema.tablename". It is defaulted to DB2ITSO.<Table_Name>, for example, DB2ITSO.EMPLOYEE - an inexistent table. When access to this inexistent table, error is returned from DB2 to WAS, then WAS to the application.
There other methods also exist to find out the cause of the failed application running. For example, you could find clues in the WAS FFDC logs, get detailed information from the activity log, or increase the DB2 UDB diagnostic level so that you also could find information about the inexistent table, etc.
There is a custom property, named currentSchema, of the DB2 data source that uses DB2 Universal JDBC Driver. This property identifies the default schema name used to qualify unqualified database object references where applicable in dynamically prepared SQL statements. Unless currentSchema is used, the default schema name is the authorization ID of the current session user.
We try to resolve this problem by modifying the currentSchema property. In WebSphere Application Server Administrative Console, select Resources -> JDBC Providers -> My DB2 Universal JDBC Provider -> Data Sources, then choose the EJBSample entry which is related to the JNDI name "jdbc/Sample". Select Custom Properties and set the value of currentSchema to db2inst1.
Note | Changing currentSchema to "db2inst1" does not resolve the problem. But because it is an "obvious" way to resolve the problem and is commonly used when people see the problem the first time, so we take this action here for demonstration purposes. The correct resolution is provided later on in this subsection. |
After applying the change to currentSchema above, we restart the application server for the changes to take effect. We then redo the testing, but the previous problem is still there. We need to check the SystemOut.log again to see what happens after changing the currentSchema property. Some snippets of SystemOut.log are extracted as shown in the following example.
From Example 9-7 on page 366 you can find that the currentSchema modification works, as the table name shown in the SystemOut.log has been changed to "db2inst1.Employee". Perhaps you would get confused if you try to test the SQL statement "select * from db2inst1.EMPLOYEE" from a DB2 Command Line Processor (CLP), as the statement could be run without any problem. But why does the error message say that the "db2inst1.EMPLOYEE" is an undefined name? To obtain more information about the problem nature, the trace of DB2 Universal JDBC Driver could be utilized.
Example 9-7: SystemOut.log investigation after changing currentSchema property
[11/24/03 21:18:10:651 PST] 26251959 DataAccessReq E PMGR6022E: Error using adapter to create or execute an Interaction. com.ibm.ws.rsadapter.cci.WSInteractionImpl@47c35950 [11/24/03 21:18:10:801 PST] 26251959 DataAccessReq E PMGR0000E: Call stack: com.ibm.ws.ejbpersistence.utilpm.PersistenceManagerException: PMGR6022E: Error using adapter to create or execute an Interaction. com.ibm.ws.rsadapter.cci.WSInteractionImpl@47c35950 at com.ibm.ws.ejbpersistence.dataaccess.DataAccessRequestImpl.executeOneRowFBPK(Unknown Source) ... ---- Begin backtrace for nested exception com.ibm.db2.jcc.a.SqlException: "db2inst1.EMPLOYEE" is an undefined name. ---- Begin backtrace for Nested Throwables com.ibm.db2.jcc.a.SqlException: An error occurred during implicit system action type "2". Information returned for the error includes SQLCODE "-204", SQLSTATE "42704" and message tokens "db2inst1.EMPLOYEE". ...
Similar to what we just did to change the currentSchema property, in the same Custom Properties page we could change the following properties to activate the JDBC trace:
traceLevel: -1
traceFile: /usr/WebSphere/AppServer/logs/trace.out
After applying the changes made above and restarting the application server, then we redo the testing and check the trace output file trace.out, as shown in the example below:
Example 9-8: JDBC Trace investigation
PART 1: [ibm][db2][jcc][Time:1069738090654][Thread:Servlet.Engine.Transports : 0][DB2ConnectionPoolDataSource@2abfd88c] getPooledConnection () called [ibm][db2][jcc] BEGIN TRACE_DRIVER_CONFIGURATION [ibm][db2][jcc] Driver: IBM DB2 JDBC Universal Driver Architecture 1.5.54 ... [ibm][db2][jcc] User's account name = root [ibm][db2][jcc] User's home directory = / [ibm][db2][jcc] User's current working directory = /usr/WebSphere/AppServer [ibm][db2][jcc] END TRACE_DRIVER_CONFIGURATION PART 2: [ibm][db2][jcc] BEGIN TRACE_CONNECTS [ibm][db2][jcc] Using properties: { password=******, dataSourceName=null, traceDirectory=null, portNumber=50000, fullyMaterializeLobData=true, currentSQLID=null, securityMechanism=3, clientAccountingInformation=null, cliSchema=null, resultSetHoldability=2, serverName=null, currentPackageSet=null, loginTimeout=0, planName=null, traceFile=/usr/WebSphere/AppServer/logs/trace.out, currentFunctionPath=null, kerberosServerPrincipal=null, retrieveMessagesFromServerOnGetMessage=true, cursorSensitivity=0, currentPackagePath=null, currentSchema=db2inst1, clientUser=null, keepDynamic=0, driverType=2, clientApplicationInformation=null, description=null, readOnly=false, clientWorkstation=null, pkList=null, deferPrepares=true, jdbcCollection=NULLID, databaseName=rsample, traceLevel=-1, traceFileAppend=false, user=db2itso } [ibm][db2][jcc] END TRACE_CONNECTS ... [ibm][db2][jcc][Time:1069738091595][Thread:Servlet.Engine.Transports : 0][Connection@480e9893] setCurrentSchema () called ... PART 3: [ibm][db2][jcc][Connection@480e9893] BEGIN TRACE_CONNECTS [ibm][db2][jcc][Connection@480e9893] Successfully connected to server jdbc:db2:rsample [ibm][db2][jcc][Connection@480e9893] User: db2itso [ibm][db2][jcc][Connection@480e9893] Database product name: DB2/6000 [ibm][db2][jcc][Connection@480e9893] Database product version: SQL08014 [ibm][db2][jcc][Connection@480e9893] Driver name: IBM DB2 JDBC Universal Driver Architecture [ibm][db2][jcc][Connection@480e9893] Driver version: 1.5.54 [ibm][db2][jcc][Connection@480e9893] DB2 Application Correlator: G901265A.OAC4.0D9415052811 [ibm][db2][jcc][Connection@480e9893] END TRACE_CONNECTS ... [ibm][db2][jcc][Time:1069738091720][Thread:Servlet.Engine.Transports : 0][Connection@480e9893] prepareStatement (SELECT T1.EMPNO, T1.FIRSTNME, T1.MIDINIT, T1.LASTNAME, T1.PHONENO, T1.HIREDATE, T1.JOB, T1.EDLEVEL, T1.SEX, T1.BIRTHDATE, T1.SALARY, T1.BONUS, T1.COMM, T1.WORKDEPT FROM EMPLOYEE T1 WHERE T1.EMPNO = ?, 1003, 1007) called ... [ibm][db2][jcc][Time:1069738091739][Thread:Servlet.Engine.Transports : 0][PreparedStatement@151f188c] executeQuery () called ... PART 4: [time:1069738091767][thread:Servlet.Engine.Transports : 0][tracepoint:5][Request.flush] SEND BUFFER: EXCSQLIMM (ASCII) (EBCDIC) 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF 0123456789ABCDEF 0000 0053D0510001004D 200A0044211399A2 .S.Q...M ..D!... ..}....(......rs 0010 8194979385404040 4040404040404040 .....@@@@@@@@@@@ ample 0020 D5E4D3D3C9C44040 4040404040404040 ......@@@@@@@@@@ NULLID 0030 4040E2E8E2E2D5F2 F0F0404040404040 @@........@@@@@@ SYSSN200 0040 404040405359534C 564C303100410005 @@@@SYSLVL01.A.. ...<.<...... 0050 2105F1002FD04300 0100292414000000 !.../.C...)$.... ..1..}.......... 0060 001F534554204355 5252454E54205343 ..SET CURRENT SC ...........+.... 0070 48454D41203D2027 646232696E737431 HEMA = 'db2inst1 ..(.........>... 0080 27FF004ED0510002 0048201400442113 '..N.Q...H ..D!. ...+}........... 0090 99A2819497938540 4040404040404040 .......@@@@@@@@@ rsample 00A0 4040D5E4D3D3C9C4 4040404040404040 @@......@@@@@@@@ NULLID 00B0 40404040E2E8E2E2 D5F2F0F040404040 @@@@........@@@@ SYSSN200 00C0 4040404040405359 534C564C30310041 @@@@@@SYSLVL01.A ...<.<.... 00D0 002AD05300020024 2414000000001A53 .*.S...$$......S ..}............. 00E0 455420434C49454E 5420555345524944 ET CLIENT USERID ....<..+........ 00F0 276462326974736F 27FF002DD0530002 'db2itso'..-.S.. .......?....}... 0100 0027241400000000 1D53455420434C49 .'$......SET CLI ..............<. 0110 454E542057524B53 544E4E414D45276B ENT WRKSTNNAME'k .+.......++.(.., 0120 616E61676127FF00 48D0530002004224 anaga'..H.S...B$ />/./....}...... 0130 1400000000385345 5420434C49454E54 .....8SET CLIENT ...........<..+. 0140 204150504C4E414D 45276462326A6363 APPLNAME'db2jcc ..&&<+.(.....|.. 0150 536572766C65742E 456E67696E652E54 Servlet.Engine.T ....%....>..>... 0160 72616E73706F7274 73203A203027FF00 ransports : 0'.. ./>..?.......... 0170 5FD0430002005924 14000000004F5345 _.C...Y$.....OSE .}...........|.. 0180 5420434C49454E54 20414343544E4727 T CLIENT ACCTNG' ...<..+......+.. 0190 4A43433031303530 4A564D2020202020 JCC01050JVM ..........(..... 01A0 2020202020202020 20206462326A6363 db2jcc .............|.. 01B0 536572766C65742E 456E67696E656462 Servlet.Enginedb ....%....>..>... 01C0 326974736F20272C 5827303027FF0058 2itso ',X'00'..X ....?........... 01D0 D05100030052200D 0044211399A28194 .Q...R ..D!..... }...........rsam 01E0 9793854040404040 404040404040D5E4 ...@@@@@@@@@@@.. ple NU 01F0 D3D3C9C440404040 4040404040404040 ....@@@@@@@@@@@@ LLID 0200 E2E8E2E2D5F3F0F0 4040404040404040 ........@@@@@@@@ SYSSN300 0210 40405359534C564C 3031000100052116 @@SYSLVL01....!. ...<.<........ 0220 F1000521460400D8 D043000300D22414 ...!F....C....$. 1......Q}....K.. 0230 00000000C853454C 4543542054312E45 .....SELECT T1.E ....H..<........ 0240 4D504E4F2C205431 2E46495253544E4D MPNO, T1.FIRSTNM (&+|..........+( 0250 452C2054312E4D49 44494E49542C2054 E, T1.MIDINIT, T ......(...+..... 0260 312E4C4153544E41 4D452C2054312E50 1.LASTNAME, T1.P ..<...+.(......& 0270 484F4E454E4F2C20 54312E4849524544 HONENO, T1.HIRED .|+.+|.......... 0280 4154452C2054312E 4A4F422C2054312E ATE, T1.JOB, T1. .........|...... 0290 45444C4556454C2C 2054312E5345582C EDLEVEL, T1.SEX, ..<...<......... 02A0 2054312E42495254 48444154452C2054 T1.BIRTHDATE, T ................ 02B0 312E53414C415259 2C2054312E424F4E 1.SALARY, T1.BON ....<.........|+ 02C0 55532C2054312E43 4F4D4D2C2054312E US, T1.COMM, T1. ........|((..... 02D0 574F524B44455054 2046524F4D20454D WORKDEPT FROM EM .|....&....|(..( 02E0 504C4F5945452020 5431205748455245 PLOYEE T1 WHERE &<|............. 02F0 2054312E454D504E 4F203D203FFF0053 T1.EMPNO = ?..S .....(&+|....... 0300 D0410004004D2008 0044211399A28194 .A...M ..D!..... }....(......rsam 0310 9793854040404040 404040404040D5E4 ...@@@@@@@@@@@.. ple NU 0320 D3D3C9C440404040 4040404040404040 ....@@@@@@@@@@@@ LLID 0330 E2E8E2E2D5F3F0F0 4040404040404040 ........@@@@@@@@ SYSSN300 0340 40405359534C564C 3031000100052146 @@SYSLVL01....!F ...<.<........ 0350 05005BD051000500 55200C0044211399 ..[.Q...U ..D!.. ..$}...........r 0360 A281949793854040 4040404040404040 ......@@@@@@@@@@ sample 0370 40D5E4D3D3C9C440 4040404040404040 @......@@@@@@@@@ NULLID 0380 404040E2E8E2E2D5 F3F0F04040404040 @@@........@@@@@ SYSSN300 0390 4040404040535953 4C564C3031000100 @@@@@SYSLVL01... ...<.<..... 03A0 08211400007FFF00 05215D010028D003 .!.......!]..(.. ....."....)...}. 03B0 0005002224120010 00100676D0417FFF ..."$......v.A.. ............}.". 03C0 0671E4D00001000E 147A000000063030 .q.......z....00 ..U}.....:...... 03D0 30303130 0010 .... ... PART 5: [ibm][db2][jcc] BEGIN TRACE_DIAGNOSTICS [ibm][db2][jcc][SQLException@5a20988c] java.sql.SQLException [ibm][db2][jcc][SQLException@5a20988c][Sqlca@5d91d88c] DB2 SQLCA from server [ibm][db2][jcc][SQLException@5a20988c][Sqlca@5d91d88c] SqlCode = -204 [ibm][db2][jcc][SQLException@5a20988c][Sqlca@5d91d88c] SqlErrd = { -2146303874, 0, 0, 0, -10, 0 } [ibm][db2][jcc][SQLException@5a20988c][Sqlca@5d91d88c] SqlErrmc = db2inst1.EMPLOYEE [ibm][db2][jcc][SQLException@5a20988c][Sqlca@5d91d88c] SqlErrmcTokens = { db2inst1.EMPLOYEE } [ibm][db2][jcc][SQLException@5a20988c][Sqlca@5d91d88c] SqlErrp = SQLNQ1FB [ibm][db2][jcc][SQLException@5a20988c][Sqlca@5d91d88c] SqlState = 42704 [ibm][db2][jcc][SQLException@5a20988c][Sqlca@5d91d88c] SqlWarn = [ibm][db2][jcc][SQLException@5a20988c] SQL state = 42704 [ibm][db2][jcc][SQLException@5a20988c] Error code = -204 [ibm][db2][jcc][SQLException@5a20988c] Tokens = db2inst1.EMPLOYEE [ibm][db2][jcc][SQLException@5a20988c] Stack trace follows com.ibm.db2.jcc.a.SqlException: db2inst1.EMPLOYEE at com.ibm.db2.jcc.a.cq.e(cq.java:1482) at com.ibm.db2.jcc.a.cq.a(cq.java:1092) at com.ibm.db2.jcc.c.bc.g(bc.java:138) at com.ibm.db2.jcc.c.bc.a(bc.java:42) at com.ibm.db2.jcc.c.q.a(q.java:31) at com.ibm.db2.jcc.c.bo.g(bo.java:145) at com.ibm.db2.jcc.a.cq.k(cq.java:1072) at com.ibm.db2.jcc.a.cr.Z(cr.java:1495) at com.ibm.db2.jcc.a.cr.d(cr.java:1918) at com.ibm.db2.jcc.a.cr.Q(cr.java:408) at com.ibm.db2.jcc.a.cr.executeQuery(cr.java:391) at com.db2int.ejb.websphere_deploy.DB2UDBNT_V8_1.EmployeeBeanFunctionSet_52940fe3.FindByPrimaryKey (Unknown Source) at com.db2int.ejb.websphere_deploy.DB2UDBNT_V8_1.EmployeeBeanFunctionSet_52940fe3.execute(Unknown Source) at com.ibm.ws.rsadapter.cci.WSInteractionImpl.execute(WSInteractionImpl.java:347) at com.ibm.ws.rsadapter.cci.WSRelationalRAAdapter.executeFinder(WSRelationalRAAdapter.java:274) at com.ibm.ws.ejbpersistence.dataaccess.DataAccessRequestImpl.executeOneRowFBPK(Unknown Source) ...
Some explanations for the different parts, which are highlighted in the above example, are provided as below:
PART 1 contains some environment information, such as the IBM DB2 JDBC Universal Driver version, Operating System information, JVM information, CLASSPATH setting, and so on. Only part of the information is shown in our example.
PART 2 shows you the current setting of properties for the DB2 data source. There are some properties highlighted in the example that you probably are interested in, for example, currentSchema=db2inst1, and you also could find that the function setCurrentSchema () is called to make the currentSchema property change.
PART 3 indicates that a connection has been made to the sample database (see the message "Successfully connected to server jdbc:db2:rsample"). Messages also shown here include the SQL statement that is executed on the DB2 side.
PART 4 shows you the send buffer content. From the ASCII translation of the send buffer, you could find that the corresponding command that is used to make the currentSchema change:
SET CURRENT SCHEMA = 'db2inst1'
The statement above is the problem source, as the schema name is case sensitive when it is enclosed in the single quotes. The problem could be easily reproduced within the DB2 CLP, as shown in the example below.
Example 9-9: Reproducing problem using similar SQL statements
atlantic / # su - db2itso $ db2 connect to sample user db2itso using ibmdb2 $ db2 "set current schema = 'db2inst1'" DB20000I The SQL command completed successfully. $ db2 "select * from employee" SQL0204N "db2inst1.EMPLOYEE" is an undefined name. SQLSTATE=42704
As the current schema value is enclosed by the single quotes, it would not be converted to upper case when running inside DB2. But the fully qualified table object name for the EMPLOYEE table is "DB2INST1.EMPLOYEE", not "db2inst1.EMPLOYEE". This is the reason that why the error message says "db2inst1.EMPLOYEE" is an undefined name.
Note | The reason why you could run the SQL statement "select * from db2inst1.EMPLOYEE" successfully in the DB2CLP is that the DB2 engine helps you to convert the lower case schema name to an upper case schema name automatically, as there are no quotation marks surrounding the schema name. |
PART 5 shows you the trace diagnostics, including the output for most tokens of the SQL Communications Area (SQLCA), stack traceback for the function calls, etc.
Now we know that the root cause of the failed application running is due to the lower case schema name. We just need to change the schema name to upper case, that is, DB2INST1, in the Custom Properties configuration page for the DB2 data source in WebSphere Application Server Administrative Console. After applying the change and restarting the application server, all the services provided by the AccessEmployee application, including the "Query Employee" service, can be run successfully.
Up to this point, we have successfully resolved the problem by investigating both WebSphere Application Server logs and DB2 Universal JDBC Driver traces. The steps discussed above could be applied to most problems that involve both WAS and DB2 UDB. Also please remember to turn off the trace once the problem troubleshooting work is done.
There are also many other methods to resolve this problem. For example, you could create aliases for related tables from the DB2 UDB side so that you could access DB2INST1.<table_name> via the alias DB2ITSO.<table_name>, or you could modify the schema for the mapping between EJB and RDB, etc. As the purpose of this scenario is just to show you some basic methodologies that you can utilize for a general problem troubleshooting situation, so we only discuss the method of changing currentSchema property of the data source to resolve the problem. If you encounter a similar "wrong schema" problem in your environment, you can have your own method that is most appropriate for your runtime environment.
Performance degradation due to inappropriate concurrency settings of an application or database is a very common problem. During our lab environment test we have also found a similar problem and this scenario is dedicated to resolve that issue. The root cause of the problem demonstrated here is one where the default lock settings of the TRADE3DB database are creating a significant locking contention on the DB2 UDB server, which results in poor performance/response time.
We are using a TRADE3 application for our scenarios which is the third generation WebSphere end-to-end benchmark and performance sample application. TRADE3 application is developed using the WebSphere Application Developer (WSAD) development tool. This includes development of EJBs, servlets and visual development of JSPs and Java clients.
Note | Refer to the Appendix A, "Trade3 application" on page 401, for a complete understanding of the TRADE3 application and the architecture used. |
In order to simulate this environment we are using the TRADE3 application. We applied a workload through Web Performance Tools (WPT) application stress, which simulates an environment where 40 users log in and make a request for the stock quotes, and then repeatedly request the same stock quotes again and again. This simulates a high volume trading application where multiple customers perform stock quote requests. Due to applied stress number of user start complaining about poor response time, so a task is assigned to a performance problem determination team to look into this matter and come up with the root cause and an appropriate solution for the same.
Table 9-2 is the detail of hardware and software infrastructure used by TRADE3 application in our lab environment. The DB2 UDB database we created is call TRADE3DB.
Hardware/Software | Configuration |
---|---|
WebSphere Application Server 5.0.2 | Hostname: HELIUM IP: 9.1.38.185 x-Series with Windows 2000 Server 4 GB RAM Dual CPU: Intel Pentium® III CPU family 1133 MHz Two 16 GB HDD Application: TRADE3 |
DB2 UDB 8.1.3 | Hostname: GALLIUM IP: 9.1.38.184 x-Series with Windows 2000 Server 4 GB RAM Dual CPU: Intel Pentium III CPU family 1133 MHz Two 16 GB HDD Database: TRADE3DB |
Web Performance Tools (WPT) | Hostname: LOCHNESS IP: 9.1.38.167 x-Series with Windows 2000 2 GB RAM Intel Pentium 4 CPU 2.40 GHz 37.2 GB HDD WPT application: Stress |
As we are working in our controlled environment, so we have ignored the real-world possibilities because it is almost impossible to cover all of them within the scope of this book. We are diagnosing some of the key areas to depict this performance degradation problem. A few of the most common areas are:
Memory constraints
Disk I/O contention
CPU resources
Application server/application issues
Database configuration/SQL
Let us start analyzing them one by one to drill down the root cause of the problem.
Memory constraints
In order to verify memory usage, CPU consumption, and disk I/O statistics, we use Performance Monitoring tools of the Windows perfmon and Windows Task Manager. We check these statistics both on WebSphere Application Server (HELIUM) and DB2 UDB server (GALLIUM) by capturing the information through the perfmon tool and Task Manager. Below are the steps to configure these tools:
To start the perfmon, click Start -> Run. Enter perfmon, or enter Control Panel. Select Administrative Tools -> Performance.
Select Performance Logs and Alerts from the left pane of the window.
Right-click Counter logs and select New Log Settings.
Give any name (for example, Memory Overview) and click OK, as shown in Figure 9-4.
Figure 9-4: perfmon settings for Memory Overview
Click ADD to add a performance counter. Select the counters from the list and click ADD, as shown in Figure 9-5.
Figure 9-5: prefmon settings for Memory Overview
Close the windows. You will be sent back to the initial screen and click OK to close the window.
Double-click Counter Logs. If the newly created log is not already running, (check the green icon as shown in Figure 9-6 on page 376) then right-click the log and select Start.
Figure 9-6: prefmon green signal
Select System Monitor from the tree, then select the View Log File Data to see the results.
The rest of the activity we track from the Windows Task Manager to make it more informative and appropriate to our settings.
Open the Windows Task Manager and click the Processes tab. Chose the View menu and click Select columns and finally check the check boxes for the desired options, as shown in Figure 9-7 on page 377
Figure 9-7: Task Manager settings
Select the appropriate columns for the Task Manager and click OK.
We considered whether system swapping due to potential memory constraints was the cause of the problem. The statistics shown in Figure 9-8 show that plenty of free memory is available for use, so it is not a memory problem. Let us continue the analysis with the rest of the key areas.
Figure 9-8: Memory usage
Disk I/O contention
We checked the disk I/O activity through the Task Manager. We have not found much contention on the I/O activity, as shown in Figure 9-9 on page 379, on the GALLIUM server.
Figure 9-9: I/O activity for concurrency scenario
We thought that the possible cause of the poor response time at peak periods could be due to disk I/O contention, but we found that the disk I/O rate is within an acceptable range. The output of Figure 9-9 clearly shows that there is not an issue with I/O contention, so we must continue with rest of the possibilities.
CPU resources
In order to check the fate of CPU consumption let us verify it with the Task Manager.
The above statistics in CPU column of the output shown in Figure 9-10 clearly indicate that the percentage of CPU consumption is under moderate limits, so CPU is not a problem in this case. We continue our diagnostic process with WebSphere Application Server and DB2 UDB.
Figure 9-10: CPU consumption for concurrency scenario
Application server/application issues
While looking at the problem through the performance analyzing tool provided by WebSphere - Tivoli Performance Viewer we found that response time is too high, as shown in Figure 9-11 on page 381.
Figure 9-11: High response time
We checked with WAS logs but could not find much there. The final area left to be explored is the DB2 UDB server, so let us examine the database activity.
Database configuration/SQL
We analyzed the DB2 diagnostic file, DB2DIAG.LOG, and found a high number of lock escalation, especially against database table DB2ADMIN.HOLDINGEJB to attain a intent share lock. Example 9-10 shows the few messages thrown in DB2DIAG.LOG file.
Example 9-10: Information from the DB2 diagnostic log DB2DIAG.LOG
2003-11-25-18.17.01.968001 Instance:DB2 Node:000 PID:3476(db2syscs.exe) TID:4056 Appid:G90126B9.A704.0158C6010957 data management sqldEscalateLocks Probe:3 Database:TRADE3DB ADM5502W The escalation of "109" locks on table "DB2ADMIN.HOLDINGEJB" to lock intent "S" was successful.
The diagnostic level parameter DIAGLEVEL default value of 3 is appropriate for routine monitoring. We have changed this value to 4, which is the highest level of diagnostics. This is because the routine monitoring level does not provide us with the information required to perform proper problem diagnosis.
Then we verified this fact with snapshot for the TRADE3 database, as shown in Example 9-11.
Example 9-11: Snapshot of TRADE3DB for concurrency scenario
db2 get snapshot for database on trade3db Database Snapshot Database name = TRADE3DB Database path = C:\DB2\NODE0000\SQL Input database alias = TRADE3DB Database status = Active Catalog database partition number = 0 Catalog network node name = Operating system running at database server= NT Location of the database = Local First database connect timestamp = 11-25-2003 18:13:34 Last reset timestamp = Last backup timestamp = Snapshot timestamp = 11-25-2003 18:19:20 Locks held currently = 123 Lock waits = 270 Lock list memory in use (Bytes) = 19188 Deadlocks detected = 0 Lock escalations = 747 Exclusive lock escalations = 0 Agents currently waiting on locks = 1 Lock Timeouts = 5 Number of indoubt transactions = 0
Bold values in Example 9-11 also verify the same fact of the lock escalation. In order to further confirm the above cause we checked it through DB2 Command Line Process output, as shown in Figure 9-12 on page 383.
Figure 9-12: CLP showing UOW waiting
Apart from the above indications, we checked our system Event Viewer Application log and found the message as shown in Figure 9-13 on page 384.
Figure 9-13: Lock Escalation Event log
All of the above facts lead to a DB2 concurrency issue, so the root cause of the problem is inappropriate settings of locking parameters.
Referring to the results shown in Example 9-11 on page 382, snapshot clearly points to of DB2 lock contention as follows:
270 Lock Waits
This corresponds to an application having to wait for another user to release a lock on a required resource before it can continue to execute.
747 Lock Escalations
According to this factor the total number of locks held by the application exceeded the value specified in LOCKLIST parameter, and therefore an escalation of the locks took place. Such an escalation has the potential to significantly impact concurrency among users requesting incompatible locks on shared resources. Exclusive lock escalations are the most restrictive of such locks, essentially single threading execution of applications contending for the same resource.
We issued the following command to determine the TRADE3DB database configuration parameters relating to locking and output, as shown in Example 9-12.
Example 9-12: Database configuration setting of TRADE3DB
db2 get db cfg for trade3db #Trimmed results of db config Percent. of lock lists per application (MAXLOCKS) = 10 Lock timeout (sec) (LOCKTIMEOUT) = -1
The LOCKLIST parameter value can be calculated by examining the memory requirements of each lock, the maximum number of concurrent connections, and the average number of concurrent locks held by an application. Each initial lock on a DB2 object consumes 72 bytes of memory, with each additional lock on the same object consuming 36 bytes. The LOCKLIST parameter value is specified in 4-KB blocks, and may be computed as follows:
LOCKLIST = Maximum Conn App * Avg no of locks * 72/4096
The MAXLOCKS defines the percentage of LOCKLIST memory that can be assigned to single application at a given time before lock escalation occurs.
The considerations when setting maxlocks is to use it in conjunction with the size of the lock list (LOCKLIST). The actual limit of the number of locks held by an application before lock escalation occurs is:
MAXLOCKS * LOCKLIST * 4 096 / (100 * 36)
Where 4,096 is the number of bytes in a page, 100 is the largest percentage value allowed for maxlocks, and 36 is the number of bytes per lock. If you know that one of your applications requires 1000 locks, and you do not want lock escalation to occur, then you should choose values for maxlocks and locklist in this formula so that the result is greater than 1000.
The LOCKTIMEOUT parameter specifies the number of seconds that an application waits to obtain a lock. This helps avoid deadlocks for applications. In a transaction processing (OLTP) environment, you can use an initial starting value of 20 to 30 seconds.
If we see the values of MAXLOCKS, LOCKTIMEOUT database parameters of TRADE3DB, as shown in Example 9-12, they are not optimally tuned. Let us change these parameters as shown in Example 9-13 on page 386.
Example 9-13: Update the configuration
db2 update db cfg using MAXLOCKS 75 DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully. db2 update db cfg using LOCKTIMEOUT 20 DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully. SQL1363W One or more of the parameters submitted for immediate modification were not changed dynamically. For these configuration parameters, all applications must disconnect from this database before the changes become effective. db2stop force 11-25-2003 19:04:26 0 0 SQL1064N DB2STOP processing was successful. SQL1064N DB2STOP processing was successful. db2start 11-25-2003 19:04:41 0 0 SQL1063N DB2START processing was successful. SQL1063N DB2START processing was successful. db2 get db cfg for trade3db #Trimmed results of db config Percent. of lock lists per application (MAXLOCKS) = 75 Lock timeout (sec) (LOCKTIMEOUT) = 20
After changing these parameters we again applied the workload through the WPT stress application and examined the DB2DIAG.LOG file, and there are no further messages of lock escalation. In order to confirm it further, let us have a look at the snapshot shown in Example 9-14.
Example 9-14: Snapshot results after resolving concurrency problem
db2 get snapshot for database on trade3db #Trimmed results of db config Database Snapshot Database name = TRADE3DB Database path = C:\DB2\NODE0000\SQL Input database alias = TRADE3DB Database status = Active Locks held currently = 0 Lock waits = 0 Lock list memory in use (Bytes) = 9000 Deadlocks detected = 0 Lock escalations = 0 Exclusive lock escalations = 0 Agents currently waiting on locks = 0 Lock Timeouts = 5 Number of indoubt transactions = 0
This confirms that now we do not have concurrency problems.
Performance tuning is an iterative process, and you encounter the situation when you just resolved the one problem and stuck with the next one. So after resolving the problem of concurrency we again simulated the environment via the TRADE3 application by putting a stress through the stress application of WPT. We still found the performance issues, and in order to eradicate the problem we once again started our problem determination and tuning process.
As discussed in previous scenarios, we are evaluating only the key areas to determine a problem. Also here we are eliminating the scope of the concurrency problem, as it is already resolved in previous scenario. Below are the areas which we are going to explore:
Memory constraints
CPU resources
Disk I/O contention
Application server/application issues
Database configuration/SQL
Let us start analyzing them one by one to drill down to the root cause of the problem.
Memory constraints
We checked both DB2 UDB and WAS servers. None of them is showing any memory issues. We have confirmed this with the help of the prefmon tool of Windows.
CPU resources
In order to check the fate of CPU consumption we have verified with the Task Manager. The statistics of CPU indicate that the percentage of CPU consumption is under moderate limits, so CPU is not a problem in this case.
Disk I/O contention
The output of the statistics shown in Figure 9-14 on page 388 clearly indicate that there is high I/O activity is happening at the database server, which is probably causing poor performance of the system.
Figure 9-14: Disk I/O contention
Application server/application-related issues
We checked the O/S resources and processes related to Websphere there is no such indication of contention among resources. Also, the previous diagnosis shows that some high I/O activity is happening at the database server. So we continue with the DB2 UDB server investigation.
Database configuration/SQL
Below are the messages thrown by DB2 in the diagnostic file indicating that there is a BUFFERPOOL-related problem (Example 9-15).
Example 9-15: db2diag.log file to evaluate high CPU consumption scenario
2003-11-25-02.38.38.187000 Instance:DB2 Node:000 PID:2168(db2syscs.exe) TID:2468 Appid:G90126B9.C904.015645020606 buffer pool services sqlbKeepPagesPinned Probe:20 Database:TRADE3DB ADM6071I The maximum number of pinned pages allowed concurrently was reached in buffer pool "IBMDEFAULTBP" (ID "1"). As a result performance may not be optimal. Increasing the SORTHEAP database configuration parameter may reduce the chances of this condition occurring in the future. 2003-11-25-02.38.38.218000 Instance:DB2 Node:000 PID:2168(db2syscs.exe) TID:2468 Appid:G90126B9.C904.015645020606 buffer pool services sqlbKeepPagesPinned Probe:20 Database:TRADE3DB BP 1, requested pages to pin = 1, actual pinned = 0
The above analysis of the db2diag.log file in Example 9-15 on page 388 and snapshot in Example 9-16 on page 389 clearly indicate that the root cause of the problem is inadequate sizing of the bufferpool. The size of the default bufferpool IBMDEFAULTBP of the TRADE3DB database is creating a problem. This configuration of DB2 UDB results in a very high rate of I/O on the system when we are applying a workload through the TRADE3 application.
Example 9-16: Snapshot to evaluate high CPU consumption scenario
update dbm configuration using DFT_MON_BUFPOOL ON db2 get snapshot for bufferpools on tarde3db #Trimmed results of db config Buffer pool data logical reads = 91411 Buffer pool data physical reads = 181 Asynchronous pool data page reads = 9 Buffer pool data writes = 245 Asynchronous pool data page writes = 78 Buffer pool index logical reads = 151390 Buffer pool index physical reads = 203 Asynchronous pool index page reads = 1 Buffer pool index writes = 246 Asynchronous pool index page writes = 61 Total buffer pool read time (ms) = 9 Total buffer pool write time (ms) = 20943 Total elapsed asynchronous read time = 0 Total elapsed asynchronous write time = 14968 Asynchronous data read requests = 4 Asynchronous index read requests = 1 LSN Gap cleaner triggers = 8 Dirty page steal cleaner triggers = 0 Dirty page threshold cleaner triggers = 0 Time waited for prefetch (ms) = 2 Unread prefetch pages = 0 Direct reads = 528 Direct writes = 136 Direct read requests = 103 Direct write requests = 36 Direct reads elapsed time (ms) = 20 Direct write elapsed time (ms) = 323 Database files closed = 0 Data pages copied to extended storage = 0 Index pages copied to extended storage = 0 Data pages copied from extended storage = 0 Index pages copied from extended storage = 0
A buffer pool is a memory used to cache table and index data pages when they are being read from disk or being modified. The buffer pool improves database system performance by allowing data to be accessed from memory instead of from disk. Accessing it from the memory is much faster than disk access. The less often the database manager needs to read from or written to a disk, the better the performance.
As pages are read and modified, they accumulate in the database buffer pool. When a page is read in, it is read into a buffer pool page. If the buffer pool is full of modified pages, one of these modified pages must be written out to the disk before the new page can be read in. When the size of the bufferpool is small, a lot of disk I/O happens, which leads to performance degradation.
In the snapshot output shown in Example 9-17, look for the following logical reads and physical reads so that you can calculate the buffer pool hit ratio, which can help you tune your buffer pools.
Example 9-17: Buffer pool values
Buffer pool data logical reads = 91411 Buffer pool data physical reads = 181 Buffer pool data writes = 245 Buffer pool index logical reads = 151390 Buffer pool index physical reads = 203
The buffer pool hit ratio indicates the percentage of time the database manager needs to not load a page from disk to fulfill the request, as the page is already in the buffer pool. The higher the buffer pool hit ratio, the lower the frequency of disk I/O.
The above calculation shows all of the pages that has been cached by the buffer pool. Ideally this ratio should be 95 percent or above. So in order to increase the buffer pool hit ratio, increase the buffer pool size.
Take the formula mentioned in "System tuning" on page 272 while calculating the size of the bufferpool and review the results using the snapshot information of your environment. See the DB2 UDB Administration Guide - Performance, SC09-4821, for more information about bufferpool tuning.
We have identified the root cause of the problem as too small of the size of the bufferpool. Let us increase the size of the buffer pool to remove this problem using command shown in Example 9-18.
Example 9-18: Bufferpool settings
db2 alter bufferpool ibmdefaultbp immediate size 2000 DB20000I The SQL command completed successfully. db2stop force 11-26-2003 12:04:26 0 0 SQL1064N DB2STOP processing was successful. SQL1064N DB2STOP processing was successful. db2start 11-26-2003 12:04:41 0 0 SQL1063N DB2START processing was successful. SQL1063N DB2START processing was successful. db2 get snapshot for database manager
After changing these parameters shown in Example 9-18, we again applied the workload through the WPT stress application and examined the DB2DIAG.LOG file. Now there is no longer a buffer pool problem.
This scenario describes the problem of performance degradation due to high CPU consumption on both the DB2 UDB and WAS server. There can be many reasons of high consumption of CPU. It may be a system problem, DB2 UDB configuration problem, WAS configuration it or might be inefficient query design. So we are going to solve this high CPU consumption by following our problem determination methodology.
We have simulated this environment on DB2 UDB server by changing few configuration parameters, as shown in Example 9-19. We are using the TRADE3 application and the TRADE3DB database with Web Performance Tools (WPT) application stress. The details of hardware and software infrastructure used for this environment are the same as provided in the previous scenario.
Example 9-19: Database configuration change for high CPU consumption scenario
su - db2inst1 db2 update dbm cfg using intra_parallel yes DB20000I The UPDATE DATABASE MANAGER CONFIGURATION command completed successfully. SQL1362W One or more of the parameters submitted for immediate modification were not changed dynamically. Client changes will not be effective until the next time the application is started or the TERMINATE command has been issued. Server changes will not be effective until the next DB2START command. db2 update dbm cfg using num_poolagents 50 maxagents 500 DB20000I The UPDATE DATABASE MANAGER CONFIGURATION command completed successfully. SQL1362W One or more of the parameters submitted for immediate modification were not changed dynamically. Client changes will not be effective until the next time the application is started or the TERMINATE command has been issued. Server changes will not be effective until the next DB2START command. db2 update dbm cfg using DIAGLEVEL 4 DB20000I The UPDATE DATABASE MANAGER CONFIGURATION command completed successfully. db2stop force 11-21-2003 19:35:19 0 0 SQL1064N DB2STOP processing was successful. SQL1064N DB2STOP processing was successful. db2start 11-21-2003 19:36:19 0 0 SQL1063N DB2START processing was successful. SQL1063N DB2START processing was successful. db2 update db cfg for sample using dft_degree 16
In Example 9-19 on page 391 we have changed a few parameters to replicate the environment of high CPU consumption, especially database agents. This configuration of DB2 UDB results in a very high rate of agent and process creation on the system when appropriate workload is applied by the TRADE3 application, which results in high CPU consumption.
As we are working in our controlled environment, so we are diagnosing some of the key areas to depict this performance degradation problem, as shown below:
Memory constraints
Disk I/O contention
CPU resources
Application server/application issues
Database configuration/SQL
Let us start analyzing them one by one to drill down to the root cause of the problem.
Memory constraints
In order to verify this concerned area we used the perfmon and Task Manger tools of Windows. The statistics shown by the tools do not lead to a memory problem. It seems to be some other problem. Let us continue the analysis with the rest of the areas.
Disk I/O contention
We have ruled out the possibility of I/O contention, as we have already resolved this issue in a previous scenario. We must continue on with the rest of the possibilities.
CPU resources
Both DB2 UDB and WAS server are showing high CPU consumption. The statistics of CPU consumption of Helium - WAS server, shown in Figure 9-15 on page 394, clearly indicate that CPU is being used a lot.
Figure 9-15: CPU analysis for high CPU consumption scenario
Application server/application related issues
Let us further drill down to find out which process is consuming a lot of processing power of the CPU on Helium. We have checked the configuration of Websphere Application Server through the Tivoli Performance Viewer tool. Thread pool enables components of the server to reuse threads to eliminate the need to create new threads at run time, as creating new threads expends time and resources. Figure 9-16 on page 395 clearly indicates that there is a high number of thread creation and destruction going on, and at the same time the average number of concurrently active threads is very low. This may be the cause of high CPU consumption.
Figure 9-16: Tivoli Performance Viewer for High CPU consumption scenario output
We analyzed the Web container thread pool settings through the admin console (as shown in Figure 9-17), which can be a possible reason for this problem.
Figure 9-17: Webcontainer thread pool settings
Database configuration/SQL
Let us evaluate these DB2 UDB settings to resolve this problem. Below in Example 9-20 are the results of the DB2 diagnostic file DB2DIAG.LOG.
Example 9-20: db2diag.log file to evaluate high CPU consumption scenario
2003-11-26-16.36.52.312000 Instance:DB2 Node:000 PID:2524(db2syscs.exe) TID:3340 Appid:G90126B9.IE0A.015246232717 base sys utilities sqleGetAgentFromPool Probe:35 Database:TRADE3DB Attempting to STEAL an agent 2003-11-26-16.36.52.375000 Instance:DB2 Node:000 PID:2524(db2syscs.exe) TID:3340 Appid:G90126B9.IE0A.015246232717 base sys utilities sqleGetAgentFromPool Probe:35 Database:TRADE3DB Attempting to STEAL an agent 2003-11-26-16.36.52.453000 Instance:DB2 Node:000 PID:2524(db2syscs.exe) TID:3340 Appid:G90126B9.IE0A.015246232717 base sys utilities sqleGetAgentFromPool Probe:35 Database:TRADE3DB
The database manager configuration parameter for agent pool size (num_poolagents) affects the total number of subagents that can be kept associated with applications. If the pool size is too small and the pool is full, a subagent disassociates itself from the application it is working on and terminates. Because subagents must be constantly created and re-associated to applications, performance suffers.
In addition, if the value of num_poolagents is too small, one application may fill the pool with associated subagents. Then when another application requires a new subagent and has no subagents in its associated agent pool, it will steal subagents from the agent pools of other applications. This situation is costly, and causes poor performance.
The above analysis of db2diag.log file and snapshot in Example 9-21 on page 396 clearly indicate that there is a problem of stealing agents among applications, which eventually creates extra cost to the system. So the root cause of the problem is inappropriate configuration settings of DB2 agents and Websphere Web container thread pool settings.
Example 9-21: Snapshot to evaluate high CPU consumption scenario
update dbm configuration using DFT_MON_STMT ON db2 get snapshot for database manager Database Manager Snapshot Node name = Node type = Enterprise Server Edition with local and remote clients Instance name = DB2 Number of database partitions in DB2 instance = 1 Database manager status = Active Product name = DB2 v8.1.3.132 Service level = s030728 (WR21324) Private Sort heap allocated = 0 Private Sort heap high water mark = 48 Post threshold sorts = Not Collected Piped sorts requested = 128 Piped sorts accepted = 128 Start Database Manager timestamp = 11-26-2003 16:43:37.000031 Last reset timestamp = Snapshot timestamp = 11-26-2003 16:51:42.462675 Remote connections to db manager = 14 Remote connections executing in db manager = 14 Local connections = 0 Local connections executing in db manager = 0 Active local databases = 1 High water mark for agents registered = 125 High water mark for agents waiting for a token = 0 Agents registered = 125 Agents waiting for a token = 0 Idle agents = 0 Committed private Memory (Bytes) = 71647232 Switch list for db partition number 0 Buffer Pool Activity Information (BUFFERPOOL) = ON 11-26-2003 16:43:37.000031 Lock Information (LOCK) = ON 11-26-2003 16:50:19.273637 Sorting Information (SORT) = OFF SQL Statement Information (STATEMENT) = ON 11-26-2003 16:43:37.000031 Table Activity Information (TABLE) = OFF Take Timestamp Information (TIMESTAMP) = ON 11-26-2003 16:43:37.000031 Unit of Work Information (UOW) = OFF Agents assigned from pool = 446 Agents created from empty pool = 124 Agents stolen from another application = 1113 High water mark for coordinating agents = 15 Max agents overflow = 0 Hash joins after heap threshold exceeded = 0
Let us first tune the Websphere Application Server (Helium). As we have seen during the investigation of the Helium, which Tivoli Performance Viewer is showing a high number of thread creation and destruction, which may be the cause of high CPU consumption.
Take the help of Tivoli Performance Viewer to tune the Web container thread pool size settings. Apply the workload with the typical number of incoming client requests, fixed number of iterations, and use a standard configuration settings. Check the Percent Maxed and Active Threads counters of the Web container submodule of the Thread Pools module.
Simple rules are if the value of the Percent Maxed counter is consistently in the double digits, then the Web container could be a bottleneck and the number of threads should be increased.
If the number of active threads are significantly lower than the number of threads in the pool, this indicates a need to lower the thread pool size for a performance gain.
Figure 9-18 shows that we have changed the value of Maximum Size parameter to 40 threads. This must enhance the average of active threads, which helps to decrease the more often creation and destruction of the threads.
Figure 9-18: Thread pool settings of Web Container
Now let us rectify the DB2 UDB server problem of stealing agents, which we have found during the diagnostic process.
The ideal settings of agents for a server with a non-partitioned database and local and remote clients is:
maxagents/50 x max_querydegree or maxagents - max_coordagents.
Otherwise, a -1 value can be chosen for NUM_POOLAGENTS setting, which is a calculated value.
-1 stands for calculated value = maxagents - num_initagents.
If the ratio of agents created due to empty agent pool versus agents assigned from the pool is very high, it may indicate that the NUM_POOLAGENTS configuration parameter should be increased; otherwise, set the values of NUM_POOLAGENTS and MAXAGENTS to an appropriate ratio.
A low ratio suggests that NUM_POOLAGENTS is set too high, and that some of the agents in the pool are rarely used and are wasting system resources. A high ratio can indicate that the overall workload for this node is too high. You can adjust the workload by lowering the maximum number of coordinating agents specified by the MAXAGENTS configuration parameter.
See the DB2 UDB Administration Guide - Performance, SC09-4821, for more details.
Let us change the configuration parameters, as shown in Example 9-22, to resolve this problem.
Example 9-22: Num_PoolAgents configuration
Change agent parameters and show results db2 update dbm cfg using num_poolagents -1 DB20000I The UPDATE DATABASE MANAGER CONFIGURATION command completed successfully. SQL1362W One or more of the parameters submitted for immediate modification were not changed dynamically. Client changes will not be effective until the next time the application is started or the TERMINATE command has been issued. Server changes will not be effective until the next DB2START command. db2stop force 11-21-2003 19:35:19 0 0 SQL1064N DB2STOP processing was successful. SQL1064N DB2STOP processing was successful. db2start 11-21-2003 19:36:19 0 0 SQL1063N DB2START processing was successful. SQL1063N DB2START processing was successful. db2 update db cfg for trade3db using dft_degree -1 DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully. db2 get snapshot for database manager
After applying the changes at DB2 UDB server and WAS server we again checked the resources. Now there is no problem of high CPU consumption and the problem has been resolved.
| < Day Day Up > |
|