kfogarty.com - Database Consultants

Website of Kenny Fogarty. An independent database consultant, based in London, England.

del.icio.us |Digg it |Earthlink |Furl |iFeedReaders |ma.gnolia |Maple.nu |Netvouz |Netscape |RawSugar |reddit |Scuttle |Shadows |Simpy |Spurl |StumbleUpon |Wink |Yahoo MyWeb |

DB2 Estimator

Filed under: Basics, DB2 — Kenny at 8:40 pm on Saturday, December 2, 2006

DB2 Estimator is a free stand-alone tool runs on a personal computer and can be that can be used to estimate the performance of applications that run against DB2 V5, V6, V7 or V8 subsystems. DB2 Estimator does not need to connect to a DB2 subsystem in order to function.

DB2 Estimator can be used to provide a table sizing information, and can be used to provide detailed performance analysis of an entire DB2 application.

DB2 Estimator is a proven aid to application DBA’s and can help to save time, lower costs migrate applications into production environments, enhance and modify existing applications.

DB2 Estimator can be used throughout all phases of the life cycle of a relational database application.

Initial Design Phase

In the initial design phase, the detailed specifications of the tables,indexes and SQL statements used by your application might not be available. Using DB2 Estimator, you can quickly and easily determine whether your proposed design is feasible and optimal.

Tables can be defined in DB2 Estimator ahead of time. Columns and significant attributes of the table can be entered into DB2 Estimator. Index creation is as simple as choosing the columns that need to be added From these basic definitions, DASD space requirements can be viewed, giving you a ball-park figure for the space needed to accomodate your application.

SQL statements can be defined or imported into DB2 Estimator which will approximate those statements that will be used within the application. Then DB2 Estimator can be used to view the cost and performance of running those SQL statements in terms of CPU and I/O capacity. SQL statements can be grouped together to model DB2 transactions and costs can then be viewed at the transaction level.

Finally, DB2 applications can be modelled by grouping transactions and specifying a rate for each transaction type. DB2 estimator estimates the amount of CPU and I/O capacity needed to run the application in your environment as well as the average elapsed time for each transaction type. The sample results screen contains the results of such an estimate.

DB2 Estimator can be used to model standalone DB2 systems, or distributed DB2 Server Systems and include Data-Sharing. DB2 Estimator also provides several predefined utilities and determine the costs of running them.

DB2 Estimator allows the definitions contained within the project to help evaluate alternative application designs. By comparing the cost and performance of these designs, it is possible to determine which design provides the optimum cost/performance ratio before valuable time and resources are invested in developing the application or creating a real database.

Developing Application and Database Objects

When you are ready to create real database objects, you can use the models specified in DB2 Estimator as a guide for naming columns and specifying attribute values. As application modules or individual programs are designed and coded, DB2 Estimator can be used to get a quick estimate of the expected performance without the need for creating real database objects. These estimates can alert you to problems that are easiest to fix at this early stage.

As application code and database design nears completion. DB2 Estimator can be used to ensure that performance is still optimal.

The accuracy of the estimates generated by DB2 Estimator can be enhanced by inputting data into DB2 Estimator from the DB2 EXPLAIN statement, or from such tools as DB2 PM.

Production Application

Performance monitoring tools such as Mainview or DB2PM are commonly used to detect application performance problems during production. Common solutions to these performance problems include changes in index and SQL design. DB2 Estimator can be used to evaluate alternative SQL designs and the resulting effects on system performance before any production database objects are modified. DB2 Estimator can also help to determine the impact resulting from a hardware and/or workload change.

Major Benefits of DB2 Estimator

  • Can provide accurate estimates of DB2 application capacity requirements
  • Can handle DB2 UDB for OS/390 versions 5, 6, and 7
  • The easiest way to estimate performance of alternative Table structures and SQL coding. DB2 application capacity requirements
  • Lets you view costs instantly online, and print or export files to spreadsheet applications
  • Easily obtain DASD size calculations by table, index and application.
  • Provides online help and an easy-to-use graphical user interface
  • Can be used without a connection to a DB2 subsystem
  • Can be used as a training and learning tool
  • You can download table, index, view and SQL definitions via flat files

Assumptions
DB2 Estimator assumes that only one table exists per table space.
DB2 Estimator assumes that each table and index is on a separate disk device and that each partition of a table or index is on a separate device.
It is not possible to override DB2 Estimator and allocate more than one table or index on a disk device.
It is not possible to put more than one table in a table space.

With this simplification, DB2 Estimator does not need to mention table spaces in the table sizing reports, utility performance estimates, or in Capacity Runs reports.

Sphere: Related Content

del.icio.us |Digg it |Earthlink |Furl |iFeedReaders |ma.gnolia |Maple.nu |Netvouz |Netscape |RawSugar |reddit |Scuttle |Shadows |Simpy |Spurl |StumbleUpon |Wink |Yahoo MyWeb |

SELECT *

Filed under: Basics, DB2 — Kenny at 3:23 pm on Tuesday, October 31, 2006

It is best enforce a standard whereby SELECT * is not used in any host programs.

The SELECT * statement is a temptation because it allows the programmer to avoid having to explicitly name each of the required columns. However, this convienience can be costly in terms of CPU time, elapsed time and maintenance of host programs.

When an SQL statement SELECTs data for use by a program, the program must have a host variable to receive a value from each column selected. If a column were added to the table, the embedded SELECT * would automatically pass its value to the program. If there is no host variable defined for the new column, there is no place to store the returned value for access by the program.

Similarly, if a column is deleted from the table, SELECT * would not return the values expected by the program, as one of the columns is no longer present.

If the columns required by the program are listed explicitly, any changed and/or deleted columns would not affect the program unless the program explicitly references a column that has been removed.

As stated above, SELECT * is a temptation to code because it avoids the programmer having to name each of the columns required by the program. However, this convenience can have a detrimental effect on CPU time, elapsed time, and maintenance of host programs (see above).

If SELECT * is used within the code of a host program, there could be instances where an unnecessary number of columns are fetched from the database and passed back to the host program. (The more columns that are processed, the more CPU is required). Also, if there is a large number of columns returned, it is less likely that a sort can be done in a sort pool in central storage, and will increase the number of work files that must be written to DASD and then merged, before the rows are returned to the host program.

If the rows returned by the SELECT * is from a distributed process, the rows must be returned over the network, which can be expensive, and cause bottlenecks and other network/performance issues.

It increases the likelihood that a compiled column assignment procedure is greater than 4 KB at bind time and thus not stored in the plan or package. A compiled assignment procedure is used rather than an interpretative process to move columns to host variable output area. If more than 100 rows are fetched, the procedure is compiled each time the SQL is executed rather than only at bind time, which increases the CPU time.

Sphere: Related Content

del.icio.us |Digg it |Earthlink |Furl |iFeedReaders |ma.gnolia |Maple.nu |Netvouz |Netscape |RawSugar |reddit |Scuttle |Shadows |Simpy |Spurl |StumbleUpon |Wink |Yahoo MyWeb |

General Rules of Thumb

Filed under: Basics, DB2 — Kenny at 8:42 am on Tuesday, October 31, 2006

Here are some general rules of thumb that can be used in a DB2 environment:

  • Never use SELECT *:
    As a general rule, a query should never ask DB2 for anything more than is required to satisfy the desired task. For maximum flexibility and efficiency, each query should access only the columns needed for the function that will be performed.

  • Singleton SELECT versus the cursor:
    To return a single row, an application program can use a cursor or a singleton SELECT. A cursor requires an OPEN, FETCH, and CLOSE to retrieve one row, whereas a singleton SELECT requires only SELECT… INTO. Usually, the singleton SELECT outperforms the cursor.

    When the selected row must be updated after it is retrieved, however, using a cursor with the FOR UPDATE OF clause is recommended over a singleton SELECT. The FOR UPDATE OF clause ensures the integrity of the data in the row because it causes DB2 to hold an exclusive lock on the page containing the row to be updated. The singleton select provides no such locking.

  • Use FOR FETCH ONLY:
    When a SELECT statement will be used only for retrieval, use the FOR FETCH ONLY clause.
  • Avoid using DISTINCT:

    The DISTINCT verb removes duplicate rows form an answer set. If duplicates will not cause a problem, do not code distinct, because it adds overhead by invoking a sort to remove the duplicates.

  • Limit the data selected:
    Return the minimum number of columns and rows needed by your application program by making efficient use of the WHERE (SQL predicate) clause. It is almost always more efficient to allow DB2 to use the WHERE clause to limit the data returned.
  • Code predicates on indexed columns:

    DB2 usually performs more efficiently when it can satisfy a request using an existing index rather than no index. Design all SQL statements to take advantage of indexes.

  • Multicolumn indexes:
    If a table has only multicolumn indexes, try to specify the high-level column in the WHERE clause of your query. This results in an index scan with at least one matching column.
  • Use ORDER BY only when sequence is important:
    Code the ORDER BY clause when the sequence of rows being returned is important. Order only those columns that are absolutely necessary in order to improve efficiency.

    Use equivalent data types: Use the same data types and lengths when comparing column values to host variables or literals. This eliminates the need for data conversion and allows for the use of an index. For example, comparing a column defined as CHAR(6) to a field which is CHAR(4) or CHAR(7) will cause data conversion and should be avoided at all costs. The easiest way to ensure datatype consistency is to use the DCLGEN fields whenever possible.

  • Use BETWEEN rather than <= and >= :
    BETWEEN allows the optimizer to select a more efficient access path.
  • Use IN instead of LIKE:
    If you know that only a certain number of occurrences exist, using IN with the specific list is more efficient than using LIKE. The functionality of LIKE can be imitated using a range of values. For example, if you want to retrieve all employees with a last name starting with “M,” use BETWEEN ‘maaaaaaaaaaaaaaa’ and ‘mzzzzzzzzzzzzzzz’ instead of LIKE ‘m%’
  • Avoid using NOT (except with EXISTS):
    Not should only be used as an alternative to very complex predicates.
Sphere: Related Content

del.icio.us |Digg it |Earthlink |Furl |iFeedReaders |ma.gnolia |Maple.nu |Netvouz |Netscape |RawSugar |reddit |Scuttle |Shadows |Simpy |Spurl |StumbleUpon |Wink |Yahoo MyWeb |

DB2 Performance Monitoring Rules of Thumb

Filed under: Basics, DB2 — Kenny at 10:00 pm on Friday, October 6, 2006

Some performance monitoring rules of thumb:

Do not overdo monitoring and tracing
DB2 performance monitoring can consume a tremendous amount of resources. Sometimes the associated overhead is worthwhile because the monitoring (problem determination or exception notification) can help alleviate or avoid a problem. However, absorbing a large CPU overhead for monitoring a DB2 subsystem that is already performing within the desired scope of acceptance is not worthwhile.

Excessive overhead caused by monitoring and tracing can turn a reasonably well performing DB2 subsystem into a poor one.

Plan and implement two types of monitoring strategies

  1. Ongoing performance monitoring to eliminate exceptions
  2. Develop procedures for monitoring exceptions after they have been observed.

use the correct tool for the job
It would be unwise to turn on a trace that causes 200 percent CPU overhead to solve a production problem that could be solved just as easily by other types of monitoring (using EXPLAIN or DB2 Catalog reports, for example).

Tuning should not be the be all and end all
Establish the goals of performance tuning in advance and stop when they have been achieved.
Tuning can go beyond the point at which reasonable gains can be realized for the amount of effort exerted.

Remember the 80/20 rule

80% of performance tuning results will come from 20% of your efforts.

Sphere: Related Content

del.icio.us |Digg it |Earthlink |Furl |iFeedReaders |ma.gnolia |Maple.nu |Netvouz |Netscape |RawSugar |reddit |Scuttle |Shadows |Simpy |Spurl |StumbleUpon |Wink |Yahoo MyWeb |

What is a Plan?

Filed under: Basics, DB2 — Kenny at 12:36 pm on Monday, September 25, 2006

In DB2 terms, a plan is an executable module containing the access path logic produced by the DB2 optimizer.

A plan can be composed of one or more DBRMs and packages.

Before a DB2 for z/OS program (with static SQL) can be run, it must have a plan associated with it.

Plans are created by the DB2 BIND command. The plan is stored in the DB2 directory and accessed when its program is run. Information about the plan is stored in the DB2 catalog.

Sphere: Related Content

del.icio.us |Digg it |Earthlink |Furl |iFeedReaders |ma.gnolia |Maple.nu |Netvouz |Netscape |RawSugar |reddit |Scuttle |Shadows |Simpy |Spurl |StumbleUpon |Wink |Yahoo MyWeb |

What is a Package?

Filed under: Basics, DB2 — Kenny at 5:41 am on Monday, September 25, 2006

A package is a single, bound DBRM with optimized access paths.

By using packages, the table access logic is “packaged” at a lower level of granularity than a plan — at the package (or program) level.

To execute a package, you first must include it in the package list of a plan (usually, there are some exceptions, such as for triggers and user-defined functions). In general, packages are not directly executed, they are only indirectly executed when the plan in which they are contained executes (UDFs and triggers are exceptions to this rule).

A plan can consist of one or more DBRMs, one or more packages or, a combination of packages and DBRMs.

Sphere: Related Content

del.icio.us |Digg it |Earthlink |Furl |iFeedReaders |ma.gnolia |Maple.nu |Netvouz |Netscape |RawSugar |reddit |Scuttle |Shadows |Simpy |Spurl |StumbleUpon |Wink |Yahoo MyWeb |

What is SQL?

Filed under: Basics, DB2 — Kenny at 12:55 pm on Sunday, September 24, 2006

Structured Query Language (SQL) is a language that can be used for data definition, data manipulation, database queries, and authorization administration for all database functions. SQL is a set-processor, non-procedural, and non-navigational language and consists of 3 components:

  1. Data Manipulation Language (DML) - statements that are used to retrieve, change, and delete data (i.e. SELECT, UPDATE, INSERT, DELETE).
  2. Data Definition Language (DDL) - statements used to define DB2 objects such as databases and tables (i.e. CREATE, ALTER, DROP).
  3. Data Control Language (DCL) - statements used to administer security for DB2 objects (i.e. GRANT, REVOKE).

SQL can be used in several modes which are described below:
Dynamic (Interactive) SQL: SQL statements that can be coded and executed interactively from terminals using various tools provided by the DBMS (For example, SPUFI and QMF).

Static (Embedded) SQL

A static SQL statement is an SQL statement that can be hardcoded in an application program at development time because information about the structure and objects (such as tables, column, and data types) with which it is intended to interact with is known in advance.

Since the details of a static SQL statement are known at development time, the work of analyzing the statement and selecting the optimum data access plan for executing the statement is performed by the DB2 optimizer as part of the development process. Because their operational form is stored in the database (as a package) and does not have to be generated at application runtime, static SQL statements execute quickly.

The downside to this approach is that all static SQL statements must be prepared (in other words, their access plans must be generated and stored in the database) before they can be executed. Furthermore, static SQL statements cannot be altered at runtime, and each application that uses static SQL must bind its operational packages to every database with which the application will interact. Additionally, because static SQL applications require prior knowledge of database objects, changes made to those objects after an application has been developed can produce undesirable results.

The following are examples of static SQL statements:

  • SELECT COUNT(*) FROM employee
  • UPDATE employee SET lastname = ‘Jones’ WHERE empid = ‘001′
  • SELECT MAX(salary), MIN(salary) INTO :MaxSalary, :MinSalary FROM employee

Generally, static SQL statements are well suited for high-performance applications that execute predefined operations against a known set of database objects.

Dynamic SQL

Although static SQL statements are relatively easy to incorporate into an application, their use is somewhat limited because their format must be known in advance.

Dynamic SQL statements, offer more flexibility because they can be constructed at application runtime. Information about a dynamic SQL statement’s structure and the objects with which it plans to interact does not need to be known in advance, and because dynamic SQL statements do not have a precoded, fixed format, the data objects they reference can change without affecting the statement (provided that objects referenced by the statement are not deleted).

Even though dynamic SQL statements are generally more flexible than static SQL embedded SQL programming statements, they are usually more complicated to incorporate into an application. And because the work of analyzing the statement to select the best access plan is performed at application runtime (again, by the DB2 optimizer), dynamic SQL statements can take longer to execute than their static SQL counterparts. (Because dynamic SQL statements can take advantage of the database statistics available at application runtime, there are some cases in which a dynamic SQL statement executes faster than an equivalent static SQL statement, but those are the exception and not the norm.)

The following are examples of dynamic SQL statements:

  • SELECT COUNT(*) FROM ?
  • INSERT INTO EMPLOYEES VALUES (?, ?)
  • DELETE FROM DEPARTMENT WHERE DEPTID = ?

Generally, dynamic SQL statements are well suited for applications that interact with a rapidly changing database or that allow users to define and execute ad-hoc queries.

Sphere: Related Content

del.icio.us |Digg it |Earthlink |Furl |iFeedReaders |ma.gnolia |Maple.nu |Netvouz |Netscape |RawSugar |reddit |Scuttle |Shadows |Simpy |Spurl |StumbleUpon |Wink |Yahoo MyWeb |

Bind or Rebind?

Filed under: Basics, DB2 — Kenny at 9:26 am on Friday, September 22, 2006

Two commands, similar in function, but what’s the difference between them, and when should you use BIND, but not REBIND, and vice-versa?
Invalid Plans and Packages
BIND PLAN

The BIND PLAN command builds an application plan. All DB2 programs require an application plan to allocate DB2 resources and support SQL requests made at run time. In general, BIND is used when the source code has changed. If the source code has changed, then you must precompile and compile the program. The new load module will have a new timestamp/token. To move that same consistency token into DB2, you must do a BIND with the new DBRM in the DBRM library as input.

BIND PACKAGE

The BIND PACKAGE command builds an application package. DB2 records the description of the package in the catalog tables and saves the prepared package in the directory.

REBIND PLAN

The REBIND PLAN command rebinds an application plan when when changes are made that affect the plan, but do not change the SQL statements in the programs. In general, REBIND is used when something other than the source code has changed.

For example, REBIND PLAN can be used when authorizations are changed, or if a new index has been created for the plan, or if the RUNSTATS utility has been run on tables accessed by the plan.

If the rebind is successful, the process prepares an application plan and updates its description in the SYSIBM.SYSPLAN catalog table.

REBIND PLAN is generally faster and more economical than BIND PLAN. But if the SQL statements have been changed, or the program has been recompiled, the BIND PLAN command should be used with ACTION(REPLACE).

REBIND PACKAGE

The REBIND PACKAGE command rebinds an application package when you make changes that affect the package, but have not changed the SQL statements in the program. For example, you can use REBIND PACKAGE when you change the authorizations, create a new index for the package, or use RUNSTATS.

When the REBIND PACKAGE(*) command is issued, trigger packages will not be affected.

REBIND PACKAGE is generally faster and more economical than BIND PACKAGE. You should use BIND PACKAGE with the ACTION(REPLACE) option under the following conditions:

  • When you change the SQL statements
  • When you recompile the program
  • You previously ran BIND PACKAGE with the SQLERROR(CONTINUE) option

NOTES

The BIND command cannot be issued when the following tablespaces are being reorganised: SYSDBAUT, SYSDBASE, SYSGPAUT, SYSPKAGE, SYSPLAN, SYSSTATS, SYSUSER and SYSVIEWS.

With a REBIND, good results are as likely to occur as bad results.
It is possible to limit potential negative impact of a rebind command by rebinding the affected package into a practice collection with EXPLAIN(YES). This package can then be freed, and the plan_table analysed to confirm the access path has either improved or remained the same.

Remember: not all improvements are reflected in the plan_table. For example, the plan_table cannot show improved index screening with list prefetch or improved lock avoidance that comes from changing REBIND parameters.

Even if the access path hasn’t changed, it is certain that there is no new negative access path, and it might be an idea to rebind into production to take advantage of the hidden, subtle improvements.

Other References:
http://db2mag.com/db_area/archives/2002/q1/programmers.shtml

Sphere: Related Content

del.icio.us |Digg it |Earthlink |Furl |iFeedReaders |ma.gnolia |Maple.nu |Netvouz |Netscape |RawSugar |reddit |Scuttle |Shadows |Simpy |Spurl |StumbleUpon |Wink |Yahoo MyWeb |

What is a DBRM?

Filed under: Basics, DB2 — Kenny at 12:30 pm on Monday, September 18, 2006

DBRM is an acronym for Data Base Request Module.

A DBRM is the output of the pre-compilation of source code that contains SQL requests. These SQL requests are extracted by the pre-compiler and written to a member of an MVS dataset. Along with the SQL itself, contextual information, such as the data type and length of the host variables used by the statements, is stored in the DBRM.

In the source, the pre-compiler replaces the SQL statements by calls to the DB2 run-time interface and this modified source is input to the usual compile and link-edit routines. The DBRM is used by DB2 at bind time, when one or more DBRMs are put together to form a DB2 plan.

This plan will be used at run time, together with the accompanying load module(s).

See also:
SYSIBM.SYSDBRM table.

Sphere: Related Content

del.icio.us |Digg it |Earthlink |Furl |iFeedReaders |ma.gnolia |Maple.nu |Netvouz |Netscape |RawSugar |reddit |Scuttle |Shadows |Simpy |Spurl |StumbleUpon |Wink |Yahoo MyWeb |

DB2 Traces

Filed under: Basics, DB2, Traces — Kenny at 5:42 am on Monday, September 18, 2006

DB2 traces record diagnostic information describing particular events within a DB2 subsystem. As DB2 operates, trace information is written to various destinations. This trace information can be read and analyzed to obtain information on subsystem and application performance.

DB2 provides six types of traces, and each describes information about the DB2 environment. These six types of traces are outlined in the following table:

Trace Started By Description
Accounting DSNZPARM or START TRACE command Records performance information about the execution of DB2 application programs
Audit DSNZPARM or START TRACE command Provides information about DB2 DDL, security, utilities, and data modification
Global DSNZPARM or START TRACE command Provides information for the servicing of DB2
Monitor DSNZPARM or START TRACE command Records data useful for online monitoring of the DB2 subsystem and DB2 application programs
Performance START TRACE COMMAND Collects detailed data about DB2 events, enabling database and performance analysts to pinpoint the causes of performance problems.
Statistics DSNZPARM or START TRACE command Records information regarding the DB2 subsystems use of resources.

DB2 traces can be started in two ways; By specifying the appropriate DSNZPARM values at DB2 startup, or by using the -START TRACE command to initiate specific traces when DB2 is already running.

Each trace is broken down further into classes, each of which provides information about aspects of that trace. Classes are composed of IFCIDs. An IFCID (sometimes pronounced if-kid) defines a record that represents a trace event. IFCIDs are the single smallest unit of tracing that can be invoked by DB2.

Sphere: Related Content