contact contact

Database Performance Tuning

By Michael Marxmeier (mike@marxmeier.com)

Tuning the database performance is not a simple task and it depends on your specific requirements, the operating system and the target hardware. There is no "one fits it all" aproach. The goal is to avoid abvious slowdowns and balance the available resources (I/O bandwidth, memory and CPU).

Please note: This document is currently a draft and neither complete nor verified for its accuracy.

2002-04-22 - Update note on HP-UX buffer cache, add note on IPC communication
2002-02-13 - Add note on limiting HP-UX buffer cache
2002-02-13 - Add note on HP-UX patch PHKL_24005
2001-11-17 - Remove information refering to obsolete Linux 2.0 kernel and add a few clarifications.

Async or Sync mode writing mode

The biggest difference in database write performance depends on the decision to use Asnc or Sync writing mode. As of A.06.20 eloqdb6 defaults to Sync writing mode.

In Sync writing mode the database is more resistant against operating system failures but it causes a significant load on the disk subsystem. In Sync write mode the operating system is forced to write committed transactions and some internal structural changes to disk. This is used to protect the database against the possible loss of buffered information when the system goes down.

Unless your system is not protected against power failure, is suspectable to fail for one reason or another or you need the best possible protection against downtime you should consider to use the async write mode.

To configure the eloqdb6 for Async write mode you need to change the eloqdb6.cfg configuration file:

SyncMode = 0
Note: A.06.20 defaults to Sync write mode. When updating from an A.06.10 revision you should make sure the corect SymcMode setting is in your eloqdb6.cfg configuration file.

Note: When importing a database (dbimport) is may be helpful to temporarily switch to Async write mode even if you usually use the Sync write mode.

Optimizing the filesystem performance for large files

Especially when using sync writing mode, the file system block size could have a big impact on the system performance.
While this section mostly applies to Linux based systems the background information could also be interesting for tuning HP-UX based systems.

The eloqdb6 database server always accesses the volume files in 8 KB blocks. Often those read or write requests are combined to access up to 64 KB at a time.

UNIX file systems are usually organized in blocks. The file system block size can be choosen at file system creation time (eg. on Linux by using the -b 4096 command line argument to mk2efs would result in a 4KB block size).

In order to maintain your data on the disk, the operating system maintains additional information where your data is located on disk. When files get bigger so does the overhead to keep track where your data is located. Enlarging the file system block size greatly reduces the amount of overhead required to maintain your data if you use big files (as eloqdb6 does) at the expense of using slightly more disk space for small files.

In addition to the data blocks (holding the file contents) additional information is maintained where the data blocks are actually located on the disk. When the file size exceeds a trivial size, the location of the data blocks is also maintained in a separate block on disk (indirect blocks - which the OS also needs to keep track of). At some point (depending on file size and block size) you need blocks that specify the location of blocks that specify the location of your data on disk (double indirection). At this point maintaining this information becomes a factor when reading or writing the database.

While a block size of 8 KB would be a "perfect fit", the max. usable block size on Linux depends on the architecture (processor). On the x86 this is 4 KB. We recommend to specify a block size of 4 KB for the Linux platform to create a file system dedicated to hold your database.

This filesystem block size is even more important when using the sync writing mode with the Linux ext2 file system. The algorithm used by the ext2 filesystem for syncing buffers to disk is inefficient for big files. While this has improved with the Linux 2.4 kernel it could be a problem with Linux 2.2 based installations. Using a small block size could result in system delays and the system may seem to stall for a few seconds when eloqdb6 performs a checkpoint operation (default every 60 seconds). This gets noticeable when the database volume files grow beyound trivial sizes (300 MB and up) and gets worse the bigger the files get.

As a solution make sure you use a 4 KB block size with the ext2 filesystem. Besides reducing the overhead it also greatly reduces the impact of the ext2 sync inefficiency. The file system block size can be choosen at file system creation time (eg. on Linux by using the -b 4096 command line argument to mk2efs would result in a 4KB block size). The the default block size for the ext 2 filesystem used to be 1 KB but in recent releases this has been changed to 4 KB.

The HP-UX platform is less likely affected by this. The hfs file system (which uses indirect blocks) uses a default block size of 16 KB and should no longer be used for anything besides /stand. The VxFS file system (which is the default) is extent based, that means the information where data blocks are located on disk is maintained differently.

HP-UX buffer cache

On HP-UX with large memory configuration you may want to limit the amount of memory which gets used as buffer space. In async write mode, pending write operations could add up in memory and the I/O system could be unable to keep up.
It has been observed that HP-UX seems to gather pending changes in memory and send them to the disc at once which causes unnecessary pressure on memory and I/O.

You may encounter a periodic system freeze (1 or 2 seconds, or more, every 30 seconds or so). In this case, the cause is often a large buffer cache whose dirty pages must be flushed every 30 seconds by the syncer process. By default, at HP-UX 10.X, the buffer cache is dynamic and is allowed to expand to take up to 50% of main memory. On systems experiencing these system freezes, setting the buffer cache to a smaller value, starting with perhaps 10% of memory, and adjusting the buffer cache may resolve this performance problem.

The problem has been determined to be either a file system I/O bottleneck or a memory bottleneck. In either case, adjusting the size of the buffer cache may resolve the problem. The size of the buffer cache is controlled by the kernel parameters bufpages, nbuf, dbc_max_pct, and dbc_min_pct.

Another option is to run the syncer more often (for example each 5 seconds instead of the default 30 seconds). This should cause dirty cache pages written to disk earlier instead of accumulating in memory.

Maximize your disk performance

If you have multiple disk drives it makes sense to split the load to multiple disk drives. For example have the data volume on a different drive than the log volume.
The log volume is heavily used for write operations (especially in sync write mode) as all transactions need to be saved to the transaction log before applied to the data volume(s). The load on the data volume(s) usually tends to be more read oriented and accesses is significantly reduced by caching the disk content.

In addition the performance of the disk subsystem can be improved by combining multiple disks into a logical disk drive. This can be achieved by using a RAID0 (striping) or RAID5 (using more than 2 disks) configurations. RAID1 (mirroring) increases the read performance but may be slightly slower during write. For RAID 5 configurations choosing the right stripe size is important for the performance.

Enable IPC communication method

Especially for read performance the Eloquence database is affected by communication latency and overhead. Read operations are typically cheap (especially if the information is present in the eloqdb6 cache) and the communication overhead could be significant in this situation.
While not much can be done about it when operating in a network environment (where client and server run on different machines) enabling the IPC communication provides up to 30% better performance in situations where the data base server and client run on the same system.

The IPC communication uses a shared memory segment for communication between client and server which results in significant less communication overhead (measured as systm time) then using sockets since data need not be copied in the kernel and the networking subsystem is avoided.

To enable IPC communication you need to change the eloqdb6.cfg configuration file:

EnableIPC = 1
On HP-UX using IPC communication requires adapting the kernel configuration for semaphores and SYSV shared memory segments. Please refer to the Eloquence Installation and Configuration manual for details.
On Linux the default kernel limits should usually be sufficient and need not be changed.

With more than 400 connected users, the current IPC communication method has been found to cause a a significant slowdown. This seems to be a limitation of HP-UX which to handle a big number of shared memory segments efficiently. Eloquence A.07.00 will provide an option to use a common shared memory segment with multiple client programs. For Eloquence versions before A.07.00 it is recommended to disable IPC communication with more than 400 connected users.

The IPC communication method is currently disabled by default because problems have been reported on HP-UX systems which are not current on kernel patches. Please make sure you have current kernel patches installed before enabling IPC communication.

For HP-UX 11.0 patch PHKL_24005 seems to fix this problem.
For HP-UX 10.20 no specific patch has been identified by now.

In rare situations a writev() call could fail with an EFAULT (indicating a wrong memory access has detected in the kernel) although the arguments to the system call are correct. The EFAULT causes eloqdb6 to abort with an internal error ("panic") and re-start automatically (depending on the panic= configuration in the eloqdb6.cfg).
While this kernel problem has only affected few installations and happens rarely (read once in a couple of months) we have decided to disable IPC communication by default until a specific kernel patch could be identified as a requirement.

Data base cache size

Almost all disk access is performed through the internal eloqdb6 buffer cache. The bigger it gets the more disk accesses can be avoided. The default size of 5 MB which is almost certainly not sufficient for a serious server.

Of course you need to make sure that enough RAM is available to run the other processes without reverting to paging as this will immediately kill performance. In addition, the eloqdb6 to some extend relies on the buffer cache of the operating system to buffer write requests temporarily. So it is also important to leave enough memory available for the OS buffer cache as otherwise additional disk access is required.

Checkpoint size

A checkpoint operation is executed regulary by the eloqdb6 server (by default each 60 seconds). If the discardable size of the log volume exceeds the configured amount, the checkpoint operation is executed immediately at a higher priority to keep the log volume size within the configured limit.

The checkpoint operation is used to prune committed transactions from the log volume after making sure all data bas been written to the data volume (this involves a sync on the data volume in sync write mode).

The check point size should be choosen that all committed transactions in the configured checkpoint interval fits in the log volume to avoid executing checkpoint operations too often. In addition the available memory in the system should be taken into account as in asnyc mode we usually want to avoid changes to the transaction log really hit the disk.

Use transactions

The Eloquence always uses transactions for all database operations which modify content. If you don't use transactions in your program, each database operation is executed with it's own transaction. When using transactions in your program you are in control over the life time of a transaction. Database operations use sub-transactions in this case.

Since committing a transaction involves some overhead (and a sync operation in Sync write mode) grouping transaction could make a big difference.

  Privacy | Webmaster | Terms of use | Impressum Revision:  2002-04-22  
  Copyright © 1995-2002 Marxmeier Software AG