By Michael Marxmeier (mike@marxmeier.com)
Tuning the database performance is not a simple task and it
depends on your specific requirements, the operating system
and the target hardware. There is no "one fits it all" aproach.
The goal is to avoid abvious slowdowns and balance the available
resources (I/O bandwidth, memory and CPU).
Please note: This document is currently a draft and neither
complete nor verified for its accuracy.
Revision:
2002-04-22 - Update note on HP-UX buffer cache, add note on IPC communication
2002-02-13 - Add note on limiting HP-UX buffer cache
2002-02-13 - Add note on HP-UX patch PHKL_24005
2001-11-17 - Remove information refering to obsolete
Linux 2.0 kernel and add a few clarifications.
Async or Sync mode writing mode
The biggest difference in database write performance depends on the
decision to use Asnc or Sync writing mode.
As of A.06.20 eloqdb6 defaults to Sync writing mode.
In Sync writing mode the database is more resistant against
operating system failures but it causes a significant
load on the disk subsystem.
In Sync write mode the operating system is forced to write committed
transactions and some internal structural changes to disk.
This is used to protect the database against the possible loss
of buffered information when the system goes down.
Unless your system is not protected against power failure, is
suspectable to fail for one reason or another or you need the best
possible protection against downtime you should consider to use the
async write mode.
To configure the eloqdb6 for Async write mode you need to change
the eloqdb6.cfg configuration file:
[Server]
...
SyncMode = 0
Note: A.06.20 defaults to Sync write mode. When updating from an A.06.10
revision you should make sure the corect SymcMode setting is in your
eloqdb6.cfg configuration file.
Note: When importing a database (dbimport) is may be helpful to
temporarily switch to Async write mode even if you usually use the
Sync write mode.
Optimizing the filesystem performance for large files
Especially when using sync writing mode, the file system block size
could have a big impact on the system performance.
While this section mostly applies to Linux based systems the background
information could also be interesting for tuning HP-UX based systems.
The eloqdb6 database server always accesses the volume
files in 8 KB blocks. Often those read or write requests are
combined to access up to 64 KB at a time.
UNIX file systems are usually organized in blocks.
The file system block size can be choosen at file system creation
time (eg. on Linux by using the -b 4096 command line argument to
mk2efs would result in a 4KB block size).
In order to maintain your data on the disk, the operating system
maintains additional information where your data is located on disk.
When files get bigger so does the overhead to keep track where your
data is located.
Enlarging the file system block size greatly reduces the amount of
overhead required to maintain your data if you use big files (as
eloqdb6 does) at the expense of using slightly more disk space for
small files.
In addition to the data blocks (holding the file contents)
additional information is maintained where the data blocks are actually
located on the disk. When the file size exceeds a trivial size, the
location of the data blocks is also maintained in a separate block
on disk (indirect blocks - which the OS also needs to keep track of).
At some point (depending on file size and block size) you need blocks
that specify the location of blocks that specify the location of your
data on disk (double indirection). At this point maintaining this
information becomes a factor when reading or writing the database.
While a block size of 8 KB would be a "perfect fit", the max. usable
block size on Linux depends on the architecture (processor). On the
x86 this is 4 KB.
We recommend to specify a block size of 4 KB for the Linux platform
to create a file system dedicated to hold your database.
This filesystem block size is even more important when using the
sync writing mode with the Linux ext2 file system.
The algorithm used by the ext2 filesystem for syncing buffers to disk
is inefficient for big files. While this has improved with the Linux 2.4
kernel it could be a problem with Linux 2.2 based installations.
Using a small block size could result in system delays and the system
may seem to stall for a few seconds when eloqdb6 performs a checkpoint
operation (default every 60 seconds). This gets noticeable when the
database volume files grow beyound trivial sizes (300 MB and up) and
gets worse the bigger the files get.
As a solution make sure you use a 4 KB block size with the ext2
filesystem. Besides reducing the overhead it also greatly reduces
the impact of the ext2 sync inefficiency.
The file system block size can be choosen at file system creation
time (eg. on Linux by using the -b 4096 command line argument to
mk2efs would result in a 4KB block size). The the default block size
for the ext 2 filesystem used to be 1 KB but in recent releases this
has been changed to 4 KB.
The HP-UX platform is less likely affected by this.
The hfs file system (which uses indirect blocks) uses a default
block size of 16 KB and should no longer be used for
anything besides /stand.
The VxFS file system (which is the default) is extent based,
that means the information where data blocks are located on disk is
maintained differently.
HP-UX buffer cache
On HP-UX with large memory configuration you may want to limit
the amount of memory which gets used as buffer space.
In async write mode, pending write operations could add up in
memory and the I/O system could be unable to keep up.
It has been observed that HP-UX seems to gather pending changes
in memory and send them to the disc at once which causes
unnecessary pressure on memory and I/O.
You may encounter a periodic system freeze (1 or 2 seconds, or
more, every 30 seconds or so). In this case, the cause is often a
large buffer cache whose dirty pages must be flushed every 30 seconds
by the syncer process. By default, at HP-UX 10.X, the buffer cache is
dynamic and is allowed to expand to take up to 50% of main memory.
On systems experiencing these system freezes, setting the buffer cache
to a smaller value, starting with perhaps 10% of memory, and adjusting
the buffer cache may resolve this performance problem.
The problem has been determined to be either a file system I/O
bottleneck or a memory bottleneck. In either case, adjusting
the size of the buffer cache may resolve the problem.
The size of the
buffer cache is controlled by the kernel parameters bufpages,
nbuf, dbc_max_pct, and dbc_min_pct.
Another option is to run the syncer more often (for example
each 5 seconds instead of the default 30 seconds). This should
cause dirty cache pages written to disk earlier instead of accumulating
in memory.
Maximize your disk performance
If you have multiple disk drives it makes sense to split the load to
multiple disk drives. For example have the data volume on a different
drive than the log volume.
The log volume is heavily used for write operations (especially in
sync write mode) as all transactions need to be saved to the
transaction log before applied to the data volume(s).
The load on the data volume(s) usually tends to be more read oriented
and accesses is significantly reduced by caching the disk content.
In addition the performance of the disk subsystem can be improved
by combining multiple disks into a logical disk drive. This can be
achieved by using a RAID0 (striping) or RAID5 (using more than 2 disks)
configurations. RAID1 (mirroring) increases the read performance but may
be slightly slower during write. For RAID 5 configurations
choosing the right stripe size is important for the performance.
Enable IPC communication method
Especially for read performance the Eloquence database is affected
by communication latency and overhead. Read operations are typically
cheap (especially if the information is present in the eloqdb6 cache)
and the communication overhead could be significant in this situation.
While not much can be done about it when operating in a network
environment (where client and server run on different machines)
enabling the IPC communication provides up to 30% better performance
in situations where the data base server and client run on the
same system.
The IPC communication uses a shared memory segment for communication
between client and server which results in significant less
communication overhead (measured as systm time) then using sockets
since data need not be copied in the kernel and the networking
subsystem is avoided.
To enable IPC communication you need to change the eloqdb6.cfg
configuration file:
[Server]
...
EnableIPC = 1
On HP-UX using IPC communication requires adapting the kernel
configuration for semaphores and SYSV shared memory segments.
Please refer to the Eloquence Installation and Configuration
manual for details.
On Linux the default kernel limits should usually be sufficient
and need not be changed.
With more than 400 connected users, the current IPC communication
method has been found to cause a a significant slowdown. This seems
to be a limitation of HP-UX which to handle a big number of
shared memory segments efficiently. Eloquence A.07.00 will
provide an option to use a common shared memory segment with
multiple client programs. For Eloquence versions before
A.07.00 it is recommended to disable IPC communication with more
than 400 connected users.
The IPC communication method is currently disabled by default
because problems have been reported on HP-UX systems which are
not current on kernel patches. Please make sure you have current
kernel patches installed before enabling IPC communication.
For HP-UX 11.0 patch PHKL_24005 seems to fix this problem.
For HP-UX 10.20 no specific patch has been identified by now.
In rare situations a writev() call could fail with an EFAULT
(indicating a wrong memory access has detected in the kernel)
although the arguments to the system call are correct.
The EFAULT causes eloqdb6 to abort with an internal error ("panic")
and re-start automatically (depending on the panic= configuration
in the eloqdb6.cfg).
While this kernel problem has only affected few installations and
happens rarely (read once in a couple of months) we have decided
to disable IPC communication by default until a specific kernel
patch could be identified as a requirement.
Data base cache size
Almost all disk access is performed through the internal eloqdb6
buffer cache. The bigger it gets the more disk accesses can be
avoided. The default size of 5 MB which is almost certainly not
sufficient for a serious server.
Of course you need to make sure that enough RAM is available
to run the other processes without reverting to paging as this
will immediately kill performance. In addition, the eloqdb6
to some extend relies on the buffer cache of the operating system
to buffer write requests temporarily. So it is also important to
leave enough memory available for the OS buffer cache as otherwise
additional disk access is required.
Checkpoint size
A checkpoint operation is executed regulary by the eloqdb6 server
(by default each 60 seconds). If the discardable size of the log volume
exceeds the configured amount, the checkpoint operation is executed
immediately at a higher priority to keep the log volume size within the
configured limit.
The checkpoint operation is used to prune committed transactions
from the log volume after making sure all data bas been written to
the data volume (this involves a sync on the data volume in
sync write mode).
The check point size should be choosen that all committed transactions
in the configured checkpoint interval fits in the log volume to avoid
executing checkpoint operations too often. In addition the available
memory in the system should be taken into account as in asnyc mode
we usually want to avoid changes to the transaction log really
hit the disk.
Use transactions
The Eloquence always uses transactions for all database
operations which modify content. If you don't use transactions in
your program, each database operation is executed with it's own
transaction. When using transactions in your program you are
in control over the life time of a transaction. Database operations
use sub-transactions in this case.
Since committing a transaction involves some overhead (and a sync
operation in Sync write mode) grouping transaction could make a big
difference.
|