HP Eloquence A.06.31 Release Notes

A.06.31 Release Notes

HP Eloquence database server (eloqdb6)

This document covers fixes and enhancements which have been integrated into the HP Eloquence A.06.31 eloqdb6 database server.

Contents of this document:

Improved support for deadlock detection and recovery
dbctl command line interface enhancements
dbfsck enhancements
Fixed problems

Improved support for deadlock detection and recovery

Eloqdb6 includes improved support for deadlock detection and recovery. The deadlock resolution algorithm is designed not to affect existing applications. Program changes may be required for programs using transactions to recover from a deadlock situation.

A deadlock situation occurs whenever two or more sessions wait for each other in a way that none could proceed (you are going to wait for someone else who was waiting for you).
Previously all sessions involved in a deadlock were blocked infinitely by eloqdb6, possibly causing a server hangup or preventing a clean shutdown. This could only be resolved by killing the eloqdb6 process and its I/O threads.
Eloqdb6 is able to reliably detect and act upon deadlock situations. The deadlock is resolved by returning the new status code -35 to one of the session involved. To maintain full backward compatibility the -35 status code is only returned to programs which satisfy specific characteristics:
- When using secondary blocking DBLOCKs a status -35 is returned whenever this DBLOCK would cause a deadlock. This can only happen when the AllowSecondaryBlockingLock configuration item has been enabled. Otherwise a status code -135 is returned when the DBLOCK is about to block.
- Using transactions can cause deadlock situations if there are dependencies between concurrent transactions. In this case the status -35 is returned to the application which has the least work done in the current transaction. Programs not using transactions are never affected since this could have an impact on data integrity.
The deadlock detection in eloqdb6 can only resolve deadlock situations within the server context. Deadlock situations beyond its scope (i.e. an application waiting for user input infinitely or when using multiple servers) must be resolved manually (for example using the dbctl cancelthread or killthread functions described below).
The new data base status -35 can be returned on a DBLOCK, DBPUT, DBUPDATE or DBDELETE statement if the eloqdb6 server has detected a deadlock condition. There is no use in retrying this statement. An appropriate action is to do a transaction rollback and possibly an unlock to resolve the situation.
The secondary status (returned in the 10th status element) is 1 if the status -35 was caused by a deadlock resolution and -1 if it was caused by a manual intervention using dbctl cancelthread.
A database session which blocks on a resource (either due to a DBLOCK or because of concurrent transactions) can now be signaled or even killed. The command syntax is:
```
  dbctl cancelthread {tid}
  dbctl killthread {tid}
```
cancelthread causes the blocking statement to return with status -35 (secondary status -1), killthread terminates the entire session by closing the connection to the client process (which will then fail with a -700 status).
The session to be killed must be specified with its 'tid' (thread identifier) which can be obtained with either 'dbctl list thread' or the http status display: A thread with a 'W' state (in the 'ST' column) can be signaled or killed. Additionally, a description is output showing the internal location where the block was caused and which other thread it is waiting for.

dbctl command line interface enhancements

The dbctl command line interface has been enhanced to support restoring a database archive with a different database name. The new command syntax is:

  dbctl dbrestore [/info] source [new_database_name]

When the /info option is specified the header of the specified archive is displayed without the restore taking place.
If a new_database_name is specified, its length must not exceed 64 characters, it must not begin with '/' and must not contain any space characters.

To copy an entire database, first use dbctl dbstore, then restore it with a different name by using dbctl dbrestore.

dbfsck enhancements

The dbfsck utility has been enhanced to support repairing a fixrec free-list. This is an internal list which keeps track of free record numbers. If such a list was corrupted, previous dbfsck releases crashed. Since patch PE63-0104230 fixrec free-lists can be safely repaired with dbfsck.

Fixed problems

Eloqdb6 includes improved support for deadlock detection and recovery.
A dbrestore operation could corrupt the database volume (#224).
Wrong status -35 on DBLOCK (#258). A released lock could fail to update lock dependencies. In rare situations this could result in a wrong -35 status (deadlock detected).
A duplicate DBLOCK could cause a deadlock (#110). A duplicate DBLOCK request issued from the same session could result in a deadlock condition (or -135 status for previous eloqdb6 versions). This happened when the duplicate DBLOCK request would block on another lock request which was blocked by itself.
This was solved by changing DBLOCK strategy slightly. Instead of a purely fair lock strategy, a DBLOCK which is about to block on a blocked lock is now granted if the session already has another granted DBLOCK. This will avoid the situation and allow for greater concurrency.
Database volume file could grow to 2GB during online backup (#225). If no volume limits where specified for a database volume file, eloqdb6 assumed a volume file limit of 2GB during online backup. Since eloqdb6 currently does not support LFS (64 bit file operations) on the UNIX platform this caused an internal failure when online backup was stopped.
This does not cause a problem during normal operation since the attempt to enlarge the volume file results in an error return from the operating system which is handled by eloqdb6. However this is not discovered during online backup. Eloqdb6 now limits the size of database volume files to 2 GB - 8K on the UNIX/Linux platform.
Eloqdb6 could fail to recover from an unclean shutdown due to inconsistencies in the log volume. Testing of log volume consistency during startup has been relaxed as this information is not used during recovery and is rewritten when eloqdb6 is up (#242).

Due to a race condition eloqdb6 could fail on a Linux 2.4 kernel during start with an internal error (#251).

   server panic: Fatal problem detected in tio_thread_main
   Assertion failed: ctx && ctx->pid == getpid()
   server panic: Aborting on internal failure, file tio.c, line 1382

Decoding of DBLOCK status could be incomplete in the HTTP status display and dbctl list lock output. Compound lock descriptors (PREDICATE lock) were not decoded correctly if an effective set or database lock was encountered (#260).
The eloqdb6 title was not included in the HTTP status display (#5).
The list of opened databases was not available in the HTTP status display (#16).
Using transactions could result in a server deadlock.
A server crash during schema, dbcreate, dberase or dbpurge could result in data corruption under rare circumstances.
The IPC communication method (using shared memory) is now disabled by default (the EnableIPC setting in the eloqdb6.cfg configuration file now defaults to 0).
This is a workaround due to problems encountered on some HP-UX systems which could result in an eloqdb6 panic due to a failed writev() system call and also possibly corrupt your database volume. This is believed to be a bug in the HP-UX kernel.
Since IPC communication does not significantly improve overall performance disabling it should not have any noticable effect.
On Linux SMP systems shutting down the eloqdb6 server could sometimes cause a segment violation (SIGSEGV).
Using regular expressions with DBFIND could cause the server to hang infinitely under rare circumstances.
A newline character was missing in the 'dbctl list lock' output.
On the HP-UX platform the file descriptor soft limit could cause connection failures.
A problem has been resolved which could cause SQL/R to return incomplete results on a query on a busy database if the query were optimized by an index.
SQL/R uses an API to the data base server which allows it to optimize queries with indexes. A write access to the database could cause any index cursor to become invalid. In this case an EOF condition was wrongly assumed. The index cursor is now revalidated appropriately.
A problem related to stopping online backup mode was solved. Depending on database usage during online backup this could result in data corruption when stopping online backup mode.
When the HP Eloquence database is in online backup mode committed transactions are saved in the log volume not the data volume(s). When the online backup mode is stopped (through dbctl backup stop) committed transactions are transferred from the log volume to the data volume(s). This could cause an extension of the data volume.
A volume extension while transferring committed transactions from the log volume to the data volume could cause corruption of information in the log volume.
Symptoms:
- server panic after stopping online backup mode
- dbfsck reports volume corruption
Under rare circumstances a volume crash recovery could corrupt the data volumes. This can happen when the eloqdb6 server has crashed or was terminated with kill -9. In this case all committed transactions since the last checkpoint operation are recovered from the log volume during eloqdb6 startup.
A problem was found if a data area affected by a transaction was deleted by a subsequent transaction and has been re-used since the last checkpoint operation. In some cases this could lead to corruption of structural volume information. This has been fixed by delaying page re-use until the next checkpoint operation.
Symptoms:
- server panic during volume crash recovery
When an eloqdb6 server panic (internal error) is encountered the eloqdb6 process restarts itself by default. This can be configured in the eloqdb6.cfg configuration file with the panic= configuration item.
When eloqdb6 encounters a fatal problem during startup this will cause eloqdb6 to be started in an endless loop, possibly filling the log file and using system resources. Now eloqdb6 will not restart automatically if a problem during restart is encountered.
A problem related to rollback of subtransactions was solved. Depending on database usage this could result in btree corruption or an eloqdb6 server panic.
When a subtransaction was rolled back but the main transaction continued it could happen that invalidated btree information remained in memory and was used by subsequent btree operations. This problem could only happen when using transactions.
Symptoms:
- Assertion failed: f_bhp->id.node_id == node_id server panic: Aborting on internal failure, file mpool.c, line 392
- DBPUT failed due to duplicate secondary index
- dbfsck reports corrupted btree pages
Impact:
A database which shows the symptoms above must be reloaded (dbexport, dberase and dbimport). Information besides internal btree is not affected.
The eloqdb6 server could panic during processing volume recovery complaining about a buffer leak. This was caused by a bug processing defective recovery information from the log volume.
Cache buffer management has been tuned to provide better performance for delete and update operations in a transaction. This should also improve the impact of transactions on concurrent read operations.
A problem in the HTTP status display was fixed which caused the lock status to return wrong HTML code (#549).