-
A deadlock situation occurs whenever two or more sessions
wait for each other in a way that none could proceed (you are
going to wait for someone else who was waiting for you).
Previously all sessions involved in a deadlock were blocked
infinitely by eloqdb6, possibly causing a server hangup
or preventing a clean shutdown. This could only be resolved
by killing the eloqdb6 process and its I/O threads.
Eloqdb6 is able to reliably detect and act upon deadlock
situations. The deadlock is resolved by returning the new status
code -35 to one of the session involved. To maintain full
backward compatibility the -35 status code is only returned
to programs which satisfy specific characteristics:
-
When using secondary blocking DBLOCKs a status -35 is returned
whenever this DBLOCK would cause a deadlock. This can only
happen when the AllowSecondaryBlockingLock configuration item
has been enabled. Otherwise a status code -135 is returned
when the DBLOCK is about to block.
-
Using transactions can cause deadlock situations if there
are dependencies between concurrent transactions. In this
case the status -35 is returned to the application which has
the least work done in the current transaction. Programs
not using transactions are never affected since this could
have an impact on data integrity.
The deadlock detection in eloqdb6 can only resolve
deadlock situations within the server context. Deadlock situations
beyond its scope (i.e. an application waiting for user input
infinitely or when using multiple servers) must be resolved
manually (for example using the dbctl cancelthread or killthread
functions described below).
-
The new data base status -35 can be returned on a DBLOCK,
DBPUT, DBUPDATE or DBDELETE statement if the eloqdb6 server
has detected a deadlock condition. There is no use in retrying
this statement. An appropriate action is to do a transaction
rollback and possibly an unlock to resolve the situation.
The secondary status (returned in the 10th status element)
is 1 if the status -35 was caused by a deadlock resolution
and -1 if it was caused by a manual intervention using
dbctl cancelthread.
-
A database session which blocks on a resource (either due to a
DBLOCK or because of concurrent transactions) can now be signaled
or even killed. The command syntax is:
dbctl cancelthread {tid}
dbctl killthread {tid}
cancelthread causes the blocking statement to return with status
-35 (secondary status -1), killthread terminates the entire
session by closing the connection to the client process (which will
then fail with a -700 status).
The session to be killed must be specified with its 'tid' (thread
identifier) which can be obtained with either 'dbctl list thread'
or the http status display: A thread with a 'W' state (in the
'ST' column) can be signaled or killed. Additionally, a description
is output showing the internal location where the block was caused
and which other thread it is waiting for.
-
Eloqdb6 includes improved support for deadlock detection and
recovery.
-
A dbrestore operation could corrupt the database volume (#224).
-
Wrong status -35 on DBLOCK (#258). A released lock could fail to
update lock dependencies. In rare situations this could result
in a wrong -35 status (deadlock detected).
-
A duplicate DBLOCK could cause a deadlock (#110). A duplicate DBLOCK
request issued from the same session could result in a deadlock
condition (or -135 status for previous eloqdb6 versions). This
happened when the duplicate DBLOCK request
would block on another lock request which was blocked by itself.
This was solved by changing DBLOCK strategy slightly. Instead of
a purely fair lock strategy, a DBLOCK which is about to block on a
blocked lock is now granted if the session already has another
granted DBLOCK. This will avoid the situation and allow for greater
concurrency.
-
Database volume file could grow to 2GB during online backup (#225).
If no volume limits where specified for a database volume file,
eloqdb6 assumed a volume file limit of 2GB during online backup.
Since eloqdb6 currently does not support LFS (64 bit file operations)
on the UNIX platform this caused an internal failure when online
backup was stopped.
This does not cause a problem during normal operation since the
attempt to enlarge the volume file results in an error return from
the operating system which is handled by eloqdb6. However this is
not discovered during online backup. Eloqdb6 now limits the size
of database volume files to 2 GB - 8K on the UNIX/Linux platform.
-
Eloqdb6 could fail to recover from an unclean shutdown due to
inconsistencies in the log volume. Testing of log volume consistency
during startup has been relaxed as this information is not used
during recovery and is rewritten when eloqdb6 is up (#242).
-
Due to a race condition eloqdb6 could fail on a Linux 2.4 kernel
during start with an internal error (#251).
server panic: Fatal problem detected in tio_thread_main
Assertion failed: ctx && ctx->pid == getpid()
server panic: Aborting on internal failure, file tio.c, line 1382
-
Decoding of DBLOCK status could be incomplete in the HTTP status
display and dbctl list lock output. Compound lock descriptors
(PREDICATE lock) were not decoded correctly if an effective set
or database lock was encountered (#260).
-
The eloqdb6 title was not included in the HTTP status display (#5).
-
The list of opened databases was not available in the HTTP status
display (#16).
-
Using transactions could result in a server deadlock.
-
A server crash during schema, dbcreate, dberase or dbpurge could
result in data corruption under rare circumstances.
-
The IPC communication method (using shared memory) is now disabled
by default (the EnableIPC setting in the eloqdb6.cfg configuration
file now defaults to 0).
This is a workaround due to problems encountered on some HP-UX
systems which could result in an eloqdb6 panic due to a failed
writev() system call and also possibly corrupt your database volume.
This is believed to be a bug in the HP-UX kernel.
Since IPC communication does not significantly improve overall
performance disabling it should not have any noticable effect.
-
On Linux SMP systems shutting down the eloqdb6 server could
sometimes cause a segment violation (SIGSEGV).
-
Using regular expressions with DBFIND could cause the server to
hang infinitely under rare circumstances.
-
A newline character was missing in the 'dbctl list lock' output.
-
On the HP-UX platform the file descriptor soft limit could cause
connection failures.
-
A problem has been resolved which could cause SQL/R to return
incomplete results on a query on a busy database if the query were
optimized by an index.
SQL/R uses an API to the data base server which allows it to
optimize queries with indexes. A write access to the database could
cause any index cursor to become invalid. In this case an EOF
condition was wrongly assumed. The index cursor is now revalidated
appropriately.
-
A problem related to stopping online backup mode was solved.
Depending on database usage during online backup this could result
in data corruption when stopping online backup mode.
When the HP Eloquence database is in online backup mode committed
transactions are saved in the log volume not the data volume(s).
When the online backup mode is stopped (through dbctl backup stop)
committed transactions are transferred from the log volume to the
data volume(s). This could cause an extension of the data volume.
A volume extension while transferring committed transactions from
the log volume to the data volume could cause corruption of
information in the log volume.
Symptoms:
- server panic after stopping online backup mode
- dbfsck reports volume corruption
-
Under rare circumstances a volume crash recovery could corrupt the
data volumes. This can happen when the eloqdb6 server has crashed
or was terminated with kill -9. In this case all committed
transactions since the last checkpoint operation are recovered
from the log volume during eloqdb6 startup.
A problem was found if a data area affected by a transaction was
deleted by a subsequent transaction and has been re-used since
the last checkpoint operation. In some cases this could lead to
corruption of structural volume information. This has been fixed by
delaying page re-use until the next checkpoint operation.
Symptoms:
- server panic during volume crash recovery
-
When an eloqdb6 server panic (internal error) is encountered the
eloqdb6 process restarts itself by default. This can be configured
in the eloqdb6.cfg configuration file with the panic= configuration
item.
When eloqdb6 encounters a fatal problem during startup this will
cause eloqdb6 to be started in an endless loop, possibly filling
the log file and using system resources. Now eloqdb6 will not
restart automatically if a problem during restart is encountered.
-
A problem related to rollback of subtransactions was solved.
Depending on database usage this could result in btree corruption
or an eloqdb6 server panic.
When a subtransaction was rolled back but the main transaction
continued it could happen that invalidated btree information
remained in memory and was used by subsequent btree operations.
This problem could only happen when using transactions.
Symptoms:
- Assertion failed: f_bhp->id.node_id == node_id
server panic: Aborting on internal failure, file mpool.c, line 392
- DBPUT failed due to duplicate secondary index
- dbfsck reports corrupted btree pages
Impact:
A database which shows the symptoms above must be reloaded (dbexport,
dberase and dbimport). Information besides internal btree is not
affected.
-
The eloqdb6 server could panic during processing volume recovery
complaining about a buffer leak. This was caused by a bug
processing defective recovery information from the log volume.
-
Cache buffer management has been tuned to provide better performance
for delete and update operations in a transaction. This should also
improve the impact of transactions on concurrent read operations.
-
A problem in the HTTP status display was fixed which caused the lock
status to return wrong HTML code (#549).