Title: | Unexpected sort order (Linux) |
Document: | 996585141 |
Author: | Michael Marxmeier, Roland Genske |
Keywords: | linux,LANG,collating sequence,locale,sort,order |
Q:
Eloquence uses an unexpected sorting order on Linux.
A:
Eloquence uses the collating sequence defined by
the operating system for the SORT BY, QFIND .. IN
statements and the LEX function.
Since Eloquence internally uses the HP-ROMAN8 character
set encoding, the locale should either use the byte order
(C, POSIX). The LANG, ELOQLANG and LC_COLLATE environment
variables should either be unset or set to POSIX or C.
Relevant environment variables:
- LANG
- The LANG environment variable defines the locale.
The LANG environment variable defines the default used for various
categories such as ordering, character classification, date, time and
monetary settings or language used for messages. Different categories
can be redefined by setting an environment variable for the category.
Please refer to man locale for more information.
- LC_COLLATE
- The LC_COLLATE environment variable defines the setting to
be used for ordering and comparizon. If set it does override the
LANG setting for the collating category.
- ELOQLANG
- The ELOQLANG environment vaiables allows to set a Eloquence
specific value for LANG, LC_CTYPE and LC_COLLATE.
Using the POSIC or C locale has the disadvantage, that national
chracters are not sorted in the expected order. For example
the German Umlaut Ä should be equivalent to AE and sorted
between A and B. Since the Ä has binary value of 216 in
Eloquence it is sorted below all letters.
If the sorting order is important you should either set the
LANG, ELOQLANG or LC_COLLATE environment variable. Since Eloquence
internally uses the HP-ROMAN8 character mapping, the choosen locale
needs to support this locale, or the order will be wrong.
For Germany this would typically be de_DE.hp-roman8.
If Linux does not provide the correct locale, you have the option
to create one from the configuration files included:
- Please make sure the glibc-i18ndata package from your Linux
distribution is installed. The package name could vary for your
distribution.
- As root change to the directory
/usr/share/i18n/locales. This directory contains the
source files for the installed locales.
- Run localedef to compile/install the new locale. For example to
create a de_DE locale with the HP-ROMAN8 character set mapping
the following command would be used:
localedef -i de_DE -f HP-ROMAN8 de_DE.hp-roman8
Then the LANG, ELOQLANG or LC_COLLATE environment variable should
be set to de_DE.hp-roman8 to make use of this locale.
The sort order may still be different that what is exprected.
The Linux operating system by default uses a collating sequence
which follows the established standards (ISO 14651) but may be
incompatible with assumptions made by the software.
For example, the German locale defines the following properties:
- Lower and upper case letters are "folded" (and may even evaluate
to the same value). For example, a would be sorted between A and B
or could even be considered identical with A.
- Lower case letters have a smaller value than upper case ones.
- Spaces and control characters are sorted differently.
While this is correct, it may well be unexpected for the customer or
even violate assumptions made by the application.
As a solution you can install your own locale to define more traditional
rules. As an example we created a deinition for a locale de_DE@eq
which implements more traditional ordering rules for the de_DE
locale. This locale could also be used with Eloquence for other locales.
- Please make sure the localedb package from your Linux
distribution is installed. The package name could vary for your
distribution. This package should provide the localedef command and
the locale sources.
- As root change to the directory
/usr/share/i18n/locales. This directory contains the
source files for the installed locales. Please unpack the example
locale files into this directory.
tar -xzvf /tmp/eq_locale_de.tar.gz
This should unpack the files de_DE@eq and eq_coll_1.
- Run localedef to compile/install the locale
localedef -i de_DE@eq -f HP-ROMAN8 de_DE@eq
This creates the de_DE@eq locale which uses the HP-ROMAN8 character
encoding.
The new locale (de_DE@eq) is now installed on your system (most
likely in the directory /usr/share/locale/).
To make Eloquence use this locale you either need to set the
LANG, ELOQLANG or LC_COLLATE environment
variable.
export ELOQLANG=de_DE@eq
The example locale file is available on the Eloquence ftp server at
ftp://ftp.marxmeier.com/eloq/misc/eq_collate_de.tar.gz.
The example files have been tried with glibc2.2 and may not work
with previous glibc versions.
|