.
contact contact

Unexpected sort order (Linux)

 
.
 

Title: Unexpected sort order (Linux)
Document: 996585141
Author: Michael Marxmeier, Roland Genske
Keywords: linux,LANG,collating sequence,locale,sort,order


Q: Eloquence uses an unexpected sorting order on Linux.

A: Eloquence uses the collating sequence defined by the operating system for the SORT BY, QFIND .. IN statements and the LEX function.

Since Eloquence internally uses the HP-ROMAN8 character set encoding, the locale should either use the byte order (C, POSIX). The LANG, ELOQLANG and LC_COLLATE environment variables should either be unset or set to POSIX or C.

Relevant environment variables:

LANG
The LANG environment variable defines the locale. The LANG environment variable defines the default used for various categories such as ordering, character classification, date, time and monetary settings or language used for messages. Different categories can be redefined by setting an environment variable for the category. Please refer to man locale for more information.

LC_COLLATE
The LC_COLLATE environment variable defines the setting to be used for ordering and comparizon. If set it does override the LANG setting for the collating category.

ELOQLANG
The ELOQLANG environment vaiables allows to set a Eloquence specific value for LANG, LC_CTYPE and LC_COLLATE.

Using the POSIC or C locale has the disadvantage, that national chracters are not sorted in the expected order. For example the German Umlaut Ä should be equivalent to AE and sorted between A and B. Since the Ä has binary value of 216 in Eloquence it is sorted below all letters.

If the sorting order is important you should either set the LANG, ELOQLANG or LC_COLLATE environment variable. Since Eloquence internally uses the HP-ROMAN8 character mapping, the choosen locale needs to support this locale, or the order will be wrong. For Germany this would typically be de_DE.hp-roman8.

If Linux does not provide the correct locale, you have the option to create one from the configuration files included:

  1. Please make sure the glibc-i18ndata package from your Linux distribution is installed. The package name could vary for your distribution.

  2. As root change to the directory /usr/share/i18n/locales. This directory contains the source files for the installed locales.

  3. Run localedef to compile/install the new locale. For example to create a de_DE locale with the HP-ROMAN8 character set mapping the following command would be used:
      localedef -i de_DE -f HP-ROMAN8 de_DE.hp-roman8
    

Then the LANG, ELOQLANG or LC_COLLATE environment variable should be set to de_DE.hp-roman8 to make use of this locale.

The sort order may still be different that what is exprected. The Linux operating system by default uses a collating sequence which follows the established standards (ISO 14651) but may be incompatible with assumptions made by the software.

For example, the German locale defines the following properties:

  • Lower and upper case letters are "folded" (and may even evaluate to the same value). For example, a would be sorted between A and B or could even be considered identical with A.

  • Lower case letters have a smaller value than upper case ones.

  • Spaces and control characters are sorted differently.

While this is correct, it may well be unexpected for the customer or even violate assumptions made by the application.

As a solution you can install your own locale to define more traditional rules. As an example we created a deinition for a locale de_DE@eq which implements more traditional ordering rules for the de_DE locale. This locale could also be used with Eloquence for other locales.

  1. Please make sure the localedb package from your Linux distribution is installed. The package name could vary for your distribution. This package should provide the localedef command and the locale sources.

  2. As root change to the directory /usr/share/i18n/locales. This directory contains the source files for the installed locales. Please unpack the example locale files into this directory.
      tar -xzvf /tmp/eq_locale_de.tar.gz
    
    This should unpack the files de_DE@eq and eq_coll_1.

  3. Run localedef to compile/install the locale
      localedef -i de_DE@eq -f HP-ROMAN8 de_DE@eq
    
    This creates the de_DE@eq locale which uses the HP-ROMAN8 character encoding.

The new locale (de_DE@eq) is now installed on your system (most likely in the directory /usr/share/locale/).

To make Eloquence use this locale you either need to set the LANG, ELOQLANG or LC_COLLATE environment variable.

  export ELOQLANG=de_DE@eq

The example locale file is available on the Eloquence ftp server at ftp://ftp.marxmeier.com/eloq/misc/eq_collate_de.tar.gz.

The example files have been tried with glibc2.2 and may not work with previous glibc versions.

 
 
.
 
 
  Privacy | Webmaster | Terms of use | Impressum Revision:  Mon May 4 15:59:29 2009  
  Copyright © 1995-2004 Marxmeier Software AG