Ordering
Ordering by binary value of a code results an insufficient output, because:
- A binary value of a code is not standing on ordering
sequence
For example, in Japanese, the binary values of SBCS Katakana characters
are in the middle of DBCS characters'.
- Each DBCS language has several ways to order DBCS.
This depends on what is required.
For example, here are the following sequences for Japanese:
- Radical stroke count sequence
- Total stroke count sequence
- Phonetic reading sequence
- Representative phonetic reading sequence
- Combination of the above
- User defined sequence
Usually
more information other than character code points is needed to sort, such
as phonetic reading.
To satisfy the minimum requirements, you may use the collating sequence
table provided by OS/2 in accordance with the current process code page
(DosGetCollate) and re-align the order as follows:
- Check if the byte is an SBCS character or the first
byte of a DBCS character.
- If so, translate it using the OS/2 collating sequence
table.
- If not (i.e. if the byte is the second byte of a DBCS
character), leave it as it is.
- Perform ordering by using those values.
You should also provide user exit(s) for national language-unique ordering.
[Back: Replacing/Overwriting Characters]
[Next: Normalization - Wide Character]