LCMapString
LCMapString
Functional Difference from WIN95
T.B.D.
Functional Difference from SBCS Open32
New
Implementation
This function maps one character string to another, performing a specified
locale-dependent transformation. The function can also be used to generate
a sort key for the input string. Instead of UniTransLower or UniTransUpper,
it is same to call UniCreateTransformObject and UniTransformStr.
───────────────────────────────────────────────────────────────
WIN95 flag Mapping in OS/2
LCMAP_FULLWIDTH Mapping table in Open32 *1
LCMAP_HALFWIDTH Mapping table in Open32 *2
LCMAP_HIRAGANA Mapping table in Open32 *3
LCMAP_KATAKANA Mapping table in Open32 *4
LCMAP_LOWERCASE UniTransLower
LCMAP_UPPERCASE UniTransUpper
LCMAP_SORTKEY UniStrxfrm *5
───────────────────────────────────────────────────────────────
SORT_STRINGSORT
NORM_IGNORECASE *6
NORM_IGNOREKANATYPE *7
NORM_IGNORENONSPACE *8
NORM_IGNORESYMBOLS
NORM_IGNOREWIDTH *9
Note:
- Maps the half-width character to the full-width
character with using the ToFullTBL[] and FullHalfTbl[], which are the internal
hard-coded table in Open32.
- Maps the full-width character to the half-width
character. As for Hiragana and Katakana characters, they are mapped by
using the FullHalfTbl[], which is the internal hard-coded table in Open32.
- Maps the Katakana character to the Hiragana
character by using ToHiraTBL[], which is the internal hard-coded range table
in Open32.
- Maps the Hiragana character to the Katakana
character with using the table; ToKataTBL[], which is the internal hard-coded
range table in Open32.
- When LCMAP_SORTKEY is set, this function maps
characters to the appropriate characters in most cases like as below, before
getting the sort key. There are three cases; NORM_IGNORESYMBOLS, and NORM_IGNOREKANATYPE
with NORM_IGNOREWIDTH, and SORT_STRINGTYPE, in that this function operates
against the result sort key.
- Before getting the sort key, this function maps
the upper character to the lower character.
- Before getting the sort key, this function maps
the Hiragana character to the Katakana character.
- Before getting the sort key, this function maps
the character combined with the nonspacing character (VOICE_SOUND, and SEMIVOICE_SOUND)
to two characters; the base character and the nonspacing character. ToBaseTBL[]
is the table for it, which is the inte rnal table in Open32, which maps
characters combined with nonspacing character
- If NORM_IGNOREWIDTH and NORM_IGNOREKANATYPE
is set, then the Hiragana is mapped to the Katakana.
Definition of the sort key.
Open32 generates the sort key to be able to return the same sort key as
in WIN95 except the Unicode character. weight.
Here is the definition of the sort key array.
According to the MSDN;
- Unicode sort weights] 0x01
- Diacritic weights] 0x01
- Case weights] 0x01
- Special weights] ---- 0x00
On
the other hand, here is the actual result of LCMapString( ) in Windows95-J.
- Unicode weights] 0x01
- Diacritic weights] 0x01
- Case weights] 0x01
- Special weights] 0x01
- Other weights] ----- 0x00
Here is the rule for each field, which is found from the actual result.
- The separator; 0x01 always exists even if the weight
is the empty.
- The sort key array of the string is grouped per each
kind of the weight.
- Unicode character weight is common between characters
whose other weight are different. For example, Katakana 'a' is equal to
Hiragana 'a' in Unicode character weight.
- Some weight can be omitted. If nothing weight follows,
such kind of weight doesn't appear.
- Here is the weight value, which is found from the
actual result.
Here is the weight value, which is found from the actual result.
- Alpha-numeric (AW) Weights
- 1st byte of Unicode character weight.
- 0x00-0x08, etc.
- Special character 0x09-0x0d !"#$%&()*,./:;?@<[>]^_`{|}</a></b></c></d>
- Math symbol
+<<=<;>
- 0a \</e>
- 0c Numeric character 0123456789
- 0e Alphabet character ABC-Z abc-z
- 22 Katakana Hiragana Katakana, Hiragana
- 8x Kanji character Kanji,
- fe 0xfd 0xfe
- ff
-
2nd byte of the Unicode character weight.
Character weight is common between characters whose other weight are different.
- eg1 Half-width lower
- Half-width upper
- Full-width lower
- Full-width upper
- eg2 Small Half-width Katakana
- Large Half-width Katakana
- Small Full-width Katakana
- Large Full-width Katakana
- Large Full-width Katakana
- Small Full-width Hiragana
- Large Full-width Hiragana
- Diacritic Weight (DW)
- 02 Single-byte lower character ( Omitted if nothing
follows.)
- 03 Katakana voice sound nonspacing character.
- 04 Katakana semi-voice sound nonspacing character.
- Case Weight (CW)
- 02 Single-byte lower character ( Omitted if nothing
follows.)
- 03 Double-byte lower character
- 0c Single-byte upper character
- 0d Double-byte upper character
This
field is separated by the unique separator; 0xFF.
- Field1
- c4 small Katakana or small Hiragana
- c6 Single-byte Katakana ( Omitted if
nothing follows.)
-
Field2
- Katakana or Hiragana ( Only one in the
field regardless the n
- c4 Katakana
- e4 Hiragana ( Omitted if nothing follows.)
-
Field3
- c4 Single-byte Katakana
- c5 small Hiragana ( Omitted if nothing
follows.)
Other weight.
- MSDN does not describes about this field. The
weight in this field is 4 bytes-weight per a character. It begins with 0x80
0x70 0x06. This weight is ignored in NORM_IGNORESYMBOLS flag and SORT_STRINGSORT
flag. When the SORT_STRINGSORT flag is set, the character weight is generated
for characters which has the weight in this field. In that character weight,
the 1st byte is the 0x07, 0x08, and 0x0A listed above. And the 2nd byte
is the original 4th byte of this field.
[Back: IsDBCSLeadByteEx]
[Next: MultiByteToWideChar, WideCharToMultiByte]