EMu 's support for Unicode
Frederic
would not match Fréderic
as the e acute character was not interpreted as an e character with a diacritic associated with it.
Now
- Case folding is similar to converting a character to its lower case equivalent except that it handles some special cases. The purpose of case folding is to make searching case insensitive. One special case is that the German lower case sharp
s
character (ß
) is generally written in upper case asSS
. SoGroßen
would be converted toGROSSEN
in upper case. When searching we would like to enter either of the previous terms and find all case variations. In order to do this theß
character needs to be folded toss
for searching purposes. - The base version of a character is its most basic representation after all diacritics and marks have been removed. For example the base character of
é
ise
.
The combination of case folding and base characters provides the basic mechanisms required to provide flexible searching over the full range of Unicode characters.
All data stored in
Tip: The Import Tool is able to convert ANSI to UTF-8.
With Unicode support, searching in ?
) or as part of a more complex string (fred@global.com
).
If you are already familiar with searching in fre*
in fre
. In fre*
in fre*
.
Escaping a character involves preceding it with a backslash (\
). Thus an fre*
to locate all words beginning with the letters fre
becomes fre\*
in
In the following sections we explore what changes have been implemented and how they impact usage of