How Unicode affects searching in EMu
Now that we understand what an index term is we can talk about searching. The incorporation of Unicode into
- letters
- numbers
- punctuation
- symbols
The inclusion of punctuation as an index term means that punctuation may now be included in searches and the high speed indexes will be used to locate matches.

An issue arises in versions of
-
@John
will find all records containing words that sound like John (phonetic searching). ^joh*
uses wildcards to match records where the first word starts with the lettersjoh
(case ignored).=John
will locate records containing John with case significance (that is an upper caseJ
and lower caseohn
).
Because versions of fred@global.com
mean? In
- "fred"
- AND the phonetic of "global"
- AND "com"
However, in @
character to be treated as punctuation or does it mean the phonetic of the word "global"?
When searching for a word prior to @
character is treated as punctuation and so must appear in matching records.
How then do we indicate that the @
character means we want the phonetic version of the following word? We proceed the character with a special marker indicating the character is to take on its phonetic meaning. The marker character used is the backslash (\) character. The introduction of a marker character to alter the meaning of a character is not new in \n
can be used in strings to represent the newline character; similarly \u{}
is used to introduce the escape sequence for a Unicode code point.
There is a simple rule to determine how to format a search in
All graphic characters, except for spaces and marks, in a search are matched as the character. Where the special meaning of a character (e.g. @) is required, the character must be preceded by the backslash (\) escape character. The only exception to this rule is that the backslash character itself must be entered twice (\\) where the actual character is required.
The table below compares some searches in
Find | EMu 4.3 | EMu 5.0 |
---|---|---|
Records containing Fred |
fred
|
fred
|
Records where Fred is the only word in the field |
^fred$
|
\^fred\$
|
Records that contain Fred phonetically |
@fred
|
\@fred
|
Records containing Fred with matching case |
=Fred
|
\=Fred
|
Records containing the phrase Sacré-Cœur |
"sacré cœur"
|
\"sacre-coeur\"
|
Records where blue and sky are within five index terms of each other |
'(blue sky) <= 5 words'
|
\'\(blue sky\) <= 5 words\'
|
In the following sections we look at all available special search operators and show examples of their use in

Transformations are an operator that is applied to a search term to alter its interpretation. The table below lists all valid transformations:
Transformation | Type of search | Description |
---|---|---|
\~
|
Stemming | Search for all variations of a word. For example, searching for \~elect will match elect , election , electing and elected , but not electricity (its base word is electric ) |
\&
|
Case | Ignore the case (upper or lower) of the search term. This is the default transformation if one is not specified explicitly. |
\@
|
Phonetic | Use phonetic or sounds like searching for the specified word. |
\=
|
Case | Perform the search using case significance for the following word. |
\==
|
Diacritics | Perform the search not only matching the case but also matching any marks (diacritics). |
A transformation is always applied to a word and is placed immediately before the word to which it applies. Some examples are:
Find | Search |
---|---|
Records containing all tenses of the word |
\~locate
|
Records where |
\=melbourne
|
Records with |
\==Sacré \==Cœur
|
Records containing words similar to |
\@smythe
|

Regular expressions provide a mechanism for searching for patterns in a word. With regular expressions, sub-parts of a word may be matched. In general the high speed indexes cannot be used with regular expression searches. The only exception is trailing regular expressions (that is a regular expression that has leading letters), where partial indexing has been enabled.
Regular expressions can be intermixed with the \=
and \==
transformations to enforce case and diacritic significance.
The table below lists all valid regular expressions:
Regular Expression | Type of search | Description |
---|---|---|
\?
|
Wildcard |
Matches any single grapheme. |
\*
|
Wildcard |
Matches zero or more graphemes. |
|
Pattern Matching |
Matches only one of a sequence of graphemes specified in any one of. any one of may consist of individual graphemes or a beginning and end grapheme may be specified separated by a minus sign (e.g. |
|
Pattern Matching |
Matches one or more of a sequence of graphemes specified in one or more of. one or more of may consist of individual graphemes or a beginning and end grapheme may be specified separated by a minus sign (e.g. |
Some examples are:
Find | Search |
---|---|
Records containing words starting with |
abs\*
|
Records containing Arabic numbers. |
\{0-9\} |
Records with a three grapheme word. |
\?\?\?
|
Records with |
organi\[sz\]ation
|
Records with at least one word containing a capital |
\=\*S\*
|
Records containing either an upper case or lower case |
\==\*\[éÉ\]\*
|

Anchors are used to indicate that a search term should be located as either the first or last word in a piece of text. Anchors can be used in combination with all other types of search operators, namely transformations, regular expressions, phrases and proximity.
The table below lists all valid anchors:
Anchors | Type of search | Description |
---|---|---|
\^
|
Wildcard |
The search term following must be the first word in the text. |
\$
|
Wildcard |
The preceding search term must be the last word in the text. |
Some examples are:
Find | Search |
---|---|
Records that have text ending in a question mark. |
?\$
|
Records with text beginning with the word |
\^the
|
Records where the text contains only the word |
\^Unknown\$
|
Records with text where the first word starts with a lower case Latin letter. |
|

Proximity searching provides a mechanism for finding a list of words within a specified distance (either words, sentences or paragraphs).
- The first is phrase searches where the words must appear next to each other and in the order they are specified. The words in a phrase search may have transformations, regular expressions and anchors applied.
- The second is a regular proximity search. Proximity searches may include transformations, regulars expressions, anchors and phrases.
The table below lists all valid proximity operators:
Proximity | Type of search | Description |
---|---|---|
|
Phrase |
The search terms enclosed within the phrase operator ( |
|
Proximity |
The search terms may appear in any order unless otherwise specified. The distance between the terms indicates the range within which the search terms must appear. The syntax for distance is:
where:
The keyword |
Some examples are:
Find | Search |
---|---|
Records where the phrase |
\"the black cat\"
|
Records containing only the phrase |
\"\^Not Applicable\$\"
|
Records where |
|
Records where the kanji character 豈 appear within 5 characters of the phrase 香港. |
|

NOT
. The NOT
operator excludes records that have the next term in the searched field.
For example: a search for \!rock roll
will return records that mention sausage roll
in the search field but not rock and roll
.
The NOT
operator can be applied to any of the other search operators, that is transformations, regular expressions, anchors and proximity.
The table below lists the valid conditional operator:
Conditionals |
Type of search |
Description |
---|---|---|
|
Records that contain the next search term in the searched field are excluded from search results. |
Some examples are:
Find | Search |
---|---|
Records that do not contain the kanji character 豈. |
|
Records that contain anything apart from the single word |
\!\^Unknown\$
|
Records that do not contain the phrase |
\!\"Not Applicable\"
|
Records containing the phrase |
\"\==Sacré \==Cœur\" \!Paris
|