Phrase Search, Truncation, and Wildcards

Phrase search is used to focus the search

A phrase is an entity with a certain meaning, formed by two or more words. For example, “university library” and “university library website” are phrases that mean different things.

The significance of a phrase for information retrieval is that a phrase enables a more accurate search. Simultaneously, fewer matches will be provided. In order to assure that the words in a phrase remain together in the wanted order, the search engine must be given a specific command. In information retrieval, phrases are often marked using the quotation marks “ ”.

Without quotation marks, the AND operator will often appear between two words, regardless of whether it was typed in. Medic is one exception to the rule, as the OR operator is placed between two words instead. See the next chapter, Combining Search Terms into a Search Query, for more information on AND and OR operators.

Without a phrase, the search terms may appear among the text in any order: in different fields, as well as near or far from each other.

Truncation allows inflected forms to be included in the search

Words are truncated or appear as parts of a compound word. The words can be formed into different kinds of derivatives.

When you search for information, you are rarely looking for certain inflected forms of the search terms. Database search engines, in turn, cannot usually determine the basic form of the terms but search with the exact spelling. Therefore, some quality results may be mistakenly excluded from the search.  

When you wish to search for all possible inflected forms and derivatives, this must be expressed in the formatting of the search term. This is done with truncation.

Truncation is done by using the uninflected form of the word, which also has a truncation symbol attached to it. An asterisk is used as the truncation symbol: *

The general rule is that truncation increases the number of matches. Multiple possible inflected forms and other variants are retrieved with a single search term.

How are words truncated?

The placement of the truncation requires attention. If the truncation takes place too early (comp*), false matches will also be found. If the truncation takes place too late (computer*), some of the matches will not be found.

A good general rule is that the truncation should take place after its word stem, but this may not be accurate if the word stem is very short, e.g., air. Words such as mouse -> mice are also problematic for truncation.

Wildcards should be used to retrieve different spellings of the same word

Wildcards are most used with search terms that have variable written forms, for example in British English the word ‘organisation’ is used, whose counterpart in American English is ‘organization’. Wildcards may be used to replace problematic characters in a search term.

Special cases of word search automation

Phrase search works also in Google.

Do note, that different word search automatics, such as e.g. truncation and wildcards, do not often apply if the search term is typed within quotation marks (phrase search).

For example, UEF-Primo can not inflect Finnish language, which is why truncation should always be used in UEF-Primo. In addition, it is not possible to truncate words within quotation marks in UEF Primo.