

A regular language is a set of strings that can be matched in a single pass through the text using only a fixed amount of memory. This subset suffices to describe all regular languages. The syntax described so far is most of the traditional Unix egrep regular expression syntax. Some examples: ab|cd is equivalent to (ab)|(cd) ab\ is equivalent to a(b\). Explicit parentheses can be used to force different meanings, as in arithmetic expressions. The operator precedence, from weakest to strongest binding, is first alternation, then concatenation, and finally the repetition operators. The metacharacters \, +, and ? are repetition operators: e 1 \ matches a sequence of zero or more (possibly different) strings, each of which match e 1 e 1 + matches one or more e 1 ? matches zero or one.

Two regular expressions can be altered or concatenated to form a new regular expression: if e 1 matches s and e 2 matches t, then e 1 | e 2 matches s or t, and e 1 e 2 matches st. To match a metacharacter, escape it with a backslash: \+ matches a literal plus character. Except for the metacharacters like \*+?()|, characters match themselves. The simplest regular expression is a single literal character. When a string is in the set described by a regular expression, we often say that the regular expression matches the string. Regular expressions are a notation for describing sets of character strings.
