Lugaru's Epsilon
Programmer's
Editor 14.06

Context:
Epsilon User's Manual and Reference
   Commands by Topic
      . . .
      Moving Around
         Simple Movement Commands
         Moving in Larger Units
         Searching
         . . .
         Comparing Many Files
      Changing Text
         . . .
         Capitalization
         Replacing
         Regular Expressions
            Entering Special Characters
            Character Classes
            Regular Expression Examples
            . . .
            Regular Expression Commands
         Rearranging
         Indenting Commands
         . . .
      Language Modes
         Asm Mode
         Batch Mode
         C Mode
         . . .
         Visual Basic Mode
      . . .

Previous   Up    Next
Replacing  Commands by Topic   Entering Special Characters


Epsilon User's Manual and Reference > Commands by Topic > Changing Text >

Regular Expressions

Most of Epsilon's searching commands, described in Searching, take a simple string to search for. Epsilon provides a more powerful regular expression search facility, and a regular expression replace facility.

Instead of a simple search string, you provide a pattern, which describes a set of strings. Epsilon searches the buffer for an occurrence of one of the strings contained in the set. You can think of the pattern as generating a (possibly infinite) set of strings, and the regex search commands as looking in the buffer for the first occurrence of one of those strings.

The following characters have special meaning in a regex search: vertical bar, parentheses, plus, star, question mark, square brackets, period, dollar, percent sign, left angle bracket ("<"), and caret ("^"). To match them literally, they must be quoted; see Entering Special Characters. See the following sections for syntax details and additional examples.

 abc|def  Finds either abc or def.
 (abc)  Finds abc.
 abc+  Finds abc or abcc or abccc or ... .
 abc*  Finds ab or abc or abcc or abccc or ... .
 abc?  Finds ab or abc.
 [abcx-z]  Finds any single character of a, b, c, x, y, or z.
 [^abcx-z]  Finds any single character except a, b, c, x, y, or z.
 .  Finds any single character except <Newline>.
 abc$  Finds abc that occurs at the end of a line.
 ^abc  Finds abc that occurs at the beginning of a line.
 %^abc  Finds a literal ^abc.
 <Tab>  Finds a <Tab> character.
 <#123>  Finds the character with ASCII code 123.
 <p:cyrillic>  Finds any character with that Unicode property.
 <alpha|1-5&!x-z>  Finds any alpha character except x, y or z or digit 1-5.
 <^c:*comment>printf   Finds uses of printf that aren't commented out.
 <h:0d 0a 45>  Finds char sequence with those hexadecimal codes.

Plain Patterns

In a regular expression, a string that does not contain any of the above characters denotes the set that contains precisely that one string. For example, the regular expression abc denotes the set that contains, as its only member, the string "abc". If you search for this regular expression, Epsilon will search for the string "abc", just as in a normal search.

Alternation

To include more than one string in the set, you can use the vertical bar character. For example, the regular expression abc|xyz denotes the set that contains the strings "abc" and "xyz". If you search for that pattern, Epsilon will find the first occurrence of either "abc" or "xyz". The alternation operator (|) always applies as widely as possible, limited only by grouping parentheses.

Grouping

You can enclose any regular expression in parentheses, and the resulting expression refers to the same set. So searching for (abc|xyz) has the same effect as searching for abc|xyz, which works as in the previous paragraph. You would use parentheses for grouping purposes in conjunction with some of the operators described below.

Parentheses are also used for retrieving specific portions of the match. A regular expression replacement uses the syntax #3 to refer to the third parenthesized group, for instance. The find_group( ) function provides a similar function for EEL programmers. The special syntax (?: ) provides grouping just like ( ), but isn't counted as a group when retrieving parts of the match in these ways.

Concatenation

You can concatenate two regular expressions to form a new regular expression. Suppose the regular expressions p and q denote sets P and Q, respectively. Then the regular expression pq denotes the set of strings that you can make by concatenating, to members of P, strings from the set Q. For example, suppose you concatenate the regular expressions (abc|xyz) and (def|ghi) to yield (abc|xyz)(def|ghi). From the previous paragraph, we know that (abc|xyz) denotes the set that contains "abc" and "xyz"; the expression (def|ghi) denotes the set that contains "def" and "ghi". Applying the rule, we see that (abc|xyz)(def|ghi) denotes the set that contains the following four strings: "abcdef", "abcghi", "xyzdef", "xyzghi".

Closure

Clearly, any regular expression must have finite length; otherwise you couldn't type it in. But because of the closure operators, the set to which the regular expression refers may contain an infinite number of strings. If you append plus to a parenthesized regular expression, the resulting expression denotes the set of one or more repetitions of that string. For example, the regular expression (ab)+ refers to the set that contains "ab", "abab", "ababab", "abababab", and so on. Star works similarly, except it denotes the set of zero or more repetitions of the indicated string.

Optionality

You can specify the question operator in the same place you might put a star or a plus. If you append a question mark to a parenthesized regular expression, the resulting expression denotes the set that contains that string, and the empty string. You would typically use the question operator to specify an optional subpart of the search string.

You can also use the plus, star, and question-mark operators with subexpressions, and with non-parenthesized things. These operators always apply to the smallest possible substring to their left. For example, the regular expression abc+ refers to the set that contains "abc", "abcc", "abccc", "abcccc", and so on. The expression a(bc)*d refers to the set that contains "ad", "abcd", "abcbcd", "abcbcbcd", and so on. The expression a(b?c)*d denotes the set that contains all strings that start with "a" and end with "d", with the inside consisting of any number of the letter "c", each optionally preceded by "b". The set includes such strings as "ad", "acd", "abcd", "abccccbcd".

Subtopics:

Entering Special Characters
Character Classes
Regular Expression Examples
Searching Rules
Regular Expression Assertions
Regular Expression Commands



Previous   Up    Next
Replacing  Commands by Topic   Entering Special Characters


Lugaru Epsilon Programmer's Editor 14.06 manual. Copyright (C) 1984, 2024 by Lugaru Software Ltd. All rights reserved.