Lugaru's Epsilon Programmer's Editor 14.04
Context:
| Regular Expression Examples
|
|
Epsilon User's Manual and Reference >
Commands by Topic >
Changing Text >
Regular Expressions >
Regular Expression Examples
- The pattern
if|else|for|do|while|switch specifies
the set of statement keywords in C and EEL.
- The pattern
c[ad]+r specifies strings like "car",
"cdr", "caadr", "caaadar". These correspond to
compositions of the car and cdr Lisp operations.
- The pattern
c[ad][ad]?[ad]?[ad]?r specifies the strings
that represent up to four compositions of car and cdr in Lisp.
- The pattern
[a-zA-Z]+ specifies the set of all sequences
of 1 or more letters. The character class part denotes
any upper- or lower-case letter, and the plus operator specifies
one or more of those.
Epsilon's commands to move by words accomplish their task by
performing a regular expression search. They use a pattern similar
to [a-zA-Z0-9_]+ , which specifies one or more letters, digits,
or underscore characters. (The actual pattern includes national
characters as well.)
- The pattern
(<Newline>|<Return>|<Tab>|<Space>)+
specifies nonempty sequences of the whitespace characters newline,
return, tab, and space. You could also write this pattern as
<Newline|Return|Tab|Space>+ or as
<Wspace|Return>+ , using a character class name.
- The pattern
/%*.*%*/ specifies a set that includes all 1-line
C-language comments. The percent character quotes the first and
third stars, so they refer to the star character itself. The middle
star applies to the period, denoting zero or more occurrences of any
character other than newline. Taken together then, the pattern
denotes the set of strings that begin with "slash star", followed
by any number of non-newline characters, followed by "star slash".
You can also write this pattern as /<Star>.*<Star>/ .
- The pattern
/%*(.|<Newline>)*%*/ looks like the
previous pattern, except that instead of ".", we have
(.|<Newline>) . So instead of "any character except
newline", we have "any character except newline, or newline", or
more simply, "any character at all". This set includes all C
comments, with or without newlines in them. You could also write
this as /%*<Any>*%*/ instead.
- The pattern
<^digit|a-f> matches any character
except one of these: 0123456789abcdef.
- The pattern
<alpha&!r&!x-z&!p:softdotted> matches
all Latin letters except R, X, Y, Z, I and J (the latter two because
the Unicode property SoftDotted, indicating a character with a dot
that can be replaced by an accent, matches I and J). It also matches
all non-Latin Unicode letters that don't have this property.
An advanced example
Let's build a regular expression that includes precisely the set of
legal strings in the C programming language. All C strings begin and end
with double quote characters. The inside of the string denotes a
sequence of characters. Most characters stand for themselves, but
newline, double quote, and backslash must appear after a "quoting"
backslash. Any other character may appear after a backslash as well.
We want to construct a pattern that generates the set of all
possible C strings. To capture the idea that the pattern must begin
and end with a double quote, we begin by writing
" something"
We still have to write the something part, to generate
the inside of the C strings. We said that the inside of a C string
consists of a sequence of characters. The star operator means
"zero or more of something". That looks promising, so we write
"( something)*"
Now we need to come up with a something part that
stands for an individual character in a C string. Recall that
characters other than newline, double quote, and backslash stand for
themselves. The pattern <^Newline|"|\> captures
precisely those characters. In a C string, a "quoting" backslash
must precede the special characters (newline, double quote, and
backslash). In fact, a backslash may precede any character in a C
string. The pattern \(.|<Newline>) means,
precisely "backslash followed by any character". Putting those
together with the alternation operator (| ), we get the pattern
<^Newline|"|\>|\(.|<Newline>) which
generates either a single "normal" character or any character
preceded by a backslash. Substituting this pattern for the
something yields
"(<^Newline|"|\>|\(.|<Newline>))*"
which represents precisely the set of legal C strings.
In fact, if you type this pattern into a regex-search command
(described below), Epsilon will find the next C string in the
buffer.
Epsilon Programmer's Editor 14.04 manual. Copyright (C) 1984, 2021 by Lugaru Software Ltd. All rights reserved.
|