Epsilon User's Manual and Reference >
Commands by Topic >
Buffers and Files >
Unicode Features
This
section explains how to use Epsilon to edit text containing
non-English characters such as ê or å.
Epsilon supports Unicode, as well as many 8-bit national character
sets such as ISO 8859-1 (Latin 1).
In Unix, Unicode support is only available when Epsilon runs under
X11, and when a font using the iso10646 (Unicode) character set is in
use. See https://www.lugaru.com/links.html#unicode for Unicode
font sources. Epsilon includes a shell script named
get_unicode_core_x11_fonts than can install Unicode-based fonts in
various sizes. By default it's in /opt/epsilon14.00/bin on
Linux and FreeBSD, and/Applications/Epsilon
14.00.app/Contents/Resources on macOS. Under Unix, Epsilon
displays all characters using a glyph width determined by
the widest character in the font.
Epsilon for Windows shows characters in a font using their specified
individual widths, but only in a full-width window. If you instead
create side-by-side windows, Epsilon will ignore the special width
rules of zero-width characters and extra-wide characters, among other
things, putting each character into a same-width cell. Text with such
characters should be edited in full-width windows for the best
display. (See the change-show-spaces command to make
zero-width characters visible.)
To enable Unicode display in Epsilon for Windows Console, see the
console-ansi-font variable. Also see DOS/OEM Character Set Support for more
information on the DOS/OEM encoding used by default in the Windows
Console version.
In
this release, Epsilon doesn't display Unicode characters outside the
basic multilingual plane (BMP), or include any of the special
processing needed to handle complex scripts, such as scripts written
right-to-left. Each character outside the BMP is handled as a pair of
surrogate characters. While Epsilon cannot display their glyphs, the
show-point command will report the Unicode character name of
the one at point.
Epsilon knows how to translate between its native Unicode format and
dozens of encodings and character sets (such as UTF-8, ISO-8859-4, or
KOI-8).
Epsilon autodetects the encoding for files that start with a Unicode
signature ("byte order mark"), and for many files that use the UTF-8
encoding. To force translation from a particular encoding, provide a
numeric argument to a file reading command like
find-file. Epsilon will then prompt for the name of the
encoding to use. Press "?" when prompted for an encoding to see a
list of available encodings. The special encoding "raw" reads and
writes 8-bit data without any character set translation.
Epsilon
uses the buffer's current encoding when writing or rereading a file.
Use the set-encoding command to set the buffer's encoding.
The unicode-convert-from-encoding command makes Epsilon translate an
8-bit buffer in a certain encoding to its 16-bit Unicode version. The
unicode-convert-to-encoding command does the reverse.
You can add a large set of additional converters to Epsilon by
downloading a file. Mostly these converters add support for various
Far East languages and for EBCDIC conversions. See
https://www.lugaru.com/encodings.html for details.
Internally, buffers with no character codes outside the range 0-255
are stored with 8 bits per character; other buffers are stored with 16
bits per character. Epsilon automatically converts formats as needed.
The detect-encodings variable controls whether Epsilon tries to
autodetect certain UTF-8 and UTF-16 files. The
default-read-encoding variable says which encoding to use when
autodetecting doesn't select an encoding. The
default-write-encoding variable sets which encoding Epsilon
uses to save a file with 16-bit characters and no specified encoding,
in a context where prompting wouldn't be appropriate such as when
auto-saving.
See the insert-ascii command in Inserting and Deleting
to type arbitrary Unicode characters, and the show-point
command to see what specific characters are present (if the current
font doesn't make that clear enough).
Epsilon Programmer's Editor 14.00 manual. Copyright (C) 1984, 2020 by Lugaru Software Ltd. All rights reserved.
|