summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/luametatex/luametatex-languages.tex
diff options
context:
space:
mode:
authorHans Hagen <pragma@wxs.nl>2020-01-26 19:35:43 +0100
committerContext Git Mirror Bot <phg@phi-gamma.net>2020-01-26 19:35:43 +0100
commit43fc66771a0c9d27cc0b7fe7a69392ea313bd0ca (patch)
tree9b339c63cd28528e5062fe980e964808df619374 /doc/context/sources/general/manuals/luametatex/luametatex-languages.tex
parent5189b2143a30a39cd3533569cbef3f06422cc1d9 (diff)
downloadcontext-43fc66771a0c9d27cc0b7fe7a69392ea313bd0ca.tar.gz
2020-01-26 18:37:00
Diffstat (limited to 'doc/context/sources/general/manuals/luametatex/luametatex-languages.tex')
-rw-r--r--doc/context/sources/general/manuals/luametatex/luametatex-languages.tex1113
1 files changed, 1113 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/luametatex/luametatex-languages.tex b/doc/context/sources/general/manuals/luametatex/luametatex-languages.tex
new file mode 100644
index 000000000..19112a7f1
--- /dev/null
+++ b/doc/context/sources/general/manuals/luametatex/luametatex-languages.tex
@@ -0,0 +1,1113 @@
+% language=uk
+
+\environment luametatex-style
+
+\startcomponent luametatex-languages
+
+\startchapter[reference=languages,title={Languages, characters, fonts and glyphs}]
+
+\startsection[title={Introduction}]
+
+\topicindex {languages}
+
+\LUATEX's internal handling of the characters and glyphs that eventually become
+typeset is quite different from the way \TEX82 handles those same objects. The
+easiest way to explain the difference is to focus on unrestricted horizontal mode
+(i.e.\ paragraphs) and hyphenation first. Later on, it will be easy to deal
+with the differences that occur in horizontal and math modes.
+
+In \TEX82, the characters you type are converted into \type {char} node records
+when they are encountered by the main control loop. \TEX\ attaches and processes
+the font information while creating those records, so that the resulting \quote
+{horizontal list} contains the final forms of ligatures and implicit kerning.
+This packaging is needed because we may want to get the effective width of for
+instance a horizontal box.
+
+When it becomes necessary to hyphenate words in a paragraph, \TEX\ converts (one
+word at time) the \type {char} node records into a string by replacing ligatures
+with their components and ignoring the kerning. Then it runs the hyphenation
+algorithm on this string, and converts the hyphenated result back into a \quote
+{horizontal list} that is consecutively spliced back into the paragraph stream.
+Keep in mind that the paragraph may contain unboxed horizontal material, which
+then already contains ligatures and kerns and the words therein are part of the
+hyphenation process.
+
+Those \type {char} node records are somewhat misnamed, as they are glyph
+positions in specific fonts, and therefore not really \quote {characters} in the
+linguistic sense. There is no language information inside the \type {char} node
+records at all. Instead, language information is passed along using \type
+{language whatsit} nodes inside the horizontal list.
+
+In \LUATEX, the situation is quite different. The characters you type are always
+converted into \nod {glyph} node records with a special subtype to identify them
+as being intended as linguistic characters. \LUATEX\ stores the needed language
+information in those records, but does not do any font|-|related processing at
+the time of node creation. It only stores the index of the current font and a
+reference to a character in that font.
+
+When it becomes necessary to typeset a paragraph, \LUATEX\ first inserts all
+hyphenation points right into the whole node list. Next, it processes all the
+font information in the whole list (creating ligatures and adjusting kerning),
+and finally it adjusts all the subtype identifiers so that the records are \quote
+{glyph nodes} from now on.
+
+\stopsection
+
+\startsection[title={Characters, glyphs and discretionaries},reference=charsandglyphs]
+
+\topicindex {characters}
+\topicindex {glyphs}
+\topicindex {hyphenation}
+
+\TEX82 (including \PDFTEX) differentiates between \type {char} nodes and \type
+{lig} nodes. The former are simple items that contained nothing but a \quote
+{character} and a \quote {font} field, and they lived in the same memory as
+tokens did. The latter also contained a list of components, and a subtype
+indicating whether this ligature was the result of a word boundary, and it was
+stored in the same place as other nodes like boxes and kerns and glues.
+
+In \LUATEX, these two types are merged into one, somewhat larger structure called
+a \nod {glyph} node. Besides having the old character, font, and component
+fields there are a few more, like \quote {attr} that we will see in \in {section}
+[glyphnodes], these nodes also contain a subtype, that codes four main types and
+two additional ghost types. For ligatures, multiple bits can be set at the same
+time (in case of a single|-|glyph word).
+
+\startitemize
+ \startitem
+ \type {character}, for characters to be hyphenated: the lowest bit
+ (bit 0) is set to 1.
+ \stopitem
+ \startitem
+ \nod {glyph}, for specific font glyphs: the lowest bit (bit 0) is
+ not set.
+ \stopitem
+ \startitem
+ \type {ligature}, for constructed ligatures bit 1 is set.
+ \stopitem
+ \startitem
+ \type {ghost}, for so called \quote {ghost objects} bit 2 is set.
+ \stopitem
+ \startitem
+ \type {left}, for ligatures created from a left word boundary and for
+ ghosts created from \lpr {leftghost} bit 3 gets set.
+ \stopitem
+ \startitem
+ \type {right}, for ligatures created from a right word boundary and
+ for ghosts created from \lpr {rightghost} bit 4 is set.
+ \stopitem
+\stopitemize
+
+The \nod {glyph} nodes also contain language data, split into four items that
+were current when the node was created: the \prm {setlanguage} (15~bits), \prm
+{lefthyphenmin} (8~bits), \prm {righthyphenmin} (8~bits), and \prm {uchyph}
+(1~bit).
+
+Incidentally, \LUATEX\ allows 16383 separate languages, and words can be 256
+characters long. The language is stored with each character. You can set
+\prm {firstvalidlanguage} to for instance~1 and make thereby language~0
+an ignored hyphenation language.
+
+The new primitive \lpr {hyphenationmin} can be used to signal the minimal length
+of a word. This value is stored with the (current) language.
+
+Because the \prm {uchyph} value is saved in the actual nodes, its handling is
+subtly different from \TEX82: changes to \prm {uchyph} become effective
+immediately, not at the end of the current partial paragraph.
+
+Typeset boxes now always have their language information embedded in the nodes
+themselves, so there is no longer a possible dependency on the surrounding
+language settings. In \TEX82, a mid|-|paragraph statement like \type {\unhbox0}
+would process the box using the current paragraph language unless there was a
+\prm {setlanguage} issued inside the box. In \LUATEX, all language variables
+are already frozen.
+
+In traditional \TEX\ the process of hyphenation is driven by \type {lccode}s. In
+\LUATEX\ we made this dependency less strong. There are several strategies
+possible. When you do nothing, the currently used \type {lccode}s are used, when
+loading patterns, setting exceptions or hyphenating a list.
+
+When you set \prm {savinghyphcodes} to a value greater than zero the current set
+of \type {lccode}s will be saved with the language. In that case changing a \type
+{lccode} afterwards has no effect. However, you can adapt the set with:
+
+\starttyping
+\hjcode`a=`a
+\stoptyping
+
+This change is global which makes sense if you keep in mind that the moment that
+hyphenation happens is (normally) when the paragraph or a horizontal box is
+constructed. When \prm {savinghyphcodes} was zero when the language got
+initialized you start out with nothing, otherwise you already have a set.
+
+When a \lpr {hjcode} is greater than 0 but less than 32 is indicates the
+to be used length. In the following example we map a character (\type {x}) onto
+another one in the patterns and tell the engine that \type {œ} counts as one
+character. Because traditionally zero itself is reserved for inhibiting
+hyphenation, a value of 32 counts as zero.
+
+Here are some examples (we assume that French patterns are used):
+
+\starttabulate[||||]
+\NC \NC \type{foobar} \NC \type{foo-bar} \NC \NR
+\NC \type{\hjcode`x=`o} \NC \type{fxxbar} \NC \type{fxx-bar} \NC \NR
+\NC \type{\lefthyphenmin3} \NC \type{œdipus} \NC \type{œdi-pus} \NC \NR
+\NC \type{\lefthyphenmin4} \NC \type{œdipus} \NC \type{œdipus} \NC \NR
+\NC \type{\hjcode`œ=2} \NC \type{œdipus} \NC \type{œdi-pus} \NC \NR
+\NC \type{\hjcode`i=32 \hjcode`d=32} \NC \type{œdipus} \NC \type{œdipus} \NC \NR
+\NC
+\stoptabulate
+
+Carrying all this information with each glyph would give too much overhead and
+also make the process of setting up these codes more complex. A solution with
+\type {hjcode} sets was considered but rejected because in practice the current
+approach is sufficient and it would not be compatible anyway.
+
+Beware: the values are always saved in the format, independent of the setting
+of \prm {savinghyphcodes} at the moment the format is dumped.
+
+A boundary node normally would mark the end of a word which interferes with for
+instance discretionary injection. For this you can use the \prm {wordboundary}
+as a trigger. Here are a few examples of usage:
+
+\startbuffer
+ discrete---discrete
+\stopbuffer
+\typebuffer \startnarrower \dontcomplain \hsize 1pt \getbuffer \par \stopnarrower
+\startbuffer
+ discrete\discretionary{}{}{---}discrete
+\stopbuffer
+\typebuffer \startnarrower \dontcomplain \hsize 1pt \getbuffer \par \stopnarrower
+\startbuffer
+ discrete\wordboundary\discretionary{}{}{---}discrete
+\stopbuffer
+\typebuffer \startnarrower \dontcomplain \hsize 1pt \getbuffer \par \stopnarrower
+\startbuffer
+ discrete\wordboundary\discretionary{}{}{---}\wordboundary discrete
+\stopbuffer
+\typebuffer \startnarrower \dontcomplain \hsize 1pt \getbuffer \par \stopnarrower
+\startbuffer
+ discrete\wordboundary\discretionary{---}{}{}\wordboundary discrete
+\stopbuffer
+\typebuffer \startnarrower \dontcomplain \hsize 1pt \getbuffer \par \stopnarrower
+
+We only accept an explicit hyphen when there is a preceding glyph and we skip a
+sequence of explicit hyphens since that normally indicates a \type {--} or \type
+{---} ligature in which case we can in a worse case usage get bad node lists
+later on due to messed up ligature building as these dashes are ligatures in base
+fonts. This is a side effect of separating the hyphenation, ligaturing and
+kerning steps.
+
+The start and end of a sequence of characters is signalled by a \nod {glue}, \nod
+{penalty}, \nod {kern} or \nod {boundary} node. But by default also a \nod
+{hlist}, \nod {vlist}, \nod {rule}, \nod {dir}, \nod {whatsit}, \nod {ins}, and
+\nod {adjust} node indicate a start or end. You can omit the last set from the
+test by setting \lpr {hyphenationbounds} to a non|-|zero value:
+
+\starttabulate[|c|l|]
+\DB value \BC behaviour \NC \NR
+\TB
+\NC \type{0} \NC not strict \NC \NR
+\NC \type{1} \NC strict start \NC \NR
+\NC \type{2} \NC strict end \NC \NR
+\NC \type{3} \NC strict start and strict end \NC \NR
+\LL
+\stoptabulate
+
+The word start is determined as follows:
+
+\starttabulate[|l|l|]
+\DB node \BC behaviour \NC \NR
+\TB
+\BC boundary \NC yes when wordboundary \NC \NR
+\BC hlist \NC when hyphenationbounds 1 or 3 \NC \NR
+\BC vlist \NC when hyphenationbounds 1 or 3 \NC \NR
+\BC rule \NC when hyphenationbounds 1 or 3 \NC \NR
+\BC dir \NC when hyphenationbounds 1 or 3 \NC \NR
+\BC whatsit \NC when hyphenationbounds 1 or 3 \NC \NR
+\BC glue \NC yes \NC \NR
+\BC math \NC skipped \NC \NR
+\BC glyph \NC exhyphenchar (one only) : yes (so no -- ---) \NC \NR
+\BC otherwise \NC yes \NC \NR
+\LL
+\stoptabulate
+
+The word end is determined as follows:
+
+\starttabulate[|l|l|]
+\DB node \BC behaviour \NC \NR
+\TB
+\BC boundary \NC yes \NC \NR
+\BC glyph \NC yes when different language \NC \NR
+\BC glue \NC yes \NC \NR
+\BC penalty \NC yes \NC \NR
+\BC kern \NC yes when not italic (for some historic reason) \NC \NR
+\BC hlist \NC when hyphenationbounds 2 or 3 \NC \NR
+\BC vlist \NC when hyphenationbounds 2 or 3 \NC \NR
+\BC rule \NC when hyphenationbounds 2 or 3 \NC \NR
+\BC dir \NC when hyphenationbounds 2 or 3 \NC \NR
+\BC whatsit \NC when hyphenationbounds 2 or 3 \NC \NR
+\BC ins \NC when hyphenationbounds 2 or 3 \NC \NR
+\BC adjust \NC when hyphenationbounds 2 or 3 \NC \NR
+\LL
+\stoptabulate
+
+\in {Figures} [hb:1] upto \in [hb:5] show some examples. In all cases we set the
+min values to 1 and make sure that the words hyphenate at each character.
+
+\hyphenation{o-n-e t-w-o}
+
+\def\SomeTest#1#2%
+ {\lefthyphenmin \plusone
+ \righthyphenmin \plusone
+ \parindent \zeropoint
+ \everypar \emptytoks
+ \dontcomplain
+ \hbox to 2cm {%
+ \vtop {%
+ \hsize 1pt
+ \hyphenationbounds#1
+ #2
+ \par}}}
+
+\startplacefigure[reference=hb:1,title={\type{one}}]
+ \startcombination[4*1]
+ {\SomeTest{0}{one}} {\type{0}}
+ {\SomeTest{1}{one}} {\type{1}}
+ {\SomeTest{2}{one}} {\type{2}}
+ {\SomeTest{3}{one}} {\type{3}}
+ \stopcombination
+\stopplacefigure
+
+\startplacefigure[reference=hb:2,title={\type{one\null two}}]
+ \startcombination[4*1]
+ {\SomeTest{0}{one\null two}} {\type{0}}
+ {\SomeTest{1}{one\null two}} {\type{1}}
+ {\SomeTest{2}{one\null two}} {\type{2}}
+ {\SomeTest{3}{one\null two}} {\type{3}}
+ \stopcombination
+\stopplacefigure
+
+\startplacefigure[reference=hb:3,title={\type{\null one\null two}}]
+ \startcombination[4*1]
+ {\SomeTest{0}{\null one\null two}} {\type{0}}
+ {\SomeTest{1}{\null one\null two}} {\type{1}}
+ {\SomeTest{2}{\null one\null two}} {\type{2}}
+ {\SomeTest{3}{\null one\null two}} {\type{3}}
+ \stopcombination
+\stopplacefigure
+
+\startplacefigure[reference=hb:4,title={\type{one\null two\null}}]
+ \startcombination[4*1]
+ {\SomeTest{0}{one\null two\null}} {\type{0}}
+ {\SomeTest{1}{one\null two\null}} {\type{1}}
+ {\SomeTest{2}{one\null two\null}} {\type{2}}
+ {\SomeTest{3}{one\null two\null}} {\type{3}}
+ \stopcombination
+\stopplacefigure
+
+\startplacefigure[reference=hb:5,title={\type{\null one\null two\null}}]
+ \startcombination[4*1]
+ {\SomeTest{0}{\null one\null two\null}} {\type{0}}
+ {\SomeTest{1}{\null one\null two\null}} {\type{1}}
+ {\SomeTest{2}{\null one\null two\null}} {\type{2}}
+ {\SomeTest{3}{\null one\null two\null}} {\type{3}}
+ \stopcombination
+\stopplacefigure
+
+% (Future versions of \LUATEX\ might provide more granularity.)
+
+In traditional \TEX\ ligature building and hyphenation are interwoven with the
+line break mechanism. In \LUATEX\ these phases are isolated. As a consequence we
+deal differently with (a sequence of) explicit hyphens. We already have added
+some control over aspects of the hyphenation and yet another one concerns
+automatic hyphens (e.g.\ \type {-} characters in the input).
+
+When \lpr {automatichyphenmode} has a value of 0, a hyphen will be turned into
+an automatic discretionary. The snippets before and after it will not be
+hyphenated. A side effect is that a leading hyphen can lead to a split but one
+will seldom run into that situation. Setting a pre and post character makes this
+more prominent. A value of 1 will prevent this side effect and a value of 2 will
+not turn the hyphen into a discretionary. Experiments with other options, like
+permitting hyphenation of the words on both sides were discarded.
+
+\startbuffer[a]
+before-after \par
+before--after \par
+before---after \par
+\stopbuffer
+
+\startbuffer[b]
+-before \par
+after- \par
+--before \par
+after-- \par
+---before \par
+after--- \par
+\stopbuffer
+
+\startbuffer[c]
+before-after \par
+before--after \par
+before---after \par
+\stopbuffer
+
+\startbuffer[demo]
+\startcombination[nx=4,ny=3,location=top]
+ {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\zerocount \hsize6em \getbuffer[a]}} {A~0~6em}
+ {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\zerocount \hsize2pt \getbuffer[a]}} {A~0~2pt}
+ {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\plusone \hsize2pt \getbuffer[a]}} {A~1~2pt}
+ {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\plustwo \hsize2pt \getbuffer[a]}} {A~2~2pt}
+ {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\zerocount \hsize6em \getbuffer[b]}} {B~0~6em}
+ {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\zerocount \hsize2pt \getbuffer[b]}} {B~0~2pt}
+ {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\plusone \hsize2pt \getbuffer[b]}} {B~1~2pt}
+ {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\plustwo \hsize2pt \getbuffer[b]}} {B~2~2pt}
+ {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\zerocount \hsize6em \getbuffer[c]}} {C~0~6em}
+ {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\zerocount \hsize2pt \getbuffer[c]}} {C~0~2pt}
+ {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\plusone \hsize2pt \getbuffer[c]}} {C~1~2pt}
+ {\framed[align=normal,strut=no,top=\vskip.5ex,bottom=\vskip.5ex]{\automatichyphenmode\plustwo \hsize2pt \getbuffer[c]}} {C~2~2pt}
+\stopcombination
+\stopbuffer
+
+\startplacefigure[locationreference=automatichyphenmode:1,title={The automatic modes \type {0} (default), \type {1} and \type {2}, with a \prm {hsize}
+of 6em and 2pt (which triggers a linebreak).}]
+ \dontcomplain \tt \getbuffer[demo]
+\stopplacefigure
+
+\startplacefigure[reference=automatichyphenmode:2,title={The automatic modes \type {0} (default), \type {1} and \type {2}, with \lpr {preexhyphenchar} and \lpr {postexhyphenchar} set to characters \type {A} and \type {B}.}]
+ \postexhyphenchar`A\relax
+ \preexhyphenchar `B\relax
+ \dontcomplain \tt \getbuffer[demo]
+\stopplacefigure
+
+In \in {figure} [automatichyphenmode:1] \in {and} [automatichyphenmode:2] we show
+what happens with three samples:
+
+Input A: \typebuffer[a]
+Input B: \typebuffer[b]
+Input C: \typebuffer[c]
+
+As with primitive companions of other single character commands, the \prm {-}
+command has a more verbose primitive version in \lpr {explicitdiscretionary}
+and the normally intercepted in the hyphenator character \type {-} (or whatever
+is configured) is available as \lpr {automaticdiscretionary}.
+
+\stopsection
+
+\startsection[title={The main control loop}]
+
+\topicindex {main loop}
+\topicindex {hyphenation}
+
+In \LUATEX's main loop, almost all input characters that are to be typeset are
+converted into \nod {glyph} node records with subtype \quote {character}, but
+there are a few exceptions.
+
+\startitemize[n]
+
+\startitem
+ The \prm {accent} primitive creates nodes with subtype \quote {glyph}
+ instead of \quote {character}: one for the actual accent and one for the
+ accentee. The primary reason for this is that \prm {accent} in \TEX82 is
+ explicitly dependent on the current font encoding, so it would not make much
+ sense to attach a new meaning to the primitive's name, as that would
+ invalidate many old documents and macro packages. A secondary reason is that
+ in \TEX82, \prm {accent} prohibits hyphenation of the current word. Since
+ in \LUATEX\ hyphenation only takes place on \quote {character} nodes, it is
+ possible to achieve the same effect. Of course, modern \UNICODE\ aware macro
+ packages will not use the \prm {accent} primitive at all but try to map
+ directly on composed characters.
+
+ This change of meaning did happen with \prm {char}, that now generates
+ \quote {glyph} nodes with a character subtype. In traditional \TEX\ there was
+ a strong relationship between the 8|-|bit input encoding, hyphenation and
+ glyphs taken from a font. In \LUATEX\ we have \UTF\ input, and in most cases
+ this maps directly to a character in a font, apart from glyph replacement in
+ the font engine. If you want to access arbitrary glyphs in a font directly
+ you can always use \LUA\ to do so, because fonts are available as \LUA\
+ table.
+\stopitem
+
+\startitem
+ All the results of processing in math mode eventually become nodes with
+ \quote {glyph} subtypes. In fact, the result of processing math is just
+ a regular list of glyphs, kerns, glue, penalties, boxes etc.
+\stopitem
+
+\startitem
+ The \ALEPH|-|derived commands \lpr {leftghost} and \lpr {rightghost}
+ create nodes of a third subtype: \quote {ghost}. These nodes are ignored
+ completely by all further processing until the stage where inter|-|glyph
+ kerning is added.
+\stopitem
+
+\startitem
+ Automatic discretionaries are handled differently. \TEX82 inserts an empty
+ discretionary after sensing an input character that matches the \prm
+ {hyphenchar} in the current font. This test is wrong in our opinion: whether
+ or not hyphenation takes place should not depend on the current font, it is a
+ language property. \footnote {When \TEX\ showed up we didn't have \UNICODE\
+ yet and being limited to eight bits meant that one sometimes had to
+ compromise between supporting character input, glyph rendering, hyphenation.}
+
+ In \LUATEX, it works like this: if \LUATEX\ senses a string of input
+ characters that matches the value of the new integer parameter \prm
+ {exhyphenchar}, it will insert an explicit discretionary after that series of
+ nodes. Initially \TEX\ sets the \type {\exhyphenchar=`\-}. Incidentally, this
+ is a global parameter instead of a language-specific one because it may be
+ useful to change the value depending on the document structure instead of the
+ text language.
+
+ The insertion of discretionaries after a sequence of explicit hyphens happens
+ at the same time as the other hyphenation processing, {\it not\/} inside the
+ main control loop.
+
+ The only use \LUATEX\ has for \prm {hyphenchar} is at the check whether a
+ word should be considered for hyphenation at all. If the \prm {hyphenchar}
+ of the font attached to the first character node in a word is negative, then
+ hyphenation of that word is abandoned immediately. This behaviour is added
+ for backward compatibility only, and the use of \type {\hyphenchar=-1} as a
+ means of preventing hyphenation should not be used in new \LUATEX\ documents.
+\stopitem
+
+\startitem
+ The \prm {setlanguage} command no longer creates whatsits. The meaning of
+ \prm {setlanguage} is changed so that it is now an integer parameter like all
+ others. That integer parameter is used in \type {\glyph_node} creation to add
+ language information to the glyph nodes. In conjunction, the \prm {language}
+ primitive is extended so that it always also updates the value of \prm
+ {setlanguage}.
+\stopitem
+
+\startitem
+ The \prm {noboundary} command (that prohibits word boundary processing
+ where that would normally take place) now does create nodes. These nodes are
+ needed because the exact place of the \prm {noboundary} command in the
+ input stream has to be retained until after the ligature and font processing
+ stages.
+\stopitem
+
+\startitem
+ There is no longer a \type {main_loop} label in the code. Remember that
+ \TEX82 did quite a lot of processing while adding \type {char_nodes} to the
+ horizontal list? For speed reasons, it handled that processing code outside
+ of the \quote {main control} loop, and only the first character of any \quote
+ {word} was handled by that \quote {main control} loop. In \LUATEX, there is
+ no longer a need for that (all hard work is done later), and the (now very
+ small) bits of character|-|handling code have been moved back inline. When
+ \prm {tracingcommands} is on, this is visible because the full word is
+ reported, instead of just the initial character.
+\stopitem
+
+\stopitemize
+
+Because we tend to make hard coded behaviour configurable a few new primitives
+have been added:
+
+\starttyping
+\hyphenpenaltymode
+\automatichyphenpenalty
+\explicithyphenpenalty
+\stoptyping
+
+The first parameter has the following consequences for automatic discs (the ones
+resulting from an \prm {exhyphenchar}:
+
+\starttabulate[|c|l|l|]
+\DB mode \BC automatic disc \type {-} \BC explicit disc \prm{-} \NC \NR
+\TB
+\NC \type{0} \NC \prm {exhyphenpenalty} \NC \prm {exhyphenpenalty} \NC \NR
+\NC \type{1} \NC \prm {hyphenpenalty} \NC \prm {hyphenpenalty} \NC \NR
+\NC \type{2} \NC \prm {exhyphenpenalty} \NC \prm {hyphenpenalty} \NC \NR
+\NC \type{3} \NC \prm {hyphenpenalty} \NC \prm {exhyphenpenalty} \NC \NR
+\NC \type{4} \NC \lpr {automatichyphenpenalty} \NC \lpr {explicithyphenpenalty} \NC \NR
+\NC \type{5} \NC \prm {exhyphenpenalty} \NC \lpr {explicithyphenpenalty} \NC \NR
+\NC \type{6} \NC \prm {hyphenpenalty} \NC \lpr {explicithyphenpenalty} \NC \NR
+\NC \type{7} \NC \lpr {automatichyphenpenalty} \NC \prm {exhyphenpenalty} \NC \NR
+\NC \type{8} \NC \lpr {automatichyphenpenalty} \NC \prm {hyphenpenalty} \NC \NR
+\LL
+\stoptabulate
+
+other values do what we always did in \LUATEX: insert \prm {exhyphenpenalty}.
+
+\stopsection
+
+\startsection[title={Loading patterns and exceptions},reference=patternsexceptions]
+
+\topicindex {hyphenation}
+\topicindex {hyphenation+patterns}
+\topicindex {hyphenation+exceptions}
+\topicindex {patterns}
+\topicindex {exceptions}
+
+Although we keep the traditional approach towards hyphenation (which is still
+superior) the implementation of the hyphenation algorithm in \LUATEX\ is quite
+different from the one in \TEX82.
+
+After expansion, the argument for \prm {patterns} has to be proper \UTF8 with
+individual patterns separated by spaces, no \prm {char} or \prm {chardef}d
+commands are allowed. The current implementation is quite strict and will reject
+all non|-|\UNICODE\ characters. Likewise, the expanded argument for \prm
+{hyphenation} also has to be proper \UTF8, but here a bit of extra syntax is
+provided:
+
+\startitemize[n]
+\startitem
+ Three sets of arguments in curly braces (\type {{}{}{}}) indicate a desired
+ complex discretionary, with arguments as in \prm {discretionary}'s command in
+ normal document input.
+\stopitem
+\startitem
+ A \type {-} indicates a desired simple discretionary, cf.\ \type {\-} and
+ \type {\discretionary{-}{}{}} in normal document input.
+\stopitem
+\startitem
+ Internal command names are ignored. This rule is provided especially for \prm
+ {discretionary}, but it also helps to deal with \prm {relax} commands that
+ may sneak in.
+\stopitem
+\startitem
+ An \type {=} indicates a (non|-|discretionary) hyphen in the document input.
+\stopitem
+\stopitemize
+
+The expanded argument is first converted back to a space|-|separated string while
+dropping the internal command names. This string is then converted into a
+dictionary by a routine that creates key|-|value pairs by converting the other
+listed items. It is important to note that the keys in an exception dictionary
+can always be generated from the values. Here are a few examples:
+
+\starttabulate[|l|l|l|]
+\DB value \BC implied key (input) \BC effect \NC\NR
+\TB
+\NC \type {ta-ble} \NC table \NC \type {ta\-ble} ($=$ \type {ta\discretionary{-}{}{}ble}) \NC\NR
+\NC \type {ba{k-}{}{c}ken} \NC backen \NC \type {ba\discretionary{k-}{}{c}ken} \NC\NR
+\LL
+\stoptabulate
+
+The resultant patterns and exception dictionary will be stored under the language
+code that is the present value of \prm {language}.
+
+In the last line of the table, you see there is no \prm {discretionary} command
+in the value: the command is optional in the \TEX-based input syntax. The
+underlying reason for that is that it is conceivable that a whole dictionary of
+words is stored as a plain text file and loaded into \LUATEX\ using one of the
+functions in the \LUA\ \type {lang} library. This loading method is quite a bit
+faster than going through the \TEX\ language primitives, but some (most?) of that
+speed gain would be lost if it had to interpret command sequences while doing so.
+
+It is possible to specify extra hyphenation points in compound words by using
+\type {{-}{}{-}} for the explicit hyphen character (replace \type {-} by the
+actual explicit hyphen character if needed). For example, this matches the word
+\quote {multi|-|word|-|boundaries} and allows an extra break inbetween \quote
+{boun} and \quote {daries}:
+
+\starttyping
+\hyphenation{multi{-}{}{-}word{-}{}{-}boun-daries}
+\stoptyping
+
+The motivation behind the \ETEX\ extension \prm {savinghyphcodes} was that
+hyphenation heavily depended on font encodings. This is no longer true in
+\LUATEX, and the corresponding primitive is basically ignored. Because we now
+have \lpr {hjcode}, the case relate codes can be used exclusively for \prm
+{uppercase} and \prm {lowercase}.
+
+The three curly brace pair pattern in an exception can be somewhat unexpected so
+we will try to explain it by example. The pattern \type {foo{}{}{x}bar} pattern
+creates a lookup \type {fooxbar} and the pattern \type {foo{}{}{}bar} creates
+\type {foobar}. Then, when a hit happens there is a replacement text (\type {x})
+or none. Because we introduced penalties in discretionary nodes, the exception
+syntax now also can take a penalty specification. The value between square brackets
+is a multiplier for \lpr {exceptionpenalty}. Here we have set it to 10000 so
+effectively we get 30000 in the example.
+
+\def\ShowSample#1#2%
+ {\startlinecorrection[blank]
+ \hyphenation{#1}%
+ \exceptionpenalty=10000
+ \bTABLE[foregroundstyle=type]
+ \bTR
+ \bTD[align=middle,nx=4] \type{#1} \eTD
+ \eTR
+ \bTR
+ \bTD[align=middle] \type{10em} \eTD
+ \bTD[align=middle] \type {3em} \eTD
+ \bTD[align=middle] \type {0em} \eTD
+ \bTD[align=middle] \type {6em} \eTD
+ \eTR
+ \bTR
+ \bTD[width=10em]\vtop{\hsize 10em 123 #2 123\par}\eTD
+ \bTD[width=10em]\vtop{\hsize 3em 123 #2 123\par}\eTD
+ \bTD[width=10em]\vtop{\hsize 0em 123 #2 123\par}\eTD
+ \bTD[width=10em]\vtop{\setupalign[verytolerant,stretch]\rmtf\hsize 6em 123 #2 #2 #2 #2 123\par}\eTD
+ \eTR
+ \eTABLE
+ \stoplinecorrection}
+
+\ShowSample{x{a-}{-b}{}x{a-}{-b}{}x{a-}{-b}{}x{a-}{-b}{}xx}{xxxxxx}
+\ShowSample{x{a-}{-b}{}x{a-}{-b}{}[3]x{a-}{-b}{}[1]x{a-}{-b}{}xx}{xxxxxx}
+
+\ShowSample{z{a-}{-b}{z}{a-}{-b}{z}{a-}{-b}{z}{a-}{-b}{z}z}{zzzzzz}
+\ShowSample{z{a-}{-b}{z}{a-}{-b}{z}[3]{a-}{-b}{z}[1]{a-}{-b}{z}z}{zzzzzz}
+
+\stopsection
+
+\startsection[title={Applying hyphenation}]
+
+\topicindex {hyphenation+how it works}
+\topicindex {hyphenation+discretionaries}
+\topicindex {discretionaries}
+
+The internal structures \LUATEX\ uses for the insertion of discretionaries in
+words is very different from the ones in \TEX82, and that means there are some
+noticeable differences in handling as well.
+
+First and foremost, there is no \quote {compressed trie} involved in hyphenation.
+The algorithm still reads pattern files generated by \PATGEN, but \LUATEX\ uses a
+finite state hash to match the patterns against the word to be hyphenated. This
+algorithm is based on the \quote {libhnj} library used by \OPENOFFICE, which in
+turn is inspired by \TEX.
+
+There are a few differences between \LUATEX\ and \TEX82 that are a direct result
+of the implementation:
+
+\startitemize
+\startitem
+ \LUATEX\ happily hyphenates the full \UNICODE\ character range.
+\stopitem
+\startitem
+ Pattern and exception dictionary size is limited by the available memory
+ only, all allocations are done dynamically. The trie|-|related settings in
+ \type {texmf.cnf} are ignored.
+\stopitem
+\startitem
+ Because there is no \quote {trie preparation} stage, language patterns never
+ become frozen. This means that the primitive \prm {patterns} (and its \LUA\
+ counterpart \type {lang.patterns}) can be used at any time, not only in
+ ini\TEX.
+\stopitem
+\startitem
+ Only the string representation of \prm {patterns} and \prm {hyphenation} is
+ stored in the format file. At format load time, they are simply
+ re|-|evaluated. It follows that there is no real reason to preload languages
+ in the format file. In fact, it is usually not a good idea to do so. It is
+ much smarter to load patterns no sooner than the first time they are actually
+ needed.
+\stopitem
+\startitem
+ \LUATEX\ uses the language-specific variables \lpr {prehyphenchar} and \lpr
+ {posthyphenchar} in the creation of implicit discretionaries, instead of
+ \TEX82's \prm {hyphenchar}, and the values of the language|-|specific
+ variables \lpr {preexhyphenchar} and \lpr {postexhyphenchar} for explicit
+ discretionaries (instead of \TEX82's empty discretionary).
+\stopitem
+\startitem
+ The value of the two counters related to hyphenation, \prm {hyphenpenalty}
+ and \prm {exhyphenpenalty}, are now stored in the discretionary nodes. This
+ permits a local overload for explicit \prm {discretionary} commands. The
+ value current when the hyphenation pass is applied is used. When no callbacks
+ are used this is compatible with traditional \TEX. When you apply the \LUA\
+ \type {lang.hyphenate} function the current values are used.
+\stopitem
+\startitem
+ The hyphenation exception dictionary is maintained as key|-|value hash, and
+ that is also dynamic, so the \type {hyph_size} setting is not used either.
+\stopitem
+\stopitemize
+
+Because we store penalties in the disc node the \prm {discretionary} command has
+been extended to accept an optional penalty specification, so you can do the
+following:
+
+\startbuffer
+\hsize1mm
+1:foo{\hyphenpenalty 10000\discretionary{}{}{}}bar\par
+2:foo\discretionary penalty 10000 {}{}{}bar\par
+3:foo\discretionary{}{}{}bar\par
+\stopbuffer
+
+\typebuffer
+
+This results in:
+
+\blank \start \getbuffer \stop \blank
+
+Inserted characters and ligatures inherit their attributes from the nearest glyph
+node item (usually the preceding one, but the following one for the items
+inserted at the left-hand side of a word).
+
+Word boundaries are no longer implied by font switches, but by language switches.
+One word can have two separate fonts and still be hyphenated correctly (but it
+can not have two different languages, the \prm {setlanguage} command forces a
+word boundary).
+
+All languages start out with \type {\prehyphenchar=`\-}, \type {\posthyphenchar=0},
+\type {\preexhyphenchar=0} and \type {\postexhyphenchar=0}. When you assign the
+values of one of these four parameters, you are actually changing the settings
+for the current \prm {language}, this behaviour is compatible with \prm {patterns}
+and \prm {hyphenation}.
+
+\LUATEX\ also hyphenates the first word in a paragraph. Words can be up to 256
+characters long (up from 64 in \TEX82). Longer words are ignored right now, but
+eventually either the limitation will be removed or perhaps it will become
+possible to silently ignore the excess characters (this is what happens in
+\TEX82, but there the behaviour cannot be controlled).
+
+If you are using the \LUA\ function \type {lang.hyphenate}, you should be aware
+that this function expects to receive a list of \quote {character} nodes. It will
+not operate properly in the presence of \quote {glyph}, \quote {ligature}, or
+\quote {ghost} nodes, nor does it know how to deal with kerning.
+
+\stopsection
+
+\startsection[title={Applying ligatures and kerning}]
+
+\topicindex {ligatures}
+\topicindex {kerning}
+
+After all possible hyphenation points have been inserted in the list, \LUATEX\
+will process the list to convert the \quote {character} nodes into \quote {glyph}
+and \quote {ligature} nodes. This is actually done in two stages: first all
+ligatures are processed, then all kerning information is applied to the result
+list. But those two stages are somewhat dependent on each other: If the used font
+makes it possible to do so, the ligaturing stage adds virtual \quote {character}
+nodes to the word boundaries in the list. While doing so, it removes and
+interprets \prm {noboundary} nodes. The kerning stage deletes those word
+boundary items after it is done with them, and it does the same for \quote
+{ghost} nodes. Finally, at the end of the kerning stage, all remaining \quote
+{character} nodes are converted to \quote {glyph} nodes.
+
+This word separation is worth mentioning because, if you overrule from \LUA\ only
+one of the two callbacks related to font handling, then you have to make sure you
+perform the tasks normally done by \LUATEX\ itself in order to make sure that the
+other, non|-|overruled, routine continues to function properly.
+
+Although we could improve the situation the reality is that in modern \OPENTYPE\
+fonts ligatures can be constructed in many ways: by replacing a sequence of
+characters by one glyph, or by selectively replacing individual glyphs, or by
+kerning, or any combination of this. Add to that contextual analysis and it will
+be clear that we have to let \LUA\ do that job instead. The generic font handler
+that we provide (which is part of \CONTEXT) distinguishes between base mode
+(which essentially is what we describe here and which delegates the task to \TEX)
+and node mode (which deals with more complex fonts.
+
+Let's look at an example. Take the word \type {office}, hyphenated \type
+{of-fice}, using a \quote {normal} font with all the \type {f}-\type {f} and
+\type {f}-\type {i} type ligatures:
+
+\starttabulate[|l|l|]
+\NC initial \NC \type {{o}{f}{f}{i}{c}{e}} \NC\NR
+\NC after hyphenation \NC \type {{o}{f}{{-},{},{}}{f}{i}{c}{e}} \NC\NR
+\NC first ligature stage \NC \type {{o}{{f-},{f},{<ff>}}{i}{c}{e}} \NC\NR
+\NC final result \NC \type {{o}{{f-},{<fi>},{<ffi>}}{c}{e}} \NC\NR
+\stoptabulate
+
+That's bad enough, but let us assume that there is also a hyphenation point
+between the \type {f} and the \type {i}, to create \type {of-f-ice}. Then the
+final result should be:
+
+\starttyping
+{o}{{f-},
+ {{f-},
+ {i},
+ {<fi>}},
+ {{<ff>-},
+ {i},
+ {<ffi>}}}{c}{e}
+\stoptyping
+
+with discretionaries in the post-break text as well as in the replacement text of
+the top-level discretionary that resulted from the first hyphenation point.
+
+Here is that nested solution again, in a different representation:
+
+\testpage[4]
+
+\starttabulate[|l|c|c|c|c|c|c|]
+\DB \BC pre \BC \BC post \BC \BC replace \BC \NC \NR
+\TB
+\NC topdisc \NC \type {f-} \NC (1) \NC \NC sub 1 \NC \NC sub 2 \NC \NR
+\NC sub 1 \NC \type {f-} \NC (2) \NC \type {i} \NC (3) \NC \type {<fi>} \NC (4) \NC \NR
+\NC sub 2 \NC \type {<ff>-} \NC (5) \NC \type {i} \NC (6) \NC \type {<ffi>} \NC (7) \NC \NR
+\LL
+\stoptabulate
+
+When line breaking is choosing its breakpoints, the following fields will
+eventually be selected:
+
+\starttabulate[|l|c|c|]
+\NC \type {of-f-ice} \NC \type {f-} \NC (1) \NC \NR
+\NC \NC \type {f-} \NC (2) \NC \NR
+\NC \NC \type {i} \NC (3) \NC \NR
+\NC \type {of-fice} \NC \type {f-} \NC (1) \NC \NR
+\NC \NC \type {<fi>} \NC (4) \NC \NR
+\NC \type {off-ice} \NC \type {<ff>-} \NC (5) \NC \NR
+\NC \NC \type {i} \NC (6) \NC \NR
+\NC \type {office} \NC \type {<ffi>} \NC (7) \NC \NR
+\stoptabulate
+
+The current solution in \LUATEX\ is not able to handle nested discretionaries,
+but it is in fact smart enough to handle this fictional \type {of-f-ice} example.
+It does so by combining two sequential discretionary nodes as if they were a
+single object (where the second discretionary node is treated as an extension of
+the first node).
+
+One can observe that the \type {of-f-ice} and \type {off-ice} cases both end with
+the same actual post replacement list (\type {i}), and that this would be the
+case even if \type {i} was the first item of a potential following ligature like
+\type {ic}. This allows \LUATEX\ to do away with one of the fields, and thus make
+the whole stuff fit into just two discretionary nodes.
+
+The mapping of the seven list fields to the six fields in this discretionary node
+pair is as follows:
+
+\starttabulate[|l|c|c|]
+\DB field \BC description \NC \NC \NR
+\TB
+\NC \type {disc1.pre} \NC \type {f-} \NC (1) \NC \NR
+\NC \type {disc1.post} \NC \type {<fi>} \NC (4) \NC \NR
+\NC \type {disc1.replace} \NC \type {<ffi>} \NC (7) \NC \NR
+\NC \type {disc2.pre} \NC \type {f-} \NC (2) \NC \NR
+\NC \type {disc2.post} \NC \type {i} \NC (3,6) \NC \NR
+\NC \type {disc2.replace} \NC \type {<ff>-} \NC (5) \NC \NR
+\LL
+\stoptabulate
+
+What is actually generated after ligaturing has been applied is therefore:
+
+\starttyping
+{o}{{f-},
+ {<fi>},
+ {<ffi>}}
+ {{f-},
+ {i},
+ {<ff>-}}{c}{e}
+\stoptyping
+
+The two discretionaries have different subtypes from a discretionary appearing on
+its own: the first has subtype 4, and the second has subtype 5. The need for
+these special subtypes stems from the fact that not all of the fields appear in
+their \quote {normal} location. The second discretionary especially looks odd,
+with things like the \type {<ff>-} appearing in \type {disc2.replace}. The fact
+that some of the fields have different meanings (and different processing code
+internally) is what makes it necessary to have different subtypes: this enables
+\LUATEX\ to distinguish this sequence of two joined discretionary nodes from the
+case of two standalone discretionaries appearing in a row.
+
+Of course there is still that relationship with fonts: ligatures can be implemented by
+mapping a sequence of glyphs onto one glyph, but also by selective replacement and
+kerning. This means that the above examples are just representing the traditional
+approach.
+
+\stopsection
+
+\startsection[title={Breaking paragraphs into lines}]
+
+\topicindex {linebreaks}
+\topicindex {paragraphs}
+\topicindex {discretionaries}
+
+This code is almost unchanged, but because of the above|-|mentioned changes
+with respect to discretionaries and ligatures, line breaking will potentially be
+different from traditional \TEX. The actual line breaking code is still based on
+the \TEX82 algorithms, and it does not expect there to be discretionaries inside
+of discretionaries. But, as patterns evolve and font handling can influence
+discretionaries, you need to be aware of the fact that long term consistency is not
+an engine matter only.
+
+But that situation is now fairly common in \LUATEX, due to the changes to the
+ligaturing mechanism. And also, the \LUATEX\ discretionary nodes are implemented
+slightly different from the \TEX82 nodes: the \type {no_break} text is now
+embedded inside the disc node, where previously these nodes kept their place in
+the horizontal list. In traditional \TEX\ the discretionary node contains a
+counter indicating how many nodes to skip, but in \LUATEX\ we store the pre, post
+and replace text in the discretionary node.
+
+The combined effect of these two differences is that \LUATEX\ does not always use
+all of the potential breakpoints in a paragraph, especially when fonts with many
+ligatures are used. Of course kerning also complicates matters here.
+
+\stopsection
+
+\startsection[title={The \type {lang} library}][library=lang]
+
+\subsection {\type {new} and \type {id}}
+
+\topicindex {languages+library}
+
+\libindex {new}
+\libindex {id}
+
+This library provides the interface to \LUATEX's structure representing a
+language, and the associated functions.
+
+\startfunctioncall
+<language> l = lang.new()
+<language> l = lang.new(<number> id)
+\stopfunctioncall
+
+This function creates a new userdata object. An object of type \type {<language>}
+is the first argument to most of the other functions in the \type {lang} library.
+These functions can also be used as if they were object methods, using the colon
+syntax. Without an argument, the next available internal id number will be
+assigned to this object. With argument, an object will be created that links to
+the internal language with that id number.
+
+\startfunctioncall
+<number> n = lang.id(<language> l)
+\stopfunctioncall
+
+The number returned is the internal \prm {language} id number this object refers to.
+
+\subsection {\type {hyphenation}}
+
+\libindex {hyphenation}
+
+You can hyphenate a string directly with:
+
+\startfunctioncall
+<string> n = lang.hyphenation(<language> l)
+lang.hyphenation(<language> l, <string> n)
+\stopfunctioncall
+
+\subsection {\type {clear_hyphenation} and \type {clean}}
+
+\libindex {clear_hyphenation}
+\libindex {clean}
+
+This either returns the current hyphenation exceptions for this language, or adds
+new ones. The syntax of the string is explained in~\in {section}
+[patternsexceptions].
+
+\startfunctioncall
+lang.clear_hyphenation(<language> l)
+\stopfunctioncall
+
+This call clears the exception dictionary (string) for this language.
+
+\startfunctioncall
+<string> n = lang.clean(<language> l, <string> o)
+<string> n = lang.clean(<string> o)
+\stopfunctioncall
+
+This function creates a hyphenation key from the supplied hyphenation value. The
+syntax of the argument string is explained in \in {section} [patternsexceptions].
+This function is useful if you want to do something else based on the words in a
+dictionary file, like spell|-|checking.
+
+\subsection {\type {patterns} and \type {clear_patterns}}
+
+\libindex {patterns}
+\libindex {clear_patterns}
+
+\startfunctioncall
+<string> n = lang.patterns(<language> l)
+lang.patterns(<language> l, <string> n)
+\stopfunctioncall
+
+This adds additional patterns for this language object, or returns the current
+set. The syntax of this string is explained in \in {section}
+[patternsexceptions].
+
+\startfunctioncall
+lang.clear_patterns(<language> l)
+\stopfunctioncall
+
+This can be used to clear the pattern dictionary for a language.
+
+\subsection {\type {hyphenationmin}}
+
+\libindex {hyphenationmin}
+
+This function sets (or gets) the value of the \TEX\ parameter
+\type {\hyphenationmin}.
+
+\startfunctioncall
+n = lang.hyphenationmin(<language> l)
+lang.hyphenationmin(<language> l, <number> n)
+\stopfunctioncall
+
+\subsection {\type {[pre|post][ex|]hyphenchar}}
+
+\libindex {prehyphenchar}
+\libindex {posthyphenchar}
+\libindex {preexhyphenchar}
+\libindex {postexhyphenchar}
+
+\startfunctioncall
+<number> n = lang.prehyphenchar(<language> l)
+lang.prehyphenchar(<language> l, <number> n)
+
+<number> n = lang.posthyphenchar(<language> l)
+lang.posthyphenchar(<language> l, <number> n)
+\stopfunctioncall
+
+These two are used to get or set the \quote {pre|-|break} and \quote
+{post|-|break} hyphen characters for implicit hyphenation in this language. The
+intial values are decimal 45 (hyphen) and decimal~0 (indicating emptiness).
+
+\startfunctioncall
+<number> n = lang.preexhyphenchar(<language> l)
+lang.preexhyphenchar(<language> l, <number> n)
+
+<number> n = lang.postexhyphenchar(<language> l)
+lang.postexhyphenchar(<language> l, <number> n)
+\stopfunctioncall
+
+These gets or set the \quote {pre|-|break} and \quote {post|-|break} hyphen
+characters for explicit hyphenation in this language. Both are initially
+decimal~0 (indicating emptiness).
+
+\subsection {\type {hyphenate}}
+
+\libindex {hyphenate}
+
+The next call inserts hyphenation points (discretionary nodes) in a node list. If
+\type {tail} is given as argument, processing stops on that node. Currently,
+\type {success} is always true if \type {head} (and \type {tail}, if specified)
+are proper nodes, regardless of possible other errors.
+
+\startfunctioncall
+<boolean> success = lang.hyphenate(<node> head)
+<boolean> success = lang.hyphenate(<node> head, <node> tail)
+\stopfunctioncall
+
+Hyphenation works only on \quote {characters}, a special subtype of all the glyph
+nodes with the node subtype having the value \type {1}. Glyph modes with
+different subtypes are not processed. See \in {section} [charsandglyphs] for
+more details.
+
+\subsection {\type {[set|get]hjcode}}
+
+\libindex {sethjcode}
+\libindex {gethjcode}
+
+The following two commands can be used to set or query hj codes:
+
+\startfunctioncall
+lang.sethjcode(<language> l, <number> char, <number> usedchar)
+<number> usedchar = lang.gethjcode(<language> l, <number> char)
+\stopfunctioncall
+
+When you set a hjcode the current sets get initialized unless the set was already
+initialized due to \prm {savinghyphcodes} being larger than zero.
+
+\stopsection
+
+\stopchapter
+
+\stopcomponent
+
+% \parindent0pt \hsize=1.1cm
+% 12-34-56 \par
+% 12-34-\hbox{56} \par
+% 12-34-\vrule width 1em height 1.5ex \par
+% 12-\hbox{34}-56 \par
+% 12-\vrule width 1em height 1.5ex-56 \par
+% \hjcode`\1=`\1 \hjcode`\2=`\2 \hjcode`\3=`\3 \hjcode`\4=`\4 \vskip.5cm
+% 12-34-56 \par
+% 12-34-\hbox{56} \par
+% 12-34-\vrule width 1em height 1.5ex \par
+% 12-\hbox{34}-56 \par
+% 12-\vrule width 1em height 1.5ex-56 \par
+