summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/luatex/luatex-languages.tex
diff options
context:
space:
mode:
authorContext Git Mirror Bot <phg42.2a@gmail.com>2015-10-07 14:15:06 +0200
committerContext Git Mirror Bot <phg42.2a@gmail.com>2015-10-07 14:15:06 +0200
commitee1c809d23ce322e7946f941545f7e0fa27ae5c6 (patch)
tree3e32a64b19cf9706e5ff0df289eb56e77571a5ca /doc/context/sources/general/manuals/luatex/luatex-languages.tex
parent961f357ef202a44da1f4b315c82ef143a6f51497 (diff)
downloadcontext-ee1c809d23ce322e7946f941545f7e0fa27ae5c6.tar.gz
2015-10-07 12:05:00
Diffstat (limited to 'doc/context/sources/general/manuals/luatex/luatex-languages.tex')
-rw-r--r--doc/context/sources/general/manuals/luatex/luatex-languages.tex514
1 files changed, 514 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/luatex/luatex-languages.tex b/doc/context/sources/general/manuals/luatex/luatex-languages.tex
new file mode 100644
index 000000000..56978b0fd
--- /dev/null
+++ b/doc/context/sources/general/manuals/luatex/luatex-languages.tex
@@ -0,0 +1,514 @@
+\environment luatex-style
+\environment luatex-logos
+
+\startcomponent luatex-languages
+
+\startchapter[reference=languages,title={Languages and characters, fonts and glyphs}]
+
+\LUATEX's internal handling of the characters and glyphs that eventually become
+typeset is quite different from the way \TEX82 handles those same objects. The
+easiest way to explain the difference is to focus on unrestricted horizontal mode
+(i.e.\ paragraphs) and hyphenation first. Later on, it will be easy to deal
+with the differences that occur in horizontal and math modes.
+
+In \TEX82, the characters you type are converted into \type {char_node} records
+when they are encountered by the main control loop. \TEX\ attaches and processes
+the font information while creating those records, so that the resulting \quote
+{horizontal list} contains the final forms of ligatures and implicit kerning.
+This packaging is needed because we may want to get the effective width of for
+instance a horizontal box.
+
+When it becomes necessary to hyphenate words in a paragraph, \TEX\ converts (one
+word at time) the \type {char_node} records into a string array by replacing
+ligatures with their components and ignoring the kerning. Then it runs the
+hyphenation algorithm on this string, and converts the hyphenated result back
+into a \quote {horizontal list} that is consecutively spliced back into the
+paragraph stream. Keep in mind that the paragraph may contain unboxed horizontal
+material, which then already contains ligatures and kerns and the words therein
+are part of the hyphenation process.
+
+The \type {char_node} records are somewhat misnamed, as they are glyph positions
+in specific fonts, and therefore not really \quote {characters} in the linguistic
+sense. There is no language information inside the \type {char_node} records.
+Instead, language information is passed along using \type {language whatsit}
+records inside the horizontal list.
+
+In \LUATEX, the situation is quite different. The characters you type are always
+converted into \type {glyph_node} records with a special subtype to identify them
+as being intended as linguistic characters. \LUATEX\ stores the needed language
+information in those records, but does not do any font|-|related processing at
+the time of node creation. It only stores the index of the current font.
+
+When it becomes necessary to typeset a paragraph, \LUATEX\ first inserts all
+hyphenation points right into the whole node list. Next, it processes all the
+font information in the whole list (creating ligatures and adjusting kerning),
+and finally it adjusts all the subtype identifiers so that the records are \quote
+{glyph nodes} from now on.
+
+That was the broad overview. The rest of this chapter will deal with the minutiae
+of the new process.
+
+\section[charsandglyphs]{Characters and glyphs}
+
+\TEX82 (including \PDFTEX) differentiates between \type {char_node}s and \type
+{lig_node}s. The former are simple items that contained nothing but a \quote
+{character} and a \quote {font} field, and they lived in the same memory as
+tokens did. The latter also contained a list of components, and a subtype
+indicating whether this ligature was the result of a word boundary, and it was
+stored in the same place as other nodes like boxes and kerns and glues.
+
+In \LUATEX, these two types are merged into one, somewhat larger structure called
+a \type {glyph_node}. Besides having the old character, font, and component
+fields, and the new special fields like \quote {attr}
+(see~\in{section}[glyphnodes]), these nodes also contain:
+
+\startitemize
+
+\startitem A subtype, split into four main types:
+
+ \startitemize
+ \startitem
+ \type {character}, for characters to be hyphenated: the lowest bit
+ (bit 0) is set to 1.
+ \stopitem
+ \startitem
+ \type {glyph}, for specific font glyphs: the lowest bit (bit 0) is
+ not set.
+ \stopitem
+ \startitem
+ \type {ligature}, for ligatures (bit 1 is set)
+ \stopitem
+ \startitem
+ \type {ghost}, for \quote {ghost objects} (bit 2 is set)
+ \stopitem
+ \stopitemize
+
+ The latter two make further use of two extra fields (bits 3 and 4):
+
+ \startitemize
+ \startitem
+ \type {left}, for ligatures created from a left word boundary and for
+ ghosts created from \type {\leftghost}
+ \stopitem
+ \startitem
+ \type {right}, for ligatures created from a right word boundary and
+ for ghosts created from \type {\rightghost}
+ \stopitem
+ \stopitemize
+
+ For ligatures, both bits can be set at the same time (in case of a
+ single|-|glyph word).
+
+\stopitem
+
+\startitem
+ \type {glyph_node}s of type \quote {character} also contain language data,
+ split into four items that were current when the node was created: the
+ \type {\setlanguage} (15 bits), \type {\lefthyphenmin} (8 bits), \type
+ {\righthyphenmin} (8 bits), and \type {\uchyph} (1 bit).
+\stopitem
+
+\stopitemize
+
+Incidentally, \LUATEX\ allows 16383 separate languages, and words can be 256
+characters long.
+
+The new primitive \type {\hyphenationmin} can be used to signal the minimal length
+of a word. This value stored with the (current) language.
+
+Because the \type {\uchyph} value is saved in the actual nodes, its handling is
+subtly different from \TEX82: changes to \type {\uchyph} become effective
+immediately, not at the end of the current partial paragraph.
+
+Typeset boxes now always have their language information embedded in the nodes
+themselves, so there is no longer a possible dependency on the surrounding
+language settings. In \TEX82, a mid-paragraph statement like \type {\unhbox0} would
+process the box using the current paragraph language unless there was a
+\type {\setlanguage} issued inside the box. In \LUATEX, all language variables are
+already frozen.
+
+\section{The main control loop}
+
+In \LUATEX's main loop, almost all input characters that are to be typeset are
+converted into \type {glyph} node records with subtype \quote {character}, but
+there are a few exceptions.
+
+First, the \type {\accent} primitives creates nodes with subtype \quote {glyph}
+instead of \quote {character}: one for the actual accent and one for the
+accentee. The primary reason for this is that \type {\accent} in \TEX82 is
+explicitly dependent on the current font encoding, so it would not make much
+sense to attach a new meaning to the primitive's name, as that would invalidate
+many old documents and macro packages. A secondary reason is that in \TEX82,
+\type {\accent} prohibits hyphenation of the current word. Since in \LUATEX\
+hyphenation only takes place on \quote {character} nodes, it is possible to
+achieve the same effect.
+
+This change of meaning did happen with \type {\char}, that now generates \quote
+{glyph} nodes with a character subtype. In traditional \TEX\ there was a strong
+relationship betwene the 8|-|bit input encoding, hyphenation and glyph staken
+from a font. In \LUATEX\ we have \UTF\ input, and in most cases this maps
+directly to a character in a font, apart from glyph replacement in the font
+engine. If you want to access arbitrary glyphs in a font directly you can alwasy
+use \LUA\ to do so, because fonts are available as \LUA\ table.
+
+Second, all the results of processing in math mode eventually become nodes with
+\quote {glyph} subtypes.
+
+Third, the \ALEPH|-|derived commands \type {\leftghost} and \type {\rightghost}
+create nodes of a third subtype: \quote {ghost}. These nodes are ignored
+completely by all further processing until the stage where inter|-|glyph kerning
+is added.
+
+Fourth, automatic discretionaries are handled differently. \TEX82 inserts an
+empty discretionary after sensing an input character that matches the \type
+{\hyphenchar} in the current font. This test is wrong, in our opinion: whether or
+not hyphenation takes place should not depend on the current font, it is a
+language property.
+
+In \LUATEX, it works like this: if \LUATEX\ senses a string of input characters
+that matches the value of the new integer parameter \type {\exhyphenchar}, it will
+insert an explicit discretionary after that series of nodes. Initex sets the \type
+{\exhyphenchar=`\-}. Incidentally, this is a global parameter instead of a
+language-specific one because it may be useful to change the value depending on
+the document structure instead of the text language.
+
+The insertion of discretionaries after a sequence of explicit hyphens happens at
+the same time as the other hyphenation processing, {\it not\/} inside the main
+control loop.
+
+The only use \LUATEX\ has for \type {\hyphenchar} is at the check whether a word
+should be considered for hyphenation at all. If the \type {\hyphenchar} of the font
+attached to the first character node in a word is negative, then hyphenation of
+that word is abandoned immediately. {\bf This behavior is added for backward
+compatibility only, and the use of \type {\hyphenchar=-1} as a means of
+preventing hyphenation should not be used in new \LUATEX\ documents.}
+
+Fifth, \type {\setlanguage} no longer creates whatsits. The meaning of \type
+{\setlanguage} is changed so that it is now an integer parameter like all others.
+That integer parameter is used in \type {\glyph_node} creation to add language
+information to the glyph nodes. In conjunction, the \type {\language} primitive is
+extended so that it always also updates the value of \type {\setlanguage}.
+
+Sixth, the \type {\noboundary} command (this command prohibits word boundary
+processing where that would normally take place) now does create whatsits. These
+whatsits are needed because the exact place of the \type {\noboundary} command in
+the input stream has to be retained until after the ligature and font processing
+stages.
+
+Finally, there is no longer a \type {main_loop} label in the code. Remember that
+\TEX82 did quite a lot of processing while adding \type {char_nodes} to the
+horizontal list? For speed reasons, it handled that processing code outside of
+the \quote {main control} loop, and only the first character of any \quote {word}
+was handled by that \quote {main control} loop. In \LUATEX, there is no longer a
+need for that (all hard work is done later), and the (now very small) bits of
+character|-|handling code have been moved back inline. When \type
+{\tracingcommands} is on, this is visible because the full word is reported,
+instead of just the initial character.
+
+\section[patternsexceptions]{Loading patterns and exceptions}
+
+The hyphenation algorithm in \LUATEX\ is quite different from the one in \TEX82,
+although it uses essentially the same user input.
+
+After expansion, the argument for \type {\patterns} has to be proper \UTF8 with
+individual patterns separated by spaces, no \type {\char} or \type {\chardef}d
+commands are allowed. The current implementation is even more strict, and will
+reject all non|-|\UNICODE\ characters, but that will be changed in the future.
+For now, the generated errors are a valuable tool in discovering font-encoding
+specific pattern files.
+
+Likewise, the expanded argument for \type {\hyphenation} also has to be proper
+\UTF8, but here a tiny little bit of extra syntax is provided:
+
+\startitemize[n]
+\startitem
+ Three sets of arguments in curly braces (\type {{}{}{}}) indicates a desired
+ complex discretionary, with arguments as in \type {\discretionary}'s command in
+ normal document input.
+\stopitem
+\startitem
+ A \type {-} indicates a desired simple discretionary, cf.\ \type {\-} and \type
+ {\discretionary{-}{}{}} in normal document input.
+\stopitem
+\startitem
+ Internal command names are ignored. This rule is provided especially for \type
+ {\discretionary}, but it also helps to deal with \type {\relax} commands that
+ may sneak in.
+\stopitem
+\startitem
+ An \type {=} indicates a (non|-|discretionary) hyphen in the document input.
+\stopitem
+\stopitemize
+
+The expanded argument is first converted back to a space-separated string while
+dropping the internal command names. This string is then converted into a
+dictionary by a routine that creates key|-|value pairs by converting the other
+listed items. It is important to note that the keys in an exception dictionary
+can always be generated from the values. Here are a few examples:
+
+\starttabulate[|l|l|l|]
+\NC \ssbf value \NC \ssbf implied key (input) \NC \ssbf effect \NC\NR
+\NC \type {ta-ble} \NC table \NC \type {ta\-ble} ($=$ \type {ta\discretionary{-}{}{}ble}) \NC\NR
+\NC \type {ba{k-}{}{c}ken} \NC backen \NC \type {ba\discretionary{k-}{}{c}ken} \NC\NR
+\stoptabulate
+
+The resultant patterns and exception dictionary will be stored under the language
+code that is the present value of \type {\language}.
+
+In the last line of the table, you see there is no \type {\discretionary} command
+in the value: the command is optional in the \TEX-based input syntax. The
+underlying reason for that is that it is conceivable that a whole dictionary of
+words is stored as a plain text file and loaded into \LUATEX\ using one of the
+functions in the \LUA\ \type {lang} library. This loading method is quite a bit
+faster than going through the \TEX\ language primitives, but some (most?) of that
+speed gain would be lost if it had to interpret command sequences while doing so.
+
+It is possible to specify extra hyphenation points in compound words by using
+\type {{-}{}{-}} for the explicit hyphen character (replace \type {-} by the
+actual explicit hyphen character if needed). For example, this matches the word
+\quote {multi|-|word|-|boundaries} and allows an extra break inbetweem \quote
+{boun} and \quote {daries}:
+
+\starttyping
+\hyphenation{multi{-}{}{-}word{-}{}{-}boun-daries}
+\stoptyping
+
+The motivation behind the \ETEX\ extension \type {\savinghyphcodes} was that
+hyphenation heavily depended on font encodings. This is no longer true in
+\LUATEX, and the corresponding primitive is ignored pending complete removal. The
+future semantics of \type {\uppercase} and \type {\lowercase} are still under
+consideration, no changes have taken place yet.
+
+\section{Applying hyphenation}
+
+The internal structures \LUATEX\ uses for the insertion of discretionaries in
+words is very different from the ones in \TEX82, and that means there are some
+noticeable differences in handling as well.
+
+First and foremost, there is no \quote {compressed trie} involved in hyphenation.
+The algorithm still reads \PATGEN-generated pattern files, but \LUATEX\ uses a
+finite state hash to match the patterns against the word to be hyphenated. This
+algorithm is based on the \quote {libhnj} library used by \OPENOFFICE, which in
+turn is inspired by \TEX. The memory allocation for this new implementation is
+completely dynamic, so the \WEBC\ setting for \type {trie_size} is ignored.
+
+Differences between \LUATEX\ and \TEX82 that are a direct result of that:
+
+\startitemize
+\startitem
+ \LUATEX\ happily hyphenates the full \UNICODE\ character range.
+\stopitem
+\startitem
+ Pattern and exception dictionary size is limited by the available memory
+ only, all allocations are done dynamically. The trie|-|related settings in
+ \type {texmf.cnf} are ignored.
+\stopitem
+\startitem
+ Because there is no \quote {trie preparation} stage, language patterns never
+ become frozen. This means that the primitive \type {\patterns} (and its \LUA\
+ counterpart \type {lang.patterns}) can be used at any time, not only in
+ ini\TEX.
+\stopitem
+\startitem
+ Only the string representation of \type {\patterns} and \type {\hyphenation} is
+ stored in the format file. At format load time, they are simply
+ re|-|evaluated. It follows that there is no real reason to preload languages
+ in the format file. In fact, it is usually not a good idea to do so. It is
+ much smarter to load patterns no sooner than the first time they are actually
+ needed.
+\stopitem
+\startitem
+ \LUATEX\ uses the language-specific variables \type {\prehyphenchar} and \type
+ {\posthyphenchar} in the creation of implicit discretionaries, instead of
+ \TEX82's \type {\hyphenchar}, and the values of the language|-|specific variables
+ \type {\preexhyphenchar} and \type {\postexhyphenchar} for explicit
+ discretionaries (instead of \TEX82's empty discretionary).
+\stopitem
+\startitem
+ The value of the two counters related to hyphenation, \type {hyphenpenalty}
+ and \type {exhyphenpenalty}, are now stored in the discretionary nodes. This
+ permits a local overload for explicit \type {\discretionary} commands. The
+ value current when the hyphenation pass is applied is used. When no callbacks
+ are used this is compatible with traditional \TEX. When you apply the \LUA\
+ \type {lang.hyphenate} function the current values are used.
+\stopitem
+\stopitemize
+
+Inserted characters and ligatures inherit their attributes from the nearest glyph
+node item (usually the preceding one, but the following one for the items
+inserted at the left-hand side of a word).
+
+Word boundaries are no longer implied by font switches, but by language switches.
+One word can have two separate fonts and still be hyphenated correctly (but it
+can not have two different languages, the \type {\setlanguage} command forces a
+word boundary).
+
+All languages start out with \type {\prehyphenchar=`\-}, \type {\posthyphenchar=0},
+\type {\preexhyphenchar=0} and \type {\postexhyphenchar=0}. When you assign the
+values of one of these four parameters, you are actually changing the settings
+for the current \type {\language}, this behavior is compatible with \type {\patterns}
+and \type {\hyphenation}.
+
+\LUATEX\ also hyphenates the first word in a paragraph. Words can be up to 256
+characters long (up from 64 in \TEX82). Longer words generate an error right now,
+but eventually either the limitation will be removed or perhaps it will become
+possible to silently ignore the excess characters (this is what happens in
+\TEX82, but there the behavior cannot be controlled).
+
+If you are using the \LUA\ function \type {lang.hyphenate}, you should be aware
+that this function expects to receive a list of \quote {character} nodes. It will
+not operate properly in the presence of \quote {glyph}, \quote {ligature}, or
+\quote {ghost} nodes, nor does it know how to deal with kerning. In the near
+future, it will be able to skip over \quote {ghost} nodes, and we may add a less
+fuzzy function you can call as well.
+
+The hyphenation exception dictionary is maintained as key|-|value hash, and that
+is also dynamic, so the \type {hyph_size} setting is not used either.
+
+\section{Applying ligatures and kerning}
+
+After all possible hyphenation points have been inserted in the list, \LUATEX\
+will process the list to convert the \quote {character} nodes into \quote {glyph}
+and \quote {ligature} nodes. This is actually done in two stages: first all
+ligatures are processed, then all kerning information is applied to the result
+list. But those two stages are somewhat dependent on each other: If the used font
+makes it possible to do so, the ligaturing stage adds virtual \quote {character}
+nodes to the word boundaries in the list. While doing so, it removes and
+interprets \type {noboundary} nodes. The kerning stage deletes those word
+boundary items after it is done with them, and it does the same for \quote
+{ghost} nodes. Finally, at the end of the kerning stage, all remaining \quote
+{character} nodes are converted to \quote {glyph} nodes.
+
+This work separation is worth mentioning because, if you overrule from \LUA\ only
+one of the two callbacks related to font handling, then you have to make sure you
+perform the tasks normally done by \LUATEX\ itself in order to make sure that the
+other, non|-|overruled, routine continues to function properly.
+
+Work in this area is not yet complete, but most of the possible cases are handled
+by our rewritten ligaturing engine. We are working hard to make sure all of the
+possible inputs will become supported soon.
+
+For example, take the word \type {office}, hyphenated \type {of-fice}, using a
+\quote {normal} font with all the \type {f}-\type {f} and \type {f}-\type {i}
+type ligatures:
+
+\starttabulate[|l|l|]
+\NC Initial: \NC \type {{o}{f}{f}{i}{c}{e}} \NC\NR
+\NC After hyphenation: \NC \type {{o}{f}{{-},{},{}}{f}{i}{c}{e}} \NC\NR
+\NC First ligature stage: \NC \type {{o}{{f-},{f},{<ff>}}{i}{c}{e}} \NC\NR
+\NC Final result: \NC \type {{o}{{f-},{<fi>},{<ffi>}}{c}{e}} \NC\NR
+\stoptabulate
+
+That's bad enough, but let us assume that there is also a hyphenation point
+between the \type {f} and the \type {i}, to create \type {of-f-ice}. Then the
+final result should be:
+
+\starttyping
+{o}{{f-},
+ {{f-},
+ {i},
+ {<fi>}},
+ {{<ff>-},
+ {i},
+ {<ffi>}}}{c}{e}
+\stoptyping
+
+with discretionaries in the post-break text as well as in the replacement text of
+the top-level discretionary that resulted from the first hyphenation point.
+
+Here is that nested solution again, in a different representation:
+
+\starttabulate[|l|l|l|l|]
+\NC \NC pre \NC post \NC replace \NC \NR
+\NC topdisc \NC \type {f-}$^1$ \NC sub1 \NC sub2 \NC \NR
+\NC sub1 \NC \type {f-}$^2$ \NC \type {i}$^3$ \NC \type {<fi>}$^4$ \NC \NR
+\NC sub2 \NC \type {<ff>-}$^5$\NC \type {i}$^6$ \NC \type {<ffi>}$^7$ \NC \NR
+\stoptabulate
+
+When line breaking is choosing its breakpoints, the following fields will
+eventually be selected:
+
+\starttabulate[|l|l|l|]
+\NC \type {of-f-ice} \NC \type {f-}$^1$ \NC \NR
+\NC \NC \type {f-}$^2$ \NC \NR
+\NC \NC \type {i}$^3$ \NC \NR
+\NC \type {of-fice} \NC \type {f-}$^1$ \NC \NR
+\NC \NC \type {<fi>}$^4$ \NC \NR
+\NC \type {off-ice} \NC \type {<ff>-}$^5$ \NC \NR
+\NC \NC \type {i}$^6$ \NC \NR
+\NC \type {office} \NC \type {<ffi>}$^7$ \NC \NR
+\stoptabulate
+
+The current solution in \LUATEX\ is not able to handle nested discretionaries,
+but it is in fact smart enough to handle this fictional \type {of-f-ice} example.
+It does so by combining two sequential discretionary nodes as if they were a
+single object (where the second discretionary node is treated as an extension of
+the first node).
+
+One can observe that the \type {of-f-ice} and \type {off-ice} cases both end with
+the same actual post replacement list (\type {i}), and that this would be the
+case even if that \type {i} was the first item of a potential following ligature
+like \type {ic}. This allows \LUATEX\ to do away with one of the fields, and thus
+make the whole stuff fit into just two discretionary nodes.
+
+The mapping of the seven list fields to the six fields in this discretionary node
+pair is as follows:
+
+\starttabulate[|l|p|]
+\NC \bf field \NC \bf description \NC \NR
+\NC \type {disc1.pre} \NC \type {f-}$^1$ \NC \NR
+\NC \type {disc1.post} \NC \type {<fi>}$^4$ \NC \NR
+\NC \type {disc1.replace} \NC \type {<ffi>}$^7$ \NC \NR
+\NC \type {disc2.pre} \NC \type {f-}$^2$ \NC \NR
+\NC \type {disc2.post} \NC \type {i}$^{3{,}6}$\NC \NR
+\NC \type {disc2.replace} \NC \type {<ff>-}$^5$\NC \NR
+\stoptabulate
+
+What is actually generated after ligaturing has been applied is therefore:
+
+\starttyping
+{o}{{f-},
+ {<fi>},
+ {<ffi>}}
+ {{f-},
+ {i},
+ {<ff>-}}{c}{e}
+\stoptyping
+
+The two discretionaries have different subtypes from a discretionary appearing on
+its own: the first has subtype 4, and the second has subtype 5. The need for
+these special subtypes stems from the fact that not all of the fields appear in
+their \quote {normal} location. The second discretionary especially looks odd,
+with things like the \type {<ff>-} appearing in \type {disc2.replace}. The fact
+that some of the fields have different meanings (and different processing code
+internally) is what makes it necessary to have different subtypes: this enables
+\LUATEX\ to distinguish this sequence of two joined discretionary nodes from the
+case of two standalone discretionaries appearing in a row.
+
+Of course there is still that relationship with fonts: ligatures can be implemented by
+mapping a sequence of glyphs onto one glyph, but also by selective replacement and
+kerning. This means that the above examples are just representing the traditional
+approach.
+
+\section{Breaking paragraphs into lines}
+
+This code is still almost unchanged, but because of the above|-|mentioned changes
+with respect to discretionaries and ligatures, line breaking will potentially be
+different from traditional \TEX. The actual line breaking code is still based on
+the \TEX82 algorithms, and it does not expect there to be discretionaries inside
+of discretionaries.
+
+But that situation is now fairly common in \LUATEX, due to the changes to the
+ligaturing mechanism. And also, the \LUATEX\ discretionary nodes are implemented
+slightly different from the \TEX82 nodes: the \type {no_break} text is now
+embedded inside the disc node, where previously these nodes kept their place in
+the horizontal list (the discretionary node contained a counter indicating how
+many nodes to skip).
+
+The combined effect of these two differences is that \LUATEX\ does not always use
+all of the potential breakpoints in a paragraph, especially when fonts with many
+ligatures are used.
+
+\stopchapter
+
+\stopcomponent