diff options
Diffstat (limited to 'doc/context/sources/general/manuals/evenmore/evenmore-hyphenation.tex')
-rw-r--r-- | doc/context/sources/general/manuals/evenmore/evenmore-hyphenation.tex | 426 |
1 files changed, 426 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/evenmore/evenmore-hyphenation.tex b/doc/context/sources/general/manuals/evenmore/evenmore-hyphenation.tex new file mode 100644 index 000000000..50113ed27 --- /dev/null +++ b/doc/context/sources/general/manuals/evenmore/evenmore-hyphenation.tex @@ -0,0 +1,426 @@ +% language=us runpath=texruns:manuals/evenmore + +\environment evenmore-style + +\startcomponent evenmore-hyphenation + +\usebodyfont[pagella] + +\startchapter[title=Hyphenation] + +\startsection[title={Introduction}] + +Hyphenation is driven by the character codes. In a traditional \TEX\ such a code +accesses a glyph in a font, which is why the font encoding mattered, but in +\LUATEX\ we use \UNICODE\ and when hyphenation is applied. \footnote {In +\CONTEXT\ \MKII\ we also use \UTF\ patterns, which made it possible to ship +patterns that didn't depend on a font encoding. Mojca and Arthur made \UTF\ the +default when the (upgraded) hyphenation pattern project started.} Later, the +character codes are adapted by the font handler where they become glyphs. There +are moments when you don't want to hyphenate and a cheap trick is to switch to a +language that has no hyphenation patterns. But, in a system like \CONTEXT\ that +doesn't work well because we have lots of language bound properties. Therefore in +\MKIV\ we set the left- and right hyphen minima to extreme values, something that +blocks hyphenation quite well. But this is not a pretty solution at all. Even +worse is that when we have situations where discretionaries (\type +{\discretionary}), automatic (\type{-}) or explicit (\type {\-}) are used these +still kick in. + +For that reason in \LMTX\ we have a mode variable that controls hyphenation. In +\LUATEX\ we have primitives like \type {\compoundhyphenmode}, \type +{\hyphenationbounds} and \type {\hyphenpenaltymode} that controlled how +hyphenation and discretionary injection is handled but when in \LUAMETATEX\ the +more generic \type {\hyphenationmode} parameter was introduced the precursors +were all merged into this one. One can argue that this is a form of regression +but there are good reasons, most noticeably the fact that we keep these +properties with glyph nodes so that we have better control over them in grouped +situations where as some operations happen when the paragraph as whole get +treated local overloads are lost. \footnote {Of course it also is a wink to those +who complain that we add primitives to an otherwise leaner variant of \LUATEX, +but let us not elaborate on that misunderstanding.} It anyway means that in +\LMTX\ we have to set different parameters but that is no big deal because users +are supposed to use the more high level interfaces; instead of setting parameters +to values one flips bits in \type {\hyphenationmode}, which in the end makes more +sense and also permits extensions later without adding much overhead. + +Currently this mode parameter controls the following options: + +\starttabulate[|Tr|||] +\NC \uchexnumber{\normalhyphenationcode} \NC \type{\normalhyphenationcode} \NC honour the (normal) \type{\discretionary} primitive \NC \NR +\NC \uchexnumber{\automatichyphenationcode} \NC \type{\automatichyphenationcode} \NC turn \type {-} into (automatic) discretionaries \NC \NR +\NC \uchexnumber{\explicithyphenationcode} \NC \type{\explicithyphenationcode} \NC turn \type {\-} into (explicit) discretionaries \NC \NR +\NC \uchexnumber{\syllablehyphenationcode} \NC \type{\syllablehyphenationcode} \NC hyphenate (syllable) according to language \NC \NR +\NC \uchexnumber{\uppercasehyphenationcode} \NC \type{\uppercasehyphenationcode} \NC hyphenate uppercase characters too \NC \NR +\NC \uchexnumber{\compoundhyphenationcode} \NC \type{\compoundhyphenationcode} \NC permit break at an explicit hyphen (border cases) \NC \NR +\NC \uchexnumber{\strictstarthyphenationcode} \NC \type{\strictstarthyphenationcode} \NC traditional \TEX\ compatibility wrt the start of a word \NC \NR +\NC \uchexnumber{\strictendhyphenationcode} \NC \type{\strictendhyphenationcode} \NC traditional \TEX\ compatibility wrt the end of a word \NC \NR +\NC \uchexnumber{\automaticpenaltyhyphenationcode} \NC \type{\automaticpenaltyhyphenationcode} \NC use \type {\automatichyphenpenalty} \NC \NR +\NC \uchexnumber{\explicitpenaltyhyphenationcode} \NC \type{\explicitpenaltyhyphenationcode} \NC use \type {\explicithyphenpenalty} \NC \NR +\NC \uchexnumber{\permitgluehyphenationcode} \NC \type{\permitgluehyphenationcode} \NC turn glue in discretionaries into kerns \NC \NR +\stoptabulate + +The default \CONTEXT\ setup is: + +\starttyping +\hyphenationmode \numexpr + \normalhyphenationcode + + \automatichyphenationcode + + \explicithyphenationcode + + \syllablehyphenationcode + + \uppercasehyphenationcode + + \compoundhyphenationcode + % \strictstarthyphenationcode + % \strictendhyphenationcode + + \automaticpenaltyhyphenationcode + + \explicitpenaltyhyphenationcode + + \permitgluehyphenationcode +\relax +\stoptyping + +When a discretionary node is created (triggered by \type {\discretionary}) the +current value is used. Injected glyph nodes on the other hand will store the +current value and use that when it is needed for hyphenating the list. + +\stopsection + +\startsection[title={Controlling hyphenation}] + +We start with an example that has some Dutch words: + +\startbuffer[sample] +NEDERLANDS\par Nederlands\par nederlands\par +\CONTEXT \par test\-test\par test-test \par +\stopbuffer + +\typebuffer[sample] + +\startbuffer[result] +\startlinecorrection +\dontleavehmode \dorecurse{\boxlines\scratchboxone} {% + \setbox\scratchbox\boxline\scratchboxone#1% + \ruledhpack{\strut\unhbox\scratchbox}% + \kern.25\emwidth +} +\stoplinecorrection +\stopbuffer + +When we typeset this with a \type {\hsize} of 2mm we get: + +\setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]} + +\getbuffer[result] + +But when we block hyphenation with \type {\nohyhens} we see: + +\setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \nohyphens \getbuffer[sample]} + +\getbuffer[result] + +The \MKIV\ behavior can be emulated by setting the mode as follows + +\startbuffer[demo] +\bitwiseflip \hyphenationmode \syllablehyphenationcode +\stopbuffer + +\setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[demo] \getbuffer[sample]} + +\getbuffer[result] + +This time the three non|-|syllable variants get hyphenated and that is not what +we want. In this case there is a \type {\discretionary} in the definition of the +macro that generates \CONTEXT\ and, apart from the fact that we might not even +want to hyphenate logos, we have to block it when we apply \type {\nohyphens}. + +This mode setting are directly applied to the three non|-|syllable variants but +delayed in the syllable discretionaries because hyphenation happens later so the +state becomes a property of glyph nodes. Doing the same for the other +discretionaries would demand an adaption of various pieces of the engine code and +plugged in user (\LUA) code also has to consider it which makes no sense. + +\startbuffer[sample] +\nohyphens nederlands {\dohyphens nederlands} nederlands\par +\stopbuffer + +\typebuffer[sample] + +\setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]} +\getbuffer[result] + +Compare this with: + +\startbuffer[sample] +nederlands {\nohyphens nederlands} nederlands\par +\stopbuffer + +\typebuffer[sample] + +\setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]} +\getbuffer[result] + +\stopsection + +\startsection[title={Compound hyphenation}] + +Yet another discretionary related issue is with compound words, that is: cases +where \type {\discretionary} commands sit between words. There are of course +tricks to deal with it like adding a huge penalty combined with a zero skip. This +is okay in a traditional \TEX\ engine but in an opened up one you might not want +this. Just to mention one aspect: when processing \OPENTYPE\ fonts you actually +need to look into discretionaries in order to deal with glyphs that interact. And +you don't want to deal with penalties and skips unless they have an explicit +meaning. We show the four possibilities: + +\startbuffer[sample] +nederlands\discretionary {!}{!}{!}nederlands\blank +\stopbuffer + +\typebuffer[sample] + +\setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]} +\getbuffer[result] + +\startbuffer[sample] +nederlands\discretionary options 1 {!}{!}{!}nederlands\blank +\stopbuffer + +\typebuffer[sample] + +\setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]} +\getbuffer[result] + +\startbuffer[sample] +nederlands\discretionary options 2 {!}{!}{!}nederlands\blank +\stopbuffer + +\typebuffer[sample] + +\setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]} +\getbuffer[result] + +\startbuffer[sample] +nederlands\discretionary options 3 {!}{!}{!}nederlands\blank +\stopbuffer + +\typebuffer[sample] + +\setbox\scratchboxone\vbox{\dontcomplain \nl \hsize 2mm \getbuffer[sample]} +\getbuffer[result] + +Here is an example of such an interference. Of course in practice this happens +seldom and certainly not with ligatures. Some fonts have kerning between certain +glyphs and for instance dashes and there it could matter. + +\startbuffer +ef% +\penalty \plustenthousand +\hskip \zeropoint +\discretionary{-}{f}{f}% +\penalty \plustenthousand +\hskip \zeropoint +e +ef\discretionary options 3 {-}{f}{f}e +\stopbuffer + +\typebuffer + +As you can see, we only get the ligature when we set the options. In the process +of processing \OPENTYPE\ features it can be that one actually looses a +discretionary, although we try to prevent this when possible. + +\startlinecorrection +\scale[height=2cm]{\setupbodyfont[pagella]\showglyphs\getbuffer} +\stoplinecorrection + +But, as said, the fact that we don't need the penalties and glue helps at the +\LUA\ end: the cleaner the node list, the better. + +\stopsection + +\startsection[title={Tracing}] + +The already present tracker command has been extended so handle the options: + +\startbuffer[sample0] +\enabletrackers[discretionaries] +\stopbuffer +\startbuffer[sample1] +test\discretionary {]} {[} {[]}test +\stopbuffer +\startbuffer[sample2] +testing\discretionary {]} {[} {[]}testing +\stopbuffer +\startbuffer[sample3] +testing\discretionary options 3 {]} {[} {[]}testing +\stopbuffer + +\typebuffer[sample0,sample1,sample2,sample3] + +\setbox\scratchboxone\vbox{\dontcomplain \getbuffer[sample0,sample1]} \getbuffer[result] +\setbox\scratchboxone\vbox{\dontcomplain \hsize 2mm \getbuffer[sample0,sample2]} \getbuffer[result] +\setbox\scratchboxone\vbox{\dontcomplain \hsize 2mm \getbuffer[sample0,sample3]} \getbuffer[result] + +\stopsection + +\startsection[title={Glue in discretionaries}] + +In the case you cannot predict what goes into a discretionary you can get run into +an error message with respect to unsupported glue in a disc node. The mode value +\number\permitgluehyphenationcode\space makes glue acceptable and turn into +kern, as demonstrated here; + +\startbuffer +{\hsize 1mm \darkblue \discretionary{potential conspiracy}{prophets}{influencers}\par} +\stopbuffer + +\typebuffer + +The line break occurs but the space in the pre part is of course frozen: + +{\getbuffer} + +As usual \TEX\ users will come up with applications. + +\stopsection + +\startsection[title={Penalties}] + +By default the par builder will use the value of \type {\hyphenpenalty} that gets +stored in the discretionary node. However, when the \type {\discretionary} is +followed by a \type {penalty} keyword and a number, that one will. + +\stopsection + +\startsection[title=Exceptions] + +At some point a user on the \CONTEXT\ mailing list wondered how to deal with a case +like this: + +\startbuffer[example] +\switchtobodyfont[pagella]\mainlanguage[de]auffasse +\stopbuffer + +\typebuffer[example] + +\startlinecorrection +\scale[height=2cm]{\inlinebuffer[example]} +\stoplinecorrection + +\startbuffer +\startexceptions[de] +au{f-}{-f}{ff}(f\zwnj f)asse +\stopexceptions +\stopbuffer + +In \LUAMETATEX\ you can block the unwanted ligature using this trick: + +\typebuffer \getbuffer + +\startlinecorrection +\scale[height=2cm]{\inlinebuffer[example]} +\stoplinecorrection + +The exception mechanism in \LUATEX\ and therefore \LUAMETATEX\ works as follows. +When we have this exception: + +\starttyping +au{f-}{-f}{ff}asse +\stoptyping + +the engine will register that exception under \type {auffasse}, that is: the +replacement part determines the word. When it runs into that word, it will create +a so called discretionary node with a pre, post and replace part. However, it +only uses the \type {ff} for a lookup and keeps the original two glyphs: these +become the replacement text. However, in \LUAMETATEX\ you can add an alternative +replacement: + +\startbuffer +\startexceptions[de] +au{f-}{-f}{ff}(st)asse +\stopexceptions +\stopbuffer + +\typebuffer \getbuffer + +This time the replacement text becomes \type {xx}. So we get \type {austasse} and +it is that sequence that is seen by the font handler when it applies its tricks. +On some fonts however + +\startbuffer[example] +\switchtobodyfont[pagella]\mainlanguage[de]auffasse +\stopbuffer + +\startlinecorrection +\scale[height=2cm]{\showglyphs\showfontkerns\inlinebuffer[example]} +\stoplinecorrection + +But in the Pagella font that we use here, a kern is added between the \type {s} and +the \type {t}. If you don't want that you can say this: + +\startbuffer +\startexceptions[de] +au{f-}{-f}{ff}(s\zwnj t)asse +\stopexceptions +\stopbuffer + +\typebuffer \getbuffer + +\startlinecorrection +\scale[height=2cm]{\showglyphs\showfontkerns\inlinebuffer[example]} +\stoplinecorrection + +A \type {zwj} will block a ligature (some fonts have an \type {st} ligature) and a +\type {zwnj} blocks a ligatures as well as kerns. + +You can actually abuse this mechanism for trickery like this: + +\startbuffer +\startexceptions[nl] +wis-kun-d{e-}{o}{eo}(e-o)n-der-wijs +\stopexceptions +\stopbuffer + +\typebuffer \getbuffer + +The Dutch word \type {wiskundeonderwijs} is found as exception and comes out like +this: + +\startbuffer[example] +\switchtobodyfont[pagella]\mainlanguage[nl]wiskundeonderwijs +\stopbuffer + +\startlinecorrection +\scale[height=1cm]{\showglyphs\showfontkerns\inlinebuffer[example]} +\stoplinecorrection + +Watch the hyphen that makes the compound word more visible! The other hyphens in +the exception are proper hyphenation points and when a break happens there a +hyphen is automatically added. The \type {\nokerning} and \type {\noligaturing} +macros can be used grouped: + +\startbuffer[example] +{every}\quad +{\nokerning every}\quad +{\noligaturing every}\quad +{e{\nokerning v}ery}\quad +{e{\glyphoptions\noleftkernglyphoptioncode v}ery}\quad +{e{\glyphoptions\norightkernglyphoptioncode v}ery}\quad +\stopbuffer + +\typebuffer[example] + +There are several low level control options. In addition to those shown here we +have a pair for ligatures: \typ {\noleftligatureglyphoptioncode} and \typ +{\norightligatureglyphoptioncode}. + +\startlinecorrection[blank] +\scale[width=\textwidth]{\showglyphs\showfontkerns\inlinebuffer[example]} +\stoplinecorrection + +There are alternative mechanism, like a blocker that implements a font feature +and a replacement mechanism, but these are not discussed here. + +\stopsection + +\stopchapter + +\stopcomponent |