diff options
Diffstat (limited to 'doc/context/sources/general/manuals/mk/mk-order.tex')
-rw-r--r-- | doc/context/sources/general/manuals/mk/mk-order.tex | 375 |
1 files changed, 375 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/mk/mk-order.tex b/doc/context/sources/general/manuals/mk/mk-order.tex new file mode 100644 index 000000000..1e6306c45 --- /dev/null +++ b/doc/context/sources/general/manuals/mk/mk-order.tex @@ -0,0 +1,375 @@ +% language=uk + +\environment mk-environment + +\startcomponent mk-order + +\chapter{The order of things} + +Normally the text that makes up a paragraph comes directly from +the input stream or macro expansions (think of labels). When \TEX\ +has collected enough content to make a paragraph, for instance +because a \type {\par} token signals it \TEX\ will try to create +one. The raw material available for making such a paragraph is +linked in a list nodes: references to glyphs in a font, kerns +(fixed spacing), glue (flexible spacing), penalties (consider them +to be directives), whatsits (can be anything, e.g.\ \PDF\ literals +or hyperlinks). The result is a list of horizontal boxes (wrappers with +lists that represent \quote {lines}) and this is either wrapped in +vertical box of added to the main vertical list that keeps the +page stream. + +The treatment consists of four activities: + +\startitemize[packed] +\item construction of ligatures (an f plus an i can become fi) +\item hyphenation of words that cross a line boundary +\item kerning of characters based on information in the font +\item breaking the list in lines in the most optimal way +\stopitemize + +The process of breaking into lines is also influenced by +protrusion (like hanging punctuation) and expansion +(hz-optimization) but here we will not take these processes +into account. There are numerous variables that control +the process and the quality. + +These activities are rather interwoven and optimized. For +instance, in order to hyphenate, ligatures are to be decomposed +and|/|or constructed. Hyphenation happens when needed. Decisions +about optimal breakpoints in lines can be influenced by penalties +(like: not to many hyphenated words in a row) and permitting extra +stretch between words. Because a paragraph can be boxed and +unboxed, decomposed and fed into the machinery again, information +is kept around. Just imagine the following: you want to measure +the width of a word and therefore you box it. In order to get the +right dimensions, \TEX\ has to construct the ligatures and add +kerns. However, when we unbox that word and feed it into the +paragraph builder, potential hyphenation points have to be +consulted and at such a point might lay between the characters +that resulted in the ligature. You can imagine that adding (and +removing) inter|-|character kerns complicates the process even +more. + +At the cost of some extra runtime and memory usage, in \LUATEX\ +these steps are more isolated. There is a function that builts +ligatures, one that kerns characters, and another one that +hyphenates all words in a list, not just the ones that are +candidate for breaking. The potential breakpoints (called +discretionaries) can contain ligature information as well. The +linebreak process is also a separate function. + +The order in which this happens now is: + +\startitemize[packed,intro] +\item hyphenation of words +\item building of ligatures from sequences of glyphs +\item kerning of glyphs +\item breaking all this into lines +\stopitemize + +One can discuss endless about the terminology here: are we dealing +with characters or with glyphs. When a glyph node is made, it +contains a reference to a slot in a font. Because in traditional +\TEX\ the number of slots is limited to 256 the relationship +between characters in the input and the shape in the font, called +glyph, is kind of indirect (the input encoding versus font +encoding issue) while in \LUATEX\ we can keep the font in +\UNICODE\ encoding if we want. In traditional \TEX, hyphenation is +based on the font encoding and therefore glyphs, and although in +\LUATEX\ this is still the case, there we can more safely talk of +characters till we start mapping then to shapes that have no +\UNICODE\ point. This is of course macro package dependent but in +\CONTEXT\ \MKIV\ we normalize all input to \UNICODE\ exclusively. + +The last step is now really isolated and for that reason we can +best talk in terms of preparation of the to-be paragraph when +we refer to the first three activities. In \LUATEX\ these three +are available as functions that operate on a node list. They each +have their own callback so we can disable them by replacing the +default functions by dummies. Then we can hook in a new function +in the two places that matter: \type {hpack_filter} and \type +{pre_linebreak_filter} and move the preparation to there. + +A simple overload is shown below. Because the first node is always +a whatsit that holds directional information (and at some point in +the future maybe even more paragraph related state info), we can +safely assume that \type {head} does not change. Of course this +situation might change when you start adding your own +functionality. + +\starttyping +local function my_preparation(head) + local tail = node.slide(head) -- also add prev pointers + tail = lang.hyphenate(head,tail) + tail = node.ligaturing(head,tail) + tail = node.kerning(head,tail) + return head +end + +callback.register("pre_linebreak_filter", my_preparation) +callback.register("hpack_filter", my_preparation) + +local dummy = function(head,tail) return tail end + +callback.register("hyphenate", dummy) +callback.register("ligaturing", dummy) +callback.register("kerning", dummy) +\stoptyping + +It might be clear that the order of actions matter. It might also +be clear that you are responsible for that order yourself. There +is no pre||cooked mechanism for guarding your actions and there are +several reasons for this: + +\startitemize + +\item Each macro package does things its own way so any hard-coded +mechanism would be replaced and overloaded anyway. Compare this to +the usage of catcodes, font systems, auxiliary files, user +interfaces, handling of inserts etc. The combination of callbacks, +the three mentioned functions and the availability of \LUA\ makes +it possible to implement any system you like. + +\item Macro packages might want to provide hooks for specialized +node list processing, and since there are many places where code +can be hooked in, some kind of oversight is needed (real people +who keep track of interference of user supplied features, no +program can do that). + +\item User functions can mess up the node list and successive +actions then might make the wrong assumptions. In order to guard +this, macro packages might add tracing options and again there are +too many ways to communicate with users. Debugging and tracing has +to be embedded in the bigger system in a natural way. + +\stopitemize + +In \CONTEXT\ \MKIV\ there are already a few places where users can +hook code into the task list, but so far we haven't really +encouraged that. The interfaces are simply not stable enough yet. +On the other hand, there are already quite some node list +manipulators at work. The most prominent one is the \OPENTYPE\ +feature handler. That one replaces the ligature and kerning +functions (at least for some fonts). It also means that we need to +keep an eye on possible interferences between \CONTEXT\ \MKIV\ +mechanisms and those provided by \LUATEX. + +For fonts, that is actually quite simple: the \LUATEX\ functions +use ligature and kerning information stored in the \TFM\ table, +and for \OPENTYPE\ fonts we simply don't provide that information +when we define a font, so in that case \LUATEX\ will not ligature +and kern. Users can influence this process to some extend by +setting the \type {mode} for a specific instance of a font to +\type {base} or \type {node}. Because \TYPEONE\ fonts have no +features like \OPENTYPE\ such fonts are (at least currently) +always are processed in base mode. + +Deep down in \CONTEXT\ we call a sequence of actions a \quote +{task}. One such task is \quote {processors} and the actions +discussed so far are in this category. Within this category we +have subcategories: + +\starttabulate[|l|p|] +\NC \bf subcategory \NC \bf intended usage \NC \NR +\HL +\NC before \NC experimental (or module) plugins \NC \NR +\NC normalizers \NC cleanup and preparation handlers \NC \NR +\NC characters \NC operations on individual characters \NC \NR +\NC words \NC operations on words \NC \NR +\NC fonts \NC font related manipulations \NC \NR +\NC lists \NC manipulations on the list as a whole \NC \NR +\NC after \NC experimental (or module) plugins \NC \NR +\stoptabulate + +Here \quote {plugins} are experimental handlers or specialized +ones provided in modules that are not part of the kernel. The categories +are not that distinctive and only provide a convenient way to group +actions. + +Examples of normalizers are: checking for missing characters and +replacing character references by fallbacks. Character processors +are for instance directional analysers (for right to left +typesetting), case swapping, and specialized character triggered +hyphenation (like compound words). Word processors deal with +hyphenation (here we use the default function provided by \LUATEX) +and spell checking. The font processors deal with \OPENTYPE\ as +well as the ligature building and kerning of other font types. +Finally, the list processors are responsible for tasks like special +spacing (french punctuation) and kerning (additional +inter||character kerning). Of course, this all is rather \CONTEXT\ +specific and we expect to add quite some more less trivial handlers +the upcoming years. + +Many of these handlers are triggered by attributes. Nodes can have +many attributes and each can have many values. Traditionally \TEX\ +had only a few attributes: language and font, where the first is +not even a real attribute and the second is only bound to glyph +nodes. In \LUATEX\ language is also a glyph property. The nice +thing about attributes is that they can be set at the \TEX\ end +and obey grouping. This makes them for instance perfect for +implementing color mechanims. Because attributes are part of the +nodes, and not nodes themselves, they don't influence or interfere +processing unless one explicitly tests for them and acts +accordingly. + +In addition to the mentioned task \quote {processors} we also have +a task \quote {shipouts} and there will be more tasks in future +versions of \CONTEXT. Again we have subcategories, currently: + +\starttabulate[|l|p|] +\NC \bf subcategory \NC \bf intended usage \NC \NR +\HL +\NC before \NC experimental (or module) plugins \NC \NR +\NC normalizers \NC cleanup and preparation handlers \NC \NR +\NC finishers \NC manipulations on the list as a whole \NC \NR +\NC after \NC experimental (or module) plugins \NC \NR +\stoptabulate + +An example of a normalizer is cleanup of the \quote {to be shipped +out} list. Finishers deal with color, transparency, overprint, +negated content (sometimes used in page imposition), special +effects effect (like outline fonts) and viewer layers (something +\PDF). Quite possible hyperlink support will also be handled there +but not before the backend code is rewritten. + +The previous description is far from complete. For instance, not +all handlers use the same interface: some work \type {head} +onwards, some need a \type {tail} pointer too. Some report back +success or failure. So the task handler needs to normalize their +usage. Also, some effort goes into optimizing the task in such a +way that processing the document is still reasonable fast. Keep in +mind that each construction of a box invokes a callback, and there +are many boxes used for constructing a page. Even a nilled +callback is one, so for a simple one word paragraph four callbacks +are triggered: the (nilled) hyphenate, ligature and kern callbacks +as well as the one called \type {pre_linebreak_filter}. The task +handler that we plug in the filter callbacks calls many functions +and each of them does one of more passes over the node list, and +in turn might do many call to functions. You can imagine that +we're quite happy that \TEX\ as well as \LUA\ is so efficient. + +As I already mentioned, implementing a task handler as well as +deciding what actions within tasks to perform in what order is +specific for the way a macro package is set up. The following code +can serve as a starting point + +\starttyping +filters = { } -- global namespace + +local list = { } + +function filters.add(fnc,n) + if not n or n > #list + 1 then + table.insert(list,#list+1) + elseif n < 0 then + table.insert(list,1) + else + table.insert(list,n) + end +end + +function filters.remove(fnc,n) + if n and n > 0 and n <= #list then + table.remove(list,n) + end +end + +local function run_filters(head,...) + local tail = node.slide(head) + for _, fnc in ipairs(list) do + head, tail = fnc(head,tail,...) + end + return head +end + +local function hyphenation(head,tail) + return head, tail, lang.hyphenate(head,tail) -- returns done +end +local function ligaturing(head,tail) + return node.ligaturing(head,tail) -- returns head,tail,done +end +local function kerning(head,tail) + return node.kerning(head,tail) -- returns head,tail,done +end + +filters.add(hyphenation) +filters.add(ligaturing) +filters.add(kerning) + +callback.register("pre_linebreak_filter", run_filters) +callback.register("hpack_filter", run_filters) +\stoptyping + +Although one can inject extra filters by using the \type {add} +function it may be clear that this can be dangerous due to +interference. Therefore a slightly more secure variant is the +following, where \type {main} is reserved for macro package +actions and the others can be used by add||ons. + +\starttyping +filters = { } -- global namespace + +local list = { + pre = { }, main = { }, post = { }, +} + +local order = { + "pre", "main", "post" +} + +local function somewhere(where) + if not where then + texio.write_nl("error: invalid filter category") + elseif not list[where] then + texio.write_nl(string.format("error: invalid filter category '%s'",where)) + else + return list[where] + end + return false +end + +function filters.add(where,fnc,n) + local list = somewhere(where) + if not list then + -- error + elseif not n or n > #list + 1 then + table.insert(list,#list+1) + elseif n < 0 then + table.insert(list,1) + else + table.insert(list,n) + end +end + +function filters.remove(where,fnc,n) + local list = somewhere(where) + if list and n and n > 0 and n <= #list then + table.remove(list,n) + end +end + +local function run_filters(head,...) + local tail = node.slide(head) + for _, lst in pairs(order) do + for _, fnc in ipairs(list[lst]) do + head, tail = fnc(head,tail,...) + end + end + return head +end + +filters.add("main",hyphenation) +filters.add("main",ligaturing) +filters.add("main",kerning) + +callback.register("pre_linebreak_filter", run_filters) +callback.register("hpack_filter", run_filters) +\stoptyping + +Of course, \CONTEXT\ users who try to use this code will +be punished by loosing much of the functionality already +present, simply because we use yet another variant of the +above code. + +\stopcomponent |