diff options
author | Hans Hagen <pragma@wxs.nl> | 2021-05-16 11:46:45 +0200 |
---|---|---|
committer | Context Git Mirror Bot <phg@phi-gamma.net> | 2021-05-16 11:46:45 +0200 |
commit | 330909ad62342ff873dc758b909968c66d0252a4 (patch) | |
tree | 72b7552cdc6925b962badb33aa9b307d949144b0 /doc/context/sources/general/manuals/luametatex/luametatex-internals.tex | |
parent | 4396699cb99f42f6378ed7229788bbceb898851a (diff) | |
download | context-330909ad62342ff873dc758b909968c66d0252a4.tar.gz |
2021-05-15 22:44:00
Diffstat (limited to 'doc/context/sources/general/manuals/luametatex/luametatex-internals.tex')
-rw-r--r-- | doc/context/sources/general/manuals/luametatex/luametatex-internals.tex | 244 |
1 files changed, 244 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/luametatex/luametatex-internals.tex b/doc/context/sources/general/manuals/luametatex/luametatex-internals.tex new file mode 100644 index 000000000..70d45cfeb --- /dev/null +++ b/doc/context/sources/general/manuals/luametatex/luametatex-internals.tex @@ -0,0 +1,244 @@ +% language=uk + +\environment luametatex-style + +\startcomponent luametatex-internals + +\startchapter[reference=internals,title={The internals}] + +\topicindex{nodes} +\topicindex{boxes} +\topicindex{\LUA} + +This is a reference manual and not a tutorial. This means that we discuss changes +relative to traditional \TEX\ and also present new (or extended) functionality. +As a consequence we will refer to concepts that we assume to be known or that +might be explained later. Because the \LUATEX\ and \LUAMETATEX\ engines open up +\TEX\ there's suddenly quite some more to explain, especially about the way a (to +be) typeset stream moves through the machinery. However, discussing all that in +detail makes not much sense, because deep knowledge is only relevant for those +who write code not possible with regular \TEX\ and who are already familiar with +these internals (or willing to spend time on figuring it out). + +So, the average user doesn't need to know much about what is in this manual. For +instance fonts and languages are normally dealt with in the macro package that +you use. Messing around with node lists is also often not really needed at the +user level. If you do mess around, you'd better know what you're dealing with. +Reading \quotation {The \TEX\ Book} by Donald Knuth is a good investment of time +then also because it's good to know where it all started. A more summarizing +overview is given by \quotation {\TEX\ by Topic} by Victor Eijkhout. You might +want to peek in \quotation {The \ETEX\ manual} too. + +But \unknown\ if you're here because of \LUA, then all you need to know is that +you can call it from within a run. If you want to learn the language, just read +the well written \LUA\ book. The macro package that you use probably will provide +a few wrapper mechanisms but the basic \lpr {directlua} command that does the job +is: + +\starttyping +\directlua{tex.print("Hi there")} +\stoptyping + +You can put code between curly braces but if it's a lot you can also put it in a +file and load that file with the usual \LUA\ commands. If you don't know what +this means, you definitely need to have a look at the \LUA\ book first. + +If you still decide to read on, then it's good to know what nodes are, so we do a +quick introduction here. If you input this text: + +\starttyping +Hi There ... +\stoptyping + +eventually we will get a linked lists of nodes, which in \ASCII\ art looks like: + +\starttyping +H <=> i <=> [glue] <=> T <=> h <=> e <=> r <=> e ... +\stoptyping + +When we have a paragraph, we actually get something like this, where a \type +{par} node stores some metadata and is followed by a \type {hlist} flagged as +indent box: + +\starttyping +[par] <=> [hlist] <=> H <=> i <=> [glue] <=> T <=> h <=> e <=> r <=> e ... +\stoptyping + +Each character becomes a so called glyph node, a record with properties like the +current font, the character code and the current language. Spaces become glue +nodes. There are many node types and nodes can have many properties but that will +be discussed later. Each node points back to a previous node or next node, given +that these exist. Sometimes multiple characters are represented by one glyph +(shape), so one can also get: + +\starttyping +[par] <=> [hlist] <=> H <=> i <=> [glue] <=> Th <=> e <=> r <=> e ... +\stoptyping + +And maybe some characters get positioned relative to each other, so we might +see: + +\starttyping +[par] <=> [hlist] <=> H <=> [kern] <=> i <=> [glue] <=> Th <=> e <=> r <=> e ... +\stoptyping + +Actually, the above representation is one view, because in \LUAMETATEX\ we can +choose for this: + +\starttyping +[par] <=> [glue] <=> H <=> [kern] <=> i <=> [glue] <=> Th <=> e <=> r <=> e ... +\stoptyping + +where glue (currently fixed) is used instead of an empty hlist (think of a \type +{\hbox}). Options like this are available because want a certain view on these +lists from the \LUA\ end and the result being predicable is part of that. + +It's also good to know beforehand that \TEX\ is basically centered around +creating paragraphs and pages. The par builder takes a list and breaks it into +lines. At some point horizontal blobs are wrapped into vertical ones. Lines are +so called boxes and can be separated by glue, penalties and more. The page +builder accumulates lines and when feasible triggers an output routine that will +take the list so far. Constructing the actual page is not part of \TEX\ but done +using primitives that permit manipulation of boxes. The result is handled back to +\TEX\ and flushed to a (often \PDF) file. + +\starttyping +\setbox\scratchbox\vbox\bgroup + line 1\par line 2 +\egroup + +\showbox\scratchbox +\stoptyping + +The above code produces the next log lines that reveal how the engines sees a +paragraph (wrapped in a \type {\vbox}): + +\starttyping[style=small] +1:4: > \box257= +1:4: \vbox[normal][16=1,17=1,47=1], width 483.69687, height 27.58083, depth 0.1416, direction l2r +1:4: .\list +1:4: ..\hbox[line][16=1,17=1,47=1], width 483.69687, height 7.59766, depth 0.1416, glue 455.40097fil, direction l2r +1:4: ...\list +1:4: ....\glue[left hang][16=1,17=1,47=1] 0.0pt +1:4: ....\glue[left][16=1,17=1,47=1] 0.0pt +1:4: ....\glue[parfillleft][16=1,17=1,47=1] 0.0pt +1:4: ....\par[newgraf][16=1,17=1,47=1], hangafter 1, hsize 483.69687, pretolerance 100, tolerance 3000, adjdemerits 10000, linepenalty 10, doublehyphendemerits 10000, finalhyphendemerits 5000, clubpenalty 2000, widowpenalty 2000, brokenpenalty 100, emergencystretch 12.0, parfillskip 0.0pt plus 1.0fil, hyphenationmode 499519 +1:4: ....\glue[indent][16=1,17=1,47=1] 0.0pt +1:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+00006C l +1:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+000069 i +1:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+00006E n +1:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+000065 e +1:4: ....\glue[space][16=1,17=1,47=1] 3.17871pt plus 1.58936pt minus 1.05957pt, font 30 +1:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+000031 1 +1:4: ....\penalty[line][16=1,17=1,47=1] 10000 +1:4: ....\glue[parfill][16=1,17=1,47=1] 0.0pt plus 1.0fil +1:4: ....\glue[right][16=1,17=1,47=1] 0.0pt +1:4: ....\glue[right hang][16=1,17=1,47=1] 0.0pt +1:4: ..\glue[par][16=1,17=1,47=1] 5.44995pt plus 1.81665pt minus 1.81665pt +1:4: ..\glue[baseline][16=1,17=1,47=1] 6.79396pt +1:4: ..\hbox[line][16=1,17=1,47=1], width 483.69687, height 7.59766, depth 0.1416, glue 455.40097fil, direction l2r +1:4: ...\list +1:4: ....\glue[left hang][16=1,17=1,47=1] 0.0pt +1:4: ....\glue[left][16=1,17=1,47=1] 0.0pt +1:4: ....\glue[parfillleft][16=1,17=1,47=1] 0.0pt +1:4: ....\par[newgraf][16=1,17=1,47=1], hangafter 1, hsize 483.69687, pretolerance 100, tolerance 3000, adjdemerits 10000, linepenalty 10, doublehyphendemerits 10000, finalhyphendemerits 5000, clubpenalty 2000, widowpenalty 2000, brokenpenalty 100, emergencystretch 12.0, parfillskip 0.0pt plus 1.0fil, hyphenationmode 499519 +1:4: ....\glue[indent][16=1,17=1,47=1] 0.0pt +1:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+00006C l +1:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+000069 i +1:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+00006E n +1:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+000065 e +1:4: ....\glue[space][16=1,17=1,47=1] 3.17871pt plus 1.58936pt minus 1.05957pt, font 30 +1:4: ....\glyph[32768][16=1,17=1,47=1], language (n=1,l=2,r=3), hyphenationmode 499519, options 128 , font <30: DejaVuSerif @ 10.0pt>, glyph U+000032 2 +1:4: ....\penalty[line][16=1,17=1,47=1] 10000 +1:4: ....\glue[parfill][16=1,17=1,47=1] 0.0pt plus 1.0fil +1:4: ....\glue[right][16=1,17=1,47=1] 0.0pt +1:4: ....\glue[right hang][16=1,17=1,47=1] 0.0pt +\stoptyping + +The \LUATEX\ engine provides hooks for \LUA\ code at nearly every reasonable +point in the process: collecting content, hyphenating, applying font features, +breaking into lines, etc. This means that you can overload \TEX's natural +behaviour, which still is the benchmark. When we refer to \quote {callbacks} we +means these hooks. The \TEX\ engine itself is pretty well optimized but when you +kick in much \LUA\ code, you will notices that performance drops. Don't blame and +bother the authors with performance issues. In \CONTEXT\ over 50\% of the time +can be spent in \LUA, but so far we didn't get many complaints about efficiency. +Adding more callbacks makes no sense, also because at some point the performance +hit gets too large. There are plenty of ways to achieve goals. For that reason: +take remarks about \LUATEX, features, potential, performance etc.\ with a natural +grain of salt. + +Where plain \TEX\ is basically a basic framework for writing a specific style, +macro packages like \CONTEXT\ and \LATEX\ provide the user a whole lot of +additional tools to make documents look good. They hide the dirty details of font +management, language support, turning structure into typeset results, wrapping +pages, including images, and so on. You should be aware of the fact that when you +hook in your own code to manipulate lists, this can interfere with the macro +package that you use. Each successive step expects a certain result and if you +mess around to much, the engine eventually might bark and quit. It can even +crash, because testing everywhere for what users can do wrong is no real option. + +When you read about nodes in the following chapters it's good to keep in mind +what commands relate to them. Here are a few: + +\starttabulate[|l|l|p|] +\DB command \BC node \BC explanation \NC \NR +\TB +\NC \prm {hbox} \NC \nod {hlist} \NC horizontal box \NC \NR +\NC \prm {vbox} \NC \nod {vlist} \NC vertical box with the baseline at the bottom \NC \NR +\NC \prm {vtop} \NC \nod {vlist} \NC vertical box with the baseline at the top \NC \NR +\NC \prm {hskip} \NC \nod {glue} \NC horizontal skip with optional stretch and shrink \NC \NR +\NC \prm {vskip} \NC \nod {glue} \NC vertical skip with optional stretch and shrink \NC \NR +\NC \prm {kern} \NC \nod {kern} \NC horizontal or vertical fixed skip \NC \NR +\NC \prm {discretionary} \NC \nod {disc} \NC hyphenation point (pre, post, replace) \NC \NR +\NC \prm {char} \NC \nod {glyph} \NC a character \NC \NR +\NC \prm {hrule} \NC \nod {rule} \NC a horizontal rule \NC \NR +\NC \prm {vrule} \NC \nod {rule} \NC a vertical rule \NC \NR +\NC \prm {textdirection} \NC \nod {dir} \NC a change in text direction \NC \NR +\LL +\stoptabulate + +Whatever we feed into \TEX\ at some point becomes a token which is either +interpreted directly or stored in a linked list. A token is just a number that +encodes a specific command (operator) and some value (operand) that further +specifies what that command is supposed to do. In addition to an interface to +nodes, there is an interface to tokens, as later chapters will demonstrate. + +Text (interspersed with macros) comes from an input medium. This can be a file, +token list, macro body cq.\ arguments, some internal quantity (like a number), +\LUA, etc. Macros get expanded. In the process \TEX\ can enter a group. Inside +the group, changes to registers get saved on a stack, and restored after leaving +the group. When conditionals are encountered, another kind of nesting happens, +and again there is a stack involved. Tokens, expansion, stacks, input levels are +all terms used in the next chapters. Don't worry, they loose their magic once you +use \TEX\ a lot. You have access to most of the internals and when not, at least +it is possible to query some state we're in or level we're at. + +When we talk about pack(ag)ing it can mean two things. When \TEX\ has consumed +some tokens that represent text they are added to the current list. When the text +is put into a so called \type {\hbox} (for instance a line in a paragraph) it +(normally) first gets hyphenated, next ligatures are build, and finally kerns are +added. Each of these stages can be overloaded using \LUA\ code. When these three +stages are finished, the dimension of the content is calculated and the box gets +its width, height and depth. What happens with the box depends on what macros do +with it. + +The other thing that can happen is that the text starts a new paragraph. In that +case some information is stored in a leading \type {par} node. Then indentation +is appended and the paragraph ends with some glue. Again the three stages are +applied but this time afterwards, the long line is broken into lines and the +result is either added to the content of a box or to the main vertical list (the +running text so to say). This is called par building. At some point \TEX\ decides +that enough is enough and it will trigger the page builder. So, building is +another concept we will encounter. Another example of a builder is the one that +turns an intermediate math list into something typeset. + +Wrapping something in a box is called packing. Adding something to a list is +described in terms of contributing. The more complicated processes are wrapped +into builders. For now this should be enough to enable you to understand the next +chapters. The text is not as enlightening and entertaining as Don Knuths books, +sorry. + +\stopchapter + +\stopcomponent |