From 7b271baae19db1528fbe6621bdf50af89a5a336b Mon Sep 17 00:00:00 2001 From: Hans Hagen Date: Fri, 22 Feb 2019 20:29:46 +0100 Subject: 2019-02-22 19:43:00 --- .../general/manuals/onandon/onandon-runtoks.tex | 531 +++++++++++++++++++++ 1 file changed, 531 insertions(+) create mode 100644 doc/context/sources/general/manuals/onandon/onandon-runtoks.tex (limited to 'doc/context/sources/general/manuals/onandon/onandon-runtoks.tex') diff --git a/doc/context/sources/general/manuals/onandon/onandon-runtoks.tex b/doc/context/sources/general/manuals/onandon/onandon-runtoks.tex new file mode 100644 index 000000000..b3adeb4a5 --- /dev/null +++ b/doc/context/sources/general/manuals/onandon/onandon-runtoks.tex @@ -0,0 +1,531 @@ +% language=uk + +\startcomponent onandon-amputating + +\environment onandon-environment + +\startchapter[title={Amputating code}] + +\startsection[title={Introduction}] + +Because \CONTEXT\ is already rather old in terms of software life and because it +evolves over time, code can get replaced by better code. Reasons for this can be: + +\startitemize[packed] +\startitem a better understanding of the way \TEX\ and \METAPOST\ work \stopitem +\startitem demand for more advanced options \stopitem +\startitem a brainwave resulting in a better solution \stopitem +\startitem new functionality provided in \TEX\ engine used \stopitem +\startitem the necessity to speed up a core process \stopitem +\stopitemize + +Replacing code that in itself does a good job but is no longer the best to be +used comes with sentiments. It can be rather satisfying to cook up a +(conceptually as well as codewise) good solution and therefore removing code from +a file can result in a somewhat bad feeling and even a feeling of losing +something. Hence the title of this chapter. + +Here I will discuss one of the more complex subsystems: the one dealing with +typeset text in \METAPOST\ graphics. I will stick to the principles and not +present (much) code as that can be found in archives. This is not a tutorial, +but more a sort of wrap|-|up for myself. It anyhow show the thinking behind +this mechanism. I'll also introduce a new \LUATEX\ feature here: subruns. + +\stopsection + +\startsection[title={The problem}] + +\METAPOST\ is meant for drawing graphics and adding text to them is not really +part of the concept. Its a bit like how \TEX\ sees images: the dimensions matter, +the content doesn't. This means that in \METAPOST\ a blob of text is an +abstraction. The native way to create a typeset text picture is: + +\starttyping +picture p ; p := btex some text etex ; +\stoptyping + +In traditional \METAPOST\ this will create a temporary \TEX\ file with the words +\type {some text} wrapped in a box that when typeset is just shipped out. The +result is a \DVI\ file that with an auxiliary program will be transformed into a +\METAPOST\ picture. That picture itself is made from multiple pictures, because +each sequences of characters becomes a picture and kerns become shifts. + +There is also a primitive \type {infont} that takes a text and just converts it +into a low level text object but no typesetting is done there: so no ligatures +and no kerns are found there. In \CONTEXT\ this operator is redefined to do the +right thing. + +In both cases, what ends up in the \POSTSCRIPT\ file is references to fonts and +characters and the original idea is that \DVIPS\ understands what +fonts to embed. Details are communicated via specials (comments) that \DVIPS\ is +supposed to intercept and understand. This all happens in an 8~bit (font) universe. + +When we moved on to \PDF, a converter from \METAPOST's rather predictable and +simple \POSTSCRIPT\ code to \PDF\ was written in \TEX. The graphic operators +became \PDF\ operators and the text was retypeset using the font information and +snippets of strings and injected at the right spot. The only complication was +that a non circular pen actually produced two path of which one has to be +transformed. + +At that moment it already had become clear that a more tight integration in +\CONTEXT\ would happen and not only would that demand a more sophisticated +handling of text, but it would also require more features not present in +\METAPOST, like dealing with \CMYK\ colors, special color spaces, transparency, +images, shading, and more. All this was implemented. In the next sections we will +only discuss texts. + +\stopsection + +\startsection[title={Using the traditional method}] + +The \type {btex} approach was not that flexible because what happens is that +\type {btex} triggers the parser to just grabbing everything upto the \type +{etex} and pass that to an external program. It's special scanner mode and +because because of that using macros for typesetting texts is a pain. So, instead +of using this method in \CONTEXT\ we used \type {textext}. Before a run the +\METAPOST\ file was scanned and for each \type {textext} the argument was copied +to a file. The \type {btex} calls were scanned to and replaced by \type {textext} +calls. + +For each processed snippet the dimensions were stored in order to be loaded at +the start of the \METAPOST\ run. In fact, each text was just a rectangle with +certain dimensions. The \PDF\ converter would use the real snippet (by +typesetting it). + +Of course there had to be some housekeeping in order to make sure that the right +snippets were used, because the order of definition (as picture) can be different +from them being used. This mechanism evolved into reasonable robust text handling +but of course was limited by the fact that the file was scanned for snippets. So, +the string had to be string and not assembled one. This disadvantage was +compensated by the fact that we could communicate relevant bits of the +environment and apply all the usual context trickery in texts in a way that was +consistent with the rest of the document. + +A later implementation could communicate the text via specials which is more +flexible. Although we talk of this method in the past sense it is still used in +\MKII. + +\stopsection + +\startsection[title={Using the library}] + +When the \MPLIB\ library showed up in \LUATEX, the same approach was used but +soon we moved on to a different approach. We already used specials to communicate +extensions to the backend, using special colors and fake objects as signals. But +at that time paths got pre- and postscripts fields and those could be used to +really carry information with objects because unlike specials, they were bound to +that object. So, all extensions using specials as well as texts were rewritten to +use these scripts. + +The \type {textext} macro changed its behaviour a bit too. Remember that a +text effectively was just a rectangle with some transformation applied. However +this time the postscript field carried the text and the prescript field some +specifics, like the fact that that we are dealing with text. Using the script made +it possible to carry some more inforation around, like special color demands. + +\starttyping +draw textext("foo") ; +\stoptyping + +Among the prescripts are \typ {tx_index=trial} and \typ {tx_state=trial} +(multiple prescripts are prepended) and the postscript is \type {foo}. In a +second run the prescript is \type {tx_index=trial} and \typ {tx_state=final}. +After the first run we analyze all objects, collect the texts (those with a \type +{tx_} variables set) and typeset them. As part of the second run we pass the +dimensions of each indexed text snippet. Internally before the first run we +\quote {reset} states, then after the first run we \quote {analyze}, and after +the second run we \quote {process} as part of the conversion of output to \PDF. + +\stopsection + +\startsection[title={Using \type {runscript}}] + +When the \type {runscript} feature was introduced in the library we no longer +needed to pass the dimensions via subscripted variables. Instead we could just +run a \LUA\ snippets and ask for the dimensions of a text with some index. This +is conceptually not much different but it saves us creating \METAPOST\ code that +stored the dimensions, at the cost of potentially a bit more runtime due to the +\type {runscript} calls. But the code definitely looks a bit cleaner this way. Of +course we had to keep the dimensions at the \LUA\ end but we already did that +because we stored the preprocessed snippets for final usage. + +\stopsection + +\startsection[title={Using a sub \TEX\ run}] + +We now come the current (post \LUATEX\ 1.08) solution. For reasons I will +mention later a two pass approach is not optimal, but we can live with that, +especially because \CONTEXT\ with \METAFUN\ (which is what we're talking about +here) is quit efficient. More important is that it's kind of ugly to do all the +not that special work twice. In addition to text we also have outlines, graphics +and more mechanisms that needed two passes and all these became one pass +features. + +A \TEX\ run is special in many ways. At some point after starting up \TEX\ +enters the main loop and begins reading text and expanding macros. Normally you +start with a file but soon a macro is seen, and a next level of input is entered, +because as part of the expansion more text can be met, files can be opened, +other macros be expanded. When a macro expands a token register, another level is +entered and the same happens when a \LUA\ call is triggered. Such a call can +print back something to \TEX\ and that has to be scanned as if it came from a +file. + +When token lists (and macros) get expanded, some commands result in direct +actions, others result in expansion only and processing later as one of more +tokens can end up in the input stack. The internals of the engine operate in +miraculous ways. All commands trigger a function call, but some have their own +while others share one with a switch statement (in \CCODE\ speak) because they +belong to a category of similar actions. Some are expanded directly, some get +delayed. + +Does it sound complicated? Well, it is. It's even more so when you consider that +\TEX\ uses nesting, which means pushing and popping local assignments, knows +modes, like horizontal, vertical and math mode, keeps track of interrupts and at +the same type triggers typesetting, par building, page construction and flushing +to the output file. + +It is for this reason plus the fact that users can and will do a lot to influence +that behaviour that there is just one main loop and in many aspects global state. +There are some exceptions, for instance when the output routine is called, which +creates a sort of closure: it interrupts the process and for that reason gets +grouping enforced so that it doesn't influence the main run. But even then the +main loop does the job. + +Starting with version 1.10 \LUATEX\ provides a way to do a local run. There are +two ways provided: expanding a token register and calling a \LUA\ function. It +took a bit of experimenting to reach an implementation that works out reasonable +and many variants were tried. In the appendix we give an example of usage. + +The current variant is reasonable robust and does the job but care is needed. +First of all, as soon as you start piping something to \TEX\ that gets typeset +you'd better in a valid mode. If not, then for instance glyphs can end up in a +vertical list and \LUATEX\ will abort. In case you wonder why we don't intercept +this: we can't because we don't know the users intentions. We cannot enforce a +mode for instance as this can have side effects, think of expanding \type +{\everypar} or injecting an indentation box. Also, as soon as you start juggling +nodes there is no way that \TEX\ can foresee what needs to be copied to +discarded. Normally it works out okay but because in \LUATEX\ you can cheat in +numerous ways with \LUA, you can get into trouble. + +So, what has this to do with \METAPOST ? Well, first of all we could now use a +one pass approach. The \type {textext} macro calls \LUA, which then let \TEX\ do +some typesetting, and then gives back the dimensions to \METAPOST. The \quote +{analyze} phase is now integrated in the run. For a regular text this works quite +well because we just box some text and that's it. However, in the next section we +will see where things get complicated. + +Let's summarize the one pass approach: the \type {textext} macro creates +rectangle with the right dimensions and for doing passes the string to \LUA\ +using \type {runscript}. We store the argument of \type {textext} in a variable, +then call \type {runtoks}, which expands the given token list, where we typeset a +box with the stored text (that we fetch with a \LUA\ call), and the \type +{runscript} passes back the three dimensions as fake \RGB\ color to \METAPOST\ +which applies a \type {scantokens} to the result. So, in principle there is no +real conceptual difference except that we now analyze in|-|place instead of +between runs. I will not show the code here because in \CONTEXT\ we use a wrapper +around \type {runscript} so low level examples won't run well. + +\stopsection + +\startsection[title={Some aspects}] + +An important aspect of the text handling is that the whole text can be +transformed. Normally this is only some scaling but rotation is also quite valid. +In the first approach, the original \METAPOST\ one, we have pictures constructed +of snippets and pictures transform well as long as the backend is not too +confused, something that can happen when for instance very small or large font +scales are used. There were some limitations with respect to the number of fonts +and efficient inclusion when for instance randomization was used (I remember +cases with thousands of font instances). The \PDF\ backend could handle most +cases well, by just using one size and scaling at the \PDF\ level. All the \type +{textext} approaches use rectangles as stubs which is very efficient and permits +all transforms. + +How about color? Think of this situation: + +\starttyping +\startMPcode + draw textext("some \color[red]{text}") + withcolor green ; +\stopMPcode +\stoptyping + +And what about the document color? We suffice by saying that this is all well +supported. Of course using transparency, spot colors etc.\ also needs extensions. +These are however not directly related to texts although we need to take it into +account when dealing with the inclusion. + +\starttyping +\startMPcode + draw textext("some \color[red]{text}") + withcolor "blue" + withtransparency (1,0.5) ; +\stopMPcode +\stoptyping + +What if you have a graphic with many small snippets of which many have the same +content? These are by default shared, but if needed you can disable it. This makes +sense if you have a case like this: + +\starttyping +\useMPlibrary[dum] + +\startMPcode + draw textext("\externalfigure[unknown]") notcached ; + draw textext("\externalfigure[unknown]") notcached ; +\stopMPcode +\stoptyping + +Normally each unknown image gets a nice placeholder with some random properties. +So, do we want these two to have the same or not? At least you can control it. + +When I said that things can get complicated with the one pass approach the +previous code snippet is a good example. The dummy figure is generated by +\METAPOST. So, as we have one pass, and jump temporarily back to \TEX, +we have two problems: we reenter the \MPLIB\ instance again in the middle of +a run, and we might pipe back something to and|/|or from \TEX\ nested. + +The first problem could be solved by starting a new \MPLIB\ session. This +normally is not a problem as both runs are independent of each other. In +\CONTEXT\ we can have \METAPOST\ runs in many places and some produce some more +of less stand alone graphic in the text while other calls produce \PDF\ code in +the backend that is used in a different way (for instance in a font). In the +first case the result gets nicely wrapped in a box, while in the second case it +might directly end up in the page stream. And, as \TEX\ has no knowledge of what +is needed, it's here that we can get the complications that can lead to aborting +a run when you are careless. But in any case, if you abort, then you can be sure +you're doing the wrong thing. So, the second problem can only be solved by +careful programming. + +When I ran the test suite on the new code, some older modules had to be fixed. +They were doing the right thing from the perspective of intermediate runs and +therefore independent box handling, putting a text in a box and collecting +dimensions, but interwoven they demanded a bit more defensive programming. For +instance, the multi|-|pass approach always made copies snippets while the one +pass approach does that only when needed. And that confused some old code in a +module, which incidentally is never used today because we have better +functionality built|-|in (the \METAFUN\ \type {followtext} mechanism). + +The two pass approach has special code for cases where a text is not used. +Imagine this: + +\starttyping +picture p ; p := textext("foo") ; + +draw boundingbox p; +\stoptyping + +Here the \quote {analyze} stage will never see the text because we don't flush p. +However because \type {textext} is called it can also make sure we still know the +dimensions. In the next case we do use the text but in two different ways. These +subtle aspects are dealt with properly and could be made a it simpler in the +single pass approach. + +\starttyping +picture p ; p := textext("foo") ; + +draw p rotated 90 withcolor red ; +draw p withcolor green ; +\stoptyping + +\stopsection + +\startsection[title=One or two runs] + +So are we better off now? One problem with two passes is that if you use the +equation solver you need to make sure that you don't run into the redundant +equation issue. So, you need to manage your variables well. In fact you need to +do that anyway because you can call out to \METAPOST\ many times in a run so old +variables can interfere anyway. So yes, we're better off here. + +Are we worse off now? The two runs with in between the text processing is very +robust. There is no interference of nested runs and no interference of nested +local \TEX\ calls. So, maybe we're also bit worse off. You need to anyhow keep +this in mind when you write your own low level \TEX|-|\METAPOST\ interaction +trickery, but fortunately now many users do that. And if you did write your own +plugins, you now need to make them single pass. + +The new code is conceptually cleaner but also still not trivial because due to +the mentioned complications. It's definitely less code but somehow amputating the +old code does hurt a bit. Maybe I should keep it around as reference of how text +handling evolved over a few decades. + +\stopsection + +\startsection[title=Appendix] + +Because the single pass approach made me finally look into a (although somewhat +limited) local \TEX\ run, I will show a simple example. For the sake of +generality I will use \type {\directlua}. Say that you need the dimensions of a +box while in \LUA: + +\startbuffer +\directlua { + tex.sprint("result 1: <") + + tex.sprint("\\setbox0\\hbox{one}") + tex.sprint("\\number\\wd0") + + tex.sprint("\\setbox0\\hbox{\\directlua{tex.print{'first'}}}") + tex.sprint(",") + tex.sprint("\\number\\wd0") + + tex.sprint(">") +} +\stopbuffer + +\typebuffer \getbuffer + +This looks ok, but only because all printed text is collected and pushed into a +new input level once the \LUA\ call is done. So take this then: + +\startbuffer +\directlua { + tex.sprint("result 2: <") + + tex.sprint("\\setbox0\\hbox{one}") + tex.sprint(tex.getbox(0).width) + + tex.sprint("\\setbox0\\hbox{\\directlua{tex.print{'first'}}}") + tex.sprint(",") + tex.sprint(tex.getbox(0).width) + + tex.sprint(">") +} +\stopbuffer + +\typebuffer \getbuffer + +This time we get the widths of the box known at the moment that we are in \LUA, +but we haven't typeset the content yet, so we get the wrong dimensions. This +however will work okay: + +\startbuffer +\toks0{\setbox0\hbox{one}} +\toks2{\setbox0\hbox{first}} +\directlua { + tex.forcehmode(true) + + tex.sprint("<") + + tex.runtoks(0) + tex.sprint(tex.getbox(0).width) + + tex.runtoks(2) + tex.sprint(",") + tex.sprint(tex.getbox(0).width) + + tex.sprint(">") +} +\stopbuffer + +\typebuffer \getbuffer + +as does this: + +\startbuffer +\toks0{\setbox0\hbox{\directlua{tex.sprint(MyGlobalText)}}} +\directlua { + tex.forcehmode(true) + + tex.sprint("result 3: <") + + MyGlobalText = "one" + tex.runtoks(0) + tex.sprint(tex.getbox(0).width) + + MyGlobalText = "first" + tex.runtoks(0) + tex.sprint(",") + tex.sprint(tex.getbox(0).width) + + tex.sprint(">") +} +\stopbuffer + +\typebuffer \getbuffer + +Here is a variant that uses functions: + +\startbuffer +\directlua { + tex.forcehmode(true) + + tex.sprint("result 4: <") + + tex.runtoks(function() + tex.sprint("\\setbox0\\hbox{one}") + end) + tex.sprint(tex.getbox(0).width) + + tex.runtoks(function() + tex.sprint("\\setbox0\\hbox{\\directlua{tex.print{'first'}}}") + end) + tex.sprint(",") + tex.sprint(tex.getbox(0).width) + + tex.sprint(">") +} +\stopbuffer + +\typebuffer \getbuffer + +The \type {forcemode} is needed when you do this in vertical mode. Otherwise the +run aborts. Of course you can also force horizontal mode before the call. I'm +sure that users will be surprised by side effects when they really use this +feature but that is to be expected: you really need to be aware of the subtle +interference of input levels and mix of input media (files, token lists, macros +or \LUA) as well as the fact that \TEX\ often looks one token ahead, and often, +when forced to typeset something, also can trigger builders. You're warned. + +\stopsection + +\stopchapter + +\stopcomponent + +% \starttext + +% \toks0{\hbox{test}} [\ctxlua{tex.runtoks(0)}]\par + +% \toks0{\relax\relax\hbox{test}\relax\relax}[\ctxlua{tex.runtoks(0)}]\par + +% \toks0{xxxxxxx} [\ctxlua{tex.runtoks(0)}]\par + +% \toks0{\hbox{(\ctxlua{context("test")})}} [\ctxlua{tex.runtoks(0)}]\par + +% \toks0{\global\setbox1\hbox{(\ctxlua{context("test")})}} [\ctxlua{tex.runtoks(0)}\box1]\par + +% \startluacode +% local s = "[\\ctxlua{tex.runtoks(0)}\\box1]" +% context("<") +% context( function() context(s) end) +% context( function() context(s) end) +% context(">") +% \stopluacode\par + +% \toks10000{\hbox{\red test1}} +% \toks10002{\green\hbox{test2}} +% \toks10004{\hbox{\global\setbox1\hbox to 1000sp{\directlua{context("!4!")}}}} +% \toks10006{\hbox{\global\setbox3\hbox to 2000sp{\directlua{context("?6?")}}}} +% \hbox{x\startluacode +% local s0 = "(\\hbox{\\ctxlua{tex.runtoks(10000)}})" +% local s2 = "[\\hbox{\\ctxlua{tex.runtoks(10002)}}]" +% context("") +% \stopluacode x}\par + + -- cgit v1.2.3