diff options
Diffstat (limited to 'doc/context/sources/general/manuals/about/about-calls.tex')
-rw-r--r-- | doc/context/sources/general/manuals/about/about-calls.tex | 739 |
1 files changed, 739 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/about/about-calls.tex b/doc/context/sources/general/manuals/about/about-calls.tex new file mode 100644 index 000000000..83bf89aad --- /dev/null +++ b/doc/context/sources/general/manuals/about/about-calls.tex @@ -0,0 +1,739 @@ +% language=uk + +\startcomponent about-calls + +\environment about-environment + +\startchapter[title={Calling Lua}] + +\startsection[title=Introduction] + +One evening, on Skype, Luigi and I were pondering about the somewhat +disappointing impact of jit in \LUAJITTEX\ and one of the reasons we could come +up with is that when you invoke \LUA\ from inside \TEX\ each \type {\directlua} +gets an extensive treatment. Take the following: + +\starttyping +\def\SomeValue#1% + {\directlua{tex.print(math.sin(#1)/math.cos(2*#1))}} +\stoptyping + +Each time \type {\SomeValue} is expanded, the \TEX\ parser will do the following: + +\startitemize[packed] +\startitem + It sees \type {\directlua} and will jump to the related scanner. +\stopitem +\startitem + There it will see a \type +{+ and enter a special mode in which it starts + collecting tokens. +\stopitem +\startitem + In the process, it will expand control sequences that are expandable. +\stopitem +\startitem + The scanning ends when a matching \type +}+ is seen. +\stopitem +\startitem + The collected tokens are converted into a regular (C) string. +\stopitem +\startitem + This string is passed to the \type {lua_load} function that compiles it into + bytecode. +\stopitem +\startitem + The bytecode is executed and characters that are printed to \TEX\ are + injected into the input buffer. +\stopitem +\stopitemize + +In the process, some state information is set and reset and errors are dealt +with. Although it looks like a lot of actions, this all happens very fast, so +fast actually that for regular usage you don't need to bother about it. + +There are however applications where you might want to see a performance boost, +for instance when you're crunching numbers that end up in tables or graphics +while processing the document. Again, this is not that typical for jobs, but with +the availability of \LUA\ more of that kind of usage will show up. And, as we now +also have \LUAJITTEX\ its jitting capabilities could be an advantage. + +Back to the example: there are two calls to functions there and apart from the +fact that they need to be resolved in the \type {math} table, they also are +executed C functions. As \LUAJIT\ optimizes known functions like this, there can +be a potential speed gain but as \type {\directlua} is parsed and loaded each +time, the jit machinery will not do that, unless the same code gets exercised +lots of time. In fact, the jit related overhead would be a waste in this one time +usage. + +In the next sections we will show two variants that follow a different approach +and as a consequence can speed up a bit. But, be warned: the impact is not as +large as you might expect, and as the code might look less intuitive, the good +old \type {\directlua} command is still the advised method. + +Before we move on it's important to realize that a \type {\directlua} call is +in fact a function call. Say that we have this: + +\starttyping +\def\SomeValue{1.23} +\stoptyping + +This becomes: + +\starttyping +\directlua{tex.print(math.sin(1.23)/math.cos(2*1.23))} +\stoptyping + +Which in \LUA\ is wrapped up as: + +\starttyping +function() + tex.print(math.sin(1.23)/math.cos(2*1.23)) +end +\stoptyping + +that gets executed. So, the code is always wrapped in a function. Being a +function it is also a closure and therefore local variables are local to this +function and are invisible at the outer level. + +\stopsection + +\startsection[title=Indirect \LUA] + +The first variant is tagged as indirect \LUA. With indirect we mean that instead +of directly parsing, compiling and executing the code, it is done in steps. This +method is not as generic a the one discussed in the next section, but for cases +where relatively constant calls are used it is fine. Consider the next call: + +\starttyping +\def\NextValue + {\indirectlua{myfunctions.nextvalue()}} +\stoptyping + +This macro does not pass values and always looks the same. Of course there can be +much more code, for instance the following is equally valid: + +\starttyping +\def\MoreValues {\indirectlua{ + for i=1,100 do + myfunctions.nextvalue(i) + end +}} +\stoptyping + +Again, there is no variable information passed from \TEX. Even the next variant +is relative constant: + +\starttyping +\def\SomeValues#1{\indirectlua{ + for i=1,#1 do + myfunctions.nextvalue(i) + end +}} +\stoptyping + +especially when this macro is called many times with the same value. So how does +\type {\indirectlua} work? Well, it's behaviour is in fact undefined! It does, +like \type {\directlua}, parse the argument and makes the string, but instead of +calling \LUA\ directly, it will pass the string to a \LUA\ function \type +{lua_call}. + +\starttyping +lua.call = function(s) load(s)() end +\stoptyping + +The previous definition is quite okay and in fact makes \type {\indirectlua} +behave like \type {\directlua}. This definition makes + +% \ctxlua{lua.savedcall = lua.call lua.call = function(s) load(s)() end} +% \testfeatureonce{10000}{\directlua {math.sin(1.23)}} +% \testfeatureonce{10000}{\indirectlua{math.sin(1.23)}} +% \ctxlua{lua.call = lua.savedcall} + +\starttyping +\directlua {tex.print(math.sin(1.23))} +\indirectlua{tex.print(math.sin(1.23))} +\stoptyping + +equivalent calls but the second one is slightly slower, which is to be expected +due to the wrapping and indirect loading. But look at this: + +\starttyping +local indirectcalls = { } + +function lua.call(code) + local fun = indirectcalls[code] + if not fun then + fun = load(code) + if type(fun) ~= "function" then + fun = function() end + end + indirectcalls[code] = fun + end + fun() +end +\stoptyping + +This time the code needs about one third of the runtime. How much we gain depends +on the size of the code and its complexity, but on the average its's much faster. +Of course, during a \TEX\ job only a small part of the time is spent on this, so +the overall impact is much smaller, but it makes runtime number crunching more +feasible. + +If we bring jit into the picture, the situation becomes somewhat more diffuse. +When we use \LUAJITTEX\ the whole job processed faster, also this part, but +because loading and interpreting is more optimized the impact might be less. If +you enable jit, in most cases a run is slower than normal. But as soon as you +have millions of calls to e.g.\ type {math.sin} it might make a difference. + +This variant of calling \LUA\ is quite intuitive and also permits us to implement +specific solutions because the \type {lua.call} function can be defined as you +with. Of course macro package writers can decide to use this feature too, so you +need to beware of unpleasant side effects if you redefine this function. + +% \testfeatureonce{100000}{\directlua {math.sin(1.23)}} +% \testfeatureonce{100000}{\indirectlua{math.sin(1.23)}} + +\stopsection + +\startsection[title=Calling \LUA] + +In the process we did some tests with indirect calls in \CONTEXT\ core code and +indeed some gain in speed could be noticed. However, many calls get variable +input and therefore don't qualify. Also, as a mixture of \type {\directlua} and +\type {\indirectlua} calls in the source can be confusing it only makes sense to +use this feature in real time|-|critical cases, because even in moderately +complex documents there are not that many calls anyway. + +The next method uses a slightly different approach. Here we stay at the \TEX\ +end, parse some basic type arguments, push them on the \LUA\ stack, and call a +predefined function. The amount of parsing \TEX\ code is not less, but especially +when we pass numbers stored in registers, no tokenization (serialization of a +number value into the input stream) and stringification (converting the tokens +back to a \LUA\ number) takes place. + +\starttyping +\indirectluacall 123 + {some string} + \scratchcounter + {another string} + true + \dimexpr 10pt\relax +\relax +\stoptyping + +Actually, an extension like this had been on the agenda for a while, but never +really got much priority. The first number is a reference to a function to be +called. + +\starttyping +lua.calls = lua.calls or { } +lua.calls[123] = function(s1,n1,s2,b,n2) + -- do something with + -- + -- string s1 + -- number n1 + -- string s2 + -- boolean b + -- number n2 +end +\stoptyping + +The first number to \type {indirectluacall} is mandate. It can best also be a +number that has a function associated in the \type {lua.calls} table. Following +that number and before the also mandate \type {\relax}, there can be any number +of arguments: strings, numbers and booleans. + +Anything surrounded by \type {{}} becomes a string. The keywords \type {true} and +\type {false} become boolean values. Spaces are skipped and everything else is +assumed to be a number. This means that if you omit the final \type {\relax}, you +get a error message mentioning a \quote {missing number}. The normal number +parser applies, so when a dimension register is passed, it is turned into a +number. The example shows that wrapping a more verbose dimension into a \type +{\dimexpr} also works. + +Performance wise, each string goes from list of tokens to temporary C string to +\LUA\ string, so that adds some overhead. A number is more efficient, especially +when you pass it using a register. The booleans are simple sequences of character +tokens so they are relatively efficient too. Because \LUA\ functions accept an +arbitrary number of arguments, you can provide as many as you like, or even less +than the function expects: it is all driven by the final \type {\relax}. + +An important characteristic of this kind of call is that there is no \type {load} +involved, which means that the functions in \type {lua.calls} can be subjected to +jitting. + +\stopsection + +\startsection[title=Name spaces] + +As with \type {\indirectlua} there is a potential clash when users mess with the +\type {lua.calls} table without taking the macro package usage into account. It not +that complex to define a variant that provides namespaces: + +\starttyping +\newcount\indirectmain \indirectmain=1 +\newcount\indirectuser \indirectuser=2 + +\indirectluacall \indirectmain + {function 1} + {some string} +\relax + +\indirectluacall \indirectuser + {function 1} + {some string} +\relax +\stoptyping + +A matching implementation is this: + +\starttyping +lua.calls = lua.calls or { } + +local main = { } + +lua.calls[1] = function(name,...) + main[name](...) +end + +main["function 1"] = function(a,b,c) + -- do something with a,b,c +end + +local user = { } + +lua.calls[2] = function(name,...) + user[name](...) +end + +user["function 1"] = function(a,b,c) + -- do something with a,b,c +end +\stoptyping + +Of course this is also ok: + +\starttyping +\indirectluacall \indirectmain 1 + {some string} +\relax + +\indirectluacall \indirectuser 1 + {some string} +\relax +\stoptyping + +with: + +\starttyping +main[1] = function(a,b,c) + -- do something with a,b,c +end + +user[1] = function(a,b,c) + -- do something with a,b,c +end +\stoptyping + +Normally a macro package, if it wants to expose this mechanism, will provide a +more abstract interface that hides the implementation details. In that case the +user is not supposed to touch \type {lua.calls} but this is not much different +from the limitations in redefining primitives, so users can learn to live with +this. + +\stopsection + +\startsection[title=Practice] + +There are some limitations. For instance in \CONTEXT\ we often pass tables and +this is not implemented. Providing a special interface for that is possible but +does not really help. Often the data passed that way is far from constant, so it +can as well be parsed by \LUA\ itself, which is quite efficient. We did some +experiments with the more simple calls and the outcome is somewhat disputable. If +we replace some of the \quote {critital} calls we can gain some 3\% on a run of +for instance the \type {fonts-mkiv.pdf} manual and a bit more on the command +reference \type {cont-en.pdf}. The first manual uses lots of position tracking +(an unfortunate side effect of using a specific feature that triggers continuous +tracking) and low level font switches and many of these can benefit from the +indirect call variant. The command reference manual uses \XML\ processing and +that involves many calls to the \XML\ mapper and also does quite some string +manipulations so again there is something to gain there. + +The following numbers are just an indication, as only a subset of \type +{\directlua} calls has been replaced. The 166 page font manual processes in about +9~seconds which is not bad given its complexity. The timings are on a Dell +Precision M6700 with Core i7 3840QM, 16 GB memory, a fast SSD and 64 bit Windows +8. The binaries were cross compiled mingw 32 bit by Luigi. \footnote {While +testing with several function definitions we noticed that \type {math.random} in +our binaries made jit twice as slow as normal, while for instance \type +{math.sin} was 100 times faster. As the font manual uses the random function for +rendering random punk examples it might have some negative impact. Our experience +is that binaries compiled with the ms compiler are somewhat faster but as long as +the engines that we test are compiled similarly the numbers can be compared.} + +% old: 8.870 8.907 9.089 / jit: 6.948 6.966 7.009 / jiton: 7.449 7.586 7.609 +% new: 8.710 8.764 8.682 | 8.64 / jit: 6.935 6.969 6.967 | 6.82 / jiton: 7.412 7.223 7.481 +% +% 3% on total, 6% on lua + +\starttabulate[|lT|cT|cT|cT|] +\HL +\NC \NC \LUATEX \NC \LUAJITTEX \NC \LUAJITTEX\ + jit \NC \NR +\HL +\NC direct \NC 8.90 \NC 6.95 \NC 7.50 \NC \NR +\NC indirect \NC 8.65 \NC 6.80 \NC 7.30 \NC \NR +\HL +\stoptabulate + +So, we can gain some 3\% on such a document and given that we spend probably half +the time in \LUA, this means that these new features can make \LUA\ run more than +5\% faster which is not that bad for a couple of lines of extra code. For regular +documents we can forget about jit which confirms earlier experiments. The +commands reference has these timings: + +\starttabulate[|lT|cT|cT|cT|] +\HL +\NC \NC \LUATEX \NC \LUAJITTEX \NC \NR +\HL +\NC direct \NC 2.55 \NC 1.90 \NC \NR +\NC indirect \NC 2.40 \NC 1.80 \NC \NR +\HL +\stoptabulate + +Here the differences are larger which is due to the fact that we can indirect +most of the calls used in this processing. The document is rather simple but as +mentioned is encoded in \XML\ and the \TEX||\XML\ interface qualifies for this +kind of speedups. + +As Luigi is still trying to figure out why jitting doesn't work out so well, we +also did some tests with (in itself useless) calculations. After all we need +proof. The first test was a loop with 100.000 step doing a regular \type +{\directlua}: + +\starttyping +\directlua { + local t = { } + for i=1,10000 + do t[i] = math.sin(i/10000) + end +} +\stoptyping + +The second test is a bit optimized. When we use jit this kind of optimizations +happens automatically for known (!) functions so there is not much won. + +\starttyping +\directlua { + local sin = math.sin + local t = { } + for i=1,10000 + do t[i] = sin(i/10000) + end +} +\stoptyping + +We also tested this with \type {\indirectlua} and therefore defined some +functions to test the call variant: + +\starttyping +lua.calls[1] = function() + -- overhead +end + +lua.calls[2] = function() + local t = { } + for i=1,10000 do + t[i] = math.sin(i/10000) -- naive + end +end + +lua.calls[3] = function() + local sin = math.sin + local t = { } + for i=1,10000 do + t[i] = sin(i/10000) -- normal + end +end +\stoptyping + +These are called with: + +\starttyping +\indirectluacall0\relax +\indirectluacall1\relax +\indirectluacall2\relax +\stoptyping + +The overhead variant demonstrated that there was hardly any: less than 0.1 second. + +\starttabulate[|lT|lT|cT|cT|cT|] +\HL +\NC \NC \NC \LUATEX \NC \LUAJITTEX \NC \LUAJITTEX\ + jit \NC \NR +\HL +\NC directlua \NC normal \NC 167 \NC 64 \NC 46 \NC \NR +\NC \NC local \NC 122 \NC 57 \NC 46 \NC \NR +\NC indirectlua \NC normal \NC 166 \NC 63 \NC 45 \NC \NR +\NC \NC local \NC 121 \NC 56 \NC 45 \NC \NR +\NC indirectluacall \NC normal \NC 165 \NC 66 \NC 48 \NC \NR +\NC \NC local \NC 120 \NC 60 \NC 47 \NC \NR +\HL +\stoptabulate + +The results are somewhat disappoint but not that unexpected. We do see a speedup +with \LUAJITTEX\ and in this case even jitting makes sense. However in a regular +typesetting run jitting will never catch up with the costs it carries for the +overall process. The indirect call is somewhat faster than the direct call. +Possible reasons are that hashing at the \LUA\ end also costs time and the +100.000 calls from \TEX\ to \LUA\ is not that big a burden. The indirect call is +therefore also not much faster because it has some additional parsing overhead at +the \TEX\ end. That one only speeds up when we pass arguments and even then not +always the same amount. It is therefore mostly a convenience feature. + +We left one aspect out and that is garbage collection. It might be that in large +runs less loading has a positive impact on collecting garbage. We also need to +keep in mind that careful application can have some real impact. Take the +following example of \CONTEXT\ code: + +\startntyping +\dorecurse {1000} { + + \startsection[title=section #1] + + \startitemize[n,columns] + \startitem test \stopitem + \startitem test \stopitem + \startitem test \stopitem + \startitem test \stopitem + \stopitemize + + \starttabulate[|l|p|] + \NC test \NC test \NC \NR + \NC test \NC test \NC \NR + \NC test \NC test \NC \NR + \stoptabulate + + test {\setfontfeature{smallcaps} abc} test + test {\setfontfeature{smallcaps} abc} test + test {\setfontfeature{smallcaps} abc} test + test {\setfontfeature{smallcaps} abc} test + test {\setfontfeature{smallcaps} abc} test + test {\setfontfeature{smallcaps} abc} test + + \framed[align={lohi,middle}]{test} + + \startembeddedxtable + \startxrow \startxcell x \stopxcell \startxcell x \stopxcell \stopxrow + \startxrow \startxcell x \stopxcell \startxcell x \stopxcell \stopxrow + \startxrow \startxcell x \stopxcell \startxcell x \stopxcell \stopxrow + \startxrow \startxcell x \stopxcell \startxcell x \stopxcell \stopxrow + \startxrow \startxcell x \stopxcell \startxcell x \stopxcell \stopxrow + \stopembeddedxtable + + \stopsection + + \page + +} +\stopntyping + +These macros happen to use mechanism that are candidates for indirectness. +However, it doesn't happen often you you process thousands of pages with mostly +tables and smallcaps (although tabular digits are a rather valid font feature in +tables). For instance, in web services squeezing out a few tens of seconds might +make sense if there is a large queue of documents. + +\starttabulate[|lT|cT|cT|cT|] +\HL +\NC \NC \LUATEX \NC \LUAJITTEX \NC \LUAJITTEX\ + jit \NC \NR +\HL +\NC direct \NC 19.1 \NC 15.9 \NC 15.8 \NC \NR +\NC indirect \NC 18.0 \NC 15.2 \NC 15.0 \NC \NR +\HL +\stoptabulate + +Surprisingly, even jitting helps a bit here. Maybe it relates the the number of +pages and the amount of calls but we didn't investigate this. By default jitting +is off anyway. The impact of indirectness is more than in previous examples. + +For this test a file was loaded that redefines some core \CONTEXT\ code. This +also has some overhead which means that numbers for the indirect case will be +somewhat better if we decide to use these mechanisms in the core code. It is +tempting to do that but it involves some work and it's always the question if a +week of experimenting and coding will ever be compensated by less. After all, in +this last test, a speed of 50 pages per second is not that bad a performance. + +When looking at these numbers, keep in mind that it is still not clear if we end +up using this functionality, and when \CONTEXT\ will use it, it might be in a way +that gives better or worse timings than mentioned above. For instance, storing \LUA\ +code in the format is possible, but these implementations force us to serialize +the \type {lua.calls} mechanism and initialize them after format loading. For that +reason alone, a more native solution is better. + +\stopsection + +\startsection[title=Exploration] + +In the early days of \LUATEX\ Taco and I discussed an approach similar do +registers which means that there is some \type {\...def} command available. The +biggest challenge there is to come up with a decent way to define the arguments. +On the one hand, using a hash syntax is natural to \TEX, but using names is more +natural to \LUA. So, when we picked up that thread, solutions like this came up +in a Skype session with Taco: + +\starttyping +\luadef\myfunction#1#2{ tex.print(arg[1]+arg[2]) } +\stoptyping + +The \LUA\ snippet becomes a function with this body: + +\starttyping +local arg = { #1, #2 } -- can be preallocated and reused +-- the body as defined at the tex end +tex.print(arg[1]+arg[2]) +\stoptyping + +Where \type {arg} is set each time. As we wrapped it in a function we can +also put the arguments on the stack and use: + +\starttyping +\luadef\myfunction#1#2{ tex.print((select(1,...))+(select(2,...)) } +\stoptyping + +Given that we can make select work this way (either or not by additional +wrapping). Anyway, both these solutions are ugly and so we need to look further. +Also, the \type {arg} variant mandates building a table. So, a natural next +iteration is: + +\starttyping +\luadef\myfunction a b { tex.print(a+b) } +\stoptyping + +Here it becomes already more natural: + +\starttyping +local a = #1 +local b = #2 +-- the body as defined at the tex end +tex.print(a+b) +\stoptyping + +But, as we don't want to reload the body we need to push \type {#1} into the +closure. This is a more static definition equivalent: + +\starttyping +local a = select(1,...) +local b = select(2,...) +tex.print(a+b) +\stoptyping + +Keep in mind that we are not talking of some template that gets filled in and +loaded, but about precompiled functions! So, a \type {#1} is not really put there +but somehow pushed into the closure (we know the stack offsets). + +Yet another issue is more direct alias. Say that we define a function at the +\LUA\ end and want to access it using this kind of interface. + +\starttyping +function foo(a,b) + tex.print(a+b) +end +\stoptyping + +Given that we have something: + +\starttyping +\luadef \myfunctiona a b { tex.print(a+b) } +\stoptyping + +We can consider: + +\starttyping +\luaref \myfunctionb 2 {foo} +\stoptyping + +The explicit number is debatable as it can be interesting to permit +an arbitrary number of arguments here. + +\starttyping +\myfunctiona{1}{2} +\myfunctionb{1}{2} +\stoptyping + +So, if we go for: + +\starttyping +\luaref \myfunctionb {foo} +\stoptyping + +we can use \type {\relax} as terminator: + +\starttyping +\myfunctiona{1}{2} +\myfunctionb{1}{2}\relax +\stoptyping + +In fact, the call method discussed in a previous section can be used here as well +as it permits less arguments as well as mixed types. Think of this: + +\starttyping +\luadef \myfunctiona a b c { tex.print(a or 0 + b or 0 + c or 0) } +\luaref \myfunctionb {foo} +\stoptyping + +with + +\starttyping +function foo(a,b,c) + tex.print(a or 0 + b or 0 + c or 0) +end +\stoptyping + +This could be all be valid: + +\starttyping +\myfunctiona{1}{2}{3]\relax +\myfunctiona{1}\relax +\myfunctionb{1}{2}\relax +\stoptyping + +or (as in practice we want numbers): + +\starttyping +\myfunctiona 1 \scratchcounter 3\relax +\myfunctiona 1 \relax +\myfunctionb 1 2 \relax +\stoptyping + +We basicaly get optional arguments for free, as long as we deal with it properly +at the \LUA\ end. The only condition with the \type {\luadef} case is that there +can be no more than the given number of arguments, because that's how the function +body gets initialized set up. In practice this is quite okay. + +% After this exploration we can move on to the final implementation and see what we +% ended up with. + +\stopsection + +% \startsection[title=The final implementation] +% {\em todo} +% \stopsection + +\startsection[title=The follow up] + +We don't know what eventually will happen with \LUATEX. We might even (at least +in \CONTEXT) stick to the current approach because there not much to gain in +terms of speed, convenience and (most of all) beauty. + +{\em Note:} In \LUATEX\ 0.79 onward \type {\indirectlua} has been implemented as +\type {\luafunction} and the \type {lua.calls} table is available as \type +{lua.get_functions_table()}. A decent token parser has been discussed at the +\CONTEXT\ 2013 conference and will show up in due time. In addition, so called +\type {latelua} nodes support function assignments and \type {user} nodes support +a field for \LUA\ values. Additional information can be associated with any nodes +using the properties subsystem. + +\stopsection + +\stopchapter + +\stopcomponent |