summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/about/about-nuts.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/context/sources/general/manuals/about/about-nuts.tex')
-rw-r--r--doc/context/sources/general/manuals/about/about-nuts.tex619
1 files changed, 619 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/about/about-nuts.tex b/doc/context/sources/general/manuals/about/about-nuts.tex
new file mode 100644
index 000000000..9ca1ba345
--- /dev/null
+++ b/doc/context/sources/general/manuals/about/about-nuts.tex
@@ -0,0 +1,619 @@
+% language=uk
+
+\startcomponent about-calls
+
+\environment about-environment
+
+\startchapter[title={Going nuts}]
+
+\startsection[title=Introduction]
+
+This is not the first story about speed and it will probably not be the last one
+either. This time we discuss a substantial speedup: upto 50\% with \LUAJITTEX.
+So, if you don't want to read further at least know that this speedup came at the
+cost of lots of testing and adapting code. Of course you could be one of those
+users who doesn't care about that and it may also be that your documents don't
+qualify at all.
+
+Often when I see a kid playing a modern computer game, I wonder how it gets done:
+all that high speed rendering, complex environments, shading, lightning,
+inter||player communication, many frames per second, adapted story lines,
+\unknown. Apart from clever programming, quite some of the work gets done by
+multiple cores working together, but above all the graphics and physics
+processors take much of the workload. The market has driven the development of
+this hardware and with success. In this perspective it's not that much of a
+surprise that complex \TEX\ jobs still take some time to get finished: all the
+hard work has to be done by interpreted languages using rather traditional
+hardware. Of course all kind of clever tricks make processors perform better than
+years ago, but still: we don't get much help from specialized hardware. \footnote
+{Apart from proper rendering on screen and printing on paper.} We're sort of
+stuck: when I replaced my 6 year old laptop (when I buy one, I always buy the
+fastest one possible) for a new one (so again a fast one) the gain in speed of
+processing a document was less than twice. The many times faster graphic
+capabilities are not of much help there, not is twice the amount of cores.
+
+So, if we ever want to go much faster, we need to improve the software. The
+reason for trying to speed up \MKIV\ has been mentioned before, but let's
+summarize it here:
+
+\startitemize
+
+\startitem
+ There was a time when users complained about the speed of \CONTEXT,
+ especially compared to other macro packages. I'm not so sure if this is still
+ a valid complaint, but I do my best to avoid bottlenecks and much time goes
+ into testing efficiency.
+\stopitem
+
+\startitem
+ Computers don't get that much faster, at least we don't see an impressive
+ boost each year any more. We might even see a slowdown when battery live
+ dominates: more cores at a lower speed seems to be a trend and that doesn't
+ suit current \TEX\ engines well. Of course we assume that \TEX\ will be
+ around for some time.
+\stopitem
+
+\startitem
+ Especially in automated workflows where multiple products each demanding a
+ couple of runs are produced speed pays back in terms of resources and
+ response time. Of course the time invested in the speedup is never regained
+ by ourselves, but we hope that users appreciate it.
+\stopitem
+
+\startitem
+ The more we do in \LUA, read: the more demanding users get and the more
+ functionality is enabled, the more we need to squeeze out of the processor.
+ And we want to do more in \LUA\ in order to get better typeset results.
+\stopitem
+
+\startitem
+ Although \LUA\ is pretty fast, future versions might be slower. So, the more
+ efficient we are, the less we probably suffer from changes.
+\stopitem
+
+\startitem
+ Using more complex scripts and fonts is so demanding that the number of pages
+ per second drops dramatically. Personally I consider a rate of 15 pps with
+ \LUATEX\ or 20 pps with \LUAJITTEX\ reasonable minima on my laptop. \footnote
+ {A Dell 6700 laptop with Core i7 3840QM, 16 GB memory and SSD, running 64 bit
+ Windows 8.}
+\stopitem
+
+\startitem
+ Among the reasons why \LUAJIT\ jitting does not help us much is that (at
+ least in \CONTEXT) we don't use that many core functions that qualify for
+ jitting. Also, as runs are limited in time and much code kicks in only a few
+ times the analysis and compilation doesn't pay back in runtime. So we cannot
+ simply sit down and wait till matters improve.
+\stopitem
+
+\stopitemize
+
+Luigi Scarso and I have been exploring several options, with \LUATEX\ as well as
+\LUAJITTEX. We observed that the virtual machine in \LUAJITTEX\ is much faster so
+that engine already gives a boots. The advertised jit feature can best be
+disabled as it slows down a run noticeably. We played with \type {ffi} as well,
+but there is additional overhead involved (\type {cdata}) as well as limited
+support for userdata, so we can forget about that too. \footnote {As we've now
+introduced getters we can construct a metatable at the \LUA\ end as that is what
+\type {ffi} likes most. But even then, we don't expect much from it: the four
+times slow down that experiments showed will not magically become a large gain.}
+Nevertheless, the twice as fast virtual machine of \LUAJIT\ is a real blessing,
+especially if you take into account that \CONTEXT\ spends quite some time in
+\LUA. We're also looking forward to the announced improved garbage collector of
+\LUAJIT.
+
+In the end we started looking at \LUATEX\ itself. What can be gained there,
+within the constraints of not having to completely redesign existing
+(\CONTEXT) \LUA\ code? \footnote {In the end a substantial change was needed but
+only in accessing node properties. The nice thing about C is that there macros
+often provide a level of abstraction which means that a similar adaption of \TEX\
+source code would be more convenient.}
+
+\stopsection
+
+\startsection[title={Two access models}]
+
+Because the \CONTEXT\ code is reasonably well optimized already, the only option
+is to look into \LUATEX\ itself. We had played with the \TEX||\LUA\ interface
+already and came to the conclusion that some runtime could be gained there. On
+the long run it adds up but it's not too impressive; these extensions are
+awaiting integration. Tracing and bechmarking as well as some quick and dirty
+patches demonstrated that there were two bottlenecks in accessing fields in
+nodes: checking (comparing the metatables) and constructing results (userdata
+with metatable).
+
+In case you're infamiliar with the concept this is how nodes work. There is an
+abstract object called node that is in \LUA\ qualified as user data. This object
+contains a pointer to \TEX's node memory. \footnote {The traditional \TEX\ node
+memory manager is used, but at some point we might change to regular C
+(de)allocation. This might be slower but has some advantages too.} As it is real
+user data (not so called light) it also carries a metatable. In the metatble
+methods are defined and one of them is the indexer. So when you say this:
+
+\starttyping
+local nn = n.next
+\stoptyping
+
+given that \type {n} is a node (userdata) the \type {next} key is resolved up
+using the \type {__index} metatable value, in our case a function. So, in fact,
+there is no \type {next} field: it's kind of virtual. The index function that
+gets the relevant data from node memory is a fast operation: after determining
+the kind of node, the requested field is located. The return value can be a
+number, for instance when we ask for \type {width}, which is also fast to return.
+But it can also be a node, as is the case with \type {next}, an then we need to
+allocate a new userdata object (memory management overhead) and a metatable has
+to be associated. And that comes at a cost.
+
+In a previous update we had already optimized the main \type {__index} function
+but felt that some more was possible. For instance we can avoid the lookup of the
+metatable for the returned node(s). And, if we don't use indexed access but a
+instead a function for frequently accessed fields we can sometimes gain a bit too.
+
+A logical next step was to avoid some checking, which is okay given that one pays
+a bit attention to coding. So, we provided a special table with some accessors of
+frequently used fields. We actually implemented this as a so called \quote {fast}
+access model, and adapted part of the \CONTEXT\ code to this, as we wanted to see
+if it made sense. We were able to gain 5 to 10\% which is nice but still not
+impressive. In fact, we concluded that for the average run using fast was indeed
+faster but not enough to justify rewriting code to the (often) less nice looking
+faster access. A nice side effect of the recoding was that I can add more advanced
+profiling.
+
+But, in the process we ran into another possibility: use accessors exclusively
+and avoiding userdata by passing around references to \TEX\ node memory directly.
+As internally nodes can be represented by numbers, we ended up with numbers, but
+future versions might use light userdata instead to carry pointers around. Light
+userdata is cheap basic object with no garbage collection involved. We tagged
+this method \quote {direct} and one can best treat the values that gets passed
+around as abstract entities (in \MKIV\ we call this special view on nodes
+\quote {nuts}).
+
+So let's summarize this in code. Say that we want to know the next node of
+\type {n}:
+
+\starttyping
+local nn = n.next
+\stoptyping
+
+Here \type {__index} will be resolved and the associated function be called. We
+can avoid that lookup by applying the \type {__index} method directly (after all,
+that one assumes a userdata node):
+
+\starttyping
+local getfield = getmetatable(n).__index
+
+local nn = getfield(n,"next") -- userdata
+\stoptyping
+
+But this is not a recomended interface for regular users. A normal helper that
+does checking is as about fast as the indexed method:
+
+\starttyping
+local getfield = node.getfield
+
+local nn = getfield(n,"next") -- userdata
+\stoptyping
+
+So, we can use indexes as well as getters mixed and both perform more of less
+equal. A dedicated getter is somewhat more efficient:
+
+\starttyping
+local getnext = node.getnext
+
+local nn = getnext(n) -- userdata
+\stoptyping
+
+If we forget about checking, we can go fast, in fact the nicely interfaced \type
+{__index} is the fast one.
+
+\starttyping
+local getfield = node.fast.getfield
+
+local nn = getfield(n,"next") -- userdata
+\stoptyping
+
+Even more efficient is the following as that one knows already what to fetch:
+
+\starttyping
+local getnext = node.fast.getnext
+
+local nn = getnext(n) -- userdata
+\stoptyping
+
+The next step, away from userdata was:
+
+\starttyping
+local getfield = node.direct.getfield
+
+local nn = getfield(n,"next") -- abstraction
+\stoptyping
+
+and:
+
+\starttyping
+local getnext = node.direct.getnext
+
+local nn = getnext(n) -- abstraction
+\stoptyping
+
+Because we considered three variants a bit too much and because \type {fast} was
+only 5 to 10\% faster in extreme cases, we decided to drop that experimental code
+and stick to providing accessors in the node namespace as well as direct variants
+for critical cases.
+
+Before you start thinking: \quote {should I rewrite all my code?} think twice!
+First of all, \type {n.next} is quite fast and switching between the normal and
+direct model also has some cost. So, unless you also adapt all your personal
+helper code or provide two variants of each, it only makes sense to use direct
+mode in critical situations. Userdata mode is much more convenient when
+developing code and only when you have millions of access you can gain by direct
+mode. And even then, if the time spent in \LUA\ is small compared to the time
+spent in \TEX\ it might not even be noticeable. The main reason we made direct
+variants is that it does pay of in \OPENTYPE\ font processing where complex
+scripts can result in many millions of calls indeed. And that code will be set up
+in such a way that it will use userdata by default and only in well controlled
+case (like \MKIV) we will use direct mode. \footnote {When we are confident
+that \type {direct} node code is stable we can consider going direct in generic
+code as well, although we need to make sure that third party code keeps working.}
+
+Another thing to keep in mind is that when you provide hooks for users you should
+assume that they use the regular mode so you need to cast the plugins onto direct
+mode then. Because the idea is that one should be able to swap normal functions
+by direct ones (which of course is only possible when no indexes are used) all
+relevant function in the \type {node} namespace are available in \type {direct}
+as well. This means that the following code is rather neutral:
+
+\starttyping
+local x = node -- or: x = node.direct
+
+for n in x.traverse(head) do
+ if x.getid(n) == node.id("glyph") and x.getchar(n) == 0x123 then
+ x.setfield(n,"char",0x456)
+ end
+end
+\stoptyping
+
+Of course one needs to make sure that \type {head} fits the model. For this you
+can use the cast functions:
+
+\starttyping
+node.direct.todirect(node or direct)
+node.direct.tonode(direct or node)
+\stoptyping
+
+These helpers are flexible enough to deal with either model. Aliasing the
+functions to locals is of course more efficient when a large number of calls
+happens (when you use \LUAJITTEX\ it will do some of that for you automatically).
+Of course, normally we use a more natural variant, using an id traverser:
+
+\starttyping
+for n in node.traverse_id(head,node.id("glyph")) do
+ if n.char == 0x123 then
+ n.char = 0x456
+ end
+end
+\stoptyping
+
+This is not that much slower, especially when it's only ran once. Just count the
+number of characters on a page (or in your document) and you will see that it's
+hard to come up with that many calls. Of course, processing many pages of Arabic
+using a mature font with many features enabled and contextual lookups, you do run
+into quantities. Tens of features times tens of contextual lookup passes can add
+up considerably. In Latin scripts you never reach such numbers, unless you use
+fonts like Zapfino.
+
+\stopsection
+
+\startsection[title={The transition}]
+
+After weeks of testing, rewriting, skyping, compiling and making decisions, we
+reached a more or less stable situation. At that point we were faced with a
+speedup that gave us a good feeling, but transition to the faster variant has a
+few consequences.
+
+\startitemize
+
+\startitem We need to use an adapted code base: indexes are to be replaced by
+function calls. This is a tedious job that can endanger stability so it has to be
+done with care. \footnote {The reverse is easier, as converting getters and
+setters to indexed is a rather simple conversion, while for instance changing
+type {.next} into a \type {getnext} needs more checking because that key is not
+unique to nodes.} \stopitem
+
+\startitem When using an old engine with the new \MKIV\ code, this approach will
+result in a somewhat slower run. Most users will probably accept a temporary
+slowdown of 10\%, so we might take this intermediate step. \stopitem
+
+\startitem When the regular getters and setters become available we get back to
+normal. Keep in mind that these accessors do some checking on arguments so that
+slows down to the level of using indexes. On the other hand, the dedicated ones
+(like \type {getnext}) are more efficient so there we gain. \stopitem
+
+\startitem As soon as direct becomes available we suddenly see a boost in speed.
+In documents of average complexity this is 10-20\% and when we use more complex
+scripts and fonts it can go up to 40\%. Here we assume that the macro package
+spends at least 50\% of its time in \LUA. \stopitem
+
+\stopitemize
+
+If we take the extremes: traditional indexed on the one hand versus optimized
+direct in \LUAJITTEX, a 50\% gain compared to the old methods is feasible.
+Because we also retrofitted some fast code into the regular accessor, indexed
+mode should also be somewhat faster compared to the older engine.
+
+In addition to the already provide helpers in the \type {node} namespace, we
+added the following:
+
+\starttabulate[|Tl|p|]
+\HL
+\NC getnext \NC this one is used a lot when analyzing and processing node lists \NC \NR
+\NC getprev \NC this one is used less often but fits in well (companion to \type {getnext}) \NC \NR
+\NC getfield \NC this is the general accessor, in userdata mode as fast as indexed \NC \NR
+\HL
+\NC getid \NC one of the most frequent called getters when parsing node lists \NC \NR
+\NC getsubtype \NC especially in fonts handling this getter gets used \NC \NR
+\HL
+\NC getfont \NC especially in complex font handling this is a favourite \NC \NR
+\NC getchar \NC as is this one \NC \NR
+\HL
+\NC getlist \NC we often want to recurse into hlists and vlists and this helps \NC \NR
+\NC getleader \NC and also often need to check if glue has leader specification (like list) \NC \NR
+\HL
+\NC setfield \NC we have just one setter as setting is less critical \NC \NR
+\HL
+\stoptabulate
+
+As \type {getfield} and \type {setfield} are just variants on indexed access, you
+can also use them to access attributes. Just pass a number as key. In the \type
+{direct} namespace, helpers like \type {insert_before} also deal with direct
+nodes.
+
+We currently only provide \type {setfield} because setting happens less than
+getting. Of course you can construct nodelists at the \LUA\ end but it doesn't
+add up that fast and indexed access is then probably as efficient. One reason why
+setters are less an issue is that they don't return nodes so no userdata overhead
+is involved. We could (and might) provide \type {setnext} and \type {setprev},
+although, when you construct lists at the \LUA\ end you will probably use the
+type {insert_after} helper anyway.
+
+\stopsection
+
+\startsection[title={Observations}]
+
+So how do these variants perform? As we no longer have \type {fast} in the engine
+that I use for this text, we can only check \type {getfield} where we can simulate
+fast mode with calling the \type{__index} metamethod. In practice the \type
+{getnext} helper will be somewhat faster because no key has to be checked,
+although the \type {getfield} functions have been optimized according to the
+frequencies of accessed keys already.
+
+\starttabulate
+\NC node[*] \NC 0.516 \NC \NR
+\NC node.fast.getfield \NC 0.616 \NC \NR
+\NC node.getfield \NC 0.494 \NC \NR
+\NC node.direct.getfield \NC 0.172 \NC \NR
+\stoptabulate
+
+Here we simulate a dumb 20 times node count of 200 paragraphs \type {tufte.tex}
+with a little bit of overhead for wrapping in functions. \footnote {When
+typesetting Arabic or using complex fonts we quickly get a tenfold.} We encounter
+over three million nodes this way. We average a couple or runs.
+
+\starttyping
+local function check(current)
+ local n = 0
+ while current do
+ n = n + 1
+ current = getfield(current,"next") -- current = current.next
+ end
+ return n
+end
+\stoptyping
+
+What we see here is that indexed access is quite okay given the amount of nodes,
+but that direct is much faster. Of course we will never see that gain in practice
+because much more happens than counting and because we also spend time in \TEX.
+The 300\% speedup will eventually go down to one tenth of that.
+
+Because \CONTEXT\ avoids node list processing when possible the baseline
+performance is not influenced much.
+
+\starttyping
+\starttext \dorecurse{1000}{test\page} \stoptext
+\stoptyping
+
+With \LUATEX\ we get some 575 pages per second and with \LUAJITTEX\ more than 610
+pages per second.
+
+\starttyping
+\setupbodyfont[pagella]
+
+\edef\zapf{\cldcontext
+ {context(io.loaddata(resolvers.findfile("zapf.tex")))}}
+
+\starttext \dorecurse{1000}{\zapf\par} \stoptext
+\stoptyping
+
+For this test \LUATEX\ needs 3.9 seconds and runs at 54 pages per second, while
+\LUAJITTEX\ needs only 2.3 seconds and gives us 93 pages per second.
+
+Just for the record, if we run this:
+
+\starttyping
+\starttext
+\stoptext
+\stoptyping
+
+a \LUATEX\ runs takes 0.229 seconds and a \LUAJITTEX\ run 0.178 seconds. This includes
+initializing fonts. If we run just this:
+
+\starttyping
+\stoptext
+\stoptyping
+
+\LUATEX\ needs 0.199 seconds and \LUAJITTEX\ only 0.082 seconds. So, in the
+meantime, we hardly spend any time on startup. Launching the binary and managing
+the job with \type {mtxrun} calling \type {mtx-context} adds 0.160 seconds
+overhead. Of course this is only true when you have already ran \CONTEXT\ once as
+the operating system normally caches files (in our case format files and fonts).
+This means that by now an edit|-|preview cycle is quite convenient. \footnote {I
+use \SCITE\ with dedicated lexers as editor and currently \type {sumatrapdf} as
+previewer.}
+
+As a more practical test we used the current version of \type {fonts-mkiv} (166
+pages, using all kind of font tricks and tracing), \type {about} (60 pages, quite
+some traced math) and a torture test of Arabic text (61 pages dense text). The
+following measurements are from 2013-07-05 after adapting some 50 files to the
+new model. Keep in mind that the old binary can fake a fast getfield and setfield
+but that the other getters are wrapped functions. The more we have, the slower it
+gets. We used the mingw versions.
+
+\starttabulate[|l|r|r|r|]
+\HL
+\NC version \NC fonts \NC about \NC arabic \NC \NR
+\HL
+\NC old mingw, indexed plus some functions \NC 8.9 \NC 3.2 \NC 20.3 \NC \NR
+\NC old mingw, fake functions \NC 9.9 \NC 3.5 \NC 27.4 \NC \NR
+\HL
+\NC new mingw, node functions \NC 9.0 \NC 3.1 \NC 20.8 \NC \NR
+\NC new mingw, indexed plus some functions \NC 8.6 \NC 3.1 \NC 19.6 \NC \NR
+\NC new mingw, direct functions \NC 7.5 \NC 2.6 \NC 14.4 \NC \NR
+\HL
+\stoptabulate
+
+The second row shows what happens when we use the adapted \CONTEXT\ code with an
+older binary. We're slower. The last row is what we will have eventually. All
+documents show a nice gain in speed and future extensions to \CONTEXT\ will no
+longer have the same impact as before. This is because what we here see also
+includes \TEX\ activity. The 300\% increase of speed of node access makes node
+processing less influential. On the average we gain 25\% here and as on these
+documents \LUAJITTEX\ gives us some 40\% gain on indexed access, it gives more
+than 50\% on the direct function based variant.
+
+In the fonts manual some 25 million getter accesses happen while the setters
+don't exceed one million. I lost the tracing files but at some point the Arabic
+test showed more than 100 millions accesses. So it's save to conclude that
+setters are sort of neglectable. In the fonts manual the amount of accesses to
+the previous node were less that 5000 while the id and next fields were the clear
+winners and list and leader fields also scored high. Of course it all depends on
+the kind of document and features used, but we think that the current set of
+helpers is quite adequate. And because we decided to provide that for normal
+nodes as well, there is no need to go direct for more simple cases.
+
+Maybe in the future further tracing might show that adding getters for width,
+height, depth and other properties of glyph, glue, kern, penalty, rule, hlist and
+vlist nodes can be of help, but quite probably only in direct mode combined with
+extensive list manipulations. We will definitely explore other getters but only
+after the current set has proven to be useful.
+
+\stopsection
+
+\startsection[title={Nuts}]
+
+So why going nuts and what are nuts? In Dutch \quote {node} sounds a bit like
+\quote {noot} and translates back to \quote {nut}. And as in \CONTEXT\ I needed
+word for these direct nodes they became \quote {nuts}. It also suits this
+project: at some point we're going nuts because we could squeeze more out
+of \LUAJITTEX, so we start looking at other options. And we're sure some folks
+consider us being nuts anyway, because we spend time on speeding up. And adapting
+the \LUATEX\ and \CONTEXT\ \MKIV\ code mid||summer is also kind of nuts.
+
+At the \CONTEXT\ 2013 conference we will present this new magic and about that
+time we've done enough tests to see if it works our well. The \LUATEX\ engine
+will provide the new helpers but they will stay experimental for a while as one
+never knows where we messed up.
+
+I end with another measurement set. Every now and and then I play with a \LUA\
+variant of the \TEX\ par builder. At some point it will show up on \MKIV\ but
+first I want to abstract it a bit more and provide some hooks. In order to test
+the performance I use the following tests:
+
+% \testfeatureonce{1000}{\tufte \par}
+
+\starttyping
+\testfeatureonce{1000}{\setbox0\hbox{\tufte}}
+
+\testfeatureonce{1000}{\setbox0\vbox{\tufte}}
+
+\startparbuilder[basic]
+ \testfeatureonce{1000}{\setbox0\vbox{\tufte}}
+\stopparbuilder
+\stoptyping
+
+We use a \type {\hbox} to determine the baseline performance. Then we break lines
+using the built|-|in parbuilder. Next we do the same but now with the \LUA\
+variant. \footnote {If we also enable protrusion and hz the \LUA\ variant suffers
+less because it implements this more efficient.}
+
+\starttabulate[|l|l|l|l|l|]
+\HL
+\NC \NC \bf \rlap{luatex} \NC \NC \bf \rlap{luajittex} \NC \NC \NR
+\HL
+\NC \NC \bf total \NC \bf linebreak \NC \bf total \NC \bf linebreak \NC \NR
+\HL
+\NC 223 pp nodes \NC 5.67 \NC 2.25 flushing \NC 3.64 \NC 1.58 flushing \NC \NR
+\HL
+\NC hbox nodes \NC 3.42 \NC \NC 2.06 \NC \NC \NR
+\NC vbox nodes \NC 3.63 \NC 0.21 baseline \NC 2.27 \NC 0.21 baseline \NC \NR
+\NC vbox lua nodes \NC 7.38 \NC 3.96 \NC 3.95 \NC 1.89 \NC \NR
+\HL
+\NC 223 pp nuts \NC 4.07 \NC 1.62 flushing \NC 2.36 \NC 1.11 flushing \NC \NR
+\HL
+\NC hbox nuts \NC 2.45 \NC \NC 1.25 \NC \NC \NR
+\NC vbox nuts \NC 2.53 \NC 0.08 baseline \NC 1.30 \NC 0.05 baseline \NC \NR
+\NC vbox lua nodes \NC 6.16 \NC 3.71 \NC 3.03 \NC 1.78 \NC \NR
+\NC vbox lua nuts \NC 5.45 \NC 3.00 \NC 2.47 \NC 1.22 \NC \NR
+\HL
+\stoptabulate
+
+We see that on this test nuts have an advantage over nodes. In this case we
+mostly measure simple font processing and there is no markup involved. Even a 223
+page document with only simple paragraphs needs to be broken across pages,
+wrapped in page ornaments and shipped out. The overhead tagged as \quote
+{flushed} indicates how much extra time would have been involved in that. These
+numbers demonstrate that with nuts the \LUA\ parbuilder is performing 10\% better
+so we gain some. In a regular document only part of the processing involves
+paragraph building so switching to a \LUA\ variant has no big impact anyway,
+unless we have simple documents (like novels). When we bring hz into the picture
+performance will drop (and users occasionally report this) but here we already
+found out that this is mostly an implementation issue: the \LUA\ variant suffers
+less so we will backport some of the improvements. \footnote {There are still
+some aspects that can be approved. For instance these tests still checks lists
+for \type {prev} fields, something that is not needed in future versions.}
+
+\stopsection
+
+\startsection[title={\LUA\ 5.3}]
+
+When we were working on this the first working version of \LUA\ 5.3 was
+announced. Apart from some minor changes that won't affect us, the most important
+change is the introduction of integers deep down. On the one hand we can benefit
+from this, given that we adapt the \TEX|-|\LUA\ interfaces a bit: the distinction
+between \type {to_number} and \type {to_integer} for instance. And, numbers are
+always somewhat special in \TEX\ as it relates to reproduction on different
+architectures, also over time. There are some changes in conversion to string
+(needs attention) and maybe at some time also in the automated casting from
+strings to numbers (the last is no big deal for us).
+
+On the one hand the integers might have a positive influence on performance
+especially as scaled points are integers and because fonts use them too (maybe
+there is some advantage in memory usage). But we also need a proper efficient
+round function (or operator) then. I'm wondering if mixed integer and float usage
+will be efficient, but on the the other hand we do not that many calculations so
+the benefits might outperform the drawbacks.
+
+We noticed that 5.2 was somewhat faster but that the experimental generational
+garbage collecter makes runs slower. Let's hope that the garbage collector
+performance doesn't degrade. But the relative gain of node versus direct will
+probably stay.
+
+Because we already have an experimental setup we will probably experiment a bit
+with this in the future. Of course the question then is how \LUAJITTEX\ will work
+out, because it is already not 5.2 compatible it has to be seen if it will
+support the next level. At least in \CONTEXT\ \MKIV\ we can prepare ourselves as
+we did with \LUA\ 5.2 so that we're ready when we follow up.
+
+\stopsection
+
+\stopchapter