summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/hybrid/hybrid-jit.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/context/sources/general/manuals/hybrid/hybrid-jit.tex')
-rw-r--r--doc/context/sources/general/manuals/hybrid/hybrid-jit.tex653
1 files changed, 653 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/hybrid/hybrid-jit.tex b/doc/context/sources/general/manuals/hybrid/hybrid-jit.tex
new file mode 100644
index 000000000..d769ccf80
--- /dev/null
+++ b/doc/context/sources/general/manuals/hybrid/hybrid-jit.tex
@@ -0,0 +1,653 @@
+% language=uk engine=luatex
+
+\startcomponent hybrid-backends
+
+\environment hybrid-environment
+
+\logo[SWIGLIB] {SwigLib}
+\logo[LUAJIT] {LuaJIT}
+\logo[LUAJITTEX]{Luajit\TeX}
+\logo[JIT] {jit}
+
+\startchapter[title={Just in time}]
+
+\startsection [title={Introduction}]
+
+Reading occasional announcements about \LUAJIT,\footnote {\LUAJIT\ is written by
+Mike Pall and more information about it and the technology it uses is at \type
+{http://luajit.org}, a site also worth visiting for its clean design.} one starts
+wondering if just||in||time compilation can speed up \LUATEX. As a side track of
+the \SWIGLIB\ project and after some discussion, Luigi Scarso decided to compile
+a version of \LUATEX\ that had the \JIT\ compiler as the \LUA\ engine. That's
+when our journey into \JIT\ began.
+
+We started with \LINUX\ 32-bit as this is what Luigi used at that time. Some
+quick first tests indicated that the \LUAJIT\ compiler made \CONTEXT\ \MKIV\ run
+faster but not that much. Because \LUAJIT\ claims to be much faster than stock
+\LUA, Luigi then played a bit with \type {ffi}, i.e.\ mixing \CCODE\ and \LUA,
+especially data structures. There is indeed quite some speed to gain here;
+unfortunately, we would have to mess up the \CONTEXT\ code base so much that one
+might wonder why \LUA\ was used in the first place. I could confirm these
+observations in a Xubuntu virtual machine in \VMWARE\ running under 32-bit
+Windows 8. So, we decided to conduct some more experiments.
+
+A next step was to create a 64-bit binary because the servers at \PRAGMA\ are
+\KVM\ virtual machines running a 64-bit OpenSuse 12.1 and 12.2. It took a bit of
+effort to get a \JIT\ version compiled because Luigi didn't want to mess up the
+regular codebase too much. This time we observed a speedup of about 40\% on some
+runs so we decided to move on to \WINDOWS\ to see if we could observe a similar
+effect there. And indeed, when we adapted Akira Kakuto's \WINDOWS\ setup a bit we
+could compile a version for \WINDOWS\ using the native \MICROSOFT\ compiler. On
+my laptop a similar speedup was observed, although by then we saw that in
+practice a 25\% speedup was about what we could expect. A bonus is that making
+formats and identifying fonts is also faster.
+
+So, in that stage, we could safely conclude that \LUATEX\ combined with \LUAJIT\
+made sense if you want a somewhat faster version. But where does the speedup come
+from? The easiest way to see if jitting has effect is to turn it on and off.
+
+\starttyping
+jit.on()
+jit.off()
+\stoptyping
+
+To our surprise \CONTEXT\ runs are not much influenced by turning the jitter on
+or off. \footnote {We also tweaked some of the fine|-|tuning parameters of
+\LUAJIT\ but didn't notice any differences. In due time more tests will
+be done.} This means that the improvement comes from other places:
+
+\startitemize[packed,n]
+\startitem The virtual machine is a different one, and targets the platforms that
+it runs on. This means that regular bytecode also runs faster. \stopitem
+\startitem The garbage collector is the one from \LUA\ 5.2, so that can make a
+difference. It looks like memory consumption is somewhat lower. \stopitem
+\startitem Some standard library functions are recognized and supported in a more
+efficient way. Think of \type {math.sin}. \stopitem
+\startitem Some built-in functions like \type {type} are probably dealt with in
+a more efficient way. \stopitem
+\stopitemize
+
+The third item is an important one. We don't use that many standard functions.
+For instance, if we need to go from characters to bytes and vice versa, we have
+to do that for \UTF\ so we use some dedicated functions or \LPEG. If in \CONTEXT\
+we parse strings, we often use \LPEG\ instead of string functions anyway. And if
+we still do use string functions, for instance when dealing with simple strings,
+it only happens a few times.
+
+The more demanding \CONTEXT\ code deals with node lists, which means frequent
+calls to core \LUATEX\ functions. Alas, jitting doesn't help much there unless we
+start messing with \type {ffi} which is not on the agenda. \footnote {If we want
+to improve these mechanisms it makes much more sense to make more helpers.
+However, profiling has shown us that the most demanding code is already quite
+optimized.}
+
+\stopsection
+
+\startsection[title=Benchmarks]
+
+Let's look at some of the benchmarks. The first one uses \METAPOST\ and because
+we want to see if calculations are faster, we draw a path with a special pen so
+that some transformations have to be done in the code that generates the \PDF\
+output. We only show the \MSWINDOWS\ and 64-bit \LINUX\ tests here. The 32-bit
+tests are consistent with those on \MSWINDOWS\ so we didn't add those timings
+here (also because in the meantime Luigi's machine broke down and he moved on
+to 64 bits).
+
+\typefile{benchmark-1.tex}
+
+The following times are measured in seconds. They are averages of 5~runs. There
+is a significant speedup but jitting doesn't do much.
+
+% mingw crosscompiled 5.2 / new mp : 25.5
+
+\starttabulate[|l|r|r|r|]
+\HL
+\NC \NC traditional \NC \JIT\ on \NC \JIT\ off \NC \NR
+\HL
+\NC \bf Windows 8 \NC 26.0 \NC 20.6 \NC 20.8 \NC \NR
+\NC \bf Linux 64 \NC 34.2 \NC 14.9 \NC 14.1 \NC \NR
+\HL
+\stoptabulate
+
+Our second example uses multiple fonts in a paragraph and adds color as well.
+Although well optimized, font||related code involves node list parsing and a
+bit of calculation. Color again deals with node lists and the backend
+code involves calculations but not that many. The traditional run on \LINUX\ is
+somewhat odd, but might have to do with the fact that the \METAPOST\ library
+suffers from the 64 bits. It is at least an indication that optimizations make
+less sense if there is a different dominant weak spot. We have to look into this
+some time.
+
+\typefile{benchmark-2.tex}
+
+Again jitting has no real benefits here, but the overall gain in speed is quite
+nice. It could be that the garbage collector plays a role here.
+
+% mingw crosscompiled 5.2 / new mp : 64.3
+
+\starttabulate[|l|r|r|r|]
+\HL
+\NC \NC traditional \NC \JIT\ on \NC \JIT\ off \NC \NR
+\HL
+\NC \bf Windows 8 \NC 54.6 \NC 36.0 \NC 35.9 \NC \NR
+\NC \bf Linux 64 \NC 46.5 \NC 32.0 \NC 31.7 \NC \NR
+\HL
+\stoptabulate
+
+This benchmark writes quite a lot of data to the console, which can have impact on
+performance as \TEX\ flushes on a per||character basis. When one runs \TEX\ as a
+service this has less impact because in that case the output goes into the void.
+There is a lot of file reading going on here, but normally the operating system
+will cache data, so after a first run this effect disappears. \footnote {On \MSWINDOWS\
+it makes sense to use \type {console2} because due to some clever buffering
+tricks it has a much better performance than the default console.}
+
+The third benchmark is one that we often use for testing regression in speed of
+the \CONTEXT\ core code. It measures the overhead in the page builder without
+special tricks being used, like backgrounds. The document has some 1000 pages.
+
+\typefile{benchmark-3.tex}
+
+These numbers are already quite okay for the normal version but the speedup of
+the \LUAJIT\ version is consistent with the expectations we have by now.
+
+% mingw crosscompiled 5.2 / new mp : 6.8
+
+\starttabulate[|l|r|r|r|]
+\HL
+\NC \NC traditional \NC \JIT\ on \NC \JIT\ off \NC \NR
+\HL
+\NC \bf Windows 8 \NC 4.5 \NC 3.6 \NC 3.6 \NC \NR
+\NC \bf Linux 64 \NC 4.8 \NC 3.9 \NC 4.0 \NC \NR
+\HL
+\stoptabulate
+
+The fourth benchmark uses some structuring, which involved \LUA\ tables and
+housekeeping, an itemize, which involves numbering and conversions, and a table
+mechanism that uses more \LUA\ than \TEX.
+
+\typefile{benchmark-4.tex}
+
+Here it looks like \JIT\ slows down the process, but of course we shouldn't take the last
+digit too seriously.
+
+% mingw crosscompiled 5.2 / new mp : 27.4
+
+\starttabulate[|l|r|r|r|]
+\HL
+\NC \NC traditional \NC \JIT\ on \NC \JIT\ off \NC \NR
+\HL
+\NC \bf Windows 8 \NC 20.9 \NC 16.8 \NC 16.5 \NC \NR
+\NC \bf Linux 64 \NC 20.4 \NC 16.0 \NC 16.1 \NC \NR
+\HL
+\stoptabulate
+
+Again, this example does a bit of logging, but not that much reading from file as
+buffers are kept in memory.
+
+We should start wondering when \JIT\ does kick in. This is what the fifth
+benchmark does.
+
+\typefile{benchmark-5.tex}
+
+Here we see \JIT\ having an effect! First of all the \LUAJIT\ versions are now 4~times
+faster. Making the \type {sin} a \type {local} function (the numbers after /) does not
+make much of a difference because the math functions are optimized anyway.. See how
+we're still faster when \JIT\ is disabled:
+
+% mingw crosscompiled 5.2 / new mp : 2.5/2.1
+
+\starttabulate[|l|r|r|r|]
+\HL
+\NC \NC traditional \NC \JIT\ on \NC \JIT\ off \NC \NR
+\HL
+\NC \bf Windows 8 \NC 1.97 / 1.54 \NC 0.46 / 0.45 \NC 0.73 / 0.61 \NC \NR
+\NC \bf Linux 64 \NC 1.62 / 1.27 \NC 0.41 / 0.42 \NC 0.67 / 0.52 \NC \NR
+\HL
+\stoptabulate
+
+Unfortunately this kind of calculation (in these amounts) doesn't happen that
+often but maybe some users can benefit.
+
+\stopsection
+
+\startsection[title=Conclusions]
+
+So, does it make sense to complicate the \LUATEX\ build with \LUAJIT ? It does
+when speed matters, for instance when \CONTEXT\ is run as a service. Some 25\% gain
+in speed means less waiting time, better use of \CPU\ cycles, less energy
+consumption, etc. On the other hand, computers are still becoming faster and compared
+to those speed|-|ups the 25\% is not that much. Also, as \TEX\ deals with files,
+the advance of \SSD\ disks and larger and faster memory helps too. Faster and
+larger \CPU\ caches contributes too. On the other hand, multiple cores don't help that
+much on a system that only runs \TEX. Interesting is that multi|-|core
+architectures tend to run at slower speeds than single cores where more heat can
+be dissipated and in that respect servers mostly running \TEX\ are better off with
+fewer cores that can run at higher frequencies. But anyhow, 25\% is still better
+than nothing and it makes my old laptop feel faster. It prolongs the lifetime
+of machines!
+
+Now, say that we cannot speed up \TEX\ itself that much, but that there is still
+something to gain at the \LUA\ end \emdash\ what can we reasonably expect? First of all
+we need to take into account that only part of the runtime is due to \LUA. Say
+that this is 25\% for a document of average complexity.
+
+\startnarrower
+runtime\low{tex} + runtime\low{lua} = 100
+\stopnarrower
+
+We can consider the time needed by \TEX\ to be constant; so if that is
+75\% of the total time (say 100 seconds) to begin with, we have:
+
+\startnarrower
+75 + runtime\low{lua} = 100
+\stopnarrower
+
+It will be clear that if we bring down the runtime to 80\% (80 seconds) of the
+original we end up with:
+
+\startnarrower
+75 + runtime\low{lua} = 80
+\stopnarrower
+
+And the 25 seconds spent in \LUA\ went down to 5, meaning that \LUA\ processing
+got 5 times faster! It is also clear that getting much more out of \LUA\
+becomes hard. Of course we can squeeze more out of it, but \TEX\ still needs its
+time. It is hard to measure how much time is actually spent in \LUA. We do keep
+track of some times but it is not that accurate. These experiments and the gain
+in speed indicate that we probably spend more time in \LUA\ than we first
+guessed. If you look in the \CONTEXT\ source it's not that hard to imagine that
+indeed we might well spend 50\% or more of our time in \LUA\ and|/|or in
+transferring control between \TEX\ and \LUA. So, in the end there still might
+be something to gain.
+
+Let's take benchmark 4 as an example. At some point we measured for a regular
+\LUATEX\ 0.74 run 27.0 seconds and for a \LUAJITTEX\ run 23.3 seconds. If we
+assume that the \LUAJIT\ virtual machine is twice as fast as the normal one, some
+juggling with numbers makes us conclude that \TEX\ takes some 19.6 seconds of
+this. An interesting border case is \type {\directlua}: we sometimes pass quite
+a lot of data and that gets tokenized first (a \TEX\ activity) and the resulting
+token list is converted into a string (also a \TEX\ activity) and then converted
+to bytecode (a \LUA\ task) and when okay executed by \LUA. The time involved in
+conversion to byte code is probably the same for stock \LUA\ and \LUAJIT.
+
+In the \LUATEX\ case, 30\% of the runtime for benchmark 4 is on \LUA's tab, and
+in \LUAJITTEX\ it's 15\%. We can try to bring down the \LUA\ part even more, but
+it makes more sense to gain something at the \TEX\ end. There macro expansion
+can be improved (read: \CONTEXT\ core code) but that is already rather
+optimized.
+
+Just for the sake of completeness Luigi compiled a stock \LUATEX\ binary for 64-bit
+\LINUX\ with the \type {-o3} option (which forces more inlining of functions
+as well as a different switch mechanism). We did a few tests and this is the result:
+
+\starttabulate[|lTB|r|r|]
+\HL
+\NC \NC \LUATEX\ 0.74 -o2 \NC \LUATEX\ 0.74 - o3 \NC \NR
+\HL
+\NC benchmark-1 \NC 15.5 \NC 15.0 \NC \NR
+\NC benchmark-2 \NC 35.8 \NC 34.0 \NC \NR
+\NC benchmark-3 \NC 4.0 \NC 3.9 \NC \NR
+\NC benchmark-4 \NC 16.0 \NC 15.8 \NC \NR
+\HL
+\stoptabulate
+
+This time we used \type {--batch} and \type {--silent} to eliminate terminal
+output. So, if you really want to squeeze out the maximum performance you need
+to compile with \type {-o3}, use \LUAJITTEX\ (with the faster virtual machine)
+but disable \JIT\ (disabled by default anyway).
+
+% tex + jit = 23.3
+% tex + lua = 27.0
+% lua = 2*jit % cf roberto
+%
+% so:
+%
+% 2*tex + 2*jit = 46.6
+% tex + 2*jit = 27.0
+% -------------------- -
+% tex = 19.6
+%
+% ratios:
+%
+% tex : lua = 70 : 30
+% tex : jit = 85 : 15
+
+We have no reason to abandon stock \LUA. Also, because during these experiments
+we were still using \LUA\ 5.1 we started wondering what the move to 5.2 would
+bring. Such a move forward also means that \CONTEXT\ \MKIV\ will not depend on
+specific \LUAJIT\ features, although it is aware of it (this is needed because we
+store bytecodes). But we will definitely explore the possibilities and see where
+we can benefit. In that respect there will be a way to enable and
+disable jitting. So, users have the choice to use either stock \LUATEX\ or the
+\JIT||aware version but we default to the regular binary.
+
+As we use stock \LUA\ as benchmark, we will use the \type {bit32} library, while
+\LUAJIT\ has its own bit library. Some functions can be aliased so that is no big
+deal. In \CONTEXT\ we use wrappers anyway. More problematic is that we want to
+move on to \LUA\ 5.2 and not all 5.2 features are supported (yet) in \LUAJIT. So,
+if \LUAJIT\ is mandatory in a workflow, then users had better make sure that the
+\LUA\ code is compatible. We don't expect too many problems in \CONTEXT\ \MKIV.
+
+\stopsection
+
+\startsection[title=About speed]
+
+It is worth mentioning that the \LUA\ version in \LUATEX\ has a patch for
+converting floats into strings. Instead of some \type {INF#} result we just
+return zero, simply because \TEX\ is integer||based and intercepting incredibly
+small numbers is too cumbersome. We had to apply the same patch in the \JIT\
+version.
+
+The benchmarks only indicate a trend. In a real document much more happens than
+in the above tests. So what are measurements worth? Say that we compile the \TEX
+book. This grandparent of all documents coded in \TEX\ is rather plainly coded
+(using of course plain \TEX) and compiles pretty fast. Processing does not suffer
+from complex expansions, there is no color, hardly any text manipulation, it's
+all 8 bit, the pagebuilder is straightforward as is all spacing. Although on my
+old machine I can get \CONTEXT\ to run at over 200 pages per second, this quickly
+drops to 10\% of that speed when we add some color, backgrounds, headers and
+footers, font switches, etc.
+
+So, running documents like the \TEX book for comparing the speed of, say,
+\PDFTEX, \XETEX, \LUATEX\ and now \LUAJITTEX\ makes no sense. The first one is
+still eight bit, the rest are \UNICODE. Also, the \TEX book uses traditional
+fonts with traditional features so effectively that it doesn't rely on anything
+that the new engines provide, not even \ETEX\ extensions. On the other hand, a
+recent document uses advanced fonts, properties like color and|/|or
+transparencies, hyperlinks, backgrounds, complex cover pages or chapter openings,
+embeds graphics, etc. Such a document might not even process in \PDFTEX\ or
+\XETEX, and if it does, it's still comparing different technologies: eight bit
+input and fast fonts in \PDFTEX, frozen \UNICODE\ and wide font support in
+\XETEX, instead of additional trickery and control, written in \LUA. So, when we
+investigate speed, we need to take into account what (font and input)
+technologies are used as well as what complicating layout and rendering features
+play a role. In practice speed only matters in an edit|-|view cycle and services
+where users wait for some result.
+
+It's rather hard to find a recent document that can be used to compare these
+engines. The best we could come up with was the rendering of the user interface
+documentation.
+
+\starttabulate[|T|T|T|T||]
+\NC texexec \NC --engine=pdftex \NC --global \NC x-set-12.mkii \NC 5.9 seconds \NC \NR
+\NC texexec \NC --engine=xetex \NC --global \NC x-set-12.mkii \NC 6.2 seconds \NC \NR
+\NC context \NC --engine=luatex \NC --global \NC x-set-12.mkiv \NC 6.2 seconds \NC \NR
+\NC context \NC --engine=luajittex \NC --global \NC x-set-12.mkiv \NC 4.6 seconds \NC \NR
+\stoptabulate
+
+Keep in mind that \type{texexec} is a \RUBY\ script and uses \type {kpsewhich}
+while \type {context} uses \LUA\ and its own (\TDS||compatible) file manager. But
+still, it is interesting to see that there is not that much difference if we keep
+\JIT\ out of the picture. This is because in \MKIV\ we have somewhat more clever
+\XML\ processing, although earlier measurements have demonstrated that in this
+case not that much speedup can be assigned to that.
+
+And so recent versions of \MKIV\ already keep up rather well with the older eight
+bit world. We do way more in \MKIV\ and the interfacing macros are nicer but
+potentially somewhat slower. Some mechanisms might be more efficient because of
+using \LUA, but some actually have more overhead because we keep track of more
+data. Font feature processing is done in \LUA, but somehow can keep up with the
+libraries used in \XETEX, or at least is not that significant a difference,
+although I can think of more demanding tasks. Of course in \LUATEX\ we can go
+beyond what libraries provide.
+
+No matter what one takes into account, performance is not that much worse in
+\LUATEX, and if we enable \JIT\ and so remove some of the traditional \LUA\
+virtual machine overhead, we're even better off. Of course we need to add a
+disclaimer here: don't force us to prove that the relative speed ratios are the
+same for all cases. In fact, it being so hard to measure and compare, performance
+can be considered to be something taken for granted as there is not that much we
+can do about getting nicer numbers, apart from maybe parallelizing which brings
+other complexities into the picture. On our servers, a few other virtual machines
+running \TEX\ services kicking in at the same time, using \CPU\ cycles, network
+bandwidth (as all data lives someplace else) and asking for disk access have much
+more impact than the 25\% we gain. Of course if all processes run faster then
+we've gained something.
+
+For what it's worth: processing this text takes some 2.3 seconds on my laptop for
+regular \LUATEX\ and 1.8 seconds with \LUAJITTEX, including the extra overhead of
+restarting. As this is a rather average example it fits earlier measurements.
+
+Processing a font manual (work in progress) takes \LUAJITTEX\ 15 seconds for 112
+pages compared to 18.4 seconds for \LUATEX. The not yet finished manual loads 20
+different fonts (each with multiple instances), uses colors, has some \METAPOST\
+graphics and does some font juggling. The gain in speed sounds familiar.
+
+\stopsection
+
+\startsection[title=The future]
+
+At the 2012 \LUA\ conference Roberto Ierusalimschy mentioned that the virtual
+machine of \LUAJIT\ is about twice as fast due to it being partly done in
+assembler while the regular machinery is written in standard \CCODE\ and keeps
+portability in mind.
+
+He also presented some plans for future versions of \LUA. There will be some
+lightweight helpers for \UTF. Our experiences so far are that only a handful of
+functions are actually needed: byte to character conversions and vice versa,
+iterators for \UTF\ characters and \UTF\ values and maybe a simple substring
+function is probably enough. Currently \LUATEX\ has some extra string iterators
+and it will provide the converters as well.
+
+There is a good chance that \LPEG\ will become a standard library (which it
+already is in \LUATEX), which is also nice. It's interesting that, especially on
+longer sequences, \LPEG\ can beat the string matchers and replacers, although
+when in a substitution no match and therefore no replacements happen, the regular
+gsub wins. We're talking small numbers here, in daily usage \LPEG\ is about as
+efficient as you can wish. In \CONTEXT\ we have a \type {lpeg.UR} and \type
+{lpeg.US} and it would be nice to have these as native \UTF\ related methods, but
+I must admit that I seldom need them.
+
+This and other extensions coming to the language also have some impact on a \JIT\
+version: the current \LUAJIT\ is already not entirely compatible with \LUA\ 5.2
+so you need to keep that into account if you want to use this version of \LUATEX.
+So, unless \LUAJIT\ follows the mainstream development, as \CONTEXT\ \MKIV\ user
+you should not depend on it. But at the moment it's nice to have this choice.
+
+The yet experimental code will end up in the main \LUATEX\ repository in time
+before the \TEX\ Live 2013 code freeze. In order to make it easier to run both
+versions alongside, we have added the \LUA\ 5.2 built|-|in library \type {bit32}
+to \LUAJITTEX. We found out that it's too much trouble to add that library to
+\LUA~5.1 but \LUATEX\ has moved on to 5.2 anyway.
+
+\stopsection
+
+\startsection[title=Running]
+
+So, as we will definitely stick to stock \LUA, one might wonder if it makes sense
+to officially support jitting in \CONTEXT. First of all, \LUATEX\ is not
+influenced that much by the low level changes in the \API\ between 5.1 and 5.2.
+Also \LUAJIT\ does support the most important new 5.2 features, so at the moment
+we're mostly okay. We expect that eventually \LUAJIT\ will catch up but if not,
+we are not in big trouble: the performance of stock \LUA\ is quite okay and above
+all, it's portable! \footnote {Stability and portability are important properties
+of \TEX\ engines, which is yet another reason for using \LUA. For those doing
+number crunching in a document, \JIT\ can come in handy.} For the moment you can
+consider \LUAJITTEX\ to be an experiment and research tool, but we will do our
+best to keep it production ready.
+
+So how do we choose between the two engines? After some experimenting with
+alternative startup scenarios and dedicated caches, the following solution was
+reached:
+
+\starttyping
+context --engine=luajittex ...
+\stoptyping
+
+The usual preamble line also works:
+
+\starttyping
+% engine=luajittex
+\stoptyping
+
+As the main infrastructure uses the \type {luatex} and related binaries, this
+will result in a relaunch: the \type {context} script will be restarted using
+\type {luajittex}. This is a simple solution and the overhead is rather minimal,
+especially compared to the somewhat faster run. Alternatively you can copy \type
+{luajittex} over \type {luatex} but that is more drastic. Keep in mind that \type
+{luatex} is the benchmark for development of \CONTEXT, so the \JIT\ aware version
+might fall behind sometimes.
+
+Yet another approach is adapting the configuration file, or better, provide (or
+adapt) your own \type {texmfcnf.lua} in for instance \type {texmf-local/web2c}
+path:
+
+\starttyping
+return {
+ type = "configuration",
+ version = "1.2.3",
+ date = "2012-12-12",
+ time = "12:12:12",
+ comment = "Local overloads",
+ author = "Hans Hagen, PRAGMA-ADE, Hasselt NL",
+ content = {
+ directives = {
+ ["system.engine"] = "luajittex",
+ },
+ },
+}
+\stoptyping
+
+This has the same effect as always providing \type {--engine=luajittex} but only
+makes sense in well controlled situations as you might easily forget that it's
+the default. Of course one could have that file and just comment out the
+directive unless in test mode.
+
+Because the bytecode of \LUAJIT\ differs from the one used by \LUA\ itself we
+have a dedicated format as well as dedicated bytecode compiled resources (for
+instance \type {tmb} instead of \type {tmc}). For most users this is not
+something they should bother about as it happens automatically.
+
+Based on experiments, by default we have disabled \JIT\, so we only benefit from
+the faster virtual machine. Future versions of \CONTEXT\ might provide some
+control over that but first we want to conduct more experiments.
+
+\stopsection
+
+\startsection[title=Addendum]
+
+These developments and experiments took place in November and December 2012. At
+the time of this writing we also made the move to \LUA\ 5.2 in stock \LUATEX; the
+first version to provide this was 0.74. Here are some measurements on Taco
+Hoekwater's 64-bit \LINUX\ machine:
+
+\starttabulate[|lTB|r|r|l|]
+\HL
+\NC \NC \LUATEX\ 0.70 \NC \LUATEX\ 0.74 \NC \NC \NR
+\HL
+\NC benchmark-1 \NC 23.67 \NC 19.57 \NC faster \NC \NR
+\NC benchmark-2 \NC 65.41 \NC 62.88 \NC faster \NC \NR
+\NC benchmark-3 \NC 4.88 \NC 4.67 \NC faster \NC \NR
+\NC benchmark-4 \NC 23.09 \NC 22.71 \NC faster \NC \NR
+\NC benchmark-5 \NC 2.56/2.06 \NC 2.66/2.29 \NC slower \NC \NR
+\HL
+\stoptabulate
+
+There is a good chance that this is due to improvements of the garbage collector,
+virtual machine and string handling. It also looks like memory consumption is a
+bit less. Some speed optimizations in reading files have been removed (at least
+for now) and some patches to the \type {format} function (in the \type {string}
+namespace) that dealt with (for \TEX) unfortunate number conversions have not
+been ported. The code base is somewhat cleaner and we expect to be able to split
+up the binary in a core program plus some libraries that are loaded on demand.
+\footnote {Of course this poses some constraints on stability as components get
+decoupled, but this is one of the issues that we hope to deal with properly in
+the library project.} In general, we don't expect too many issues in the
+transition to \LUA\ 5.2, and \CONTEXT\ is already adapted to support \LUATEX\
+with 5.2 as well as \LUAJITTEX\ with an older version.
+
+Running the same tests on a 32-bit \MSWINDOWS\ machine gives this:
+
+\starttabulate[|lTB|r|r|r|]
+\HL
+\NC \NC \LUATEX\ 0.70 \NC \LUATEX\ 0.74 \NC \NC \NR
+\HL
+\NC benchmark-1 \NC 26.4 \NC 25.5 \NC faster \NC \NR
+\NC benchmark-2 \NC 64.2 \NC 63.6 \NC faster \NC \NR
+\NC benchmark-3 \NC 7.1 \NC 6.9 \NC faster \NC \NR
+\NC benchmark-4 \NC 28.3 \NC 27.0 \NC faster \NC \NR
+\NC benchmark-5 \NC 1.95/1.50 \NC 1.84/1.48 \NC faster \NC \NR
+\HL
+\stoptabulate
+
+The gain is less impressive but the machine is rather old and we can benefit less
+from modern \CPU\ properties (cache, memory bandwidth, etc.). I tend to conclude
+that there is no significant improvement here but it also doesn't get worse.
+However we need to keep in mind that file \IO\ is less optimal in 0.74 so this
+might play a role. As usual, runtime is negatively influenced by the relatively
+slow speed of displaying messages on the console (even when we use \type
+{console2}).
+
+A few days before the end of 2012, Akira Kakuto compiled native \MSWINDOWS\
+binaries for both engines. This time I decided to run a comparison inside the
+\SCITE\ editor, that has very fast console output. \footnote {Most of my personal
+\TEX\ runs are from within \SCITE, while most runs on the servers are in batch
+mode, so normally the overhead of the console is acceptable or even neglectable.}
+
+\starttabulate[|lTB|r|r|r|]
+\HL
+\NC \NC \LUATEX\ 0.74 (5.2) \NC \LUAJITTEX\ 0.72 (5.1) \NC \NC \NR
+\HL
+\NC benchmark-1 \NC 25.4 \NC 25.4 \NC similar \NC \NR
+\NC benchmark-2 \NC 54.7 \NC 36.3 \NC faster \NC \NR
+\NC benchmark-3 \NC 4.3 \NC 3.6 \NC faster \NC \NR
+\NC benchmark-4 \NC 20.0 \NC 16.3 \NC faster \NC \NR
+\NC benchmark-5 \NC 1.93/1.48 \NC 0.74/0.61 \NC faster \NC \NR
+\HL
+\stoptabulate
+
+Only the \METAPOST\ library and conversion benchmark didn't show a speedup. The
+regular \TEX\ tests 1||3 gain some 15||35\%. Enabling \JIT\ (off by default)
+slowed down processing. For the sake of completeness I also timed \LUAJITTEX\
+on the console, so here you see the improvement of both engines.
+
+\starttabulate[|lTB|r|r|r|]
+\HL
+\NC \NC \LUATEX\ 0.70 \NC \LUATEX\ 0.74 \NC \LUAJITTEX\ 0.72 \NC \NR
+\HL
+\NC benchmark-1 \NC 26.4 \NC 25.5 \NC 25.9 \NC \NR
+\NC benchmark-2 \NC 64.2 \NC 63.6 \NC 45.5 \NC \NR
+\NC benchmark-3 \NC 7.1 \NC 6.9 \NC 6.0 \NC \NR
+\NC benchmark-4 \NC 28.3 \NC 27.0 \NC 23.3 \NC \NR
+\NC benchmark-5 \NC 1.95/1.50 \NC 1.84/1.48 \NC 0.73/0.60 \NC \NR
+\HL
+\stoptabulate
+
+In this text, the term \JIT\ has come up a lot but you might rightfully wonder if
+the observations here relate to \JIT\ at all. For the moment I tend to conclude
+that the implementation of the virtual machine and garbage collection have more
+impact than the actual just||in||time compilation. More exploration of \JIT\ is
+needed to see if we can really benefit from that. Of course the fact that we use
+a bit less memory is also nice. In case you wonder why I bother about speed at
+all: we happen to run \LUATEX\ mostly as a (remote) service and generating a
+bunch of (related) documents takes a bit of time. Bringing the waiting down from
+15 to 10 seconds might not sound impressive but it makes a difference when it is
+someone's job to generate these sets.
+
+In summary: just before we entered 2013, we saw two rather fundamental updates of
+\LUATEX\ show up: an improved traditional one with \LUA\ 5.2 as well as the
+somewhat faster \LUAJITTEX\ with a mixture between 5.1 and 5.2. And in 2013 we
+will of course try to make them both even more attractive.
+
+\stopsection
+
+\stopchapter
+
+% benchmark-4:
+%
+% tex + jit = 23.3
+% tex + lua = 27.0
+% lua = 2*jit % cf roberto
+%
+% so:
+%
+% 2*tex + 2*jit = 46.6
+% tex + 2*jit = 27.0
+% -------------------- -
+% tex = 19.6
+%
+% ratios:
+%
+% tex : lua = 70 : 30
+% tex : jit = 85 : 15