diff options
Diffstat (limited to 'doc/context/sources/general/manuals/hybrid/hybrid-jit.tex')
-rw-r--r-- | doc/context/sources/general/manuals/hybrid/hybrid-jit.tex | 653 |
1 files changed, 653 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/hybrid/hybrid-jit.tex b/doc/context/sources/general/manuals/hybrid/hybrid-jit.tex new file mode 100644 index 000000000..d769ccf80 --- /dev/null +++ b/doc/context/sources/general/manuals/hybrid/hybrid-jit.tex @@ -0,0 +1,653 @@ +% language=uk engine=luatex + +\startcomponent hybrid-backends + +\environment hybrid-environment + +\logo[SWIGLIB] {SwigLib} +\logo[LUAJIT] {LuaJIT} +\logo[LUAJITTEX]{Luajit\TeX} +\logo[JIT] {jit} + +\startchapter[title={Just in time}] + +\startsection [title={Introduction}] + +Reading occasional announcements about \LUAJIT,\footnote {\LUAJIT\ is written by +Mike Pall and more information about it and the technology it uses is at \type +{http://luajit.org}, a site also worth visiting for its clean design.} one starts +wondering if just||in||time compilation can speed up \LUATEX. As a side track of +the \SWIGLIB\ project and after some discussion, Luigi Scarso decided to compile +a version of \LUATEX\ that had the \JIT\ compiler as the \LUA\ engine. That's +when our journey into \JIT\ began. + +We started with \LINUX\ 32-bit as this is what Luigi used at that time. Some +quick first tests indicated that the \LUAJIT\ compiler made \CONTEXT\ \MKIV\ run +faster but not that much. Because \LUAJIT\ claims to be much faster than stock +\LUA, Luigi then played a bit with \type {ffi}, i.e.\ mixing \CCODE\ and \LUA, +especially data structures. There is indeed quite some speed to gain here; +unfortunately, we would have to mess up the \CONTEXT\ code base so much that one +might wonder why \LUA\ was used in the first place. I could confirm these +observations in a Xubuntu virtual machine in \VMWARE\ running under 32-bit +Windows 8. So, we decided to conduct some more experiments. + +A next step was to create a 64-bit binary because the servers at \PRAGMA\ are +\KVM\ virtual machines running a 64-bit OpenSuse 12.1 and 12.2. It took a bit of +effort to get a \JIT\ version compiled because Luigi didn't want to mess up the +regular codebase too much. This time we observed a speedup of about 40\% on some +runs so we decided to move on to \WINDOWS\ to see if we could observe a similar +effect there. And indeed, when we adapted Akira Kakuto's \WINDOWS\ setup a bit we +could compile a version for \WINDOWS\ using the native \MICROSOFT\ compiler. On +my laptop a similar speedup was observed, although by then we saw that in +practice a 25\% speedup was about what we could expect. A bonus is that making +formats and identifying fonts is also faster. + +So, in that stage, we could safely conclude that \LUATEX\ combined with \LUAJIT\ +made sense if you want a somewhat faster version. But where does the speedup come +from? The easiest way to see if jitting has effect is to turn it on and off. + +\starttyping +jit.on() +jit.off() +\stoptyping + +To our surprise \CONTEXT\ runs are not much influenced by turning the jitter on +or off. \footnote {We also tweaked some of the fine|-|tuning parameters of +\LUAJIT\ but didn't notice any differences. In due time more tests will +be done.} This means that the improvement comes from other places: + +\startitemize[packed,n] +\startitem The virtual machine is a different one, and targets the platforms that +it runs on. This means that regular bytecode also runs faster. \stopitem +\startitem The garbage collector is the one from \LUA\ 5.2, so that can make a +difference. It looks like memory consumption is somewhat lower. \stopitem +\startitem Some standard library functions are recognized and supported in a more +efficient way. Think of \type {math.sin}. \stopitem +\startitem Some built-in functions like \type {type} are probably dealt with in +a more efficient way. \stopitem +\stopitemize + +The third item is an important one. We don't use that many standard functions. +For instance, if we need to go from characters to bytes and vice versa, we have +to do that for \UTF\ so we use some dedicated functions or \LPEG. If in \CONTEXT\ +we parse strings, we often use \LPEG\ instead of string functions anyway. And if +we still do use string functions, for instance when dealing with simple strings, +it only happens a few times. + +The more demanding \CONTEXT\ code deals with node lists, which means frequent +calls to core \LUATEX\ functions. Alas, jitting doesn't help much there unless we +start messing with \type {ffi} which is not on the agenda. \footnote {If we want +to improve these mechanisms it makes much more sense to make more helpers. +However, profiling has shown us that the most demanding code is already quite +optimized.} + +\stopsection + +\startsection[title=Benchmarks] + +Let's look at some of the benchmarks. The first one uses \METAPOST\ and because +we want to see if calculations are faster, we draw a path with a special pen so +that some transformations have to be done in the code that generates the \PDF\ +output. We only show the \MSWINDOWS\ and 64-bit \LINUX\ tests here. The 32-bit +tests are consistent with those on \MSWINDOWS\ so we didn't add those timings +here (also because in the meantime Luigi's machine broke down and he moved on +to 64 bits). + +\typefile{benchmark-1.tex} + +The following times are measured in seconds. They are averages of 5~runs. There +is a significant speedup but jitting doesn't do much. + +% mingw crosscompiled 5.2 / new mp : 25.5 + +\starttabulate[|l|r|r|r|] +\HL +\NC \NC traditional \NC \JIT\ on \NC \JIT\ off \NC \NR +\HL +\NC \bf Windows 8 \NC 26.0 \NC 20.6 \NC 20.8 \NC \NR +\NC \bf Linux 64 \NC 34.2 \NC 14.9 \NC 14.1 \NC \NR +\HL +\stoptabulate + +Our second example uses multiple fonts in a paragraph and adds color as well. +Although well optimized, font||related code involves node list parsing and a +bit of calculation. Color again deals with node lists and the backend +code involves calculations but not that many. The traditional run on \LINUX\ is +somewhat odd, but might have to do with the fact that the \METAPOST\ library +suffers from the 64 bits. It is at least an indication that optimizations make +less sense if there is a different dominant weak spot. We have to look into this +some time. + +\typefile{benchmark-2.tex} + +Again jitting has no real benefits here, but the overall gain in speed is quite +nice. It could be that the garbage collector plays a role here. + +% mingw crosscompiled 5.2 / new mp : 64.3 + +\starttabulate[|l|r|r|r|] +\HL +\NC \NC traditional \NC \JIT\ on \NC \JIT\ off \NC \NR +\HL +\NC \bf Windows 8 \NC 54.6 \NC 36.0 \NC 35.9 \NC \NR +\NC \bf Linux 64 \NC 46.5 \NC 32.0 \NC 31.7 \NC \NR +\HL +\stoptabulate + +This benchmark writes quite a lot of data to the console, which can have impact on +performance as \TEX\ flushes on a per||character basis. When one runs \TEX\ as a +service this has less impact because in that case the output goes into the void. +There is a lot of file reading going on here, but normally the operating system +will cache data, so after a first run this effect disappears. \footnote {On \MSWINDOWS\ +it makes sense to use \type {console2} because due to some clever buffering +tricks it has a much better performance than the default console.} + +The third benchmark is one that we often use for testing regression in speed of +the \CONTEXT\ core code. It measures the overhead in the page builder without +special tricks being used, like backgrounds. The document has some 1000 pages. + +\typefile{benchmark-3.tex} + +These numbers are already quite okay for the normal version but the speedup of +the \LUAJIT\ version is consistent with the expectations we have by now. + +% mingw crosscompiled 5.2 / new mp : 6.8 + +\starttabulate[|l|r|r|r|] +\HL +\NC \NC traditional \NC \JIT\ on \NC \JIT\ off \NC \NR +\HL +\NC \bf Windows 8 \NC 4.5 \NC 3.6 \NC 3.6 \NC \NR +\NC \bf Linux 64 \NC 4.8 \NC 3.9 \NC 4.0 \NC \NR +\HL +\stoptabulate + +The fourth benchmark uses some structuring, which involved \LUA\ tables and +housekeeping, an itemize, which involves numbering and conversions, and a table +mechanism that uses more \LUA\ than \TEX. + +\typefile{benchmark-4.tex} + +Here it looks like \JIT\ slows down the process, but of course we shouldn't take the last +digit too seriously. + +% mingw crosscompiled 5.2 / new mp : 27.4 + +\starttabulate[|l|r|r|r|] +\HL +\NC \NC traditional \NC \JIT\ on \NC \JIT\ off \NC \NR +\HL +\NC \bf Windows 8 \NC 20.9 \NC 16.8 \NC 16.5 \NC \NR +\NC \bf Linux 64 \NC 20.4 \NC 16.0 \NC 16.1 \NC \NR +\HL +\stoptabulate + +Again, this example does a bit of logging, but not that much reading from file as +buffers are kept in memory. + +We should start wondering when \JIT\ does kick in. This is what the fifth +benchmark does. + +\typefile{benchmark-5.tex} + +Here we see \JIT\ having an effect! First of all the \LUAJIT\ versions are now 4~times +faster. Making the \type {sin} a \type {local} function (the numbers after /) does not +make much of a difference because the math functions are optimized anyway.. See how +we're still faster when \JIT\ is disabled: + +% mingw crosscompiled 5.2 / new mp : 2.5/2.1 + +\starttabulate[|l|r|r|r|] +\HL +\NC \NC traditional \NC \JIT\ on \NC \JIT\ off \NC \NR +\HL +\NC \bf Windows 8 \NC 1.97 / 1.54 \NC 0.46 / 0.45 \NC 0.73 / 0.61 \NC \NR +\NC \bf Linux 64 \NC 1.62 / 1.27 \NC 0.41 / 0.42 \NC 0.67 / 0.52 \NC \NR +\HL +\stoptabulate + +Unfortunately this kind of calculation (in these amounts) doesn't happen that +often but maybe some users can benefit. + +\stopsection + +\startsection[title=Conclusions] + +So, does it make sense to complicate the \LUATEX\ build with \LUAJIT ? It does +when speed matters, for instance when \CONTEXT\ is run as a service. Some 25\% gain +in speed means less waiting time, better use of \CPU\ cycles, less energy +consumption, etc. On the other hand, computers are still becoming faster and compared +to those speed|-|ups the 25\% is not that much. Also, as \TEX\ deals with files, +the advance of \SSD\ disks and larger and faster memory helps too. Faster and +larger \CPU\ caches contributes too. On the other hand, multiple cores don't help that +much on a system that only runs \TEX. Interesting is that multi|-|core +architectures tend to run at slower speeds than single cores where more heat can +be dissipated and in that respect servers mostly running \TEX\ are better off with +fewer cores that can run at higher frequencies. But anyhow, 25\% is still better +than nothing and it makes my old laptop feel faster. It prolongs the lifetime +of machines! + +Now, say that we cannot speed up \TEX\ itself that much, but that there is still +something to gain at the \LUA\ end \emdash\ what can we reasonably expect? First of all +we need to take into account that only part of the runtime is due to \LUA. Say +that this is 25\% for a document of average complexity. + +\startnarrower +runtime\low{tex} + runtime\low{lua} = 100 +\stopnarrower + +We can consider the time needed by \TEX\ to be constant; so if that is +75\% of the total time (say 100 seconds) to begin with, we have: + +\startnarrower +75 + runtime\low{lua} = 100 +\stopnarrower + +It will be clear that if we bring down the runtime to 80\% (80 seconds) of the +original we end up with: + +\startnarrower +75 + runtime\low{lua} = 80 +\stopnarrower + +And the 25 seconds spent in \LUA\ went down to 5, meaning that \LUA\ processing +got 5 times faster! It is also clear that getting much more out of \LUA\ +becomes hard. Of course we can squeeze more out of it, but \TEX\ still needs its +time. It is hard to measure how much time is actually spent in \LUA. We do keep +track of some times but it is not that accurate. These experiments and the gain +in speed indicate that we probably spend more time in \LUA\ than we first +guessed. If you look in the \CONTEXT\ source it's not that hard to imagine that +indeed we might well spend 50\% or more of our time in \LUA\ and|/|or in +transferring control between \TEX\ and \LUA. So, in the end there still might +be something to gain. + +Let's take benchmark 4 as an example. At some point we measured for a regular +\LUATEX\ 0.74 run 27.0 seconds and for a \LUAJITTEX\ run 23.3 seconds. If we +assume that the \LUAJIT\ virtual machine is twice as fast as the normal one, some +juggling with numbers makes us conclude that \TEX\ takes some 19.6 seconds of +this. An interesting border case is \type {\directlua}: we sometimes pass quite +a lot of data and that gets tokenized first (a \TEX\ activity) and the resulting +token list is converted into a string (also a \TEX\ activity) and then converted +to bytecode (a \LUA\ task) and when okay executed by \LUA. The time involved in +conversion to byte code is probably the same for stock \LUA\ and \LUAJIT. + +In the \LUATEX\ case, 30\% of the runtime for benchmark 4 is on \LUA's tab, and +in \LUAJITTEX\ it's 15\%. We can try to bring down the \LUA\ part even more, but +it makes more sense to gain something at the \TEX\ end. There macro expansion +can be improved (read: \CONTEXT\ core code) but that is already rather +optimized. + +Just for the sake of completeness Luigi compiled a stock \LUATEX\ binary for 64-bit +\LINUX\ with the \type {-o3} option (which forces more inlining of functions +as well as a different switch mechanism). We did a few tests and this is the result: + +\starttabulate[|lTB|r|r|] +\HL +\NC \NC \LUATEX\ 0.74 -o2 \NC \LUATEX\ 0.74 - o3 \NC \NR +\HL +\NC benchmark-1 \NC 15.5 \NC 15.0 \NC \NR +\NC benchmark-2 \NC 35.8 \NC 34.0 \NC \NR +\NC benchmark-3 \NC 4.0 \NC 3.9 \NC \NR +\NC benchmark-4 \NC 16.0 \NC 15.8 \NC \NR +\HL +\stoptabulate + +This time we used \type {--batch} and \type {--silent} to eliminate terminal +output. So, if you really want to squeeze out the maximum performance you need +to compile with \type {-o3}, use \LUAJITTEX\ (with the faster virtual machine) +but disable \JIT\ (disabled by default anyway). + +% tex + jit = 23.3 +% tex + lua = 27.0 +% lua = 2*jit % cf roberto +% +% so: +% +% 2*tex + 2*jit = 46.6 +% tex + 2*jit = 27.0 +% -------------------- - +% tex = 19.6 +% +% ratios: +% +% tex : lua = 70 : 30 +% tex : jit = 85 : 15 + +We have no reason to abandon stock \LUA. Also, because during these experiments +we were still using \LUA\ 5.1 we started wondering what the move to 5.2 would +bring. Such a move forward also means that \CONTEXT\ \MKIV\ will not depend on +specific \LUAJIT\ features, although it is aware of it (this is needed because we +store bytecodes). But we will definitely explore the possibilities and see where +we can benefit. In that respect there will be a way to enable and +disable jitting. So, users have the choice to use either stock \LUATEX\ or the +\JIT||aware version but we default to the regular binary. + +As we use stock \LUA\ as benchmark, we will use the \type {bit32} library, while +\LUAJIT\ has its own bit library. Some functions can be aliased so that is no big +deal. In \CONTEXT\ we use wrappers anyway. More problematic is that we want to +move on to \LUA\ 5.2 and not all 5.2 features are supported (yet) in \LUAJIT. So, +if \LUAJIT\ is mandatory in a workflow, then users had better make sure that the +\LUA\ code is compatible. We don't expect too many problems in \CONTEXT\ \MKIV. + +\stopsection + +\startsection[title=About speed] + +It is worth mentioning that the \LUA\ version in \LUATEX\ has a patch for +converting floats into strings. Instead of some \type {INF#} result we just +return zero, simply because \TEX\ is integer||based and intercepting incredibly +small numbers is too cumbersome. We had to apply the same patch in the \JIT\ +version. + +The benchmarks only indicate a trend. In a real document much more happens than +in the above tests. So what are measurements worth? Say that we compile the \TEX +book. This grandparent of all documents coded in \TEX\ is rather plainly coded +(using of course plain \TEX) and compiles pretty fast. Processing does not suffer +from complex expansions, there is no color, hardly any text manipulation, it's +all 8 bit, the pagebuilder is straightforward as is all spacing. Although on my +old machine I can get \CONTEXT\ to run at over 200 pages per second, this quickly +drops to 10\% of that speed when we add some color, backgrounds, headers and +footers, font switches, etc. + +So, running documents like the \TEX book for comparing the speed of, say, +\PDFTEX, \XETEX, \LUATEX\ and now \LUAJITTEX\ makes no sense. The first one is +still eight bit, the rest are \UNICODE. Also, the \TEX book uses traditional +fonts with traditional features so effectively that it doesn't rely on anything +that the new engines provide, not even \ETEX\ extensions. On the other hand, a +recent document uses advanced fonts, properties like color and|/|or +transparencies, hyperlinks, backgrounds, complex cover pages or chapter openings, +embeds graphics, etc. Such a document might not even process in \PDFTEX\ or +\XETEX, and if it does, it's still comparing different technologies: eight bit +input and fast fonts in \PDFTEX, frozen \UNICODE\ and wide font support in +\XETEX, instead of additional trickery and control, written in \LUA. So, when we +investigate speed, we need to take into account what (font and input) +technologies are used as well as what complicating layout and rendering features +play a role. In practice speed only matters in an edit|-|view cycle and services +where users wait for some result. + +It's rather hard to find a recent document that can be used to compare these +engines. The best we could come up with was the rendering of the user interface +documentation. + +\starttabulate[|T|T|T|T||] +\NC texexec \NC --engine=pdftex \NC --global \NC x-set-12.mkii \NC 5.9 seconds \NC \NR +\NC texexec \NC --engine=xetex \NC --global \NC x-set-12.mkii \NC 6.2 seconds \NC \NR +\NC context \NC --engine=luatex \NC --global \NC x-set-12.mkiv \NC 6.2 seconds \NC \NR +\NC context \NC --engine=luajittex \NC --global \NC x-set-12.mkiv \NC 4.6 seconds \NC \NR +\stoptabulate + +Keep in mind that \type{texexec} is a \RUBY\ script and uses \type {kpsewhich} +while \type {context} uses \LUA\ and its own (\TDS||compatible) file manager. But +still, it is interesting to see that there is not that much difference if we keep +\JIT\ out of the picture. This is because in \MKIV\ we have somewhat more clever +\XML\ processing, although earlier measurements have demonstrated that in this +case not that much speedup can be assigned to that. + +And so recent versions of \MKIV\ already keep up rather well with the older eight +bit world. We do way more in \MKIV\ and the interfacing macros are nicer but +potentially somewhat slower. Some mechanisms might be more efficient because of +using \LUA, but some actually have more overhead because we keep track of more +data. Font feature processing is done in \LUA, but somehow can keep up with the +libraries used in \XETEX, or at least is not that significant a difference, +although I can think of more demanding tasks. Of course in \LUATEX\ we can go +beyond what libraries provide. + +No matter what one takes into account, performance is not that much worse in +\LUATEX, and if we enable \JIT\ and so remove some of the traditional \LUA\ +virtual machine overhead, we're even better off. Of course we need to add a +disclaimer here: don't force us to prove that the relative speed ratios are the +same for all cases. In fact, it being so hard to measure and compare, performance +can be considered to be something taken for granted as there is not that much we +can do about getting nicer numbers, apart from maybe parallelizing which brings +other complexities into the picture. On our servers, a few other virtual machines +running \TEX\ services kicking in at the same time, using \CPU\ cycles, network +bandwidth (as all data lives someplace else) and asking for disk access have much +more impact than the 25\% we gain. Of course if all processes run faster then +we've gained something. + +For what it's worth: processing this text takes some 2.3 seconds on my laptop for +regular \LUATEX\ and 1.8 seconds with \LUAJITTEX, including the extra overhead of +restarting. As this is a rather average example it fits earlier measurements. + +Processing a font manual (work in progress) takes \LUAJITTEX\ 15 seconds for 112 +pages compared to 18.4 seconds for \LUATEX. The not yet finished manual loads 20 +different fonts (each with multiple instances), uses colors, has some \METAPOST\ +graphics and does some font juggling. The gain in speed sounds familiar. + +\stopsection + +\startsection[title=The future] + +At the 2012 \LUA\ conference Roberto Ierusalimschy mentioned that the virtual +machine of \LUAJIT\ is about twice as fast due to it being partly done in +assembler while the regular machinery is written in standard \CCODE\ and keeps +portability in mind. + +He also presented some plans for future versions of \LUA. There will be some +lightweight helpers for \UTF. Our experiences so far are that only a handful of +functions are actually needed: byte to character conversions and vice versa, +iterators for \UTF\ characters and \UTF\ values and maybe a simple substring +function is probably enough. Currently \LUATEX\ has some extra string iterators +and it will provide the converters as well. + +There is a good chance that \LPEG\ will become a standard library (which it +already is in \LUATEX), which is also nice. It's interesting that, especially on +longer sequences, \LPEG\ can beat the string matchers and replacers, although +when in a substitution no match and therefore no replacements happen, the regular +gsub wins. We're talking small numbers here, in daily usage \LPEG\ is about as +efficient as you can wish. In \CONTEXT\ we have a \type {lpeg.UR} and \type +{lpeg.US} and it would be nice to have these as native \UTF\ related methods, but +I must admit that I seldom need them. + +This and other extensions coming to the language also have some impact on a \JIT\ +version: the current \LUAJIT\ is already not entirely compatible with \LUA\ 5.2 +so you need to keep that into account if you want to use this version of \LUATEX. +So, unless \LUAJIT\ follows the mainstream development, as \CONTEXT\ \MKIV\ user +you should not depend on it. But at the moment it's nice to have this choice. + +The yet experimental code will end up in the main \LUATEX\ repository in time +before the \TEX\ Live 2013 code freeze. In order to make it easier to run both +versions alongside, we have added the \LUA\ 5.2 built|-|in library \type {bit32} +to \LUAJITTEX. We found out that it's too much trouble to add that library to +\LUA~5.1 but \LUATEX\ has moved on to 5.2 anyway. + +\stopsection + +\startsection[title=Running] + +So, as we will definitely stick to stock \LUA, one might wonder if it makes sense +to officially support jitting in \CONTEXT. First of all, \LUATEX\ is not +influenced that much by the low level changes in the \API\ between 5.1 and 5.2. +Also \LUAJIT\ does support the most important new 5.2 features, so at the moment +we're mostly okay. We expect that eventually \LUAJIT\ will catch up but if not, +we are not in big trouble: the performance of stock \LUA\ is quite okay and above +all, it's portable! \footnote {Stability and portability are important properties +of \TEX\ engines, which is yet another reason for using \LUA. For those doing +number crunching in a document, \JIT\ can come in handy.} For the moment you can +consider \LUAJITTEX\ to be an experiment and research tool, but we will do our +best to keep it production ready. + +So how do we choose between the two engines? After some experimenting with +alternative startup scenarios and dedicated caches, the following solution was +reached: + +\starttyping +context --engine=luajittex ... +\stoptyping + +The usual preamble line also works: + +\starttyping +% engine=luajittex +\stoptyping + +As the main infrastructure uses the \type {luatex} and related binaries, this +will result in a relaunch: the \type {context} script will be restarted using +\type {luajittex}. This is a simple solution and the overhead is rather minimal, +especially compared to the somewhat faster run. Alternatively you can copy \type +{luajittex} over \type {luatex} but that is more drastic. Keep in mind that \type +{luatex} is the benchmark for development of \CONTEXT, so the \JIT\ aware version +might fall behind sometimes. + +Yet another approach is adapting the configuration file, or better, provide (or +adapt) your own \type {texmfcnf.lua} in for instance \type {texmf-local/web2c} +path: + +\starttyping +return { + type = "configuration", + version = "1.2.3", + date = "2012-12-12", + time = "12:12:12", + comment = "Local overloads", + author = "Hans Hagen, PRAGMA-ADE, Hasselt NL", + content = { + directives = { + ["system.engine"] = "luajittex", + }, + }, +} +\stoptyping + +This has the same effect as always providing \type {--engine=luajittex} but only +makes sense in well controlled situations as you might easily forget that it's +the default. Of course one could have that file and just comment out the +directive unless in test mode. + +Because the bytecode of \LUAJIT\ differs from the one used by \LUA\ itself we +have a dedicated format as well as dedicated bytecode compiled resources (for +instance \type {tmb} instead of \type {tmc}). For most users this is not +something they should bother about as it happens automatically. + +Based on experiments, by default we have disabled \JIT\, so we only benefit from +the faster virtual machine. Future versions of \CONTEXT\ might provide some +control over that but first we want to conduct more experiments. + +\stopsection + +\startsection[title=Addendum] + +These developments and experiments took place in November and December 2012. At +the time of this writing we also made the move to \LUA\ 5.2 in stock \LUATEX; the +first version to provide this was 0.74. Here are some measurements on Taco +Hoekwater's 64-bit \LINUX\ machine: + +\starttabulate[|lTB|r|r|l|] +\HL +\NC \NC \LUATEX\ 0.70 \NC \LUATEX\ 0.74 \NC \NC \NR +\HL +\NC benchmark-1 \NC 23.67 \NC 19.57 \NC faster \NC \NR +\NC benchmark-2 \NC 65.41 \NC 62.88 \NC faster \NC \NR +\NC benchmark-3 \NC 4.88 \NC 4.67 \NC faster \NC \NR +\NC benchmark-4 \NC 23.09 \NC 22.71 \NC faster \NC \NR +\NC benchmark-5 \NC 2.56/2.06 \NC 2.66/2.29 \NC slower \NC \NR +\HL +\stoptabulate + +There is a good chance that this is due to improvements of the garbage collector, +virtual machine and string handling. It also looks like memory consumption is a +bit less. Some speed optimizations in reading files have been removed (at least +for now) and some patches to the \type {format} function (in the \type {string} +namespace) that dealt with (for \TEX) unfortunate number conversions have not +been ported. The code base is somewhat cleaner and we expect to be able to split +up the binary in a core program plus some libraries that are loaded on demand. +\footnote {Of course this poses some constraints on stability as components get +decoupled, but this is one of the issues that we hope to deal with properly in +the library project.} In general, we don't expect too many issues in the +transition to \LUA\ 5.2, and \CONTEXT\ is already adapted to support \LUATEX\ +with 5.2 as well as \LUAJITTEX\ with an older version. + +Running the same tests on a 32-bit \MSWINDOWS\ machine gives this: + +\starttabulate[|lTB|r|r|r|] +\HL +\NC \NC \LUATEX\ 0.70 \NC \LUATEX\ 0.74 \NC \NC \NR +\HL +\NC benchmark-1 \NC 26.4 \NC 25.5 \NC faster \NC \NR +\NC benchmark-2 \NC 64.2 \NC 63.6 \NC faster \NC \NR +\NC benchmark-3 \NC 7.1 \NC 6.9 \NC faster \NC \NR +\NC benchmark-4 \NC 28.3 \NC 27.0 \NC faster \NC \NR +\NC benchmark-5 \NC 1.95/1.50 \NC 1.84/1.48 \NC faster \NC \NR +\HL +\stoptabulate + +The gain is less impressive but the machine is rather old and we can benefit less +from modern \CPU\ properties (cache, memory bandwidth, etc.). I tend to conclude +that there is no significant improvement here but it also doesn't get worse. +However we need to keep in mind that file \IO\ is less optimal in 0.74 so this +might play a role. As usual, runtime is negatively influenced by the relatively +slow speed of displaying messages on the console (even when we use \type +{console2}). + +A few days before the end of 2012, Akira Kakuto compiled native \MSWINDOWS\ +binaries for both engines. This time I decided to run a comparison inside the +\SCITE\ editor, that has very fast console output. \footnote {Most of my personal +\TEX\ runs are from within \SCITE, while most runs on the servers are in batch +mode, so normally the overhead of the console is acceptable or even neglectable.} + +\starttabulate[|lTB|r|r|r|] +\HL +\NC \NC \LUATEX\ 0.74 (5.2) \NC \LUAJITTEX\ 0.72 (5.1) \NC \NC \NR +\HL +\NC benchmark-1 \NC 25.4 \NC 25.4 \NC similar \NC \NR +\NC benchmark-2 \NC 54.7 \NC 36.3 \NC faster \NC \NR +\NC benchmark-3 \NC 4.3 \NC 3.6 \NC faster \NC \NR +\NC benchmark-4 \NC 20.0 \NC 16.3 \NC faster \NC \NR +\NC benchmark-5 \NC 1.93/1.48 \NC 0.74/0.61 \NC faster \NC \NR +\HL +\stoptabulate + +Only the \METAPOST\ library and conversion benchmark didn't show a speedup. The +regular \TEX\ tests 1||3 gain some 15||35\%. Enabling \JIT\ (off by default) +slowed down processing. For the sake of completeness I also timed \LUAJITTEX\ +on the console, so here you see the improvement of both engines. + +\starttabulate[|lTB|r|r|r|] +\HL +\NC \NC \LUATEX\ 0.70 \NC \LUATEX\ 0.74 \NC \LUAJITTEX\ 0.72 \NC \NR +\HL +\NC benchmark-1 \NC 26.4 \NC 25.5 \NC 25.9 \NC \NR +\NC benchmark-2 \NC 64.2 \NC 63.6 \NC 45.5 \NC \NR +\NC benchmark-3 \NC 7.1 \NC 6.9 \NC 6.0 \NC \NR +\NC benchmark-4 \NC 28.3 \NC 27.0 \NC 23.3 \NC \NR +\NC benchmark-5 \NC 1.95/1.50 \NC 1.84/1.48 \NC 0.73/0.60 \NC \NR +\HL +\stoptabulate + +In this text, the term \JIT\ has come up a lot but you might rightfully wonder if +the observations here relate to \JIT\ at all. For the moment I tend to conclude +that the implementation of the virtual machine and garbage collection have more +impact than the actual just||in||time compilation. More exploration of \JIT\ is +needed to see if we can really benefit from that. Of course the fact that we use +a bit less memory is also nice. In case you wonder why I bother about speed at +all: we happen to run \LUATEX\ mostly as a (remote) service and generating a +bunch of (related) documents takes a bit of time. Bringing the waiting down from +15 to 10 seconds might not sound impressive but it makes a difference when it is +someone's job to generate these sets. + +In summary: just before we entered 2013, we saw two rather fundamental updates of +\LUATEX\ show up: an improved traditional one with \LUA\ 5.2 as well as the +somewhat faster \LUAJITTEX\ with a mixture between 5.1 and 5.2. And in 2013 we +will of course try to make them both even more attractive. + +\stopsection + +\stopchapter + +% benchmark-4: +% +% tex + jit = 23.3 +% tex + lua = 27.0 +% lua = 2*jit % cf roberto +% +% so: +% +% 2*tex + 2*jit = 46.6 +% tex + 2*jit = 27.0 +% -------------------- - +% tex = 19.6 +% +% ratios: +% +% tex : lua = 70 : 30 +% tex : jit = 85 : 15 |