summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/about/about-jitting.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/context/sources/general/manuals/about/about-jitting.tex')
-rw-r--r--doc/context/sources/general/manuals/about/about-jitting.tex439
1 files changed, 439 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/about/about-jitting.tex b/doc/context/sources/general/manuals/about/about-jitting.tex
new file mode 100644
index 000000000..4a8bc763a
--- /dev/null
+++ b/doc/context/sources/general/manuals/about/about-jitting.tex
@@ -0,0 +1,439 @@
+% language=uk engine=luajittex
+
+\startluacode
+
+ local nofjitruns = 5000
+
+ local runnow = string.find(environment.jobname,"about%-jitting") and jit
+
+ local runtimes = table.load("about-jitting-jit.lua") or {
+ nofjitruns = nofjitruns,
+ timestamp = os.currenttime(),
+ }
+
+ document.NOfJitRuns = runtimes.nofjitruns or nofjitruns
+ document.JitRunTimes = runtimes
+
+ function document.JitRun(specification)
+
+ local code = buffers.getcontent(specification.name)
+
+ if runnow then
+
+ local function testrun(how)
+ local test = load(code)()
+ collectgarbage("collect")
+ jit[how]()
+ local t = os.clock()
+ for i=1,document.NOfJitRuns do
+ test()
+ end
+ t = os.clock() - t
+ jit.off()
+ return string.format("%0.3f",t)
+ end
+
+ local rundata = {
+ off = testrun("off"),
+ on = testrun("on"),
+ }
+
+ runtimes[code] = rundata
+ document.JitTiming = rundata
+
+ else
+
+ local rundata = runtimes[code] or { }
+
+ document.JitTiming = {
+ off = rundata.off or "0",
+ on = rundata.on or "0",
+ }
+
+
+ end
+
+ end
+
+\stopluacode
+
+\starttexdefinition LuaJitTest #1%
+
+ \ctxlua{document.JitRun { name = "#1" } }
+
+ \starttabulate[|lT|lT|]
+ \NC off \NC \cldcontext{document.JitTiming.off} \NC \NR
+ \NC on \NC \cldcontext{document.JitTiming.on } \NC \NR
+ \stoptabulate
+
+\stoptexdefinition
+
+\starttexdefinition NOfLuaJitRuns
+ \cldcontext{document.NOfJitRuns}
+\stoptexdefinition
+
+% end of code
+
+\startcomponent about-jitting
+
+\environment about-environment
+
+\definehead[jittestsection][subsubsection][color=,style=bold]
+
+\startchapter[title=Luigi's nightmare]
+
+\startsection[title=Introduction]
+
+If you have a bit of a background in programming and watch kids playing video
+games, either or not on a dedicates desktop machine, a console or even a mobile
+device, there is a good change that you realize how much processing power is
+involved. All those pixels get calculated many times per second, based on a
+dynamic model that not only involves characters, environment, physics and a story
+line but also immediately reacts on user input.
+
+If on the other hand in your text editor hit the magic key combination that
+renders a document source into for instance a \PDF\ file, you might wonder why
+that takes so many seconds. Of course it does matter that some resources are
+loaded, that maybe images are included, and lots of fuzzy logic makes things
+happen, but the most important factor is without doubt that \TEX\ macros are not
+compiled into machine code but into an intermediate representation. Those macros
+then get expanded, often over and over again, and that a relative slow process.
+As (local) macros can be redefined any time, the engine needs to take that into
+account and there is not much caching going on, unless you explicitly define
+macros that do so. Take this:
+
+\starttyping
+\def\bar{test}
+\def\foo{test \bar\space test}
+\stoptyping
+
+Even if the definition of \type {\test} stays the same, that if \type {\bar} can
+change:
+
+\starttyping
+\foo \def\bar{foo} \foo
+\stoptyping
+
+There is no mechanism to freeze the meaning of \type {\bar} in \type {\foo},
+something that is possible in the other language used in \CONTEXT:
+
+\starttyping
+local function bar() context("test") end
+function foo() context("test ") bar() context(" test") end
+\stoptyping
+
+Here we can use local functions to limit their scope.
+
+\starttyping
+foo() local function bar() context("foo") end foo()
+\stoptyping
+
+In a way you can say that \TEX\ is a bit more dynamic that \LUA, and optimizing
+(as well as hardening) it is much more difficult. In \CONTEXT\ we already
+stretched that to the limits, although occasionally I find ways to speed up a
+bit. Given that we spend a considerable amount of runtime in \LUA\ it makes sense
+to see what we can gain there. We have less possible interference and often a more
+predictable outcome as \type {bar}s won't suddenly become \type {foo}s.
+
+Nevertheless, the dynamic nature of both \TEX\ and \LUA\ has some impact on
+performance, especially when they do most of the work. While in games there are
+dedicated chips to do tasks, for \TEX\ there aren't. So, we're sort of stuck when
+it comes to speeding up the process to the level that is similar to advanced
+games. In the next sections I will discuss a few aspects of possible speedups and
+the reason why it doesn't work out as expected.
+
+\stopsection
+
+\startsection[title=Jitting]
+
+Let's go back once more to Luigi's nightmare of disappointing jit \footnote
+{Luigi Scarso is the author of \LUAJITTEX\ and we have reported on experiments
+with this variant of \LUATEX\ on several occasions.} We already know that the
+virtual machine of \LUAJIT\ is about twice as fast as the standard machine. We
+also experienced that enabling jit can degrade performance. Although we did
+observe some real drastic drop in performance when testing functions like \type
+{math.random} using the \type {mingw} compiler, we also saw a performance boost
+with simple pure \LUA\ functions. In that respect \LUAJIT\ is an impressive
+effort. So, it makes sense to use \LUAJITTEX\ even if in theory it could be
+faster.
+
+Next some tests will be shown. The timings are snapshots so different versions of
+\LUAJITTEX\ can have different outcomes. The tests are mostly used for
+discussions between Luigi and me and further experiments and believe me: we've
+really done all kind of tests to see if we can get some speed out of jitting.
+After all it's hard to believe that we can't gain something from it, so we might
+as do something wrong.
+
+Each test is run \NOfLuaJitRuns\ times. These are of course non|-|typical
+examples but they illustrate the principle. Each time we show two measurements:
+one with jit turned on, and one with jit off, but in both cases the faster
+virtual machine is enabled. The times shown are of course dependent on the
+architecture and operating system, but as we are only interested in relative
+times it's enough to know that we run 32 bit mingw binaries under 64 bit Windows
+8 on a modern quad core Ivy bridge \CPU. We did most tests with \LUAJIT\ 2.0.1
+but as far as we can see 2.0.2 has a similar performance.
+
+\startjittestsection[title={simple loops, no function calls}]
+
+\startbuffer[jittest]
+return function()
+ local a = 0
+ for i=1,10000 do
+ a = a + i
+ end
+end
+\stopbuffer
+
+\typebuffer[jittest] \LuaJitTest{jittest}
+
+\stopjittestsection
+
+\startjittestsection[title={simple loops, with simple function}]
+
+\startbuffer[jittest]
+local function whatever(i)
+ return i
+end
+
+return function()
+ local a = 0
+ for i=1,10000 do
+ a = a + whatever(i)
+ end
+end
+\stopbuffer
+
+\typebuffer[jittest] \LuaJitTest{jittest}
+
+\stopjittestsection
+
+\startjittestsection[title={simple loops, with built-in basic functions}]
+
+\startbuffer[jittest]
+return function()
+ local a = 0
+ for i=1,10000 do
+ a = a + math.sin(1/i)
+ end
+end
+\stopbuffer
+
+\typebuffer[jittest] \LuaJitTest{jittest}
+
+\stopjittestsection
+
+\startjittestsection[title={simple loops, with built-in simple functions}]
+
+\startbuffer[jittest]
+return function()
+ local a = 0
+ for i=1,1000 do
+ local a = a + tonumber(tostring(i))
+ end
+end
+\stopbuffer
+
+\typebuffer[jittest] \LuaJitTest{jittest}
+
+\stopjittestsection
+
+\startjittestsection[title={simple loops, with built-in simple functions}]
+
+\startbuffer[jittest]
+local tostring, tonumber = tostring, tonumber
+return function()
+ local a = 0
+ for i=1,1000 do
+ local a = a + tonumber(tostring(i))
+ end
+end
+\stopbuffer
+
+\typebuffer[jittest] \LuaJitTest{jittest}
+
+\stopjittestsection
+
+\startjittestsection[title={simple loops, with built-in complex functions}]
+
+\startbuffer[jittest]
+return function()
+ local a = 0
+ local p = (1-lpeg.P("5"))^0 * lpeg.P("5") + lpeg.Cc(0)
+ for i=1,100 do
+ local a = a + lpeg.match(p,tostring(i))
+ end
+end
+\stopbuffer
+
+\typebuffer[jittest] \LuaJitTest{jittest}
+
+\stopjittestsection
+
+\startjittestsection[title={simple loops, with foreign function}]
+
+\startbuffer[jittest]
+return function()
+ local a = 0
+ for i=1,10000 do
+ a = a + font.current()
+ end
+end
+\stopbuffer
+
+\typebuffer[jittest] \LuaJitTest{jittest}
+
+\stopjittestsection
+
+\startjittestsection[title={simple loops, with wrapped foreign functions}]
+
+\startbuffer[jittest]
+local fc = font.current
+
+function font.xcurrent()
+ return fc()
+end
+
+return function()
+ local a = 0
+ for i=1,10000 do
+ a = a + font.xcurrent()
+ end
+end
+\stopbuffer
+
+\typebuffer[jittest] \LuaJitTest{jittest}
+
+\stopjittestsection
+
+What we do observe here is that turning on jit doesn't always help. By design the
+current just|-|in|-|time compiler aborts optimization when it sees a function
+that is not known. This means that in \LUAJITTEX\ most code will not get jit,
+because we use built|-|in library calls a lot. Also, in version 2.0 we notice
+that a bit of extra wrapping will make performance worse too. This might be why
+for us jitting doesn't work out the way it is advertised. Often performance tests
+are done with simple functions that use built in functions that do get jit. And
+the more of those are supported, the better it gets. Although, when you profile a
+\CONTEXT\ run, you will notice that we don't call that many standard library
+functions, at least not so often that jitting would get noticed.
+
+A safe conclusion is that you can benefit a lot from the fast virtual machine but
+should check carefully if jit is not having a negative impact. As it is turned on
+by default in \LUAJIT\ (but off in \LUAJITTEX) it might as well get unnoticed,
+especially because there is always a performance gain due to the faster virtual
+machine and that might show more overall gain than the drawback of jitting
+unjittable code. It might just be a bit less drastic then possible because of
+artifacts mentioned here, but who knows what future versions of \LUAJIT\ will
+bring.
+
+Maybe sometime we can benefit from \type {ffi} but it makes no sense to mess up
+the \CONTEXT\ code with related calls: it looks ugly and also makes the code
+unusable in stock \LUA, so it is a a sort of no|-|go. There are some suggestions
+in \LUAJIT\ related posts about adapting the code to suit the jitter, but again,
+that makes no sense. If we need to keep a specific interpreter in mind, we could
+as well start writing everything in C. So, our hopes are on future versions of
+stock \LUA\ and \LUAJIT. Luigi uncovered the following comment in the source code:
+
+\starttyping
+/* C functions can have arbitrary side-effects and are not
+recorded (yet). */
+\stoptyping
+
+Although the \type {(yet)} indicates that at some point this restriction can be
+lifted, we don't expect this to happen soon. And patching the jit machinery
+ourselves to suite \LUATEX\ is no option.
+
+There is an important difference between a \LUATEX\ run and other programs: they
+are runs and these live short. A lot of code gets executed only once of a few
+times (like loading fonts), or gets executed in such different ways that (branch)
+prediction is hard. If you run a web server using \LUA\ it runs for weeks in a
+row so optimizing a function pays off, given that it gets optimized. When you
+have a \LUA\ enhanced interactive program, again, the session is long enough to
+benefit from jitting (if applied). And, when you crunch numbers, it might pay off
+too. In practice, a \TEX\ run has no such characteristics.
+
+\stopsection
+
+\startsection[title=Implementation]
+
+In \LUA\ 5.2 there are some changes in the implementation compared to 5.1 and
+before. It is hard to measure the impact of that but it's probably a win some
+here and loose some there situation. A good example is the way \LUA\ deals with
+strings. Before 5.2 all strings were hashed, but now only short strings are
+(at most 32 bytes are looked at). Now, consider this:
+
+\startitemize
+ \startitem
+ In \CONTEXT\ we do all font handling in \LUA\ and that involves lots of
+ tables with lots of (nicely hashed) short keys. So, comparing them is
+ pretty fast.
+ \stopitem
+ \startitem
+ We also read a lot from files, and each line passes filters and such
+ before it gets passed to \TEX. There hashing is not really needed,
+ although when it gets processed by filters it might as well save some
+ time.
+ \stopitem
+ \startitem
+ When we go from \TEX\ to \LUA\ and reverse, lots of strings are involved
+ and many of them are unique and used once. There hashing might bring a
+ penalty.
+ \stopitem
+ \startitem
+ When we loop over a string with \type {gmatch} or some \type {lpeg}
+ subprogram lots of (small) strings can get created and each gets hashed,
+ even if they have a short livespan.
+ \stopitem
+\stopitemize
+
+The above items indicate that we can benefit from hashing but that sometimes it
+might have a performance hit. My impression is that on the average we're better
+off by hashing and it's one of the reasons why \LUA\ is so fast (and useable).
+
+In \TEX\ all numbers are integers and in \LUA\ all numbers are floats. On modern
+computers dealing with floating point is fast and we're not crunching numbers
+anyway. We definitely would have an issue when numbers were just integers and an
+upcoming mixed integer|/|float model might not be in our advantage. We'll see.
+
+I had expected to benefit from bitwise operations but so far never could find a
+real application in \CONTEXT, at least not one that had a positive impact. But
+maybe it's just a way of thinking that hasn't evolved yet. Also, the fact that
+functions are used instead of a real language extension makes it less possible
+that there is a speedup involved.
+
+\stopsection
+
+\startsection[title=Garbage collection]
+
+In the beginning I played with tuning the \LUA\ garbage collector in order to
+improve performance. For some documents changing the step and multiplier worked
+out well, but for others it didn't, so I decided that one can best leave the
+values as they are. Turning the garbage collector off as expected gives a
+relative small speedup, and for the average run the extra memory used can be
+neglected. Just keep in mind that a \TEX\ run are never persistent so memory
+can't keep filling. I did some tests with the in theory faster (experimental)
+generational mode of the garbage collector but it made runs significantly slower.
+For instance processing the \type {fonts-mkiv.pdf} went from 9 to 9.5 seconds.
+
+\stopsection
+
+\startsection[title=Conclusion]
+
+So what is, given unpredictable performance hits of advertised optimizations, the
+best approach. It all starts by the \LUA\ (and \TEX) code: sloppy coding can have
+a price. Some of that can be disguised by clever interpreters but some can't. If
+the code is already fast, there is not much to gain. When going from \MKII\ to
+\MKIV\ more and more \LUA\ got introduced and lots of approaches were
+benchmarked, so, I'm already rather confident that there is not that much to
+gain. It will never have the impressive performance of interactive games and
+that's something we have to live with. As long as \LUA\ stays lean and mean,
+things can only get better over time.
+
+\stopsection
+
+\startluacode
+ table.save("about-jitting-jit.lua",document.JitRunTimes)
+\stopluacode
+
+\stopchapter
+
+\stopcomponent