summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/about/about-jitting.tex
blob: 4a8bc763a2e8745a4328cdcf69d345f4881db76b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
% language=uk engine=luajittex

\startluacode

    local nofjitruns = 5000

    local runnow     = string.find(environment.jobname,"about%-jitting") and jit

    local runtimes   = table.load("about-jitting-jit.lua") or {
        nofjitruns = nofjitruns,
        timestamp  = os.currenttime(),
    }

    document.NOfJitRuns  = runtimes.nofjitruns or nofjitruns
    document.JitRunTimes = runtimes

    function document.JitRun(specification)

        local code = buffers.getcontent(specification.name)

        if runnow then

            local function testrun(how)
                local test = load(code)()
                collectgarbage("collect")
                jit[how]()
                local t = os.clock()
                for i=1,document.NOfJitRuns do
                    test()
                end
                t = os.clock() - t
                jit.off()
                return string.format("%0.3f",t)
            end

            local rundata = {
                off = testrun("off"),
                on  = testrun("on"),
            }

            runtimes[code]     = rundata
            document.JitTiming = rundata

        else

            local rundata      = runtimes[code] or { }

            document.JitTiming = {
                off = rundata.off or "0",
                on  = rundata.on  or "0",
            }


        end

    end

\stopluacode

\starttexdefinition LuaJitTest #1%

    \ctxlua{document.JitRun { name = "#1" } }

    \starttabulate[|lT|lT|]
        \NC off \NC \cldcontext{document.JitTiming.off} \NC \NR
        \NC on  \NC \cldcontext{document.JitTiming.on } \NC \NR
    \stoptabulate

\stoptexdefinition

\starttexdefinition NOfLuaJitRuns
    \cldcontext{document.NOfJitRuns}
\stoptexdefinition

% end of code

\startcomponent about-jitting

\environment about-environment

\definehead[jittestsection][subsubsection][color=,style=bold]

\startchapter[title=Luigi's nightmare]

\startsection[title=Introduction]

If you have a bit of a background in programming and watch kids playing video
games, either or not on a dedicates desktop machine, a console or even a mobile
device, there is a good change that you realize how much processing power is
involved. All those pixels get calculated many times per second, based on a
dynamic model that not only involves characters, environment, physics and a story
line but also immediately reacts on user input.

If on the other hand in your text editor hit the magic key combination that
renders a document source into for instance a \PDF\ file, you might wonder why
that takes so many seconds. Of course it does matter that some resources are
loaded, that maybe images are included, and lots of fuzzy logic makes things
happen, but the most important factor is without doubt that \TEX\ macros are not
compiled into machine code but into an intermediate representation. Those macros
then get expanded, often over and over again, and that a relative slow process.
As (local) macros can be redefined any time, the engine needs to take that into
account and there is not much caching going on, unless you explicitly define
macros that do so. Take this:

\starttyping
\def\bar{test}
\def\foo{test \bar\space test}
\stoptyping

Even if the definition of \type {\test} stays the same, that if \type {\bar} can
change:

\starttyping
\foo \def\bar{foo} \foo
\stoptyping

There is no mechanism to freeze the meaning of \type {\bar} in \type {\foo},
something that is possible in the other language used in \CONTEXT:

\starttyping
local function bar() context("test") end
function foo() context("test ") bar() context(" test") end
\stoptyping

Here we can use local functions to limit their scope.

\starttyping
foo() local function bar() context("foo") end foo()
\stoptyping

In a way you can say that \TEX\ is a bit more dynamic that \LUA, and optimizing
(as well as hardening) it is much more difficult. In \CONTEXT\ we already
stretched that to the limits, although occasionally I find ways to speed up a
bit. Given that we spend a considerable amount of runtime in \LUA\ it makes sense
to see what we can gain there. We have less possible interference and often a more
predictable outcome as \type {bar}s won't suddenly become \type {foo}s.

Nevertheless, the dynamic nature of both \TEX\ and \LUA\ has some impact on
performance, especially when they do most of the work. While in games there are
dedicated chips to do tasks, for \TEX\ there aren't. So, we're sort of stuck when
it comes to speeding up the process to the level that is similar to advanced
games. In the next sections I will discuss a few aspects of possible speedups and
the reason why it doesn't work out as expected.

\stopsection

\startsection[title=Jitting]

Let's go back once more to Luigi's nightmare of disappointing jit \footnote
{Luigi Scarso is the author of \LUAJITTEX\ and we have reported on experiments
with this variant of \LUATEX\ on several occasions.} We already know that the
virtual machine of \LUAJIT\ is about twice as fast as the standard machine. We
also experienced that enabling jit can degrade performance. Although we did
observe some real drastic drop in performance when testing functions like \type
{math.random} using the \type {mingw} compiler, we also saw a performance boost
with simple pure \LUA\ functions. In that respect \LUAJIT\ is an impressive
effort. So, it makes sense to use \LUAJITTEX\ even if in theory it could be
faster.

Next some tests will be shown. The timings are snapshots so different versions of
\LUAJITTEX\ can have different outcomes. The tests are mostly used for
discussions between Luigi and me and further experiments and believe me: we've
really done all kind of tests to see if we can get some speed out of jitting.
After all it's hard to believe that we can't gain something from it, so we might
as do something wrong.

Each test is run \NOfLuaJitRuns\ times. These are of course non|-|typical
examples but they illustrate the principle. Each time we show two measurements:
one with jit turned on, and one with jit off, but in both cases the faster
virtual machine is enabled. The times shown are of course dependent on the
architecture and operating system, but as we are only interested in relative
times it's enough to know that we run 32 bit mingw binaries under 64 bit Windows
8 on a modern quad core Ivy bridge \CPU. We did most tests with \LUAJIT\ 2.0.1
but as far as we can see 2.0.2 has a similar performance.

\startjittestsection[title={simple loops, no function calls}]

\startbuffer[jittest]
return function()
    local a = 0
    for i=1,10000 do
        a = a + i
    end
end
\stopbuffer

\typebuffer[jittest] \LuaJitTest{jittest}

\stopjittestsection

\startjittestsection[title={simple loops, with simple function}]

\startbuffer[jittest]
local function whatever(i)
    return i
end

return function()
    local a = 0
    for i=1,10000 do
        a = a + whatever(i)
    end
end
\stopbuffer

\typebuffer[jittest] \LuaJitTest{jittest}

\stopjittestsection

\startjittestsection[title={simple loops, with built-in basic functions}]

\startbuffer[jittest]
return function()
    local a = 0
    for i=1,10000 do
        a = a + math.sin(1/i)
    end
end
\stopbuffer

\typebuffer[jittest] \LuaJitTest{jittest}

\stopjittestsection

\startjittestsection[title={simple loops, with built-in simple functions}]

\startbuffer[jittest]
return function()
    local a = 0
    for i=1,1000 do
        local a = a + tonumber(tostring(i))
    end
end
\stopbuffer

\typebuffer[jittest] \LuaJitTest{jittest}

\stopjittestsection

\startjittestsection[title={simple loops, with built-in simple functions}]

\startbuffer[jittest]
local tostring, tonumber = tostring, tonumber
return function()
    local a = 0
    for i=1,1000 do
        local a = a + tonumber(tostring(i))
    end
end
\stopbuffer

\typebuffer[jittest] \LuaJitTest{jittest}

\stopjittestsection

\startjittestsection[title={simple loops, with built-in complex functions}]

\startbuffer[jittest]
return function()
    local a = 0
    local p = (1-lpeg.P("5"))^0 * lpeg.P("5") + lpeg.Cc(0)
    for i=1,100 do
        local a = a + lpeg.match(p,tostring(i))
    end
end
\stopbuffer

\typebuffer[jittest] \LuaJitTest{jittest}

\stopjittestsection

\startjittestsection[title={simple loops, with foreign function}]

\startbuffer[jittest]
return function()
    local a = 0
    for i=1,10000 do
        a = a + font.current()
    end
end
\stopbuffer

\typebuffer[jittest] \LuaJitTest{jittest}

\stopjittestsection

\startjittestsection[title={simple loops, with wrapped foreign functions}]

\startbuffer[jittest]
local fc = font.current

function font.xcurrent()
    return fc()
end

return function()
    local a = 0
    for i=1,10000 do
        a = a + font.xcurrent()
    end
end
\stopbuffer

\typebuffer[jittest] \LuaJitTest{jittest}

\stopjittestsection

What we do observe here is that turning on jit doesn't always help. By design the
current just|-|in|-|time compiler aborts optimization when it sees a function
that is not known. This means that in \LUAJITTEX\ most code will not get jit,
because we use built|-|in library calls a lot. Also, in version 2.0 we notice
that a bit of extra wrapping will make performance worse too. This might be why
for us jitting doesn't work out the way it is advertised. Often performance tests
are done with simple functions that use built in functions that do get jit. And
the more of those are supported, the better it gets. Although, when you profile a
\CONTEXT\ run, you will notice that we don't call that many standard library
functions, at least not so often that jitting would get noticed.

A safe conclusion is that you can benefit a lot from the fast virtual machine but
should check carefully if jit is not having a negative impact. As it is turned on
by default in \LUAJIT\ (but off in \LUAJITTEX) it might as well get unnoticed,
especially because there is always a performance gain due to the faster virtual
machine and that might show more overall gain than the drawback of jitting
unjittable code. It might just be a bit less drastic then possible because of
artifacts mentioned here, but who knows what future versions of \LUAJIT\ will
bring.

Maybe sometime we can benefit from \type {ffi} but it makes no sense to mess up
the \CONTEXT\ code with related calls: it looks ugly and also makes the code
unusable in stock \LUA, so it is a a sort of no|-|go. There are some suggestions
in \LUAJIT\ related posts about adapting the code to suit the jitter, but again,
that makes no sense. If we need to keep a specific interpreter in mind, we could
as well start writing everything in C. So, our hopes are on future versions of
stock \LUA\ and \LUAJIT. Luigi uncovered the following comment in the source code:

\starttyping
/* C functions can have arbitrary side-effects and are not
recorded (yet). */
\stoptyping

Although the \type {(yet)} indicates that at some point this restriction can be
lifted, we don't expect this to happen soon. And patching the jit machinery
ourselves to suite \LUATEX\ is no option.

There is an important difference between a \LUATEX\ run and other programs: they
are runs and these live short. A lot of code gets executed only once of a few
times (like loading fonts), or gets executed in such different ways that (branch)
prediction is hard. If you run a web server using \LUA\ it runs for weeks in a
row so optimizing a function pays off, given that it gets optimized. When you
have a \LUA\ enhanced interactive program, again, the session is long enough to
benefit from jitting (if applied). And, when you crunch numbers, it might pay off
too. In practice, a \TEX\ run has no such characteristics.

\stopsection

\startsection[title=Implementation]

In \LUA\ 5.2 there are some changes in the implementation compared to 5.1 and
before. It is hard to measure the impact of that but it's probably a win some
here and loose some there situation. A good example is the way \LUA\ deals with
strings. Before 5.2 all strings were hashed, but now only short strings are
(at most 32 bytes are looked at). Now, consider this:

\startitemize
    \startitem
        In \CONTEXT\ we do all font handling in \LUA\ and that involves lots of
        tables with lots of (nicely hashed) short keys. So, comparing them is
        pretty fast.
    \stopitem
    \startitem
        We also read a lot from files, and each line passes filters and such
        before it gets passed to \TEX. There hashing is not really needed,
        although when it gets processed by filters it might as well save some
        time.
    \stopitem
    \startitem
        When we go from \TEX\ to \LUA\ and reverse, lots of strings are involved
        and many of them are unique and used once. There hashing might bring a
        penalty.
    \stopitem
    \startitem
        When we loop over a string with \type {gmatch} or some \type {lpeg}
        subprogram lots of (small) strings can get created and each gets hashed,
        even if they have a short livespan.
    \stopitem
\stopitemize

The above items indicate that we can benefit from hashing but that sometimes it
might have a performance hit. My impression is that on the average we're better
off by hashing and it's one of the reasons why \LUA\ is so fast (and useable).

In \TEX\ all numbers are integers and in \LUA\ all numbers are floats. On modern
computers dealing with floating point is fast and we're not crunching numbers
anyway. We definitely would have an issue when numbers were just integers and an
upcoming mixed integer|/|float model might not be in our advantage. We'll see.

I had expected to benefit from bitwise operations but so far never could find a
real application in \CONTEXT, at least not one that had a positive impact. But
maybe it's just a way of thinking that hasn't evolved yet. Also, the fact that
functions are used instead of a real language extension makes it less possible
that there is a speedup involved.

\stopsection

\startsection[title=Garbage collection]

In the beginning I played with tuning the \LUA\ garbage collector in order to
improve performance. For some documents changing the step and multiplier worked
out well, but for others it didn't, so I decided that one can best leave the
values as they are. Turning the garbage collector off as expected gives a
relative small speedup, and for the average run the extra memory used can be
neglected. Just keep in mind that a \TEX\ run are never persistent so memory
can't keep filling. I did some tests with the in theory faster (experimental)
generational mode of the garbage collector but it made runs significantly slower.
For instance processing the \type {fonts-mkiv.pdf} went from 9 to 9.5 seconds.

\stopsection

\startsection[title=Conclusion]

So what is, given unpredictable performance hits of advertised optimizations, the
best approach. It all starts by the \LUA\ (and \TEX) code: sloppy coding can have
a price. Some of that can be disguised by clever interpreters but some can't. If
the code is already fast, there is not much to gain. When going from \MKII\ to
\MKIV\ more and more \LUA\ got introduced and lots of approaches were
benchmarked, so, I'm already rather confident that there is not that much to
gain. It will never have the impressive performance of interactive games and
that's something we have to live with. As long as \LUA\ stays lean and mean,
things can only get better over time.

\stopsection

\startluacode
    table.save("about-jitting-jit.lua",document.JitRunTimes)
\stopluacode

\stopchapter

\stopcomponent