summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/onandon/onandon-performance.tex
blob: b1b34443d41a47eb4aa6f5b843e4832bdc297037 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
% language=uk

% no zero timing compensation, just simple tests
% m4all book

\startcomponent onandon-performance

\environment onandon-environment

\startchapter[title=Performance]

\startsection[title=Introduction]

This chapter is about performance. Although it concerns \LUATEX\ this text is
only meant for \CONTEXT\ users. This is not because they ever complain about
performance, on the contrary, I never received a complain from them. No, it's
because it gives them some ammunition against the occasionally occurring nagging
about the speed of \LUATEX\ (somewhere on the web or at some meeting). My
experience is that in most such cases those complaining have no clue what they're
talking about, so effectively we could just ignore them, but let's, for the sake
of our users, waste some words on the issue.

\stopsection

\startsection[title=What performance]

So what exactly does performance refer to? If you use \CONTEXT\ there are
probably only two things that matter:

\startitemize[packed]
\startitem How long does one run take? \stopitem
\startitem How many runs do I need? \stopitem
\stopitemize

Processing speed is reported at the end of a run in terms of seconds spent on the
run, but also in pages per second. The runtime is made up out of three
components:

\startitemize[packed]
\startitem start-up time \stopitem
\startitem processing pages \stopitem
\startitem finishing the document \stopitem
\stopitemize

The startup time is rather constant. Let's take my 2013 Dell Precision with
i7-3840QM as reference. A simple

\starttyping
\starttext
\stoptext
\stoptyping

document reports 0.4 seconds but, as we wrap the run in an \type {mtxrun}
management run, we have an additional 0.3 overhead (auxiliary file handling,
\PDF\ viewer management, etc). This includes loading the Latin Modern font. With
\LUAJITTEX, these times are below 0.3 and 0.2 seconds. It might look like a lot
of overhead, but in an edit|-|preview runs it feels snappy. One can try this:

\starttyping
\stoptext
\stoptyping

which bring down the time to about 0.2 seconds for both engines but it doesn't
do anything useful in practice.

Finishing a document is not that demanding, because most gets flushed as we go.
The more (large) fonts we use, the longer it takes to finish a document, but, on
the average that time is not worth noticing. The main runtime contribution comes
from processing the pages.

Okay, this is not always true. For instance, if we process a 400 page book from
2500 small \XML\ files with multiple graphics per page, there is a little
overhead in loading the files and in constructing the \XML\ tree as well as in
inserting the graphics, but in such cases one expects a few seconds longer
runtime. \METAFUN\ manual has some 450 pages with over 2500 runtime|-|generated
\METAPOST\ graphics. It has color, uses quite some fonts, has lots of font
switches (verbatim, too), but, still, one run takes only 18 seconds in stock
\LUATEX\ and less and less that 15 seconds with \LUAJITTEX. Keep these numbers in
mind if a non|-|\CONTEXT\ users bark against the performance tree that his few
page mediocre document takes 10 seconds to compile: the content, styling, quality
of macros and whatever one can come up with all play a role. Personally I find
any rate between 10 and 30 pages per second acceptable, and, if I get the lower
rate, then I normally know pretty well that the job is demanding in all kind of
aspects.

Over time, the \CONTEXT||\LUATEX\ combination, in spite of the fact that more
functionality has been added, has not become slower. In fact, some subsystems
have been sped up. For instance, font handling is very sensitive to adding
functionality. However, each version so far performed a bit better. Whenever some
neat new trickery was added, at the same time improvements were made thanks to
more insight in the matter. In practice, we're not talking of changes in speed by
large factors but more by small percentages. I'm pretty sure that most \CONTEXT\
users never noticed. Recently, a 15\endash30\% speed up (in font handling) was
realized (for more complex fonts), but only when you use such complex fonts and
pages full of text will you see a positive impact on the whole run.

There is one important factor I didn't mention yet: the efficiency of the
console. You can best check that by making a format (\typ {context --make en}).
When that is done by piping the messages to a file, it takes 3.2 seconds on my
laptop and about the same when done from the editor (\SCITE), maybe because the
\LUATEX\ run and the log pane run on a different thread. When I use the standard
console, it takes 3.8 seconds in Windows 10 Creative update (in older versions it
took 4.3 and slightly less when using a console wrapper). The powershell takes
3.2 seconds, which is the same as piping to a file. Interesting is that in Bash
on Windows, it takes 2.8 seconds and 2.6 seconds when piped to a file. Normal
runs are somewhat slower, but it looks like the 64 bit Linux binary is somewhat
faster than the 64 bit mingw version. \footnote {Long ago, we found that \LUATEX\
is very sensitive to for instance the \CPU\ cache, so maybe there are some
differences due to optimization flags and|/|or the fact that bash runs in one
thread, and all file \IO\ takes place in the main Windows instance. Who knows.}
Anyway, it demonstrates that when someone yells a number you need to ask what the
conditions were.

At a \CONTEXT\ meeting, there has been a presentation about possible speed|-|up
of of a run by using, for instance, a separate syntax checker to prevent a
useless run. However, the use case concerned a document that took a minute on the
machine used, while the same document took a few seconds on mine. At the same
meeting, we also did a comparison of speed for a \LATEX\ run using \PDFTEX\ and
the same document migrated to \CONTEXT\ \MKIV\ using \LUATEX\ (Harald K\"onigs
\XML\ torture and compatibility test). Contrary to what one might expect, the
\CONTEXT\ run was significantly faster; the resulting document was a few
gigabytes in size.

\stopsection

\startsection[title=Bottlenecks]

I will discuss a few potential bottlenecks next. A complex integrated system like
\CONTEXT\ has lots of components and some can be quite demanding. However, when
something is not used, it has no (or hardly any) impact on performance. Even when
we spend a lot of time in \LUA, that is not the reason for a slow|-|down.
Sometimes using \LUA\ results in a speedup, sometimes it doesn't matter. Complex
mechanisms like natural tables, for instance, will not suddenly become less
complex. So, let's focus on the \quotation {aspects} that come up in those
complaints: fonts and \LUA. Because I only use \CONTEXT\ and occasionally test
with the plain \TEX\ version that we provide, I will not explore the potential
impact of using truckloads of packages, styles, and such, which I'm sure of plays
a role, but one neglected in my discussion.

\startsubsubject[title=Fonts]

According to the principles of \LUATEX, we process (\OPENTYPE) fonts using \LUA.
That way, we have complete control over any aspect of font handling, and can, as
to be expected in \TEX\ systems, provide users what they need, now and in the
future. In fact, if we didn't had that freedom in \CONTEXT, I'd probably already
quit using \TEX\ a decade ago and found myself some other (programming) niche.

After a font has been loaded, part of the data gets passed to the \TEX\ engine,
so that it can do its work. For instance, in order to be able to typeset a
paragraph, \TEX\ needs to know the dimensions of glyphs. Once a font has been
loaded (that is, the binary blob) it's fetched from a cache the next time.
Initial loading (and preparation) takes some time, depending on the complexity
and the size of the font. Loading from cache is close to instantaneous. After
loading, the dimensions are passed to \TEX\ but all data remains accessible for
any desired usage. The \OPENTYPE\ feature processor, for instance, uses that data
and \CONTEXT, for sure, needs that data (quickly accessible) for different
purposes, too.

When a font is used in so|-|called base mode, we let \TEX\ do the ligaturing and
kerning. This is possible with simple fonts and features. If you have a critical
workflow, you might enable base mode, which can be done per font instance.
Processing in node mode takes some time, but how much depends on the font and
script. Normally, there is no difference between \CONTEXT\ and generic usage. In
\CONTEXT, we also have dynamic features, and the impact on performance depends on
usage. In addition to base and node, we also have plug mode, but that is only used
for testing and therefore not advertised.

Every \type {\hbox} and every paragraph goes through the font handler. Because
we support mixed modes, some analysis takes place, and because we do more in
\CONTEXT, the generic analyzer is more lightweight, which again can mean that a
generic run is not slower than a similar \CONTEXT\ one.

Interesting is that added functionality for variable and|/|or color fonts had no
impact on performance. Runtime|-|added user features can have some impact, but,
when defined well, it can be neglected. I bet that when you add additional node
list handling yourself, its impact on performance will be larger. But in the end
what counts is that the job gets done and the more you demand the higher the
price you pay.

\stopsubsubject

\startsubsubject[title=\LUA]

The second possible bottleneck when using \LUATEX\ can be in using \LUA\ code.
However, using that is laughable as an argument for slow runs. For instance,
\CONTEXT\ \MKIV\ can easily spend half its time in \LUA, and that is not making
it any slower than \MKII\ using \PDFTEX\ doing equally complex things. For
instance, the embedded \METAPOST\ library makes \MKIV\ way faster than \MKII, and
the built|-|in \XML\ processing capabilities in \MKIV\ can easily beat \MKII\
\XML\ handling, apart from the fact that it can do more, like filtering by path
and expression. In fact, files that take, say, half a minute in \MKIV, could as
well have taken 15 minutes or more in \MKII\ (and imagine multiple runs then).

So, for \CONTEXT, using \LUA\ to achieve its objectives is mandatory. The
combination of \TEX, \METAPOST\ and \LUA\ is pretty powerful! Each of these
components is really fast. If \TEX\ is your bottleneck, review your macros! When
\LUA\ seems to be the bad, go over your code and make it better. Much of the
\LUA\ code I see flying around doesn't look that efficient, which is okay, because
the interpreter is really fast, but don't blame \LUA\ beforehand, blame your
coding (style) first. When \METAPOST\ is the bottleneck, well, sometimes not much
can be done about it, but when you know that language well enough, you can often
make it perform better.

For the record: every additional mechanism that kicks in, like character spacing
(the ugly one), case treatments, special word and line trickery, marginal stuff,
graphics, line numbering, underlining, referencing, and a few dozen more will add
a bit to the processing time. In that case, in \CONTEXT, the font related runtime
gets pretty well obscured by other things happening, just that you know.

\stopsubsubject

\stopsection

\startsection[title=Some timing]

Next, I will show some timings related to fonts. For this, I use stock \LUATEX\
(second column) as well as \LUAJITTEX\ (last column), which, of course, performs
much better. The timings are rounded to three decimal places, but, as the system
load is usually only consistent in a set of test runs, the last two decimals only
matter in relative comparison. So, for comparing runs over time, round to the
first decimal. Let's start with loading a bodyfont. This happens once per
document, and one usually only has one bodyfont active. Loading involves
definitions as well as setting up math, so a couple of fonts are actually loaded
even if they're not used later on. A setup normally involves a serif, sans, mono
and math setup (in \CONTEXT). \footnote {The timing for Latin Modern is so low,
because that font is loaded already.}

\environment onandon-speed-000

\ShowSample{onandon-speed-000} % bodyfont

There is a bit of a difference between the font sets, but a safe average is 150
milliseconds, and this is rather constant over runs.

An actual font switch can result in loading a font, but this is a one|-|time overhead.
Loading four variants (regular, bold, italic and bold italic) roughly takes the
following time:

\ShowSample{onandon-speed-001} % four variants

Using them again later on takes no time:

\ShowSample{onandon-speed-002} % four variants

Before we start timing the font handler, a few baseline benchmarks are shown.
When no font is applied and nothing else is done with the node list, we get:

\ShowSample{onandon-speed-009}

A simple monospaced, no|-|features|-|applied, run takes a bit more:

\ShowSample{onandon-speed-010}

Now, we show a one|-|font typesetting run. As with the two benchmarks before, we
just typeset a text in a \type {\hbox}, so no par builder interference happens.
We use the \type {sapolsky} sample text and typeset it 100 times 4, first without
font switches.

\ShowSample{onandon-speed-003}

Much more runtime is needed when we typeset with four font switches. Ebgaramond
is the most demanding. Actually, we're not doing 4 fonts there because ebgaramond
has no bold, so the numbers are a bit lower than expected for this example. One
reason for it being demanding is that it has lots of (contextual) lookups.
Combining lookups saves space and time, so complexity of a font is not always a
good predictor for performance hits.

% \ShowSample{onandon-speed-004}

If we typeset paragraphs, we get the following:

\ShowSample{onandon-speed-005}

We're talking of some 275 pages here.

\ShowSample{onandon-speed-006}

There is, of course overhead in handling paragraphs and pages:

\ShowSample{onandon-speed-011}

Before I discuss these numbers in more detail, two more benchmarks are
shown. The next table concerns a paragraph with only a few (bold) words.

\ShowSample{onandon-speed-007}

The following table has paragraphs with a few mono spaced words
typeset using \type{\type}.

\ShowSample{onandon-speed-008}

When a node list (hbox or paragraph) is processed, each glyph is looked at. One
important property of \LUATEX\ (compared to \PDFTEX) is that it hyphenates the
whole text, not only the most feasible spots. For the \type {sapolsky} snippet,
this results in 200 potential breakpoints registered in an equal number of
discretionary nodes. The snippet has 688 characters grouped into 125 words and,
because it's an English quote, we're not hampered with composed characters or
complex script handling. And, when we mention 100 runs, then we actually mean
400 ones when font switching and bodyfonts are compared

\startnarrower
    \showglyphs \showfontkerns
    \input sapolsky \wordright{Robert M. Sapolsky}
\stopnarrower

In order to get substitutions and positioning right, we need not only to consult
streams of glyphs but also combinations with preceding pre or replace, or
trailing post and replace texts. When a font has a bit more complex substitutions,
as ebgaramond has, multiple (sometimes hundreds of) passes over the list are made.
This is why the more complex a font is, the more runtime is involved.

Another factor, one you could easily deduce from the benchmarks, is intermediate
font switches. Even a few such switches (in the last benchmarks) already result
in a runtime penalty. The four switch benchmarks show an impressive increase of
runtime, but it's good to know that such a situation seldom happens. It's also
important not to confuse, for instance, a verbatim snippet with a bold one. The
bold one is indeed leading to a pass over the list, but verbatim is normally
skipped, because it uses a font that needs no processing. That verbatim or bold
have the same penalty is mainly due to the fact that verbatim itself is costly:
the text is picked up using a different catcode regime and travels through \TEX\
and \LUA\ before it finally gets typeset. This relates to special treatments of
spacing, syntax highlighting, and such.

Also, keep in mind that the page examples are quite unreal. We use a layout with
no margins, just text from edge to edge.

\placefigure
  {\SampleTitle{onandon-speed-005}}
  {\externalfigure[onandon-speed-005][frame=on,orientation=90,width=.45\textheight]}

\placefigure
  {\SampleTitle{onandon-speed-006}}
  {\externalfigure[onandon-speed-006][frame=on,orientation=90,maxwidth=.45\textheight,maxheight=\textwidth]}

\placefigure
  {\SampleTitle{onandon-speed-007}}
  {\externalfigure[onandon-speed-007][frame=on,orientation=90,width=.45\textheight]}

\placefigure
  {\SampleTitle{onandon-speed-008}}
  {\externalfigure[onandon-speed-008][frame=on,orientation=90,width=.45\textheight]}

\placefigure
  {\SampleTitle{onandon-speed-011}}
  {\externalfigure[onandon-speed-011][frame=on,orientation=90,width=.45\textheight]}

So, what is a realistic example? That is hard to say. Unfortunately, no one has
ever asked us to typeset novels. They are rather brain dead-products for a
machinery, so they process fast. On the mentioned laptop, 350 word pages in
Dejavu fonts can be processed at a rate of 75 pages per second with \LUATEX\ and
over 100 pages per second with \LUAJITTEX . On a more modern laptop or a
professional server, the performance is of course better. And, for automated
flows, batch mode is your friend. The rate is not much worse for a document in a
language with a bit more complex character handling, take accents or ligatures.
Of course, \PDFTEX\ is faster on such a dumb document, but kick in some more
functionality, and the advantage quickly disappears. So, if someone complains
that \LUATEX\ needs 10 or more seconds for a simple few page document \unknown\
you can bet that when the fonts are seen as reason, then the setup is pretty bad.
Personally I would not waste time on such a complaint.

\stopsection

\startsection[title=Valid questions]

Here are some reasonable questions that you can ask when someone complains to you
about the slowness of \LUATEX:

\startsubsubject[title={What engines do you compare?}]

If you come from \PDFTEX, you come from an 8-bit world: input and font handling
are based on bytes, and hyphenation is integrated into the par builder. If you use
\UTF-8\ in \PDFTEX, the input is decoded by \TEX\ macros, which carries a speed
penalty. Because in the wide engines macro names can also be \UTF\ sequences,
construction of macro names is less efficient too.

When you try to use wide fonts, there is, again, a penalty. Now, if you use
\XETEX\ or \LUATEX, your input is \UTF-8, which becomes something 32-bit
internally. Fonts are wide, so more resources are needed, apart from these fonts
being larger and in need of more processing due to feature handling. Where
\XETEX\ uses a library, \LUATEX\ uses its own handler. Does that have a
consequence for performance? Yes and no. First of all, it depends on how much
time is spent on fonts at all, but even then, the difference is not that large.
Sometimes \XETEX\ wins, sometimes it's \LUATEX. One thing is clear: \LUATEX\ is
more flexible as we can roll out our own solutions and therefore do more advanced
font magic. For \CONTEXT, it doesn't matter as we use \LUATEX\ exclusively, and
we rely on the flexible font handler, also for future extensions. If really
needed, you can kick in a library-based handler but it's (currently) not
distributed as we lose other functionality, which would, in turn, result in
complaints about that fact (apart from conflicting with the strive for
independence).

There is no doubt that \PDFTEX\ is faster, but, for \CONTEXT, it's an obsolete
engine. The hard-coded-solutions engine \XETEX\ is not feasible for \CONTEXT\
either. So, in practice, \CONTEXT\ users have no choice: \LUATEX\ is used, but
users of other macro packages can use the alternatives if they are not satisfied
with performance. The fact that \CONTEXT\ users don't complain about speed is a
clear signal that this is a no|-|issue. And, if you want more speed, you can always
use \LUAJITTEX. \footnote {In plug mode, we can actually test a library and
experiments have shown that performance on the average is much worse, but it can
be a bit better for complex scripts, although a gain gets unnoticed in normal
documents. So, one can decide to use a library but at the cost of much other
functionality that \CONTEXT\ offers, so we don't support it.} In the last section,
the different engines will be compared in more detail.

Just that you know, when we do the four|-|switches example in plain \TEX\ on my
laptop, I get a rate of 40 pages per second, and, for one font, 180 pages per
second. There is, of course, a bit more going on in \CONTEXT\ in page building
and so, but the difference between plain and \CONTEXT\ is not that large.

\stopsubsubject

\startsubsubject[title={What macro package is used?}]

When plain \TEX\ is used, a follow up question is: what variant? The \CONTEXT\
distribution ships with \type {luatex-plain}, and that is our benchmark. If there
really is a bottleneck, it is worth exploring, but keep in mind that, in order to
be plain, not that much can be done. The \LUATEX\ part is just an example of an
implementation. We already discussed \CONTEXT, and for \LATEX, I don't want to
speculate where performance hits might come from. When we're talking fonts,
\CONTEXT\ can actually be a bit slower than the generic (or \LATEX) variant, because
we can kick in more functionality. Also, when you compare macro packages, keep in
mind that, when node list processing code is added in that package, the impact
depends on interaction with other functionality and depends on the efficiency of
the code. You can't compare mechanisms or draw general conclusions when you don't
know what else is done!

\stopsubsubject

\startsubsubject[title={What do you load?}]

Most \CONTEXT\ modules are small and load fast. Of course, there can be exceptions
when we rely on third party code; for instance, loading tikz takes a bit of
time. It makes no sense to look for ways to speed that system up, because it is
maintained elsewhere. There can probably be gained a bit, but, again, no user
has complained so far.

If \CONTEXT\ is not used, one probably also uses a large \TEX\ installation.
File lookup in \CONTEXT\ is done differently, and can be faster. Even loading
can be more efficient in \CONTEXT, but it's hard to generalize that conclusion.
If one complains about loading fonts being an issue, just try to measure how much
time is spent on loading other code.

\stopsubsubject

\startsubsubject[title={Did you patch macros?}]

Not everyone is a \TEX pert. So, coming up with macros that are expanded many
times and|/|or have inefficient user interfacing, can have some impact. If someone
complains about one subsystem being slow, then honesty demands to complain about
other subsystems as well. You get what you ask for.

\stopsubsubject

\startsubsubject[title={How efficient is the code that you use?}]

Writing super|-|efficient code only makes sense when it's used frequently. In
\CONTEXT, most code is reasonable efficient. It can be that in one document fonts
are responsible for most runtime, but in another document, table construction can
be more demanding while yet another document puts some stress on interactive
features. When hz or protrusion is enabled, then you run substantially slower
anyway, so when you are willing to sacrifice 10 \% or more of runtime, don't
complain about other components. The same is true for enabling \SYNCTEX: if you
are willing to add more than 10 \% of runtime for that, don't wither about the
same amount for font handling. \footnote {In \CONTEXT, we use a \SYNCTEX\
alternative that is somewhat faster, but it remains a fact that enabling more and
more functionality will make the penalty of, for instance, font processing
relatively small.}

\stopsubsubject

\startsubsubject[title={How efficient is the styling that you use?}]

Probably the most easily overlooked optimization is in switching fonts and colors.
Although in \CONTEXT, font switching is fast, I have no clue about it in other
macro packages. But in a style, you can decide to use inefficient (massive) font
switches. The effects can easily be tested by commenting bit and pieces. For
instance, sometimes you need to do a full bodyfont switch when changing a style,
like assigning \type {\small\bf} to the \type {style} key in \type {\setuphead},
but often using e.g.\ \type {\tfd} is much more efficient and works quite as
well. Just try it.

\stopsubsubject

\startsubsubject[title={Are fonts really the bottleneck?}]

We already mentioned that one can look in the wrong direction. Maybe, once someone
is convinced that fonts are the culprit, it gets hard to look at the real issue.
If a similar job in different macro packages has a significantly different runtime,
one can wonder what happens indeed.

It is good to keep in mind that the amount of text is often not as large as you
think. It's easy to do a test with hundreds of paragraphs of text, but, in practice,
we have whitespace, section titles, half empty pages, floats, itemize and similar
constructs, etc. Often, we don't mix many fonts in the running text either. So,
in the end, a real document is your best test.

\stopsubsubject

\startsubsubject[title={If you use \LUA, is that code any good?}]

You can gain from the faster virtual machine of \LUAJITTEX. Don't expect wonders
from the jitting as that only pays off in long runs with the same code used over
and over again. If the gain is high, you can even wonder how well-written your
\LUA\ code is anyway.

\stopsubsubject

\startsubsubject[title={What if they don't believe you?}]

So, say that someone finds \LUATEX\ slow, what can be done about it? Just advice
them to stick to their previously|-|used tool. Then, if arguments come that one
also wants to use \UTF-8, \OPENTYPE\ fonts, a bit of \METAPOST, and is looking
forward to using \LUA\ runtime, the only answer is: take it or leave it. You pay
a price for progress, but, if you do your job well, the price is not that high.
Tell them to spend time on learning and maybe adapting and to bark against their own
tree before barking against those who took that step a decade ago. Most \CONTEXT\
users took that step and someone still using \LUATEX\ after a decade can't be
that stupid. It's always best to first wonder what one actually asks from \LUATEX,
and if the benefit of having \LUA\ on board has an advantage. If not, one can
just use another engine.

Also think of this: when a job is slow, for me it's no problem to identify where
the problem is. The question then is: can something be done about it? Well, I
happily keep the answer for myself. After all, some people always need room to
complain, maybe if only to hide their ignorance or incompetence. Who knows.

\stopsubsubject

\stopsection

\startsection[title={Comparing engines}]

The next comparison is to be taken with a grain of salt and concerns the state of
affairs mid-2017. First of all, you cannot really compare \MKII\ with \MKIV: the
latter has more functionality (or a more advanced implementation of
functionality). And, as mentioned, you can also not really compare \PDFTEX\ and the
wide engines. Anyway, here are some (useless) tests. First, a bunch of loads. Keep
in mind that different engines also deal differently with reading files. For
instance, \MKIV\ uses \LUATEX\ callbacks to normalize the input and has its own
readers. There is a bit more overhead in starting up a \LUATEX\ run, and some
functionality is enabled that is not present in \MKII. The format is also larger,
if only because we preload a lot of useful font, character and script related
data.

\starttyping
\starttext
    \dorecurse {#1} {
        \input knuth
        \par
    }
\stoptext
\stoptyping

When looking at the numbers, one should realize that the times include startup and
job management by the runner scripts. We also run in batchmode to avoid logging
to influence runtime. The average is calculated from 5 runs.

% sample 1, number of runs: 5

\starttabulate[||r|r|r|]
\HL
\BC engine \BC 50 \BC 500 \BC 2500 \NC \NR
\HL
\BC pdftex    \NC 0.43 \NC 0.77 \NC 2.33 \NC \NR
\BC xetex     \NC 0.85 \NC 2.66 \NC 10.79 \NC \NR
\BC luatex    \NC 0.94 \NC 2.50 \NC 9.44 \NC \NR
\BC luajittex \NC 0.68 \NC 1.69 \NC 6.34 \NC \NR
\HL
\stoptabulate

The second example does a few switches in a paragraph:

\starttyping
\starttext
    \dorecurse {#1} {
        \tf \input knuth
        \bf \input knuth
        \it \input knuth
        \bs \input knuth
        \par
    }
\stoptext
\stoptyping

% sample 2, number of runs: 5

\starttabulate[||r|r|r|]
\HL
\BC engine \BC 50 \BC 500 \BC 2500 \NC \NR
\HL
\BC pdftex    \NC 0.58 \NC 2.10 \NC 8.97 \NC \NR
\BC xetex     \NC 1.47 \NC 8.66 \NC 42.50 \NC \NR
\BC luatex    \NC 1.59 \NC 8.26 \NC 38.11 \NC \NR
\BC luajittex \NC 1.12 \NC 5.57 \NC 25.48 \NC \NR
\HL
\stoptabulate

The third example does more, resulting in multiple subranges per style:

\starttyping
\starttext
    \dorecurse {#1} {
        \tf \input knuth \it knuth
        \bf \input knuth \bs knuth
        \it \input knuth \tf knuth
        \bs \input knuth \bf knuth
        \par
    }
\stoptext
\stoptyping

% sample 3, number of runs: 5

\starttabulate[||r|r|r|]
\HL
\BC engine \BC 50 \BC 500 \BC 2500 \NC \NR
\HL
\BC pdftex    \NC 0.59 \NC 2.20 \NC 9.52 \NC \NR
\BC xetex     \NC 1.49 \NC 8.88 \NC 43.85 \NC \NR
\BC luatex    \NC 1.64 \NC 8.91 \NC 41.26 \NC \NR
\BC luajittex \NC 1.15 \NC 5.91 \NC 27.15 \NC \NR
\HL
\stoptabulate

The last example adds some color. Enabling more functionality can have an impact
on performance. In fact, as \MKIV\ uses a lot of \LUA\ and is also more advanced
that \MKII, one can expect a performance hit, but, in practice, the opposite
happens, which can also be due to some fundamental differences deep down at the
macro level.

\starttyping
\setupcolors[state=start] % default in MkIV

\starttext
    \dorecurse {#1} {
        {\red \tf \input knuth \green \it knuth}
        {\red \bf \input knuth \green \bs knuth}
        {\red \it \input knuth \green \tf knuth}
        {\red \bs \input knuth \green \bf knuth}
        \par
    }
\stoptext
\stoptyping

% sample 4, number of runs: 5

\starttabulate[||r|r|r|]
\HL
\BC engine \BC 50 \BC 500 \BC 2500 \NC \NR
\HL
\BC pdftex    \NC 0.61 \NC 2.36 \NC 10.33 \NC \NR
\BC xetex     \NC 1.53 \NC 9.25 \NC 45.59 \NC \NR
\BC luatex    \NC 1.65 \NC 8.91 \NC 41.32 \NC \NR
\BC luajittex \NC 1.15 \NC 5.93 \NC 27.34 \NC \NR
\HL
\stoptabulate

In these measurements, the accuracy is a few decimals, but a pattern is visible.
As expected, \PDFTEX\ wins on simple documents but starts losing when things get
more complex. For these tests, I used 64 bit binaries. A 32-bit \XETEX\ with
\MKII\ performs the same as \LUAJITTEX\ with \MKIV, but a 64-bit \XETEX\ is
actually quite a bit slower. In that case, the mingw cross|-|compiled \LUATEX\
version does pretty well. A 64-bit \PDFTEX\ is also slower (it looks) than a
32-bit version. So, in the end, there are more factors that play a role. Choosing
between \LUATEX\ and \LUAJITTEX\ depends on how well the memory|-|limited
\LUAJITTEX\ variant can handle your documents and fonts.

Because in most of our recent styles we use \OPENTYPE\ fonts and (structural)
features as well as recent \METAFUN\ extensions only present in \MKIV, we cannot
compare engines using such documents. The mentioned performance of \LUATEX\ (or
\LUAJITTEX) and \MKIV\ on the \METAFUN\ manual illustrate that, in most cases, this
combination is a clear winner.

\starttyping
\starttext
    \dorecurse {#1} {
        \null \page
    }
\stoptext
\stoptyping

This gives:

% sample 5, number of runs: 5

\starttabulate[||r|r|r|]
\HL
\BC engine \BC 50 \BC 500 \BC 2500 \NC \NR
\HL
\BC pdftex    \NC 0.46 \NC 1.05 \NC 3.72 \NC \NR
\BC xetex     \NC 0.73 \NC 1.80 \NC 6.56 \NC \NR
\BC luatex    \NC 0.84 \NC 1.44 \NC 4.07 \NC \NR
\BC luajittex \NC 0.61 \NC 1.10 \NC 3.33 \NC \NR
\HL
\stoptabulate

That leaves the zero run:

\starttyping
\starttext
    \dorecurse {#1} {
        % nothing
    }
\stoptext
\stoptyping

This gives the following numbers. In longer runs, the difference in overhead is
negligible.

% sample 6, number of runs: 5

\starttabulate[||r|r|r|]
\HL
\BC engine \BC 50 \BC 500 \BC 2500 \NC \NR
\HL
\BC pdftex    \NC 0.36 \NC 0.36 \NC 0.36 \NC \NR
\BC xetex     \NC 0.57 \NC 0.57 \NC 0.59 \NC \NR
\BC luatex    \NC 0.74 \NC 0.74 \NC 0.74 \NC \NR
\BC luajittex \NC 0.53 \NC 0.53 \NC 0.54 \NC \NR
\HL
\stoptabulate

It will be clear that when we use different fonts, the numbers will also be
different. And, if you use a lot of runtime \METAPOST\ graphics (for instance for
backgrounds), the \MKIV\ runs end up at the top. And, when we process \XML, it
will be clear that going back to \MKII\ is no longer a realistic option. It must
be noted that I occasionally manage to improve performance, but we've now reached
a state where there is not that much to gain. Some functionality is hard to
compare. For instance, in \CONTEXT, we don't use much of the \PDF\ backend
features because we implement them all in \LUA. In fact, even in \MKII, already
done in \TEX, so in the end, the speed difference there is not large and often in
favour of \MKIV.

For the record, I mention that shipping out the about 1250 pages has some overhead
too: about 2 seconds. Here, \LUAJITTEX\ is 20\% more efficient, which is an
indication of quite some \LUA\ involvement. Loading the input files has an
overhead of about half a second. Starting up \LUATEX\ takes more time than
\PDFTEX\ and \XETEX, but that disadvantage disappears with more pages. So, in the
end, there are quite some factors that blur the measurements. In practice, what
matters is convenience: does the runtime feel reasonable and, in most cases, it
does.

If I would replace my laptop with a reasonable comparable alternative, that one
would be some 35\% faster (single threads on processors don't gain much per year).
I guess that this is about the same increase in performance than \CONTEXT\
\MKIV\ got in that period. I don't expect such a gain in the upcoming years, so,
at some point, we're stuck with what we have.

\stopsection

\startsection[title=Summary]

So, how \quotation {slow} is \LUATEX\ really compared to the other engines? If we
go back in time to when the first wide engines showed up, \OMEGA\ was considered
to be slow, although I never tested that myself. Then, when \XETEX\ showed up,
there was not much talk about speed, just about the fact that we could use
\OPENTYPE\ fonts and native \UTF\ input. If you look at the numbers, for sure you
can say that it was much slower than \PDFTEX. So, how come that some people
complain about \LUATEX\ being so slow, especially when we take into account that
it's not that much slower than \XETEX, and that \LUAJITTEX\ is often faster than
\XETEX. Also, computers have become faster. With the wide engines, you get more
functionality and that comes at a price. This was accepted for \XETEX\ and is
also acceptable for \LUATEX. But the price is nto that high if you take into
account that hardware performs better: you just need to compare \LUATEX\ (and
\XETEX) runtime with \PDFTEX\ runtime 15 years ago.

As a comparison, look at games and video. Resolution became much higher as did
color depth. Higher frame rates were in demand. Therefore, the hardware had to
become faster, and it did, and, as a result, the user experience kept up. No user
will say that a modern game is slower than an old one, because the old one does
500 frames per second compared to some 50 for the new game on the modern
hardware. In a similar fashion, the demands for typesetting became higher:
\UNICODE, \OPENTYPE, graphics, \XML, advanced \PDF, more complex (niche)
typesetting, etc. This happened more or less in parallel with computers becoming
more powerful. So, as with games, the user experience didn't degrade with
demands. Comparing \LUATEX\ with \PDFTEX\ is like comparing a low|-|res,
low|-|framerate, low|-|color game with a modern one. You need to have
up|-|to|-|date hardware and even then, the writer of such programs needs to make
sure that they run efficiently, simply because hardware no longer scales like it
did decades ago. You need to look at the bigger picture.

\stopsection

\stopchapter

\stopcomponent