summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/followingup/followingup-evolution.tex
blob: 730f4cc1b849105b295fb42219aeee8b37f995e5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
% language=us

\startcomponent followingup-evolution

\environment followingup-style

% Yes, music is still evolving in qualitive ways ...
%
% Home Is - Jacob Collier with VOCES8
%
% and as long as there's interesting new music to run into I keep
% doing thse kind of things.

\startchapter[title={Evolution}]

\startsection[title={Introduction}]

The original idea behind \TEX\ is that of a relatively small kernel with (either
or not system dependent) extensions. One such extension is the \DVI\ backend, and
later \PDFTEX\ added a \PDF\ backend. Other extensions are \quote {writing to
files} and \quote {writing to the output medium} using so called specials. This
extension mechanism permits \TEX\ to support, for instance, color and image
inclusion.

The \LUATEX\ project started from \PDFTEX, including its extensions like font
expansion, and combined that with (bi|)|directional typesetting from the, at that
moment, stable \OMEGA\ variant \ALEPH. During the more than a decade development
we integrated expansion in a more efficient way and limited directions to the
four that made sense. The assumption that \UNICODE\ has the future lead to \UTF8
being used all over the place.

The \LUATEX\ variant opens up the internals using the \LUA\ extension language.
The idea was (and still is) that instead if adding more and more hard coded
solutions, one can use \LUA\ to do it on demand. So, for instance \OPENTYPE\
fonts are supported by providing a font file reader but the implementation of
features is up to \LUA. From \PDFTEX\ the graphic inclusions were inherited but
an image and \PDF\ reading library provided a few more possibilities, for
instance for querying properties. An important integral part of \LUATEX\ is the
\METAPOST\ library, but apart from that one, the amount of libraries is kept at a
minimum. That way we're free of dependencies and compilation hassles.

With version 1.0 the functionality became official and with version 1.1 the
functionality became more of less frozen. The main reason for this is that
further extensions would violate the principle of using \LUA\ instead of hard
coding solutions. Another reason is that at some point you have to provide a
stable machinery for macro packages so that backward as well as forward
compatibility over a longer period is possible. Also, because one can use \TEX\
in (unattended) workflows sudden changes become undesirable.

\stopsection

\startsection[title={What next?}]

Does it stop here? We have reached a reasonable stable state with \CONTEXT\
\MKIV\ and can basically do what we want to do. However, during the more than a
decade development of this \MKII\ follow up, the idea surfaced that we can go
more minimal in the engine. Basically we can go back to where \TEX\ started: a
core plus extension mechanism. What does that mean? First of all, there is the
very efficient frontend: scanning macros, expanding them and constructing node
lists, all within a powerful grouping mechanism. There is no reason to reconsider
that. The core of the interface is also well documented, for instance in the
\TEX\ book. We added some primitives to \LUATEX, but most of them are of no real
importance to users; they make more sense to macro package writers.

Original \TEX\ has a \DVI\ backend which is a simple representation of a page:
characters and rules positioned on some grid. A separate program has to convert
that into something for a printer. There is a basic extension mechanism that
permits injection of so called specials that get passed to the external program
so that for instance an image can be included. Given that \LUATEX\ is mostly used
to generate \PDF, using so called wide fonts in a \UNICODE\ universe, a \DVI\
backend is not that useful. In fact, one can then better use the faster \PDFTEX\
program or just \ETEX\ or \TEX: use the best tool available for the job.

The backend however can be left out and can be implemented in \LUA\ instead. In
fact, most of the backend related code in \CONTEXT\ doesn't really use the
\LUATEX\ backend features at all. The backend is only used to convert the page
stream to a \PDF\ content stream, include images, include fonts and manage low
level objects. Everything specific to \PDF\ is already done in \LUA. Of course
this has a performance penalty but given the overhead already present in
\CONTEXT\ it is bearable.

Alongside the frontend the \METAPOST\ library plays an important role in
\CONTEXT: integration between \TEX, \METAPOST\ and \LUA\ is pretty tight and a
unique property of \CONTEXT. But, for instance the font reader library is no
longer used. Also the interfacing to the \TEX\ Directory Structure was done in
\LUA, originally for performance reasons as it reduced startup time by more that
a second. For some of the frontend code (like hyphenation and par building) we
can kick in \LUA\ variants too but there is not much to gain there. (I know that
some users use them with success.)

So, traditional \TEX\ can be summarized as:

\starttyping
tex core + dvi backend + tex extensions
\stoptyping

where the extension interface provide a few goodies. If we would have to summarize
\LUATEX\ we could say:

\starttyping
tex core + dvi & pdf backend + tex extensions + lua callbacks
\stoptyping

The core interprets the input and does the typesetting. In order to be able to
typeset \TEX\ only needs the dimensions of characters and information about
spacing (which in principle are sort of independent) in math mode a few more
properties are needed, like snippets that make large symbols. In text mode
ligature and kerning information can be used too. However, in \LUATEX, where
normally \OPENTYPE\ fonts are used, that information is provided from \LUA. This
means that one can also think of:

\starttyping
tex core + basic font data + tex extensions + lua callbacks
\stoptyping

Compared to regular \TEX\ this is not that different, and it's what \CONTEXT\ can
do with. So, it will be no surprise that when I wondered what \LUATEX\ 2.0 could
be that a more minimalistic approach was considered: back to the basics.

\stopsection

\startsection[title={Roadmap}]

Before I continue it is good to mention the following. One of the burdens that
\CONTEXT\ users (and developers) carry is that the outside world likes putting
labels on \CONTEXT, like \quotation {A macro package depending on \PDFTEX} in a
time that we supported \DVI\ at the same level using a more of less generic
driver model. The same is true for \MKIV, e.g.\ \quotation {\CONTEXT\ uses a lot
of \LUA\ and moves away from \TEX} while in fact we provide a hybrid tool: you
can use \TEX\ input (which most users do) but also \LUA\ (which can be handy) or
\XML\ (which some publishers demand and definitely seems to be used by some
\CONTEXT\ power users). A special one is \quotation {\CONTEXT\ is kind of plain
\TEX, so you have to program all yourself.} Reality is that \CONTEXT\ is an
integrated system, where \TEX\ and \METAPOST\ work together to provide a lot of
integrated functionality. Because of \LUATEX\ development and the relation
between an updated engine and the beta version of \CONTEXT, the impression can be
that we have an unstable system. This strategy of parallel adaptation is the only
way to really test of things work as expected. Because we have a rather fast
update cycle normally users don't suffer that much from it.

The core of whatever we follow up with is and remains \TEX, just because I like
it. So, when I talk about a small core, I actually still talk about \TEX. The
main reason is that it's way easier (and readable) to code some solutions in this
hybrid fashion. A pure \LUA\ solution is no fun, maybe even a pain, and I have no
use for it, but a pure \TEX\ solution can be cumbersome too. And \TEX\ input is
just very convenient and for that one needs a \TEX\ interpreter. I would already
have dropped out when \TEX\ was not part of the game: an intriguing, puzzling and
powerful toy. And \METAPOST\ and \LUA\ add even more fun. So, I settle for a mix
between three interesting languages. And, because I seldom run into professional
demand for \LUATEX\ related support (or high end, high performance rendering),
the fun factor has always been the driving force.

All that said, for practical reasons, when we explore a follow up in the
perspective of \CONTEXT, we will use the working title \LUAMETATEX\ instead.
\LUAMETATEX\ has the current \LUATEX\ frontend, some \LUA\ libraries, but no
backend. Gone are the font reader, image inclusion, \DVI\ and \PDF\ backend
(including font inclusion) and the interface to the \TDS. Can that work? As
mentioned, the font reader was already not used in \CONTEXT\ for quite a while. An
alternative page stream builder was also in good working condition in \CONTEXT\
when \LUATEX\ 1.08 was released and around \LUATEX\ 1.09 image inclusion was
replaced (\PDF\ inclusion was already accompanied for a while by a \LUA\
variant). Currently (fall 2018) \CONTEXT\ is able to completely construct the
\PDF\ file which also meant font inclusion. However, it didn't make much sense to
release that code yet because after all, there was minimal gain when using it
with a full blown \LUATEX. Also, switching to this variant involved some runtime
adaption of code which might confuse users. But above all, it needed more
testing, and releasing something before an upcoming \TEX Live code freeze is a
bad idea.

During \LUATEX\ development a few times we got suggestions for additional
features but merely looking at them already made clear that what works for
someone in a particular case, can introduce side effects that make (for instance)
\CONTEXT\ fail. And, how many folks keep \CONTEXT\ in mind? So, when \LUATEX\
goes into maintenance mode, specific distributions could accept patches outside
our control, which has the danger that a binary (suggesting to be \LUATEX)
doesn't work with \CONTEXT. Of course we cannot change something ourselves either
without looking around. And I'm not even bringing possible negative side effects
on performance into the discussion here.

When developing \LUATEX\ some ideas were dropped or delayed and these can now be
explored without the danger of messing up the stable version. It has always been
relatively easy to adapt \CONTEXT\ to changes so an (at least for now)
experimental follow up can be dealt with too, but this time the concept of \quote
{experimental} is really bound to \CONTEXT. When something is found useful (or
can be improved) it can always (after testing it for a while) be fed back into
\LUATEX, as long as it doesn't break something. I'll decide on that later.

In the documentation of \TEX, when discussing the extension mechanism, Donald
Knuth says:

\startquotation
The goal of a \TEX\ extender should be to minimize alterations to the standard
parts of the program, and to avoid them completely if possible. He or she should
also be quite sure that there's no easy way to accomplish the desired goals with
the standard features that \TEX\ already has. \quotation {Think thrice before
extending}, because that may save a lot of work, and it will also keep
incompatible extensions of \TEX\ from proliferating.
\stopquotation

With the in the next chapters discussed reduction of backend and some frontend
code, combined with hooks that can trigger callbacks, we try to come close to
this objective. Now, the last sentence of this quote relates to stability and
this is also a reason why we enter this new thread: the smaller the core is, the
less subjected we are to change. Think of this: I haven't used \CONTEXT\ \MKII\
in over a decade. A \PDFTEX\ format still gets generated but I have no clue if
the engine has been changed in ways that make some code behave differently (it
could also be the ecosystem related to that engine), but I assume it's still
behaving the same. The same has to become true for stock \LUATEX\ and \MKIV\ and
for \CONTEXT\ it can even become more true with \LUAMETATEX. We'll see.

\stopsection

\startsection[title={Experiments}]

This (still sort of) prototype of what \LUAMETATEX\ could be boils down to a much
smaller binary, and not that much more \LUA\ code on top of what we already have.
There are no longer dependencies on third party code, apart from \LUA\ (\type
{pplib} is tuned for \LUATEX\ and permanent part of the code base). Performance
wise the backend of the experimental version makes a run upto 5\% slower than
when using a native backend (on processing the \LUATEX\ manual) but history has
learned that we can gain some of that back in due time. Performance also depends
a bit on the properties of the document. Interesting is that better control over
the output showed that \PDF\ output of the mentioned manual was a bit smaller
(but that might change). \footnote {In the meantime the experimental version can
process the \LUATEX\ manual 5\endash10\% faster and the result is still smaller.}

The experiments actually started already years ago with no longer using the font
loader. It sort of went this way:

\startitemize
\startitem
    Stepwise \CONTEXT\ functionality started using a combination of \TEX\ and
    \LUA\ code and we got an idea of what was needed. The most demanding part
    was support for fonts.
\stopitem
\startitem
    Font handling was done in \LUA\ because it's flexible which is what \TEX ies
    are accustomed to. The \OPENTYPE\ and \PDF\ standards would not be called
    standards if some implementation was impossible and so far we're ok. (Some
    more script support will be provided in future versions.)
\stopitem
\startitem
    We stopped using the fontforge font loader but use one written in \LUA\
    instead. One reason for this was that when variable fonts showed up we wanted
    to support it in \CONTEXT\ right from the start (not that there has been much
    demand). The same is true for fonts using color (like emoji). Also, fighting
    the built|-|in \FONTFORGE\ heuristics was hard.
\stopitem
\startitem
    The (large and dependent on \CPLUSPLUS) poppler library used for \PDF\
    embedding has been replaced by a small lightweight library in pure \CCODE.
    This was triggered at a chat during a bacho\TEX\ meeting.
\stopitem
\startitem
    The hard coded \PDF\ inclusion can be swapped with a \LUA\ based one so that
    we can for instance filter the page stream. We already had a hybrid solution
    in \CONTEXT\ anyway for other reasons (merging annotations, layers,
    bookmarks, etc.).
\stopitem
\startitem
    The page stream constructor got a (shipout and xforms) by a \LUA\ variant,
    but I decided not to make that an independent option in stock \LUATEX\ with
    \CONTEXT\ \MKIV, although for a while I had the option \type {--lmtx} for
    activating that experimental code.
\stopitem
\startitem
    Then of course bitmap image inclusion had to be done by \LUA\ code, in order
    to see if we can get rid of another external dependency as some of these
    libraries get frequent updates while in practice we only use a very small
    subset of functionality. Indeed this was possible. \footnote {I have a pure
    \LUA\ parser for \PDF\ too, so at some point that might get included in the
    \CONTEXT\ code base.}
\stopitem
\startitem
    With some effort (deciphering specs and such) the font inclusion could also
    be done by a \LUA. This was made possible by the fact that we already had
    support for variable fonts. More tricks are possible and will be explored.
\stopitem
\startitem
    Finally the \PDF\ file construction and \PDF\ object management had to be
    implemented. This was actually the easiest part.
\stopitem
\stopitemize

Performance wise the \LUA\ font loader is faster than the built in one. The same
is true for \PDF\ inclusion but in practice that is unnoticeable. Bitmap
inclusion is currently slower for interlaced images (seldom used in print) and
just as efficient for other types. The page stream constructor is definitely
slower but this is compensated by the faster font inclusion and \PDF\ file
construction. Of course it all depends on the kind of content, but these are the
observation as of fall 2018. Anyway, they were enough reason to continue this
experiment.

One thing to keep in mind is that the smaller the binary and the less code paths
we have, the better future performance might be. Computers are not becoming much
faster for single thread processes like \TEX, so the less we jump around code
space (memory) the better it probably is for \CPU\ caching (as caches are not
growing much either).

\stopsection

\startsection[title={Conclusion}]

Normally when writing this kind of code I make sure that I can enable such new
mechanisms on top of others but at some point one has to decide how to really
integrate them. For instance, we can do font inclusion independent of \PDF\
generation or page stream construction independent of \PDF\ generation and|/|or
font inclusion but in the end that doesn't make sense and makes the code base a
bit of a mess. So, this is how it will go.

Stock \LUATEX\ with \MKIV\ will use the normal backend but probably there might
be an option to overload the built|-|in image inclusion so that one can avoid the
abortion of a run in case of problematic images. Complete \PDF\ file
construction, which then also includes page stream construction, font embedding
and object management might be available as option for \MKIV\ with \LUATEX\ 1.10
(for a while) but will be default when using \LUAMETATEX. When we move on \LMTX\
support might evolve in more sophisticated trickery. \footnote {A few months
later I decided that this made no sense, and that it was cleaner to just leave
that approach for \LMTX\ only. So, now both engines use different code
exclusively.}

Once tested a bit in real documents experimental code will end up in the
distribution. That code can then be turned into production code (read: cleaned up
and reshuffled a bit). We can streamline the engine code base: strip the
components that are not needed any more, remove some obsolete features, optimize
the code, strip some functions from \LUA\ libraries, rename some helpers, and
finally add some documentation. There are some plans to extend \METAPOST\ so also
things can get added. Concerning the \LUA\ interface it means that \type
{slunicode} is removed, the embedded socket related \LUA\ code goes external (but
the library stays), the font loader gets removed, the \type {img} library goes
away, no longer \PNG\ libraries are embedded, synctex is stripped out (but the
fields in nodes stay or get extended). \footnote {Much later I also decided to
remove the zip file reader library.} The resulting binary will be much smaller
and the code base more independent and smaller too. In the process \LUAJIT\
support might be dropped as well, simply because it no longer is in sync with
stock \LUA, but that also depends on how complex long term maintenance becomes.
\footnote {As we will see in following chapters, indeed support for \LUAJIT\ has
been dropped while \LUA\ got upgraded to 5.4.}

Because such a stripped down binary is no longer what got presented as \LUATEX\
version~1, it will basically become \LUATEX\ version 2, but then we have the
problem that its binary name clashes with the original. This is why it will be
run as \typ {luametatex}. For \CONTEXT\ it's not that relevant as it will run on
both \LUATEX\ 1.10 and its lean and mean successor. I might also provide a plain
\TEX\ (read: generic) version but that is to be decided because it probably
doesn't make much sense to spend time on it. As usual we will test this within
the \CONTEXT\ beta program. The good thing is that it doesn't interact with
\LUATEX, so that other macro packages are not affected. Another side effect can
be that we uncover issues with \LUATEX\ 1.10 and that we can experiment with some
improvements that we feed back into the parent.

At the \CONTEXT\ end of this there are some plans to extend the export, maybe
improve already present \PDF\ tagging (if found useful), add some more input
(xml) manipulations, and maybe extend (virtual) font handling a bit, now that we
no longer are bound to the currently used packet model. Contrary to what one
might expect this is not really dependent on the engine.

How do we proceed? As with the transition from \MKII\ to \MKIV, it will all
happen stepwise. This means that for a while the code base will be a bit hybrid
but at some point it might be partially split to make things cleaner, not that I
expect many fundamental differences (certainly not in the front|-|end). This
dualistic approach means more work but also makes that we keep a working
\CONTEXT. We also need to keep an eye on for instance generic commands as used in
tikz: we can't drop them so we emulate them (so far with success). As the time of
this writing, begin November 2018, the \CONTEXT\ test suite can be processed in
\LMTX\ mode without problems so I'm confident that it will work out ok. The next
chapter describes the results of how we did the above in more detail.

\stopsection

\stopchapter

\stopcomponent