summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/followingup/followingup-stripping.tex
blob: 69af6376c4e0db400487188c1df92c13c175cd8b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
% language=us

%  2,777,600 / 11,561,471 cont-en.fmt

% Hooverphonic - Live at the Ancienne Belgique (Geike Arnaert)

\startcomponent followingup-stripping

\environment followingup-style

\startchapter[title={Stripping}]

\startsection[title={Introduction}]

Normally I need a couple of iterations to reach the implementation that I like
(an average of three rewrites is rather normal). So, I sat down and started
stripping the engine and did so a few times in order to get an idea of how to
proceed. One drawback of going public too soon (and we ran into that with
\LUATEX) is that as soon as there are more users, one gets stuck into the
situation that a different approach is not really possible. This is why from now
on experimental is really experimental, even if that means: it works ok in
\CONTEXT\ (even for production) but we can change interfaces be better, e.g.\
more consistent (although we're also stuck with existing \TEX\ terminology).
Anyway, let's proceed.

\stopsection

\startsection[title={The binary}]

In 2014 the \LUATEX\ binary was some 10.9 MB large. The version 1.09 binary of
October 2018 was about 6.8MB, and the reduction was due to removing the bitmap
generation from \MPLIB\ as well as replacing poppler by pplib. As an exercise I
decided to see how easy it was to make a small version suitable for \CONTEXT\
\LMTX, and as expected the binary shrunk to below 3MB (plus a \LUA\ and \KPSE\
dll). This is a reasonable size given what is still present.

There is hardly any file related code left because in practice the backend used
the most different file types. That also meant that we could remove \KPSE\
related code and keep all that in the library part. In principle one can load
that library and hook it into the few callbacks that relate to loading files.
Once we're stable I'll probably write some code for that. \footnote {In the
meantime I think it makes not much sense to do that.} Launching the binary with a
startup script can deal with all matters needed, because the command line
arguments are available.

We could actually go even smaller by removing the built|-|in \TFM\ and \VF\
readers. For instance it made not much sense to read and store information that
is never used anyway, like virtual font data: as long as the backend has access
to what it needs it's fine. By removing unused code and stripping no longer used
fields in the internal font tables (which is also good for memory consumption),
and cleaning up a bit here and there the experimental binary ended up at a bit
above 2.5MB (plus a \LUA\ dll). \footnote {Mid January we were just below 2.7 MB
with a static, all inclusive, binary. In March the static ended up at 2.9 MB on
\MSWINDOWS\ and 2.6 MB in \UNIX.}

\stopsection

\startsection[title={Functionality}]

There is no real reason to change much in the functionality of the frontend but
as we have no backend now, some primitives are gone. These have to be implemented
as part of creating a backend.

\starttyping
\dviextension \dvivariable \dvifeedback
\pdfextension \pdfvariable \pdffeedback
\stoptyping

The already obsolete related dimensions are also removed:

\starttyping
\pageleftoffset \pagerightoffset
\pagetopoffset  \pagebottomoffset
\stoptyping

And we no longer need the page dimensions because they are just registers that
are normally used in the backend. So, we got rid of:

\starttyping
\pageheight
\pagewidth
\stoptyping

Some font related inheritances from \PDFTEX\ have also been dropped:

\starttyping
\letterspacefont
\copyfont
\expandglyphsinfont
\ignoreligaturesinfont
\tagcode
\stoptyping

Internally all backend whatsits are gone, but generic \type {literal}, \type
{save}, \type {restore} and \type {setmatrix} nodes can still be created. Under
consideration is to let them be so called user nodes but for testing it made
sense to keep them around for a while. \footnote {Don't take this as a reference:
later we will see that more was changed.}

The resource relates primitives are backend dependent so the primitives have been
removed. As with other backend related primitives, their arguments depend on the
implementation. So, no more:

\starttyping
\saveboxresource
\useboxresource
\lastsavedboxresourceindex
\stoptyping

and:

\starttyping
\saveimageresource
\useimageresource
\lastsavedimageresourceindex
\lastsavedimageresourcepages
\stoptyping

Of course the rule nodes subtypes are still there, so the typesetting machinery
will handle them fine. It is no big deal to define a pseudo|-|primitive that
provides the functionality at the \TEX\ level.

The position related primitives are also backend dependent so again they were
removed. \footnote {There was some sentimental element in this. Long ago, even
before \PDFTEX\ showed up, \CONTEXT\ already had a positional mechanism. It
worked by using specials in combination with a program that calculated the
positions from the \DVI\ file. At some point that functionality was integrated
into \PDFTEX. For me it always was a nice example of demonstrating that
complaints like \quotation {\TEX\ is limited because we don't know the position
of an element in the text.} make no sense: \TEX\ can do more than one thinks,
given that one thinks the right way.}

\starttyping
\savepos
\lastxpos
\lastypos
\stoptyping

We could have kept \type {\savepos} but better is to be consistent. We no longer
need these:

\starttyping
\outputmode
\draftmode
\synctex
\stoptyping

These could go because we no longer have a backend and if one needs it it's easy
to define a meaningful variable and listen to that.

The \type {\shipout} primitive does no ship out but just flushes the content of
the box, if that hasn't happened already.

Because we have \LUA\ on board, and because we can now use the token scanners to
implement features, we no longer need the hard coded randomizer extensions. In
fact, also the \METAPOST\ should now use the \LUA\ randomizer, so that we are
consistent. Anyway, removed are:

\starttyping
\randomseed
\setrandomseed
\normaldeviate
\uniformdeviate
\stoptyping

plus the helpers in the \type {tex} library.

\stopsection

\startsection[title={Fonts}]

Fonts are sort of special. We need the data at the \LUA\ end in order to process
\OPENTYPE\ fonts and the backend code needs the virtual commands. The par builder
also needs to access font properties, as does the math renderer, but here is no
real reason to carry virtual font information around (which involves packing and
unpacking virtual packets). So, in the end it made much sense to also delegate
the \TFM\ and \VF\ loading to \LUA\ as well. And, as a consequence dumping and
undumping font information could go away too, which is okay, as we didn't preload
fonts in \CONTEXT\ anyway. The saving in binary bytes is not impressive but
keeping unused code around neither. In principle we can get rid of the internal
representation if we fetch relevant data from the \LUA\ tables but that might be
unwise from the perspective of performance. By removing the no longer needed
fields the memory footprint became somewhat smaller and font loading (passing
from \LUA\ to \TEX) more efficient.

\stopsection

\startsection[title={File IO}]

What came next? A program like \LUATEX\ interacts with its environment and one of
the nice things about \TEX\ is that it has a standard ecosystem, organized as the
\quotation {\TEX\ Directory Structure}. There is library that interfaces with
this structure: \KPSE, but in \CONTEXT\ \MKIV\ we implement its functionality in
\LUA. The primary reason for this was performance. When we started with \LUATEX\
the startup on my machine (\MSWINDOWS) and a few servers (\LINUX) of a \TEX\
engine took seconds and most fo that was due to loading the rather large file
databases, because a \TEX\ Live installation was a gigabyte adventure. With the
\LUA\ variant I could bring that down to milliseconds, because I could pre|-|hash
the database and limit it to files relevant for \CONTEXT\ (still a lot, as fonts
made up most). Nowadays we have \SSD\ disks and plenty of memory for caching, so
these things are less urgent, but on network shares it still matters.

So, as we don't use \KPSE, we can remove that library. By doing that we simplify
compilation a lot as then all dependencies are in the engine's source tree, and
we're no longer dependent on updates. One can argue that we then sacrifice too
much, but already for a decade we don't use it and the \LUA\ variant does the job
well within the \TDS\ ecosystem. Also, in our by now stripped down engine, there
is not that much lookup going on anyway: we're already in \LUA\ when we do fonts.
But on the other hand, some generic usage could benefit from the library to be
present, so we face a choice. The choice is made even more difficult by the fact
that we can remove all kind of tweaks once we delegate for instance control over
command execution to \LUA\ completely. But, we might provide \KPSE\ as loadable
\LUA\ module so that when needed one can use a stub to start the program with a
\LUA\ script that as first action loads this library that then can take care of
further file management. As command line arguments are available in \LUA, one can
also implement the relevant extra switches (and even more if needed).

Now, the interesting thing is that because we have a \LUA\ interface to \KPSE\ we
can actually drop some hard coded solutions. This means that we can have a binary
without \KPSE, in which case one has to cook up callbacks that do what this
library does. But in a version with \KPSE\ embedded one also has to define some
file related callbacks although they can be rather simple. By keeping a handful
of file related callbacks the code base could be simplified a lot. In the process
the recorder option went away (not that we ever used it). It is relatively easy
to support this in the \quote {find} related callbacks and one has to deal with
other files (like images and fonts) also, so keeping this feature was a cheat
anyway.

At this point it is important to notice that while we're dropping some command
line options, they can still be passed and intercepted at the \LUA\ end. So,
providing compatible (or alternative solution) is no big deal. For instance,
execution of (shell) programs is a \LUA\ activity and can be managed from there.

\stopsection

\startsection[title={Callbacks}]

Callbacks can be organized in groups. First there are those related to
\IO. We only have to deal with a few types: all kind of \TEX\ files (data
files), format files and \LUA\ modules (but these to are on the list of
potentially dropped files as this can be programmed in \LUA).

\starttyping
find_write_file
find_data_file open_data_file read_data_file
find_format_file find_lua_file find_clua_file
\stoptyping

The callbacks related to errors stay: \footnote {Some more error handling was
added later, as was intercepting user input related to it.}

\starttyping
show_error_hook show_lua_error_hook,
show_error_message show_warning_message
\stoptyping

% We kept the buffer handlers but dropped the output handler later anyway, so we
% have left:
%
% \starttyping
% process_input_buffer
% \stoptyping

The management hooks were kept (but the edit one might go): \footnote {And
indeed, that one went away.}

\starttyping
process_jobname
call_edit
start_run stop_run wrapup_run
pre_dump
start_file stop_file
\stoptyping

Of course the typesetting callbacks remain too as they are the backbone of the
opening up:

\starttyping
buildpage_filter hpack_filter vpack_filter
hyphenate ligaturing kerning
pre_output_filter contribute_filter build_page_insert
pre_linebreak_filter linebreak_filter post_linebreak_filter
insert_local_par append_to_vlist_filter new_graf
hpack_quality vpack_quality
mlist_to_hlist make_extensible
\stoptyping

Finally we mention one of the important callbacks:

\starttyping
define_font
\stoptyping

Without that one defined not much will happen with respect to typesetting. I
could actually remove the \type {\font} primitive but that would be a bit weird
as other font related commands stay. Also, it's one of the fundamental frontend
primitives, so removal was never really considered.

\stopsection

\startsection[title={Bits and pieces}]

In the process some helpers and status queries were removed. From the summary
above you can deduce that this concerns images, backend, and file management.
Also not used variables (some inherited from the past and predecessors) were
removed. These and other changes are the reason why there is a separate manual
for \LUAMETATEX. \footnote {Relatively late in the project I decided to be more
selective in what got initialized in \LUA\ only mode.}

One of my objectives was to see how lean and mean the code base could be. But
even if we don't use that many files, the rather complex build system makes that
we need to have (make and configure) files in the tree that are not really used
but even then omitting them aborts a build. I played a bit with that but the
problem is that it needs to be dealt with upstream in order to prevent repetitive
work. So, this is something to sort out later. Eventually it would be nice to be
able to compile with a minimal set of source files, also because other programs
(all kind of \TEX\ variants) that are checked for but not compiled depend on
libraries that we don't need (and therefore want) to have in the stripped down
source tree. \footnote {In the end, the source tree was redesigned completely.}

For now we also brought down the number of catcode tables (to 256) \footnote {As
with math families, and if more tables are needed one should wonder about the
\TEX\ code used.}, and the number of languages (to 8192) \footnote {This is
already a lot and because languages are loaded run time, we can go much lower
than this.} as that saves some initially allocated memory.

\stopsection

\startsection[title={What's next}]

Basically the experiment ends here. A next step is to create a stable code base,
make compilation easy and consider the way the code is packages. Then some
cleanup can take place. Also, as it's a window to the outside world, \type {ffi}
support will move to the code base and be integral to \LUAMETATEX. And of course
the decision about \LUAJIT\ support has to be made some day soon. The same is
true for \LUA\ 5.4: in \LUATEX\ for now we stick to 5.3 but experimenting with
5.4 in \LUAMETATEX\ can't harm us. \footnote {The choice has been made:
\LUAMETATEX\ will not have a \LUAJIT\ based companion.}

To what extend the \CONTEXT\ code base will have a special files for \LMTX\ is
yet to be decided, but we have some ideas about new features that might make that
desirable from the perspective of maintenance. The main question is: do I want to
have hybrid files or clean files for each variant (stock \MKIV\ and \LMTX).

For the record: at the time of wrapping this up, processing the \LUATEX\ manual
of 294 pages took 13.5 seconds using stock \LUATEX\ while using the stripped down
binary, where \LUA\ takes over some tasks, took 13.9 seconds. \footnote {In the
meantime we're down to around 11.6MB. These are all rough numbers and mostly
indicate relative speeds at some point.} The \LUAJITTEX\ variant needed 10.9 and
10.8 seconds. So, there is no real reason to not explore this route, although
\unknown\ the \PDF\ file size shrinks from 1.48MB to 1.18MB (and optionally we
can squeeze out more) but one can wonder if I didn't make big mistakes. It is
good to realize that there is not much performance to gain in the engine simply
because most code is already pretty well optimized. The same is true for the
\CONTEXT\ code: there might be a few places where we can squeeze out a few
milliseconds but probably it will go unnoticed.

On the todo list went removal of \type {\primitive} which we never use (need) and
the possible introduction of a way to protect primitives and macros against
redefinition, but on the other hand, it might impact performance and be not worth
the trouble. In the end it is a macro package issue anyway and we never really
ran into users redefining primitives. \footnote {Indeed this primitive has been
removed.}

\stopsection

\stopchapter

\stopcomponent