summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/followingup/followingup-cleanup.tex
blob: 7dcb3b3b1fab61a0aaff7a840ef13d3a3974a08c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
% language=us

% Youtube: TheLucs play with Jacob Collier // Don't stop til you get enough

\startcomponent followingup-cleanup

\environment followingup-style

\logo [ALGOL]   {Algol}
\logo [FORTRAN] {FORTRAN}
\logo [SPSS]    {SPSS}
\logo [DEC]     {DEC}
\logo [VAX]     {VAX}
\logo [AMIGA]   {Amiga}

\startchapter[title={Cleanup}]

\startsection[title={Introduction}]

Original \TEX\ is a literate program, which means that code and documentation are
mixed. This mix, called a \WEB, is split into a source file and a \TEX\ file and
both parts are processed independently into a program (binary) and a typeset
document. The evolution of \TEX\ went through stages but in the end a \PASCAL\
\WEB\ file was the result. This fact has lead to the more or less standard \WEBC\
compilation infrastructure which is the basis for \TEXLIVE.

% My programming experience started with programming a micro processor kit (using
% an 1802 processor), but at the university I went from \ALGOL\ to \PASCAL\ (okay,
% I also remember lots of \SPSS\ kind|-|of|-|\FORTRAN\ programming. The \PASCAL\
% was the one provided on \DEC\ and \VAX\ machines and it was a bit beyond standard
% \PASCAL. Later I did quite some programming in \MODULA 2 in (for a while an
% \AMIGA) but mostly on personal computers. The reason that I mention this it that
% it still determines the way I look at programs. For instance that code goes
% through a couple if stepwise improvements (and that it can always be done
% better). That you need to keep an eye on memory consumption (can be a nice
% challenge). That a properly formatted source code is important (at least for me).
%
% When into \PASCAL, I ran into the \TEX\ series and as it looked familiar it ended
% up on my bookshelf. However, I could not really get an idea what it was about,
% simply because I had no access to the \TEX\ program. But the magic stayed with
% me. The fact that \LUA\ resembles \PASCAL, made it a good candidate for extending
% \TEX\ (there were other reasons as well). When decades later, after using \TEX\
% in practice, I ended up looking at the source, it was the \LUATEX\ source.

So, \TEX\ is a woven program and this is also true for the starting point of
\LUATEX: \PDFTEX. But, because we wanted to open up the internals, and because
\LUA\ is written in \CCODE, already in an early stage Taco decided to start from
the \CCODE\ translated from \PASCAL. A permanent conversion was achieved using
additional scripts and the original documentation stayed in the source. The one
large file was split into more logical smaller parts and combined with snippets
from \ALEPH .

After we released version 1.0 I went through the documentation parts of the code
and normalized that a bit. The at that moment still sort of simple \WEB\ files
became regular \CCODE\ files, and the idea was (and is) that at some point it
should be possible to process the documentation (using \CONTEXT).

Over time the \CCODE\ code evolved and functions ended up in places that at that
made most sense at that moment. After the previously described stripping process,
I decided to go through the files and see if a bit of reshuffling made sense,
mostly because that would make documenting easier. (I'm not literate enough to
turn it into a proper literate program.) It was also a good moment to get rid of
unused code (not that much) and unused macros (some more than expected). It also
made sense to change a few names (for instance to avoid potential future clashes
with \type {lua_} core functions). However, all this takes quite some careful
checking and compilation runs, so I expect that after this first cleanup, for
quite some time stepwise improvements can happen (especially in adding comments).
\footnote {This is and will be an ongoing effort. It probably doesn't show, but
getting the code base in the state it is in now, took quite some time. It
probably won't take away complaints and nagging but I've decided no longer to pay
attention to those on the sideline.} \footnote {In the end not much \PDFTEX\ and
\ALEPH\ code is present in \LUAMETATEX , but these were useful intermediate
steps. No matter how lean \LUAMETATEX\ becomes, I have a weak spot for \PDFTEX\
as it always served us well and without it \TEX\ would be less present today.}

One of the things that I keep in mind when doing this, is that we use \LUA. This
component compiles on most relevant platforms and as such we can assume that
\LUAMETATEX\ also should (and can be) made a bit less dependent on old mechanisms
that are used in stock \LUATEX. For instance, we don't come from \PASCAL\ any
longer but there are traces of that transition still present. We also don't use
specific operating system features, and those that we use are also used in \LUA.
And, as we try to share code we can also delegate some (more) to \LUA. For
instance file related code is not dependent on other components in the \TEX\
infrastructure, but maybe at some point the runtime loadable \KPSE\ library can
kick in. So, basically the idea is to sort of go bare bone first and later see
how with the help of \LUA\ we can get bring some back. For the record: this is
not needed for \CONTEXT\ as it already has this interface to \TDS. \footnote
{This has been removed from my agenda.}

\stopsection

\startsection[title={Motivation}]

The \LUATEX\ project started as an experiment of adding \LUA\ to \PDFTEX, which
was done by Hartmut and in order to avoid confusion we named it \LUATEX. When we
figured out that there this had possibilities we decided to go further and Taco
took the challenge to rework the code base. Part of that work was sponsored by
Idris' Oriental \TEX\ project. I have fond memory of the intensive and rapid
development cycles: online discussions, binaries going my directions,
experimental \CONTEXT\ code going the other way. When we had reached a sort of
stable state but at some point, read: usage in \CONTEXT\ had become crucial, a
steady further development started, where Taco redid \METAPOST\ into \MPLIB,
funded by user groups. At some point Luigi took over from Taco the task of
integration of components (also into \TEX Live), introduced \LUAJIT\ into the
binary, conducted the (again partially funded) swiglib project, followed by
support for \FFI. A while later I myself started messing around in the code base
directly and continued extending the engine and \LUA\ interfaces.

I could work on this because I have quite some freedom at the place where I work.
We use (part of) \CONTEXT\ for some projects and especially in dealing with \XML\
we could benefit from \LUATEX. It must be said that (long running) projects like
these never pay off (on the contrary, they cost a lot in terms of money and
energy) so it's quite safe to conclude that \LUATEX\ development is to a large
extend a (many man years) work of love for the subject. I guess that no sane
company will do (permit) such a thing. It is also for that reason that I keep
spending time on it, and as a simplification of the code base was always one of
my dreams, this is what I spend my time on now. After all, \LUATEX\ is just
juggling bytes and as it is written in \CCODE, and has no graphical user
interface or complex dependencies, it should be possible to have a relative
simple setup in terms of code files and compilation. Of course this is also made
possible by the fact that I can use \LUA. It's also why I decided to
\quotation {Just do it}, and then \quotation {Let's see where I end up}. No
matter how it turns out, it makes a good vehicle for further development and
years of fun.

\stopsection

\startsection[title={Files}]

After a decade of adding and moving around code it's about time to reorganize the
code a bit, but we do so without deviating too much from the original setup. For
instance we started out with a small number of \LUA\ interface macros and these
were collected in a few files, and defined in one \type {h} file, but it made
sense to have header files alongside the libraries that implement helpers. This
is a rather tedious job but with music videos or video casts on a second screen
it is bearable.

When I reached a state where we only needed the \LUATEX\ files plus the minimal
set of libraries I tried to get rid of directories in the source tree that were
placeholders, but with \type {automake} files, like those for \PDFTEX\ and
\XETEX. After a couple of attempts I gave up on that because the build setup is
rather hard coded for checking them. Also, there were some (puzzling)
dependencies in the configuring on \OMEGA\ files as well as some \DVI\ related
tools. So, that bit is for later to sort out. \footnote {Of course later the
decision was made to forget about using \type {autotools} and go for an as simple
as possible \type {cmake} solution.}

\stopsection

\startsection[title={Command line arguments}]

As we need to set up a backend and deal with font loading in \LUA, we can as well
delegate some of the command line handling to \LUA\ as well. Therefore, only the
a limited set of options is dealt with: those that determine the startup and \LUA\
behavior. In principle we can even get rid of all and always use a startup script
but for now it makes sense to not deviate too much from a regular \TEX\ run.

At the time of this writing some code is still in place that is a candidate for
removal. For instance, using the \type {&} to define a format file has long be
replaced by \type {--fmt}. There are sentimental reasons for keeping it but at
the same time we need to realize that shells use these special characters too. A
for me unknown (or forgotten) feature of prefixing a jobname with a \type {*}
will be removed as it makes no sense. There is some \MSWINDOWS\ specific last
resort code that probably will go too, unless I can figure out why it is needed
in the first place. \footnote {Intercepting these symbols has been dropped in
favor of the command line flags.}

Now left with a very simple set of command line options it also makes sense to
use a simple option analyzer, so that was a next step as it rid us of a
dependency and produces less code.

So, the option parser has now been replaced by a simple variant that is more in
tune with what will happen when you deal with options in \LUA: no magic. One
problem is that \TEX's first input file is moved from the command line to the
input buffer and a an interactive session is emulated. As mentioned before, there
is some extra \type {&}, \type {*} and \type {\\} parsing involved. One can
wonder if this still makes sense in a situation where one has to specify a format
and \LUA\ file (using \type {--fmt} and \type {--ini}) so that might as well be
redone a bit some day. \footnote {In the end only these explicit command line
options were supported.}

\stopsection

\startsection[title={Platforms}]

When going through the code I noticed conditional sections for long obsolete
platforms: \type {amiga}, \type {dos} and \type {djgpp}, \type {os/2}, \type
{aix}, \type {solaris}, etc. Also, with 64 bit becoming the standard, it makes
sense to assume that users will use a modern 64 platform (intel or arm combined
with \MSWINDOWS\ or some popular \UNIX\ variant). We don't need large and complex
code management for obscure platforms and architectures simply because we want to
proof that \LUAMETATEX\ runs everywhere. With respect to \MSWINDOWS\ we use a
cross compiler (\type {mingw}) as reference but native compilation should be no
big deal eventually. We can cross that bridge when we have a simplified
compilation set up. Right now it doesn't make sense to waste time on a native
\MICROSOFT\ compilation as it would also pollute the code with conditional
sections. We'll see what happens when I'm bored. \footnote {In the meantime no
effort is made to let the source compile otherwise than with the cross compiler.
Best is to keep the code as clean as possible with respect to conditional code
sections. So don't bother me with patches.}

\stopsection

\startsection[title={Stubs}]

A \CONTEXT\ run is managed by \MTXRUN\ in combination with a specific script

\starttyping
mtxrun --script context
\stoptyping

On windows, we use a stub because using a \type {cmd} file create an indirectness
that is not seen as executable and therefore in other command files needs to
be called in a special way to guarantee continuation. So, there we have a small
binary:

\starttyping
mtxrun.exe ...
\stoptyping

that will call:

\starttyping
luatex --luaonly mtxrun.lua ...
\stoptyping

And when the stub has a different name than \type {mtxrun}, say:

\starttyping
context.exe ...
\stoptyping

it effectively becomes:

\starttyping
luatex --luaonly mtxrun.lua --script context ...
\stoptyping

Because the stripped down version assumes some kind of initializations anyway a
small extension made it possible to use \LUAMETATEX\ as stub too. So, when we
rename \type {luametatex.exe} to \type {mtxrun.exe} (on \UNIX\ we don't use a
suffix) it will start up as \LUA\ interpreter when it finds a script with the
name \type {mtxrun.lua} in the same path. When we rename it to \type
{context.exe} it will search for \type {context.lua} and all that that script has
to do is this:

\starttyping
arg[0] = "mtxrun"

table.insert(arg,1,"mtx-context")
table.insert(arg,1,"--script")

dofile(os.selfpath .. "/" .. "mtxrun.lua")
\stoptyping

So, it basically becomes a call to \type {mtxrun}, but we stay in \LUAMETATEX.
Because we want an isolated run this will launch \LUAMETATEX\ again with the
right command line arguments. This sounds inefficient but because we have a small
binary this is no real issue, and as that run is isolated, it cannot influence
the caller. The overhead is really small: on my somewhat older laptop it's .2
seconds, but we had that management overhead already for decades, so no one
bothers about it. On all platforms using symbolic links works ok too.

\stopsection

\startsection[title={Global variables}]

There are quite a bit global variables and function in the code base, but in the
process of opening up I got rid of some. The cleanup turned some more into
locals which saved executable bytes (keep in mind that we also use the engine as
\LUA\ interpreter so, the smaller, the more friendly). \footnote {Later the
global variables were collected in so called \CCODE\ structs.} This is work
in progress.

\stopsection

\startsection[title={Memory usage}]

By going over all the code a couple of times, I was able to decrease the amount
of used memory a bit as well as avoid some memory allocations. This has no
consequences for performance but is nicer when multiple runs at the same time
(e.g.\ on virtual machines) have to compete for resources. \footnote {I will
probably have to spend some more time on this in order to reach a state that I'm
satisfied with.}

\stopsection

\startsection[title={\METAPOST}]

The current code base doesn't have that many files. We can imagine that, when
\LUA\ can be compiled on a platform, that compiling \LUAMETATEX\ is also no that
complicated. However, the rather complex build infrastructure demonstrates the
opposite. One of the complications is that \MPLIB\ is codes in \CWEB\ and that
needs some juggling to get \CCODE. The process has quite some dependencies. There
are some upstream patches needed, but for now occasionally checking with the
upstream sources used for compiling \MPLIB\ in \LUATEX\ works okay. \footnote
{Later I decided to cleanup the \MPLIB\ code: unused font related code was
removed, the \POSTSCRIPT\ backend was untangled, the translation from \CWEB\ to
\CCODE\ got done by a \LUA\ script, aspects like error reporting and \IO\ were
redone, and in the end some new extensions were added. Some of that might trickle
back to th original, as long as it doesn't harm compatibility; after all
\METAPOST\ (the program) is standardized and considered functionally stable.}

As \LUAMETATEX\ is also used for experiments we use a copy of the \LUA\ library
interface. That way we don't interfere with the stable \LUATEX\ situation. When
we play with extensions, we can always decide to backport them, once they are
found useful and in good working order. But, as that interface was just \CCODE\
this was trivial.

\stopsection

\startsection[title={Files}]

In a relative late stage I decided to cleanup some of the filename handling.
First I got rid of the \type {area}, \type {name} and \type {ext} decomposition
and optional recomposition. In the original engine that goes through the string
pool and although there is some recovery in the end, with many files and fonts
being used, the pool can get exhausted. For instance when you have hundreds of
thousands of \typ {\font \foo = bar} kind of definitions, each definition wipes
out the previous entry in the hash, but its font name is kept in the string pool.
I got rid of that side effect by reusing strings but in the end decided to avoid
the pool altogether. It was then a small step to also do that for other
filenames. In the process I also decided that it made no sense to keep the code
around that reads a filename from the console: we now just quit. Restarting the
program with a proper filename is no big deal today. I might do some more cleanup
there. In the end we can best use a callback for handling input from the console.

\stopsection

\stopchapter

\stopcomponent