From f72c2cf29d36ae836c894bad29dfd965d1af0236 Mon Sep 17 00:00:00 2001 From: Hans Hagen Date: Sun, 18 Aug 2019 22:51:53 +0200 Subject: 2019-08-18 22:26:00 --- .../manuals/followingup/followingup-cleanup.tex | 332 +++++++++++++++++++++ 1 file changed, 332 insertions(+) create mode 100644 doc/context/sources/general/manuals/followingup/followingup-cleanup.tex (limited to 'doc/context/sources/general/manuals/followingup/followingup-cleanup.tex') diff --git a/doc/context/sources/general/manuals/followingup/followingup-cleanup.tex b/doc/context/sources/general/manuals/followingup/followingup-cleanup.tex new file mode 100644 index 000000000..7dcb3b3b1 --- /dev/null +++ b/doc/context/sources/general/manuals/followingup/followingup-cleanup.tex @@ -0,0 +1,332 @@ +% language=us + +% Youtube: TheLucs play with Jacob Collier // Don't stop til you get enough + +\startcomponent followingup-cleanup + +\environment followingup-style + +\logo [ALGOL] {Algol} +\logo [FORTRAN] {FORTRAN} +\logo [SPSS] {SPSS} +\logo [DEC] {DEC} +\logo [VAX] {VAX} +\logo [AMIGA] {Amiga} + +\startchapter[title={Cleanup}] + +\startsection[title={Introduction}] + +Original \TEX\ is a literate program, which means that code and documentation are +mixed. This mix, called a \WEB, is split into a source file and a \TEX\ file and +both parts are processed independently into a program (binary) and a typeset +document. The evolution of \TEX\ went through stages but in the end a \PASCAL\ +\WEB\ file was the result. This fact has lead to the more or less standard \WEBC\ +compilation infrastructure which is the basis for \TEXLIVE. + +% My programming experience started with programming a micro processor kit (using +% an 1802 processor), but at the university I went from \ALGOL\ to \PASCAL\ (okay, +% I also remember lots of \SPSS\ kind|-|of|-|\FORTRAN\ programming. The \PASCAL\ +% was the one provided on \DEC\ and \VAX\ machines and it was a bit beyond standard +% \PASCAL. Later I did quite some programming in \MODULA 2 in (for a while an +% \AMIGA) but mostly on personal computers. The reason that I mention this it that +% it still determines the way I look at programs. For instance that code goes +% through a couple if stepwise improvements (and that it can always be done +% better). That you need to keep an eye on memory consumption (can be a nice +% challenge). That a properly formatted source code is important (at least for me). +% +% When into \PASCAL, I ran into the \TEX\ series and as it looked familiar it ended +% up on my bookshelf. However, I could not really get an idea what it was about, +% simply because I had no access to the \TEX\ program. But the magic stayed with +% me. The fact that \LUA\ resembles \PASCAL, made it a good candidate for extending +% \TEX\ (there were other reasons as well). When decades later, after using \TEX\ +% in practice, I ended up looking at the source, it was the \LUATEX\ source. + +So, \TEX\ is a woven program and this is also true for the starting point of +\LUATEX: \PDFTEX. But, because we wanted to open up the internals, and because +\LUA\ is written in \CCODE, already in an early stage Taco decided to start from +the \CCODE\ translated from \PASCAL. A permanent conversion was achieved using +additional scripts and the original documentation stayed in the source. The one +large file was split into more logical smaller parts and combined with snippets +from \ALEPH . + +After we released version 1.0 I went through the documentation parts of the code +and normalized that a bit. The at that moment still sort of simple \WEB\ files +became regular \CCODE\ files, and the idea was (and is) that at some point it +should be possible to process the documentation (using \CONTEXT). + +Over time the \CCODE\ code evolved and functions ended up in places that at that +made most sense at that moment. After the previously described stripping process, +I decided to go through the files and see if a bit of reshuffling made sense, +mostly because that would make documenting easier. (I'm not literate enough to +turn it into a proper literate program.) It was also a good moment to get rid of +unused code (not that much) and unused macros (some more than expected). It also +made sense to change a few names (for instance to avoid potential future clashes +with \type {lua_} core functions). However, all this takes quite some careful +checking and compilation runs, so I expect that after this first cleanup, for +quite some time stepwise improvements can happen (especially in adding comments). +\footnote {This is and will be an ongoing effort. It probably doesn't show, but +getting the code base in the state it is in now, took quite some time. It +probably won't take away complaints and nagging but I've decided no longer to pay +attention to those on the sideline.} \footnote {In the end not much \PDFTEX\ and +\ALEPH\ code is present in \LUAMETATEX , but these were useful intermediate +steps. No matter how lean \LUAMETATEX\ becomes, I have a weak spot for \PDFTEX\ +as it always served us well and without it \TEX\ would be less present today.} + +One of the things that I keep in mind when doing this, is that we use \LUA. This +component compiles on most relevant platforms and as such we can assume that +\LUAMETATEX\ also should (and can be) made a bit less dependent on old mechanisms +that are used in stock \LUATEX. For instance, we don't come from \PASCAL\ any +longer but there are traces of that transition still present. We also don't use +specific operating system features, and those that we use are also used in \LUA. +And, as we try to share code we can also delegate some (more) to \LUA. For +instance file related code is not dependent on other components in the \TEX\ +infrastructure, but maybe at some point the runtime loadable \KPSE\ library can +kick in. So, basically the idea is to sort of go bare bone first and later see +how with the help of \LUA\ we can get bring some back. For the record: this is +not needed for \CONTEXT\ as it already has this interface to \TDS. \footnote +{This has been removed from my agenda.} + +\stopsection + +\startsection[title={Motivation}] + +The \LUATEX\ project started as an experiment of adding \LUA\ to \PDFTEX, which +was done by Hartmut and in order to avoid confusion we named it \LUATEX. When we +figured out that there this had possibilities we decided to go further and Taco +took the challenge to rework the code base. Part of that work was sponsored by +Idris' Oriental \TEX\ project. I have fond memory of the intensive and rapid +development cycles: online discussions, binaries going my directions, +experimental \CONTEXT\ code going the other way. When we had reached a sort of +stable state but at some point, read: usage in \CONTEXT\ had become crucial, a +steady further development started, where Taco redid \METAPOST\ into \MPLIB, +funded by user groups. At some point Luigi took over from Taco the task of +integration of components (also into \TEX Live), introduced \LUAJIT\ into the +binary, conducted the (again partially funded) swiglib project, followed by +support for \FFI. A while later I myself started messing around in the code base +directly and continued extending the engine and \LUA\ interfaces. + +I could work on this because I have quite some freedom at the place where I work. +We use (part of) \CONTEXT\ for some projects and especially in dealing with \XML\ +we could benefit from \LUATEX. It must be said that (long running) projects like +these never pay off (on the contrary, they cost a lot in terms of money and +energy) so it's quite safe to conclude that \LUATEX\ development is to a large +extend a (many man years) work of love for the subject. I guess that no sane +company will do (permit) such a thing. It is also for that reason that I keep +spending time on it, and as a simplification of the code base was always one of +my dreams, this is what I spend my time on now. After all, \LUATEX\ is just +juggling bytes and as it is written in \CCODE, and has no graphical user +interface or complex dependencies, it should be possible to have a relative +simple setup in terms of code files and compilation. Of course this is also made +possible by the fact that I can use \LUA. It's also why I decided to +\quotation {Just do it}, and then \quotation {Let's see where I end up}. No +matter how it turns out, it makes a good vehicle for further development and +years of fun. + +\stopsection + +\startsection[title={Files}] + +After a decade of adding and moving around code it's about time to reorganize the +code a bit, but we do so without deviating too much from the original setup. For +instance we started out with a small number of \LUA\ interface macros and these +were collected in a few files, and defined in one \type {h} file, but it made +sense to have header files alongside the libraries that implement helpers. This +is a rather tedious job but with music videos or video casts on a second screen +it is bearable. + +When I reached a state where we only needed the \LUATEX\ files plus the minimal +set of libraries I tried to get rid of directories in the source tree that were +placeholders, but with \type {automake} files, like those for \PDFTEX\ and +\XETEX. After a couple of attempts I gave up on that because the build setup is +rather hard coded for checking them. Also, there were some (puzzling) +dependencies in the configuring on \OMEGA\ files as well as some \DVI\ related +tools. So, that bit is for later to sort out. \footnote {Of course later the +decision was made to forget about using \type {autotools} and go for an as simple +as possible \type {cmake} solution.} + +\stopsection + +\startsection[title={Command line arguments}] + +As we need to set up a backend and deal with font loading in \LUA, we can as well +delegate some of the command line handling to \LUA\ as well. Therefore, only the +a limited set of options is dealt with: those that determine the startup and \LUA\ +behavior. In principle we can even get rid of all and always use a startup script +but for now it makes sense to not deviate too much from a regular \TEX\ run. + +At the time of this writing some code is still in place that is a candidate for +removal. For instance, using the \type {&} to define a format file has long be +replaced by \type {--fmt}. There are sentimental reasons for keeping it but at +the same time we need to realize that shells use these special characters too. A +for me unknown (or forgotten) feature of prefixing a jobname with a \type {*} +will be removed as it makes no sense. There is some \MSWINDOWS\ specific last +resort code that probably will go too, unless I can figure out why it is needed +in the first place. \footnote {Intercepting these symbols has been dropped in +favor of the command line flags.} + +Now left with a very simple set of command line options it also makes sense to +use a simple option analyzer, so that was a next step as it rid us of a +dependency and produces less code. + +So, the option parser has now been replaced by a simple variant that is more in +tune with what will happen when you deal with options in \LUA: no magic. One +problem is that \TEX's first input file is moved from the command line to the +input buffer and a an interactive session is emulated. As mentioned before, there +is some extra \type {&}, \type {*} and \type {\\} parsing involved. One can +wonder if this still makes sense in a situation where one has to specify a format +and \LUA\ file (using \type {--fmt} and \type {--ini}) so that might as well be +redone a bit some day. \footnote {In the end only these explicit command line +options were supported.} + +\stopsection + +\startsection[title={Platforms}] + +When going through the code I noticed conditional sections for long obsolete +platforms: \type {amiga}, \type {dos} and \type {djgpp}, \type {os/2}, \type +{aix}, \type {solaris}, etc. Also, with 64 bit becoming the standard, it makes +sense to assume that users will use a modern 64 platform (intel or arm combined +with \MSWINDOWS\ or some popular \UNIX\ variant). We don't need large and complex +code management for obscure platforms and architectures simply because we want to +proof that \LUAMETATEX\ runs everywhere. With respect to \MSWINDOWS\ we use a +cross compiler (\type {mingw}) as reference but native compilation should be no +big deal eventually. We can cross that bridge when we have a simplified +compilation set up. Right now it doesn't make sense to waste time on a native +\MICROSOFT\ compilation as it would also pollute the code with conditional +sections. We'll see what happens when I'm bored. \footnote {In the meantime no +effort is made to let the source compile otherwise than with the cross compiler. +Best is to keep the code as clean as possible with respect to conditional code +sections. So don't bother me with patches.} + +\stopsection + +\startsection[title={Stubs}] + +A \CONTEXT\ run is managed by \MTXRUN\ in combination with a specific script + +\starttyping +mtxrun --script context +\stoptyping + +On windows, we use a stub because using a \type {cmd} file create an indirectness +that is not seen as executable and therefore in other command files needs to +be called in a special way to guarantee continuation. So, there we have a small +binary: + +\starttyping +mtxrun.exe ... +\stoptyping + +that will call: + +\starttyping +luatex --luaonly mtxrun.lua ... +\stoptyping + +And when the stub has a different name than \type {mtxrun}, say: + +\starttyping +context.exe ... +\stoptyping + +it effectively becomes: + +\starttyping +luatex --luaonly mtxrun.lua --script context ... +\stoptyping + +Because the stripped down version assumes some kind of initializations anyway a +small extension made it possible to use \LUAMETATEX\ as stub too. So, when we +rename \type {luametatex.exe} to \type {mtxrun.exe} (on \UNIX\ we don't use a +suffix) it will start up as \LUA\ interpreter when it finds a script with the +name \type {mtxrun.lua} in the same path. When we rename it to \type +{context.exe} it will search for \type {context.lua} and all that that script has +to do is this: + +\starttyping +arg[0] = "mtxrun" + +table.insert(arg,1,"mtx-context") +table.insert(arg,1,"--script") + +dofile(os.selfpath .. "/" .. "mtxrun.lua") +\stoptyping + +So, it basically becomes a call to \type {mtxrun}, but we stay in \LUAMETATEX. +Because we want an isolated run this will launch \LUAMETATEX\ again with the +right command line arguments. This sounds inefficient but because we have a small +binary this is no real issue, and as that run is isolated, it cannot influence +the caller. The overhead is really small: on my somewhat older laptop it's .2 +seconds, but we had that management overhead already for decades, so no one +bothers about it. On all platforms using symbolic links works ok too. + +\stopsection + +\startsection[title={Global variables}] + +There are quite a bit global variables and function in the code base, but in the +process of opening up I got rid of some. The cleanup turned some more into +locals which saved executable bytes (keep in mind that we also use the engine as +\LUA\ interpreter so, the smaller, the more friendly). \footnote {Later the +global variables were collected in so called \CCODE\ structs.} This is work +in progress. + +\stopsection + +\startsection[title={Memory usage}] + +By going over all the code a couple of times, I was able to decrease the amount +of used memory a bit as well as avoid some memory allocations. This has no +consequences for performance but is nicer when multiple runs at the same time +(e.g.\ on virtual machines) have to compete for resources. \footnote {I will +probably have to spend some more time on this in order to reach a state that I'm +satisfied with.} + +\stopsection + +\startsection[title={\METAPOST}] + +The current code base doesn't have that many files. We can imagine that, when +\LUA\ can be compiled on a platform, that compiling \LUAMETATEX\ is also no that +complicated. However, the rather complex build infrastructure demonstrates the +opposite. One of the complications is that \MPLIB\ is codes in \CWEB\ and that +needs some juggling to get \CCODE. The process has quite some dependencies. There +are some upstream patches needed, but for now occasionally checking with the +upstream sources used for compiling \MPLIB\ in \LUATEX\ works okay. \footnote +{Later I decided to cleanup the \MPLIB\ code: unused font related code was +removed, the \POSTSCRIPT\ backend was untangled, the translation from \CWEB\ to +\CCODE\ got done by a \LUA\ script, aspects like error reporting and \IO\ were +redone, and in the end some new extensions were added. Some of that might trickle +back to th original, as long as it doesn't harm compatibility; after all +\METAPOST\ (the program) is standardized and considered functionally stable.} + +As \LUAMETATEX\ is also used for experiments we use a copy of the \LUA\ library +interface. That way we don't interfere with the stable \LUATEX\ situation. When +we play with extensions, we can always decide to backport them, once they are +found useful and in good working order. But, as that interface was just \CCODE\ +this was trivial. + +\stopsection + +\startsection[title={Files}] + +In a relative late stage I decided to cleanup some of the filename handling. +First I got rid of the \type {area}, \type {name} and \type {ext} decomposition +and optional recomposition. In the original engine that goes through the string +pool and although there is some recovery in the end, with many files and fonts +being used, the pool can get exhausted. For instance when you have hundreds of +thousands of \typ {\font \foo = bar} kind of definitions, each definition wipes +out the previous entry in the hash, but its font name is kept in the string pool. +I got rid of that side effect by reusing strings but in the end decided to avoid +the pool altogether. It was then a small step to also do that for other +filenames. In the process I also decided that it made no sense to keep the code +around that reads a filename from the console: we now just quit. Restarting the +program with a proper filename is no big deal today. I might do some more cleanup +there. In the end we can best use a callback for handling input from the console. + +\stopsection + +\stopchapter + +\stopcomponent -- cgit v1.2.3