From f72c2cf29d36ae836c894bad29dfd965d1af0236 Mon Sep 17 00:00:00 2001 From: Hans Hagen Date: Sun, 18 Aug 2019 22:51:53 +0200 Subject: 2019-08-18 22:26:00 --- .../manuals/followingup/followingup-evolution.tex | 373 +++++++++++++++++++++ 1 file changed, 373 insertions(+) create mode 100644 doc/context/sources/general/manuals/followingup/followingup-evolution.tex (limited to 'doc/context/sources/general/manuals/followingup/followingup-evolution.tex') diff --git a/doc/context/sources/general/manuals/followingup/followingup-evolution.tex b/doc/context/sources/general/manuals/followingup/followingup-evolution.tex new file mode 100644 index 000000000..730f4cc1b --- /dev/null +++ b/doc/context/sources/general/manuals/followingup/followingup-evolution.tex @@ -0,0 +1,373 @@ +% language=us + +\startcomponent followingup-evolution + +\environment followingup-style + +% Yes, music is still evolving in qualitive ways ... +% +% Home Is - Jacob Collier with VOCES8 +% +% and as long as there's interesting new music to run into I keep +% doing thse kind of things. + +\startchapter[title={Evolution}] + +\startsection[title={Introduction}] + +The original idea behind \TEX\ is that of a relatively small kernel with (either +or not system dependent) extensions. One such extension is the \DVI\ backend, and +later \PDFTEX\ added a \PDF\ backend. Other extensions are \quote {writing to +files} and \quote {writing to the output medium} using so called specials. This +extension mechanism permits \TEX\ to support, for instance, color and image +inclusion. + +The \LUATEX\ project started from \PDFTEX, including its extensions like font +expansion, and combined that with (bi|)|directional typesetting from the, at that +moment, stable \OMEGA\ variant \ALEPH. During the more than a decade development +we integrated expansion in a more efficient way and limited directions to the +four that made sense. The assumption that \UNICODE\ has the future lead to \UTF8 +being used all over the place. + +The \LUATEX\ variant opens up the internals using the \LUA\ extension language. +The idea was (and still is) that instead if adding more and more hard coded +solutions, one can use \LUA\ to do it on demand. So, for instance \OPENTYPE\ +fonts are supported by providing a font file reader but the implementation of +features is up to \LUA. From \PDFTEX\ the graphic inclusions were inherited but +an image and \PDF\ reading library provided a few more possibilities, for +instance for querying properties. An important integral part of \LUATEX\ is the +\METAPOST\ library, but apart from that one, the amount of libraries is kept at a +minimum. That way we're free of dependencies and compilation hassles. + +With version 1.0 the functionality became official and with version 1.1 the +functionality became more of less frozen. The main reason for this is that +further extensions would violate the principle of using \LUA\ instead of hard +coding solutions. Another reason is that at some point you have to provide a +stable machinery for macro packages so that backward as well as forward +compatibility over a longer period is possible. Also, because one can use \TEX\ +in (unattended) workflows sudden changes become undesirable. + +\stopsection + +\startsection[title={What next?}] + +Does it stop here? We have reached a reasonable stable state with \CONTEXT\ +\MKIV\ and can basically do what we want to do. However, during the more than a +decade development of this \MKII\ follow up, the idea surfaced that we can go +more minimal in the engine. Basically we can go back to where \TEX\ started: a +core plus extension mechanism. What does that mean? First of all, there is the +very efficient frontend: scanning macros, expanding them and constructing node +lists, all within a powerful grouping mechanism. There is no reason to reconsider +that. The core of the interface is also well documented, for instance in the +\TEX\ book. We added some primitives to \LUATEX, but most of them are of no real +importance to users; they make more sense to macro package writers. + +Original \TEX\ has a \DVI\ backend which is a simple representation of a page: +characters and rules positioned on some grid. A separate program has to convert +that into something for a printer. There is a basic extension mechanism that +permits injection of so called specials that get passed to the external program +so that for instance an image can be included. Given that \LUATEX\ is mostly used +to generate \PDF, using so called wide fonts in a \UNICODE\ universe, a \DVI\ +backend is not that useful. In fact, one can then better use the faster \PDFTEX\ +program or just \ETEX\ or \TEX: use the best tool available for the job. + +The backend however can be left out and can be implemented in \LUA\ instead. In +fact, most of the backend related code in \CONTEXT\ doesn't really use the +\LUATEX\ backend features at all. The backend is only used to convert the page +stream to a \PDF\ content stream, include images, include fonts and manage low +level objects. Everything specific to \PDF\ is already done in \LUA. Of course +this has a performance penalty but given the overhead already present in +\CONTEXT\ it is bearable. + +Alongside the frontend the \METAPOST\ library plays an important role in +\CONTEXT: integration between \TEX, \METAPOST\ and \LUA\ is pretty tight and a +unique property of \CONTEXT. But, for instance the font reader library is no +longer used. Also the interfacing to the \TEX\ Directory Structure was done in +\LUA, originally for performance reasons as it reduced startup time by more that +a second. For some of the frontend code (like hyphenation and par building) we +can kick in \LUA\ variants too but there is not much to gain there. (I know that +some users use them with success.) + +So, traditional \TEX\ can be summarized as: + +\starttyping +tex core + dvi backend + tex extensions +\stoptyping + +where the extension interface provide a few goodies. If we would have to summarize +\LUATEX\ we could say: + +\starttyping +tex core + dvi & pdf backend + tex extensions + lua callbacks +\stoptyping + +The core interprets the input and does the typesetting. In order to be able to +typeset \TEX\ only needs the dimensions of characters and information about +spacing (which in principle are sort of independent) in math mode a few more +properties are needed, like snippets that make large symbols. In text mode +ligature and kerning information can be used too. However, in \LUATEX, where +normally \OPENTYPE\ fonts are used, that information is provided from \LUA. This +means that one can also think of: + +\starttyping +tex core + basic font data + tex extensions + lua callbacks +\stoptyping + +Compared to regular \TEX\ this is not that different, and it's what \CONTEXT\ can +do with. So, it will be no surprise that when I wondered what \LUATEX\ 2.0 could +be that a more minimalistic approach was considered: back to the basics. + +\stopsection + +\startsection[title={Roadmap}] + +Before I continue it is good to mention the following. One of the burdens that +\CONTEXT\ users (and developers) carry is that the outside world likes putting +labels on \CONTEXT, like \quotation {A macro package depending on \PDFTEX} in a +time that we supported \DVI\ at the same level using a more of less generic +driver model. The same is true for \MKIV, e.g.\ \quotation {\CONTEXT\ uses a lot +of \LUA\ and moves away from \TEX} while in fact we provide a hybrid tool: you +can use \TEX\ input (which most users do) but also \LUA\ (which can be handy) or +\XML\ (which some publishers demand and definitely seems to be used by some +\CONTEXT\ power users). A special one is \quotation {\CONTEXT\ is kind of plain +\TEX, so you have to program all yourself.} Reality is that \CONTEXT\ is an +integrated system, where \TEX\ and \METAPOST\ work together to provide a lot of +integrated functionality. Because of \LUATEX\ development and the relation +between an updated engine and the beta version of \CONTEXT, the impression can be +that we have an unstable system. This strategy of parallel adaptation is the only +way to really test of things work as expected. Because we have a rather fast +update cycle normally users don't suffer that much from it. + +The core of whatever we follow up with is and remains \TEX, just because I like +it. So, when I talk about a small core, I actually still talk about \TEX. The +main reason is that it's way easier (and readable) to code some solutions in this +hybrid fashion. A pure \LUA\ solution is no fun, maybe even a pain, and I have no +use for it, but a pure \TEX\ solution can be cumbersome too. And \TEX\ input is +just very convenient and for that one needs a \TEX\ interpreter. I would already +have dropped out when \TEX\ was not part of the game: an intriguing, puzzling and +powerful toy. And \METAPOST\ and \LUA\ add even more fun. So, I settle for a mix +between three interesting languages. And, because I seldom run into professional +demand for \LUATEX\ related support (or high end, high performance rendering), +the fun factor has always been the driving force. + +All that said, for practical reasons, when we explore a follow up in the +perspective of \CONTEXT, we will use the working title \LUAMETATEX\ instead. +\LUAMETATEX\ has the current \LUATEX\ frontend, some \LUA\ libraries, but no +backend. Gone are the font reader, image inclusion, \DVI\ and \PDF\ backend +(including font inclusion) and the interface to the \TDS. Can that work? As +mentioned, the font reader was already not used in \CONTEXT\ for quite a while. An +alternative page stream builder was also in good working condition in \CONTEXT\ +when \LUATEX\ 1.08 was released and around \LUATEX\ 1.09 image inclusion was +replaced (\PDF\ inclusion was already accompanied for a while by a \LUA\ +variant). Currently (fall 2018) \CONTEXT\ is able to completely construct the +\PDF\ file which also meant font inclusion. However, it didn't make much sense to +release that code yet because after all, there was minimal gain when using it +with a full blown \LUATEX. Also, switching to this variant involved some runtime +adaption of code which might confuse users. But above all, it needed more +testing, and releasing something before an upcoming \TEX Live code freeze is a +bad idea. + +During \LUATEX\ development a few times we got suggestions for additional +features but merely looking at them already made clear that what works for +someone in a particular case, can introduce side effects that make (for instance) +\CONTEXT\ fail. And, how many folks keep \CONTEXT\ in mind? So, when \LUATEX\ +goes into maintenance mode, specific distributions could accept patches outside +our control, which has the danger that a binary (suggesting to be \LUATEX) +doesn't work with \CONTEXT. Of course we cannot change something ourselves either +without looking around. And I'm not even bringing possible negative side effects +on performance into the discussion here. + +When developing \LUATEX\ some ideas were dropped or delayed and these can now be +explored without the danger of messing up the stable version. It has always been +relatively easy to adapt \CONTEXT\ to changes so an (at least for now) +experimental follow up can be dealt with too, but this time the concept of \quote +{experimental} is really bound to \CONTEXT. When something is found useful (or +can be improved) it can always (after testing it for a while) be fed back into +\LUATEX, as long as it doesn't break something. I'll decide on that later. + +In the documentation of \TEX, when discussing the extension mechanism, Donald +Knuth says: + +\startquotation +The goal of a \TEX\ extender should be to minimize alterations to the standard +parts of the program, and to avoid them completely if possible. He or she should +also be quite sure that there's no easy way to accomplish the desired goals with +the standard features that \TEX\ already has. \quotation {Think thrice before +extending}, because that may save a lot of work, and it will also keep +incompatible extensions of \TEX\ from proliferating. +\stopquotation + +With the in the next chapters discussed reduction of backend and some frontend +code, combined with hooks that can trigger callbacks, we try to come close to +this objective. Now, the last sentence of this quote relates to stability and +this is also a reason why we enter this new thread: the smaller the core is, the +less subjected we are to change. Think of this: I haven't used \CONTEXT\ \MKII\ +in over a decade. A \PDFTEX\ format still gets generated but I have no clue if +the engine has been changed in ways that make some code behave differently (it +could also be the ecosystem related to that engine), but I assume it's still +behaving the same. The same has to become true for stock \LUATEX\ and \MKIV\ and +for \CONTEXT\ it can even become more true with \LUAMETATEX. We'll see. + +\stopsection + +\startsection[title={Experiments}] + +This (still sort of) prototype of what \LUAMETATEX\ could be boils down to a much +smaller binary, and not that much more \LUA\ code on top of what we already have. +There are no longer dependencies on third party code, apart from \LUA\ (\type +{pplib} is tuned for \LUATEX\ and permanent part of the code base). Performance +wise the backend of the experimental version makes a run upto 5\% slower than +when using a native backend (on processing the \LUATEX\ manual) but history has +learned that we can gain some of that back in due time. Performance also depends +a bit on the properties of the document. Interesting is that better control over +the output showed that \PDF\ output of the mentioned manual was a bit smaller +(but that might change). \footnote {In the meantime the experimental version can +process the \LUATEX\ manual 5\endash10\% faster and the result is still smaller.} + +The experiments actually started already years ago with no longer using the font +loader. It sort of went this way: + +\startitemize +\startitem + Stepwise \CONTEXT\ functionality started using a combination of \TEX\ and + \LUA\ code and we got an idea of what was needed. The most demanding part + was support for fonts. +\stopitem +\startitem + Font handling was done in \LUA\ because it's flexible which is what \TEX ies + are accustomed to. The \OPENTYPE\ and \PDF\ standards would not be called + standards if some implementation was impossible and so far we're ok. (Some + more script support will be provided in future versions.) +\stopitem +\startitem + We stopped using the fontforge font loader but use one written in \LUA\ + instead. One reason for this was that when variable fonts showed up we wanted + to support it in \CONTEXT\ right from the start (not that there has been much + demand). The same is true for fonts using color (like emoji). Also, fighting + the built|-|in \FONTFORGE\ heuristics was hard. +\stopitem +\startitem + The (large and dependent on \CPLUSPLUS) poppler library used for \PDF\ + embedding has been replaced by a small lightweight library in pure \CCODE. + This was triggered at a chat during a bacho\TEX\ meeting. +\stopitem +\startitem + The hard coded \PDF\ inclusion can be swapped with a \LUA\ based one so that + we can for instance filter the page stream. We already had a hybrid solution + in \CONTEXT\ anyway for other reasons (merging annotations, layers, + bookmarks, etc.). +\stopitem +\startitem + The page stream constructor got a (shipout and xforms) by a \LUA\ variant, + but I decided not to make that an independent option in stock \LUATEX\ with + \CONTEXT\ \MKIV, although for a while I had the option \type {--lmtx} for + activating that experimental code. +\stopitem +\startitem + Then of course bitmap image inclusion had to be done by \LUA\ code, in order + to see if we can get rid of another external dependency as some of these + libraries get frequent updates while in practice we only use a very small + subset of functionality. Indeed this was possible. \footnote {I have a pure + \LUA\ parser for \PDF\ too, so at some point that might get included in the + \CONTEXT\ code base.} +\stopitem +\startitem + With some effort (deciphering specs and such) the font inclusion could also + be done by a \LUA. This was made possible by the fact that we already had + support for variable fonts. More tricks are possible and will be explored. +\stopitem +\startitem + Finally the \PDF\ file construction and \PDF\ object management had to be + implemented. This was actually the easiest part. +\stopitem +\stopitemize + +Performance wise the \LUA\ font loader is faster than the built in one. The same +is true for \PDF\ inclusion but in practice that is unnoticeable. Bitmap +inclusion is currently slower for interlaced images (seldom used in print) and +just as efficient for other types. The page stream constructor is definitely +slower but this is compensated by the faster font inclusion and \PDF\ file +construction. Of course it all depends on the kind of content, but these are the +observation as of fall 2018. Anyway, they were enough reason to continue this +experiment. + +One thing to keep in mind is that the smaller the binary and the less code paths +we have, the better future performance might be. Computers are not becoming much +faster for single thread processes like \TEX, so the less we jump around code +space (memory) the better it probably is for \CPU\ caching (as caches are not +growing much either). + +\stopsection + +\startsection[title={Conclusion}] + +Normally when writing this kind of code I make sure that I can enable such new +mechanisms on top of others but at some point one has to decide how to really +integrate them. For instance, we can do font inclusion independent of \PDF\ +generation or page stream construction independent of \PDF\ generation and|/|or +font inclusion but in the end that doesn't make sense and makes the code base a +bit of a mess. So, this is how it will go. + +Stock \LUATEX\ with \MKIV\ will use the normal backend but probably there might +be an option to overload the built|-|in image inclusion so that one can avoid the +abortion of a run in case of problematic images. Complete \PDF\ file +construction, which then also includes page stream construction, font embedding +and object management might be available as option for \MKIV\ with \LUATEX\ 1.10 +(for a while) but will be default when using \LUAMETATEX. When we move on \LMTX\ +support might evolve in more sophisticated trickery. \footnote {A few months +later I decided that this made no sense, and that it was cleaner to just leave +that approach for \LMTX\ only. So, now both engines use different code +exclusively.} + +Once tested a bit in real documents experimental code will end up in the +distribution. That code can then be turned into production code (read: cleaned up +and reshuffled a bit). We can streamline the engine code base: strip the +components that are not needed any more, remove some obsolete features, optimize +the code, strip some functions from \LUA\ libraries, rename some helpers, and +finally add some documentation. There are some plans to extend \METAPOST\ so also +things can get added. Concerning the \LUA\ interface it means that \type +{slunicode} is removed, the embedded socket related \LUA\ code goes external (but +the library stays), the font loader gets removed, the \type {img} library goes +away, no longer \PNG\ libraries are embedded, synctex is stripped out (but the +fields in nodes stay or get extended). \footnote {Much later I also decided to +remove the zip file reader library.} The resulting binary will be much smaller +and the code base more independent and smaller too. In the process \LUAJIT\ +support might be dropped as well, simply because it no longer is in sync with +stock \LUA, but that also depends on how complex long term maintenance becomes. +\footnote {As we will see in following chapters, indeed support for \LUAJIT\ has +been dropped while \LUA\ got upgraded to 5.4.} + +Because such a stripped down binary is no longer what got presented as \LUATEX\ +version~1, it will basically become \LUATEX\ version 2, but then we have the +problem that its binary name clashes with the original. This is why it will be +run as \typ {luametatex}. For \CONTEXT\ it's not that relevant as it will run on +both \LUATEX\ 1.10 and its lean and mean successor. I might also provide a plain +\TEX\ (read: generic) version but that is to be decided because it probably +doesn't make much sense to spend time on it. As usual we will test this within +the \CONTEXT\ beta program. The good thing is that it doesn't interact with +\LUATEX, so that other macro packages are not affected. Another side effect can +be that we uncover issues with \LUATEX\ 1.10 and that we can experiment with some +improvements that we feed back into the parent. + +At the \CONTEXT\ end of this there are some plans to extend the export, maybe +improve already present \PDF\ tagging (if found useful), add some more input +(xml) manipulations, and maybe extend (virtual) font handling a bit, now that we +no longer are bound to the currently used packet model. Contrary to what one +might expect this is not really dependent on the engine. + +How do we proceed? As with the transition from \MKII\ to \MKIV, it will all +happen stepwise. This means that for a while the code base will be a bit hybrid +but at some point it might be partially split to make things cleaner, not that I +expect many fundamental differences (certainly not in the front|-|end). This +dualistic approach means more work but also makes that we keep a working +\CONTEXT. We also need to keep an eye on for instance generic commands as used in +tikz: we can't drop them so we emulate them (so far with success). As the time of +this writing, begin November 2018, the \CONTEXT\ test suite can be processed in +\LMTX\ mode without problems so I'm confident that it will work out ok. The next +chapter describes the results of how we did the above in more detail. + +\stopsection + +\stopchapter + +\stopcomponent -- cgit v1.2.3