diff options
Diffstat (limited to 'doc/context/sources/general/manuals/ontarget/ontarget-registers.tex')
-rw-r--r-- | doc/context/sources/general/manuals/ontarget/ontarget-registers.tex | 164 |
1 files changed, 164 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/ontarget/ontarget-registers.tex b/doc/context/sources/general/manuals/ontarget/ontarget-registers.tex new file mode 100644 index 000000000..e5e03fbda --- /dev/null +++ b/doc/context/sources/general/manuals/ontarget/ontarget-registers.tex @@ -0,0 +1,164 @@ +% language=us runpath=texruns:manuals/ontarget + +\startcomponent ontarget-registers + +\environment ontarget-style + +\startchapter[title={Gaining performance}] + +In the meantime (2022) the \LUAMETATEX\ engine has touched many aspects of the +original \TEX\ implementation. This has resulted in less memory consumption than +for instance \LUATEX\ when we talk tokens, more efficient macro handing, +additional storage options and numerous new features and optimizations. Of course +one can disagree about all of this, but what matters to us is that it facilitates +\CONTEXT\ well. That macro package went from \MKII\ to \MKIV\ to \MKXL\ (aka +\LMTX). + +Although over the years the macros evolved the basic ideas haven't changed: it is +a keyword driven macro package that is set up in a way that makes it possible to +move forward. In spite of what one might think, the fundamentals didn't change +much. It looks like we made the right decisions at the start, which means that we +can change low level implementations to match the engine without users noticing +much. Of course in the area of fonts, input encoding and languages things have +changed simply because the environment in which we operate changes. + +A fundamental difference between \PDFTEX\ and \LUAMETATEX\ is that the later is +in many aspects 32 and even 64 bit all over the place. That comes with a huge +performance hit but also with possibilities (that I won't discuss here now)! On a +simple document nothing can beat \PDFTEX, even with the optimizations that we can +apply when using the modern engines. However, on more complex documents reality +is that \LUAMETATEX\ can outperform \PDFTEX, and documents (read: user demands) +have become more complex indeed. + +So, how does that work in practice? One can add some features to an engine but +then the macro package has to be adapted. Due to the way \CONTEXT\ is organized +it was not that hard to keep it in sync with new features, although not all are +applied yet to full extend. Some new features improved performance, others made +the machinery (or its usage) a bit slower. The first versions of \LUAMETATEX\ +were some 25\percent\ slower than \LUATEX, simply because the backend is written +in \LUA. But, end 2022 we can safely say that \LUAMETATEX\ can be 50\percent\ +faster than its ancestor. This is due to a mix of the already mentioned +optimizations and new features, for instance a more powerful macro parser. The +backend has become more complex too, but also benefits from a few more helpers. + +Because we spend a lot of time in \LUA\ the interfaces to \TEX\ have been +extended and improved too. Of course we depend on the \LUA\ interpreter being +kept in optimum state by its authors. It must be said that quite some of the +interfaces might look obscure but these are not really meant for the average user +anyway. Also, as soon as one messes with tokens and nodes at that level one +definitely need to know what one's doing! + +The more stable the engine becomes, the less there is to improve. Occasionally it +was possible to squeeze our a few more milliseconds on run but it depends a lot +of what one does. And \TEX\ is already quite fast anyway. Of course 0.005 seconds +on a 5 second run is not much but hundred times such an improvement is +noticeable, especially when there are multiple runs or when one processes a batch +of 10.000 documents (each needing two runs). + +One interesting aspect of \TEX\ that it can surprise you every now and then. End +2022 I decided to play a bit more with a feature that has been around for a +while: + +\starttyping +\integerdef \fooA 123 +\dimensiondef\fooB 123pt +\stoptyping + +These primitives create a counter and a dimen where the value is stored in the hash +table. The original reason was that I didn't want to spoil registers. But although +these are basically constants there is more to it now. + +\starttyping +\countdef\fooC 27 +\dimendef\fooD 56 +\stoptyping + +These primitives create a command that stores the register number (here 27 and +56) with the name. In this case a \quote {variable} is accessed in two steps: the +\type {\fooC} macro expands to an register accessor with value 27. Next that +accessor will kick in and fetch (or set) the value in slot 27 of the memory range +bound to (in total 65K) counters. All these registers sit a the lower end of +\TEX's memory which is definitely not next to the meaning of \type {\fooC}. So we +have two memory accesses to get to the number. Contrary to that once we are at +\type {\fooA} we are also at the value. Although memory access can be fast when +the relevant slots are cached in practice it can give delays, especially in a +program like \TEX\ where most data is spread all over the place. And imagine other +processes competing for access too. + +It is for that reason that I decided to replace the more or less \quote +{constant} property of \type {\fooA} by one that also supports assignments As +well as the arithmic commands like \type {\advance}. This was not that hard due +to the way the \LUAMETATEX\ source is organized. After that using these pseudo +constants proved to be more efficient than registers, but of course I then had to +adapt the source. Interestingly that should have been easy because one only needs +to change the definitions of for instance \type {\newcount} but in practice that +doesn't work because it will|/|can break for instance generic packages like Tikz. + +So, in the end a new allocator was added and just over 1000 lines in some 120 +files (with some overlap) had to be adapted to this. In addition some precautions +had to be made for access from \LUA\ because the quantities were no longer +registers. But it was rewarding in the sense that the test suite now ran some +5\percent\ faster and processing the \LUAMETATEX\ manual went from 8.7 seconds on +my laptop down to around 8.5, which is not bad. + +Now why do we bother so much about performance? If I really want a faster run +using a decent desktop is of more help. But even then there can be reasons. When +Mikael and I were discussing math engine developments at some point we noticed +that a run took twice as much time as a result of (supposedly idle) background +tasks. Now keep in mind that \TEX\ uses a single core so with plenty cores it +should not be that bad. However, when the video chat program takes half of the +CPU power, or when a mathematical manipulation program idles in the background +taking 80 percent of a modern machine, or when a popular editor keeps all kind of +plug ins busy for no reason, or when a supposedly closed a browser consumes +gigabytes of memory and keeps dozens of supposedly idle threads busy, it becomes +clear that we should not let \TEX\ put a large burden on memory access (and +cache). + +It can get even worse when one runs on virtual machines where the host suggests +that you get 16 cores so that you can run a dozen \TEX\ jobs in parallel but +simple measurements show that these shared cores report a much higher ideal +performance than the one you measure. So, the less demanding a \CONTEXT\ run +becomes, the better: we're not so much after the .2 seconds on a 8 second run, +but more after 3 seconds for that same run when using shared resources where it +became 15 seconds. And this is what observations with respect to the performance +of the test suite seem to indicate. + +In the end it's mostly about comfort: when you process a document of 300 pages, +10 seconds is quite okay for a few changes, because one can relate time to +output, but 20 seconds \unknown\ And when processing a a few page document the +waiting time of a second is often less than what one needs to move the mouse +around to the viewer. Also, when a user starts \TEX\ on the console and +afterwards opens a browser from there that second is even less noticeable. + +Now let's go back to improvements. A related addition was \type {\advanceby} that +doesn't check for the \type {by} keyword. When there is no such keyword we can +avoid pushing back the non|-|matching next token which is also noticeable. Here +about 680 changes were needed. Changes like these only make a difference in +performance for some very demanding mechanisms in \CONTEXT. Again one cannot +overload an existing primitive because generic packages can fail (as the test +suite proved). There were also a few places where a dirty trick had to be changed +because we cannot alias these constants. + +We can give similar stories about other improvements but this one sort of stands +out because it is so noticeable. Also, other changes involve more drastic low +level adaptations of \CONTEXT\ so these happen over a longer period of time. Of +course all has to happen in ways that don't impact users. An example of a +performance primitive is \typ {\advancebyplusone} which is actually implemented +but still disabled because the gain is in hundreds of seconds range and I need to +(again) adapt the source in order to benefit. + +The mentioned register variants are implemented for count (integer), dimen +(dimension), skip (gluespec) and muskip (mugluespec). Token registers are more +complex as they have reference counters as well as more manipulator primitives. +The same is true for boxes (although it is tempting to come up with some faster +access mechanism) and attributes, that also have more diverse accessors. Also, +token lists and boxes involve way more than a simple assignment or access so any +gain will drown in other actions. That said, it really makes sense now to drop +the maximum of 64K registers to some more reasonable 8K (or even less for mu +skips). That will save a couple of megabytes which sounds like little but still +puts less burden on the system. + +\stopchapter + +\stopcomponent + |