diff options
Diffstat (limited to 'doc/context/sources/general/manuals/evenmore/evenmore-formats.tex')
-rw-r--r-- | doc/context/sources/general/manuals/evenmore/evenmore-formats.tex | 362 |
1 files changed, 362 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/evenmore/evenmore-formats.tex b/doc/context/sources/general/manuals/evenmore/evenmore-formats.tex new file mode 100644 index 000000000..7ec95a7d4 --- /dev/null +++ b/doc/context/sources/general/manuals/evenmore/evenmore-formats.tex @@ -0,0 +1,362 @@ +% language=us runpath=texruns:manuals/evenmore + +% This one accidentally ended up in the older history document followingup, +% btu it's now moved here. + +\environment evenmore-style + +\startcomponent evenmore-format + +\startchapter[title={The format file}] + +It is interesting when someone compares macro package and used parameters like +the size of a format file, the output of \type {\tracingall}, or startup time to +make some point. The point I want to make here is that unless you know exactly +what goes on in a run that involves a real document, which can itself involve +multiple runs, such a comparison is rather pointless. For sure I do benchmark but +I can only draw conclusions of what I (can) know (about). Yes, benchmarking your +own work makes sense but doing that in comparisons to what you consider +comparable variants assumes knowledge of more than your own work and objectives. + + +For instance, when you load few fonts, typeset one page and don't do anything +that demands any processing or multiple runs, you basically don't measure +anything. More interesting are the differences between 10 or 500 pages, a few +font calls or tens of thousands, no color of extensive usage of color and other +properties, interfacing, including inheritance of document constructs, etc. And +even then, when comparing macro packages, it is kind of tricky to deduce much +from what you observe. You really need to know what is going on inside and also +how that relates to for instance adaptive font scaling. You can have a fast start +up but if a users needs one tikz picture, loading that package alone will make +you forget the initial startup time. You always pay a price for advanced features +and integration! And we didn't even talk about the operating system caching +files, running on a network share, sharing processors among virtual machines, +etc. + +Pointless comparing is also true for looking at the log file when enabling \type +{\tracingall}. When a macro package loads stuff at startup you can be sure that +the log file is larger. When a font or language is loaded the first time, or +maybe when math is set up there can be plenty of lines dumped. Advanced analysis +of conditions and trial runs come at a price too. And eventually, when a box is +shown the configured depth and breadth really matter, and it might also be that +the engine provides much more (verbose) detail. So, a comparison is again +pointless. It can also backfire. Over the decades of developing \CONTEXT\ I have +heard people working on systems make claims like \quotation {We prefer not to +\unknown} or \quotation {It is better to it this way \unknown} or (often about +operating system) \quotation {It is bad that \unknown} just to see years later +the same being done in the presumed better alternative. I can have a good laugh +about that: do this and don't do that backfiring. + +That brings us to the format file. When you make a \CONTEXT\ format with the +English user interface, with interfacing being a feature that itself introduces +overhead, the \LUATEX\ engine will show this at the end: + +\starttyping +Beginning to dump on file cont-en.fmt + (format=cont-en 2021.6.9) +48605 strings using 784307 bytes +1050637 memory locations dumped; current usage is 414&523763 +44974 multiletter control sequences +\font\nullfont=nullfont +0 preloaded fonts +\stoptyping + +The file itself is quite large: 11,129,903 bytes. However, it is actually much +larger because the format file is compressed! The real size is 19.399.216. Not +taking that into account when comparing the size of format files is kind of bad +because compression directly relates to what resources a format use and how +usage is distributed over the available memory blobs. The \LUATEX\ engine does +some optimizations and saves the data sparse but the more holes you create, the +worse it gets. For instance, the large character vectors are compartmentalized in +order to handle \UNICODE\ efficiently so the used memory relates to what you +define: do you set up all catcodes or just a subset. Maybe you delay some +initialization to after the format is loaded, in which case a smaller format file +gets compensated by more memory usage and initializaton time afterwards. Maybe +your temporary macros create holes in the token array. The memory that is +configured in the configuration files also matter. Some memory blobs are saved at +their configured size, others dismiss the top part that is not used when saving +the format but allocate the lot when the format is loaded. That means that memory +usage in for instance \LUATEX\ can be much larger than a format file suggests. +Keep in mind that a format file is basically a memory dump. + +Now, how does \LUAMETATEX\ compare to \LUATEX. Again we will look at the size of +the format file, but you need to keep in mind that for various reasons the \LMTX\ +macros are somewhat more efficient than the \MKIV\ ones, in the meantime some new +mechanism were added, which adds more \TEX\ and \LUA\ code, but I still expect +(at least for now) a smaller format file. However when we create the format we +see this (reformatted): + +\starttyping +Dumping format 'cont-en.fmt 2021.6.9' in file 'cont-en.fmt': +tokenlist compacted from 489733 to 488204 entries, +1437 potentially aliased lua call/value entries, +max string length 69, 16 fingerprint ++ 16 engine + 28 preamble ++ 836326 stringpool ++ 10655 nodes + 3905660 tokens ++ 705300 equivalents ++ 23072 math codes + 493024 text codes ++ 38132 primitives + 497352 hashtable ++ 4 fonts + 10272 math + 1008 language + 180 insert ++ 10305643 bytecodes ++ 12 housekeeping = 16826700 total. +\stoptyping + +This looks quite different from the \LUATEX\ output. Here we report more detail: +for each blob we mention the number of bytes used. The final result is a file +that takes 16.826.700 bytes. That number should be compared with the 19.399.216 +for \LUATEX. So, we need less indeed. But, when we compress the \LUAMETATEX\ +format we get this: 5,913,932 which is much less than the 11,129,903 compressed +size that the \LUATEX\ engine makes of it. One reason for using level 3 zip +compression compression in \LUATEX\ is that (definitely when we started) it loads +faster. It adds to creating the dump but doesn't really influence loading, +although that depends a bit on the compiler used. It is not easy to see from +these numbers what goes on, but when you consider the fact that we mostly store +32 bit numbers it will also be clear that many can be zero or have two or three +zero bytes. There's a lot of repetition involved! + +So let's look at some of these numbers. The mentioning of token list compaction +relates to getting rid of holes in memory. Each token takes 8 bytes, 4 for the +token identifier, internally called a cmd and chr, and 4 for a value like an +integer or dimension value, or a glue pointer, or a pointer to a next token, etc. +In our case compaction doesn't save that much. + +The mentioning of potentially aliased \LUA\ call|/|value entries is more a warning. +Because the \LUA\ engine starts fresh each run, you cannot store its \quote +{pointers} and because hashes are randomized this means that you need to delay +initialization to startup time, definitely for function tokens. + +Strings in \TEX\ can be pretty long but in practice they aren't. In \CONTEXT\ the +maximum string length is 69. This makes it possible to use one byte for +registering the string length instead of four which saves quite a bit. Of course +one large string will spoil this game. + +The fingerprint, engine, preamble and later housekeeping bytes can be neglected +but the string pool not. These are the bytes that make up the strings. The bytes +are stored in format but when loaded become dynamically allocated. The \LUATEX\ +engine and its successor don't really have a pool. + +Now comes a confusing number. There are not tens of thousands of nodes allocated. +A node is just a pointer into a large array so actually node references are just +indices. Their size varies from 2 slots to 25; the largest are par nodes, while +shape nodes are allocated dynamically. So what gets reported are the number of +bytes that nodes take. each node slot takes 8 bytes, so a glyph node of 12 +bytes takes 96 bytes, while a glue spec node (think skip registers) takes 5 slots +or 40 bytes. These are amounts of memory that were not realistic when \TEX\ was +written. For the record: in \LUATEX\ glue spec nodes are not shared, so we have +many more. + +The majority of \TEX\ related dump data is for tokens, and here we need 3905660 +which means 488K tokens (each reported value also includes some overhead). The +memory used for the table of equivalents makes for some 88K of them. This table +relates to macros (their names and content). Keep in mind that (math) character +references are also macros. + +The next sections that get loaded are math and text codes. These are the +mentioned compartimized character properties. The number of math codes is not +that large (because we delay much of math) but the text codes are plenty, think +of lc, uc, sf, hj, catcodes, etc. Compared to \LUATEX\ we have more categories +but use less space because we have an more granular storage model. Optimizing +that bit really payed off, also because we have more vectors. + +The way primitives and macro names get resolved is pretty much the same in all +engines but by using the fact that we operate in 32 bit I could actually get rid +of some parallel tables that handle saving and restore. Some optimizations relate +to the fact that the register ranges are part of the game so basically we have +some holes in there when they are not used. I guess this is why \ETEX\ uses a +sparse model for the registers above 255. What also saved a lot is that we don't +need to store font names, because these are available in another way; even in +\LUATEX\ that takes a large, basically useless, chunk. The memory that a macro +without parameters consumes is 8 bytes smaller and in \CONTEXT\ we have lots of +these. +We don't really store fonts, so that section is small, but we do store the math +parameters, and there is not much we can save there. We also have more such +parameters in \LUAMETATEX\ so there we might actually use more storage. The +information related to languages is also minimal because patterns and exceptions +are loaded at runtime. A new category (compared to \LUATEX) is inserts because in +\LUAMETATEX\ we can use an alternative (not register based) variant. As you can +see from the 180 bytes used, indeed \CONTEXT\ is using that variant. + +That leaves a large block of more than 10 million bytes that relates to \LUA\ +byte code. A large part of that is the huge \LUA\ character table that \CONTEXT\ +uses. The implementation of font handling also takes quite a bit and we're not +even talking of all the auxiliary \LUA\ modules, \XML\ processing, etc. When +\CONTEXT\ would load that on demand, which is nearly always, the format file +would be much smaller but one would pay for it later. Loading the (some 600) +\LUA\ byte code chunks takes of course some time as does initialization but not +much. + +All that said, the reason why we have a large format file can be understood well +if one considers what goes in there. The \CONTEXT\ format files for \PDFTEX\ and +\XETEX\ are 3.3 and 4.7 MB each which is smaller but not that much when you +consider the fact that there is no \LUA\ code stored and that there are less +character tables and an \ETEX\ register model used. But a format file is not the +whole story. Runtime memory usage also comes at a price. + +The current memory settings of \CONTEXT\ are as follows; these values get +reported when a format has been generated and can be queried at runtime an any +moment: + +\starttabulate[|l|r|r|r|r|] +\BC \BC max \BC min \BC set \BC stp \BC \NR +\HL +\BC string \NC 2097152 \NC 150000 \NC 500000 \NC 100000 \NC \NR +\BC pool \NC 100000000 \NC 10000000 \NC 20000000 \NC 1000000 \NC \NR +\BC hash \NC 2097152 \NC 150000 \NC 250000 \NC 100000 \NC \NR +\BC lookup \NC 2097152 \NC 150000 \NC 250000 \NC 100000 \NC \NR +\BC node \NC 50000000 \NC 1000000 \NC 5000000 \NC 500000 \NC \NR +\BC token \NC 10000000 \NC 1000000 \NC 10000000 \NC 250000 \NC \NR +\BC buffer \NC 100000000 \NC 1000000 \NC 10000000 \NC 1000000 \NC \NR +\BC input \NC 100000 \NC 10000 \NC 100000 \NC 10000 \NC \NR +\BC file \NC 2000 \NC 500 \NC 2000 \NC 250 \NC \NR +\BC nest \NC 10000 \NC 1000 \NC 10000 \NC 1000 \NC \NR +\BC parameter \NC 100000 \NC 20000 \NC 100000 \NC 10000 \NC \NR +\BC save \NC 500000 \NC 100000 \NC 500000 \NC 10000 \NC \NR +\BC font \NC 100000 \NC 250 \NC 250 \NC 250 \NC \NR +\BC language \NC 10000 \NC 250 \NC 250 \NC 250 \NC \NR +\BC mark \NC 10000 \NC 50 \NC 50 \NC 50 \NC \NR +\BC insert \NC 500 \NC 10 \NC 10 \NC 10 \NC \NR +\stoptabulate + +The maxima is what can be used at most. Apart from the magic number 2097152 all +these maxima can be bumped at compile time but if you need more, you might wonder +of your approach to rendering makes sense. The minima are what always gets +allocated, and again these are hard coded defaults. The size can be configured +and is normally the same as the minima but we use larger values in \CONTEXT. The +step is how much an initial memory blob will grow when more is needed than is +currently available. The last four entries show that we don't start out with many +fonts (especially when we use the \CONTEXT\ compact font model not that many are +needed) and because \CONTEXT\ implements marks in a different way we actually +don't need them. We do use the new insert properties storage model and for now +the set sizes are enough for what we need. + +In practice a \LUAMETATEX\ run uses less memory than a \LUATEX\ one, not only +because memory allocation is more dynamic, but also because of other +optimizations. When the compact font model is used (something \CONTEXT) even less +memory is needed. Even this claim should be made with care. Whenever I discuss +the use of resources one needs to limit the conclusions to \CONTEXT. I can't +speak for other macro packages simply because I don't know the internals and the +design decisions made and their impact on the statistics. As a teaser I show the +impact of some definitions: + +\starttyping +\chardef \MyFooA1234 +\Umathchardef\MyFooB"1 "0 "1234 +\Umathcode 1 2 3 4 +\def \MyFooC{ABC} +\def \MyFooD#1{A#1C} +\def \MyFooE{\directlua{print("some lua")}} +\stoptyping + +The stringpool grows because we store the names (here they are of equal length). +Only symbolic definitions bump the hashtable and equivalents. And with +definitions that have text inside the number of bytes taken by tokens grows fast +because every character in that linked list takes 8 bytes, 4 for the character +with its catcode state and 4 for the link to the next token. + +\starttabulate[|l||||||] +\BC \BC stringpool \BC tokens \BC equivalents \BC hashtable \BC total \NC \NR +\HL +\NC \NC 836408 \NC 3906124 \NC 705316 \NC 497396 \NC 16828987 \NC \NR +\NC \type {\chardef} \NC 836415 \NC 3906116 \NC 705324 \NC 497408 \NC 16829006 \NC \NR +\NC \type {\Umathchardef} \NC 836422 \NC 3906116 \NC 705324 \NC 497420 \NC 16829025 \NC \NR +\NC \type {\Umathcode} \NC 836422 \NC 3906124 \NC 705324 \NC 497420 \NC 16829033 \NC \NR +\NC \type {\def} (no arg) \NC 836429 \NC 3906148 \NC 705332 \NC 497428 \NC 16829080 \NC \NR +\NC \type {\def} (arg) \NC 836436 \NC 3906196 \NC 705340 \NC 497440 \NC 16829155 \NC \NR +\NC \type {\def} (text) \NC 836443 \NC 3906372 \NC 705348 \NC 497452 \NC 16829358 \NC \NR +\stoptabulate + +So, every time a user wants some feature (some extra checking, a warning, color +or font support for some element) that results in a trivial extension to the +core, it can bump the size fo the format file more than you think. Of course when +it leads to some overhaul sharing code can actually make the format shrink too. I +hope it is clear now that there really is not much to deduce from the bare +numbers. Just try to imagine what: + +\starttyping +\definefilesynonym + [type-imp-newcomputermodern-book.mkiv] + [type-imp-newcomputermodern.mkiv] +\stoptyping + +adds to the format. Convenience has a price. + +\stopchapter + +\stopcomponent + +% Some bonus content: + +When processing thousand paragraphs \type {tufte.tex}, staying below 4 seconds +(just over 60 pages per second) all|-|in that looks ok. But it doesn't say that +much. Outputting 1000 pages in 2 seconds tells a bit about the overhead on a page +but again in practice things work out differently. So what do we need to +consider? + +\startitemize + +\startitem + Check what macros and resources are preloaded and what gets always loaded at + runtime. +\stopitem + +\startitem + After a first run it's likely that the operating system has resources in its + cache so start measuring after a few runs. +\stopitem + +\startitem + Best run a test many times and and take the average runtime. +\stopitem + +\startitem + Simple macro performance tests can be faster than in real usage because the + related bytes are in \CPU\ cache memory. So one can only use that to test a + specific improvement (or hit due to added functionality). +\stopitem + +\startitem + The size of the used \TEX\ tree can matter. The file databases need to be + loaded and consulted. +\stopitem + +\startitem + The binary matters: is it optimized, does it load libraries, is it 64 bit or not. +\stopitem + +\startitem + Local and|/|or global font definitions can hit performance and when a style + does many redundant switches it might hit performance. Of course that only is + the case when font switching is adaptive. +\stopitem + +\startitem + The granularity of subsystems impacts performance: advanced color support, + inheritance used in mechanisms, abstraction combined with extensive + support for features, it all matters. +\stopitem + +\startitem + The more features one enables the more it will impact performance as does + preprocessing the input (normalizing, bidi checking, etc). +\stopitem + +\startitem + It matters how the page (and layout) dimensions are defined. Although + language doesn't really play a role (apart from possible hyphenation) + specific scripts might. +\stopitem + +\stopitemize + +These are just a few points, but it might be clear that I don't take comparisons +too serious simply because it's real runs that matter. As long as we're in the +runtime comfort zone we're okay. You can run tests within the domain of a macro +package but comparing macro package makes not that much sense. It can even +backfire, especially when claims were made about what should be or not be in a +kernel (while later violating that) or relying on old stories (or rumors) about a +variant macro package being slow. (The same is true when comparing one's favorite +operating system.) Yes, the \CONTEXT\ format file is huge and performance less +than for instance plain \TEX. If that is a problem and not a virtue then make +sure your own alternative will never end up like that. And just don't come to +conclusions about a system that you don't really know. |