summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/hybrid/hybrid-optimize.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/context/sources/general/manuals/hybrid/hybrid-optimize.tex')
-rw-r--r--doc/context/sources/general/manuals/hybrid/hybrid-optimize.tex501
1 files changed, 501 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/hybrid/hybrid-optimize.tex b/doc/context/sources/general/manuals/hybrid/hybrid-optimize.tex
new file mode 100644
index 000000000..00772ee4b
--- /dev/null
+++ b/doc/context/sources/general/manuals/hybrid/hybrid-optimize.tex
@@ -0,0 +1,501 @@
+% language=uk
+
+\startcomponent hybrid-optimize
+
+\environment hybrid-environment
+
+\startchapter[title={Optimizations again}]
+
+\startsection [title={Introduction}]
+
+Occasionally we do some timing on new functionality in either
+\LUATEX\ or \MKIV, so here's another wrapup.
+
+\stopsection
+
+\startsection [title={Font loading}]
+
+In \CONTEXT\ we cache font data in a certain way. Loading a font from the cache
+takes hardly any time. However, preparation takes more time as well memory as we
+need to go from the fontforge ordering to one we can use. In \MKIV\ we have
+several font tables:
+
+\startitemize[packed]
+\startitem
+ The original fontforge table: this one is only loaded once and converted to
+ another representation that is cached.
+\stopitem
+\startitem
+ The cached font representation that is the basis for further manipulations.
+\stopitem
+\startitem
+ In base mode this table is converted to a (optionally cached) scaled \TFM\
+ table that is passed to \TEX.
+\stopitem
+\startitem
+ In node mode a limited scaled version is passed to \TEX. As with base mode,
+ this table is kept in memory so that we can access the data.
+\stopitem
+\startitem
+ When processing features in node mode additional (shared) subtables are
+ created that extend the memorized catched table.
+\stopitem
+\stopitemize
+
+This model is already quite old and dates from the beginning of \MKIV. Future
+versions might use different derived tables but for the moment we need all this
+data if only because it helps us with the development.
+
+The regular method to construct a font suitable for \TEX, either or not using
+base mode or node mode in \MKIV, is to load the font as table using \type
+{to_table}, a \type {fontloader} method. This means that all information is
+available (and can be manipulated). In \MKIV\ this table is converted to another
+one and in the process new entries are added and existing ones are freed. Quite
+some garbage collection and table resizing takes place in the process. In the
+cached instance we share identical tables so there we can gain a lot of memory
+and avoid garbage collection.
+
+The difference in usage is as follows:
+
+\starttyping
+do
+ local f = fontloader.open("somefont.otf") -- allocates font object
+ local t = fontloader.to_table(f) -- allocates table
+ fontloader.close(f) -- frees font object
+ for index, glyph in pairs(t) do
+ local width = glyph.width -- accesses table value
+ end
+end -- frees table
+\stoptyping
+
+Here \type {t} is a complete \LUA\ table and it can get quite large: script fonts
+like Zapfino (for latin) or Husayni (for arabic) have lots of alternate shapes
+and much features related information, fonts meant for \CJK\ usage have tens of
+thousands of glyphs, and math fonts like Cambria have many glyphs and math
+specific information.
+
+\starttyping
+do
+ local f = fontloader.open("somefont.otf") -- allocates font object
+ for index=0, t.glyphmax-1 do
+ local glyph = f.glyphs[index] -- assigns user data object
+ if glyph then
+ local width = glyph.width -- calls virtual table value
+ end
+ end
+ fontloader.close(f) -- frees font object
+end
+\stoptyping
+
+In this case there is no big table, and \type {glyph} is a so called userdata
+object. Its entries are created when asked for. So, where in the first example
+the \type {width} of a glyph is a number, in the second case it is a function
+disguised as virtual key that will return a number. In the first case you can
+change the width, in the second case you can't.
+
+This means that if you want to keep the data around you need to copy it into
+another table but you can do that stepwise and selectively. Alternatively you can
+keep the font object in memory. As some glyphs can have much data you can imagine
+that when you only need to access the width, the userdata method is more
+efficient. On the other hand, if you need access to all information, the first
+method is more interesting as less overhead is involved.
+
+In the userdata variant only the parent table and its glyph subtable are
+virtualized, as are entries in an optional subfonts table. So, if you ask for the
+kerns table of a glyph you will get a real table as it makes no sense to
+virtualize it. A way in between would have been to request tabls per glyph but as
+we will see there is no real benefit in that while it would further complicate
+the code.
+
+When in \LUATEX\ 0.63 the loaded font object became partially virtual it was time
+to revision the loading code to see if we could benefit from this.
+
+In the following tables we distinguish three cases: the original but adapted
+loading code \footnote {For practical reasons we share as much odd as possible
+between the methods so some reorganization was needed.}, already a few years old,
+the new sparse loading code, using the userdata approach and no longer a raw
+table, and a mixed approach where we still use the raw table but instead of
+manipulating that one, construct a new one from it. It must be noticed that in
+the process of integrating the new method the traditional method suffered.
+
+First we tested Oriental \TEX's Husayni font. This one has lots of features, many
+of lookups, and quite some glyphs. Keep in mind that the times concern the
+preparation and not the reload from the cache, which is more of less neglectable.
+The memory consumption is a snapshot of the current run just after the font has
+been loaded. Peak memory is what bothers most users. Later we will explain what
+the values between parenthesis refer to.
+
+\starttabulate[|l|c|c|c|]
+\FL
+\NC \NC \bf used memory \NC \bf peak memory \NC \bf font loading time \NC \NR
+\TL
+\NC \bf table \NC 113 MB (102) \NC 118 MB (117) \NC 1.8 sec (1.9) \NC \NR
+\NC \bf mixed \NC 114 MB (103) \NC 119 MB (117) \NC 1.9 sec (1.9) \NC \NR
+\NC \bf sparse \NC 117 MB (104) \NC 121 MB (120) \NC 1.9 sec (2.0) \NC \NR
+\NC \bf cached \NC ~75 MB \NC ~80 MB \NC 0.4 sec \NC \NR
+\NC \bf baseline \NC ~67 MB \NC ~71 MB \NC 0.3 sec \NC \NR
+\LL
+\stoptabulate
+
+So, here the new method is not offering any advantages. As this is a font we use
+quite a lot during development, any loading variant will do the job with similar
+efficiency.
+
+Next comes Cambria, a font that carries lots of glyphs and has extensive support
+for math. In order to provide a complete bodyfont setup some six instances are
+loaded. Interesting is that the original module needs 3.9 seconds instead if 6.4
+which is probably due to a different ordering of code which might influence the
+garbage collector and it looks like in the reorganized code the garbage collector
+kicks in a few times during the font loading. Already long ago we found out that
+this is also somewhat platform dependent.
+
+\starttabulate[|l|c|c|c|]
+\FL
+\NC \NC \bf used memory \NC \bf peak memory \NC \bf font loading time \NC \NR
+\TL
+\NC \bf table \NC 155 MB (126) \NC 210 MB (160) \NC 6.4 sec (6.8) \NC \NR
+\NC \bf mixed \NC 154 MB (130) \NC 210 MB (160) \NC 6.3 sec (6.7) \NC \NR
+\NC \bf sparse \NC 140 MB (123) \NC 199 MB (144) \NC 6.4 sec (6.8) \NC \NR
+\NC \bf cached \NC ~90 MB \NC ~94 MB \NC 0.6 sec \NC \NR
+\NC \bf baseline \NC ~67 MB \NC ~71 MB \NC 0.3 sec \NC \NR
+\LL
+\stoptabulate
+
+Here the sparse method reports less memory usage. There is no other gain as there
+is a lot of access to glyph data due to the fact that this font is rather
+advanced. More virtualization would probably work against us here.
+
+Being a \CJK\ font, the somewhat feature|-|dumb but large AdobeSongStd-Light has
+lots of glyphs. In previous tables we already saw values between parenthesis:
+these are values measured with implicit calls to the garbage collector before
+writing the font to the cache. For this font much more memory is used but garbage
+collection has a positive impact on memory consumption but drastic consequences
+for runtime. Eventually it's the cached timing that matters and that is a
+constant factor but even then it can disturb users if a first run after an update
+takes so much time.
+
+\starttabulate[|l|c|c|c|]
+\FL
+\NC \NC \bf used memory \NC \bf peak memory \NC \bf font loading time \NC \NR
+\TL
+\NC \bf table \NC 180 MB (125) \NC 185 MB (172) \NC 4.4 sec (4.5) \NC \NR
+\NC \bf mixed \NC 190 MB (144) \NC 194 MB (181) \NC 4.4 sec (4.7) \NC \NR
+\NC \bf sparse \NC 153 MB (119) \NC 232 MB (232) \NC 8.7 sec (8.9) \NC \NR
+\NC \bf cached \NC ~96 MB \NC 100 MB \NC 0.7 sec \NC \NR
+\NC \bf baseline \NC ~67 MB \NC ~71 MB \NC 0.3 sec \NC \NR
+\LL
+\stoptabulate
+
+Peak memory is quite high for the sparse method which is due to the fact that we
+have only glyphs (but many) so we have lots of access and small tables being
+created and collected. I suspect that in a regular run the loading time is much
+lower for the sparse case because this is just too much of a difference.
+
+The last test loaded 40 variants of Latin Modern. Each font has reasonable number
+of glyphs (covering the latin script takes some 400--600 glyphs), the normal
+amount of kerning, but hardly any features. Reloading these 40 fonts takes about
+a second.
+
+\starttabulate[|l|c|c|c|]
+\FL
+\NC \NC \bf used memory \NC \bf peak memory \NC \bf font loading time \NC \NR
+\TL
+\NC \bf table \NC 204 MB (175) \NC 213 MB (181) \NC 13.1 sec (16.4) \NC \NR
+\NC \bf mixed \NC 195 MB (168) \NC 205 MB (174) \NC 13.4 sec (16.5) \NC \NR
+\NC \bf sparse \NC 198 MB (165) \NC 202 MB (170) \NC 13.4 sec (16.6) \NC \NR
+\NC \bf cached \NC 147 MB \NC 151 MB \NC ~1.7 sec \NC \NR
+\NC \bf baseline \NC ~67 MB \NC ~71 MB \NC ~0.3 sec \NC \NR
+\LL
+\stoptabulate
+
+The old method wins in runtime and this makes it hard to decide which strategy to
+follow. Again the numbers between parenthesis show what happens when we do an
+extra garbage collection sweep after packaging the font instance. A few more
+sweeps in other spots will bring down memory a few megabytes but at the cost of
+quite some runtime. The original module that uses the table approach is 3~seconds
+faster that the current one. As the code is essentially the same but organized
+differently again we suspect the garbage collector to be the culprit.
+
+So when we came this far, Taco and I did some further tests and on his machine
+Taco ran a profiler on some of the tests. He posted the following conclusion to
+the \LUATEX\ mailing list:
+
+\startnarrower
+It seems that the userdata access is useful if {\em but only if} you are very low
+on memory. In other cases, it just adds extra objects to be garbage collected,
+which makes the collector slower. That is on top of extra time spent on the
+actual calls, and even worse: those extra gc objects tend to be scattered around
+in memory, resulting in extra minor page faults (cpu cache misses) and all that
+has a noticeable effect on run speed: the metatable based access is 20--30\%
+slower than the old massive \type {to_table}.
+
+Therefore, there seems little point in expanding the metadata functionality any
+further. What is there will stay, but adding more metadata objects appears to be
+a waste of time on all sides.
+\stopnarrower
+
+This leaves us with a question: should we replace the old module by the
+experimental one? It makes sense to do this as in practice users will not be
+harmed much. Fonts are cached and loading a cached font is not influenced. The
+new module leaves the choice to the user. He or she can decide to limit memory
+usage (for cache building) by using directives:
+
+\starttyping
+\enabledirectives[fonts.otf.loader.method=table]
+\enabledirectives[fonts.otf.loader.method=mixed]
+\enabledirectives[fonts.otf.loader.method=sparse]
+
+\enabledirectives[fonts.otf.loader.cleanup]
+\enabledirectives[fonts.otf.loader.cleanup=1]
+\enabledirectives[fonts.otf.loader.cleanup=2]
+\enabledirectives[fonts.otf.loader.cleanup=3]
+\stoptyping
+
+The cleanup has three levels and each level adds a garbage collection sweep (in a
+different spot). Of course three sweeps per font that is prepared for caching has
+quite some impact on performance. If your computer has enough memory it makes no
+sense to use any of these directives. For the record: these directives are not
+available in the generic (plain \TEX) variant, at least not in the short term. As
+Taco mentions, cache misses can have drastic consequences and we've ran into that
+years ago already when support for \OPENTYPE\ math was added to \LUATEX: out of a
+sudden and without no reason passing a font table to \TEX\ became twice as slow
+on my machine. This is comparable with the new, reorganized table loader being
+slower than the old one. Eventually I'll get back that time, which is unlikely to
+happen with the unserdata variant where there is no way to bring down the number
+of function calls and intermediate table creation.
+
+The previously shown values that concern all fonts including creating, caching,
+reloading, creating a scaled instance and passing the data to \TEX. In that
+process quite some garbage collection can happen and that obscures the real
+values. However, in \MKIV\ we report the conversion time when a font gets cached
+so that the user at least sees something happening. These timings are on a per
+font base. Watch the following values:
+
+\starttabulate[|l|l|l|]
+\FL
+\NC \NC \bf table \NC \bf sparse \NC \NR
+\TL
+\NC \bf song \NC 3.2 \NC 3.6 \NC \NR
+\NC \bf cambria \NC 4.9 (0.9 1.0 0.9 1.1 0.5 0.5) \NC 5.6 (1.1 1.1 1.0 1.2 0.6 0.6) \NC \NR
+\NC \bf husayni \NC 1.2 \NC 1.3 \NC \NR
+\LL
+\stoptabulate
+
+In the case of Cambria several fonts are loaded including subfonts from
+\TRUETYPE\ containers. This shows that the table variant is definitely faster. It
+might be that later this is compensated by additional garbage collection but that
+would even worsen the sparse case were more extensive userdata be used. These
+values more reflect what Taco measured in the profiler. Improvements to the
+garbage collector are more likely to happen than a drastic speed up in function
+calls so the table variant is still a safe bet.
+
+There are a few places where the renewed code can be optimized so these numbers
+are not definitive. Also, the loader code was not the only code adapted. As we
+cannot manipulate the main table in the userdata variant, the code related to
+patches and extra features like \type {tlig}, \type {trep} and \type {anum} had
+to be rewritten as well: more code and a bit more close to the final table
+format.
+
+\starttabulate[|l|c|c|]
+\FL
+\NC \NC \bf table \NC \bf sparse \NC \NR
+\TL
+\NC \bf hybrid \NC 310 MB / 10.3 sec \NC 285 MB / 10.5 sec \NC \NR
+\NC \bf mk \NC 884 MB / 47.5 sec \NC 878 MB / 48.7 sec \NC \NR
+\LL
+\stoptabulate
+
+The timings in the previous table concern runs of a few documents where the \type
+{mk} loads quite some large and complex fonts. The runs are times with an empty
+cache so all fonts are preprocessed. The memory consumption is the peak load as
+reported by the task manager and we need to keep in mind that \LUA\ allocates
+more than it needs. Keep in mind that these values are so high because fonts are
+created. A regular run takes less memory. Interesting is that for \type {mk} the
+original implementation performs better but the difference is about a second
+which again indicates that the garbage collector is a major factor. Timing only
+the total runtime gives:
+
+\starttabulate[|l|c|c|c|c|]
+\FL
+\NC \NC \bf cached \NC \bf original \NC \bf table \NC \bf sparse \NC \NR
+\TL
+\NC \bf mk \NC 38.1 sec \NC 75.5 sec \NC 77.2 sec \NC 80.8 sec \NC \NR
+\LL
+\stoptabulate
+
+Here we used the system timer while in previous tables we used the values as
+reported by the timers built in \MKIV\ (and only reported the font loading
+times).
+
+The timings above are taken on my laptop running Windows 7 and this is not that
+good a platform for precise timings. Tacos measurements were done with
+specialized tools and should be trusted more. It looks indeed that the current
+level of userdata support is about the best compromise one can get.
+
+{\em In the process I also experimented with virtualizing the final \TFM\ table,
+thereby simulating the upcoming virtualization of that table in \LUATEX.
+Interesting is that for (for instance) \type {mk.pdf} memory consumption went
+down with 20\% but that document is non|-|typical and loades many fonts,
+including vitual punk fonts. However, as access to that tables happens
+infrequently virtualization makes muich sense there, again only at the toplevel
+of the characters subtable.}
+
+\stopsection
+
+\startsection [title={Hyperlinks}]
+
+At \PRAGMA\ we have a long tradition of creating highly interactive documents. I
+still remember the days that processing a 20.000 page document with numerous
+menus and buttons on each page took a while to get finished, especially if each
+page has a \METAPOST\ graphic as well.
+
+On a regular computer a document with so many links is no real problem. After
+all, the \PDF\ format is designed in such a way that only the partial content has
+to be loaded. However, half a million hyperlinks do demand some memory.
+
+Recently I had to make a document that targets at one of these tablets and it is
+no secret that tablets (and e-readers) don't have that much memory. As in
+\CONTEXT\ \MKIV\ we have a bit more control over the backend, it will be no
+surprise that we are able to deal with such issues more comfortable than in
+\MKII.
+
+That specific document (part of a series) contained 1100 pages and each page has
+a navigation menu as well as an alphabetic index into the register. There is a
+table of contents refering to about 200 chapters and these are backlinked to the
+table of contents. There are some also 200 images and tables that end up
+elsewhere and again are crosslinked. Of course there is the usual bunch of inline
+hyperlinks. So, in total this document has some 32.000 hyperlinks. The input is a
+3.03 MB \XML\ file.
+
+\starttabulate[|l|c|c|]
+\FL
+\NC \NC \bf size \NC \bf one run \NC \NR
+\TL
+\NC \bf don't optimize \NC 5.76 MB \NC 59.4 sec \NC \NR
+\NC \bf prefer page references over named ones \NC 5.66 MB \NC 56.2 sec \NC \NR
+\NC \bf agressively share similar references \NC 5.19 MB \NC 60.2 sec \NC \NR
+\NC \bf optimize page as well as similar references \NC 5.11 MB \NC 56.5 sec \NC \NR
+\NC \bf disable all interactive features \NC 4.19 MB \NC 42.7 sec \NC \NR
+\LL
+\stoptabulate
+
+So, by aggressively sharing hyperlinks and turning all internal named
+destinations into page destinations we bring down the size noticeably and even
+have a faster run. It is for this reason that aggressive sharing is enabled by
+default. I you don't want it, you can disable it with:
+
+\starttyping
+\disabledirectives[refences.sharelinks]
+\stoptyping
+
+Currently we use names for internal (automatically generated) links. We can force
+page links for them but still use names for explicit references so that we can
+reach them from external documents; this is called mixed mode. When no references
+from outside are needed, you can force pagelinks. At some point mixed mode can
+become the default.
+
+\starttyping
+\enabledirectives[references.linkmethod=page]
+\stoptyping
+
+With values: \type {page}, \type {mixed}, \type {names} and \type {yes} being
+equivalent to \type {page}. The \MKII\ way of setting this is still supported:
+
+\starttyping
+\setupinteraction[page=yes]
+\stoptyping
+
+We could probably gain quite some more bytes by turning all repetitive elements
+into shared graphical objects but it only makes sense to spend time on that when
+a project really needs it (and pays for it). There is upto one megabyte of
+(compressed) data related to menus and other screen real estate that qualifies
+for this but it might not be worth the trouble.
+
+The reason for trying to minimize the amount of hyperlink related metadata (in
+\PDF\ terminology annotations) is that on tablets with not that much memory (and
+no virtual memory) we don't want to keep too much of that (redundant) data in
+memory. And indeed, the optimized document feels more responsive than the dirty
+version, but that could as well be related to the viewing applications.
+
+\stopsection
+
+\startsection[title=Constants]
+
+Not every optimization saves memory of runtime. They are more optimizations due
+to changes in circumstances. When \TEX\ had only 256 registers one had to find
+ways to get round this. For instance counters are quite handy and you could
+quickly run out of them. In \CONTEXT\ there are two ways to deal with this.
+Instead of a real count register you can use a macro:
+
+\starttyping
+\newcounter \somecounter
+\increment \somecounter
+\decrement (\somecounter,4)
+\stoptyping
+
+In \MKIV\ many such pseudo counters have been replaced by real ones which is
+somewhat faster in usage.
+
+Often one needs a constant and a convenient way to define such a frozen counter
+is:
+
+\starttyping
+\chardef \myconstant 10
+\ifnum \myvariable = \myconstant ....
+\ifcase \myconstant ...
+\stoptyping
+
+This is both efficient and fast and works out well because \TEX\ treats them as
+numbers in comparisons. However, it is somewhat clumsy, as constants have nothing
+to do with characters. This is why all such definitions have been replaced by:
+
+\starttyping
+\newconstant \myconstant 10
+\setconstant \myconstant 12
+\ifnum \myvariable = \myconstant ....
+\ifcase \myconstant ...
+\stoptyping
+
+We use count registers which means that when you set a constant, you can just
+assign the new value directly or use the \type {\setcounter} macro.
+
+We already had an alternative for conditionals:
+
+\starttyping
+\newconditional \mycondition
+\settrue \mycondition
+\setfalse \mycondition
+\ifconditional \mycondition
+\stoptyping
+
+These will also be adapted to counts but first we need a new primitive.
+
+The advantage of these changes is that at the \LUA\ end we can consult as well as
+change these values. This means that in the end much more code will be adapted.
+Especially changing the constants resulted in quite some cosmetic changes in the
+core code.
+
+\stopsection
+
+\startsection[title=Definitions]
+
+Another recent optimization was possible when at the \LUA end settings lccodes
+cum suis and some math definitions became possible. As all these initializations
+take place at the \LUA\ end till then we were just writing \TEX\ code back to
+\TEX, but now we stay at the \LUA end. This not only looks nicer, but also
+results in a slightly less memory usage during format generation (a few percent).
+Making a format also takes a few tenths of a second less (again a few percent).
+The reason why less memory is needed is that instead of writing tens of thousands
+\type {\lccode} related commands to \TEX\ we now set the value directly. As
+writes to \TEX\ are collected, quite an amount of tokens get cached.
+
+All such small improvements makes that \CONTEXT\ \MKIV\ runs smoother with each
+advance of \LUATEX. We do have a wishlist for further improvements but so far we
+managed to improve stepwise instead of putting too much pressure on \LUATEX\
+development.
+
+\stopsection
+
+\stopchapter
+
+\stopcomponent