diff options
author | Context Git Mirror Bot <phg42.2a@gmail.com> | 2016-08-01 16:40:14 +0200 |
---|---|---|
committer | Context Git Mirror Bot <phg42.2a@gmail.com> | 2016-08-01 16:40:14 +0200 |
commit | 96f283b0d4f0259b7d7d1c64d1d078c519fc84a6 (patch) | |
tree | e9673071aa75f22fee32d701d05f1fdc443ce09c /doc/context/sources/general/manuals/mk/mk-fonts.tex | |
parent | c44a9d2f89620e439f335029689e7f0dff9516b7 (diff) | |
download | context-96f283b0d4f0259b7d7d1c64d1d078c519fc84a6.tar.gz |
2016-08-01 14:21:00
Diffstat (limited to 'doc/context/sources/general/manuals/mk/mk-fonts.tex')
-rw-r--r-- | doc/context/sources/general/manuals/mk/mk-fonts.tex | 841 |
1 files changed, 841 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/mk/mk-fonts.tex b/doc/context/sources/general/manuals/mk/mk-fonts.tex new file mode 100644 index 000000000..b5e945923 --- /dev/null +++ b/doc/context/sources/general/manuals/mk/mk-fonts.tex @@ -0,0 +1,841 @@ +% language=uk + +\usemodule[virtual] + +\startcomponent mk-fonts + +\environment mk-environment + +\chapter{A fresh look at fonts} + +\subject{readers} + +Now that we have the file system, \LUA\ script integration, input +encoding and basic logging in place, we have arrived at fonts. +Although today \OPENTYPE\ fonts are the fashion, we still need to +deal with \TEX's native font machinery. Although Latin Modern and +the \TEX\ Gyre collection will bring us many free \OPENTYPE\ +fonts, we can be sure that for a long time \TYPEONE\ variants will +be used as well, and when one has lots of bought fonts, replacing +them with \OPENTYPE\ updates is not always an option. And so, +reimplementing the readers for \TEX\ Font Metrics (\type {tfm} +files) and Virtual Fonts (\type {vf} files), was the first step. + +Because \ALEPH\ font handling was integrated already, Taco decided +to combine the \TFM\ and \OFM\ readers into a new one. The +combined loader is written in C and produces tables that are +accessible from within \LUA. A problem is that once a font is +used, one cannot simply change its metrics. So, we have to make +sure that we apply changes before a font is actually used: + +\starttyping +\font\test=texnansi-lmr at 31.415 pt +\test Yet another nice Kate Bush song: Pi +\stoptyping + +In this example, any change to the fontmetrics has to be done before +\type {test} is invoked. For this purpose the \type {define_font} +callback is provided. Below you see an experimental overload: + +\starttyping +callback.register("define_font", function (name,area,size) + return fonts.patches.process(font.read_tfm(name,size)) +end ) +\stoptyping + +The \type {fonts.patched.process} function (currently in \CONTEXT\ +\MKIV) implements a mechanism for tweaking the font parameters in +between. In order to get an idea of further features we played a +bit with ligature replacement, character spacing, kern tweaking +etc. Think of such a function (or a chain of functions) doing +things similar to: + +\starttyping +callback.register("define_font", function (name,area,size) + local tfmblob = font.read_tfm(name,size) -- build in loader + tfmblob.characters[string.byte("f")].ligatures = nil + return tfmblob -- datastructure that TeX will use internally +end ) +\stoptyping + +Of course the above definition is not complete, if only because we +need to handle chained ligatures as well (fl followed by i). + +In practice we prefer a more abstract interface (at the macro +level) but the idea stays the same. Interesting is that having +access to the internals this way already makes our \TEX\ Live more +interesting. (We cannot demonstrate this trickery here because +when this document is processed you cannot be sure if the +experimental interface is still in place.) + +When playing with this we ran into problems with file searching. +When performing the backend role, \LUATEX\ will look in the \TEX\ +tree if there is a corresponding virtual file. It took a while and +a bit of tracing (which is not that hard in the \LUA\ based +reader) to figure out that the omega related path definitions in +\type {texmf.cnf} files were not correct, something that went +unnoticed because omega never had a backend integrated and the +\DVI\ processors did multiple searches to get around this. + +Currently, if you want to enable extensive tracing of file +searching and loading, you can set an environment variable: + +\starttyping +MTX.INPUT.TRACE=3 +\stoptyping + +This will produce a lot of information about what file is asked +for, what types (tex, font, etc) determines the search, along what +paths is being searched, what readers and locators are used (file, +zip, protocol), etc. + +\subject{AFM} + +While Taco implemented the virtual font reader |<|eventually its +data will be merged with the \TFM\ table|>| I started playing with +constructing \TFM\ tables directly. Because \CONTEXT\ has a rather +systematic naming scheme, we can rather easily see which encoding +we are dealing with. This means that in principle we can throw all +encoded \TFM\ files out of our tree and construct the tables using +the \AFM\ file and an encoding vector. + +It took us a good day to figure out the details, but in the end we +were able to trick \LUATEX\ into using \AFM\ files. With a bit of +internal caching it was even reasonable fast. When the basic +conversion mechanism was written we tried to compare the results +with existing \TFM\ metrics as generated by \type {afm2tfm} and +\type {afm2pl}. Doing so was less trivial than we first thought. +To mention a few aspects: + +\startitemize[packed] +\item heights and depths have a limited number of values in \TEX +\item we need to convert to \TEX's scaled points +\item rounding errors of one scaled point occur +\item \type {afm2tfm} can only add kerns when virtual fonts are used +\item \type {afm2tfm} adds some extra ligatures and also does some + kern magic +\item \type {afm2pl} adds even more kerns +\item the tools remove kern pars between digits +\stopitemize + +In this perspective we need not be too picky on what exactly a +ligature is. An example of a ligature is \type {fi} and such a +character can be in the font. In the \TFM\ file, the definition of +\type {f} contains information about what to do when it's followed +by an \type {i}: it has to insert a reference (character number) +pointing to the fi glyph. + +However, because \TEX\ was written in \ASCII\ time space, there +was a problem of how to get access to for instance the Spanish +quotation and exclamation marks. Here the ligature mechanism +available in the \TFM\ format was misused in the sense that a +combination of \type {exclam} and \type {quoteleft} becomes \type +{exclamdown}. In a similar fashion will two single quotes become a +double quote. And every \TEX ie knows that multiple hyphens +combine into -- (endash) and --- (emdash), where the later one is +achieved by defining a ligature between an endash and a hyphen. + +Of course we have to deal with conversions from \AFM\ units (1000 +per em) to \TEX's scaled points. Such conversions may be sensitive +for rounding errors. Because we noticed differences of one scaled +point, I tried several strategies to get the results consistent +but so far I didn't manage to find out where these differences +come from. Rounding errors seem to be rather random and I have no +clue what strategy the regular converters follow. Another fuzzy +area are the font parameters (visible as font dimensions for +users): I wonder how many users really know what values are used +and why. + +You may wonder to what extend this rounding problem will influence +consistent typesetting. We have no reason to assume that the +rounding error is operating system dependent. This leaves the +different methods used and personally I have no problems with the +direct reader being not 100\% compatible with the regular tools. +First of all it's an illusion to think that \TEX\ distributions +are stable over the years. Fonts and conversion tools are being +updated every now and then, and metrics change over time (apart +from Computer Modern which is stable by definition). Also, pattern +file are updated, so paragraphs may be broken into lines different +anyway. If you really want stability, then you need to store the +fonts and patterns with your document. + +As we already mentioned, the regular converter programs add kerns +as well. Treating common glyph shapes similar is not uncommon in +\CONTEXT\ so I decided to provide methods for adding \quote +{missing} kerns. For example, with regards to kerning, we can +treat \type {eacute} the same way as an~\type {e}. Some ligatures, +like \type {ae} or \type {fi}, need to be seen from two sides: +when looked at from the left side they resemble an \type {a} and +\type {f}, but when kerned at their right, they are to be treated +as \type {e} and \type {i}. + +So, when all this is taken care of, we will have a reasonable +robust and compatible way to deal with \AFM\ files and when this +variant is enabled, we can prune our \TEX\ trees pretty well. +Also, now that we have font related tables, we can start moving +tables built out of \TEX\ macros (think of protruding and hz) to +\LUA, which will not only save us much hash entries but also +permits us faster implementations. + +The question may arise why there is no hard coded \AFM\ reader. +Although some speed up can be achieved by reading the table with +\AFM\ data directly, there would still be the issue of making that +table accessible for manipulations as described (costs time too). +The \AFM\ format is human readable contrary to the \TFM\ format +and therefore they can conveniently be processed by \LUA. Also, +the possible manipulations may differ per macro package, user, and +even documents. The changes of users and developers reaching an +agreement about such issues is near zero. By writing the reader in +\LUA, a macro package writer can also implement caching mechanisms +that suits the package. Also, keep in mind that we often only need +to load about four \AFM\ files or a few more when we mix fonts. + +In my main tree (regular distributions) there are some 350 files +in \type {texnansi} encoding that take over 2~MByte. My personal +font tree has over a thousand such entries which means that we can +prune the tree considerably when we use the \AFM\ loader. Why +bother about \TFM\ when \AFM\ can do the job. + +In order to reduce the overhead in reading the \AFM\ file, we now +use external caching, which (in \CONTEXT\ \MKIV) boils down to +serializing the internal \AFM\ tables and compiling them to +bytecode. As a result, the runtime becomes comparable to a run +using regular \TFM\ files. On this document usign the \AFM\ reader +(cached) takes some .3 seconds more on 8 seconds total (28 pages +in Optima Nova with a couple of graphics). + +While we were playing with this, Hermann Zapf surprised me by +sending me a \CD\ with his marvelous new Palatino Sans. So, +instead of generating \TFM\ metrics, I decided to use \type +{ttf2afm} to generate me an \AFM\ file from the \TRUETYPE\ files +and use these metrics. It worked right out of the box which means +that one can copy a set of font files directly from the source to +the tree. In a demo document the Palatino Sans came out quite well +and so we will use this font to explore the upcoming Open Type +features. + +Because we now have less font resources (only two files per font) +we decided to get away from the spread||all||over||the||tree +paradigm. For this we introduced + +\starttyping +../fonts/data/vendor/collection +\stoptyping + +like: + +\starttyping +../fonts/data/tex/latin-modern +../fonts/data/tex-gyre/bonum +../fonts/data/linotype/optima-nova +../fonts/data/linotype/palatino-nova +../fonts/data/linotype/palatino-sans +\stoptyping + +Of course one needs to adapt the related font paths in the +configuration files but getting that done in tex distributions is +another story. + +\subject{map files} + +Reading an \AFM\ file is only part of the game. Because we bypass +the regular \TFM\ reader we may internally end up with different +names of fonts (and|/|or files). This also means that the map +files that map an internal name onto an font (outline) file may be +of no use. The map file also specifies the encoding file which +maps character numbers onto names used in font files. + +The map file maps a font name to a (preferable outline) font +resource file. This can be a file with suffix \type {pfb}, \type +{ttf}, \type {otf} or alike. When we convert am \AFM\ file into a +more suitable format, we also store the associated (outline) +filename, that we use later when we assemble the map line data (we +use \type {\pdfmapline} to tell \LUATEX\ how to prepare and embed +a file. + +Eventually \LUATEX\ will take care of all these issues itself +thereby rendering map files and encoding files kind of useless. +When loading an \AFM\ file we already have to read encoding files, +so we have all the information available that normally goes into +the map file. While conducting experiments with reading \AFM\ +files, we therefore could use the \type {\pdfmapline} primitive to +push the right entries into font inclusion machinery. Because +\CONTEXT\ already handles map data itself we could easily hook +this into the normal handlers for that. (There are some nasty +synchronization issues involved in handling map entries in general +but we will not bother you with that now). + +Although eventually we may get rid of map files, we also used the +general map file handling in \CONTEXT\ as a playground for the +\XML\ handler that we wrote in \LUA. Playing with many map files +(a few KBytes) coded in \XML\ format, or with one big map file +(easily 800 MBytes) makes a good test case for loading and dumping + +But why bother too much about map files in \LUATEX\ \unknown\ they +will go away anyway. + +\subject{OTF \& TTF} + +One of the reasons for starting the \LUATEX\ development was that we wanted to +be able to use \OPENTYPE\ (and \TRUETYPE) fonts in \PDFTEX. As a prelude (and kind of +transition) we first dealt with \TYPEONE\ using either \TFM\ or \AFM. For \TEX\ it does +not really matter what font is used, it only deals with dimensions and generic +characteristics. Of course, when fonts offer more advanced possibilities, we may +need more features in the \TEX\ kernel, but think of \HZ\ or protruding as provided +by \PDFTEX: it's not part of the font (specification) but of the engine. The same +is actually true for kerning and ligature building, although here the font (data) may +provide the information needed to deal with it properly. + +\OPENTYPE\ fonts come with features. Examples of features are using oldstyle figures or +tabular digits instead of the default ones. Dealing with such issues boils down to +replacing one character representation by another or treating combinations of character +in the input differently depending on the circumstances. There can be relationships +between languages and scripts, but, as \TEX ies know, other relationships exist as well, +for instance between content and visualization. + +Therefore, it will be no surprise that \LUATEX\ does not simply implement the \OPENTYPE\ +specification as such. On the one hand it implements a way to load information stored +in the font, on the other hand it implements mechanisms to fullfil the demands of such +fonts and more. The glue between both is done with \LUA. In the simple case of ligatures +and kerns this goes as follows. A user (or macropackage) specified a font, and this +call can be intercepted using a callback. This callback can use a built in function that +loads an \OTF\ or \TTF\ font. From this table, a font table is constructed that is passed +on to \TEX. The construction +may involve building ligature and kerning tables using the information present +in the font file, but it may as well mean more. So, given a bare \LUATEX\ system, +\OPENTYPE\ font support is not giving you automatically handling of features, or more +precisely, there is no hard coded support for features. + +This may sound as a disadvantage +but as soon as you start looking at how \TEX\ users use their system (in most cases +by using a macro package) you may understand that flexibility is larger this way. Instead +of adding more and more control and exceptions, and thereby making the kernel more +instable and complex, we delegate control to the macro package. The advantage is that +there are no (everlasting) discussions on how to deal with things and in the end the +user will use a high level interface anyway. Of course the macro package needs proper +access to the font's internals, but this is provided: the code used for reading in the +data comes from FontForge (an advanced font editor) and is presented via \LUA\ tables +in a well organized way. + +Given that users expect \OPENTYPE\ features to be supported, how do we provide an +interface. In \CONTEXT\ the user interface has always be an important aspect and +consistency is a priority. On the other hand, there has been the tradition of specifying +the size explicity and a new custom introduced by \XETEX\ to enhance fontname +with directives. Traditional \TEX\ provides: + +\starttyping +\font \name filename [optional size] +\stoptyping + +\XETEX\ accepts + +\starttyping +\font \name "fontname[:optional features]" [optional size] +\font \name fontname[:optional features] [optional size] +\stoptyping + +Instead of a fontname one can pass a filename between square brackets. \LUATEX\ +handles: + +\starttyping +\font \name anything [optional size] +\font \name {anything} [optional size] +\stoptyping + +where anything as well as the size are passed on to the callback. + +This permits us to implement a traditional specification, support \XETEX\ like +definitions, and easily pass information from a macro package down to the +callback as well. Interpreting anything is done in \LUA. + +While implementing the \LUA\ side of the loader we took a similar approach +as the \AFM\ reader and cached intermediate tables as well as keep track +of font names (in addition to filenames). In order to be able to quickly +determine the (internal) font name of an \OPENTYPE\ font, special loader +functions are provided. + +The size is kind of special, because we can have specifications like + +\starttyping +at 10pt +at 3ex +at \dimexpr\bodyfontsize+1pt\relax +\stoptyping + +This means that we need to handle that on the \TEX\ side and pass the +calculated value to the callback. + +Virtual fonts have a rather special nature. They permit you to define variations +of fonts using other fonts and special (\DVI\ related) operators. However, from the +perspective of \TEX\ itself they don't exist at all. When you create a virtual font +you also end up with a \TFM\ file and \TEX\ only needs this file, which defined +characters in terms of a width, height, depth and italic correction as well as +associates characters with kerning pairs and ligatures. \TEX\ leaves it to the +backend to deal the actual glyphs and therefore the backend will be confronted +by the internals of a virtual font. Because \PDFTEX\ and therefore \LUATEX\ has the +backend built in, it is capable of handling virtual fonts information. + +In \LUATEX\ you can build your own virtual font and this will suit us well. It +permits us for instance to complete fonts that lack certain characters (glyphs) and +thereby let us get rid of ugly macro based fallback trickery. Although in \CONTEXT\ +we will provide a high level interface, we will give you a taste of \LUA\ here. + +\starttyping +callback.register("define_font", function(name,size) + if name == "demo" then + local f = font.read_tfm('texnansi-lmr10',size) + if f then + local capscale, digscale = 0.85, 0.75 + f.name, f.type = name, 'virtual' + f.fonts = { + { name="texnansi-lmr10" , size=size }, + { name="texnansi-lmss10", size=size*capscale }, + { name="texnansi-lmtt10", size=size*digscale } + } + for k,v in pairs(f.characters) do + local chr = utf.char(k) + if chr:find("[A-Z]") then + v.width = capscale*v.width + v.commands = { + {"special","pdf: 1 0 0 rg"}, + {"font",2}, {"char",k}, + {"special","pdf: 0 g"} + } + elseif chr:find("[0-9]") then + v.width = digscale*v.width + v.commands = { + {"special","pdf: 0 0 1 rg"}, + {"font",3}, {"char",k}, + {"special","pdf: 0 g"} + } + else + v.commands = { + {"font",1}, {"char",k} + } + end + end + return f + end + end + return font.read_tfm(name,size) +end) +\stoptyping + +Here we define a virtual font that uses three real fonts and +which font is used depends on the kind of character we're +dealing with (inreal world situations we can best use the \MKIV\ function +that tells what class a character belongs to). The \type {commands} +table determines what glyphs comes out in what way. We use a bit of +literal pdf code to color the special characters but generally color is +not handled at the font level. + +This example can be used like: + +\starttyping +\font\test=demo \test +Hi there, this is the first (number 1) example of playing with +Virtual Fonts, some neat feature of \TeX, once you have access +to it. For instance, we can misuse it to fill in gaps in fonts. +\stoptyping + +During development of this mechanism, we decided to save some redundant +loading by permitting id's in the fonts array: + +\starttyping +callback.register("define_font", function(name,size) + if name == "demo" then + local f = font.read_tfm('texnansi-lmr10',size) + if f then + local id = font.define(f) + local capscale, digscale = 0.85, 0.75 + f.name, f.type = name, 'virtual' + f.fonts = { + { id=id }, + { name="texnansi-lmss10", size=size*capscale }, + { name="texnansi-lmtt10", size=size*digscale } + } + for k,v in pairs(f.characters) do + local chr = utf.char(k) + if chr:find("[A-Z]") then + v.width = capscale*v.width + v.commands = { + {"special","pdf: 1 0 0 rg"}, + {"slot",2,k}, + {"special","pdf: 0 g"} + } + elseif chr:find("[0-9]") then + v.width = digscale*v.width + v.commands = { + {"special","pdf: 0 0 1 rg"}, + {"slot",3,k}, + {"special","pdf: 0 g"} + } + else + v.commands = { + {"slot",1,k} + } + end + end + return f + end + end + return font.read_tfm(name,size) +end) +\stoptyping + +Hardwiring fontnames in callbacks this way does not deserve a price and +when possible we will provide better extension interfaces. Anyhow, +in the experimental \CONTEXT\ code we used calls like this, where +\type {demo} is an installed feature. + +\startbuffer +\font\myfont = special@demo-1 at 12pt \myfont +Hi there, this is the first (number 1) example of playing with Virtual Fonts, +some neat feature of \TeX, once you have access to it. For instance, we can +misuse it to fill in gaps in fonts. +\stopbuffer + +\typebuffer \start \getbuffer \par \stop + +Keep in mind that this is just an example. In practice we will not do such things +at the font level but by manipulating \TEX's internals. + +While developing this functionality and especially when Taco was +programming the backend functionality, we used more sane \MKIV\ code. Think +of (still \LUA) definitions like: + +\startbuffer +\ctxlua { + fonts.definers.methods.install("weird", { + { "copy-range", "lmroman10-regular" } , + { "copy-char", "lmroman10-regular", 65, 66 } , + { "copy-range", "lmsans10-regular", 0x0100, 0x01FF } , + { "copy-range", "lmtypewriter10-regular", 0x0200, 0xFF00 } , + { "fallback-range", "lmtypewriter10-regular", 0x0000, 0x0200 } + }) +} +\stopbuffer + +\typebuffer \getbuffer + +Again, this is not the final user interface, but it shows the +direction we're heading. The result looks like: + +\startbuffer +\font\test={myfont@weird} at 12pt \test +\eacute \rcaron \adoublegrave \char65 +\stopbuffer + +\typebuffer + +This shows up as: + +\start \getbuffer \stop + +Here the \type {@} tells the (new) \CONTEXT\ font handler what constructor +should be used. + +Because some testers already have \XETEX\ font support files, we +also support a \XETEX\ like definition syntax. + +\startbuffer +\font\test={lmroman10-regular:dlig;liga}\test +f i fi ffi \crlf +f i f\kern0pti f\kern0ptf\kern0pti \crlf +\char64259 \space\char64256 \char105 \space \char102\char102\char105 +\stopbuffer + +\typebuffer + +This gives: + +\start \getbuffer \stop + +We are quite tolerant with regards to this specification and will provide less +dense methods as well. Of course we need to implement a whole bunch of +features but we will do this in such a way that we give users full control. + +\subject{encodings} + +By now we've reached a stage where we can get rid of font encodings. We now +have the full unicode range available and no longer depend on the font +encoding when we hyphenate. In a previous chapter we discussed the difference +in size between formats. + +\starttabulate[|c|c|c|c|c|] +\NC \bf date \NC \bf luatex \NC \bf pdftex \NC \NR +\NC 2006-10-23 \NC 3 135 568 \NC 7 095 775 \NC \NR +\NC 2007-02-18 \NC 3 373 206 \NC 7 426 451 \NC \NR +\NC 2007-02-19 \NC 3 060 103 \NC 7 426 451 \NC \NR +\stoptabulate + +The size of the formats has grown a bit due to a few more +patterns and a extra preloaded encoding. But the \LUATEX\ +format shrinks some 10\% now that we can get rid of encoding +support. Some support for encodings is still present, so that +one can keep using the metric files that are installed (for +instance in project related trees that have special fonts) +although \AFM/\TYPEONE\ files or \OPENTYPE\ fonts will be used when +available. + +A couple of years from now, we may throw away some \LUA\ code +related to encodings. + +\subject{files} + +\TEX\ distributions tend to be rather large, both in terms of +files and bytes. Fonts take most of the space. The merged +\TEX Live 2007 trees contain some 60.000 files that take +1.123 MBytes. Of this, 25.000 files concern fonts totaling +to 431 MBytes. A recent \CONTEXT\ distribution spans 1200 files and +20 MBytes and a bit more when third party modules are taken into +account. The fonts in \TEX Live are distributed as follows: + +\starttabulate[|l|r|r|r|r|] +\HL +\NC \bf format \NC \bf files \NC \bf bytes \NC \NC \NC \NR +\HL +\NC AFM \NC 1.769 \NC 123.068.970 \NC 443 \NC 22.290.132 \NC \NR +\NC TFM \NC 10.613 \NC 44.915.448 \NC 2.346 \NC 8.028.920 \NC \NR +\NC VF \NC 3.798 \NC 6.322.343 \NC 861 \NC 1.391.684 \NC \NR +\NC TYPE1 \NC 2.904 \NC 180.567.337 \NC 456 \NC 18.375.045 \NC \NR +\NC TRUETYPE \NC 22 \NC 1.494.943 \NC \NC \NC \NR +\NC OPENTYPE \NC 144 \NC 17.571.732 \NC \NC \NC \NR +\NC ENC \NC 268 \NC 782.680 \NC \NC \NC \NR +\NC MAP \NC 406 \NC 6.098.982 \NC 110 \NC 129.135 \NC \NR +\NC OFM \NC 39 \NC 10.309.792 \NC \NC \NC \NR +\NC OVF \NC 39 \NC 413.352 \NC \NC \NC \NR +\NC OVP \NC 22 \NC 2.698.027 \NC \NC \NC \NR +\NC SOURCE \NC 4.736 \NC 25.932.413 \NC \NC \NC \NR +\HL +\stoptabulate + +We omitted the more obscure file types. The last two columns show the +numbers for one of my local font trees. + +In due time we will see a shift from \TYPEONE\ to \OPENTYPE\ and \TRUETYPE\ +files and because these fonts are more +complete, they may take some more space. More important is that the \TEX\ specific +font metric files will phase out and the less \TYPEONE\ fonts we have, the less \AFM\ +companions we need (\AFM\ files are not compressed and therefore relatively +large). Mapping and encoding files can also go away. + +In \LUATEX\ we can do with less files, but the number of bytes may grow a bit +depending on how much is catched (especially fonts). Anyhow, we can safely +assume that a \LUATEX\ based distributions will carry less files and less +bytes around. + +\subject{fallbacks} + +Do we need virtual fonts? Currently in \CONTEXT, when a font encoding is chosen, a +fallback mechanism steps in as soon as a character is not in the encoding. So far, +so good. But occasionally we run into a font that does not (completely) fits an +encoding and we end up with defining a non standard one. In traditional \TEX\ +a side effects of font encodings is that they relate to hyphenation. \CONTEXT\ can +deal with that comfortably and multiple instances of the same set of hyphenation +patterns can be loaded, but for custom encodings this is kind of cumbersome. + +In \LUATEX\ we have just one font encoding: \UNICODE. When \OPENTYPE\ fonts are used, +we don't expect many problems related to missing glyphs, but you can bet on it that +they will occur. This is where in \CONTEXT\ \MKIV\ fallbacks will be used and this +will be implemented using vitual fonts. The advantage of using virtual fonts is that +we still deal with proper characters and hyphenation will take place as expected. And +since virtual fonts can be defined on the fly, we can be flexible in our implementation. +We can think of generic fallbacks, not much different than macro based representations, +or font specific ones, where we even may rely on \METAPOST\ for generating the glyph +data. + +How do we define a fall back character. When building this mechanism I used the +\quote {\textcent} as an example. A cent symbol is roughly defined as follows: + +\starttyping +local t = table.fastcopy(g.characters[0x0063]) -- mkiv function +local s = fonts.constructors.scaled(g.fonts[1].size) -- mkiv function +t.commands = { + {"push"}, + {"slot", 1, c}, + {"pop"}, + {"right", .5*t.width}, + {"down", .2*t.height}, + {"rule", 1.4*t.height, .02*s} +} +t.height = 1.2*t.height +t.depth = 0.2*t.height +\stoptyping + +Here, \type {g} is a loaded font (table) which has type \type {virtual}. The +first font in the \type {fonts} array is the main font. What happens here +is the following: we assign the characteristics of \quote {c} to the cent +symbol (this includes kerning and dimensions) and then define a command +sequence that draws the \quote {c} and a vertical rule through it. + +The real code is slightly more complicated because we need to take care of +italic properties when applicable and because we have added some tracing too. +While playing with this kind of things, it becomes clear what features are +handy, and the reason that we now have a virtual command \type {comment} is +that it permits us to implement tracing (using for instance color specials). + +\def\TestLine#1% + {\start + \font\test=#1\relax + \test + c\quad + \textcent\quad + \ruledhbox{c}\quad + \ruledhbox{\textcent}\quad + \scaron\quad + \eacute\quad + \adiaeresis\quad + \udiaeresis\quad + \char 465\quad + \char 463\quad + \char7685\quad + \stop + \blank} + +\TestLine {lmroman10-regular@demo-2 at 24pt} +\TestLine {lmroman10-italic@demo-2 at 24pt} + +The previous lines are typeset using a similar specification as mentioned +before: + +\starttyping +\font\test=lmroman10-regular@demo-2 +\stoptyping + +Without the fallbacks we get: + +\TestLine {lmroman10-regular at 24pt} +\TestLine {lmroman10-italic at 24pt} + +And with normal (non forced fallbacks) it looks as follows. As it happens, +this font has a cent symbol so no fallback is needed. + +\TestLine {lmroman10-regular@demo-3 at 24pt} +\TestLine {lmroman10-italic@demo-3 at 24pt} + +The font definition callback intercepts the \type {demo-2} and a couple of +chained lua functions make sure that characters missing in the font are +replaced by fallbacks. In the case of missing composed characters, they are +constructed from their components. In this particular example we have told +the handler to assume that all composed characters are missing. + +\subject{memory} + +Traditional \TEX\ has been designed for speed and a small memory footprint. Todays +implementations are considerably more generous with the amount of memory that +you can use (hash, fonts, main memory, patterns, backend, etc). Depending +on how complicated a document layout it, memory may run into tens of megabytes. + +Because \LUATEX\ is not only suitable for wide fonts, but also does away with some of +the optimizations in the \TEX\ code that complicate extensions, it has a larger +footprint that \PDFTEX. When implementing the \OPENTYPE\ font basics, we did quite +some tests with respect to memory usage. Getting the numbers right is non trivial +because the \LUA\ garbage collector is interfering. For instance, on my machine a +test file with the regular \CONTEXT\ setup of of Latin Modern fonts made \LUA\ +allocate 130 MB, while the same run on Taco's machine took 100 MB. + +When a font data table is constructed, it is handled over to \TEX, and turned into +the internal font data structures. During the construction of that \TABLE\ at the +\LUA\ end, \CONTEXT\ \MKIV\ disables the garbage collector. By doing this, the time +needed to construct and scale a font can be halved. Curious to the amount of memory +involved in passing such a table, I added the following piece of code: + +\starttyping +if type(fontdata) == "table" then + local s = statistics.luastate_bytes + local t = table.copy(fontdata) + local d = statistics.luastate_bytes-s + texio.write_nl(string.format("table memory footprint: %s",d)) +end +\stoptyping + +It turned out that a Regular Latin Modern font (\OPENTYPE) takes around +800 KB. However, more interesting was that by adding this snippet of testcode +which duplicted the table in order to measure its size, the total memory footprint +dropped to 100 MB (about the amount used on Taco's machine). This demonstrates +that one should be very careful with drawing conclusions. + +Because fonts are rather important in \TEX\ and because there can be lots of +them used, it makes sense to keep an eye on memory as well as performance. +Because many manipulations now take place in \LUA, it no longer makes sense +to let \TEX\ buffer fonts. In plain \TEX\ one finds these magic + +\starttyping +\font\preloaded=cmr10 +\font\preloaded=cmr12 +\stoptyping + +lines. The second definitions obscures the first, but the \type {cmr10} stays +loaded. + +\starttyping +\font\one=cmr10 at 10pt +\font\two=cmr10 at 10pt +\stoptyping + +These two definitions make \TEX\ load the font only once. However, since +we can now delegate loading to \LUA, \TEX\ no longer helps us there. For instance, +\TEX\ has no knowledge to what extend this \type {cmr10} font has been manipulated +and therefore both instances may actually differ. + +When you use a callback to define the font, \TEX\ passes a font id number. You can +use this number as a reference to a loaded font (that is, passed to \TEX). If +instead of a table, you return a number, \TEX\ will reuse the already loaded font. +This feature can save you a lot of time, especially when a macro package (like +\CONTEXT) defines fonts dynamically which means that when grouping is used, fonts +get (re)defined a lot. Of course additional caching can take place at the \LUA\ end, +but there one needs to take into account more than just the scaled instance. Think of +\OPENTYPE\ features or virtual font properties. The following are quite certainly +different setups, in spite of the common size. + +\starttyping +\font\one=lmr10@demo-1 at 10pt +\font\two=lmr10@demo-2 at 10pt +\stoptyping + +When scaling a font, one not only needs to handle the regular glyph dimensions, but also the +kerning tables. We found out that dealing with such issues takes some 25\% of the time +spent on loading Latin Modern fonts that have rather extensive kerning tables. +When creating a virtual font, copying glyph tables may happen a lot. Deep copying +tables takes a bit of time. This is one of the reasons why we discussed (and consider) +some dedicated support functions so that copying and recalculating tables happens faster +(less costly hash lookups and such). On the other hand, the time wasted on calculations +(including rounding to scaled points) can be neglected. + +The following table shows what happens when we enforce a different +garbage collecting scheme. This test was triggered by another experiment +where at regular time, for instance after a pag eis shipped out, say + +\starttyping +collectgarbage("collect") +\stoptyping + +However, such a complete sweep has drastic consequences for the runtime. +But, since the memory footprint becomes 10--15\% less by doing so, we +played a bit with + +\starttyping +collectgarbage("setstepmul", somenumber) +\stoptyping + +When processing a not so large file but one that loads a bunch of open type +fonts, we get the following values. The left set is on linux (Taco's machine) +and the right set in mine. + +\starttabulate[|r|r|r|r|r|] +\NC \bf stepmul \NC \bf run (s) \NC \bf mem (MB) \NC \bf run (s) \NC \bf mem (MB) \NC \NR +\HL +\NC 200 \NC 1.58 \NC 69.14 \NC 5.6 \NC 84.17 \NC \NR +\NC 1000 \NC 1.63 \NC 69.14 \NC 6.5 \NC 72.32 \NC \NR +\NC 2000 \NC 1.64 \NC 60.66 \NC 6.8 \NC 73.53 \NC \NR +\NC 10000 \NC 1.71 \NC 59.94 \NC 7.0 \NC 72.30 \NC \NR +\stoptabulate + +Since I use an old laptop running Windows with a probably +different \TEX\ configuration (fonts), and under some load, both columns +don't compare well, but the general idea is the same. For practical usage +a value of 1000 is probably best, especially because memory intensive font +and script loading only happens at the first couple of pages. + +\stopcomponent |