% language=us \usemodule[virtual] \startcomponent mk-fonts \environment mk-environment \chapter{A fresh look at fonts} \subject{readers} Now that we have the file system, \LUA\ script integration, input encoding and basic logging in place, we have arrived at fonts. Although today \OPENTYPE\ fonts are the fashion, we still need to deal with \TEX's native font machinery. Although Latin Modern and the \TEX\ Gyre collection will bring us many free \OPENTYPE\ fonts, we can be sure that for a long time \TYPEONE\ variants will be used as well, and when one has lots of bought fonts, replacing them with \OPENTYPE\ updates is not always an option. And so, reimplementing the readers for \TEX\ Font Metrics (\type {tfm} files) and Virtual Fonts (\type {vf} files), was the first step. Because \ALEPH\ font handling was integrated already, Taco decided to combine the \TFM\ and \OFM\ readers into a new one. The combined loader is written in C and produces tables that are accessible from within \LUA. A problem is that once a font is used, one cannot simply change its metrics. So, we have to make sure that we apply changes before a font is actually used: \starttyping \font\test=texnansi-lmr at 31.415 pt \test Yet another nice Kate Bush song: Pi \stoptyping In this example, any change to the fontmetrics has to be done before \type {test} is invoked. For this purpose the \type {define_font} callback is provided. Below you see an experimental overload: \starttyping callback.register("define_font", function (name,area,size) return fonts.patches.process(font.read_tfm(name,size)) end ) \stoptyping The \type {fonts.patched.process} function (currently in \CONTEXT\ \MKIV) implements a mechanism for tweaking the font parameters in between. In order to get an idea of further features we played a bit with ligature replacement, character spacing, kern tweaking etc. Think of such a function (or a chain of functions) doing things similar to: \starttyping callback.register("define_font", function (name,area,size) local tfmblob = font.read_tfm(name,size) -- build in loader tfmblob.characters[string.byte("f")].ligatures = nil return tfmblob -- datastructure that TeX will use internally end ) \stoptyping Of course the above definition is not complete, if only because we need to handle chained ligatures as well (fl followed by i). In practice we prefer a more abstract interface (at the macro level) but the idea stays the same. Interesting is that having access to the internals this way already makes our \TEX\ Live more interesting. (We cannot demonstrate this trickery here because when this document is processed you cannot be sure if the experimental interface is still in place.) When playing with this we ran into problems with file searching. When performing the backend role, \LUATEX\ will look in the \TEX\ tree if there is a corresponding virtual file. It took a while and a bit of tracing (which is not that hard in the \LUA\ based reader) to figure out that the omega related path definitions in \type {texmf.cnf} files were not correct, something that went unnoticed because omega never had a backend integrated and the \DVI\ processors did multiple searches to get around this. Currently, if you want to enable extensive tracing of file searching and loading, you can set an environment variable: \starttyping MTX.INPUT.TRACE=3 \stoptyping This will produce a lot of information about what file is asked for, what types (tex, font, etc) determines the search, along what paths is being searched, what readers and locators are used (file, zip, protocol), etc. \subject{AFM} While Taco implemented the virtual font reader |<|eventually its data will be merged with the \TFM\ table|>| I started playing with constructing \TFM\ tables directly. Because \CONTEXT\ has a rather systematic naming scheme, we can rather easily see which encoding we are dealing with. This means that in principle we can throw all encoded \TFM\ files out of our tree and construct the tables using the \AFM\ file and an encoding vector. It took us a good day to figure out the details, but in the end we were able to trick \LUATEX\ into using \AFM\ files. With a bit of internal caching it was even reasonable fast. When the basic conversion mechanism was written we tried to compare the results with existing \TFM\ metrics as generated by \type {afm2tfm} and \type {afm2pl}. Doing so was less trivial than we first thought. To mention a few aspects: \startitemize[packed] \item heights and depths have a limited number of values in \TEX \item we need to convert to \TEX's scaled points \item rounding errors of one scaled point occur \item \type {afm2tfm} can only add kerns when virtual fonts are used \item \type {afm2tfm} adds some extra ligatures and also does some kern magic \item \type {afm2pl} adds even more kerns \item the tools remove kern pars between digits \stopitemize In this perspective we need not be too picky on what exactly a ligature is. An example of a ligature is \type {fi} and such a character can be in the font. In the \TFM\ file, the definition of \type {f} contains information about what to do when it's followed by an \type {i}: it has to insert a reference (character number) pointing to the fi glyph. However, because \TEX\ was written in \ASCII\ time space, there was a problem of how to get access to for instance the Spanish quotation and exclamation marks. Here the ligature mechanism available in the \TFM\ format was misused in the sense that a combination of \type {exclam} and \type {quoteleft} becomes \type {exclamdown}. In a similar fashion will two single quotes become a double quote. And every \TEX ie knows that multiple hyphens combine into -- (endash) and --- (emdash), where the later one is achieved by defining a ligature between an endash and a hyphen. Of course we have to deal with conversions from \AFM\ units (1000 per em) to \TEX's scaled points. Such conversions may be sensitive for rounding errors. Because we noticed differences of one scaled point, I tried several strategies to get the results consistent but so far I didn't manage to find out where these differences come from. Rounding errors seem to be rather random and I have no clue what strategy the regular converters follow. Another fuzzy area are the font parameters (visible as font dimensions for users): I wonder how many users really know what values are used and why. You may wonder to what extend this rounding problem will influence consistent typesetting. We have no reason to assume that the rounding error is operating system dependent. This leaves the different methods used and personally I have no problems with the direct reader being not 100\% compatible with the regular tools. First of all it's an illusion to think that \TEX\ distributions are stable over the years. Fonts and conversion tools are being updated every now and then, and metrics change over time (apart from Computer Modern which is stable by definition). Also, pattern file are updated, so paragraphs may be broken into lines different anyway. If you really want stability, then you need to store the fonts and patterns with your document. As we already mentioned, the regular converter programs add kerns as well. Treating common glyph shapes similar is not uncommon in \CONTEXT\ so I decided to provide methods for adding \quote {missing} kerns. For example, with regards to kerning, we can treat \type {eacute} the same way as an~\type {e}. Some ligatures, like \type {ae} or \type {fi}, need to be seen from two sides: when looked at from the left side they resemble an \type {a} and \type {f}, but when kerned at their right, they are to be treated as \type {e} and \type {i}. So, when all this is taken care of, we will have a reasonable robust and compatible way to deal with \AFM\ files and when this variant is enabled, we can prune our \TEX\ trees pretty well. Also, now that we have font related tables, we can start moving tables built out of \TEX\ macros (think of protruding and hz) to \LUA, which will not only save us much hash entries but also permits us faster implementations. The question may arise why there is no hard coded \AFM\ reader. Although some speed up can be achieved by reading the table with \AFM\ data directly, there would still be the issue of making that table accessible for manipulations as described (costs time too). The \AFM\ format is human readable contrary to the \TFM\ format and therefore they can conveniently be processed by \LUA. Also, the possible manipulations may differ per macro package, user, and even documents. The changes of users and developers reaching an agreement about such issues is near zero. By writing the reader in \LUA, a macro package writer can also implement caching mechanisms that suits the package. Also, keep in mind that we often only need to load about four \AFM\ files or a few more when we mix fonts. In my main tree (regular distributions) there are some 350 files in \type {texnansi} encoding that take over 2~MByte. My personal font tree has over a thousand such entries which means that we can prune the tree considerably when we use the \AFM\ loader. Why bother about \TFM\ when \AFM\ can do the job. In order to reduce the overhead in reading the \AFM\ file, we now use external caching, which (in \CONTEXT\ \MKIV) boils down to serializing the internal \AFM\ tables and compiling them to bytecode. As a result, the runtime becomes comparable to a run using regular \TFM\ files. On this document usign the \AFM\ reader (cached) takes some .3 seconds more on 8 seconds total (28 pages in Optima Nova with a couple of graphics). While we were playing with this, Hermann Zapf surprised me by sending me a \CD\ with his marvelous new Palatino Sans. So, instead of generating \TFM\ metrics, I decided to use \type {ttf2afm} to generate me an \AFM\ file from the \TRUETYPE\ files and use these metrics. It worked right out of the box which means that one can copy a set of font files directly from the source to the tree. In a demo document the Palatino Sans came out quite well and so we will use this font to explore the upcoming Open Type features. Because we now have less font resources (only two files per font) we decided to get away from the spread||all||over||the||tree paradigm. For this we introduced \starttyping ../fonts/data/vendor/collection \stoptyping like: \starttyping ../fonts/data/tex/latin-modern ../fonts/data/tex-gyre/bonum ../fonts/data/linotype/optima-nova ../fonts/data/linotype/palatino-nova ../fonts/data/linotype/palatino-sans \stoptyping Of course one needs to adapt the related font paths in the configuration files but getting that done in tex distributions is another story. \subject{map files} Reading an \AFM\ file is only part of the game. Because we bypass the regular \TFM\ reader we may internally end up with different names of fonts (and|/|or files). This also means that the map files that map an internal name onto an font (outline) file may be of no use. The map file also specifies the encoding file which maps character numbers onto names used in font files. The map file maps a font name to a (preferable outline) font resource file. This can be a file with suffix \type {pfb}, \type {ttf}, \type {otf} or alike. When we convert am \AFM\ file into a more suitable format, we also store the associated (outline) filename, that we use later when we assemble the map line data (we use \type {\pdfmapline} to tell \LUATEX\ how to prepare and embed a file. Eventually \LUATEX\ will take care of all these issues itself thereby rendering map files and encoding files kind of useless. When loading an \AFM\ file we already have to read encoding files, so we have all the information available that normally goes into the map file. While conducting experiments with reading \AFM\ files, we therefore could use the \type {\pdfmapline} primitive to push the right entries into font inclusion machinery. Because \CONTEXT\ already handles map data itself we could easily hook this into the normal handlers for that. (There are some nasty synchronization issues involved in handling map entries in general but we will not bother you with that now). Although eventually we may get rid of map files, we also used the general map file handling in \CONTEXT\ as a playground for the \XML\ handler that we wrote in \LUA. Playing with many map files (a few KBytes) coded in \XML\ format, or with one big map file (easily 800 MBytes) makes a good test case for loading and dumping But why bother too much about map files in \LUATEX\ \unknown\ they will go away anyway. \subject{OTF \& TTF} One of the reasons for starting the \LUATEX\ development was that we wanted to be able to use \OPENTYPE\ (and \TRUETYPE) fonts in \PDFTEX. As a prelude (and kind of transition) we first dealt with \TYPEONE\ using either \TFM\ or \AFM. For \TEX\ it does not really matter what font is used, it only deals with dimensions and generic characteristics. Of course, when fonts offer more advanced possibilities, we may need more features in the \TEX\ kernel, but think of \HZ\ or protruding as provided by \PDFTEX: it's not part of the font (specification) but of the engine. The same is actually true for kerning and ligature building, although here the font (data) may provide the information needed to deal with it properly. \OPENTYPE\ fonts come with features. Examples of features are using oldstyle figures or tabular digits instead of the default ones. Dealing with such issues boils down to replacing one character representation by another or treating combinations of character in the input differently depending on the circumstances. There can be relationships between languages and scripts, but, as \TEX ies know, other relationships exist as well, for instance between content and visualization. Therefore, it will be no surprise that \LUATEX\ does not simply implement the \OPENTYPE\ specification as such. On the one hand it implements a way to load information stored in the font, on the other hand it implements mechanisms to fullfil the demands of such fonts and more. The glue between both is done with \LUA. In the simple case of ligatures and kerns this goes as follows. A user (or macropackage) specified a font, and this call can be intercepted using a callback. This callback can use a built in function that loads an \OTF\ or \TTF\ font. From this table, a font table is constructed that is passed on to \TEX. The construction may involve building ligature and kerning tables using the information present in the font file, but it may as well mean more. So, given a bare \LUATEX\ system, \OPENTYPE\ font support is not giving you automatically handling of features, or more precisely, there is no hard coded support for features. This may sound as a disadvantage but as soon as you start looking at how \TEX\ users use their system (in most cases by using a macro package) you may understand that flexibility is larger this way. Instead of adding more and more control and exceptions, and thereby making the kernel more instable and complex, we delegate control to the macro package. The advantage is that there are no (everlasting) discussions on how to deal with things and in the end the user will use a high level interface anyway. Of course the macro package needs proper access to the font's internals, but this is provided: the code used for reading in the data comes from FontForge (an advanced font editor) and is presented via \LUA\ tables in a well organized way. Given that users expect \OPENTYPE\ features to be supported, how do we provide an interface. In \CONTEXT\ the user interface has always be an important aspect and consistency is a priority. On the other hand, there has been the tradition of specifying the size explicity and a new custom introduced by \XETEX\ to enhance fontname with directives. Traditional \TEX\ provides: \starttyping \font \name filename [optional size] \stoptyping \XETEX\ accepts \starttyping \font \name "fontname[:optional features]" [optional size] \font \name fontname[:optional features] [optional size] \stoptyping Instead of a fontname one can pass a filename between square brackets. \LUATEX\ handles: \starttyping \font \name anything [optional size] \font \name {anything} [optional size] \stoptyping where anything as well as the size are passed on to the callback. This permits us to implement a traditional specification, support \XETEX\ like definitions, and easily pass information from a macro package down to the callback as well. Interpreting anything is done in \LUA. While implementing the \LUA\ side of the loader we took a similar approach as the \AFM\ reader and cached intermediate tables as well as keep track of font names (in addition to filenames). In order to be able to quickly determine the (internal) font name of an \OPENTYPE\ font, special loader functions are provided. The size is kind of special, because we can have specifications like \starttyping at 10pt at 3ex at \dimexpr\bodyfontsize+1pt\relax \stoptyping This means that we need to handle that on the \TEX\ side and pass the calculated value to the callback. Virtual fonts have a rather special nature. They permit you to define variations of fonts using other fonts and special (\DVI\ related) operators. However, from the perspective of \TEX\ itself they don't exist at all. When you create a virtual font you also end up with a \TFM\ file and \TEX\ only needs this file, which defined characters in terms of a width, height, depth and italic correction as well as associates characters with kerning pairs and ligatures. \TEX\ leaves it to the backend to deal the actual glyphs and therefore the backend will be confronted by the internals of a virtual font. Because \PDFTEX\ and therefore \LUATEX\ has the backend built in, it is capable of handling virtual fonts information. In \LUATEX\ you can build your own virtual font and this will suit us well. It permits us for instance to complete fonts that lack certain characters (glyphs) and thereby let us get rid of ugly macro based fallback trickery. Although in \CONTEXT\ we will provide a high level interface, we will give you a taste of \LUA\ here. \starttyping callback.register("define_font", function(name,size) if name == "demo" then local f = font.read_tfm('texnansi-lmr10',size) if f then local capscale, digscale = 0.85, 0.75 f.name, f.type = name, 'virtual' f.fonts = { { name="texnansi-lmr10" , size=size }, { name="texnansi-lmss10", size=size*capscale }, { name="texnansi-lmtt10", size=size*digscale } } for k,v in pairs(f.characters) do local chr = utf.char(k) if chr:find("[A-Z]") then v.width = capscale*v.width v.commands = { {"special","pdf: 1 0 0 rg"}, {"font",2}, {"char",k}, {"special","pdf: 0 g"} } elseif chr:find("[0-9]") then v.width = digscale*v.width v.commands = { {"special","pdf: 0 0 1 rg"}, {"font",3}, {"char",k}, {"special","pdf: 0 g"} } else v.commands = { {"font",1}, {"char",k} } end end return f end end return font.read_tfm(name,size) end) \stoptyping Here we define a virtual font that uses three real fonts and which font is used depends on the kind of character we're dealing with (inreal world situations we can best use the \MKIV\ function that tells what class a character belongs to). The \type {commands} table determines what glyphs comes out in what way. We use a bit of literal pdf code to color the special characters but generally color is not handled at the font level. This example can be used like: \starttyping \font\test=demo \test Hi there, this is the first (number 1) example of playing with Virtual Fonts, some neat feature of \TeX, once you have access to it. For instance, we can misuse it to fill in gaps in fonts. \stoptyping During development of this mechanism, we decided to save some redundant loading by permitting id's in the fonts array: \starttyping callback.register("define_font", function(name,size) if name == "demo" then local f = font.read_tfm('texnansi-lmr10',size) if f then local id = font.define(f) local capscale, digscale = 0.85, 0.75 f.name, f.type = name, 'virtual' f.fonts = { { id=id }, { name="texnansi-lmss10", size=size*capscale }, { name="texnansi-lmtt10", size=size*digscale } } for k,v in pairs(f.characters) do local chr = utf.char(k) if chr:find("[A-Z]") then v.width = capscale*v.width v.commands = { {"special","pdf: 1 0 0 rg"}, {"slot",2,k}, {"special","pdf: 0 g"} } elseif chr:find("[0-9]") then v.width = digscale*v.width v.commands = { {"special","pdf: 0 0 1 rg"}, {"slot",3,k}, {"special","pdf: 0 g"} } else v.commands = { {"slot",1,k} } end end return f end end return font.read_tfm(name,size) end) \stoptyping Hardwiring fontnames in callbacks this way does not deserve a price and when possible we will provide better extension interfaces. Anyhow, in the experimental \CONTEXT\ code we used calls like this, where \type {demo} is an installed feature. \startbuffer \font\myfont = special@demo-1 at 12pt \myfont Hi there, this is the first (number 1) example of playing with Virtual Fonts, some neat feature of \TeX, once you have access to it. For instance, we can misuse it to fill in gaps in fonts. \stopbuffer \typebuffer \start \getbuffer \par \stop Keep in mind that this is just an example. In practice we will not do such things at the font level but by manipulating \TEX's internals. While developing this functionality and especially when Taco was programming the backend functionality, we used more sane \MKIV\ code. Think of (still \LUA) definitions like: \startbuffer \ctxlua { fonts.definers.methods.install("weird", { { "copy-range", "lmroman10-regular" } , { "copy-char", "lmroman10-regular", 65, 66 } , { "copy-range", "lmsans10-regular", 0x0100, 0x01FF } , { "copy-range", "lmtypewriter10-regular", 0x0200, 0xFF00 } , { "fallback-range", "lmtypewriter10-regular", 0x0000, 0x0200 } }) } \stopbuffer \typebuffer \getbuffer Again, this is not the final user interface, but it shows the direction we're heading. The result looks like: \startbuffer \font\test={myfont@weird} at 12pt \test \eacute \rcaron \adoublegrave \char65 \stopbuffer \typebuffer This shows up as: \start \getbuffer \stop Here the \type {@} tells the (new) \CONTEXT\ font handler what constructor should be used. Because some testers already have \XETEX\ font support files, we also support a \XETEX\ like definition syntax. \startbuffer \font\test={lmroman10-regular:dlig;liga}\test f i fi ffi \crlf f i f\kern0pti f\kern0ptf\kern0pti \crlf \char64259 \space\char64256 \char105 \space \char102\char102\char105 \stopbuffer \typebuffer This gives: \start \getbuffer \stop We are quite tolerant with regards to this specification and will provide less dense methods as well. Of course we need to implement a whole bunch of features but we will do this in such a way that we give users full control. \subject{encodings} By now we've reached a stage where we can get rid of font encodings. We now have the full unicode range available and no longer depend on the font encoding when we hyphenate. In a previous chapter we discussed the difference in size between formats. \starttabulate[|c|c|c|c|c|] \NC \bf date \NC \bf luatex \NC \bf pdftex \NC \NR \NC 2006-10-23 \NC 3 135 568 \NC 7 095 775 \NC \NR \NC 2007-02-18 \NC 3 373 206 \NC 7 426 451 \NC \NR \NC 2007-02-19 \NC 3 060 103 \NC 7 426 451 \NC \NR \stoptabulate The size of the formats has grown a bit due to a few more patterns and a extra preloaded encoding. But the \LUATEX\ format shrinks some 10\% now that we can get rid of encoding support. Some support for encodings is still present, so that one can keep using the metric files that are installed (for instance in project related trees that have special fonts) although \AFM/\TYPEONE\ files or \OPENTYPE\ fonts will be used when available. A couple of years from now, we may throw away some \LUA\ code related to encodings. \subject{files} \TEX\ distributions tend to be rather large, both in terms of files and bytes. Fonts take most of the space. The merged \TEX Live 2007 trees contain some 60.000 files that take 1.123 MBytes. Of this, 25.000 files concern fonts totaling to 431 MBytes. A recent \CONTEXT\ distribution spans 1200 files and 20 MBytes and a bit more when third party modules are taken into account. The fonts in \TEX Live are distributed as follows: \starttabulate[|l|r|r|r|r|] \HL \NC \bf format \NC \bf files \NC \bf bytes \NC \NC \NC \NR \HL \NC AFM \NC 1.769 \NC 123.068.970 \NC 443 \NC 22.290.132 \NC \NR \NC TFM \NC 10.613 \NC 44.915.448 \NC 2.346 \NC 8.028.920 \NC \NR \NC VF \NC 3.798 \NC 6.322.343 \NC 861 \NC 1.391.684 \NC \NR \NC TYPE1 \NC 2.904 \NC 180.567.337 \NC 456 \NC 18.375.045 \NC \NR \NC TRUETYPE \NC 22 \NC 1.494.943 \NC \NC \NC \NR \NC OPENTYPE \NC 144 \NC 17.571.732 \NC \NC \NC \NR \NC ENC \NC 268 \NC 782.680 \NC \NC \NC \NR \NC MAP \NC 406 \NC 6.098.982 \NC 110 \NC 129.135 \NC \NR \NC OFM \NC 39 \NC 10.309.792 \NC \NC \NC \NR \NC OVF \NC 39 \NC 413.352 \NC \NC \NC \NR \NC OVP \NC 22 \NC 2.698.027 \NC \NC \NC \NR \NC SOURCE \NC 4.736 \NC 25.932.413 \NC \NC \NC \NR \HL \stoptabulate We omitted the more obscure file types. The last two columns show the numbers for one of my local font trees. In due time we will see a shift from \TYPEONE\ to \OPENTYPE\ and \TRUETYPE\ files and because these fonts are more complete, they may take some more space. More important is that the \TEX\ specific font metric files will phase out and the less \TYPEONE\ fonts we have, the less \AFM\ companions we need (\AFM\ files are not compressed and therefore relatively large). Mapping and encoding files can also go away. In \LUATEX\ we can do with less files, but the number of bytes may grow a bit depending on how much is catched (especially fonts). Anyhow, we can safely assume that a \LUATEX\ based distributions will carry less files and less bytes around. \subject{fallbacks} Do we need virtual fonts? Currently in \CONTEXT, when a font encoding is chosen, a fallback mechanism steps in as soon as a character is not in the encoding. So far, so good. But occasionally we run into a font that does not (completely) fits an encoding and we end up with defining a non standard one. In traditional \TEX\ a side effects of font encodings is that they relate to hyphenation. \CONTEXT\ can deal with that comfortably and multiple instances of the same set of hyphenation patterns can be loaded, but for custom encodings this is kind of cumbersome. In \LUATEX\ we have just one font encoding: \UNICODE. When \OPENTYPE\ fonts are used, we don't expect many problems related to missing glyphs, but you can bet on it that they will occur. This is where in \CONTEXT\ \MKIV\ fallbacks will be used and this will be implemented using vitual fonts. The advantage of using virtual fonts is that we still deal with proper characters and hyphenation will take place as expected. And since virtual fonts can be defined on the fly, we can be flexible in our implementation. We can think of generic fallbacks, not much different than macro based representations, or font specific ones, where we even may rely on \METAPOST\ for generating the glyph data. How do we define a fall back character. When building this mechanism I used the \quote {\textcent} as an example. A cent symbol is roughly defined as follows: \starttyping local t = table.fastcopy(g.characters[0x0063]) -- mkiv function local s = fonts.constructors.scaled(g.fonts[1].size) -- mkiv function t.commands = { {"push"}, {"slot", 1, c}, {"pop"}, {"right", .5*t.width}, {"down", .2*t.height}, {"rule", 1.4*t.height, .02*s} } t.height = 1.2*t.height t.depth = 0.2*t.height \stoptyping Here, \type {g} is a loaded font (table) which has type \type {virtual}. The first font in the \type {fonts} array is the main font. What happens here is the following: we assign the characteristics of \quote {c} to the cent symbol (this includes kerning and dimensions) and then define a command sequence that draws the \quote {c} and a vertical rule through it. The real code is slightly more complicated because we need to take care of italic properties when applicable and because we have added some tracing too. While playing with this kind of things, it becomes clear what features are handy, and the reason that we now have a virtual command \type {comment} is that it permits us to implement tracing (using for instance color specials). \def\TestLine#1% {\start \font\test=#1\relax \test c\quad \textcent\quad \ruledhbox{c}\quad \ruledhbox{\textcent}\quad \scaron\quad \eacute\quad \adiaeresis\quad \udiaeresis\quad \char 465\quad \char 463\quad \char7685\quad \stop \blank} \TestLine {lmroman10-regular@demo-2 at 24pt} \TestLine {lmroman10-italic@demo-2 at 24pt} The previous lines are typeset using a similar specification as mentioned before: \starttyping \font\test=lmroman10-regular@demo-2 \stoptyping Without the fallbacks we get: \TestLine {lmroman10-regular at 24pt} \TestLine {lmroman10-italic at 24pt} And with normal (non forced fallbacks) it looks as follows. As it happens, this font has a cent symbol so no fallback is needed. \TestLine {lmroman10-regular@demo-3 at 24pt} \TestLine {lmroman10-italic@demo-3 at 24pt} The font definition callback intercepts the \type {demo-2} and a couple of chained lua functions make sure that characters missing in the font are replaced by fallbacks. In the case of missing composed characters, they are constructed from their components. In this particular example we have told the handler to assume that all composed characters are missing. \subject{memory} Traditional \TEX\ has been designed for speed and a small memory footprint. Todays implementations are considerably more generous with the amount of memory that you can use (hash, fonts, main memory, patterns, backend, etc). Depending on how complicated a document layout it, memory may run into tens of megabytes. Because \LUATEX\ is not only suitable for wide fonts, but also does away with some of the optimizations in the \TEX\ code that complicate extensions, it has a larger footprint that \PDFTEX. When implementing the \OPENTYPE\ font basics, we did quite some tests with respect to memory usage. Getting the numbers right is non trivial because the \LUA\ garbage collector is interfering. For instance, on my machine a test file with the regular \CONTEXT\ setup of of Latin Modern fonts made \LUA\ allocate 130 MB, while the same run on Taco's machine took 100 MB. When a font data table is constructed, it is handled over to \TEX, and turned into the internal font data structures. During the construction of that \TABLE\ at the \LUA\ end, \CONTEXT\ \MKIV\ disables the garbage collector. By doing this, the time needed to construct and scale a font can be halved. Curious to the amount of memory involved in passing such a table, I added the following piece of code: \starttyping if type(fontdata) == "table" then local s = statistics.luastate_bytes local t = table.copy(fontdata) local d = statistics.luastate_bytes-s texio.write_nl(string.format("table memory footprint: %s",d)) end \stoptyping It turned out that a Regular Latin Modern font (\OPENTYPE) takes around 800 KB. However, more interesting was that by adding this snippet of testcode which duplicted the table in order to measure its size, the total memory footprint dropped to 100 MB (about the amount used on Taco's machine). This demonstrates that one should be very careful with drawing conclusions. Because fonts are rather important in \TEX\ and because there can be lots of them used, it makes sense to keep an eye on memory as well as performance. Because many manipulations now take place in \LUA, it no longer makes sense to let \TEX\ buffer fonts. In plain \TEX\ one finds these magic \starttyping \font\preloaded=cmr10 \font\preloaded=cmr12 \stoptyping lines. The second definitions obscures the first, but the \type {cmr10} stays loaded. \starttyping \font\one=cmr10 at 10pt \font\two=cmr10 at 10pt \stoptyping These two definitions make \TEX\ load the font only once. However, since we can now delegate loading to \LUA, \TEX\ no longer helps us there. For instance, \TEX\ has no knowledge to what extend this \type {cmr10} font has been manipulated and therefore both instances may actually differ. When you use a callback to define the font, \TEX\ passes a font id number. You can use this number as a reference to a loaded font (that is, passed to \TEX). If instead of a table, you return a number, \TEX\ will reuse the already loaded font. This feature can save you a lot of time, especially when a macro package (like \CONTEXT) defines fonts dynamically which means that when grouping is used, fonts get (re)defined a lot. Of course additional caching can take place at the \LUA\ end, but there one needs to take into account more than just the scaled instance. Think of \OPENTYPE\ features or virtual font properties. The following are quite certainly different setups, in spite of the common size. \starttyping \font\one=lmr10@demo-1 at 10pt \font\two=lmr10@demo-2 at 10pt \stoptyping When scaling a font, one not only needs to handle the regular glyph dimensions, but also the kerning tables. We found out that dealing with such issues takes some 25\% of the time spent on loading Latin Modern fonts that have rather extensive kerning tables. When creating a virtual font, copying glyph tables may happen a lot. Deep copying tables takes a bit of time. This is one of the reasons why we discussed (and consider) some dedicated support functions so that copying and recalculating tables happens faster (less costly hash lookups and such). On the other hand, the time wasted on calculations (including rounding to scaled points) can be neglected. The following table shows what happens when we enforce a different garbage collecting scheme. This test was triggered by another experiment where at regular time, for instance after a pag eis shipped out, say \starttyping collectgarbage("collect") \stoptyping However, such a complete sweep has drastic consequences for the runtime. But, since the memory footprint becomes 10--15\% less by doing so, we played a bit with \starttyping collectgarbage("setstepmul", somenumber) \stoptyping When processing a not so large file but one that loads a bunch of open type fonts, we get the following values. The left set is on linux (Taco's machine) and the right set in mine. \starttabulate[|r|r|r|r|r|] \NC \bf stepmul \NC \bf run (s) \NC \bf mem (MB) \NC \bf run (s) \NC \bf mem (MB) \NC \NR \HL \NC 200 \NC 1.58 \NC 69.14 \NC 5.6 \NC 84.17 \NC \NR \NC 1000 \NC 1.63 \NC 69.14 \NC 6.5 \NC 72.32 \NC \NR \NC 2000 \NC 1.64 \NC 60.66 \NC 6.8 \NC 73.53 \NC \NR \NC 10000 \NC 1.71 \NC 59.94 \NC 7.0 \NC 72.30 \NC \NR \stoptabulate Since I use an old laptop running Windows with a probably different \TEX\ configuration (fonts), and under some load, both columns don't compare well, but the general idea is the same. For practical usage a value of 1000 is probably best, especially because memory intensive font and script loading only happens at the first couple of pages. \stopcomponent