path: root/doc/context/sources/general/manuals/mk/mk-fonts.tex
diff options
authorContext Git Mirror Bot <>2016-08-01 16:40:14 +0200
committerContext Git Mirror Bot <>2016-08-01 16:40:14 +0200
commit96f283b0d4f0259b7d7d1c64d1d078c519fc84a6 (patch)
treee9673071aa75f22fee32d701d05f1fdc443ce09c /doc/context/sources/general/manuals/mk/mk-fonts.tex
parentc44a9d2f89620e439f335029689e7f0dff9516b7 (diff)
2016-08-01 14:21:00
Diffstat (limited to 'doc/context/sources/general/manuals/mk/mk-fonts.tex')
1 files changed, 841 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/mk/mk-fonts.tex b/doc/context/sources/general/manuals/mk/mk-fonts.tex
new file mode 100644
index 000000000..b5e945923
--- /dev/null
+++ b/doc/context/sources/general/manuals/mk/mk-fonts.tex
@@ -0,0 +1,841 @@
+% language=uk
+\startcomponent mk-fonts
+\environment mk-environment
+\chapter{A fresh look at fonts}
+Now that we have the file system, \LUA\ script integration, input
+encoding and basic logging in place, we have arrived at fonts.
+Although today \OPENTYPE\ fonts are the fashion, we still need to
+deal with \TEX's native font machinery. Although Latin Modern and
+the \TEX\ Gyre collection will bring us many free \OPENTYPE\
+fonts, we can be sure that for a long time \TYPEONE\ variants will
+be used as well, and when one has lots of bought fonts, replacing
+them with \OPENTYPE\ updates is not always an option. And so,
+reimplementing the readers for \TEX\ Font Metrics (\type {tfm}
+files) and Virtual Fonts (\type {vf} files), was the first step.
+Because \ALEPH\ font handling was integrated already, Taco decided
+to combine the \TFM\ and \OFM\ readers into a new one. The
+combined loader is written in C and produces tables that are
+accessible from within \LUA. A problem is that once a font is
+used, one cannot simply change its metrics. So, we have to make
+sure that we apply changes before a font is actually used:
+\font\test=texnansi-lmr at 31.415 pt
+\test Yet another nice Kate Bush song: Pi
+In this example, any change to the fontmetrics has to be done before
+\type {test} is invoked. For this purpose the \type {define_font}
+callback is provided. Below you see an experimental overload:
+callback.register("define_font", function (name,area,size)
+ return fonts.patches.process(font.read_tfm(name,size))
+end )
+The \type {fonts.patched.process} function (currently in \CONTEXT\
+\MKIV) implements a mechanism for tweaking the font parameters in
+between. In order to get an idea of further features we played a
+bit with ligature replacement, character spacing, kern tweaking
+etc. Think of such a function (or a chain of functions) doing
+things similar to:
+callback.register("define_font", function (name,area,size)
+ local tfmblob = font.read_tfm(name,size) -- build in loader
+ tfmblob.characters[string.byte("f")].ligatures = nil
+ return tfmblob -- datastructure that TeX will use internally
+end )
+Of course the above definition is not complete, if only because we
+need to handle chained ligatures as well (fl followed by i).
+In practice we prefer a more abstract interface (at the macro
+level) but the idea stays the same. Interesting is that having
+access to the internals this way already makes our \TEX\ Live more
+interesting. (We cannot demonstrate this trickery here because
+when this document is processed you cannot be sure if the
+experimental interface is still in place.)
+When playing with this we ran into problems with file searching.
+When performing the backend role, \LUATEX\ will look in the \TEX\
+tree if there is a corresponding virtual file. It took a while and
+a bit of tracing (which is not that hard in the \LUA\ based
+reader) to figure out that the omega related path definitions in
+\type {texmf.cnf} files were not correct, something that went
+unnoticed because omega never had a backend integrated and the
+\DVI\ processors did multiple searches to get around this.
+Currently, if you want to enable extensive tracing of file
+searching and loading, you can set an environment variable:
+This will produce a lot of information about what file is asked
+for, what types (tex, font, etc) determines the search, along what
+paths is being searched, what readers and locators are used (file,
+zip, protocol), etc.
+While Taco implemented the virtual font reader |<|eventually its
+data will be merged with the \TFM\ table|>| I started playing with
+constructing \TFM\ tables directly. Because \CONTEXT\ has a rather
+systematic naming scheme, we can rather easily see which encoding
+we are dealing with. This means that in principle we can throw all
+encoded \TFM\ files out of our tree and construct the tables using
+the \AFM\ file and an encoding vector.
+It took us a good day to figure out the details, but in the end we
+were able to trick \LUATEX\ into using \AFM\ files. With a bit of
+internal caching it was even reasonable fast. When the basic
+conversion mechanism was written we tried to compare the results
+with existing \TFM\ metrics as generated by \type {afm2tfm} and
+\type {afm2pl}. Doing so was less trivial than we first thought.
+To mention a few aspects:
+\item heights and depths have a limited number of values in \TEX
+\item we need to convert to \TEX's scaled points
+\item rounding errors of one scaled point occur
+\item \type {afm2tfm} can only add kerns when virtual fonts are used
+\item \type {afm2tfm} adds some extra ligatures and also does some
+ kern magic
+\item \type {afm2pl} adds even more kerns
+\item the tools remove kern pars between digits
+In this perspective we need not be too picky on what exactly a
+ligature is. An example of a ligature is \type {fi} and such a
+character can be in the font. In the \TFM\ file, the definition of
+\type {f} contains information about what to do when it's followed
+by an \type {i}: it has to insert a reference (character number)
+pointing to the fi glyph.
+However, because \TEX\ was written in \ASCII\ time space, there
+was a problem of how to get access to for instance the Spanish
+quotation and exclamation marks. Here the ligature mechanism
+available in the \TFM\ format was misused in the sense that a
+combination of \type {exclam} and \type {quoteleft} becomes \type
+{exclamdown}. In a similar fashion will two single quotes become a
+double quote. And every \TEX ie knows that multiple hyphens
+combine into -- (endash) and --- (emdash), where the later one is
+achieved by defining a ligature between an endash and a hyphen.
+Of course we have to deal with conversions from \AFM\ units (1000
+per em) to \TEX's scaled points. Such conversions may be sensitive
+for rounding errors. Because we noticed differences of one scaled
+point, I tried several strategies to get the results consistent
+but so far I didn't manage to find out where these differences
+come from. Rounding errors seem to be rather random and I have no
+clue what strategy the regular converters follow. Another fuzzy
+area are the font parameters (visible as font dimensions for
+users): I wonder how many users really know what values are used
+and why.
+You may wonder to what extend this rounding problem will influence
+consistent typesetting. We have no reason to assume that the
+rounding error is operating system dependent. This leaves the
+different methods used and personally I have no problems with the
+direct reader being not 100\% compatible with the regular tools.
+First of all it's an illusion to think that \TEX\ distributions
+are stable over the years. Fonts and conversion tools are being
+updated every now and then, and metrics change over time (apart
+from Computer Modern which is stable by definition). Also, pattern
+file are updated, so paragraphs may be broken into lines different
+anyway. If you really want stability, then you need to store the
+fonts and patterns with your document.
+As we already mentioned, the regular converter programs add kerns
+as well. Treating common glyph shapes similar is not uncommon in
+\CONTEXT\ so I decided to provide methods for adding \quote
+{missing} kerns. For example, with regards to kerning, we can
+treat \type {eacute} the same way as an~\type {e}. Some ligatures,
+like \type {ae} or \type {fi}, need to be seen from two sides:
+when looked at from the left side they resemble an \type {a} and
+\type {f}, but when kerned at their right, they are to be treated
+as \type {e} and \type {i}.
+So, when all this is taken care of, we will have a reasonable
+robust and compatible way to deal with \AFM\ files and when this
+variant is enabled, we can prune our \TEX\ trees pretty well.
+Also, now that we have font related tables, we can start moving
+tables built out of \TEX\ macros (think of protruding and hz) to
+\LUA, which will not only save us much hash entries but also
+permits us faster implementations.
+The question may arise why there is no hard coded \AFM\ reader.
+Although some speed up can be achieved by reading the table with
+\AFM\ data directly, there would still be the issue of making that
+table accessible for manipulations as described (costs time too).
+The \AFM\ format is human readable contrary to the \TFM\ format
+and therefore they can conveniently be processed by \LUA. Also,
+the possible manipulations may differ per macro package, user, and
+even documents. The changes of users and developers reaching an
+agreement about such issues is near zero. By writing the reader in
+\LUA, a macro package writer can also implement caching mechanisms
+that suits the package. Also, keep in mind that we often only need
+to load about four \AFM\ files or a few more when we mix fonts.
+In my main tree (regular distributions) there are some 350 files
+in \type {texnansi} encoding that take over 2~MByte. My personal
+font tree has over a thousand such entries which means that we can
+prune the tree considerably when we use the \AFM\ loader. Why
+bother about \TFM\ when \AFM\ can do the job.
+In order to reduce the overhead in reading the \AFM\ file, we now
+use external caching, which (in \CONTEXT\ \MKIV) boils down to
+serializing the internal \AFM\ tables and compiling them to
+bytecode. As a result, the runtime becomes comparable to a run
+using regular \TFM\ files. On this document usign the \AFM\ reader
+(cached) takes some .3 seconds more on 8 seconds total (28 pages
+in Optima Nova with a couple of graphics).
+While we were playing with this, Hermann Zapf surprised me by
+sending me a \CD\ with his marvelous new Palatino Sans. So,
+instead of generating \TFM\ metrics, I decided to use \type
+{ttf2afm} to generate me an \AFM\ file from the \TRUETYPE\ files
+and use these metrics. It worked right out of the box which means
+that one can copy a set of font files directly from the source to
+the tree. In a demo document the Palatino Sans came out quite well
+and so we will use this font to explore the upcoming Open Type
+Because we now have less font resources (only two files per font)
+we decided to get away from the spread||all||over||the||tree
+paradigm. For this we introduced
+Of course one needs to adapt the related font paths in the
+configuration files but getting that done in tex distributions is
+another story.
+\subject{map files}
+Reading an \AFM\ file is only part of the game. Because we bypass
+the regular \TFM\ reader we may internally end up with different
+names of fonts (and|/|or files). This also means that the map
+files that map an internal name onto an font (outline) file may be
+of no use. The map file also specifies the encoding file which
+maps character numbers onto names used in font files.
+The map file maps a font name to a (preferable outline) font
+resource file. This can be a file with suffix \type {pfb}, \type
+{ttf}, \type {otf} or alike. When we convert am \AFM\ file into a
+more suitable format, we also store the associated (outline)
+filename, that we use later when we assemble the map line data (we
+use \type {\pdfmapline} to tell \LUATEX\ how to prepare and embed
+a file.
+Eventually \LUATEX\ will take care of all these issues itself
+thereby rendering map files and encoding files kind of useless.
+When loading an \AFM\ file we already have to read encoding files,
+so we have all the information available that normally goes into
+the map file. While conducting experiments with reading \AFM\
+files, we therefore could use the \type {\pdfmapline} primitive to
+push the right entries into font inclusion machinery. Because
+\CONTEXT\ already handles map data itself we could easily hook
+this into the normal handlers for that. (There are some nasty
+synchronization issues involved in handling map entries in general
+but we will not bother you with that now).
+Although eventually we may get rid of map files, we also used the
+general map file handling in \CONTEXT\ as a playground for the
+\XML\ handler that we wrote in \LUA. Playing with many map files
+(a few KBytes) coded in \XML\ format, or with one big map file
+(easily 800 MBytes) makes a good test case for loading and dumping
+But why bother too much about map files in \LUATEX\ \unknown\ they
+will go away anyway.
+\subject{OTF \& TTF}
+One of the reasons for starting the \LUATEX\ development was that we wanted to
+be able to use \OPENTYPE\ (and \TRUETYPE) fonts in \PDFTEX. As a prelude (and kind of
+transition) we first dealt with \TYPEONE\ using either \TFM\ or \AFM. For \TEX\ it does
+not really matter what font is used, it only deals with dimensions and generic
+characteristics. Of course, when fonts offer more advanced possibilities, we may
+need more features in the \TEX\ kernel, but think of \HZ\ or protruding as provided
+by \PDFTEX: it's not part of the font (specification) but of the engine. The same
+is actually true for kerning and ligature building, although here the font (data) may
+provide the information needed to deal with it properly.
+\OPENTYPE\ fonts come with features. Examples of features are using oldstyle figures or
+tabular digits instead of the default ones. Dealing with such issues boils down to
+replacing one character representation by another or treating combinations of character
+in the input differently depending on the circumstances. There can be relationships
+between languages and scripts, but, as \TEX ies know, other relationships exist as well,
+for instance between content and visualization.
+Therefore, it will be no surprise that \LUATEX\ does not simply implement the \OPENTYPE\
+specification as such. On the one hand it implements a way to load information stored
+in the font, on the other hand it implements mechanisms to fullfil the demands of such
+fonts and more. The glue between both is done with \LUA. In the simple case of ligatures
+and kerns this goes as follows. A user (or macropackage) specified a font, and this
+call can be intercepted using a callback. This callback can use a built in function that
+loads an \OTF\ or \TTF\ font. From this table, a font table is constructed that is passed
+on to \TEX. The construction
+may involve building ligature and kerning tables using the information present
+in the font file, but it may as well mean more. So, given a bare \LUATEX\ system,
+\OPENTYPE\ font support is not giving you automatically handling of features, or more
+precisely, there is no hard coded support for features.
+This may sound as a disadvantage
+but as soon as you start looking at how \TEX\ users use their system (in most cases
+by using a macro package) you may understand that flexibility is larger this way. Instead
+of adding more and more control and exceptions, and thereby making the kernel more
+instable and complex, we delegate control to the macro package. The advantage is that
+there are no (everlasting) discussions on how to deal with things and in the end the
+user will use a high level interface anyway. Of course the macro package needs proper
+access to the font's internals, but this is provided: the code used for reading in the
+data comes from FontForge (an advanced font editor) and is presented via \LUA\ tables
+in a well organized way.
+Given that users expect \OPENTYPE\ features to be supported, how do we provide an
+interface. In \CONTEXT\ the user interface has always be an important aspect and
+consistency is a priority. On the other hand, there has been the tradition of specifying
+the size explicity and a new custom introduced by \XETEX\ to enhance fontname
+with directives. Traditional \TEX\ provides:
+\font \name filename [optional size]
+\XETEX\ accepts
+\font \name "fontname[:optional features]" [optional size]
+\font \name fontname[:optional features] [optional size]
+Instead of a fontname one can pass a filename between square brackets. \LUATEX\
+\font \name anything [optional size]
+\font \name {anything} [optional size]
+where anything as well as the size are passed on to the callback.
+This permits us to implement a traditional specification, support \XETEX\ like
+definitions, and easily pass information from a macro package down to the
+callback as well. Interpreting anything is done in \LUA.
+While implementing the \LUA\ side of the loader we took a similar approach
+as the \AFM\ reader and cached intermediate tables as well as keep track
+of font names (in addition to filenames). In order to be able to quickly
+determine the (internal) font name of an \OPENTYPE\ font, special loader
+functions are provided.
+The size is kind of special, because we can have specifications like
+at 10pt
+at 3ex
+at \dimexpr\bodyfontsize+1pt\relax
+This means that we need to handle that on the \TEX\ side and pass the
+calculated value to the callback.
+Virtual fonts have a rather special nature. They permit you to define variations
+of fonts using other fonts and special (\DVI\ related) operators. However, from the
+perspective of \TEX\ itself they don't exist at all. When you create a virtual font
+you also end up with a \TFM\ file and \TEX\ only needs this file, which defined
+characters in terms of a width, height, depth and italic correction as well as
+associates characters with kerning pairs and ligatures. \TEX\ leaves it to the
+backend to deal the actual glyphs and therefore the backend will be confronted
+by the internals of a virtual font. Because \PDFTEX\ and therefore \LUATEX\ has the
+backend built in, it is capable of handling virtual fonts information.
+In \LUATEX\ you can build your own virtual font and this will suit us well. It
+permits us for instance to complete fonts that lack certain characters (glyphs) and
+thereby let us get rid of ugly macro based fallback trickery. Although in \CONTEXT\
+we will provide a high level interface, we will give you a taste of \LUA\ here.
+callback.register("define_font", function(name,size)
+ if name == "demo" then
+ local f = font.read_tfm('texnansi-lmr10',size)
+ if f then
+ local capscale, digscale = 0.85, 0.75
+, f.type = name, 'virtual'
+ f.fonts = {
+ { name="texnansi-lmr10" , size=size },
+ { name="texnansi-lmss10", size=size*capscale },
+ { name="texnansi-lmtt10", size=size*digscale }
+ }
+ for k,v in pairs(f.characters) do
+ local chr = utf.char(k)
+ if chr:find("[A-Z]") then
+ v.width = capscale*v.width
+ v.commands = {
+ {"special","pdf: 1 0 0 rg"},
+ {"font",2}, {"char",k},
+ {"special","pdf: 0 g"}
+ }
+ elseif chr:find("[0-9]") then
+ v.width = digscale*v.width
+ v.commands = {
+ {"special","pdf: 0 0 1 rg"},
+ {"font",3}, {"char",k},
+ {"special","pdf: 0 g"}
+ }
+ else
+ v.commands = {
+ {"font",1}, {"char",k}
+ }
+ end
+ end
+ return f
+ end
+ end
+ return font.read_tfm(name,size)
+Here we define a virtual font that uses three real fonts and
+which font is used depends on the kind of character we're
+dealing with (inreal world situations we can best use the \MKIV\ function
+that tells what class a character belongs to). The \type {commands}
+table determines what glyphs comes out in what way. We use a bit of
+literal pdf code to color the special characters but generally color is
+not handled at the font level.
+This example can be used like:
+\font\test=demo \test
+Hi there, this is the first (number 1) example of playing with
+Virtual Fonts, some neat feature of \TeX, once you have access
+to it. For instance, we can misuse it to fill in gaps in fonts.
+During development of this mechanism, we decided to save some redundant
+loading by permitting id's in the fonts array:
+callback.register("define_font", function(name,size)
+ if name == "demo" then
+ local f = font.read_tfm('texnansi-lmr10',size)
+ if f then
+ local id = font.define(f)
+ local capscale, digscale = 0.85, 0.75
+, f.type = name, 'virtual'
+ f.fonts = {
+ { id=id },
+ { name="texnansi-lmss10", size=size*capscale },
+ { name="texnansi-lmtt10", size=size*digscale }
+ }
+ for k,v in pairs(f.characters) do
+ local chr = utf.char(k)
+ if chr:find("[A-Z]") then
+ v.width = capscale*v.width
+ v.commands = {
+ {"special","pdf: 1 0 0 rg"},
+ {"slot",2,k},
+ {"special","pdf: 0 g"}
+ }
+ elseif chr:find("[0-9]") then
+ v.width = digscale*v.width
+ v.commands = {
+ {"special","pdf: 0 0 1 rg"},
+ {"slot",3,k},
+ {"special","pdf: 0 g"}
+ }
+ else
+ v.commands = {
+ {"slot",1,k}
+ }
+ end
+ end
+ return f
+ end
+ end
+ return font.read_tfm(name,size)
+Hardwiring fontnames in callbacks this way does not deserve a price and
+when possible we will provide better extension interfaces. Anyhow,
+in the experimental \CONTEXT\ code we used calls like this, where
+\type {demo} is an installed feature.
+\font\myfont = special@demo-1 at 12pt \myfont
+Hi there, this is the first (number 1) example of playing with Virtual Fonts,
+some neat feature of \TeX, once you have access to it. For instance, we can
+misuse it to fill in gaps in fonts.
+\typebuffer \start \getbuffer \par \stop
+Keep in mind that this is just an example. In practice we will not do such things
+at the font level but by manipulating \TEX's internals.
+While developing this functionality and especially when Taco was
+programming the backend functionality, we used more sane \MKIV\ code. Think
+of (still \LUA) definitions like:
+\ctxlua {
+ fonts.definers.methods.install("weird", {
+ { "copy-range", "lmroman10-regular" } ,
+ { "copy-char", "lmroman10-regular", 65, 66 } ,
+ { "copy-range", "lmsans10-regular", 0x0100, 0x01FF } ,
+ { "copy-range", "lmtypewriter10-regular", 0x0200, 0xFF00 } ,
+ { "fallback-range", "lmtypewriter10-regular", 0x0000, 0x0200 }
+ })
+\typebuffer \getbuffer
+Again, this is not the final user interface, but it shows the
+direction we're heading. The result looks like:
+\font\test={myfont@weird} at 12pt \test
+\eacute \rcaron \adoublegrave \char65
+This shows up as:
+\start \getbuffer \stop
+Here the \type {@} tells the (new) \CONTEXT\ font handler what constructor
+should be used.
+Because some testers already have \XETEX\ font support files, we
+also support a \XETEX\ like definition syntax.
+f i fi ffi \crlf
+f i f\kern0pti f\kern0ptf\kern0pti \crlf
+\char64259 \space\char64256 \char105 \space \char102\char102\char105
+This gives:
+\start \getbuffer \stop
+We are quite tolerant with regards to this specification and will provide less
+dense methods as well. Of course we need to implement a whole bunch of
+features but we will do this in such a way that we give users full control.
+By now we've reached a stage where we can get rid of font encodings. We now
+have the full unicode range available and no longer depend on the font
+encoding when we hyphenate. In a previous chapter we discussed the difference
+in size between formats.
+\NC \bf date \NC \bf luatex \NC \bf pdftex \NC \NR
+\NC 2006-10-23 \NC 3 135 568 \NC 7 095 775 \NC \NR
+\NC 2007-02-18 \NC 3 373 206 \NC 7 426 451 \NC \NR
+\NC 2007-02-19 \NC 3 060 103 \NC 7 426 451 \NC \NR
+The size of the formats has grown a bit due to a few more
+patterns and a extra preloaded encoding. But the \LUATEX\
+format shrinks some 10\% now that we can get rid of encoding
+support. Some support for encodings is still present, so that
+one can keep using the metric files that are installed (for
+instance in project related trees that have special fonts)
+although \AFM/\TYPEONE\ files or \OPENTYPE\ fonts will be used when
+A couple of years from now, we may throw away some \LUA\ code
+related to encodings.
+\TEX\ distributions tend to be rather large, both in terms of
+files and bytes. Fonts take most of the space. The merged
+\TEX Live 2007 trees contain some 60.000 files that take
+1.123 MBytes. Of this, 25.000 files concern fonts totaling
+to 431 MBytes. A recent \CONTEXT\ distribution spans 1200 files and
+20 MBytes and a bit more when third party modules are taken into
+account. The fonts in \TEX Live are distributed as follows:
+\NC \bf format \NC \bf files \NC \bf bytes \NC \NC \NC \NR
+\NC AFM \NC 1.769 \NC 123.068.970 \NC 443 \NC 22.290.132 \NC \NR
+\NC TFM \NC 10.613 \NC 44.915.448 \NC 2.346 \NC 8.028.920 \NC \NR
+\NC VF \NC 3.798 \NC 6.322.343 \NC 861 \NC 1.391.684 \NC \NR
+\NC TYPE1 \NC 2.904 \NC 180.567.337 \NC 456 \NC 18.375.045 \NC \NR
+\NC TRUETYPE \NC 22 \NC 1.494.943 \NC \NC \NC \NR
+\NC OPENTYPE \NC 144 \NC 17.571.732 \NC \NC \NC \NR
+\NC ENC \NC 268 \NC 782.680 \NC \NC \NC \NR
+\NC MAP \NC 406 \NC 6.098.982 \NC 110 \NC 129.135 \NC \NR
+\NC OFM \NC 39 \NC 10.309.792 \NC \NC \NC \NR
+\NC OVF \NC 39 \NC 413.352 \NC \NC \NC \NR
+\NC OVP \NC 22 \NC 2.698.027 \NC \NC \NC \NR
+\NC SOURCE \NC 4.736 \NC 25.932.413 \NC \NC \NC \NR
+We omitted the more obscure file types. The last two columns show the
+numbers for one of my local font trees.
+In due time we will see a shift from \TYPEONE\ to \OPENTYPE\ and \TRUETYPE\
+files and because these fonts are more
+complete, they may take some more space. More important is that the \TEX\ specific
+font metric files will phase out and the less \TYPEONE\ fonts we have, the less \AFM\
+companions we need (\AFM\ files are not compressed and therefore relatively
+large). Mapping and encoding files can also go away.
+In \LUATEX\ we can do with less files, but the number of bytes may grow a bit
+depending on how much is catched (especially fonts). Anyhow, we can safely
+assume that a \LUATEX\ based distributions will carry less files and less
+bytes around.
+Do we need virtual fonts? Currently in \CONTEXT, when a font encoding is chosen, a
+fallback mechanism steps in as soon as a character is not in the encoding. So far,
+so good. But occasionally we run into a font that does not (completely) fits an
+encoding and we end up with defining a non standard one. In traditional \TEX\
+a side effects of font encodings is that they relate to hyphenation. \CONTEXT\ can
+deal with that comfortably and multiple instances of the same set of hyphenation
+patterns can be loaded, but for custom encodings this is kind of cumbersome.
+In \LUATEX\ we have just one font encoding: \UNICODE. When \OPENTYPE\ fonts are used,
+we don't expect many problems related to missing glyphs, but you can bet on it that
+they will occur. This is where in \CONTEXT\ \MKIV\ fallbacks will be used and this
+will be implemented using vitual fonts. The advantage of using virtual fonts is that
+we still deal with proper characters and hyphenation will take place as expected. And
+since virtual fonts can be defined on the fly, we can be flexible in our implementation.
+We can think of generic fallbacks, not much different than macro based representations,
+or font specific ones, where we even may rely on \METAPOST\ for generating the glyph
+How do we define a fall back character. When building this mechanism I used the
+\quote {\textcent} as an example. A cent symbol is roughly defined as follows:
+local t = table.fastcopy(g.characters[0x0063]) -- mkiv function
+local s = fonts.constructors.scaled(g.fonts[1].size) -- mkiv function
+t.commands = {
+ {"push"},
+ {"slot", 1, c},
+ {"pop"},
+ {"right", .5*t.width},
+ {"down", .2*t.height},
+ {"rule", 1.4*t.height, .02*s}
+t.height = 1.2*t.height
+t.depth = 0.2*t.height
+Here, \type {g} is a loaded font (table) which has type \type {virtual}. The
+first font in the \type {fonts} array is the main font. What happens here
+is the following: we assign the characteristics of \quote {c} to the cent
+symbol (this includes kerning and dimensions) and then define a command
+sequence that draws the \quote {c} and a vertical rule through it.
+The real code is slightly more complicated because we need to take care of
+italic properties when applicable and because we have added some tracing too.
+While playing with this kind of things, it becomes clear what features are
+handy, and the reason that we now have a virtual command \type {comment} is
+that it permits us to implement tracing (using for instance color specials).
+ {\start
+ \font\test=#1\relax
+ \test
+ c\quad
+ \textcent\quad
+ \ruledhbox{c}\quad
+ \ruledhbox{\textcent}\quad
+ \scaron\quad
+ \eacute\quad
+ \adiaeresis\quad
+ \udiaeresis\quad
+ \char 465\quad
+ \char 463\quad
+ \char7685\quad
+ \stop
+ \blank}
+\TestLine {lmroman10-regular@demo-2 at 24pt}
+\TestLine {lmroman10-italic@demo-2 at 24pt}
+The previous lines are typeset using a similar specification as mentioned
+Without the fallbacks we get:
+\TestLine {lmroman10-regular at 24pt}
+\TestLine {lmroman10-italic at 24pt}
+And with normal (non forced fallbacks) it looks as follows. As it happens,
+this font has a cent symbol so no fallback is needed.
+\TestLine {lmroman10-regular@demo-3 at 24pt}
+\TestLine {lmroman10-italic@demo-3 at 24pt}
+The font definition callback intercepts the \type {demo-2} and a couple of
+chained lua functions make sure that characters missing in the font are
+replaced by fallbacks. In the case of missing composed characters, they are
+constructed from their components. In this particular example we have told
+the handler to assume that all composed characters are missing.
+Traditional \TEX\ has been designed for speed and a small memory footprint. Todays
+implementations are considerably more generous with the amount of memory that
+you can use (hash, fonts, main memory, patterns, backend, etc). Depending
+on how complicated a document layout it, memory may run into tens of megabytes.
+Because \LUATEX\ is not only suitable for wide fonts, but also does away with some of
+the optimizations in the \TEX\ code that complicate extensions, it has a larger
+footprint that \PDFTEX. When implementing the \OPENTYPE\ font basics, we did quite
+some tests with respect to memory usage. Getting the numbers right is non trivial
+because the \LUA\ garbage collector is interfering. For instance, on my machine a
+test file with the regular \CONTEXT\ setup of of Latin Modern fonts made \LUA\
+allocate 130 MB, while the same run on Taco's machine took 100 MB.
+When a font data table is constructed, it is handled over to \TEX, and turned into
+the internal font data structures. During the construction of that \TABLE\ at the
+\LUA\ end, \CONTEXT\ \MKIV\ disables the garbage collector. By doing this, the time
+needed to construct and scale a font can be halved. Curious to the amount of memory
+involved in passing such a table, I added the following piece of code:
+if type(fontdata) == "table" then
+ local s = statistics.luastate_bytes
+ local t = table.copy(fontdata)
+ local d = statistics.luastate_bytes-s
+ texio.write_nl(string.format("table memory footprint: %s",d))
+It turned out that a Regular Latin Modern font (\OPENTYPE) takes around
+800 KB. However, more interesting was that by adding this snippet of testcode
+which duplicted the table in order to measure its size, the total memory footprint
+dropped to 100 MB (about the amount used on Taco's machine). This demonstrates
+that one should be very careful with drawing conclusions.
+Because fonts are rather important in \TEX\ and because there can be lots of
+them used, it makes sense to keep an eye on memory as well as performance.
+Because many manipulations now take place in \LUA, it no longer makes sense
+to let \TEX\ buffer fonts. In plain \TEX\ one finds these magic
+lines. The second definitions obscures the first, but the \type {cmr10} stays
+\font\one=cmr10 at 10pt
+\font\two=cmr10 at 10pt
+These two definitions make \TEX\ load the font only once. However, since
+we can now delegate loading to \LUA, \TEX\ no longer helps us there. For instance,
+\TEX\ has no knowledge to what extend this \type {cmr10} font has been manipulated
+and therefore both instances may actually differ.
+When you use a callback to define the font, \TEX\ passes a font id number. You can
+use this number as a reference to a loaded font (that is, passed to \TEX). If
+instead of a table, you return a number, \TEX\ will reuse the already loaded font.
+This feature can save you a lot of time, especially when a macro package (like
+\CONTEXT) defines fonts dynamically which means that when grouping is used, fonts
+get (re)defined a lot. Of course additional caching can take place at the \LUA\ end,
+but there one needs to take into account more than just the scaled instance. Think of
+\OPENTYPE\ features or virtual font properties. The following are quite certainly
+different setups, in spite of the common size.
+\font\one=lmr10@demo-1 at 10pt
+\font\two=lmr10@demo-2 at 10pt
+When scaling a font, one not only needs to handle the regular glyph dimensions, but also the
+kerning tables. We found out that dealing with such issues takes some 25\% of the time
+spent on loading Latin Modern fonts that have rather extensive kerning tables.
+When creating a virtual font, copying glyph tables may happen a lot. Deep copying
+tables takes a bit of time. This is one of the reasons why we discussed (and consider)
+some dedicated support functions so that copying and recalculating tables happens faster
+(less costly hash lookups and such). On the other hand, the time wasted on calculations
+(including rounding to scaled points) can be neglected.
+The following table shows what happens when we enforce a different
+garbage collecting scheme. This test was triggered by another experiment
+where at regular time, for instance after a pag eis shipped out, say
+However, such a complete sweep has drastic consequences for the runtime.
+But, since the memory footprint becomes 10--15\% less by doing so, we
+played a bit with
+collectgarbage("setstepmul", somenumber)
+When processing a not so large file but one that loads a bunch of open type
+fonts, we get the following values. The left set is on linux (Taco's machine)
+and the right set in mine.
+\NC \bf stepmul \NC \bf run (s) \NC \bf mem (MB) \NC \bf run (s) \NC \bf mem (MB) \NC \NR
+\NC 200 \NC 1.58 \NC 69.14 \NC 5.6 \NC 84.17 \NC \NR
+\NC 1000 \NC 1.63 \NC 69.14 \NC 6.5 \NC 72.32 \NC \NR
+\NC 2000 \NC 1.64 \NC 60.66 \NC 6.8 \NC 73.53 \NC \NR
+\NC 10000 \NC 1.71 \NC 59.94 \NC 7.0 \NC 72.30 \NC \NR
+Since I use an old laptop running Windows with a probably
+different \TEX\ configuration (fonts), and under some load, both columns
+don't compare well, but the general idea is the same. For practical usage
+a value of 1000 is probably best, especially because memory intensive font
+and script loading only happens at the first couple of pages.