doc/context/sources/general/manuals/mk/mk-fonts.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841

% language=uk

\usemodule[virtual]

\startcomponent mk-fonts

\environment mk-environment

\chapter{A fresh look at fonts}

\subject{readers}

Now that we have the file system, \LUA\ script integration, input
encoding and basic logging in place, we have arrived at fonts.
Although today \OPENTYPE\ fonts are the fashion, we still need to
deal with \TEX's native font machinery. Although Latin Modern and
the \TEX\ Gyre collection will bring us many free \OPENTYPE\
fonts, we can be sure that for a long time \TYPEONE\ variants will
be used as well, and when one has lots of bought fonts, replacing
them with \OPENTYPE\ updates is not always an option. And so,
reimplementing the readers for \TEX\ Font Metrics (\type {tfm}
files) and Virtual Fonts (\type {vf} files), was the first step.

Because \ALEPH\ font handling was integrated already, Taco decided
to combine the \TFM\ and \OFM\ readers into a new one. The
combined loader is written in C and produces tables that are
accessible from within \LUA. A problem is that once a font is
used, one cannot simply change its metrics. So, we have to make
sure that we apply changes before a font is actually used:

\starttyping
\font\test=texnansi-lmr at 31.415 pt
\test Yet another nice Kate Bush song: Pi
\stoptyping

In this example, any change to the fontmetrics has to be done before
\type {test} is invoked. For this purpose the \type {define_font}
callback is provided. Below you see an experimental overload:

\starttyping
callback.register("define_font", function (name,area,size)
    return fonts.patches.process(font.read_tfm(name,size))
end )
\stoptyping

The \type {fonts.patched.process} function (currently in \CONTEXT\
\MKIV) implements a mechanism for tweaking the font parameters in
between. In order to get an idea of further features we played a
bit with ligature replacement, character spacing, kern tweaking
etc. Think of such a function (or a chain of functions) doing
things similar to:

\starttyping
callback.register("define_font", function (name,area,size)
    local tfmblob = font.read_tfm(name,size) -- build in loader
    tfmblob.characters[string.byte("f")].ligatures = nil
    return tfmblob -- datastructure that TeX will use internally
end )
\stoptyping

Of course the above definition is not complete, if only because we
need to handle chained ligatures as well (fl followed by i).

In practice we prefer a more abstract interface (at the macro
level) but the idea stays the same. Interesting is that having
access to the internals this way already makes our \TEX\ Live more
interesting. (We cannot demonstrate this trickery here because
when this document is processed you cannot be sure if the
experimental interface is still in place.)

When playing with this we ran into problems with file searching.
When performing the backend role, \LUATEX\ will look in the \TEX\
tree if there is a corresponding virtual file. It took a while and
a bit of tracing (which is not that hard in the \LUA\ based
reader) to figure out that the omega related path definitions in
\type {texmf.cnf} files were not correct, something that went
unnoticed because omega never had a backend integrated and the
\DVI\ processors did multiple searches to get around this.

Currently, if you want to enable extensive tracing of file
searching and loading, you can set an environment variable:

\starttyping
MTX.INPUT.TRACE=3
\stoptyping

This will produce a lot of information about what file is asked
for, what types (tex, font, etc) determines the search, along what
paths is being searched, what readers and locators are used (file,
zip, protocol), etc.

\subject{AFM}

While Taco implemented the virtual font reader |<|eventually its
data will be merged with the \TFM\ table|>| I started playing with
constructing \TFM\ tables directly. Because \CONTEXT\ has a rather
systematic naming scheme, we can rather easily see which encoding
we are dealing with. This means that in principle we can throw all
encoded \TFM\ files out of our tree and construct the tables using
the \AFM\ file and an encoding vector.

It took us a good day to figure out the details, but in the end we
were able to trick \LUATEX\ into using \AFM\ files. With a bit of
internal caching it was even reasonable fast. When the basic
conversion mechanism was written we tried to compare the results
with existing \TFM\ metrics as generated by \type {afm2tfm} and
\type {afm2pl}. Doing so was less trivial than we first thought.
To mention a few aspects:

\startitemize[packed]
\item heights and depths have a limited number of values in \TEX
\item we need to convert to \TEX's scaled points
\item rounding errors of one scaled point occur
\item \type {afm2tfm} can only add kerns when virtual fonts are used
\item \type {afm2tfm} adds some extra ligatures and also does some
      kern magic
\item \type {afm2pl} adds even more kerns
\item the tools remove kern pars between digits
\stopitemize

In this perspective we need not be too picky on what exactly a
ligature is. An example of a ligature is \type {fi} and such a
character can be in the font. In the \TFM\ file, the definition of
\type {f} contains information about what to do when it's followed
by an \type {i}: it has to insert a reference (character number)
pointing to the fi glyph.

However, because \TEX\ was written in \ASCII\ time space, there
was a problem of how to get access to for instance the Spanish
quotation and exclamation marks. Here the ligature mechanism
available in the \TFM\ format was misused in the sense that a
combination of \type {exclam} and \type {quoteleft} becomes \type
{exclamdown}. In a similar fashion will two single quotes become a
double quote. And every \TEX ie knows that multiple hyphens
combine into -- (endash) and --- (emdash), where the later one is
achieved by defining a ligature between an endash and a hyphen.

Of course we have to deal with conversions from \AFM\ units (1000
per em) to \TEX's scaled points. Such conversions may be sensitive
for rounding errors. Because we noticed differences of one scaled
point, I tried several strategies to get the results consistent
but so far I didn't manage to find out where these differences
come from. Rounding errors seem to be rather random and I have no
clue what strategy the regular converters follow. Another fuzzy
area are the font parameters (visible as font dimensions for
users): I wonder how many users really know what values are used
and why.

You may wonder to what extend this rounding problem will influence
consistent typesetting. We have no reason to assume that the
rounding error is operating system dependent. This leaves the
different methods used and personally I have no problems with the
direct reader being not 100\% compatible with the regular tools.
First of all it's an illusion to think that \TEX\ distributions
are stable over the years. Fonts and conversion tools are being
updated every now and then, and metrics change over time (apart
from Computer Modern which is stable by definition). Also, pattern
file are updated, so paragraphs may be broken into lines different
anyway. If you really want stability, then you need to store the
fonts and patterns with your document.

As we already mentioned, the regular converter programs add kerns
as well. Treating common glyph shapes similar is not uncommon in
\CONTEXT\ so I decided to provide methods for adding \quote
{missing} kerns. For example, with regards to kerning, we can
treat \type {eacute} the same way as an~\type {e}. Some ligatures,
like \type {ae} or \type {fi}, need to be seen from two sides:
when looked at from the left side they resemble an \type {a} and
\type {f}, but when kerned at their right, they are to be treated
as \type {e} and \type {i}.

So, when all this is taken care of, we will have a reasonable
robust and compatible way to deal with \AFM\ files and when this
variant is enabled, we can prune our \TEX\ trees pretty well.
Also, now that we have font related tables, we can start moving
tables built out of \TEX\ macros (think of protruding and hz) to
\LUA, which will not only save us much hash entries but also
permits us faster implementations.

The question may arise why there is no hard coded \AFM\ reader.
Although some speed up can be achieved by reading the table with
\AFM\ data directly, there would still be the issue of making that
table accessible for manipulations as described (costs time too).
The \AFM\ format is human readable contrary to the \TFM\ format
and therefore they can conveniently be processed by \LUA. Also,
the possible manipulations may differ per macro package, user, and
even documents. The changes of users and developers reaching an
agreement about such issues is near zero. By writing the reader in
\LUA, a macro package writer can also implement caching mechanisms
that suits the package. Also, keep in mind that we often only need
to load about four \AFM\ files or a few more when we mix fonts.

In my main tree (regular distributions) there are some 350 files
in \type {texnansi} encoding that take over 2~MByte. My personal
font tree has over a thousand such entries which means that we can
prune the tree considerably when we use the \AFM\ loader. Why
bother about \TFM\ when \AFM\ can do the job.

In order to reduce the overhead in reading the \AFM\ file, we now
use external caching, which (in \CONTEXT\ \MKIV) boils down to
serializing the internal \AFM\ tables and compiling them to
bytecode. As a result, the runtime becomes comparable to a run
using regular \TFM\ files. On this document usign the \AFM\ reader
(cached) takes some .3 seconds more on 8 seconds total (28 pages
in Optima Nova with a couple of graphics).

While we were playing with this, Hermann Zapf surprised me by
sending me a \CD\ with his marvelous new Palatino Sans. So,
instead of generating \TFM\ metrics, I decided to use \type
{ttf2afm} to generate me an \AFM\ file from the \TRUETYPE\ files
and use these metrics. It worked right out of the box which means
that one can copy a set of font files directly from the source to
the tree. In a demo document the Palatino Sans came out quite well
and so we will use this font to explore the upcoming Open Type
features.

Because we now have less font resources (only two files per font)
we decided to get away from the spread||all||over||the||tree
paradigm. For this we introduced

\starttyping
../fonts/data/vendor/collection
\stoptyping

like:

\starttyping
../fonts/data/tex/latin-modern
../fonts/data/tex-gyre/bonum
../fonts/data/linotype/optima-nova
../fonts/data/linotype/palatino-nova
../fonts/data/linotype/palatino-sans
\stoptyping

Of course one needs to adapt the related font paths in the
configuration files but getting that done in tex distributions is
another story.

\subject{map files}

Reading an \AFM\ file is only part of the game. Because we bypass
the regular \TFM\ reader we may internally end up with different
names of fonts (and|/|or files). This also means that the map
files that map an internal name onto an font (outline) file may be
of no use. The map file also specifies the encoding file which
maps character numbers onto names used in font files.

The map file maps a font name to a (preferable outline) font
resource file. This can be a file with suffix \type {pfb}, \type
{ttf}, \type {otf} or alike. When we convert am \AFM\ file into a
more suitable format, we also store the associated (outline)
filename, that we use later when we assemble the map line data (we
use \type {\pdfmapline} to tell \LUATEX\ how to prepare and embed
a file.

Eventually \LUATEX\ will take care of all these issues itself
thereby rendering map files and encoding files kind of useless.
When loading an \AFM\ file we already have to read encoding files,
so we have all the information available that normally goes into
the map file. While conducting experiments with reading \AFM\
files, we therefore could use the \type {\pdfmapline} primitive to
push the right entries into font inclusion machinery. Because
\CONTEXT\ already handles map data itself we could easily hook
this into the normal handlers for that. (There are some nasty
synchronization issues involved in handling map entries in general
but we will not bother you with that now).

Although eventually we may get rid of map files, we also used the
general map file handling in \CONTEXT\ as a playground for the
\XML\ handler that we wrote in \LUA. Playing with many map files
(a few KBytes) coded in \XML\ format, or with one big map file
(easily 800 MBytes) makes a good test case for loading and dumping

But why bother too much about map files in \LUATEX\ \unknown\ they
will go away anyway.

\subject{OTF \& TTF}

One of the reasons for starting the \LUATEX\ development was that we wanted to
be able to use \OPENTYPE\ (and \TRUETYPE) fonts in \PDFTEX. As a prelude (and kind of
transition) we first dealt with \TYPEONE\ using either \TFM\ or \AFM. For \TEX\ it does
not really matter what font is used, it only deals with dimensions and generic
characteristics. Of course, when fonts offer more advanced possibilities, we may
need more features in the \TEX\ kernel, but think of \HZ\ or protruding as provided
by \PDFTEX: it's not part of the font (specification) but of the engine. The same
is actually true for kerning and ligature building, although here the font (data) may
provide the information needed to deal with it properly.

\OPENTYPE\ fonts come with features. Examples of features are using oldstyle figures or
tabular digits instead of the default ones. Dealing with such issues boils down to
replacing one character representation by another or treating combinations of character
in the input differently depending on the circumstances. There can be relationships
between languages and scripts, but, as \TEX ies know, other relationships exist as well,
for instance between content and visualization.

Therefore, it will be no surprise that \LUATEX\ does not simply implement the \OPENTYPE\
specification as such. On the one hand it implements a way to load information stored
in the font, on the other hand it implements mechanisms to fullfil the demands of such
fonts and more. The glue between both is done with \LUA. In the simple case of ligatures
and kerns this goes as follows. A user (or macropackage) specified a font, and this
call can be intercepted using a callback. This callback can use a built in function that
loads an \OTF\ or \TTF\ font. From this table, a font table is constructed that is passed
on to \TEX. The construction
may involve building ligature and kerning tables using the information present
in the font file, but it may as well mean more. So, given a bare \LUATEX\ system,
\OPENTYPE\ font support is not giving you automatically handling of features, or more
precisely, there is no hard coded support for features.

This may sound as a disadvantage
but as soon as you start looking at how \TEX\ users use their system (in most cases
by using a macro package) you may understand that flexibility is larger this way. Instead
of adding more and more control and exceptions, and thereby making the kernel more
instable and complex, we delegate control to the macro package. The advantage is that
there are no (everlasting) discussions on how to deal with things and in the end the
user will use a high level interface anyway. Of course the macro package needs proper
access to the font's internals, but this is provided: the code used for reading in the
data comes from FontForge (an advanced font editor) and is presented via \LUA\ tables
in a well organized way.

Given that users expect \OPENTYPE\ features to be supported, how do we provide an
interface. In \CONTEXT\ the user interface has always be an important aspect and
consistency is a priority. On the other hand, there has been the tradition of specifying
the size explicity and a new custom introduced by \XETEX\ to enhance fontname
with directives. Traditional \TEX\ provides:

\starttyping
\font \name filename [optional size]
\stoptyping

\XETEX\ accepts

\starttyping
\font \name "fontname[:optional features]" [optional size]
\font \name  fontname[:optional features]  [optional size]
\stoptyping

Instead of a fontname one can pass a filename between square brackets. \LUATEX\
handles:

\starttyping
\font \name  anything  [optional size]
\font \name {anything} [optional size]
\stoptyping

where anything as well as the size are passed on to the callback.

This permits us to implement a traditional specification, support \XETEX\ like
definitions, and easily pass information from a macro package down to the
callback as well. Interpreting anything is done in \LUA.

While implementing the \LUA\ side of the loader we took a similar approach
as the \AFM\ reader and cached intermediate tables as well as keep track
of font names (in addition to filenames). In order to be able to quickly
determine the (internal) font name of an \OPENTYPE\ font, special loader
functions are provided.

The size is kind of special, because we can have specifications like

\starttyping
at 10pt
at 3ex
at \dimexpr\bodyfontsize+1pt\relax
\stoptyping

This means that we need to handle that on the \TEX\ side and pass the
calculated value to the callback.

Virtual fonts have a rather special nature. They permit you to define variations
of fonts using other fonts and special (\DVI\ related) operators. However, from the
perspective of \TEX\ itself they don't exist at all. When you create a virtual font
you also end up with a \TFM\ file and \TEX\ only needs this file, which defined
characters in terms of a width, height, depth and italic correction as well as
associates characters with kerning pairs and ligatures. \TEX\ leaves it to the
backend to deal the actual glyphs and therefore the backend will be confronted
by the internals of a virtual font. Because \PDFTEX\ and therefore \LUATEX\ has the
backend built in, it is capable of handling virtual fonts information.

In \LUATEX\ you can build your own virtual font and this will suit us well. It
permits us for instance to complete fonts that lack certain characters (glyphs) and
thereby let us get rid of ugly macro based fallback trickery. Although in \CONTEXT\
we will provide a high level interface, we will give you a taste of \LUA\ here.

\starttyping
callback.register("define_font", function(name,size)
    if name == "demo" then
        local f = font.read_tfm('texnansi-lmr10',size)
        if f then
            local capscale, digscale = 0.85, 0.75
            f.name, f.type = name, 'virtual'
            f.fonts = {
                { name="texnansi-lmr10" , size=size },
                { name="texnansi-lmss10", size=size*capscale },
                { name="texnansi-lmtt10", size=size*digscale }
            }
            for k,v in pairs(f.characters) do
               local chr = utf.char(k)
               if chr:find("[A-Z]") then
                    v.width = capscale*v.width
                    v.commands = {
                        {"special","pdf: 1 0 0 rg"},
                        {"font",2}, {"char",k},
                        {"special","pdf: 0 g"}
                    }
                elseif chr:find("[0-9]") then
                    v.width  = digscale*v.width
                    v.commands = {
                        {"special","pdf: 0 0 1 rg"},
                        {"font",3}, {"char",k},
                        {"special","pdf: 0 g"}
                    }
                else
                    v.commands = {
                        {"font",1}, {"char",k}
                    }
                end
            end
            return f
        end
    end
    return font.read_tfm(name,size)
end)
\stoptyping

Here we define a virtual font that uses three real fonts and
which font is used depends on the kind of character we're
dealing with (inreal world situations we can best use the \MKIV\ function
that tells what class a character belongs to). The \type {commands}
table determines what glyphs comes out in what way. We use a bit of
literal pdf code to color the special characters but generally color is
not handled at the font level.

This example can be used like:

\starttyping
\font\test=demo \test
Hi there, this is the first (number 1) example of playing with
Virtual Fonts, some neat feature of \TeX, once you have access
to it. For instance, we can misuse it to fill in gaps in fonts.
\stoptyping

During development of this mechanism, we decided to save some redundant
loading by permitting id's in the fonts array:

\starttyping
callback.register("define_font", function(name,size)
    if name == "demo" then
        local f = font.read_tfm('texnansi-lmr10',size)
        if f then
            local id = font.define(f)
            local capscale, digscale = 0.85, 0.75
            f.name, f.type = name, 'virtual'
            f.fonts = {
                { id=id },
                { name="texnansi-lmss10", size=size*capscale },
                { name="texnansi-lmtt10", size=size*digscale }
            }
            for k,v in pairs(f.characters) do
               local chr = utf.char(k)
               if chr:find("[A-Z]") then
                    v.width = capscale*v.width
                    v.commands = {
                        {"special","pdf: 1 0 0 rg"},
                        {"slot",2,k},
                        {"special","pdf: 0 g"}
                    }
                elseif chr:find("[0-9]") then
                    v.width  = digscale*v.width
                    v.commands = {
                        {"special","pdf: 0 0 1 rg"},
                        {"slot",3,k},
                        {"special","pdf: 0 g"}
                    }
                else
                    v.commands = {
                        {"slot",1,k}
                    }
                end
            end
            return f
        end
    end
    return font.read_tfm(name,size)
end)
\stoptyping

Hardwiring fontnames in callbacks this way does not deserve a price and
when possible we will provide better extension interfaces. Anyhow,
in the experimental \CONTEXT\ code we used calls like this, where
\type {demo} is an installed feature.

\startbuffer
\font\myfont = special@demo-1 at 12pt \myfont
Hi there, this is the first (number 1) example of playing with Virtual Fonts,
some neat feature of \TeX, once you have access to it. For instance, we can
misuse it to fill in gaps in fonts.
\stopbuffer

\typebuffer \start \getbuffer \par \stop

Keep in mind that this is just an example. In practice we will not do such things
at the font level but by manipulating \TEX's internals.

While developing this functionality and especially when Taco was
programming the backend functionality, we used more sane \MKIV\ code. Think
of (still \LUA) definitions like:

\startbuffer
\ctxlua {
    fonts.definers.methods.install("weird", {
        { "copy-range",     "lmroman10-regular"                      } ,
        { "copy-char",      "lmroman10-regular",          65,     66 } ,
        { "copy-range",     "lmsans10-regular",       0x0100, 0x01FF } ,
        { "copy-range",     "lmtypewriter10-regular", 0x0200, 0xFF00 } ,
        { "fallback-range", "lmtypewriter10-regular", 0x0000, 0x0200 }
    })
}
\stopbuffer

\typebuffer \getbuffer

Again, this is not the final user interface, but it shows the
direction we're heading. The result looks like:

\startbuffer
\font\test={myfont@weird} at 12pt \test
\eacute \rcaron \adoublegrave \char65
\stopbuffer

\typebuffer

This shows up as:

\start \getbuffer \stop

Here the \type {@} tells the (new) \CONTEXT\ font handler what constructor
should be used.

Because some testers already have \XETEX\ font support files, we
also support a \XETEX\ like definition syntax.

\startbuffer
\font\test={lmroman10-regular:dlig;liga}\test
f i fi ffi \crlf
f i f\kern0pti f\kern0ptf\kern0pti \crlf
\char64259 \space\char64256 \char105 \space \char102\char102\char105
\stopbuffer

\typebuffer

This gives:

\start \getbuffer \stop

We are quite tolerant with regards to this specification and will provide less
dense methods as well. Of course we need to implement a whole bunch of
features but we will do this in such a way that we give users full control.

\subject{encodings}

By now we've reached a stage where we can get rid of font encodings. We now
have the full unicode range available and no longer depend on the font
encoding when we hyphenate. In a previous chapter we discussed the difference
in size between formats.

\starttabulate[|c|c|c|c|c|]
\NC \bf date   \NC \bf luatex \NC \bf pdftex \NC \NR
\NC 2006-10-23 \NC 3 135 568  \NC 7 095 775  \NC \NR
\NC 2007-02-18 \NC 3 373 206  \NC 7 426 451  \NC \NR
\NC 2007-02-19 \NC 3 060 103  \NC 7 426 451  \NC \NR
\stoptabulate

The size of the formats has grown a bit due to a few more
patterns and a extra preloaded encoding. But the \LUATEX\
format shrinks some 10\% now that we can get rid of encoding
support. Some support for encodings is still present, so that
one can keep using the metric files that are installed (for
instance in project related trees that have special fonts)
although \AFM/\TYPEONE\ files or \OPENTYPE\ fonts will be used when
available.

A couple of years from now, we may throw away some \LUA\ code
related to encodings.

\subject{files}

\TEX\ distributions tend to be rather large, both in terms of
files and bytes. Fonts take most of the space. The merged
\TEX Live 2007 trees contain some 60.000 files that take
1.123 MBytes. Of this, 25.000 files concern fonts totaling
to 431 MBytes. A recent \CONTEXT\ distribution spans 1200 files and
20 MBytes and a bit more when third party modules are taken into
account. The fonts in \TEX Live are distributed as follows:

\starttabulate[|l|r|r|r|r|]
\HL
\NC \bf format \NC \bf files \NC \bf bytes \NC     \NC            \NC \NR
\HL
\NC AFM      \NC  1.769 \NC 123.068.970 \NC    443 \NC 22.290.132 \NC \NR
\NC TFM      \NC 10.613 \NC  44.915.448 \NC  2.346 \NC  8.028.920 \NC \NR
\NC VF       \NC  3.798 \NC   6.322.343 \NC    861 \NC  1.391.684 \NC \NR
\NC TYPE1    \NC  2.904 \NC 180.567.337 \NC    456 \NC 18.375.045 \NC \NR
\NC TRUETYPE \NC     22 \NC   1.494.943 \NC        \NC            \NC \NR
\NC OPENTYPE \NC    144 \NC  17.571.732 \NC        \NC            \NC \NR
\NC ENC      \NC    268 \NC     782.680 \NC        \NC            \NC \NR
\NC MAP      \NC    406 \NC   6.098.982 \NC    110 \NC    129.135 \NC \NR
\NC OFM      \NC     39 \NC  10.309.792 \NC        \NC            \NC \NR
\NC OVF      \NC     39 \NC     413.352 \NC        \NC            \NC \NR
\NC OVP      \NC     22 \NC   2.698.027 \NC        \NC            \NC \NR
\NC SOURCE   \NC  4.736 \NC  25.932.413 \NC        \NC            \NC \NR
\HL
\stoptabulate

We omitted the more obscure file types. The last two columns show the
numbers for one of my local font trees.

In due time we will see a shift from \TYPEONE\ to \OPENTYPE\ and \TRUETYPE\
files and because these fonts are more
complete, they may take some more space. More important is that the \TEX\ specific
font metric files will phase out and the less \TYPEONE\ fonts we have, the less \AFM\
companions we need (\AFM\ files are not compressed and therefore relatively
large). Mapping and encoding files can also go away.

In \LUATEX\ we can do with less files, but the number of bytes may grow a bit
depending on how much is catched (especially fonts). Anyhow, we can safely
assume that a \LUATEX\ based distributions will carry less files and less
bytes around.

\subject{fallbacks}

Do we need virtual fonts? Currently in \CONTEXT, when a font encoding is chosen, a
fallback mechanism steps in as soon as a character is not in the encoding. So far,
so good. But occasionally we run into a font that does not (completely) fits an
encoding and we end up with defining a non standard one. In traditional \TEX\
a side effects of font encodings is that they relate to hyphenation. \CONTEXT\ can
deal with that comfortably and multiple instances of the same set of hyphenation
patterns can be loaded, but for custom encodings this is kind of cumbersome.

In \LUATEX\ we have just one font encoding: \UNICODE. When \OPENTYPE\ fonts are used,
we don't expect many problems related to missing glyphs, but you can bet on it that
they will occur. This is where in \CONTEXT\ \MKIV\ fallbacks will be used and this
will be implemented using vitual fonts. The advantage of using virtual fonts is that
we still deal with proper characters and hyphenation will take place as expected. And
since virtual fonts can be defined on the fly, we can be flexible in our implementation.
We can think of generic fallbacks, not much different than macro based representations,
or font specific ones, where we even may rely on \METAPOST\ for generating the glyph
data.

How do we define a fall back character. When building this mechanism I used the
\quote {\textcent} as an example. A cent symbol is roughly defined as follows:

\starttyping
local t = table.fastcopy(g.characters[0x0063]) -- mkiv function
local s = fonts.constructors.scaled(g.fonts[1].size)    -- mkiv function
t.commands = {
    {"push"},
    {"slot", 1, c},
    {"pop"},
    {"right", .5*t.width},
    {"down",  .2*t.height},
    {"rule", 1.4*t.height, .02*s}
}
t.height = 1.2*t.height
t.depth  = 0.2*t.height
\stoptyping

Here, \type {g} is a loaded font (table) which has type \type {virtual}. The
first font in the \type {fonts} array is the main font. What happens here
is the following: we assign the characteristics of \quote {c} to the cent
symbol (this includes kerning and dimensions) and then define a command
sequence that draws the \quote {c} and a vertical rule through it.

The real code is slightly more complicated because we need to take care of
italic properties when applicable and because we have added some tracing too.
While playing with this kind of things, it becomes clear what features are
handy, and the reason that we now have a virtual command \type {comment} is
that it permits us to implement tracing (using for instance color specials).

\def\TestLine#1%
  {\start
   \font\test=#1\relax
   \test
   c\quad
   \textcent\quad
   \ruledhbox{c}\quad
   \ruledhbox{\textcent}\quad
   \scaron\quad
   \eacute\quad
   \adiaeresis\quad
   \udiaeresis\quad
   \char 465\quad
   \char 463\quad
   \char7685\quad
   \stop
   \blank}

\TestLine {lmroman10-regular@demo-2 at 24pt}
\TestLine {lmroman10-italic@demo-2  at 24pt}

The previous lines are typeset using a similar specification as mentioned
before:

\starttyping
\font\test=lmroman10-regular@demo-2
\stoptyping

Without the fallbacks we get:

\TestLine {lmroman10-regular at 24pt}
\TestLine {lmroman10-italic  at 24pt}

And with normal (non forced fallbacks) it looks as follows. As it happens,
this font has a cent symbol so no fallback is needed.

\TestLine {lmroman10-regular@demo-3 at 24pt}
\TestLine {lmroman10-italic@demo-3  at 24pt}

The font definition callback intercepts the \type {demo-2} and a couple of
chained lua functions make sure that characters missing in the font are
replaced by fallbacks. In the case of missing composed characters, they are
constructed from their components. In this particular example we have told
the handler to assume that all composed characters are missing.

\subject{memory}

Traditional \TEX\ has been designed for speed and a small memory footprint. Todays
implementations are considerably more generous with the amount of memory that
you can use (hash, fonts, main memory, patterns, backend, etc). Depending
on how complicated a document layout it, memory may run into tens of megabytes.

Because \LUATEX\ is not only suitable for wide fonts, but also does away with some of
the optimizations in the \TEX\ code that complicate extensions, it has a larger
footprint that \PDFTEX. When implementing the \OPENTYPE\ font basics, we did quite
some tests with respect to memory usage. Getting the numbers right is non trivial
because the \LUA\ garbage collector is interfering. For instance, on my machine a
test file with the regular \CONTEXT\ setup of of Latin Modern fonts made \LUA\
allocate 130 MB, while the same run on Taco's machine took 100 MB.

When a font data table is constructed, it is handled over to \TEX, and turned into
the internal font data structures. During the construction of that \TABLE\ at the
\LUA\ end, \CONTEXT\ \MKIV\ disables the garbage collector. By doing this, the time
needed to construct and scale a font can be halved. Curious to the amount of memory
involved in passing such a table, I added the following piece of code:

\starttyping
if type(fontdata) == "table" then
    local s = statistics.luastate_bytes
    local t = table.copy(fontdata)
    local d = statistics.luastate_bytes-s
    texio.write_nl(string.format("table memory footprint: %s",d))
end
\stoptyping

It turned out that a Regular Latin Modern font (\OPENTYPE) takes around
800 KB. However, more interesting was that by adding this snippet of testcode
which duplicted the table in order to measure its size, the total memory footprint
dropped to 100 MB (about the amount used on Taco's machine). This demonstrates
that one should be very careful with drawing conclusions.

Because fonts are rather important in \TEX\ and because there can be lots of
them used, it makes sense to keep an eye on memory as well as performance.
Because many manipulations now take place in \LUA, it no longer makes sense
to let \TEX\ buffer fonts. In plain \TEX\ one finds these magic

\starttyping
\font\preloaded=cmr10
\font\preloaded=cmr12
\stoptyping

lines. The second definitions obscures the first, but the \type {cmr10} stays
loaded.

\starttyping
\font\one=cmr10 at 10pt
\font\two=cmr10 at 10pt
\stoptyping

These two definitions make \TEX\ load the font only once. However, since
we can now delegate loading to \LUA, \TEX\ no longer helps us there. For instance,
\TEX\ has no knowledge to what extend this \type {cmr10} font has been manipulated
and therefore both instances may actually differ.

When you use a callback to define the font, \TEX\ passes a font id number. You can
use this number as a reference to a loaded font (that is, passed to \TEX). If
instead of a table, you return a number, \TEX\ will reuse the already loaded font.
This feature can save you a lot of time, especially when a macro package (like
\CONTEXT) defines  fonts dynamically which means that when grouping is used, fonts
get (re)defined a lot. Of course additional caching can take place at the \LUA\ end,
but there one needs to take into account more than just the scaled instance. Think of
\OPENTYPE\ features or virtual font properties. The following are quite certainly
different setups, in spite of the common size.

\starttyping
\font\one=lmr10@demo-1 at 10pt
\font\two=lmr10@demo-2 at 10pt
\stoptyping

When scaling a font, one not only needs to handle the regular glyph dimensions, but also the
kerning tables. We found out that dealing with such issues takes some 25\% of the time
spent on loading Latin Modern fonts that have rather extensive kerning tables.
When creating a virtual font, copying glyph tables may happen a lot. Deep copying
tables takes a bit of time. This is one of the reasons why we discussed (and consider)
some dedicated support functions so that copying and recalculating tables happens faster
(less costly hash lookups and such).  On the other hand, the time wasted on calculations
(including rounding to scaled points) can be neglected.

The following table shows what happens when we enforce a different
garbage collecting scheme. This test was triggered by another experiment
where at regular time, for instance after a pag eis shipped out, say

\starttyping
collectgarbage("collect")
\stoptyping

However, such a complete sweep has drastic consequences for the runtime.
But, since the memory footprint becomes 10--15\% less by doing so, we
played a bit with

\starttyping
collectgarbage("setstepmul", somenumber)
\stoptyping

When processing a not so large file but one that loads a bunch of open type
fonts, we get the following values. The left set is on linux (Taco's machine)
and the right set in mine.

\starttabulate[|r|r|r|r|r|]
\NC \bf stepmul \NC \bf run (s) \NC \bf mem (MB) \NC \bf run (s) \NC \bf mem (MB) \NC \NR
\HL
\NC     200 \NC 1.58    \NC 69.14    \NC 5.6     \NC 84.17     \NC \NR
\NC    1000 \NC 1.63    \NC 69.14    \NC 6.5     \NC 72.32     \NC \NR
\NC    2000 \NC 1.64    \NC 60.66    \NC 6.8     \NC 73.53     \NC \NR
\NC   10000 \NC 1.71    \NC 59.94    \NC 7.0     \NC 72.30     \NC \NR
\stoptabulate

Since I use an old laptop running Windows with a probably
different \TEX\ configuration (fonts), and under some load, both columns
don't compare well, but the general idea is the same. For practical usage
a value of 1000 is probably best, especially because memory intensive font
and script loading only happens at the first couple of pages.

\stopcomponent