1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
|
% language=uk
\startcomponent fonts-formats
\environment fonts-environment
\startchapter[title=Font formats][color=darkred]
\startsection[title=Introduction]
In this chapter the font formats as we know them will be introduced. The
descriptions will be rather general but more details can be found in the
appendix. Although in \MKIV\ we do support all these types eventually the focus
will be on \OPENTYPE\ fonts but it does not hurt to see where we are coming from.
\stopsection
\startsection[title=Glyphs]
A typeset text is mostly a sequence of characters turned into glyphs. We talk of
characters when you input the text, but the visualization involves glyphs. When
you copy a part of the screen in an open \PDF\ document or \HTML\ page back to
your editor you end up with characters again. In case you wonder why we make this
distinction between these two states we give an example.
\startplacefigure [location=here,reference=fig:character-glyph,title=From characters to glyphs.]
\startcombination
{\color[maincolor]{\definedfont[Serif*default at 30pt]affiliation}} {upright}
{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]affiliation}} {italic}
\stopcombination
\stopplacefigure
We see here that the shape of the \type {a} is different for an upright serif and
an italic. We also see that in \type {ffi} there is no dot on the \type {i}. The
first case is just a stylistic one but the second one, called a ligature, is
actually one shape. The 11 characters are converted into 9 glyphs. Hopefully the
final document format carries some extra information about this transformation so
that a cut and paste will work out well. In \PDF\ files this is normally the
case. In this document we will not be too picky about the distinction as in most
cases the glyph is rather related to the character as one knows it.
So, a font contains glyphs and it also carries some information about
replacements. In addition to that there needs to be at least some information
about the dimensions of them. Actually, a typesetting engine does not have to
know anything about the actual shape at all.
\startplacefigure [location=here,reference=fig:glyph-dimension-normal,title=The boundingbox of some normal glyphs.]
\startcombination[9*1]
{\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]a}}} {}
{\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]b}}} {}
{\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]g}}} {}
{\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]l}}} {}
{\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]q}}} {}
{\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt].}}} {}
{\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt];}}} {}
{\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]?}}} {}
{\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]ffi}}} {}
\stopcombination
\stopplacefigure
\startplacefigure [location=here,reference=fig:glyph-dimension-italic,title=The boundingbox of some italic glyphs.]
\startcombination[9*1]
{\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]a}}} {}
{\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]b}}} {}
{\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]g}}} {}
{\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]l}}} {}
{\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]q}}} {}
{\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt].}}} {}
{\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt];}}} {}
{\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]?}}} {}
{\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]ffi}}} {}
\stopcombination
\stopplacefigure
The rectangles around the shapes \in {figure} [fig:glyph-dimension-normal] and \in
{figure} [fig:glyph-dimension-italic] are called boundingbox. The dashed line
reflects the baseline where they eventually are aligned onto next to each other.
The amount above the baseline is called height, and below is called depth. The
piece of the shape above the baseline is the ascender and the bit below the
descender. The width of the bounding box is not by definition the width of the
glyph. In \TYPEONE\ and \OPENTYPE\ fonts each shape has a so called advance width
and that is the one that will be used.
\usemodule[fnt-40]
\startplacefigure [location=here,reference=fig:glyph-kerns,title={Kerning in Latin Roman, Cambria, Pagella and Dejavu.}]
\scale[width=\textwidth]{\startcombination[1*4]
{\color[maincolor]{\definedfont[name:lmroman10-regular*default sa 1]\ShowKernedHBox{Very often glyphs get very small spaces inserted horizontally.}}} {}
{\color[maincolor]{\definedfont[name:cambria*default sa 1]\ShowKernedHBox{Very often glyphs get very small spaces inserted horizontally.}}} {}
{\color[maincolor]{\definedfont[name:texgyrepagellaregular*default sa 1]\ShowKernedHBox{Very often glyphs get very small spaces inserted horizontally.}}} {}
{\color[maincolor]{\definedfont[name:dejavuserif*default sa 0.9]\ShowKernedHBox{Very often glyphs get very small spaces inserted horizontally.}}} {}
\stopcombination}
\stopplacefigure
Another traditional property of a font is kerning. In \in {figure}
[fig:glyph-kerns] you see this in action. These examples
demonstrate that not all fonts need (or provide) the same kerns
(in points).
So, as a start, we have now met a couple of properties of a font.
They can be summarized as follows:
\starttabulate[|l|p|]
\NC mapping to glyphs \EQ characters are represented by a shapes that have recognizable
properties so that readers know what they mean \NC \NR
\NC ligature building \EQ a sequence of characters gets mapped onto one glyph \NC \NR
\NC dimensions \EQ each glyph has a width, height and depth \NC \NR
\NC inter-glyph kerning \EQ optionally a bit of positive or negative space has to be inserted between glyphs \NC \NR
%NC italic correction \EQ a correction is applied between an oblique shape and what follows \NC \NR
\stoptabulate
Regular font kerning is hardly noticeable and improves the overall look of the
page. Typesetting applications sometimes are capable of inserting additional
spaces between shapes. This more excessive kerning is not that much related to
the font and is used for special purposes, like making a snippet of text stand
out. In \CONTEXT\ this kind of kerning is available but it is a font independent
feature. Keep in mind that when applying that kind of rather visible kerning
you'd better not have ligatures and fancy replacements enabled as \CONTEXT\
already tries to deal with that as good as possible.
\stopsection
\startsection[title=The basic process]
In \TEX\ a font is an abstraction: the engine only needs to know about the
mapping from characters to glyphs, what the width, height and depth is, what
sequences need to be translated into ligatures and when kerning has to be
applied. If for the moment we forget about math, these are all the properties
that matter and this is what the \TEX\ font metric files that we see in the next
section provide.
Because one of the principles behind \LUATEX\ is that the core engine (the
binary) stays small and that new functionality is provided in \LUA\ code, the
font subsystem largely looks like it always has been. As users will normally use
a macro package most of the loading will be hidden from the user. It is however
good to give a quick overview of how for instance \PDFTEX\ deals with fonts using
traditional metric files.
\startFLOWchart[pdftex]
\startFLOWcell
\name {source}
\location {1,1}
\shape {action}
\text {input}
\connection [rl] {parser}
\stopFLOWcell
\startFLOWcell
\name {parser}
\location {2,1}
\shape {action}
\text {characters}
\connection [rl] {builder}
\stopFLOWcell
\startFLOWcell
\name {builder}
\location {3,1}
\shape {action}
\text {glyphs}
\connection [rl] {backend}
\stopFLOWcell
\startFLOWcell
\name {backend}
\location {4,1}
\shape {action}
\text {subset}
\stopFLOWcell
\stopFLOWchart
\startplacefigure [location=here,reference=fig:tfm-pdftex,title={Several translation steps in a traditonal \TEX\ flow.}]
\FLOWchart[pdftex]
\stopplacefigure
The input (bytes) gets translated into characters by the input parser. Normally
this is a one|-|to|-|one translation but there are examples of some translation
taking place. You can for instance make characters active and give them a
meaning. So, the eight bit represention of an editors code page \type {ë} can
become something else internally, for instance a regular \type {e} with an \type
{¨} overlayed. It can also become another character, which in the code page
would be shown as \type {á} but the user will not know this as by then this byte
is already tokenized. Another example is multibyte translation, for instance
\UTF\ sequences can get remapped to something that is known internally as being a
character of some kind. The \LUATEX\ engine expects \UTF\ so a macro package has
to make sure that translation to this encoding happens beforehand, for instance
using a callback that intercepts the input from file. \footnote {In \CONTEXT\ we
talk of input regimes and these can be mixed, although in practice most users
will stick to \UTF\ and never use regimes.}
So, the input character (sequence) becomes tokens representing a character. From
these tokens \TEX\ will start building a (linked) node list where each character
becomes a node. In this node there is a reference to the current font. If you
know \TEX\ you will understand that a list can have more than characters: there
can be skips, kerns, rules, references to images, boxes, etc.
At some point \TEX\ will handle this list over to a routine that will turn them
into something that resembles a paragraph or otherwise snippet of text. In that
stage hyphenation kicks in, ligatures get built and kerning is added. Character
references become glyph indices. This list can finally be broken into lines.
It is no secret that \TEX\ can box and unbox material and that after unboxing
some new formatting has to happen. The traditional engine has some optimizations
that demand a partial reconstruction of the original list but in \LUATEX\ we
removed this kind of optimization so there the process is somewhat simpler. We
will see more of that later.
When \TEX\ ships out a page, the backend will load the real font data and merge
that into the final output. It will now take the glyph index and build the right
data structures and references to the real font. As a font gets subset only the
used glyphs end up in the final output.
There is one tricky aspect involved here: re|-|encoding. In so called map files
one can map a specific metric filename onto a real font name. One can also
specify an encoding vector that tells what a given index really refers to. This
makes it possible to use fonts that have more than 256 glyphs and refer to any of
them. This is also the trick that makes it possible to use \TRUETYPE\ fonts in
\PDFTEX: the backend code filters the right glyphs from the font, remapping
\TEX's glyph indices onto real entries in the font happens via the encoding
vector. In \in {figure} [fig:tfm-bytes] we show a possible route for input byte
68.
\startFLOWchart[bytes]
\startFLOWcell
\name {source}
\location {1,1}
\shape {action}
\text {bytes (68)}
\connection [rl] {parser}
\stopFLOWcell
\startFLOWcell
\name {parser}
\location {2,1}
\shape {action}
\text {bytes (31)}
\connection [rl] {builder}
\stopFLOWcell
\startFLOWcell
\name {builder}
\location {3,1}
\shape {action}
\text {index (31)}
\connection [rl] {backend}
\stopFLOWcell
\startFLOWcell
\name {backend}
\location {4,1}
\shape {action}
\text {index (88)}
\stopFLOWcell
\stopFLOWchart
\startplacefigure [location=here,reference=fig:tfm-bytes,title={From bytes to indices.}]
\FLOWchart[bytes]
\stopplacefigure
As \LUATEX\ carries much of the bagage of older engines, you can still do it this
way but in \CONTEXT\ \MKIV\ we have made our live much simpler: we use unicode as
much as possible. This means that we effectively have removed two steps (see \in
{figure} [fig:tfm-luatex]).
\startFLOWchart[luatex]
\startFLOWcell
\name {source}
\location {1,1}
\shape {action}
\text {input}
\connection [rl] {builder}
\stopFLOWcell
\startFLOWcell
\name {builder}
\location {2,1}
\shape {action}
\text {glyphs}
\stopFLOWcell
\stopFLOWchart
\startplacefigure [location=here,reference=fig:tfm-luatex,title={Simplified mapping in \LUATEX.}]
\FLOWchart[luatex]
\stopplacefigure
There is of course still some work to do for the backend, like subsetting, but
the nasty dependency on the input encoding, font encoding (that itself relates to
hyphenation) and backend re|-|encoding is gone. But keep in mind that the
internal data structure of the font is still quite traditional.
Before we move on to font formats I like to point out that there is no space in
\TEX. Spaces in the input are converted into glue, either or not with some
stretch and|/|or shrink. This also means that accessing character 32 in
traditional \TEX\ will not end up as space in the output.
\stopsection
\startsection[title=\TEX\ metrics]
\appendixdata{\in[fontdata:tfm]}
\appendixdata{\in[fontdata:vf]}
Traditional font metrics are packaged in a binary format. Due to the limitations
of that time a font has at most 256 characters. In books dedicated to \TEX\ you
will often find tables that show what glyphs are in a font, so we will not repeat
that here as after all we got rid of that limitation in \LUATEX.
Because 256 is not that much, especially when you mix many scripts and need lots
of symbols from the same font, there are quite some encodings used in traditional
\TEX, like \type {texnansi}, \type {ec} and \type {qx}. When you use \LUATEX\
exclusively you can do with way less font files. This is easier for users,
especially because most of those files were never used anyway. It's interesting
to notice that some of the encodings contain symbols that are never used or used
only once in a document, like the copyright or registered symbols. They are often
accessed by symbolic names and therefore easily could have been omitted and
collected in a dedicated symbol font thereby freeing slots for more useful
characters anyway. The lack of coverage is probably one of the reasons why new
encodings kept popping up. In the next table you see how many files are involved
in Latin Modern which comes in a couple of design sizes. \footnote {The original
Computer Modern fonts have \METAFONT\ source files and (runtime) generated bitmap
files in whatever resolutions are needed for previewing and printing. The
\TYPEONE\ follow|-|up came in several sets, organized by language support. The
Latin Modern fonts have a few more weights and variants than Computer Modern.}
\starttabulate[|l|c|r|r|r|]
\HL
\NC \bf font format \NC \bf type \NC \bf \# files \NC \bf size in bytes \NC \bf \CONTEXT \NC \NR
\HL
\NC type 1 \NC tfm \NC 380 \NC 3.841.708 \NC \NC \NR
\NC \NC afm \NC 25 \NC 2.697.583 \NC \NC \NR
\NC \NC pfb \NC 92 \NC 9.193.082 \NC \NC \NR
\NC \NC enc \NC 15 \NC 37.605 \NC \NC \NR
\NC \NC map \NC 9 \NC 42.040 \NC \NC \NR
\HL[darkgray]
\NC \NC \NC 521 \NC 15.812.018 \NC mkii \NC \NR
\HL
\NC opentype \NC otf \NC 73 \NC 8.224.100 \NC mkiv \NC \NR
\HL
\stoptabulate
A \TFM\ file can contain so called italic corrections. This is an additional kern
that can be added after a character in order to get better spacing between an
italic shape and an upright one. As this is manual work, it's a not that advanced
mechanism, but in addition to width, height, depth, kerns and ligatures it is
nevertheless a useful piece of information. But, it's a rather \TEX\ specific
quantity.
Since \TEX\ showed up many fonts have been added. In addition support for
commercial fonts was provided. In fact, for that to happen, one only needs
accompanying metric files for \TEX\ itself and map files and encoding vectors
for the backend. Because a metric file also has some general information, like
spacing (including stretch and shrink), the ex|-|height and em|-|width, this
means that sometimes guesses must be made when the original font does not come
with such parameters.
At some point virtual fonts were introduced. In a virtual font a \TFM\ file has
an accompanying \VF\ file. In that file each glyph has a specification that tells
where to find the real glyph. It is even possible to construct glyphs from other
glyphs. In traditional \TEX\ this only concerns the backend, which in \PDFTEX\ is
built in. In \LUATEX\ this mechanism is integrated into the frontend which means
that users can construct such virtual fonts themselves. We will see more of that
later, but for now it's enough to know that when we talk about the representation
of font (the \TFM\ table) in \LUATEX, this includes virtual functionality.
An important limitation of \TFM\ files cq.\ traditional \TEX\ is that the number
of depths and heights is limited to 16 each. Although this results in somewhat
inaccurate dimensions in practice this gets unnoticed, if only because many
designs have some consistency in this. On the other hand, it is a limitation when
we start thinking of accents or even multiple accents which lead to many more
distinctive heights and depths.
Concerning ligatures we can remark that there are quite some substitutions
possible although in practice only the multiple to one replacement has been used.
Some fonts that are used in \TEX\ started out as bitmaps but rather soon
\TYPEONE\ outline fonts became the fashion. These are supported using the map
files that we will discuss later. First we look into \TYPEONE\ fonts.
\stopsection
\startsection[title=\TYPEONE]
\appendixdata{\in[fontdata:afm]}
\appendixdata{\in[fontdata:enc]}
\appendixdata{\in[fontdata:map]}
For a long time \TYPEONE\ fonts have dominated the scene. These are \POSTSCRIPT\
fonts that can have more that 256 glyphs in the file that defines the shapes, but
only 256 of them can be used at one time. Of course there can be multiple subsets
active in one document.
In traditional \TEX\ a \TYPEONE\ font is used by making a \TFM\ file from a so
called Adobe metric file (\AFM) that come with such a font. There are several
tool chains for doing this and \CONTEXT\ \MKII\ ships with one that can be of
help when you need to support commercial fonts. Projects like the Latin Modern
Fonts and \TEX\ Gyre have normalized a whole lot of fonts that came in several
more or less complete encodings into a consistent package of \TYPEONE\ fonts.
This already simplified live a lot but still users had to choose a suitable input
and font encoding for their language and|/|or script. As \TEX\ only cares about
metrics and not about the rendering, it doesn't consider \TYPEONE\ fonts as
something special. Also, as \TEX\ and \POSTSCRIPT\ were developed about the same
time support for \TYPEONE\ fonts is rather present in \TEX\ distributions.
You can still follow this route but for \CONTEXT\ \MKIV\ this is no longer the
recommended way because there we have changed the whole subsystem to use
\UNICODE. As a result we no longer use \TFM\ files derived from \AFM\ files, but
directly interpret the \AFM\ data. This not only removes the 256 limitation, but
also brings more resolution in height and depth as we no longer have at most 16
alternatives. There can also be more kerns. Of course we need some heuristics to
determine for instance the spacing but that is not different from former times.
Because most \TEX\ users don't use commercial fonts, they will not notice that
\CONTEXT\ \MKIV\ treats \TYPEONE\ fonts this way. One reason is that the free
fonts also come as wide fonts in \OPENTYPE\ format and whenever possible
\CONTEXT\ prefers \OPENTYPE\ over \TYPEONE\ over \TFM.
In the beginning \LUATEX\ only could load a \TFM\ file, which is why loading
\AFM\ files is implemented in \LUA. Later, when the \OPENTYPE\ loaded was added,
loading \PFB\ and \AFM\ files also became possible but it's slower and we see no
reason to rewrite the current code in \CONTEXT. We also do a couple of extra
things when loading such a file. As more \TYPEONE\ fonts move on to \OPENTYPE\ we
don't expect that much usage anyway.
\stopsection
\startsection[title=\OPENTYPE]
\appendixdata{\in[fontdata:otf]}
When an engine can deal with \UNICODE\ directly it also means that internally it
uses pretty large numbers for storing characters and glyph indices. The first
\TEX\ descendent that went wide was \OMEGA, later replaced by \ALEPH. However, this
engine never took off and still used its own extended \TFM\ format: \OFM. In fact,
as \LUATEX\ uses some of the \ALEPH\ code, it can also use these extended metric
files but I don't think that there are any useful fonts around so we can forget
about this.
We use the term \OPENTYPE\ for a couple of font formats that share the same
principles: \OPENTYPE\ (\OTF), \TRUETYPE\ (\TTF) and \TRUETYPE\ containers
(\TTC). The \LUATEX\ font reader presents them in a similar format. In the case
of a \TRUETYPE\ container, one does not load the whole font but selects an
instance from it. Internally an \OPENTYPE\ font can have the glyphs organized in
subfonts.
The first \TEX\ descendent to really go wide from front to back is \XETEX. This
engine can use \OPENTYPE\ fonts directly and for a whole category of users this
opened up a new world. Hoever, it is still mostly a traditional engine. The
transition from characters to glyphs is accomplished by external libraries, while
in \LUATEX\ we code in \LUA. This has the disadvantage that it is slower
(although that depends on the job) but the advantage is that we have much more
control and can extend the font handler as we like.
An \OPENTYPE\ font is much more complex than a \TYPEONE\ one. Unless it is a
quick and dirty converted existing font, it will have more glyphs to start with.
Quite likely it will have kerns and ligatures too and of course there are
dimensions. However, there is no concept of a depth and height. These need to be
deduced from the bounding box instead. There is an advance width. This means that
we can start right away using such fonts if we map those properties onto the
\TFM\ table that \LUATEX\ expects.
But there is more, take ligatures. In a traditional font the sequence \type {ffi}
always becomes a ligature, given that the font has such a glyph. In \LUATEX\
there is a way to disable this mechanism, which is sometimes handy when dealing
with mono|-|spaced fonts in verbatim. It's pretty hard to disable that. For
instance one option is to insert kerns manually. In an \OPENTYPE\ font ligatures
are collected in a so called feature. There can be many such features and even
kerning is a feature. Other examples are old style numerals, fractions,
superiors, inferiors, historic ligatures and stylistic alternates.
\starttabulate[|lT|l|l|l|l|]
\NC \type{onum} \NC \ruledhbox{\maincolor\DemoOnumLM\char45 1}
\NC \ruledhbox{\maincolor\DemoOnumLM1234567890}
\NC \ruledhbox{\maincolor\DemoOnumLM\char"A2}
\NC \ruledhbox{\maincolor\DemoOnumLM\char"24} \NC \NR
%NC \type{lnum} \NC \ruledhbox{\maincolor\DemoLnumLM\char45 1}
% \NC \ruledhbox{\maincolor\DemoLnumLM1234567890}
% \NC \ruledhbox{\maincolor\DemoLnumLM\char"A2}
% \NC \ruledhbox{\maincolor\DemoLnumLM\char"24} \NC \NR
\NC \type{tnum} \NC \ruledhbox{\maincolor\DemoTnumLM\char45 1}
\NC \ruledhbox{\maincolor\DemoTnumLM1234567890}
\NC \ruledhbox{\maincolor\DemoTnumLM\char"A2}
\NC \ruledhbox{\maincolor\DemoTnumLM\char"24} \NC \NR
\NC \type{pnum} \NC \ruledhbox{\maincolor\DemoPnumLM\char45 1}
\NC \ruledhbox{\maincolor\DemoPnumLM1234567890}
\NC \ruledhbox{\maincolor\DemoPnumLM\char"A2}
\NC \ruledhbox{\maincolor\DemoPnumLM\char"24} \NC \NR
\stoptabulate
To this all you need to add that features operate in two dimensions: languages
and scripts. This means that when ligatures are enabled for Dutch the \type {ij}
sequence becomes a single glyph but for German it gets mapped onto two glyphs.
And, to make it even more complex, a substitution can depend on circumstances,
which means that for Dutch \type {fijn} becomes \type {f ij n} but \type {fiets}
becomes \type {fi ets}. It will be no surprise that not all \OPENTYPE\ fonts come
with a complete and rich repertoire of rules. To make things worse, there can be
rules that turn \type {1/2} into one glyph, or transfer the numbers into superior
and inferior alternatives, but leaves us with an unacceptable rendered \type
{1/a}, given that the \type {frac} features is enabled. It looks like features
like this are to be applied to a manually selected range of characters.
The fact that an \OPENTYPE\ font can contain many features and rules to apply
them makes it possible to typeset scripts like Arabic. And this is where it gets
vague. A generic \OPENTYPE\ sub|-|engine can do clever things using these rules,
but if you read the specification for some scripts additional intelligence has to
be provided by the typesetting engine.
While users no longer have to care about encodings, map files and back|-|end
issues, they do have to carry knowledge about the possibilities and limitations
of features. Even worse, he or she needs to be aware that fonts can have bugs.
Also, as font vendors have no tradition of providing updates this is something
that we might need to take care of ourselves by tweaking the engine.
One of the problems with the transition from \TYPEONE\ to \OPENTYPE\ is that font
designers can take an existing design and start from that basic repertoire of
shapes. If such a design had oldstyle figures only, there is a good chance that
this will be the case in the \OPENTYPE\ variant too. However, such a default
interferes with the fact that the \type {onum} feature is one that we explicitly
have to enable. This means that writing a generic style where a font is later
plugged in becomes somewhat messy if it assumes that features need to be turned
on.
\TEX\ users expect more control, which means that in practice just an \OPENTYPE\
engine is not enough, but for the average font the \TEX\ model using the
traditional approach still is quite acceptable. After all, not all users use
complex scripts or need advanced features. And, in practice most readers don't
notice the difference anyway.
\stopsection
\startsection[title=\LUA]
\appendixdata{\in[fontdata:lua]}
As mentioned support for virtual fonts is built into \LUATEX\ and loading the so
called \VF\ files happens when needed. However, that concerns traditional fonts
that we already covered. In \CONTEXT\ we do use the virtual font mechanism for
creating missing glyphs out of existing ones or add fallbacks when this is not
possible. But this is not related to some kind of font format.
In 2010 and 2011 the first public \OPENTYPE\ math fonts showed up that replace
their \TYPEONE\ originals. In \CONTEXT\ we already went forward and created
virtual \UNICODE\ fonts out of traditional fonts. Of course eventually the
defaults will change to the \OPENTYPE\ alternatives. The specification for such a
virtual font is given in \LUA\ tables and therefore you can consider \LUA\ to be
a font format as well. In \CONTEXT\ such fonts can be defined in so called
goodies files. As we use these files for much more tuning, we come back to that
in a later chapter. In a virtual font you can mix real \TYPEONE\ fonts and real
\OPENTYPE\ fonts using whatever metrics suit best.
An extreme example is the virtual \UNICODE\ Punk font. This font is defined in
the \METAPOST\ language (derived from Don Knuths \METAFONT\ sources) where each
glyph is one graphic. Normally we get \POSTSCRIPT, but in \LUATEX\ we can also
get output in a comparable \LUA\ table. That output is converted to \PDF\
literals that become part of the virtual font definitions and these eventually
end up in the \PDF\ page stream. So, at the \TEX\ end we have regular (virtual)
characters and all \TEX\ needs is their dimensions, but in the \PDF\ each glyph
is shown using drawing operations. Of course the now available \OPENTYPE\ variant
is more efficient, but it demonstrates the possibilities.
\stopsection
\startsection[title=Files]
We summarize these formats in the following table where we explain what the file
suffixes stand for:
\starttabulate[|Tl|p|]
\HL
\NC tfm \NC This is the traditional \TEX\ font metric file format and it reflects
the internal quantities that \TEX\ uses. The internal data structures
(in \LUATEX) are an extension of the \TFM\ format. \NC \NR
\NC vf \NC This file contains information about how to construct and where to
find virtual glyphs and is meant for the backend. With \LUATEX\ this
format gets more known. \NC \NR
\NC pk \NC This is the bitmap format used for the first generation of \TEX\
fonts but the typesetter never deals with them. Bitmap files are more
or less obselete. \NC \NR
\HL
\NC ofm \NC This is the \OMEGA\ variant of the \type {tfm} files that caters for
larger fonts. \NC \NR
\NC ovf \NC This is the \OMEGA\ variant of the \type {vf}. \NC \NR
\HL
\NC pfb \NC In this file we find the glyph data (outlines) and some basic
information about the font, like name|-|to|-|index mappings. A
differently byte|-|encoded variant of this format is \type {pfa}.\NC
\NR
\NC afm \NC This file accompanies the \type {pfb} file and provides additional
metrics, kerns and information about ligatures. A binary variant of
this is the \PFA\ format. For \MSWINDOWS\ there is a variant that has the
\type {pfm} suffix. \NC \NR
\NC map \NC The backend will consult this file for mapping metric file names onto
real font names. \NC \NR
\NC enc \NC The backend will include (and use) this encoding vector to map
internal indices to font indices using glyph names, if needed. \NC
\NR
\HL
\NC otf \NC This binary format describes not only the font in terms of metrics,
features and properties but also contains the shapes. \NC \NR
\NC ttf \NC This is the \MICROSOFT\ variant of \OPENTYPE. \NC \NR
\NC ttc \NC This is the \MICROSOFT\ container format that combines multiple fonts
in one. \NC \NR
\HL
\NC fea \NC A (\FONTFORGE) feature definition file. Such a file can be loaded and
applied to a font. This is no longer supported in \CONTEXT\ as we have
other means to achieve the same goals. \NC \NR
\NC cid \NC A glyph index (name) to \UNICODE\ mapping file that is referenced
from an \OPENTYPE\ font and is shared between fonts. \NC \NR
\HL
\NC lfg \NC These are \CONTEXT\ specific \LUA\ font goodie files providing
additional information. \NC \NR
\HL
\stoptabulate
If you look at how files are organized in a \TEX\ distribution, you will notice
that these files all get their own place. Therefore adding a \TYPEONE\ font to
the distribution is not that trivial if you want to avoid clashes. Also, files
are simply not found when they are not in the right spot. Just to mention a few
paths:
\starttyping
<root>/fonts/tfm/vendor/typeface
<root>/fonts/vf/vendor/typeface
<root>/fonts/type1/vendor/typeface
<root>/fonts/truetype/vendor/typeface
<root>/fonts/opentype/vendor/typeface
<root>/fonts/fea
<root>/fonts/cid
<root>/fonts/dvips/enc
<root>/fonts/dvips/map
\stoptyping
There can be multiple roots and the right locations are specified in a
configuration file. Currently all engines can use the \DVIPS\ encoding and map
files, so luckily we don't need to duplicate this. For some reason \TRUETYPE\ and
\OPENTYPE\ fonts have different locations and you need to be aware of the fact
that some fonts come in both formats (just to confuse users) so you might end up
with conflicts.
In \CONTEXT\ we try to make live somewhat easier by also supporting a simple path
structure:
\starttyping
<root>/fonts/data/vendor/typeface
\stoptyping
This way files are kept together and installing commercial fonts is less complex
and error prone. Also, in practice we only have one set of files now: one of the
other \OPENTYPE\ formats.
If you want to see the difference between a traditional (\PDFTEX\ or \XETEX\ plus
\CONTEXT\ \MKII) setup or a modern one (\LUATEX\ with \CONTEXT\ \MKIV) you can
install the \CONTEXT\ suite (formerly known as minimals). If you explicitly
choose for a \LUATEX\ only setup, you will notice that far less files get
installed.
\stopsection
\startsection[title=Text]
This is not an in|-|depth explanation of how to define and load fonts in
\CONTEXT. First of all this is covered in other manuals, but more important is
that we assume that the reader is already familiar with the way \CONTEXT\ deals
with fonts. Therefore we limit ourselves to some remarks and expand on this a bit
in later chapters.
The font subsystem has evolved over years and when you look at the low level code
you will probably find it complex. This is true, although in some aspects it is
not as complex as in \MKII\ where we also had to deal with encodings due to the
eight bit limitations. In fact, setting up fonts is easier due the fact that we
have less files to deal with.
The main properties of a (modern) font subsystem for typesetting text are the
following:
\startitemize[n]
\startitem
We need to be able to switch the look and feel efficiently and
consistently, for instance going from regular to bold or italic. So,
when we load a font family we not only load one file, but often
at least four: regular, bold, italic (oblique) and bolditalic
(boldoblique).
\stopitem
\startitem
When we change the size we also need to make sure that these related
sets are changed accordingly. You really want the bold shapes to scale
along with the regular ones.
\stopitem
\startitem
Shapes are organized in serif, sans serif, mono spaced and math and for
proper working of a typesetter that has math all over you need always
need the math. Again, when you change size, all these shapes need to
scale in sync.
\stopitem
\startitem
In one document several families can be combined so the subsystem should
make it possible to switch from one to the other without too much
overhead.
\stopitem
\startitem
Because section heads and other structural elements have their own sizes
there has to be a consistent way to deal with that. It should also be
possible to specify exceptions for them.
\stopitem
\stopitemize
In the next chapters we will cover some details, for instance font features. You
can actually control these when setting up a body font, simply by redefining
the \type {default} feature set, but not all features are dealt with this way.
So let's continue the demands put on a font subsystem.
\startitemize[continue]
\startitem
Sometimes inter|-|character kerning is needed. In \CONTEXT\ this is not a
property of a font because glyphs can be mixed with basically anything.
This kind of features is applied independent of a font.
\stopitem
\startitem
The same is true for casing (like uppercasing and such) which is not
related to a font but applied to a selected (or marked) piece of the
input stream.
\stopitem
\startitem
Using so called \quotation {small caps} or \quotation {old style}
numerals or \unknown\ can be dealt with by setting the default features
but often these are applied selectively. As these are applied using the
information in a font they do belong to the font subsystem but in
practice they can be seen as independent (assuming that the font supports
them at all).
\stopitem
\startitem
Protrusion (into margins) and expansion (to improve whitespace) are
applied to the font at load time because the engine needs to know about
them. But they two can selectively be turned on and off. They are more
related to line break handling than font defining.
\stopitem
\startitem
Slanting (to fake oblique) and expanding (to fake bold) are regular
features but are applied to the font because the engine needs to know
about them. They permanently influence the shape.
\stopitem
\stopitemize
We will discuss these in this manual too. What we will not discuss in depth is
spacing, even when it depends on the (main body) font size. These use properties
of fonts (like the ex|-|height or em|-|width and maybe the width of the space,
but normally they are controlled by the spacing subsystem. We will however
mention some rather specific possibilities:
\startitemize[continue]
\startitem
The \CONTEXT\ font subsystem provides ways to combine multiple fonts
into one.
\stopitem
\startitem
You can construct artificial fonts, using existing fonts or \METAPOST\
graphics.
\stopitem
\startitem
Fonts can be fixed (dimensions) and completed (for instance accented
characters) when loading/
\stopitem
\startitem
There are extensive tracing options, not only for applied features but
also for loading, checking etc. There is a set of styles that can be
used to study fonts.
\stopitem
\stopitemize
Sometimes users ask for very special trickery and it no surprise then that some
of that is now widely know (or even discussed in detail). When we get notice of
that we can mention it in this manual.
So how does this all relate to font formats? We mentioned that when loading we
basically load some four files per family (and more if we use specific fonts for
titling). These files just provide the data: metric information, shapes and ways
to remap characters (or sequences) into glyphs, either of not positioned relative
to each other. In traditional \TEX\ only dimensions, kerns and ligatures
mattered, but in nowadays we also deal with specific \OPENTYPE\ features. But
still, as you can deduce from the above, this is only part of the story. You need
a complete and properly integrated system. It is no big deal to set up some
environment that uses font files to achieve some typesetting goal, but to provide
users with some consistent and extensible system is a bit more work.
There are basically three font formats: good old bitmaps, \TYPEONE\ and
\OPENTYPE. All need to be supported and expectations are that we also support
their features. But is should be noticed that whatever font you use, the quality
of the outcome depends on what information the font can provide. We can improve
processing but are often stuck with the font. There are many thousands of
fonts out there and we need to be able to use them all.
\stopsection
\startsection[title=Math]
In the previous section we already mentioned math fonts. The fonts are just one
aspect of typesetting math and math fonts are special in the sense that they have
to provide the relevant information. For instance a parenthesis comes in several
sizes and at some point turns in a symbol made out of pieces (like a top curve,
middle lines and bottom curve) that overlap. The user never sees such details. In
fact, there are ot that many math fonts and these are already set up so there is
not much to mess up here. Nevertheless we mention:
\startitemize [n]
\startitem
Math fonts are loaded in three sizes: text, script and scriptscript. The
optimal relative sizes ar defined in the font.
\stopitem
\startitem
There are direction aware math fonts and we support this in \CONTEXT.
\stopitem
\startitem
Bold math is in fact a bolder version of a regular math font (that can
have bold symbols too). Again this is supported.
\stopitem
\stopitemize
The way math is dealt with in \CONTEXT\ is different from the way it is done
traditionally. Already when we started with \MKIV\ we moved to \UNICODE\ and
the setup at the font level is kept simple by delegating some of the work to
the \LUA\ end. We will see some of the mentioned aspects in more detail later.
Because of it's complexity and because in a math text there can be many times
activation of math fonts (and related settings) quite some effort has been put in
making it efficient. But you need to keep in mind that when we discuss math
related topics later on, this is hardly of concern. Math fonts are loaded only
once so manipulating them a bit has no penalty. And using them later on is hardly
related to the font subsystem.
Concerning formats we can notice that traditional \TEX\ comes with math fonts
that have properties that the engine can use. Because there were not many math
fonts, this was no problem. The \OPENTYPE\ math fonts however are also used in
other applications and therefore are a bit more generic. \footnote {Their
internals are now defined in the \OPENTYPE\ specification.} For this we not only
had to adapt the math engine in \LUATEX\ (although we kept that to the minimum)
but we also had to think different about loading them. In later chapters we will
see that in the transition to \UNICODE\ math fonts we implemented a mechanism for
combining \TYPEONE\ fonts into virtual \UNICODE\ fonts. We did that because it
made no sense to keep an old and new loader alongside.
There will not be thousands of math fonts flying around. A few dozen is already a
lot and the developers of macro packages can set them up for the users. So, in
practice there is not much that a user needs to know about math font formats.
\stopsection
\startsection[title=Caching]
Because fonts can be large and because we use \LUA\ tables to describe them
a bit of effort has been put into managing them efficiently. Once converted
to the representation that we need they get cached. You can peek into the cache
which is someplace on your system (depending on the setup):
\starttabulate[|l|p|]
\NC \type{fonts/afm} \NC type one fonts, converted from \type {afm} and \type
{pfb} files \NC \NR
\NC \type{fonts/data} \NC font name databases \NC \NR
\NC \type{fonts/mp} \NC fonts created using \METAPOST \NC \NR
\NC \type{fonts/otf} \NC open type fonts, converted from \type {ttf}, \type {otf},
\type {ttc} and \type {ttx} files loaded using the
\FONTFORGE\ loader \NC \NR
\NC \type{fonts/otl} \NC open type fonts, converted from \type {ttf}, \type {otf},
\type {ttc} and \type {ttx} files loaded using the
\CONTEXT\ \LUA\ loader \NC \NR
\NC \type{fonts/shapes} \NC outlines of fonts (for instance for use in \METAFUN) \NC \NR
\stoptabulate
There can be three types of files there. The \type{tma} files are just \LUA\
tables and they can be large. These files can be compiled to bytecode where \type
{tmc} is for stock \LUATEX\ and \type {tmb} for \LUAJITTEX. The \type {tma} files
are optimized for space and memory (aka: packed) but you can expand them with
\type {mtxrun --script font}.
Fonts in the cache are automatically updated when you install new versions of a
font or when the \CONTEXT\ font loader has been updated.
\stopsection
\stopchapter
\stopcomponent
|