summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/fonts/fonts-formats.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/context/sources/general/manuals/fonts/fonts-formats.tex')
-rw-r--r--doc/context/sources/general/manuals/fonts/fonts-formats.tex896
1 files changed, 896 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/fonts/fonts-formats.tex b/doc/context/sources/general/manuals/fonts/fonts-formats.tex
new file mode 100644
index 000000000..9ad6bc9bd
--- /dev/null
+++ b/doc/context/sources/general/manuals/fonts/fonts-formats.tex
@@ -0,0 +1,896 @@
+% language=uk
+
+\startcomponent fonts-formats
+
+\environment fonts-environment
+
+\startchapter[title=Font formats][color=darkred]
+
+\startsection[title=Introduction]
+
+In this chapter the font formats as we know them will be introduced. The
+descriptions will be rather general but more details can be found in the
+appendix. Although in \MKIV\ we do support all these types eventually the focus
+will be on \OPENTYPE\ fonts but it does not hurt to see where we are coming from.
+
+\stopsection
+
+\startsection[title=Glyphs]
+
+A typeset text is mostly a sequence of characters turned into glyphs. We talk of
+characters when you input the text, but the visualization involves glyphs. When
+you copy a part of the screen in an open \PDF\ document or \HTML\ page back to
+your editor you end up with characters again. In case you wonder why we make this
+distinction between these two states we give an example.
+
+\startplacefigure [location=here,reference=fig:character-glyph,title=From characters to glyphs.]
+ \startcombination
+ {\color[maincolor]{\definedfont[Serif*default at 30pt]affiliation}} {upright}
+ {\color[maincolor]{\definedfont[SerifItalic*default at 30pt]affiliation}} {italic}
+ \stopcombination
+\stopplacefigure
+
+We see here that the shape of the \type {a} is different for an upright serif and
+an italic. We also see that in \type {ffi} there is no dot on the \type {i}. The
+first case is just a stylistic one but the second one, called a ligature, is
+actually one shape. The 11 characters are converted into 9 glyphs. Hopefully the
+final document format carries some extra information about this transformation so
+that a cut and paste will work out well. In \PDF\ files this is normally the
+case. In this document we will not be too picky about the distinction as in most
+cases the glyph is rather related to the character as one knows it.
+
+So, a font contains glyphs and it also carries some information about
+replacements. In addition to that there needs to be at least some information
+about the dimensions of them. Actually, a typesetting engine does not have to
+know anything about the actual shape at all.
+
+\startplacefigure [location=here,reference=fig:glyph-dimension-normal,title=The boundingbox of some normal glyphs.]
+ \startcombination[9*1]
+ {\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]a}}} {}
+ {\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]b}}} {}
+ {\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]g}}} {}
+ {\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]l}}} {}
+ {\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]q}}} {}
+ {\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt].}}} {}
+ {\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt];}}} {}
+ {\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]?}}} {}
+ {\ruledhbox{\color[maincolor]{\definedfont[Serif*default at 30pt]ffi}}} {}
+ \stopcombination
+\stopplacefigure
+
+\startplacefigure [location=here,reference=fig:glyph-dimension-italic,title=The boundingbox of some italic glyphs.]
+ \startcombination[9*1]
+ {\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]a}}} {}
+ {\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]b}}} {}
+ {\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]g}}} {}
+ {\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]l}}} {}
+ {\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]q}}} {}
+ {\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt].}}} {}
+ {\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt];}}} {}
+ {\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]?}}} {}
+ {\ruledhbox{\color[maincolor]{\definedfont[SerifItalic*default at 30pt]ffi}}} {}
+ \stopcombination
+\stopplacefigure
+
+The rectangles around the shapes \in {figure} [fig:glyph-dimension-normal] and \in
+{figure} [fig:glyph-dimension-italic] are called boundingbox. The dashed line
+reflects the baseline where they eventually are aligned onto next to each other.
+The amount above the baseline is called height, and below is called depth. The
+piece of the shape above the baseline is the ascender and the bit below the
+descender. The width of the bounding box is not by definition the width of the
+glyph. In \TYPEONE\ and \OPENTYPE\ fonts each shape has a so called advance width
+and that is the one that will be used.
+
+\usemodule[fonts-kerns]
+
+\startplacefigure [location=here,reference=fig:glyph-kerns,title={Kerning in Latin Roman, Cambria, Pagella and Dejavu.}]
+ \scale[width=\textwidth]{\startcombination[1*4]
+ {\color[maincolor]{\definedfont[name:lmroman10-regular*default sa 1]\ShowKernedHBox{Very often glyphs get very small spaces inserted horizontally.}}} {}
+ {\color[maincolor]{\definedfont[name:cambria*default sa 1]\ShowKernedHBox{Very often glyphs get very small spaces inserted horizontally.}}} {}
+ {\color[maincolor]{\definedfont[name:texgyrepagellaregular*default sa 1]\ShowKernedHBox{Very often glyphs get very small spaces inserted horizontally.}}} {}
+ {\color[maincolor]{\definedfont[name:dejavuserif*default sa 0.9]\ShowKernedHBox{Very often glyphs get very small spaces inserted horizontally.}}} {}
+ \stopcombination}
+\stopplacefigure
+
+Another traditional property of a font is kerning. In \in {figure}
+[fig:glyph-kerns] you see this in action. These examples
+demonstrate that not all fonts need (or provide) the same kerns
+(in points).
+
+So, as a start, we have now met a couple of properties of a font.
+They can be summarized as follows:
+
+\starttabulate[|l|p|]
+\NC mapping to glyphs \EQ characters are represented by a shapes that have recognizable
+ properties so that readers know what they mean \NC \NR
+\NC ligature building \EQ a sequence of characters gets mapped onto one glyph \NC \NR
+\NC dimensions \EQ each glyph has a width, height and depth \NC \NR
+\NC inter-glyph kerning \EQ optionally a bit of positive or negative space has to be inserted between glyphs \NC \NR
+%NC italic correction \EQ a correction is applied between an oblique shape and what follows \NC \NR
+\stoptabulate
+
+Regular font kerning is hardly noticeable and improves the overall look of the
+page. Typesetting applications sometimes are capable of inserting additional
+spaces between shapes. This more excessive kerning is not that much related to
+the font and is used for special purposes, like making a snippet of text stand
+out. In \CONTEXT\ this kind of kerning is available but it is a font independent
+feature. Keep in mind that when applying that kind of rather visible kerning
+you'd better not have ligatures and fancy replacements enabled as \CONTEXT\
+already tries to deal with that as good as possible.
+
+\stopsection
+
+\startsection[title=The basic process]
+
+In \TEX\ a font is an abstraction: the engine only needs to know about the
+mapping from characters to glyphs, what the width, height and depth is, what
+sequences need to be translated into ligatures and when kerning has to be
+applied. If for the moment we forget about math, these are all the properties
+that matter and this is what the \TEX\ font metric files that we see in the next
+section provide.
+
+Because one of the principles behind \LUATEX\ is that the core engine (the
+binary) stays small and that new functionality is provided in \LUA\ code, the
+font subsystem largely looks like it always has been. As users will normally use
+a macro package most of the loading will be hidden from the user. It is however
+good to give a quick overview of how for instance \PDFTEX\ deals with fonts using
+traditional metric files.
+
+\startFLOWchart[pdftex]
+ \startFLOWcell
+ \name {source}
+ \location {1,1}
+ \shape {action}
+ \text {input}
+ \connection [rl] {parser}
+ \stopFLOWcell
+ \startFLOWcell
+ \name {parser}
+ \location {2,1}
+ \shape {action}
+ \text {characters}
+ \connection [rl] {builder}
+ \stopFLOWcell
+ \startFLOWcell
+ \name {builder}
+ \location {3,1}
+ \shape {action}
+ \text {glyphs}
+ \connection [rl] {backend}
+ \stopFLOWcell
+ \startFLOWcell
+ \name {backend}
+ \location {4,1}
+ \shape {action}
+ \text {subset}
+ \stopFLOWcell
+\stopFLOWchart
+
+\startplacefigure [location=here,reference=fig:tfm-pdftex,title={Several translation steps in a traditonal \TEX\ flow.}]
+ \FLOWchart[pdftex]
+\stopplacefigure
+
+The input (bytes) gets translated into characters by the input parser. Normally
+this is a one|-|to|-|one translation but there are examples of some translation
+taking place. You can for instance make characters active and give them a
+meaning. So, the eight bit represention of an editors code page \type {ë} can
+become something else internally, for instance a regular \type {e} with an \type
+{¨} overlayed. It can also become another character, which in the code page
+would be shown as \type {á} but the user will not know this as by then this byte
+is already tokenized. Another example is multibyte translation, for instance
+\UTF\ sequences can get remapped to something that is known internally as being a
+character of some kind. The \LUATEX\ engine expects \UTF\ so a macro package has
+to make sure that translation to this encoding happens beforehand, for instance
+using a callback that intercepts the input from file. \footnote {In \CONTEXT\ we
+talk of input regimes and these can be mixed, although in practice most users
+will stick to \UTF\ and never use regimes.}
+
+So, the input character (sequence) becomes tokens representing a character. From
+these tokens \TEX\ will start building a (linked) node list where each character
+becomes a node. In this node there is a reference to the current font. If you
+know \TEX\ you will understand that a list can have more than characters: there
+can be skips, kerns, rules, references to images, boxes, etc.
+
+At some point \TEX\ will handle this list over to a routine that will turn them
+into something that resembles a paragraph or otherwise snippet of text. In that
+stage hyphenation kicks in, ligatures get built and kerning is added. Character
+references become glyph indices. This list can finally be broken into lines.
+
+It is no secret that \TEX\ can box and unbox material and that after unboxing
+some new formatting has to happen. The traditional engine has some optimizations
+that demand a partial reconstruction of the original list but in \LUATEX\ we
+removed this kind of optimization so there the process is somewhat simpler. We
+will see more of that later.
+
+When \TEX\ ships out a page, the backend will load the real font data and merge
+that into the final output. It will now take the glyph index and build the right
+data structures and references to the real font. As a font gets subset only the
+used glyphs end up in the final output.
+
+There is one tricky aspect involved here: re|-|encoding. In so called map files
+one can map a specific metric filename onto a real font name. One can also
+specify an encoding vector that tells what a given index really refers to. This
+makes it possible to use fonts that have more than 256 glyphs and refer to any of
+them. This is also the trick that makes it possible to use \TRUETYPE\ fonts in
+\PDFTEX: the backend code filters the right glyphs from the font, remapping
+\TEX's glyph indices onto real entries in the font happens via the encoding
+vector. In \in {figure} [fig:tfm-bytes] we show a possible route for input byte
+68.
+
+\startFLOWchart[bytes]
+ \startFLOWcell
+ \name {source}
+ \location {1,1}
+ \shape {action}
+ \text {bytes (68)}
+ \connection [rl] {parser}
+ \stopFLOWcell
+ \startFLOWcell
+ \name {parser}
+ \location {2,1}
+ \shape {action}
+ \text {bytes (31)}
+ \connection [rl] {builder}
+ \stopFLOWcell
+ \startFLOWcell
+ \name {builder}
+ \location {3,1}
+ \shape {action}
+ \text {index (31)}
+ \connection [rl] {backend}
+ \stopFLOWcell
+ \startFLOWcell
+ \name {backend}
+ \location {4,1}
+ \shape {action}
+ \text {index (88)}
+ \stopFLOWcell
+\stopFLOWchart
+
+\startplacefigure [location=here,reference=fig:tfm-bytes,title={From bytes to indices.}]
+ \FLOWchart[bytes]
+\stopplacefigure
+
+As \LUATEX\ carries much of the bagage of older engines, you can still do it this
+way but in \CONTEXT\ \MKIV\ we have made our live much simpler: we use unicode as
+much as possible. This means that we effectively have removed two steps (see \in
+{figure} [fig:tfm-luatex]).
+
+\startFLOWchart[luatex]
+ \startFLOWcell
+ \name {source}
+ \location {1,1}
+ \shape {action}
+ \text {input}
+ \connection [rl] {builder}
+ \stopFLOWcell
+ \startFLOWcell
+ \name {builder}
+ \location {2,1}
+ \shape {action}
+ \text {glyphs}
+ \stopFLOWcell
+\stopFLOWchart
+
+\startplacefigure [location=here,reference=fig:tfm-luatex,title={Simplified mapping in \LUATEX.}]
+ \FLOWchart[luatex]
+\stopplacefigure
+
+There is of course still some work to do for the backend, like subsetting, but
+the nasty dependency on the input encoding, font encoding (that itself relates to
+hyphenation) and backend re|-|encoding is gone. But keep in mind that the
+internal data structure of the font is still quite traditional.
+
+Before we move on to font formats I like to point out that there is no space in
+\TEX. Spaces in the input are converted into glue, either or not with some
+stretch and|/|or shrink. This also means that accessing character 32 in
+traditional \TEX\ will not end up as space in the output.
+
+\stopsection
+
+\startsection[title=\TEX\ metrics]
+
+\appendixdata{\in[fontdata:tfm]}
+\appendixdata{\in[fontdata:vf]}
+
+Traditional font metrics are packaged in a binary format. Due to the limitations
+of that time a font has at most 256 characters. In books dedicated to \TEX\ you
+will often find tables that show what glyphs are in a font, so we will not repeat
+that here as after all we got rid of that limitation in \LUATEX.
+
+Because 256 is not that much, especially when you mix many scripts and need lots
+of symbols from the same font, there are quite some encodings used in traditional
+\TEX, like \type {texnansi}, \type {ec} and \type {qx}. When you use \LUATEX\
+exclusively you can do with way less font files. This is easier for users,
+especially because most of those files were never used anyway. It's interesting
+to notice that some of the encodings contain symbols that are never used or used
+only once in a document, like the copyright or registered symbols. They are often
+accessed by symbolic names and therefore easily could have been omitted and
+collected in a dedicated symbol font thereby freeing slots for more useful
+characters anyway. The lack of coverage is probably one of the reasons why new
+encodings kept popping up. In the next table you see how many files are involved
+in Latin Modern which comes in a couple of design sizes. \footnote {The original
+Computer Modern fonts have \METAFONT\ source files and (runtime) generated bitmap
+files in whatever resolutions are needed for previewing and printing. The
+\TYPEONE\ follow|-|up came in several sets, organized by language support. The
+Latin Modern fonts have a few more weights and variants than Computer Modern.}
+
+\starttabulate[|l|c|r|r|r|]
+\HL
+\NC \bf font format \NC \bf type \NC \bf \# files \NC \bf size in bytes \NC \bf \CONTEXT \NC \NR
+\HL
+\NC type 1 \NC tfm \NC 380 \NC 3.841.708 \NC \NC \NR
+\NC \NC afm \NC 25 \NC 2.697.583 \NC \NC \NR
+\NC \NC pfb \NC 92 \NC 9.193.082 \NC \NC \NR
+\NC \NC enc \NC 15 \NC 37.605 \NC \NC \NR
+\NC \NC map \NC 9 \NC 42.040 \NC \NC \NR
+\HL[darkgray]
+\NC \NC \NC 521 \NC 15.812.018 \NC mkii \NC \NR
+\HL
+\NC opentype \NC otf \NC 73 \NC 8.224.100 \NC mkiv \NC \NR
+\HL
+\stoptabulate
+
+A \TFM\ file can contain so called italic corrections. This is an additional kern
+that can be added after a character in order to get better spacing between an
+italic shape and an upright one. As this is manual work, it's a not that advanced
+mechanism, but in addition to width, height, depth, kerns and ligatures it is
+nevertheless a useful piece of information. But, it's a rather \TEX\ specific
+quantity.
+
+Since \TEX\ showed up many fonts have been added. In addition support for
+commercial fonts was provided. In fact, for that to happen, one only needs
+accompanying metric files for \TEX\ itself and map files and encoding vectors
+for the backend. Because a metric file also has some general information, like
+spacing (including stretch and shrink), the ex|-|height and em|-|width, this
+means that sometimes guesses must be made when the original font does not come
+with such parameters.
+
+At some point virtual fonts were introduced. In a virtual font a \TFM\ file has
+an accompanying \VF\ file. In that file each glyph has a specification that tells
+where to find the real glyph. It is even possible to construct glyphs from other
+glyphs. In traditional \TEX\ this only concerns the backend, which in \PDFTEX\ is
+built in. In \LUATEX\ this mechanism is integrated into the frontend which means
+that users can construct such virtual fonts themselves. We will see more of that
+later, but for now it's enough to know that when we talk about the representation
+of font (the \TFM\ table) in \LUATEX, this includes virtual functionality.
+
+An important limitation of \TFM\ files cq.\ traditional \TEX\ is that the number
+of depths and heights is limited to 16 each. Although this results in somewhat
+inaccurate dimensions in practice this gets unnoticed, if only because many
+designs have some consistency in this. On the other hand, it is a limitation when
+we start thinking of accents or even multiple accents which lead to many more
+distinctive heights and depths.
+
+Concerning ligatures we can remark that there are quite some substitutions
+possible although in practice only the multiple to one replacement has been used.
+
+Some fonts that are used in \TEX\ started out as bitmaps but rather soon
+\TYPEONE\ outline fonts became the fashion. These are supported using the map
+files that we will discuss later. First we look into \TYPEONE\ fonts.
+
+\stopsection
+
+\startsection[title=\TYPEONE]
+
+\appendixdata{\in[fontdata:afm]}
+\appendixdata{\in[fontdata:enc]}
+\appendixdata{\in[fontdata:map]}
+
+For a long time \TYPEONE\ fonts have dominated the scene. These are \POSTSCRIPT\
+fonts that can have more that 256 glyphs in the file that defines the shapes, but
+only 256 of them can be used at one time. Of course there can be multiple subsets
+active in one document.
+
+In traditional \TEX\ a \TYPEONE\ font is used by making a \TFM\ file from a so
+called Adobe metric file (\AFM) that come with such a font. There are several
+tool chains for doing this and \CONTEXT\ \MKII\ ships with one that can be of
+help when you need to support commercial fonts. Projects like the Latin Modern
+Fonts and \TEX\ Gyre have normalized a whole lot of fonts that came in several
+more or less complete encodings into a consistent package of \TYPEONE\ fonts.
+This already simplified live a lot but still users had to choose a suitable input
+and font encoding for their language and|/|or script. As \TEX\ only cares about
+metrics and not about the rendering, it doesn't consider \TYPEONE\ fonts as
+something special. Also, as \TEX\ and \POSTSCRIPT\ were developed about the same
+time support for \TYPEONE\ fonts is rather present in \TEX\ distributions.
+
+You can still follow this route but for \CONTEXT\ \MKIV\ this is no longer the
+recommended way because there we have changed the whole subsystem to use
+\UNICODE. As a result we no longer use \TFM\ files derived from \AFM\ files, but
+directly interpret the \AFM\ data. This not only removes the 256 limitation, but
+also brings more resolution in height and depth as we no longer have at most 16
+alternatives. There can also be more kerns. Of course we need some heuristics to
+determine for instance the spacing but that is not different from former times.
+
+Because most \TEX\ users don't use commercial fonts, they will not notice that
+\CONTEXT\ \MKIV\ treats \TYPEONE\ fonts this way. One reason is that the free
+fonts also come as wide fonts in \OPENTYPE\ format and whenever possible
+\CONTEXT\ prefers \OPENTYPE\ over \TYPEONE\ over \TFM.
+
+In the beginning \LUATEX\ only could load a \TFM\ file, which is why loading
+\AFM\ files is implemented in \LUA. Later, when the \OPENTYPE\ loaded was added,
+loading \PFB\ and \AFM\ files also became possible but it's slower and we see no
+reason to rewrite the current code in \CONTEXT. We also do a couple of extra
+things when loading such a file. As more \TYPEONE\ fonts move on to \OPENTYPE\ we
+don't expect that much usage anyway.
+
+\stopsection
+
+\startsection[title=\OPENTYPE]
+
+\appendixdata{\in[fontdata:otf]}
+
+When an engine can deal with \UNICODE\ directly it also means that internally it
+uses pretty large numbers for storing characters and glyph indices. The first
+\TEX\ descendent that went wide was \OMEGA, later replaced by \ALEPH. However, this
+engine never took off and still used its own extended \TFM\ format: \OFM. In fact,
+as \LUATEX\ uses some of the \ALEPH\ code, it can also use these extended metric
+files but I don't think that there are any useful fonts around so we can forget
+about this.
+
+We use the term \OPENTYPE\ for a couple of font formats that share the same
+principles: \OPENTYPE\ (\OTF), \TRUETYPE\ (\TTF) and \TRUETYPE\ containers
+(\TTC). The \LUATEX\ font reader presents them in a similar format. In the case
+of a \TRUETYPE\ container, one does not load the whole font but selects an
+instance from it. Internally an \OPENTYPE\ font can have the glyphs organized in
+subfonts.
+
+The first \TEX\ descendent to really go wide from front to back is \XETEX. This
+engine can use \OPENTYPE\ fonts directly and for a whole category of users this
+opened up a new world. Hoever, it is still mostly a traditional engine. The
+transition from characters to glyphs is accomplished by external libraries, while
+in \LUATEX\ we code in \LUA. This has the disadvantage that it is slower
+(although that depends on the job) but the advantage is that we have much more
+control and can extend the font handler as we like.
+
+An \OPENTYPE\ font is much more complex than a \TYPEONE\ one. Unless it is a
+quick and dirty converted existing font, it will have more glyphs to start with.
+Quite likely it will have kerns and ligatures too and of course there are
+dimensions. However, there is no concept of a depth and height. These need to be
+deduced from the bounding box instead. There is an advance width. This means that
+we can start right away using such fonts if we map those properties onto the
+\TFM\ table that \LUATEX\ expects.
+
+But there is more, take ligatures. In a traditional font the sequence \type {ffi}
+always becomes a ligature, given that the font has such a glyph. In \LUATEX\
+there is a way to disable this mechanism, which is sometimes handy when dealing
+with mono|-|spaced fonts in verbatim. It's pretty hard to disable that. For
+instance one option is to insert kerns manually. In an \OPENTYPE\ font ligatures
+are collected in a so called feature. There can be many such features and even
+kerning is a feature. Other examples are old style numerals, fractions,
+superiors, inferiors, historic ligatures and stylistic alternates.
+
+\starttabulate[|lT|l|l|l|l|]
+\NC \type{onum} \NC \ruledhbox{\maincolor\DemoOnumLM\char45 1}
+ \NC \ruledhbox{\maincolor\DemoOnumLM1234567890}
+ \NC \ruledhbox{\maincolor\DemoOnumLM\char"A2}
+ \NC \ruledhbox{\maincolor\DemoOnumLM\char"24} \NC \NR
+%NC \type{lnum} \NC \ruledhbox{\maincolor\DemoLnumLM\char45 1}
+% \NC \ruledhbox{\maincolor\DemoLnumLM1234567890}
+% \NC \ruledhbox{\maincolor\DemoLnumLM\char"A2}
+% \NC \ruledhbox{\maincolor\DemoLnumLM\char"24} \NC \NR
+\NC \type{tnum} \NC \ruledhbox{\maincolor\DemoTnumLM\char45 1}
+ \NC \ruledhbox{\maincolor\DemoTnumLM1234567890}
+ \NC \ruledhbox{\maincolor\DemoTnumLM\char"A2}
+ \NC \ruledhbox{\maincolor\DemoTnumLM\char"24} \NC \NR
+\NC \type{pnum} \NC \ruledhbox{\maincolor\DemoPnumLM\char45 1}
+ \NC \ruledhbox{\maincolor\DemoPnumLM1234567890}
+ \NC \ruledhbox{\maincolor\DemoPnumLM\char"A2}
+ \NC \ruledhbox{\maincolor\DemoPnumLM\char"24} \NC \NR
+\stoptabulate
+
+To this all you need to add that features operate in two dimensions: languages
+and scripts. This means that when ligatures are enabled for Dutch the \type {ij}
+sequence becomes a single glyph but for German it gets mapped onto two glyphs.
+And, to make it even more complex, a substitution can depend on circumstances,
+which means that for Dutch \type {fijn} becomes \type {f ij n} but \type {fiets}
+becomes \type {fi ets}. It will be no surprise that not all \OPENTYPE\ fonts come
+with a complete and rich repertoire of rules. To make things worse, there can be
+rules that turn \type {1/2} into one glyph, or transfer the numbers into superior
+and inferior alternatives, but leaves us with an unacceptable rendered \type
+{1/a}, given that the \type {frac} features is enabled. It looks like features
+like this are to be applied to a manually selected range of characters.
+
+The fact that an \OPENTYPE\ font can contain many features and rules to apply
+them makes it possible to typeset scripts like Arabic. And this is where it gets
+vague. A generic \OPENTYPE\ sub|-|engine can do clever things using these rules,
+but if you read the specification for some scripts additional intelligence has to
+be provided by the typesetting engine.
+
+While users no longer have to care about encodings, map files and back|-|end
+issues, they do have to carry knowledge about the possibilities and limitations
+of features. Even worse, he or she needs to be aware that fonts can have bugs.
+Also, as font vendors have no tradition of providing updates this is something
+that we might need to take care of ourselves by tweaking the engine.
+
+One of the problems with the transition from \TYPEONE\ to \OPENTYPE\ is that font
+designers can take an existing design and start from that basic repertoire of
+shapes. If such a design had oldstyle figures only, there is a good chance that
+this will be the case in the \OPENTYPE\ variant too. However, such a default
+interferes with the fact that the \type {onum} feature is one that we explicitly
+have to enable. This means that writing a generic style where a font is later
+plugged in becomes somewhat messy if it assumes that features need to be turned
+on.
+
+\TEX\ users expect more control, which means that in practice just an \OPENTYPE\
+engine is not enough, but for the average font the \TEX\ model using the
+traditional approach still is quite acceptable. After all, not all users use
+complex scripts or need advanced features. And, in practice most readers don't
+notice the difference anyway.
+
+\stopsection
+
+\startsection[title=\LUA]
+
+\appendixdata{\in[fontdata:lua]}
+
+As mentioned support for virtual fonts is built into \LUATEX\ and loading the so
+called \VF\ files happens when needed. However, that concerns traditional fonts
+that we already covered. In \CONTEXT\ we do use the virtual font mechanism for
+creating missing glyphs out of existing ones or add fallbacks when this is not
+possible. But this is not related to some kind of font format.
+
+In 2010 and 2011 the first public \OPENTYPE\ math fonts showed up that replace
+their \TYPEONE\ originals. In \CONTEXT\ we already went forward and created
+virtual \UNICODE\ fonts out of traditional fonts. Of course eventually the
+defaults will change to the \OPENTYPE\ alternatives. The specification for such a
+virtual font is given in \LUA\ tables and therefore you can consider \LUA\ to be
+a font format as well. In \CONTEXT\ such fonts can be defined in so called
+goodies files. As we use these files for much more tuning, we come back to that
+in a later chapter. In a virtual font you can mix real \TYPEONE\ fonts and real
+\OPENTYPE\ fonts using whatever metrics suit best.
+
+An extreme example is the virtual \UNICODE\ Punk font. This font is defined in
+the \METAPOST\ language (derived from Don Knuths \METAFONT\ sources) where each
+glyph is one graphic. Normally we get \POSTSCRIPT, but in \LUATEX\ we can also
+get output in a comparable \LUA\ table. That output is converted to \PDF\
+literals that become part of the virtual font definitions and these eventually
+end up in the \PDF\ page stream. So, at the \TEX\ end we have regular (virtual)
+characters and all \TEX\ needs is their dimensions, but in the \PDF\ each glyph
+is shown using drawing operations. Of course the now available \OPENTYPE\ variant
+is more efficient, but it demonstrates the possibilities.
+
+\stopsection
+
+\startsection[title=Files]
+
+We summarize these formats in the following table where we explain what the file
+suffixes stand for:
+
+\starttabulate[|Tl|p|]
+\HL
+\NC tfm \NC This is the traditional \TEX\ font metric file format and it reflects
+ the internal quantities that \TEX\ uses. The internal data structures
+ (in \LUATEX) are an extension of the \TFM\ format. \NC \NR
+\NC vf \NC This file contains information about how to construct and where to
+ find virtual glyphs and is meant for the backend. With \LUATEX\ this
+ format gets more known. \NC \NR
+\NC pk \NC This is the bitmap format used for the first generation of \TEX\
+ fonts but the typesetter never deals with them. Bitmap files are more
+ or less obselete. \NC \NR
+\HL
+\NC ofm \NC This is the \OMEGA\ variant of the \type {tfm} files that caters for
+ larger fonts. \NC \NR
+\NC ovf \NC This is the \OMEGA\ variant of the \type {vf}. \NC \NR
+\HL
+\NC pfb \NC In this file we find the glyph data (outlines) and some basic
+ information about the font, like name|-|to|-|index mappings. A
+ differently byte|-|encoded variant of this format is \type {pfa}.\NC
+ \NR
+\NC afm \NC This file accompanies the \type {pfb} file and provides additional
+ metrics, kerns and information about ligatures. A binary variant of
+ this is the \PFA\ format. For \MSWINDOWS\ there is a variant that has the
+ \type {pfm} suffix. \NC \NR
+\NC map \NC The backend will consult this file for mapping metric file names onto
+ real font names. \NC \NR
+\NC enc \NC The backend will include (and use) this encoding vector to map
+ internal indices to font indices using glyph names, if needed. \NC
+ \NR
+\HL
+\NC otf \NC This binary format describes not only the font in terms of metrics,
+ features and properties but also contains the shapes. \NC \NR
+\NC ttf \NC This is the \MICROSOFT\ variant of \OPENTYPE. \NC \NR
+\NC ttc \NC This is the \MICROSOFT\ container format that combines multiple fonts
+ in one. \NC \NR
+\HL
+\NC fea \NC A (\FONTFORGE) feature definition file. Such a file can be loaded and
+ applied to a font. This is no longer supported in \CONTEXT\ as we have
+ other means to achieve the same goals. \NC \NR
+\NC cid \NC A glyph index (name) to \UNICODE\ mapping file that is referenced
+ from an \OPENTYPE\ font and is shared between fonts. \NC \NR
+\HL
+\NC lfg \NC These are \CONTEXT\ specific \LUA\ font goodie files providing
+ additional information. \NC \NR
+\HL
+\stoptabulate
+
+If you look at how files are organized in a \TEX\ distribution, you will notice
+that these files all get their own place. Therefore adding a \TYPEONE\ font to
+the distribution is not that trivial if you want to avoid clashes. Also, files
+are simply not found when they are not in the right spot. Just to mention a few
+paths:
+
+\starttyping
+<root>/fonts/tfm/vendor/typeface
+<root>/fonts/vf/vendor/typeface
+<root>/fonts/type1/vendor/typeface
+<root>/fonts/truetype/vendor/typeface
+<root>/fonts/opentype/vendor/typeface
+<root>/fonts/fea
+<root>/fonts/cid
+<root>/fonts/dvips/enc
+<root>/fonts/dvips/map
+\stoptyping
+
+There can be multiple roots and the right locations are specified in a
+configuration file. Currently all engines can use the \DVIPS\ encoding and map
+files, so luckily we don't need to duplicate this. For some reason \TRUETYPE\ and
+\OPENTYPE\ fonts have different locations and you need to be aware of the fact
+that some fonts come in both formats (just to confuse users) so you might end up
+with conflicts.
+
+In \CONTEXT\ we try to make live somewhat easier by also supporting a simple path
+structure:
+
+\starttyping
+<root>/fonts/data/vendor/typeface
+\stoptyping
+
+This way files are kept together and installing commercial fonts is less complex
+and error prone. Also, in practice we only have one set of files now: one of the
+other \OPENTYPE\ formats.
+
+If you want to see the difference between a traditional (\PDFTEX\ or \XETEX\ plus
+\CONTEXT\ \MKII) setup or a modern one (\LUATEX\ with \CONTEXT\ \MKIV) you can
+install the \CONTEXT\ suite (formerly known as minimals). If you explicitly
+choose for a \LUATEX\ only setup, you will notice that far less files get
+installed.
+
+\stopsection
+
+\startsection[title=Text]
+
+This is not an in|-|depth explanation of how to define and load fonts in
+\CONTEXT. First of all this is covered in other manuals, but more important is
+that we assume that the reader is already familiar with the way \CONTEXT\ deals
+with fonts. Therefore we limit ourselves to some remarks and expand on this a bit
+in later chapters.
+
+The font subsystem has evolved over years and when you look at the low level code
+you will probably find it complex. This is true, although in some aspects it is
+not as complex as in \MKII\ where we also had to deal with encodings due to the
+eight bit limitations. In fact, setting up fonts is easier due the fact that we
+have less files to deal with.
+
+The main properties of a (modern) font subsystem for typesetting text are the
+following:
+
+\startitemize[n]
+ \startitem
+ We need to be able to switch the look and feel efficiently and
+ consistently, for instance going from regular to bold or italic. So,
+ when we load a font family we not only load one file, but often
+ at least four: regular, bold, italic (oblique) and bolditalic
+ (boldoblique).
+ \stopitem
+ \startitem
+ When we change the size we also need to make sure that these related
+ sets are changed accordingly. You really want the bold shapes to scale
+ along with the regular ones.
+ \stopitem
+ \startitem
+ Shapes are organized in serif, sans serif, mono spaced and math and for
+ proper working of a typesetter that has math all over you need always
+ need the math. Again, when you change size, all these shapes need to
+ scale in sync.
+ \stopitem
+ \startitem
+ In one document several families can be combined so the subsystem should
+ make it possible to switch from one to the other without too much
+ overhead.
+ \stopitem
+ \startitem
+ Because section heads and other structural elements have their own sizes
+ there has to be a consistent way to deal with that. It should also be
+ possible to specify exceptions for them.
+ \stopitem
+\stopitemize
+
+In the next chapters we will cover some details, for instance font features. You
+can actually control these when setting up a body font, simply by redefining
+the \type {default} feature set, but not all features are dealt with this way.
+So let's continue the demands put on a font subsystem.
+
+\startitemize[continue]
+ \startitem
+ Sometimes inter|-|character kerning is needed. In \CONTEXT\ this is not a
+ property of a font because glyphs can be mixed with basically anything.
+ This kind of features is applied independent of a font.
+ \stopitem
+ \startitem
+ The same is true for casing (like uppercasing and such) which is not
+ related to a font but applied to a selected (or marked) piece of the
+ input stream.
+ \stopitem
+ \startitem
+ Using so called \quotation {small caps} or \quotation {old style}
+ numerals or \unknown\ can be dealt with by setting the default features
+ but often these are applied selectively. As these are applied using the
+ information in a font they do belong to the font subsystem but in
+ practice they can be seen as independent (assuming that the font supports
+ them at all).
+ \stopitem
+ \startitem
+ Protrusion (into margins) and expansion (to improve whitespace) are
+ applied to the font at load time because the engine needs to know about
+ them. But they two can selectively be turned on and off. They are more
+ related to line break handling than font defining.
+ \stopitem
+ \startitem
+ Slanting (to fake oblique) and expanding (to fake bold) are regular
+ features but are applied to the font because the engine needs to know
+ about them. They permanently influence the shape.
+ \stopitem
+\stopitemize
+
+We will discuss these in this manual too. What we will not discuss in depth is
+spacing, even when it depends on the (main body) font size. These use properties
+of fonts (like the ex|-|height or em|-|width and maybe the width of the space,
+but normally they are controlled by the spacing subsystem. We will however
+mention some rather specific possibilities:
+
+\startitemize[continue]
+ \startitem
+ The \CONTEXT\ font subsystem provides ways to combine multiple fonts
+ into one.
+ \stopitem
+ \startitem
+ You can construct artificial fonts, using existing fonts or \METAPOST\
+ graphics.
+ \stopitem
+ \startitem
+ Fonts can be fixed (dimensions) and completed (for instance accented
+ characters) when loading/
+ \stopitem
+ \startitem
+ There are extensive tracing options, not only for applied features but
+ also for loading, checking etc. There is a set of styles that can be
+ used to study fonts.
+ \stopitem
+\stopitemize
+
+Sometimes users ask for very special trickery and it no surprise then that some
+of that is now widely know (or even discussed in detail). When we get notice of
+that we can mention it in this manual.
+
+So how does this all relate to font formats? We mentioned that when loading we
+basically load some four files per family (and more if we use specific fonts for
+titling). These files just provide the data: metric information, shapes and ways
+to remap characters (or sequences) into glyphs, either of not positioned relative
+to each other. In traditional \TEX\ only dimensions, kerns and ligatures
+mattered, but in nowadays we also deal with specific \OPENTYPE\ features. But
+still, as you can deduce from the above, this is only part of the story. You need
+a complete and properly integrated system. It is no big deal to set up some
+environment that uses font files to achieve some typesetting goal, but to provide
+users with some consistent and extensible system is a bit more work.
+
+There are basically three font formats: good old bitmaps, \TYPEONE\ and
+\OPENTYPE. All need to be supported and expectations are that we also support
+their features. But is should be noticed that whatever font you use, the quality
+of the outcome depends on what information the font can provide. We can improve
+processing but are often stuck with the font. There are many thousands of
+fonts out there and we need to be able to use them all.
+
+\stopsection
+
+\startsection[title=Math]
+
+In the previous section we already mentioned math fonts. The fonts are just one
+aspect of typesetting math and math fonts are special in the sense that they have
+to provide the relevant information. For instance a parenthesis comes in several
+sizes and at some point turns in a symbol made out of pieces (like a top curve,
+middle lines and bottom curve) that overlap. The user never sees such details. In
+fact, there are ot that many math fonts and these are already set up so there is
+not much to mess up here. Nevertheless we mention:
+
+\startitemize [n]
+ \startitem
+ Math fonts are loaded in three sizes: text, script and scriptscript. The
+ optimal relative sizes ar defined in the font.
+ \stopitem
+ \startitem
+ There are direction aware math fonts and we support this in \CONTEXT.
+ \stopitem
+ \startitem
+ Bold math is in fact a bolder version of a regular math font (that can
+ have bold symbols too). Again this is supported.
+ \stopitem
+\stopitemize
+
+The way math is dealt with in \CONTEXT\ is different from the way it is done
+traditionally. Already when we started with \MKIV\ we moved to \UNICODE\ and
+the setup at the font level is kept simple by delegating some of the work to
+the \LUA\ end. We will see some of the mentioned aspects in more detail later.
+
+Because of it's complexity and because in a math text there can be many times
+activation of math fonts (and related settings) quite some effort has been put in
+making it efficient. But you need to keep in mind that when we discuss math
+related topics later on, this is hardly of concern. Math fonts are loaded only
+once so manipulating them a bit has no penalty. And using them later on is hardly
+related to the font subsystem.
+
+Concerning formats we can notice that traditional \TEX\ comes with math fonts
+that have properties that the engine can use. Because there were not many math
+fonts, this was no problem. The \OPENTYPE\ math fonts however are also used in
+other applications and therefore are a bit more generic. \footnote {Their
+internals are now defined in the \OPENTYPE\ specification.} For this we not only
+had to adapt the math engine in \LUATEX\ (although we kept that to the minimum)
+but we also had to think different about loading them. In later chapters we will
+see that in the transition to \UNICODE\ math fonts we implemented a mechanism for
+combining \TYPEONE\ fonts into virtual \UNICODE\ fonts. We did that because it
+made no sense to keep an old and new loader alongside.
+
+There will not be thousands of math fonts flying around. A few dozen is already a
+lot and the developers of macro packages can set them up for the users. So, in
+practice there is not much that a user needs to know about math font formats.
+
+\stopsection
+
+\startsection[title=Caching]
+
+Because fonts can be large and because we use \LUA\ tables to describe them
+a bit of effort has been put into managing them efficiently. Once converted
+to the representation that we need they get cached. You can peek into the cache
+which is someplace on your system (depending on the setup):
+
+\starttabulate[|l|p|]
+\NC \type{fonts/data} \NC font name databases \NC \NR
+\NC \type{fonts/mp} \NC fonts created using \METAPOST \NC \NR
+\NC \type{fonts/one} \NC type one fonts, converted from \type {afm} and \type
+ {pfb} files \NC \NR
+\NC \type{fonts/otl} \NC open type fonts, converted from \type {ttf}, \type {otf},
+ \type {ttc} and \type {ttx} files loaded using the
+ \CONTEXT\ \LUA\ loader \NC \NR
+\NC \type{fonts/pdf} \NC font shapes for color fonts \NC \NR
+\NC \type{fonts/shapes} \NC outlines of fonts (for instance for use in \METAFUN) \NC \NR
+\NC \type{fonts/streams} \NC font programs for variable font instances \NC \NR
+\stoptabulate
+
+There can be three types of files there. The \type{tma} files are just \LUA\
+tables and they can be large. These files can be compiled to bytecode where \type
+{tmc} is for stock \LUATEX\ and \type {tmb} for \LUAJITTEX. The \type {tma} files
+are optimized for space and memory (aka: packed) but you can expand them with
+\type {mtxrun --script font}.
+
+Fonts in the cache are automatically updated when you install new versions of a
+font or when the \CONTEXT\ font loader has been updated.
+
+\stopsection
+
+\startsection[title=Paths]
+
+The search for fonts happens on paths defined in \type {texmf.cnf}. The information
+in there is used to generate a file database for fast access with priorities based
+on file type. The \TDS\ is starting point. The environment variable driven paths
+\type {OSFONTDIR} (set automatically) and \type {EXTRAFONTDIR} are taken into account.
+
+In addition you can set \type {RUNTIMEFONTS} which is, when set, consulted at
+runtime. You can also add a path in your style:
+
+\starttyping
+\usefontpath[c:/data/projects/myproject/fonts]
+\stoptyping
+
+although in general we recommend to put fonts in
+
+\starttyping
+<texroot>/tex/texmf-fonts/fonts/data]
+\stoptyping
+
+which is more efficient.
+
+\stopsection
+
+\stopchapter
+
+\stopcomponent