doc/context/sources/general/manuals/hybrid/hybrid-math.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347

% language=uk

\startcomponent hybrid-math

\environment hybrid-environment

\startchapter[title={Handling math: A retrospective}]

{This is \TUGBOAT\ article .. reference needed.}

% In this article I will reflect on how the plain \TEX\ approach to math
% fonts influenced the way math has been dealt with in \CONTEXT\ \MKII\
% and why (and how) we divert from it in its follow up \MKIV, now that
% \LUATEX\ and \OPENTYPE\ math have come around.

When you start using \TEX, you cannot help but notice that math plays an
important role in this system. As soon as you dive into the code you will see
that there is a concept of families that is closely related to math typesetting.
A family is a set of three sizes: text, script and scriptscript.

\startformula
a^{b^{c}} = \frac{d}{e}
\stopformula

The smaller sizes are used in superscripts and subscripts and in more complex
formulas where information is put on top of each other.

It is no secret that the latest math font technology is not driven by the \TEX\
community but by Microsoft. They have taken a good look at \TEX\ and extended the
\OPENTYPE\ font model with the information that is needed to do things similar to
\TEX\ and beyond. It is a firm proof of \TEX's abilities that after some 30 years
it is still seen as the benchmark for math typesetting. One can only speculate
what Don Knuth would have come up with if today's desktop hardware and printing
technology had been available in those days.

As a reference implementation of a font Microsoft provides Cambria Math. In the
specification the three sizes are there too: a font can provide specifically
designed script and scriptscript variants for text glyphs where that is relevant.
Control is exercised with the \type {ssty} feature.

Another inheritance from \TEX\ and its fonts is the fact that larger symbols can
be made out of snippets and these snippets are available as glyphs in the font,
so no special additional (extension) fonts are needed to get for instance really
large parentheses. The information of when to move up one step in size (given
that there is a larger shape available) or when and how to construct larger
symbols out of snippets is there as well. Placement of accents is made easy by
information in the font and there are a whole lot of parameters that control the
typesetting process. Of course you still need machinery comparable to \TEX's math
subsystem but Microsoft Word has such capabilities.

I'm not going to discuss the nasty details of providing math support in \TEX, but
rather pay some attention to an (at least for me) interesting side effect of
\TEX's math machinery. There are excellent articles by Bogus\l{}aw Jackowski and
Ulrik Vieth about how \TEX\ constructs math and of course Knuth's publications
are the ultimate source of information as well.

Even if you only glance at the implementation of traditional \TEX\ font support,
the previously mentioned families are quite evident. You can have 16 of them but
4 already have a special role: the upright roman font, math italic, math symbol
and math extension. These give us access to some 1000 glyphs in theory, but when
\TEX\ showed up it was mostly a 7-bit engine and input of text was often also
7-bit based, so in practice many fewer shapes are available, and subtracting the
snippets that make up the large symbols brings down the number again.

Now, say that in a formula you want to have a bold character. This character is
definitely not in the 4 mentioned families. Instead you enable another one, one
that is linked to a bold font. And, of course there is also a family for bold
italic, slanted, bold slanted, monospaced, maybe smallcaps, sans serif, etc. To
complicate things even more, there are quite a few symbols that are not covered
in the foursome so we need another 2 or 3 families just for those. And yes, bold
math symbols will demand even more families.

\startformula
a + \bf b + \bi c = \tt d + \ss e + \cal f
\stopformula

Try to imagine what this means for implementing a font system. When (in for
instance \CONTEXT) you choose a specific body font at a certain size, you not
only switch the regular text fonts, you also initialize math. When dealing with
text and a font switch there, it is no big deal to delay font loading and
initialization till you really need the font. But for math it is different. In
order to set up the math subsystem, the families need to be known and set up and
as each one can have three members you can imagine that you easily initialize
some 30 to 40 fonts. And, when you use several math setups in a document,
switching between them involves at least some re-initialization of those
families.

When Taco Hoekwater and I were discussing \LUATEX\ and especially what was needed
for math, it was sort of natural to extend the number of families to 256. After
all, years of traditional usage had demonstrated that it was pretty hard to come
up with math font support where you could freely mix a whole regular and a whole
bold set of characters simply because you ran out of families. This is a side
effect of math processing happening in several passes: you can change a family
definition within a formula, but as \TEX\ remembers only the family number, a
later definition overloads a previous one. The previous example in a traditional
\TEX\ approach can result in:

\starttyping
a + \fam7 b + \fam8 c = \fam9 d + \fam10 e + \fam11 f
\stoptyping

Here the \type{a} comes from the family that reflects math italic (most likely
family~1) and \type {+} and \type {=} can come from whatever family is told to
provide them (this is driven by their math code properties). As family numbers
are stored in the identification pass, and in the typesetting pass resolve to
real fonts you can imagine that overloading a family in the middle of a
definition is not an option: it's the number that gets stored and not what it is
bound to. As it is unlikely that we actually use more than 16 families we could
have come up with a pool approach where families are initialized on demand but
that does not work too well with grouping (or at least it complicates matters).

So, when I started thinking of rewriting the math font support for \CONTEXT\
\MKIV, I still had this nicely increased upper limit in mind, if only because I
was still thinking of support for the traditional \TEX\ fonts. However, I soon
realized that it made no sense at all to stick to that approach: \OPENTYPE\ math
was on its way and in the meantime we had started the math font project. But
given that this would easily take some five years to finish, an intermediate
solution was needed. As we can make virtual fonts in \LUATEX, I decided to go
that route and for several years already it has worked quite well. For the moment
the traditional \TEX\ math fonts (Computer Modern, px, tx, Lucida, etc) are
virtualized into a pseudo|-|\OPENTYPE\ font that follows the \UNICODE\ math
standard. So instead of needing more families, in \CONTEXT\ we could do with
less. In fact, we can do with only two: one for regular and one for bold,
although, thinking of it, there is nothing that prevents us from mixing different
font designs (or preferences) in one formula but even then a mere four families
would still be fine.

To summarize this, in \CONTEXT\ \MKIV\ the previous example now becomes:

\starttyping
U+1D44E + U+1D41B + 0x1D484 = U+1D68D + U+1D5BE + U+1D4BB
\stoptyping

For a long time I have been puzzled by the fact that one needs so many fonts for
a traditional setup. It was only after implementing the \CONTEXT\ \MKIV\ math
subsystem that I realized that all of this was only needed in order to support
alphabets, i.e.\ just a small subset of a font. In \UNICODE\ we have quite a few
math alphabets and in \CONTEXT\ we have ways to map a regular keyed-in (say)
\quote{a} onto a bold or monospaced one. When writing that code I hadn't even
linked the \UNICODE\ math alphabets to the family approach for traditional \TEX.
Not being a mathematician myself I had no real concept of systematic usage of
alternative alphabets (apart from the occasional different shape for an
occasional physics entity).

Just to give an idea of what \UNICODE\ defines: there are alphabets in regular
(upright), bold, italic, bold italic, script, bold script, fraktur, bold fraktur,
double|-|struck, sans|-|serif, sans|-|serif bold, sans|-|serif italic,
sans|-|serif bold italic and monospace. These are regular alphabets with upper-
and lowercase characters complemented by digits and occasionally Greek.

It was a few years later (somewhere near the end of 2010) that I realized that a
lot of the complications in (and load on) a traditional font system were simply
due to the fact that in order to get one bold character, a whole font had to be
loaded in order for families to express themselves. And that in order to have
several fonts being rendered, one needed lots of initialization for just a few
cases. Instead of wasting one font and family for an alphabet, one could as well
have combined 9 (upper and lowercase) alphabets into one font and use an offset
to access them (in practice we have to handle the digits too). Of course that
would have meant extending the \TEX\ math machinery with some offset or
alternative to some extensive mathcode juggling but that also has some overhead.

If you look at the plain \TEX\ definitions for the family related matters, you
can learn a few things. First of all, there are the regular four families
defined:

\starttyping
\textfont0=\tenrm \scriptfont0=\sevenrm \scriptscriptfont0=\fiverm
\textfont1=\teni  \scriptfont1=\seveni  \scriptscriptfont1=\fivei
\textfont2=\tensy \scriptfont2=\sevensy \scriptscriptfont2=\fivesy
\textfont3=\tenex \scriptfont3=\tenex   \scriptscriptfont3=\tenex
\stoptyping

Each family has three members. There are some related definitions
as well:

\starttyping
\def\rm      {\fam0\tenrm}
\def\mit     {\fam1}
\def\oldstyle{\fam1\teni}
\def\cal     {\fam2}
\stoptyping

So, with \type {\rm} you not only switch to a family (in math mode) but you also
enable a font. The same is true for \type {\oldstyle} and this actually brings us
to another interesting side effect. The fact that oldstyle numerals come from a
math font has implications for the way this rendering is supported in macro
packages. As naturally all development started when \TEX\ came around, package
design decisions were driven by the basic fact that there was only one math font
available. And, as a consequence most users used the Computer Modern fonts and
therefore there was never a real problem in getting those oldstyle characters in
your document.

However, oldstyle figures are a property of a font design (like table digits) and
as such not specially related to math. And, why should one tag each number then?
Of course it's good practice to tag extensively (and tagging makes switching
fonts easy) but to tag each number is somewhat over the top. When more fonts
(usable in \TEX) became available it became more natural to use a proper oldstyle
font for text and the \type {\oldstyle} more definitely ended up as a math
command. This was not always easy to understand for users who primarily used
\TEX\ for anything but math.

Another interesting aspect is that with \OPENTYPE\ fonts oldstyle figures are
again an optional feature, but now at a different level. There are a few more
such traditional issues: bullets often come from a math font as well (which works
out ok as they have nice, not so tiny bullets). But the same is true for
triangles, squares, small circles and other symbols. And, to make things worse,
some come from the regular \TEX\ math fonts, and others from additional ones,
like the \AMS\ symbols. Again, \OPENTYPE\ and \UNICODE\ will change this as now
these symbols are quite likely to be found in fonts as they have a larger
repertoire of shapes.

From the perspective of going from \MKII\ to \MKIV\ it boils down to changing old
mechanisms that need to handle all this (dependent on the availability of fonts)
to cleaner setups. Of course, as fonts are never completely consistent, or
complete for that matter, and features can be implemented incorrectly or
incompletely we still end up with issues, but (at least in \CONTEXT) dealing with
that has been moved to runtime manipulation of the fonts themselves (as part of
the so-called font goodies).

Back to the plain definitions, we now arrive at some new families:

\starttyping
\newfam\itfam \def\it{\fam\itfam\tenit}
\newfam\slfam \def\sl{\fam\slfam\tensl}
\newfam\bffam \def\bf{\fam\bffam\tenbf}
\newfam\ttfam \def\tt{\fam\ttfam\tentt}
\stoptyping

The plain \TEX\ format was never meant as a generic solution but instead was an
example of a macro set and serves as a basis for styles used by Don Knuth for his
books. Nevertheless, in spite of the fact that \TEX\ was made to be extended,
pretty soon it became frozen and the macros and font definitions that came with
it became the benchmark. This might be the reason why \UNICODE\ now has a
monospaced alphabet. Once you've added monospaced you might as well add more
alphabets as for sure in some countries they have their own preferences.
\footnote {At the Dante 2011 meeting we had interesting discussions during dinner
about the advantages of using Sütterlinschrift for vector algebra and the
possibilities for providing it in the upcoming \TeX\ Gyre math fonts.}

As with \type {\rm}, the related commands are meant to be used in text as well.
More interesting is to see what follows now:

\starttyping
\textfont        \itfam=\tenit
\textfont        \slfam=\tensl

\textfont        \bffam=\tenbf
\scriptfont      \bffam=\sevenbf
\scriptscriptfont\bffam=\fivebf

\textfont        \ttfam=\tentt
\stoptyping

Only the bold definition has all members. This means that (regular) italic,
slanted, and monospaced are not actually that much math at all. You will probably
only see them in text inside a math formula. From this you can deduce that
contrary to what I said before, these variants were not really meant for
alphabets, but for text in which case we need complete fonts. So why do I still
conclude that we don't need all these families? In practice text inside math is
not always done this way but with a special set of text commands. This is a
consequence of the fact that when we add text, we want to be able to do so in
each language with even language|-|specific properties supported. And, although a
family switch like the above might do well for English, as soon as you want
Polish (extended Latin), Cyrillic or Greek you definitely need more than a family
switch, if only because encodings come into play. In that respect it is
interesting that we do have a family for monospaced, but that \type {\Im} and
\type {\Re} have symbolic names, although a more extensive setup can have a
blackboard family switch.

By the way, the fact that \TEX\ came with italic alongside slanted also has some
implications. Normally a font design has either italic or something slanted (then
called oblique). But, Computer Modern came with both, which is no surprise as
there is a metadesign behind it. And therefore macro packages provide ways to
deal with those variants alongside. I wonder what would have happened if this had
not been the case. Nowadays there is always this regular, italic (or oblique),
bold and bold italic set to deal with, and the whole set can become lighter or
bolder.

In \CONTEXT\ \MKII, however, the set is larger as we also have slanted and bold
slanted and even smallcaps, so most definition sets have 7~definitions instead
of~4. By the way, smallcaps is also special. if Computer Modern had had smallcaps
for all variants, support for them in \CONTEXT\ undoubtedly would have been kept
out of the mentioned~7 but always been a new typeface definition (i.e.\ another
fontclass for insiders). So, when something would have to be smallcaps, one would
simply switch the whole lot to smallcaps (bold smallcaps, etc.). Of course this
is what normally happens, at least in my setups, but nevertheless one can still
find traces of this original Computer Modern|-|driven approach. And now we are at
it: the whole font system still has the ability to use design sizes and combine
different ones in sets, if only because in Computer Modern you don't have all
sizes. The above definitions use ten, seven and five, but for instance for an
eleven point set up you need to creatively choose the proper originals and scale
them to the right family size. Nowadays only a few fonts ship with multiple
design sizes, and although some can be compensated with clever hinting it is a
pity that we can apply this mechanism only to the traditional \TEX\ fonts.

Concerning the slanting we can remark that \TEX ies are so fond of this that they
even extended the \TEX\ engines to support slanting in the core machinery (or
more precisely in the backend while the frontend then uses adapted metrics). So,
slanting is available for all fonts.

This brings me to another complication in writing a math font subsystem: bold.
During the development of \CONTEXT\ \MKII\ I was puzzled by the fact that user
demands with respect to bold were so inconsistent. This is again related to the
way a somewhat simple setup looks: explicitly switching to bold characters or
symbols using a \type {\bf} (alike) switch. This works quite well in most cases,
but what if you use math in a section title? Then the whole lot should be in bold
and an embedded bold symbol should be heavy (i.e.\ more bold than bold). As a
consequence (and due to limited availability of complete bold math fonts) in
\MKII\ there are several bold strategies implemented.

However, in a \UNICODE\ universe things become surprisingly easy as \UNICODE\
defines those symbols that have bold companions (whatever you want to call them,
mostly math alphanumerics) so a proper math font has them already. This limited
subset is often available in a font collection and font designers can stick to
that subset. So, eventually we get one regular font (with some bold glyphs
according to the \UNICODE\ specification) and a bold companion that has heavy
variants for those regular bold shapes.

The simple fact that \UNICODE\ distinguishes regular and bold simplifies an
implementation as it's easier to take that as a starting point than users who for
all their goodwill see only their small domain of boldness.

It might sound like \UNICODE\ solves all our problems but this is not entirely
true. For instance, the \UNICODE\ principle that no character should be there
more than once has resulted in holes in the \UNICODE\ alphabets, especially
Greek, blackboard, fraktur and script. As exceptions were made for non|-|math I
see no reason why the few math characters that now put holes in an alphabet could
not have been there. As with more standards, following some principles too
strictly eventually results in all applications that follow the standard having
to implement the same ugly exceptions explicitly. As some standards aim for
longevity I wonder how many programming hours will be wasted this way.

This brings me to the conclusion that in practice 16 families are more than
enough in a \UNICODE|-|aware \TEX\ engine especially when you consider that for a
specific document one can define a nice set of families, just as in plain \TEX.
It's simply the fact that we want to make a macro package that does it all and
therefore has to provide all possible math demands into one mechanism that
complicates life. And the fact that \UNICODE\ clearly demonstrates that we're
only talking about alphabets has brought (at least) \CONTEXT\ back to its basics:
a relatively simple, few|-|family approach combined with a dedicated alphabet
selection system. Of course eventually users may come up with new demands and we
might again end up with a mess. After all, it's the fact that \TEX\ gives us
control that makes it so much fun.

\stopchapter

\stopcomponent