summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/hybrid/hybrid-math.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/context/sources/general/manuals/hybrid/hybrid-math.tex')
-rw-r--r--doc/context/sources/general/manuals/hybrid/hybrid-math.tex347
1 files changed, 347 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/hybrid/hybrid-math.tex b/doc/context/sources/general/manuals/hybrid/hybrid-math.tex
new file mode 100644
index 000000000..de10a1b9c
--- /dev/null
+++ b/doc/context/sources/general/manuals/hybrid/hybrid-math.tex
@@ -0,0 +1,347 @@
+% language=uk
+
+\startcomponent hybrid-math
+
+\environment hybrid-environment
+
+\startchapter[title={Handling math: A retrospective}]
+
+{This is \TUGBOAT\ article .. reference needed.}
+
+% In this article I will reflect on how the plain \TEX\ approach to math
+% fonts influenced the way math has been dealt with in \CONTEXT\ \MKII\
+% and why (and how) we divert from it in its follow up \MKIV, now that
+% \LUATEX\ and \OPENTYPE\ math have come around.
+
+When you start using \TEX, you cannot help but notice that math plays an
+important role in this system. As soon as you dive into the code you will see
+that there is a concept of families that is closely related to math typesetting.
+A family is a set of three sizes: text, script and scriptscript.
+
+\startformula
+a^{b^{c}} = \frac{d}{e}
+\stopformula
+
+The smaller sizes are used in superscripts and subscripts and in more complex
+formulas where information is put on top of each other.
+
+It is no secret that the latest math font technology is not driven by the \TEX\
+community but by Microsoft. They have taken a good look at \TEX\ and extended the
+\OPENTYPE\ font model with the information that is needed to do things similar to
+\TEX\ and beyond. It is a firm proof of \TEX's abilities that after some 30 years
+it is still seen as the benchmark for math typesetting. One can only speculate
+what Don Knuth would have come up with if today's desktop hardware and printing
+technology had been available in those days.
+
+As a reference implementation of a font Microsoft provides Cambria Math. In the
+specification the three sizes are there too: a font can provide specifically
+designed script and scriptscript variants for text glyphs where that is relevant.
+Control is exercised with the \type {ssty} feature.
+
+Another inheritance from \TEX\ and its fonts is the fact that larger symbols can
+be made out of snippets and these snippets are available as glyphs in the font,
+so no special additional (extension) fonts are needed to get for instance really
+large parentheses. The information of when to move up one step in size (given
+that there is a larger shape available) or when and how to construct larger
+symbols out of snippets is there as well. Placement of accents is made easy by
+information in the font and there are a whole lot of parameters that control the
+typesetting process. Of course you still need machinery comparable to \TEX's math
+subsystem but Microsoft Word has such capabilities.
+
+I'm not going to discuss the nasty details of providing math support in \TEX, but
+rather pay some attention to an (at least for me) interesting side effect of
+\TEX's math machinery. There are excellent articles by Bogus\l{}aw Jackowski and
+Ulrik Vieth about how \TEX\ constructs math and of course Knuth's publications
+are the ultimate source of information as well.
+
+Even if you only glance at the implementation of traditional \TEX\ font support,
+the previously mentioned families are quite evident. You can have 16 of them but
+4 already have a special role: the upright roman font, math italic, math symbol
+and math extension. These give us access to some 1000 glyphs in theory, but when
+\TEX\ showed up it was mostly a 7-bit engine and input of text was often also
+7-bit based, so in practice many fewer shapes are available, and subtracting the
+snippets that make up the large symbols brings down the number again.
+
+Now, say that in a formula you want to have a bold character. This character is
+definitely not in the 4 mentioned families. Instead you enable another one, one
+that is linked to a bold font. And, of course there is also a family for bold
+italic, slanted, bold slanted, monospaced, maybe smallcaps, sans serif, etc. To
+complicate things even more, there are quite a few symbols that are not covered
+in the foursome so we need another 2 or 3 families just for those. And yes, bold
+math symbols will demand even more families.
+
+\startformula
+a + \bf b + \bi c = \tt d + \ss e + \cal f
+\stopformula
+
+Try to imagine what this means for implementing a font system. When (in for
+instance \CONTEXT) you choose a specific body font at a certain size, you not
+only switch the regular text fonts, you also initialize math. When dealing with
+text and a font switch there, it is no big deal to delay font loading and
+initialization till you really need the font. But for math it is different. In
+order to set up the math subsystem, the families need to be known and set up and
+as each one can have three members you can imagine that you easily initialize
+some 30 to 40 fonts. And, when you use several math setups in a document,
+switching between them involves at least some re-initialization of those
+families.
+
+When Taco Hoekwater and I were discussing \LUATEX\ and especially what was needed
+for math, it was sort of natural to extend the number of families to 256. After
+all, years of traditional usage had demonstrated that it was pretty hard to come
+up with math font support where you could freely mix a whole regular and a whole
+bold set of characters simply because you ran out of families. This is a side
+effect of math processing happening in several passes: you can change a family
+definition within a formula, but as \TEX\ remembers only the family number, a
+later definition overloads a previous one. The previous example in a traditional
+\TEX\ approach can result in:
+
+\starttyping
+a + \fam7 b + \fam8 c = \fam9 d + \fam10 e + \fam11 f
+\stoptyping
+
+Here the \type{a} comes from the family that reflects math italic (most likely
+family~1) and \type {+} and \type {=} can come from whatever family is told to
+provide them (this is driven by their math code properties). As family numbers
+are stored in the identification pass, and in the typesetting pass resolve to
+real fonts you can imagine that overloading a family in the middle of a
+definition is not an option: it's the number that gets stored and not what it is
+bound to. As it is unlikely that we actually use more than 16 families we could
+have come up with a pool approach where families are initialized on demand but
+that does not work too well with grouping (or at least it complicates matters).
+
+So, when I started thinking of rewriting the math font support for \CONTEXT\
+\MKIV, I still had this nicely increased upper limit in mind, if only because I
+was still thinking of support for the traditional \TEX\ fonts. However, I soon
+realized that it made no sense at all to stick to that approach: \OPENTYPE\ math
+was on its way and in the meantime we had started the math font project. But
+given that this would easily take some five years to finish, an intermediate
+solution was needed. As we can make virtual fonts in \LUATEX, I decided to go
+that route and for several years already it has worked quite well. For the moment
+the traditional \TEX\ math fonts (Computer Modern, px, tx, Lucida, etc) are
+virtualized into a pseudo|-|\OPENTYPE\ font that follows the \UNICODE\ math
+standard. So instead of needing more families, in \CONTEXT\ we could do with
+less. In fact, we can do with only two: one for regular and one for bold,
+although, thinking of it, there is nothing that prevents us from mixing different
+font designs (or preferences) in one formula but even then a mere four families
+would still be fine.
+
+To summarize this, in \CONTEXT\ \MKIV\ the previous example now becomes:
+
+\starttyping
+U+1D44E + U+1D41B + 0x1D484 = U+1D68D + U+1D5BE + U+1D4BB
+\stoptyping
+
+For a long time I have been puzzled by the fact that one needs so many fonts for
+a traditional setup. It was only after implementing the \CONTEXT\ \MKIV\ math
+subsystem that I realized that all of this was only needed in order to support
+alphabets, i.e.\ just a small subset of a font. In \UNICODE\ we have quite a few
+math alphabets and in \CONTEXT\ we have ways to map a regular keyed-in (say)
+\quote{a} onto a bold or monospaced one. When writing that code I hadn't even
+linked the \UNICODE\ math alphabets to the family approach for traditional \TEX.
+Not being a mathematician myself I had no real concept of systematic usage of
+alternative alphabets (apart from the occasional different shape for an
+occasional physics entity).
+
+Just to give an idea of what \UNICODE\ defines: there are alphabets in regular
+(upright), bold, italic, bold italic, script, bold script, fraktur, bold fraktur,
+double|-|struck, sans|-|serif, sans|-|serif bold, sans|-|serif italic,
+sans|-|serif bold italic and monospace. These are regular alphabets with upper-
+and lowercase characters complemented by digits and occasionally Greek.
+
+It was a few years later (somewhere near the end of 2010) that I realized that a
+lot of the complications in (and load on) a traditional font system were simply
+due to the fact that in order to get one bold character, a whole font had to be
+loaded in order for families to express themselves. And that in order to have
+several fonts being rendered, one needed lots of initialization for just a few
+cases. Instead of wasting one font and family for an alphabet, one could as well
+have combined 9 (upper and lowercase) alphabets into one font and use an offset
+to access them (in practice we have to handle the digits too). Of course that
+would have meant extending the \TEX\ math machinery with some offset or
+alternative to some extensive mathcode juggling but that also has some overhead.
+
+If you look at the plain \TEX\ definitions for the family related matters, you
+can learn a few things. First of all, there are the regular four families
+defined:
+
+\starttyping
+\textfont0=\tenrm \scriptfont0=\sevenrm \scriptscriptfont0=\fiverm
+\textfont1=\teni \scriptfont1=\seveni \scriptscriptfont1=\fivei
+\textfont2=\tensy \scriptfont2=\sevensy \scriptscriptfont2=\fivesy
+\textfont3=\tenex \scriptfont3=\tenex \scriptscriptfont3=\tenex
+\stoptyping
+
+Each family has three members. There are some related definitions
+as well:
+
+\starttyping
+\def\rm {\fam0\tenrm}
+\def\mit {\fam1}
+\def\oldstyle{\fam1\teni}
+\def\cal {\fam2}
+\stoptyping
+
+So, with \type {\rm} you not only switch to a family (in math mode) but you also
+enable a font. The same is true for \type {\oldstyle} and this actually brings us
+to another interesting side effect. The fact that oldstyle numerals come from a
+math font has implications for the way this rendering is supported in macro
+packages. As naturally all development started when \TEX\ came around, package
+design decisions were driven by the basic fact that there was only one math font
+available. And, as a consequence most users used the Computer Modern fonts and
+therefore there was never a real problem in getting those oldstyle characters in
+your document.
+
+However, oldstyle figures are a property of a font design (like table digits) and
+as such not specially related to math. And, why should one tag each number then?
+Of course it's good practice to tag extensively (and tagging makes switching
+fonts easy) but to tag each number is somewhat over the top. When more fonts
+(usable in \TEX) became available it became more natural to use a proper oldstyle
+font for text and the \type {\oldstyle} more definitely ended up as a math
+command. This was not always easy to understand for users who primarily used
+\TEX\ for anything but math.
+
+Another interesting aspect is that with \OPENTYPE\ fonts oldstyle figures are
+again an optional feature, but now at a different level. There are a few more
+such traditional issues: bullets often come from a math font as well (which works
+out ok as they have nice, not so tiny bullets). But the same is true for
+triangles, squares, small circles and other symbols. And, to make things worse,
+some come from the regular \TEX\ math fonts, and others from additional ones,
+like the \AMS\ symbols. Again, \OPENTYPE\ and \UNICODE\ will change this as now
+these symbols are quite likely to be found in fonts as they have a larger
+repertoire of shapes.
+
+From the perspective of going from \MKII\ to \MKIV\ it boils down to changing old
+mechanisms that need to handle all this (dependent on the availability of fonts)
+to cleaner setups. Of course, as fonts are never completely consistent, or
+complete for that matter, and features can be implemented incorrectly or
+incompletely we still end up with issues, but (at least in \CONTEXT) dealing with
+that has been moved to runtime manipulation of the fonts themselves (as part of
+the so-called font goodies).
+
+Back to the plain definitions, we now arrive at some new families:
+
+\starttyping
+\newfam\itfam \def\it{\fam\itfam\tenit}
+\newfam\slfam \def\sl{\fam\slfam\tensl}
+\newfam\bffam \def\bf{\fam\bffam\tenbf}
+\newfam\ttfam \def\tt{\fam\ttfam\tentt}
+\stoptyping
+
+The plain \TEX\ format was never meant as a generic solution but instead was an
+example of a macro set and serves as a basis for styles used by Don Knuth for his
+books. Nevertheless, in spite of the fact that \TEX\ was made to be extended,
+pretty soon it became frozen and the macros and font definitions that came with
+it became the benchmark. This might be the reason why \UNICODE\ now has a
+monospaced alphabet. Once you've added monospaced you might as well add more
+alphabets as for sure in some countries they have their own preferences.
+\footnote {At the Dante 2011 meeting we had interesting discussions during dinner
+about the advantages of using Sütterlinschrift for vector algebra and the
+possibilities for providing it in the upcoming \TeX\ Gyre math fonts.}
+
+As with \type {\rm}, the related commands are meant to be used in text as well.
+More interesting is to see what follows now:
+
+\starttyping
+\textfont \itfam=\tenit
+\textfont \slfam=\tensl
+
+\textfont \bffam=\tenbf
+\scriptfont \bffam=\sevenbf
+\scriptscriptfont\bffam=\fivebf
+
+\textfont \ttfam=\tentt
+\stoptyping
+
+Only the bold definition has all members. This means that (regular) italic,
+slanted, and monospaced are not actually that much math at all. You will probably
+only see them in text inside a math formula. From this you can deduce that
+contrary to what I said before, these variants were not really meant for
+alphabets, but for text in which case we need complete fonts. So why do I still
+conclude that we don't need all these families? In practice text inside math is
+not always done this way but with a special set of text commands. This is a
+consequence of the fact that when we add text, we want to be able to do so in
+each language with even language|-|specific properties supported. And, although a
+family switch like the above might do well for English, as soon as you want
+Polish (extended Latin), Cyrillic or Greek you definitely need more than a family
+switch, if only because encodings come into play. In that respect it is
+interesting that we do have a family for monospaced, but that \type {\Im} and
+\type {\Re} have symbolic names, although a more extensive setup can have a
+blackboard family switch.
+
+By the way, the fact that \TEX\ came with italic alongside slanted also has some
+implications. Normally a font design has either italic or something slanted (then
+called oblique). But, Computer Modern came with both, which is no surprise as
+there is a metadesign behind it. And therefore macro packages provide ways to
+deal with those variants alongside. I wonder what would have happened if this had
+not been the case. Nowadays there is always this regular, italic (or oblique),
+bold and bold italic set to deal with, and the whole set can become lighter or
+bolder.
+
+In \CONTEXT\ \MKII, however, the set is larger as we also have slanted and bold
+slanted and even smallcaps, so most definition sets have 7~definitions instead
+of~4. By the way, smallcaps is also special. if Computer Modern had had smallcaps
+for all variants, support for them in \CONTEXT\ undoubtedly would have been kept
+out of the mentioned~7 but always been a new typeface definition (i.e.\ another
+fontclass for insiders). So, when something would have to be smallcaps, one would
+simply switch the whole lot to smallcaps (bold smallcaps, etc.). Of course this
+is what normally happens, at least in my setups, but nevertheless one can still
+find traces of this original Computer Modern|-|driven approach. And now we are at
+it: the whole font system still has the ability to use design sizes and combine
+different ones in sets, if only because in Computer Modern you don't have all
+sizes. The above definitions use ten, seven and five, but for instance for an
+eleven point set up you need to creatively choose the proper originals and scale
+them to the right family size. Nowadays only a few fonts ship with multiple
+design sizes, and although some can be compensated with clever hinting it is a
+pity that we can apply this mechanism only to the traditional \TEX\ fonts.
+
+Concerning the slanting we can remark that \TEX ies are so fond of this that they
+even extended the \TEX\ engines to support slanting in the core machinery (or
+more precisely in the backend while the frontend then uses adapted metrics). So,
+slanting is available for all fonts.
+
+This brings me to another complication in writing a math font subsystem: bold.
+During the development of \CONTEXT\ \MKII\ I was puzzled by the fact that user
+demands with respect to bold were so inconsistent. This is again related to the
+way a somewhat simple setup looks: explicitly switching to bold characters or
+symbols using a \type {\bf} (alike) switch. This works quite well in most cases,
+but what if you use math in a section title? Then the whole lot should be in bold
+and an embedded bold symbol should be heavy (i.e.\ more bold than bold). As a
+consequence (and due to limited availability of complete bold math fonts) in
+\MKII\ there are several bold strategies implemented.
+
+However, in a \UNICODE\ universe things become surprisingly easy as \UNICODE\
+defines those symbols that have bold companions (whatever you want to call them,
+mostly math alphanumerics) so a proper math font has them already. This limited
+subset is often available in a font collection and font designers can stick to
+that subset. So, eventually we get one regular font (with some bold glyphs
+according to the \UNICODE\ specification) and a bold companion that has heavy
+variants for those regular bold shapes.
+
+The simple fact that \UNICODE\ distinguishes regular and bold simplifies an
+implementation as it's easier to take that as a starting point than users who for
+all their goodwill see only their small domain of boldness.
+
+It might sound like \UNICODE\ solves all our problems but this is not entirely
+true. For instance, the \UNICODE\ principle that no character should be there
+more than once has resulted in holes in the \UNICODE\ alphabets, especially
+Greek, blackboard, fraktur and script. As exceptions were made for non|-|math I
+see no reason why the few math characters that now put holes in an alphabet could
+not have been there. As with more standards, following some principles too
+strictly eventually results in all applications that follow the standard having
+to implement the same ugly exceptions explicitly. As some standards aim for
+longevity I wonder how many programming hours will be wasted this way.
+
+This brings me to the conclusion that in practice 16 families are more than
+enough in a \UNICODE|-|aware \TEX\ engine especially when you consider that for a
+specific document one can define a nice set of families, just as in plain \TEX.
+It's simply the fact that we want to make a macro package that does it all and
+therefore has to provide all possible math demands into one mechanism that
+complicates life. And the fact that \UNICODE\ clearly demonstrates that we're
+only talking about alphabets has brought (at least) \CONTEXT\ back to its basics:
+a relatively simple, few|-|family approach combined with a dedicated alphabet
+selection system. Of course eventually users may come up with new demands and we
+might again end up with a mess. After all, it's the fact that \TEX\ gives us
+control that makes it so much fun.
+
+\stopchapter
+
+\stopcomponent