% language=uk \usemodule[fnt-23] \usemodule[fnt-25] \startcomponent mk-math \environment mk-environment \chapter{Unicode math} {\em I assume that the reader is somewhat familiar with math in \TEX. Although in \CONTEXT\ we try to support the concepts and symbols used in the \TEX\ community we have our own way of implementing math. The fact that \CONTEXT\ is not used extensively for conventional math journals permits us to rigourously re|-|implement mechanisms. Of course the user interfaces mostly remain the same.} \subject{introduction} The \LUATEX\ project entered a new stage when end of 2008 and beginning of 2009 math got opened up. Although \TEX\ can handle math pretty good we had a few wishes that we hoped to fulfill in the process. That \TEX's math machinery is a rather independent subsystem is reflected in the fact that after parsing there is an intermediate list of so called noads (math elements), which then gets converted into a node list (glyphs, kerns, penalties, glue and more). This conversion can be intercepted by a callback and a macro package can do whatever it likes with the list of noads as long as it returns a proper list. Of course \CONTEXT\ does support math and that is visible in its code base: \startitemize \item Due to the fact that we need to be able to switch to alternative styles the font system is quite complex and in \CONTEXT\ \MKII\ math font definitions (and changes) are good for 50\% of the time involved. In \MKIV\ we can use a more efficient model. \item Because some usage of \CONTEXT\ demands the mix of several completely different encoded math fonts there is a dedicated math encoding subsystem in \MKII. In \MKIV\ we will use \UNICODE\ exclusively. \item Some constructs (and symbols) are implemented in a way that we find suboptimal. In the perspective of \UNICODE\ in \MKIV\ we aim at all symbols being real characters. This is possible because all important constructs (like roots, accents and delimiters) are supported by the engine. \item In order to fit vertical spacing around math (think for instance of typesetting on a grid) in \MKII\ we have ended up with rather messy and suboptimal code. \footnote {This is because spacing before and after formulas has to cooperate with spacing of structural components that surround it.} The expectation is that we can improve that. \stopitemize In the following sections I will discuss a few of the implementation details of the font related issues in \MKIV. Of course a few years from now the actual solutions we implemented might look different but the principles remain the same. Also, as with other components of \LUATEX\ Taco and I worked in parallel on the code and its usage, which made both our tasks easier. \subject{transition} In \TEX, math typesetting uses a special concept called families. Each math component (number, letter, symbol, etc) is member of a family. Because we have three sizes (text, script and scriptscript) this results in a family||size matrix of defined fonts. Because the number of glyphs in a font was limited to 256, in practice it meant that we had quite some font definitions. The minimum number of families was~4 (roman, italic, symbol, and extension) but in practice several more could be active (sans, bold, mono|-|spaced, more symbols, etc.) for specific alphabets or extra symbols (for instance \AMS\ set A and B). The total number of families in traditional \TEX\ is limited to 16, and one easily hits this maximum. In that case, some 16 times 3 fonts are defined for one size of which in practice only a few are really used in the typesetting. A potential source of confusion is bold math. Bold in math can either mean having some bold letters, or having the whole formula in bold. In practice this means that for a complete bold formula one has to define the whole lot using bold fonts. A complication is that the math symbols (etc) are kind of bound to families and so we end up with either redefining symbols, or reusing the families (which is easier and faster). In any case there is a performance issue involved due to the rather massive switch from normal to bold. In \UNICODE\ all alphabets that make sense as well as all math symbols are part of the definition although unfortunately some alphabets have their letters spread over the \UNICODE\ vector and not in a range (like blackboard). This forces all applications that want to support math to implement similar hacks to deal with it. In \MKIV\ we will assume that we have \UNICODE\ aware math fonts, like \OPENTYPE. The font that sets the standard is Microsoft Cambria. The upcoming (I'm writing this in January 2009) \TEX Gyre fonts will be compliant to this standard but they're not yet there and so we have a problem. The way out is to define virtual fonts and now that \LUATEX\ math is extended to cover all of \UNICODE\ as well as provides access to the (intermediate) math lists this has become feasible. This also permits us to test \LUATEX\ with both Cambria and Latin Modern Virtual Math. The advantage is that we can stick to just one family for all shapes which simplifies the underlying \TEX\ code enormously. First of all we need to define way less fonts (which is partially compensated by loading them as part of the virtual font) and all math aspects can now be dealt with using the character data tables. One tricky aspect of the new approach is that the Latin Modern fonts have design sizes, so we have to define several virtual fonts. On the other hand, fonts like Cambria have alternative script and scriptscript shapes which is controlled by the \type {ssty} feature, a gsub alternate that provides some alternative sizes for a couple of hundred characters that matter. \starttabulate[|l|l|l|] \NC text \NC \type {lmmi12 at 12pt} \NC \type {cambria at 12pt with ssty=no} \NC \NR \NC script \NC \type {lmmi8 at 8pt} \NC \type {cambria at 8pt with ssty=1} \NC \NR \NC scriptscript \NC \type {lmmi6 at 6pt} \NC \type {cambria at 6pt with ssty=2} \NC \NR \stoptabulate So Cambria not so much has design sizes but shapes optimized relative to the text variant: in the following example we see text in red, script in green and scriptscript in blue. \startbuffer \definefontfeature[math][analyze=false,script=math,language=dflt] \definefontfeature[text] [math][ssty=no] \definefontfeature[script] [math][ssty=1] \definefontfeature[scriptscript][math][ssty=2] \stopbuffer \typebuffer \getbuffer Let us first look at Cambria: \startbuffer \startoverlay {\definedfont[name:cambriamath*scriptscript at 150pt]\mkblue X} {\definedfont[name:cambriamath*script at 150pt]\mkgreen X} {\definedfont[name:cambriamath*text at 150pt]\mkred X} \stopoverlay \stopbuffer \typebuffer \startlinecorrection \getbuffer \stoplinecorrection When we compare them scaled down as happens in real script and scriptscript we get: \startbuffer \startoverlay {\definedfont[name:cambriamath*scriptscript at 120pt]\mkblue X} {\definedfont[name:cambriamath*script at 80pt]\mkgreen X} {\definedfont[name:cambriamath*text at 60pt]\mkred X} \stopoverlay \stopbuffer \typebuffer \startlinecorrection \getbuffer \stoplinecorrection Next we see (scaled) Latin Modern: \startbuffer \startoverlay {\definedfont[LMRoman8-Regular at 150pt]\mkblue X} {\definedfont[LMRoman10-Regular at 150pt]\mkgreen X} {\definedfont[LMRoman12-Regular at 150pt]\mkred X} \stopoverlay \stopbuffer \typebuffer \startlinecorrection \getbuffer \stoplinecorrection In practice we will see: \startbuffer \startoverlay {\definedfont[LMRoman8-Regular at 120pt]\mkblue X} {\definedfont[LMRoman10-Regular at 80pt]\mkgreen X} {\definedfont[LMRoman12-Regular at 60pt]\mkred X} \stopoverlay \stopbuffer \typebuffer \startlinecorrection \getbuffer \stoplinecorrection Both methods probably work out well although you need to keep in mind that the \OPENTYPE\ \type {ssty} feature is not so much a design size related feature. An \OPENTYPE\ font can have a specification for the script and scriptscript size. By default we listen to this specification instead of the one imposed by the bodyfont environment. When you turn on tracing \starttyping \enabletrackers[otf.math] \stoptyping you will get messages like: \starttyping asked scriptscript size: 458752, used: 471859.2 (102.86 %) asked script size: 589824, used: 574095.36 (97.33 %) \stoptyping The differences between the defaults and the font recommendations are not that large so by default we listen to the font specification. \usetypescript[cambria] \start \setupbodyfont[cambria] \stop \definefontfeature[math-script] [math-script] [mathsize=no] \definefontfeature[math-scriptscript][math-scriptscript][mathsize=no] \definetypeface [cambria-ns] [rm] [serif] [cambria] [default] \definetypeface [cambria-ns] [tt] [mono] [modern] [default] \definetypeface [cambria-ns] [mm] [math] [cambria] [default] \usetypescript[cambria-ns] \start \setupbodyfont[cambria-ns] \stop \startlinecorrection \scale [width=\textwidth] {\backgroundline [darkgray] {\startoverlay {\white\switchtobodyfont [cambria]$\sum_{i=0}^n$} {\mkred\switchtobodyfont[cambria-ns]$\sum_{i=0}^n$} \stopoverlay \startoverlay {\white\switchtobodyfont [cambria]$\int_{i=0}^n$} {\mkred\switchtobodyfont[cambria-ns]$\int_{i=0}^n$} \stopoverlay \startoverlay {\white\switchtobodyfont [cambria]$\log_{i=0}^n$} {\mkred\switchtobodyfont[cambria-ns]$\log_{i=0}^n$} \stopoverlay \startoverlay {\white\switchtobodyfont [cambria]$\cos_{i=0}^n$} {\mkred\switchtobodyfont[cambria-ns]$\cos_{i=0}^n$} \stopoverlay \startoverlay {\white\switchtobodyfont [cambria]$\prod_{i=0}^n$} {\mkred\switchtobodyfont[cambria-ns]$\prod_{i=0}^n$} \stopoverlay}} \stoplinecorrection \definefontfeature[math-script] [math-script] [mathsize=yes] \definefontfeature[math-scriptscript][math-scriptscript][mathsize=yes] In this overlay the white text is scaled according to the specification in the font, while the red text is scaled according to the bodyfont environment (12/7/5 points). \subject{going virtual} The number of math fonts (used) in the \TEX\ community is relatively small and of those only Latin Modern (which builds upon Computer Modern) has design sizes. This means that the amount of \UNICODE\ compliant virtual math fonts that we have to make is not that large. We could have used an already present virtual composition mechanism but instead we made a handy helper function that does a more efficient job. This means that a definition looks (a bit simplified) as follows: \starttyping mathematics.make_font ( "lmroman10-math", { { name="lmroman10-regular", features="virtualmath", main=true }, { name="lmmi10", vector="tex-mi", skewchar=0x7F }, { name="lmsy10", vector="tex-sy", skewchar=0x30, parameters=true } , { name="lmex10", vector="tex-ex", extension=true } , { name="msam10", vector="tex-ma" }, { name="msbm10", vector="tex-mb" }, { name="lmroman10-bold", "tex-bf" } , { name="lmmib10", vector="tex-bi", skewchar=0x7F } , { name="lmsans10-regular", vector="tex-ss", optional=true }, { name="lmmono10-regular", vector="tex-tt", optional=true }, } ) \stoptyping For the \TEX Gyre Pagella it looks this way: \starttyping mathematics.make_font ( "px-math", { { name="texgyrepagella-regular", features="virtualmath", main=true }, { name="pxr", vector="tex-mr" } , { name="pxmi", vector="tex-mi", skewchar=0x7F }, { name="pxsy", vector="tex-sy", skewchar=0x30, parameters=true } , { name="pxex", vector="tex-ex", extension=true } , { name="pxsya", vector="tex-ma" }, { name="pxsyb", vector="tex-mb" }, } ) \stoptyping As you can see, it is possible to add alphabets, given that there is a suitable vector that maps glyph indices onto \UNICODE s. It is good to know that this function only defines the way such a font is constructed. The actual construction is delayed till the font is needed. Such a virtual font is used in typescripts (the building blocks of typeface definitions in \CONTEXT) as follows: \starttyping \starttypescript [math] [palatino] [name] \definefontsynonym [MathRoman] [pxmath@px-math] \loadmapfile[original-youngryu-px.map] \stoptypescript \stoptyping If you're familiar with the way fonts are defined in \CONTEXT, you will notice that we no longer need to define MathItalic, MathSymbol and additional symbol fonts. Of course users don't have to deal with these issues themselves. The \type {@} triggers the virtual font builder. You can imagine that in \MKII\ switching to another font style or size involves initializing (or at least checking) involves some 30 to 40 font definitions when it comes to math (the number of used families times 3, the number o fmath sizes.). And even if we take into account that fonts are loaded only once, this checking and enabling takes time. Keep in mind that in \CONTEXT\ we can have several math font sets active in one document which comes at a price. In \MKIV\ we use one family (at three sizes). Of course we need to load the font (and more than one in the case of virtual variants) but when switching bodyfont sizes we only need to enable one (already defined) math font. And that really saves time. This is one of the areas where we gain back time that we loose elsewhere by extending core functionality using \LUA\ (like \OPENTYPE\ support). \subject{dimensions} By setting font related dimensions you can control the way \TEX\ positions math elements relative to each other. Math fonts have a few more dimensions than regular text fonts. But \OPENTYPE\ math fonts like Cambria have quite some more. There is a nice booklet published by Microsoft, \quote {Mathematical Typesetting}, where dealing with math is discussed in the perspective of their word processor and \TEX. In the booklet some of the parameters are discussed and since many of them are rather special it makes no sense (yet) to elaborate on them here. \footnote {Googling on \quote {Ulrich Vieth}, \quote {TeX} and \quote {conferences} might give you some hits on articles on these matters.} Figuring out their meaning was quite a challenge. I am the first to admit that the current code in \MKIV\ that deals with math parameters is somewhat messy. There are several reasons for this: \startitemize[packed] \item We can pass parameters as \type {MathConstants} table in the \TFM\ table that we pass to the core engine. \item We can use some named parameters, like \type {x_height} and pass those in the \type {parameters} table. \item We can use the traditional font dimension numbers in the \type {parameters} table, but since they overlap for symbol and extensible fonts, that is asking for troubles. \stopitemize Because in \MKIV\ we create virtual fonts at run|-|time and use just one family, we fill the \type {MathConstants} table for traditional fonts as well. Future versions may use the upcoming mechanisms of font parameter sets at the macro level. These can be defined for each of the sizes (display, text, script and scriptscript, and the last three in cramped form as well) but since a font only carries one set, we currently use a compromise. \subject{tracing} One of the nice aspects of the opened up math machinery is that it permits us to get a more detailed look at what happens. It also fits nicely in the way we always want to visualize things in \CONTEXT\ using color, although most users are probably unaware of many such features because they don't need them as I do. \startbuffer \enabletrackers[math.analyzing] \ruledhbox{$a = \sqrt{b^2 + \sin{c} - {1 \over \gamma}}$} \disabletrackers[math.analyzing] \stopbuffer \typebuffer \startbaselinecorrection \getbuffer \stopbaselinecorrection This tracker option colors characters depending on their nature and the fact that they are remapped. The tracker also was handy during development of \LUATEX\ especially for checking if attributes migrated right in constructed symbols. For over a year I had been using a partial \UNICODE\ math implementation in some projects but for serious math the vectors needed to be completed. In order to help the \quote {math department} of the \CONTEXT\ development team (Aditya Mahajan, Mojca Miklavec, Taco Hoekwater and myself) we have some extra tracing options, like \startbuffer \showmathfontcharacters[list=0x0007B] \stopbuffer \typebuffer \start \blank \getbuffer \blank \stop The simple variant with no arguments would have extended this document with many pages of such descriptions. Another handy command (defined in module \type{fnt-25}) is the following: \starttyping \ShowCompleteFont{name:cambria}{9pt}{1} \ShowCompleteFont{dummy@lmroman10-math}{10pt}{1} \stoptyping This will for instance for Cambria generate between 50 and 100 pages of character tables. \startbuffer[mathtest] $abc \bf abc \bi abc$ $\mathscript abcdefghijklmnopqrstuvwxyz % 1234567890 ABCDEFGHIJKLMNOPQRSTUVWXYZ$ $\mathfraktur abcdefghijklmnopqrstuvwxyz % 1234567890 ABCDEFGHIJKLMNOPQRSTUVWXYZ$ $\mathblackboard abcdefghijklmnopqrstuvwxyz % 1234567890 ABCDEFGHIJKLMNOPQRSTUVWXYZ$ $\mathscript abc IRZ \mathfraktur abc IRZ % \mathblackboard abc IRZ \ss abc IRZ 123$ \stopbuffer If you look at the following samples you can imagine how coloring the characters and replacements helped figuring out the alphabets We use the following input (stored in a buffer): \typebuffer [mathtest] For testing Cambria we say: \starttyping \usetypescript[cambria] \switchtobodyfont[cambria,11pt] \enabletrackers[math.analyzing] \getbuffer[mathtest] % the input shown before \disabletrackers[math.analyzing] \stoptyping And we get: \usetypescript[cambria] % global \startlines \switchtobodyfont[cambria,10pt] \enabletrackers[math.analyzing] \getbuffer[mathtest] % the input shown before \disabletrackers[math.analyzing] \stoplines For the virtualized Latin Modern we say: \starttyping \usetypescript[modern] \switchtobodyfont[modern,11pt] \enabletrackers[math.analyzing] \getbuffer[mathtest] % the input shown before \disabletrackers[math.analyzing] \stoptyping This gives: \usetypescript[modern] % global \startlines \switchtobodyfont[modern,11pt] \enabletrackers[math.analyzing] \getbuffer[mathtest] \disabletrackers[math.analyzing] \stoplines These two samples demonstrate that Cambria has a rather complete repertoire of shapes which is no surprise because it is a recent font that also serves as a showcase for \UNICODE\ and \OPENTYPE\ driven math. Commands like \type {\mathscript} sets an attribute. When we post|-|process the noad list and encounter this attribute, we remap the characters to the desired variant. Of course this happens selectively. So, a capital~A (\type {0x0041}) becomes a capital script~A (\type {0x1D49C}). Of course this solution is rather \CONTEXT\ specific and there are other ways to achieve the same goal (like using more families and switching family). \subject{special cases} Because we now are operating in the \UNICODE\ domain, we run into problems if we keep defining some of the math symbols in the traditional \TEX\ way. Even with the \AMS\ fonts available we still end up with some characters that are represented by combining others. Take for instance $\neq$ which is composed of two characters. Because in \MKIV\ we want to have all characters in their pure form we use a virtual replacement for them. In \MKIV\ speak it looks like this: \starttyping local function negate(main,unicode,basecode) local characters = main.characters local basechar = characters[basecode] local ht, wd = basechar.height, basechar.width characters[unicode] = { width = wd, height = ht, depth = basechar.depth, italic = basechar.italic, kerns = basechar.kerns, commands = { { "slot", 1, basecode }, { "push" }, { "down", ht/5}, { "right", - wd/2}, { "slot", 1, 0x2215 }, { "pop" }, } } end \stoptyping In case you're curious, there are indeed kerns, in this case the kerns with the Greek Delta. Another thing we need to handle is positioning of accents on top of slanted (italic) shapes. For this \TEX\ uses a special character in its fonts (set with \type{\skewchar}). Any character can have in its kerning table a kern towards this special character. From this kern we can calculate the \type {top_accent} variable that we can pass for each character. This variable lives at the same level as \type {width}, \type {height}, \type {depth} and \type {italic} and is calculated as: $w/2 + k$, so it defines the horizontal anchor. A nice side effect is that (in the \CONTEXT\ font management subsystem) this saves us passing information associated with specific fonts such as the skew character. A couple of concepts are unique to \TEX, like having \type {\hat} and \type {\widehat} where the wide one has sizes. In \OPENTYPE\ and \UNICODE\ we don't have this distinction so we need special trickery to simulate this. We do so by adding extra code points in a private \UNICODE\ space which in return results in them being defined automatically and the relevant first size variant being used for \type {\hat}. For some users this might still be too wide but at least it's better than a wrongly positioned \ASCII\ variant. In the future we might use this private space for similar cases. Arrows, horizontal extenders and radicals also fall in the category \quote {troublesome} if only because they use special dimensions to get the desired effect. Fortunately \OPENTYPE\ math is modeled after \TEX, so in \LUATEX\ we introduce a couple of new constructs to deal with this. One such simplification at the macro level is in the definition of \type {\root}. Here we use the new \type {\Uroot} primitive. The placement related parameters are those used by traditional \TEX, but when they are available the \OPENTYPE\ parameters are applied. The simplified plain definitions are now: \starttyping \def\rootradical{\Uroot 0 "221A } \def\root#1\of{\rootradical{#1}} \def\sqrt{\rootradical{}} \stoptyping The successive sizes of the root will be taken from the font in the same way as traditional \TEX\ does it. In that sense \LUATEX\ is no doing anything differently, it only has more parameters to control the process. The definition of \type {\sqrt} in \CONTEXT\ permits an optional first argument that sets the degree. \startbuffer \showmathfontcharacters[list=0x221A] \stopbuffer \start \blank \getbuffer \blank \stop Note that we've collected all characters in family~0 (simply because that is what \TEX\ defaults characters to) and that we use the formal \UNICODE\ slots. When we use the Latin Modern fonts we just remap traditional slots to the right ones. Another neat trick is used when users choose among the bigger variants of some characters. The traditional approach is to create a box of a certain size and create a fake delimited variant which is then used. \starttyping \definemathcommand [big] {\choosemathbig\plusone } \definemathcommand [Big] {\choosemathbig\plustwo } \definemathcommand [bigg] {\choosemathbig\plusthree} \definemathcommand [Bigg] {\choosemathbig\plusfour } \stoptyping Of course this can become a primitive operation and we might decide to add such a primitive later on so we won't bother you with more details. Attributes are also used to make live easier for authors who have to enter lots of pairs. Compare: \startbuffer \setupmathematics[autopunctuation=no] $ (a,b) = (1.20,3.40) $ \stopbuffer \typebuffer \begingroup \getbuffer \endgroup with: \startbuffer \setupmathematics[autopunctuation=yes] $ (a,b) = (1.20,3.40) $ \stopbuffer \typebuffer \begingroup \getbuffer \endgroup So we don't need to use this any more: \starttyping $ (a{,}b) = (1{.}20{,}3{.}40) $ \stoptyping Features like this are implemented on top of an experimental math manipulation framework that is part of \MKIV. When the math font system is stable we will rework the rest of math support and implement additional manipulating frameworks. \subject{control} As with all other character related issues, in \MKIV\ everything is driven by a character table (consider it a database). Quite some effort went into getting that one right and although by now math is represented well, more data will be added in due time. In \MKIV\ we no longer have huge lists of \TEX\ definitions for math related symbols. Everything is initialized using the mentioned table: normal symbols, delimiters, radicals, whether or not with name. Take for instance the square root: \start \blank \showmathfontcharacters[list=0x221A] \blank \stop Its entry is: \starttyping [0x221A] = { adobename = "radical", category = "sm", cjkwd = "a", description = "SQUARE ROOT", direction = "on", linebreak = "ai", mathclass = "radical", mathname = "surd", unicodeslot = 0x221A, } \stoptyping The fraction symbol also comes in sizes. This symbol is not to be confused with the negation symbol \type {0x2215}, which in \TEX\ is known as \type {\not}). \start \blank \showmathfontcharacters[list=0x2044] \blank \stop \starttyping [0x2044] = { adobename = "fraction", category = "sm", contextname = "textfraction", description = "FRACTION SLASH", direction = "cs", linebreak = "is", mathspec = { { class = "binary", name = "slash" }, { class = "close", name = "solidus" }, }, unicodeslot = 0x2044, } \stoptyping However, since most users don't have this symbol visualized in their word processor, they expect the same behaviour from the regular slash. This is why we find a reference to the real symbol in its definition. \start \blank \showmathfontcharacters[list=0x002F] \blank \stop The definition is: \starttyping [0x002F] = { adobename = "slash", category = "po", cjkwd = "na", contextname = "textslash", description = "SOLIDUS", direction = "cs", linebreak = "sy", mathsymbol = 0x2044, unicodeslot = 0x002F, } \stoptyping One problem left is that currently we have only one class per character (apart from the delimiter and radical usage which have their own definitions). Future releases of \CONTEXT\ will provide support for math dictionaries (as in \OPENMATH\ and \MATHML~3). At that point we will also have a \type {mathdict} entry. There is another issue with character mappings, one that will seldom reveal itself to the user, but might confuse macro writers when they see an error message. In traditional \TEX, and therefore also in the Latin Modern fonts, a chain from small to large character goes in two steps: the normal size is taken from one family and the larger variants from another. The larger variant then has a pointer to an even larger one and so on, until there is no larger variant or an extensible recipe is found. The default family is number~0. It is for this reason that some of the definition primitives expect a small and large family part. However, in order to support \OPENTYPE\ in \LUATEX\ the alternative method no longer assumes this split. After all, we no longer have a situation where the 256 limit forces us to take the smaller variant from one font and the larger sequence from another (so we need two family||slot pairs where each family eventually resolves to a font). It is for that reason that the new \type {\U...} primitives expect only one family specification: the small symbol, which then has a pointer to a larger variant when applicable. However deep down in the engine, there is still support for the multiple family solution (after all, we don't want to drop compatibility). As a result, in error messages you can still find references (defaulting to~0) to large specifications, even if you don't use them. In that case you can simply ignore the large symbol (0,0), since it is not used when the small symbol provides a link. \subject{extensibles} In \TEX\ fences can be told to become larger automatically. In traditional \TEX\ a character can have a linked list of next larger shapes ending in a description of how to compose even larger variants. A parenthesis in Cambria has the following list: \start \switchtobodyfont[cambria,10pt] \showmathfontcharacters[list=0x00028] \stop In Latin Modern we have: \start \switchtobodyfont[modern,10pt] \showmathfontcharacters[list=0x00028] \stop Of course \LUATEX\ is downward compatible with respect to this feature, but the internal representation is now closer to what \OPENTYPE\ math provides (which is not that far from how \TEX\ works simply because it's inspired by \TEX). Because Cambria has different parameters we get slightly different results. In the following list of pairs, you see Cambria on the left and Latin Modern on the right. Both start with stepwise larger shapes, followed by a more gradual growth. The thresholds for a next step are driven by parameters set in the \OPENTYPE\ font or by \TEX's default. \start \lineskip1ex \dostepwiserecurse{5}{140}{5} { \dontleavehmode \ruledhbox \bgroup \setbox0=\vbox{\vss\hbox{\switchtobodyfont[cambria,10pt]$\left\{ \vcenter{\hbox{\darkgray\vrule height \recurselevel pt width 5pt}} \right\}$}\vss}% \setbox2=\vbox{\vss\hbox{\switchtobodyfont[modern, 10pt]$\left\{ \vcenter{\hbox{\darkgray\vrule height \recurselevel pt width 5pt}} \right\}$}\vss}% \ifdim\ht0>\ht2 \setbox2\vbox to \htdp0{\vss\box2\vss}% \else \setbox0\vbox to \htdp2{\vss\box0\vss}% \fi \box0\box2 \egroup \quad } \par \stop In traditional \TEX\ horizontal extensibles are not really present. Accents are chosen from a linked list of variants and don't have an extensible specification. This is because most such accents grow in two dimensions and the only extensible like accents are rules and braces. However, in \UNICODE\ we have a few more and also because of symmetry we decided to add horizontal extensibles too. Take: \startbuffer $ \overbrace {a+1} \underbrace {b+2} \doublebrace {c+3} $ \par $ \overparent{a+1} \underparent{b+2} \doubleparent{c+3} $ \par \stopbuffer \typebuffer This gives: \getbuffer Contrary to Cambria, Latin Modern Math, which is just like Computer Modern Math, has no ready overbrace glyphs. Keep in mind that in that we're dealing with fonts that have only 256 slots and that the traditional font mechanism has the same limitation. For this reason, the (extensible) braces are traditionally made from snippets as is demonstrated below. \startbuffer \hbox\bgroup \ruledhbox{\getglyph{lmex10}{\char"7A}} \ruledhbox{\getglyph{lmex10}{\char"7B}} \ruledhbox{\getglyph{lmex10}{\char"7C}} \ruledhbox{\getglyph{lmex10}{\char"7D}} \ruledhbox{\getglyph{lmex10}{\char"7A\char"7D\char"7C\char"7B}} \ruledhbox{\getglyph{name:cambriamath}{\char"23DE}} \ruledhbox{\getglyph{lmex10}{\char"7C\char"7B\char"7A\char"7D}} \ruledhbox{\getglyph{name:cambriamath}{\char"23DF}} \egroup \stopbuffer \typebuffer This gives: \startlinecorrection \getbuffer \stoplinecorrection The four snippets have the height and depth of the rule that will connect them. Since we want a single interface for all fonts we no longer will use macro based solutions. First of all fonts like Cambria don't have the snippets, and using active character trickery (so that we can adapt the meaning to the font) has no preference either. This leaves virtual glyphs. It took us a bit of experimenting to get the right virtual definition because it is a multi||step process: \startitemize[packed] \item The right \UNICODE\ character (\type {0x23DE}) points to a character that has no glyph itself but only horizontal extensibles. \item The snippets that make up the extensible don't have the right dimensions (as they define the size of the connecting rule), so we need to make them virtual themselves and give them a size that matches \LUATEX's expectations. \item Each virtual snippet contains a reference to the physical snippet and moves it up or down as well as fixes its size. \item The second and fifth snippet are actually not real glyphs but rules. The dimensions are derived from the snippets and it is shifted up or down too. \stopitemize You might wonder if this is worth the trouble. Well, it is if you take into account that all upcoming math fonts will be organized like Cambria. \subject{math kerning} While reading Microsofts orange booklet, it became clear that \OPENTYPE\ provides advanced kerning possibilities and we decided to put it on the agenda for \LUATEX. It is possible to define a ladder||like boundary for each corner of a character where the ladder more or less follows the shape of a character. In theory this means that when we attach a superscript to a base character we can use two such ladders to determine the optimal spacing between them. Let's have a look at a few characters, the upright~f and its italic cousin. \startcombination[2*1] {\ShowGlyphShape{name:cambria-math}{40bp}{0x66}} {U+00066} {\ShowGlyphShape{name:cambria-math}{40bp}{0x1D453}} {0x1D453} \stopcombination The ladders on the right can be used to position a super or subscript, that is, they are positioned in the normal way but the ladder, as well as the boundingbox and/or left ladders of the scripts can be used to fine tune the positioning. Should we use this information? I made this visualizer for checking some Arabic fonts anchoring and cursive features and then it made sense to add some of the information related to math as well. \footnote {Taco extended the visualizer for his presentation at Bachotek 2009 so you might run into variants.} The orange booklet shows quite advanced ladders, and when looking at the 3500 shapes in Cambria, it quickly becomes clear that in practice there is not that much detail in the specification. Nevertheless, because without this feature the result is not acceptable \LUATEX\ gracefully supports it. \usetypescript[cambria-y] \startbuffer $V^a_a V^a V_a V^1_2 V^1 V_2 f^a f_a f^a_a$\par $V^f_f V^f V_f V^1_2 V^1 V_2 f^f f_f f^f_f$\par $T^a_a T^a T_a T^1_2 T^1 T_2 f^a f_f f^a_f$\par $T^f_f T^f T_f T^1_2 T^1 T_2 f^f f_a f^f_a$\par \stopbuffer \startlinecorrection \startcombination[3*1] {\framed[align=normal]{\switchtobodyfont[modern]\getbuffer}} {latin modern} {\framed[align=normal]{\switchtobodyfont[cambria-y]\getbuffer}} {cambria without kerning} {\framed[align=normal]{\switchtobodyfont[cambria]\getbuffer}} {cambria with kerning} \stopcombination \stoplinecorrection % \ShowGlyphShape{name:cambria-math} {40bp}{0x1D43F} % \ShowGlyphShape{name:cambria-math}{100bp}{0x1D444} % \ShowGlyphShape{name:cambria-math}{100bp}{0x1D447} % \ShowGlyphShape{name:cambria-math}{100bp}{0x2112} % \ShowGlyphShape{name:cambria-math}{100bp}{0x1D432} % \ShowGlyphShape{name:cambria-math}{100bp}{0x1D43D} % \ShowGlyphShape{name:cambria-math}{100bp}{0x1D44A} % \ShowGlyphShape{name:cambria-math}{100bp}{0x1D45D} \subject{faking glyphs} A previous section already discussed virtual shapes. In the process of replacing all shapes that lack in Latin Modern and are composed from snippets instead we ran into the dots. As they are a nice demonstration of something that, although somewhat of a hack, survived 30 years without problems we show the definition used in \CONTEXT\ \MKII: % ldots = 2026 % vdots = 22EE % cdots = 22EF % ddots = 22F1 % udots = 22F0 \startbuffer \def\PLAINldots{\ldotp\ldotp\ldotp} \def\PLAINcdots{\cdotp\cdotp\cdotp} \def\PLAINvdots {\vbox{\forgetall\baselineskip.4\bodyfontsize\lineskiplimit\zeropoint\kern.6\bodyfontsize\hbox{.}\hbox{.}\hbox{.}}} \def\PLAINddots {\mkern1mu% \raise.7\bodyfontsize\ruledvbox{\kern.7\bodyfontsize\hbox{.}}% \mkern2mu% \raise.4\bodyfontsize\relax\ruledhbox{.}% \mkern2mu% \raise.1\bodyfontsize\ruledhbox{.}% \mkern1mu} \stopbuffer \getbuffer \typebuffer This permitted us to say: \starttyping \definemathcommand [ldots] [inner] {\PLAINldots} \definemathcommand [cdots] [inner] {\PLAINcdots} \definemathcommand [vdots] [nothing] {\PLAINvdots} \definemathcommand [ddots] [inner] {\PLAINddots} \stoptyping However, in \MKIV\ we use virtual shapes instead. \definemathcommand [xldots] [inner] {\PLAINldots} \definemathcommand [xcdots] [inner] {\PLAINcdots} \definemathcommand [xvdots] [nothing] {\PLAINvdots} \definemathcommand [xddots] [inner] {\PLAINddots} The following lines show the virtual shapes in red. In each triplet we see the original, the virtual and the overlaid character. \startlinecorrection \switchtobodyfont[modern,17.3pt]% \dontleavehmode \ruledhbox{$\xldots$}% \ruledhbox{$\ldots$}% \ruledhbox{\startoverlay{$\xldots$}{$\red\ldots$}\stopoverlay}% \quad \ruledhbox{$\xcdots$}% \ruledhbox{$\cdots$}% \ruledhbox{\startoverlay{$\xcdots$}{$\red\cdots$}\stopoverlay}% \quad \ruledhbox{$\xvdots$}% \ruledhbox{$\vdots$}% \ruledhbox{\startoverlay{$\xvdots$}{$\red\vdots$}\stopoverlay}% \quad \ruledhbox{$\xddots$}% \ruledhbox{$\ddots$}% \ruledhbox{\startoverlay{$\xddots$}{$\red\ddots$}\stopoverlay}% \quad \ruledhbox{$\xddots$}% \ruledhbox{$\udots$}% \ruledhbox{\startoverlay{$\xddots$}{$\red\udots$}\stopoverlay}% \stoplinecorrection As you can see here, the virtual variants are rather close to the originals. At 12pt there are no real differences but (somehow) at other sizes we get slightly different results but it is hardly visible. Watch the special spacing above the shapes. It is probably needed for getting the spacing right in matrices (where they are used). \stopcomponent