summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/mk/mk-math.tex
blob: 9fddd4f2739e8d7b72583912599124a4f8c63e51 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
% language=uk

\usemodule[fnt-23]
\usemodule[fnt-25]

\startcomponent mk-math

\environment mk-environment

\chapter{Unicode math}

{\em I assume that the reader is somewhat familiar with math in
\TEX. Although in \CONTEXT\ we try to support the concepts and
symbols used in the \TEX\ community we have our own way of
implementing math. The fact that \CONTEXT\ is not used extensively
for conventional math journals permits us to rigourously
re|-|implement mechanisms. Of course the user interfaces mostly
remain the same.}

\subject{introduction}

The \LUATEX\ project entered a new stage when end of 2008 and
beginning of 2009 math got opened up. Although \TEX\ can handle
math pretty good we had a few wishes that we hoped to fulfill in
the process. That \TEX's math machinery is a rather independent
subsystem is reflected in the fact that after parsing there is an
intermediate list of so called noads (math elements), which then
gets converted into a node list (glyphs, kerns, penalties, glue and
more). This conversion can be intercepted by a callback and a
macro package can do whatever it likes with the list of noads as
long as it returns a proper list.

Of course \CONTEXT\ does support math and that is visible in its
code base:

\startitemize

\item Due to the fact that we need to be able to switch to
alternative styles the font system is quite complex and in
\CONTEXT\ \MKII\ math font definitions (and changes) are good for
50\% of the time involved. In \MKIV\ we can use a more efficient
model.

\item Because some usage of \CONTEXT\ demands the mix of several
completely different encoded math fonts there is a dedicated math
encoding subsystem in \MKII. In \MKIV\ we will use \UNICODE\
exclusively.

\item Some constructs (and symbols) are implemented in a way that
we find suboptimal. In the perspective of \UNICODE\ in \MKIV\ we
aim at all symbols being real characters. This is possible because
all important constructs (like roots, accents and delimiters) are
supported by the engine.

\item In order to fit vertical spacing around math (think for
instance of typesetting on a grid) in \MKII\ we have ended up with
rather messy and suboptimal code. \footnote {This is because
spacing before and after formulas has to cooperate with spacing of
structural components that surround it.} The expectation is that
we can improve that.

\stopitemize

In the following sections I will discuss a few of the
implementation details of the font related issues in \MKIV. Of
course a few years from now the actual solutions we implemented
might look different but the principles remain the same. Also, as
with other components of \LUATEX\ Taco and I worked in parallel on
the code and its usage, which made both our tasks easier.

\subject{transition}

In \TEX, math typesetting uses a special concept called families.
Each math component (number, letter, symbol, etc) is member of a
family. Because we have three sizes (text, script and
scriptscript) this results in a family||size matrix of defined
fonts. Because the number of glyphs in a font was limited to 256,
in practice it meant that we had quite some font definitions. The
minimum number of families was~4 (roman, italic, symbol, and
extension) but in practice several more could be active (sans,
bold, mono|-|spaced, more symbols, etc.) for specific alphabets or
extra symbols (for instance \AMS\ set A and B). The total number
of families in traditional \TEX\ is limited to 16, and one easily
hits this maximum. In that case, some 16 times 3 fonts are defined
for one size of which in practice only a few are really used in the
typesetting.

A potential source of confusion is bold math. Bold in math can
either mean having some bold letters, or having the whole formula
in bold. In practice this means that for a complete bold formula
one has to define the whole lot using bold fonts. A complication
is that the math symbols (etc) are kind of bound to families and
so we end up with either redefining symbols, or reusing the
families (which is easier and faster). In any case there is a
performance issue involved due to the rather massive switch from
normal to bold.

In \UNICODE\ all alphabets that make sense as well as all math
symbols are part of the definition although unfortunately some
alphabets have their letters spread over the \UNICODE\ vector and
not in a range (like blackboard). This forces all applications
that want to support math to implement similar hacks to deal with
it.

In \MKIV\ we will assume that we have \UNICODE\ aware math fonts,
like \OPENTYPE. The font that sets the standard is Microsoft
Cambria. The upcoming (I'm writing this in January 2009) \TEX Gyre
fonts will be compliant to this standard but they're not yet there
and so we have a problem. The way out is to define virtual fonts
and now that \LUATEX\ math is extended to cover all of \UNICODE\
as well as provides access to the (intermediate) math lists this
has become feasible. This also permits us to test \LUATEX\
with both Cambria and Latin Modern Virtual Math.

The advantage is that we can stick to just one family for all
shapes which simplifies the underlying \TEX\ code enormously.
First of all we need to define way less fonts (which is partially
compensated by loading them as part of the virtual font) and all
math aspects can now be dealt with using the character data
tables.

One tricky aspect of the new approach is that the Latin Modern
fonts have design sizes, so we have to define several virtual
fonts. On the other hand, fonts like Cambria have alternative
script and scriptscript shapes which is controlled by the \type
{ssty} feature, a gsub alternate that provides some alternative
sizes for a couple of hundred characters that matter.

\starttabulate[|l|l|l|]
\NC text         \NC \type {lmmi12 at 12pt} \NC \type {cambria at 12pt with ssty=no} \NC \NR
\NC script       \NC \type {lmmi8  at  8pt} \NC \type {cambria at  8pt with ssty=1}  \NC \NR
\NC scriptscript \NC \type {lmmi6  at  6pt} \NC \type {cambria at  6pt with ssty=2}  \NC \NR
\stoptabulate

So Cambria not so much has design sizes but shapes optimized
relative to the text variant: in the following example we see text
in red, script in green and scriptscript in blue.

\startbuffer
\definefontfeature[math][analyze=false,script=math,language=dflt]

\definefontfeature[text]        [math][ssty=no]
\definefontfeature[script]      [math][ssty=1]
\definefontfeature[scriptscript][math][ssty=2]
\stopbuffer

\typebuffer \getbuffer

Let us first look at Cambria:

\startbuffer
\startoverlay
    {\definedfont[name:cambriamath*scriptscript at 150pt]\mkblue  X}
    {\definedfont[name:cambriamath*script       at 150pt]\mkgreen X}
    {\definedfont[name:cambriamath*text         at 150pt]\mkred   X}
\stopoverlay
\stopbuffer

\typebuffer \startlinecorrection \getbuffer \stoplinecorrection

When we compare them scaled down as happens in real script and
scriptscript we get:

\startbuffer
\startoverlay
    {\definedfont[name:cambriamath*scriptscript at 120pt]\mkblue  X}
    {\definedfont[name:cambriamath*script       at  80pt]\mkgreen X}
    {\definedfont[name:cambriamath*text         at  60pt]\mkred   X}
\stopoverlay
\stopbuffer

\typebuffer \startlinecorrection \getbuffer \stoplinecorrection

Next we see (scaled) Latin Modern:

\startbuffer
\startoverlay
    {\definedfont[LMRoman8-Regular  at 150pt]\mkblue  X}
    {\definedfont[LMRoman10-Regular at 150pt]\mkgreen X}
    {\definedfont[LMRoman12-Regular at 150pt]\mkred   X}
\stopoverlay
\stopbuffer

\typebuffer \startlinecorrection \getbuffer \stoplinecorrection

In practice we will see:

\startbuffer
\startoverlay
    {\definedfont[LMRoman8-Regular  at 120pt]\mkblue  X}
    {\definedfont[LMRoman10-Regular at  80pt]\mkgreen X}
    {\definedfont[LMRoman12-Regular at  60pt]\mkred   X}
\stopoverlay
\stopbuffer

\typebuffer \startlinecorrection \getbuffer \stoplinecorrection

Both methods probably work out well although you need to keep in
mind that the \OPENTYPE\ \type {ssty} feature is not so much a
design size related feature.

An \OPENTYPE\ font can have a specification for the script and
scriptscript size. By default we listen to this specification instead
of the one imposed by the bodyfont environment. When you turn on
tracing

\starttyping
\enabletrackers[otf.math]
\stoptyping

you will get messages like:

\starttyping
asked scriptscript size: 458752, used: 471859.2 (102.86 %)
asked script size: 589824, used: 574095.36 (97.33 %)
\stoptyping

The differences between the defaults and the font recommendations
are not that large so by default we listen to the font specification.

\usetypescript[cambria] \start \setupbodyfont[cambria] \stop

\definefontfeature[math-script]      [math-script]      [mathsize=no]
\definefontfeature[math-scriptscript][math-scriptscript][mathsize=no]

\definetypeface [cambria-ns] [rm] [serif] [cambria] [default]
\definetypeface [cambria-ns] [tt] [mono]  [modern]  [default]
\definetypeface [cambria-ns] [mm] [math]  [cambria] [default]

\usetypescript[cambria-ns] \start \setupbodyfont[cambria-ns] \stop

\startlinecorrection
\scale
  [width=\textwidth]
  {\backgroundline
     [darkgray]
     {\startoverlay
       {\white\switchtobodyfont   [cambria]$\sum_{i=0}^n$}
       {\mkred\switchtobodyfont[cambria-ns]$\sum_{i=0}^n$}
     \stopoverlay
     \startoverlay
       {\white\switchtobodyfont   [cambria]$\int_{i=0}^n$}
       {\mkred\switchtobodyfont[cambria-ns]$\int_{i=0}^n$}
     \stopoverlay
     \startoverlay
       {\white\switchtobodyfont   [cambria]$\log_{i=0}^n$}
       {\mkred\switchtobodyfont[cambria-ns]$\log_{i=0}^n$}
     \stopoverlay
     \startoverlay
       {\white\switchtobodyfont   [cambria]$\cos_{i=0}^n$}
       {\mkred\switchtobodyfont[cambria-ns]$\cos_{i=0}^n$}
     \stopoverlay
     \startoverlay
       {\white\switchtobodyfont   [cambria]$\prod_{i=0}^n$}
       {\mkred\switchtobodyfont[cambria-ns]$\prod_{i=0}^n$}
     \stopoverlay}}
\stoplinecorrection

\definefontfeature[math-script]      [math-script]      [mathsize=yes]
\definefontfeature[math-scriptscript][math-scriptscript][mathsize=yes]

In this overlay the white text is scaled according to the
specification in the font, while the red text is scaled according
to the bodyfont environment (12/7/5 points).

\subject{going virtual}

The number of math fonts (used) in the \TEX\ community is
relatively small and of those only Latin Modern (which builds upon
Computer Modern) has design sizes. This means that the amount of
\UNICODE\ compliant virtual math fonts that we have to make is not
that large. We could have used an already present virtual
composition mechanism but instead we made a handy helper function
that does a more efficient job. This means that a definition looks
(a bit simplified) as follows:

\starttyping
mathematics.make_font ( "lmroman10-math", {
  { name="lmroman10-regular", features="virtualmath", main=true },
  { name="lmmi10", vector="tex-mi", skewchar=0x7F },
  { name="lmsy10", vector="tex-sy", skewchar=0x30, parameters=true } ,
  { name="lmex10", vector="tex-ex", extension=true } ,
  { name="msam10", vector="tex-ma" },
  { name="msbm10", vector="tex-mb" },
  { name="lmroman10-bold", "tex-bf" } ,
  { name="lmmib10", vector="tex-bi", skewchar=0x7F } ,
  { name="lmsans10-regular", vector="tex-ss", optional=true },
  { name="lmmono10-regular", vector="tex-tt", optional=true },
} )
\stoptyping

For the \TEX Gyre Pagella it looks this way:

\starttyping
mathematics.make_font ( "px-math", {
  { name="texgyrepagella-regular", features="virtualmath", main=true },
  { name="pxr", vector="tex-mr" } ,
  { name="pxmi", vector="tex-mi", skewchar=0x7F },
  { name="pxsy", vector="tex-sy", skewchar=0x30, parameters=true } ,
  { name="pxex", vector="tex-ex", extension=true } ,
  { name="pxsya", vector="tex-ma" },
  { name="pxsyb", vector="tex-mb" },
} )
\stoptyping

As you can see, it is possible to add alphabets, given that there is
a suitable vector that maps glyph indices onto \UNICODE s. It is good
to know that this function only defines the way such a font is
constructed. The actual construction is delayed till the font is
needed.

Such a virtual font is used in typescripts (the building blocks of
typeface definitions in \CONTEXT) as follows:

\starttyping
\starttypescript [math] [palatino] [name]
  \definefontsynonym [MathRoman] [pxmath@px-math]
  \loadmapfile[original-youngryu-px.map]
\stoptypescript
\stoptyping

If you're familiar with the way fonts are defined in \CONTEXT, you will
notice that we no longer need to define MathItalic, MathSymbol and
additional symbol fonts. Of course users don't have to deal with
these issues themselves. The \type {@} triggers the virtual
font builder.

You can imagine that in \MKII\ switching to another font style or size
involves initializing (or at least checking) involves some 30 to 40
font definitions when it comes to math (the number of used
families times 3, the number o fmath sizes.). And even if we take
into account that fonts are loaded only once, this checking and
enabling takes time. Keep in mind that in \CONTEXT\ we can have
several math font sets active in one document which comes at a
price.

In \MKIV\ we use one family (at three sizes). Of course we need to
load the font (and more than one in the case of virtual variants)
but when switching bodyfont sizes we only need to enable one
(already defined) math font. And that really saves time. This is
one of the areas where we gain back time that we loose elsewhere
by extending core functionality using \LUA\ (like \OPENTYPE\
support).

\subject{dimensions}

By setting font related dimensions you can control the way \TEX\
positions math elements relative to each other. Math fonts have a
few more dimensions than regular text fonts. But \OPENTYPE\ math
fonts like Cambria have quite some more. There is a nice booklet
published by Microsoft, \quote {Mathematical Typesetting}, where
dealing with math is discussed in the perspective of their word
processor and \TEX. In the booklet some of the parameters are
discussed and since many of them are rather special it makes no
sense (yet) to elaborate on them here. \footnote {Googling on
\quote {Ulrich Vieth}, \quote {TeX} and \quote {conferences} might
give you some hits on articles on these matters.} Figuring out
their meaning was quite a challenge.

I am the first to admit that the current code in \MKIV\ that deals
with math parameters is somewhat messy. There are several reasons
for this:

\startitemize[packed]
\item We can pass parameters as \type {MathConstants} table in the
      \TFM\ table that we pass to the core engine.
\item We can use some named parameters, like \type {x_height} and
      pass those in the \type {parameters} table.
\item We can use the traditional font dimension numbers in the
      \type {parameters} table, but since they overlap for symbol and
      extensible fonts, that is asking for troubles.
\stopitemize

Because in \MKIV\ we create virtual fonts at run|-|time and use just
one family, we fill the \type {MathConstants} table for
traditional fonts as well. Future versions may use the upcoming
mechanisms of font parameter sets at the macro level. These can be
defined for each of the sizes (display, text, script and
scriptscript, and the last three in cramped form as well) but
since a font only carries one set, we currently use a compromise.

\subject{tracing}

One of the nice aspects of the opened up math machinery is that it
permits us to get a more detailed look at what happens. It also
fits nicely in the way we always want to visualize things in
\CONTEXT\ using color, although most users are probably unaware of
many such features because they don't need them as I do.

\startbuffer
\enabletrackers[math.analyzing]
\ruledhbox{$a = \sqrt{b^2 + \sin{c} - {1 \over \gamma}}$}
\disabletrackers[math.analyzing]
\stopbuffer

\typebuffer \startbaselinecorrection \getbuffer \stopbaselinecorrection

This tracker option colors characters depending on their nature and the
fact that they are remapped. The tracker also was handy during development
of \LUATEX\ especially for checking if attributes migrated right in
constructed symbols.

For over a year I had been using a partial \UNICODE\ math
implementation in some projects but for serious math the vectors
needed to be completed. In order to help the \quote {math
department} of the \CONTEXT\ development team (Aditya Mahajan,
Mojca Miklavec, Taco Hoekwater and myself) we have some extra
tracing options, like

\startbuffer
\showmathfontcharacters[list=0x0007B]
\stopbuffer

\typebuffer

\start \blank \getbuffer \blank \stop

The simple variant with no arguments would have extended this
document with many pages of such descriptions.

Another handy command (defined in module \type{fnt-25}) is the following:

\starttyping
\ShowCompleteFont{name:cambria}{9pt}{1}
\ShowCompleteFont{dummy@lmroman10-math}{10pt}{1}
\stoptyping

This will for instance for Cambria generate between 50 and 100
pages of character tables.

\startbuffer[mathtest]
$abc \bf abc \bi abc$
$\mathscript abcdefghijklmnopqrstuvwxyz %
  1234567890 ABCDEFGHIJKLMNOPQRSTUVWXYZ$
$\mathfraktur abcdefghijklmnopqrstuvwxyz %
  1234567890 ABCDEFGHIJKLMNOPQRSTUVWXYZ$
$\mathblackboard abcdefghijklmnopqrstuvwxyz %
  1234567890 ABCDEFGHIJKLMNOPQRSTUVWXYZ$
$\mathscript abc IRZ \mathfraktur abc IRZ %
  \mathblackboard abc IRZ \ss abc IRZ 123$
\stopbuffer

If you look at the following samples you can imagine how coloring
the characters and replacements helped figuring out the alphabets
We use the following input (stored in a buffer):

\typebuffer [mathtest]

For testing Cambria we say:

\starttyping
\usetypescript[cambria]
\switchtobodyfont[cambria,11pt]
\enabletrackers[math.analyzing]
\getbuffer[mathtest] % the input shown before
\disabletrackers[math.analyzing]
\stoptyping

And we get:

\usetypescript[cambria] % global

\startlines
\switchtobodyfont[cambria,10pt]
\enabletrackers[math.analyzing]
\getbuffer[mathtest] % the input shown before
\disabletrackers[math.analyzing]
\stoplines

For the virtualized Latin Modern we say:

\starttyping
\usetypescript[modern]
\switchtobodyfont[modern,11pt]
\enabletrackers[math.analyzing]
\getbuffer[mathtest] % the input shown before
\disabletrackers[math.analyzing]
\stoptyping

This gives:

\usetypescript[modern] % global

\startlines
\switchtobodyfont[modern,11pt]
\enabletrackers[math.analyzing]
\getbuffer[mathtest]
\disabletrackers[math.analyzing]
\stoplines

These two samples demonstrate that Cambria has a rather complete
repertoire of shapes which is no surprise because it is a recent
font that also serves as a showcase for \UNICODE\ and \OPENTYPE\
driven math.

Commands like \type {\mathscript} sets an attribute. When we post|-|process
the noad list and encounter this attribute, we remap the characters to
the desired variant. Of course this happens selectively. So, a capital~A
(\type {0x0041}) becomes a capital script~A (\type {0x1D49C}). Of course
this solution is rather \CONTEXT\ specific and there are other ways to
achieve the same goal (like using more families and switching family).

\subject{special cases}

Because we now are operating in the \UNICODE\ domain, we run into
problems if we keep defining some of the math symbols in the
traditional \TEX\ way. Even with the \AMS\ fonts available we
still end up with some characters that are represented by
combining others. Take for instance $\neq$ which is composed of
two characters. Because in \MKIV\ we want to have all
characters in their pure form we use a virtual replacement for
them. In \MKIV\ speak it looks like this:

\starttyping
local function negate(main,unicode,basecode)
    local characters = main.characters
    local basechar = characters[basecode]
    local ht, wd = basechar.height, basechar.width
    characters[unicode] = {
        width    = wd,
        height   = ht,
        depth    = basechar.depth,
        italic   = basechar.italic,
        kerns    = basechar.kerns,
        commands = {
            { "slot", 1, basecode },
            { "push" },
            { "down",    ht/5},
            { "right", - wd/2},
            { "slot", 1, 0x2215 },
            { "pop" },
        }
    }
end
\stoptyping

In case you're curious, there are indeed kerns, in this case the
kerns with the Greek Delta.

Another thing we need to handle is positioning of accents on top
of slanted (italic) shapes. For this \TEX\ uses a special
character in its fonts (set with \type{\skewchar}). Any character
can have in its kerning table a kern towards this special
character. From this kern we
can calculate the \type {top_accent} variable that we can pass for
each character. This variable lives at the same level as \type
{width}, \type {height}, \type {depth} and \type {italic} and is
calculated as: $w/2 + k$, so it defines the horizontal anchor. A
nice side effect is that (in the \CONTEXT\ font management
subsystem) this saves us passing information associated with
specific fonts such as the skew character.

A couple of concepts are unique to \TEX, like having \type {\hat}
and \type {\widehat} where the wide one has sizes. In \OPENTYPE\ and
\UNICODE\ we don't have this distinction so we need special
trickery to simulate this. We do so by adding extra code points in
a private \UNICODE\ space which in return results in them being
defined automatically and the relevant first size variant being
used for \type {\hat}. For some users this might still be too wide
but at least it's better than a wrongly positioned \ASCII\ variant.
In the future we might use this private space for similar cases.

Arrows, horizontal extenders and radicals also fall in the
category \quote {troublesome} if only because they use special
dimensions to get the desired effect. Fortunately \OPENTYPE\ math
is modeled after \TEX, so in \LUATEX\ we introduce a couple
of new constructs to deal with this. One such simplification at
the macro level is in the definition of \type {\root}. Here we use
the new \type {\Uroot} primitive. The placement related parameters
are those used by traditional \TEX, but when they are available the
\OPENTYPE\ parameters are applied. The simplified
plain definitions are now:

\starttyping
\def\rootradical{\Uroot 0 "221A }

\def\root#1\of{\rootradical{#1}}

\def\sqrt{\rootradical{}}
\stoptyping

The successive sizes of the root will be taken from the font in the
same way as traditional \TEX\ does it. In that sense \LUATEX\ is no
doing anything differently, it only has more parameters to control
the process. The definition of \type {\sqrt} in \CONTEXT\ permits
an optional first argument that sets the degree.

\startbuffer
\showmathfontcharacters[list=0x221A]
\stopbuffer

\start \blank \getbuffer \blank \stop

Note that we've collected all characters in family~0 (simply
because that is what \TEX\ defaults characters to) and that we use
the formal \UNICODE\ slots. When we use the Latin Modern fonts we
just remap traditional slots to the right ones.

Another neat trick is used when users choose among the bigger variants
of some characters. The traditional approach is to create a box of a
certain size and create a fake delimited variant which is then used.

\starttyping
\definemathcommand [big]  {\choosemathbig\plusone  }
\definemathcommand [Big]  {\choosemathbig\plustwo  }
\definemathcommand [bigg] {\choosemathbig\plusthree}
\definemathcommand [Bigg] {\choosemathbig\plusfour }
\stoptyping

Of course this can become a primitive operation and we might decide
to add such a primitive later on so we won't bother you with more
details.

Attributes are also used to make live easier for authors who have
to enter lots of pairs. Compare:

\startbuffer
\setupmathematics[autopunctuation=no]

$ (a,b) = (1.20,3.40) $
\stopbuffer

\typebuffer \begingroup \getbuffer \endgroup

with:

\startbuffer
\setupmathematics[autopunctuation=yes]

$ (a,b) = (1.20,3.40) $
\stopbuffer

\typebuffer \begingroup \getbuffer \endgroup

So we don't need to use this any more:

\starttyping
$ (a{,}b) = (1{.}20{,}3{.}40) $
\stoptyping

Features like this are implemented on top of an experimental math
manipulation framework that is part of \MKIV. When the math
font system is stable we will rework the rest of math support
and implement additional manipulating frameworks.

\subject{control}

As with all other character related issues, in \MKIV\ everything
is driven by a character table (consider it a database).
Quite some effort went into getting that one right and although by
now math is represented well, more data will be added in due time.

In \MKIV\ we no longer have huge lists of \TEX\ definitions for
math related symbols. Everything is initialized using the mentioned
table: normal symbols, delimiters, radicals, whether or not with name.
Take for instance the square root:

\start \blank \showmathfontcharacters[list=0x221A] \blank \stop


Its entry is:

\starttyping
[0x221A] = {
    adobename = "radical",
    category = "sm",
    cjkwd = "a",
    description = "SQUARE ROOT",
    direction = "on",
    linebreak = "ai",
    mathclass = "radical",
    mathname = "surd",
    unicodeslot = 0x221A,
}
\stoptyping

The fraction symbol also comes in sizes. This symbol is not to be
confused with the negation symbol \type {0x2215}, which in \TEX\ is
known as \type {\not}).

\start \blank \showmathfontcharacters[list=0x2044] \blank \stop

\starttyping
[0x2044] = {
    adobename = "fraction",
    category = "sm",
    contextname = "textfraction",
    description = "FRACTION SLASH",
    direction = "cs",
    linebreak = "is",
    mathspec = {
        { class = "binary", name = "slash" },
        { class = "close", name = "solidus" },
    },
    unicodeslot = 0x2044,
}
\stoptyping

However, since most users don't have this symbol visualized in
their word processor, they expect the same behaviour from the
regular slash. This is why we find a reference to the real symbol
in its definition.

\start \blank \showmathfontcharacters[list=0x002F] \blank \stop

The definition is:

\starttyping
[0x002F] = {
    adobename = "slash",
    category = "po",
    cjkwd = "na",
    contextname = "textslash",
    description = "SOLIDUS",
    direction = "cs",
    linebreak = "sy",
    mathsymbol = 0x2044,
    unicodeslot = 0x002F,
}
\stoptyping

One problem left is that currently we have only one class per
character (apart from the delimiter and radical usage which have
their own definitions). Future releases of \CONTEXT\ will provide
support for math dictionaries (as in \OPENMATH\ and \MATHML~3). At
that point we will also have a \type {mathdict} entry.

There is another issue with character mappings, one that will
seldom reveal itself to the user, but might confuse macro writers
when they see an error message.

In traditional \TEX, and therefore also in the Latin Modern fonts,
a chain from small to large character goes in two steps: the
normal size is taken from one family and the larger variants from
another. The larger variant then has a pointer to an even larger
one and so on, until there is no larger variant or an extensible
recipe is found. The default family is number~0. It is for this
reason that some of the definition primitives expect a small and
large family part.

However, in order to support \OPENTYPE\ in \LUATEX\ the
alternative method no longer assumes this split. After all, we no
longer have a situation where the 256 limit forces us to take the
smaller variant from one font and the larger sequence from another
(so we need two family||slot pairs where each family eventually
resolves to a font).

It is for that reason that the new \type {\U...} primitives expect
only one family specification: the small symbol, which then has a
pointer to a larger variant when applicable. However deep down in
the engine, there is still support for the multiple family
solution (after all, we don't want to drop compatibility). As a
result, in error messages you can still find references
(defaulting to~0) to large specifications, even if you don't use
them. In that case you can simply ignore the large symbol (0,0),
since it is not used when the small symbol provides a link.

\subject{extensibles}

In \TEX\ fences can be told to become larger automatically. In
traditional \TEX\ a character can have a linked list of next
larger shapes ending in a description of how to compose even
larger variants.

A parenthesis in Cambria has the following list:

\start
    \switchtobodyfont[cambria,10pt]
    \showmathfontcharacters[list=0x00028]
\stop

In Latin Modern we have:

\start
    \switchtobodyfont[modern,10pt]
    \showmathfontcharacters[list=0x00028]
\stop

Of course \LUATEX\ is downward compatible with respect to this
feature, but the internal representation is now closer to what
\OPENTYPE\ math provides (which is not that far from how \TEX\
works simply because it's inspired by \TEX). Because Cambria has
different parameters we get slightly different results. In the
following list of pairs, you see Cambria on the left and Latin
Modern on the right.
Both start with stepwise larger shapes, followed by a more gradual
growth. The thresholds for a next step are driven by parameters
set in the \OPENTYPE\ font or by \TEX's default.

\start
\lineskip1ex
\dostepwiserecurse{5}{140}{5} {
    \dontleavehmode \ruledhbox \bgroup
        \setbox0=\vbox{\vss\hbox{\switchtobodyfont[cambria,10pt]$\left\{ \vcenter{\hbox{\darkgray\vrule height \recurselevel pt width 5pt}} \right\}$}\vss}%
        \setbox2=\vbox{\vss\hbox{\switchtobodyfont[modern, 10pt]$\left\{ \vcenter{\hbox{\darkgray\vrule height \recurselevel pt width 5pt}} \right\}$}\vss}%
        \ifdim\ht0>\ht2
            \setbox2\vbox to \htdp0{\vss\box2\vss}%
        \else
            \setbox0\vbox to \htdp2{\vss\box0\vss}%
        \fi
        \box0\box2
    \egroup \quad
}
\par \stop

In traditional \TEX\ horizontal extensibles are not really present. Accents
are chosen from a linked list of variants and don't have an extensible
specification. This is because most such accents grow in two dimensions and
the only extensible like accents are rules and braces. However, in \UNICODE\
we have a few more and also because of symmetry we decided to add horizontal
extensibles too. Take:

\startbuffer
$ \overbrace {a+1} \underbrace {b+2} \doublebrace {c+3} $ \par
$ \overparent{a+1} \underparent{b+2} \doubleparent{c+3} $ \par
\stopbuffer

\typebuffer

This gives:

\getbuffer

Contrary to Cambria, Latin Modern Math, which is just like
Computer Modern Math, has no ready overbrace glyphs. Keep in mind
that in that we're dealing with fonts that have only 256 slots and
that the traditional font mechanism has the same limitation. For
this reason, the (extensible) braces are traditionally made from
snippets as is demonstrated below.

\startbuffer
\hbox\bgroup
  \ruledhbox{\getglyph{lmex10}{\char"7A}}
  \ruledhbox{\getglyph{lmex10}{\char"7B}}
  \ruledhbox{\getglyph{lmex10}{\char"7C}}
  \ruledhbox{\getglyph{lmex10}{\char"7D}}
  \ruledhbox{\getglyph{lmex10}{\char"7A\char"7D\char"7C\char"7B}}
  \ruledhbox{\getglyph{name:cambriamath}{\char"23DE}}
  \ruledhbox{\getglyph{lmex10}{\char"7C\char"7B\char"7A\char"7D}}
  \ruledhbox{\getglyph{name:cambriamath}{\char"23DF}}
\egroup
\stopbuffer

\typebuffer

This gives:

\startlinecorrection
\getbuffer
\stoplinecorrection

The four snippets have the height and depth of the rule that will
connect them. Since we want a single interface for all fonts we no
longer will use macro based solutions. First of all fonts like
Cambria don't have the snippets, and using active character
trickery (so that we can adapt the meaning to the font) has no
preference either. This leaves virtual glyphs.

It took us a bit of experimenting to get the right virtual definition because
it is a multi||step process:

\startitemize[packed]
\item The right \UNICODE\ character (\type {0x23DE}) points to a character that has
      no glyph itself but only horizontal extensibles.
\item The snippets that make up the extensible don't have the right dimensions
      (as they define the size of the connecting rule), so we need to make them
      virtual themselves and give them a size that matches \LUATEX's expectations.
\item Each virtual snippet contains a reference to the physical snippet and moves
      it up or down as well as fixes its size.
\item The second and fifth snippet are actually not real glyphs but rules. The
      dimensions are derived from the snippets and it is shifted up or down too.
\stopitemize

You might wonder if this is worth the trouble. Well, it is if you take into
account that all upcoming math fonts will be organized like Cambria.

\subject{math kerning}

While reading Microsofts orange booklet, it became clear that
\OPENTYPE\ provides advanced kerning possibilities and we decided
to put it on the agenda for \LUATEX.

It is possible to define a ladder||like boundary for each corner
of a character where the ladder more or less follows the shape of
a character. In theory this means that when we attach a
superscript to a base character we can use two such ladders to
determine the optimal spacing between them.

Let's have a look at a few characters, the upright~f and its
italic cousin.

\startcombination[2*1]
  {\ShowGlyphShape{name:cambria-math}{40bp}{0x66}}    {U+00066}
  {\ShowGlyphShape{name:cambria-math}{40bp}{0x1D453}} {0x1D453}
\stopcombination

The ladders on the right can be used to position a super or
subscript, that is, they are positioned in the normal way but the
ladder, as well as the boundingbox and/or left ladders of the
scripts can be used to fine tune the positioning.

Should we use this information? I made this visualizer for
checking some Arabic fonts anchoring and cursive features and then
it made sense to add some of the information related to math as
well. \footnote {Taco extended the visualizer for his presentation
at Bachotek 2009 so you might run into variants.} The orange
booklet shows quite advanced ladders, and when looking at the 3500
shapes in Cambria, it quickly becomes clear that in practice there
is not that much detail in the specification. Nevertheless,
because without this feature the result is not acceptable \LUATEX\
gracefully supports it.

\usetypescript[cambria-y]

\startbuffer
$V^a_a V^a V_a V^1_2 V^1 V_2 f^a f_a f^a_a$\par
$V^f_f V^f V_f V^1_2 V^1 V_2 f^f f_f f^f_f$\par
$T^a_a T^a T_a T^1_2 T^1 T_2 f^a f_f f^a_f$\par
$T^f_f T^f T_f T^1_2 T^1 T_2 f^f f_a f^f_a$\par
\stopbuffer

\startlinecorrection
\startcombination[3*1]
    {\framed[align=normal]{\switchtobodyfont[modern]\getbuffer}}    {latin modern}
    {\framed[align=normal]{\switchtobodyfont[cambria-y]\getbuffer}} {cambria without kerning}
    {\framed[align=normal]{\switchtobodyfont[cambria]\getbuffer}}   {cambria with kerning}
\stopcombination
\stoplinecorrection

% \ShowGlyphShape{name:cambria-math} {40bp}{0x1D43F}
% \ShowGlyphShape{name:cambria-math}{100bp}{0x1D444}
% \ShowGlyphShape{name:cambria-math}{100bp}{0x1D447}
% \ShowGlyphShape{name:cambria-math}{100bp}{0x2112}
% \ShowGlyphShape{name:cambria-math}{100bp}{0x1D432}
% \ShowGlyphShape{name:cambria-math}{100bp}{0x1D43D}
% \ShowGlyphShape{name:cambria-math}{100bp}{0x1D44A}
% \ShowGlyphShape{name:cambria-math}{100bp}{0x1D45D}

\subject{faking glyphs}

A previous section already discussed virtual shapes. In the
process of replacing all shapes that lack in Latin Modern and are
composed from snippets instead we ran into the dots. As they are a
nice demonstration of something that, although somewhat of a hack,
survived 30 years without problems we show the definition used in
\CONTEXT\ \MKII:

% ldots = 2026
% vdots = 22EE
% cdots = 22EF
% ddots = 22F1
% udots = 22F0

\startbuffer
\def\PLAINldots{\ldotp\ldotp\ldotp}
\def\PLAINcdots{\cdotp\cdotp\cdotp}

\def\PLAINvdots
  {\vbox{\forgetall\baselineskip.4\bodyfontsize\lineskiplimit\zeropoint\kern.6\bodyfontsize\hbox{.}\hbox{.}\hbox{.}}}

\def\PLAINddots
  {\mkern1mu%
   \raise.7\bodyfontsize\ruledvbox{\kern.7\bodyfontsize\hbox{.}}%
   \mkern2mu%
   \raise.4\bodyfontsize\relax\ruledhbox{.}%
   \mkern2mu%
   \raise.1\bodyfontsize\ruledhbox{.}%
   \mkern1mu}
\stopbuffer

\getbuffer \typebuffer

This permitted us to say:

\starttyping
\definemathcommand [ldots] [inner]   {\PLAINldots}
\definemathcommand [cdots] [inner]   {\PLAINcdots}
\definemathcommand [vdots] [nothing] {\PLAINvdots}
\definemathcommand [ddots] [inner]   {\PLAINddots}
\stoptyping

However, in \MKIV\ we use virtual shapes instead.

\definemathcommand [xldots] [inner]   {\PLAINldots}
\definemathcommand [xcdots] [inner]   {\PLAINcdots}
\definemathcommand [xvdots] [nothing] {\PLAINvdots}
\definemathcommand [xddots] [inner]   {\PLAINddots}

The following lines show the virtual shapes in red. In each
triplet we see the original, the virtual and the overlaid
character.

\startlinecorrection
\switchtobodyfont[modern,17.3pt]%
\dontleavehmode
\ruledhbox{$\xldots$}%
\ruledhbox{$\ldots$}%
\ruledhbox{\startoverlay{$\xldots$}{$\red\ldots$}\stopoverlay}%
\quad
\ruledhbox{$\xcdots$}%
\ruledhbox{$\cdots$}%
\ruledhbox{\startoverlay{$\xcdots$}{$\red\cdots$}\stopoverlay}%
\quad
\ruledhbox{$\xvdots$}%
\ruledhbox{$\vdots$}%
\ruledhbox{\startoverlay{$\xvdots$}{$\red\vdots$}\stopoverlay}%
\quad
\ruledhbox{$\xddots$}%
\ruledhbox{$\ddots$}%
\ruledhbox{\startoverlay{$\xddots$}{$\red\ddots$}\stopoverlay}%
\quad
\ruledhbox{$\xddots$}%
\ruledhbox{$\udots$}%
\ruledhbox{\startoverlay{$\xddots$}{$\red\udots$}\stopoverlay}%
\stoplinecorrection

As you can see here, the virtual variants are rather close to the
originals. At 12pt there are no real differences but (somehow) at
other sizes we get slightly different results but it is hardly
visible. Watch the special spacing above the shapes. It is
probably needed for getting the spacing right in matrices (where
they are used).

\stopcomponent