% language=uk
% \enabletrackers[structures.export]
% \setupbackend[export=yes]
\usemodule[mathml] % also loads calcmath
\startcomponent hybrid-mathml
\environment hybrid-environment
\startchapter[title={Exporting math}]
\startsection [title={Introduction}]
As \CONTEXT\ has an \XML\ export feature and because \TEX\ is often strongly
associated with math typesetting, it makes sense to take a look at coding and
exporting math. In the next sections some aspects are discussed. The examples
shown are a snaphot of the possibilities around June 2011.
\stopsection
\startsection [title={Encoding the math}]
In \CONTEXT\ there are several ways to input math. In the following example we
will use some bogus math with enough structure to get some interesting results.
The most natural way to key in math is using the \TEX\ syntax. Of course you need
to know the right commands for accessing special symbols, but if you're familiar
with a certain domain, this is not that hard.
\startbuffer
\startformula
\frac { x \geq 2 } { y \leq 4 }
\stopformula
\stopbuffer
\typebuffer \getbuffer
When you have an editor that can show more than \ASCII\ the following also works
out well.
\starttyping
\startformula
\frac { x ≥ 2 } { y ≤ 4 }
\stopformula
\stoptyping
One can go a step further and use the proper math italic alphabet but there are
hardly any (monospaced) fonts out there that can visualize it.
\starttyping[escape=yes]
\startformula
\frac { /BTEX\it x/ETEX ≥ 2 } { /BTEX\it y/ETEX ≤ 4 }
\stopformula
\stoptyping
Anyhow, \CONTEXT\ is quite capable of remapping the regular alphabets onto the
real math ones, so you can stick to \type {x} and \type {y}.
Another way to enter the same formula is by using what we call calculator math.
We came up with this format many years ago when \CONTEXT\ had to process student
input using a syntax similar to what the calculators they use at school accept.
\startbuffer
\startformula
\calcmath{(x >= 2)/(y <= 4)}
\stopformula
\stopbuffer
\typebuffer \getbuffer
As \CONTEXT\ is used in a free and open school math project, and because some of
our projects mix \MATHML\ into \XML\ encoded sources, we can also consider using
\MATHML. The conceptually nicest way is to use content markup, where the focus is
on meaning and interchangability and not on rendering. However, we can render it
quite well. OpenMath, now present in \MATHML~3 is also supported.
\startbuffer
\stopbuffer
\typebuffer \processxmlbuffer
In practice \MATHML\ will be coded using the presentational variant. In many
aspects this way of coding is not much different from what \TEX\ does.
\startbuffer
\stopbuffer
\typebuffer \processxmlbuffer
When we enable \XML\ export in the backend of \CONTEXT, all of the above variants
are converted into the following:
%
%
%
%
% 𝑥
% ≥
% 2
%
%
% 𝑦
% ≤
% 4
%
%
%
%
\starttyping[escape=yes]
/BTEX\it x/ETEX
≥
2
/BTEX\it y/ETEX
≤
4
\stoptyping
This is pretty close to what we have entered as presentation \MATHML. The main
difference is that the (display or inline) mode is registered as attribute and
that entities have been resolved to \UTF. Of course one could use \UTF\ directly
in the input.
\stopsection
\startsection [title={Parsing the input}]
In \TEX\ typesetting math happens in two stages. First the input is parsed and
converted into a so called math list. In the following case it's a rather linear
list, but in the case of a fraction it is a tree.
\startbuffer
\startformula
x = - 1.23
\stopformula
\stopbuffer
\typebuffer \getbuffer
A naive export looks as follows. The sequence becomes an \type {mrow}:
\starttyping[escape=yes]
/BTEX\it x/ETEX
=
−
1
.
2
3
\stoptyping
However, we can clean this up without too much danger of getting invalid output:
\starttyping[escape=yes]
/BTEX\it x/ETEX
=
−
1.23
\stoptyping
This is still not optimal, as one can argue that the minus sign is part of the
number. This can be taken care of at the input end:
\startbuffer
\startformula
x = \mn{- 1.23}
\stopformula
\stopbuffer
\typebuffer
Now we get:
\starttyping[escape=yes]
/BTEX\it x/ETEX
=
−1.23
\stoptyping
Tagging a number makes sense anyway, for instance when we use different numbering
schemes:
\startbuffer
\startformula
x = \mn{0x20DF} = 0x20DF
\stopformula
\stopbuffer
\typebuffer
We get the first number nicely typeset in an upright font but the second one
becomes a mix of numbers and identifiers:
\getbuffer
This is nicely reflected in the export:
\starttyping[escape=yes]
/BTEX\it x/ETEX
=
0x20DF
=
0
/BTEX\it x/ETEX
20
/BTEX\it D/ETEX
/BTEX\it F/ETEX
\stoptyping
In a similar fashion we can use \type {\mo} and \type {\mi} although these are
seldom needed, if only because characters and symbols already carry these
properties with them.
\stopsection
\startsection [title={Enhancing the math list}]
When the input is parsed into a math list the individual elements are called
noads. The most basic noad has pointers to a nucleus, a superscript and a
subscript and each of them can be the start of a sublist. All lists (with more
than one character) are quite similar to \type {mrow} in \MATHML. In the export
we do some flattening because otherwise we would get too many redundant \type
{mrow}s, not that it hurts but it saves bytes.
\startbuffer
\startformula
x_n^2
\stopformula
\stopbuffer
\typebuffer
This renders as:
\getbuffer
And it gets exported as:
\starttyping[escape=yes]
/BTEX\it x/ETEX
/BTEX\it n/ETEX
2
\stoptyping
As said, in the math list this looks more or less the same: we have a noad with a
nucleus pointing to a math character (\type {x}) and two additional pointers to
the sub- and superscripts.
After this math list is typeset, we will end up with horizontal and vertical
lists with glyphs, kerns, glue and other nodes. In fact we end up with what can
be considered regular references to slots in a font mixed with positioning
information. In the process the math properties gets lost. This happens between
step~3 and~4 in the next overview.
\starttabulate[|l|l|l|]
\NC 1 \NC \XML \NC optional alternative input \NC \NR
\NC 2 \NC \TEX \NC native math coding \NC \NR
\NC 3 \NC noads \NC intermediate linked list / tree \NC \NR
\NC 4 \NC nodes \NC linked list with processed (typeset) math \NC \NR
\NC 5a \NC \PDF \NC page description suitable for rendering \NC \NR
\NC 5b \NC \XML \NC export reflecting the final document content \NC \NR
\stoptabulate
In \CONTEXT\ \MKIV\ we intercept the math list (with noads) and apply a couple of
manipulations to it, most noticeably relocation of characters. Last in the
(currently some 10) manipulation passes over the math list comes tagging. This
only happens when the export is active or when we produce tagged pdf. \footnote
{Currently the export is the benchmark and the tagged \PDF\ implementation
follows, so there can be temporary incompatibilities.}
By tagging the recognizable math snippets we can later use those persistent
properties to reverse engineer the \MATHML\ from the input.
\stopsection
\startsection [title={Intercepting the typeset content}]
When a page gets shipped out, we also convert the typeset content to an
intermediate form, ready for export later on. Version 0.22 of the exporter has a
rather verbose tracing mechanism and the simple example with sub- and superscript
is reported as follows:
\starttyping[escape=yes]
/BTEX\it x/ETEX
2
/BTEX\it n/ETEX
\stoptyping
This is not yet what we want so some more effort is needed in order to get proper
\MATHML.
\stopsection
\startsection [title={Exporting the result}]
The report that we showed before representing the simple example with super- and
subscripts is strongly related to the visual rendering. It happens that \TEX\
first typesets the superscript and then deals with the subscript. Some spacing is
involved which shows up in the report between the two scripts.
In \MATHML\ we need to swap the order of the scripts, so effectively we need:
\starttyping[escape=yes]
/BTEX\it x/ETEX
/BTEX\it n/ETEX
2
\stoptyping
This swapping (and some further cleanup) is done before the final tree is written
to a file. There we get:
\starttyping[escape=yes]
/BTEX\it x/ETEX
/BTEX\it n/ETEX
2
\stoptyping
This looks pretty close to the intermediate format. In case you wonder with how
much intermediate data we end up, the answer is: quite some. The reason will be
clear: we intercept typeset output and reconstruct the input from that, which
means that we have additional information travelling with the content. Also, we
need to take crossing pages into account and we need to reconstruct paragraphs.
There is also some overhead in making the \XML\ look acceptable but that is
neglectable. In terms of runtime, the overhead of an export (including tagging)
is some 10\% which is not that bad, and there is some room for optimization.
\stopsection
\startsection[title={Special treatments}]
In content \MATHML\ the \type {apply} tag is the cornerstone of the definition.
Because there is enough information the rendering mechanism can deduce when a
function is applied and act accordingly when it comes to figuring out the right
amount of spacing. In presentation \MATHML\ there is no such information and
there the signal is given by putting a character with code \type {U+2061} between
the function identifier and the argument. In \TEX\ input all this is dealt with
in the macro that specifies a function but some ambiguity is left.
Compare the following two formulas:
\startbuffer
\startformula
\tan = \frac { \sin } { \cos }
\stopformula
\stopbuffer
\typebuffer \getbuffer
In the export this shows up as follows:
\starttyping
tan
=
sin
cos
\stoptyping
Watch how we know that \type {tan} is a function and not a multiplication of the
variables \type {t}, \type{a} and~\type {n}.
In most cases functions will get an argument, as in:
\startbuffer
\startformula
\tan (x) = \frac { \sin (x) } { \cos (x) }
\stopformula
\stopbuffer
\typebuffer \getbuffer
\starttyping[escape=yes]
tan
(
/BTEX\it x/ETEX
)
=
sin
(
/BTEX\it x/ETEX
)
cos
(
/BTEX\it x/ETEX
)
\stoptyping
As expected we now see the arguments but it is still not clear that the function
has to be applied.
\startbuffer
\startformula
\apply \tan {(x)} = \frac {
\apply \sin {(x)}
} {
\apply \cos {(x)}
}
\stopformula
\stopbuffer
\typebuffer \getbuffer
This time we get the function application signal in the output. We could add it
automatically in some cases but for the moment we don't do so. Because this
trigger has no visual rendering and no width it will not be visible in an editor.
Therefore we output an entity.
\starttyping[escape=yes]
tan
(
/BTEX\it x/ETEX
)
=
sin
(
/BTEX\it x/ETEX
)
cos
(
/BTEX\it x/ETEX
)
\stoptyping
In the future, we will extend the \type {\apply} macro to also deal with
automatically managed fences. Talking of those, fences are actually supported
when explicitly coded:
\startbuffer
\startformula
\apply \tan {\left(x\right)} = \frac {
\apply \sin {\left(x\right)}
} {
\apply \cos {\left(x\right)}
}
\stopformula
\stopbuffer
\typebuffer \getbuffer
This time we get a bit more structure because delimiters in \TEX\ can be
recognized easily. Of course it helps that in \CONTEXT\ we already have the
infrastructure in place.
\starttyping[escape=yes]
tan
/BTEX\it x/ETEX
=
sin
/BTEX\it x/ETEX
cos
/BTEX\it x/ETEX
\stoptyping
Yet another special treatment is needed for alignments. We use the next example
to show some radicals as well.
\startbuffer
\startformula
\startalign
\NC a^2 \EQ \sqrt{b} \NR
\NC c \EQ \frac{d}{e} \NR
\NC \EQ f \NR
\stopalign
\stopformula
\stopbuffer
\typebuffer
It helps that in \CONTEXT\ we use a bit of structure in math alignments. In fact,
a math alignment is just a regular alignment, with math in its cells. As with
other math, eventually we end up with boxes so we need to make sure that enough
information is passed along to reconstuct the original.
\getbuffer
\starttyping[escape=yes]
/BTEX\it a/ETEX
2
=
/BTEX\it b/ETEX
/BTEX\it c/ETEX
=
/BTEX\it d/ETEX
/BTEX\it e/ETEX
=
/BTEX\it f/ETEX
\stoptyping
Watch how the equal sign ends up in the cell. Contrary to what you might expect,
the relation symbols (currently) don't end up in their own column. Keep in mind
that these tables look structured but that presentational \MATHML\ does not
assume that much structure. \footnote {The spacing could be improved here but
it's just an example, not something real.}
\stopsection
\startsection[title=Units]
Rather early in the history of \CONTEXT\ we had support for units and the main
reason for this was that we wanted consistent spacing. The input of the old
method looks as follows:
\starttyping
10 \Cubic \Meter \Per \Second
\stoptyping
This worked in regular text as well as in math and we even have an \XML\ variant.
A few years ago I played with a different method and the \LUA\ code has been
laying around for a while but never made it into the \CONTEXT\ core. However,
when playing with the export, I decided to pick up that thread. The verbose
variant can now be coded as:
\starttyping
10 \unit{cubic meter per second}
\stoptyping
but equally valid is:
\starttyping
10 \unit{m2/s}
\stoptyping
and also
\starttyping
\unit{10 m2/s}
\stoptyping
is okay. So, one can use the short (often official) symbols as well as more
verbose names. In order to see what gets output we cook up some bogus units.
\startbuffer
30 \unit{kilo pascal square meter / kelvin second}
\stopbuffer
\typebuffer
This gets rendered as: \getbuffer. The export looks as follows:
\starttyping
30 kPa⋅m2/K⋅s
\stoptyping
\startbuffer
\unit{30 kilo pascal square meter / kelvin second}
\stopbuffer
You can also say:
\typebuffer
and get: \getbuffer. This time the export looks like this:
\starttyping
30
kPa⋅m2/K⋅s
\stoptyping
\startbuffer
$30 \unit{kilo pascal square meter / kelvin second }$
\stopbuffer
When we use units in math, the rendering is mostly the same. So,
\typebuffer
Gives: \getbuffer, but the export now looks different:
\starttyping
30
k
P
a
⋅
m
2
/
K
⋅
s
\stoptyping
Watch how we provide some extra information about it being a unit and how the
rendering is controlled as by default a renderer could turn the \type {K} and
other identifiers into math italic. Of course the subtle spacing is lost as we
assume a clever renderer that can use the information provided in the \type
{maction}.
\stopsection
\startsection[title=Conclusion]
So far the results of the export look quite acceptable. It is to be seen to what
extent typographic detail will be added. Thanks to \UNICODE\ math we don't need
to add style directives. Because we carry information with special spaces, we
could add these details if needed but for the moment the focus is on getting the
export robust on the one end, and extending \CONTEXT's math support with some
additional structure.
The export shows in the previous sections was not entirely honest: we didn't show
the wrapper. Say that we have this:
\startbuffer
\startformula
e = mc^2
\stopformula
\stopbuffer
\typebuffer
This shows up as:
\getbuffer
and exports as:
\starttyping[escape=yes]
/BTEX\it e/ETEX
=
/BTEX\it m/ETEX
/BTEX\it c/ETEX
2
\stoptyping
\startbuffer
\placeformula
\startformula
e = mc^2
\stopformula
\stopbuffer
\typebuffer
This becomes:
\getbuffer
and exports as:
\starttyping[escape=yes]
/BTEX\it e/ETEX
=
/BTEX\it m/ETEX
/BTEX\it c/ETEX
2
(1.1)
\stoptyping
The caption can also have a label in front of the number. The best way to deal
with this still under consideration. I leave it to the reader to wonder how we
get the caption at the same level as the content while in practice the number is
part of the formula.
Anyway, the previous pages have demonstrated that with version 0.22 of the
exporter we can already get a quite acceptable math export. Of course more will
follow.
\stopsection
\stopchapter
\stopcomponent