summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/followingup/followingup-expressions.tex
blob: f5bce052cdbecb0c5a40422105bbfd37488403fe (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
% language=us runpath=texruns:manuals/followingup

\startcomponent followingup-expressions

\environment followingup-style

\startchapter[title={Expressions}]

\startsection[title={Introduction}]

Do we need bitwise expressions? Actually the answer is \quotation {no, although
not until recently}. In \CONTEXT\ \MKII\ and \MKIV\ we just use integer addition
because we only need to enable things but in \LMTX\ we want to control de
detailed modes that some mechanisms in the engine provides and in order to not
have tons of parameters these use bit sets. We manipulate these with the bitwise
macros that actually are efficient \LUA\ function calls. But, as with some other
extensions in \LUAMETATEX, one way to prevent tracing clutter is to have a few
handy primitives. So let's see what we got.

{\em I haven't checked all operators and combinations yet!}

\stopsection

\startsection[title={Exploration}]

Already early in the \LUAMETATEX\ development (2019) the expression parser was
extended with an integer division operator \type {:} that we actually use in
\LMTX, and soon after that I added basic bitwise operators but these were never
activated but kept as comment because I didn't want to impact the scanner (even
if we can afford to loose some performance because the scanner has been
optimized). But in the process of cleaning up \quote {todo} comments in the
source code I eventually arrived at expressions again.

The colon already makes the scanner incompatible because \type {\numexpr 1+2:}
expects a number (which means that we cannot port back) and more operators only
make that less likely. In \CONTEXT\ I nearly always use \type {\relax} as
terminator unless we're sure that lookahead is no issue. \footnote {In the \ETEX\
expression parser, the normal \type {/} rounds the result. Both the \type {*} and
\type {/} operator have a dedicated code path that assures no loss of accuracy.
The \type {:} operator just divides like \LUA's \type {//} which is an integer
division operator. There are subtle differences between the division variants
which can be noticeable when you go round trip. That is actually the main reason
why this was one of the first things added to \LUAMETATEX\ as I wanted to get rid
of some few scaled point rounding issues. The \ETEX\ expression parser is
somewhat complicated because it can deal with a mix of integers, dimensions and
even glue, but always brings the result back to its main operating model. Because
we adopted some of these \ETEX\ rather early in \CONTEXT\ lookahead pitfalls are
taken care of already.}

When going over the code in 2021, mostly because I wanted to get rid of some
commented experiments, I decided that the extension should not go into the
normal scanner but that a dedicated, simple and integer only scanner made more
sense, so during a rainy summer weekend I started playing with that. It eventually
became a bit more than initially intended, although the amount of code is rather
minimal. The performance was about twice that of the already available bitwise
macros but operator precedence was not provided (apart from the multiplication
and division operators). The final implementation was different, not that much
faster on simple bitwise operations but could do more complex things in one go.
Performance was not a real reason to provide this anyway because we're talking
microseconds, it's more about less code and better readability.

The initial primitive command was \type {\bitexpr} and it supported nesting with
parenthesis as the other expressions do. Because there are many operators, also
verbose ones, the non|-|optional \type {\relax} token finishes parsing. But
soon we moved on to two dedicated primitives.

\stopsection

\startsection[title={Operators}]

The set of operators that we have to support is the following. Most have
alternatives so that we can get around catcode issues.

\starttabulate[||cT|cT|]
\BC add       \NC +                    \NC        \NC \NR
\BC subtract  \NC -                    \NC        \NC \NR
\BC multiply  \NC *                    \NC        \NC \NR
\BC divide    \NC / :                  \NC        \NC \NR
\BC mod       \NC \letterpercent       \NC mod    \NC \NR
\BC band      \NC &                    \NC band   \NC \NR
\BC bxor      \NC ^                    \NC bxor   \NC \NR
\BC bor       \NC \letterbar \space v  \NC bor    \NC \NR
\BC and       \NC &&                   \NC and    \NC \NR
\BC or        \NC \letterbar\letterbar \NC or     \NC \NR
\BC setbit    \NC <undecided>          \NC bset   \NC \NR
\BC resetbit  \NC <undecided>          \NC breset \NC \NR
\BC left      \NC <<                   \NC        \NC \NR
\BC right     \NC >>                   \NC        \NC \NR
\BC less      \NC <                    \NC        \NC \NR
\BC lessequal \NC <=                   \NC        \NC \NR
\BC equal     \NC = ==                 \NC        \NC \NR
\BC moreequal \NC >=                   \NC        \NC \NR
\BC more      \NC >                    \NC        \NC \NR
\BC unequal   \NC <> != \lettertilde = \NC        \NC \NR
\BC not       \NC ! \lettertilde       \NC not    \NC \NR
\stoptabulate

I considered using \type {++} and type {--} as the \type {bset} and \type
{bunset} shortcuts but that leads to issues because in \TEX\ \type {-+-++--10} is
a valid number and one never knows what sequence (without spaces) gets fed into
an expression.

Originally I'd added some \UNICODE\ characters but for some reason support of
logical operators is suboptimal so I removed that feature. Because these special
characters are multi|-|byte \UTF\ sequences they are not that much better than
verbose words anyway.

% 0x00AC  !    ¬              lua: not
% 0x00D7  *    ×
% 0x00F7  /    ÷
% 0x2227  &&   ∧ c: and       lua: and
% 0x2228  ||   ∨ c: or        lua: or
% 0x2229  &    ∩ c: bitand    lua: band
% 0x222A  |    ∪ c: bitor     lua: bor
%         ^      c: bitxor    lua: bxor
% 0x2260  !=   ≠
% 0x2261  ==   ≡
% 0x2264  <=   ≤
% 0x2265  >=   ≥
% 0x22BB  xor  ⊻
% 0x22BC  nand ⊼
% 0x22BD  nor  ⊽
% 0x22C0  and  ⋀ n-arry logical and
% 0x22C1  or   ⋁ n-arry logical or
% 0x2AA1  <<   ⪡
% 0x2AA2  >>   ⪢

\stopsection

\startsection[title={Integers and dimensions}]

When I was playing a bit with this feature, I wondered if we could mix in some
dimensions. It was actually not that hard to add this: only explicit (verbose)
dimensions had to be intercepted because dimen registers and such are seen as
integers by the integer scanner. Once we're able do handle that, a next step was
to make sure that \typ {2 * 10pt} was permitted, something that the \ETEX\ \type
{\dimexpr} primitives can't handle. So, a variant of the dimen parser has to be
used that makes the unit optional: \type {\dimexpression} and \type
{\numexpression} were born.

The resulting parsers worked quite well but were about twice as slow as the
normal expression scanners but that is no surprise because they do more. For
instance we are case insensitive and need to handle letter and other (and in a
few cases alignment and superscript) catcodes too. However, with a slightly tuned
integer parser, also possible because the sentinel \type {\relax} makes parsing
more predictable, and a dedicated unit scanner, in the end both the integer and
dimension parser were performing well. It's not like we run them millions of
times in a document.

\startbuffer
\scratchcounter = \numexpression
    "00000 bor "00001 bor "00020 bor "00400 bor "08000 bor "F0000
\relax
\stopbuffer

Here is an example that results in {0x\inlinebuffer\uchexnumber\scratchcounter}:

\typebuffer

\startbuffer
\scratchcounter = \numexpression
    "FFFFF bxor "10101
\relax
\stopbuffer

And this gives {0x\inlinebuffer\uchexnumber\scratchcounter}:

\typebuffer

We can give numerous example but you get the picture. In the above table you can
see that some operators have equivalents. The reason for this is that a macro
package can change catcodes and some characters have special meanings. So, the
scanner is rather tolerant.

\startbuffer
\scratchcounterone = 10
\scratchcountertwo = 20
\ifcase \numexpression
    (\scratchcounterone > 5) && (\scratchcountertwo > 5)
\relax yes\else nop\fi
%
\space
%
\scratchcounterone = 2
\scratchcountertwo = 4
\ifcase \numexpression
    (\scratchcounterone > 5) and (\scratchcountertwo > 5)
\relax nop\else yes\fi
\stopbuffer

And this gives \quote {\tttf \inlinebuffer}:

\typebuffer

The normal expansion rules apply, so one can use macros and other symbolic
numbers. The only difference in handling dimensions is that we don't support
\type {true} units but these are obsolete in \LUAMETATEX\ anyway.

\stopsection

\startsection[title={Tracing}]

When \type {\tracingexpressions} is set to one or higher the intermediate \quote
{reverse polish notation} stack that is used for the calculation is shown, for
instance:

\starttyping
4:8: {numexpression rpn: 2 5 > 4 5 > and}
\stoptyping

When you want the output on your console, you need to say:

\starttyping
\tracingexpressions 1
\tracingonline      1
\stoptyping

The fact that we process the expression in two phases makes it possible to provide this
kind of tracing.

\stopsection

\startsection[title={Performance}]

The following table shows the results of 100.000 evaluations (per line) so you'll
notice that there is a difference. But keep in mind that the new variant can so
more, so it might pay off when we have cases that otherwise demand multiple
traditional expressions.

\starttabulate[|l|c|]
\NC \type {\dimexpr       4pt*2 + 6pt\relax}        \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchdimen  \dimexpr       4pt*2 + 6pt\relax} \elapsedtime\fi \NC \NR
\NC \type {\dimexpression 4pt*2 + 6pt\relax}        \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchdimen  \dimexpression 4pt*2 + 6pt\relax} \elapsedtime\fi \NC \NR
\NC \type {\dimexpression 2*4pt + 6pt\relax}        \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchdimen  \dimexpression 4pt*2 + 6pt\relax} \elapsedtime\fi \NC \NR
\TB
\NC \type {\numexpr       4 * 2 + 6\relax}          \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpr       4 * 2 + 6\relax}   \elapsedtime\fi \NC \NR
\NC \type {\numexpression 2 * 4 + 6\relax}          \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpression 2 * 4 + 6\relax}   \elapsedtime\fi \NC \NR
\TB
\NC \type {\numexpr       4*2+6\relax}              \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpr       4*2+6\relax}       \elapsedtime\fi \NC \NR
\NC \type {\numexpression 2*4+6\relax}              \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpression 2*4+6\relax}       \elapsedtime\fi \NC \NR
\TB
\NC \type {\numexpr       (1+2)*(3+4)\relax}        \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpr       (1+2)*(3+4)\relax} \elapsedtime\fi \NC \NR
\NC \type {\numexpression (1+2)*(3+4)\relax}        \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpression (1+2)*(3+4)\relax} \elapsedtime\fi \NC \NR
\TB
\NC \type {\numexpr       (1 + 2) * (3 + 4) \relax} \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpr       (1 + 2) * (3 + 4) \relax} \elapsedtime\fi \NC \NR
\NC \type {\numexpression (1 + 2) * (3 + 4) \relax} \EQ \iftrialtypesetting\else\testfeatureonce{100000}{\scratchcounter\numexpression (1 + 2) * (3 + 4) \relax} \elapsedtime\fi \NC \NR
\stoptabulate

As usual I'll probably find some way to improve performance a bit but that might
than also concern the traditional one. When we compare them, the new numeric
scanner suffers from more options while the new dimension parser gain on the
units. Also, keep in mind than the \LUAMETATEX\ normal parsers are already
somewhat faster than the ones in \LUATEX. The numbers above are calculated when
this document is rendered, so they may change over time and per run. The two
engines compare as follows (mid 2021):

\starttabulate[|l|c|c|]
\NC                                           \BC \LUATEX \BC \LUAMETATEX \NC \NR
\NC \type {\dimexpr 4pt*2 + 6pt\relax}        \NC 0.073   \NC 0.045 \NC \NR
\NC \type {\numexpr 4 * 2 + 6\relax}          \NC 0.034   \NC 0.028 \NC \NR
\NC \type {\numexpr 4*2+6\relax}              \NC 0.035   \NC 0.032 \NC \NR
\NC \type {\numexpr (1+2)*(3+4)\relax}        \NC 0.050   \NC 0.047 \NC \NR
\NC \type {\numexpr (1 + 2) * (3 + 4) \relax} \NC 0.052   \NC 0.048 \NC \NR
\stoptabulate

Of course tests like these are dubious because often \CPU\ cache will keep the
current code accessible, but who knows.

It will probably take a while before I will use this in the source code because
first I need to make sure that all works as expected and while doing that I might
adapt some of this. But the basic framework is there.

\stopsection

% \start
% \nologbuffering
% \scratchdimen    100pt
% \scratchdimenone 65.536pt
% \scratchdimentwo 65.536bp

% \tracingonline1
% \tracingexpressions1
% \scratchcounter\bitexpr \scratchdimen / 2   \relax\the\scratchcounter\par

% \scratchcounter\numexpression \scratchdimen / 2sp \relax \the\scratchcounter\par
% \scratchcounter\numexpression \scratchdimen / 1pt \relax \the\scratchcounter\par
% \scratchcounter\numexpression \scratchdimenone / 65.536pt \relax \the\scratchcounter\par
% \scratchcounter\numexpression \scratchdimentwo / 2 \relax \the\scratchcounter\par

% \scratchcounter\numexpression \scratchcounterone / 4 \relax \the\scratchcounter\par
% \scratchdimen  \dimexpression \scratchcounterone / 4 \relax \the\scratchdimen\par

% \scratchdimen  \dimexpression 2 * 4pt \relax \the\scratchdimen\par

% \tracingexpressions0
% \tracingonline0

% \startTEXpage
% \tracingonline1
% \tracingexpressions1
% \the\dimexpr -10pt\relax\quad
% \the\dimexpr  10pt\relax\quad
% \the\dimexpr  10.12 pt\relax\quad
% \the\dimexpression -10pt\relax\quad
% \the\dimexpression  10pt\relax\quad
% \stopTEXpage

\stopchapter

\stopcomponent