From f1129626606384a7a55a21a83531f51f8b5dee25 Mon Sep 17 00:00:00 2001 From: Hans Hagen Date: Tue, 14 Jul 2020 00:25:53 +0200 Subject: 2020-07-13 23:52:00 --- .../general/manuals/evenmore/evenmore-keywords.tex | 288 +++++++++++++++++++++ 1 file changed, 288 insertions(+) create mode 100644 doc/context/sources/general/manuals/evenmore/evenmore-keywords.tex (limited to 'doc/context/sources/general/manuals/evenmore/evenmore-keywords.tex') diff --git a/doc/context/sources/general/manuals/evenmore/evenmore-keywords.tex b/doc/context/sources/general/manuals/evenmore/evenmore-keywords.tex new file mode 100644 index 000000000..372e54781 --- /dev/null +++ b/doc/context/sources/general/manuals/evenmore/evenmore-keywords.tex @@ -0,0 +1,288 @@ +% language=us + +\environment evenmore-style + +\startcomponent evenmore-keywords + +\startchapter[title=Keywords] + +Some primitives in \TEX\ can take one or more optional keywords and|/|or keywords +followed by one or more values. In traditional \TEX\ it concerns a handful of +primitives, in \PDFTEX\ there are plenty of backend related primitives, \LUATEX\ +introduced optional keywords to some math constructs and attributes to boxes, +while \LUAMETATEX\ adds some more too. The keyword scanner in \TEX\ is kind of +special. Keywords are used in cases like: + +\starttyping +\hbox spread 10cm {...} +\advance\scratchcounter by 10 +\vrule width 3cm height 1ex +\stoptyping + +Sometimes there are multiple keywords, as with rules, in which case you can +imagine use cases like: + +\starttyping +\vrule width 3cm depth 1ex width 10cm depth 0ex height 1ex\relax +\stoptyping + +Here we add a \type {\relax} to end the scanning. If we don't do that and the +rule specification is followed by arbitrary (read: unpredictable) text, the next +word can as well be valid keyword and when followed by a dimensions (unlikely) it +will happily take that as directive or when not followed by a dimension an error +message will show up. Sometimes the scanning is more restricted, like with glue +where the optional \type {plus} and \type {minus} are to come in that order, but +when missing, again a word from the text can be picked up if one doesn't +explicitly ends with a \type {\relax} or some other not relevant token. + +\starttyping +\scratchskip = 10pt plus 10pt minus 10pt % okay +\scratchskip = 10pt plus 10pt % okay +\scratchskip = 10pt minus 10pt % okay +\scratchskip = 10pt minus 10pt plus 10pt % typesets "plus 10pt" +\scratchskip = 10pt plus whatever % an error +\stoptyping + +The scanner is case insensitive, so the following specifications are all valid: + +\starttyping +\hbox To 10cm {To} +\hbox TO 10cm {TO} +\hbox tO 10cm {tO} +\hbox to 10cm {to} +\stoptyping + +It happens that keywords are always simple english words so the engine uses a +cheap check deep down, just offsetting to uppercase, but of course that will not +work for arbitrary \UTF\ (as used in \LUATEX) and it's also unrelated to the +upper- and lowercase codes as \TEX\ knows them. + +The above lines scan for the keyword \type {to} and after that for a dimension. +Where keyword scanning is case tolerant, dimension scanning is period tolerant: + +\starttyping +\hbox to 10cm {10cm} +\hbox to 10.0cm {10.0cm} +\hbox to .0cm {.0cm} +\hbox to .cm {.cm} +\hbox to 10.cm {10.cm} +\stoptyping + +These are all valid and according to the specification; even the single period one +is okay, although it looks funny. It would not be hard to intercept that but I +guess that when \TEX\ was written anything that could harm performance was taken +into account and the above is quite okay. One can even argue for cases like: + +\starttyping +\hbox to \first.\second cm {.cm} +\stoptyping + +Here \type {\first} and|/|or \type {\second} can be empty. Most users won't +notice these side effects of scanning numbers anyway. + +The reason for even spending words on keywords is the following. Optional keyword +scanning is kind of costly, not so much now, but more so decades ago. For +instance, in the first line below, there is no keyword. The scanner sees a \type +{1} and it not being a keyword, pushes that character back in the input. + +\starttyping +\advance\scratchcounter 10 +\advance\scratchcounter by 10 +\stoptyping + +In the case of: + +\starttyping +\scratchskip 10pt plux +\stoptyping + +It has to push back the four scanned tokens \type {plux}. Now, in the engine +there are lots of cases where lookahead happens and when a condition is not +satisfied, the just read token is pushed back. Incidentally, when picking up the +next token triggered some expansion, it's not the original next token that gets +pushed back, but the first token seen at the expansion. Pushing back tokens is +not that inefficient, although it involves allocating a token and pushing and +popping input stacks (we're talking of a mix of reading from file, token memory, +\LUA\ prints, etc) but it always takes a little time and memory. In \LUATEX\ +there are more keywords for boxes, and there we have loops too: in a box +specification one or more optional attributes are scanned before the optional +\type {to} or \type {spread}, so again there can be push back when no more \type +{attr} are seen. + +\starttyping +\hbox attr 1 98 attr 2 99 to 1cm{...} +\stoptyping + +In \LUAMETATEX\ there is even more optional keyword scanning, but we leave that +for now and just show one example: + +\starttyping +\hbox spread 10em {\hss + \hbox orientation 0 yoffset 1mm to 2em {up}\hss + \hbox to 2em {here}\hss + \hbox orientation 0 xoffset -1mm to 2em {down}\hss +} +\stoptyping + +Although one cannot mess to much with these low level scanners there was room for +some optimization so the penalty we pay for more keyword scanning in \LUAMETATEX\ +is not that high. In fact, I often manage to compensate adding features that +have a possible performance hit with some gain elsewhere. + +Anyway, it will be no surprise that there can be interesting side effects to +keyword scanning. For instance, using the two character keyword \type {by} in an +advance can be more efficient because nothing needs to be pushed back. The same is +true for the sometimes optional equal: + +\starttyping +\scratchskip = 10pt +\stoptyping + +Similar impacts on efficiency can be found in the way the end of a number is +seen, basically anything not resolving to a number (or digit). + +\starttyping +\scratchcounter 10% space not seen, ends \cs +\scratchcounter =10% no push back of optional = +\scratchcounter = 10% extra optional space gobble +\scratchcounter = 10 % efficient ending of number scanning +\scratchcounter = 10\relax % depending on engine less efficient +\stoptyping + +In the above examples scanning the number involves: skipping over spaces, +checking for an optional equal, skipping over spaces, scanning for a sign, +checking for an optional octal or hexadecimal trigger (single or double quote), +scanning the number till a non digit is seen. In the case of dimensions there is +fraction scanning as well as unit scanning too. + +In any case, the equal is optional and kind of a keyword. Having an \type {equal} +can be more efficient then not having one, again due to push back in case of no +equal being seen, In the process spaces have been skipped, so add to the overhead +the scanning for optional spaces. In \LUAMETATEX\ all that has been optimized a +bit. By the way, in dimension scanning \type {pt} is actually a keyword and as +there are several dimensions possible quite some push back can happen there, but +we scan for the most likely candidates first. + +All that said, we're now ready for a surprise. The keyword scanner gets a string +that it will test for, say \type {to} in case of a box specification. It then +will fetch tokens from whatever provides the input. A token encodes a so called +command and a character and can be related to a control sequence. For instance, +the character \type {t} becomes a letter command with related value \number`t. +So, we have three properties: the command code, the character code and the +control sequence code. Now, instead of checking if the command code is a letter +or other character (two checks) a fast check happens for the control sequence +code being zero. If that is the case, the character code is compared. In practice +that works out well because the characters that make up a keyword are in the +range \number"41\ upto \number"5A\ and \number"61\ upto \number"7A, and all other +character codes are either below that (the ones that relate to primitives where +the character code is actually a sub command of a limited range) or much larger +numbers that for instance indicate an entry in some array, where the first useful +index is above the mentioned ranges. + +The surprise is in the fact that there is no checking for letters or other +characters, so this is why the next code will work too: \footnote {No longer in +\LUAMETATEX\ where we do a bit more robust check.} + +\starttyping +\catcode `O= 1 \hbox tO 10cm {...} % { begingroup +\catcode `O= 2 \hbox tO 10cm {...} % } endgroup +\catcode `O= 3 \hbox tO 10cm {...} % $ mathshift +\catcode `O= 4 \hbox tO 10cm {...} % & alignment +\catcode `O= 6 \hbox tO 10cm {...} % # parameter +\catcode `O= 7 \hbox tO 10cm {...} % ^ superscript +\catcode `O= 8 \hbox tO 10cm {...} % _ subscript +\catcode `O=11 \hbox tO 10cm {...} % letter +\catcode `O=12 \hbox tO 10cm {...} % other +\stoptyping + +In the first line, when we would use change the catcode of \type {T} and use that +one it would kind of fails because they \TEX\ sees a begin group character and +starts the group, but as a second character in a keyword it's okay because \TEX\ +will not look at the category code. + +Of course only the cases \type {11} and \type {12} make sense because one can +imagine that messing with the category codes of regular letters this way will +definitely give problems with processing the text. In a case like: + +\starttyping +{\catcode `o=3 \hbox to 10cm {oeps}} % $ mathshift {oeps} +{\catcode `O=3 \hbox to 10cm {Oeps}} % $ mathshift {$eps} +\stoptyping + +we have several issues: the primitive control sequence \type {\hbox} has an \type +{o} so \TEX\ will stop after \type {\hb} which can be undefined or a valid macro +and what happens next is hard to predict. Going uppercase will work but then the +content of the box is bad because there the \type {O} enters math. + +\starttyping +{\catcode `O=3 \hbox tO 10cm {Oeps Oeps}} % {$eps $eps} +\stoptyping + +This will work because there are now two \type {O} in the box so we have balanced +inline math triggers. But how does one explain that to a user, who probably +doesn't understand where an error message comes from in the first place. Anyway, +this kind of tolerance is still not pretty so in \LUAMETATEX\ we now check for +the command code and stick to letters and other characters. On today's machines +(and even on my by now ancient workhorse) the performance hit can be neglected. +Actually, by intercepting the weird cases we also avoid an unnecessary case check +when we fall through the zero cs test. Of course that also means that the above +mentioned category code trickery doesn't work any more: only letters and other +characters are now valid in keyword scanning. Now, it can be that some macro +programmer actually used those side effects but apart from some macro hacker +being hurt because no longer mastering those details can be showed off, it is +users that we care more for, don't we? + +Now get me right, the above mentioning of performance of keyword and equal +scanning is not that relevant in practice. But for the record, here are some +timings on a laptop with a i7-3849QM processor using \MINGW\ binaries on a 64 bit +\MSWINDOWS\ 10. The times are the averages of five times a million such +assignments and advancements: + +\starttabulate[|l|c|c|c|] +\FL +\NC one million times \NC terminal \NC \LUAMETATEX\ \NC \LUATEX \NC \NR +\ML +\NC \type {\advance\scratchcounter 1} \NC space \NC 0.068 \NC 0.085 \NC \NR +\NC \type {\advance\scratchcounter 1} \NC \type {\relax} \NC 0.135 \NC 0.149 \NC \NR +\NC \type {\advance\scratchcounter by 1} \NC space \NC 0.087 \NC 0.099 \NC \NR +\NC \type {\advance\scratchcounter by 1} \NC \type {\relax} \NC 0.155 \NC 0.161 \NC \NR +\NC \type {\scratchcounter 1} \NC space \NC 0.057 \NC 0.096 \NC \NR +\NC \type {\scratchcounter 1} \NC \type {\relax} \NC 0.125 \NC 0.151 \NC \NR +\NC \type {\scratchcounter=1} \NC space \NC 0.063 \NC 0.080 \NC \NR +\NC \type {\scratchcounter=1} \NC \type {\relax} \NC 0.131 \NC 0.138 \NC \NR +\LL +\stoptabulate + +We differentiate between using a space as terminal or a \type {\relax}. The later +is a bit less efficient because more code is involved in resolving the meaning of +that control sequence (which eventually boils down to nothing) but nevertheless, +these are not timings that one can loose sleep over, especially when the rest of +a decent \TEX\ run is taken into account. And yes, \LUAMETATEX\ is a bit faster +here than \LUATEX, but I would be disappointed if that weren't the case. + +% luametatex: + +% \luaexpr{(0.068+0.070+0.069+0.067+0.068)/5} 0.068\crlf +% \luaexpr{(0.137+0.132+0.136+0.137+0.134)/5} 0.135\crlf +% \luaexpr{(0.085+0.088+0.084+0.089+0.087)/5} 0.087\crlf +% \luaexpr{(0.145+0.160+0.158+0.156+0.154)/5} 0.155\crlf +% \luaexpr{(0.060+0.055+0.059+0.055+0.056)/5} 0.057\crlf +% \luaexpr{(0.118+0.127+0.128+0.122+0.130)/5} 0.125\crlf +% \luaexpr{(0.063+0.062+0.067+0.061+0.063)/5} 0.063\crlf +% \luaexpr{(0.127+0.128+0.133+0.128+0.140)/5} 0.131\crlf + +% luatex: + +% \luaexpr{(0.087+0.090+0.083+0.081+0.086)/5} 0.085\crlf +% \luaexpr{(0.150+0.151+0.146+0.154+0.145)/5} 0.149\crlf +% \luaexpr{(0.100+0.092+0.113+0.094+0.098)/5} 0.099\crlf +% \luaexpr{(0.162+0.165+0.161+0.160+0.157)/5} 0.161\crlf +% \luaexpr{(0.093+0.101+0.086+0.100+0.098)/5} 0.096\crlf +% \luaexpr{(0.147+0.151+0.160+0.144+0.151)/5} 0.151\crlf +% \luaexpr{(0.076+0.085+0.088+0.073+0.078)/5} 0.080\crlf +% \luaexpr{(0.136+0.138+0.142+0.135+0.140)/5} 0.138\crlf + + +\stopchapter + +\stopcomponent -- cgit v1.2.3