summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/evenmore/evenmore-keywords.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/context/sources/general/manuals/evenmore/evenmore-keywords.tex')
-rw-r--r--doc/context/sources/general/manuals/evenmore/evenmore-keywords.tex288
1 files changed, 288 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/evenmore/evenmore-keywords.tex b/doc/context/sources/general/manuals/evenmore/evenmore-keywords.tex
new file mode 100644
index 000000000..372e54781
--- /dev/null
+++ b/doc/context/sources/general/manuals/evenmore/evenmore-keywords.tex
@@ -0,0 +1,288 @@
+% language=us
+
+\environment evenmore-style
+
+\startcomponent evenmore-keywords
+
+\startchapter[title=Keywords]
+
+Some primitives in \TEX\ can take one or more optional keywords and|/|or keywords
+followed by one or more values. In traditional \TEX\ it concerns a handful of
+primitives, in \PDFTEX\ there are plenty of backend related primitives, \LUATEX\
+introduced optional keywords to some math constructs and attributes to boxes,
+while \LUAMETATEX\ adds some more too. The keyword scanner in \TEX\ is kind of
+special. Keywords are used in cases like:
+
+\starttyping
+\hbox spread 10cm {...}
+\advance\scratchcounter by 10
+\vrule width 3cm height 1ex
+\stoptyping
+
+Sometimes there are multiple keywords, as with rules, in which case you can
+imagine use cases like:
+
+\starttyping
+\vrule width 3cm depth 1ex width 10cm depth 0ex height 1ex\relax
+\stoptyping
+
+Here we add a \type {\relax} to end the scanning. If we don't do that and the
+rule specification is followed by arbitrary (read: unpredictable) text, the next
+word can as well be valid keyword and when followed by a dimensions (unlikely) it
+will happily take that as directive or when not followed by a dimension an error
+message will show up. Sometimes the scanning is more restricted, like with glue
+where the optional \type {plus} and \type {minus} are to come in that order, but
+when missing, again a word from the text can be picked up if one doesn't
+explicitly ends with a \type {\relax} or some other not relevant token.
+
+\starttyping
+\scratchskip = 10pt plus 10pt minus 10pt % okay
+\scratchskip = 10pt plus 10pt % okay
+\scratchskip = 10pt minus 10pt % okay
+\scratchskip = 10pt minus 10pt plus 10pt % typesets "plus 10pt"
+\scratchskip = 10pt plus whatever % an error
+\stoptyping
+
+The scanner is case insensitive, so the following specifications are all valid:
+
+\starttyping
+\hbox To 10cm {To}
+\hbox TO 10cm {TO}
+\hbox tO 10cm {tO}
+\hbox to 10cm {to}
+\stoptyping
+
+It happens that keywords are always simple english words so the engine uses a
+cheap check deep down, just offsetting to uppercase, but of course that will not
+work for arbitrary \UTF\ (as used in \LUATEX) and it's also unrelated to the
+upper- and lowercase codes as \TEX\ knows them.
+
+The above lines scan for the keyword \type {to} and after that for a dimension.
+Where keyword scanning is case tolerant, dimension scanning is period tolerant:
+
+\starttyping
+\hbox to 10cm {10cm}
+\hbox to 10.0cm {10.0cm}
+\hbox to .0cm {.0cm}
+\hbox to .cm {.cm}
+\hbox to 10.cm {10.cm}
+\stoptyping
+
+These are all valid and according to the specification; even the single period one
+is okay, although it looks funny. It would not be hard to intercept that but I
+guess that when \TEX\ was written anything that could harm performance was taken
+into account and the above is quite okay. One can even argue for cases like:
+
+\starttyping
+\hbox to \first.\second cm {.cm}
+\stoptyping
+
+Here \type {\first} and|/|or \type {\second} can be empty. Most users won't
+notice these side effects of scanning numbers anyway.
+
+The reason for even spending words on keywords is the following. Optional keyword
+scanning is kind of costly, not so much now, but more so decades ago. For
+instance, in the first line below, there is no keyword. The scanner sees a \type
+{1} and it not being a keyword, pushes that character back in the input.
+
+\starttyping
+\advance\scratchcounter 10
+\advance\scratchcounter by 10
+\stoptyping
+
+In the case of:
+
+\starttyping
+\scratchskip 10pt plux
+\stoptyping
+
+It has to push back the four scanned tokens \type {plux}. Now, in the engine
+there are lots of cases where lookahead happens and when a condition is not
+satisfied, the just read token is pushed back. Incidentally, when picking up the
+next token triggered some expansion, it's not the original next token that gets
+pushed back, but the first token seen at the expansion. Pushing back tokens is
+not that inefficient, although it involves allocating a token and pushing and
+popping input stacks (we're talking of a mix of reading from file, token memory,
+\LUA\ prints, etc) but it always takes a little time and memory. In \LUATEX\
+there are more keywords for boxes, and there we have loops too: in a box
+specification one or more optional attributes are scanned before the optional
+\type {to} or \type {spread}, so again there can be push back when no more \type
+{attr} are seen.
+
+\starttyping
+\hbox attr 1 98 attr 2 99 to 1cm{...}
+\stoptyping
+
+In \LUAMETATEX\ there is even more optional keyword scanning, but we leave that
+for now and just show one example:
+
+\starttyping
+\hbox spread 10em {\hss
+ \hbox orientation 0 yoffset 1mm to 2em {up}\hss
+ \hbox to 2em {here}\hss
+ \hbox orientation 0 xoffset -1mm to 2em {down}\hss
+}
+\stoptyping
+
+Although one cannot mess to much with these low level scanners there was room for
+some optimization so the penalty we pay for more keyword scanning in \LUAMETATEX\
+is not that high. In fact, I often manage to compensate adding features that
+have a possible performance hit with some gain elsewhere.
+
+Anyway, it will be no surprise that there can be interesting side effects to
+keyword scanning. For instance, using the two character keyword \type {by} in an
+advance can be more efficient because nothing needs to be pushed back. The same is
+true for the sometimes optional equal:
+
+\starttyping
+\scratchskip = 10pt
+\stoptyping
+
+Similar impacts on efficiency can be found in the way the end of a number is
+seen, basically anything not resolving to a number (or digit).
+
+\starttyping
+\scratchcounter 10% space not seen, ends \cs
+\scratchcounter =10% no push back of optional =
+\scratchcounter = 10% extra optional space gobble
+\scratchcounter = 10 % efficient ending of number scanning
+\scratchcounter = 10\relax % depending on engine less efficient
+\stoptyping
+
+In the above examples scanning the number involves: skipping over spaces,
+checking for an optional equal, skipping over spaces, scanning for a sign,
+checking for an optional octal or hexadecimal trigger (single or double quote),
+scanning the number till a non digit is seen. In the case of dimensions there is
+fraction scanning as well as unit scanning too.
+
+In any case, the equal is optional and kind of a keyword. Having an \type {equal}
+can be more efficient then not having one, again due to push back in case of no
+equal being seen, In the process spaces have been skipped, so add to the overhead
+the scanning for optional spaces. In \LUAMETATEX\ all that has been optimized a
+bit. By the way, in dimension scanning \type {pt} is actually a keyword and as
+there are several dimensions possible quite some push back can happen there, but
+we scan for the most likely candidates first.
+
+All that said, we're now ready for a surprise. The keyword scanner gets a string
+that it will test for, say \type {to} in case of a box specification. It then
+will fetch tokens from whatever provides the input. A token encodes a so called
+command and a character and can be related to a control sequence. For instance,
+the character \type {t} becomes a letter command with related value \number`t.
+So, we have three properties: the command code, the character code and the
+control sequence code. Now, instead of checking if the command code is a letter
+or other character (two checks) a fast check happens for the control sequence
+code being zero. If that is the case, the character code is compared. In practice
+that works out well because the characters that make up a keyword are in the
+range \number"41\ upto \number"5A\ and \number"61\ upto \number"7A, and all other
+character codes are either below that (the ones that relate to primitives where
+the character code is actually a sub command of a limited range) or much larger
+numbers that for instance indicate an entry in some array, where the first useful
+index is above the mentioned ranges.
+
+The surprise is in the fact that there is no checking for letters or other
+characters, so this is why the next code will work too: \footnote {No longer in
+\LUAMETATEX\ where we do a bit more robust check.}
+
+\starttyping
+\catcode `O= 1 \hbox tO 10cm {...} % { begingroup
+\catcode `O= 2 \hbox tO 10cm {...} % } endgroup
+\catcode `O= 3 \hbox tO 10cm {...} % $ mathshift
+\catcode `O= 4 \hbox tO 10cm {...} % & alignment
+\catcode `O= 6 \hbox tO 10cm {...} % # parameter
+\catcode `O= 7 \hbox tO 10cm {...} % ^ superscript
+\catcode `O= 8 \hbox tO 10cm {...} % _ subscript
+\catcode `O=11 \hbox tO 10cm {...} % letter
+\catcode `O=12 \hbox tO 10cm {...} % other
+\stoptyping
+
+In the first line, when we would use change the catcode of \type {T} and use that
+one it would kind of fails because they \TEX\ sees a begin group character and
+starts the group, but as a second character in a keyword it's okay because \TEX\
+will not look at the category code.
+
+Of course only the cases \type {11} and \type {12} make sense because one can
+imagine that messing with the category codes of regular letters this way will
+definitely give problems with processing the text. In a case like:
+
+\starttyping
+{\catcode `o=3 \hbox to 10cm {oeps}} % $ mathshift {oeps}
+{\catcode `O=3 \hbox to 10cm {Oeps}} % $ mathshift {$eps}
+\stoptyping
+
+we have several issues: the primitive control sequence \type {\hbox} has an \type
+{o} so \TEX\ will stop after \type {\hb} which can be undefined or a valid macro
+and what happens next is hard to predict. Going uppercase will work but then the
+content of the box is bad because there the \type {O} enters math.
+
+\starttyping
+{\catcode `O=3 \hbox tO 10cm {Oeps Oeps}} % {$eps $eps}
+\stoptyping
+
+This will work because there are now two \type {O} in the box so we have balanced
+inline math triggers. But how does one explain that to a user, who probably
+doesn't understand where an error message comes from in the first place. Anyway,
+this kind of tolerance is still not pretty so in \LUAMETATEX\ we now check for
+the command code and stick to letters and other characters. On today's machines
+(and even on my by now ancient workhorse) the performance hit can be neglected.
+Actually, by intercepting the weird cases we also avoid an unnecessary case check
+when we fall through the zero cs test. Of course that also means that the above
+mentioned category code trickery doesn't work any more: only letters and other
+characters are now valid in keyword scanning. Now, it can be that some macro
+programmer actually used those side effects but apart from some macro hacker
+being hurt because no longer mastering those details can be showed off, it is
+users that we care more for, don't we?
+
+Now get me right, the above mentioning of performance of keyword and equal
+scanning is not that relevant in practice. But for the record, here are some
+timings on a laptop with a i7-3849QM processor using \MINGW\ binaries on a 64 bit
+\MSWINDOWS\ 10. The times are the averages of five times a million such
+assignments and advancements:
+
+\starttabulate[|l|c|c|c|]
+\FL
+\NC one million times \NC terminal \NC \LUAMETATEX\ \NC \LUATEX \NC \NR
+\ML
+\NC \type {\advance\scratchcounter 1} \NC space \NC 0.068 \NC 0.085 \NC \NR
+\NC \type {\advance\scratchcounter 1} \NC \type {\relax} \NC 0.135 \NC 0.149 \NC \NR
+\NC \type {\advance\scratchcounter by 1} \NC space \NC 0.087 \NC 0.099 \NC \NR
+\NC \type {\advance\scratchcounter by 1} \NC \type {\relax} \NC 0.155 \NC 0.161 \NC \NR
+\NC \type {\scratchcounter 1} \NC space \NC 0.057 \NC 0.096 \NC \NR
+\NC \type {\scratchcounter 1} \NC \type {\relax} \NC 0.125 \NC 0.151 \NC \NR
+\NC \type {\scratchcounter=1} \NC space \NC 0.063 \NC 0.080 \NC \NR
+\NC \type {\scratchcounter=1} \NC \type {\relax} \NC 0.131 \NC 0.138 \NC \NR
+\LL
+\stoptabulate
+
+We differentiate between using a space as terminal or a \type {\relax}. The later
+is a bit less efficient because more code is involved in resolving the meaning of
+that control sequence (which eventually boils down to nothing) but nevertheless,
+these are not timings that one can loose sleep over, especially when the rest of
+a decent \TEX\ run is taken into account. And yes, \LUAMETATEX\ is a bit faster
+here than \LUATEX, but I would be disappointed if that weren't the case.
+
+% luametatex:
+
+% \luaexpr{(0.068+0.070+0.069+0.067+0.068)/5} 0.068\crlf
+% \luaexpr{(0.137+0.132+0.136+0.137+0.134)/5} 0.135\crlf
+% \luaexpr{(0.085+0.088+0.084+0.089+0.087)/5} 0.087\crlf
+% \luaexpr{(0.145+0.160+0.158+0.156+0.154)/5} 0.155\crlf
+% \luaexpr{(0.060+0.055+0.059+0.055+0.056)/5} 0.057\crlf
+% \luaexpr{(0.118+0.127+0.128+0.122+0.130)/5} 0.125\crlf
+% \luaexpr{(0.063+0.062+0.067+0.061+0.063)/5} 0.063\crlf
+% \luaexpr{(0.127+0.128+0.133+0.128+0.140)/5} 0.131\crlf
+
+% luatex:
+
+% \luaexpr{(0.087+0.090+0.083+0.081+0.086)/5} 0.085\crlf
+% \luaexpr{(0.150+0.151+0.146+0.154+0.145)/5} 0.149\crlf
+% \luaexpr{(0.100+0.092+0.113+0.094+0.098)/5} 0.099\crlf
+% \luaexpr{(0.162+0.165+0.161+0.160+0.157)/5} 0.161\crlf
+% \luaexpr{(0.093+0.101+0.086+0.100+0.098)/5} 0.096\crlf
+% \luaexpr{(0.147+0.151+0.160+0.144+0.151)/5} 0.151\crlf
+% \luaexpr{(0.076+0.085+0.088+0.073+0.078)/5} 0.080\crlf
+% \luaexpr{(0.136+0.138+0.142+0.135+0.140)/5} 0.138\crlf
+
+
+\stopchapter
+
+\stopcomponent