summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/context/documents/general/manuals/evenmore.pdfbin1509228 -> 1607905 bytes
-rw-r--r--doc/context/documents/general/manuals/luametatex.pdfbin1229180 -> 1229175 bytes
-rw-r--r--doc/context/sources/general/manuals/evenmore/evenmore-parsing.tex425
-rw-r--r--doc/context/sources/general/manuals/evenmore/evenmore-tokens.tex460
-rw-r--r--doc/context/sources/general/manuals/evenmore/evenmore.tex5
-rw-r--r--doc/context/sources/general/manuals/luametatex/luametatex-tex.tex13
-rw-r--r--doc/context/sources/general/manuals/luametatex/luametatex.tex7
7 files changed, 901 insertions, 9 deletions
diff --git a/doc/context/documents/general/manuals/evenmore.pdf b/doc/context/documents/general/manuals/evenmore.pdf
index 6e5b1105e..0f7da622e 100644
--- a/doc/context/documents/general/manuals/evenmore.pdf
+++ b/doc/context/documents/general/manuals/evenmore.pdf
Binary files differ
diff --git a/doc/context/documents/general/manuals/luametatex.pdf b/doc/context/documents/general/manuals/luametatex.pdf
index 93dcc0868..f3a25d6b3 100644
--- a/doc/context/documents/general/manuals/luametatex.pdf
+++ b/doc/context/documents/general/manuals/luametatex.pdf
Binary files differ
diff --git a/doc/context/sources/general/manuals/evenmore/evenmore-parsing.tex b/doc/context/sources/general/manuals/evenmore/evenmore-parsing.tex
new file mode 100644
index 000000000..d0cc29db1
--- /dev/null
+++ b/doc/context/sources/general/manuals/evenmore/evenmore-parsing.tex
@@ -0,0 +1,425 @@
+% language=us
+
+\environment evenmore-style
+
+\startcomponent evenmore-parsing
+
+\startchapter[title=Parsing]
+
+The macro mechanism is \TEX\ is quite powerful and once you understand the
+concept of mixing parameters and delimiters you can do a lot with it. I assume
+that you know what we're talking about, otherwise quit reading. When grabbing
+arguments, there are a few catches.
+
+\startitemize
+\startitem
+ When they are used, delimiters are mandate: \TEX\ will go on reading an
+ argument till the (current) delimiter condition is met. This means that when
+ you forget one you end up with way more in the argument than expected or even
+ run out of input.
+\stopitem
+\startitem
+ Because specified arguments and delimiters are mandate, when you want to
+ parse input, you often need multi|-|step macros that first pick up the to be
+ parsed input, and then piecewise fetch snippets. Bogus delimiters have to be
+ appended to the original in order to catch a run away argument and checking
+ has to be done to get rid of them when all is ok.
+\stopitem
+\stopitemize
+
+The first item can be illustrated as follows:
+
+\starttyping[option=TEX]
+\def\foo[#1]{...}
+\stoptyping
+
+When \type {\foo} gets expanded \TEX\ first looks for a \type{[} and then starts
+collecting tokens for parameter \type {#1}. It stops doing that when aa \type {]}
+is seen. So,
+
+\starttyping[option=TEX]
+\starttext
+ \foo[whatever
+\stoptext
+\stoptyping
+
+will for sure give an error. When collecting tokens, \TEX\ doesn't expand them so
+the \type {\stoptext} is just turned into a token that gets appended.
+
+The second item is harder to explain (or grasp):
+
+\starttyping[option=TEX]
+\def\foo[#1=#2]{(#1/#2)}
+\stoptyping
+
+Here we expect a key and a value, so these will work:
+
+\starttyping[option=TEX]
+\foo[key=value]
+\foo[key=]
+\stoptyping
+
+while these will fail:
+
+\starttyping[option=TEX]
+\foo[key]
+\foo[]
+\stoptyping
+
+unless we have:
+
+\starttyping[option=TEX]
+\foo[key]=]
+\foo[]=]
+\stoptyping
+
+But, when processing the result, we then need to analyze the found arguments and
+correct for them being wrong. For instance, argument \type {#1} can become \type
+{]} or here \type {key]}. When indeed a valid key|/|value combination is given we
+need to get rid of the two \quote {fixup} tokens \type{=]}. Normally we will have
+multiple key|/|value pairs separated by a comma, and in practice we only need to
+catch the missing equal because we can ignore empty cases. There are plenty of
+examples (rather old old code but also more modern variants) in the \CONTEXT\
+code base.
+
+I will now show some new magic that is available in \LUAMETATEX\ as experimental
+code. It will be tested in \LMTX\ for a while and might evolve in the process.
+
+\startbuffer
+\def\foo#1=#2,{(#1/#2)}
+
+\foo 1=2,\ignorearguments
+\foo 1=2\ignorearguments
+\foo 1\ignorearguments
+\foo \ignorearguments
+\stopbuffer
+
+\typebuffer[option=TEX]
+
+Here we pick up a key and value separated by an equal sign. We end the input with
+a special signal command: \type {\ignorearguments}. This tells the parser to quit
+scanning. So, we get this, without any warning with respect to a missing
+delimiter of running away:
+
+\getbuffer
+
+The implementation is actually fairly simple and adds not much overhead.
+Alternatives (and I pondered a few) are just too messy, would remind me too much
+of those awful expression syntaxes, and definitely impact performance of macro
+expansion, therefore: a no|-|go.
+
+Using this new feature, we can implement a key value parser that does a sequence.
+The prototypes used to get here made only use of this one new feature and
+therefore still had to do some testing of the results. But, after looking at the
+code, I decided that a few more helpers could make better looking code. So this
+is what I ended up with:
+
+\startbuffer
+\def\grabparameter#1=#2,%
+ {\ifarguments\or\or
+ % (\whatever/#1/#2)\par%
+ \expandafter\def\csname\namespace#1\endcsname{#2}%
+ \expandafter\grabnextparameter
+ \fi}
+
+\def\grabnextparameter
+ {\expandafterspaces\grabparameter}
+
+\def\grabparameters[#1]#2[#3]%
+ {\def\namespace{#1}%
+ \expandafterspaces\grabparameter#3\ignorearguments\ignorearguments}
+\stopbuffer
+
+\typebuffer[option=TEX]
+
+Now, this one actually does what the \CONTEXT\ \type {\getparameters} command
+does: setting variables in a namespace. Being a parameter driven macro package
+this kind of macros have been part of \CONTEXT\ since the beginning. There are
+some variants and we also need to deal with the multilingual interface. Actually,
+\MKIV\ (and therefore \LMTX) do things a bit different, but the same principles
+apply.
+
+The \type {\ignorearguments} quits the scanning. Here we need two because we
+actually quit twice. The \type {\expandafterspaces} can be implemented in
+traditional \TEX\ macros but I though it is nice to have it this way; the fact
+that I only now added it has more to do with cosmetics. One could use the already
+somewhat older extension \type {\futureexpandis} (which expands the second or
+third token depending seeing the first, in this variant ignoring spaces) or a
+bunch of good old primitives to do the same. The new conditional \type
+{\ifarguments} can be used to act upon the number of arguments given. It reflects
+the most recently expanded macro. There is also a \type {\lastarguments}
+primitive (that provides the number of arguments.
+
+So, what are the benefits? You might think that it is about performance, but in
+practice there are not that many parameter settings going on. When I process the
+\LUAMETATEX\ manual, only some 5000 times one or more parameters are set. And
+even in a way more complex document that I asked my colleague to run I was a bit
+disappointed that only some 30.000 cases were reported. I know of users who have
+documents with hundreds of thousands of cases, but compared to the rest of
+processing this is not where the performance bottleneck is. \footnote {Think of
+thousands of pages of tables with cell settings applied.} This means that a
+change in implementation like the above is not paying off in significantly better
+runtime: all these low level mechanisms in \CONTEXT\ have been very well
+optimized over the years. And faster machines made old bottlenecks go away
+anyway. Take this use case:
+
+\starttyping[option=TEX]
+\grabparameters
+ [foo]
+ [key0=value0,
+ key1=value1,
+ key2=value2,
+ key3=value3]
+\stoptyping
+
+After this, parameters can be accessed with:
+
+\starttyping[option=TEX]
+\def\getvalue#1#2{\csname#1#2\endcsname}
+\stoptyping
+
+used as:
+
+\starttyping[option=TEX]
+\getvalue{foo}{key2}
+\stoptyping
+
+which takes care of characters normally not permitted in macro names, like the
+digits in this example. Of course some namespace protection can be added, like
+adding a colon between the namespace and the key, but let's take just this one.
+
+Some 10.000 expansions of the grabber take on my machine 0.045 seconds while the
+original \type {\getparameters} takes 0.090 so although for this case we're twice
+as fast, the 0.045 difference will not be noticed on a real run. After all, when
+these parameters are set some action will take place. Also, we don't actually use
+this macro for collecting settings with the \type {\setupsomething} commands, so
+the additional overhead that is involved adds a baseline to performance that can
+turn any gain into noise. But some users might notice some gain. Of course this
+observation might change once we apply this trickery in more places than
+parameter parsing, because I have to admit that there might be other places in
+the support macros where we can benefit: less code, double performance, but these
+are all support macros that made sense in \MKII\ and not that much in \MKIV\ or
+\LMTX\ and are kept just for convenience and backward compatibility. Think of
+some list processing macros. So, as a kind of nostalgic trip I decided to rewrite
+some low level macros anyway, if only to see what is no longer used and|/|or to
+make the code base somewhat (c)leaner.
+
+Elsewhere I introduce the \type {#0} argument indicator. That one will just
+gobbles the argument and does not store a token list on the stack. It saves some
+memory access and token recycling when arguments are not used. Another special
+indicator is \type {#+}. That one will flag an argument to be passed as|-|is. The
+\type {#-} variant will simply discard an argument and move on. The following
+examples demonstrate this:
+
+\startbuffer
+\def\foo [#1]{\detokenize{#1}}
+\def\ofo [#0]{\detokenize{#1}}
+\def\oof [#+]{\detokenize{#1}}
+\def\fof[#1#-#2]{\detokenize{#1#2}}
+\def\fff[#1#0#3]{\detokenize{#1#3}}
+
+\meaning\foo\ : <\foo[{123}]> \crlf
+\meaning\ofo\ : <\ofo[{123}]> \crlf
+\meaning\oof\ : <\oof[{123}]> \crlf
+\meaning\fof\ : <\fof[123]> \crlf
+\meaning\fff\ : <\fof[123]> \crlf
+\stopbuffer
+
+\typebuffer[option=TEX]
+
+This gives:
+
+{\tttf \getbuffer}
+
+% \getcommalistsize[a,b,c] \commalistsize\par
+% \getcommalistsize[{a,b,c}] \commalistsize\par
+
+When playing with new features like the one described here, it makes sense to use
+them in existing macros so that they get well tested. Some of the low level
+system files come in different versions: for \MKII, \MKIV\ and \LMTX. The \MKII\
+files often also have the older implementations, so they are also good for
+looking at the history. The \LMTX\ files can be leaner and meaner than the \MKIV\
+files because they use the latest features. \footnote {Some 70 primitives present
+in \LUATEX\ are not in \LUAMETATEX. On the other hand there are also about 70 new
+primitives. Of those gone, most concerned the backend, fonts or no longer
+relevant features from other engines. Of those new, some are really new
+primitives (conditionals, expansion magic), some control previously hardwired
+behaviour, some give access to properties of for instance boxes, and some are
+just variants of existing ones but with options for control.}
+
+When I was rewriting some of these low level \MKIV\ macros using the newer features,
+at some point I wondered why I still had to jump through some hoops. Why not just
+add some more primitives to deal with that? After all, \LUATEX\ and \LUAMETATEX\
+already have more primitives that are helpful in parsing, so a few dozen more lines
+don't hurt. As long as these primitives are generic and not that specific. In this
+particular case we talk about two new conditionals (in addition to the already
+present comparison primitives):
+
+\starttyping[option=TEX]
+\ifhastok <token> {<token list>}
+\ifhastoks {<token list>} {<token list>}
+\ifhasxtoks {<token list>} {<token list>}
+\stoptyping
+
+You can probably guess what they do from their names. The last one is the
+expandable variant of the second one. The first one is the fast one. When playing
+with these I decided to redo the set checker. In \MKII\ that one is done in good
+old \TEX, in \MKIV\ we use \LUA. So, how about going back to \TEX ?
+
+\starttyping[option=TEX]
+\ifhasxtoks {cd} {abcdef}
+\stoptyping
+
+This check is true. But that doesn't work well with a comma separated list, but
+there is a way out:
+
+\starttyping[option=TEX]
+\ifhasxtoks {,cd,} {,ab,cd,ef,}
+\stoptyping
+
+However, when I applied that a user reported that it didn't handle optional
+spaces before commas. So how do we deal with such optional characters tokens?
+
+\startbuffer
+\def\setcontains#1#2{\ifhasxtoks{,#1,}{,#2,}}
+
+\ifcondition\setcontains{cd}{ab,cd,ef}YES \else NO \fi
+\ifcondition\setcontains{cd}{ab, cd, ef}YES \else NO \fi
+\stopbuffer
+
+\typebuffer[option=TEX]
+
+We get:
+
+\getbuffer
+
+The \type {\ifcondition} is an old one. When nested in a condition it will be
+seen as an \type {\if...} by the fast skipping scanner, but when expanded it will
+go on and a following macro has to expand to a proper condition. That said, we
+can take care of the optional space by entering some new territory. Look at this:
+
+\startbuffer
+\def\setcontains#1#2{\ifhasxtoks{,\expandtoken 9 "20 #1,}{,#2,}}
+
+\ifcondition\setcontains{cd}{ab,cd,ef}YES \else NO \fi
+\ifcondition\setcontains{cd}{ab, cd, ef}YES \else NO \fi
+\stopbuffer
+
+\typebuffer[option=TEX]
+
+We get:
+
+\getbuffer
+
+So how does that work? The \type {\expandtoken} injects a space token with
+catcode~9 which means that it is in the to be ignored category. When a to be
+ignored token is seen, and the to be checked token is a character (letter, other,
+space or ignored) then the character code will be compared. When they match, we
+move on, otherwise we just skip over the ignored token (here the space).
+
+In the \CONTEXT\ code base there are already files that are specific for \MKIV\
+and \LMTX. The most visible difference is that we use the \type {\orelse}
+primitive to construct nicer test trees, and we also use some of the additional
+\type {\future...} and \type {\expandafter...} features. The extensions discussed
+here make for the most recent differences (we're talking end May 2020).
+
+After implementing this trick I decided to look at the macro definition mechanism
+one more time and see if I could also use this there. Before I demonstrate
+another next feature, I will again show the argument extensions, this time with
+a fourth variant:
+
+\startbuffer[definitions]
+\def\TestA#1#2#3{{(#1)(#2)(#3)}}
+\def\TestB#1#0#3{(#1)(#2)(#3)}
+\def\TestC#1#+#3{(#1)(#2)(#3)}
+\def\TestD#1#-#2{(#1)(#2)}
+\stopbuffer
+
+\typebuffer[definitions][option=TEX] \getbuffer[definitions]
+
+The last one specifies a to be thrashed argument: \type {#-}. It goes further
+than the second one (\type {#0}) which still keeps a reference. This is why in
+this last case the third argument gets number \type {2}. The meanings of these
+four are:
+
+\startlines \tttf
+\meaning\TestA
+\meaning\TestB
+\meaning\TestC
+\meaning\TestD
+\stoplines
+
+There are some subtle differences between these variants, as you can see from
+the following examples:
+
+\startbuffer[usage]
+\TestA1{\red 2}3
+\TestB1{\red 2}3
+\TestC1{\red 2}3
+\TestD1{\red 2}3
+\stopbuffer
+
+\typebuffer[usage][option=TEX]
+
+Here you also see the side effect of keeping the braces. The zero argument (\type
+{#0}) is ignored, and the thrashed argument (\type {#-}) can't even be accessed.
+
+\startlines \tttf \getbuffer[usage] \stoplines
+
+In the next example we see two delimiters being used, a comma and a space, but
+they have catcode~9 which flags them as ignored. This is a signal for the parser
+that both the comma and the space can be skipped. The zero arguments are still on
+the parameter stack, but the thrashed ones result in a smaller stack, not that
+the later matters much on today's machines.
+
+\startbuffer
+\normalexpanded {
+ \def\noexpand\foo
+ \expandtoken 9 "2C % comma
+ \expandtoken 9 "20 % space
+ #1=#2]%
+}{(#1)(#2)}
+\stopbuffer
+
+\typebuffer[option=TEX] \getbuffer
+
+This means that the next tree expansions won't bark:
+
+\startbuffer
+\foo,key=value]
+\foo, key=value]
+\foo key=value]
+\stopbuffer
+
+\typebuffer[option=TEX]
+
+or expanded:
+
+\startlines \tttf \getbuffer \stoplines
+
+Now, why didn't I add these primitives long ago already? After all, I already
+added dozens of new primitives over the years. To quote Andrew Cuomo, what
+follows now are opinions, not facts.
+
+Decades ago, when \TEX\ showed up, there was no Internet. I remember that I got
+my first copy on floppy disks. Computers were slow and memory was limited. The
+\TEX book was the main resource and writing macros was a kind of art. One could
+not look up solutions, so trial and error was a valid way to go. Figuring out
+what was efficient in terms of memory consumption and runtime was often needed
+too. I remember meetings where one was not taken serious when not talking in the
+right \quote {token}, \quote {node}, \quote {stomach} and \quote {mouth} speak.
+Suggesting extensions could end up in being told that there was no need because
+all could be done in macros or even arguments of the \quotation {who needs that}.
+I must admit that nowadays I wonder to what extend that was related to extensions
+taking away some of the craftmanship and showing off. In a way it is no surprise
+that (even trivial to implement) extensions never surfaced. Of course then the
+question is: will extensions that once were considered not of importance be used
+today? We'll see.
+
+Let's end by saying that, as with other experiments, I might port some of the new
+features in \LUAMETATEX\ to \LUATEX, but only after they have become stable and
+have been tested in \LMTX\ for quite a while.
+
+\stopchapter
+
+\stopcomponent
diff --git a/doc/context/sources/general/manuals/evenmore/evenmore-tokens.tex b/doc/context/sources/general/manuals/evenmore/evenmore-tokens.tex
new file mode 100644
index 000000000..d653703a9
--- /dev/null
+++ b/doc/context/sources/general/manuals/evenmore/evenmore-tokens.tex
@@ -0,0 +1,460 @@
+% language=us
+
+% TODO: copy_node_list : return tail
+% TODO: grabnested
+
+\environment evenmore-style
+
+\startcomponent evenmore-tokens
+
+\startchapter[title=Tokens]
+
+\usemodule[article-basic,abbreviations-logos]
+
+\starttext
+
+{\em This is mostly a wrapup of some developments, and definitely not a tutorial.}
+
+Talking deep down \TEX\ is talking about tokens and nodes. Roughly spoken, from
+the perspective of the user, tokens are what goes in and stays in (as macro,
+token list of whatever) and nodes is what get produced and eventually results in
+output. A character in the input becomes one token (before expansion) and a
+control sequence like \type {\foo} also is turned into a token. Tokens can be
+linked into lists. This actually means that in the engine we can talk of tokens
+in two ways: the single item with properties that trigger actions, or as compound
+item with that item and a pointer to the next token (called link). In \LUA\ speak
+token memory can be seen as:
+
+\starttyping
+fixmem = {
+ { info, link },
+ { info, link },
+ { info, link },
+ { info, link },
+ ....
+}
+\stoptyping
+
+Both are 32 bit integers. The \type {info} is a combination of a command code (an
+operator) and a so called chr code (operand) and these determine its behaviour.
+For instance the command code can indicate an integer register and the chr code
+then indicates the number of that register. So, like:
+
+\starttyping
+fixmem = {
+ { { cmd, chr}, index_into_fixmem },
+ { { cmd, chr}, index_into_fixmem },
+ { { cmd, chr}, index_into_fixmem },
+ { { cmd, chr}, index_into_fixmem },
+ ....
+}
+\stoptyping
+
+
+In the following line the characters that make three words are tokens (letters),
+so are the space (spacer), the curly braces (begin- and endgroup token) and the
+bold face switch (which becomes one token which resolves to a token list of
+tokens that trigger actions (in this case switching to a bolder font).
+
+\starttyping
+foo {\bf bar} foo
+\stoptyping
+
+When \TEX\ reads a line of input tokens are expanded immediately but a sequence
+can also become part fo a macro body or token list. Here we have $3_{\type{foo}}
++ 1 + 1_{\type+{+} + 1_{\type{\bf}} + 3_{\type{bar}} + 1_{\type+}+} + 1 +
+3_{\type{foo}} = 14$ tokens.
+
+A control sequence normally starts with a backslash. Some are built in, these are
+called primitives, and others are defined by the macro package or the user. There
+is a lookup table that relates the tokenized control sequence to some action. For
+instance:
+
+\starttyping
+\def\foo{foo}
+\stoptyping
+
+creates an entry that leads (directly or following a hash chain) to the three
+letter token list. Every time the input sees \type {\foo} it gets resolved to
+that list via a hash lookup. However, once internalized and part of a token list,
+it is a direct reference. On the other hand,
+
+\starttyping
+\the\count0
+\stoptyping
+
+triggers the \type {\the} action that relates to this control sequence, which
+then reads a next token and operates on that. That next token itself expects a
+number as follow up. In the end the value of \type {\count0} is found and that
+one is also in the so called equivalent lookup table, in what \TEX\ calls
+specific regions.
+
+\starttyping
+equivalents = {
+ { level, type, value },
+ { level, type, value },
+ { level, type, value },
+ ...
+}
+\stoptyping
+
+The value is in most cases similar to the info (cmd & chr) field in fixmem, but
+one difference is that counters, dimensions etc directly store their value, which
+is why we sometimes need the type separately, for instance in order to reclaim
+memory for glue or node specifications. It sound complicated and it is, but as
+long as you get a rough idea we can continue. Just keep in mind that tokens
+sometimes get expanded on the fly, and sometimes just get stored.
+
+There are a lot of primitives and each has a unique info. The same is true for
+characters (each category has its own command code, so regular letters can be
+distinguished from other tokens, comment signs, math triggers etc). All important
+basic bits are in table of equivalents: macros as well as registers although the
+meaning of a macro and content of token lists lives in the fixmem table and
+the content of boxes in so called node lists (nodes have their own memory).
+
+In traditional \TEX\ the lookup table for primitives, registers and macros is as
+compact as can be: it is an array of so called 32 bit memory words. These can be
+divided into halfs and quarters, so in the source you find terms like \type
+{halfword} and \type {quarterword}. The lookup table is a hybrid:
+
+\starttyping
+[level 8] [type 8] [value 16] | [equivalent 32]
+[level 8] [type 8] [value 16] | [equivalent 32]
+[level 8] [type 8] [value 16] | [equivalent 32]
+...
+\stoptyping
+
+The mentioned counters and such are directly encoded in an equivalent and the
+rest is a combination of level, type and value. The level is used for the
+grouping, and in for instance \PDFTEX\ there can therefore be at most 255 levels.
+In \LUATEX\ we use a wider model. There we have 64 bit memory words which means
+that we have way more levels and don't need to have this dual nature:
+
+\starttyping
+[level 16] [type 16] [value 32]
+[level 16] [type 16] [value 32]
+[level 16] [type 16] [value 32]
+...
+\stoptyping
+
+We already showed a \LUA\ representation. The type in this table is what a
+command code is in an \quote {info} field. In such a token the integer encodes
+the command as well as a value (called chr). In the lookup table the type is the
+command code. When \TEX\ is dealing with a control sequences it looks at the
+type, otherwise it filters the command from the token integer. This means that a
+token cannot store an integer (or dimension), but the lookup table actually can
+do that. However, commands can limit the range, for instance characters are bound
+by what \UNICODE\ permits.
+
+Internally, \LUATEX\ still uses these ranges of fast accessible registers, like
+counters, dimensions and attributes. However, we saw that in \LUATEX\ they don't
+overlap with the level and type. In \LUATEX, at least till version 1.13 we still
+have the shadow array for levels but in \LUAMETATEX\ we just use those in the
+equivalents lookup table. If you look in the \PASCAL\ source you will notice that
+arrays run from \type {[somemin ... somemax]} which in the \CCODE\ source would
+mean using offsets. Actually, the shadow array starts at zero so we waste the
+part that doesn't need shadowing. It is good to remind ourselves that traditional
+\TEX\ is 8 bit character based.
+
+The equivalents lookup table has all kind of special ranges (combined into
+regions of similar nature, in \TEX\ speak), like those for lowercase mapping,
+specific catcode mappings, etc.\ but we're still talking of $n \times 256$
+entries. In \LUATEX\ all these mappings are in dedicated sparse hash tables
+because we need to support the full \UNICODE\ repertoire. This means that, while
+on the one hand \LUATEX\ uses more memory for the lookup table the number of
+slots can be less. But still there was the waste of the shadow level table: I
+didn't calculate the exact saving of ditching that one, but I bet it came close
+to what was available as total memory for programs and data on the first machines
+that I used for running \TEX. But \unknown\ after more than a decade of \LUATEX\
+we now reclaimed that space in \LUAMETATEX. \footnote {Don't expect a gain in
+performance, although using less memory might pay back on a virtual machine or
+when \TEX\ has to share the \CPU\ cache.}
+
+Now, in case you're interested (and actually I just write it down because I don't
+want to forget it myself) the lookup table in \LUAMETATEX\ is layout as follows
+
+\starttabulate
+\NC the hash table \NC \NC \NR
+\NC some frozen primitives \NC \NC \NR
+\NC current and defined fonts \NC one slot + many pointers \NC \NR
+\NC undefined control sequence \NC one slot \NC \NR
+\NC internal and register glue \NC pointer to node \NC \NR
+\NC internal and register muglue \NC pointer to node \NC \NR
+\NC internal and register toks \NC pointer to token list \NC \NR
+\NC internal and register boxes \NC pointer to node list \NC \NR
+\NC internal and register counts \NC value in token \NC \NR
+\NC internal and register attributes \NC value in token \NC \NR
+\NC internal and register dimens \NC value in token \NC \NR
+\NC some special data structures \NC pointer to node list \NC \NC \NR
+\NC the (runtime) extended hash table \NC \NC \NR
+\stoptabulate
+
+Normally a user doesn't need to know anything about these specific properties of
+the engine and it might comfort you to know that for a long time I could stay
+away from these details. One difference with the other engines is that we have
+internal variables and registers split more explicitly. The special data
+structures have their own slots and are not just put somewhere (semi random). The
+initialization is bit more granular in that we properly set the types (cmd codes)
+for registers which in turn is possible because for instance we're able to
+distinguish glue types. This is all part of coming up with a bit more consistent
+interface to tokens from the \LUA\ end. It also permits diagnostics.
+
+Anyway, we now are ready for some more details about tokens. You don't need to
+understand all of it in order to define decent macros. But when you are using
+\LUATEX\ and do want to mess around here is some insight. Assume we have defined
+these macros:
+
+\startluacode
+ local alsoraw = false
+ function documentdata.StartShowTokens(rawtoo)
+ context.starttabulate { "|T|rT|lT|rT|rT|rT|" .. (rawtoo and "rT|" or "") }
+ context.BC()
+ context.BC() context("cmd")
+ context.BC() context("name")
+ context.BC() context("chr")
+ context.BC() context("cs")
+ if rawtoo then
+ context.BC() context("rawchr")
+ end
+ context.BC() context.NR()
+ context.SL()
+ alsoraw = rawtoo
+ end
+ function documentdata.StopShowTokens()
+ context.stoptabulate()
+ end
+ function documentdata.ShowToken(name)
+ local cmd, chr, cs = token.get_cmdchrcs(name)
+ local _, raw, _ = token.get_cmdchrcs(name,true)
+ context.NC() context("\\string\\"..name)
+ context.NC() context(cmd)
+ context.NC() context(tokens.commands[cmd])
+ context.NC() context(chr)
+ context.NC() context(cs)
+ if alsoraw and chr ~= raw then
+ context.NC() context(raw)
+ end
+ context.NC() context.NR()
+ end
+\stopluacode
+
+\startbuffer
+\def\MacroA{a} \def\MacroB{b}
+\def\macroa{a} \def\macrob{b}
+\def\MACROa{a} \def\MACROb{b}
+\stopbuffer
+
+\typebuffer \getbuffer
+
+How does that end up internally?
+
+\startluacode
+ documentdata.StartShowTokens(true)
+ documentdata.ShowToken("scratchcounterone")
+ documentdata.ShowToken("scratchcountertwo")
+ documentdata.ShowToken("scratchdimen")
+ documentdata.ShowToken("scratchtoks")
+ documentdata.ShowToken("scratchcounter")
+ documentdata.ShowToken("letterpercent")
+ documentdata.ShowToken("everypar")
+ documentdata.ShowToken("%")
+ documentdata.ShowToken("pagegoal")
+ documentdata.ShowToken("pagetotal")
+ documentdata.ShowToken("hangindent")
+ documentdata.ShowToken("hangafter")
+ documentdata.ShowToken("dimdim")
+ documentdata.ShowToken("relax")
+ documentdata.ShowToken("dimen")
+ documentdata.ShowToken("stoptext")
+ documentdata.ShowToken("MacroA")
+ documentdata.ShowToken("MacroB")
+ documentdata.ShowToken("MacroC")
+ documentdata.ShowToken("macroa")
+ documentdata.ShowToken("macrob")
+ documentdata.ShowToken("macroc")
+ documentdata.ShowToken("MACROa")
+ documentdata.ShowToken("MACROb")
+ documentdata.ShowToken("MACROc")
+ documentdata.StopShowTokens()
+\stopluacode
+
+We show the raw chr value but in the \LUA\ interface these are normalized to for
+instance proper register indices. This is because the raw numbers can for
+instance be indices into memory or some \UNICODE\ reference with catcode specific
+bits set. But, while these indices are real and stable, these offsets can
+actually change when the implementation changes. For that reason, in \LUAMETATEX\
+we can better talk of command codes as main indicator and:
+
+\starttabulate
+\NC subcommand \NC for tokens that have variants, like \type {\ifnum} \NC \NR
+\NC register indices \NC for the 64K register banks, like \type {\count0} \NC \NR
+\NC internal indices \NC for internal variables like \type {\parindent} \NC \NR
+\NC characters \NC specific \UNICODE\ slots combined with catcode \NC \NR
+\NC pointers \NC to token lists, macros, \LUA\ functions, nodes \NC \NR
+\stoptabulate
+
+This so called \type {cs} number is a pointer into the table of equivalents. That
+number results comes from the hash table. A macro name, when scanned the first
+time, is still a sequence of bytes. This sequence is used to compute a hash
+number, which is a pointer to a slot in the lower part of the hash (lookup)
+table. That slot points to a string and a next hash entry in the higher end. A
+lookup goes as follows:
+
+\startitemize[n,packed]
+ \startitem
+ compute the index into the hash table from the string
+ \stopitem
+ \startitem
+ goto the slot with that index and compare the \type {string} field
+ \stopitem
+ \startitem
+ when there is no match goto the slot indicated by the \type {next} field
+ \stopitem
+ \startitem
+ compare again and keep following \type {next} fields till there is no
+ follow up
+ \stopitem
+ \startitem
+ optionally create a new entry
+ \stopitem
+ \startitem
+ use the index of that entry as index in the table of equivalents
+ \stopitem
+\stopitemize
+
+So, in \LUA\ speak, we have:
+
+\starttyping
+hashtable = {
+ -- lower part, accessed via the calculated hash number
+ { stringpointer, nextindex },
+ { stringpointer, nextindex },
+ ...
+ -- higher part, accessed by following nextindex
+ { stringpointer, nextindex },
+ { stringpointer, nextindex },
+ ...
+}
+\stoptyping
+
+Eventually, after following a lookup chain in the hash tabl;e, we end up at
+pointer to the equivalents lookup table that we already discussed. From then on
+we're talking tokens. When you're lucky, the list is small and you have a quick
+match. The maximum initial hash index is not that large, around 64K (double that
+in \LUAMETATEX), so in practice there will often be some indirect
+(multi|-|compare) match but increasing the lower end of the hash table might
+result in less string comparisons later on, but also increases the time to
+calculate the initial hash needed for accessing the lower part. Here you can sort
+of see that:
+
+\startbuffer
+\dostepwiserecurse{`a}{`z}{1}{
+ \expandafter\def\csname whatever\Uchar#1\endcsname
+ {}
+}
+\dostepwiserecurse{`a}{`z}{1}{
+ \expandafter\let\csname somemore\Uchar#1\expandafter\endcsname
+ \csname whatever\Uchar#1\endcsname
+}
+\stopbuffer
+
+\typebuffer \getbuffer
+
+\startluacode
+ documentdata.StartShowTokens(true)
+ for i=utf.byte("a"),utf.byte("z") do
+ documentdata.ShowToken("whatever"..utf.char(i))
+ documentdata.ShowToken("somemore"..utf.char(i))
+ end
+ documentdata.StopShowTokens()
+\stopluacode
+
+The command code indicates a macro and the action related to it is an expandable
+call. We have no sub command \footnote {We cheat a little here because chr
+actually is an index into token memory but we don't show them as such.} so that
+column shows zeros. The fifth column is the hash entry which can bring us back to
+the verbose name as needed in reporting while the last column is the index to
+into token memory (watch the duplicates for \type {\let} macros: a ref count is
+kept in order to be able to manage such shared references). When you look a the
+cs column you will notice that some numbers are close which (I think) in this
+case indicates some closeness in the calculated hash name and followed chain.
+
+It will be clear that it is best to not make any assumptions with respect to the
+numbers which is why, in \LUAMETATEX\ we sort of normalize them when accessing
+properties.
+
+\starttabulate
+\NC field \NC meaning \NC \NR
+\FL
+\NC command \NC operator \NC \NR
+\NC cmdname \NC internal name of operator \NC \NR
+\NC index \NC sanitized operand \NC \NR
+\NC mode \NC original operand \NC \NR
+\NC csname \NC associated name \NC \NR
+\NC id \NC the index in token memory (a virtual address) \NC \NR
+\NC tok \NC the integer representation \NC \NR
+\ML
+\NC active \NC true when an active character \NC \NR
+\NC expandable \NC true when expandable command \NC \NR
+\NC protected \NC true when a protected command \NC \NR
+\NC frozen \NC true when a frozen command \NC \NR
+\NC user \NC true when a user defined command \NC \NR
+\LL
+\stoptabulate
+
+When a control sequence is an alias to an existing primitive, for instance
+made by \type {\let}, the operand (chr) picked up from its meaning. Take this:
+
+\startbuffer
+\newif\ifmyconditionone
+\newif\ifmyconditiontwo
+
+ \meaning\ifmyconditionone \crlf
+ \meaning\ifmyconditiontwo \crlf
+ \meaning\myconditiononetrue \crlf
+ \meaning\myconditiontwofalse \crlf
+\myconditiononetrue \meaning\ifmyconditionone \crlf
+\myconditiontwofalse\meaning\ifmyconditiontwo \crlf
+\stopbuffer
+
+\typebuffer \getbuffer
+
+Internally this is:
+
+\startluacode
+ documentdata.StartShowTokens(false)
+ documentdata.ShowToken("ifmyconditionone")
+ documentdata.ShowToken("ifmyconditiontwo")
+ documentdata.ShowToken("iftrue")
+ documentdata.ShowToken("iffalse")
+ documentdata.StopShowTokens()
+\stopluacode
+
+The whole list of available commands is given below. Once they are stable the \LUAMETATEX\ manual
+will document the accessors. In this chapter we use:
+
+\starttyping
+kind, min, max, fixedvalue token.get_range("primitive")
+cmd, chr, cs = token.get_cmdchrcs("primitive")
+\stoptyping
+
+The kind of command is given in the first column, which can have the following values:
+
+\starttabulate[|l|l|p|]
+\NC 0 \NC no \NC not accessible \NC \NR
+\NC 1 \NC regular \NC possibly with subcommand \NC \NR
+\NC 2 \NC character \NC the \UNICODE\ slot is encodes in the the token \NC \NR
+\NC 3 \NC register \NC this is an indexed register (zero upto 64K) \NC \NR
+\NC 4 \NC internal \NC this is an internal register (range given) \NC \NR
+\NC 5 \NC reference \NC this is a reference to a node, \LUA\ function, etc. \NC \NR
+\NC 6 \NC data \NC a general data entry (kind of private) \NC \NR
+\NC 7 \NC token \NC a token reference (that can have a followup) \NC \NR
+\stoptabulate
+
+\usemodule[system-tokens]
+
+\start \switchtobodyfont[7pt] \showsystemtokens \stop
+
+\stopchapter
+
+\stopcomponent
diff --git a/doc/context/sources/general/manuals/evenmore/evenmore.tex b/doc/context/sources/general/manuals/evenmore/evenmore.tex
index b662e5108..c2e4e232b 100644
--- a/doc/context/sources/general/manuals/evenmore/evenmore.tex
+++ b/doc/context/sources/general/manuals/evenmore/evenmore.tex
@@ -22,8 +22,9 @@
\component evenmore-whattex
\component evenmore-numbers
% \component evenmore-parameters
- % \component evenmore-parsing
- % \component evenmore-tokens
+ \startchapter[title=Parameters] {\em This will appear first in \TUGBOAT.} \stopchapter
+ \component evenmore-parsing
+ \component evenmore-tokens
\stopbodymatter
\stopdocument
diff --git a/doc/context/sources/general/manuals/luametatex/luametatex-tex.tex b/doc/context/sources/general/manuals/luametatex/luametatex-tex.tex
index 8aa7408d6..7dfc00313 100644
--- a/doc/context/sources/general/manuals/luametatex/luametatex-tex.tex
+++ b/doc/context/sources/general/manuals/luametatex/luametatex-tex.tex
@@ -653,6 +653,8 @@ tex.currentifbranch
\libindex{scantoks}
+\libindex{getmark}
+
\TEX's attributes (\lpr {attribute}), counters (\prm {count}), dimensions (\prm
{dimen}), skips (\prm {skip}, \prm {muskip}) and token (\prm {toks}) registers
can be accessed and written to using two times five virtual sub|-|tables of the
@@ -788,6 +790,12 @@ tex.scantoks("global",0,3,"$\int\limits^1_2$")
In the function-based interface, it is possible to define values globally by
using the string \type {global} as the first function argument.
+There is a dedicated getter for marks: \type {getmark} that takes two arguments.
+The first argument is one of \type {top}, \type {bottom}, \type {first}, \type
+{splitbottom} or \type {splitfirst}, and the second argument is a marks class
+number. When no arguments are given the current maximum number of classes is
+returned.
+
\stopsubsection
\startsubsection[title={Character code registers: \type {[get|set]*code[s]}}]
@@ -2087,6 +2095,11 @@ macro, in which case the result will also provide information about what
arguments are expected and in the result this is separated from the meaning by a
separator token. The \type {expand} flag determines if the list will be expanded.
+The \type {scan_argument} function expands the given argument. When a braced
+argument is scanned, expansion can be prohibited by passing \type {false}
+(default is \type {true}). In case of a control sequence passing \type {false}
+will result in a one|-|level expansion (the meaning of the macro).
+
The string scanner scans for something between curly braces and expands on the
way, or when it sees a control sequence it will return its meaning. Otherwise it
will scan characters with catcode \type {letter} or \type {other}. So, given the
diff --git a/doc/context/sources/general/manuals/luametatex/luametatex.tex b/doc/context/sources/general/manuals/luametatex/luametatex.tex
index 0fd7a31b3..acdffcde3 100644
--- a/doc/context/sources/general/manuals/luametatex/luametatex.tex
+++ b/doc/context/sources/general/manuals/luametatex/luametatex.tex
@@ -9,13 +9,6 @@
% mswin 2562k 2555k mswin 2481k 2471k
% ------------------------ ------------------------
-% experiment (if this becomes default we need to check visualizers and disable it when needed):
-
-\startluacode
-nodes.handlers.cleanuppage = nodes.nuts.flatten_discretionaries
-nodes.tasks.prependaction("shipouts", "normalizers", "nodes.handlers.cleanuppage", nil, "nut", "enabled")
-\stopluacode
-
% 20200509 : 258 pages
%
% my 2013 i7 laptop with windows : 11.8 sec mingw64