summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/mk/mk-mix.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/context/sources/general/manuals/mk/mk-mix.tex')
-rw-r--r--doc/context/sources/general/manuals/mk/mk-mix.tex1014
1 files changed, 1014 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/mk/mk-mix.tex b/doc/context/sources/general/manuals/mk/mk-mix.tex
new file mode 100644
index 000000000..dd2c72d5b
--- /dev/null
+++ b/doc/context/sources/general/manuals/mk/mk-mix.tex
@@ -0,0 +1,1014 @@
+% language=uk
+
+\startcomponent mk-mix
+
+\environment mk-environment
+
+\chapter{The \luaTeX\ Mix}
+
+\subject{introduction}
+
+The idea of embedding \LUA\ into \TEX\ originates in some
+experiments with \LUA\ embedded in the \SCITE\ editor. You can add
+functionality to this editor by loading \LUA\ scripts. This is
+accomplished by a library that gives access to the internals of
+the editing component.
+
+The first integration of \LUA\ in \PDFTEX\ was relatively simple:
+from \TEX\ one could call out to \LUA\ and from \LUA\ one could
+print to \TEX. My first application was converting math encoded a
+calculator syntax to \TEX. Following experiments dealt with
+\METAPOST. At this point integration meant as little as: having some
+scripting language as addition to the macro language. But, even in
+this early stage further possibilities were explored, for instance
+in manipulating the final output (i.e.\ the \PDF\ code). The first
+versions of what by then was already called \LUATEX\ provided
+access to some internals, like counter and dimension registers and
+the dimensions of boxes.
+
+Boosted by the oriental \TeX\ project, the team started exploring
+more fundamental possibilities: hooks in the input|/|output,
+tokenization, fonts and nodelists. This was followed by opening up
+hyphenation, breaking lines into paragraphs and building
+ligatures. At that point we not only had access to some internals
+but also could influence the way \TEX\ operates.
+
+After that, an excursion was made to \MPLIB, which fulfilled a
+long standing wish for a more natural integration of \METAPOST\
+into \TEX. At that point we ended up with mixtures of \TEX, \LUA\
+and \METAPOST\ code.
+
+Medio 2008 we still need to open up more of \TEX, like page
+building, math, alignments and the backend. Eventually \LUATEX\
+will be nicely split up in components, rewritten in \CCODE, and we may
+even end up with \LUA\ glueing together the components that make
+up the \TEX\ engine. At that point the interoperation between
+\TEX\ and \LUA\ may be more rich that it is now.
+
+In the next sections I will discuss some of the ideas behind
+\LUATEX\ and the relationship between \LUA\ and \TEX\ and how it
+presents itself to users. I will not discuss the interface itself,
+which consists of quite some functions (organized in pseudo
+libraries) and the mechanisms used to access and replace internals
+(we call them callbacks).
+
+\subject {tex vs. lua}
+
+\TEX\ is a macro language. Everything boils down to either allowing
+stepwise expansion or explicitly preventing it. There are no real
+control features, like loops; tail recursion is a key concept.
+There are few accessible data|-|structures like numbers, dimensions,
+glue, token lists and boxes. What happens inside \TEX\ is
+controlled by variables, mostly hidden from view, and optimized
+within the constraints of 30 years ago.
+
+The original idea behind \TEX\ was that an author would write a
+specific collection of macros for each publication, but increasing
+popularity among non-programmers quickly resulted in distributed
+collections of macros, called macro packages. They started small
+but grew and grew and by now have become pretty large. In these
+packages there are macros dealing with fonts, structure, page
+layout, graphic inclusion, etc. There is also code dealing with
+user interfaces, process control, conversion and much of that code
+looks out of place: the lack of control features and string
+manipulation is solved by mimicking other languages, the
+unavailability of a float datatype is compensated by misusing
+dimension registers, and you can find provisions to force or
+inhibit expansion all over the place.
+
+\TEX\ is a powerful typographical programming language but
+lacks some of the handy features of scripting languages. Handy in the
+sense that you will need them when you want to go beyond the
+original purpose of the system. \LUA\ is a powerful scripting
+language, but knows nothing of typesetting. To some extent it
+resembles the language that \TEX\ was written in: \PASCAL. And,
+since \LUA\ is meant for embedding and extending existing systems,
+it makes sense to bring \LUA\ into \TEX. How do they compare?
+Let's give some examples.
+
+About the simplest example of using \LUA\ in \TEX\ is the following:
+
+\starttyping
+\directlua { tex.print(math.sqrt(10)) }
+\stoptyping
+
+This kind of application is probably what most users will want and
+use, if they use \LUA\ at all. However, we can go further than that.
+
+In \TEX\ a loop can be implemented as in the plain format
+(copied with comment):
+
+\starttyping
+\def\loop#1\repeat{\def\body{#1}\iterate}
+\def\iterate{\body\let\next\iterate\else\let\next\relax\fi\next}
+\let\repeat=\fi % this makes \loop...\if...\repeat skippable
+\stoptyping
+
+This is then used as:
+
+\starttyping
+\newcount \mycounter \mycounter=1
+\loop
+ ...
+ \advance\mycounter 1
+ \ifnum\mycounter < 11
+\repeat
+\stoptyping
+
+The definition shows a bit how \TEX\ programming works. Of course
+such definitions can be wrapped in macros, like:
+
+\starttyping
+\forloop{1}{10}{1}{some action}
+\stoptyping
+
+and this is what often happens in more complex macro packages. In
+order to use such control loops without side effects, the macro
+writer needs to take measures that permit for instance nested
+usage and avoids clashes between local variables (counters or
+macros) and user defined ones. Here we use a counter in the
+condition, but in practice expressions will be more complex
+and this is not that trivial to implement.
+
+The original definition of the iterator can be written a bit
+more efficient:
+
+\starttyping
+\def\iterate{\body \expandafter\iterate \fi}
+\stoptyping
+
+And indeed, in macro packages you will find many such expansion
+control primitives being used, which does not make reading macros
+easier.
+
+Now, get me right, this does not make \TEX\ less powerful, it's
+just that the language is focused on typesetting and not on
+general purpose programming, and in principle users can do
+without: documents can be preprocessed using another language, and
+document specific styles can be used.
+
+We have to keep in mind that \TEX\ was written in a time when
+resources in terms of memory and \CPU\ cycles weres less abundant
+than they are now. The 255 registers per class and the about 3000
+hash slots in original \TEX\ were more than enough for typesetting
+a book, but in huge collections of macros they are not all that much. For
+that reason many macropackages use obscure names to hide their
+private registers from users and instead of allocating new ones
+with meaningful names, existing ones are shared. It is therefore
+not completely fair to compare \TEX\ code with \LUA\ code: in \LUA\
+we have plenty of memory and the only limitations are those
+imposed by modern computers.
+
+In \LUA, a loop looks like this:
+
+\starttyping
+for i=1,10 do
+ ...
+end
+\stoptyping
+
+But while in the \TEX\ example, the content directly ends up in
+the input stream, in \LUA\ we need to do that explicitly, so in
+fact we will have:
+
+\starttyping
+for i=1,10 do
+ tex.print("...")
+end
+\stoptyping
+
+And, in order to execute this code snippet, in \LUATEX\ we will do:
+
+\starttyping
+\directlua 0 {
+ for i=1,10 do
+ tex.print("...")
+ end
+}
+\stoptyping
+
+So, eventually we will end up with more code than just \LUA\ code,
+but still the loop itself looks quite readable and more complex loops
+are possible:
+
+\starttyping
+\directlua 0 {
+ local t, n = { }, 0
+ while true do
+ local r = math.random(1,10)
+ if not t[r] then
+ t[r], n = true, n+1
+ tex.print(r)
+ if n == 10 then break end
+ end
+ end
+}
+\stoptyping
+
+This will typeset the numbers 1 to 10 in randomized order.
+Implementing a random number generator in pure \TEX\ takes some bit of
+code and keeping track of already defined numbers in macros can be
+done with macros, but both are not very efficient.
+
+I already stressed that \TEX\ is a typographical programming
+language and as such some things in \TEX\ are easier than in \LUA,
+given some access to internals:
+
+\starttyping
+\setbox0=\hbox{x} \the\wd0
+\stoptyping
+
+In \LUA\ we can do this as follows:
+
+\starttyping
+\directlua 0 {
+ local n = node.new('glyph')
+ n.font = font.current()
+ n.char = string.byte('x')
+ tex.box[0] = node.hpack(n)
+ tex.print(tex.box[0].width/65536 .. "pt")
+}
+\stoptyping
+
+One pitfall here is that \TEX\ rounds the number differently than
+\LUA. Both implementations can be wrapped in a macro cq. function:
+
+\starttyping
+\def\measured#1{\setbox0=\hbox{#1}\the\wd0\relax}
+\stoptyping
+
+Now we get:
+
+\starttyping
+\measured{x}
+\stoptyping
+
+The same macro using \LUA\ looks as follows:
+
+\starttyping
+\directlua 0 {
+ function measure(chr)
+ local n = node.new('glyph')
+ n.font = font.current()
+ n.char = string.byte(chr)
+ tex.box[0] = node.hpack(n)
+ tex.print(tex.box[0].width/65536 .. "pt")
+ end
+}
+\def\measured#1{\directlua0{measure("#1")}}
+\stoptyping
+
+In both cases, special tricks are needed if you want to pass for
+instance a \type {#} to \TEX's variant, or a \type {"} to \LUA. In
+both cases we can use shortcuts like \type {\#} and in the second
+case we can pass strings as long strings using double square
+brackets to \LUA.
+
+This example is somewhat misleading. Imagine that we want to
+pass more than one character. The \TEX\ variant is already suited
+for that, but the function will now look like:
+
+\starttyping
+\directlua 0 {
+ function measure(str)
+ if str == "" then
+ tex.print("0pt")
+ else
+ local head, tail = nil, nil
+ for chr in str:gmatch(".") do
+ local n = node.new('glyph')
+ n.font = font.current()
+ n.char = string.byte(chr)
+ if not head then
+ head = n
+ else
+ tail.next = n
+ end
+ tail = n
+ end
+ tex.box[0] = node.hpack(head)
+ tex.print(tex.box[0].width/65536 .. "pt")
+ end
+ end
+}
+\stoptyping
+
+And still it's not okay, since \TEX\ inserts kerns between
+characters (depending on the font) and glue between words, and
+doing that all in \LUA\ takes more code. So, it will be clear that
+although we will use \LUA\ to implement advanced features, \TEX\
+itself still has quite some work to do.
+
+In the following example we show code, but this is not of
+production quality. It just demonstrates a new way of dealing
+with text in \TEX.
+
+Occasionally a design demands that at some place the first
+character of each word should be uppercase, or that the first word
+of a paragraph should be in small caps, or that each first line of a
+paragraph has to be in dark blue. When using traditional \TEX\ the user
+then has to fall back on parsing the data stream, and preferably
+you should then start such a sentence with a command that can pick
+up the text. For accentless languages like English this is quite
+doable but as soon as commands (for instance dealing with accents)
+enter the stream this process becomes quite hairy.
+
+The next code shows how \CONTEXT\ \MKII\ defines the \type {\Word}
+and \type {\Words} macros that capitalize the first characters of
+word(s). The spaces are really important here because they signal
+the end of a word.
+
+\starttyping
+\def\doWord#1%
+ {\bgroup\the\everyuppercase\uppercase{#1}\egroup}
+
+\def\Word#1%
+ {\doWord#1}
+
+\def\doprocesswords#1 #2\od
+ {\doifsomething{#1}{\processword{#1} \doprocesswords#2 \od}}
+
+\def\processwords#1%
+ {\doprocesswords#1 \od\unskip}
+
+\let\processword\relax
+
+\def\Words
+ {\let\processword\Word \processwords}
+\stoptyping
+
+Actually, the code is not that complex. We split of words and feed
+them to a macro that picks up the first token (hopefully a character)
+which is then fed into the \type {\uppercase} primitive. This assumes that
+for each character a corresponding uppercase variant is defined using the
+\type {\uccode} primitive. Exceptions can be dealt with by assigning relevant
+code to the token register \type {\everyuppercase}.
+However, such macros are far from robust. What happens if the text
+is generated and not input as-is? What happens with commands in
+the stream that do something with the following tokens?
+
+A \LUA\ based solution can look as follows:
+
+\starttyping
+\def\Words#1{\directlua 0
+ for s in unicode.utf8.gmatch("#1", "([^ ])") do
+ tex.sprint(string.upper(s:sub(1,1)) .. s:sub(2))
+ end
+}
+\stoptyping
+
+But there is no real advantage here, apart from the fact that less code
+is needed. We still operate on the input and therefore we need to look
+to a different kind of solution: operating on the node list.
+
+\starttyping
+function CapitalizeWords(head)
+ local done = false
+ local glyph = node.id("glyph")
+ for start in node.traverse_id(glyph,head) do
+ local prev, next = start.prev, start.next
+ if prev and prev.id == kern and prev.subtype == 0 then
+ prev = prev.prev
+ end
+ if next and next.id == kern and next.subtype == 0 then
+ next = next.next
+ end
+ if (not prev or prev.id ~= glyph) and
+ next and next.id == glyph then
+ done = upper(start)
+ end
+ end
+ return head, done
+end
+\stoptyping
+
+A node list is a forward|-|linked list. With a helper
+function in the \type {node} library we can loop over such lists. Instead
+of traversing we can use a regular while loop, but it is probably less
+efficient in this case. But how to apply this function to the relevant
+part of the input? In \LUATEX\ there are several callbacks that operate
+on the horizontal lists and we can use one of them to plug in this
+function. However, in that case the function is applied to probably
+more text than we want.
+
+The solution for this is to assign attributes to the range of text
+that such a function has to take care of. These attributes (there
+can be many) travel with the nodes. This is also a reason why such
+code normally is not written by end users, but by macropackage
+writers: they need to provide the frameworks where you can plug in
+code. In \CONTEXT\ we have several such mechanisms and therefore
+in \MKIV\ this function looks (slightly stripped) as follows:
+
+\starttyping
+function cases.process(namespace,attribute,head)
+ local done, actions = false, cases.actions
+ for start in node.traverse_id(glyph,head) do
+ local attr = has_attribute(start,attribute)
+ if attr and attr > 0 then
+ unset_attribute(start,attribute)
+ local action = actions[attr]
+ if action then
+ local _, ok = action(start)
+ done = done and ok
+ end
+ end
+ end
+ return head, done
+end
+\stoptyping
+
+Here we check attributes (these are set at the \TEX\ end) and we have
+all kind of actions that can be applied, depending on the value of the
+attribute. Here the function that does the actual uppercasing
+is defined somewhere else. The \type {cases} table provides us a
+namespace; such namespaces needs to be coordinated by macro package
+writers.
+
+This approach means that the macro code looks completely different; in
+pseudo code we get:
+
+\starttyping
+\def\Words#1{{<setattribute><cases><somevalue>#1}}
+\stoptyping
+
+Or alternatively:
+
+\starttyping
+\def\StartWords{\begingroup<setattribute><cases><somevalue>}
+\def\StopWords {\endgroup}
+\stoptyping
+
+Because starting a paragraph with a group can have unwanted side
+effects (like \type {\everypar} being expanded inside a group) a
+variant is:
+
+\starttyping
+\def\StartWords{<setattribute><cases><somevalue>}
+\def\StopWords {<resetattribute><cases>}
+\stoptyping
+
+So, what happens here is that the users sets an attribute using some high
+level command, and at some point during the transformation of the input into
+node lists, some action takes place. At that point commands, expansion and
+whatever no longer can interfere.
+
+In addition to some infrastructure, macro packages need to carry some
+knowledge, just as with the \type {\uccode} used in \type {\uppercase}.
+The \type {upper} function in the first example looks as follows:
+
+\starttyping
+local function upper(start)
+ local data, char = characters.data, start.char
+ if data[char] then
+ local uc = data[char].uccode
+ if uc and fonts.ids[start.font].characters[uc] then
+ start.char = uc
+ return true
+ end
+ end
+ return false
+end
+\stoptyping
+
+Such code is really macro package dependent: \LUATEX\ only
+provides the means, not the solutions. In \CONTEXT\ we have
+collected information about characters in a \type {data} table
+in the \type {characters} namespace. There we have stored the
+uppercase codes (\type {uccode}). The, again \CONTEXT\ specific,
+\type {fonts} table keeps track of all defined fonts and before
+we change the case, we make sure that this character is present
+in the font. Here \type {id} is the number by which
+\LUATEX\ keeps track of the used fonts. Each glyph node carries
+such a reference.
+
+In this example, eventually we end up with more code than in \TEX,
+but the solution is much more robust. Just imagine what would happen
+when in the \TEX\ solution we would have:
+
+\starttyping
+\Words{\framed[offset=3pt]{hello world}}
+\stoptyping
+
+It simply does not work. On the other hand, the \LUA\ code never
+sees \TEX\ commands, it only sees the two words represented by
+glyphs nodes and separated by glue.
+
+Of course, there is a danger when we start opening \TEX's core
+features. Currently macro packages know what to expect, they know
+what \TEX\ can and cannot do. Of course macro writers have
+exploited every corner of \TEX, even the dark ones. Where dirty
+tricks in the \TEX book had an educational purpose, those of users
+sometimes have obscene traits. If we just stick to the trickery
+introduced for parsing input, converting this into that, doing
+some calculations, and alike, it will be clear that \LUA\ is more
+than welcome. It may hurt to throw away thousands of lines of
+impressive code and replace it by a few lines of \LUA\ but that's
+the price the user pays for abusing \TEX. Eventually \CONTEXT\ \MKIV\
+will be a decent mix of \LUA\ and \TEX\ code, and hopefully the
+solutions programmed in those languages are as clean as possible.
+
+Of course we can discuss until eternity whether \LUA\ is the best
+choice. Taco, Hartmut and I are pretty confident that it is, and
+in the couple of years that we are working on \LUATEX\ nothing has proved
+us wrong yet. We can fantasize about concepts, only to find out that
+they are impossible to implement or hard to agree on; we just go
+ahead using trial and error. We can talk over and over how opening up
+should be done, which is what the team does in a nicely
+closed and efficient loop, but at some points decisions have to be
+made. Nothing is perfect, neither is \LUATEX, but most users won't
+notice it as long as it extends \TEX's live and makes usage more
+convenient.
+
+Users of \TEX\ and \METAPOST\ will have noticed that both
+languages have their own grouping (scope) model. In \TEX\ grouping
+is focused on content: by grouping the macro writer (or author)
+can limit the scope to a specific part of the text or keep certain
+macros live within their own world.
+
+\starttyping
+.1. \bgroup .2. \egroup .1.
+\stoptyping
+
+Everything done at 2 is local unless explicitly told otherwise.
+This means that users can write (and share) macros with a small
+chance of clashes. In \METAPOST\ grouping is available too, but
+variables explicitly need to be saved.
+
+\starttyping
+.1. begingroup ; save p ; path p ; .2. endgroup .1.
+\stoptyping
+
+After using \METAPOST\ for a while this feels quite natural
+because an enforced local scope demands multiple return values
+which is not part of the macro language. Actually, this is another
+fundamental difference between the languages: \METAPOST\ has (a
+kind of) functions, which \TEX\ lacks. In \METAPOST\ you can write
+
+\starttyping
+draw origin for i=1 upto 10 : .. (i,sin(i)) endfor ;
+\stoptyping
+
+but also:
+
+\starttyping
+draw some(0) for i=1 upto 10 : .. some(i) endfor ;
+\stoptyping
+
+with
+
+\starttyping
+vardef some (expr i) =
+ if i > 4 : i = i - 4 fi ;
+ (i,sin(i))
+enddef ;
+\stoptyping
+
+The condition and assignment in no way interfere with the loop where
+this function is called, as long as some value is returned (a pair in
+this case).
+
+In \TEX\ things work differently. Take this:
+
+\starttyping
+\count0=1
+\message{\advance\count0 by 1 \the\count0}
+\the\count0
+\stoptyping
+
+The terminal wil show:
+
+\starttyping
+\advance \count 0 by 1 1
+\stoptyping
+
+At the end the counter still has the value~1. There are quite some
+situations like this, for instance when data like a table of
+contents has to be written to a file. You cannot write macros where
+such calculations are done and hidden and only the result is seen.
+
+The nice thing about the way \LUA\ is presented to the user is that it
+permits the following:
+
+\starttyping
+\count0=1
+\message{\directlua0{tex.count[0] = tex.count[0] + 1}\the\count0}
+\the\count0
+\stoptyping
+
+This will report~2 to the terminal and typeset a 2 in the
+document. Of course this does not solve everything, but it is a
+step forward. Also, compared to \TEX\ and \METAPOST, grouping is
+done differently: there is a \type {local} prefix that makes
+variables (and functions are variables too) local in modules,
+functions, conditions, loops etc. The \LUA\ code in this story
+contains such locals.
+
+In practice most users will use a macro package and so, if a user
+sees \TEX, he or she sees a user interface, not the code behind
+it. As such, they will also not encounter the code written in
+\LUA\ that deals with for instance fonts or node list
+manipulations. If a user sees \LUA, it will most probably be in
+processing actual data. Therefore, in the next section I will give an
+example of two ways to deal with \XML: one more suitable for
+traditional \TEX, and one inspired by \LUA. It demonstrates how
+the availability of \LUA\ can result in different solutions for
+the same problem.
+
+\subject {an example: xml}
+
+In \CONTEXT\ \MKII, the version that deals with \PDFTEX\ and \XETEX,
+we use a stream based \XML\ parser, written in \TEX. Each \type {<}
+and \type {&} triggers a macro that then parses the tag and/or entity.
+This method is quite efficient in terms of memory but the associated
+code is not simple because it has to deal with attributes, namespaces
+and nesting.
+
+The user interface is not that complex, but involves quite some
+commands. Take for instance the following \XML\ snippet:
+
+\starttyping
+<document>
+ <section>
+ <title>Whatever</title>
+ <p>some text</p>
+ <p>some more</p>
+ </section>
+</document>
+\stoptyping
+
+When using \CONTEXT\ commands, we can imagine the following definitions:
+
+\starttyping
+\defineXMLenvironment[document]{\starttext} {\stoptext}
+\defineXMLargument [title] {\section}
+\defineXMLenvironment[p] {\ignorespaces}{\par}
+\stoptyping
+
+When attributes have to be dealt with, for instance a reference to
+this section, things quickly start looking more complex. Also,
+users need to know what definitions to use in situations like this:
+
+\starttyping
+<table>
+ <tr><td>first</td><td>...</td> <td>last</td></tr>
+ <tr><td>left</td><td>...</td> <td>right</td></tr>
+</table>
+\stoptyping
+
+Here we cannot be sure if a cell does not contain a nested table,
+which is why we need to define the mapping as follows:
+
+\starttyping
+\defineXMLnested[table]{\bTABLE} {\eTABLE}
+\defineXMLnested[tr] {\bTR} {\eTR}
+\defineXMLnested[td] {\bTD} {\eTD}
+\stoptyping
+
+The \type {\defineXMLnested} macro is rather messy because it has
+to collect snippets and keep track of the nesting level, but users
+don't see that code, they just need to know when to use what
+macro. Once it works, it keeps working.
+
+Unfortunately mappings from source to style are never that simple
+in real life. We usually need to collect, filter and relocate
+data. Of course this can be done before feeding the source to
+\TEX, but \MKII\ provides a few mechanisms for that too. If for
+instance you want to reverse the order you can do this:
+
+\starttyping
+<article>
+ <title>Whatever</title>
+ <author>Someone</author>
+ <p>some text</p>
+</article>
+\stoptyping
+
+\starttyping
+\defineXMLenvironment[article]
+ {\defineXMLsave[author]}
+ {\blank author: \XMLflush{author}}
+\stoptyping
+
+This will save the content of the \type {author} element and flush
+it when the end tag \type {article} is seen. So, given previous
+definitions, we will get the title, some text and then the author.
+You may argue that instead we should use for instance \XSLT\ but
+even then a mapping is needed from the \XML\ to \TEX, and it's a
+matter of taste where the burden is put.
+
+Because \CONTEXT\ also wants to support standards like
+\MATHML, there are some more mechanisms but these are hidden from
+the user. And although these do a good job in most cases, the code
+associated with the solutions has never been satisfying.
+
+Supporting \XML\ this way is doable, and \CONTEXT\ has used this method
+for many years in fairly complex situations. However, now that we
+have \LUA\ available, it is possible to see if some things can be done
+simpler (or differently).
+
+After some experimenting I decided to write a full blown \XML\
+parser in \LUA, but contrary to the stream based approach, this
+time the whole tree is loaded in memory. Although this uses more
+memory than a streaming solution, in practice the difference is
+not significant because often in \MKII\ we also needed to store
+whole chunks.
+
+Loading \XML\ files in memory is real fast and once it is done we
+can have access to the elements in a way similar to \XPATH. We can
+selectively pipe data to \TEX\ and manipulate content using \TEX\
+or \LUA. In most cases this is faster than the stream|-|based
+method. Interesting is that we can do this without linking to
+existing \XML\ libraries, and as a result we are pretty
+independent.
+
+So how does this look from the perspective of the user? Say that
+we have the simple article definition stored in \type {demo.xml}.
+
+\starttyping
+<?xml version ='1.0'?>
+<article>
+ <title>Whatever</title>
+ <author>Someone</author>
+ <p>some text</p>
+</article>
+\stoptyping
+
+This time we associate so called setups with the elements. Each
+element can have its own setup, and we can use expressions to
+assign them. Here we have just one such setup:
+
+\starttyping
+\startxmlsetups xml:document
+ \xmlsetsetup{main}{article}{xml:article}
+\stopxmlsetups
+\stoptyping
+
+When loading the document it will automatically be associated with the tag \type
+{main}. The previous rule associates setup \type {xml:article}
+with the \type {article} element in tree \type {main}. We need to
+register this setup so that it will be applied to the document
+after loading:
+
+\starttyping
+\xmlregistersetup{xml:document}
+\stoptyping
+
+and the document itself is processed with:
+
+\starttyping
+\xmlprocessfile{main}{demo.xml}{} % optional setup
+\stoptyping
+
+The setup \type {xml:article} can look as follows:
+
+\starttyping
+\startxmlsetups xml:article
+ \section{\xmltext{#1}{/title}}
+ \xmlall{#1}{!(title|author)}
+ \blank author: \xmltext{#1}{/author}
+\stopxmlsetups
+\stoptyping
+
+Here \type {#1} refers to the current node in the \XML\ tree, in
+this case the root element, \type {article}. The second argument
+of \type {\xmltext} and \type {\xmlall} is a path expression,
+comparable with \XPATH: \type {/title} means: the \type {title}
+element anchored to the current root (\type{#1}), and \type
+{!(title|author)} is the negation of (complement to) \type{title}
+or \type {author}. Such expressions can be more complex that the
+one above, like:
+
+\starttyping
+\xmlfirst{#1}{/one/(alpha|beta)/two/text()}
+\stoptyping
+
+which returns the content of the first element that satisfies one of
+the paths (nested tree):
+
+\starttyping
+/one/alpha/two
+/one/beta/two
+\stoptyping
+
+There is a whole bunch of commands like \type {\xmltext} that
+filter content and pipe it into \TEX. These are calling \LUA\
+functions. This is no manual, so we will not discuss them here.
+However, it is important to realize that we have to associate
+setups (consider them free formatted macros) to at least one
+element in order to get started. Also, \XML\ inclusions have to be
+dealt with before assigning the setups. These are simple
+one|-|line commands. You can also assign defaults to elements,
+which saves some work.
+
+Because we can use \LUA\ to access the tree and manipulate
+content, we can now implement parts of \XML\ handling in \LUA. An
+example of this is dealing with so|-|called Cals tables. This is
+done in approximately 150 lines of \LUA\ code, loaded at runtime in a
+module. This time the association uses functions instead of setups and those
+functions will pipe data back to \TEX. In the module you will find:
+
+\starttyping
+\startxmlsetups xml:cals:process
+ \xmlsetfunction {\xmldocument} {cals:table} {lxml.cals.table}
+\stopxmlsetups
+
+\xmlregistersetup{xml:cals:process}
+
+\xmlregisterns{cals}{cals}
+\stoptyping
+
+These commands tell \MKIV\ that elements with a namespace
+specification that contains \type {cals} will be remapped to the
+internal namespace \type {cals} and the setup associates a
+function with this internal namespace.
+
+By now it will be clear that from the perspective of the user
+hardly any \LUA\ is visible. Sure, he or she can deduce that deep
+down some magic takes place, especially when you run into more
+complex expressions like this (the \type {@} denotes an
+attribute):
+
+\starttyping
+\xmlsetsetup
+ {main} {item[@type='mpctext' or @type='mrtext']}
+ {questions:multiple:text}
+\stoptyping
+
+Such expressions resemble \XPATH, but can go much further than
+that, just by adding more functions to the library.
+
+\starttyping
+b[position() > 2 and position() < 5 and text() == 'ok']
+b[position() > 2 and position() < 5 and text() == upper('ok')]
+b[@n=='03' or @n=='08']
+b[number(@n)>2 and number(@n)<6]
+b[find(text(),'ALSO')]
+\stoptyping
+
+Just to give you an idea \unknown\ in the module that implements
+the parser you will find definitions that match the function calls
+in the above expressions.
+
+\starttyping
+xml.functions.find = string.find
+xml.functions.upper = string.upper
+xml.functions.number = tonumber
+\stoptyping
+
+So much for the different approaches. It's up to the user what
+method to use: stream based \MKII, tree based \MKIV, or a mixture.
+
+The main reason for taking \XML\ as an example of mixing \TEX\ and
+\LUA\ is in that it can be a bit mind boggling if you start
+thinking of what happens behind the screens. Say that we have
+
+\starttyping
+<?xml version ='1.0'?>
+<article>
+ <title>Whatever</title>
+ <author>Someone</author>
+ <p>some <b>bold</b> text</p>
+</article>
+\stoptyping
+
+and that we use the setup shown before with \type {article}.
+
+At some point, we are done with defining setups and load the
+document. The first thing that happens is that the list of
+manipulations is applied: file inclusions are processed first,
+setups and functions are assigned next, maybe some elements are
+deleted or added, etc. When that is done we serialize the tree to
+\TEX, starting with the root element. When piping data to \TEX\ we
+use the current catcode regime; linebreaks and spaces are honored
+as usual.
+
+Each element can have a function (command) associated and when
+this is the case, control is given to that function. In our case
+the root element has such a command, one that will trigger a
+setup. And so, instead of piping content to \TEX, a function is
+called that lets \TEX\ expand the macro that deals with this
+setup.
+
+However, that setup itself calls \LUA\ code that filters the title
+and feeds it into the \type {\section} command, next it flushes
+everything except the title and author, which again involves
+calling \LUA. Last it flushes the author. The nested sequence
+of events is as follows:
+
+\startitemize[2*broad]
+
+ \sym{lua:} Load the document and apply setups and alike.
+
+ \sym{lua:} Serialize the \type {article} element, but since
+ there is an associated setup, tell \TEX\ do expand that one
+ instead.
+
+ \startitemize[2*broad]
+
+ \sym{tex:} Execute the setup, first expand the \type {\section}
+ macro, but its argument is a call to \LUA.
+
+ \startitemize[2*broad]
+
+ \sym{lua:} Filter \type {title} from the subtree under
+ \type {article}, print the content to \TEX\ and return
+ control to \TEX.
+
+ \stopitemize
+
+ \sym{tex:} Tell \LUA\ to filter the paragraphs i.e.\ skip \type
+ {title} and \type {author}; since the \type {b} element has
+ no associated setup (or whatever) it is just serialized.
+
+ \startitemize[2*broad]
+
+ \sym{lua:} Filter the requested elements and return control
+ to \TEX.
+
+ \stopitemize
+
+ \sym{tex:} Ask \LUA\ to filter \type {author}.
+
+ \startitemize[2*broad]
+ \sym{lua:} Pipe \type {author}'s content to \TEX.
+ \stopitemize
+
+ \sym{tex:} We're done.
+
+ \stopitemize
+
+ \sym{lua:} We're done.
+
+\stopitemize
+
+This is a really simple case. In my daily work I am dealing
+with rather extensive and complex educational documents where in
+one source there is text, math, graphics, all kind of fancy stuff,
+questions and answers in several categories and of different kinds,
+either or not to be reshuffled, omitted or combined. So there
+we are talking about many more levels of \TEX\ calling \LUA\ and \LUA\
+piping to \TEX\ etc. To stay in \TEX\ speak: we're dealing with
+one big ongoing nested expansion (because \LUA calls expand), and
+you can imagine that this somewhat stresses \TEX's input stack, but
+so far I have not encountered any problems.
+
+\subject{some remarks}
+
+Here I discussed several possible applications of \LUA\ in \TEX. I
+didn't mention yet that because \LUATEX\ contains a scripting engine
+plus some extra libraries, it can also be used purely for that.
+This means that support programs can now be written in \LUA\ and
+that there are no longer dependencies of other scripting engines
+being present on the system. Consider this a bonus.
+
+Usage in \TEX\ can be organized in four categories:
+
+\startitemize[n]
+\item Users can use \LUA\ for generating data, do all kind of
+ data manipulations, maybe read data from file, etc. The
+ only link with \TEX\ is the print function.
+\item Users can use information provided by \TEX\ and use this
+ when making decisions. An example is collecting data in
+ boxes and use \LUA\ to do calculations with the dimensions.
+ Another example is a converter from \METAPOST\ output to
+ \PDF\ literals. No real knowledge of \TEX's internals is
+ needed. The \MKIV\ \XML\ functionality discussed before
+ demonstrates this: it's mostly data processing and piping
+ to \TEX. Other examples are dealing with buffers, defining
+ character mappings, and handling error messages, verbatim
+ \unknown\ the list is long.
+\item Users can extend \TEX's core functionality. An example is
+ support for \OPENTYPE\ fonts: \LUATEX\ itself does not
+ support this format directly, but provides ways to feed
+ \TEX\ with the relevant information. Support for \OPENTYPE\
+ features demands manipulating node lists. Knowledge of
+ internals is a requirement. Advanced spacing and language
+ specific features are made possible by node list
+ manipulations and attributes. The alternative \type {\Words}
+ macro is an example of this.
+\item Users can replace existing \TEX\ functionality. In \MKIV\
+ there are numerous example of this, for instance all file
+ \IO\ is written in \LUA, including reading from \ZIP\ files
+ and remote locations. Loading and defining fonts is also
+ under \LUA\ control. At some point \MKIV\ will provide
+ dedicated splitters for multicolumn typesetting and
+ probably also better display spacing and display
+ math splitting.
+\stopitemize
+
+The boundaries between these categories are not frozen. For
+instance, support for image inclusion and \MPLIB\ in \CONTEXT\
+\MKIV\ sits between category 3 and~4. Category 3 and~4, and
+probably also~2 are normally the domain of macro package writers
+and more advanced users who contribute to macro packages. Because
+a macropackage has to provide some stability it is not a good idea
+to let users mess around with all those internals, because of
+potential interference. On the other hand, normally users operate
+on top of a kernel using some kind of \API\ and history has
+proved that macro packages are stable enough for this.
+
+Sometime around 2010 the team expects \LUATEX\ to be feature
+complete and stable. By that time I can probably provide a more
+detailed categorization.
+
+\stopcomponent