diff options
Diffstat (limited to 'doc/context/sources/general/manuals/mk/mk-mix.tex')
-rw-r--r-- | doc/context/sources/general/manuals/mk/mk-mix.tex | 1014 |
1 files changed, 1014 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/mk/mk-mix.tex b/doc/context/sources/general/manuals/mk/mk-mix.tex new file mode 100644 index 000000000..dd2c72d5b --- /dev/null +++ b/doc/context/sources/general/manuals/mk/mk-mix.tex @@ -0,0 +1,1014 @@ +% language=uk + +\startcomponent mk-mix + +\environment mk-environment + +\chapter{The \luaTeX\ Mix} + +\subject{introduction} + +The idea of embedding \LUA\ into \TEX\ originates in some +experiments with \LUA\ embedded in the \SCITE\ editor. You can add +functionality to this editor by loading \LUA\ scripts. This is +accomplished by a library that gives access to the internals of +the editing component. + +The first integration of \LUA\ in \PDFTEX\ was relatively simple: +from \TEX\ one could call out to \LUA\ and from \LUA\ one could +print to \TEX. My first application was converting math encoded a +calculator syntax to \TEX. Following experiments dealt with +\METAPOST. At this point integration meant as little as: having some +scripting language as addition to the macro language. But, even in +this early stage further possibilities were explored, for instance +in manipulating the final output (i.e.\ the \PDF\ code). The first +versions of what by then was already called \LUATEX\ provided +access to some internals, like counter and dimension registers and +the dimensions of boxes. + +Boosted by the oriental \TeX\ project, the team started exploring +more fundamental possibilities: hooks in the input|/|output, +tokenization, fonts and nodelists. This was followed by opening up +hyphenation, breaking lines into paragraphs and building +ligatures. At that point we not only had access to some internals +but also could influence the way \TEX\ operates. + +After that, an excursion was made to \MPLIB, which fulfilled a +long standing wish for a more natural integration of \METAPOST\ +into \TEX. At that point we ended up with mixtures of \TEX, \LUA\ +and \METAPOST\ code. + +Medio 2008 we still need to open up more of \TEX, like page +building, math, alignments and the backend. Eventually \LUATEX\ +will be nicely split up in components, rewritten in \CCODE, and we may +even end up with \LUA\ glueing together the components that make +up the \TEX\ engine. At that point the interoperation between +\TEX\ and \LUA\ may be more rich that it is now. + +In the next sections I will discuss some of the ideas behind +\LUATEX\ and the relationship between \LUA\ and \TEX\ and how it +presents itself to users. I will not discuss the interface itself, +which consists of quite some functions (organized in pseudo +libraries) and the mechanisms used to access and replace internals +(we call them callbacks). + +\subject {tex vs. lua} + +\TEX\ is a macro language. Everything boils down to either allowing +stepwise expansion or explicitly preventing it. There are no real +control features, like loops; tail recursion is a key concept. +There are few accessible data|-|structures like numbers, dimensions, +glue, token lists and boxes. What happens inside \TEX\ is +controlled by variables, mostly hidden from view, and optimized +within the constraints of 30 years ago. + +The original idea behind \TEX\ was that an author would write a +specific collection of macros for each publication, but increasing +popularity among non-programmers quickly resulted in distributed +collections of macros, called macro packages. They started small +but grew and grew and by now have become pretty large. In these +packages there are macros dealing with fonts, structure, page +layout, graphic inclusion, etc. There is also code dealing with +user interfaces, process control, conversion and much of that code +looks out of place: the lack of control features and string +manipulation is solved by mimicking other languages, the +unavailability of a float datatype is compensated by misusing +dimension registers, and you can find provisions to force or +inhibit expansion all over the place. + +\TEX\ is a powerful typographical programming language but +lacks some of the handy features of scripting languages. Handy in the +sense that you will need them when you want to go beyond the +original purpose of the system. \LUA\ is a powerful scripting +language, but knows nothing of typesetting. To some extent it +resembles the language that \TEX\ was written in: \PASCAL. And, +since \LUA\ is meant for embedding and extending existing systems, +it makes sense to bring \LUA\ into \TEX. How do they compare? +Let's give some examples. + +About the simplest example of using \LUA\ in \TEX\ is the following: + +\starttyping +\directlua { tex.print(math.sqrt(10)) } +\stoptyping + +This kind of application is probably what most users will want and +use, if they use \LUA\ at all. However, we can go further than that. + +In \TEX\ a loop can be implemented as in the plain format +(copied with comment): + +\starttyping +\def\loop#1\repeat{\def\body{#1}\iterate} +\def\iterate{\body\let\next\iterate\else\let\next\relax\fi\next} +\let\repeat=\fi % this makes \loop...\if...\repeat skippable +\stoptyping + +This is then used as: + +\starttyping +\newcount \mycounter \mycounter=1 +\loop + ... + \advance\mycounter 1 + \ifnum\mycounter < 11 +\repeat +\stoptyping + +The definition shows a bit how \TEX\ programming works. Of course +such definitions can be wrapped in macros, like: + +\starttyping +\forloop{1}{10}{1}{some action} +\stoptyping + +and this is what often happens in more complex macro packages. In +order to use such control loops without side effects, the macro +writer needs to take measures that permit for instance nested +usage and avoids clashes between local variables (counters or +macros) and user defined ones. Here we use a counter in the +condition, but in practice expressions will be more complex +and this is not that trivial to implement. + +The original definition of the iterator can be written a bit +more efficient: + +\starttyping +\def\iterate{\body \expandafter\iterate \fi} +\stoptyping + +And indeed, in macro packages you will find many such expansion +control primitives being used, which does not make reading macros +easier. + +Now, get me right, this does not make \TEX\ less powerful, it's +just that the language is focused on typesetting and not on +general purpose programming, and in principle users can do +without: documents can be preprocessed using another language, and +document specific styles can be used. + +We have to keep in mind that \TEX\ was written in a time when +resources in terms of memory and \CPU\ cycles weres less abundant +than they are now. The 255 registers per class and the about 3000 +hash slots in original \TEX\ were more than enough for typesetting +a book, but in huge collections of macros they are not all that much. For +that reason many macropackages use obscure names to hide their +private registers from users and instead of allocating new ones +with meaningful names, existing ones are shared. It is therefore +not completely fair to compare \TEX\ code with \LUA\ code: in \LUA\ +we have plenty of memory and the only limitations are those +imposed by modern computers. + +In \LUA, a loop looks like this: + +\starttyping +for i=1,10 do + ... +end +\stoptyping + +But while in the \TEX\ example, the content directly ends up in +the input stream, in \LUA\ we need to do that explicitly, so in +fact we will have: + +\starttyping +for i=1,10 do + tex.print("...") +end +\stoptyping + +And, in order to execute this code snippet, in \LUATEX\ we will do: + +\starttyping +\directlua 0 { + for i=1,10 do + tex.print("...") + end +} +\stoptyping + +So, eventually we will end up with more code than just \LUA\ code, +but still the loop itself looks quite readable and more complex loops +are possible: + +\starttyping +\directlua 0 { + local t, n = { }, 0 + while true do + local r = math.random(1,10) + if not t[r] then + t[r], n = true, n+1 + tex.print(r) + if n == 10 then break end + end + end +} +\stoptyping + +This will typeset the numbers 1 to 10 in randomized order. +Implementing a random number generator in pure \TEX\ takes some bit of +code and keeping track of already defined numbers in macros can be +done with macros, but both are not very efficient. + +I already stressed that \TEX\ is a typographical programming +language and as such some things in \TEX\ are easier than in \LUA, +given some access to internals: + +\starttyping +\setbox0=\hbox{x} \the\wd0 +\stoptyping + +In \LUA\ we can do this as follows: + +\starttyping +\directlua 0 { + local n = node.new('glyph') + n.font = font.current() + n.char = string.byte('x') + tex.box[0] = node.hpack(n) + tex.print(tex.box[0].width/65536 .. "pt") +} +\stoptyping + +One pitfall here is that \TEX\ rounds the number differently than +\LUA. Both implementations can be wrapped in a macro cq. function: + +\starttyping +\def\measured#1{\setbox0=\hbox{#1}\the\wd0\relax} +\stoptyping + +Now we get: + +\starttyping +\measured{x} +\stoptyping + +The same macro using \LUA\ looks as follows: + +\starttyping +\directlua 0 { + function measure(chr) + local n = node.new('glyph') + n.font = font.current() + n.char = string.byte(chr) + tex.box[0] = node.hpack(n) + tex.print(tex.box[0].width/65536 .. "pt") + end +} +\def\measured#1{\directlua0{measure("#1")}} +\stoptyping + +In both cases, special tricks are needed if you want to pass for +instance a \type {#} to \TEX's variant, or a \type {"} to \LUA. In +both cases we can use shortcuts like \type {\#} and in the second +case we can pass strings as long strings using double square +brackets to \LUA. + +This example is somewhat misleading. Imagine that we want to +pass more than one character. The \TEX\ variant is already suited +for that, but the function will now look like: + +\starttyping +\directlua 0 { + function measure(str) + if str == "" then + tex.print("0pt") + else + local head, tail = nil, nil + for chr in str:gmatch(".") do + local n = node.new('glyph') + n.font = font.current() + n.char = string.byte(chr) + if not head then + head = n + else + tail.next = n + end + tail = n + end + tex.box[0] = node.hpack(head) + tex.print(tex.box[0].width/65536 .. "pt") + end + end +} +\stoptyping + +And still it's not okay, since \TEX\ inserts kerns between +characters (depending on the font) and glue between words, and +doing that all in \LUA\ takes more code. So, it will be clear that +although we will use \LUA\ to implement advanced features, \TEX\ +itself still has quite some work to do. + +In the following example we show code, but this is not of +production quality. It just demonstrates a new way of dealing +with text in \TEX. + +Occasionally a design demands that at some place the first +character of each word should be uppercase, or that the first word +of a paragraph should be in small caps, or that each first line of a +paragraph has to be in dark blue. When using traditional \TEX\ the user +then has to fall back on parsing the data stream, and preferably +you should then start such a sentence with a command that can pick +up the text. For accentless languages like English this is quite +doable but as soon as commands (for instance dealing with accents) +enter the stream this process becomes quite hairy. + +The next code shows how \CONTEXT\ \MKII\ defines the \type {\Word} +and \type {\Words} macros that capitalize the first characters of +word(s). The spaces are really important here because they signal +the end of a word. + +\starttyping +\def\doWord#1% + {\bgroup\the\everyuppercase\uppercase{#1}\egroup} + +\def\Word#1% + {\doWord#1} + +\def\doprocesswords#1 #2\od + {\doifsomething{#1}{\processword{#1} \doprocesswords#2 \od}} + +\def\processwords#1% + {\doprocesswords#1 \od\unskip} + +\let\processword\relax + +\def\Words + {\let\processword\Word \processwords} +\stoptyping + +Actually, the code is not that complex. We split of words and feed +them to a macro that picks up the first token (hopefully a character) +which is then fed into the \type {\uppercase} primitive. This assumes that +for each character a corresponding uppercase variant is defined using the +\type {\uccode} primitive. Exceptions can be dealt with by assigning relevant +code to the token register \type {\everyuppercase}. +However, such macros are far from robust. What happens if the text +is generated and not input as-is? What happens with commands in +the stream that do something with the following tokens? + +A \LUA\ based solution can look as follows: + +\starttyping +\def\Words#1{\directlua 0 + for s in unicode.utf8.gmatch("#1", "([^ ])") do + tex.sprint(string.upper(s:sub(1,1)) .. s:sub(2)) + end +} +\stoptyping + +But there is no real advantage here, apart from the fact that less code +is needed. We still operate on the input and therefore we need to look +to a different kind of solution: operating on the node list. + +\starttyping +function CapitalizeWords(head) + local done = false + local glyph = node.id("glyph") + for start in node.traverse_id(glyph,head) do + local prev, next = start.prev, start.next + if prev and prev.id == kern and prev.subtype == 0 then + prev = prev.prev + end + if next and next.id == kern and next.subtype == 0 then + next = next.next + end + if (not prev or prev.id ~= glyph) and + next and next.id == glyph then + done = upper(start) + end + end + return head, done +end +\stoptyping + +A node list is a forward|-|linked list. With a helper +function in the \type {node} library we can loop over such lists. Instead +of traversing we can use a regular while loop, but it is probably less +efficient in this case. But how to apply this function to the relevant +part of the input? In \LUATEX\ there are several callbacks that operate +on the horizontal lists and we can use one of them to plug in this +function. However, in that case the function is applied to probably +more text than we want. + +The solution for this is to assign attributes to the range of text +that such a function has to take care of. These attributes (there +can be many) travel with the nodes. This is also a reason why such +code normally is not written by end users, but by macropackage +writers: they need to provide the frameworks where you can plug in +code. In \CONTEXT\ we have several such mechanisms and therefore +in \MKIV\ this function looks (slightly stripped) as follows: + +\starttyping +function cases.process(namespace,attribute,head) + local done, actions = false, cases.actions + for start in node.traverse_id(glyph,head) do + local attr = has_attribute(start,attribute) + if attr and attr > 0 then + unset_attribute(start,attribute) + local action = actions[attr] + if action then + local _, ok = action(start) + done = done and ok + end + end + end + return head, done +end +\stoptyping + +Here we check attributes (these are set at the \TEX\ end) and we have +all kind of actions that can be applied, depending on the value of the +attribute. Here the function that does the actual uppercasing +is defined somewhere else. The \type {cases} table provides us a +namespace; such namespaces needs to be coordinated by macro package +writers. + +This approach means that the macro code looks completely different; in +pseudo code we get: + +\starttyping +\def\Words#1{{<setattribute><cases><somevalue>#1}} +\stoptyping + +Or alternatively: + +\starttyping +\def\StartWords{\begingroup<setattribute><cases><somevalue>} +\def\StopWords {\endgroup} +\stoptyping + +Because starting a paragraph with a group can have unwanted side +effects (like \type {\everypar} being expanded inside a group) a +variant is: + +\starttyping +\def\StartWords{<setattribute><cases><somevalue>} +\def\StopWords {<resetattribute><cases>} +\stoptyping + +So, what happens here is that the users sets an attribute using some high +level command, and at some point during the transformation of the input into +node lists, some action takes place. At that point commands, expansion and +whatever no longer can interfere. + +In addition to some infrastructure, macro packages need to carry some +knowledge, just as with the \type {\uccode} used in \type {\uppercase}. +The \type {upper} function in the first example looks as follows: + +\starttyping +local function upper(start) + local data, char = characters.data, start.char + if data[char] then + local uc = data[char].uccode + if uc and fonts.ids[start.font].characters[uc] then + start.char = uc + return true + end + end + return false +end +\stoptyping + +Such code is really macro package dependent: \LUATEX\ only +provides the means, not the solutions. In \CONTEXT\ we have +collected information about characters in a \type {data} table +in the \type {characters} namespace. There we have stored the +uppercase codes (\type {uccode}). The, again \CONTEXT\ specific, +\type {fonts} table keeps track of all defined fonts and before +we change the case, we make sure that this character is present +in the font. Here \type {id} is the number by which +\LUATEX\ keeps track of the used fonts. Each glyph node carries +such a reference. + +In this example, eventually we end up with more code than in \TEX, +but the solution is much more robust. Just imagine what would happen +when in the \TEX\ solution we would have: + +\starttyping +\Words{\framed[offset=3pt]{hello world}} +\stoptyping + +It simply does not work. On the other hand, the \LUA\ code never +sees \TEX\ commands, it only sees the two words represented by +glyphs nodes and separated by glue. + +Of course, there is a danger when we start opening \TEX's core +features. Currently macro packages know what to expect, they know +what \TEX\ can and cannot do. Of course macro writers have +exploited every corner of \TEX, even the dark ones. Where dirty +tricks in the \TEX book had an educational purpose, those of users +sometimes have obscene traits. If we just stick to the trickery +introduced for parsing input, converting this into that, doing +some calculations, and alike, it will be clear that \LUA\ is more +than welcome. It may hurt to throw away thousands of lines of +impressive code and replace it by a few lines of \LUA\ but that's +the price the user pays for abusing \TEX. Eventually \CONTEXT\ \MKIV\ +will be a decent mix of \LUA\ and \TEX\ code, and hopefully the +solutions programmed in those languages are as clean as possible. + +Of course we can discuss until eternity whether \LUA\ is the best +choice. Taco, Hartmut and I are pretty confident that it is, and +in the couple of years that we are working on \LUATEX\ nothing has proved +us wrong yet. We can fantasize about concepts, only to find out that +they are impossible to implement or hard to agree on; we just go +ahead using trial and error. We can talk over and over how opening up +should be done, which is what the team does in a nicely +closed and efficient loop, but at some points decisions have to be +made. Nothing is perfect, neither is \LUATEX, but most users won't +notice it as long as it extends \TEX's live and makes usage more +convenient. + +Users of \TEX\ and \METAPOST\ will have noticed that both +languages have their own grouping (scope) model. In \TEX\ grouping +is focused on content: by grouping the macro writer (or author) +can limit the scope to a specific part of the text or keep certain +macros live within their own world. + +\starttyping +.1. \bgroup .2. \egroup .1. +\stoptyping + +Everything done at 2 is local unless explicitly told otherwise. +This means that users can write (and share) macros with a small +chance of clashes. In \METAPOST\ grouping is available too, but +variables explicitly need to be saved. + +\starttyping +.1. begingroup ; save p ; path p ; .2. endgroup .1. +\stoptyping + +After using \METAPOST\ for a while this feels quite natural +because an enforced local scope demands multiple return values +which is not part of the macro language. Actually, this is another +fundamental difference between the languages: \METAPOST\ has (a +kind of) functions, which \TEX\ lacks. In \METAPOST\ you can write + +\starttyping +draw origin for i=1 upto 10 : .. (i,sin(i)) endfor ; +\stoptyping + +but also: + +\starttyping +draw some(0) for i=1 upto 10 : .. some(i) endfor ; +\stoptyping + +with + +\starttyping +vardef some (expr i) = + if i > 4 : i = i - 4 fi ; + (i,sin(i)) +enddef ; +\stoptyping + +The condition and assignment in no way interfere with the loop where +this function is called, as long as some value is returned (a pair in +this case). + +In \TEX\ things work differently. Take this: + +\starttyping +\count0=1 +\message{\advance\count0 by 1 \the\count0} +\the\count0 +\stoptyping + +The terminal wil show: + +\starttyping +\advance \count 0 by 1 1 +\stoptyping + +At the end the counter still has the value~1. There are quite some +situations like this, for instance when data like a table of +contents has to be written to a file. You cannot write macros where +such calculations are done and hidden and only the result is seen. + +The nice thing about the way \LUA\ is presented to the user is that it +permits the following: + +\starttyping +\count0=1 +\message{\directlua0{tex.count[0] = tex.count[0] + 1}\the\count0} +\the\count0 +\stoptyping + +This will report~2 to the terminal and typeset a 2 in the +document. Of course this does not solve everything, but it is a +step forward. Also, compared to \TEX\ and \METAPOST, grouping is +done differently: there is a \type {local} prefix that makes +variables (and functions are variables too) local in modules, +functions, conditions, loops etc. The \LUA\ code in this story +contains such locals. + +In practice most users will use a macro package and so, if a user +sees \TEX, he or she sees a user interface, not the code behind +it. As such, they will also not encounter the code written in +\LUA\ that deals with for instance fonts or node list +manipulations. If a user sees \LUA, it will most probably be in +processing actual data. Therefore, in the next section I will give an +example of two ways to deal with \XML: one more suitable for +traditional \TEX, and one inspired by \LUA. It demonstrates how +the availability of \LUA\ can result in different solutions for +the same problem. + +\subject {an example: xml} + +In \CONTEXT\ \MKII, the version that deals with \PDFTEX\ and \XETEX, +we use a stream based \XML\ parser, written in \TEX. Each \type {<} +and \type {&} triggers a macro that then parses the tag and/or entity. +This method is quite efficient in terms of memory but the associated +code is not simple because it has to deal with attributes, namespaces +and nesting. + +The user interface is not that complex, but involves quite some +commands. Take for instance the following \XML\ snippet: + +\starttyping +<document> + <section> + <title>Whatever</title> + <p>some text</p> + <p>some more</p> + </section> +</document> +\stoptyping + +When using \CONTEXT\ commands, we can imagine the following definitions: + +\starttyping +\defineXMLenvironment[document]{\starttext} {\stoptext} +\defineXMLargument [title] {\section} +\defineXMLenvironment[p] {\ignorespaces}{\par} +\stoptyping + +When attributes have to be dealt with, for instance a reference to +this section, things quickly start looking more complex. Also, +users need to know what definitions to use in situations like this: + +\starttyping +<table> + <tr><td>first</td><td>...</td> <td>last</td></tr> + <tr><td>left</td><td>...</td> <td>right</td></tr> +</table> +\stoptyping + +Here we cannot be sure if a cell does not contain a nested table, +which is why we need to define the mapping as follows: + +\starttyping +\defineXMLnested[table]{\bTABLE} {\eTABLE} +\defineXMLnested[tr] {\bTR} {\eTR} +\defineXMLnested[td] {\bTD} {\eTD} +\stoptyping + +The \type {\defineXMLnested} macro is rather messy because it has +to collect snippets and keep track of the nesting level, but users +don't see that code, they just need to know when to use what +macro. Once it works, it keeps working. + +Unfortunately mappings from source to style are never that simple +in real life. We usually need to collect, filter and relocate +data. Of course this can be done before feeding the source to +\TEX, but \MKII\ provides a few mechanisms for that too. If for +instance you want to reverse the order you can do this: + +\starttyping +<article> + <title>Whatever</title> + <author>Someone</author> + <p>some text</p> +</article> +\stoptyping + +\starttyping +\defineXMLenvironment[article] + {\defineXMLsave[author]} + {\blank author: \XMLflush{author}} +\stoptyping + +This will save the content of the \type {author} element and flush +it when the end tag \type {article} is seen. So, given previous +definitions, we will get the title, some text and then the author. +You may argue that instead we should use for instance \XSLT\ but +even then a mapping is needed from the \XML\ to \TEX, and it's a +matter of taste where the burden is put. + +Because \CONTEXT\ also wants to support standards like +\MATHML, there are some more mechanisms but these are hidden from +the user. And although these do a good job in most cases, the code +associated with the solutions has never been satisfying. + +Supporting \XML\ this way is doable, and \CONTEXT\ has used this method +for many years in fairly complex situations. However, now that we +have \LUA\ available, it is possible to see if some things can be done +simpler (or differently). + +After some experimenting I decided to write a full blown \XML\ +parser in \LUA, but contrary to the stream based approach, this +time the whole tree is loaded in memory. Although this uses more +memory than a streaming solution, in practice the difference is +not significant because often in \MKII\ we also needed to store +whole chunks. + +Loading \XML\ files in memory is real fast and once it is done we +can have access to the elements in a way similar to \XPATH. We can +selectively pipe data to \TEX\ and manipulate content using \TEX\ +or \LUA. In most cases this is faster than the stream|-|based +method. Interesting is that we can do this without linking to +existing \XML\ libraries, and as a result we are pretty +independent. + +So how does this look from the perspective of the user? Say that +we have the simple article definition stored in \type {demo.xml}. + +\starttyping +<?xml version ='1.0'?> +<article> + <title>Whatever</title> + <author>Someone</author> + <p>some text</p> +</article> +\stoptyping + +This time we associate so called setups with the elements. Each +element can have its own setup, and we can use expressions to +assign them. Here we have just one such setup: + +\starttyping +\startxmlsetups xml:document + \xmlsetsetup{main}{article}{xml:article} +\stopxmlsetups +\stoptyping + +When loading the document it will automatically be associated with the tag \type +{main}. The previous rule associates setup \type {xml:article} +with the \type {article} element in tree \type {main}. We need to +register this setup so that it will be applied to the document +after loading: + +\starttyping +\xmlregistersetup{xml:document} +\stoptyping + +and the document itself is processed with: + +\starttyping +\xmlprocessfile{main}{demo.xml}{} % optional setup +\stoptyping + +The setup \type {xml:article} can look as follows: + +\starttyping +\startxmlsetups xml:article + \section{\xmltext{#1}{/title}} + \xmlall{#1}{!(title|author)} + \blank author: \xmltext{#1}{/author} +\stopxmlsetups +\stoptyping + +Here \type {#1} refers to the current node in the \XML\ tree, in +this case the root element, \type {article}. The second argument +of \type {\xmltext} and \type {\xmlall} is a path expression, +comparable with \XPATH: \type {/title} means: the \type {title} +element anchored to the current root (\type{#1}), and \type +{!(title|author)} is the negation of (complement to) \type{title} +or \type {author}. Such expressions can be more complex that the +one above, like: + +\starttyping +\xmlfirst{#1}{/one/(alpha|beta)/two/text()} +\stoptyping + +which returns the content of the first element that satisfies one of +the paths (nested tree): + +\starttyping +/one/alpha/two +/one/beta/two +\stoptyping + +There is a whole bunch of commands like \type {\xmltext} that +filter content and pipe it into \TEX. These are calling \LUA\ +functions. This is no manual, so we will not discuss them here. +However, it is important to realize that we have to associate +setups (consider them free formatted macros) to at least one +element in order to get started. Also, \XML\ inclusions have to be +dealt with before assigning the setups. These are simple +one|-|line commands. You can also assign defaults to elements, +which saves some work. + +Because we can use \LUA\ to access the tree and manipulate +content, we can now implement parts of \XML\ handling in \LUA. An +example of this is dealing with so|-|called Cals tables. This is +done in approximately 150 lines of \LUA\ code, loaded at runtime in a +module. This time the association uses functions instead of setups and those +functions will pipe data back to \TEX. In the module you will find: + +\starttyping +\startxmlsetups xml:cals:process + \xmlsetfunction {\xmldocument} {cals:table} {lxml.cals.table} +\stopxmlsetups + +\xmlregistersetup{xml:cals:process} + +\xmlregisterns{cals}{cals} +\stoptyping + +These commands tell \MKIV\ that elements with a namespace +specification that contains \type {cals} will be remapped to the +internal namespace \type {cals} and the setup associates a +function with this internal namespace. + +By now it will be clear that from the perspective of the user +hardly any \LUA\ is visible. Sure, he or she can deduce that deep +down some magic takes place, especially when you run into more +complex expressions like this (the \type {@} denotes an +attribute): + +\starttyping +\xmlsetsetup + {main} {item[@type='mpctext' or @type='mrtext']} + {questions:multiple:text} +\stoptyping + +Such expressions resemble \XPATH, but can go much further than +that, just by adding more functions to the library. + +\starttyping +b[position() > 2 and position() < 5 and text() == 'ok'] +b[position() > 2 and position() < 5 and text() == upper('ok')] +b[@n=='03' or @n=='08'] +b[number(@n)>2 and number(@n)<6] +b[find(text(),'ALSO')] +\stoptyping + +Just to give you an idea \unknown\ in the module that implements +the parser you will find definitions that match the function calls +in the above expressions. + +\starttyping +xml.functions.find = string.find +xml.functions.upper = string.upper +xml.functions.number = tonumber +\stoptyping + +So much for the different approaches. It's up to the user what +method to use: stream based \MKII, tree based \MKIV, or a mixture. + +The main reason for taking \XML\ as an example of mixing \TEX\ and +\LUA\ is in that it can be a bit mind boggling if you start +thinking of what happens behind the screens. Say that we have + +\starttyping +<?xml version ='1.0'?> +<article> + <title>Whatever</title> + <author>Someone</author> + <p>some <b>bold</b> text</p> +</article> +\stoptyping + +and that we use the setup shown before with \type {article}. + +At some point, we are done with defining setups and load the +document. The first thing that happens is that the list of +manipulations is applied: file inclusions are processed first, +setups and functions are assigned next, maybe some elements are +deleted or added, etc. When that is done we serialize the tree to +\TEX, starting with the root element. When piping data to \TEX\ we +use the current catcode regime; linebreaks and spaces are honored +as usual. + +Each element can have a function (command) associated and when +this is the case, control is given to that function. In our case +the root element has such a command, one that will trigger a +setup. And so, instead of piping content to \TEX, a function is +called that lets \TEX\ expand the macro that deals with this +setup. + +However, that setup itself calls \LUA\ code that filters the title +and feeds it into the \type {\section} command, next it flushes +everything except the title and author, which again involves +calling \LUA. Last it flushes the author. The nested sequence +of events is as follows: + +\startitemize[2*broad] + + \sym{lua:} Load the document and apply setups and alike. + + \sym{lua:} Serialize the \type {article} element, but since + there is an associated setup, tell \TEX\ do expand that one + instead. + + \startitemize[2*broad] + + \sym{tex:} Execute the setup, first expand the \type {\section} + macro, but its argument is a call to \LUA. + + \startitemize[2*broad] + + \sym{lua:} Filter \type {title} from the subtree under + \type {article}, print the content to \TEX\ and return + control to \TEX. + + \stopitemize + + \sym{tex:} Tell \LUA\ to filter the paragraphs i.e.\ skip \type + {title} and \type {author}; since the \type {b} element has + no associated setup (or whatever) it is just serialized. + + \startitemize[2*broad] + + \sym{lua:} Filter the requested elements and return control + to \TEX. + + \stopitemize + + \sym{tex:} Ask \LUA\ to filter \type {author}. + + \startitemize[2*broad] + \sym{lua:} Pipe \type {author}'s content to \TEX. + \stopitemize + + \sym{tex:} We're done. + + \stopitemize + + \sym{lua:} We're done. + +\stopitemize + +This is a really simple case. In my daily work I am dealing +with rather extensive and complex educational documents where in +one source there is text, math, graphics, all kind of fancy stuff, +questions and answers in several categories and of different kinds, +either or not to be reshuffled, omitted or combined. So there +we are talking about many more levels of \TEX\ calling \LUA\ and \LUA\ +piping to \TEX\ etc. To stay in \TEX\ speak: we're dealing with +one big ongoing nested expansion (because \LUA calls expand), and +you can imagine that this somewhat stresses \TEX's input stack, but +so far I have not encountered any problems. + +\subject{some remarks} + +Here I discussed several possible applications of \LUA\ in \TEX. I +didn't mention yet that because \LUATEX\ contains a scripting engine +plus some extra libraries, it can also be used purely for that. +This means that support programs can now be written in \LUA\ and +that there are no longer dependencies of other scripting engines +being present on the system. Consider this a bonus. + +Usage in \TEX\ can be organized in four categories: + +\startitemize[n] +\item Users can use \LUA\ for generating data, do all kind of + data manipulations, maybe read data from file, etc. The + only link with \TEX\ is the print function. +\item Users can use information provided by \TEX\ and use this + when making decisions. An example is collecting data in + boxes and use \LUA\ to do calculations with the dimensions. + Another example is a converter from \METAPOST\ output to + \PDF\ literals. No real knowledge of \TEX's internals is + needed. The \MKIV\ \XML\ functionality discussed before + demonstrates this: it's mostly data processing and piping + to \TEX. Other examples are dealing with buffers, defining + character mappings, and handling error messages, verbatim + \unknown\ the list is long. +\item Users can extend \TEX's core functionality. An example is + support for \OPENTYPE\ fonts: \LUATEX\ itself does not + support this format directly, but provides ways to feed + \TEX\ with the relevant information. Support for \OPENTYPE\ + features demands manipulating node lists. Knowledge of + internals is a requirement. Advanced spacing and language + specific features are made possible by node list + manipulations and attributes. The alternative \type {\Words} + macro is an example of this. +\item Users can replace existing \TEX\ functionality. In \MKIV\ + there are numerous example of this, for instance all file + \IO\ is written in \LUA, including reading from \ZIP\ files + and remote locations. Loading and defining fonts is also + under \LUA\ control. At some point \MKIV\ will provide + dedicated splitters for multicolumn typesetting and + probably also better display spacing and display + math splitting. +\stopitemize + +The boundaries between these categories are not frozen. For +instance, support for image inclusion and \MPLIB\ in \CONTEXT\ +\MKIV\ sits between category 3 and~4. Category 3 and~4, and +probably also~2 are normally the domain of macro package writers +and more advanced users who contribute to macro packages. Because +a macropackage has to provide some stability it is not a good idea +to let users mess around with all those internals, because of +potential interference. On the other hand, normally users operate +on top of a kernel using some kind of \API\ and history has +proved that macro packages are stable enough for this. + +Sometime around 2010 the team expects \LUATEX\ to be feature +complete and stable. By that time I can probably provide a more +detailed categorization. + +\stopcomponent |