diff options
Diffstat (limited to 'doc')
15 files changed, 11598 insertions, 0 deletions
diff --git a/doc/context/documents/general/manuals/luatex.pdf b/doc/context/documents/general/manuals/luatex.pdf Binary files differnew file mode 100644 index 000000000..52f96d25a --- /dev/null +++ b/doc/context/documents/general/manuals/luatex.pdf diff --git a/doc/context/sources/general/manuals/luatex/luatex-contents.tex b/doc/context/sources/general/manuals/luatex/luatex-contents.tex new file mode 100644 index 000000000..f002716b1 --- /dev/null +++ b/doc/context/sources/general/manuals/luatex/luatex-contents.tex @@ -0,0 +1,20 @@ +\environment luatex-style +\environment luatex-logos + +\startcomponent luatex-contents + +\starttitle[title=Contents] + +\start + + \definecolor[maincolor][black] + + \placecontent + [criterium=text, + level=subsection] + +\stop + +\stoptitle + +\stopcomponent diff --git a/doc/context/sources/general/manuals/luatex/luatex-enhancements.tex b/doc/context/sources/general/manuals/luatex/luatex-enhancements.tex new file mode 100644 index 000000000..c16c66d62 --- /dev/null +++ b/doc/context/sources/general/manuals/luatex/luatex-enhancements.tex @@ -0,0 +1,708 @@ +\environment luatex-style +\environment luatex-logos + +\startcomponent luatex-enhancements + +\startchapter[reference=enhancements,title={Basic \TEX\ enhancements}] + +\section{Introduction} + +From day one, \LUATEX\ has offered extra features compared to the superset of +\PDFTEX\ and \ALEPH. That has not been limited to the possibility to execute +\LUA\ code via \type {\directlua}, but \LUATEX\ also adds functionality via new +\TEX-side primitives. + +When \LUATEX\ starts up in \quote {iniluatex} mode (\type {luatex -ini}), it +defines only the primitive commands known by \TEX82 and the one extra command +\type {\directlua}. As is fitting, a \LUA\ function has to be called to add the +extra primitives to the user environment. The simplest method to get access to +all of the new primitive commands is by adding this line to the format generation +file: + +\starttyping +\directlua { tex.enableprimitives('',tex.extraprimitives()) } +\stoptyping + +But be aware that the curly braces may not have the proper \type {\catcode} +assigned to them at this early time (giving a \quote {Missing number} error), so +it may be needed to put these assignments before the above line: + +\starttyping +\catcode `\{=1 +\catcode `\}=2 +\stoptyping + +More fine-grained primitives control is possible, you can look up the details in +\in {section} [luaprimitives]. For simplicity's sake, this manual assumes that you +have executed the \type {\directlua} command as given above. + +The startup behavior documented above is considered stable in the sense that +there will not be backward|-|incompatible changes any more. However, we can +decide to promite some primitives to the \LUATEX\ namespace. For instance, after +version 0.80.1 we promoted some rather generic \PDFTEX\ primitives to core +\LUATEX\ ones, and the ones inherited frome \ALEPH\ (\OMEGA) are also promoted. +Effectively this means that we now have the \type {tex}, \type {etex}, \type +{luatex} and \type {pdftex} (sub)sets left. + +\section{Version information} + +There are three new primitives to test the version of \LUATEX: + +\starttabulate[|l|p|p|] +\NC \bf primitive \NC \bf explanation \NC \bf value \NC \NR +\NC \type {\luatexbanner} \NC the banner reported on the command line \NC \luatexbanner \NC \NR +\NC \type {\luatexversion} \NC a combination of major and minor number \NC \the\luatexversion \NC \NR +\NC \type {\luatexrevision} \NC the revision number, the current value is \NC \luatexrevision \NC \NR +\stoptabulate + +The official \LUATEX\ version is defined as follows: + +\startitemize +\startitem + The major version is the integer result of \type {\luatexversion} divided by + 100. The primitive is an \quote {internal variable}, so you may need to prefix + its use with \type {\the} depending on the context. +\stopitem +\startitem + The minor version is the two-digit result of \type {\luatexversion} modulo 100. +\stopitem +\startitem + The revision is the given by \type {\luatexrevision}. This primitive expands to + a positive integer. +\stopitem +\startitem + The full version number consists of the major version, minor version and + revision, separated by dots. +\stopitem +\stopitemize + +\section{\UNICODE\ text support} + +Text input and output is now considered to be \UNICODE\ text, so input characters +can use the full range of \UNICODE\ ($2^{20}+2^{16}-1 = \hbox{0x10FFFF}$). Later +chapters will talk of characters and glyphs. Although these are not +interchangeable, they are closely related. During typesetting, a character is +always converted to a suitable graphic representation of that character in a +specific font. However, while processing a list of to|-|be|-|typeset nodes, its +contents may still be seen as a character. Inside \LUATEX\ there is no clear +separation between the two concepts. Because the subtype of a glyph node can be +changed in \LUA\ it is lso up to the user. + +A few primitives are affected by this, all in a similar fashion: each of them has +to accommodate for a larger range of acceptable numbers. For instance, \type +{\char} now accepts values between~0 and $1{,}114{,}111$. This should not be a +problem for well|-|behaved input files, but it could create incompatibilities for +input that would have generated an error when processed by older \TEX|-|based +engines. The affected commands with an altered initial (left of the equals sign) +or secondary (right of the equals sign) value are: \type {\char}, \type +{\lccode}, \type {\uccode}, \type {\catcode}, \type {\sfcode}, \type {\efcode}, +\type {\lpcode}, \type {\rpcode}, \type {\chardef}. + +As far as the core engine is concerned, all input and output to text files is +\UTF-8 encoded. Input files can be pre|-|processed using the \type {reader} +callback. This will be explained in a later chapter. + +Output in byte|-|sized chunks can be achieved by using characters just outside of +the valid \UNICODE\ range, starting at the value $1{,}114{,}112$ (0x110000). When +the time comes to print a character $c>=1{,}114{,}112$, \LUATEX\ will actually +print the single byte corresponding to $c$ minus 1{,}114{,}112. + +Output to the terminal uses \type {^^} notation for the lower control range +($c<32$), with the exception of \type {^^I}, \type {^^J} and \type {^^M}. These +are considered \quote {safe} and therefore printed as-is. + +Normalization of the \UNICODE\ input can be handled by a macro package during +callback processing (this will be explained in \in{section}[iocallback]). + +\section{Extended tables} + +All traditional \TEX\ and \ETEX\ registers can be 16-bit numbers. The affected +commands are: + +\startfourcolumns +\starttyping +\count +\dimen +\skip +\muskip +\marks +\toks +\countdef +\dimendef +\skipdef +\muskipdef +\toksdef +\insert +\box +\unhbox +\unvbox +\copy +\unhcopy +\unvcopy +\wd +\ht +\dp +\setbox +\vsplit +\stoptyping +\stopfourcolumns + +The glyph properties \type {\efcode}, \type {\lpcode} and \type {\rpcode}, +introduced in \PDFTEX\ that deal with font expansion (hz) and character +protruding, are also 16-bit. Because font memory management has been rewritten, +these character properties are no longer shared among fonts instances that +originate from the same metric file. + +\section{Attributes} + +\subsection{Attribute registers} + +Attributes are a completely new concept in \LUATEX. Syntactically, they behave a +lot like counters: attributes obey \TEX's nesting stack and can be used after +\type {\the} etc.\ just like the normal \type {\count} registers. + +\startsyntax +\attribute <16-bit number> <optional equals> <32-bit number>!crlf +\attributedef <csname> <optional equals> <16-bit number> +\stopsyntax + +Conceptually, an attribute is either \quote {set} or \quote {unset}. Unset +attributes have a special negative value to indicate that they are unset, that +value is the lowest legal value: \type {-"7FFFFFFF} in hexadecimal, a.k.a. +$-2147483647$ in decimal. It follows that the value \type {-"7FFFFFFF} cannot be +used as a legal attribute value, but you {\it can\/} assign \type {-"7FFFFFFF} to +\quote {unset} an attribute. All attributes start out in this \quote {unset} +state in \INITEX. + +Attributes can be used as extra counter values, but their usefulness comes mostly +from the fact that the numbers and values of all \quote {set} attributes are +attached to all nodes created in their scope. These can then be queried from any +\LUA\ code that deals with node processing. Further information about how to use +attributes for node list processing from \LUA\ is given in~\in {chapter}[nodes]. + +\subsection{Box attributes} + +Nodes typically receive the list of attributes that is in effect when they are +created. This moment can be quite asynchronous. For example: in paragraph +building, the individual line boxes are created after the \type {\par} command has +been processed, so they will receive the list of attributes that is in effect +then, not the attributes that were in effect in, say, the first or third line of +the paragraph. + +Similar situations happen in \LUATEX\ regularly. A few of the more obvious +problematic cases are dealt with: the attributes for nodes that are created +during hyphenation, kerning and ligaturing borrow their attributes from their +surrounding glyphs, and it is possible to influence box attributes directly. + +When you assemble a box in a register, the attributes of the nodes contained in +the box are unchanged when such a box is placed, unboxed, or copied. In this +respect attributes act the same as characters that have been converted to +references to glyphs in fonts. For instance, when you use attributes to implement +color support, each node carries information about its eventual color. In that +case, unless you implement mechanisms that deal with it, applying a color to +already boxed material will have no effect. Keep in mind that this +incompatibility is mostly due to the fact that separate specials and literals are +a more unnatural approach to colors than attributes. + +It is possible to fine-tune the list of attributes that are applied to a \type +{hbox}, \type {vbox} or \type {vtop} by the use of the keyword \type {attr}. An +example: + +\starttyping +\attribute2=5 +\setbox0=\hbox {Hello} +\setbox2=\hbox attr1=12 attr2=-"7FFFFFFF{Hello} +\stoptyping + +This will set the attribute list of box~2 to $1=12$, and the attributes of box~0 +will be $2=5$. As you can see, assigning the maximum negative value causes an +attribute to be ignored. + +The \type {attr} keyword(s) should come before a \type {to} or \type {spread}, if +that is also specified. + +\section{\LUA\ related primitives} + +\subsection{\type {\directlua}} + +In order to merge \LUA\ code with \TEX\ input, a few new primitives are needed. +The primitive \type {\directlua} is used to execute \LUA\ code immediately. The +syntax is + +\startsyntax +\directlua <general text>!crlf +\directlua name <general text> <general text>!crlf +\directlua <16-bit number> <general text> +\stopsyntax + +The last \syntax {<general text>} is expanded fully, and then fed into the \LUA\ +interpreter. After reading and expansion has been applied to the \syntax +{<general text>}, the resulting token list is converted to a string as if it was +displayed using \type {\the\toks}. On the \LUA\ side, each \type {\directlua} +block is treated as a separate chunk. In such a chunk you can use the \type +{local} directive to keep your variables from interfering with those used by the +macro package. + +The conversion to and from a token list means that you normally can not use \LUA\ +line comments (starting with \type {--}) within the argument. As there typically +will be only one \quote {line} the first line comment will run on until the end +of the input. You will either need to use \TEX|-|style line comments (starting +with \%), or change the \TEX\ category codes locally. Another possibility is to +say: + +\starttyping +\begingroup +\endlinechar=10 +\directlua ... +\endgroup +\stoptyping + +Then \LUA\ line comments can be used, since \TEX\ does not replace line endings +with spaces. + +The \syntax {name <general text>} specifies the name of the \LUA\ chunk, mainly +shown in the stack backtrace of error messages created by \LUA\ code. The \syntax +{<general text>} is expanded fully, thus macros can be used to generate the chunk +name, i.e. + +\starttyping +\directlua name{\jobname:\the\inputlineno} ... +\stoptyping + +to include the name of the input file as well as the input line into the chunk +name. + +Likewise, the \syntax {<16-bit number>} designates a name of a \LUA\ chunk, but +in this case the name will be taken from the \type {lua.name} array (see the +documentation of the \type {lua} table further in this manual). + +The chunk name should not start with a \type {@}, or it will be displayed as a +file name (this is a quirk in the current \LUA\ implementation). + +The \type {\directlua} command is expandable. Since it passes \LUA\ code to the +\LUA\ interpreter its expansion from the \TEX\ viewpoint is usually empty. +However, there are some \LUA\ functions that produce material to be read by \TEX, +the so called print functions. The most simple use of these is \type +{tex.print(<string> s)}. The characters of the string \type {s} will be placed on +the \TEX\ input buffer, that is, \quote {before \TEX's eyes} to be read by \TEX\ +immediately. For example: + +\startbuffer +\count10=20 +a\directlua{tex.print(tex.count[10]+5)}b +\stopbuffer + +\typebuffer + +expands to + +\getbuffer + +Here is another example: + +\startbuffer +$\pi = \directlua{tex.print(math.pi)}$ +\stopbuffer + +\typebuffer + +will result in + +\getbuffer + +Note that the expansion of \type {\directlua} is a sequence of characters, not of +tokens, contrary to all \TEX\ commands. So formally speaking its expansion is +null, but it places material on a pseudo-file to be immediately read by \TEX, as +\ETEX's \type {\scantokens}. For a description of print functions look at \in +{section} [sec:luaprint]. + +Because the \syntax {<general text>} is a chunk, the normal \LUA\ error handling +is triggered if there is a problem in the included code. The \LUA\ error messages +should be clear enough, but the contextual information is still pretty bad. +Often, you will only see the line number of the right brace at the end of the +code. + +While on the subject of errors: some of the things you can do inside \LUA\ code +can break up \LUATEX\ pretty bad. If you are not careful while working with the +node list interface, you may even end up with assertion errors from within the +\TEX\ portion of the executable. + +The behavior documented in the above subsection is considered stable in the sense +that there will not be backward-incompatible changes any more. + +\subsection{\type {\latelua}} + +\type {\latelua} stores \LUA\ code in a whatsit that will be processed at the time +of shipping out. Its intended use is a cross between \type {\pdfliteral} and +\type {\write}. Within the \LUA\ code you can print \PDF\ statements directly to the +\PDF\ file via \type {pdf.print}, or you can write to other output streams via +\type {texio.write} or simply using \LUA\ I/O routines. + +\startsyntax +\latelua <general text>!crlf +\latelua name <general text> <general text>!crlf +\latelua <16-bit number> <general text> +\stopsyntax + +Expansion of macros etcetera in the final \type {<general text>} is delayed until +just before the whatsit is executed (like in \type {\write}). With regard to \PDF\ +output stream \type {\latelua} behaves as \type {\pdfliteral page}. The \syntax {name +<general text>} and \syntax {<16-bit number>} behave in the same way as they do +for \type {\directlua} + +\subsection{\type {\luaescapestring}} + +This primitive converts a \TEX\ token sequence so that it can be safely used as +the contents of a \LUA\ string: embedded backslashes, double and single quotes, +and newlines and carriage returns are escaped. This is done by prepending an +extra token consisting of a backslash with category code~12, and for the line +endings, converting them to \type {n} and \type {r} respectively. The token +sequence is fully expanded. + +\startsyntax +\luaescapestring <general text> +\stopsyntax + +Most often, this command is not actually the best way to deal with the +differences between the \TEX\ and \LUA. In very short bits of \LUA\ +code it is often not needed, and for longer stretches of \LUA\ code it +is easier to keep the code in a separate file and load it using \LUA's +\type {dofile}: + +\starttyping +\directlua { dofile('mysetups.lua') } +\stoptyping + +\subsection{\type {\luafunction}} + +The \type {\directlua} commands involves tokenization of its argument (after +picking up an optional name or number specification). The tokenlist is then +converted into a string and given to \LUA\ to turn into a function that is +called. The overhead is rather small but when you use this primitive hundreds or +thousands of times, it can become noticeable. For this reason there is a variant +call available: \type {\luafunction}. This command is used as follows: + +\starttyping +\directlua { + local t = lua.get_functions_table() + t[1] = function() tex.print("!") end + t[2] = function() tex.print("?") end +} + +\luafunction1 +\luafunction2 +\stoptyping + +Of course the functions can also be defined in a separate file. There is no limit +on the number of functions apart from normal \LUA\ limitations. Of course there +is the limitation of no arguments but that would involve parsing and thereby give +no gain. The function, when called in fact gets one argument, being the index, so +in the following example the number \type {8} gets typeset. + +\starttyping +\directlua { + local t = lua.get_functions_table() + t[8] = function(slot) tex.print(slot) end +} +\stoptyping + +\section{\type {\clearmarks}} + +This primitive complements the \ETEX\ mark primitives and clears a mark class +completely, resetting all three connected mark texts to empty. It is an +immediate command. + +\startsyntax +\clearmarks <16-bit number> +\stopsyntax + +\section{\type {\noligs} and \type {\nokerns}} + +These primitives prohibit ligature and kerning insertion at the time when the +initial node list is built by \LUATEX's main control loop. They are part of a +temporary trick and will be removed in the near future. For now, you need to +enable these primitives when you want to do node list processing of \quote +{characters}, where \TEX's normal processing would get in the way. + +\startsyntax +\noligs <integer>!crlf +\nokerns <integer> +\stopsyntax + +These primitives can now be implemented by overloading the ligature building and +kerning functions, i.e.\ by assigning dummy functions to their associated +callbacks. + +\section{\type {\formatname}} + +The \type {\formatname} syntax is identical to \type {\jobname}. In \INITEX, the +expansion is empty. Otherwise, the expansion is the value that \type {\jobname} had +during the \INITEX\ run that dumped the currently loaded format. + +\section{\type {\scantextokens}} + +The syntax of \type {\scantextokens} is identical to \type {\scantokens}. This +primitive is a slightly adapted version of \ETEX's \type {\scantokens}. The +differences are: + +\startitemize +\startitem + The last (and usually only) line does not have a \type {\endlinechar} + appended. +\stopitem +\startitem + \type {\scantextokens} never raises an EOF error, and it does not execute + \type {\everyeof} tokens. +\stopitem +\startitem + The \quote{\unknown\ while end of file \unknown} error tests are not + executed, allowing the expansion to end on a different grouping level or + while a conditional is still incomplete. +\stopitem +\stopitemize + +\section {Alignments} + +\subsection{\tex {alignmark}} + +This primitive duplicates the functionality of \type {#} inside alignment +preambles. + +\subsection{\tex {aligntab}} + +This primitive duplicates the functionality of \type {&} inside alignments and +preambles. + +\section{Catcode tables} + +Catcode tables are a new feature that allows you to switch to a predefined +catcode regime in a single statement. You can have a practically unlimited number +of different tables. This subsystem is backward compatible: if you never use the +following commands, your document will not notice any difference in behavior +compared to traditional \TEX. The contents of each catcode table is independent +from any other catcode tables, and their contents is stored and retrieved from +the format file. + +\subsection{\type {\catcodetable}} + +\startsyntax +\catcodetable <15-bit number> +\stopsyntax + +The primitive \type {\catcodetable} switches to a different catcode table. Such a +table has to be previously created using one of the two primitives below, or it +has to be zero. Table zero is initialized by \INITEX. + +\subsection{\type {\initcatcodetable}} + +\startsyntax +\initcatcodetable <15-bit number> +\stopsyntax + +The primitive \type {\initcatcodetable} creates a new table with catcodes identical +to those defined by \INITEX: + +\starttabulate[|r|l|l|l|l|] +\NC 0 \NC \type {\letterbackslash} \NC \NC \type {escape} \NC\NR +\NC 5 \NC \type {\letterhat\letterhat M} \NC return \NC \type {car_ret} \NC (this name may change) \NC\NR +\NC 9 \NC \type {\letterhat\letterhat @} \NC null \NC \type {ignore} \NC\NR +\NC 10 \NC \type {<space>} \NC space \NC \type {spacer} \NC\NR +\NC 11 \NC \type {a} -- \type {z} \NC \NC \type {letter} \NC\NR +\NC 11 \NC \type {A} -- \type {Z} \NC \NC \type {letter} \NC\NR +\NC 12 \NC everything else \NC \NC \type {other} \NC\NR +\NC 14 \NC \type {\letterpercent} \NC \NC \type {comment} \NC\NR +\NC 15 \NC \type {\letterhat\letterhat ?} \NC delete \NC \type {invalid_char} \NC\NR +\stoptabulate + +The new catcode table is allocated globally: it will not go away after the +current group has ended. If the supplied number is identical to the currently +active table, an error is raised. + +\subsection{\type {\savecatcodetable}} + +\startsyntax +\savecatcodetable <15-bit number> +\stopsyntax + +\type {\savecatcodetable} copies the current set of catcodes to a new table with +the requested number. The definitions in this new table are all treated as if +they were made in the outermost level. + +The new table is allocated globally: it will not go away after the current group +has ended. If the supplied number is the currently active table, an error is +raised. + +\section{Suppressing errors} + +\subsection{\type {\suppressfontnotfounderror}} + +\startsyntax +\suppressfontnotfounderror = 1 +\stopsyntax + +If this new integer parameter is non|-|zero, then \LUATEX\ will not complain +about font metrics that are not found. Instead it will silently skip the font +assignment, making the requested csname for the font \type {\ifx} equal to +\type {\nullfont}, so that it can be tested against that without bothering the user. + +\subsection{\type {\suppresslongerror}} + +\startsyntax +\suppresslongerror = 1 +\stopsyntax + +If this new integer parameter is non|-|zero, then \LUATEX\ will not complain +about \type {\par} commands encountered in contexts where that is normally +prohibited (most prominently in the arguments of non-long macros). + +\subsection{\type {\suppressifcsnameerror}} + +\startsyntax +\suppressifcsnameerror = 1 +\stopsyntax + +If this new integer parameter is non|-|zero, then \LUATEX\ will not complain +about non-expandable commands appearing in the middle of a \type {\ifcsname} +expansion. Instead, it will keep getting expanded tokens from the input until it +encounters an \type {\endcsname} command. Use with care! This command is +experimental: if the input expansion is unbalanced wrt. \type {\csname} \ldots +\type {\endcsname} pairs, the \LUATEX\ process may hang indefinitely. + +\subsection{\type {\suppressoutererror}} + +\startsyntax +\suppressoutererror = 1 +\stopsyntax + +If this new integer parameter is non|-|zero, then \LUATEX\ will not complain +about \type {\outer} commands encountered in contexts where that is normally +prohibited. + +\subsection{\type {\suppressmathparerror}} + +The following setting will permit \par tokens in a math formula: + +\startsyntax +\suppressmathparerror = 1 +\stopsyntax + +So, the next code is valid then: + +\starttyping +$ x + 1 = + +a $ +\stoptyping + +\section{\type {\matheqnogapstep}} + +By default \TEX\ will add one quad between the equation and the number. This is +hardcoded. A new primitive can control this: + +\startsyntax +\matheqnogapstep = 1000 +\stopsyntax + +Because a math quad from the math text font is used instead of a dimension, we +use a step to control the size. A value of zero will suppress the gap. The step +is divided by 1000 which is the usual way to mimmick floating point factors in +\TEX. + +\section{\type {\outputbox}} + +\startsyntax +\outputbox = 65535 +\stopsyntax + +This new integer parameter allows you to alter the number of the box that will be +used to store the page sent to the output routine. Its default value is 255, and +the acceptable range is from 0 to 65535. + +\section{\type {\fontid}} + +\startsyntax +\fontid\font +\stopsyntax + +This primitive expands into a number. It is not a register so there is no need to +prefix with \type {\number} (and using \type {\the} gives an error). The currently +used font id is \fontid\font. Here are some more: + +\starttabulate[|l|c|] +\NC \type {\bf} \NC \bf \fontid\font \NC \NR +\NC \type {\it} \NC \it \fontid\font \NC \NR +\NC \type {\bi} \NC \bi \fontid\font \NC \NR +\stoptabulate + +These numbers depend on the macro package used because each one has its own way +of dealing with fonts. They can also differ per run, as they can depend on the +order of loading fonts. For instance, when in \CONTEXT\ virtual math \UNICODE\ +fonts are used, we can easily get over a hundred ids in use. Not all ids have to +be bound to a real font, after all it's just a number. + +\section{\type {\gleaders}} + +This type of leaders is anchored to the origin of the box to be shipped out. So +they are like normal \type {\leaders} in that they align nicely, except that the +alignment is based on the {\it largest\/} enclosing box instead of the {\it +smallest\/}. The \type {g} stresses this global nature. + +\section{\type {\Uchar}} + +The expandable command \type {\Uchar} reads a number between~0 and $1{,}114{,}111$ +and expands to the associated \UNICODE\ character. + +\section{\type {\hyphenationmin}} + +This primitive can be used to set the minimal word length, so setting it to a value +of~$5$ means that only words of 6 characters and more will be hyphenated, of course +within the constraints of the \type {\lefthyphenmin} and \type {\righthyphenmin} +values (as stored in the glyph node). This primitive accepts a number and stores +the value with the language. + +\section{Debugging} + +If \type {\tracingonline} is larger than~2, the node list display will also print +the node number of the nodes. + +\section{Images and Forms} + +\LUATEX\ accepts optional dimension parameters for \type {\pdfrefximage} and +\type {\pdfrefxform} in the same format as for \type {\pdfximage}. With images, +these dimensions are then used instead of the ones given to \type {\pdfximage} +but the original dimensions are not overwritten, so that a \type {\pdfrefximage} +without dimensions still provides the image with dimensions defined by \type +{\pdfximage}. These optional parameters are not implemented for \type +{\pdfxform}. + +\starttyping +\pdfrefximage width 20mm height 10mm depth 5mm \pdflastximage +\pdfrefxform width 20mm height 10mm depth 5mm \pdflastxform +\stoptyping + +\section{File syntax} + +\LUATEX\ will accept a braced argument as a file name: + +\starttyping +\input {plain} +\openin 0 {plain} +\stoptyping + +This allows for embedded spaces, without the need for double quotes. Macro +expansion takes place inside the argument. + +\section{Font syntax} + +\LUATEX\ will accept a braced argument as a font name: + +\starttyping +\font\myfont = {cmr10} +\stoptyping + +This allows for embedded spaces, without the need for double quotes. Macro +expansion takes place inside the argument. + +\stopchapter + +\stopcomponent diff --git a/doc/context/sources/general/manuals/luatex/luatex-fonts.tex b/doc/context/sources/general/manuals/luatex/luatex-fonts.tex new file mode 100644 index 000000000..8ea4058a6 --- /dev/null +++ b/doc/context/sources/general/manuals/luatex/luatex-fonts.tex @@ -0,0 +1,618 @@ +\environment luatex-style +\environment luatex-logos + +\startcomponent luatex-fonts + +\startchapter[reference=fonts,title={Font structure}] + +All \TEX\ fonts are represented to \LUA\ code as tables, and internally as +\CCODE~structures. All keys in the table below are saved in the internal font +structure if they are present in the table returned by the \type {define_font} +callback, or if they result from the normal \TFM|/|\VF\ reading routines if there +is no \type {define_font} callback defined. + +The column \quote {from \VF} means that this key will be created by the \type +{font.read_vf()} routine, \quote {from \TFM} means that the key will be created +by the \type {font.read_tfm()} routine, and \quote{used} means whether or not +the \LUATEX\ engine itself will do something with the key. + +The top|-|level keys in the table are as follows: + +\starttabulate[|Tl|l|l|l|l|p|] +\NC \ssbf key \NC \bf from vf \NC \bf from tfm \NC \bf used\NC \bf value type \NC + \bf description +\NC \NR +\NC name \NC yes \NC yes \NC yes \NC string \NC + metric (file) name +\NC \NR +\NC area \NC no \NC yes \NC yes \NC string \NC + (directory) location, typically empty +\NC \NR +\NC used \NC no \NC yes \NC yes \NC boolean\NC + used already? (initial: false) +\NC \NR +\NC characters \NC yes \NC yes \NC yes \NC table \NC + the defined glyphs of this font +\NC \NR +\NC checksum \NC yes \NC yes \NC no \NC number \NC + default: 0 +\NC \NR +\NC designsize \NC no \NC yes \NC yes \NC number \NC + expected size (default: 655360 == 10pt) +\NC \NR +\NC direction \NC no \NC yes \NC yes \NC number \NC + default: 0 (TLT) +\NC \NR +\NC encodingbytes \NC no \NC no \NC yes \NC number \NC + default: depends on \type {format} +\NC \NR +\NC encodingname \NC no \NC no \NC yes \NC string \NC + encoding name +\NC \NR +\NC fonts \NC yes \NC no \NC yes \NC table \NC + locally used fonts +\NC \NR +\NC psname \NC no \NC no \NC yes \NC string \NC + actual (\POSTSCRIPT) name (this is the PS fontname in the incoming font + source, also used as fontname identifier in the \PDF\ output, new in 0.43) +\NC \NR +\NC fullname \NC no \NC no \NC yes \NC string \NC + output font name, used as a fallback in the \PDF\ output + if the psname is not set +\NC \NR +\NC header \NC yes \NC no \NC no \NC string \NC + header comments, if any +\NC \NR +\NC hyphenchar \NC no \NC no \NC yes \NC number \NC + default: TeX's \type {\hyphenchar} +\NC \NR +\NC parameters \NC no \NC yes \NC yes \NC hash \NC + default: 7 parameters, all zero +\NC \NR +\NC size \NC no \NC yes \NC yes \NC number \NC + loaded (at) size. (default: same as designsize) +\NC \NR +\NC skewchar \NC no \NC no \NC yes \NC number \NC + default: TeX's \type {\skewchar} +\NC \NR +\NC type \NC yes \NC no \NC yes \NC string \NC + basic type of this font +\NC \NR +\NC format \NC no \NC no \NC yes \NC string \NC + disk format type +\NC \NR +\NC embedding \NC no \NC no \NC yes \NC string \NC + \PDF\ inclusion +\NC \NR +\NC filename \NC no \NC no \NC yes \NC string \NC + disk file name +\NC \NR +\NC tounicode \NC no \NC yes \NC yes \NC number \NC + if 1, \LUATEX\ assumes per-glyph tounicode entries are + present in the font +\NC \NR +\NC stretch \NC no \NC no \NC yes \NC number \NC + the \quote {stretch} value from \type {\expandglyphsinfont} +\NC \NR +\NC shrink \NC no \NC no \NC yes \NC number \NC + the \quote {shrink} value from \type {\expandglyphsinfont} +\NC \NR +\NC step \NC no \NC no \NC yes \NC number \NC + the \quote {step} value from \type {\expandglyphsinfont} +\NC \NR +\NC auto_expand \NC no \NC no \NC yes \NC boolean\NC + the \quote {autoexpand} keyword from\crlf \type {\expandglyphsinfont} +\NC \NR +\NC expansion_factor \NC no \NC no \NC no \NC number \NC + the actual expansion factor of an expanded font +\NC \NR +\NC attributes \NC no \NC no \NC yes \NC string \NC + the \type {\pdffontattr} +\NC \NR +\NC cache \NC no \NC no \NC yes \NC string \NC + this key controls caching of the lua table on the \type {tex} end. \type {yes}: + use a reference to the table that is passed to \LUATEX\ (this is the + default). \type {no}: don't store the table reference, don't cache any lua + data for this font. \type {renew}: don't store the table reference, but save a + reference to the table that is created at the first access to one of its + fields in font.fonts. (new in 0.40.0, before that caching was always + \type {yes}). Note: the saved reference is thread-local, so be careful when + you are using coroutines: an error will be thrown if the table has been + cached in one thread, but you reference it from another thread ($\approx$ + coroutine) +\NC \NR +\NC nomath \NC no \NC no \NC yes \NC boolean\NC + this key allows a minor speedup for text fonts. if it is present and true, + then \LUATEX\ will not check the character enties for math-specific keys. +\NC \NR +\NC slant \NC no \NC no \NC yes \NC number \NC + This has the same semantics as the \type {SlantFont} operator in font map + files. +\NC \NR +\NC extent \NC no \NC no \NC yes \NC number \NC + This has the same semantics as the \type {ExtendFont} operator in font map + files. +\NC \NR +\stoptabulate + +The key \type {name} is always required. The keys \type {stretch}, \type +{shrink}, \type {step} and optionally \type {auto_expand} only have meaning when +used together: they can be used to replace a post|-|loading \type +{\expandglyphsinfont} command. The \type {expansion_factor} is value that can be +present inside a font in \type {font.fonts}. It is the actual expansion factor (a +value between \type {-shrink} and \type {stretch}, with step \type {step}) of a +font that was automatically generated by the font expansion algorithm. The key +\type {attributes} can be used to replace \type {\pdffontattr}. The key \type {used} +is set by the engine when a font is actively in use, this makes sure that the +font's definition is written to the output file (\DVI\ or \PDF). The \TFM\ reader +sets it to false. The \type {direction} is a number signalling the \quote +{normal} direction for this font. There are sixteen possibilities: + +\starttabulate[|Tc|c|c|c|] +\NC \ssbf number \NC \bf meaning \NC \bf number \NC \bf meaning \NC\NR +\NC 0 \NC LT \NC 8 \NC TT \NC\NR +\NC 1 \NC LL \NC 9 \NC TL \NC\NR +\NC 2 \NC LB \NC 10 \NC TB \NC\NR +\NC 3 \NC LR \NC 11 \NC TR \NC\NR +\NC 4 \NC RT \NC 12 \NC BT \NC\NR +\NC 5 \NC RL \NC 13 \NC BL \NC\NR +\NC 6 \NC RB \NC 14 \NC BB \NC\NR +\NC 7 \NC RR \NC 15 \NC BR \NC\NR +\stoptabulate + +These are \OMEGA|-|style direction abbreviations: the first character indicates +the \quote {first} edge of the character glyphs (the edge that is seen first in +the writing direction), the second the \quote {top} side. + +The \type {parameters} is a hash with mixed key types. There are seven possible +string keys, as well as a number of integer indices (these start from 8 up). The +seven strings are actually used instead of the bottom seven indices, because that +gives a nicer user interface. + +The names and their internal remapping are: + +\starttabulate[|lT|c|] +\NC \ssbf name \NC \bf internal remapped number \NC\NR +\NC slant \NC 1 \NC\NR +\NC space \NC 2 \NC\NR +\NC space_stretch \NC 3 \NC\NR +\NC space_shrink \NC 4 \NC\NR +\NC x_height \NC 5 \NC\NR +\NC quad \NC 6 \NC\NR +\NC extra_space \NC 7 \NC\LR +\stoptabulate + +The keys \type {type}, \type {format}, \type {embedding}, \type {fullname} and +\type {filename} are used to embed \OPENTYPE\ fonts in the result \PDF. + +The \type {characters} table is a list of character hashes indexed by an integer +number. The number is the \quote {internal code} \TEX\ knows this character by. + +Two very special string indexes can be used also: \type {left_boundary} is a +virtual character whose ligatures and kerns are used to handle word boundary +processing. \type {right_boundary} is similar but not actually used for anything +(yet!). + +Other index keys are ignored. + +Each character hash itself is a hash. For example, here is the character \quote +{f} (decimal 102) in the font cmr10 at 10 points: + +\starttyping +[102] = { + ['width'] = 200250, + ['height'] = 455111, + ['depth'] = 0, + ['italic'] = 50973, + ['kerns'] = { + [63] = 50973, + [93] = 50973, + [39] = 50973, + [33] = 50973, + [41] = 50973 + }, + ['ligatures'] = { + [102] = { + ['char'] = 11, + ['type'] = 0 + }, + [108] = { + ['char'] = 13, + ['type'] = 0 + }, + [105] = { + ['char'] = 12, + ['type'] = 0 + } + } +} +\stoptyping + +The following top|-|level keys can be present inside a character hash: + +\starttabulate[|lT|c|c|c|l|p|] +\NC \ssbf key \NC \bf from vf \NC \bf from tfm \NC \bf used \NC \bf value type \NC \bf description \NC\NR +\NC width \NC yes \NC yes \NC yes \NC number \NC character's width, in sp (default 0) \NC\NR +\NC height \NC no \NC yes \NC yes \NC number \NC character's height, in sp (default 0) \NC\NR +\NC depth \NC no \NC yes \NC yes \NC number \NC character's depth, in sp (default 0) \NC\NR +\NC italic \NC no \NC yes \NC yes \NC number \NC character's italic correction, in sp (default zero) \NC\NR +\NC top_accent \NC no \NC no \NC maybe \NC number \NC character's top accent alignment place, in sp (default zero) \NC\NR +\NC bot_accent \NC no \NC no \NC maybe \NC number \NC character's bottom accent alignment place, in sp (default zero) \NC\NR +\NC left_protruding \NC no \NC no \NC maybe \NC number \NC character's \type {\lpcode} \NC\NR +\NC right_protruding \NC no \NC no \NC maybe \NC number \NC character's \type {\rpcode} \NC\NR +\NC expansion_factor \NC no \NC no \NC maybe \NC number \NC character's \type {\efcode} \NC\NR +\NC tounicode \NC no \NC no \NC maybe \NC string \NC character's Unicode equivalent(s), in UTF-16BE hexadecimal format\NC\NR +\NC next \NC no \NC yes \NC yes \NC number \NC the \quote{next larger} character index \NC\NR +\NC extensible \NC no \NC yes \NC yes \NC table \NC the constituent parts of an extensible recipe \NC\NR +\NC vert_variants \NC no \NC no \NC yes \NC table \NC constituent parts of a vertical variant set\NC \NR +\NC horiz_variants \NC no \NC no \NC yes \NC table \NC constituent parts of a horizontal variant set\NC \NR +\NC kerns \NC no \NC yes \NC yes \NC table \NC kerning information \NC\NR +\NC ligatures \NC no \NC yes \NC yes \NC table \NC ligaturing information \NC\NR +\NC commands \NC yes \NC no \NC yes \NC array \NC virtual font commands \NC\NR +\NC name \NC no \NC no \NC no \NC string \NC the character (\POSTSCRIPT) name \NC\NR +\NC index \NC no \NC no \NC yes \NC number \NC the (\OPENTYPE\ or \TRUETYPE) font glyph index \NC\NR +\NC used \NC no \NC yes \NC yes \NC boolean \NC typeset already (default: false)? \NC\NR +\NC mathkern \NC no \NC no \NC yes \NC table \NC math cut-in specifications \NC\NR +\stoptabulate + +The values of \type {top_accent}, \type {bot_accent} and \type {mathkern} are +used only for math accent and superscript placement, see the \at {math chapter} +[math] in this manual for details. + +The values of \type {left_protruding} and \type {right_protruding} are used only +when \type {\protrudechars} is non-zero. + +Whether or not \type {expansion_factor} is used depends on the font's global +expansion settings, as well as on the value of \type {\adjustspacing}. + +The usage of \type {tounicode} is this: if this font specifies a \type +{tounicode=1} at the top level, then \LUATEX\ will construct a \type {/ToUnicode} +entry for the \PDF\ font (or font subset) based on the character|-|level \type +{tounicode} strings, where they are available. If a character does not have a +sensible \UNICODE\ equivalent, do not provide a string either (no empty strings). + +If the font-level \type {tounicode} is not set, then \LUATEX\ will build up \type +{/ToUnicode} based on the \TEX\ code points you used, and any character-level +\type {tounicodes} will be ignored. {\it At the moment, the string format is +exactly the format that is expected by Adobe \CMAP\ files (\UTF-16BE in +hexadecimal encoding), minus the enclosing angle brackets. This may change in the +future.} Small example: the \type {tounicode} for a \type {fi} ligature would be +\type {00660069}. + +The presence of \type {extensible} will overrule \type {next}, if that is also +present. It in in turn can be overruled by \type {vert_variants}. + +The \type {extensible} table is very simple: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf description \NC\NR +\NC top \NC number \NC \quote{top} character index \NC\NR +\NC mid \NC number \NC \quote{middle} character index \NC\NR +\NC bot \NC number \NC \quote{bottom} character index \NC\NR +\NC rep \NC number \NC \quote{repeatable} character index \NC\NR +\stoptabulate + +The \type {horiz_variants} and \type {vert_variants} are arrays of components. +Each of those components is itself a hash of up to five keys: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC\NR +\NC glyph \NC number \NC The character index (note that this is an encoding number, not a name). \NC \NR +\NC extender \NC number \NC One (1) if this part is repeatable, zero (0) otherwise. \NC \NR +\NC start \NC number \NC Maximum overlap at the starting side (in scaled points). \NC \NR +\NC end \NC number \NC Maximum overlap at the ending side (in scaled points). \NC \NR +\NC advance \NC number \NC Total advance width of this item (can be zero or missing, + then the natural size of the glyph for character \type {component} + is used). \NC \NR +\stoptabulate + +The \type {kerns} table is a hash indexed by character index (and \quote +{character index} is defined as either a non|-|negative integer or the string +value \type {right_boundary}), with the values the kerning to be applied, in +scaled points. + +The \type {ligatures} table is a hash indexed by character index (and \quote +{character index} is defined as either a non|-|negative integer or the string +value \type {right_boundary}), with the values being yet another small hash, with +two fields: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf description \NC \NR +\NC type \NC number \NC the type of this ligature command, default 0 \NC \NR +\NC char \NC number \NC the character index of the resultant ligature \NC \NR +\stoptabulate + +The \type {char} field in a ligature is required. + +The \type {type} field inside a ligature is the numerical or string value of one +of the eight possible ligature types supported by \TEX. When \TEX\ inserts a new +ligature, it puts the new glyph in the middle of the left and right glyphs. The +original left and right glyphs can optionally be retained, and when at least one +of them is kept, it is also possible to move the new \quote {insertion point} +forward one or two places. The glyph that ends up to the right of the insertion +point will become the next \quote {left}. + +\starttabulate[|l|c|l|l|] +\NC \bf textual (Knuth) \NC \bf number \NC \bf string \NC result \NC\NR +\NC l + r =: n \NC 0 \NC \type {=:} \NC \|n \NC\NR +\NC l + r =:\| n \NC 1 \NC \type {=:|} \NC \|nr \NC\NR +\NC l + r \|=: n \NC 2 \NC \type {|=:} \NC \|ln \NC\NR +\NC l + r \|=:\| n \NC 3 \NC \type {|=:|} \NC \|lnr \NC\NR +\NC l + r =:\|\> n \NC 5 \NC \type {=:|>} \NC n\|r \NC\NR +\NC l + r \|=:\> n \NC 6 \NC \type {|=:>} \NC l\|n \NC\NR +\NC l + r \|=:\|\> n \NC 7 \NC \type {|=:|>} \NC l\|nr \NC\NR +\NC l + r \|=:\|\>\> n \NC 11 \NC \type {|=:|>>} \NC ln\|r \NC\NR +\stoptabulate + +The default value is~0, and can be left out. That signifies a \quote {normal} +ligature where the ligature replaces both original glyphs. In this table the~\| +indicates the final insertion point. + +The \type {commands} array is explained below. + +\section {Real fonts} + +Whether or not a \TEX\ font is a \quote {real} font that should be written to the +\PDF\ document is decided by the \type {type} value in the top|-|level font +structure. If the value is \type {real}, then this is a proper font, and the +inclusion mechanism will attempt to add the needed font object definitions to the +\PDF. + +Values for \type {type}: + +\starttabulate[|Tl|p|] +\NC \ssbf value \NC \bf description \NC\NR +\NC real \NC this is a base font \NC\NR +\NC virtual \NC this is a virtual font \NC\NR +\stoptabulate + +The actions to be taken depend on a number of different variables: + +\startitemize[packed] +\startitem + Whether the used font fits in an 8-bit encoding scheme or not. +\stopitem +\startitem + The type of the disk font file. +\stopitem +\startitem + The level of embedding requested. +\stopitem +\stopitemize + +A font that uses anything other than an 8-bit encoding vector has to be written +to the \PDF\ in a different way. + +The rule is: if the font table has \type {encodingbytes} set to~2, then this is a +wide font, in all other cases it isn't. The value~2 is the default for \OPENTYPE\ +and \TRUETYPE\ fonts loaded via \LUA. For \TYPEONE\ fonts, you have to set \type +{encodingbytes} to~2 explicitly. For \PK\ bitmap fonts, wide font encoding is not +supported at all. + +If no special care is needed, \LUATEX\ currently falls back to the +mapfile|-|based solution used by \PDFTEX\ and \DVIPS. This behavior will be +removed in the future, when the existing code becomes integrated in the new +subsystem. + +But if this is a \quote {wide} font, then the new subsystem kicks in, and some +extra fields have to be present in the font structure. In this case, \LUATEX\ +does not use a map file at all. + +The extra fields are: \type {format}, \type {embedding}, \type {fullname}, \type +{cidinfo} (as explained above), \type {filename}, and the \type {index} key in +the separate characters. + +Values for \type {format} are: + +\starttabulate[|Tl|p|] +\NC \ssbf value \NC \bf description \NC \NR +\NC type1 \NC this is a \POSTSCRIPT\ \TYPEONE\ font \NC \NR +\NC type3 \NC this is a bitmapped (\PK) font \NC \NR +\NC truetype \NC this is a \TRUETYPE\ or \TRUETYPE|-|based \OPENTYPE\ font \NC \NR +\NC opentype \NC this is a \POSTSCRIPT|-|based \OPENTYPE\ font \NC \NR +\stoptabulate + +\type {type3} fonts are provided for backward compatibility only, and do not +support the new wide encoding options. + +Values for \type {embedding} are: + +\starttabulate[|Tl|p|] +\NC \ssbf value \NC \bf description \NC \NR +\NC no \NC don't embed the font at all \NC \NR +\NC subset \NC include and atttempt to subset the font \NC \NR +\NC full \NC include this font in its entirety \NC \NR +\stoptabulate + +It is not possible to artificially modify the transformation matrix +for the font at the moment. + +The other fields are used as follows: The \type {fullname} will be the +\POSTSCRIPT|/|\PDF\ font name. The \type {cidinfo} will be used as the character +set (the CID \type {/Ordering} and \type {/Registry} keys). The \type {filename} +points to the actual font file. If you include the full path in the \type +{filename} or if the file is in the local directory, \LUATEX\ will run a little +bit more efficient because it will not have to re|-|run the \type {find_xxx_file} +callback in that case. + +Be careful: when mixing old and new fonts in one document, it is possible to +create \POSTSCRIPT\ name clashes that can result in printing errors. When this +happens, you have to change the \type {fullname} of the font. + +Typeset strings are written out in a wide format using 2~bytes per glyph, using +the \type {index} key in the character information as value. The overall effect +is like having an encoding based on numbers instead of traditional (\POSTSCRIPT) +name|-|based reencoding. The way to get the correct \type {index} numbers for +\TYPEONE\ fonts is by loading the font via \type {fontloader.open}; use the table +indices as \type {index} fields. + +This type of reencoding means that there is no longer a clear connection between +the text in your input file and the strings in the output \PDF\ file. Dealing +with this is high on the agenda. + +\section[virtualfonts]{Virtual fonts} + +You have to take the following steps if you want \LUATEX\ to treat the returned +table from \type {define_font} as a virtual font: + +\startitemize[packed] +\startitem + Set the top|-|level key \type {type} to \type {virtual}. +\stopitem +\startitem + Make sure there is at least one valid entry in \type {fonts} (see below). +\stopitem +\startitem + Give a \type {commands} array to every character (see below). +\stopitem +\stopitemize + +The presence of the toplevel \type {type} key with the specific value \type +{virtual} will trigger handling of the rest of the special virtual font fields in +the table, but the mere existence of 'type' is enough to prevent \LUATEX\ from +looking for a virtual font on its own. + +Therefore, this also works \quote {in reverse}: if you are absolutely certain +that a font is not a virtual font, assigning the value \type {base} or \type +{real} to \type {type} will inhibit \LUATEX\ from looking for a virtual font +file, thereby saving you a disk search. + +The \type {fonts} is another \LUA\ array. The values are one- or two|-|key +hashes themselves, each entry indicating one of the base fonts in a virtual font. +In case your font is referring to itself, you can use the \type {font.nextid()} +function which returns the index of the next to be defined font which is probably +the currently defined one. + +An example makes this easy to understand + +\starttyping +fonts = { + { name = 'ptmr8a', size = 655360 }, + { name = 'psyr', size = 600000 }, + { id = 38 } +} +\stoptyping + +says that the first referenced font (index 1) in this virtual font is \type +{ptrmr8a} loaded at 10pt, and the second is \type {psyr} loaded at a little over +9pt. The third one is previously defined font that is known to \LUATEX\ as fontid +\quote {38}. + +The array index numbers are used by the character command definitions that are +part of each character. + +The \type {commands} array is a hash where each item is another small array, +with the first entry representing a command and the extra items being the +parameters to that command. The allowed commands and their arguments are: + +\starttabulate[|Tl|l|l|p|] +\NC \ssbf command name \NC \bf arguments \NC \bf arg type \NC \bf description \NC\NR +\NC font \NC 1 \NC number \NC select a new font from the local \type {fonts} table\NC\NR +\NC char \NC 1 \NC number \NC typeset this character number from the current font, + and move right by the character's width\NC\NR +\NC node \NC 1 \NC node \NC output this node (list), and move right + by the width of this list\NC\NR +\NC slot \NC 2 \NC number \NC a shortcut for the combination of a font and char command\NC\NR +\NC push \NC 0 \NC \NC save current position\NC\NR +\NC nop \NC 0 \NC \NC do nothing \NC\NR +\NC pop \NC 0 \NC \NC pop position \NC\NR +\NC rule \NC 2 \NC 2 numbers \NC output a rule $ht*wd$, and move right.\NC\NR +\NC down \NC 1 \NC number \NC move down on the page\NC\NR +\NC right \NC 1 \NC number \NC move right on the page\NC\NR +\NC special \NC 1 \NC string \NC output a \type {\special} command\NC\NR +\NC lua \NC 1 \NC string \NC execute a \LUA\ script (at \type {\latelua} time)\NC\NR +\NC image \NC 1 \NC image \NC output an image (the argument can be either an \type + {<image>} variable or an \type {image_spec} table)\NC\NR +\NC comment \NC any \NC any \NC the arguments of this command are ignored\NC\NR +\stoptabulate + +Here is a rather elaborate glyph commands example: + +\starttyping +... +commands = { + { 'push' }, -- remember where we are + { 'right', 5000 }, -- move right about 0.08pt + { 'font', 3 }, -- select the fonts[3] entry + { 'char', 97 }, -- place character 97 (ASCII 'a') + { 'pop' }, -- go all the way back + { 'down', -200000 }, -- move upwards by about 3pt + { 'special', 'pdf: 1 0 0 rg' } -- switch to red color + { 'rule', 500000, 20000 } -- draw a bar + { 'special','pdf: 0 g' } -- back to black +} +... +\stoptyping + +The default value for \type {font} is always~1 at the start of the +\type {commands} array. Therefore, if the virtual font is essentially only a +re|-|encoding, then you do usually not have create an explicit \quote {font} +command in the array. + +Rules inside of \type {commands} arrays are built up using only two dimensions: +they do not have depth. For correct vertical placement, an extra \type {down} +command may be needed. + +Regardless of the amount of movement you create within the \type {commands}, the +output pointer will always move by exactly the width that was given in the \type +{width} key of the character hash. Any movements that take place inside the \type +{commands} array are ignored on the upper level. + +\subsection{Artificial fonts} + +Even in a \quote {real} font, there can be virtual characters. When \LUATEX\ +encounters a \type {commands} field inside a character when it becomes time to +typeset the character, it will interpret the commands, just like for a true +virtual character. In this case, if you have created no \quote {fonts} array, +then the default (and only) \quote {base} font is taken to be the current font +itself. In practice, this means that you can create virtual duplicates of +existing characters which is useful if you want to create composite characters. + +Note: this feature does {\it not\/} work the other way around. There can not be +\quote {real} characters in a virtual font! You cannot use this technique for +font re-encoding either; you need a truly virtual font for that (because +characters that are already present cannot be altered). + +\subsection{Example virtual font} + +Finally, here is a plain \TEX\ input file with a virtual font demonstration: + +\startbuffer +\directlua { + callback.register('define_font', + function (name,size) + if name == 'cmr10-red' then + f = font.read_tfm('cmr10',size) + f.name = 'cmr10-red' + f.type = 'virtual' + f.fonts = {{ name = 'cmr10', size = size }} + for i,v in pairs(f.characters) do + if (string.char(i)):find('[tacohanshartmut]') then + v.commands = { + {'special','pdf: 1 0 0 rg'}, + {'char',i}, + {'special','pdf: 0 g'}, + } + else + v.commands = {{'char',i}} + end + end + else + f = font.read_tfm(name,size) + end + return f + end + ) +} + +\font\myfont = cmr10-red at 10pt \myfont This is a line of text \par +\font\myfontx= cmr10 at 10pt \myfontx Here is another line of text \par +\stopbuffer + +\typebuffer + +% \getbuffer + +\stopchapter + +\stopcomponent diff --git a/doc/context/sources/general/manuals/luatex/luatex-introduction.tex b/doc/context/sources/general/manuals/luatex/luatex-introduction.tex new file mode 100644 index 000000000..23b921129 --- /dev/null +++ b/doc/context/sources/general/manuals/luatex/luatex-introduction.tex @@ -0,0 +1,86 @@ +\environment luatex-style +\environment luatex-logos + +\startcomponent luatex-introduction + +\startchapter[title=Introduction] + +This book will eventually become the reference manual of \LUATEX. At the moment, +it simply reports the behavior of the executable matching the snapshot or beta +release date in the title page. + +Features may come and go. The current version of \LUATEX\ can be used for +production (in fact it is used in production by the authors) but users cannot +depend on complete stability, nor on functionality staying the same. This means +that when you update your binary, you also need to check if something fundamental +has changed. Normally this is communicated in articles or messages to a mailing +list. We're still not at version 1 but when we reach that state the interface +will be stable. Of course we then can decide to move towards version 2 with +different properties. + +Don't expect \LUATEX\ to behave the same as \PDFTEX ! Although the core +functionality of that 8 bit engine is present, \LUATEX\ can behave different due +to not only its 32 bit character: there is native \UTF\ input, support for wide +fonts, and the math machinery is tuned for \OPENTYPE\ math. Also, the log output +can differ (and will likely differ more as we move forward). + +\LUATEX\ consists of a number of interrelated but (still) distinguishable parts. +The organization of the source code is adapted so that it cna glue all these +components together. We continue cleaning up side effects of the accumulated +code in \TEX\ engines (especially code that is not needed any longer). + +\startitemize[packed] + \startitem + Most of \PDFTEX\ version 1.40.9, converted to C (with patches from later + releases). Some experimental features have been removed and some utility + macros are not inherited as their functionality can be done in \LUA. We + still use the \type {\pdf*} primitive namespace. + \stopitem + \startitem + The direction model and some other bits from \ALEPH\ RC4 (derived from + \OMEGA) is included. The related primitives are part of core \LUATEX. + \stopitem + \startitem + We currently use \LUA\ 5.2.*. At some point we might decide to move to + 5.3.* but that is yet to be decided. + \stopitem + \startitem + There are few \LUA\ libraries that we consider part of the core \LUA\ + machinery. + \stopitem + \startitem + There are additional \LUA\ libraries that interface to the internals of + \TEX. + \stopitem + \startitem + There are various \TEX\ extensions but only those that cannot be done + using the \LUA\ interfaces. + \stopitem + \startitem + The fontloader uses parts of \FONTFORGE\ 2008.11.17 combined with + additionaL code specific for usage in a \TEX\ engine. + \stopitem + \startitem + the \METAPOST\ library + \stopitem +\stopitemize + +Neither \ALEPH's I/O translation processes, nor tcx files, nor \ENCTEX\ can be +used, these encoding|-|related functions are superseded by a \LUA|-|based +solution (reader callbacks). + +The yearly \TEXLIVE\ version is the stable version, any version between them is +considered beta. Keep in mind that new (or changed) features also need to be +reflected in the macro package that you use. + +\blank[3*big] + +\starttabulate +\NC \LUATEX \EQ Version \number\luatexversion.\luatexrevision \NC \NR +\NC \CONTEXT \EQ \contextversion \NC \NR +\NC timestamp \EQ \currentdate \NC \NR +\stoptabulate + +\stopchapter + +\stopcomponent diff --git a/doc/context/sources/general/manuals/luatex/luatex-languages.tex b/doc/context/sources/general/manuals/luatex/luatex-languages.tex new file mode 100644 index 000000000..56978b0fd --- /dev/null +++ b/doc/context/sources/general/manuals/luatex/luatex-languages.tex @@ -0,0 +1,514 @@ +\environment luatex-style +\environment luatex-logos + +\startcomponent luatex-languages + +\startchapter[reference=languages,title={Languages and characters, fonts and glyphs}] + +\LUATEX's internal handling of the characters and glyphs that eventually become +typeset is quite different from the way \TEX82 handles those same objects. The +easiest way to explain the difference is to focus on unrestricted horizontal mode +(i.e.\ paragraphs) and hyphenation first. Later on, it will be easy to deal +with the differences that occur in horizontal and math modes. + +In \TEX82, the characters you type are converted into \type {char_node} records +when they are encountered by the main control loop. \TEX\ attaches and processes +the font information while creating those records, so that the resulting \quote +{horizontal list} contains the final forms of ligatures and implicit kerning. +This packaging is needed because we may want to get the effective width of for +instance a horizontal box. + +When it becomes necessary to hyphenate words in a paragraph, \TEX\ converts (one +word at time) the \type {char_node} records into a string array by replacing +ligatures with their components and ignoring the kerning. Then it runs the +hyphenation algorithm on this string, and converts the hyphenated result back +into a \quote {horizontal list} that is consecutively spliced back into the +paragraph stream. Keep in mind that the paragraph may contain unboxed horizontal +material, which then already contains ligatures and kerns and the words therein +are part of the hyphenation process. + +The \type {char_node} records are somewhat misnamed, as they are glyph positions +in specific fonts, and therefore not really \quote {characters} in the linguistic +sense. There is no language information inside the \type {char_node} records. +Instead, language information is passed along using \type {language whatsit} +records inside the horizontal list. + +In \LUATEX, the situation is quite different. The characters you type are always +converted into \type {glyph_node} records with a special subtype to identify them +as being intended as linguistic characters. \LUATEX\ stores the needed language +information in those records, but does not do any font|-|related processing at +the time of node creation. It only stores the index of the current font. + +When it becomes necessary to typeset a paragraph, \LUATEX\ first inserts all +hyphenation points right into the whole node list. Next, it processes all the +font information in the whole list (creating ligatures and adjusting kerning), +and finally it adjusts all the subtype identifiers so that the records are \quote +{glyph nodes} from now on. + +That was the broad overview. The rest of this chapter will deal with the minutiae +of the new process. + +\section[charsandglyphs]{Characters and glyphs} + +\TEX82 (including \PDFTEX) differentiates between \type {char_node}s and \type +{lig_node}s. The former are simple items that contained nothing but a \quote +{character} and a \quote {font} field, and they lived in the same memory as +tokens did. The latter also contained a list of components, and a subtype +indicating whether this ligature was the result of a word boundary, and it was +stored in the same place as other nodes like boxes and kerns and glues. + +In \LUATEX, these two types are merged into one, somewhat larger structure called +a \type {glyph_node}. Besides having the old character, font, and component +fields, and the new special fields like \quote {attr} +(see~\in{section}[glyphnodes]), these nodes also contain: + +\startitemize + +\startitem A subtype, split into four main types: + + \startitemize + \startitem + \type {character}, for characters to be hyphenated: the lowest bit + (bit 0) is set to 1. + \stopitem + \startitem + \type {glyph}, for specific font glyphs: the lowest bit (bit 0) is + not set. + \stopitem + \startitem + \type {ligature}, for ligatures (bit 1 is set) + \stopitem + \startitem + \type {ghost}, for \quote {ghost objects} (bit 2 is set) + \stopitem + \stopitemize + + The latter two make further use of two extra fields (bits 3 and 4): + + \startitemize + \startitem + \type {left}, for ligatures created from a left word boundary and for + ghosts created from \type {\leftghost} + \stopitem + \startitem + \type {right}, for ligatures created from a right word boundary and + for ghosts created from \type {\rightghost} + \stopitem + \stopitemize + + For ligatures, both bits can be set at the same time (in case of a + single|-|glyph word). + +\stopitem + +\startitem + \type {glyph_node}s of type \quote {character} also contain language data, + split into four items that were current when the node was created: the + \type {\setlanguage} (15 bits), \type {\lefthyphenmin} (8 bits), \type + {\righthyphenmin} (8 bits), and \type {\uchyph} (1 bit). +\stopitem + +\stopitemize + +Incidentally, \LUATEX\ allows 16383 separate languages, and words can be 256 +characters long. + +The new primitive \type {\hyphenationmin} can be used to signal the minimal length +of a word. This value stored with the (current) language. + +Because the \type {\uchyph} value is saved in the actual nodes, its handling is +subtly different from \TEX82: changes to \type {\uchyph} become effective +immediately, not at the end of the current partial paragraph. + +Typeset boxes now always have their language information embedded in the nodes +themselves, so there is no longer a possible dependency on the surrounding +language settings. In \TEX82, a mid-paragraph statement like \type {\unhbox0} would +process the box using the current paragraph language unless there was a +\type {\setlanguage} issued inside the box. In \LUATEX, all language variables are +already frozen. + +\section{The main control loop} + +In \LUATEX's main loop, almost all input characters that are to be typeset are +converted into \type {glyph} node records with subtype \quote {character}, but +there are a few exceptions. + +First, the \type {\accent} primitives creates nodes with subtype \quote {glyph} +instead of \quote {character}: one for the actual accent and one for the +accentee. The primary reason for this is that \type {\accent} in \TEX82 is +explicitly dependent on the current font encoding, so it would not make much +sense to attach a new meaning to the primitive's name, as that would invalidate +many old documents and macro packages. A secondary reason is that in \TEX82, +\type {\accent} prohibits hyphenation of the current word. Since in \LUATEX\ +hyphenation only takes place on \quote {character} nodes, it is possible to +achieve the same effect. + +This change of meaning did happen with \type {\char}, that now generates \quote +{glyph} nodes with a character subtype. In traditional \TEX\ there was a strong +relationship betwene the 8|-|bit input encoding, hyphenation and glyph staken +from a font. In \LUATEX\ we have \UTF\ input, and in most cases this maps +directly to a character in a font, apart from glyph replacement in the font +engine. If you want to access arbitrary glyphs in a font directly you can alwasy +use \LUA\ to do so, because fonts are available as \LUA\ table. + +Second, all the results of processing in math mode eventually become nodes with +\quote {glyph} subtypes. + +Third, the \ALEPH|-|derived commands \type {\leftghost} and \type {\rightghost} +create nodes of a third subtype: \quote {ghost}. These nodes are ignored +completely by all further processing until the stage where inter|-|glyph kerning +is added. + +Fourth, automatic discretionaries are handled differently. \TEX82 inserts an +empty discretionary after sensing an input character that matches the \type +{\hyphenchar} in the current font. This test is wrong, in our opinion: whether or +not hyphenation takes place should not depend on the current font, it is a +language property. + +In \LUATEX, it works like this: if \LUATEX\ senses a string of input characters +that matches the value of the new integer parameter \type {\exhyphenchar}, it will +insert an explicit discretionary after that series of nodes. Initex sets the \type +{\exhyphenchar=`\-}. Incidentally, this is a global parameter instead of a +language-specific one because it may be useful to change the value depending on +the document structure instead of the text language. + +The insertion of discretionaries after a sequence of explicit hyphens happens at +the same time as the other hyphenation processing, {\it not\/} inside the main +control loop. + +The only use \LUATEX\ has for \type {\hyphenchar} is at the check whether a word +should be considered for hyphenation at all. If the \type {\hyphenchar} of the font +attached to the first character node in a word is negative, then hyphenation of +that word is abandoned immediately. {\bf This behavior is added for backward +compatibility only, and the use of \type {\hyphenchar=-1} as a means of +preventing hyphenation should not be used in new \LUATEX\ documents.} + +Fifth, \type {\setlanguage} no longer creates whatsits. The meaning of \type +{\setlanguage} is changed so that it is now an integer parameter like all others. +That integer parameter is used in \type {\glyph_node} creation to add language +information to the glyph nodes. In conjunction, the \type {\language} primitive is +extended so that it always also updates the value of \type {\setlanguage}. + +Sixth, the \type {\noboundary} command (this command prohibits word boundary +processing where that would normally take place) now does create whatsits. These +whatsits are needed because the exact place of the \type {\noboundary} command in +the input stream has to be retained until after the ligature and font processing +stages. + +Finally, there is no longer a \type {main_loop} label in the code. Remember that +\TEX82 did quite a lot of processing while adding \type {char_nodes} to the +horizontal list? For speed reasons, it handled that processing code outside of +the \quote {main control} loop, and only the first character of any \quote {word} +was handled by that \quote {main control} loop. In \LUATEX, there is no longer a +need for that (all hard work is done later), and the (now very small) bits of +character|-|handling code have been moved back inline. When \type +{\tracingcommands} is on, this is visible because the full word is reported, +instead of just the initial character. + +\section[patternsexceptions]{Loading patterns and exceptions} + +The hyphenation algorithm in \LUATEX\ is quite different from the one in \TEX82, +although it uses essentially the same user input. + +After expansion, the argument for \type {\patterns} has to be proper \UTF8 with +individual patterns separated by spaces, no \type {\char} or \type {\chardef}d +commands are allowed. The current implementation is even more strict, and will +reject all non|-|\UNICODE\ characters, but that will be changed in the future. +For now, the generated errors are a valuable tool in discovering font-encoding +specific pattern files. + +Likewise, the expanded argument for \type {\hyphenation} also has to be proper +\UTF8, but here a tiny little bit of extra syntax is provided: + +\startitemize[n] +\startitem + Three sets of arguments in curly braces (\type {{}{}{}}) indicates a desired + complex discretionary, with arguments as in \type {\discretionary}'s command in + normal document input. +\stopitem +\startitem + A \type {-} indicates a desired simple discretionary, cf.\ \type {\-} and \type + {\discretionary{-}{}{}} in normal document input. +\stopitem +\startitem + Internal command names are ignored. This rule is provided especially for \type + {\discretionary}, but it also helps to deal with \type {\relax} commands that + may sneak in. +\stopitem +\startitem + An \type {=} indicates a (non|-|discretionary) hyphen in the document input. +\stopitem +\stopitemize + +The expanded argument is first converted back to a space-separated string while +dropping the internal command names. This string is then converted into a +dictionary by a routine that creates key|-|value pairs by converting the other +listed items. It is important to note that the keys in an exception dictionary +can always be generated from the values. Here are a few examples: + +\starttabulate[|l|l|l|] +\NC \ssbf value \NC \ssbf implied key (input) \NC \ssbf effect \NC\NR +\NC \type {ta-ble} \NC table \NC \type {ta\-ble} ($=$ \type {ta\discretionary{-}{}{}ble}) \NC\NR +\NC \type {ba{k-}{}{c}ken} \NC backen \NC \type {ba\discretionary{k-}{}{c}ken} \NC\NR +\stoptabulate + +The resultant patterns and exception dictionary will be stored under the language +code that is the present value of \type {\language}. + +In the last line of the table, you see there is no \type {\discretionary} command +in the value: the command is optional in the \TEX-based input syntax. The +underlying reason for that is that it is conceivable that a whole dictionary of +words is stored as a plain text file and loaded into \LUATEX\ using one of the +functions in the \LUA\ \type {lang} library. This loading method is quite a bit +faster than going through the \TEX\ language primitives, but some (most?) of that +speed gain would be lost if it had to interpret command sequences while doing so. + +It is possible to specify extra hyphenation points in compound words by using +\type {{-}{}{-}} for the explicit hyphen character (replace \type {-} by the +actual explicit hyphen character if needed). For example, this matches the word +\quote {multi|-|word|-|boundaries} and allows an extra break inbetweem \quote +{boun} and \quote {daries}: + +\starttyping +\hyphenation{multi{-}{}{-}word{-}{}{-}boun-daries} +\stoptyping + +The motivation behind the \ETEX\ extension \type {\savinghyphcodes} was that +hyphenation heavily depended on font encodings. This is no longer true in +\LUATEX, and the corresponding primitive is ignored pending complete removal. The +future semantics of \type {\uppercase} and \type {\lowercase} are still under +consideration, no changes have taken place yet. + +\section{Applying hyphenation} + +The internal structures \LUATEX\ uses for the insertion of discretionaries in +words is very different from the ones in \TEX82, and that means there are some +noticeable differences in handling as well. + +First and foremost, there is no \quote {compressed trie} involved in hyphenation. +The algorithm still reads \PATGEN-generated pattern files, but \LUATEX\ uses a +finite state hash to match the patterns against the word to be hyphenated. This +algorithm is based on the \quote {libhnj} library used by \OPENOFFICE, which in +turn is inspired by \TEX. The memory allocation for this new implementation is +completely dynamic, so the \WEBC\ setting for \type {trie_size} is ignored. + +Differences between \LUATEX\ and \TEX82 that are a direct result of that: + +\startitemize +\startitem + \LUATEX\ happily hyphenates the full \UNICODE\ character range. +\stopitem +\startitem + Pattern and exception dictionary size is limited by the available memory + only, all allocations are done dynamically. The trie|-|related settings in + \type {texmf.cnf} are ignored. +\stopitem +\startitem + Because there is no \quote {trie preparation} stage, language patterns never + become frozen. This means that the primitive \type {\patterns} (and its \LUA\ + counterpart \type {lang.patterns}) can be used at any time, not only in + ini\TEX. +\stopitem +\startitem + Only the string representation of \type {\patterns} and \type {\hyphenation} is + stored in the format file. At format load time, they are simply + re|-|evaluated. It follows that there is no real reason to preload languages + in the format file. In fact, it is usually not a good idea to do so. It is + much smarter to load patterns no sooner than the first time they are actually + needed. +\stopitem +\startitem + \LUATEX\ uses the language-specific variables \type {\prehyphenchar} and \type + {\posthyphenchar} in the creation of implicit discretionaries, instead of + \TEX82's \type {\hyphenchar}, and the values of the language|-|specific variables + \type {\preexhyphenchar} and \type {\postexhyphenchar} for explicit + discretionaries (instead of \TEX82's empty discretionary). +\stopitem +\startitem + The value of the two counters related to hyphenation, \type {hyphenpenalty} + and \type {exhyphenpenalty}, are now stored in the discretionary nodes. This + permits a local overload for explicit \type {\discretionary} commands. The + value current when the hyphenation pass is applied is used. When no callbacks + are used this is compatible with traditional \TEX. When you apply the \LUA\ + \type {lang.hyphenate} function the current values are used. +\stopitem +\stopitemize + +Inserted characters and ligatures inherit their attributes from the nearest glyph +node item (usually the preceding one, but the following one for the items +inserted at the left-hand side of a word). + +Word boundaries are no longer implied by font switches, but by language switches. +One word can have two separate fonts and still be hyphenated correctly (but it +can not have two different languages, the \type {\setlanguage} command forces a +word boundary). + +All languages start out with \type {\prehyphenchar=`\-}, \type {\posthyphenchar=0}, +\type {\preexhyphenchar=0} and \type {\postexhyphenchar=0}. When you assign the +values of one of these four parameters, you are actually changing the settings +for the current \type {\language}, this behavior is compatible with \type {\patterns} +and \type {\hyphenation}. + +\LUATEX\ also hyphenates the first word in a paragraph. Words can be up to 256 +characters long (up from 64 in \TEX82). Longer words generate an error right now, +but eventually either the limitation will be removed or perhaps it will become +possible to silently ignore the excess characters (this is what happens in +\TEX82, but there the behavior cannot be controlled). + +If you are using the \LUA\ function \type {lang.hyphenate}, you should be aware +that this function expects to receive a list of \quote {character} nodes. It will +not operate properly in the presence of \quote {glyph}, \quote {ligature}, or +\quote {ghost} nodes, nor does it know how to deal with kerning. In the near +future, it will be able to skip over \quote {ghost} nodes, and we may add a less +fuzzy function you can call as well. + +The hyphenation exception dictionary is maintained as key|-|value hash, and that +is also dynamic, so the \type {hyph_size} setting is not used either. + +\section{Applying ligatures and kerning} + +After all possible hyphenation points have been inserted in the list, \LUATEX\ +will process the list to convert the \quote {character} nodes into \quote {glyph} +and \quote {ligature} nodes. This is actually done in two stages: first all +ligatures are processed, then all kerning information is applied to the result +list. But those two stages are somewhat dependent on each other: If the used font +makes it possible to do so, the ligaturing stage adds virtual \quote {character} +nodes to the word boundaries in the list. While doing so, it removes and +interprets \type {noboundary} nodes. The kerning stage deletes those word +boundary items after it is done with them, and it does the same for \quote +{ghost} nodes. Finally, at the end of the kerning stage, all remaining \quote +{character} nodes are converted to \quote {glyph} nodes. + +This work separation is worth mentioning because, if you overrule from \LUA\ only +one of the two callbacks related to font handling, then you have to make sure you +perform the tasks normally done by \LUATEX\ itself in order to make sure that the +other, non|-|overruled, routine continues to function properly. + +Work in this area is not yet complete, but most of the possible cases are handled +by our rewritten ligaturing engine. We are working hard to make sure all of the +possible inputs will become supported soon. + +For example, take the word \type {office}, hyphenated \type {of-fice}, using a +\quote {normal} font with all the \type {f}-\type {f} and \type {f}-\type {i} +type ligatures: + +\starttabulate[|l|l|] +\NC Initial: \NC \type {{o}{f}{f}{i}{c}{e}} \NC\NR +\NC After hyphenation: \NC \type {{o}{f}{{-},{},{}}{f}{i}{c}{e}} \NC\NR +\NC First ligature stage: \NC \type {{o}{{f-},{f},{<ff>}}{i}{c}{e}} \NC\NR +\NC Final result: \NC \type {{o}{{f-},{<fi>},{<ffi>}}{c}{e}} \NC\NR +\stoptabulate + +That's bad enough, but let us assume that there is also a hyphenation point +between the \type {f} and the \type {i}, to create \type {of-f-ice}. Then the +final result should be: + +\starttyping +{o}{{f-}, + {{f-}, + {i}, + {<fi>}}, + {{<ff>-}, + {i}, + {<ffi>}}}{c}{e} +\stoptyping + +with discretionaries in the post-break text as well as in the replacement text of +the top-level discretionary that resulted from the first hyphenation point. + +Here is that nested solution again, in a different representation: + +\starttabulate[|l|l|l|l|] +\NC \NC pre \NC post \NC replace \NC \NR +\NC topdisc \NC \type {f-}$^1$ \NC sub1 \NC sub2 \NC \NR +\NC sub1 \NC \type {f-}$^2$ \NC \type {i}$^3$ \NC \type {<fi>}$^4$ \NC \NR +\NC sub2 \NC \type {<ff>-}$^5$\NC \type {i}$^6$ \NC \type {<ffi>}$^7$ \NC \NR +\stoptabulate + +When line breaking is choosing its breakpoints, the following fields will +eventually be selected: + +\starttabulate[|l|l|l|] +\NC \type {of-f-ice} \NC \type {f-}$^1$ \NC \NR +\NC \NC \type {f-}$^2$ \NC \NR +\NC \NC \type {i}$^3$ \NC \NR +\NC \type {of-fice} \NC \type {f-}$^1$ \NC \NR +\NC \NC \type {<fi>}$^4$ \NC \NR +\NC \type {off-ice} \NC \type {<ff>-}$^5$ \NC \NR +\NC \NC \type {i}$^6$ \NC \NR +\NC \type {office} \NC \type {<ffi>}$^7$ \NC \NR +\stoptabulate + +The current solution in \LUATEX\ is not able to handle nested discretionaries, +but it is in fact smart enough to handle this fictional \type {of-f-ice} example. +It does so by combining two sequential discretionary nodes as if they were a +single object (where the second discretionary node is treated as an extension of +the first node). + +One can observe that the \type {of-f-ice} and \type {off-ice} cases both end with +the same actual post replacement list (\type {i}), and that this would be the +case even if that \type {i} was the first item of a potential following ligature +like \type {ic}. This allows \LUATEX\ to do away with one of the fields, and thus +make the whole stuff fit into just two discretionary nodes. + +The mapping of the seven list fields to the six fields in this discretionary node +pair is as follows: + +\starttabulate[|l|p|] +\NC \bf field \NC \bf description \NC \NR +\NC \type {disc1.pre} \NC \type {f-}$^1$ \NC \NR +\NC \type {disc1.post} \NC \type {<fi>}$^4$ \NC \NR +\NC \type {disc1.replace} \NC \type {<ffi>}$^7$ \NC \NR +\NC \type {disc2.pre} \NC \type {f-}$^2$ \NC \NR +\NC \type {disc2.post} \NC \type {i}$^{3{,}6}$\NC \NR +\NC \type {disc2.replace} \NC \type {<ff>-}$^5$\NC \NR +\stoptabulate + +What is actually generated after ligaturing has been applied is therefore: + +\starttyping +{o}{{f-}, + {<fi>}, + {<ffi>}} + {{f-}, + {i}, + {<ff>-}}{c}{e} +\stoptyping + +The two discretionaries have different subtypes from a discretionary appearing on +its own: the first has subtype 4, and the second has subtype 5. The need for +these special subtypes stems from the fact that not all of the fields appear in +their \quote {normal} location. The second discretionary especially looks odd, +with things like the \type {<ff>-} appearing in \type {disc2.replace}. The fact +that some of the fields have different meanings (and different processing code +internally) is what makes it necessary to have different subtypes: this enables +\LUATEX\ to distinguish this sequence of two joined discretionary nodes from the +case of two standalone discretionaries appearing in a row. + +Of course there is still that relationship with fonts: ligatures can be implemented by +mapping a sequence of glyphs onto one glyph, but also by selective replacement and +kerning. This means that the above examples are just representing the traditional +approach. + +\section{Breaking paragraphs into lines} + +This code is still almost unchanged, but because of the above|-|mentioned changes +with respect to discretionaries and ligatures, line breaking will potentially be +different from traditional \TEX. The actual line breaking code is still based on +the \TEX82 algorithms, and it does not expect there to be discretionaries inside +of discretionaries. + +But that situation is now fairly common in \LUATEX, due to the changes to the +ligaturing mechanism. And also, the \LUATEX\ discretionary nodes are implemented +slightly different from the \TEX82 nodes: the \type {no_break} text is now +embedded inside the disc node, where previously these nodes kept their place in +the horizontal list (the discretionary node contained a counter indicating how +many nodes to skip). + +The combined effect of these two differences is that \LUATEX\ does not always use +all of the potential breakpoints in a paragraph, especially when fonts with many +ligatures are used. + +\stopchapter + +\stopcomponent diff --git a/doc/context/sources/general/manuals/luatex/luatex-libraries.tex b/doc/context/sources/general/manuals/luatex/luatex-libraries.tex new file mode 100644 index 000000000..df03e348d --- /dev/null +++ b/doc/context/sources/general/manuals/luatex/luatex-libraries.tex @@ -0,0 +1,6199 @@ +\environment luatex-style +\environment luatex-logos + +% HH: to be checked + +\startcomponent luatex-libraries + +\startchapter[reference=libraries,title={\LUATEX\ \LUA\ Libraries}] + +The implied use of the built|-|in \LUA\ modules \type {epdf}, \type {fontloader}, +\type {mplib}, and \type {pdfscanner} is deprecated. If you want to use these, +please start your source file with a proper \type {require} line. In the future, +\LUATEX\ will switch to loading these modules on demand. + +The interfacing between \TEX\ and \LUA\ is facilitated by a set of library +modules. The \LUA\ libraries in this chapter are all defined and initialized by +the \LUATEX\ executable. Together, they allow \LUA\ scripts to query and change a +number of \TEX's internal variables, run various internal \TEX\ functions, and +set up \LUATEX's hooks to execute \LUA\ code. + +The following sections are in alphabetical order. + +\section{The \type {callback} library} + +This library has functions that register, find and list callbacks. Callbacks are +\LUA\ functions that are called in well defined places. There are two kind of +callbacks: those that mix with existing functionality, and those that (when +enabled) replace functionality. In mosty cases the second category is expected to +behave similar to the built in functiontionality because in a next step specific +data is expected. For instance, you can replace the hyphenation routine. The +function gets a list that can be hyphenated (or not). The final list should be +valid and is (normally) used for constructing a paragraph. Another function can +replace the ligature builder and|/|or kerner. Doing something else is possible +but in the end might not give the user the expected outcome. + +The first thing you need to do is registering a callback: + +\startfunctioncall +id, error = callback.register (<string> callback_name, <function> func) +id, error = callback.register (<string> callback_name, nil) +id, error = callback.register (<string> callback_name, false) +\stopfunctioncall + +Here the \syntax {callback_name} is a predefined callback name, see below. The +function returns the internal \type {id} of the callback or \type {nil}, if the +callback could not be registered. In the latter case, \type {error} contains an +error message, otherwise it is \type {nil}. + +\LUATEX\ internalizes the callback function in such a way that it does not matter +if you redefine a function accidentally. + +Callback assignments are always global. You can use the special value \type {nil} +instead of a function for clearing the callback. + +For some minor speed gain, you can assign the boolean \type {false} to the +non|-|file related callbacks, doing so will prevent \LUATEX\ from executing +whatever it would execute by default (when no callback function is registered at +all). Be warned: this may cause all sorts of grief unless you know {\em exactly} +what you are doing! + +Currently, callbacks are not dumped into the format file. + +\startfunctioncall +<table> info = callback.list() +\stopfunctioncall + +The keys in the table are the known callback names, the value is a boolean where +\type {true} means that the callback is currently set (active). + +\startfunctioncall +<function> f = callback.find (callback_name) +\stopfunctioncall + +If the callback is not set, \type {callback.find} returns \type {nil}. + +\subsection{File discovery callbacks} + +The behavior documented in this subsection is considered stable in the sense that +there will not be backward|-|incompatible changes any more. + +\subsubsection{\type {find_read_file} and \type {find_write_file}} + +Your callback function should have the following conventions: + +\startfunctioncall +<string> actual_name = function (<number> id_number, <string> asked_name) +\stopfunctioncall + +Arguments: + +\startitemize + +\sym{id_number} + +This number is zero for the log or \type {\input} files. For \TEX's \type {\read} +or \type {\write} the number is incremented by one, so \type {\read0} becomes~1. + +\sym{asked_name} + +This is the user|-|supplied filename, as found by \type {\input}, \type {\openin} +or \type {\openout}. + +\stopitemize + +Return value: + +\startitemize + +\sym{actual_name} + +This is the filename used. For the very first file that is read in by \TEX, you +have to make sure you return an \type {actual_name} that has an extension and +that is suitable for use as \type {jobname}. If you don't, you will have to +manually fix the name of the log file and output file after \LUATEX\ is finished, +and an eventual format filename will become mangled. That is because these file +names depend on the jobname. + +You have to return \type {nil} if the file cannot be found. + +\stopitemize + +\subsubsection{\type {find_font_file}} + +Your callback function should have the following conventions: + +\startfunctioncall +<string> actual_name = function (<string> asked_name) +\stopfunctioncall + +The \type {asked_name} is an \OTF\ or \TFM\ font metrics file. + +Return \type {nil} if the file cannot be found. + +\subsubsection{\type {find_output_file}} + +Your callback function should have the following conventions: + +\startfunctioncall +<string> actual_name = function (<string> asked_name) +\stopfunctioncall + +The \type {asked_name} is the \PDF\ or \DVI\ file for writing. + +\subsubsection{\type {find_format_file}} + +Your callback function should have the following conventions: + +\startfunctioncall +<string> actual_name = function (<string> asked_name) +\stopfunctioncall + +The \type {asked_name} is a format file for reading (the format file for writing +is always opened in the current directory). + +\subsubsection{\type {find_vf_file}} + +Like \type {find_font_file}, but for virtual fonts. This applies to both \ALEPH's +\OVF\ files and traditional Knuthian \VF\ files. + +\subsubsection{\type {find_map_file}} + +Like \type {find_font_file}, but for map files. + +\subsubsection{\type {find_enc_file}} + +Like \type {find_font_file}, but for enc files. + +\subsubsection{\type {find_sfd_file}} + +Like \type {find_font_file}, but for subfont definition files. + +\subsubsection{\type {find_pk_file}} + +Like \type {find_font_file}, but for pk bitmap files. The argument \type +{asked_name} is a bit special in this case. Its form is + +\starttyping +<base res>dpi/<fontname>.<actual res>pk +\stoptyping + +So you may be asked for \type {600dpi/manfnt.720pk}. It is up to you to find a +\quote {reasonable} bitmap file to go with that specification. + +\subsubsection{\type {find_data_file}} + +Like \type {find_font_file}, but for embedded files (\type {\pdfobj file '...'}). + +\subsubsection{\type {find_opentype_file}} + +Like \type {find_font_file}, but for \OPENTYPE\ font files. + +\subsubsection{\type {find_truetype_file} and \type {find_type1_file}} + +Your callback function should have the following conventions: + +\startfunctioncall +<string> actual_name = function (<string> asked_name) +\stopfunctioncall + +The \type {asked_name} is a font file. This callback is called while \LUATEX\ is +building its internal list of needed font files, so the actual timing may +surprise you. Your return value is later fed back into the matching \type +{read_file} callback. + +Strangely enough, \type {find_type1_file} is also used for \OPENTYPE\ (\OTF) +fonts. + +\subsubsection{\type {find_image_file}} + +Your callback function should have the following conventions: + +\startfunctioncall +<string> actual_name = function (<string> asked_name) +\stopfunctioncall + +The \type {asked_name} is an image file. Your return value is used to open a file +from the harddisk, so make sure you return something that is considered the name +of a valid file by your operating system. + +\subsection[iocallback]{File reading callbacks} + +The behavior documented in this subsection is considered stable in the sense that +there will not be backward-incompatible changes any more. + +\subsubsection{\type {open_read_file}} + +Your callback function should have the following conventions: + +\startfunctioncall +<table> env = function (<string> file_name) +\stopfunctioncall + +Argument: + +\startitemize + +\sym{file_name} + +The filename returned by a previous \type {find_read_file} or the return value of +\type {kpse.find_file()} if there was no such callback defined. + +\stopitemize + +Return value: + +\startitemize + +\sym{env} + +This is a table containing at least one required and one optional callback +function for this file. The required field is \type {reader} and the associated +function will be called once for each new line to be read, the optional one is +\type {close} that will be called once when \LUATEX\ is done with the file. + +\LUATEX\ never looks at the rest of the table, so you can use it to store your +private per|-|file data. Both the callback functions will receive the table as +their only argument. + +\stopitemize + +\subsubsubsection{\type {reader}} + +\LUATEX\ will run this function whenever it needs a new input line from the file. + +\startfunctioncall +function(<table> env) + return <string> line +end +\stopfunctioncall + +Your function should return either a string or \type {nil}. The value \type {nil} +signals that the end of file has occurred, and will make \TEX\ call the optional +\type {close} function next. + +\subsubsubsection{\type {close}} + +\LUATEX\ will run this optional function when it decides to close the file. + +\startfunctioncall +function(<table> env) +end +\stopfunctioncall + +Your function should not return any value. + +\subsubsection{General file readers} + +There is a set of callbacks for the loading of binary data files. These all use +the same interface: + +\startfunctioncall +function(<string> name) + return <boolean> success, <string> data, <number> data_size +end +\stopfunctioncall + +The \type {name} will normally be a full path name as it is returned by either +one of the file discovery callbacks or the internal version of \type +{kpse.find_file()}. + +\startitemize + +\sym{success} + +Return \type {false} when a fatal error occurred (e.g.\ when the file cannot be +found, after all). + +\sym{data} + +The bytes comprising the file. + +\sym{data_size} + +The length of the \type {data}, in bytes. + +\stopitemize + +Return an empty string and zero if the file was found but there was a +reading problem. + +The list of functions is as follows: + +\starttabulate[|l|p|] +\NC \type {read_font_file} \NC ofm or tfm files \NC \NR +\NC \type {read_vf_file} \NC virtual fonts \NC \NR +\NC \type {read_map_file} \NC map files \NC \NR +\NC \type {read_enc_file} \NC encoding files \NC \NR +\NC \type {read_sfd_file} \NC subfont definition files \NC \NR +\NC \type {read_pk_file} \NC pk bitmap files \NC \NR +\NC \type {read_data_file} \NC embedded files (\type {\pdfobj file ...}) \NC \NR +\NC \type {read_truetype_file} \NC \TRUETYPE\ font files \NC \NR +\NC \type {read_type1_file} \NC \TYPEONE\ font files \NC \NR +\NC \type {read_opentype_file} \NC \OPENTYPE\ font files \NC \NR +\stoptabulate + +\subsection{Data processing callbacks} + +\subsubsection{\type {process_input_buffer}} + +This callback allows you to change the contents of the line input buffer just +before \LUATEX\ actually starts looking at it. + +\startfunctioncall +function(<string> buffer) + return <string> adjusted_buffer +end +\stopfunctioncall + +If you return \type {nil}, \LUATEX\ will pretend like your callback never +happened. You can gain a small amount of processing time from that. + +This callback does not replace any internal code. + +\subsubsection{\type {process_output_buffer}} + +This callback allows you to change the contents of the line output buffer just +before \LUATEX\ actually starts writing it to a file as the result of a \type +{\write} command. It is only called for output to an actual file (that is, +excluding the log, the terminal, and \type {\write18} calls). + +\startfunctioncall +function(<string> buffer) + return <string> adjusted_buffer +end +\stopfunctioncall + +If you return \type {nil}, \LUATEX\ will pretend like your callback never +happened. You can gain a small amount of processing time from that. + +This callback does not replace any internal code. + +\subsubsection{\type {process_jobname}} + +This callback allows you to change the jobname given by \type {\jobname} in \TEX\ +and \type {tex.jobname} in Lua. It does not affect the internal job name or the +name of the output or log files. + +\startfunctioncall +function(<string> jobname) + return <string> adjusted_jobname +end +\stopfunctioncall + +The only argument is the actual job name; you should not use \type {tex.jobname} +inside this function or infinite recursion may occur. If you return \type {nil}, +\LUATEX\ will pretend your callback never happened. + +This callback does not replace any internal code. + +\subsubsection{\type {token_filter}} + +This callback allows you to replace the way \LUATEX\ fetches lexical tokens. + +\startfunctioncall +function() + return <table> token +end +\stopfunctioncall + +The calling convention for this callback is a bit more complicated than for most +other callbacks. The function should either return a \LUA\ table representing a +valid to|-|be|-|processed token or tokenlist, or something else like \type {nil} +or an empty table. + +If your \LUA\ function does not return a table representing a valid token, it +will be immediately called again, until it eventually does return a useful token +or tokenlist (or until you reset the callback value to nil). See the description +of \type {token} for some handy functions to be used in conjunction with this +callback. + +If your function returns a single usable token, then that token will be processed +by \LUATEX\ immediately. If the function returns a token list (a table consisting +of a list of consecutive token tables), then that list will be pushed to the +input stack at a completely new token list level, with its token type set to +\quote {inserted}. In either case, the returned token(s) will not be fed back +into the callback function. + +Setting this callback to \type {false} has no effect (because otherwise nothing +would happen, forever). + +\subsection{Node list processing callbacks} + +The description of nodes and node lists is in~\in{chapter}[nodes]. + +\subsubsection{\type {buildpage_filter}} + +This callback is called whenever \LUATEX\ is ready to move stuff to the main +vertical list. You can use this callback to do specialized manipulation of the +page building stage like imposition or column balancing. + +\startfunctioncall +function(<string> extrainfo) +end +\stopfunctioncall + +The string \type {extrainfo} gives some additional information about what \TEX's +state is with respect to the \quote {current page}. The possible values are: + +\starttabulate[|lT|p|] +\NC \ssbf value \NC \bf explanation \NC \NR +\NC alignment \NC a (partial) alignment is being added \NC \NR +\NC after_output \NC an output routine has just finished \NC \NR +\NC box \NC a typeset box is being added \NC \NR +%NC pre_box \NC interline material is being added \NC \NR +%NC adjust \NC \type {\vadjust} material is being added \NC \NR +\NC new_graf \NC the beginning of a new paragraph \NC \NR +\NC vmode_par \NC \type {\par} was found in vertical mode \NC \NR +\NC hmode_par \NC \type {\par} was found in horizontal mode \NC \NR +\NC insert \NC an insert is added \NC \NR +\NC penalty \NC a penalty (in vertical mode) \NC \NR +\NC before_display \NC immediately before a display starts \NC \NR +\NC after_display \NC a display is finished \NC \NR +\NC end \NC \LUATEX\ is terminating (it's all over) \NC \NR +\stoptabulate + +This callback does not replace any internal code. + +\subsubsection{\type {pre_linebreak_filter}} + +This callback is called just before \LUATEX\ starts converting a list of nodes +into a stack of \type {\hbox}es, after the addition of \type {\parfillskip}. + +\startfunctioncall +function(<node> head, <string> groupcode) + return true | false | <node> newhead +end +\stopfunctioncall + +The string called \type {groupcode} identifies the nodelist's context within +\TEX's processing. The range of possibilities is given in the table below, but +not all of those can actually appear in \type {pre_linebreak_filter}, some are +for the \type {hpack_filter} and \type {vpack_filter} callbacks that will be +explained in the next two paragraphs. + +\starttabulate[|lT|p|] +\NC \ssbf value \NC \bf explanation \NC \NR +\NC <empty> \NC main vertical list \NC \NR +\NC hbox \NC \type {\hbox} in horizontal mode \NC \NR +\NC adjusted_hbox \NC \type {\hbox} in vertical mode \NC \NR +\NC vbox \NC \type {\vbox} \NC \NR +\NC vtop \NC \type {\vtop} \NC \NR +\NC align \NC \type {\halign} or \type {\valign} \NC \NR +\NC disc \NC discretionaries \NC \NR +\NC insert \NC packaging an insert \NC \NR +\NC vcenter \NC \type {\vcenter} \NC \NR +\NC local_box \NC \type {\localleftbox} or \type {\localrightbox} \NC \NR +\NC split_off \NC top of a \type {\vsplit} \NC \NR +\NC split_keep \NC remainder of a \type {\vsplit} \NC \NR +\NC align_set \NC alignment cell \NC \NR +\NC fin_row \NC alignment row \NC \NR +\stoptabulate + +As for all the callbacks that deal with nodes, the return value can be one of +three things: + +\startitemize +\startitem + boolean \type {true} signals succesful processing +\stopitem +\startitem + \type {<node>} signals that the \quote {head} node should be replaced by the + returned node +\stopitem +\startitem + boolean \type {false} signals that the \quote {head} node list should be + ignored and flushed from memory +\stopitem +\stopitemize + +This callback does not replace any internal code. + +\subsubsection{\type {linebreak_filter}} + +This callback replaces \LUATEX's line breaking algorithm. + +\startfunctioncall +function(<node> head, <boolean> is_display) + return <node> newhead +end +\stopfunctioncall + +The returned node is the head of the list that will be added to the main vertical +list, the boolean argument is true if this paragraph is interrupted by a +following math display. + +If you return something that is not a \type {<node>}, \LUATEX\ will apply the +internal linebreak algorithm on the list that starts at \type {<head>}. +Otherwise, the \type {<node>} you return is supposed to be the head of a list of +nodes that are all allowed in vertical mode, and at least one of those has to +represent a hbox. Failure to do so will result in a fatal error. + +Setting this callback to \type {false} is possible, but dangerous, because it is +possible you will end up in an unfixable \quote {deadcycles loop}. + +\subsubsection{\type {post_linebreak_filter}} + +This callback is called just after \LUATEX\ has converted a list of nodes into a +stack of \type {\hbox}es. + +\startfunctioncall +function(<node> head, <string> groupcode) + return true | false | <node> newhead +end +\stopfunctioncall + +This callback does not replace any internal code. + +\subsubsection{\type {hpack_filter}} + +This callback is called when \TEX\ is ready to start boxing some horizontal mode +material. Math items and line boxes are ignored at the moment. + +\startfunctioncall +function(<node> head, <string> groupcode, <number> size, + <string> packtype [, <string> direction]) + return true | false | <node> newhead +end +\stopfunctioncall + +The \type {packtype} is either \type {additional} or \type {exactly}. If \type +{additional}, then the \type {size} is a \type {\hbox spread ...} argument. If +\type {exactly}, then the \type {size} is a \type {\hbox to ...}. In both cases, +the number is in scaled points. + +The \type {direction} is either one of the three-letter direction specifier +strings, or \type {nil}. + +This callback does not replace any internal code. + +\subsubsection{\type {vpack_filter}} + +This callback is called when \TEX\ is ready to start boxing some vertical mode +material. Math displays are ignored at the moment. + +This function is very similar to the \type {hpack_filter}. Besides the fact +that it is called at different moments, there is an extra variable that matches +\TEX's \type {\maxdepth} setting. + +\startfunctioncall +function(<node> head, <string> groupcode, <number> size, <string> + packtype, <number> maxdepth [, <string> direction]) + return true | false | <node> newhead +end +\stopfunctioncall + +This callback does not replace any internal code. + +\subsubsection{\type {pre_output_filter}} + +This callback is called when \TEX\ is ready to start boxing the box 255 for \type +{\output}. + +\startfunctioncall +function(<node> head, <string> groupcode, <number> size, <string> packtype, + <number> maxdepth [, <string> direction]) + return true | false | <node> newhead +end +\stopfunctioncall + +This callback does not replace any internal code. + +\subsubsection{\type {hyphenate}} + +\startfunctioncall +function(<node> head, <node> tail) +end +\stopfunctioncall + +No return values. This callback has to insert discretionary nodes in the node +list it receives. + +Setting this callback to \type {false} will prevent the internal discretionary +insertion pass. + +\subsubsection{\type {ligaturing}} + +\startfunctioncall +function(<node> head, <node> tail) +end +\stopfunctioncall + +No return values. This callback has to apply ligaturing to the node list it +receives. + +You don't have to worry about return values because the \type {head} node that is +passed on to the callback is guaranteed not to be a glyph_node (if need be, a +temporary node will be prepended), and therefore it cannot be affected by the +mutations that take place. After the callback, the internal value of the \quote +{tail of the list} will be recalculated. + +The \type {next} of \type {head} is guaranteed to be non-nil. + +The \type {next} of \type {tail} is guaranteed to be nil, and therefore the +second callback argument can often be ignored. It is provided for orthogonality, +and because it can sometimes be handy when special processing has to take place. + +Setting this callback to \type {false} will prevent the internal ligature +creation pass. + +\subsubsection{\type {kerning}} + +\startfunctioncall +function(<node> head, <node> tail) +end +\stopfunctioncall + +No return values. This callback has to apply kerning between the nodes in the +node list it receives. See \type {ligaturing} for calling conventions. + +Setting this callback to \type {false} will prevent the internal kern insertion +pass. + +\subsubsection{\type {mlist_to_hlist}} + +This callback replaces \LUATEX's math list to node list conversion algorithm. + +\startfunctioncall +function(<node> head, <string> display_type, <boolean> need_penalties) + return <node> newhead +end +\stopfunctioncall + +The returned node is the head of the list that will be added to the vertical or +horizontal list, the string argument is either \quote {text} or \quote {display} +depending on the current math mode, the boolean argument is \type {true} if +penalties have to be inserted in this list, \type {false} otherwise. + +Setting this callback to \type {false} is bad, it will almost certainly result in +an endless loop. + +\subsection{Information reporting callbacks} + +\subsubsection{\type {pre_dump}} + +\startfunctioncall +function() +end +\stopfunctioncall + +This function is called just before dumping to a format file starts. It does not +replace any code and there are neither arguments nor return values. + +\subsubsection{\type {start_run}} + +\startfunctioncall +function() +end +\stopfunctioncall + +This callback replaces the code that prints \LUATEX's banner. Note that for +successful use, this callback has to be set in the lua initialization script, +otherwise it will be seen only after the run has already started. + +\subsubsection{\type {stop_run}} + +\startfunctioncall +function() +end +\stopfunctioncall + +This callback replaces the code that prints \LUATEX's statistics and \quote +{output written to} messages. + +\subsubsection{\type {start_page_number}} + +\startfunctioncall +function() +end +\stopfunctioncall + +Replaces the code that prints the \type {[} and the page number at the begin of +\type {\shipout}. This callback will also override the printing of box information +that normally takes place when \type {\tracingoutput} is positive. + +\subsubsection{\type {stop_page_number}} + +\startfunctioncall +function() +end +\stopfunctioncall + +Replaces the code that prints the \type {]} at the end of \type {\shipout}. + +\subsubsection{\type {show_error_hook}} + +\startfunctioncall +function() +end +\stopfunctioncall + +This callback is run from inside the \TEX\ error function, and the idea is to +allow you to do some extra reporting on top of what \TEX\ already does (none of +the normal actions are removed). You may find some of the values in the \type +{status} table useful. + +This callback does not replace any internal code. + +\iffalse % this has been retracted for the moment + + \startitemize + + \sym{message} + + is the formal error message \TEX\ has given to the user. (the line after the + \type {'!'}). + + \sym{indicator} + + is either a filename (when it is a string) or a location indicator (a number) + that can mean lots of different things like a token list id or a \type {\read} + number. + + \sym{lineno} + + is the current line number. + \stopitemize + + This is an investigative item for 'testing the water' only. The final goal is the + total replacement of \TEX's error handling routines, but that needs lots of + adjustments in the web source because \TEX\ deals with errors in a somewhat + haphazard fashion. This is why the exact definition of \type {indicator} is not + given here. + +\fi + +\subsubsection{\type {show_error_message}} + +\startfunctioncall +function() +end +\stopfunctioncall + +This callback replaces the code that prints the error message. The usual +interaction after the message is not affected. + +\subsubsection{\type {show_lua_error_hook}} + +\startfunctioncall +function() +end +\stopfunctioncall + +This callback replaces the code that prints the extra lua error message. + +\subsubsection{\type {start_file}} + +\startfunctioncall +function(category,filename) +end +\stopfunctioncall + +This callback replaces the code that prints \LUATEX's when a file is opened like +\type {(filename} for regular files. The category is a number: + +\starttabulate[|||] +\NC 1 \NC a normal data file, like a \TEX\ source \NC \NR +\NC 2 \NC a font map coupling font names to resources \NC \NR +\NC 3 \NC an image file (\type {png}, \type {pdf}, etc) \NC \NR +\NC 4 \NC an embedded font subset \NC \NR +\NC 5 \NC a fully embedded font \NC \NR +\stoptabulate + +\subsubsection{\type {stop_file}} + +\startfunctioncall +function(category) +end +\stopfunctioncall + +This callback replaces the code that prints \LUATEX's when a file is closed like +the \type {)} for regular files. + +\subsection{PDF-related callbacks} + +\subsubsection{\type {finish_pdffile}} + +\startfunctioncall +function() +end +\stopfunctioncall + +This callback is called when all document pages are already written to the \PDF\ +file and \LUATEX\ is about to finalize the output document structure. Its +intended use is final update of \PDF\ dictionaries such as \type {/Catalog} or +\type {/Info}. The callback does not replace any code. There are neither +arguments nor return values. + +\subsubsection{\type {finish_pdfpage}} + +\startfunctioncall +function(shippingout) +end +\stopfunctioncall + +This callback is called after the pdf page stream has been assembled and before +the page object gets finalized. + +\subsection{Font-related callbacks} + +\subsubsection{\type {define_font}} + +\startfunctioncall +function(<string> name, <number> size, <number> id) + return <table> font | <number> id +end +\stopfunctioncall + +The string \type {name} is the filename part of the font specification, as given +by the user. + +The number \type {size} is a bit special: + +\startitemize[packed] +\startitem + If it is positive, it specifies an \quote{at size} in scaled points. +\stopitem +\startitem + If it is negative, its absolute value represents a \quote {scaled} setting + relative to the designsize of the font. +\stopitem +\stopitemize + +The \type {id} is the internal number assigned to the font. + +The internal structure of the \type {font} table that is to be returned is +explained in \in {chapter} [fonts]. That table is saved internally, so you can +put extra fields in the table for your later \LUA\ code to use. In alternative, +retval can be a previously defined fontid. This is useful if a previous +definition can be reused instead of creating a whole new font structure. + +Setting this callback to \type {false} is pointless as it will prevent font +loading completely but will nevertheless generate errors. + +\section{The \type {epdf} library} + +The \type {epdf} library provides Lua bindings to many \PDF\ access functions +that are defined by the poppler pdf viewer library (written in C$+{}+$ by +Kristian H\o gsberg, based on xpdf by Derek Noonburg). Within \LUATEX\ (and +\PDFTEX), xpdf functionality is being used since long time to embed \PDF\ files. +The \type {epdf} library shall allow to scrutinize an external \PDF\ file. It +gives access to its document structure, e.g., catalog, cross-reference table, +individual pages, objects, annotations, info, and metadata. The \LUATEX\ team is +evaluating the possibility of reducing the binding to a basic low level \PDF\ +primitives and delegate the complete set of functions to an external shared +object module. + +The \type {epdf} library is still in alpha state: \PDF\ access is currently +read|-|only. Iit's not yet possible to alter a \PDF\ file or to assemble it from +scratch, and many function bindings are still missing, and it is unlikely that we +to support that at all. At some point we might also decide to limit the interface +to a reasonable subset. + +For a start, a \PDF\ file is opened by \type {epdf.open()} with file name, e.g.: + +\starttyping +doc = epdf.open("foo.pdf") +\stoptyping + +This normally returns a \type {PDFDoc} userdata variable; but if the file could +not be opened successfully, instead of a fatal error just the value \type {nil} is +returned. + +All Lua functions in the \type {epdf} library are named after the poppler +functions listed in the poppler header files for the various classes, e.g., files +\type {PDFDoc.h}, \type {Dict.h}, and \type {Array.h}. These files can be found +in the poppler subdirectory within the \LUATEX\ sources. Which functions are +already implemented in the \type {epdf} library can be found in the \LUATEX\ +source file \type {lepdflib.cc}. For using the \type {epdf} library, knowledge of +the \PDF\ file architecture is indispensable. + +There are many different userdata types defined by the \type {epdf} library, +currently these are \type {AnnotBorderStyle}, \type {AnnotBorder}, \type +{Annots}, \type {Annot}, \type {Array}, \type {Attribute}, \type {Catalog}, \type +{Dict}, \type {EmbFile}, \type {GString}, \type {LinkDest}, \type {Links}, \type +{Link}, \type {ObjectStream}, \type {Object}, \type {PDFDoc}, \type +{PDFRectangle}, \type {Page}, \type {Ref}, \type {Stream}, \type {StructElement}, +\type {StructTreeRoot} \type {TextSpan}, \type {XRefEntry} and \type {XRef}. + +All these userdata names and the Lua access functions closely resemble the +classes naming from the poppler header files, including the choice of mixed upper +and lower case letters. The Lua function calls use object|-|oriented syntax, +e.g., the following calls return the \type {Page} object for page~1: + +\starttyping +pageref = doc:getCatalog():getPageRef(1) +pageobj = doc:getXRef():fetch(pageref.num, pageref.gen) +\stoptyping + +But writing such chained calls is risky, as an intermediate function may return +\type {nil} on error; therefore between function calls there should be Lua type +checks (e.g., against \type {nil}) done. If a non-object item is requested (e.g., +a \type {Dict} item by calling \type {page:getPieceInfo()}, cf.~\type {Page.h}) +but not available, the Lua functions return \type {nil} (without error). If a +function should return an \type {Object}, but it's not existing, a \type {Null} +object is returned instead (also without error; this is in|-|line with poppler +behavior). + +All library objects have a \type {__gc} metamethod for garbage collection. The +\type {__tostring} metamethod gives the type name for each object. + +All object constructors: + +\startfunctioncall +<PDFDoc> = epdf.open(<string> PDF filename) +<Annot> = epdf.Annot(<XRef>, <Dict>, <Catalog>, <Ref>) +<Annots> = epdf.Annots(<XRef>, <Catalog>, <Object>) +<Array> = epdf.Array(<XRef>) +<Attribute> = epdf.Attribute(<Type>,<Object>)| epdf.Attribute(<string>, <int>, <Object>) +<Dict> = epdf.Dict(<XRef>) +<Object> = epdf.Object() +<PDFRectangle> = epdf.PDFRectangle() +\stopfunctioncall + +The functions \type {StructElement_Type}, \type {Attribute_Type} and \type +{AttributeOwner_Type} return a hash table \type {{<string>,<integer>}}. + +\type {Annot} methods: + +\startfunctioncall +<boolean> = <Annot>:isOK() +<Object> = <Annot>:getAppearance() +<AnnotBorder> = <Annot>:getBorder() +<boolean> = <Annot>:match(<Ref>) +\stopfunctioncall + +\type {AnnotBorderStyle} methods: + +\startfunctioncall +<number> = <AnnotBorderStyle>:getWidth() +\stopfunctioncall + +\type {Annots} methods: + +\startfunctioncall +<integer> = <Annots>:getNumAnnots() +<Annot> = <Annots>:getAnnot(<integer>) +\stopfunctioncall + +\type {Array} methods: + +\startfunctioncall + <Array>:incRef() + <Array>:decRef() +<integer> = <Array>:getLength() + <Array>:add(<Object>) +<Object> = <Array>:get(<integer>) +<Object> = <Array>:getNF(<integer>) +<string> = <Array>:getString(<integer>) +\stopfunctioncall + +\type {Attribute} methods: + +\startfunctioncall +<boolean> = <Attribute>:isOk() +<integer> = <Attribute>:getType() +<integer> = <Attribute>:getOwner() +<string> = <Attribute>:getTypeName() +<string> = <Attribute>:getOwnerName() +<Object> = <Attribute>:getValue() +<Object> = <Attribute>:getDefaultValue +<string> = <Attribute>:getName() +<integer> = <Attribute>:getRevision() + <Attribute>:setRevision(<unsigned integer>) +<boolean> = <Attribute>:istHidden() + <Attribute>:setHidden(<boolean>) +<string> = <Attribute>:getFormattedValue() +<string> = <Attribute>:setFormattedValue(<string>) +\stopfunctioncall + +\type {Catalog} methods: + +\startfunctioncall +<boolean> = <Catalog>:isOK() +<integer> = <Catalog>:getNumPages() +<Page> = <Catalog>:getPage(<integer>) +<Ref> = <Catalog>:getPageRef(<integer>) +<string> = <Catalog>:getBaseURI() +<string> = <Catalog>:readMetadata() +<Object> = <Catalog>:getStructTreeRoot() +<integer> = <Catalog>:findPage(<integer> object number, <integer> object generation) +<LinkDest> = <Catalog>:findDest(<string> name) +<Object> = <Catalog>:getDests() +<integer> = <Catalog>:numEmbeddedFiles() +<EmbFile> = <Catalog>:embeddedFile(<integer>) +<integer> = <Catalog>:numJS() +<string> = <Catalog>:getJS(<integer>) +<Object> = <Catalog>:getOutline() +<Object> = <Catalog>:getAcroForm() +\stopfunctioncall + +\type {EmbFile} methods: + +\startfunctioncall +<string> = <EmbFile>:name() +<string> = <EmbFile>:description() +<integer> = <EmbFile>:size() +<string> = <EmbFile>:modDate() +<string> = <EmbFile>:createDate() +<string> = <EmbFile>:checksum() +<string> = <EmbFile>:mimeType() +<Object> = <EmbFile>:streamObject() +<boolean> = <EmbFile>:isOk() +\stopfunctioncall + +\type {Dict} methods: + +\startfunctioncall + <Dict>:incRef() + <Dict>:decRef() +<integer> = <Dict>:getLength() + <Dict>:add(<string>, <Object>) + <Dict>:set(<string>, <Object>) + <Dict>:remove(<string>) +<boolean> = <Dict>:is(<string>) +<Object> = <Dict>:lookup(<string>) +<Object> = <Dict>:lookupNF(<string>) +<integer> = <Dict>:lookupInt(<string>, <string>) +<string> = <Dict>:getKey(<integer>) +<Object> = <Dict>:getVal(<integer>) +<Object> = <Dict>:getValNF(<integer>) +<boolean> = <Dict>:hasKey(<string>) +\stopfunctioncall + +\type {Link} methods: + +\startfunctioncall +<boolean> = <Link>:isOK() +<boolean> = <Link>:inRect(<number>, <number>) +\stopfunctioncall + +\type {LinkDest} methods: + +\startfunctioncall +<boolean> = <LinkDest>:isOK() +<integer> = <LinkDest>:getKind() +<string> = <LinkDest>:getKindName() +<boolean> = <LinkDest>:isPageRef() +<integer> = <LinkDest>:getPageNum() +<Ref> = <LinkDest>:getPageRef() +<number> = <LinkDest>:getLeft() +<number> = <LinkDest>:getBottom() +<number> = <LinkDest>:getRight() +<number> = <LinkDest>:getTop() +<number> = <LinkDest>:getZoom() +<boolean> = <LinkDest>:getChangeLeft() +<boolean> = <LinkDest>:getChangeTop() +<boolean> = <LinkDest>:getChangeZoom() +\stopfunctioncall + +\type {Links} methods: + +\startfunctioncall +<integer> = <Links>:getNumLinks() +<Link> = <Links>:getLink(<integer>) +\stopfunctioncall + +\type {Object} methods: + +\startfunctioncall + <Object>:initBool(<boolean>) + <Object>:initInt(<integer>) + <Object>:initReal(<number>) + <Object>:initString(<string>) + <Object>:initName(<string>) + <Object>:initNull() + <Object>:initArray(<XRef>) + <Object>:initDict(<XRef>) + <Object>:initStream(<Stream>) + <Object>:initRef(<integer> object number, <integer> object generation) + <Object>:initCmd(<string>) + <Object>:initError() + <Object>:initEOF() +<Object> = <Object>:fetch(<XRef>) +<integer> = <Object>:getType() +<string> = <Object>:getTypeName() +<boolean> = <Object>:isBool() +<boolean> = <Object>:isInt() +<boolean> = <Object>:isReal() +<boolean> = <Object>:isNum() +<boolean> = <Object>:isString() +<boolean> = <Object>:isName() +<boolean> = <Object>:isNull() +<boolean> = <Object>:isArray() +<boolean> = <Object>:isDict() +<boolean> = <Object>:isStream() +<boolean> = <Object>:isRef() +<boolean> = <Object>:isCmd() +<boolean> = <Object>:isError() +<boolean> = <Object>:isEOF() +<boolean> = <Object>:isNone() +<boolean> = <Object>:getBool() +<integer> = <Object>:getInt() +<number> = <Object>:getReal() +<number> = <Object>:getNum() +<string> = <Object>:getString() +<string> = <Object>:getName() +<Array> = <Object>:getArray() +<Dict> = <Object>:getDict() +<Stream> = <Object>:getStream() +<Ref> = <Object>:getRef() +<integer> = <Object>:getRefNum() +<integer> = <Object>:getRefGen() +<string> = <Object>:getCmd() +<integer> = <Object>:arrayGetLength() + = <Object>:arrayAdd(<Object>) +<Object> = <Object>:arrayGet(<integer>) +<Object> = <Object>:arrayGetNF(<integer>) +<integer> = <Object>:dictGetLength(<integer>) + = <Object>:dictAdd(<string>, <Object>) + = <Object>:dictSet(<string>, <Object>) +<Object> = <Object>:dictLookup(<string>) +<Object> = <Object>:dictLookupNF(<string>) +<string> = <Object>:dictgetKey(<integer>) +<Object> = <Object>:dictgetVal(<integer>) +<Object> = <Object>:dictgetValNF(<integer>) +<boolean> = <Object>:streamIs(<string>) + = <Object>:streamReset() +<integer> = <Object>:streamGetChar() +<integer> = <Object>:streamLookChar() +<integer> = <Object>:streamGetPos() + = <Object>:streamSetPos(<integer>) +<Dict> = <Object>:streamGetDict() +\stopfunctioncall + +\type {Page} methods: + +\startfunctioncall +<boolean> = <Page>:isOk() +<integer> = <Page>:getNum() +<PDFRectangle> = <Page>:getMediaBox() +<PDFRectangle> = <Page>:getCropBox() +<boolean> = <Page>:isCropped() +<number> = <Page>:getMediaWidth() +<number> = <Page>:getMediaHeight() +<number> = <Page>:getCropWidth() +<number> = <Page>:getCropHeight() +<PDFRectangle> = <Page>:getBleedBox() +<PDFRectangle> = <Page>:getTrimBox() +<PDFRectangle> = <Page>:getArtBox() +<integer> = <Page>:getRotate() +<string> = <Page>:getLastModified() +<Dict> = <Page>:getBoxColorInfo() +<Dict> = <Page>:getGroup() +<Stream> = <Page>:getMetadata() +<Dict> = <Page>:getPieceInfo() +<Dict> = <Page>:getSeparationInfo() +<Dict> = <Page>:getResourceDict() +<Object> = <Page>:getAnnots() +<Links> = <Page>:getLinks(<Catalog>) +<Object> = <Page>:getContents() +\stopfunctioncall + +\type {PDFDoc} methods: + +\startfunctioncall +<boolean> = <PDFDoc>:isOk() +<integer> = <PDFDoc>:getErrorCode() +<string> = <PDFDoc>:getErrorCodeName() +<string> = <PDFDoc>:getFileName() +<XRef> = <PDFDoc>:getXRef() +<Catalog> = <PDFDoc>:getCatalog() +<number> = <PDFDoc>:getPageMediaWidth() +<number> = <PDFDoc>:getPageMediaHeight() +<number> = <PDFDoc>:getPageCropWidth() +<number> = <PDFDoc>:getPageCropHeight() +<integer> = <PDFDoc>:getNumPages() +<string> = <PDFDoc>:readMetadata() +<Object> = <PDFDoc>:getStructTreeRoot() +<integer> = <PDFDoc>:findPage(<integer> object number, <integer> object generation) +<Links> = <PDFDoc>:getLinks(<integer>) +<LinkDest> = <PDFDoc>:findDest(<string>) +<boolean> = <PDFDoc>:isEncrypted() +<boolean> = <PDFDoc>:okToPrint() +<boolean> = <PDFDoc>:okToChange() +<boolean> = <PDFDoc>:okToCopy() +<boolean> = <PDFDoc>:okToAddNotes() +<boolean> = <PDFDoc>:isLinearized() +<Object> = <PDFDoc>:getDocInfo() +<Object> = <PDFDoc>:getDocInfoNF() +<integer> = <PDFDoc>:getPDFMajorVersion() +<integer> = <PDFDoc>:getPDFMinorVersion() +\stopfunctioncall + +\type {PDFRectangle} methods: + +\startfunctioncall +<boolean> = <PDFRectangle>:isValid() +\stopfunctioncall + +%\type {Ref} methods: +% +%\startfunctioncall +%\stopfunctioncall + +\type {Stream} methods: + +\startfunctioncall +<integer> = <Stream>:getKind() +<string> = <Stream>:getKindName() + = <Stream>:reset() + = <Stream>:close() +<integer> = <Stream>:getChar() +<integer> = <Stream>:lookChar() +<integer> = <Stream>:getRawChar() +<integer> = <Stream>:getUnfilteredChar() + = <Stream>:unfilteredReset() +<integer> = <Stream>:getPos() +<boolean> = <Stream>:isBinary() +<Stream> = <Stream>:getUndecodedStream() +<Dict> = <Stream>:getDict() +\stopfunctioncall + +\type {StructElement} methods: + +\startfunctioncall +<string> = <StructElement>:getTypeName() +<integer> = <StructElement>:getType() +<boolean> = <StructElement>:isOk() +<boolean> = <StructElement>:isBlock() +<boolean> = <StructElement>:isInline() +<boolean> = <StructElement>:isGrouping() +<boolean> = <StructElement>:isContent() +<boolean> = <StructElement>:isObjectRef() +<integer> = <StructElement>:getMCID() +<Ref> = <StructElement>:getObjectRef() +<Ref> = <StructElement>:getParentRef() +<boolean> = <StructElement>:hasPageRef() +<Ref> = <StructElement>:getPageRef() +<StructTreeRoot> = <StructElement>:getStructTreeRoot() +<string> = <StructElement>:getID() +<string> = <StructElement>:getLanguage() +<integer> = <StructElement>:getRevision() + <StructElement>:setRevision(<unsigned integer>) +<string> = <StructElement>:getTitle() +<string> = <StructElement>:getExpandedAbbr() +<integer> = <StructElement>:getNumChildren() +<StructElement> = <StructElement>:getChild() + = <StructElement>:appendChild<StructElement>) +<integer> = <StructElement>:getNumAttributes() +<Attribute> = <StructElement>:geAttribute(<integer>) +<string> = <StructElement>:appendAttribute(<Attribute>) +<Attribute> = <StructElement>:findAttribute(<Attribute::Type>,boolean,Attribute::Owner) +<string> = <StructElement>:getAltText() +<string> = <StructElement>:getActualText() +<string> = <StructElement>:getText(<boolean>) +<table> = <StructElement>:getTextSpans() +\stopfunctioncall + +\type {StructTreeRoot} methods: + +\startfunctioncall +<StructElement> = <StructTreeRoot>:findParentElement +<PDFDoc> = <StructTreeRoot>:getDoc +<Dict> = <StructTreeRoot>:getRoleMap +<Dict> = <StructTreeRoot>:getClassMap +<integer> = <StructTreeRoot>:getNumChildren +<StructElement> = <StructTreeRoot>:getChild + <StructTreeRoot>:appendChild +<StructElement> = <StructTreeRoot>:findParentElement +\stopfunctioncall + +\type {TextSpan} han only one method: + +\startfunctioncall +<string> = <TestSpan>:getText() +\stopfunctioncall + +\type {XRef} methods: + +\startfunctioncall +<boolean> = <XRef>:isOk() +<integer> = <XRef>:getErrorCode() +<boolean> = <XRef>:isEncrypted() +<boolean> = <XRef>:okToPrint() +<boolean> = <XRef>:okToPrintHighRes() +<boolean> = <XRef>:okToChange() +<boolean> = <XRef>:okToCopy() +<boolean> = <XRef>:okToAddNotes() +<boolean> = <XRef>:okToFillForm() +<boolean> = <XRef>:okToAccessibility() +<boolean> = <XRef>:okToAssemble() +<Object> = <XRef>:getCatalog() +<Object> = <XRef>:fetch(<integer> object number, <integer> object generation) +<Object> = <XRef>:getDocInfo() +<Object> = <XRef>:getDocInfoNF() +<integer> = <XRef>:getNumObjects() +<integer> = <XRef>:getRootNum() +<integer> = <XRef>:getRootGen() +<integer> = <XRef>:getSize() +<Object> = <XRef>:getTrailerDict() +\stopfunctioncall + +There is an experimental function \type {epdf.openMemStream} that takes three +arguments: + +\starttabulate +\NC \type {stream} \NC this is a (in low level \LUA\ speak) light userdata + object, i.e.\ a pointer to a sequence of bytes \NC \NR +\NC \type {length} \NC this is the length of the stream in bytes \NC \NR +\NC \type {name} \NC this is a unique identifier that us used for hashing the + stream, so that mulltiple doesn't use more memory \NC \NR +\stoptabulate + +Instead of a light userdata stream you can also pass a \LUA\ string, in which +case the given length is (at most) the string length. + +The returned object can be used in the \type {img} library instead of a filename. +Both the memory stream and it's use in the image library is experimental and can +change. In case you wonder where this can be used: when you use the swiglib +library for graphic magick, it can return such a userdata object. This permits +conversion in memory and passing the result directly to the backend. This might +save some runtime in one|-|pass workflows. This feature is currently not meant +for production. + +%*********************************************************************** + +\section{The \type {font} library} + +The font library provides the interface into the internals of the font system, +and also it contains helper functions to load traditional \TEX\ font metrics +formats. Other font loading functionality is provided by the \type {fontloader} +library that will be discussed in the next section. + +\subsection{Loading a \TFM\ file} + +The behavior documented in this subsection is considered stable in the sense that +there will not be backward-incompatible changes any more. + +\startfunctioncall +<table> fnt = font.read_tfm(<string> name, <number> s) +\stopfunctioncall + +The number is a bit special: + +\startitemize +\startitem + If it is positive, it specifies an \quote {at size} in scaled points. +\stopitem +\startitem + If it is negative, its absolute value represents a \quote {scaled} + setting relative to the designsize of the font. +\stopitem +\stopitemize + +The internal structure of the metrics font table that is returned is explained in +\in {chapter} [fonts]. + +\subsection{Loading a \VF\ file} + +The behavior documented in this subsection is considered stable in the sense that +there will not be backward-incompatible changes any more. + +\startfunctioncall +<table> vf_fnt = font.read_vf(<string> name, <number> s) +\stopfunctioncall + +The meaning of the number \type {s} and the format of the returned table are +similar to the ones in the \type {read_tfm()} function. + +\subsection{The fonts array} + +The whole table of \TEX\ fonts is accessible from \LUA\ using a virtual array. + +\starttyping +font.fonts[n] = { ... } +<table> f = font.fonts[n] +\stoptyping + +See \in {chapter} [fonts] for the structure of the tables. Because this is a +virtual array, you cannot call \type {pairs} on it, but see below for the \type +{font.each} iterator. + +The two metatable functions implementing the virtual array are: + +\startfunctioncall +<table> f = font.getfont(<number> n) +font.setfont(<number> n, <table> f) +\stopfunctioncall + +Note that at the moment, each access to the \type {font.fonts} or call to \type +{font.getfont} creates a lua table for the whole font. This process can be quite +slow. In a later version of \LUATEX, this interface will change (it will start +using userdata objects instead of actual tables). + +Also note the following: assignments can only be made to fonts that have already +been defined in \TEX, but have not been accessed {\it at all\/} since that +definition. This limits the usability of the write access to \type {font.fonts} +quite a lot, a less stringent ruleset will likely be implemented later. + +\subsection{Checking a font's status} + +You can test for the status of a font by calling this function: + +\startfunctioncall +<boolean> f = font.frozen(<number> n) +\stopfunctioncall + +The return value is one of \type {true} (unassignable), \type {false} (can be +changed) or \type {nil} (not a valid font at all). + +\subsection{Defining a font directly} + +You can define your own font into \type {font.fonts} by calling this function: + +\startfunctioncall +<number> i = font.define(<table> f) +\stopfunctioncall + +The return value is the internal id number of the defined font (the index into +\type {font.fonts}). If the font creation fails, an error is raised. The table +is a font structure, as explained in \in {chapter} [fonts]. + +\subsection{Projected next font id} + +\startfunctioncall +<number> i = font.nextid() +\stopfunctioncall + +This returns the font id number that would be returned by a \type {font.define} +call if it was executed at this spot in the code flow. This is useful for virtual +fonts that need to reference themselves. + +\subsection{Font id} + +\startfunctioncall +<number> i = font.id(<string> csname) +\stopfunctioncall + +This returns the font id associated with \type {csname} string, or $-1$ if \type +{csname} is not defined. + +\subsection{Currently active font} + +\startfunctioncall +<number> i = font.current() +font.current(<number> i) +\stopfunctioncall + +This gets or sets the currently used font number. + +\subsection{Maximum font id} + +\startfunctioncall +<number> i = font.max() +\stopfunctioncall + +This is the largest used index in \type {font.fonts}. + +\subsection{Iterating over all fonts} + +\startfunctioncall +for i,v in font.each() do + ... +end +\stopfunctioncall + +This is an iterator over each of the defined \TEX\ fonts. The first returned +value is the index in \type {font.fonts}, the second the font itself, as a \LUA\ +table. The indices are listed incrementally, but they do not always form an array +of consecutive numbers: in some cases there can be holes in the sequence. + +\section{The \type {fontloader} library} + +\subsection{Getting quick information on a font} + +\startfunctioncall +<table> info = fontloader.info(<string> filename) +\stopfunctioncall + +This function returns either \type {nil}, or a \type {table}, or an array of +small tables (in the case of a TrueType collection). The returned table(s) will +contain some fairly interesting information items from the font(s) defined by the +file: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC fontname \NC string \NC the \POSTSCRIPT\ name of the font\NC \NR +\NC fullname \NC string \NC the formal name of the font\NC \NR +\NC familyname \NC string \NC the family name this font belongs to\NC \NR +\NC weight \NC string \NC a string indicating the color value of the font\NC \NR +\NC version \NC string \NC the internal font version\NC \NR +\NC italicangle \NC float \NC the slant angle\NC \NR +\NC units_per_em \NC number \NC 1000 for \POSTSCRIPT-based fonts, usually 2048 for \TRUETYPE\NC \NR +\NC pfminfo \NC table \NC (see \in{section}[fontloaderpfminfotable])\NC \NR +\stoptabulate + +Getting information through this function is (sometimes much) more efficient than +loading the font properly, and is therefore handy when you want to create a +dictionary of available fonts based on a directory contents. + +\subsection{Loading an \OPENTYPE\ or \TRUETYPE\ file} +If you want to use an \OPENTYPE\ font, you have to get the metric information +from somewhere. Using the \type {fontloader} library, the simplest way to get +that information is thus: + +\starttyping +function load_font (filename) + local metrics = nil + local font = fontloader.open(filename) + if font then + metrics = fontloader.to_table(font) + fontloader.close(font) + end + return metrics +end + +myfont = load_font('/opt/tex/texmf/fonts/data/arial.ttf') +\stoptyping + +The main function call is + +\startfunctioncall +<userdata> f, <table> w = fontloader.open(<string> filename) +<userdata> f, <table> w = fontloader.open(<string> filename, <string> fontname) +\stopfunctioncall + +The first return value is a userdata representation of the font. The second +return value is a table containing any warnings and errors reported by fontloader +while opening the font. In normal typesetting, you would probably ignore the +second argument, but it can be useful for debugging purposes. + +For \TRUETYPE\ collections (when filename ends in 'ttc') and \DFONT\ collections, +you have to use a second string argument to specify which font you want from the +collection. Use the \type {fontname} strings that are returned by \type +{fontloader.info} for that. + +To turn the font into a table, \type {fontloader.to_table} is used on the font +returned by \type {fontloader.open}. + +\startfunctioncall +<table> f = fontloader.to_table(<userdata> font) +\stopfunctioncall + +This table cannot be used directly by \LUATEX\ and should be turned into another +one as described in~\in {chapter} [fonts]. Do not forget to store the \type +{fontname} value in the \type {psname} field of the metrics table to be returned +to \LUATEX, otherwise the font inclusion backend will not be able to find the +correct font in the collection. + +See \in {section} [fontloadertables] for details on the userdata object returned +by \type {fontloader.open()} and the layout of the \type {metrics} table returned +by \type {fontloader.to_table()}. + +The font file is parsed and partially interpreted by the font loading routines +from \FONTFORGE. The file format can be \OPENTYPE, \TRUETYPE, \TRUETYPE\ +Collection, \CFF, or \TYPEONE. + +There are a few advantages to this approach compared to reading the actual font +file ourselves: + +\startitemize + +\startitem + The font is automatically re|-|encoded, so that the \type {metrics} table for + \TRUETYPE\ and \OPENTYPE\ fonts is using \UNICODE\ for the character indices. +\stopitem + +\startitem + Many features are pre|-|processed into a format that is easier to handle than + just the bare tables would be. +\stopitem + +\startitem + \POSTSCRIPT|-|based \OPENTYPE\ fonts do not store the character height and + depth in the font file, so the character boundingbox has to be calculated in + some way. +\stopitem + +\startitem + In the future, it may be interesting to allow \LUA\ scripts access to + the font program itself, perhaps even creating or changing the font. +\stopitem + +\stopitemize + +A loaded font is discarded with: + +\startfunctioncall +fontloader.close(<userdata> font) +\stopfunctioncall + +\subsection{Applying a \quote{feature file}} + +You can apply a \quote{feature file} to a loaded font: + +\startfunctioncall +<table> errors = fontloader.apply_featurefile(<userdata> font, <string> filename) +\stopfunctioncall + +A \quote {feature file} is a textual representation of the features in an +\OPENTYPE\ font. See + +\starttyping +http://www.adobe.com/devnet/opentype/afdko/topic_feature_file_syntax.html +\stoptyping + +and + +\starttyping +http://fontforge.sourceforge.net/featurefile.html +\stoptyping + +for a more detailed description of feature files. + +If the function fails, the return value is a table containing any errors reported +by fontloader while applying the feature file. On success, \type {nil} is +returned. + +\subsection{Applying an \quote{\AFM\ file}} + +You can apply an \quote {\AFM\ file} to a loaded font: + +\startfunctioncall +<table> errors = fontloader.apply_afmfile(<userdata> font, <string> filename) +\stopfunctioncall + +An \AFM\ file is a textual representation of (some of) the meta information +in a \TYPEONE\ font. See + +\starttyping +ftp://ftp.math.utah.edu/u/ma/hohn/linux/postscript/5004.AFM_Spec.pdf +\stoptyping + +for more information about \AFM\ files. + +Note: If you \type {fontloader.open()} a \TYPEONE\ file named \type {font.pfb}, +the library will automatically search for and apply \type {font.afm} if it exists +in the same directory as the file \type {font.pfb}. In that case, there is no +need for an explicit call to \type {apply_afmfile()}. + +If the function fails, the return value is a table containing any errors reported +by fontloader while applying the AFM file. On success, \type {nil} is returned. + +\subsection[fontloadertables]{Fontloader font tables} + +As mentioned earlier, the return value of \type {fontloader.open()} is a userdata +object. One way to have access to the actual metrics is to call \type +{fontloader.to_table()} on this object, returning the table structure that is +explained in the following subsections. + +However, it turns out that the result from \type {fontloader.to_table()} +sometimes needs very large amounts of memory (depending on the font's complexity +and size) so it is possible to access the userdata object directly. + +\startitemize +\startitem + All top|-|level keys that would be returned by \type {to_table()} + can also be accessed directly. +\stopitem +\startitem +\startitem + The top|-|level key \quote {glyphs} returns a {\it virtual\/} array that + allows indices from \type {f.glyphmin} to (\type {f.glyphmax}). +\stopitem +\startitem + The items in that virtual array (the actual glyphs) are themselves also + userdata objects, and each has accessors for all of the keys explained in the + section \quote {Glyph items} below. +\stopitem + The top|-|level key \quote {subfonts} returns an {\it actual} array of userdata + objects, one for each of the subfonts (or nil, if there are no subfonts). +\stopitem +\stopitemize + +A short example may be helpful. This code generates a printout of all +the glyph names in the font \type {PunkNova.kern.otf}: + +\starttyping +local f = fontloader.open('PunkNova.kern.otf') +print (f.fontname) +local i = 0 +if f.glyphcnt > 0 then + for i=f.glyphmin,f.glyphmax do + local g = f.glyphs[i] + if g then + print(g.name) + end + i = i + 1 + end +end +fontloader.close(f) +\stoptyping + +In this case, the \LUATEX\ memory requirement stays below 100MB on the test +computer, while the internal stucture generated by \type {to_table()} needs more +than 2GB of memory (the font itself is 6.9MB in disk size). + +Only the top|-|level font, the subfont table entries, and the glyphs are virtual +objects, everything else still produces normal lua values and tables. + +If you want to know the valid fields in a font or glyph structure, call the \type +{fields} function on an object of a particular type (either glyph or font): + +\startfunctioncall +<table> fields = fontloader.fields(<userdata> font) +<table> fields = fontloader.fields(<userdata> font_glyph) +\stopfunctioncall + +For instance: + +\startfunctioncall +local fields = fontloader.fields(f) +local fields = fontloader.fields(f.glyphs[0]) +\stopfunctioncall + +\subsubsection{Table types} + +\subsubsubsection{Top-level} + +The top|-|level keys in the returned table are (the explanations in this part of +the documentation are not yet finished): + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC table_version \NC number \NC indicates the metrics version (currently~0.3)\NC \NR +\NC fontname \NC string \NC \POSTSCRIPT\ font name\NC \NR +\NC fullname \NC string \NC official (human-oriented) font name\NC \NR +\NC familyname \NC string \NC family name\NC \NR +\NC weight \NC string \NC weight indicator\NC \NR +\NC copyright \NC string \NC copyright information\NC \NR +\NC filename \NC string \NC the file name\NC \NR +\NC version \NC string \NC font version\NC \NR +\NC italicangle \NC float \NC slant angle\NC \NR +\NC units_per_em \NC number \NC 1000 for \POSTSCRIPT-based fonts, usually 2048 for \TRUETYPE\NC \NR +\NC ascent \NC number \NC height of ascender in \type {units_per_em}\NC \NR +\NC descent \NC number \NC depth of descender in \type {units_per_em}\NC \NR +\NC upos \NC float \NC \NC \NR +\NC uwidth \NC float \NC \NC \NR +\NC uniqueid \NC number \NC \NC \NR +\NC glyphs \NC array \NC \NC \NR +\NC glyphcnt \NC number \NC number of included glyphs\NC \NR +\NC glyphmax \NC number \NC maximum used index the glyphs array\NC \NR +\NC glyphmin \NC number \NC minimum used index the glyphs array\NC \NR +\NC hasvmetrics \NC number \NC \NC \NR +\NC onlybitmaps \NC number \NC \NC \NR +\NC serifcheck \NC number \NC \NC \NR +\NC isserif \NC number \NC \NC \NR +\NC issans \NC number \NC \NC \NR +\NC encodingchanged \NC number \NC \NC \NR +\NC strokedfont \NC number \NC \NC \NR +\NC use_typo_metrics \NC number \NC \NC \NR +\NC weight_width_slope_only \NC number \NC \NC \NR +\NC head_optimized_for_cleartype \NC number \NC \NC \NR +\NC uni_interp \NC enum \NC \type {unset}, \type {none}, \type {adobe}, + \type {greek}, \type {japanese}, \type {trad_chinese}, + \type {simp_chinese}, \type {korean}, \type {ams}\NC \NR +\NC origname \NC string \NC the file name, as supplied by the user\NC \NR +\NC map \NC table \NC \NC \NR +\NC private \NC table \NC \NC \NR +\NC xuid \NC string \NC \NC \NR +\NC pfminfo \NC table \NC \NC \NR +\NC names \NC table \NC \NC \NR +\NC cidinfo \NC table \NC \NC \NR +\NC subfonts \NC array \NC \NC \NR +\NC commments \NC string \NC \NC \NR +\NC fontlog \NC string \NC \NC \NR +\NC cvt_names \NC string \NC \NC \NR +\NC anchor_classes \NC table \NC \NC \NR +\NC ttf_tables \NC table \NC \NC \NR +\NC ttf_tab_saved \NC table \NC \NC \NR +\NC kerns \NC table \NC \NC \NR +\NC vkerns \NC table \NC \NC \NR +\NC texdata \NC table \NC \NC \NR +\NC lookups \NC table \NC \NC \NR +\NC gpos \NC table \NC \NC \NR +\NC gsub \NC table \NC \NC \NR +\NC mm \NC table \NC \NC \NR +\NC chosenname \NC string \NC \NC \NR +\NC macstyle \NC number \NC \NC \NR +\NC fondname \NC string \NC \NC \NR +%NC design_size \NC number \NC \NC \NR +\NC fontstyle_id \NC number \NC \NC \NR +\NC fontstyle_name \NC table \NC \NC \NR +%NC design_range_bottom \NC number \NC \NC \NR +%NC design_range_top \NC number \NC \NC \NR +\NC strokewidth \NC float \NC \NC \NR +\NC mark_classes \NC table \NC \NC \NR +\NC creationtime \NC number \NC \NC \NR +\NC modificationtime \NC number \NC \NC \NR +\NC os2_version \NC number \NC \NC \NR +\NC sfd_version \NC number \NC \NC \NR +\NC math \NC table \NC \NC \NR +\NC validation_state \NC table \NC \NC \NR +\NC horiz_base \NC table \NC \NC \NR +\NC vert_base \NC table \NC \NC \NR +\NC extrema_bound \NC number \NC \NC \NR +\stoptabulate + +\subsubsubsection{Glyph items} + +The \type {glyphs} is an array containing the per|-|character +information (quite a few of these are only present if nonzero). + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC name \NC string \NC the glyph name \NC \NR +\NC unicode \NC number \NC unicode code point, or -1 \NC \NR +\NC boundingbox \NC array \NC array of four numbers, see note below \NC \NR +\NC width \NC number \NC only for horizontal fonts \NC \NR +\NC vwidth \NC number \NC only for vertical fonts \NC \NR +\NC tsidebearing \NC number \NC only for vertical ttf/otf fonts, and only if nonzero \NC \NR +\NC lsidebearing \NC number \NC only if nonzero and not equal to boundingbox[1] \NC \NR +\NC class \NC string \NC one of "none", "base", "ligature", "mark", "component" + (if not present, the glyph class is \quote {automatic}) \NC \NR +\NC kerns \NC array \NC only for horizontal fonts, if set \NC \NR +\NC vkerns \NC array \NC only for vertical fonts, if set \NC \NR +\NC dependents \NC array \NC linear array of glyph name strings, only if nonempty\NC \NR +\NC lookups \NC table \NC only if nonempty \NC \NR +\NC ligatures \NC table \NC only if nonempty \NC \NR +\NC anchors \NC table \NC only if set \NC \NR +\NC comment \NC string \NC only if set \NC \NR +\NC tex_height \NC number \NC only if set \NC \NR +\NC tex_depth \NC number \NC only if set \NC \NR +\NC italic_correction \NC number \NC only if set \NC \NR +\NC top_accent \NC number \NC only if set \NC \NR +\NC is_extended_shape \NC number \NC only if this character is part of a math extension list \NC \NR +\NC altuni \NC table \NC alternate \UNICODE\ items \NC \NR +\NC vert_variants \NC table \NC \NC \NR +\NC horiz_variants \NC table \NC \NC \NR +\NC mathkern \NC table \NC \NC \NR +\stoptabulate + +On \type {boundingbox}: The boundingbox information for \TRUETYPE\ fonts and +\TRUETYPE-based \OTF\ fonts is read directly from the font file. +\POSTSCRIPT-based fonts do not have this information, so the boundingbox of +traditional \POSTSCRIPT\ fonts is generated by interpreting the actual bezier +curves to find the exact boundingbox. This can be a slow process, so the +boundingboxes of \POSTSCRIPT-based \OTF\ fonts (and raw \CFF\ fonts) are +calculated using an approximation of the glyph shape based on the actual glyph +points only, instead of taking the whole curve into account. This means that +glyphs that have missing points at extrema will have a too|-|tight boundingbox, +but the processing is so much faster that in our opinion the tradeoff is worth +it. + +The \type {kerns} and \type {vkerns} are linear arrays of small hashes: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC char \NC string \NC \NC \NR +\NC off \NC number \NC \NC \NR +\NC lookup \NC string \NC \NC \NR +\stoptabulate + +The \type {lookups} is a hash, based on lookup subtable names, with +the value of each key inside that a linear array of small hashes: + +% TODO: fix this description +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC type \NC enum \NC \type {position}, \type {pair}, \type + {substitution}, \type {alternate}, \type + {multiple}, \type {ligature}, \type {lcaret}, + \type {kerning}, \type {vkerning}, \type + {anchors}, \type {contextpos}, \type + {contextsub}, \type {chainpos}, \type + {chainsub}, \type {reversesub}, \type {max}, + \type {kernback}, \type {vkernback} \NC \NR +\NC specification \NC table \NC extra data \NC \NR +\stoptabulate + +For the first seven values of \type {type}, there can be additional +sub|-|information, stored in the sub-table \type {specification}: + +\starttabulate[|lT|l|p|] +\NC \ssbf value \NC \bf type \NC \bf explanation \NC \NR +\NC position \NC table \NC a table of the \type {offset_specs} type \NC \NR +\NC pair \NC table \NC one string: \type {paired}, and an array of one + or two \type {offset_specs} tables: \type + {offsets} \NC \NR +\NC substitution \NC table \NC one string: \type {variant} \NC \NR +\NC alternate \NC table \NC one string: \type {components} \NC \NR +\NC multiple \NC table \NC one string: \type {components} \NC \NR +\NC ligature \NC table \NC two strings: \type {components}, \type {char} \NC \NR +\NC lcaret \NC array \NC linear array of numbers \NC \NR +\stoptabulate + +Tables for \type {offset_specs} contain up to four number|-|valued fields: \type +{x} (a horizontal offset), \type {y} (a vertical offset), \type {h} (an advance +width correction) and \type {v} (an advance height correction). + +The \type {ligatures} is a linear array of small hashes: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC lig \NC table \NC uses the same substructure as a single item in + the \type {lookups} table explained above \NC \NR +\NC char \NC string \NC \NC \NR +\NC components \NC array \NC linear array of named components \NC \NR +\NC ccnt \NC number \NC \NC \NR +\stoptabulate + +The \type {anchor} table is indexed by a string signifying the anchor type, which +is one of + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC mark \NC table \NC placement mark \NC \NR +\NC basechar \NC table \NC mark for attaching combining items to a base char \NC \NR +\NC baselig \NC table \NC mark for attaching combining items to a ligature \NC \NR +\NC basemark \NC table \NC generic mark for attaching combining items to connect to \NC \NR +\NC centry \NC table \NC cursive entry point \NC \NR +\NC cexit \NC table \NC cursive exit point \NC \NR +\stoptabulate + +The content of these is a short array of defined anchors, with the +entry keys being the anchor names. For all except \type {baselig}, the +value is a single table with this definition: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC x \NC number \NC x location \NC \NR +\NC y \NC number \NC y location \NC \NR +\NC ttf_pt_index \NC number \NC truetype point index, only if given \NC \NR +\stoptabulate + +For \type {baselig}, the value is a small array of such anchor sets sets, one for +each constituent item of the ligature. + +For clarification, an anchor table could for example look like this : + +\starttyping +['anchor'] = { + ['basemark'] = { + ['Anchor-7'] = { ['x']=170, ['y']=1080 } + }, + ['mark'] ={ + ['Anchor-1'] = { ['x']=160, ['y']=810 }, + ['Anchor-4'] = { ['x']=160, ['y']=800 } + }, + ['baselig'] = { + [1] = { ['Anchor-2'] = { ['x']=160, ['y']=650 } }, + [2] = { ['Anchor-2'] = { ['x']=460, ['y']=640 } } + } + } +\stoptyping + +Note: The \type {baselig} table can be sparse! + +\subsubsubsection{map table} + +The top|-|level map is a list of encoding mappings. Each of those is a table +itself. + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC enccount \NC number \NC \NC \NR +\NC encmax \NC number \NC \NC \NR +\NC backmax \NC number \NC \NC \NR +\NC remap \NC table \NC \NC \NR +\NC map \NC array \NC non|-|linear array of mappings\NC \NR +\NC backmap \NC array \NC non|-|linear array of backward mappings\NC \NR +\NC enc \NC table \NC \NC \NR +\stoptabulate + +The \type {remap} table is very small: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC firstenc \NC number \NC \NC \NR +\NC lastenc \NC number \NC \NC \NR +\NC infont \NC number \NC \NC \NR +\stoptabulate + +The \type {enc} table is a bit more verbose: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC enc_name \NC string \NC \NC \NR +\NC char_cnt \NC number \NC \NC \NR +\NC char_max \NC number \NC \NC \NR +\NC unicode \NC array \NC of \UNICODE\ position numbers\NC \NR +\NC psnames \NC array \NC of \POSTSCRIPT\ glyph names\NC \NR +\NC builtin \NC number \NC \NC \NR +\NC hidden \NC number \NC \NC \NR +\NC only_1byte \NC number \NC \NC \NR +\NC has_1byte \NC number \NC \NC \NR +\NC has_2byte \NC number \NC \NC \NR +\NC is_unicodebmp \NC number \NC only if nonzero\NC \NR +\NC is_unicodefull \NC number \NC only if nonzero\NC \NR +\NC is_custom \NC number \NC only if nonzero\NC \NR +\NC is_original \NC number \NC only if nonzero\NC \NR +\NC is_compact \NC number \NC only if nonzero\NC \NR +\NC is_japanese \NC number \NC only if nonzero\NC \NR +\NC is_korean \NC number \NC only if nonzero\NC \NR +\NC is_tradchinese \NC number \NC only if nonzero [name?]\NC \NR +\NC is_simplechinese \NC number \NC only if nonzero\NC \NR +\NC low_page \NC number \NC \NC \NR +\NC high_page \NC number \NC \NC \NR +\NC iconv_name \NC string \NC \NC \NR +\NC iso_2022_escape \NC string \NC \NC \NR +\stoptabulate + +\subsubsubsection{private table} + +This is the font's private \POSTSCRIPT\ dictionary, if any. Keys and values are +both strings. + +\subsubsubsection{cidinfo table} + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC registry \NC string \NC \NC \NR +\NC ordering \NC string \NC \NC \NR +\NC supplement \NC number \NC \NC \NR +\NC version \NC number \NC \NC \NR +\stoptabulate + +\subsubsubsection[fontloaderpfminfotable]{pfminfo table} + +The \type {pfminfo} table contains most of the OS/2 information: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC pfmset \NC number \NC \NC \NR +\NC winascent_add \NC number \NC \NC \NR +\NC windescent_add \NC number \NC \NC \NR +\NC hheadascent_add \NC number \NC \NC \NR +\NC hheaddescent_add \NC number \NC \NC \NR +\NC typoascent_add \NC number \NC \NC \NR +\NC typodescent_add \NC number \NC \NC \NR +\NC subsuper_set \NC number \NC \NC \NR +\NC panose_set \NC number \NC \NC \NR +\NC hheadset \NC number \NC \NC \NR +\NC vheadset \NC number \NC \NC \NR +\NC pfmfamily \NC number \NC \NC \NR +\NC weight \NC number \NC \NC \NR +\NC width \NC number \NC \NC \NR +\NC avgwidth \NC number \NC \NC \NR +\NC firstchar \NC number \NC \NC \NR +\NC lastchar \NC number \NC \NC \NR +\NC fstype \NC number \NC \NC \NR +\NC linegap \NC number \NC \NC \NR +\NC vlinegap \NC number \NC \NC \NR +\NC hhead_ascent \NC number \NC \NC \NR +\NC hhead_descent \NC number \NC \NC \NR +\NC os2_typoascent \NC number \NC \NC \NR +\NC os2_typodescent \NC number \NC \NC \NR +\NC os2_typolinegap \NC number \NC \NC \NR +\NC os2_winascent \NC number \NC \NC \NR +\NC os2_windescent \NC number \NC \NC \NR +\NC os2_subxsize \NC number \NC \NC \NR +\NC os2_subysize \NC number \NC \NC \NR +\NC os2_subxoff \NC number \NC \NC \NR +\NC os2_subyoff \NC number \NC \NC \NR +\NC os2_supxsize \NC number \NC \NC \NR +\NC os2_supysize \NC number \NC \NC \NR +\NC os2_supxoff \NC number \NC \NC \NR +\NC os2_supyoff \NC number \NC \NC \NR +\NC os2_strikeysize \NC number \NC \NC \NR +\NC os2_strikeypos \NC number \NC \NC \NR +\NC os2_family_class \NC number \NC \NC \NR +\NC os2_xheight \NC number \NC \NC \NR +\NC os2_capheight \NC number \NC \NC \NR +\NC os2_defaultchar \NC number \NC \NC \NR +\NC os2_breakchar \NC number \NC \NC \NR +\NC os2_vendor \NC string \NC \NC \NR +\NC codepages \NC table \NC A two-number array of encoded code pages\NC \NR +\NC unicoderages \NC table \NC A four-number array of encoded unicode ranges\NC \NR +\NC panose \NC table \NC \NC \NR +\stoptabulate + +The \type {panose} subtable has exactly 10 string keys: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC familytype \NC string \NC Values as in the \OPENTYPE\ font + specification: \type {Any}, \type {No Fit}, + \type {Text and Display}, \type {Script}, + \type {Decorative}, \type {Pictorial} \NC + \NR +\NC serifstyle \NC string \NC See the \OPENTYPE\ font specification for + values \NC \NR +\NC weight \NC string \NC id. \NC \NR +\NC proportion \NC string \NC id. \NC \NR +\NC contrast \NC string \NC id. \NC \NR +\NC strokevariation \NC string \NC id. \NC \NR +\NC armstyle \NC string \NC id. \NC \NR +\NC letterform \NC string \NC id. \NC \NR +\NC midline \NC string \NC id. \NC \NR +\NC xheight \NC string \NC id. \NC \NR +\stoptabulate + +\subsubsubsection[fontloadernamestable]{names table} + +Each item has two top|-|level keys: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC lang \NC string \NC language for this entry \NC \NR +\NC names \NC table \NC \NC \NR +\stoptabulate + +The \type {names} keys are the actual \TRUETYPE\ name strings. The possible keys +are: + +\starttabulate[|lT|p|] +\NC \ssbf key \NC \bf explanation \NC \NR +\NC copyright \NC \NC \NR +\NC family \NC \NC \NR +\NC subfamily \NC \NC \NR +\NC uniqueid \NC \NC \NR +\NC fullname \NC \NC \NR +\NC version \NC \NC \NR +\NC postscriptname \NC \NC \NR +\NC trademark \NC \NC \NR +\NC manufacturer \NC \NC \NR +\NC designer \NC \NC \NR +\NC descriptor \NC \NC \NR +\NC venderurl \NC \NC \NR +\NC designerurl \NC \NC \NR +\NC license \NC \NC \NR +\NC licenseurl \NC \NC \NR +\NC idontknow \NC \NC \NR +\NC preffamilyname \NC \NC \NR +\NC prefmodifiers \NC \NC \NR +\NC compatfull \NC \NC \NR +\NC sampletext \NC \NC \NR +\NC cidfindfontname \NC \NC \NR +\NC wwsfamily \NC \NC \NR +\NC wwssubfamily \NC \NC \NR +\stoptabulate + +\subsubsubsection{anchor_classes table} + +The anchor_classes classes: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC name \NC string \NC a descriptive id of this anchor class\NC \NR +\NC lookup \NC string \NC \NC \NR +\NC type \NC string \NC one of \type {mark}, \type {mkmk}, \type {curs}, \type {mklg} \NC \NR +\stoptabulate + +% type is actually a lookup subtype, not a feature name. Officially, these +% strings should be gpos_mark2mark etc. + +\subsubsubsection{gpos table} + +The \type {gpos} table has one array entry for each lookup. (The \type {gpos_} +prefix is somewhat redundant.) + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC type \NC string \NC one of \type {gpos_single}, \type {gpos_pair}, + \type {gpos_cursive}, \type {gpos_mark2base},\crlf + \type {gpos_mark2ligature}, \type + {gpos_mark2mark}, \type {gpos_context},\crlf \type + {gpos_contextchain} \NC \NR +\NC flags \NC table \NC \NC \NR +\NC name \NC string \NC \NC \NR +\NC features \NC array \NC \NC \NR +\NC subtables \NC array \NC \NC \NR +\stoptabulate + +The flags table has a true value for each of the lookup flags that is actually +set: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC r2l \NC boolean \NC \NC \NR +\NC ignorebaseglyphs \NC boolean \NC \NC \NR +\NC ignoreligatures \NC boolean \NC \NC \NR +\NC ignorecombiningmarks \NC boolean \NC \NC \NR +\NC mark_class \NC string \NC \NC \NR +\stoptabulate + +The features subtable items of gpos have: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC tag \NC string \NC \NC \NR +\NC scripts \NC table \NC \NC \NR +\stoptabulate + +The scripts table within features has: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC script \NC string \NC \NC \NR +\NC langs \NC array of strings \NC \NC \NR +\stoptabulate + +The subtables table has: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC name \NC string \NC \NC \NR +\NC suffix \NC string \NC (only if used)\NC \NR % used by gpos_single to get a default +\NC anchor_classes \NC number \NC (only if used)\NC \NR +\NC vertical_kerning \NC number \NC (only if used)\NC \NR +\NC kernclass \NC table \NC (only if used)\NC \NR +\stoptabulate + + +The kernclass with subtables table has: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC firsts \NC array of strings \NC \NC \NR +\NC seconds \NC array of strings \NC \NC \NR +\NC lookup \NC string or array \NC associated lookup(s) \NC \NR +\NC offsets \NC array of numbers \NC \NC \NR +\stoptabulate + +\subsubsubsection{gsub table} + +This has identical layout to the \type {gpos} table, except for the +type: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC type \NC string \NC one of \type {gsub_single}, \type {gsub_multiple}, + \type {gsub_alternate}, \type + {gsub_ligature},\crlf \type {gsub_context}, \type + {gsub_contextchain}, \type + {gsub_reversecontextchain} \NC \NR +\stoptabulate + +\subsubsubsection{ttf_tables and ttf_tab_saved tables} + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC tag \NC string \NC \NC \NR +\NC len \NC number \NC \NC \NR +\NC maxlen \NC number \NC \NC \NR +\NC data \NC number \NC \NC \NR +\stoptabulate + +\subsubsubsection{mm table} + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC axes \NC table \NC array of axis names \NC \NR +\NC instance_count \NC number \NC \NC \NR +\NC positions \NC table \NC array of instance positions + (\#axes * instances )\NC \NR +\NC defweights \NC table \NC array of default weights for instances \NC \NR +\NC cdv \NC string \NC \NC \NR +\NC ndv \NC string \NC \NC \NR +\NC axismaps \NC table \NC \NC \NR +\stoptabulate + +The \type {axismaps}: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC blends \NC table \NC an array of blend points \NC \NR +\NC designs \NC table \NC an array of design values \NC \NR +\NC min \NC number \NC \NC \NR +\NC def \NC number \NC \NC \NR +\NC max \NC number \NC \NC \NR +\stoptabulate + +\subsubsubsection{mark_classes table} + +The keys in this table are mark class names, and the values are a +space|-|separated string of glyph names in this class. + +\subsubsubsection{math table} + +\starttabulate[|lT|p|] +\NC ScriptPercentScaleDown \NC \NC \NR +\NC ScriptScriptPercentScaleDown \NC \NC \NR +\NC DelimitedSubFormulaMinHeight \NC \NC \NR +\NC DisplayOperatorMinHeight \NC \NC \NR +\NC MathLeading \NC \NC \NR +\NC AxisHeight \NC \NC \NR +\NC AccentBaseHeight \NC \NC \NR +\NC FlattenedAccentBaseHeight \NC \NC \NR +\NC SubscriptShiftDown \NC \NC \NR +\NC SubscriptTopMax \NC \NC \NR +\NC SubscriptBaselineDropMin \NC \NC \NR +\NC SuperscriptShiftUp \NC \NC \NR +\NC SuperscriptShiftUpCramped \NC \NC \NR +\NC SuperscriptBottomMin \NC \NC \NR +\NC SuperscriptBaselineDropMax \NC \NC \NR +\NC SubSuperscriptGapMin \NC \NC \NR +\NC SuperscriptBottomMaxWithSubscript \NC \NC \NR +\NC SpaceAfterScript \NC \NC \NR +\NC UpperLimitGapMin \NC \NC \NR +\NC UpperLimitBaselineRiseMin \NC \NC \NR +\NC LowerLimitGapMin \NC \NC \NR +\NC LowerLimitBaselineDropMin \NC \NC \NR +\NC StackTopShiftUp \NC \NC \NR +\NC StackTopDisplayStyleShiftUp \NC \NC \NR +\NC StackBottomShiftDown \NC \NC \NR +\NC StackBottomDisplayStyleShiftDown \NC \NC \NR +\NC StackGapMin \NC \NC \NR +\NC StackDisplayStyleGapMin \NC \NC \NR +\NC StretchStackTopShiftUp \NC \NC \NR +\NC StretchStackBottomShiftDown \NC \NC \NR +\NC StretchStackGapAboveMin \NC \NC \NR +\NC StretchStackGapBelowMin \NC \NC \NR +\NC FractionNumeratorShiftUp \NC \NC \NR +\NC FractionNumeratorDisplayStyleShiftUp \NC \NC \NR +\NC FractionDenominatorShiftDown \NC \NC \NR +\NC FractionDenominatorDisplayStyleShiftDown \NC \NC \NR +\NC FractionNumeratorGapMin \NC \NC \NR +\NC FractionNumeratorDisplayStyleGapMin \NC \NC \NR +\NC FractionRuleThickness \NC \NC \NR +\NC FractionDenominatorGapMin \NC \NC \NR +\NC FractionDenominatorDisplayStyleGapMin \NC \NC \NR +\NC SkewedFractionHorizontalGap \NC \NC \NR +\NC SkewedFractionVerticalGap \NC \NC \NR +\NC OverbarVerticalGap \NC \NC \NR +\NC OverbarRuleThickness \NC \NC \NR +\NC OverbarExtraAscender \NC \NC \NR +\NC UnderbarVerticalGap \NC \NC \NR +\NC UnderbarRuleThickness \NC \NC \NR +\NC UnderbarExtraDescender \NC \NC \NR +\NC RadicalVerticalGap \NC \NC \NR +\NC RadicalDisplayStyleVerticalGap \NC \NC \NR +\NC RadicalRuleThickness \NC \NC \NR +\NC RadicalExtraAscender \NC \NC \NR +\NC RadicalKernBeforeDegree \NC \NC \NR +\NC RadicalKernAfterDegree \NC \NC \NR +\NC RadicalDegreeBottomRaisePercent \NC \NC \NR +\NC MinConnectorOverlap \NC \NC \NR +\NC FractionDelimiterSize \NC \NC \NR +\NC FractionDelimiterDisplayStyleSize \NC \NC \NR +\stoptabulate + +\subsubsubsection{validation_state table} + +\starttabulate[|lT|p|] +\NC \ssbf key \NC \bf explanation \NC \NR +\NC bad_ps_fontname \NC \NC \NR +\NC bad_glyph_table \NC \NC \NR +\NC bad_cff_table \NC \NC \NR +\NC bad_metrics_table \NC \NC \NR +\NC bad_cmap_table \NC \NC \NR +\NC bad_bitmaps_table \NC \NC \NR +\NC bad_gx_table \NC \NC \NR +\NC bad_ot_table \NC \NC \NR +\NC bad_os2_version \NC \NC \NR +\NC bad_sfnt_header \NC \NC \NR +\stoptabulate + +\subsubsubsection{horiz_base and vert_base table} + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC tags \NC table \NC an array of script list tags\NC \NR +\NC scripts \NC table \NC \NC \NR +\stoptabulate + +The \type {scripts} subtable: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC baseline \NC table \NC \NC \NR +\NC default_baseline \NC number \NC \NC \NR +\NC lang \NC table \NC \NC \NR +\stoptabulate + + +The \type {lang} subtable: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC tag \NC string \NC a script tag \NC \NR +\NC ascent \NC number \NC \NC \NR +\NC descent \NC number \NC \NC \NR +\NC features \NC table \NC \NC \NR +\stoptabulate + +The \type {features} points to an array of tables with the same layout except +that in those nested tables, the tag represents a language. + +\subsubsubsection{altuni table} + +An array of alternate \UNICODE\ values. Inside that array are hashes with: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC unicode \NC number \NC this glyph is also used for this unicode \NC \NR +\NC variant \NC number \NC the alternative is driven by this unicode selector \NC \NR +\stoptabulate + +\subsubsubsection{vert_variants and horiz_variants table} + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC variants \NC string \NC \NC \NR +\NC italic_correction \NC number \NC \NC \NR +\NC parts \NC table \NC \NC \NR +\stoptabulate + +The \type {parts} table is an array of smaller tables: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC component \NC string \NC \NC \NR +\NC extender \NC number \NC \NC \NR +\NC start \NC number \NC \NC \NR +\NC end \NC number \NC \NC \NR +\NC advance \NC number \NC \NC \NR +\stoptabulate + + +\subsubsubsection{mathkern table} + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC top_right \NC table \NC \NC \NR +\NC bottom_right \NC table \NC \NC \NR +\NC top_left \NC table \NC \NC \NR +\NC bottom_left \NC table \NC \NC \NR +\stoptabulate + +Each of the subtables is an array of small hashes with two keys: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC height \NC number \NC \NC \NR +\NC kern \NC number \NC \NC \NR +\stoptabulate + +\subsubsubsection{kerns table} + +Substructure is identical to the per|-|glyph subtable. + +\subsubsubsection{vkerns table} + +Substructure is identical to the per|-|glyph subtable. + +\subsubsubsection{texdata table} + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC type \NC string \NC \type {unset}, \type {text}, \type {math}, \type {mathext} \NC \NR +\NC params \NC array \NC 22 font numeric parameters \NC \NR +\stoptabulate + +\subsubsubsection{lookups table} + +Top|-|level \type {lookups} is quite different from the ones at character level. +The keys in this hash are strings, the values the actual lookups, represented as +dictionary tables. + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC type \NC string \NC \NC \NR +\NC format \NC enum \NC one of \type {glyphs}, \type {class}, \type {coverage}, \type {reversecoverage} \NC \NR +\NC tag \NC string \NC \NC \NR +\NC current_class \NC array \NC \NC \NR +\NC before_class \NC array \NC \NC \NR +\NC after_class \NC array \NC \NC \NR +\NC rules \NC array \NC an array of rule items\NC \NR +\stoptabulate + +Rule items have one common item and one specialized item: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC lookups \NC array \NC a linear array of lookup names\NC \NR +\NC glyphs \NC array \NC only if the parent's format is \type {glyphs}\NC \NR +\NC class \NC array \NC only if the parent's format is \type {class}\NC \NR +\NC coverage \NC array \NC only if the parent's format is \type {coverage}\NC \NR +\NC reversecoverage \NC array \NC only if the parent's format is \type {reversecoverage}\NC \NR +\stoptabulate + +A glyph table is: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC names \NC string \NC \NC \NR +\NC back \NC string \NC \NC \NR +\NC fore \NC string \NC \NC \NR +\stoptabulate + +A class table is: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC current \NC array \NC of numbers \NC \NR +\NC before \NC array \NC of numbers \NC \NR +\NC after \NC array \NC of numbers \NC \NR +\stoptabulate + +coverage: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC current \NC array \NC of strings \NC \NR +\NC before \NC array \NC of strings\NC \NR +\NC after \NC array \NC of strings \NC \NR +\stoptabulate + +reversecoverage: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC current \NC array \NC of strings \NC \NR +\NC before \NC array \NC of strings\NC \NR +\NC after \NC array \NC of strings \NC \NR +\NC replacements \NC string \NC \NC \NR +\stoptabulate + +%*********************************************************************** + +\section{The \type {img} library} + +The \type {img} library can be used as an alternative to \type {\pdfximage} and +\type {\pdfrefximage}, and the associated \quote {satellite} commands like \type +{\pdfximagebbox}. Image objects can also be used within virtual fonts via the +\type {image} command listed in~\in {section} [virtualfonts]. + +\subsection{\type {img.new}} + +\startfunctioncall +<image> var = img.new() +<image> var = img.new(<table> image_spec) +\stopfunctioncall + +This function creates a userdata object of type \quote {image}. The \type +{image_spec} argument is optional. If it is given, it must be a table, and that +table must contain a \type {filename} key. A number of other keys can also be +useful, these are explained below. + +You can either say + +\starttyping +a = img.new() +\stoptyping + +followed by + +\starttyping +a.filename = "foo.png" +\stoptyping + +or you can put the file name (and some or all of the other keys) into a table +directly, like so: + +\starttyping +a = img.new({filename='foo.pdf', page=1}) +\stoptyping + +The generated \type {<image>} userdata object allows access to a set of +user|-|specified values as well as a set of values that are normally filled in +and updated automatically by \LUATEX\ itself. Some of those are derived from the +actual image file, others are updated to reflect the \PDF\ output status of the +object. + +There is one required user-specified field: the file name (\type {filename}). It +can optionally be augmented by the requested image dimensions (\type {width}, +\type {depth}, \type {height}), user|-|specified image attributes (\type {attr}), +the requested \PDF\ page identifier (\type {page}), the requested boundingbox +(\type {pagebox}) for \PDF\ inclusion, the requested color space object (\type +{colorspace}). + +The function \type {img.new} does not access the actual image file, it just +creates the \type {<image>} userdata object and initializes some memory +structures. The \type {<image>} object and its internal structures are +automatically garbage collected. + +Once the image is scanned, all the values in the \type {<image>} except \type +{width}, \type {height} and \type {depth}, become frozen, and you cannot change +them any more. + +\subsection{\type {img.keys}} + +\startfunctioncall +<table> keys = img.keys() +\stopfunctioncall + +This function returns a list of all the possible \type {image_spec} keys, both +user-supplied and automatic ones. + +% hahe: i need to add r/w ro column... +\starttabulate[|l|l|p|] +\NC \bf field name\NC \bf type \NC description \NC \NR +\NC attr \NC string \NC the image attributes for \LUATEX \NC \NR +\NC bbox \NC table \NC table with 4 boundingbox dimensions + \type {llx}, \type {lly}, \type {urx}, + and \type {ury} overruling the \type {pagebox} + entry\NC \NR +\NC colordepth \NC number \NC the number of bits used by the color space\NC \NR +\NC colorspace \NC number \NC the color space object number \NC \NR +\NC depth \NC number \NC the image depth for \LUATEX\ + (in scaled points)\NC \NR +\NC filename \NC string \NC the image file name \NC \NR +\NC filepath \NC string \NC the full (expanded) file name of the image\NC \NR +\NC height \NC number \NC the image height for \LUATEX\ + (in scaled points)\NC \NR +\NC imagetype \NC string \NC one of \type {pdf}, \type {png}, \type {jpg}, \type {jp2}, + \type {jbig2}, or \type {nil} \NC \NR +\NC index \NC number \NC the \PDF\ image name suffix \NC \NR +\NC objnum \NC number \NC the \PDF\ image object number \NC \NR +\NC page \NC ?? \NC the identifier for the requested image page + (type is number or string, + default is the number 1)\NC \NR +\NC pagebox \NC string \NC the requested bounding box, one of + \type {none}, \type {media}, \type {crop}, + \type {bleed}, \type {trim}, \type {art} \NC \NR +\NC pages \NC number \NC the total number of available pages \NC \NR +\NC rotation \NC number \NC the image rotation from included \PDF\ file, + in multiples of 90~deg. \NC \NR +\NC stream \NC string \NC the raw stream data for an \type {/Xobject} + \type {/Form} object\NC \NR +\NC transform \NC number \NC the image transform, integer number 0..7\NC \NR +\NC width \NC number \NC the image width for \LUATEX\ + (in scaled points)\NC \NR +\NC xres \NC number \NC the horizontal natural image resolution + (in \DPI) \NC \NR +\NC xsize \NC number \NC the natural image width \NC \NR +\NC yres \NC number \NC the vertical natural image resolution + (in \DPI) \NC \NR +\NC ysize \NC number \NC the natural image height \NC \NR +\stoptabulate + +A running (undefined) dimension in \type {width}, \type {height}, or \type +{depth} is represented as \type {nil} in \LUA, so if you want to load an image at +its \quote {natural} size, you do not have to specify any of those three fields. + +The \type {stream} parameter allows to fabricate an \type {/XObject} \type +{/Form} object from a string giving the stream contents, e.g., for a filled +rectangle: + +\startfunctioncall +a.stream = "0 0 20 10 re f" +\stopfunctioncall + +When writing the image, an \type {/Xobject} \type {/Form} object is created, like +with embedded \PDF\ file writing. The object is written out only once. The \type +{stream} key requires that also the \type {bbox} table is given. The \type +{stream} key conflicts with the \type {filename} key. The \type {transform} key +works as usual also with \type {stream}. + +The \type {bbox} key needs a table with four boundingbox values, e.g.: + +\startfunctioncall +a.bbox = {"30bp", 0, "225bp", "200bp"} +\stopfunctioncall + +This replaces and overrules any given \type {pagebox} value; with given \type +{bbox} the box dimensions coming with an embedded \PDF\ file are ignored. The +\type {xsize} and \type {ysize} dimensions are set accordingly, when the image is +scaled. The \type {bbox} parameter is ignored for non-\PDF\ images. + +The \type {transform} allows to mirror and rotate the image in steps of 90~deg. +The default value~$0$ gives an unmirrored, unrotated image. Values $1-3$ give +counterclockwise rotation by $90$, $180$, or $270$~degrees, whereas with values +$4-7$ the image is first mirrored and then rotated counterclockwise by $90$, +$180$, or $270$~degrees. The \type {transform} operation gives the same visual +result as if you would externally preprocess the image by a graphics tool and +then use it by \LUATEX. If a \PDF\ file to be embedded already contains a \type +{/Rotate} specification, the rotation result is the combination of the \type +{/Rotate} rotation followed by the \type {transform} operation. + +\subsection{\type {img.scan}} + +\startfunctioncall +<image> var = img.scan(<image> var) +<image> var = img.scan(<table> image_spec) +\stopfunctioncall + +When you say \type {img.scan(a)} for a new image, the file is scanned, and +variables such as \type {xsize}, \type {ysize}, image \type {type}, number of +\type {pages}, and the resolution are extracted. Each of the \type {width}, \type +{height}, \type {depth} fields are set up according to the image dimensions, if +they were not given an explicit value already. An image file will never be +scanned more than once for a given image variable. With all subsequent \type +{img.scan(a)} calls only the dimensions are again set up (if they have been +changed by the user in the meantime). + +For ease of use, you can do right-away a + +\starttyping +<image> a = img.scan ({ filename = "foo.png" }) +\stoptyping + +without a prior \type {img.new}. + +Nothing is written yet at this point, so you can do \type {a=img.scan}, retrieve +the available info like image width and height, and then throw away \type {a} +again by saying \type {a=nil}. In that case no image object will be reserved in +the PDF, and the used memory will be cleaned up automatically. + +\subsection{\type {img.copy}} + +\startfunctioncall +<image> var = img.copy(<image> var) +<image> var = img.copy(<table> image_spec) +\stopfunctioncall + +If you say \type {a = b}, then both variables point to the same \type {<image>} +object. if you want to write out an image with different sizes, you can do a +\type {b=img.copy(a)}. + +Afterwards, \type {a} and \type {b} still reference the same actual image +dictionary, but the dimensions for \type {b} can now be changed from their +initial values that were just copies from \type {a}. + +\subsection{\type {img.write}} + +\startfunctioncall +<image> var = img.write(<image> var) +<image> var = img.write(<table> image_spec) +\stopfunctioncall + +By \type {img.write(a)} a \PDF\ object number is allocated, and a whatsit node of +subtype \type {pdf_refximage} is generated and put into the output list. By this +the image \type {a} is placed into the page stream, and the image file is written +out into an image stream object after the shipping of the current page is +finished. + +Again you can do a terse call like + +\starttyping +img.write ({ filename = "foo.png" }) +\stoptyping + +The \type {<image>} variable is returned in case you want it for later +processing. + +\subsection{\type {img.immediatewrite}} + +\startfunctioncall +<image> var = img.immediatewrite(<image> var) +<image> var = img.immediatewrite(<table> image_spec) +\stopfunctioncall + +By \type {img.immediatewrite(a)} a \PDF\ object number is allocated, and the +image file for image \type {a} is written out immediately into the \PDF\ file as +an image stream object (like with \type {\immediate}\type {\pdfximage}). The object +number of the image stream dictionary is then available by the \type {objnum} +key. No \type {pdf_refximage} whatsit node is generated. You will need an +\type {img.write(a)} or \type {img.node(a)} call to let the image appear on the +page, or reference it by another trick; else you will have a dangling image +object in the \PDF\ file. + +Also here you can do a terse call like + +\starttyping +a = img.immediatewrite ({ filename = "foo.png" }) +\stoptyping + +The \type {<image>} variable is returned and you will most likely need it. + +\subsection{\type {img.node}} + +\startfunctioncall +<node> n = img.node(<image> var) +<node> n = img.node(<table> image_spec) +\stopfunctioncall + +This function allocates a \PDF\ object number and returns a whatsit node of +subtype \type {pdf_refximage}, filled with the image parameters \type {width}, +\type {height}, \type {depth}, and \type {objnum}. Also here you can do a terse +call like: + +\starttyping +n = img.node ({ filename = "foo.png" }) +\stoptyping + +This example outputs an image: + +\starttyping +node.write(img.node{filename="foo.png"}) +\stoptyping + +\subsection{\type {img.types}} + +\startfunctioncall +<table> types = img.types() +\stopfunctioncall + +This function returns a list with the supported image file type names, currently +these are \type {pdf}, \type {png}, \type {jpg}, \type {jp2} (JPEG~2000), and +\type {jbig2}. + +\subsection{\type {img.boxes}} + +\startfunctioncall +<table> boxes = img.boxes() +\stopfunctioncall + +This function returns a list with the supported \PDF\ page box names, currently +these are \type {media}, \type {crop}, \type {bleed}, \type {trim}, and \type +{art} (all in lowercase letters). + +%*********************************************************************** + +\section{The \type {kpse} library} + +This library provides two separate, but nearly identical interfaces to the +\KPATHSEA\ file search functionality: there is a \quote {normal} procedural +interface that shares its kpathsea instance with \LUATEX\ itself, and an object +oriented interface that is completely on its own. + +\subsection{\type {kpse.set_program_name} and \type {kpse.new}} + +Before the search library can be used at all, its database has to be initialized. +There are three possibilities, two of which belong to the procedural interface. + +First, when \LUATEX\ is used to typeset documents, this initialization happens +automatically and the \KPATHSEA\ executable and program names are set to \type +{luatex} (that is, unless explicitly prohibited by the user's startup script. +See~\in {section} [init] for more details). + +Second, in \TEXLUA\ mode, the initialization has to be done explicitly via the +\type {kpse.set_program_name} function, which sets the \KPATHSEA\ executable +(and optionally program) name. + +\startfunctioncall +kpse.set_program_name(<string> name) +kpse.set_program_name(<string> name, <string> progname) +\stopfunctioncall + +The second argument controls the use of the \quote {dotted} values in the \type +{texmf.cnf} configuration file, and defaults to the first argument. + +Third, if you prefer the object oriented interface, you have to call a different +function. It has the same arguments, but it returns a userdata variable. + +\startfunctioncall +local kpathsea = kpse.new(<string> name) +local kpathsea = kpse.new(<string> name, <string> progname) +\stopfunctioncall + +Apart from these two functions, the calling conventions of the interfaces are +identical. Depending on the chosen interface, you either call \type +{kpse.find_file()} or \type {kpathsea:find_file()}, with identical arguments and +return vales. + +\subsection{\type {find_file}} + +The most often used function in the library is find_file: + +\startfunctioncall +<string> f = kpse.find_file(<string> filename) +<string> f = kpse.find_file(<string> filename, <string> ftype) +<string> f = kpse.find_file(<string> filename, <boolean> mustexist) +<string> f = kpse.find_file(<string> filename, <string> ftype, <boolean> mustexist) +<string> f = kpse.find_file(<string> filename, <string> ftype, <number> dpi) +\stopfunctioncall + +Arguments: +\startitemize[intro] + +\sym{filename} + +the name of the file you want to find, with or without extension. + +\sym{ftype} + +maps to the \type {-format} argument of \KPSEWHICH. The supported \type {ftype} +values are the same as the ones supported by the standalone \type {kpsewhich} +program: + +\startsimplecolumns +\starttyping +gf +pk +bitmap font +tfm +afm +base +bib +bst +cnf +ls-R +fmt +map +mem +mf +mfpool +mft +mp +mppool +MetaPost support +ocp +ofm +opl +otp +ovf +ovp +graphic/figure +tex +TeX system documentation +texpool +TeX system sources +PostScript header +Troff fonts +type1 fonts +vf +dvips config +ist +truetype fonts +type42 fonts +web2c files +other text files +other binary files +misc fonts +web +cweb +enc files +cmap files +subfont definition files +opentype fonts +pdftex config +lig files +texmfscripts +lua +font feature files +cid maps +mlbib +mlbst +clua +\stoptyping +\stopsimplecolumns + +The default type is \type {tex}. Note: this is different from \KPSEWHICH, which +tries to deduce the file type itself from looking at the supplied extension. + +\sym{mustexist} + +is similar to \KPSEWHICH's \type {-must-exist}, and the default is \type {false}. +If you specify \type {true} (or a non|-|zero integer), then the \KPSE\ library +will search the disk as well as the \type {ls-R} databases. + +\sym{dpi} + +This is used for the size argument of the formats \type {pk}, \type {gf}, and +\type {bitmap font}. \stopitemize + + +\subsection{\type {lookup}} + +A more powerful (but slower) generic method for finding files is also available. +It returns a string for each found file. + +\startfunctioncall +<string> f, ... = kpse.lookup(<string> filename, <table> options) +\stopfunctioncall + +The options match commandline arguments from \type {kpsewhich}: + +\starttabulate[|l|l|p|] +\NC \ssbf key \NC \ssbf type \NC \ssbf description \NC \NR +\NC debug \NC number \NC set debugging flags for this lookup\NC \NR +\NC format \NC string \NC use specific file type (see list above)\NC \NR +\NC dpi \NC number \NC use this resolution for this lookup; default 600\NC \NR +\NC path \NC string \NC search in the given path\NC \NR +\NC all \NC boolean \NC output all matches, not just the first\NC \NR +\NC mustexist \NC boolean \NC search the disk as well as ls-R if necessary\NC \NR +\NC mktexpk \NC boolean \NC disable/enable mktexpk generation for this lookup\NC \NR +\NC mktextex \NC boolean \NC disable/enable mktextex generation for this lookup\NC \NR +\NC mktexmf \NC boolean \NC disable/enable mktexmf generation for this lookup\NC \NR +\NC mktextfm \NC boolean \NC disable/enable mktextfm generation for this lookup\NC \NR +\NC subdir \NC string + or table \NC only output matches whose directory part + ends with the given string(s) \NC \NR +\stoptabulate + +\subsection{\type {init_prog}} + +Extra initialization for programs that need to generate bitmap fonts. + +\startfunctioncall +kpse.init_prog(<string> prefix, <number> base_dpi, <string> mfmode) +kpse.init_prog(<string> prefix, <number> base_dpi, <string> mfmode, <string> fallback) +\stopfunctioncall + +\subsection{\type {readable_file}} + +Test if an (absolute) file name is a readable file. + +\startfunctioncall +<string> f = kpse.readable_file(<string> name) +\stopfunctioncall + +The return value is the actual absolute filename you should use, because the disk +name is not always the same as the requested name, due to aliases and +system|-|specific handling under e.g.\ \MSDOS. + +Returns \type {nil} if the file does not exist or is not readable. + +\subsection{\type {expand_path}} + +Like kpsewhich's \type {-expand-path}: + +\startfunctioncall +<string> r = kpse.expand_path(<string> s) +\stopfunctioncall + +\subsection{\type {expand_var}} + +Like kpsewhich's \type {-expand-var}: + +\startfunctioncall +<string> r = kpse.expand_var(<string> s) +\stopfunctioncall + +\subsection{\type {expand_braces}} + +Like kpsewhich's \type {-expand-braces}: + +\startfunctioncall +<string> r = kpse.expand_braces(<string> s) +\stopfunctioncall + +\subsection{\type {show_path}} + +Like kpsewhich's \type {-show-path}: + +\startfunctioncall +<string> r = kpse.show_path(<string> ftype) +\stopfunctioncall + + +\subsection{\type {var_value}} + +Like kpsewhich's \type {-var-value}: + +\startfunctioncall +<string> r = kpse.var_value(<string> s) +\stopfunctioncall + +\subsection{\type {version}} + +Returns the kpathsea version string. + +\startfunctioncall +<string> r = kpse.version() +\stopfunctioncall + + +\section{The \type {lang} library} + +This library provides the interface to \LUATEX's structure +representing a language, and the associated functions. + +\startfunctioncall +<language> l = lang.new() +<language> l = lang.new(<number> id) +\stopfunctioncall + +This function creates a new userdata object. An object of type \type {<language>} +is the first argument to most of the other functions in the \type {lang} +library. These functions can also be used as if they were object methods, using +the colon syntax. + +Without an argument, the next available internal id number will be assigned to +this object. With argument, an object will be created that links to the internal +language with that id number. + +\startfunctioncall +<number> n = lang.id(<language> l) +\stopfunctioncall + +returns the internal \type {\language} id number this object refers to. + +\startfunctioncall +<string> n = lang.hyphenation(<language> l) +lang.hyphenation(<language> l, <string> n) +\stopfunctioncall + +Either returns the current hyphenation exceptions for this language, or adds new +ones. The syntax of the string is explained in~\in {section} +[patternsexceptions]. + +\startfunctioncall +lang.clear_hyphenation(<language> l) +\stopfunctioncall + +Clears the exception dictionary for this language. + +\startfunctioncall +<string> n = lang.clean(<string> o) +\stopfunctioncall + +Creates a hyphenation key from the supplied hyphenation value. The syntax of the +argument string is explained in~\in {section} [patternsexceptions]. This function +is useful if you want to do something else based on the words in a dictionary +file, like spell|-|checking. + +\startfunctioncall +<string> n = lang.patterns(<language> l) +lang.patterns(<language> l, <string> n) +\stopfunctioncall + +Adds additional patterns for this language object, or returns the current set. +The syntax of this string is explained in~\in {section} [patternsexceptions]. + +\startfunctioncall +lang.clear_patterns(<language> l) +\stopfunctioncall + +Clears the pattern dictionary for this language. + +\startfunctioncall +<number> n = lang.prehyphenchar(<language> l) +lang.prehyphenchar(<language> l, <number> n) +\stopfunctioncall + +Gets or sets the \quote {pre|-|break} hyphen character for implicit hyphenation +in this language (initially the hyphen, decimal 45). + +\startfunctioncall +<number> n = lang.posthyphenchar(<language> l) +lang.posthyphenchar(<language> l, <number> n) +\stopfunctioncall + +Gets or sets the \quote {post|-|break} hyphen character for implicit hyphenation +in this language (initially null, decimal~0, indicating emptiness). + +\startfunctioncall +<number> n = lang.preexhyphenchar(<language> l) +lang.preexhyphenchar(<language> l, <number> n) +\stopfunctioncall + +Gets or sets the \quote {pre|-|break} hyphen character for explicit hyphenation +in this language (initially null, decimal~0, indicating emptiness). + +\startfunctioncall +<number> n = lang.postexhyphenchar(<language> l) +lang.postexhyphenchar(<language> l, <number> n) +\stopfunctioncall + +Gets or sets the \quote {post|-|break} hyphen character for explicit hyphenation +in this language (initially null, decimal~0, indicating emptiness). + +\startfunctioncall +<boolean> success = lang.hyphenate(<node> head) +<boolean> success = lang.hyphenate(<node> head, <node> tail) +\stopfunctioncall + +Inserts hyphenation points (discretionary nodes) in a node list. If \type {tail} +is given as argument, processing stops on that node. Currently, \type {success} +is always true if \type {head} (and \type {tail}, if specified) are proper nodes, +regardless of possible other errors. + +Hyphenation works only on \quote {characters}, a special subtype of all the glyph +nodes with the node subtype having the value \type {1}. Glyph modes with +different subtypes are not processed. See \in {section~} [charsandglyphs] for +more details. + +\section{The \type {lua} library} + +This library contains one read|-|only item: + +\starttyping +<string> s = lua.version +\stoptyping + +This returns the \LUA\ version identifier string. The value is currently +\directlua {tex.print(lua.version)}. + +\subsection{\LUA\ bytecode registers} + +\LUA\ registers can be used to communicate \LUA\ functions across \LUA\ chunks. +The accepted values for assignments are functions and \type {nil}. Likewise, the +retrieved value is either a function or \type {nil}. + +\starttyping +lua.bytecode[<number> n] = <function> f +lua.bytecode[<number> n]() +\stoptyping + +The contents of the \type {lua.bytecode} array is stored inside the format file +as actual \LUA\ bytecode, so it can also be used to preload \LUA\ code. + +Note: The function must not contain any upvalues. Currently, functions containing +upvalues can be stored (and their upvalues are set to \type {nil}), but this is +an artifact of the current \LUA\ implementation and thus subject to change. + +The associated function calls are + +\startfunctioncall +<function> f = lua.getbytecode(<number> n) +lua.setbytecode(<number> n, <function> f) +\stopfunctioncall + +Note: Since a \LUA\ file loaded using \type {loadfile(filename)} is essentially +an anonymous function, a complete file can be stored in a bytecode register like +this: + +\startfunctioncall +lua.bytecode[n] = loadfile(filename) +\stopfunctioncall + +Now all definitions (functions, variables) contained in the file can be +created by executing this bytecode register: + +\startfunctioncall +lua.bytecode[n]() +\stopfunctioncall + +Note that the path of the file is stored in the \LUA\ bytecode to be used in +stack backtraces and therefore dumped into the format file if the above code is +used in \INITEX. If it contains private information, i.e. the user name, this +information is then contained in the format file as well. This should be kept in +mind when preloading files into a bytecode register in \INITEX. + +\subsection{\LUA\ chunk name registers} + +There is an array of 65536 (0--65535) potential chunk names for use with the +\type {\directlua} and \type {\latelua} primitives. + +\startfunctioncall +lua.name[<number> n] = <string> s +<string> s = lua.name[<number> n] +\stopfunctioncall + +If you want to unset a lua name, you can assign \type {nil} to it. + +\section{The \type {mplib} library} + +The \MP\ library interface registers itself in the table \type {mplib}. It is +based on \MPLIB\ version \ctxlua {context(mplib.version())}. + +\subsection{\type {mplib.new}} + +To create a new \METAPOST\ instance, call + +\startfunctioncall +<mpinstance> mp = mplib.new({...}) +\stopfunctioncall + +This creates the \type {mp} instance object. The argument hash can have a number +of different fields, as follows: + +\starttabulate[|lT|l|p|p|] +\NC \ssbf name \NC \bf type \NC \bf description \NC \bf default \NC \NR +\NC error_line \NC number \NC error line width \NC 79 \NC \NR +\NC print_line \NC number \NC line length in ps output \NC 100 \NC \NR +\NC random_seed \NC number \NC the initial random seed \NC variable \NC \NR +\NC interaction \NC string \NC the interaction mode, + one of + \type {batch}, + \type {nonstop}, + \type {scroll}, + \type {errorstop} \NC \type {errorstop} \NC \NR +\NC job_name \NC string \NC \type {--jobname} \NC \type {mpout} \NC \NR +\NC find_file \NC function \NC a function to find files \NC only local files \NC \NR +\stoptabulate + +The \type {find_file} function should be of this form: + +\starttyping +<string> found = finder (<string> name, <string> mode, <string> type) +\stoptyping + +with: + +\starttabulate[|lT|l|p|] +\NC \bf name \NC \bf the requested file \NC \NR +\NC mode \NC the file mode: \type {r} or \type {w} \NC \NR +\NC type \NC the kind of file, one of: \type {mp}, \type {tfm}, \type {map}, + \type {pfb}, \type {enc} \NC \NR +\stoptabulate + +Return either the full pathname of the found file, or \type {nil} if the file +cannot be found. + +Note that the new version of \MPLIB\ no longer uses binary mem files, so the way +to preload a set of macros is simply to start off with an \type {input} command +in the first \type {mp:execute()} call. + +\subsection{\type {mp:statistics}} + +You can request statistics with: + +\startfunctioncall +<table> stats = mp:statistics() +\stopfunctioncall + +This function returns the vital statistics for an \MPLIB\ instance. There are +four fields, giving the maximum number of used items in each of four allocated +object classes: + +\starttabulate[|lT|l|p|] +\NC main_memory \NC number \NC memory size \NC \NR +\NC hash_size \NC number \NC hash size\NC \NR +\NC param_size \NC number \NC simultaneous macro parameters\NC \NR +\NC max_in_open \NC number \NC input file nesting levels\NC \NR +\stoptabulate + +Note that in the new version of \MPLIB, this is informational only. The objects +are all allocated dynamically, so there is no chance of running out of space +unless the available system memory is exhausted. + +\subsection{\type {mp:execute}} + +You can ask the \METAPOST\ interpreter to run a chunk of code by calling + +\startfunctioncall +<table> rettable = mp:execute('metapost language chunk') +\stopfunctioncall + +for various bits of \METAPOST\ language input. Be sure to check the \type +{rettable.status} (see below) because when a fatal \METAPOST\ error occurs the +\MPLIB\ instance will become unusable thereafter. + +Generally speaking, it is best to keep your chunks small, but beware that all +chunks have to obey proper syntax, like each of them is a small file. For +instance, you cannot split a single statement over multiple chunks. + +In contrast with the normal standalone \type {mpost} command, there is {\em no} +implied \quote{input} at the start of the first chunk. + +\subsection{\type {mp:finish}} + +\startfunctioncall +<table> rettable = mp:finish() +\stopfunctioncall + +If for some reason you want to stop using an \MPLIB\ instance while processing is +not yet actually done, you can call \type {mp:finish}. Eventually, used memory +will be freed and open files will be closed by the \LUA\ garbage collector, but +an explicit \type {mp:finish} is the only way to capture the final part of the +output streams. + +\subsection{Result table} + +The return value of \type {mp:execute} and \type {mp:finish} is a table with a +few possible keys (only \type {status} is always guaranteed to be present). + +\starttabulate[|l|l|p|] +\NC log \NC string \NC output to the \quote {log} stream \NC \NR +\NC term \NC string \NC output to the \quote {term} stream \NC \NR +\NC error \NC string \NC output to the \quote {error} stream + (only used for \quote {out of memory}) \NC \NR +\NC status \NC number \NC the return value: + \type {0} = good, + \type {1} = warning, + \type {2} = errors, + \type {3} = fatal error \NC \NR +\NC fig \NC table \NC an array of generated figures (if any) \NC \NR +\stoptabulate + +When \type {status} equals~3, you should stop using this \MPLIB\ instance +immediately, it is no longer capable of processing input. + +If it is present, each of the entries in the \type {fig} array is a userdata +representing a figure object, and each of those has a number of object methods +you can call: + +\starttabulate[|l|l|p|] +\NC boundingbox \NC function \NC returns the bounding box, as an array of 4 + values\NC \NR +\NC postscript \NC function \NC returns a string that is the ps output of the + \type {fig}. this function accepts two optional + integer arguments for specifying the values of + \type {prologues} (first argument) and \type + {procset} (second argument)\NC \NR +\NC svg \NC function \NC returns a string that is the svg output of the + \type {fig}. This function accepts an optional + integer argument for specifying the value of + \type {prologues}\NC \NR +\NC objects \NC function \NC returns the actual array of graphic objects in + this \type {fig} \NC \NR +\NC copy_objects \NC function \NC returns a deep copy of the array of graphic + objects in this \type {fig} \NC \NR +\NC filename \NC function \NC the filename this \type {fig}'s \POSTSCRIPT\ + output would have written to in standalone + mode \NC \NR +\NC width \NC function \NC the \type {fontcharwd} value \NC \NR +\NC height \NC function \NC the \type {fontcharht} value \NC \NR +\NC depth \NC function \NC the \type {fontchardp} value \NC \NR +\NC italcorr \NC function \NC the \type {fontcharit} value \NC \NR +\NC charcode \NC function \NC the (rounded) \type {charcode} value \NC \NR +\stoptabulate + +Note: you can call \type {fig:objects()} only once for any one \type {fig} +object! + +When the boundingbox represents a \quote {negated rectangle}, i.e.\ when the +first set of coordinates is larger than the second set, the picture is empty. + +Graphical objects come in various types that each has a different list of +accessible values. The types are: \type {fill}, \type {outline}, \type {text}, +\type {start_clip}, \type {stop_clip}, \type {start_bounds}, \type {stop_bounds}, +\type {special}. + +There is helper function (\type {mplib.fields(obj)}) to get the list of +accessible values for a particular object, but you can just as easily use the +tables given below. + +All graphical objects have a field \type {type} that gives the object type as a +string value; it is not explicit mentioned in the following tables. In the +following, \type {number}s are \POSTSCRIPT\ points represented as a floating +point number, unless stated otherwise. Field values that are of type \type +{table} are explained in the next section. + +\subsubsection{fill} + +\starttabulate[|l|l|p|] +\NC path \NC table \NC the list of knots \NC \NR +\NC htap \NC table \NC the list of knots for the reversed trajectory \NC \NR +\NC pen \NC table \NC knots of the pen \NC \NR +\NC color \NC table \NC the object's color \NC \NR +\NC linejoin \NC number \NC line join style (bare number)\NC \NR +\NC miterlimit \NC number \NC miterlimit\NC \NR +\NC prescript \NC string \NC the prescript text \NC \NR +\NC postscript \NC string \NC the postscript text \NC \NR +\stoptabulate + +The entries \type {htap} and \type {pen} are optional. + +There is helper function (\type {mplib.pen_info(obj)}) that returns a table +containing a bunch of vital characteristics of the used pen (all values are +floats): + +\starttabulate[|l|l|p|] +\NC width \NC number \NC width of the pen \NC \NR +\NC sx \NC number \NC $x$ scale \NC \NR +\NC rx \NC number \NC $xy$ multiplier \NC \NR +\NC ry \NC number \NC $yx$ multiplier \NC \NR +\NC sy \NC number \NC $y$ scale \NC \NR +\NC tx \NC number \NC $x$ offset \NC \NR +\NC ty \NC number \NC $y$ offset \NC \NR +\stoptabulate + +\subsubsection{outline} + +\starttabulate[|l|l|p|] +\NC path \NC table \NC the list of knots \NC \NR +\NC pen \NC table \NC knots of the pen \NC \NR +\NC color \NC table \NC the object's color \NC \NR +\NC linejoin \NC number \NC line join style (bare number) \NC \NR +\NC miterlimit \NC number \NC miterlimit \NC \NR +\NC linecap \NC number \NC line cap style (bare number) \NC \NR +\NC dash \NC table \NC representation of a dash list \NC \NR +\NC prescript \NC string \NC the prescript text \NC \NR +\NC postscript \NC string \NC the postscript text \NC \NR +\stoptabulate + +The entry \type {dash} is optional. + +\subsubsection{text} + +\starttabulate[|l|l|p|] +\NC text \NC string \NC the text \NC \NR +\NC font \NC string \NC font tfm name \NC \NR +\NC dsize \NC number \NC font size \NC \NR +\NC color \NC table \NC the object's color \NC \NR +\NC width \NC number \NC \NC \NR +\NC height \NC number \NC \NC \NR +\NC depth \NC number \NC \NC \NR +\NC transform \NC table \NC a text transformation \NC \NR +\NC prescript \NC string \NC the prescript text \NC \NR +\NC postscript \NC string \NC the postscript text \NC \NR +\stoptabulate + +\subsubsection{special} + +\starttabulate[|l|l|p|] +\NC prescript \NC string \NC special text \NC \NR +\stoptabulate + +\subsubsection{start_bounds, start_clip} + +\starttabulate[|l|l|p|] +\NC path \NC table \NC the list of knots \NC \NR +\stoptabulate + +\subsubsection{stop_bounds, stop_clip} + +Here are no fields available. + +\subsection{Subsidiary table formats} + +\subsubsection{Paths and pens} + +Paths and pens (that are really just a special type of paths as far as \MPLIB\ is +concerned) are represented by an array where each entry is a table that +represents a knot. + +\starttabulate[|lT|l|p|] +\NC left_type \NC string \NC when present: endpoint, but usually absent \NC \NR +\NC right_type \NC string \NC like \type {left_type} \NC \NR +\NC x_coord \NC number \NC X coordinate of this knot \NC \NR +\NC y_coord \NC number \NC Y coordinate of this knot \NC \NR +\NC left_x \NC number \NC X coordinate of the precontrol point of this knot \NC \NR +\NC left_y \NC number \NC Y coordinate of the precontrol point of this knot \NC \NR +\NC right_x \NC number \NC X coordinate of the postcontrol point of this knot \NC \NR +\NC right_y \NC number \NC Y coordinate of the postcontrol point of this knot \NC \NR +\stoptabulate + +There is one special case: pens that are (possibly transformed) ellipses have an +extra string-valued key \type {type} with value \type {elliptical} besides the +array part containing the knot list. + +\subsubsection{Colors} + +A color is an integer array with 0, 1, 3 or 4 values: + +\starttabulate[|l|l|p|] +\NC 0 \NC marking only \NC no values \NC \NR +\NC 1 \NC greyscale \NC one value in the range $(0,1)$, \quote {black} is $0$ \NC \NR +\NC 3 \NC \RGB \NC three values in the range $(0,1)$, \quote {black} is $0,0,0$ \NC \NR +\NC 4 \NC \CMYK \NC four values in the range $(0,1)$, \quote {black} is $0,0,0,1$ \NC \NR +\stoptabulate + +If the color model of the internal object was \type {uninitialized}, then it was +initialized to the values representing \quote {black} in the colorspace \type +{defaultcolormodel} that was in effect at the time of the \type {shipout}. + +\subsubsection{Transforms} + +Each transform is a six|-|item array. + +\starttabulate[|l|l|p|] +\NC 1 \NC number \NC represents x \NC \NR +\NC 2 \NC number \NC represents y \NC \NR +\NC 3 \NC number \NC represents xx \NC \NR +\NC 4 \NC number \NC represents yx \NC \NR +\NC 5 \NC number \NC represents xy \NC \NR +\NC 6 \NC number \NC represents yy \NC \NR +\stoptabulate + +Note that the translation (index 1 and 2) comes first. This differs from the +ordering in \POSTSCRIPT, where the translation comes last. + +\subsubsection{Dashes} + +Each \type {dash} is two-item hash, using the same model as \POSTSCRIPT\ for the +representation of the dashlist. \type {dashes} is an array of \quote {on} and +\quote {off}, values, and \type {offset} is the phase of the pattern. + +\starttabulate[|l|l|p|] +\NC dashes \NC hash \NC an array of on-off numbers \NC \NR +\NC offset \NC number \NC the starting offset value \NC \NR +\stoptabulate + +\subsection{Character size information} + +These functions find the size of a glyph in a defined font. The \type {fontname} +is the same name as the argument to \type {infont}; the \type {char} is a glyph +id in the range 0 to 255; the returned \type {w} is in AFM units. + +\subsubsection{\type {mp:char_width}} + +\startfunctioncall +<number> w = mp:char_width(<string> fontname, <number> char) +\stopfunctioncall + +\subsubsection{\type {mp:char_height}} + +\startfunctioncall +<number> w = mp:char_height(<string> fontname, <number> char) +\stopfunctioncall + +\subsubsection{\type {mp:char_depth}} + +\startfunctioncall +<number> w = mp:char_depth(<string> fontname, <number> char) +\stopfunctioncall + +\section{The \type {node} library} + +The \type {node} library contains functions that facilitate dealing with (lists +of) nodes and their values. They allow you to create, alter, copy, delete, and +insert \LUATEX\ node objects, the core objects within the typesetter. + +\LUATEX\ nodes are represented in \LUA\ as userdata with the metadata type +\type {luatex.node}. The various parts within a node can be accessed using +named fields. + +Each node has at least the three fields \type {next}, \type {id}, and \type +{subtype}: + +\startitemize[intro] + +\startitem + The \type {next} field returns the userdata object for the next node in a + linked list of nodes, or \type {nil}, if there is no next node. +\stopitem + +\startitem + The \type {id} indicates \TEX's \quote{node type}. The field \type {id} has a + numeric value for efficiency reasons, but some of the library functions also + accept a string value instead of \type {id}. +\stopitem + +\startitem + The \type {subtype} is another number. It often gives further information + about a node of a particular \type {id}, but it is most important when + dealing with \quote {whatsits}, because they are differentiated solely based + on their \type {subtype}. +\stopitem + +\stopitemize + +The other available fields depend on the \type {id} (and for \quote {whatsits}, +the \type {subtype}) of the node. Further details on the various fields and their +meanings are given in~\in{chapter}[nodes]. + +Support for \type {unset} (alignment) nodes is partial: they can be queried and +modified from \LUA\ code, but not created. + +Nodes can be compared to each other, but: you are actually comparing indices into +the node memory. This means that equality tests can only be trusted under very +limited conditions. It will not work correctly in any situation where one of the +two nodes has been freed and|/|or reallocated: in that case, there will be false +positives. + +At the moment, memory management of nodes should still be done explicitly by the +user. Nodes are not \quote {seen} by the \LUA\ garbage collector, so you have to +call the node freeing functions yourself when you are no longer in need of a node +(list). Nodes form linked lists without reference counting, so you have to be +careful that when control returns back to \LUATEX\ itself, you have not deleted +nodes that are still referenced from a \type {next} pointer elsewhere, and that +you did not create nodes that are referenced more than once. + +There are statistics available with regards to the allocated node memory, which +can be handy for tracing. + +\subsection{Node handling functions} + +\subsubsection{\type {node.is_node}} + +\startfunctioncall +<boolean> t = node.is_node(<any> item) +\stopfunctioncall + +This function returns true if the argument is a userdata object of +type \type {<node>}. + +\subsubsection{\type {node.types}} + +\startfunctioncall +<table> t = node.types() +\stopfunctioncall + +This function returns an array that maps node id numbers to node type strings, +providing an overview of the possible top|-|level \type {id} types. + +\subsubsection{\type {node.whatsits}} + +\startfunctioncall +<table> t = node.whatsits() +\stopfunctioncall + +\TEX's \quote{whatsits} all have the same \type {id}. The various subtypes are +defined by their \type {subtype} fields. The function is much like \type +{node.types}, except that it provides an array of \type {subtype} mappings. + +\subsubsection{\type {node.id}} + +\startfunctioncall +<number> id = node.id(<string> type) +\stopfunctioncall + +This converts a single type name to its internal numeric representation. + +\subsubsection{\type {node.subtype}} + +\startfunctioncall +<number> subtype = node.subtype(<string> type) +\stopfunctioncall + +This converts a single whatsit name to its internal numeric representation (\type +{subtype}). + +\subsubsection{\type {node.type}} + +\startfunctioncall +<string> type = node.type(<any> n) +\stopfunctioncall + +In the argument is a number, then this function converts an internal numeric +representation to an external string representation. Otherwise, it will return +the string \type {node} if the object represents a node, and \type {nil} +otherwise. + +\subsubsection{\type {node.fields}} + +\startfunctioncall +<table> t = node.fields(<number> id) +<table> t = node.fields(<number> id, <number> subtype) +\stopfunctioncall + +This function returns an array of valid field names for a particular type of +node. If you want to get the valid fields for a \quote {whatsit}, you have to +supply the second argument also. In other cases, any given second argument will +be silently ignored. + +This function accepts string \type {id} and \type {subtype} values as well. + +\subsubsection{\type {node.has_field}} + +\startfunctioncall +<boolean> t = node.has_field(<node> n, <string> field) +\stopfunctioncall + +This function returns a boolean that is only true if \type {n} is +actually a node, and it has the field. + +\subsubsection{\type {node.new}} + +\startfunctioncall +<node> n = node.new(<number> id) +<node> n = node.new(<number> id, <number> subtype) +\stopfunctioncall + +Creates a new node. All of the new node's fields are initialized to either zero +or \type {nil} except for \type {id} and \type {subtype} (if supplied). If you +want to create a new whatsit, then the second argument is required, otherwise it +need not be present. As with all node functions, this function creates a node on +the \TEX\ level. + +This function accepts string \type {id} and \type {subtype} values as well. + +\subsubsection{\type {node.free}} + +\startfunctioncall +node.free(<node> n) +\stopfunctioncall + +Removes the node \type {n} from \TEX's memory. Be careful: no checks are done on +whether this node is still pointed to from a register or some \type {next} field: +it is up to you to make sure that the internal data structures remain correct. + +\subsubsection{\type {node.flush_list}} + +\startfunctioncall +node.flush_list(<node> n) +\stopfunctioncall + +Removes the node list \type {n} and the complete node list following \type {n} +from \TEX's memory. Be careful: no checks are done on whether any of these nodes +is still pointed to from a register or some \type {next} field: it is up to you +to make sure that the internal data structures remain correct. + +\subsubsection{\type {node.copy}} + +\startfunctioncall +<node> m = node.copy(<node> n) +\stopfunctioncall + +Creates a deep copy of node \type {n}, including all nested lists as in the case +of a hlist or vlist node. Only the \type {next} field is not copied. + +\subsubsection{\type {node.copy_list}} + +\startfunctioncall +<node> m = node.copy_list(<node> n) +<node> m = node.copy_list(<node> n, <node> m) +\stopfunctioncall + +Creates a deep copy of the node list that starts at \type {n}. If \type {m} is +also given, the copy stops just before node \type {m}. + +Note that you cannot copy attribute lists this way, specialized functions for +dealing with attribute lists will be provided later but are not there yet. +However, there is normally no need to copy attribute lists as when you do +assignments to the \type {attr} field or make changes to specific attributes, the +needed copying and freeing takes place automatically. + +\subsubsection{\type {node.next}} + +\startfunctioncall +<node> m = node.next(<node> n) +\stopfunctioncall + +Returns the node following this node, or \type {nil} if there is no such node. + +\subsubsection{\type {node.prev}} + +\startfunctioncall +<node> m = node.prev(<node> n) +\stopfunctioncall + +Returns the node preceding this node, or \type {nil} if there is no such node. + +\subsubsection{\type {node.current_attr}} + +\startfunctioncall +<node> m = node.current_attr() +\stopfunctioncall + +Returns the currently active list of attributes, if there is one. + +The intended usage of \type {current_attr} is as follows: + +\starttyping +local x1 = node.new("glyph") +x1.attr = node.current_attr() +local x2 = node.new("glyph") +x2.attr = node.current_attr() +\stoptyping + +or: + +\starttyping +local x1 = node.new("glyph") +local x2 = node.new("glyph") +local ca = node.current_attr() +x1.attr = ca +x2.attr = ca +\stoptyping + +The attribute lists are ref counted and the assignment takes care of incrementing +the refcount. You cannot expect the value \type {ca} to be valid any more when +you assign attributes (using \type {tex.setattribute}) or when control has been +passed back to \TEX. + +Note: this function is somewhat experimental, and it returns the {\it actual} +attribute list, not a copy thereof. Therefore, changing any of the attributes in +the list will change these values for all nodes that have the current attribute +list assigned to them. + +\subsubsection{\type {node.hpack}} + +\startfunctioncall +<node> h, <number> b = node.hpack(<node> n) +<node> h, <number> b = node.hpack(<node> n, <number> w, <string> info) +<node> h, <number> b = node.hpack(<node> n, <number> w, <string> info, <string> dir) +\stopfunctioncall + +This function creates a new hlist by packaging the list that begins at node \type +{n} into a horizontal box. With only a single argument, this box is created using +the natural width of its components. In the three argument form, \type {info} +must be either \type {additional} or \type {exactly}, and \type {w} is the +additional (\type {\hbox spread}) or exact (\type {\hbox to}) width to be used. The +second return value is the badness of the generated box. + +Caveat: at this moment, there can be unexpected side|-|effects to this function, +like updating some of the \type {\marks} and \type {\inserts}. Also note that the +content of \type {h} is the original node list \type {n}: if you call \type +{node.free(h)} you will also free the node list itself, unless you explicitly set +the \type {list} field to \type {nil} beforehand. And in a similar way, calling +\type {node.free(n)} will invalidate \type {h} as well! + +\subsubsection{\type {node.vpack}} + +\startfunctioncall +<node> h, <number> b = node.vpack(<node> n) +<node> h, <number> b = node.vpack(<node> n, <number> w, <string> info) +<node> h, <number> b = node.vpack(<node> n, <number> w, <string> info, <string> dir) +\stopfunctioncall + +This function creates a new vlist by packaging the list that begins at node \type +{n} into a vertical box. With only a single argument, this box is created using +the natural height of its components. In the three argument form, \type {info} +must be either \type {additional} or \type {exactly}, and \type {w} is the +additional (\type {\vbox spread}) or exact (\type {\vbox to}) height to be used. + +The second return value is the badness of the generated box. + +See the description of \type {node.hpack()} for a few memory allocation caveats. + +\subsubsection{\type {node.dimensions}} + +\startfunctioncall +<number> w, <number> h, <number> d = node.dimensions(<node> n) +<number> w, <number> h, <number> d = node.dimensions(<node> n, <string> dir) +<number> w, <number> h, <number> d = node.dimensions(<node> n, <node> t) +<number> w, <number> h, <number> d = node.dimensions(<node> n, <node> t, <string> dir) +\stopfunctioncall + +This function calculates the natural in-line dimensions of the node list starting +at node \type {n} and terminating just before node \type {t} (or the end of the +list, if there is no second argument). The return values are scaled points. An +alternative format that starts with glue parameters as the first three arguments +is also possible: + +\startfunctioncall +<number> w, <number> h, <number> d = + node.dimensions(<number> glue_set, <number> glue_sign, + <number> glue_order, <node> n) +<number> w, <number> h, <number> d = + node.dimensions(<number> glue_set, <number> glue_sign, + <number> glue_order, <node> n, <string> dir) +<number> w, <number> h, <number> d = + node.dimensions(<number> glue_set, <number> glue_sign, + <number> glue_order, <node> n, <node> t) +<number> w, <number> h, <number> d = + node.dimensions(<number> glue_set, <number> glue_sign, + <number> glue_order, <node> n, <node> t, <string> dir) +\stopfunctioncall + +This calling method takes glue settings into account and is especially useful for +finding the actual width of a sublist of nodes that are already boxed, for +example in code like this, which prints the width of the space inbetween the +\type {a} and \type {b} as it would be if \type {\box0} was used as-is: + +\starttyping +\setbox0 = \hbox to 20pt {a b} + +\directlua{print (node.dimensions( + tex.box[0].glue_set, + tex.box[0].glue_sign, + tex.box[0].glue_order, + tex.box[0].head.next, + node.tail(tex.box[0].head) +)) } +\stoptyping + +\subsubsection{\type {node.mlist_to_hlist}} + +\startfunctioncall +<node> h = node.mlist_to_hlist(<node> n, + <string> display_type, <boolean> penalties) +\stopfunctioncall + +This runs the internal mlist to hlist conversion, converting the math list in +\type {n} into the horizontal list \type {h}. The interface is exactly the same +as for the callback \type {mlist_to_hlist}. + +\subsubsection{\type {node.slide}} + +\startfunctioncall +<node> m = node.slide(<node> n) +\stopfunctioncall + +Returns the last node of the node list that starts at \type {n}. As a +side|-|effect, it also creates a reverse chain of \type {prev} pointers between +nodes. + +\subsubsection{\type {node.tail}} + +\startfunctioncall +<node> m = node.tail(<node> n) +\stopfunctioncall + +Returns the last node of the node list that starts at \type {n}. + +\subsubsection{\type {node.length}} + +\startfunctioncall +<number> i = node.length(<node> n) +<number> i = node.length(<node> n, <node> m) +\stopfunctioncall + +Returns the number of nodes contained in the node list that starts at \type {n}. +If \type {m} is also supplied it stops at \type {m} instead of at the end of the +list. The node \type {m} is not counted. + +\subsubsection{\type {node.count}} + +\startfunctioncall +<number> i = node.count(<number> id, <node> n) +<number> i = node.count(<number> id, <node> n, <node> m) +\stopfunctioncall + +Returns the number of nodes contained in the node list that starts at \type {n} +that have a matching \type {id} field. If \type {m} is also supplied, counting +stops at \type {m} instead of at the end of the list. The node \type {m} is not +counted. + +This function also accept string \type {id}'s. + +\subsubsection{\type {node.traverse}} + +\startfunctioncall +<node> t = node.traverse(<node> n) +\stopfunctioncall + +This is a lua iterator that loops over the node list that starts at \type {n}. +Typically code looks like this: + +\starttyping +for n in node.traverse(head) do + ... +end +\stoptyping + +is functionally equivalent to: + +\starttyping +do + local n + local function f (head,var) + local t + if var == nil then + t = head + else + t = var.next + end + return t + end + while true do + n = f (head, n) + if n == nil then break end + ... + end +end +\stoptyping + +It should be clear from the definition of the function \type {f} that even though +it is possible to add or remove nodes from the node list while traversing, you +have to take great care to make sure all the \type {next} (and \type {prev}) +pointers remain valid. + +If the above is unclear to you, see the section \quote {For Statement} in the +\LUA\ Reference Manual. + +\subsubsection{\type {node.traverse_id}} + +\startfunctioncall +<node> t = node.traverse_id(<number> id, <node> n) +\stopfunctioncall + +This is an iterator that loops over all the nodes in the list that starts at +\type {n} that have a matching \type {id} field. + +See the previous section for details. The change is in the local function \type +{f}, which now does an extra while loop checking against the upvalue \type {id}: + +\starttyping + local function f(head,var) + local t + if var == nil then + t = head + else + t = var.next + end + while not t.id == id do + t = t.next + end + return t + end +\stoptyping + +\subsubsection{\type {node.end_of_math}} + +\startfunctioncall +<node> t = node.end_of_math(<node> start) +\stopfunctioncall + +Looks for and returns the next \type {math_node} following the \type {start}. If +the given node is a math endnode this helper return that node, else it follows +the list and return the next math endnote. If no such node is found nil is +returned. + +\subsubsection{\type {node.remove}} + +\startfunctioncall +<node> head, current = node.remove(<node> head, <node> current) +\stopfunctioncall + +This function removes the node \type {current} from the list following \type +{head}. It is your responsibility to make sure it is really part of that list. +The return values are the new \type {head} and \type {current} nodes. The +returned \type {current} is the node following the \type {current} in the calling +argument, and is only passed back as a convenience (or \type {nil}, if there is +no such node). The returned \type {head} is more important, because if the +function is called with \type {current} equal to \type {head}, it will be +changed. + +\subsubsection{\type {node.insert_before}} + +\startfunctioncall +<node> head, new = node.insert_before(<node> head, <node> current, <node> new) +\stopfunctioncall + +This function inserts the node \type {new} before \type {current} into the list +following \type {head}. It is your responsibility to make sure that \type +{current} is really part of that list. The return values are the (potentially +mutated) \type {head} and the node \type {new}, set up to be part of the list +(with correct \type {next} field). If \type {head} is initially \type {nil}, it +will become \type {new}. + +\subsubsection{\type {node.insert_after}} + +\startfunctioncall +<node> head, new = node.insert_after(<node> head, <node> current, <node> new) +\stopfunctioncall + +This function inserts the node \type {new} after \type {current} into the list +following \type {head}. It is your responsibility to make sure that \type +{current} is really part of that list. The return values are the \type {head} and +the node \type {new}, set up to be part of the list (with correct \type {next} +field). If \type {head} is initially \type {nil}, it will become \type {new}. + +\subsubsection{\type {node.first_glyph}} + +\startfunctioncall +<node> n = node.first_glyph(<node> n) +<node> n = node.first_glyph(<node> n, <node> m) +\stopfunctioncall + +Returns the first node in the list starting at \type {n} that is a glyph node +with a subtype indicating it is a glyph, or \type {nil}. If \type {m} is given, +processing stops at (but including) that node, otherwise processing stops at the +end of the list. + +\subsubsection{\type {node.ligaturing}} + +\startfunctioncall +<node> h, <node> t, <boolean> success = node.ligaturing(<node> n) +<node> h, <node> t, <boolean> success = node.ligaturing(<node> n, <node> m) +\stopfunctioncall + +Apply \TEX-style ligaturing to the specified nodelist. The tail node \type {m} is +optional. The two returned nodes \type {h} and \type {t} are the new head and +tail (both \type {n} and \type {m} can change into a new ligature). + +\subsubsection{\type {node.kerning}} + +\startfunctioncall +<node> h, <node> t, <boolean> success = node.kerning(<node> n) +<node> h, <node> t, <boolean> success = node.kerning(<node> n, <node> m) +\stopfunctioncall + +Apply \TEX|-|style kerning to the specified nodelist. The tail node \type {m} is +optional. The two returned nodes \type {h} and \type {t} are the head and tail +(either one of these can be an inserted kern node, because special kernings with +word boundaries are possible). + +\subsubsection{\type {node.unprotect_glyphs}} + +\startfunctioncall +node.unprotect_glyphs(<node> n) +\stopfunctioncall + +Subtracts 256 from all glyph node subtypes. This and the next function are +helpers to convert from \type {characters} to \type {glyphs} during node +processing. + +\subsubsection{\type {node.protect_glyphs}} + +\startfunctioncall +node.protect_glyphs(<node> n) +\stopfunctioncall + +Adds 256 to all glyph node subtypes in the node list starting at \type {n}, +except that if the value is 1, it adds only 255. The special handling of 1 means +that \type {characters} will become \type {glyphs} after subtraction of 256. + +\subsubsection{\type {node.last_node}} + +\startfunctioncall +<node> n = node.last_node() +\stopfunctioncall + +This function pops the last node from \TEX's \quote{current list}. It returns +that node, or \type {nil} if the current list is empty. + +\subsubsection{\type {node.write}} + +\startfunctioncall +node.write(<node> n) +\stopfunctioncall + +This is an experimental function that will append a node list to \TEX's \quote +{current list} The node list is not deep|-|copied! There is no error checking +either! + +\subsubsection{\type {node.protrusion_skippable}} +\startfunctioncall +<boolean> skippable = node.protrusion_skippable(<node> n) +\stopfunctioncall + +Returns \type {true} if, for the purpose of line boundary discovery when +character protrusion is active, this node can be skipped. + +\subsection{Attribute handling} + +Attributes appear as linked list of userdata objects in the \type {attr} field of +individual nodes. They can be handled individually, but it is much safer and more +efficient to use the dedicated functions associated with them. + +\subsubsection{\type {node.has_attribute}} + +\startfunctioncall +<number> v = node.has_attribute(<node> n, <number> id) +<number> v = node.has_attribute(<node> n, <number> id, <number> val) +\stopfunctioncall + +Tests if a node has the attribute with number \type {id} set. If \type {val} is +also supplied, also tests if the value matches \type {val}. It returns the value, +or, if no match is found, \type {nil}. + +\subsubsection{\type {node.set_attribute}} + +\startfunctioncall +node.set_attribute(<node> n, <number> id, <number> val) +\stopfunctioncall + +Sets the attribute with number \type {id} to the value \type {val}. Duplicate +assignments are ignored. {\em [needs explanation]} + +\subsubsection{\type {node.unset_attribute}} + +\startfunctioncall +<number> v = node.unset_attribute(<node> n, <number> id) +<number> v = node.unset_attribute(<node> n, <number> id, <number> val) +\stopfunctioncall + +Unsets the attribute with number \type {id}. If \type {val} is also supplied, it +will only perform this operation if the value matches \type {val}. Missing +attributes or attribute|-|value pairs are ignored. + +If the attribute was actually deleted, returns its old value. Otherwise, returns +\type {nil}. + +\section{The \type {pdf} library} + +This contains variables and functions that are related to the \PDF\ backend. + +\subsection{\type {pdf.mapfile}, \type {pdf.mapline}} + +\startfunctioncall +pdf.mapfile(<string> map file) +pdf.mapline(<string> map line) +\stopfunctioncall + +These two functions can be used to replace primitives \type {\pdfmapfile} and +\type {\pdfmapline} from \PDFTEX. They expect a string as only parameter and have +no return value. + +The also functions replace the former variables \type {pdf.pdfmapfile} and +\type {pdf.pdfmapline}. + +\subsection{\type {pdf.catalog}, \type {pdf.info},\type {pdf.names}, + \type {pdf.trailer}} + +These variables offer a read|-|write interface to the corresponding \PDFTEX\ +token lists. The value types are strings and they are written out to the \PDF\ +file directly after the \PDFTEX\ token registers. + +The preferred interface is now \type {pdf.setcatalog}, \type {pdf.setinfo} +\type {pdf.setnames} and \type {pdf.settrailer} for setting these properties +and \type {pdf.getcatalog}, \type {pdf.getinfo} \type {pdf.getnames} and +\type {pdf.gettrailer} for querying them, + +The corresponding \quote {\type {pdf}} parameter names \type {pdf.pdfcatalog}, +\type {pdf.pdfinfo}, \type {pdf.pdfnames}, and \type {pdf.pdftrailer} are +not available. + +\subsection{\type {pdf.<set/get>pageattributes}, \type {pdf.<set/get>pageresources}, + \type {pdf.<set/get>pagesattributes}} + +These variables offer a read|-|write interface to related token lists. The value +types are strings. The variables have no interaction with the corresponding +\PDFTEX\ token registers \type {\pdfpageattr}, \type {\pdfpageresources}, and \type +{\pdfpagesattr}. They are written out to the \PDF\ file directly after the +\PDFTEX\ token registers. + +The preferred interface is now \type {pdf.setpageattributes}, \type +{pdf.setpagesattributes} and \type {pdf.setpageresources} for setting these +properties and \type {pdf.getpageattributes}, \type {pdf.getpageattributes} +and \type {pdf.getpageresources} for querying them. + +\subsection{\type {pdf.h}, \type {pdf.v}} + +These are the \type {h} and \type {v} values that define the current location on +the output page, measured from its lower left corner. The values can be queried +using scaled points as units. + +\starttyping +local h = pdf.h +local v = pdf.v +\stoptyping + +\subsection{\type {pdf.getpos}, \type {pdf.gethpos}, \type {pdf.getvpos}} + +These are the function variants of \type {pdf.h} and \type {pdf.v}. Sometimes +using a function is preferred over a key so this saves wrapping. Also, these +functions are faster then the key based access, as \type {h} and \type {v} keys +are not real variables but looked up using a metatable call. The \type {getpos} +function returns two values, the other return one. + +\starttyping +local h, v = pdf.getpos() +\stoptyping + +\subsection{\type {pdf.hasmatrix}, \type {pdf.getmatrix}} + +The current matrix transformation is available via the \type {getmatrix} command, +which returns 6 values: \type {sx}, \type {rx}, \type {ry}, \type {sy}, \type +{tx}, and \type {ty}. The \type {hasmatrix} function returns \type {true} when a +matrix is applied. + +\starttyping +if pdf.hasmatrix() then + local sx, rx, ry, sy, tx, ty = pdf.getmatrix() + -- do something useful or not +end +\stoptyping + +\subsection{\type {pdf.print}} + +A print function to write stuff to the \PDF\ document that can be used from +within a \type {\latelua} argument. This function is not to be used inside +\type {\directlua} unless you know {\it exactly} what you are doing. + +\startfunctioncall +pdf.print(<string> s) +pdf.print(<string> type, <string> s) +\stopfunctioncall + +The optional parameter can be used to mimic the behavior of \type {\pdfliteral}: +the \type {type} is \type {direct} or \type {page}. + +\subsection{\type {pdf.immediateobj}} + +This function creates a \PDF\ object and immediately writes it to the \PDF\ file. +It is modelled after \PDFTEX's \type {\immediate} \type {\pdfobj} primitives. All +function variants return the object number of the newly generated object. + +\startfunctioncall +<number> n = pdf.immediateobj(<string> objtext) +<number> n = pdf.immediateobj("file", <string> filename) +<number> n = pdf.immediateobj("stream", <string> streamtext, <string> attrtext) +<number> n = pdf.immediateobj("streamfile", <string> filename, <string> attrtext) +\stopfunctioncall + +The first version puts the \type {objtext} raw into an object. Only the object +wrapper is automatically generated, but any internal structure (like \type {<< +>>} dictionary markers) needs to provided by the user. The second version with +keyword \type {"file"} as 1st argument puts the contents of the file with name +\type {filename} raw into the object. The third version with keyword \type +{"stream"} creates a stream object and puts the \type {streamtext} raw into the +stream. The stream length is automatically calculated. The optional \type +{attrtext} goes into the dictionary of that object. The fourth version with +keyword \type {"streamfile"} does the same as the 3rd one, it just reads the +stream data raw from a file. + +An optional first argument can be given to make the function use a previously +reserved \PDF\ object. + +\startfunctioncall +<number> n = pdf.immediateobj(<integer> n, <string> objtext) +<number> n = pdf.immediateobj(<integer> n, "file", <string> filename) +<number> n = pdf.immediateobj(<integer> n, "stream", <string> streamtext, <string> attrtext) +<number> n = pdf.immediateobj(<integer> n, "streamfile", <string> filename, <string> attrtext) +\stopfunctioncall + +\subsection{\type {pdf.obj}} + +This function creates a \PDF\ object, which is written to the \PDF\ file only +when referenced, e.g., by \type {pdf.refobj()}. + +All function variants return the object number of the newly generated object, and +there are two separate calling modes. + +The first mode is modelled after \PDFTEX's \type {\pdfobj} primitive. + +\startfunctioncall +<number> n = pdf.obj(<string> objtext) +<number> n = pdf.obj("file", <string> filename) +<number> n = pdf.obj("stream", <string> streamtext, <string> attrtext) +<number> n = pdf.obj("streamfile", <string> filename, <string> attrtext) +\stopfunctioncall + +An optional first argument can be given to make the function use a previously +reserved \PDF\ object. + +\startfunctioncall +<number> n = pdf.obj(<integer> n, <string> objtext) +<number> n = pdf.obj(<integer> n, "file", <string> filename) +<number> n = pdf.obj(<integer> n, "stream", <string> streamtext, <string> attrtext) +<number> n = pdf.obj(<integer> n, "streamfile", <string> filename, <string> attrtext) +\stopfunctioncall + +The second mode accepts a single argument table with key--value pairs. + +\startfunctioncall +<number> n = pdf.obj { + type = <string>, + immmediate = <boolean>, + objnum = <number>, + attr = <string>, + compresslevel = <number>, + objcompression = <boolean>, + file = <string>, + string = <string> +} +\stopfunctioncall + +The \type {type} field can have the values \type {raw} and \type {stream}, this +field is required, the others are optional (within constraints). + +Note: this mode makes \type {pdf.obj} look more flexible than it actually is: the +constraints from the separate parameter version still apply, so for example you +can't have both \type {string} and \type {file} at the same time. + +\subsection{\type {pdf.refobj}} + +This function, the \LUA\ version of the \type {\pdfrefobj} primitive, references an +object by its object number, so that the object will be written out. + +\startfunctioncall +pdf.refobj(<integer> n) +\stopfunctioncall + +This function works in both the \type {\directlua} and \type {\latelua} environment. +Inside \type {\directlua} a new whatsit node \quote {pdf_refobj} is created, which +will be marked for flushing during page output and the object is then written +directly after the page, when also the resources objects are written out. Inside +\type {\latelua} the object will be marked for flushing. + +This function has no return values. + +\subsection{\type {pdf.reserveobj}} + +This function creates an empty \PDF\ object and returns its number. + +\startfunctioncall +<number> n = pdf.reserveobj() +<number> n = pdf.reserveobj("annot") +\stopfunctioncall + +\subsection{\type {pdf.registerannot}} + +This function adds an object number to the \type {/Annots} array for the current +page without doing anything else. This function can only be used from within +\type {\latelua}. + +\startfunctioncall +pdf.registerannot (<number> objnum) +\stopfunctioncall + +\section{The \type {pdfscanner} library} + +The \type {pdfscanner} library allows interpretation of PDF content streams and +\type {/ToUnicode} (cmap) streams. You can get those streams from the \type +{epdf} library, as explained in an earlier section. There is only a single +top|-|level function in this library: + +\startfunctioncall +pdfscanner.scan (<Object> stream, <table> operatortable, <table> info) +\stopfunctioncall + +The first argument, \type {stream}, should be either a PDF stream object, or a +PDF array of PDF stream objects (those options comprise the possible return +values of \type {<Page>:getContents()} and \type {<Object>:getStream()} in the +\type {epdf} library). + +The second argument, \type {operatortable}, should be a Lua table where the keys +are PDF operator name strings and the values are Lua functions (defined by you) +that are used to process those operators. The functions are called whenever the +scanner finds one of these PDF operators in the content stream(s). The functions +are called with two arguments: the \type {scanner} object itself, and the \type +{info} table that was passed are the third argument to \type {pdfscanner.scan}. + +Internally, \type {pdfscanner.scan} loops over the PDF operators in the +stream(s), collecting operands on an internal stack until it finds a PDF +operator. If that PDF operator's name exists in \type {operatortable}, then the +associated function is executed. After the function has run (or when there is no +function to execute) the internal operand stack is cleared in preparation for the +next operator, and processing continues. + +The \type {scanner} argument to the processing functions is needed because it +offers various methods to get the actual operands from the internal operand +stack. + +A simple example of processing a PDF's document stream could look like this: + +\starttyping +function Do (scanner, info) + local val = scanner:pop() + local name = val[2] -- val[1] == 'name' + local resources = info.resources + local xobject = resources:lookup("XObject"):getDict():lookup(name) + print (info.space ..'Use XObject '.. name) + if xobject and xobject:isStream() then + local dict = xobject:getStream():getDict() + if dict then + local name = dict:lookup("Subtype") + if name:getName() == "Form" then + local newinfo = { + space = info.space .. " " , + resources = dict:lookup("Resources"):getDict() + } + pdfscanner.scan(xobject, operatortable, newinfo) + end + end + end +end + +operatortable = { Do = Do } + +doc = epdf.open(arg[1]) +pagenum = 1 + +while pagenum <= doc:getNumPages() do + local page = doc:getCatalog():getPage(pagenum) + local info = { + space = " " , + resources = page:getResourceDict() + } + print('Page ' .. pagenum) + pdfscanner.scan(page:getContents(), operatortable, info) + pagenum = pagenum + 1 +end +\stoptyping + +This example iterates over all the actual content in the PDF, and prints out the +found XObject names. While the code demonstrates quite some of the \type {epdf} +functions, let's focus on the type \type {pdfscanner} specific code instead. + +From the bottom up, the line + +\starttyping + pdfscanner.scan(page:getContents(), operatortable, info) +\stoptyping + +runs the scanner with the PDF page's top-level content. + +The third argument, \type {info}, contains two entries: \type {space} is used to +indent the printed output, and \type {resources} is needed so that embedded \type +{XForms} can find their own content. + +The second argument, \type {operatortable} defines a processing function for a +single PDF operator, \type {Do}. + +The function \type {Do} prints the name of the current XObject, and then starts a +new scanner for that object's content stream, under the condition that the +XObject is in fact a \type {/Form}. That nested scanner is called with new \type +{info} argument with an updated \type {space} value so that the indentation of +the output nicely nests, and with an new \type {resources} field to help the next +iteration down to properly process any other, embedded XObjects. + +Of course, this is not a very useful example in practise, but for the purpose of +demonstrating \type {pdfscanner}, it is just long enough. It makes use of only +one \type {scanner} method: \type {scanner:pop()}. That function pops the top +operand of the internal stack, and returns a lua table where the object at index +one is a string representing the type of the operand, and object two is its +value. + +The list of possible operand types and associated lua value types is: + +\starttabulate[|lT|p|] +\NC integer \NC <number> \NC \NR +\NC real \NC <number> \NC \NR +\NC boolean \NC <boolean> \NC \NR +\NC name \NC <string> \NC \NR +\NC operator \NC <string> \NC \NR +\NC string \NC <string> \NC \NR +\NC array \NC <table> \NC \NR +\NC dict \NC <table> \NC \NR +\stoptabulate + +In case of \type {integer} or \type {real}, the value is always a \LUA\ (floating +point) number. + +In case of \type {name}, the leading slash is always stripped. + +In case of \type {string}, please bear in mind that PDF actually supports +different types of strings (with different encodings) in different parts of the +PDF document, so may need to reencode some of the results; \type {pdfscanner} +always outputs the byte stream without reencoding anything. \type {pdfscanner} +does not differentiate between literal strings and hexidecimal strings (the +hexadecimal values are decoded), and it treats the stream data for inline images +as a string that is the single operand for \type {EI}. + +In case of \type {array}, the table content is a list of \type {pop} return +values. + +In case of \type {dict}, the table keys are PDF name strings and the values are +\type {pop} return values. + +\blank + +There are few more methods defined that you can ask \type {scanner}: + +\starttabulate[|lT|p|] +\NC pop \NC as explained above \NC \NR +\NC popNumber \NC return only the value of a \type {real} or \type {integer} \NC \NR +\NC popName \NC return only the value of a \type {name} \NC \NR +\NC popString \NC return only the value of a \type {string} \NC \NR +\NC popArray \NC return only the value of a \type {array} \NC \NR +\NC popDict \NC return only the value of a \type {dict} \NC \NR +\NC popBool \NC return only the value of a \type {boolean} \NC \NR +\NC done \NC abort further processing of this \type {scan()} call \NC \NR +\stoptabulate + +The \type {popXXX} are convenience functions, and come in handy when you know the +type of the operands beforehand (which you usually do, in PDF). For example, the +\type {Do} function could have used \type {local name = scanner:popName()} +instead, because the single operand to the \type {Do} operator is always a PDF +name object. + +The \type {done} function allows you to abort processing of a stream once you +have learned everything you want to learn. This comes in handy while parsing +\type {/ToUnicode}, because there usually is trailing garbage that you are not +interested in. Without \type {done}, processing only end at the end of the +stream, possibly wasting CPU cycles. + +\section{The \type {status} library} + +This contains a number of run|-|time configuration items that you may find useful +in message reporting, as well as an iterator function that gets all of the names +and values as a table. + +\startfunctioncall +<table> info = status.list() +\stopfunctioncall + +The keys in the table are the known items, the value is the current value. Almost +all of the values in \type {status} are fetched through a metatable at run|-|time +whenever they are accessed, so you cannot use \type {pairs} on \type {status}, +but you {\it can\/} use \type {pairs} on \type {info}, of course. If you do not +need the full list, you can also ask for a single item by using its name as an +index into \type {status}. + +The current list is: + +\starttabulate[|lT|p|] +\NC \ssbf key \NC \bf explanation \NC \NR +\NC pdf_gone \NC written \PDF\ bytes \NC \NR +\NC pdf_ptr \NC not yet written \PDF\ bytes \NC \NR +\NC dvi_gone \NC written \DVI\ bytes \NC \NR +\NC dvi_ptr \NC not yet written \DVI\ bytes \NC \NR +\NC total_pages \NC number of written pages \NC \NR +\NC output_file_name \NC name of the \PDF\ or \DVI\ file \NC \NR +\NC log_name \NC name of the log file \NC \NR +\NC banner \NC terminal display banner \NC \NR +\NC var_used \NC variable (one|-|word) memory in use \NC \NR +\NC dyn_used \NC token (multi|-|word) memory in use \NC \NR +\NC str_ptr \NC number of strings \NC \NR +\NC init_str_ptr \NC number of \INITEX\ strings \NC \NR +\NC max_strings \NC maximum allowed strings \NC \NR +\NC pool_ptr \NC string pool index \NC \NR +\NC init_pool_ptr \NC \INITEX\ string pool index \NC \NR +\NC pool_size \NC current size allocated for string characters \NC \NR +\NC node_mem_usage \NC a string giving insight into currently used nodes \NC \NR +\NC var_mem_max \NC number of allocated words for nodes \NC \NR +\NC fix_mem_max \NC number of allocated words for tokens \NC \NR +\NC fix_mem_end \NC maximum number of used tokens \NC \NR +\NC cs_count \NC number of control sequences \NC \NR +\NC hash_size \NC size of hash \NC \NR +\NC hash_extra \NC extra allowed hash \NC \NR +\NC font_ptr \NC number of active fonts \NC \NR +\NC max_in_stack \NC max used input stack entries \NC \NR +\NC max_nest_stack \NC max used nesting stack entries \NC \NR +\NC max_param_stack \NC max used parameter stack entries \NC \NR +\NC max_buf_stack \NC max used buffer position \NC \NR +\NC max_save_stack \NC max used save stack entries \NC \NR +\NC stack_size \NC input stack size \NC \NR +\NC nest_size \NC nesting stack size \NC \NR +\NC param_size \NC parameter stack size \NC \NR +\NC buf_size \NC current allocated size of the line buffer \NC \NR +\NC save_size \NC save stack size \NC \NR +\NC obj_ptr \NC max \PDF\ object pointer \NC \NR +\NC obj_tab_size \NC \PDF\ object table size \NC \NR +\NC pdf_os_cntr \NC max \PDF\ object stream pointer \NC \NR +\NC pdf_os_objidx \NC \PDF\ object stream index \NC \NR +\NC pdf_dest_names_ptr \NC max \PDF\ destination pointer \NC \NR +\NC dest_names_size \NC \PDF\ destination table size \NC \NR +\NC pdf_mem_ptr \NC max \PDF\ memory used \NC \NR +\NC pdf_mem_size \NC \PDF\ memory size \NC \NR +\NC largest_used_mark \NC max referenced marks class \NC \NR +\NC filename \NC name of the current input file \NC \NR +\NC inputid \NC numeric id of the current input \NC \NR +\NC linenumber \NC location in the current input file \NC \NR +\NC lasterrorstring \NC last error string\NC \NR +\NC luabytecodes \NC number of active \LUA\ bytecode registers \NC \NR +\NC luabytecode_bytes \NC number of bytes in \LUA\ bytecode registers \NC \NR +\NC luastate_bytes \NC number of bytes in use by \LUA\ interpreters \NC \NR +\NC output_active \NC \type {true} if the \type {\output} routine is active \NC \NR +\NC callbacks \NC total number of executed callbacks so far \NC \NR +\NC indirect_callbacks \NC number of those that were themselves + a result of other callbacks (e.g. file readers) \NC \NR +\NC luatex_svn \NC the luatex repository id \NC \NR +\NC luatex_version \NC the luatex version number \NC \NR +\NC luatex_revision \NC the luatex revision string \NC \NR +\NC ini_version \NC \type {true} if this is an \INITEX\ run \NC \NR +\stoptabulate + +\section{The \type {tex} library} + +The \type {tex} table contains a large list of virtual internal \TEX\ +parameters that are partially writable. + +The designation \quote {virtual} means that these items are not properly defined +in \LUA, but are only front\-ends that are handled by a metatable that operates +on the actual \TEX\ values. As a result, most of the \LUA\ table operators (like +\type {pairs} and \type {#}) do not work on such items. + +At the moment, it is possible to access almost every parameter that has these +characteristics: + +\startitemize[packed] +\item You can use it after \type {\the} +\item It is a single token. +\item Some special others, see the list below +\stopitemize + +This excludes parameters that need extra arguments, like \type {\the\scriptfont}. + +The subset comprising simple integer and dimension registers are +writable as well as readable (stuff like \type {\tracingcommands} and +\type {\parindent}). + +\subsection{Internal parameter values} + +For all the parameters in this section, it is possible to access them directly +using their names as index in the \type {tex} table, or by using one of the +functions \type {tex.get()} and \type {tex.set()}. + +The exact parameters and return values differ depending on the actual parameter, +and so does whether \type {tex.set} has any effect. For the parameters that {\it +can\/} be set, it is possible to use \type {global} as the first argument to +\type {tex.set}; this makes the assignment global instead of local. + +\startfunctioncall +tex.set (<string> n, ...) +tex.set ('global', <string> n, ...) +... = tex.get (<string> n) +\stopfunctioncall + +\subsubsection{Integer parameters} + +The integer parameters accept and return \LUA\ numbers. + +Read|-|write: + +\starttwocolumns +\starttyping +tex.adjdemerits +tex.binoppenalty +tex.brokenpenalty +tex.catcodetable +tex.clubpenalty +tex.day +tex.defaulthyphenchar +tex.defaultskewchar +tex.delimiterfactor +tex.displaywidowpenalty +tex.doublehyphendemerits +tex.endlinechar +tex.errorcontextlines +tex.escapechar +tex.exhyphenpenalty +tex.fam +tex.finalhyphendemerits +tex.floatingpenalty +tex.globaldefs +tex.hangafter +tex.hbadness +tex.holdinginserts +tex.hyphenpenalty +tex.interlinepenalty +tex.language +tex.lastlinefit +tex.lefthyphenmin +tex.linepenalty +tex.localbrokenpenalty +tex.localinterlinepenalty +tex.looseness +tex.mag +tex.maxdeadcycles +tex.month +tex.newlinechar +tex.outputpenalty +tex.pausing +tex.pdfadjustspacing +tex.pdfcompresslevel +tex.pdfdecimaldigits +tex.pdfgamma +tex.pdfgentounicode +tex.pdfimageapplygamma +tex.pdfimagegamma +tex.pdfimagehicolor +tex.pdfimageresolution +tex.pdfinclusionerrorlevel +tex.pdfminorversion +tex.pdfobjcompresslevel +tex.pdfoutput +tex.pdfpagebox +tex.pdfpkresolution +tex.pdfprotrudechars +tex.pdftracingfonts +tex.pdfuniqueresname +tex.postdisplaypenalty +tex.predisplaydirection +tex.predisplaypenalty +tex.pretolerance +tex.relpenalty +tex.righthyphenmin +tex.savinghyphcodes +tex.savingvdiscards +tex.showboxbreadth +tex.showboxdepth +tex.time +tex.tolerance +tex.tracingassigns +tex.tracingcommands +tex.tracinggroups +tex.tracingifs +tex.tracinglostchars +tex.tracingmacros +tex.tracingnesting +tex.tracingonline +tex.tracingoutput +tex.tracingpages +tex.tracingparagraphs +tex.tracingrestores +tex.tracingscantokens +tex.tracingstats +tex.uchyph +tex.vbadness +tex.widowpenalty +tex.year +\stoptyping +\stoptwocolumns + +Read|-|only: + +\startthreecolumns +\starttyping +tex.deadcycles +tex.insertpenalties +tex.parshape +tex.prevgraf +tex.spacefactor +\stoptyping +\stopthreecolumns + +\subsubsection{Dimension parameters} + +The dimension parameters accept \LUA\ numbers (signifying scaled points) or +strings (with included dimension). The result is always a number in scaled +points. + +Read|-|write: + +\startthreecolumns +\starttyping +tex.boxmaxdepth +tex.delimitershortfall +tex.displayindent +tex.displaywidth +tex.emergencystretch +tex.hangindent +tex.hfuzz +tex.hoffset +tex.hsize +tex.lineskiplimit +tex.mathsurround +tex.maxdepth +tex.nulldelimiterspace +tex.overfullrule +tex.pagebottomoffset +tex.pageheight +tex.pageleftoffset +tex.pagerightoffset +tex.pagetopoffset +tex.pagewidth +tex.parindent +tex.pdfdestmargin +tex.pdfhorigin +tex.pdflinkmargin +tex.pdfpageheight +tex.pdfpagewidth +tex.pdfpxdimen +tex.pdfthreadmargin +tex.pdfvorigin +tex.predisplaysize +tex.scriptspace +tex.splitmaxdepth +tex.vfuzz +tex.voffset +tex.vsize +\stoptyping +\stopthreecolumns + +Read|-|only: + +\startthreecolumns +\starttyping +tex.pagedepth +tex.pagefilllstretch +tex.pagefillstretch +tex.pagefilstretch +tex.pagegoal +tex.pageshrink +tex.pagestretch +tex.pagetotal +tex.prevdepth +\stoptyping +\stopthreecolumns + +\subsubsection{Direction parameters} + +The direction parameters are read|-|only and return a \LUA\ string. + +\startthreecolumns +\starttyping +tex.bodydir +tex.mathdir +tex.pagedir +tex.pardir +tex.textdir +\stoptyping +\stopthreecolumns + +\subsubsection{Glue parameters} + +The glue parameters accept and return a userdata object that represents a \type +{glue_spec} node. + +\startthreecolumns +\starttyping +tex.abovedisplayshortskip +tex.abovedisplayskip +tex.baselineskip +tex.belowdisplayshortskip +tex.belowdisplayskip +tex.leftskip +tex.lineskip +tex.parfillskip +tex.parskip +tex.rightskip +tex.spaceskip +tex.splittopskip +tex.tabskip +tex.topskip +tex.xspaceskip +\stoptyping +\stopthreecolumns + +\subsubsection{Muglue parameters} + +All muglue parameters are to be used read|-|only and return a \LUA\ string. + +\startthreecolumns +\starttyping +tex.medmuskip +tex.thickmuskip +tex.thinmuskip +\stoptyping +\stopthreecolumns + +\subsubsection{Tokenlist parameters} + +The tokenlist parameters accept and return \LUA\ strings. \LUA\ strings are +converted to and from token lists using \type {\the} \type {\toks} style expansion: +all category codes are either space (10) or other (12). It follows that assigning +to some of these, like \quote {tex.output}, is actually useless, but it feels bad +to make exceptions in view of a coming extension that will accept full|-|blown +token strings. + +\startthreecolumns +\starttyping +tex.errhelp +tex.everycr +tex.everydisplay +tex.everyeof +tex.everyhbox +tex.everyjob +tex.everymath +tex.everypar +tex.everyvbox +tex.output +tex.pdfpageattr +tex.pdfpageresources +tex.pdfpagesattr +tex.pdfpkmode +\stoptyping +\stopthreecolumns + +\subsection{Convert commands} + +All \quote {convert} commands are read|-|only and return a \LUA\ string. The +supported commands at this moment are: + +\starttwocolumns +\starttyping +tex.eTeXVersion +tex.eTeXrevision +tex.formatname +tex.jobname +tex.luatexbanner +tex.luatexrevision +tex.pdfnormaldeviate +tex.fontname(number) +tex.pdffontname(number) +tex.pdffontobjnum(number) +tex.pdffontsize(number) +tex.uniformdeviate(number) +tex.number(number) +tex.romannumeral(number) +tex.pdfpageref(number) +tex.pdfxformname(number) +tex.fontidentifier(number) +\stoptyping +\stoptwocolumns + +If you are wondering why this list looks haphazard; these are all the cases of +the \quote {convert} internal command that do not require an argument, as well as +the ones that require only a simple numeric value. + +The special (lua-only) case of \type {tex.fontidentifier} returns the \type +{csname} string that matches a font id number (if there is one). + +if these are really needed in a macro package. + +\subsection{Last item commands} + +All \quote {last item} commands are read|-|only and return a number. + +The supported commands at this moment are: + +\startthreecolumns +\starttyping +tex.lastpenalty +tex.lastkern +tex.lastskip +tex.lastnodetype +tex.inputlineno +tex.pdflastobj +tex.pdflastxform +tex.pdflastximage +tex.pdflastximagepages +tex.pdflastannot +tex.pdflastxpos +tex.pdflastypos +tex.pdfrandomseed +tex.pdflastlink +tex.luatexversion +tex.eTeXminorversion +tex.eTeXversion +tex.currentgrouplevel +tex.currentgrouptype +tex.currentiflevel +tex.currentiftype +tex.currentifbranch +tex.pdflastximagecolordepth +\stoptyping +\stopthreecolumns + +\subsection{Attribute, count, dimension, skip and token registers} + +\TEX's attributes (\type {\attribute}), counters (\type {\count}), dimensions (\type +{\dimen}), skips (\type {\skip}) and token (\type {\toks}) registers can be accessed +and written to using two times five virtual sub|-|tables of the \type {tex} +table: + +\startthreecolumns +\starttyping +tex.attribute +tex.count +tex.dimen +tex.skip +tex.toks +\stoptyping +\stopthreecolumns + +It is possible to use the names of relevant \type {\attributedef}, \type {\countdef}, +\type {\dimendef}, \type {\skipdef}, or \type {\toksdef} control sequences as indices +to these tables: + +\starttyping +tex.count.scratchcounter = 0 +enormous = tex.dimen['maxdimen'] +\stoptyping + +In this case, \LUATEX\ looks up the value for you on the fly. You have to use a +valid \type {\countdef} (or \type {\attributedef}, or \type {\dimendef}, or \type +{\skipdef}, or \type {\toksdef}), anything else will generate an error (the intent +is to eventually also allow \type {<chardef tokens>} and even macros that expand +into a number). + +The attribute and count registers accept and return \LUA\ numbers. + +The dimension registers accept \LUA\ numbers (in scaled points) or strings (with +an included absolute dimension; \type {em} and \type {ex} and \type {px} are +forbidden). The result is always a number in scaled points. + +The token registers accept and return \LUA\ strings. \LUA\ strings are converted +to and from token lists using \type {\the} \type {\toks} style expansion: all +category codes are either space (10) or other (12). + +The skip registers accept and return \type {glue_spec} userdata node objects (see +the description of the node interface elsewhere in this manual). + +As an alternative to array addressing, there are also accessor functions defined +for all cases, for example, here is the set of possibilities for \type {\skip} +registers: + +\startfunctioncall +tex.setskip (<number> n, <node> s) +tex.setskip (<string> s, <node> s) +tex.setskip ('global',<number> n, <node> s) +tex.setskip ('global',<string> s, <node> s) +<node> s = tex.getskip (<number> n) +<node> s = tex.getskip (<string> s) +\stopfunctioncall + +In the function-based interface, it is possible to define values globally by +using the string \type {global} as the first function argument. + +\subsection{Character code registers} + +\TEX's character code tables (\type {\lccode}, \type {\uccode}, \type {\sfcode}, \type +{\catcode}, \type {\mathcode}, \type {\delcode}) can be accessed and written to using +six virtual subtables of the \type {tex} table + +\startthreecolumns +\starttyping +tex.lccode +tex.uccode +tex.sfcode +tex.catcode +tex.mathcode +tex.delcode +\stoptyping +\stopthreecolumns + +The function call interfaces are roughly as above, but there are a few twists. +\type {sfcode}s are the simple ones: + +\startfunctioncall +tex.setsfcode (<number> n, <number> s) +tex.setsfcode ('global', <number> n, <number> s) +<number> s = tex.getsfcode (<number> n) +\stopfunctioncall + +The function call interface for \type {lccode} and \type {uccode} additionally +allows you to set the associated sibling at the same time: + +\startfunctioncall +tex.setlccode (['global'], <number> n, <number> lc) +tex.setlccode (['global'], <number> n, <number> lc, <number> uc) +<number> lc = tex.getlccode (<number> n) +tex.setuccode (['global'], <number> n, <number> uc) +tex.setuccode (['global'], <number> n, <number> uc, <number> lc) +<number> uc = tex.getuccode (<number> n) +\stopfunctioncall + +The function call interface for \type {catcode} also allows you to specify a +category table to use on assignment or on query (default in both cases is the +current one): + +\startfunctioncall +tex.setcatcode (['global'], <number> n, <number> c) +tex.setcatcode (['global'], <number> cattable, <number> n, <number> c) +<number> lc = tex.getcatcode (<number> n) +<number> lc = tex.getcatcode (<number> cattable, <number> n) +\stopfunctioncall + +The interfaces for \type {delcode} and \type {mathcode} use small array tables to +set and retrieve values: + +\startfunctioncall +tex.setmathcode (['global'], <number> n, <table> mval ) +<table> mval = tex.getmathcode (<number> n) +tex.setdelcode (['global'], <number> n, <table> dval ) +<table> dval = tex.getdelcode (<number> n) +\stopfunctioncall + +Where the table for \type {mathcode} is an array of 3 numbers, like this: + +\starttyping +{<number> mathclass, <number> family, <number> character} +\stoptyping + +And the table for \type {delcode} is an array with 4 numbers, like this: + +\starttyping +{<number> small_fam, <number> small_char, <number> large_fam, <number> large_char} +\stoptyping + +Normally, the third and fourth values in a delimiter code assignment will be zero +according to \type {\Udelcode} usage, but the returned table can have values there +(if the delimiter code was set using \type {\delcode}, for example). Unset \type +{delcode}'s can be recognized because \type {dval[1]} is $-1$. + +\subsection{Box registers} + +It is possible to set and query actual boxes, using the node interface as defined +in the \type {node} library: + +\starttyping +tex.box +\stoptyping + +for array access, or + +\starttyping +tex.setbox(<number> n, <node> s) +tex.setbox(<string> cs, <node> s) +tex.setbox('global', <number> n, <node> s) +tex.setbox('global', <string> cs, <node> s) +<node> n = tex.getbox(<number> n) +<node> n = tex.getbox(<string> cs) +\stoptyping + +for function|-|based access. In the function-based interface, it is possible to +define values globally by using the string \type {global} as the first function +argument. + +Be warned that an assignment like + +\starttyping +tex.box[0] = tex.box[2] +\stoptyping + +does not copy the node list, it just duplicates a node pointer. If \type {\box2} +will be cleared by \TEX\ commands later on, the contents of \type {\box0} becomes +invalid as well. To prevent this from happening, always use \type +{node.copy_list()} unless you are assigning to a temporary variable: + +\starttyping +tex.box[0] = node.copy_list(tex.box[2]) +\stoptyping + +\subsection{Math parameters} + +It is possible to set and query the internal math parameters using: + +\startfunctioncall +tex.setmath(<string> n, <string> t, <number> n) +tex.setmath('global', <string> n, <string> t, <number> n) +<number> n = tex.getmath(<string> n, <string> t) +\stopfunctioncall + +As before an optional first parameter \type {global} indicates a global +assignment. + +The first string is the parameter name minus the leading \quote {Umath}, and the +second string is the style name minus the trailing \quote {style}. + +Just to be complete, the values for the math parameter name are: + +\starttyping +quad axis operatorsize +overbarkern overbarrule overbarvgap +underbarkern underbarrule underbarvgap +radicalkern radicalrule radicalvgap +radicaldegreebefore radicaldegreeafter radicaldegreeraise +stackvgap stacknumup stackdenomdown +fractionrule fractionnumvgap fractionnumup +fractiondenomvgap fractiondenomdown fractiondelsize +limitabovevgap limitabovebgap limitabovekern +limitbelowvgap limitbelowbgap limitbelowkern +underdelimitervgap underdelimiterbgap +overdelimitervgap overdelimiterbgap +subshiftdrop supshiftdrop subshiftdown +subsupshiftdown subtopmax supshiftup +supbottommin supsubbottommax subsupvgap +spaceafterscript connectoroverlapmin +ordordspacing ordopspacing ordbinspacing ordrelspacing +ordopenspacing ordclosespacing ordpunctspacing ordinnerspacing +opordspacing opopspacing opbinspacing oprelspacing +opopenspacing opclosespacing oppunctspacing opinnerspacing +binordspacing binopspacing binbinspacing binrelspacing +binopenspacing binclosespacing binpunctspacing bininnerspacing +relordspacing relopspacing relbinspacing relrelspacing +relopenspacing relclosespacing relpunctspacing relinnerspacing +openordspacing openopspacing openbinspacing openrelspacing +openopenspacing openclosespacing openpunctspacing openinnerspacing +closeordspacing closeopspacing closebinspacing closerelspacing +closeopenspacing closeclosespacing closepunctspacing closeinnerspacing +punctordspacing punctopspacing punctbinspacing punctrelspacing +punctopenspacing punctclosespacing punctpunctspacing punctinnerspacing +innerordspacing inneropspacing innerbinspacing innerrelspacing +inneropenspacing innerclosespacing innerpunctspacing innerinnerspacing +\stoptyping + +The values for the style parameter name are: + +\starttyping +display crampeddisplay +text crampedtext +script crampedscript +scriptscript crampedscriptscript +\stoptyping + +\subsection{Special list heads} + +The virtual table \type {tex.lists} contains the set of internal registers that +keep track of building page lists. + +\starttabulate[|lT|p|] +\NC \bf field \NC \bf description \NC \NR +\NC page_ins_head \NC circular list of pending insertions \NC \NR +\NC contrib_head \NC the recent contributions \NC \NR +\NC page_head \NC the current page content \NC \NR +%NC temp_head \NC \NC \NR +\NC hold_head \NC used for held-over items for next page \NC \NR +\NC adjust_head \NC head of the current \type {\vadjust} list \NC \NR +\NC pre_adjust_head \NC head of the current \type {\vadjust pre} list \NC \NR +%NC align_head \NC \NC \NR +\stoptabulate + +\subsection{Semantic nest levels} + +The virtual table \type {tex.nest} contains the currently active +semantic nesting state. It has two main parts: a zero-based array of userdata for +the semantic nest itself, and the numerical value \type {tex.nest.ptr}, which +gives the highest available index. Neither the array items in \type {tex.nest[]} +nor \type {tex.nest.ptr} can be assigned to (as this would confuse the +typesetting engine beyond repair), but you can assign to the individual values +inside the array items, e.g.\ \type {tex.nest[tex.nest.ptr].prevdepth}. + +\type {tex.nest[tex.nest.ptr]} is the current nest state, \type {tex.nest[0]} the +outermost (main vertical list) level. + +The known fields are: + +\starttabulate[|lT|l|l|p|] +\NC \ssbf key \NC \bf type \NC \bf modes \NC \bf explanation \NC \NR +\NC mode \NC number \NC all \NC The current mode. This is a number representing the + main mode at this level:\crlf + \type {0} == no mode (this happens during \type {\write})\crlf + \type {1} == vertical,\crlf + \type {127} = horizontal,\crlf + \type {253} = display math.\crlf + \type {-1} == internal vertical,\crlf + \type {-127} = restricted horizontal,\crlf + \type {-253} = inline math. \NC \NR +\NC modeline \NC number \NC all \NC source input line where this mode was entered in, + negative inside the output routine \NC \NR +\NC head \NC node \NC all \NC the head of the current list \NC \NR +\NC tail \NC node \NC all \NC the tail of the current list \NC \NR +\NC prevgraf \NC number \NC vmode \NC number of lines in the previous paragraph \NC \NR +\NC prevdepth \NC number \NC vmode \NC depth of the previous paragraph (equal to \type {\pdfignoreddimen} + when it is to be ignored) \NC \NR +\NC spacefactor \NC number \NC hmode \NC the current space factor \NC \NR +\NC dirs \NC node \NC hmode \NC used for temporary storage by the line break algorithm\NC \NR +\NC noad \NC node \NC mmode \NC used for temporary storage of a pending fraction numerator, + for \type {\over} etc. \NC \NR +\NC delimptr \NC node \NC mmode \NC used for temporary storage of the previous math delimiter, + for \type {\middle} \NC \NR +\NC mathdir \NC boolean \NC mmode \NC true when during math processing the \type {\mathdir} is not + the same as the surrounding \type {\textdir} \NC \NR +\NC mathstyle \NC number \NC mmode \NC the current \type {\mathstyle} \NC \NR +\stoptabulate + +\subsection[sec:luaprint]{Print functions} + +The \type {tex} table also contains the three print functions that are the +major interface from \LUA\ scripting to \TEX. + +The arguments to these three functions are all stored in an in|-|memory virtual +file that is fed to the \TEX\ scanner as the result of the expansion of +\type {\directlua}. + +The total amount of returnable text from a \type {\directlua} command is only +limited by available system \RAM. However, each separate printed string has to +fit completely in \TEX's input buffer. + +The result of using these functions from inside callbacks is undefined +at the moment. + +\subsubsection{\type {tex.print}} + +\startfunctioncall +tex.print(<string> s, ...) +tex.print(<number> n, <string> s, ...) +tex.print(<table> t) +tex.print(<number> n, <table> t) +\stopfunctioncall + +Each string argument is treated by \TEX\ as a separate input line. If there is a +table argument instead of a list of strings, this has to be a consecutive array +of strings to print (the first non-string value will stop the printing process). + +The optional parameter can be used to print the strings using the catcode regime +defined by \type {\catcodetable}~\type {n}. If \type {n} is $-1$, the currently +active catcode regime is used. If \type {n} is $-2$, the resulting catcodes are +the result of \type {\the} \type {\toks}: all category codes are 12 (other) except for +the space character, that has category code 10 (space). Otherwise, if \type {n} +is not a valid catcode table, then it is ignored, and the currently active +catcode regime is used instead. + +The very last string of the very last \type {tex.print()} command in a \type +{\directlua} will not have the \type {\endlinechar} appended, all others do. + +\subsubsection{\type {tex.sprint}} + +\startfunctioncall +tex.sprint(<string> s, ...) +tex.sprint(<number> n, <string> s, ...) +tex.sprint(<table> t) +tex.sprint(<number> n, <table> t) +\stopfunctioncall + +Each string argument is treated by \TEX\ as a special kind of input line that +makes it suitable for use as a partial line input mechanism: + +\startitemize[packed] +\startitem + \TEX\ does not switch to the \quote {new line} state, so that leading spaces + are not ignored. +\stopitem +\startitem + No \type {\endlinechar} is inserted. +\stopitem +\startitem + Trailing spaces are not removed. + + Note that this does not prevent \TEX\ itself from eating spaces as result of + interpreting the line. For example, in + +\starttyping +before\directlua{tex.sprint("\\relax")tex.sprint(" inbetween")}after +\stoptyping + the space before \type {inbetween} will be gobbled as a result of the \quote + {normal} scanning of \type {\relax}. +\stopitem +\stopitemize + +If there is a table argument instead of a list of strings, this has to +be a consecutive array of strings to print (the first non-string value +will stop the printing process). + +The optional argument sets the catcode regime, as with \type {tex.print()}. + +\subsubsection{\type {tex.tprint}} + +\startfunctioncall +tex.tprint({<number> n, <string> s, ...}, {...}) +\stopfunctioncall + +This function is basically a shortcut for repeated calls to \type +{tex.sprint(<number> n, <string> s, ...)}, once for each of the supplied argument +tables. + +\subsubsection{\type {tex.write}} + +\startfunctioncall +tex.write(<string> s, ...) +tex.write(<table> t) +\stopfunctioncall + +Each string argument is treated by \TEX\ as a special kind of input line that +makes it suitable for use as a quick way to dump information: + +\startitemize +\item All catcodes on that line are either \quote{space} (for '~') or + \quote{character} (for all others). +\item There is no \type {\endlinechar} appended. +\stopitemize + +If there is a table argument instead of a list of strings, this has to be a +consecutive array of strings to print (the first non-string value will stop the +printing process). + +\subsection{Helper functions} + +\subsubsection{\type {tex.round}} + +\startfunctioncall +<number> n = tex.round(<number> o) +\stopfunctioncall + +Rounds \LUA\ number \type {o}, and returns a number that is in the range of a +valid \TEX\ register value. If the number starts out of range, it generates a +\quote {number to big} error as well. + +\subsubsection{\type {tex.scale}} + +\startfunctioncall +<number> n = tex.scale(<number> o, <number> delta) +<table> n = tex.scale(table o, <number> delta) +\stopfunctioncall + +Multiplies the \LUA\ numbers \type {o} and \type {delta}, and returns a rounded +number that is in the range of a valid \TEX\ register value. In the table +version, it creates a copy of the table with all numeric top||level values scaled +in that manner. If the multiplied number(s) are of range, it generates +\quote{number to big} error(s) as well. + +Note: the precision of the output of this function will depend on your computer's +architecture and operating system, so use with care! An interface to \LUATEX's +internal, 100\% portable scale function will be added at a later date. + +\subsubsection{\type {tex.sp}} + +\startfunctioncall +<number> n = tex.sp(<number> o) +<number> n = tex.sp(<string> s) +\stopfunctioncall + +Converts the number \type {o} or a string \type {s} that represents an explicit +dimension into an integer number of scaled points. + +For parsing the string, the same scanning and conversion rules are used that +\LUATEX\ would use if it was scanning a dimension specifier in its \TEX|-|like +input language (this includes generating errors for bad values), expect for the +following: + +\startitemize[n] +\startitem + only explicit values are allowed, control sequences are not handled +\stopitem +\startitem + infinite dimension units (\type {fil...}) are forbidden +\stopitem +\startitem + \type {mu} units do not generate an error (but may not be useful either) +\stopitem +\stopitemize + +\subsubsection{\type {tex.definefont}} + +\startfunctioncall +tex.definefont(<string> csname, <number> fontid) +tex.definefont(<boolean> global, <string> csname, <number> fontid) +\stopfunctioncall + +Associates \type {csname} with the internal font number \type {fontid}. The +definition is global if (and only if) \type {global} is specified and true (the +setting of \type {globaldefs} is not taken into account). + +\subsubsection{\type {tex.error}} + +\startfunctioncall +tex.error(<string> s) +tex.error(<string> s, <table> help) +\stopfunctioncall + +This creates an error somewhat like the combination of \type {\errhelp} and \type +{\errmessage} would. During this error, deletions are disabled. + +The array part of the \type {help} table has to contain strings, one for each +line of error help. + +\subsubsection{\type {tex.hashtokens}} + +\startfunctioncall +for i,v in pairs (tex.hashtokens()) do ... end +\stopfunctioncall + +Returns a name and token table pair (see~\in {section} [luatokens] about token +tables) iterator for every non-zero entry in the hash table. This can be useful +for debugging, but note that this also reports control sequences that may be +unreachable at this moment due to local redefinitions: it is strictly a dump of +the hash table. + +\subsection[luaprimitives]{Functions for dealing with primitives } + +\subsubsection{\type {tex.enableprimitives}} + +\startfunctioncall +tex.enableprimitives(<string> prefix, <table> primitive names) +\stopfunctioncall + +This function accepts a prefix string and an array of primitive names. + +For each combination of \quote {prefix} and \quote {name}, the \type +{tex.enableprimitives} first verifies that \quote {name} is an actual primitive +(it must be returned by one of the \type {tex.extraprimitives()} calls explained +below, or part of \TEX82, or \type {\directlua}). If it is not, \type +{tex.enableprimitives} does nothing and skips to the next pair. + +But if it is, then it will construct a csname variable by concatenating the +\quote {prefix} and \quote {name}, unless the \quote {prefix} is already the +actual prefix of \quote {name}. In the latter case, it will discard the \quote +{prefix}, and just use \quote {name}. + +Then it will check for the existence of the constructed csname. If the csname is +currently undefined (note: that is not the same as \type {\relax}), it will +globally define the csname to have the meaning: run code belonging to the +primitive \quote {name}. If for some reason the csname is already defined, it +does nothing and tries the next pair. + +An example: + +\starttyping + tex.enableprimitives('LuaTeX', {'formatname'}) +\stoptyping + +will define \type {\LuaTeXformatname} with the same intrinsic meaning as the +documented primitive \type {\formatname}, provided that the control sequences \type +{\LuaTeXformatname} is currently undefined. + +Second example: + +\starttyping + tex.enableprimitives('Omega',tex.extraprimitives ('omega')) +\stoptyping + +will define a whole series of csnames like \type {\Omegatextdir}, \type +{\Omegapardir}, etc., but it will stick with \type {\OmegaVersion} instead of +creating the doubly-prefixed \type {\OmegaOmegaVersion}. + +When \LUATEX\ is run with \type {--ini} only the \TEX82 primitives and \type +{\directlua} are available, so no extra primitives {\bf at all}. + +If you want to have all the new functionality available using their default +names, as it is now, you will have to add + +\starttyping + \ifx\directlua\undefined \else + \directlua {tex.enableprimitives('',tex.extraprimitives ())} + \fi +\stoptyping + +near the beginning of your format generation file. Or you can choose different +prefixes for different subsets, as you see fit. + +Calling some form of \type {tex.enableprimitives()} is highly important though, +because if you do not, you will end up with a \TEX82-lookalike that can run \LUA\ +code but not do much else. The defined csnames are (of course) saved in the +format and will be available at runtime. + +\subsubsection{\type {tex.extraprimitives}} + +\startfunctioncall +<table> t = tex.extraprimitives(<string> s, ...) +\stopfunctioncall + +This function returns a list of the primitives that originate from the engine(s) +given by the requested string value(s). The possible values and their (current) +return values are: + +\startluacode +function document.showprimitives(tag) + for k, v in table.sortedpairs(tex.extraprimitives(tag)) do + if v == ' ' then + v = '\\normalcontrolspace' + end + context.type(v) + context.space() + end +end +\stopluacode + +\starttabulate[|l|pl|] +\NC \bf name\NC \bf values \NC \NR +\NC tex \NC \ctxlua{document.showprimitives('tex') } \NC \NR +\NC core \NC \ctxlua{document.showprimitives('core') } \NC \NR +\NC etex \NC \ctxlua{document.showprimitives('etex') } \NC \NR +\NC pdftex \NC \ctxlua{document.showprimitives('pdftex') } \NC \NR +\NC luatex \NC \ctxlua{document.showprimitives('luatex') } \NC \NR +\NC umath \NC \ctxlua{document.showprimitives('umath') } \NC \NR +\stoptabulate + +Note that \type {'luatex'} does not contain \type {directlua}, as that +isconsidered to be a core primitive, along with all the \TEX82 primitives, so it +is part of the list that is returned from \type {'core'}. + +\type {'umath'} is a subset of \type {'luatex'} that covers the Unicode math +primitives as it might be desired to handle the prefixing of that subset +differently. + +Running \type {tex.extraprimitives()} will give you the complete list of +primitives \type {-ini} startup. It is exactly equivalent to \type +{tex.extraprimitives('etex', 'pdftex' and 'luatex')}. + +\subsubsection{\type {tex.primitives}} + +\startfunctioncall +<table> t = tex.primitives() +\stopfunctioncall + +This function returns a hash table listing all primitives that \LUATEX\ knows +about. The keys in the hash are primitives names, the values are tables +representing tokens (see~\in{section }[luatokens]). The third value is always +zero. + +\subsection{Core functionality interfaces} + +\subsubsection{\type {tex.badness}} + +\startfunctioncall +<number> b = tex.badness(<number> t, <number> s) +\stopfunctioncall + +This helper function is useful during linebreak calculations. \type {t} and \type +{s} are scaled values; the function returns the badness for when total \type {t} +is supposed to be made from amounts that sum to \type {s}. The returned number is +a reasonable approximation of $100(t/s)^3$; + +\subsubsection{\type {tex.linebreak}} + +\startfunctioncall +local <node> nodelist, <table> info = + tex.linebreak(<node> listhead, <table> parameters) +\stopfunctioncall + +The understood parameters are as follows: + +\starttabulate[|l|l|p|] +\NC \bf name \NC \bf type \NC \bf description \NC \NR +\NC pardir \NC string \NC \NC \NR +\NC pretolerance \NC number \NC \NC \NR +\NC tracingparagraphs \NC number \NC \NC \NR +\NC tolerance \NC number \NC \NC \NR +\NC looseness \NC number \NC \NC \NR +\NC hyphenpenalty \NC number \NC \NC \NR +\NC exhyphenpenalty \NC number \NC \NC \NR +\NC pdfadjustspacing \NC number \NC \NC \NR +\NC adjdemerits \NC number \NC \NC \NR +\NC pdfprotrudechars \NC number \NC \NC \NR +\NC linepenalty \NC number \NC \NC \NR +\NC lastlinefit \NC number \NC \NC \NR +\NC doublehyphendemerits \NC number \NC \NC \NR +\NC finalhyphendemerits \NC number \NC \NC \NR +\NC hangafter \NC number \NC \NC \NR +\NC interlinepenalty \NC number or table \NC if a table, then it is an array like \type {\interlinepenalties} \NC \NR +\NC clubpenalty \NC number or table \NC if a table, then it is an array like \type {\clubpenalties} \NC \NR +\NC widowpenalty \NC number or table \NC if a table, then it is an array like \type {\widowpenalties} \NC \NR +\NC brokenpenalty \NC number \NC \NC \NR +\NC emergencystretch \NC number \NC in scaled points \NC \NR +\NC hangindent \NC number \NC in scaled points \NC \NR +\NC hsize \NC number \NC in scaled points \NC \NR +\NC leftskip \NC glue_spec node \NC \NC \NR +\NC rightskip \NC glue_spec node \NC \NC \NR +\NC pdfignoreddimen \NC number \NC in scaled points \NC \NR +\NC parshape \NC table \NC \NC \NR +\stoptabulate + +Note that there is no interface for \type {\displaywidowpenalties}, you have to +pass the right choice for \type {widowpenalties} yourself. + +The meaning of the various keys should be fairly obvious from the table (the +names match the \TEX\ and \PDFTEX\ primitives) except for the last 5 entries. The +four \type {pdf...line...} keys are ignored if their value equals \type +{pdfignoreddimen}. + +It is your own job to make sure that \type {listhead} is a proper paragraph list: +this function does not add any nodes to it. To be exact, if you want to replace +the core line breaking, you may have to do the following (when you are not +actually working in the \type {pre_linebreak_filter} or \type {linebreak_filter} +callbacks, or when the original list starting at listhead was generated in +horizontal mode): + +\startitemize +\startitem + add an \quote {indent box} and perhaps a \type {local_par} node at the start + (only if you need them) +\stopitem +\startitem + replace any found final glue by an infinite penalty (or add such a penalty, + if the last node is not a glue) +\stopitem +\startitem + add a glue node for the \type {\parfillskip} after that penalty node +\stopitem +\startitem + make sure all the \type {prev} pointers are OK +\stopitem +\stopitemize + +The result is a node list, it still needs to be vpacked if you want to assign it +to a \type {\vbox}. + +The returned \type {info} table contains four values that are all numbers: + +\starttabulate[|l|p|] +\NC prevdepth \NC depth of the last line in the broken paragraph \NC \NR +\NC prevgraf \NC number of lines in the broken paragraph \NC \NR +\NC looseness \NC the actual looseness value in the broken paragraph \NC \NR +\NC demerits \NC the total demerits of the chosen solution \NC \NR +\stoptabulate + +Note there are a few things you cannot interface using this function: You cannot +influence font expansion other than via \type {pdfadjustspacing}, because the +settings for that take place elsewhere. The same is true for hbadness and hfuzz +etc. All these are in the \type {hpack()} routine, and that fetches its own +variables via globals. + +\subsubsection{\type {tex.shipout}} + +\startfunctioncall +tex.shipout(<number> n) +\stopfunctioncall + +Ships out box number \type {n} to the output file, and clears the box register. + +\section[texconfig]{The \type {texconfig} table} + +This is a table that is created empty. A startup \LUA\ script could +fill this table with a number of settings that are read out by +the executable after loading and executing the startup file. + +\starttabulate[|lT|l|l|p|] +\NC \ssbf key \NC \bf type \NC \bf default \NC \bf explanation \NC \NR +\NC kpse_init \NC boolean \NC true +\NC + \type {false} totally disables \KPATHSEA\ initialisation, and enables + interpretation of the following numeric key--value pairs. (only ever unset + this if you implement {\it all\/} file find callbacks!) +\NC \NR +\NC + shell_escape \NC string \NC \type {'f'} \NC + Use \type {'y'} or \type {'t'} or \type {'1'} to enable \type {\write18} + unconditionally, \type {'p'} to enable the commands that are listed in \type + {shell_escape_commands} +\NC \NR +\NC + shell_escape_commands \NC string \NC \NC Comma-separated list of command + names that may be executed by \type {\write18} even if \type {shell_escape} + is set to \type {'p'}. Do {\it not\/} use spaces around commas, separate any + required command arguments by using a space, and use the ASCII double quote + (\type {"}) for any needed argument or path quoting +\NC \NR + +\NC string_vacancies \NC number \NC 75000 \NC cf.\ web2c docs \NC \NR +\NC pool_free \NC number \NC 5000 \NC cf.\ web2c docs \NC \NR +\NC max_strings \NC number \NC 15000 \NC cf.\ web2c docs \NC \NR +\NC strings_free \NC number \NC 100 \NC cf.\ web2c docs \NC \NR +\NC nest_size \NC number \NC 50 \NC cf.\ web2c docs \NC \NR +\NC max_in_open \NC number \NC 15 \NC cf.\ web2c docs \NC \NR +\NC param_size \NC number \NC 60 \NC cf.\ web2c docs \NC \NR +\NC save_size \NC number \NC 4000 \NC cf.\ web2c docs \NC \NR +\NC stack_size \NC number \NC 300 \NC cf.\ web2c docs \NC \NR +\NC dvi_buf_size \NC number \NC 16384 \NC cf.\ web2c docs \NC \NR +\NC error_line \NC number \NC 79 \NC cf.\ web2c docs \NC \NR +\NC half_error_line \NC number \NC 50 \NC cf.\ web2c docs \NC \NR +\NC max_print_line \NC number \NC 79 \NC cf.\ web2c docs \NC \NR +\NC hash_extra \NC number \NC 0 \NC cf.\ web2c docs \NC \NR +\NC pk_dpi \NC number \NC 72 \NC cf.\ web2c docs \NC \NR +\NC trace_file_names \NC boolean \NC true +\NC + \type {false} disables \TEX's normal file open|-|close feedback (the + assumption is that callbacks will take care of that) +\NC \NR +\NC file_line_error \NC boolean \NC false +\NC + do \type {file:line} style error messages +\NC \NR +\NC halt_on_error \NC boolean \NC false +\NC + abort run on the first encountered error +\NC \NR +\NC formatname \NC string \NC +\NC + if no format name was given on the commandline, this key will be tested first + instead of simply quitting +\NC \NR +\NC jobname \NC string \NC +\NC + if no input file name was given on the commandline, this key will be tested + first instead of simply giving up +\NC \NR +\stoptabulate + +Note: the numeric values that match web2c parameters are only used if \type +{kpse_init} is explicitly set to \type {false}. In all other cases, the normal +values from \type {texmf.cnf} are used. + +\section{The \type {texio} library} + +This library takes care of the low|-|level I/O interface. + +\subsection{Printing functions} + +\subsubsection{\type {texio.write}} + +\startfunctioncall +texio.write(<string> target, <string> s, ...) +texio.write(<string> s, ...) +\stopfunctioncall + +Without the \type {target} argument, writes all given strings to the same +location(s) \TEX\ writes messages to at this moment. If \type {\batchmode} is in +effect, it writes only to the log, otherwise it writes to the log and the +terminal. The optional \type {target} can be one of three possibilities: \type +{term}, \type {log} or \type {term and log}. + +Note: If several strings are given, and if the first of these strings is or might +be one of the targets above, the \type {target} must be specified explicitly to +prevent \LUA\ from interpreting the first string as the target. + +\subsubsection{\type {texio.write_nl}} + +\startfunctioncall +texio.write_nl(<string> target, <string> s, ...) +texio.write_nl(<string> s, ...) +\stopfunctioncall + +This function behaves like \type {texio.write}, but make sure that the given +strings will appear at the beginning of a new line. You can pass a single empty +string if you only want to move to the next line. + +\section[luatokens]{The \type {token} library} + +The \type {token} table contains interface functions to \TEX's handling of +tokens. These functions are most useful when combined with the \type +{token_filter} callback, but they could be used standalone as well. + +A token is represented in \LUA\ as a small table. For the moment, this table +consists of three numeric entries: + +\starttabulate[|l|l|p|] +\NC \bf index \NC \bf meaning \NC \bf description \NC \NR +\NC 1 \NC command code \NC this is a value between~$0$ and~$130$ (approximately)\NC \NR +\NC 2 \NC command modifier \NC this is a value between~$0$ and~$2^{21}$ \NC \NR +\NC 3 \NC control sequence id \NC for commands that are not the result of control + sequences, like letters and characters, it is zero, + otherwise, it is a number pointing into the \quote + {equivalence table} \NC \NR +\stoptabulate + +\subsection{\type {token.get_next}} + +\startfunctioncall +token t = token.get_next() +\stopfunctioncall + +This fetches the next input token from the current input source, without +expansion. + +\subsection{\type {token.is_expandable}} + +\startfunctioncall +<boolean> b = token.is_expandable(<token> t) +\stopfunctioncall + +This tests if the token \type {t} could be expanded. + +\subsection{\type {token.expand}} + +\startfunctioncall +token.expand(<token> t) +\stopfunctioncall + +If a token is expandable, this will expand one level of it, so that the first +token of the expansion will now be the next token to be read by \type +{token.get_next()}. + +\subsection{\type {token.is_activechar}} + +\startfunctioncall +<boolean> b = token.is_activechar(<token> t) +\stopfunctioncall + +This is a special test that is sometimes handy. Discovering whether some control +sequence is the result of an active character turned out to be very hard +otherwise. + +\subsection{\type {token.create}} + +\startfunctioncall +token t = token.create(<string> csname) +token t = token.create(<number> charcode) +token t = token.create(<number> charcode, <number> catcode) +\stopfunctioncall + +This is the token factory. If you feed it a string, then it is the name of a +control sequence (without leading backslash), and it will be looked up in the +equivalence table. + +If you feed it number, then this is assumed to be an input character, and an +optional second number gives its category code. This means it is possible to +overrule a character's category code, with a few exceptions: the category codes~0 +(escape), 9~(ignored), 13~(active), 14~(comment), and 15 (invalid) cannot occur +inside a token. The values~0, 9, 14 and~15 are therefore illegal as input to +\type {token.create()}, and active characters will be resolved immediately. + +Note: unknown string sequences and never defined active characters will result in +a token representing an \quote {undefined control sequence} with a near|-|random +name. It is {\em not} possible to define brand new control sequences using +\type {token.create}! + +\subsection{\type {token.command_name}} + +\startfunctioncall +<string> commandname = token.command_name(<token> t) +\stopfunctioncall + +This returns the name associated with the \quote {command} value of the token in +\LUATEX. There is not always a direct connection between these names and +primitives. For instance, all \type {\ifxxx} tests are grouped under \type +{if_test}, and the \quote {command modifier} defines which test is to be run. + +\subsection{\type {token.command_id}} + +\startfunctioncall +<number> i = token.command_id(<string> commandname) +\stopfunctioncall + +This returns a number that is the inverse operation of the previous command, to +be used as the first item in a token table. + +\subsection{\type {token.csname_name}} + +\startfunctioncall +<string> csname = token.csname_name(<token> t) +\stopfunctioncall + +This returns the name associated with the \quote {equivalence table} value of the +token in \LUATEX. It returns the string value of the command used to create the +current token, or an empty string if there is no associated control sequence. + +Keep in mind that there are potentially two control sequences that return the +same csname string: single character control sequences and active characters have +the same \quote {name}. + +\subsection{\type {token.csname_id}} + +\startfunctioncall +<number> i = token.csname_id(<string> csname) +\stopfunctioncall + +This returns a number that is the inverse operation of the previous command, to +be used as the third item in a token table. + +\subsection{The \type {newtoken} libray} + +The current \type {token} library will be replaced by a new one that is more +flexible and powerful. The transition takes place in steps. In version 0.80 we +have \type {newtoken} and in version 0.85 the old lib will be replaced +completely. So if you use this new mechanism in production code you need to be +aware of incompatible updates between 0.80 and 0.90. Because the related in- and +output code will also be cleaned up and rewritten you should be aware of +incompatible logging and error reporting too. + +The old library presents tokens as triplets or numbers, the new library presents +a userdata object. The old library used a callback to intercept tokens in the +input but the new library provides a basic scanner infrastructure that can be +used to write macros that accept a wide range of arguments. This interface is on +purpose kept general and as performance is quite ok one can build additional +parsers without too much overhead. It's up to macro package writers to see how +they can benefit from this as the main principle behind \LUATEX\ is to provide a +minimal set of tools and no solutions. + +The current functions in the \type {newtoken} namespace are given in the next +table: + +\starttabulate[|lT|lT|p|] +\NC \bf function \NC \bf argument \NC \bf result \NC \NR +\HL +\NC is_token \NC token \NC checks if the given argument is a token userdatum \NC \NR +\NC get_next \NC \NC returns the next token in the input \NC \NR +\NC scan_keyword \NC string \NC returns true if the given keyword is gobbled \NC \NR +\NC scan_int \NC \NC returns a number \NC \NR +\NC scan_dimen \NC infinity, mu-units \NC returns a number representing a dimension and or two numbers being the filler and order \NC \NR +\NC scan_glue \NC mu-units \NC returns a glue spec node \NC \NR +\NC scan_toks \NC definer, expand \NC returns a table of tokens token list (this can become a linked list in later releases) \NC \NR +\NC scan_code \NC bitset \NC returns a character if its category is in the given bitset (representing catcodes) \NC \NR +\NC scan_string \NC \NC returns a string given between \type {{}}, as \type {\macro} or as sequence of characters with catcode 11 or 12 \NC \NR +\NC scan_word \NC \NC returns a sequence of characters with catcode 11 or 12 as string \NC \NR +\NC create \NC \NC returns a userdata token object of the given control sequence name (or character); this interface can change \NC \NR +\stoptabulate + +The scanners can be considered stable apart from the one scanning for a token. +This is because futures releases can return a linked list instead of a table (as +with nodes). The \type {scan_code} function takes an optional number, the \type +{keyword} function a normal \LUA\ string. The \type {infinity} boolean signals +that we also permit \type {fill} as dimension and the \type {mu-units} flags the +scanner that we expect math units. When scanning tokens we can indicate that we +are defining a macro, in which case the result will also provide information +about what arguments are expected and in the result this is separated from the +meaning by a separator token. The \type {expand} flag determines if the list will +be expanded. + +The string scanner scans for something between curly braces and expands on the +way, or when it sees a control sequence it will return its meaning. Otherwise it +will scan characters with catcode \type {letter} or \type {other}. So, given the +following definition: + +\startbuffer +\def\bar{bar} +\def\foo{foo-\bar} +\stopbuffer + +\typebuffer \getbuffer + +we get: + +\starttabulate[|l|Tl|l|] +\NC \type {\directlua{newtoken.scan_string()}{foo}} \NC \directlua{context("{\\red\\type {"..newtoken.scan_string().."}}")} {foo} \NC full expansion \NR +\NC \type {\directlua{newtoken.scan_string()}foo} \NC \directlua{context("{\\red\\type {"..newtoken.scan_string().."}}")} foo \NC letters and others \NR +\NC \type {\directlua{newtoken.scan_string()}\foo} \NC \directlua{context("{\\red\\type {"..newtoken.scan_string().."}}")}\foo \NC meaning \NR +\stoptabulate + +The \type {\foo} case only gives the meaning, but one can pass an already +expanded definition (\type {\edef}'d). In the case of the braced variant one can of +course use the \type {\detokenize} and \type {\unexpanded} primitives as there we +do expand. + +The \type {scan_word} scanner can be used to implement for instance a number scanner: + +\starttyping +function newtokens.scan_number(base) + return tonumber(newtoken.scan_word(),base) +end +\stoptyping + +This scanner accepts any valid \LUA\ number so it is a way to pick up floats +in the input. + +The creator function can be used as follows: + +\starttyping +local t = newtoken("relax") +\stoptyping + +This gives back a token object that has the properties of the \type {\relax} +primitive. The possible properties of tokens are: + +\starttabulate[|lT|p|] +\NC command \NC a number representing the internal command number \NC \NR +\NC cmdname \NC the type of the command (for instance the catcode in case of a + character or the classifier that determines the internal + treatment \NC \NR +\NC csname \NC the associated control sequence (if applicable) \NC \NR +\NC id \NC the unique id of the token \NC \NR +%NC tok \NC \NC \NR % might change +\NC active \NC a boolean indicating the active state of the token \NC \NR +\NC expandable \NC a boolean indicating if the token (macro) is expandable \NC \NR +\NC protected \NC a boolean indicating if the token (macro) is protected \NC \NR +\stoptabulate + +The numbers that represent a catcode are the same as in \TEX\ itself, so using +this information assumes that you know a bit about \TEX's internals. The other +numbers and names are used consistently but are not frozen. So, when you use them +for comparing you can best query a known primitive or character first to see the +values. + +More interesting are the scanners. You can use the \LUA\ interface as follows: + +\starttyping +\directlua { + function mymacro(n) + ... + end +} + +\def\mymacro#1{% + \directlua { + mymacro(\number\dimexpr#1) + }% +} + +\mymacro{12pt} +\mymacro{\dimen0} +\stoptyping + +You can also do this: + +\starttyping +\directlua { + function mymacro() + local d = newtoken.scan_dimen() + ... + end +} + +\def\mymacro{% + \directlua { + mymacro() + }% +} + +\mymacro 12pt +\mymacro \dimen0 +\stoptyping + +It is quite clear from looking at the code what the first method needs as +argument(s). For the second method you need to look at the \LUA\ code to see what +gets picked up. Instead of passing from \TEX\ to \LUA\ we let \LUA\ fetch from +the input stream. + +In the first case the input is tokenized and then turned into a string when it's +passed to \LUA\ where it gets interpreted. In the second case only a function +call gets interpreted but then the input is picked up by explicitly calling the +scanner functions. These return proper \LUA\ variables so no further conversion +has to be done. This is more efficient but in practice (given what \TEX\ has to +do) this effect should not be overestimated. For numbers and dimensions it saves a +bit but for passing strings conversion to and from tokens has to be done anyway +(although we can probably speed up the process in later versions if needed). + +When the interface is stable and has replaced the old one completely we will add +some more information here. By that time the internals have been cleaned up a bit +more so we know then what will stay and go. A positive side effect of this +transition is that we can simplify the input part because we no longer need to +intercept using callbacks. + +\stopchapter + +\stopcomponent diff --git a/doc/context/sources/general/manuals/luatex/luatex-logos.tex b/doc/context/sources/general/manuals/luatex/luatex-logos.tex new file mode 100644 index 000000000..7406dd602 --- /dev/null +++ b/doc/context/sources/general/manuals/luatex/luatex-logos.tex @@ -0,0 +1,19 @@ +\startenvironment luatex-logos + +\logo[DFONT] {dfont} +\logo[CFF] {cff} +\logo[CMAP] {CMap} +\logo[PATGEN] {patgen} +\logo[MP] {MetaPost} +\logo[METAPOST] {MetaPost} +\logo[MPLIB] {MPlib} +\logo[COCO] {coco} +\logo[SUNOS] {SunOS} +\logo[BSD] {bsd} +\logo[SYSV] {sysv} +\logo[DPI] {dpi} +\logo[DLL] {dll} +\logo[OPENOFFICE]{OpenOffice} +\logo[OCP] {OCP} + +\stopenvironment diff --git a/doc/context/sources/general/manuals/luatex/luatex-lua.tex b/doc/context/sources/general/manuals/luatex/luatex-lua.tex new file mode 100644 index 000000000..3fe2ec9ad --- /dev/null +++ b/doc/context/sources/general/manuals/luatex/luatex-lua.tex @@ -0,0 +1,542 @@ +\environment luatex-style +\environment luatex-logos + +\startcomponent luatex-lua + +\startchapter[reference=lua,title={\LUA\ general}] + +\section[init]{Initialization} + +\subsection{\LUATEX\ as a \LUA\ interpreter} + +There are some situations that make \LUATEX\ behave like a standalone \LUA\ +interpreter: + +\startitemize[packed] +\startitem + if a \type {--luaonly} option is given on the commandline, or +\stopitem +\startitem + if the executable is named \type {texlua} or \type {luatexlua}, or +\stopitem +\startitem + if the only non|-|option argument (file) on the commandline has the extension + \type {lua} or \type {luc}. +\stopitem +\stopitemize + +In this mode, it will set \LUA's \type {arg[0]} to the found script name, pushing +preceding options in negative values and the rest of the commandline in the +positive values, just like the \LUA\ interpreter. + +\LUATEX\ will exit immediately after executing the specified \LUA\ script and is, +in effect, a somewhat bulky standalone \LUA\ interpreter with a bunch of extra +preloaded libraries. + +\subsection{\LUATEX\ as a \LUA\ byte compiler} + +There are two situations that make \LUATEX\ behave like the \LUA\ byte compiler: + +\startitemize[packed] +\startitem if a \type {--luaconly} option is given on the commandline, or \stopitem +\startitem if the executable is named \type {texluac} \stopitem +\stopitemize + +In this mode, \LUATEX\ is exactly like \type {luac} from the standalone \LUA\ +distribution, except that it does not have the \type {-l} switch, and that it +accepts (but ignores) the \type {--luaconly} switch. + +\subsection{Other commandline processing} + +When the \LUATEX\ executable starts, it looks for the \type {--lua} commandline +option. If there is no \type {--lua} option, the commandline is interpreted in a +similar fashion as in traditional \PDFTEX\ and \ALEPH. Some options are accepted +but have no consequence. The following command|-|line options are understood: + +\starttabulate[|lT|p|] +\NC --fmt=FORMAT \NC load the format file \type {FORMAT} \NC\NR +\NC --lua=FILE \NC load and execute a \LUA\ initialization script\NC\NR +\NC --safer \NC disable easily exploitable \LUA\ commands \NC\NR +\NC --nosocket \NC disable the \LUA\ socket library \NC\NR +\NC --help \NC display help and exit \NC\NR +\NC --ini \NC be iniluatex, for dumping formats \NC\NR +\NC --interaction=STRING \NC set interaction mode: \type {batchmode}, \type {nonstopmode} + \type {scrollmode} or \type {errorstopmode} \NC \NR +\NC --halt-on-error \NC stop processing at the first error\NC \NR +\NC --kpathsea-debug=NUMBER \NC set path searching debugging flags according to + the bits of \type {NUMBER} \NC \NR +\NC --progname=STRING \NC set the program name to \type {STRING} \NC \NR +\NC --version \NC display version and exit \NC \NR +\NC --credits \NC display credits and exit \NC \NR +\NC --recorder \NC enable filename recorder \NC \NR +\NC --etex \NC ignored \NC \NR +\NC --output-comment=STRING \NC use \type {STRING} for \DVI\ file comment instead of + date (no effect for \PDF) \NC \NR +\NC --output-directory=DIR \NC use \type {DIR} as the directory to write files to \NC \NR +\NC --draftmode \NC switch on draft mode i.e.\ generate no output in \PDF\ mode \NC \NR +\NC --output-format=FORMAT \NC use \type {FORMAT} for job output; \type {FORMAT} is \type {dvi} or + \type {pdf} \NC \NR +\NC --[no-]shell-escape \NC disable/enable \type {\write18{SHELL COMMAND}} \NC \NR +\NC --enable-write18 \NC enable \type {\write18{SHELL COMMAND}} \NC \NR +\NC --disable-write18 \NC disable \type {\write18{SHELL COMMAND}} \NC \NR +\NC --shell-restricted \NC restrict \type {\write18} to a list of commands + given in \type {texmf.cnf} \NC \NR +\NC --debug-format \NC enable format debugging \NC \NR +\NC --[no-]file-line-error \NC disable/enable \type {file:line:error} style messages \NC \NR +\NC --[no-]file-line-error-style \NC aliases of \type {--[no-]file-line-error} \NC \NR +\NC --jobname=STRING \NC set the job name to \type {STRING} \NC \NR +\NC --[no-]parse-first-line \NC ignored \NC \NR +\NC --translate-file= \NC ignored \NC \NR +\NC --default-translate-file= \NC ignored \NC \NR +\NC --8bit \NC ignored \NC \NR +\NC --[no-]mktex=FMT \NC disable/enable \type {mktexFMT} generation with \type {FMT} + is \type {tex} or \type {tfm} \NC \NR +\NC --synctex=NUMBER \NC enable \type {synctex} \NC \NR +\stoptabulate + +A note on the creation of the various temporary files and the \type {\jobname}. +The value to use for \type {\jobname} is decided as follows: + +\startitemize +\startitem + If \type {--jobname} is given on the command line, its argument will be the + value for \type {\jobname}, without any changes. The argument will not be used + for actual input so it need not exist. The \type {--jobname} switch only + controls the \type {\jobname} setting. +\stopitem +\startitem + Otherwise, \type {\jobname} will be the name of the first file that is read + from the file system, with any path components and the last extension (the + part following the last \type {.}) stripped off. +\stopitem +\startitem + An exception to the previous point: if the command line goes into interactive + mode (by starting with a command) and there are no files input via \type + {\everyjob} either, then the \type {\jobname} is set to \type {texput} as a + last resort. +\stopitem +\stopitemize + +The file names for output files that are generated automatically are created by +attaching the proper extension (\type {.log}, \type {.pdf}, etc.) to the found +\type {\jobname}. These files are created in the directory pointed to by \type +{--output-directory}, or in the current directory, if that switch is not present. + +\blank + +Without the \type {--lua} option, command line processing works like it does in +any other web2c-based typesetting engine, except that \LUATEX\ has a few extra +switches. + +If the \type {--lua} option is present, \LUATEX\ will enter an alternative mode +of commandline processing in comparison to the standard web2c programs. + +In this mode, a small series of actions is taken in order. First, it will parse +the commandline as usual, but it will only interpret a small subset of the +options immediately: \type {--safer}, \type {--nosocket}, \type +{--[no-]shell-escape}, \type {--enable-write18}, \type {--disable-write18}, \type +{--shell-restricted}, \type {--help}, \type {--version}, and \type {--credits}. + +Now it searches for the requested \LUA\ initialization script. If it cannot be +found using the actual name given on the commandline, a second attempt is made by +prepending the value of the environment variable \type {LUATEXDIR}, if that +variable is defined in the environment. + +Then it checks the various safety switches. You can use those to disable some +\LUA\ commands that can easily be abused by a malicious document. At the moment, +\type {--safer} \type {nil}s the following functions: + +\starttabulate[|l|l|] +\NC \bf library \NC \bf functions \NC \NR +\NC \type {os} \NC \type {execute} \type {exec} \type {setenv} \type {rename} \type {remove} \type {tmpdir} \NC \NR +\NC \type {io} \NC \type {popen} \type {output} \type {tmpfile} \NC \NR +\NC \type {lfs} \NC \type {rmdir} \type {mkdir} \type {chdir} \type {lock} \type {touch} \NC \NR +\stoptabulate + +Furthermore, it disables loading of compiled \LUA\ libraries and it makes \type +{io.open()} fail on files that are opened for anything besides reading. + +\type {--nosocket} makes the socket library unavailable, so that \LUA\ cannot use +networking. + +The switches \type {--[no-]shell-escape}, \type {--[enable|disable]-write18}, and +\type {--shell-restricted} have the same effects as in \PDFTEX, and additionally +make \type {io.popen()}, \type {os.execute}, \type {os.exec} and \type {os.spawn} +adhere to the requested option. + +Next the initialization script is loaded and executed. From within the script, +the entire commandline is available in the \LUA\ table \type {arg}, beginning with +\type {arg[0]}, containing the name of the executable. As consequence, the warning +about unrecognized option is suppressed. + +Commandline processing happens very early on. So early, in fact, that none of +\TEX's initializations have taken place yet. For that reason, the tables that +deal with typesetting, like \type {tex}, \type {token}, \type {node} and +\type {pdf}, are off|-|limits during the execution of the startup file (they +are nilled). Special care is taken that \type {texio.write} and \type +{texio.write_nl} function properly, so that you can at least report your actions +to the log file when (and if) it eventually becomes opened (note that \TEX\ does +not even know its \type {\jobname} yet at this point). See \in {chapter} [libraries] +for more information about the \LUATEX-specific \LUA\ extension tables. + +Everything you do in the \LUA\ initialization script will remain visible during +the rest of the run, with the exception of the aforementioned \type {tex}, +\type {token}, \type {node} and \type {pdf} tables: those will be +initialized to their documented state after the execution of the script. You +should not store anything in variables or within tables with these four global +names, as they will be overwritten completely. + +We recommend you use the startup file only for your own \TEX|-|independent +initializations (if you need any), to parse the commandline, set values in the +\type {texconfig} table, and register the callbacks you need. + +\LUATEX\ allows some of the commandline options to be overridden by reading +values from the \type {texconfig} table at the end of script execution (see the +description of the \type {texconfig} table later on in this document for more +details on which ones exactly). + +Unless the \type {texconfig} table tells \LUATEX\ not to initialize \KPATHSEA\ +at all (set \type {texconfig.kpse_init} to \type {false} for that), \LUATEX\ +acts on some more commandline options after the initialization script is +finished: in order to initialize the built|-|in \KPATHSEA\ library properly, +\LUATEX\ needs to know the correct program name to use, and for that it needs to +check \type {--progname}, or \type {--ini} and \type {--fmt}, if \type +{--progname} is missing. + +\section{\LUA\ behaviour} + +\LUA s \type {tonumber} function may return values in scientific notation, +thereby confusing the \TEX\ end of things when it is used as the right|-|hand +side of an assignment to a \type {\dimen} or \type {\count}. + +Loading dynamic \LUA\ libraries will fail if there are two \LUA\ libraries loaded +at the same time (which will typically happen on \type {win32}, because there is +one \LUA\ 5.2 inside \LUATEX, and another will likely be linked to the \DLL\ file +of the module itself). We plan to fix that later by switching \LUATEX\ itself to +using de \DLL\ version of \LUA\ 5.2 inside \LUATEX\ instead of including a static +version in the binary. + +\LUATEX\ is able to use the kpathsea library to find \type {require()}d modules. +For this purpose, \type {package.searchers[2]} is replaced by a different loader +function, that decides at runtime whether to use kpathsea or the built|-|in core +\LUA\ function. It uses \KPATHSEA\ when that is already initialized at that point +in time, otherwise it reverts to using the normal \type {package.path} loader. + +Initialization of \KPATHSEA\ can happen either implicitly (when \LUATEX\ starts +up and the startup script has not set \type {texconfig.kpse_init} to false), or +explicitly by calling the \LUA\ function \type {kpse.set_program_name()}. + +\LUATEX\ is able to use dynamically loadable \LUA\ libraries, unless +\type {--safer} was given as an option on the command line. For this purpose, +\type {package.searchers[3]} is replaced by a different loader function, that +decides at runtime whether to use \KPATHSEA\ or the built|-|in core \LUA\ +function. It uses \KPATHSEA\ when that is already initialized at that point in +time, otherwise it reverts to using the normal \type {package.cpath} loader. + +This functionality required an extension to kpathsea: + +\startnarrower +There is a new kpathsea file format: \type {kpse_clua_format} that searches for +files with extension \type {.dll} and \type {.so}. The \type {texmf.cnf} setting +for this variable is \type {CLUAINPUTS}, and by default it has this value: + +\starttyping +CLUAINPUTS=.:$SELFAUTOLOC/lib/{$progname,$engine,}/lua// +\stoptyping % $ + +This path is imperfect (it requires a \TDS\ subtree below the binaries +directory), but the architecture has to be in the path somewhere, and the +currently simplest way to do that is to search below the binaries directory only. +Of course it no big deal to write an alternative loader and use that in a macro +package. + +One level up (a \type {lib} directory parallel to \type {bin}) would have been +nicer, but that is not doable because \TEXLIVE\ uses a \type {bin/<arch>} +structure. +\stopnarrower + +In keeping with the other \TEX|-|like programs in \TEXLIVE, the two \LUA\ functions +\type {os.execute} and \type {io.popen}, as well as the two new functions \type +{os.exec} and \type {os.spawn} that are explained below, take the value of \type +{shell_escape} and|/|or \type {shell_escape_commands} in account. Whenever +\LUATEX\ is run with the assumed intention to typeset a document (and by that we +mean that it is called as \type {luatex}, as opposed to \type {texlua}, and that +the commandline option \type {--luaonly} was not given), it will only run the +four functions above if the matching \type {texmf.cnf} variable(s) or their \type +{texconfig} (see \in {section} [texconfig]) counterparts allow execution of the +requested system command. In \quote {script interpreter} runs of \LUATEX, these +settings have no effect, and all four functions function as normal. + +The \type {f:read("*line")} and \type {f:lines()} functions from the io library +have been adjusted so that they are line|-|ending neutral: any of \type {LF}, +\type {CR} or \type {CR+LF} are acceptable line endings. + +\type {luafilesystem} has been extended: there are two extra boolean functions +(\type {lfs.isdir(filename)} and \type {lfs.isfile(filename)}) and one extra +string field in its attributes table (\type {permissions}). There is an +additional function \type {lfs.shortname()} which takes a file name and returns +its short name on \type {win32} platforms. On other platforms, it just returns +the given argument. The file name is not tested for existence. Finally, for +non|-|\type {win32} platforms only, there is the new function \type +{lfs.readlink()} hat takes an existing symbolic link as argument and returns its +content. It returns an error on \type {win32}. + +The \type {string} library has an extra function: \type {string.explode(s[,m])}. +This function returns an array containing the string argument \type {s} split +into sub-strings based on the value of the string argument \type {m}. The second +argument is a string that is either empty (this splits the string into +characters), a single character (this splits on each occurrence of that +character, possibly introducing empty strings), or a single character followed by +the plus sign \type {+} (this special version does not create empty sub-strings). +The default value for \type {m} is \quote {\type { +}} (multiple spaces). Note: +\type {m} is not hidden by surrounding braces as it would be if this function was +written in \TEX\ macros. + +The \type {string} library also has six extra iterators that return strings +piecemeal: + +\startitemize +\startitem + \type {string.utfvalues(s)}: an integer value in the \UNICODE\ range +\stopitem +\startitem + \type {string.utfcharacters(s)}: a string with a single \UTF-8 token in it +\stopitem +\startitem + \type {string.characters(s)} \NC a string containing one byte +\stopitem +\startitem + \type {string.characterpairs(s)} two strings each containing one byte or an + empty second string if the string length was odd +\stopitem +\startitem + \type {string.bytes(s)} a single byte value +\stopitem +\startitem + \type {string.bytepairs(s)} two byte values or nil instead of a number as + its second return value if the string length was odd +\stopitem +\stopitemize + +The \type {string.characterpairs()} and \type {string.bytepairs()} iterators +are useful especially in the conversion of \UTF-16 encoded data into \UTF-8. + +There is also a two|-|argument form of \type {string.dump()}. The second argument +is a boolean which, if true, strips the symbols from the dumped data. This +matches an extension made in \type {luajit}. + +The \type {string} library functions \type {len}, \type {lower}, \type {sub} +etc.\ are not \UNICODE|-|aware. For strings in the \UTF8 encoding, i.e., strings +containing characters above code point 127, the corresponding functions from the +\type {slnunicode} library can be used, e.g., \type {unicode.utf8.len}, \type +{unicode.utf8.lower} etc. The exceptions are \type {unicode.utf8.find}, that +always returns byte positions in a string, and \type {unicode.utf8.match} and +\type {unicode.utf8.gmatch}. While the latter two functions in general {\it +are} \UNICODE|-|aware, they fall|-|back to non|-|\UNICODE|-|aware behavior when +using the empty capture \type {()} but other captures work as expected. For the +interpretation of character classes in \type {unicode.utf8} functions refer to +the library sources at \hyphenatedurl {http://luaforge.net/projects/sln}. Version +5.3 of \LUA\ will provide some native \UTF8 support. + +\blank + +The \type {os} library has a few extra functions and variables: + +\startitemize + +\startitem + \type {os.selfdir} is a variable that holds the directory path of the + actual executable. For example: \type {\directlua {tex.sprint(os.selfdir)}}. +\stopitem + +\startitem + \type {os.exec(commandline)} is a variation on \type {os.execute}. Here + \type {commandline} can be either a single string or a single table. + + If the argument is a table: \LUATEX\ first checks if there is a value at + integer index zero. If there is, this is the command to be executed. + Otherwise, it will use the value at integer index one. (if neither are + present, nothing at all happens). + + The set of consecutive values starting at integer~1 in the table are the + arguments that are passed on to the command (the value at index~1 becomes + \type {arg[0]}). The command is searched for in the execution path, so there + is normally no need to pass on a fully qualified pathname. + + If the argument is a string, then it is automatically converted into a table + by splitting on whitespace. In this case, it is impossible for the command + and first argument to differ from each other. + + In the string argument format, whitespace can be protected by putting (part + of) an argument inside single or double quotes. One layer of quotes is + interpreted by \LUATEX, and all occurrences of \type {\"}, \type {\'} or \type + {\\} within the quoted text are unescaped. In the table format, there is no + string handling taking place. + + This function normally does not return control back to the \LUA\ script: the + command will replace the current process. However, it will return the two + values \type {nil} and \type {'error'} if there was a problem while + attempting to execute the command. + + On \MSWINDOWS, the current process is actually kept in memory until after the + execution of the command has finished. This prevents crashes in situations + where \TEXLUA\ scripts are run inside integrated \TEX\ environments. + + The original reason for this command is that it cleans out the current + process before starting the new one, making it especially useful for use in + \TEXLUA. +\stopitem + +\startitem + \type {os.spawn(commandline)} is a returning version of \type {os.exec}, + with otherwise identical calling conventions. + + If the command ran ok, then the return value is the exit status of the + command. Otherwise, it will return the two values \type {nil} and \type + {'error'}. +\stopitem + +\startitem + \type {os.setenv('key','value')} sets a variable in the environment. + Passing \type {nil} instead of a value string will remove the variable. +\stopitem + +\startitem + \type {os.env} is a hash table containing a dump of the variables and + values in the process environment at the start of the run. It is writeable, + but the actual environment is {\em not\/} updated automatically. +\stopitem + +\startitem + \type {os.gettimeofday()} returns the current \quote {\UNIX\ time}, but as a + float. This function is not available on the \SUNOS\ platforms, so do not use + this function for portable documents. +\stopitem + +\startitem + \type {os.times()}returns the current process times according to \ the + \UNIX\ C library function \quote {times}. This function is not available on + the \MSWINDOWS\ and \SUNOS\ platforms, so do not use this function for + portable documents. +\stopitem + +\startitem + \type {os.tmpdir()} creates a directory in the \quote {current directory} + with the name \type {luatex.XXXXXX} where the \type {X}-es are replaced by a + unique string. The function also returns this string, so you can \type + {lfs.chdir()} into it, or \type {nil} if it failed to create the directory. + The user is responsible for cleaning up at the end of the run, it does not + happen automatically. +\stopitem + +\startitem + \type {os.type} is a string that gives a global indication of the class of + operating system. The possible values are currently \type {windows}, \type + {unix}, and \type {msdos} (you are unlikely to find this value \quote {in the + wild}). +\stopitem + +\startitem + \type {os.name} is a string that gives a more precise indication of the + operating system. These possible values are not yet fixed, and for \type + {os.type} values \type {windows} and \type {msdos}, the \type {os.name} + values are simply \type {windows} and \type {msdos} + + The list for the type \type {unix} is more precise: \type {linux}, \type + {freebsd}, \type {kfreebsd}, \type {cygwin}, \type {openbsd}, \type + {solaris}, \type {sunos} (pre-solaris), \type {hpux}, \type {irix}, \type + {macosx}, \type {gnu} (hurd), \type {bsd} (unknown, but \BSD|-|like), \type + {sysv} (unknown, but \SYSV|-|like), \type {generic} (unknown). +\stopitem + +\startitem + \type {os.version} is planned as a future extension. +\stopitem + +\startitem + \type {os.uname()} returns a table with specific operating system + information acquired at runtime. The keys in the returned table are all + string valued, and their names are: \type {sysname}, \type {machine}, \type + {release}, \type {version}, and \type {nodename}. +\stopitem + +\stopitemize + +In stock \LUA, many things depend on the current locale. In \LUATEX, we can't do +that, because it makes documents unportable. While \LUATEX\ is running if +forces the following locale settings: + +\starttyping +LC_CTYPE=C +LC_COLLATE=C +LC_NUMERIC=C +\stoptyping + +\section {\LUA\ modules} + +The implied use of the built|-|in Lua modules in this section is deprecated. If +you want to use one of these libraries, please start your source file with a +proper \type {require} line. At some point \LUATEX\ will switch to loading these +modules on demand. + +Some modules that are normally external to \LUA\ are statically linked in with +\LUATEX, because they offer useful functionality: + +\startitemize + +\startitem + \type {slnunicode}, from the \type {Selene} libraries, \hyphenatedurl + {http://luaforge.net/projects/sln}. (version 1.1) This library has been + slightly extended so that the \type {unicode.utf8.*} functions also accept the + first 256 values of plane~18. This is the range \LUATEX\ uses for raw binary + output, as explained above. +\stopitem + +\startitem + \type {luazip}, from the kepler project, + \hyphenatedurl{http://www.keplerproject.org/luazip/}. (version 1.2.1, but + patched for compilation with \LUA\ 5.2) +\stopitem + +\startitem + \type {luafilesystem}, also from the kepler project, \hyphenatedurl + {http://www.keplerproject.org/luafilesystem/}. (version 1.5.0) +\stopitem + +\startitem + \type {lpeg}, by Roberto Ierusalimschy, \hyphenatedurl + {http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html}. (version 0.10.2) This + library is not \UNICODE|-|aware, but interprets strings on a + byte|-|per|-|byte basis. This mainly means that \type {lpeg.S} cannot be + used with \UTF\ characters encoded in more than two bytes, and thus \type + {lpeg.S} will look for one of those two bytes when matching, not the + combination of the two. The same is true for \type {lpeg.R}, although the + latter will display an error message if used with multibyte characters. + Therefore \type {lpeg.R('aä')} results in the message \type {bad argument + #1 to 'R' (range must have two characters)}, since to \type {lpeg}, \type {ä} + is two 'characters' (bytes), so \type {aä} totals three. In practice this is + no real issue. +\stopitem + +\startitem + \type {lzlib}, by Tiago Dionizio, \hyphenatedurl + {http://luaforge.net/projects/lzlib/}. (version 0.2) +\stopitem + +\startitem + \type {md5}, by Roberto Ierusalimschy \hyphenatedurl + {http://www.inf.puc-rio.br/~roberto/md5/md5-5/md5.html}. +\stopitem + +\startitem + \type {luasocket}, by Diego Nehab \hyphenatedurl + {http://w3.impa.br/~diego/software/luasocket/} (version 2.0.2). The \type + {.lua} support modules from \type {luasocket} are also preloaded inside the + executable, there are no external file dependencies. +\stopitem + +\stopitemize + +\stopchapter + +\stopcomponent diff --git a/doc/context/sources/general/manuals/luatex/luatex-math.tex b/doc/context/sources/general/manuals/luatex/luatex-math.tex new file mode 100644 index 000000000..88809d9d9 --- /dev/null +++ b/doc/context/sources/general/manuals/luatex/luatex-math.tex @@ -0,0 +1,671 @@ +\environment luatex-style +\environment luatex-logos + +\startcomponent luatex-math + +\startchapter[reference=math,title={Math}] + +The handling of mathematics in \LUATEX\ differs quite a bit from how \TEX82 (and +therefore \PDFTEX) handles math. First, \LUATEX\ adds primitives and extends some +others so that \UNICODE\ input can be used easily. Second, all of \TEX82's +internal special values (for example for operator spacing) have been made +accessible and changeable via control sequences. Third, there are extensions that +make it easier to use \OPENTYPE\ math fonts. And finally, there are some +extensions that have been proposed in the past that are now added to the engine. + +\section{The current math style} + +It is possible to discover the math style that will be used for a formula in an +expandable fashion (while the math list is still being read). To make this +possible, \LUATEX\ adds the new primitive: \type {\mathstyle}. This is a \quote +{convert command} like e.g. \type {\romannumeral}: its value can only be read, +not set. + +\subsection{\type {\mathstyle}} + +The returned value is between 0 and 7 (in math mode), or $-1$ (all other modes). +For easy testing, the eight math style commands have been altered so that the can +be used as numeric values, so you can write code like this: + +\starttyping +\ifnum\mathstyle=\textstyle + \message{normal text style} +\else \ifnum\mathstyle=\crampedtextstyle + \message{cramped text style} +\fi \fi +\stoptyping + +\subsection{\type {\Ustack}} + +There are a few math commands in \TEX\ where the style that will be used is not +known straight from the start. These commands (\type {\over}, \type {\atop}, +\type {\overwithdelims}, \type {\atopwithdelims}) would therefore normally return +wrong values for \type {\mathstyle}. To fix this, \LUATEX\ introduces a special +prefix command: \type {\Ustack}: + +\starttyping +$\Ustack {a \over b}$ +\stoptyping + +The \type {\Ustack} command will scan the next brace and start a new math group +with the correct (numerator) math style. + +\section{Unicode math characters} + +Character handling is now extended up to the full \UNICODE\ range (the \type {\U} +prefix), which is compatible with \XETEX. + +The math primitives from \TEX\ are kept as they are, except for the ones that +convert from input to math commands: \type {mathcode}, and \type {delcode}. These +two now allow for a 21-bit character argument on the left hand side of the equals +sign. + +Some of the new \LUATEX\ primitives read more than one separate value. This is +shown in the tables below by a plus sign in the second column. + +The input for such primitives would look like this: + +\starttyping +\def\overbrace{\Umathaccent 0 1 "23DE } +\stoptyping + +Altered \TEX82 primitives: + +\starttabulate[|l|l|l|] +\NC \bf primitive \NC \bf value range (in hex) \NC \NR +\NC \type {\mathcode} \NC 0--10FFFF = 0--8000 \NC \NR +\NC \type {\delcode} \NC 0--10FFFF = 0--FFFFFF \NC \NR +\stoptabulate + +Unaltered: + +\starttabulate[|l|l|l|] +\NC \bf primitive \NC \bf value range (in hex) \NC \NR +\NC \type {\mathchardef} \NC 0--8000 \NC \NR +\NC \type {\mathchar} \NC 0--7FFF \NC \NR +\NC \type {\mathaccent} \NC 0--7FFF \NC \NR +\NC \type {\delimiter} \NC 0--7FFFFFF \NC \NR +\NC \type {\radical} \NC 0--7FFFFFF \NC \NR +\stoptabulate + +New primitives that are compatible with \XETEX: + +\starttabulate[|l|l|l|l|] +\NC \bf primitive \NC \bf value range (in hex) \NC \NR +\NC \type {\Umathchardef} \NC 0+0+0--7+FF+10FFFF$^1$ \NC \NR +\NC \type {\Umathcharnumdef}$^5$ \NC -80000000--7FFFFFFF$^3$ \NC \NR +\NC \type {\Umathcode} \NC 0--10FFFF = 0+0+0--7+FF+10FFFF$^1$ \NC \NR +\NC \type {\Udelcode} \NC 0--10FFFF = 0+0--FF+10FFFF$^2$ \NC \NR +\NC \type {\Umathchar} \NC 0+0+0--7+FF+10FFFF \NC \NR +\NC \type {\Umathaccent} \NC 0+0+0--7+FF+10FFFF$^{2,4}$ \NC \NR +\NC \type {\Udelimiter} \NC 0+0+0--7+FF+10FFFF$^2$ \NC \NR +\NC \type {\Uradical} \NC 0+0--FF+10FFFF$^2$ \NC \NR +\NC \type {\Umathcharnum} \NC -80000000--7FFFFFFF$^3$ \NC \NR +\NC \type {\Umathcodenum} \NC 0--10FFFF = -80000000--7FFFFFFF$^3$ \NC \NR +\NC \type {\Udelcodenum} \NC 0--10FFFF = -80000000--7FFFFFFF$^3$ \NC \NR +\stoptabulate + +Note 1: \type {\Umathchardef<csname>="8"0"0} and \type +{\Umathchardef<number>="8"0"0} are also accepted. + +Note 2: The new primitives that deal with delimiter-style objects do not set up a +\quote {large family}. Selecting a suitable size for display purposes is expected +to be dealt with by the font via the \type {\Umathoperatorsize} parameter (more +information can be found in a following section). + +Note 3: For these three primitives, all information is packed into a single +signed integer. For the first two (\type {\Umathcharnum} and \type {\Umathcodenum}), +the lowest 21 bits are the character code, the 3 bits above that represent the +math class, and the family data is kept in the topmost bits (This means that the +values for math families 128--255 are actually negative). For \type {\Udelcodenum} +there is no math class; the math family information is stored in the bits +directly on top of the character code. Using these three commands is not as +natural as using the two- and three-value commands, so unless you know exactly +what you are doing and absolutely require the speedup resulting from the faster +input scanning, it is better to use the verbose commands instead. + +Note 4: The \type {\Umathaccent} command accepts optional keywords to control +various details regarding math accents. See \in {section} [mathacc] below for +details. + +New primitives that exist in \LUATEX\ only (all of these will be explained +in following sections): + +\starttabulate[|l|l|l|l|] +\NC \bf primitive \NC \bf value range (in hex) \NC \NR +\NC \type {\Uroot} \NC 0+0--FF+10FFFF$^2$ \NC \NR +\NC \type {\Uoverdelimiter} \NC 0+0--FF+10FFFF$^2$ \NC \NR +\NC \type {\Uunderdelimiter} \NC 0+0--FF+10FFFF$^2$ \NC \NR +\NC \type {\Udelimiterover} \NC 0+0--FF+10FFFF$^2$ \NC \NR +\NC \type {\Udelimiterunder} \NC 0+0--FF+10FFFF$^2$ \NC \NR +\stoptabulate + +\section{Cramped math styles} + +\LUATEX\ has four new primitives to set the cramped math styles directly: + +\starttyping +\crampeddisplaystyle +\crampedtextstyle +\crampedscriptstyle +\crampedscriptscriptstyle +\stoptyping + +These additional commands are not all that valuable on their own, but they come +in handy as arguments to the math parameter settings that will be added shortly. + +\section{Math parameter settings} + +In \LUATEX, the font dimension parameters that \TEX\ used in math typesetting are +now accessible via primitive commands. In fact, refactoring of the math engine +has resulted in many more parameters than were accessible before. + +\starttabulate +\NC \bf primitive name \NC \bf description \NC \NR +\NC \type {\Umathquad} \NC the width of 18mu's \NC \NR +\NC \type {\Umathaxis} \NC height of the vertical center axis of + the math formula above the baseline \NC \NR +\NC \type {\Umathoperatorsize} \NC minimum size of large operators in display mode \NC \NR +\NC \type {\Umathoverbarkern} \NC vertical clearance above the rule \NC \NR +\NC \type {\Umathoverbarrule} \NC the width of the rule \NC \NR +\NC \type {\Umathoverbarvgap} \NC vertical clearance below the rule \NC \NR +\NC \type {\Umathunderbarkern} \NC vertical clearance below the rule \NC \NR +\NC \type {\Umathunderbarrule} \NC the width of the rule \NC \NR +\NC \type {\Umathunderbarvgap} \NC vertical clearance above the rule \NC \NR +\NC \type {\Umathradicalkern} \NC vertical clearance above the rule \NC \NR +\NC \type {\Umathradicalrule} \NC the width of the rule \NC \NR +\NC \type {\Umathradicalvgap} \NC vertical clearance below the rule \NC \NR +\NC \type {\Umathradicaldegreebefore}\NC the forward kern that takes place before placement of + the radical degree \NC \NR +\NC \type {\Umathradicaldegreeafter} \NC the backward kern that takes place after placement of + the radical degree \NC \NR +\NC \type {\Umathradicaldegreeraise} \NC this is the percentage of the total height and depth of + the radical sign that the degree is raised by. It is + expressed in \type {percents}, so 60\% is expressed as the + integer $60$. \NC \NR +\NC \type {\Umathstackvgap} \NC vertical clearance between the two + elements in a \type {\atop} stack \NC \NR +\NC \type {\Umathstacknumup} \NC numerator shift upward in \type {\atop} stack \NC \NR +\NC \type {\Umathstackdenomdown} \NC denominator shift downward in \type {\atop} stack \NC \NR +\NC \type {\Umathfractionrule} \NC the width of the rule in a \type {\over} \NC \NR +\NC \type {\Umathfractionnumvgap} \NC vertical clearance between the numerator and the rule \NC \NR +\NC \type {\Umathfractionnumup} \NC numerator shift upward in \type {\over} \NC \NR +\NC \type {\Umathfractiondenomvgap} \NC vertical clearance between the denominator and the rule \NC \NR +\NC \type {\Umathfractiondenomdown} \NC denominator shift downward in \type {\over} \NC \NR +\NC \type {\Umathfractiondelsize} \NC minimum delimiter size for \type {\...withdelims} \NC \NR +\NC \type {\Umathlimitabovevgap} \NC vertical clearance for limits above operators \NC \NR +\NC \type {\Umathlimitabovebgap} \NC vertical baseline clearance for limits above operators \NC \NR +\NC \type {\Umathlimitabovekern} \NC space reserved at the top of the limit \NC \NR +\NC \type {\Umathlimitbelowvgap} \NC vertical clearance for limits below operators \NC \NR +\NC \type {\Umathlimitbelowbgap} \NC vertical baseline clearance for limits below operators \NC \NR +\NC \type {\Umathlimitbelowkern} \NC space reserved at the bottom of the limit \NC \NR +\NC \type {\Umathoverdelimitervgap} \NC vertical clearance for limits above delimiters \NC \NR +\NC \type {\Umathoverdelimiterbgap} \NC vertical baseline clearance for limits above delimiters \NC \NR +\NC \type {\Umathunderdelimitervgap} \NC vertical clearance for limits below delimiters \NC \NR +\NC \type {\Umathunderdelimiterbgap} \NC vertical baseline clearance for limits below delimiters \NC \NR +\NC \type {\Umathsubshiftdrop} \NC subscript drop for boxes and subformulas \NC \NR +\NC \type {\Umathsubshiftdown} \NC subscript drop for characters \NC \NR +\NC \type {\Umathsupshiftdrop} \NC superscript drop (raise, actually) for boxes and subformulas \NC \NR +\NC \type {\Umathsupshiftup} \NC superscript raise for characters \NC \NR +\NC \type {\Umathsubsupshiftdown} \NC subscript drop in the presence of a superscript \NC \NR +\NC \type {\Umathsubtopmax} \NC the top of standalone subscripts cannot be higher than this + above the baseline \NC \NR +\NC \type {\Umathsupbottommin} \NC the bottom of standalone superscripts cannot be less than + this above the baseline \NC \NR +\NC \type {\Umathsupsubbottommax} \NC the bottom of the superscript of a combined super- and subscript + be at least as high as this above the baseline \NC \NR +\NC \type {\Umathsubsupvgap} \NC vertical clearance between super- and subscript \NC \NR +\NC \type {\Umathspaceafterscript} \NC additional space added after a super- or subscript \NC \NR +\NC \type {\Umathconnectoroverlapmin}\NC minimum overlap between parts in an extensible recipe \NC \NR +\stoptabulate + +Each of the parameters in this section can be set by a command like this: + +\starttyping +\Umathquad\displaystyle=1em +\stoptyping + +they obey grouping, and you can use \type {\the\Umathquad\displaystyle} if +needed. + +\section{Font-based Math Parameters} + +While it is nice to have these math parameters available for tweaking, it would +be tedious to have to set each of them by hand. For this reason, \LUATEX\ +initializes a bunch of these parameters whenever you assign a font identifier to +a math family based on either the traditional math font dimensions in the font +(for assignments to math family~2 and~3 using \TFM|-|based fonts like \type +{cmsy} and \type {cmex}), or based on the named values in a potential \type +{MathConstants} table when the font is loaded via Lua. If there is a \type +{MathConstants} table, this takes precedence over font dimensions, and in that +case no attention is paid to which family is being assigned to: the \type +{MathConstants} tables in the last assigned family sets all parameters. + +In the table below, the one|-|letter style abbreviations and symbolic tfm font +dimension names match those using in the \TeX book. Assignments to \type +{\textfont} set the values for the cramped and uncramped display and text styles. +Use \type {\scriptfont} for the script styles, and \type {\scriptscriptfont} for the +scriptscript styles (totalling eight parameters for three font sizes). In the +\TFM\ case, assignments only happen in family~2 and family~3 (and of course only +for the parameters for which there are font dimensions). + +Besides the parameters below, \LUATEX\ also looks at the \quote {space} font +dimension parameter. For math fonts, this should be set to zero. + +\start + +\switchtobodyfont[8pt] + +\starttabulate[|l|l|l|p|] +\NC \bf variable \NC \bf style \NC \bf default value opentype \NC \bf default value tfm \NC \NR +\NC \type {\Umathaxis} \NC -- \NC AxisHeight \NC axis_height \NC \NR +\NC \type {\Umathoperatorsize} \NC D, D' \NC DisplayOperatorMinHeight \NC $^6$ \NC \NR +\NC \type {\Umathfractiondelsize} \NC D, D' \NC FractionDelimiterDisplayStyleSize$^9$ \NC delim1 \NC \NR +\NC " \NC T, T', S, S', SS, SS' \NC FractionDelimiterSize$^9$ \NC delim2 \NC \NR +\NC \type {\Umathfractiondenomdown} \NC D, D' \NC FractionDenominatorDisplayStyleShiftDown \NC denom1 \NC \NR +\NC " \NC T, T', S, S', SS, SS' \NC FractionDenominatorShiftDown \NC denom2 \NC \NR +\NC \type {\Umathfractiondenomvgap} \NC D, D' \NC FractionDenominatorDisplayStyleGapMin \NC 3*default_rule_thickness \NC \NR +\NC " \NC T, T', S, S', SS, SS' \NC FractionDenominatorGapMin \NC default_rule_thickness \NC \NR +\NC \type {\Umathfractionnumup} \NC D, D' \NC FractionNumeratorDisplayStyleShiftUp \NC num1 \NC \NR +\NC " \NC T, T', S, S', SS, SS' \NC FractionNumeratorShiftUp \NC num2 \NC \NR +\NC \type {\Umathfractionnumvgap} \NC D, D' \NC FractionNumeratorDisplayStyleGapMin \NC 3*default_rule_thickness \NC \NR +\NC " \NC T, T', S, S', SS, SS' \NC FractionNumeratorGapMin \NC default_rule_thickness \NC \NR +\NC \type {\Umathfractionrule} \NC -- \NC FractionRuleThickness \NC default_rule_thickness \NC \NR +\NC \type {\Umathlimitabovebgap} \NC -- \NC UpperLimitBaselineRiseMin \NC big_op_spacing3 \NC \NR +\NC \type {\Umathlimitabovekern} \NC -- \NC 0$^1$ \NC big_op_spacing5 \NC \NR +\NC \type {\Umathlimitabovevgap} \NC -- \NC UpperLimitGapMin \NC big_op_spacing1 \NC \NR +\NC \type {\Umathlimitbelowbgap} \NC -- \NC LowerLimitBaselineDropMin \NC big_op_spacing4 \NC \NR +\NC \type {\Umathlimitbelowkern} \NC -- \NC 0$^1$ \NC big_op_spacing5 \NC \NR +\NC \type {\Umathlimitbelowvgap} \NC -- \NC LowerLimitGapMin \NC big_op_spacing2 \NC \NR +\NC \type {\Umathoverdelimitervgap} \NC -- \NC StretchStackGapBelowMin \NC big_op_spacing1 \NC \NR +\NC \type {\Umathoverdelimiterbgap} \NC -- \NC StretchStackTopShiftUp \NC big_op_spacing3 \NC \NR +\NC \type {\Umathunderdelimitervgap} \NC-- \NC StretchStackGapAboveMin \NC big_op_spacing2 \NC \NR +\NC \type {\Umathunderdelimiterbgap} \NC-- \NC StretchStackBottomShiftDown \NC big_op_spacing4 \NC \NR +\NC \type {\Umathoverbarkern} \NC -- \NC OverbarExtraAscender \NC default_rule_thickness \NC \NR +\NC \type {\Umathoverbarrule} \NC -- \NC OverbarRuleThickness \NC default_rule_thickness \NC \NR +\NC \type {\Umathoverbarvgap} \NC -- \NC OverbarVerticalGap \NC 3*default_rule_thickness \NC \NR +\NC \type {\Umathquad} \NC -- \NC <font_size(f)>$^1$ \NC math_quad \NC \NR +\NC \type {\Umathradicalkern} \NC -- \NC RadicalExtraAscender \NC default_rule_thickness \NC \NR +\NC \type {\Umathradicalrule} \NC -- \NC RadicalRuleThickness \NC <not set>$^2$ \NC \NR +\NC \type {\Umathradicalvgap} \NC D, D' \NC RadicalDisplayStyleVerticalGap \NC (default_rule_thickness+\crlf + (abs(math_x_height)/4))$^3$ \NC \NR +\NC " \NC T, T', S, S', SS, SS' \NC RadicalVerticalGap \NC (default_rule_thickness+\crlf + (abs(default_rule_thickness)/4))$^3$ \NC \NR +\NC \type {\Umathradicaldegreebefore} \NC -- \NC RadicalKernBeforeDegree \NC <not set>$^2$ \NC \NR +\NC \type {\Umathradicaldegreeafter} \NC -- \NC RadicalKernAfterDegree \NC <not set>$^2$ \NC \NR +\NC \type {\Umathradicaldegreeraise} \NC -- \NC RadicalDegreeBottomRaisePercent \NC <not set>$^{2,7}$ \NC \NR +\NC \type {\Umathspaceafterscript} \NC -- \NC SpaceAfterScript \NC script_space$^4$ \NC \NR +\NC \type {\Umathstackdenomdown} \NC D, D' \NC StackBottomDisplayStyleShiftDown \NC denom1 \NC \NR +\NC " \NC T, T', S, S', SS, SS' \NC StackBottomShiftDown \NC denom2 \NC \NR +\NC \type {\Umathstacknumup} \NC D, D' \NC StackTopDisplayStyleShiftUp \NC num1 \NC \NR +\NC " \NC T, T', S, S', SS, SS' \NC StackTopShiftUp \NC num3 \NC \NR +\NC \type {\Umathstackvgap} \NC D, D' \NC StackDisplayStyleGapMin \NC 7*default_rule_thickness \NC \NR +\NC " \NC T, T', S, S', SS, SS' \NC StackGapMin \NC 3*default_rule_thickness \NC \NR +\NC \type {\Umathsubshiftdown} \NC -- \NC SubscriptShiftDown \NC sub1 \NC \NR +\NC \type {\Umathsubshiftdrop} \NC -- \NC SubscriptBaselineDropMin \NC sub_drop \NC \NR +\NC \type {\Umathsubsupshiftdown} \NC -- \NC SubscriptShiftDownWithSuperscript$^8$ \NC \NC \NR +\NC \NC \NC \quad\ or SubscriptShiftDown \NC sub2 \NC \NR +\NC \type {\Umathsubtopmax} \NC -- \NC SubscriptTopMax \NC (abs(math_x_height * 4) / 5) \NC \NR +\NC \type {\Umathsubsupvgap} \NC -- \NC SubSuperscriptGapMin \NC 4*default_rule_thickness \NC \NR +\NC \type {\Umathsupbottommin} \NC -- \NC SuperscriptBottomMin \NC (abs(math_x_height) / 4) \NC \NR +\NC \type {\Umathsupshiftdrop} \NC -- \NC SuperscriptBaselineDropMax \NC sup_drop \NC \NR +\NC \type {\Umathsupshiftup} \NC D \NC SuperscriptShiftUp \NC sup1 \NC \NR +\NC " \NC T, S, SS, \NC SuperscriptShiftUp \NC sup2 \NC \NR +\NC " \NC D', T', S', SS' \NC SuperscriptShiftUpCramped \NC sup3 \NC \NR +\NC \type {\Umathsupsubbottommax} \NC -- \NC SuperscriptBottomMaxWithSubscript \NC (abs(math_x_height * 4) / 5) \NC \NR +\NC \type {\Umathunderbarkern} \NC -- \NC UnderbarExtraDescender \NC default_rule_thickness \NC \NR +\NC \type {\Umathunderbarrule} \NC -- \NC UnderbarRuleThickness \NC default_rule_thickness \NC \NR +\NC \type {\Umathunderbarvgap} \NC -- \NC UnderbarVerticalGap \NC 3*default_rule_thickness \NC \NR +\NC \type {\Umathconnectoroverlapmin} \NC -- \NC MinConnectorOverlap \NC 0$^5$ \NC \NR +\stoptabulate + +\stop + +Note 1: \OPENTYPE\ fonts set \type {\Umathlimitabovekern} and \type +{\Umathlimitbelowkern} to zero and set \type {\Umathquad} to the font size of the +used font, because these are not supported in the \type {MATH} table, + +Note 2: \TFM\ fonts do not set \type {\Umathradicalrule} because \TEX82\ uses the +height of the radical instead. When this parameter is indeed not set when +\LUATEX\ has to typeset a radical, a backward compatibility mode will kick in +that assumes that an oldstyle \TEX\ font is used. Also, they do not set \type +{\Umathradicaldegreebefore}, \type {\Umathradicaldegreeafter}, and \type +{\Umathradicaldegreeraise}. These are then automatically initialized to +$5/18$quad, $-10/18$quad, and 60. + +Note 3: If tfm fonts are used, then the \type {\Umathradicalvgap} is not set until +the first time \LUATEX\ has to typeset a formula because this needs parameters +from both family2 and family3. This provides a partial backward compatibility +with \TEX82, but that compatibility is only partial: once the \type +{\Umathradicalvgap} is set, it will not be recalculated any more. + +Note 4: (also if tfm fonts are used) A similar situation arises wrt. \type +{\Umathspaceafterscript}: it is not set until the first time \LUATEX\ has to +typeset a formula. This provides some backward compatibility with \TEX82. But +once the \type {\Umathspaceafterscript} is set, \type {\scriptspace} will never be +looked at again. + +Note 5: Tfm fonts set \type {\Umathconnectoroverlapmin} to zero because \TEX82\ +always stacks extensibles without any overlap. + +Note 6: The \type {\Umathoperatorsize} is only used in \type {\displaystyle}, and is +only set in \OPENTYPE\ fonts. In \TFM\ font mode, it is artificially set to one +scaled point more than the initial attempt's size, so that always the \quote +{first next} will be tried, just like in \TEX82. + +Note 7: The \type {\Umathradicaldegreeraise} is a special case because it is the +only parameter that is expressed in a percentage instead of as a number of scaled +points. + +Note 8: \type {SubscriptShiftDownWithSuperscript} does not actually exist in the +\quote {standard} Opentype Math font Cambria, but it is useful enough to be +added. + +Note 9: \type {FractionDelimiterDisplayStyleSize} and \type +{FractionDelimiterSize} do not actually exist in the \quote {standard} Opentype +Math font Cambria, but were useful enough to be added. + +\section{Math spacing setting} + +Besides the parameters mentioned in the previous sections, there are also 64 new +primitives to control the math spacing table (as explained in Chapter~18 of the +\TEX book). The primitive names are a simple matter of combining two math atom +types, but for completeness' sake, here is the whole list: + +\starttwocolumns +\starttyping +\Umathordordspacing +\Umathordopspacing +\Umathordbinspacing +\Umathordrelspacing +\Umathordopenspacing +\Umathordclosespacing +\Umathordpunctspacing +\Umathordinnerspacing +\Umathopordspacing +\Umathopopspacing +\Umathopbinspacing +\Umathoprelspacing +\Umathopopenspacing +\Umathopclosespacing +\Umathoppunctspacing +\Umathopinnerspacing +\Umathbinordspacing +\Umathbinopspacing +\Umathbinbinspacing +\Umathbinrelspacing +\Umathbinopenspacing +\Umathbinclosespacing +\Umathbinpunctspacing +\Umathbininnerspacing +\Umathrelordspacing +\Umathrelopspacing +\Umathrelbinspacing +\Umathrelrelspacing +\Umathrelopenspacing +\Umathrelclosespacing +\Umathrelpunctspacing +\Umathrelinnerspacing +\Umathopenordspacing +\Umathopenopspacing +\Umathopenbinspacing +\Umathopenrelspacing +\Umathopenopenspacing +\Umathopenclosespacing +\Umathopenpunctspacing +\Umathopeninnerspacing +\Umathcloseordspacing +\Umathcloseopspacing +\Umathclosebinspacing +\Umathcloserelspacing +\Umathcloseopenspacing +\Umathcloseclosespacing +\Umathclosepunctspacing +\Umathcloseinnerspacing +\Umathpunctordspacing +\Umathpunctopspacing +\Umathpunctbinspacing +\Umathpunctrelspacing +\Umathpunctopenspacing +\Umathpunctclosespacing +\Umathpunctpunctspacing +\Umathpunctinnerspacing +\Umathinnerordspacing +\Umathinneropspacing +\Umathinnerbinspacing +\Umathinnerrelspacing +\Umathinneropenspacing +\Umathinnerclosespacing +\Umathinnerpunctspacing +\Umathinnerinnerspacing +\stoptyping +\stoptwocolumns + +These parameters are of type \type {\muskip}, so setting a parameter can be done +like this: + +\starttyping +\Umathopordspacing\displaystyle=4mu plus 2mu +\stoptyping + +They are all initialized by initex to the values mentioned in the table in +Chapter~18 of the \TEX book. + +Note 1: for ease of use as well as for backward compatibility, \type +{\thinmuskip}, \type {\medmuskip} and \type {\thickmuskip} are treated +especially. In their case a pointer to the corresponding internal parameter is +saved, not the actual \type {\muskip} value. This means that any later changes to +one of these three parameters will be taken into account. + +Note 2: Careful readers will realise that there are also primitives for the items +marked \type {*} in the \TEX book. These will not actually be used as those +combinations of atoms cannot actually happen, but it seemed better not to break +orthogonality. They are initialized to zero. + +\section[mathacc]{Math accent handling} + +\LUATEX\ supports both top accents and bottom accents in math mode, and math +accents stretch automatically (if this is supported by the font the accent comes +from, of course). Bottom and combined accents as well as fixed-width math accents +are controlled by optional keywords following \type {\Umathaccent}. + +The keyword \type {bottom} after \type {\Umathaccent} signals that a bottom accent +is needed, and the keyword \type {both} signals that both a top and a bottom +accent are needed (in this case two accents need to be specified, of course). + +Then the set of three integers defining the accent is read. This set of integers +can be prefixed by the \type {fixed} keyword to indicate that a non-stretching +variant is requested (in case of both accents, this step is repeated). + +A simple example: + +\starttyping +\Umathaccent both fixed 0 0 "20D7 fixed 0 0 "20D7 {example} +\stoptyping + +If a math top accent has to be placed and the accentee is a character and has a +non-zero \type {top_accent} value, then this value will be used to place the +accent instead of the \type {\skewchar} kern used by \TEX82. + +The \type {top_accent} value represents a vertical line somewhere in the +accentee. The accent will be shifted horizontally such that its own \type +{top_accent} line coincides with the one from the accentee. If the \type +{top_accent} value of the accent is zero, then half the width of the accent +followed by its italic correction is used instead. + +The vertical placement of a top accent depends on the \type {x_height} of the +font of the accentee (as explained in the \TEX book), but if value that turns out +to be zero and the font had a MathConstants table, then \type {AccentBaseHeight} +is used instead. + +If a math bottom accent has to be placed, the \type {bot_accent} value is checked +instead of \type {top_accent}. Because bottom accents do not exist in \TEX82, the +\type {\skewchar} kern is ignored. + +The vertical placement of a bottom accent is straight below the accentee, no +correction takes place. + +\section{Math root extension} + +The new primitive \type {\Uroot} allows the construction of a radical noad +including a degree field. Its syntax is an extension of \type {\Uradical}: + +\starttyping +\Uradical <fam integer> <char integer> <radicand> +\Uroot <fam integer> <char integer> <degree> <radicand> +\stoptyping + +The placement of the degree is controlled by the math parameters \type +{\Umathradicaldegreebefore}, \type {\Umathradicaldegreeafter}, and \type +{\Umathradicaldegreeraise}. The degree will be typeset in \type +{\scriptscriptstyle}. + +\section{Math kerning in super- and subscripts} + +The character fields in a lua-loaded OpenType math font can have a \quote +{mathkern} table. The format of this table is the same as the \quote {mathkern} +table that is returned by the \type {fontloader} library, except that all height +and kern values have to be specified in actual scaled points. + +When a super- or subscript has to be placed next to a math item, \LUATEX\ checks +whether the super- or subscript and the nucleus are both simple character items. +If they are, and if the fonts of both character imtes are OpenType fonts (as +opposed to legacy \TEX\ fonts), then \LUATEX\ will use the OpenType MATH +algorithm for deciding on the horizontal placement of the super- or subscript. + +This works as follows: + +\startitemize + \startitem + The vertical position of the script is calculated. + \stopitem + \startitem + The default horizontal position is flat next to the base character. + \stopitem + \startitem + For superscripts, the italic correction of the base character is added. + \stopitem + \startitem + For a superscript, two vertical values are calculated: the bottom of the + script (after shifting up), and the top of the base. For a subscript, the two + values are the top of the (shifted down) script, and the bottom of the base. + \stopitem + \startitem + For each of these two locations: + \startitemize + \startitem + find the mathkern value at this height for the base (for a subscript + placement, this is the bottom_right corner, for a superscript + placement the top_right corner) + \stopitem + \startitem + find the mathkern value at this height for the script (for a + subscript placement, this is the top_left corner, for a superscript + placement the bottom_left corner) + \stopitem + \startitem + add the found values together to get a preliminary result. + \stopitem + \stopitemize + \stopitem + \startitem + The horizontal kern to be applied is the smallest of the two results from + previous step. + \stopitem +\stopitemize + +The mathkern value at a specific height is the kern value that is specified by the +next higher height and kern pair, or the highest one in the character (if there is no +value high enough in the character), or simply zero (if the character has no mathkern +pairs at all). + +\section{Scripts on horizontally extensible items like arrows} + +The primitives \type {\Uunderdelimiter} and \type {\Uoverdelimiter} allow the +placement of a subscript or superscript on an automatically extensible item and +\type {\Udelimiterunder} and \type {\Udelimiterover} allow the placement of an +automatically extensible item as a subscript or superscript on a nucleus. The +input: + +% these produce radical noads .. in fact the code base has the numbers wrong for +% quite a while, so no one seems to use this + +\startbuffer +$\Uoverdelimiter 0 "2194 {\hbox{\strut overdelimiter}}$ +$\Uunderdelimiter 0 "2194 {\hbox{\strut underdelimiter}}$ +$\Udelimiterover 0 "2194 {\hbox{\strut delimiterover}}$ +$\Udelimiterunder 0 "2194 {\hbox{\strut delimiterunder}}$ +\stopbuffer + +\typebuffer will render this: + +\blank \startnarrower \getbuffer \stopnarrower \blank + +The vertical placements are controlled by \type {\Umathunderdelimiterbgap}, \type +{\Umathunderdelimitervgap}, \type {\Umathoverdelimiterbgap}, and \type +{\Umathoverdelimitervgap} in a similar way as limit placements on large operators. +The superscript in \type {\Uoverdelimiter} is typeset in a suitable scripted style, +the subscript in \type {\Uunderdelimiter} is cramped as well. + +\section {Extensible delimiters} + +\LUATEX\ internally uses a structure that supports \OPENTYPE\ \quote +{MathVariants} as well as \TFM\ \quote {extensible recipes}. + +\section{Other Math changes} + +\subsection {Verbose versions of single-character math commands} + +\LUATEX\ defines six new primitives that have the same function as +\type {^}, \type {_}, \type {$}, and \type {$$}. %$ + +\starttabulate[|l|l|l|l|] +\NC \bf primitive \NC \bf explanation \NC \NR +\NC \type {\Usuperscript} \NC Duplicates the functionality of \type {^} \NC \NR +\NC \type {\Usubscript} \NC Duplicates the functionality of \type {_} \NC \NR +\NC \type {\Ustartmath} \NC Duplicates the functionality of \type {$}, % $ + when used in non-math mode. \NC \NR +\NC \type {\Ustopmath} \NC Duplicates the functionality of \type {$}, % $ + when used in inline math mode. \NC \NR +\NC \type {\Ustartdisplaymath} \NC Duplicates the functionality of \type {$$}, % $$ + when used in non-math mode. \NC \NR +\NC \type {\Ustopdisplaymath} \NC Duplicates the functionality of \type {$$}, % $$ + when used in display math mode. \NC \NR +\stoptabulate + +The \type {\Ustopmath} and \type {\Ustopdisplaymath} primitives check if the current +math mode is the correct one (inline vs.\ displayed), but you can freely intermix +the four mathon|/|mathoff commands with explicit dollar sign(s). + +\subsection{Allowed math commands in non-math modes} + +The commands \type {\mathchar}, and \type {\Umathchar} and control sequences that +are the result of \type {\mathchardef} or \type {\Umathchardef} are also +acceptable in the horizontal and vertical modes. In those cases, the \type +{\textfont} from the requested math family is used. + +\section{Math todo} + +The following items are still todo. + +\startitemize +\startitem + Pre-scripts. +\stopitem +\startitem + Multi-story stacks. +\stopitem +\startitem + Flattened accents for high characters (maybe). +\stopitem +\startitem + Better control over the spacing around displays and handling of equation numbers. +\stopitem +\startitem + Support for multi|-|line displays using \MATHML\ style alignment points. +\stopitem +\stopitemize + +\stopchapter + +\stopcomponent diff --git a/doc/context/sources/general/manuals/luatex/luatex-modifications.tex b/doc/context/sources/general/manuals/luatex/luatex-modifications.tex new file mode 100644 index 000000000..630528bec --- /dev/null +++ b/doc/context/sources/general/manuals/luatex/luatex-modifications.tex @@ -0,0 +1,499 @@ +\environment luatex-style +\environment luatex-logos + +\startcomponent luatex-modifications + +\startchapter[reference=modifications,title={Modifications}] + +\startsection[title=The merged engines] + +\startsubsection[title=The need for change] + +The first version of \LUATEX\ only had a few extra primitives and it was largely +the same as \PDFTEX. Then we merged substantial parts of \ALEPH\ into the code +and got more primitives. When we got more stable teh decision was made to clean +up the rather hybrid nature of the program. This means that some primnitives have +been promoted to core primitives, often with a different name, and that others +were removed. This made it possible to start cleaning up the code base. We will +describe most in following paragraphs. + +Besides the expected changes caused by new functionality, there are a number of +not|-|so|-|expected changes. These are sometimes a side|-|effect of a new +(conflicting) feature, or, more often than not, a change neccessary to clean up +the internal interfaces. These will also be mentioned. + +\stopsubsection + +\startsubsection[title=Changes from \TEX\ 3.1415926] + +Of course it all starts with traditional \TEX. Even if we started with \PDFTEX, +most still comes from the original. But we divert a bit. + +\startitemize + +\startitem + The current code base is written in \CCODE, not \PASCAL. We use \CWEB\ + when possible. +\stopitem + +\startitem + See \in {chapter} [languages] for many small changes related to paragraph + building, language handling and hyphenation. The most important change is + that adding a brace group in the middle of a word (like in \type {of{}fice}) + does not prevent ligature creation. +\stopitem + +\startitem + There is no pool file, all strings are embedded during compilation. +\stopitem + +\startitem + The specifier \type {plus 1 fillll} does not generate an error. The extra + \quote{l} is simply typeset. +\stopitem + +\startitem + The upper limit to \type {\endlinechar} and \type {\newlinechar} is 127. +\stopitem + +\startitem + The hz optimization code has been partially redone so that we no longer need + to create extra font instances. The front- and backend have been decoupled and + more efficient (\PDF) code is generated. +\stopitem + +\stopitemize + +\stopsubsection + +\startsubsection[title=Changes from \ETEX\ 2.2] + +Being the de factor standard extension of course we provide the \ETEX\ +functionality, but with a few small adaptions. + +\startitemize + +\startitem + The \ETEX\ functionality is always present and enabled so the prepended + asterisk or \type {-etex} switch for \INITEX\ is not needed. +\stopitem + +\startitem + The \TEXXET\ extension is not present, so the primitives \type + {\TeXXeTstate}, \type {\beginR}, \type {\beginL}, \type {\endR} and \type + {\endL} are missing. +\stopitem + +\startitem + Some of the tracing information that is output by \ETEX's \type + {\tracingassigns} and \type {\tracingrestores} is not there. +\stopitem + +\startitem + Register management in \LUATEX\ uses the \ALEPH\ model, so the maximum value + is 65535 and the implementation uses a flat array instead of the mixed + flat|\&|sparse model from \ETEX. +\stopitem + +\startitem + The \type {\savinghyphcodes} command is a no|-|op. \in {Chapter} [languages] + explains why. +\stopitem + +\startitem + When kpathsea is used to find files, \LUATEX\ uses the \type {ofm} file + format to search for font metrics. In turn, this means that \LUATEX\ looks at + the \type {OFMFONTS} configuration variable (like \OMEGA\ and \ALEPH) instead + of \type {TFMFONTS} (like \TEX\ and \PDFTEX). Likewise for virtual fonts + (\LUATEX\ uses the variable \type {OVFFONTS} instead of \type {VFFONTS}). +\stopitem + +\stopitemize + +\stopsubsection + +\startsubsection[title=Changes from \PDFTEX\ 1.40] + +Because we want to produce \PDF\ the most natural starting point was the popular +\PDFTEX\ program. We inherit the stable features, dropped most of the +experimental code and promoted some functionality to core \LUATEX\ functionality +which in turn triggered renaming primitives. + +\startitemize + +\startitem + The (experimental) support for snap nodes has been removed, because it is + much more natural to build this functionality on top of node processing and + attributes. The associated primitives that are now gone are: \type + {\pdfsnaprefpoint}, \type {\pdfsnapy}, and \type {\pdfsnapycomp}. +\stopitem + +\startitem + The (experimental) support for specialized spacing around nodes has also been + removed. The associated primitives that are now gone are: \type + {\pdfadjustinterwordglue}, \type {\pdfprependkern}, and \type {\pdfappendkern}, as + well as the five supporting primitives \type {\knbscode}, \type {\stbscode}, \type + {\shbscode}, \type {\knbccode}, and \type {\knaccode}. +\stopitem + +\startitem + A number of \quote {pdftex primitives} have been removed as they can be + implemented using \LUA: + + \start \raggedright + \type {\pdfelapsedtime}, \type {\pdfescapehex}, \type {\pdfescapename}, \type + {\pdfescapestring}, \type {\pdffiledump}, \type {\pdffilemoddate}, \type + {\pdffilesize}, \type {\pdfforcepagebox}, \type {\pdflastmatch}, \type + {\pdfmatch}, \type {\pdfmdfivesum}, \type {\pdfmovechars}, \type + {\pdfoptionalwaysusepdfpagebox}, \type {\pdfoptionpdfinclusionerrorlevel}, + \type {\pdfresettimer}, \type {\pdfshellescape}, \type {\pdfstrcmp} and \type + {\pdfunescapehex} + \par \stop +\stopitem + +\startitem + The version related primitives \type {\pdftexbanner}, \type {\pdftexversion} + and \type {\pdftexrevision} are no longer present as there is no longer a + strict relationship with \PDFTEX\ development. +\stopitem + +\startitem + The experimental snapper mechanism has been removed and therefore also the + primitives: + + \start \raggedright + \type {\pdfignoreddimen}, \type {\pdffirstlineheight}, \type + {\pdfeachlineheight}, \type {\pdfeachlinedepth} and \type + {\pdflastlinedepth} + \par \stop +\stopitem + +\startitem + The experimental primitives \type {\primitive}, \type {\ifprimitive}, \type + {\ifabsnum} and \type {\ifabsdim} are promoted to core primitives. The \type + {\pdf*} prefixed originals are not available. +\stopitem + +\startitem + The \PNG\ transparency fix from 1.40.6 is not applied as high|-|level + support is pending. +\stopitem + +\startitem + Two extra token lists are provides, \type {\pdfxformresources} and \type + {\pdfxformattr}, as an alternative to \type {\pdfxform} keywords. +\stopitem + +\startitem + The current version of \LUATEX\ no longer replaces and|/|or merges fonts in + embedded pdf files with fonts of the enveloping \PDF\ document. This + regression may be temporary, depending on how the rewritten font backend will + look like. +\stopitem + +\startitem + The primitives \type {\pdfpagewidth} and \type {\pdfpageheight} have been removed + because \type {\pagewidth} and \type {\pageheight} have that purpose. +\stopitem + +\startitem + The primitives \type {\pdfnormaldeviate}, \type {\pdfuniformdeviate}, \type + {\pdfsetrandomseed} and \type {\pdfrandomseed} have been promoted to core + primitives without \type {pdf} prefix so the original commands are no longer + recognized. +\stopitem + +\startitem + The primitives \type {\ifincsname}, \type {\expanded} and \type {\quitvmode} are now + core primitives. +\stopitem + +\startitem + As the hz and protrusion mechanism are part of the core the related + primitives \type {\lpcode}, \type {\rpcode}, \type {\efcode}, \type + {\leftmarginkern}, \type {\rightmarginkern} are promoted to core primitives. The + two commands \type {\protrudechars} and \type {\adjustspacing} replace their + prefixed with \type {\pdf} originals. +\stopitem + +\startitem + The \type {\tagcode} primitive is promoted to core primitive. +\stopitem + +\startitem + The \type {\letterspacefont} feature is now part of the core but will not be + changed (improved). We just provide it for legacy use. +\stopitem + +\startitem + The \type {\pdfnoligatures} primitive is now \type {\ignoreligaturesinfont}. +\stopitem + +\startitem + The \type {\pdffontexpand} primitive is now \type {\expandglyphsinfont}. +\stopitem + +\startitem + Because position tracking is also available in \DVI\ mode the + \type {\savepos}, \type {\lastxpos} and \type {\lastypos} commands now + replace their \type {pdf} prefixed originals. +\stopitem + +\startitem + Candidates for removal are \type {\pdfcolorstackinit} and \type + {\pdfcolorstack}. +\stopitem + +\startitem + Candidates for replacement are \type {\pdfoutput} (\type {\outputmode}) and + \type {\pdfmatrix} (something with a normal syntax). +\stopitem + +\stopitemize + +\stopsubsection + +\startsubsection[title=Changes from \ALEPH\ RC4] + +Because we wanted proper directional typesetting the \ALEPH\ mechanisms looked +most attractive. These are rather close to the ones provided by \OMEGA, so what +we say next applies to both these programs. + +\startitemize + +\startitem + The extended 16-bit math primitives (\type {\omathcode} etc.) have been + removed. +\stopitem + +\startitem + The \OCP\ processing is no longer supported at all. As a consequence, the + following primitives have been removed: + + \start \raggedright + \type {\ocp}, \type {\externalocp}, \type {\ocplist}, \type {\pushocplist}, + \type {\popocplist}, \type {\clearocplists}, \type {\addbeforeocplist}, \type + {\addafterocplist}, \type {\removebeforeocplist}, \type {\removeafterocplist} + and \type {\ocptracelevel} + \par \stop +\stopitem + +\startitem + \LUATEX\ only understands 4~of the 16~direction specifiers of \ALEPH: \type + {TLT} (latin), \type {TRT} (arabic), \type {RTT} (cjk), \type {LTL} + (mongolian). All other direction specifiers generate an error. +\stopitem + +\startitem + The input translations from \ALEPH\ are not implemented, the related + primitives are not available: + + \start \raggedright + \type {\DefaultInputMode}, \type {\noDefaultInputMode}, \type {\noInputMode}, + \type {\InputMode}, \type {\DefaultOutputMode}, \type {\noDefaultOutputMode}, + \type {\noOutputMode}, \type {\OutputMode}, \type {\DefaultInputTranslation}, + \type {\noDefaultInputTranslation}, \type {\noInputTranslation}, \type + {\InputTranslation}, \type {\DefaultOutputTranslation}, \type + {\noDefaultOutputTranslation}, \type {\noOutputTranslation} and \type + {\OutputTranslation} + \par \stop +\stopitem + +\startitem + Several bugs hav ebeen fixed. The \type {\hoffset} bug when \type {\pagedir TRT} + is gone, removing the need for an explicit fix to \type {\hoffset}. Also bug + causing \type {\fam} to fail for family numbers above 15 is fixed. A fair amount + of other minor bugs are fixed as well, most of these related to \type + {\tracingcommands} output. +\stopitem + +\startitem + The scanner for direction specifications now allows an optional space after + the direction is completely parsed. +\stopitem + +\startitem + The \type {^^} notation can come in five and six item repetitions also, to + insert characters that do not fit in the BMP. +\stopitem + +\startitem + Glues {\it immediately after} direction change commands are not legal + breakpoints. +\stopitem + +\startitem + Several mechanisms that need to be right|-|to|-|left aware have been + improved. For instance placement of formula numbers. +\stopitem + +\startitem + The page dimension related primitives \type {\pagewidth} and \type {\pageheight} have + been promoted to core primitives. +\stopitem + +\startitem + The primitives \type {\charwd}, \type {\charht}, \type {\chardp} and \type {\charit} + have been removes as we have the \ETEX\ variants \type {\fontchar*}. +\stopitem + +\startitem + The two dimension registers \type {\pagerightoffset} and \type + {\pagebottomoffset} are now core primitives. +\stopitem + +\startitem + The direction related primitives \type {\pagedir}, \type {\bodydir}, \type + {\pardir}, \type {\textdir}, \type {\mathdir} and \type {\boxdir} are now + core primitives. +\stopitem + +\startitem + The promotion of primitives to core primitives as well as the removed of all + others mean that the initialization namespace \type {aleph} is gone. +\stopitem + +\stopitemize + +\stopsubsection + +\startsubsection[title=Changes from standard \WEBC] + +The compilation framework is \WEBC\ and we keep using that but without the +\PASCAL\ to \CCODE\ step. This framework also provides some common features that +deal with reading bytes from files and locating files in \TDS. This is what we do +different: + +\startitemize + +\startitem + There is no mltex support. +\stopitem + +\startitem + There is no enctex support. +\stopitem + +\startitem + The following commandline switches are silently ignored, even in non|-|\LUA\ + mode: \type {-8bit}, \type {-translate-file}, \type {-mltex}, \type {-enc} + and \type {-etex}. +\stopitem + +\startitem + The \type {\openout} whatsits are not written to the log file. +\stopitem + +\startitem + Some of the so|-|called web2c extensions are hard to set up in non|-|\KPSE\ + mode because \type {texmf.cnf} is not read: \type {shell-escape} is off (but + that is not a problem because of \LUA's \type {os.execute}), and the paranoia + checks on \type {openin} and \type {openout} do not happen (however, it is + easy for a \LUA\ script to do this itself by overloading \type {io.open}). +\stopitem + +\startitem + The \quote{E} option does not do anything useful. +\stopitem + +\stopitemize + +\stopsubsection + +\stopsection + +\startsection[title=Implementation notes] + +\startsubsection[title=Memory allocation] + +The single internal memory heap that traditional \TEX\ used for tokens and nodes +is split into two separate arrays. Each of these will grow dynamically when +needed. + +The \type {texmf.cnf} settings related to main memory are no longer used (these +are: \type {main_memory}, \type {mem_bot}, \type {extra_mem_top} and \type +{extra_mem_bot}). \quote {Out of main memory} errors can still occur, but the +limiting factor is now the amount of RAM in your system, not a predefined limit. + +Also, the memory (de)allocation routines for nodes are completely rewritten. The +relevant code now lives in the C file \type {texnode.c}, and basically uses a +dozen or so \quote {avail} lists instead of a doubly|-|linked model. An extra +function layer is added so that the code can ask for nodes by type instead of +directly requisitioning a certain amount of memory words. + +Because of the split into two arrays and the resulting differences in the data +structures, some of the macros have been duplicated. For instance, there are now +\type {vlink} and \type {vinfo} as well as \type {token_link} and \type +{token_info}. All access to the variable memory array is now hidden behind a +macro called \type {vmem}. + +The implementation of the growth of two arrays (via reallocation) introduces a +potential pitfall: the memory arrays should never be used as the left hand side +of a statement that can modify the array in question. + +The input line buffer and pool size are now also reallocated when needed, and the +\type {texmf.cnf} settings \type {buf_size} and \type {pool_size} are silently +ignored. + +\stopsubsection + +\startsubsection[title=Sparse arrays] + +The \type {\mathcode}, \type {\delcode}, \type {\catcode}, \type {\sfcode}, \type {\lccode} +and \type {\uccode} tables are now sparse arrays that are implemented in~\CCODE. +They are no longer part of the \TEX\ \quote {equivalence table} and because each +had 1.1 million entries with a few memory words each, this makes a major +difference in memory usage. + +The \type {\catcode}, \type {\sfcode}, \type {\lccode} and \type {\uccode} assignments do +not yet show up when using the etex tracing routines \type {\tracingassigns} and +\type {\tracingrestores} (code simply not written yet). + +A side|-|effect of the current implementation is that \type {\global} is now more +expensive in terms of processing than non|-|global assignments. + +See \type {mathcodes.c} and \type {textcodes.c} if you are interested in the +details. + +Also, the glyph ids within a font are now managed by means of a sparse array and +glyph ids can go up to index $2^{21}-1$. + +\stopsubsection + +\startsubsection[title=Simple single-character csnames] + +Single|-|character commands are no longer treated specially in the internals, +they are stored in the hash just like the multiletter csnames. + +The code that displays control sequences explicitly checks if the length is one +when it has to decide whether or not to add a trailing space. + +Active characters are internally implemented as a special type of multi|-|letter +control sequences that uses a prefix that is otherwise impossible to obtain. + +\stopsubsection + +\startsubsection[title=Compressed format] + +The format is passed through zlib, allowing it to shrink to roughly half of the +size it would have had in uncompressed form. This takes a bit more \CPU\ cycles +but much less disk \IO, so it should still be faster. + +\stopsubsection + +\startsubsection[title=Binary file reading] + +All of the internal code is changed in such a way that if one of the \type +{read_xxx_file} callbacks is not set, then the file is read by a C function using +basically the same convention as the callback: a single read into a buffer big +enough to hold the entire file contents. While this uses more memory than the +previous code (that mostly used \type {getc} calls), it can be quite a bit faster +(depending on your I/O subsystem). + +\stopsubsection + +\stopsection + +\stopchapter + +\stopcomponent diff --git a/doc/context/sources/general/manuals/luatex/luatex-nodes.tex b/doc/context/sources/general/manuals/luatex/luatex-nodes.tex new file mode 100644 index 000000000..6d4127341 --- /dev/null +++ b/doc/context/sources/general/manuals/luatex/luatex-nodes.tex @@ -0,0 +1,1291 @@ +\environment luatex-style +\environment luatex-logos + +\startcomponent luatex-nodes + +\startchapter[reference=nodes,title={Nodes}] + +\section{\LUA\ node representation} + +\TEX's nodes are represented in \LUA\ as userdata object with a variable set of +fields. In the following syntax tables, such the type of such a userdata object +is represented as \syntax {<node>}. + +The current return value of \type {node.types()} is: +\startluacode + for id, name in table.sortedhash(node.types()) do + context.type(name) + context(" (%s), ",id) + end + context.removeunwantedspaces() + context.removepunctuation() +\stopluacode +. % period + +The \type {\lastnodetype} primitive is \ETEX\ compliant. The valid range is still +$[-1,15]$ and glyph nodes (formerly known as char nodes) have number~0 while +ligature nodes are mapped to~7. That way macro packages can use the same symbolic +names as in traditional \ETEX. Keep in mind that the internal node numbers are +different and that there are more node types than~15. + +\subsection{Auxiliary items} + +A few node|-|typed userdata objects do not occur in the \quote {normal} list of +nodes, but can be pointed to from within that list. They are not quite the same +as regular nodes, but it is easier for the library routines to treat them as if +they were. + +\subsubsection{glue_spec items} + +Skips are about the only type of data objects in traditional \TEX\ that are not a +simple value. The structure that represents the glue components of a skip is +called a \type {glue_spec}, and it has the following accessible fields: + +\starttabulate[|lT|l|p|] +\NC \ssbf key \NC \bf type \NC \bf explanation \NC \NR +\NC width \NC number \NC \NC \NR +\NC stretch \NC number \NC \NC \NR +\NC stretch_order \NC number \NC \NC \NR +\NC shrink \NC number \NC \NC \NR +\NC shrink_order \NC number \NC \NC \NR +\NC writable \NC boolean \NC If this is true, you can't assign to this + \type {glue_spec} because it is one of the + preallocated special cases. \NC \NR +\stoptabulate + +These objects are reference counted, so there is actually an extra read-only +field named \type {ref_count} as well. This item type will likely disappear in +the future, and the glue fields themselves will become part of the nodes +referencing glue items. + +The effective width of some glue subtypes depends on the stretch or shrink needed +to make the encapsulating box fit its dimensions. For instance, in a paragraph +lines normally have glue representing spaces and these stretch of shrink to make +the content fit in the available space. The \type {effective_glue} function that +takes a glue node and a parent (hlist or vlist) returns the effective width of +that glue item. + +\subsubsection{attribute_list and attribute items} + +The newly introduced attribute registers are non|-|trivial, because the value +that is attached to a node is essentially a sparse array of key|-|value pairs. + +It is generally easiest to deal with attribute lists and attributes by using the +dedicated functions in the \type {node} library, but for completeness, here is +the low|-|level interface. + +An \type {attribute_list} item is used as a head pointer for a list of attribute +items. It has only one user-visible field: + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC next \NC \syntax{<node>} \NC + pointer to the first attribute +\NC \NR +\stoptabulate + +A normal node's attribute field will point to an item of type \type +{attribute_list}, and the \type {next} field in that item will point to the first +defined \quote {attribute} item, whose \type {next} will point to the second +\quote {attribute} item, etc. + +Valid fields in \type {attribute} items: + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC next \NC \syntax{<node>} \NC pointer to the next attribute \NC \NR +\NC number \NC number \NC the attribute type id \NC \NR +\NC value \NC number \NC the attribute value \NC \NR +\stoptabulate + +As mentioned it's better to use the official helpers rather than edit these +fields directly. For instance the \type {prev} field is used for other purposes +and there is no double linked list. + +\subsubsection{action item} + +Valid fields: \showfields{action}\crlf +Id: \showid{action} + +These are a special kind of item that only appears inside \PDF\ start link +objects. + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC action_type \NC number \NC \NC \NR +\NC action_id \NC number or string \NC \NC \NR +\NC named_id \NC number \NC \NC \NR +\NC file \NC string \NC \NC \NR +\NC new_window \NC number \NC \NC \NR +\NC data \NC string \NC \NC \NR +\NC ref_count \NC number \NC + read-only +\NC \NR +\stoptabulate + +\subsection{Main text nodes} + +These are the nodes that comprise actual typesetting commands. + +A few fields are present in all nodes regardless of their type, these are: + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC next \NC \syntax{<node>} \NC the next node in a list, or nil \NC \NR +\NC id \NC number \NC the node's type (\type {id}) number \NC \NR +\NC subtype \NC number \NC the node \type {subtype} identifier \NC \NR +\stoptabulate + +The \type {subtype} is sometimes just a stub entry. Not all nodes actually use +the \type {subtype}, but this way you can be sure that all nodes accept it as a +valid field name, and that is often handy in node list traversal. In the +following tables \type {next} and \type {id} are not explicitly mentioned. + +Besides these three fields, almost all nodes also have an \type {attr} field, and +there is a also a field called \type {prev}. That last field is always present, +but only initialized on explicit request: when the function \type {node.slide()} +is called, it will set up the \type {prev} fields to be a backwards pointer in +the argument node list. + +\subsubsection{hlist nodes} + +Valid fields: \showfields{hlist}\crlf +Id: \showid{hlist} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC subtype \NC number \NC \type {0} = unknown origin, + \type {1} = created by linebreaking, + \type {2} = explicit box command, + \type {3} = paragraph indentation box, + \type {4} = alignment column or row, + \type {5} = alignment cell \NC \NR +\NC attr \NC \syntax{<node>} \NC The head of the associated attribute + list \NC \NR +\NC width \NC number \NC \NC \NR +\NC height \NC number \NC \NC \NR +\NC depth \NC number \NC \NC \NR +\NC shift \NC number \NC a displacement perpendicular to the + character progression direction \NC \NR +\NC glue_order \NC number \NC a number in the range $[0,4]$, indicating + the glue order \NC \NR +\NC glue_set \NC number \NC the calculated glue ratio \NC \NR +\NC glue_sign \NC number \NC \type {0} = normal, + \type {1} = stretching, + \type {2} = shrinking \NC \NR +\NC head \NC \syntax{<node>} \NC the first node of the body of this + list \NC \NR +\NC dir \NC string \NC the direction of this box, + see~\in[dirnodes] \NC \NR +\stoptabulate + +A warning: never assign a node list to the \type {head} field unless you are sure +its internal link structure is correct, otherwise an error may result. + +Note: the new field name \type {head} was introduced in 0.65 to replace the old +name \type {list}. Use of the name \type {list} is now deprecated, but it will +stay available until at least version 0.80. + +\subsubsection{vlist nodes} + +Valid fields: As for hlist, except that \quote {shift} is a displacement +perpendicular to the line progression direction, and \quote {subtype} only has +subtypes~0, 4, and~5. + +\subsubsection{rule nodes} + +Valid fields: \showfields{rule}\crlf +Id: \showid{rule} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC subtype \NC number \NC unused \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC width \NC number \NC the width of the rule; the special value + $-1073741824$ is used for \quote + {running} glue dimensions \NC \NR +\NC height \NC number \NC the height of the rule (can be + negative) \NC \NR +\NC depth \NC number \NC the depth of the rule (can be + negative) \NC \NR +\NC dir \NC string \NC the direction of this rule, + see~\in[dirnodes] \NC \NR +\stoptabulate + +\subsubsection{ins nodes} + +Valid fields: \showfields{ins}\crlf +Id: \showid{ins} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC subtype \NC number \NC the insertion class \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC cost \NC number \NC the penalty associated with this + insert \NC \NR +\NC height \NC number \NC \NC \NR +\NC depth \NC number \NC \NC \NR +\NC head/list \NC \syntax{<node>} \NC the first node of the body of this + insert \NC \NR +\NC spec \NC \syntax{<node>} \NC a pointer to the \type {\splittopskip} + glue spec \NC \NR +\stoptabulate + +A warning: never assign a node list to the \type {head} field unless you are sure +its internal link structure is correct, otherwise an error may be result. You can use +\type {list} instead (often in functions you want to use local variable swith similar +names and both names are equally sensible). + +\subsubsection{mark nodes} + +Valid fields: \showfields{mark}\crlf +Id: \showid{mark} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC subtype \NC number \NC unused \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC class \NC number \NC the mark class \NC \NR +\NC mark \NC table \NC a table representing a token list \NC \NR +\stoptabulate + +\subsubsection{adjust nodes} + +Valid fields: \showfields{adjust}\crlf +Id: \showid{adjust} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC subtype \NC number \NC \type {0} = normal, + \type {1} = \quote{pre} \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC head/list \NC \syntax{<node>} \NC adjusted material \NC \NR +\stoptabulate + +A warning: never assign a node list to the \type {head} field unless you are sure +its internal link structure is correct, otherwise an error may be result. + +\subsubsection{disc nodes} + +Valid fields: \showfields{disc}\crlf +Id: \showid{disc} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC subtype \NC number \NC indicates the source of a discretionary: + \type {0} = the \type {\discretionary} command, + \type {1} = the \type {\-} command, + \type {2} = added automatically following a \type {-}, + \type {3} = added by the hyphenation algorithm (simple), + \type {4} = added by the hyphenation algorithm (hard, first item), + \type {5} = added by the hyphenation algorithm (hard, second item) \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC pre \NC \syntax{<node>} \NC pointer to the pre|-|break text \NC \NR +\NC post \NC \syntax{<node>} \NC pointer to the post|-|break text \NC \NR +\NC replace \NC \syntax{<node>} \NC pointer to the no|-|break text \NC \NR +\NC penalty \NC number \NC the penalty associated with the break, + normally \type {\hyphenpenalty} or \type + {\exhyphenpenalty} \NC \NR +\stoptabulate + +The subtype numbers~4 and~5 belong to the \quote {of-f-ice} explanation given +elsewhere. + +Warning: never assign a node list to the \type {pre}, \type {post} or \type +{replace} field unless you are sure its internal link structure is correct, +otherwise an error may be result. This limnitation will disappear in the future, + +\subsubsection{math nodes} + +Valid fields: \showfields{math}\crlf +Id: \showid{math} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC subtype \NC number \NC \type {0} = on, + \type {1} = off \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC surround \NC number \NC width of the \type {\mathsurround} kern \NC \NR +\stoptabulate + +\subsubsection{glue nodes} + +Valid fields: \showfields{glue}\crlf +Id: \showid{glue} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC subtype \NC number \NC \type {0} = \type {\skip}, + \type {1-18} = internal glue parameters, + \type {100-103} = \quote {leader} subtypes \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC spec \NC \syntax{<node>} \NC pointer to a glue_spec item \NC \NR +\NC leader \NC \syntax{<node>} \NC pointer to a box or rule for leaders \NC \NR +\stoptabulate + +The exact meanings of the subtypes are as follows: + +\starttabulate[|rT|l|] +\NC 1 \NC \type {\lineskip} \NC \NR +\NC 2 \NC \type {\baselineskip} \NC \NR +\NC 3 \NC \type {\parskip} \NC \NR +\NC 4 \NC \type {\abovedisplayskip} \NC \NR +\NC 5 \NC \type {\belowdisplayskip} \NC \NR +\NC 6 \NC \type {\abovedisplayshortskip} \NC \NR +\NC 7 \NC \type {\belowdisplayshortskip} \NC \NR +\NC 8 \NC \type {\leftskip} \NC \NR +\NC 9 \NC \type {\rightskip} \NC \NR +\NC 10 \NC \type {\topskip} \NC \NR +\NC 11 \NC \type {\splittopskip} \NC \NR +\NC 12 \NC \type {\tabskip} \NC \NR +\NC 13 \NC \type {\spaceskip} \NC \NR +\NC 14 \NC \type {\xspaceskip} \NC \NR +\NC 15 \NC \type {\parfillskip} \NC \NR +\NC 16 \NC \type {\thinmuskip} \NC \NR +\NC 17 \NC \type {\medmuskip} \NC \NR +\NC 18 \NC \type {\thickmuskip} \NC \NR +\NC 100 \NC \type {\leaders} \NC \NR +\NC 101 \NC \type {\cleaders} \NC \NR +\NC 102 \NC \type {\xleaders} \NC \NR +\NC 103 \NC \type {\gleaders} \NC \NR +\stoptabulate + +\subsubsection{kern nodes} + +Valid fields: \showfields{kern}\crlf +Id: \showid{kern} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC subtype \NC number \NC \type {0} = from font, + \type {1} = from \type {\kern} or \type {\/}, + \type {2} = from \type {\accent} \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC kern \NC number \NC \NC \NR +\stoptabulate + + +\subsubsection{penalty nodes} + +Valid fields: \showfields{penalty}\crlf +Id: \showid{penalty} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC subtype \NC number \NC not used \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC penalty \NC number \NC \NC \NR +\stoptabulate + +\subsubsection[glyphnodes]{glyph nodes} + +Valid fields: \showfields{glyph}\crlf +Id: \showid{glyph} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \ssbf type \NC \ssbf explanation \NC \NR +\NC subtype \NC number \NC bitfield \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC char \NC number \NC \NC \NR +\NC font \NC number \NC \NC \NR +\NC lang \NC number \NC \NC \NR +\NC left \NC number \NC \NC \NR +\NC right \NC number \NC \NC \NR +\NC uchyph \NC boolean \NC \NC \NR +\NC components \NC \syntax{<node>} \NC pointer to ligature components \NC \NR +\NC xoffset \NC number \NC \NC \NR +\NC yoffset \NC number \NC \NC \NR +\NC width \NC number \NC \NC \NR +\NC height \NC number \NC \NC \NR +\NC depth \NC number \NC \NC \NR +\NC expansion_factor \NC number \NC \NC \NR +\stoptabulate + +A warning: never assign a node list to the components field unless you are sure +its internal link structure is correct, otherwise an error may be result. Valid +bits for the \type {subtype} field are: + +\starttabulate[|c|l|] +\NC \ssbf bit \NC \bf meaning \NC \NR +\NC 0 \NC character \NC \NR +\NC 1 \NC ligature \NC \NR +\NC 2 \NC ghost \NC \NR +\NC 3 \NC left \NC \NR +\NC 4 \NC right \NC \NR +\stoptabulate + +See \in {section} [charsandglyphs] for a detailed description of the \type +{subtype} field. + +The \type {expansion_factor} has been introduced as part of the separation +between font- and backend. It is the result of extensive experiments with a more +efficient implementation of expansion. Early versions of \LUATEX\ already +replaced multiple instances of fonts in the backend by scaling but contrary to +\PDFTEX\ in \LUATEX\ we now also got rid of font copies in the frontend and +replaced them by expansion factors that travel with glyph nodes. Apart from a +cleaner approach this is also a step towards a better separation between front- +and backend. + +The \type {is_char} function checks if a node is a glyphnode with a subtype still +less than 256. This function can be used to determine if applying font logic to a +glyph node makes sense. + +\subsubsection{margin_kern nodes} + +Valid fields: \showfields{margin_kern}\crlf +Id: \showid{margin_kern} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC subtype \NC number \NC \type {0} = left side, + \type {1} = right side \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC width \NC number \NC \NC \NR +\NC glyph \NC \syntax{<node>} \NC \NC \NR +\stoptabulate + +\subsection{Math nodes} + +These are the so||called \quote {noad}s and the nodes that are specifically +associated with math processing. Most of these nodes contain subnodes so that the +list of possible fields is actually quite small. First, the subnodes: + +\subsubsection{Math kernel subnodes} + +Many object fields in math mode are either simple characters in a specific family +or math lists or node lists. There are four associated subnodes that represent +these cases (in the following node descriptions these are indicated by the word +\type {<kernel>}). + +The \type {next} and \type {prev} fields for these subnodes are unused. + +\subsubsubsection{math_char and math_text_char subnodes} + +Valid fields: \showfields{math_char}\crlf +Id: \showid{math_char} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC char \NC number \NC \NC \NR +\NC fam \NC number \NC \NC \NR +\stoptabulate + +The \type {math_char} is the simplest subnode field, it contains the character +and family for a single glyph object. The \type {math_text_char} is a special +case that you will not normally encounter, it arises temporarily during math list +conversion (its sole function is to suppress a following italic correction). + +\subsubsubsection{sub_box and sub_mlist subnodes} + +Valid fields: \showfields{sub_box}\crlf +Id: \showid{sub_box} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>}\NC \NC \NR +\NC head \NC \syntax{<node>}\NC \NC \NR +\stoptabulate + +These two subnode types are used for subsidiary list items. For \type {sub_box}, +the \type {head} points to a \quote {normal} vbox or hbox. For \type {sub_mlist}, +the \type {head} points to a math list that is yet to be converted. + +A warning: never assign a node list to the \type {head} field unless you are sure +its internal link structure is correct, otherwise an error may be result. + +\subsubsection{Math delimiter subnode} + +There is a fifth subnode type that is used exclusively for delimiter fields. As +before, the \type {next} and \type {prev} fields are unused. + +\subsubsubsection{delim subnodes} + +Valid fields: \showfields{delim}\crlf +Id: \showid{delim} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC\bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC small_char \NC number \NC \NC \NR +\NC small_fam \NC number \NC \NC \NR +\NC large_char \NC number \NC \NC \NR +\NC large_fam \NC number \NC \NC \NR +\stoptabulate + +The fields \type {large_char} and \type {large_fam} can be zero, in that case the +font that is sed for the \type {small_fam} is expected to provide the large +version as an extension to the \type {small_char}. + +\subsubsection{Math core nodes} + +First, there are the objects (the \TEX book calls then \quote {atoms}) that are +associated with the simple math objects: Ord, Op, Bin, Rel, Open, Close, Punct, +Inner, Over, Under, Vcent. These all have the same fields, and they are combined +into a single node type with separate subtypes for differentiation. + +\subsubsubsection{simple nodes} + +Valid fields: \showfields{noad}\crlf +Id: \showid{noad} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC subtype \NC number \NC see below \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC nucleus \NC \syntax{<kernel>} \NC \NC \NR +\NC sub \NC \syntax{<kernel>} \NC \NC \NR +\NC sup \NC \syntax{<kernel>} \NC \NC \NR +\stoptabulate + +Operators are a bit special because they occupy three subtypes. \type {subtype}. + +\starttabulate[|lT|p|] +\NC \ssbf number \NC \bf node subtype \NC \NR +\NC 0 \NC Ord \NC \NR +\NC 1 \NC Op: \type {\displaylimits} \NC \NR +\NC 2 \NC Op: \type {\limits} \NC \NR +\NC 3 \NC Op: \type {\nolimits} \NC \NR +\NC 4 \NC Bin \NC \NR +\NC 5 \NC Rel \NC \NR +\NC 6 \NC Open \NC \NR +\NC 7 \NC Close \NC \NR +\NC 8 \NC Punct \NC \NR +\NC 9 \NC Inner \NC \NR +\NC 10 \NC Under \NC \NR +\NC 11 \NC Over \NC \NR +\NC 12 \NC Vcent \NC \NR +\stoptabulate + +\subsubsubsection{accent nodes} + +Valid fields: \showfields{accent}\crlf +Id: \showid{accent} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC subtype \NC number \NC the first bit is used for a fixed top + accent flag (if the \type {accent} + field is present), the second bit for a + fixed bottom accent flag (if the \type + {bot_accent} field is present); example: + the actual value \type {3} means: do + not stretch either accent \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC nucleus \NC \syntax{<kernel>} \NC \NC \NR +\NC sub \NC \syntax{<kernel>} \NC \NC \NR +\NC sup \NC \syntax{<kernel>} \NC \NC \NR +\NC accent \NC \syntax{<kernel>} \NC \NC \NR +\NC bot_accent \NC \syntax{<kernel>} \NC \NC \NR +\stoptabulate + +\subsubsubsection{style nodes} + +Valid fields: \showfields{style}\crlf Id: \showid{style} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC style \NC string \NC contains the style \NC \NR +\stoptabulate + +There are eight possibilities for the string value: one of \quote {display}, +\quote {text}, \quote {script}, or \quote {scriptscript}. Each of these can have +a trailing \type {'} to signify \quote {cramped} styles. + +\subsubsubsection{choice nodes} + +Valid fields: \showfields{choice}\crlf Id: \showid{choice} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC display \NC \syntax{<node>} \NC \NC \NR +\NC text \NC \syntax{<node>} \NC \NC \NR +\NC script \NC \syntax{<node>} \NC \NC \NR +\NC scriptscript \NC \syntax{<node>} \NC \NC \NR +\stoptabulate + +A warning: never assign a node list to the display, text, script, or +scriptscript field unless you are sure its internal link structure is +correct, otherwise an error may be result. + +\subsubsubsection{radical nodes} + +Valid fields: \showfields{radical}\crlf Id: \showid{radical} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC nucleus \NC \syntax{<kernel>} \NC \NC \NR +\NC sub \NC \syntax{<kernel>} \NC \NC \NR +\NC sup \NC \syntax{<kernel>} \NC \NC \NR +\NC left \NC \syntax{<delim>} \NC \NC \NR +\NC degree \NC \syntax{<kernel>} \NC + Only set by \type {\Uroot} +\NC \NR +\stoptabulate + +A warning: never assign a node list to the nucleus, sub, sup, left, or degree +field unless you are sure its internal link structure is correct, otherwise an +error may be result. + +The radical noad is also used for under- and overdelimiters, which is indicated +by the subtypes: + +\starttabulate[|lT|l|] +\NC 0 \NC \type {\radical} \NC \NR +\NC 1 \NC \type {\Uradical} \NC \NR +\NC 2 \NC \type {\Uroot} \NC \NR +\NC 3 \NC \type {\Uunderdelimiter} \NC \NR +\NC 4 \NC \type {\Uoverdelimiter} \NC \NR +\NC 5 \NC \type {\Udelimiterunder} \NC \NR +\NC 6 \NC \type {\Udelimiterover} \NC \NR +\stoptabulate + +\subsubsubsection{fraction nodes} + +Valid fields: \showfields{fraction}\crlf +Id: \showid{fraction} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC width \NC number \NC \NC \NR +\NC num \NC \syntax{<kernel>} \NC \NC \NR +\NC denom \NC \syntax{<kernel>} \NC \NC \NR +\NC left \NC \syntax{<delim>} \NC \NC \NR +\NC right \NC \syntax{<delim>} \NC \NC \NR +\stoptabulate + +A warning: never assign a node list to the num, or denom field unless you are +sure its internal link structure is correct, otherwise an error may be result. + +\subsubsubsection{fence nodes} + +Valid fields: \showfields{fence}\crlf Id: \showid{fence} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC subtype \NC number \NC + \type {1} = \type {\left}, + \type {2} = \type {\middle}, + \type {3} = \type {\right} +\NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC delim \NC \syntax{<delim>} \NC \NC \NR +\stoptabulate + +\subsection{whatsit nodes} + +Whatsit nodes come in many subtypes that you can ask for by running +\type {node.whatsits()}: +\startluacode + for id, name in table.sortedpairs(node.whatsits()) do + context.type(name) + context(" (%s), ",id) + end + context.removeunwantedspaces() + context.removepunctuation() +\stopluacode +. % period + +\subsubsection{open nodes} + +Valid fields: \showfields{whatsit,open}\crlf +Id: \showid{whatsit,open} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC stream \NC number \NC \TEX's stream id number \NC \NR +\NC name \NC string \NC file name \NC \NR +\NC ext \NC string \NC file extension \NC \NR +\NC area \NC string \NC file area (this may become obsolete) \NC \NR +\stoptabulate + +\subsubsection{write nodes} + +Valid fields: \showfields{whatsit,write}\crlf +Id: \showid{whatsit,write} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC stream \NC number \NC \TEX's stream id number \NC \NR +\NC data \NC table \NC a table representing the token list + to be written \NC \NR +\stoptabulate + +\subsubsection{close nodes} + +Valid fields: \showfields{whatsit,close}\crlf +Id: \showid{whatsit,close} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC stream \NC number \NC \TEX's stream id number \NC \NR +\stoptabulate + +\subsubsection{special nodes} + +Valid fields: \showfields{whatsit,special}\crlf +Id: \showid{whatsit,special} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC data \NC string \NC the \type {\special} information \NC \NR +\stoptabulate + +\subsubsection{language nodes} + +\LUATEX\ does not have language whatsits any more. All language information is +already present inside the glyph nodes themselves. This whatsit subtype will be +removed in the next release. + +\subsubsection{local_par nodes} + +Valid fields: \showfields{whatsit,local_par}\crlf +Id: \showid{whatsit,local_par} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC pen_inter \NC number \NC local interline penalty (from \type + {\localinterlinepenalty}) \NC \NR +\NC pen_broken \NC number \NC local broken penalty (from \type + {\localbrokenpenalty}) \NC \NR +\NC dir \NC string \NC the direction of this par. see~\in + [dirnodes] \NC \NR +\NC box_left \NC \syntax{<node>} \NC the \type {\localleftbox} \NC \NR +\NC box_left_width \NC number \NC width of the \type {\localleftbox} \NC \NR +\NC box_right \NC \syntax{<node>} \NC the \type {\localrightbox} +\NC \NR +\NC box_right_width \NC number \NC width of the \type {\localrightbox} \NC \NR +\stoptabulate + +A warning: never assign a node list to the \type {box_left} or \type {box_right} +field unless you are sure its internal link structure is correct, otherwise an +error may be result. + +\subsubsection[dirnodes]{dir nodes} + +Valid fields: \showfields{whatsit,dir}\crlf +Id: \showid{whatsit,dir} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC dir \NC string \NC the direction (but see below) \NC \NR +\NC level \NC number \NC nesting level of this direction whatsit \NC \NR +\NC dvi_ptr \NC number \NC a saved dvi buffer byte offset \NC \NR +\NC dir_h \NC number \NC a saved dvi position \NC \NR +\stoptabulate + +A note on \type {dir} strings. Direction specifiers are three|-|letter +combinations of \type {T}, \type {B}, \type {R}, and \type {L}. + +These are built up out of three separate items: + +\startitemize[packed] +\startitem + the first is the direction of the \quote{top} of paragraphs. +\stopitem +\startitem + the second is the direction of the \quote{start} of lines. +\stopitem +\startitem + the third is the direction of the \quote{top} of glyphs. +\stopitem +\stopitemize + +However, only four combinations are accepted: \type {TLT}, \type {TRT}, \type +{RTT}, and \type {LTL}. + +Inside actual \type {dir} whatsit nodes, the representation of \type {dir} is not +a three-letter but a four|-|letter combination. The first character in this case +is always either \type {+} or \type {-}, indicating whether the value is pushed +or popped from the direction stack. + +\subsubsection{pdf_literal nodes} + +Valid fields: \showfields{whatsit,pdf_literal}\crlf +Id: \showid{whatsit,pdf_literal} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC mode \NC number \NC the \quote {mode} setting of this + literal \NC \NR +\NC data \NC string \NC the \type {\pdfliteral} information \NC \NR +\stoptabulate + +Mode values: + +\starttabulate[|lT|p|] +\NC \ssbf value \NC \ssbf corresponding \type {\pdftex} keyword \NC \NR +\NC 0 \NC setorigin \NC \NR +\NC 1 \NC page \NC \NR +\NC 2 \NC direct \NC \NR +\stoptabulate + +\subsubsection{pdf_refobj nodes} + +Valid fields: \showfields{whatsit,pdf_refobj}\crlf +Id: \showid{whatsit,pdf_refobj} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC objnum \NC number \NC the referenced \PDF\ object number \NC \NR +\stoptabulate + +\subsubsection{pdf_refxform nodes} + +Valid fields: \showfields{whatsit,pdf_refxform}\crlf +Id: \showid{whatsit,pdf_refxform}. + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC width \NC number \NC \NC \NR +\NC height \NC number \NC \NC \NR +\NC depth \NC number \NC \NC \NR +\NC objnum \NC number \NC the referenced \PDF\ object number \NC \NR +\stoptabulate + +Be aware that \type {pdf_refxform} nodes have dimensions that are used by \LUATEX. + +\subsubsection{pdf_refximage nodes} + +Valid fields: \showfields{whatsit,pdf_refximage}\crlf +Id: \showid{whatsit,pdf_refximage} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC width \NC number \NC \NC \NR +\NC height \NC number \NC \NC \NR +\NC depth \NC number \NC \NC \NR +\NC objnum \NC number \NC the referenced \PDF\ object number \NC \NR +\stoptabulate + +Be aware that \type {pdf_refximage} nodes have dimensions that are used by +\LUATEX. + +\subsubsection{pdf_annot nodes} + +Valid fields: \showfields{whatsit,pdf_annot}\crlf +Id: \showid{whatsit,pdf_annot} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC width \NC number \NC \NC \NR +\NC height \NC number \NC \NC \NR +\NC depth \NC number \NC \NC \NR +\NC objnum \NC number \NC the referenced \PDF\ object number \NC \NR +\NC data \NC string \NC the annotation data \NC \NR +\stoptabulate + +\subsubsection{pdf_start_link nodes} + +Valid fields: \showfields{whatsit,pdf_start_link}\crlf +Id: \showid{whatsit,pdf_start_link} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC width \NC number \NC \NC \NR +\NC height \NC number \NC \NC \NR +\NC depth \NC number \NC \NC \NR +\NC objnum \NC number \NC the referenced \PDF\ object number \NC \NR +\NC link_attr \NC table \NC the link attribute token list \NC \NR +\NC action \NC \syntax{<node>} \NC the action to perform \NC \NR +\stoptabulate + +\subsubsection{pdf_end_link nodes} + +Valid fields: \showfields{whatsit,pdf_end_link}\crlf +Id: \showid{whatsit,pdf_end_link} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\stoptabulate + +\subsubsection{pdf_dest nodes} + +Valid fields: \showfields{whatsit,pdf_dest}\crlf +Id: \showid{whatsit,pdf_dest} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC width \NC number \NC \NC \NR +\NC height \NC number \NC \NC \NR +\NC depth \NC number \NC \NC \NR +\NC named_id \NC number \NC is the dest_id a string value? \NC \NR +\NC dest_id \NC number \NC the destination id \NC \NR +\NC \NC string \NC the destination name \NC \NR +\NC dest_type \NC number \NC type of destination \NC \NR +\NC xyz_zoom \NC number \NC \NC \NR +\NC objnum \NC number \NC the \PDF\ object number \NC \NR +\stoptabulate + +\subsubsection{pdf_thread nodes} + +Valid fields: \showfields{whatsit,pdf_thread}\crlf +Id: \showid{whatsit,pdf_thread} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC width \NC number \NC \NC \NR +\NC height \NC number \NC \NC \NR +\NC depth \NC number \NC \NC \NR +\NC named_id \NC number \NC is the tread_id a string value? \NC \NR +\NC tread_id \NC number \NC the thread id \NC \NR +\NC \NC string \NC the thread name \NC \NR +\NC thread_attr \NC number \NC extra thread information \NC \NR +\stoptabulate + +\subsubsection{pdf_start_thread nodes} + +Valid fields: \showfields{whatsit,pdf_start_thread}\crlf +Id: \showid{whatsit,pdf_start_thread} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC width \NC number \NC \NC \NR +\NC height \NC number \NC \NC \NR +\NC depth \NC number \NC \NC \NR +\NC named_id \NC number \NC is the tread_id a string value? \NC \NR +\NC tread_id \NC number \NC the thread id \NC \NR +\NC \NC string \NC the thread name \NC \NR +\NC thread_attr \NC number \NC extra thread information \NC \NR +\stoptabulate + +\subsubsection{pdf_end_thread nodes} + +Valid fields: \showfields{whatsit,pdf_end_thread}\crlf +Id: \showid{whatsit,pdf_end_thread} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\stoptabulate + +\subsubsection{save_pos nodes} + +Valid fields: \showfields{whatsit,save_pos}\crlf +Id: \showid{whatsit,save_pos} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\stoptabulate + +\subsubsection{late_lua nodes} + +Valid fields: \showfields{whatsit,late_lua}\crlf +Id: \showid{whatsit,late_lua} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC data \NC string \NC data to execute \NC \NR +\NC string \NC string \NC data to execute \NC \NR +\NC name \NC string \NC the name to use for lua error reporting \NC \NR +\stoptabulate + +The difference between \type {data} and \type {string} is that on assignment, the +\type {data} field is converted to a token list, cf. use as \type {\latelua}. The +\type {string} version is treated as a literal string. + +\subsubsection{pdf_colorstack nodes} + +Valid fields: \showfields{whatsit,pdf_colorstack}\crlf +Id: \showid{whatsit,pdf_colorstack} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC stack \NC number \NC colorstack id number \NC \NR +\NC command \NC number \NC command to execute \NC \NR +\NC data \NC string \NC data \NC \NR +\stoptabulate + +\subsubsection{pdf_setmatrix nodes} + +Valid fields: \showfields{whatsit,pdf_setmatrix}\crlf +Id: \showid{whatsit,pdf_setmatrix} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC data \NC string \NC data \NC \NR +\stoptabulate + +\subsubsection{pdf_save nodes} + +Valid fields: \showfields{whatsit,pdf_save}\crlf +Id: \showid{whatsit,pdf_save} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\stoptabulate + +\subsubsection{pdf_restore nodes} + +Valid fields: \showfields{whatsit,pdf_restore}\crlf +Id: \showid{whatsit,pdf_restore} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\stoptabulate + +\subsubsection{user_defined nodes} + +User|-|defined whatsit nodes can only be created and handled from \LUA\ code. In +effect, they are an extension to the extension mechanism. The \LUATEX\ engine +will simply step over such whatsits without ever looking at the contents. + +Valid fields: \showfields{whatsit,user_defined}\crlf +Id: \showid{whatsit,user_defined} + +\starttabulate[|lT|l|p|] +\NC \ssbf field \NC \bf type \NC \bf explanation \NC \NR +\NC attr \NC \syntax{<node>} \NC \NC \NR +\NC user_id \NC number \NC id number \NC \NR +\NC type \NC number \NC type of the value \NC \NR +\NC value \NC number \NC \NC \NR +\NC \NC string \NC \NC \NR +\NC \NC \syntax{<node>} \NC \NC \NR +\NC \NC table \NC \NC \NR +\stoptabulate + +The \type {type} can have one of five distinct values: + +\starttabulate[|lT|p|] +\NC \ssbf value \NC \bf explanation \NC \NR +\NC 97 \NC the value is an attribute node list \NC \NR +\NC 100 \NC the value is a number \NC \NR +\NC 110 \NC the value is a node list \NC \NR +\NC 115 \NC the value is a string \NC \NR +\NC 116 \NC the value is a token list in \LUA\ table form \NC \NR +\stoptabulate + +\section{Two access models} + +After doing lots of tests with \LUATEX\ and \LUAJITTEX\, with and without just in +time compilation enabled, and with and without using ffi, we came to the +conclusion that userdata prevents a speedup. We also found that the checking of +metatables as well as assignment comes with overhead that can't be neglected. +This is normally not really a problem but when processing fonts for more complex +scripts it could have quite some overhead. + +Because the userdata approach has some benefits, this remains the recommended way +to access nodes. We did several experiments with faster access using this model, +but eventually settled for the \quote {direct} approach. For code that is proven +to be okay, one can use this access model that operates on nodes more directly. + +Deep down in \TEX\ a node has a number which is an entry in a memory table. In +fact, this model, where \TEX\ manages memory is real fast and one of the reasons +why plugging in callbacks that operate on nodes is quite fast. No matter what +future memory model \LUATEX\ has, an internal reference will always be a simple +data type (like a number or light userdata in \LUA\ speak). So, if you use the +direct model, even if you know that you currently deal with numbers, you should +not depend on that property but treat it an abstraction just like traditional +nodes. In fact, the fact that we use a simple basic datatype has the penalty that +less checking can be done, but less checking is also the reason why it's somewhat +faster. An important aspect is that one cannot mix both methods, but you can cast +both models. + +So our advice is: use the indexed approach when possible and investigate the +direct one when speed might be an issue. For that reason we also provide the +\type {get*} and \type {set*} functions in the top level node namespace. There is +a limited set of getters. When implementing this direct approach the regular +index by key variant was also optimized, so direct access only makes sense when +we're accessing nodes millions of times (which happens in some font processing +for instance). + +We're talking mostly of getters because setters are less important. Documents +have not that many content related nodes and setting many thousands of properties +is hardly a burden contrary to millions of consultations. + +Normally you will access nodes like this: + +\starttyping +local next = current.next +if next then + -- do something +end +\stoptyping + +Here \type {next} is not a real field, but a virtual one. Accessing it results in +a metatable method being called. In practice it boils down to looking up the node +type and based on the node type checking for the field name. In a worst case you +have a node type that sits at the end of the lookup list and a field that is last +in the lookup chain. However, in successive versions of \LUATEX\ these lookups +have been optimized and the most frequently accessed nodes and fields have a +higher priority. + +Because in practice the \type {next} accessor results in a function call, there +is some overhead involved. The next code does the same and performs a tiny bit +faster (but not that much because it is still a function call but one that knows +what to look up). + +\starttyping +local next = node.next(current) +if next then + -- do something +end +\stoptyping + +There are several such function based accessors now: + +\starttabulate[|T|p|] +\NC getnext \NC parsing nodelist always involves this one \NC \NR +\NC getprev \NC used less but is logical companion to getnext \NC \NR +\NC getboth \NC returns the next and prev pointer of a node \NC \NR +\NC getid \NC consulted a lot \NC \NR +\NC getsubtype \NC consulted less but also a topper \NC \NR +\NC getfont \NC used a lot in otf handling (glyph nodes are consulted a lot) \NC \NR +\NC getchar \NC idem and also in other places \NC \NR +\NC getdisc \NC returns the \type {pre}, \type {post} an d\type {replace} fields \NC \NR +\NC getlist \NC we often parse nested lists so this is a convenient one too + (only works for hlist and vlist!) \NC \NR +\NC getleader \NC comparable to list, seldom used in \TEX\ (but needs frequent consulting + like lists; leaders could have been made a dedicated node type) \NC \NR +\NC getfield \NC generic getter, sufficient for the rest (other field names are + often shared so a specific getter makes no sense then) \NC \NR +\stoptabulate + +Some have setter counterparts: + +There are several such function based accessors now: + +\starttabulate[|T|p|] +\NC setnext \NC assigns a value to the next field \NC \NR +\NC setprev \NC assigns a value to the prev field \NC \NR +\NC setboth \NC assigns a value to the prev and next field \NC \NR +\NC setlink \NC links two noded \NC \NR +\NC setchar \NC sets the character field \NC \NR +\NC setdisc \NC sets the \type {pre}, \type {post} and \type {replace} fields and optionally the + \type {subtype} and \type {penalty} fields \NC \NR \NC \NR +\NC getfont \NC used a lot in otf handling (glyph nodes are consulted a lot) \NC \NR +\NC getchar \NC idem and also in other places \NC \NR +\NC getdisc \NC returns the \type {pre}, \type {post} an d\type {replace} fields \NC \NR +\NC getlist \NC we often parse nested lists so this is a convenient one too + (only works for hlist and vlist!) \NC \NR +\NC getleader \NC comparable to list, seldom used in \TEX\ (but needs frequent consulting + like lists; leaders could have been made a dedicated node type) \NC \NR +\NC getfield \NC generic getter, sufficient for the rest (other field names are + often shared so a specific getter makes no sense then) \NC \NR +\stoptabulate + +It doesn't make sense to add more. Profiling demonstrated that these fields can +get accesses way more times than other fields. Even in complex documents, many +node and fields types never get seen, or seen only a few times. Most functions in +the \type {node} namespace have a companion in \type {node.direct}, but of course +not the ones that don't deal with nodes themselves. The following table +summarized this: + +\start \def\yes{$+$} \def\nop{$-$} + +\starttabulate[|T|c|c|] +\HL +\NC \bf function \NC \bf node \NC \bf direct \NC \NR +\HL +\NC \type {copy_list} \NC \yes \NC \yes \NC \NR +\NC \type {copy} \NC \yes \NC \yes \NC \NR +\NC \type {count} \NC \yes \NC \yes \NC \NR +\NC \type {current_attr} \NC \yes \NC \yes \NC \NR +\NC \type {dimensions} \NC \yes \NC \yes \NC \NR +\NC \type {do_ligature_n} \NC \yes \NC \yes \NC \NR +\NC \type {effective_glue} \NC \yes \NC \yes \NC \NR +\NC \type {end_of_math} \NC \yes \NC \yes \NC \NR +\NC \type {family_font} \NC \yes \NC \nop \NC \NR +\NC \type {fields} \NC \yes \NC \nop \NC \NR +\NC \type {first_character} \NC \yes \NC \nop \NC \NR +\NC \type {first_glyph} \NC \yes \NC \yes \NC \NR +\NC \type {flush_list} \NC \yes \NC \yes \NC \NR +\NC \type {flush_node} \NC \yes \NC \yes \NC \NR +\NC \type {free} \NC \yes \NC \yes \NC \NR +\NC \type {getboth} \NC \yes \NC \yes \NC \NR +\NC \type {getbox} \NC \nop \NC \yes \NC \NR +\NC \type {getchar} \NC \yes \NC \yes \NC \NR +\NC \type {getdisc} \NC \yes \NC \yes \NC \NR +\NC \type {getfield} \NC \yes \NC \yes \NC \NR +\NC \type {getfont} \NC \yes \NC \yes \NC \NR +\NC \type {getid} \NC \yes \NC \yes \NC \NR +\NC \type {getleader} \NC \yes \NC \yes \NC \NR +\NC \type {getlist} \NC \yes \NC \yes \NC \NR +\NC \type {getnext} \NC \yes \NC \yes \NC \NR +\NC \type {getprev} \NC \yes \NC \yes \NC \NR +\NC \type {getsubtype} \NC \yes \NC \yes \NC \NR +\NC \type {has_attribute} \NC \yes \NC \yes \NC \NR +\NC \type {has_field} \NC \yes \NC \yes \NC \NR +\NC \type {has_glyph} \NC \yes \NC \yes \NC \NR +\NC \type {hpack} \NC \yes \NC \yes \NC \NR +\NC \type {id} \NC \yes \NC \nop \NC \NR +\NC \type {insert_after} \NC \yes \NC \yes \NC \NR +\NC \type {insert_before} \NC \yes \NC \yes \NC \NR +\NC \type {is_char} \NC \yes \NC \yes \NC \NR +\NC \type {is_direct} \NC \nop \NC \yes \NC \NR +\NC \type {is_node} \NC \yes \NC \yes \NC \NR +\NC \type {kerning} \NC \yes \NC \nop \NC \NR +\NC \type {last_node} \NC \yes \NC \yes \NC \NR +\NC \type {length} \NC \yes \NC \yes \NC \NR +\NC \type {ligaturing} \NC \yes \NC \nop \NC \NR +\NC \type {mlist_to_hlist} \NC \yes \NC \nop \NC \NR +\NC \type {new} \NC \yes \NC \yes \NC \NR +\NC \type {next} \NC \yes \NC \nop \NC \NR +\NC \type {prev} \NC \yes \NC \nop \NC \NR +\NC \type {protect_glyphs} \NC \yes \NC \yes \NC \NR +\NC \type {protrusion_skippable} \NC \yes \NC \yes \NC \NR +\NC \type {remove} \NC \yes \NC \yes \NC \NR +\NC \type {set_attribute} \NC \yes \NC \yes \NC \NR +\NC \type {setboth} \NC \yes \NC \yes \NC \NR +\NC \type {setbox} \NC \yes \NC \yes \NC \NR +\NC \type {setchar} \NC \yes \NC \yes \NC \NR +\NC \type {setdisc} \NC \yes \NC \yes \NC \NR +\NC \type {setfield} \NC \yes \NC \yes \NC \NR +\NC \type {setlink} \NC \yes \NC \yes \NC \NR +\NC \type {setnext} \NC \yes \NC \yes \NC \NR +\NC \type {setprev} \NC \yes \NC \yes \NC \NR +\NC \type {slide} \NC \yes \NC \yes \NC \NR +\NC \type {subtype} \NC \yes \NC \nop \NC \NR +\NC \type {tail} \NC \yes \NC \yes \NC \NR +\NC \type {todirect} \NC \yes \NC \yes \NC \NR +\NC \type {tonode} \NC \yes \NC \yes \NC \NR +\NC \type {tostring} \NC \yes \NC \yes \NC \NR +\NC \type {traverse_id} \NC \yes \NC \yes \NC \NR +\NC \type {traverse} \NC \yes \NC \yes \NC \NR +\NC \type {types} \NC \yes \NC \nop \NC \NR +\NC \type {type} \NC \yes \NC \nop \NC \NR +\NC \type {unprotect_glyphs} \NC \yes \NC \yes \NC \NR +\NC \type {unset_attribute} \NC \yes \NC \yes \NC \NR +\NC \type {usedlist} \NC \yes \NC \yes \NC \NR +\NC \type {vpack} \NC \yes \NC \yes \NC \NR +\NC \type {whatsits} \NC \yes \NC \nop \NC \NR +\NC \type {write} \NC \yes \NC \yes \NC \NR +\stoptabulate + +\stop + +The \type {node.next} and \type {node.prev} functions will stay but for +consistency there are variants called \type {getnext} and \type {getprev}. We had +to use \type {get} because \type {node.id} and \type {node.subtype} are already +taken for providing meta information about nodes. Note: The getters do only basic +checking for valid keys. You should just stick to the keys mentioned in the +sections that describe node properties. + +\stopchapter + +\stopcomponent diff --git a/doc/context/sources/general/manuals/luatex/luatex-style.tex b/doc/context/sources/general/manuals/luatex/luatex-style.tex new file mode 100644 index 000000000..4bb4557b0 --- /dev/null +++ b/doc/context/sources/general/manuals/luatex/luatex-style.tex @@ -0,0 +1,332 @@ +\startenvironment luatex-style + +% I'll clean this up some day. + +\usemodule[abr-02] + +\setuplayout + [height=middle, + width=middle, + backspace=2cm, + topspace=10mm, + bottomspace=10mm, + header=10mm, + footer=10mm, + footerdistance=10mm, + headerdistance=10mm] + +\setuppagenumbering + [alternative=doublesided] + +\setuptolerance + [stretch,tolerant] + +\setuptype + [lines=hyphenated] + +\setuptyping + [lines=hyphenated] + +\setupitemize + [each] + [packed] + +\setupwhitespace + [medium] + +\def\|{\string|} +\def\>{\string>} + +\def\showfields#1{\ctxlua + { + local t = string.split('#1',',') + local r = { } + for _, a in pairs(node.fields(t[1],t[2])) do + if not (a == 'id' or a == 'subtype' or a =='next' or a=='prev') then + table.insert(r,'\\type{'.. a .. '}') + end + end + tex.sprint(table.concat(r, ', ')) + }% +} + +\def\showid#1{\ctxlua + { + local t = string.split('#1',',') + tex.sprint('\\type{'.. node.id(t[1]) .. '}') + if t[2] then + tex.sprint(', \\type{'.. node.subtype(t[2]) .. '}') + end + }% +} + +\starttexdefinition unexpanded todo #1 + \dontleavehmode + \startcolor[red] + \bf<TODO: #1> + \stopcolor +\stoptexdefinition + +\definecolor[blue] [b=.5] +\definecolor[red] [r=.5] +\definecolor[green][g=.5] + +\definecolor[maincolor] [b=.5] +\definecolor[othercolor][r=.5,g=.5] + +% \doifmodeelse {atpragma} { +% +% % \setupbodyfont +% % [lucidaot,10pt] +% +% \setupbodyfont +% [dejavu,10pt] +% +% \setuphead [chapter] [style=\bfd] +% \setuphead [section] [style=\bfb] +% \setuphead [subsection] [style=\bfa] +% \setuphead [subsubsection][style=\bf] +% +% } { +% +% \definetypeface[mainfacenormal] [ss][sans] [iwona] [default] +% \definetypeface[mainfacenormal] [rm][serif][palatino] [default] +% \definetypeface[mainfacenormal] [tt][mono] [modern] [default][rscale=1.1] +% \definetypeface[mainfacenormal] [mm][math] [iwona] [default] +% +% \definetypeface[mainfacemedium] [ss][sans] [iwona-medium][default] +% \definetypeface[mainfacemedium] [rm][serif][palatino] [default] +% \definetypeface[mainfacemedium] [tt][mono] [modern] [default][rscale=1.1] +% \definetypeface[mainfacemedium] [mm][math] [iwona-medium][default] +% +% \setupbodyfont +% [mainfacenormal,10pt] +% +% \setuphead [chapter] [style=\mainfacemedium\bfd] +% \setuphead [section] [style=\mainfacemedium\bfb] +% \setuphead [subsection] [style=\mainfacemedium\bfa] +% \setuphead [subsubsection][style=\mainfacemedium\bf] +% +% } + +\writestatus{luatex manual}{we assume that dejavu math is available} + +\setupbodyfont % assumes dejavu-math + [dejavu,10pt] + +\setuphead [chapter] [style=\bfd] +\setuphead [section] [style=\bfb] +\setuphead [subsection] [style=\bfa] +\setuphead [subsubsection][style=\bf] + +\setuphead [chapter] [color=maincolor] +\setuphead [section] [color=maincolor] +\setuphead [subsection] [color=maincolor] +\setuphead [subsubsection][color=maincolor] + +\definehead + [remark] + [subsubsubject] + +\setupheadertexts + [] + +\definemixedcolumns + [twocolumns] + [n=2, + balance=yes, + before=\blank, + after=\blank] + +\definemixedcolumns + [threecolumns] + [twocolumns] + [n=3] + +\definemixedcolumns + [fourcolumns] + [threecolumns] + [n=4] + +\setuptyping + [color=maincolor] + +\setuptype + [color=maincolor] + +\definetyping + [functioncall] + +\startMPdefinitions + + color luaplanetcolor ; luaplanetcolor := \MPcolor{maincolor} ; + color luaholecolor ; luaholecolor := white ; + numeric luaextraangle ; luaextraangle := 0 ; + numeric luaorbitfactor ; luaorbitfactor := .25 ; + + vardef lualogo = image ( + + % Graphic design by A. Nakonechnyj. Copyright (c) 1998, All rights reserved. + + save d, r, p ; numeric d, r, p ; + + d := sqrt(2)/4 ; r := 1/4 ; p := r/8 ; + + fill fullcircle scaled 1 + withcolor luaplanetcolor ; + draw fullcircle rotated 40.5 scaled (1+r) + dashed evenly scaled p + withpen pencircle scaled (p/2) + withcolor (luaorbitfactor * luaholecolor) ; + fill fullcircle scaled r shifted (d+1/8,d+1/8) + rotated luaextraangle + withcolor luaplanetcolor ; + fill fullcircle scaled r shifted (d-1/8,d-1/8) + withcolor luaholecolor ; + luaorbitfactor := .25 ; + ) enddef ; + +\stopMPdefinitions + +\startuseMPgraphic{luapage} + StartPage ; + + fill Page withcolor \MPcolor{othercolor} ; + + luaorbitfactor := 1 ; + picture p ; p := lualogo xsized (3PaperWidth/5) ; + draw p shifted center Page shifted (0,-ypart center ulcorner p) ; + + StopPage ; +\stopuseMPgraphic + +\starttexdefinition luaextraangle + % we can also just access the last page and so in mp directly + \ctxlua { + context(\lastpage == 0 and 0 or \realfolio*360/\lastpage) + } +\stoptexdefinition + +\startuseMPgraphic{luanumber} + luaextraangle := \luaextraangle; + luaorbitfactor := 0.25 ; + picture p ; p := lualogo ; + setbounds p to boundingbox fullcircle ; + draw p ysized 1cm ; +\stopuseMPgraphic + +\definelayer + [page] + [width=\paperwidth, + height=\paperheight] + +\setupbackgrounds + [leftpage] + [background=page] + +\setupbackgrounds + [rightpage] + [background=page] + +\startsetups pagenumber:right + \setlayerframed + [page] + [preset=rightbottom,offset=1cm] + [frame=off,height=1cm,offset=overlay] + {\useMPgraphic{luanumber}} + \setlayerframed + [page] + [preset=rightbottom,offset=1cm,x=1.5cm] + [frame=off,height=1cm,width=1cm,offset=overlay] + {\pagenumber} + \setlayerframed + [page] + [preset=rightbottom,offset=1cm,x=2.5cm] + [frame=off,height=1cm,offset=overlay] + {\getmarking[chapter]} +\stopsetups + +\startsetups pagenumber:left + \setlayerframed + [page] + [preset=leftbottom,offset=1cm,x=2.5cm] + [frame=off,height=1cm,offset=overlay] + {\getmarking[chapter]} + \setlayerframed + [page] + [preset=leftbottom,offset=1cm,x=1.5cm] + [frame=off,height=1cm,width=1cm,offset=overlay] + {\pagenumber} + \setlayerframed + [page] + [preset=leftbottom,offset=1cm] + [frame=off,height=1cm,offset=overlay] + {\useMPgraphic{luanumber}} +\stopsetups + +\unexpanded\def\nonterminal#1>{\mathematics{\langle\hbox{\rm #1}\rangle}} + +% taco's brainwave -) + +\newcatcodetable\syntaxcodetable + +\unexpanded\def\makesyntaxcodetable + {\begingroup + \catcode`\<=13 \catcode`\|=12 + \catcode`\!= 0 \catcode`\\=12 + \savecatcodetable\syntaxcodetable + \endgroup} + +\makesyntaxcodetable + +\unexpanded\def\startsyntax {\begingroup\catcodetable\syntaxcodetable \dostartsyntax} +\unexpanded\def\syntax {\begingroup\catcodetable\syntaxcodetable \dosyntax} + \let\stopsyntax \relax + +\unexpanded\def\syntaxenvbody#1% + {\par + \tt + \startnarrower + \maincolor #1 + \stopnarrower + \par} + +\unexpanded\def\syntaxbody#1% + {\begingroup + \maincolor \tt #1% + \endgroup} + +\bgroup \catcodetable\syntaxcodetable + +!gdef!dostartsyntax#1\stopsyntax{!let<!nonterminal!syntaxenvbody{#1}!endgroup} +!gdef!dosyntax #1{!let<!nonterminal!syntaxbody{#1}!endgroup} + +!egroup + +% end of wave + +\setupinteraction + [state=start, + focus=standard, + style=, + color=, + contrastcolor=] + +\placebookmarks + [chapter,section,subsection] + +\setuplist + [chapter,section,subsection,subsubsection] + [interaction=all] + +\setuplist + [chapter] + [style=bold, + color=maincolor] + +% Hans doesn't like the bookmarks opening by default so we comment this: +% +% \setupinteractionscreen +% [option=bookmark] + +\stopenvironment diff --git a/doc/context/sources/general/manuals/luatex/luatex-titlepage.tex b/doc/context/sources/general/manuals/luatex/luatex-titlepage.tex new file mode 100644 index 000000000..cf40b8eb8 --- /dev/null +++ b/doc/context/sources/general/manuals/luatex/luatex-titlepage.tex @@ -0,0 +1,69 @@ +\environment luatex-style +\environment luatex-logos + +\startcomponent luatex-titlepage + +\startstandardmakeup + + \switchtobodyfont + [mainfacemedium] + + \definedfont[Bold*default at \the\dimexpr.08\paperheight\relax] \setupinterlinespace + + \setlayer + [page] + {\useMPgraphic{luapage}} + + \setlayerframed + [page] + [preset=middletop, + voffset=.05\paperheight] + [align=middle, + foregroundcolor=blue, + frame=off] + {Lua\TeX\\Reference} + + \definedfont[Bold*default at 24pt] \setupinterlinespace + + \setlayerframed + [page] + [preset=middletop, + voffset=.35\paperheight] + [align=middle, + foregroundcolor=blue, + frame=off] + {\doifsomething{\documentvariable{snapshot}}{snapshot \documentvariable{snapshot}}% + \doifsomething{\documentvariable{beta}} {beta \documentvariable{beta}}} + +\stopstandardmakeup + +\startstandardmakeup + + \start + \raggedleft + \definedfont[Bold*default at 48pt] + \setupinterlinespace + \blue Lua\TeX \endgraf Reference \endgraf Manual \endgraf + \stop + + \vfill + + \definedfont[Bold*default at 12pt] + + \starttabulate[|l|l|] + \NC copyright \EQ Lua\TeX\ development team \NC \NR + \NC more info \EQ www.luatex.org \NC \NR + \NC version \EQ \currentdate \doifsomething{\documentvariable{snapshot}}{(snapshot \documentvariable{snapshot})} \NC \NR + \stoptabulate + +\stopstandardmakeup + +\setupbackgrounds + [leftpage] + [setups=pagenumber:left] + +\setupbackgrounds + [rightpage] + [setups=pagenumber:right] + +\stopcomponent diff --git a/doc/context/sources/general/manuals/luatex/luatex.tex b/doc/context/sources/general/manuals/luatex/luatex.tex new file mode 100644 index 000000000..079c34e61 --- /dev/null +++ b/doc/context/sources/general/manuals/luatex/luatex.tex @@ -0,0 +1,30 @@ +% \tex vs \type vs \syntax vs. \luatex +% \em \it \/ + +\environment luatex-style +\environment luatex-logos + +\dontcomplain + +\startdocument + [beta=0.80.1] + +\component luatex-titlepage + +\startfrontmatter + \component luatex-contents + \component luatex-introduction +\stopfrontmatter + +\startbodymatter + \component luatex-enhancements + \component luatex-lua + \component luatex-languages + \component luatex-fonts + \component luatex-math + \component luatex-nodes + \component luatex-libraries + \component luatex-modifications +\stopbodymatter + +\stopdocument |