2015-10-07 12:05:00

author: Context Git Mirror Bot <phg42.2a@gmail.com> 2015-10-07 14:15:06 +0200
committer: Context Git Mirror Bot <phg42.2a@gmail.com> 2015-10-07 14:15:06 +0200
commit: ee1c809d23ce322e7946f941545f7e0fa27ae5c6 (patch)
tree: 3e32a64b19cf9706e5ff0df289eb56e77571a5ca /doc/context/sources/general/manuals/luatex/luatex-modifications.tex
parent: 961f357ef202a44da1f4b315c82ef143a6f51497 (diff)
download: context-ee1c809d23ce322e7946f941545f7e0fa27ae5c6.tar.gz
1 files changed, 499 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/luatex/luatex-modifications.tex b/doc/context/sources/general/manuals/luatex/luatex-modifications.tex
new file mode 100644
index 000000000..630528bec
--- /dev/null
+++ b/doc/context/sources/general/manuals/luatex/luatex-modifications.tex
@@ -0,0 +1,499 @@
+\environment luatex-style
+\environment luatex-logos
+
+\startcomponent luatex-modifications
+
+\startchapter[reference=modifications,title={Modifications}]
+
+\startsection[title=The merged engines]
+
+\startsubsection[title=The need for change]
+
+The first version of \LUATEX\ only had a few extra primitives and it was largely
+the same as \PDFTEX. Then we merged substantial parts of \ALEPH\ into the code
+and got more primitives. When we got more stable teh decision was made to clean
+up the rather hybrid nature of the program. This means that some primnitives have
+been promoted to core primitives, often with a different name, and that others
+were removed. This made it possible to start cleaning up the code base. We will
+describe most in following paragraphs.
+
+Besides the expected changes caused by new functionality, there are a number of
+not|-|so|-|expected changes. These are sometimes a side|-|effect of a new
+(conflicting) feature, or, more often than not, a change neccessary to clean up
+the internal interfaces. These will also be mentioned.
+
+\stopsubsection
+
+\startsubsection[title=Changes from \TEX\ 3.1415926]
+
+Of course it all starts with traditional \TEX. Even if we started with \PDFTEX,
+most still comes from the original. But we divert a bit.
+
+\startitemize
+
+\startitem
+    The current code base is written in \CCODE, not \PASCAL. We use \CWEB\
+    when possible.
+\stopitem
+
+\startitem
+    See \in {chapter} [languages] for many small changes related to paragraph
+    building, language handling and hyphenation. The most important change is
+    that adding a brace group in the middle of a word (like in \type {of{}fice})
+    does not prevent ligature creation.
+\stopitem
+
+\startitem
+    There is no pool file, all strings are embedded during compilation.
+\stopitem
+
+\startitem
+    The specifier \type {plus 1 fillll} does not generate an error. The extra
+    \quote{l} is simply typeset.
+\stopitem
+
+\startitem
+    The upper limit to \type {\endlinechar} and \type {\newlinechar} is 127.
+\stopitem
+
+\startitem
+    The hz optimization code has been partially redone so that we no longer need
+    to create extra font instances. The front- and backend have been decoupled and
+    more efficient (\PDF) code is generated.
+\stopitem
+
+\stopitemize
+
+\stopsubsection
+
+\startsubsection[title=Changes from \ETEX\ 2.2]
+
+Being the de factor standard extension of course we provide the \ETEX\
+functionality, but with a few small adaptions.
+
+\startitemize
+
+\startitem
+    The \ETEX\ functionality is always present and enabled so the prepended
+    asterisk or \type {-etex} switch for \INITEX\ is not needed.
+\stopitem
+
+\startitem
+    The \TEXXET\ extension is not present, so the primitives \type
+    {\TeXXeTstate}, \type {\beginR}, \type {\beginL}, \type {\endR} and \type
+    {\endL} are missing.
+\stopitem
+
+\startitem
+    Some of the tracing information that is output by \ETEX's \type
+    {\tracingassigns} and \type {\tracingrestores} is not there.
+\stopitem
+
+\startitem
+    Register management in \LUATEX\ uses the \ALEPH\ model, so the maximum value
+    is 65535 and the implementation uses a flat array instead of the mixed
+    flat|\&|sparse model from \ETEX.
+\stopitem
+
+\startitem
+    The \type {\savinghyphcodes} command is a no|-|op. \in {Chapter} [languages]
+    explains why.
+\stopitem
+
+\startitem
+    When kpathsea is used to find files, \LUATEX\ uses the \type {ofm} file
+    format to search for font metrics. In turn, this means that \LUATEX\ looks at
+    the \type {OFMFONTS} configuration variable (like \OMEGA\ and \ALEPH) instead
+    of \type {TFMFONTS} (like \TEX\ and \PDFTEX). Likewise for virtual fonts
+    (\LUATEX\ uses the variable \type {OVFFONTS} instead of \type {VFFONTS}).
+\stopitem
+
+\stopitemize
+
+\stopsubsection
+
+\startsubsection[title=Changes from \PDFTEX\ 1.40]
+
+Because we want to produce \PDF\ the most natural starting point was the popular
+\PDFTEX\ program. We inherit the stable features, dropped most of the
+experimental code and promoted some functionality to core \LUATEX\ functionality
+which in turn triggered renaming primitives.
+
+\startitemize
+
+\startitem
+    The (experimental) support for snap nodes has been removed, because it is
+    much more natural to build this functionality on top of node processing and
+    attributes. The associated primitives that are now gone are: \type
+    {\pdfsnaprefpoint}, \type {\pdfsnapy}, and \type {\pdfsnapycomp}.
+\stopitem
+
+\startitem
+    The (experimental) support for specialized spacing around nodes has also been
+    removed. The associated primitives that are now gone are: \type
+    {\pdfadjustinterwordglue}, \type {\pdfprependkern}, and \type {\pdfappendkern}, as
+    well as the five supporting primitives \type {\knbscode}, \type {\stbscode}, \type
+    {\shbscode}, \type {\knbccode}, and \type {\knaccode}.
+\stopitem
+
+\startitem
+    A number of \quote {pdftex primitives} have been removed as they can be
+    implemented using \LUA:
+
+    \start \raggedright
+    \type {\pdfelapsedtime}, \type {\pdfescapehex}, \type {\pdfescapename}, \type
+    {\pdfescapestring}, \type {\pdffiledump}, \type {\pdffilemoddate}, \type
+    {\pdffilesize}, \type {\pdfforcepagebox}, \type {\pdflastmatch}, \type
+    {\pdfmatch}, \type {\pdfmdfivesum}, \type {\pdfmovechars}, \type
+    {\pdfoptionalwaysusepdfpagebox}, \type {\pdfoptionpdfinclusionerrorlevel},
+    \type {\pdfresettimer}, \type {\pdfshellescape}, \type {\pdfstrcmp} and \type
+    {\pdfunescapehex}
+    \par \stop
+\stopitem
+
+\startitem
+    The version related primitives \type {\pdftexbanner}, \type {\pdftexversion}
+    and \type {\pdftexrevision} are no longer present as there is no longer a
+    strict relationship with \PDFTEX\ development.
+\stopitem
+
+\startitem
+    The experimental snapper mechanism has been removed and therefore also the
+    primitives:
+
+    \start \raggedright
+    \type {\pdfignoreddimen}, \type {\pdffirstlineheight}, \type
+    {\pdfeachlineheight}, \type {\pdfeachlinedepth} and \type
+    {\pdflastlinedepth}
+    \par \stop
+\stopitem
+
+\startitem
+    The experimental primitives \type {\primitive}, \type {\ifprimitive}, \type
+    {\ifabsnum} and \type {\ifabsdim} are promoted to core primitives. The \type
+    {\pdf*} prefixed originals are not available.
+\stopitem
+
+\startitem
+    The \PNG\ transparency fix from 1.40.6 is not applied as high|-|level
+    support is pending.
+\stopitem
+
+\startitem
+    Two extra token lists are provides, \type {\pdfxformresources} and \type
+    {\pdfxformattr}, as an alternative to \type {\pdfxform} keywords.
+\stopitem
+
+\startitem
+    The current version of \LUATEX\ no longer replaces and|/|or merges fonts in
+    embedded pdf files with fonts of the enveloping \PDF\ document. This
+    regression may be temporary, depending on how the rewritten font backend will
+    look like.
+\stopitem
+
+\startitem
+    The primitives \type {\pdfpagewidth} and \type {\pdfpageheight} have been removed
+    because \type {\pagewidth} and \type {\pageheight} have that purpose.
+\stopitem
+
+\startitem
+    The primitives \type {\pdfnormaldeviate}, \type {\pdfuniformdeviate}, \type
+    {\pdfsetrandomseed} and \type {\pdfrandomseed} have been promoted to core
+    primitives without \type {pdf} prefix so the original commands are no longer
+    recognized.
+\stopitem
+
+\startitem
+    The primitives \type {\ifincsname}, \type {\expanded} and \type {\quitvmode} are now
+    core primitives.
+\stopitem
+
+\startitem
+    As the hz and protrusion mechanism are part of the core the related
+    primitives \type {\lpcode}, \type {\rpcode}, \type {\efcode}, \type
+    {\leftmarginkern}, \type {\rightmarginkern} are promoted to core primitives. The
+    two commands \type {\protrudechars} and \type {\adjustspacing} replace their
+    prefixed with \type {\pdf} originals.
+\stopitem
+
+\startitem
+    The \type {\tagcode} primitive is promoted to core primitive.
+\stopitem
+
+\startitem
+    The \type {\letterspacefont} feature is now part of the core but will not be
+    changed (improved). We just provide it for legacy use.
+\stopitem
+
+\startitem
+    The \type {\pdfnoligatures} primitive is now \type {\ignoreligaturesinfont}.
+\stopitem
+
+\startitem
+    The \type {\pdffontexpand} primitive is now \type {\expandglyphsinfont}.
+\stopitem
+
+\startitem
+    Because position tracking is also available in \DVI\ mode the
+    \type {\savepos}, \type {\lastxpos} and \type {\lastypos} commands now
+    replace their \type {pdf} prefixed originals.
+\stopitem
+
+\startitem
+    Candidates for removal are \type {\pdfcolorstackinit} and \type
+    {\pdfcolorstack}.
+\stopitem
+
+\startitem
+    Candidates for replacement are \type {\pdfoutput} (\type {\outputmode}) and
+    \type {\pdfmatrix} (something with a normal syntax).
+\stopitem
+
+\stopitemize
+
+\stopsubsection
+
+\startsubsection[title=Changes from \ALEPH\ RC4]
+
+Because we wanted proper directional typesetting the \ALEPH\ mechanisms looked
+most attractive. These are rather close to the ones provided by \OMEGA, so what
+we say next applies to both these programs.
+
+\startitemize
+
+\startitem
+    The extended 16-bit math primitives (\type {\omathcode} etc.) have been
+    removed.
+\stopitem
+
+\startitem
+    The \OCP\ processing is no longer supported at all. As a consequence, the
+    following primitives have been removed:
+
+    \start \raggedright
+    \type {\ocp}, \type {\externalocp}, \type {\ocplist}, \type {\pushocplist},
+    \type {\popocplist}, \type {\clearocplists}, \type {\addbeforeocplist}, \type
+    {\addafterocplist}, \type {\removebeforeocplist}, \type {\removeafterocplist}
+    and \type {\ocptracelevel}
+    \par \stop
+\stopitem
+
+\startitem
+    \LUATEX\ only understands 4~of the 16~direction specifiers of \ALEPH: \type
+    {TLT} (latin), \type {TRT} (arabic), \type {RTT} (cjk), \type {LTL}
+    (mongolian). All other direction specifiers generate an error.
+\stopitem
+
+\startitem
+    The input translations from \ALEPH\ are not implemented, the related
+    primitives are not available:
+
+    \start \raggedright
+    \type {\DefaultInputMode}, \type {\noDefaultInputMode}, \type {\noInputMode},
+    \type {\InputMode}, \type {\DefaultOutputMode}, \type {\noDefaultOutputMode},
+    \type {\noOutputMode}, \type {\OutputMode}, \type {\DefaultInputTranslation},
+    \type {\noDefaultInputTranslation}, \type {\noInputTranslation}, \type
+    {\InputTranslation}, \type {\DefaultOutputTranslation}, \type
+    {\noDefaultOutputTranslation}, \type {\noOutputTranslation} and \type
+    {\OutputTranslation}
+    \par \stop
+\stopitem
+
+\startitem
+    Several bugs hav ebeen fixed. The \type {\hoffset} bug when \type {\pagedir TRT}
+    is gone, removing the need for an explicit fix to \type {\hoffset}. Also bug
+    causing \type {\fam} to fail for family numbers above 15 is fixed. A fair amount
+    of other minor bugs are fixed as well, most of these related to \type
+    {\tracingcommands} output.
+\stopitem
+
+\startitem
+    The scanner for direction specifications now allows an optional space after
+    the direction is completely parsed.
+\stopitem
+
+\startitem
+    The \type {^^} notation can come in five and six item repetitions also, to
+    insert characters that do not fit in the BMP.
+\stopitem
+
+\startitem
+    Glues {\it immediately after} direction change commands are not legal
+    breakpoints.
+\stopitem
+
+\startitem
+    Several mechanisms that need to be right|-|to|-|left aware have been
+    improved. For instance placement of formula numbers.
+\stopitem
+
+\startitem
+    The page dimension related primitives \type {\pagewidth} and \type {\pageheight} have
+    been promoted to core primitives.
+\stopitem
+
+\startitem
+    The primitives \type {\charwd}, \type {\charht}, \type {\chardp} and \type {\charit}
+    have been removes as we have the \ETEX\ variants \type {\fontchar*}.
+\stopitem
+
+\startitem
+    The two dimension registers \type {\pagerightoffset} and \type
+    {\pagebottomoffset} are now core primitives.
+\stopitem
+
+\startitem
+    The direction related primitives \type {\pagedir}, \type {\bodydir}, \type
+    {\pardir}, \type {\textdir}, \type {\mathdir} and \type {\boxdir} are now
+    core primitives.
+\stopitem
+
+\startitem
+    The promotion of primitives to core primitives as well as the removed of all
+    others mean that the initialization namespace \type {aleph} is gone.
+\stopitem
+
+\stopitemize
+
+\stopsubsection
+
+\startsubsection[title=Changes from standard \WEBC]
+
+The compilation framework is \WEBC\ and we keep using that but without the
+\PASCAL\ to \CCODE\ step. This framework also provides some common features that
+deal with reading bytes from files and locating files in \TDS. This is what we do
+different:
+
+\startitemize
+
+\startitem
+    There is no mltex support.
+\stopitem
+
+\startitem
+    There is no enctex support.
+\stopitem
+
+\startitem
+    The following commandline switches are silently ignored, even in non|-|\LUA\
+    mode: \type {-8bit}, \type {-translate-file}, \type {-mltex}, \type {-enc}
+    and \type {-etex}.
+\stopitem
+
+\startitem
+    The \type {\openout} whatsits are not written to the log file.
+\stopitem
+
+\startitem
+    Some of the so|-|called web2c extensions are hard to set up in non|-|\KPSE\
+    mode because \type {texmf.cnf} is not read: \type {shell-escape} is off (but
+    that is not a problem because of \LUA's \type {os.execute}), and the paranoia
+    checks on \type {openin} and \type {openout} do not happen (however, it is
+    easy for a \LUA\ script to do this itself by overloading \type {io.open}).
+\stopitem
+
+\startitem
+    The \quote{E} option does not do anything useful.
+\stopitem
+
+\stopitemize
+
+\stopsubsection
+
+\stopsection
+
+\startsection[title=Implementation notes]
+
+\startsubsection[title=Memory allocation]
+
+The single internal memory heap that traditional \TEX\ used for tokens and nodes
+is split into two separate arrays. Each of these will grow dynamically when
+needed.
+
+The \type {texmf.cnf} settings related to main memory are no longer used (these
+are: \type {main_memory}, \type {mem_bot}, \type {extra_mem_top} and \type
+{extra_mem_bot}). \quote {Out of main memory} errors can still occur, but the
+limiting factor is now the amount of RAM in your system, not a predefined limit.
+
+Also, the memory (de)allocation routines for nodes are completely rewritten. The
+relevant code now lives in the C file \type {texnode.c}, and basically uses a
+dozen or so \quote {avail} lists instead of a doubly|-|linked model. An extra
+function layer is added so that the code can ask for nodes by type instead of
+directly requisitioning a certain amount of memory words.
+
+Because of the split into two arrays and the resulting differences in the data
+structures, some of the macros have been duplicated. For instance, there are now
+\type {vlink} and \type {vinfo} as well as \type {token_link} and \type
+{token_info}. All access to the variable memory array is now hidden behind a
+macro called \type {vmem}.
+
+The implementation of the growth of two arrays (via reallocation) introduces a
+potential pitfall: the memory arrays should never be used as the left hand side
+of a statement that can modify the array in question.
+
+The input line buffer and pool size are now also reallocated when needed, and the
+\type {texmf.cnf} settings \type {buf_size} and \type {pool_size} are silently
+ignored.
+
+\stopsubsection
+
+\startsubsection[title=Sparse arrays]
+
+The \type {\mathcode}, \type {\delcode}, \type {\catcode}, \type {\sfcode}, \type {\lccode}
+and \type {\uccode} tables are now sparse arrays that are implemented in~\CCODE.
+They are no longer part of the \TEX\ \quote {equivalence table} and because each
+had 1.1 million entries with a few memory words each, this makes a major
+difference in memory usage.
+
+The \type {\catcode}, \type {\sfcode}, \type {\lccode} and \type {\uccode} assignments do
+not yet show up when using the etex tracing routines \type {\tracingassigns} and
+\type {\tracingrestores} (code simply not written yet).
+
+A side|-|effect of the current implementation is that \type {\global} is now more
+expensive in terms of processing than non|-|global assignments.
+
+See \type {mathcodes.c} and \type {textcodes.c} if you are interested in the
+details.
+
+Also, the glyph ids within a font are now managed by means of a sparse array and
+glyph ids can go up to index $2^{21}-1$.
+
+\stopsubsection
+
+\startsubsection[title=Simple single-character csnames]
+
+Single|-|character commands are no longer treated specially in the internals,
+they are stored in the hash just like the multiletter csnames.
+
+The code that displays control sequences explicitly checks if the length is one
+when it has to decide whether or not to add a trailing space.
+
+Active characters are internally implemented as a special type of multi|-|letter
+control sequences that uses a prefix that is otherwise impossible to obtain.
+
+\stopsubsection
+
+\startsubsection[title=Compressed format]
+
+The format is passed through zlib, allowing it to shrink to roughly half of the
+size it would have had in uncompressed form. This takes a bit more \CPU\ cycles
+but much less disk \IO, so it should still be faster.
+
+\stopsubsection
+
+\startsubsection[title=Binary file reading]
+
+All of the internal code is changed in such a way that if one of the \type
+{read_xxx_file} callbacks is not set, then the file is read by a C function using
+basically the same convention as the callback: a single read into a buffer big
+enough to hold the entire file contents. While this uses more memory than the
+previous code (that mostly used \type {getc} calls), it can be quite a bit faster
+(depending on your I/O subsystem).
+
+\stopsubsection
+
+\stopsection
+
+\stopchapter
+
+\stopcomponent
author	Context Git Mirror Bot <phg42.2a@gmail.com>	2015-10-07 14:15:06 +0200
committer	Context Git Mirror Bot <phg42.2a@gmail.com>	2015-10-07 14:15:06 +0200
commit	ee1c809d23ce322e7946f941545f7e0fa27ae5c6 (patch)
tree	3e32a64b19cf9706e5ff0df289eb56e77571a5ca /doc/context/sources/general/manuals/luatex/luatex-modifications.tex
parent	961f357ef202a44da1f4b315c82ef143a6f51497 (diff)
download	context-ee1c809d23ce322e7946f941545f7e0fa27ae5c6.tar.gz