diff options
author | Context Git Mirror Bot <phg42.2a@gmail.com> | 2015-10-07 14:15:06 +0200 |
---|---|---|
committer | Context Git Mirror Bot <phg42.2a@gmail.com> | 2015-10-07 14:15:06 +0200 |
commit | ee1c809d23ce322e7946f941545f7e0fa27ae5c6 (patch) | |
tree | 3e32a64b19cf9706e5ff0df289eb56e77571a5ca /doc/context/sources/general/manuals/luatex/luatex-modifications.tex | |
parent | 961f357ef202a44da1f4b315c82ef143a6f51497 (diff) | |
download | context-ee1c809d23ce322e7946f941545f7e0fa27ae5c6.tar.gz |
2015-10-07 12:05:00
Diffstat (limited to 'doc/context/sources/general/manuals/luatex/luatex-modifications.tex')
-rw-r--r-- | doc/context/sources/general/manuals/luatex/luatex-modifications.tex | 499 |
1 files changed, 499 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/luatex/luatex-modifications.tex b/doc/context/sources/general/manuals/luatex/luatex-modifications.tex new file mode 100644 index 000000000..630528bec --- /dev/null +++ b/doc/context/sources/general/manuals/luatex/luatex-modifications.tex @@ -0,0 +1,499 @@ +\environment luatex-style +\environment luatex-logos + +\startcomponent luatex-modifications + +\startchapter[reference=modifications,title={Modifications}] + +\startsection[title=The merged engines] + +\startsubsection[title=The need for change] + +The first version of \LUATEX\ only had a few extra primitives and it was largely +the same as \PDFTEX. Then we merged substantial parts of \ALEPH\ into the code +and got more primitives. When we got more stable teh decision was made to clean +up the rather hybrid nature of the program. This means that some primnitives have +been promoted to core primitives, often with a different name, and that others +were removed. This made it possible to start cleaning up the code base. We will +describe most in following paragraphs. + +Besides the expected changes caused by new functionality, there are a number of +not|-|so|-|expected changes. These are sometimes a side|-|effect of a new +(conflicting) feature, or, more often than not, a change neccessary to clean up +the internal interfaces. These will also be mentioned. + +\stopsubsection + +\startsubsection[title=Changes from \TEX\ 3.1415926] + +Of course it all starts with traditional \TEX. Even if we started with \PDFTEX, +most still comes from the original. But we divert a bit. + +\startitemize + +\startitem + The current code base is written in \CCODE, not \PASCAL. We use \CWEB\ + when possible. +\stopitem + +\startitem + See \in {chapter} [languages] for many small changes related to paragraph + building, language handling and hyphenation. The most important change is + that adding a brace group in the middle of a word (like in \type {of{}fice}) + does not prevent ligature creation. +\stopitem + +\startitem + There is no pool file, all strings are embedded during compilation. +\stopitem + +\startitem + The specifier \type {plus 1 fillll} does not generate an error. The extra + \quote{l} is simply typeset. +\stopitem + +\startitem + The upper limit to \type {\endlinechar} and \type {\newlinechar} is 127. +\stopitem + +\startitem + The hz optimization code has been partially redone so that we no longer need + to create extra font instances. The front- and backend have been decoupled and + more efficient (\PDF) code is generated. +\stopitem + +\stopitemize + +\stopsubsection + +\startsubsection[title=Changes from \ETEX\ 2.2] + +Being the de factor standard extension of course we provide the \ETEX\ +functionality, but with a few small adaptions. + +\startitemize + +\startitem + The \ETEX\ functionality is always present and enabled so the prepended + asterisk or \type {-etex} switch for \INITEX\ is not needed. +\stopitem + +\startitem + The \TEXXET\ extension is not present, so the primitives \type + {\TeXXeTstate}, \type {\beginR}, \type {\beginL}, \type {\endR} and \type + {\endL} are missing. +\stopitem + +\startitem + Some of the tracing information that is output by \ETEX's \type + {\tracingassigns} and \type {\tracingrestores} is not there. +\stopitem + +\startitem + Register management in \LUATEX\ uses the \ALEPH\ model, so the maximum value + is 65535 and the implementation uses a flat array instead of the mixed + flat|\&|sparse model from \ETEX. +\stopitem + +\startitem + The \type {\savinghyphcodes} command is a no|-|op. \in {Chapter} [languages] + explains why. +\stopitem + +\startitem + When kpathsea is used to find files, \LUATEX\ uses the \type {ofm} file + format to search for font metrics. In turn, this means that \LUATEX\ looks at + the \type {OFMFONTS} configuration variable (like \OMEGA\ and \ALEPH) instead + of \type {TFMFONTS} (like \TEX\ and \PDFTEX). Likewise for virtual fonts + (\LUATEX\ uses the variable \type {OVFFONTS} instead of \type {VFFONTS}). +\stopitem + +\stopitemize + +\stopsubsection + +\startsubsection[title=Changes from \PDFTEX\ 1.40] + +Because we want to produce \PDF\ the most natural starting point was the popular +\PDFTEX\ program. We inherit the stable features, dropped most of the +experimental code and promoted some functionality to core \LUATEX\ functionality +which in turn triggered renaming primitives. + +\startitemize + +\startitem + The (experimental) support for snap nodes has been removed, because it is + much more natural to build this functionality on top of node processing and + attributes. The associated primitives that are now gone are: \type + {\pdfsnaprefpoint}, \type {\pdfsnapy}, and \type {\pdfsnapycomp}. +\stopitem + +\startitem + The (experimental) support for specialized spacing around nodes has also been + removed. The associated primitives that are now gone are: \type + {\pdfadjustinterwordglue}, \type {\pdfprependkern}, and \type {\pdfappendkern}, as + well as the five supporting primitives \type {\knbscode}, \type {\stbscode}, \type + {\shbscode}, \type {\knbccode}, and \type {\knaccode}. +\stopitem + +\startitem + A number of \quote {pdftex primitives} have been removed as they can be + implemented using \LUA: + + \start \raggedright + \type {\pdfelapsedtime}, \type {\pdfescapehex}, \type {\pdfescapename}, \type + {\pdfescapestring}, \type {\pdffiledump}, \type {\pdffilemoddate}, \type + {\pdffilesize}, \type {\pdfforcepagebox}, \type {\pdflastmatch}, \type + {\pdfmatch}, \type {\pdfmdfivesum}, \type {\pdfmovechars}, \type + {\pdfoptionalwaysusepdfpagebox}, \type {\pdfoptionpdfinclusionerrorlevel}, + \type {\pdfresettimer}, \type {\pdfshellescape}, \type {\pdfstrcmp} and \type + {\pdfunescapehex} + \par \stop +\stopitem + +\startitem + The version related primitives \type {\pdftexbanner}, \type {\pdftexversion} + and \type {\pdftexrevision} are no longer present as there is no longer a + strict relationship with \PDFTEX\ development. +\stopitem + +\startitem + The experimental snapper mechanism has been removed and therefore also the + primitives: + + \start \raggedright + \type {\pdfignoreddimen}, \type {\pdffirstlineheight}, \type + {\pdfeachlineheight}, \type {\pdfeachlinedepth} and \type + {\pdflastlinedepth} + \par \stop +\stopitem + +\startitem + The experimental primitives \type {\primitive}, \type {\ifprimitive}, \type + {\ifabsnum} and \type {\ifabsdim} are promoted to core primitives. The \type + {\pdf*} prefixed originals are not available. +\stopitem + +\startitem + The \PNG\ transparency fix from 1.40.6 is not applied as high|-|level + support is pending. +\stopitem + +\startitem + Two extra token lists are provides, \type {\pdfxformresources} and \type + {\pdfxformattr}, as an alternative to \type {\pdfxform} keywords. +\stopitem + +\startitem + The current version of \LUATEX\ no longer replaces and|/|or merges fonts in + embedded pdf files with fonts of the enveloping \PDF\ document. This + regression may be temporary, depending on how the rewritten font backend will + look like. +\stopitem + +\startitem + The primitives \type {\pdfpagewidth} and \type {\pdfpageheight} have been removed + because \type {\pagewidth} and \type {\pageheight} have that purpose. +\stopitem + +\startitem + The primitives \type {\pdfnormaldeviate}, \type {\pdfuniformdeviate}, \type + {\pdfsetrandomseed} and \type {\pdfrandomseed} have been promoted to core + primitives without \type {pdf} prefix so the original commands are no longer + recognized. +\stopitem + +\startitem + The primitives \type {\ifincsname}, \type {\expanded} and \type {\quitvmode} are now + core primitives. +\stopitem + +\startitem + As the hz and protrusion mechanism are part of the core the related + primitives \type {\lpcode}, \type {\rpcode}, \type {\efcode}, \type + {\leftmarginkern}, \type {\rightmarginkern} are promoted to core primitives. The + two commands \type {\protrudechars} and \type {\adjustspacing} replace their + prefixed with \type {\pdf} originals. +\stopitem + +\startitem + The \type {\tagcode} primitive is promoted to core primitive. +\stopitem + +\startitem + The \type {\letterspacefont} feature is now part of the core but will not be + changed (improved). We just provide it for legacy use. +\stopitem + +\startitem + The \type {\pdfnoligatures} primitive is now \type {\ignoreligaturesinfont}. +\stopitem + +\startitem + The \type {\pdffontexpand} primitive is now \type {\expandglyphsinfont}. +\stopitem + +\startitem + Because position tracking is also available in \DVI\ mode the + \type {\savepos}, \type {\lastxpos} and \type {\lastypos} commands now + replace their \type {pdf} prefixed originals. +\stopitem + +\startitem + Candidates for removal are \type {\pdfcolorstackinit} and \type + {\pdfcolorstack}. +\stopitem + +\startitem + Candidates for replacement are \type {\pdfoutput} (\type {\outputmode}) and + \type {\pdfmatrix} (something with a normal syntax). +\stopitem + +\stopitemize + +\stopsubsection + +\startsubsection[title=Changes from \ALEPH\ RC4] + +Because we wanted proper directional typesetting the \ALEPH\ mechanisms looked +most attractive. These are rather close to the ones provided by \OMEGA, so what +we say next applies to both these programs. + +\startitemize + +\startitem + The extended 16-bit math primitives (\type {\omathcode} etc.) have been + removed. +\stopitem + +\startitem + The \OCP\ processing is no longer supported at all. As a consequence, the + following primitives have been removed: + + \start \raggedright + \type {\ocp}, \type {\externalocp}, \type {\ocplist}, \type {\pushocplist}, + \type {\popocplist}, \type {\clearocplists}, \type {\addbeforeocplist}, \type + {\addafterocplist}, \type {\removebeforeocplist}, \type {\removeafterocplist} + and \type {\ocptracelevel} + \par \stop +\stopitem + +\startitem + \LUATEX\ only understands 4~of the 16~direction specifiers of \ALEPH: \type + {TLT} (latin), \type {TRT} (arabic), \type {RTT} (cjk), \type {LTL} + (mongolian). All other direction specifiers generate an error. +\stopitem + +\startitem + The input translations from \ALEPH\ are not implemented, the related + primitives are not available: + + \start \raggedright + \type {\DefaultInputMode}, \type {\noDefaultInputMode}, \type {\noInputMode}, + \type {\InputMode}, \type {\DefaultOutputMode}, \type {\noDefaultOutputMode}, + \type {\noOutputMode}, \type {\OutputMode}, \type {\DefaultInputTranslation}, + \type {\noDefaultInputTranslation}, \type {\noInputTranslation}, \type + {\InputTranslation}, \type {\DefaultOutputTranslation}, \type + {\noDefaultOutputTranslation}, \type {\noOutputTranslation} and \type + {\OutputTranslation} + \par \stop +\stopitem + +\startitem + Several bugs hav ebeen fixed. The \type {\hoffset} bug when \type {\pagedir TRT} + is gone, removing the need for an explicit fix to \type {\hoffset}. Also bug + causing \type {\fam} to fail for family numbers above 15 is fixed. A fair amount + of other minor bugs are fixed as well, most of these related to \type + {\tracingcommands} output. +\stopitem + +\startitem + The scanner for direction specifications now allows an optional space after + the direction is completely parsed. +\stopitem + +\startitem + The \type {^^} notation can come in five and six item repetitions also, to + insert characters that do not fit in the BMP. +\stopitem + +\startitem + Glues {\it immediately after} direction change commands are not legal + breakpoints. +\stopitem + +\startitem + Several mechanisms that need to be right|-|to|-|left aware have been + improved. For instance placement of formula numbers. +\stopitem + +\startitem + The page dimension related primitives \type {\pagewidth} and \type {\pageheight} have + been promoted to core primitives. +\stopitem + +\startitem + The primitives \type {\charwd}, \type {\charht}, \type {\chardp} and \type {\charit} + have been removes as we have the \ETEX\ variants \type {\fontchar*}. +\stopitem + +\startitem + The two dimension registers \type {\pagerightoffset} and \type + {\pagebottomoffset} are now core primitives. +\stopitem + +\startitem + The direction related primitives \type {\pagedir}, \type {\bodydir}, \type + {\pardir}, \type {\textdir}, \type {\mathdir} and \type {\boxdir} are now + core primitives. +\stopitem + +\startitem + The promotion of primitives to core primitives as well as the removed of all + others mean that the initialization namespace \type {aleph} is gone. +\stopitem + +\stopitemize + +\stopsubsection + +\startsubsection[title=Changes from standard \WEBC] + +The compilation framework is \WEBC\ and we keep using that but without the +\PASCAL\ to \CCODE\ step. This framework also provides some common features that +deal with reading bytes from files and locating files in \TDS. This is what we do +different: + +\startitemize + +\startitem + There is no mltex support. +\stopitem + +\startitem + There is no enctex support. +\stopitem + +\startitem + The following commandline switches are silently ignored, even in non|-|\LUA\ + mode: \type {-8bit}, \type {-translate-file}, \type {-mltex}, \type {-enc} + and \type {-etex}. +\stopitem + +\startitem + The \type {\openout} whatsits are not written to the log file. +\stopitem + +\startitem + Some of the so|-|called web2c extensions are hard to set up in non|-|\KPSE\ + mode because \type {texmf.cnf} is not read: \type {shell-escape} is off (but + that is not a problem because of \LUA's \type {os.execute}), and the paranoia + checks on \type {openin} and \type {openout} do not happen (however, it is + easy for a \LUA\ script to do this itself by overloading \type {io.open}). +\stopitem + +\startitem + The \quote{E} option does not do anything useful. +\stopitem + +\stopitemize + +\stopsubsection + +\stopsection + +\startsection[title=Implementation notes] + +\startsubsection[title=Memory allocation] + +The single internal memory heap that traditional \TEX\ used for tokens and nodes +is split into two separate arrays. Each of these will grow dynamically when +needed. + +The \type {texmf.cnf} settings related to main memory are no longer used (these +are: \type {main_memory}, \type {mem_bot}, \type {extra_mem_top} and \type +{extra_mem_bot}). \quote {Out of main memory} errors can still occur, but the +limiting factor is now the amount of RAM in your system, not a predefined limit. + +Also, the memory (de)allocation routines for nodes are completely rewritten. The +relevant code now lives in the C file \type {texnode.c}, and basically uses a +dozen or so \quote {avail} lists instead of a doubly|-|linked model. An extra +function layer is added so that the code can ask for nodes by type instead of +directly requisitioning a certain amount of memory words. + +Because of the split into two arrays and the resulting differences in the data +structures, some of the macros have been duplicated. For instance, there are now +\type {vlink} and \type {vinfo} as well as \type {token_link} and \type +{token_info}. All access to the variable memory array is now hidden behind a +macro called \type {vmem}. + +The implementation of the growth of two arrays (via reallocation) introduces a +potential pitfall: the memory arrays should never be used as the left hand side +of a statement that can modify the array in question. + +The input line buffer and pool size are now also reallocated when needed, and the +\type {texmf.cnf} settings \type {buf_size} and \type {pool_size} are silently +ignored. + +\stopsubsection + +\startsubsection[title=Sparse arrays] + +The \type {\mathcode}, \type {\delcode}, \type {\catcode}, \type {\sfcode}, \type {\lccode} +and \type {\uccode} tables are now sparse arrays that are implemented in~\CCODE. +They are no longer part of the \TEX\ \quote {equivalence table} and because each +had 1.1 million entries with a few memory words each, this makes a major +difference in memory usage. + +The \type {\catcode}, \type {\sfcode}, \type {\lccode} and \type {\uccode} assignments do +not yet show up when using the etex tracing routines \type {\tracingassigns} and +\type {\tracingrestores} (code simply not written yet). + +A side|-|effect of the current implementation is that \type {\global} is now more +expensive in terms of processing than non|-|global assignments. + +See \type {mathcodes.c} and \type {textcodes.c} if you are interested in the +details. + +Also, the glyph ids within a font are now managed by means of a sparse array and +glyph ids can go up to index $2^{21}-1$. + +\stopsubsection + +\startsubsection[title=Simple single-character csnames] + +Single|-|character commands are no longer treated specially in the internals, +they are stored in the hash just like the multiletter csnames. + +The code that displays control sequences explicitly checks if the length is one +when it has to decide whether or not to add a trailing space. + +Active characters are internally implemented as a special type of multi|-|letter +control sequences that uses a prefix that is otherwise impossible to obtain. + +\stopsubsection + +\startsubsection[title=Compressed format] + +The format is passed through zlib, allowing it to shrink to roughly half of the +size it would have had in uncompressed form. This takes a bit more \CPU\ cycles +but much less disk \IO, so it should still be faster. + +\stopsubsection + +\startsubsection[title=Binary file reading] + +All of the internal code is changed in such a way that if one of the \type +{read_xxx_file} callbacks is not set, then the file is read by a C function using +basically the same convention as the callback: a single read into a buffer big +enough to hold the entire file contents. While this uses more memory than the +previous code (that mostly used \type {getc} calls), it can be quite a bit faster +(depending on your I/O subsystem). + +\stopsubsection + +\stopsection + +\stopchapter + +\stopcomponent |