2019-12-30 19:16:00

author: Hans Hagen <pragma@wxs.nl> 2019-12-30 20:42:59 +0100
committer: Context Git Mirror Bot <phg@phi-gamma.net> 2019-12-30 20:42:59 +0100
commit: 54732448eb933607bdcb11a457756741dc4e0b44 (patch)
tree: d0f312dd29af54ee85d89f6d6f242be7ee6b5454 /doc/context/sources/general/manuals/luametatex/luametatex-modifications.tex
parent: ede5a2aae42ff502be35d800e97271cf0bdc889b (diff)
download: context-54732448eb933607bdcb11a457756741dc4e0b44.tar.gz
1 files changed, 440 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/luametatex/luametatex-modifications.tex b/doc/context/sources/general/manuals/luametatex/luametatex-modifications.tex
new file mode 100644
index 000000000..dca6c3781
--- /dev/null
+++ b/doc/context/sources/general/manuals/luametatex/luametatex-modifications.tex
@@ -0,0 +1,440 @@
+% language=uk
+
+\environment luametatex-style
+
+\startcomponent luametatex-modifications
+
+\startchapter[reference=modifications,title={The original engines}]
+
+\startsection[title=The merged engines]
+
+\startsubsection[title=The rationale]
+
+\topicindex {engines}
+\topicindex {history}
+
+The first version of \LUATEX, made by Hartmut after we discussed the possibility
+of an extension language, only had a few extra primitives and it was largely the
+same as \PDFTEX. It was presented to the public in 2005. As part of the Oriental
+\TEX\ project, Taco merged substantial parts of \ALEPH\ into the code and some
+more primitives were added. Then we started more fundamental experiments. After
+many years, when the engine had become more stable, the decision was made to
+clean up the rather hybrid nature of the program. This means that some primitives
+were promoted to core primitives, often with a different name, and that others
+were removed. This also made it possible to start cleaning up the code base. In
+\in {chapter} [enhancements] we discussed some new primitives, here we will cover
+most of the adapted ones.
+
+During more than a decade stepwise new functionality was added and after 10 years
+the more of less stable version 1.0 was presented. But we continued and after
+some 15 years the \LUAMETATEX\ follow up entered its first testing stage. But
+before details about the engine are discussed in successive chapters, we first
+summarize where we started from. Keep in mind that in \LUAMETATEX\ we have a bit
+less than in \LUATEX, so this section differs from the one in the \LUATEX\
+manual.
+
+Besides the expected changes caused by new functionality, there are a number of
+not|-|so|-|expected changes. These are sometimes a side|-|effect of a new
+(conflicting) feature, or, more often than not, a change necessary to clean up
+the internal interfaces. These will also be mentioned.
+
+\stopsubsection
+
+\startsubsection[title=Changes from \TEX\ 3.1415926]
+
+\topicindex {\TEX}
+
+Of course it all starts with traditional \TEX. Even if we started with \PDFTEX,
+most still comes from original Knuthian \TEX. But we divert a bit.
+
+\startitemize
+
+\startitem
+    The current code base is written in \CCODE, not \PASCAL. The original \CWEB\
+    documentation is kept when possible and not wrapped in tagged comments. As a
+    consequence instead of one large file plus change files, we now have multiple
+    files organized in categories like \type {tex}, \type {luaf}, \type
+    {languages}, \type {fonts}, \type {libraries}, etc. There are some artifacts
+    of the conversion to \CCODE, but these got (and get) removed stepwise. The
+    documentation, which is actually comes from the mix of engines (via so called
+    change files) is kept as much as possible. Of course we want to stay as close
+    as possible to the original so that the documentation of the fundamentals
+    behind \TEX\ by Don Knuth still applies. However, because we use \CCODE, some
+    documentation is a bit off. Also, most global variables are now collected in
+    structures, but the original names were kept. There are lots of so called
+    macros too.
+\stopitem
+
+\startitem
+    See \in {chapter} [languages] for many small changes related to paragraph
+    building, language handling and hyphenation. The most important change is
+    that adding a brace group in the middle of a word (like in \type {of{}fice})
+    does not prevent ligature creation. Also, the hyphenation, ligature building
+    and kerning has been split so that we can hook in alternative or extra code
+    wherever we like. There are various options to control discretionary
+    injection and related penalties are now integrated in these nodes. Language
+    information is now bound to glyphs. The number of languages in \LUAMETATEX\
+    is smaller than in \LUATEX.
+\stopitem
+
+\startitem
+    There is no pool file, all strings are embedded during compilation. This also
+    removed some memory constraints. We kept token and node memory management
+    because it is convenient and efficient but parts were reimplemented in order
+    to remove some constraints. Token memory management is largely the same.
+\stopitem
+
+\startitem
+    The specifier \type {plus 1 fillll} does not generate an error. The extra
+    \quote {l} is simply typeset.
+\stopitem
+
+\startitem
+    The upper limit to \prm {endlinechar} and \prm {newlinechar} is 127.
+\stopitem
+
+\startitem
+    Because the backend is not built|-|in, the magnification (\prm {mag})
+    primitive is not doing nothing. A \type {shipout} just discards the content
+    of the given box. The write related primitives have to be implemented in the
+    used macro package using \LUA. None of the \PDFTEX\ derived primitives is
+    present.
+\stopitem
+
+\startitem
+    There is more control over some (formerly hard|-|coded) math properties. In fact,
+    there is a whole extra bit of math related code because we need to deal with
+    \OPENTYPE\ fonts.
+\stopitem
+
+\startitem
+    The \type {\outer} and \type {\long} prefixed are silently ignored. It is
+    permitted to use \type {\par} in math.
+\stopitem
+
+\startitem
+    Because there is no font loader, a \LUA\ variant is free to either support or
+    not the \OMEGA\ \type {ofm} file format. As there are hardly any such fonts
+    it probably makes no sense.
+\stopitem
+
+\startitem
+    The lack of a backend means that some primitives related to it are not
+    implemented. This is no big deal because it is possible to use the scanner
+    library to implement them as needed, which depends on the macro package and
+    backend.
+\stopitem
+
+\startitem
+    When detailed logging is enabled more detail is output with respect to what
+    nodes are involved. This is a side effect of the core nodes having more
+    detailed subtype information. The benefit of more detail wins from any wish
+    to be byte compatible in the logging. One can always write additional logging
+    in \LUA.
+\stopitem
+
+\stopitemize
+
+\stopsubsection
+
+\startsubsection[title=Changes from \ETEX\ 2.2]
+
+\topicindex {\ETEX}
+
+Being the de|-|facto standard extension of course we provide the \ETEX\
+features, but with a few small adaptations.
+
+\startitemize
+
+\startitem
+    The \ETEX\ functionality is always present and enabled so the prepended
+    asterisk or \type {-etex} switch for \INITEX\ is not needed.
+\stopitem
+
+\startitem
+    The \TEXXET\ extension is not present, so the primitives \type
+    {\TeXXeTstate}, \type {\beginR}, \type {\beginL}, \type {\endR} and \type
+    {\endL} are missing. Instead we used the \OMEGA|/|\ALEPH\ approach to
+    directionality as starting point, albeit it has been changed quite a bit,
+    so that we're probably not that far from \TEXXET.
+\stopitem
+
+\startitem
+    Some of the tracing information that is output by \ETEX's \prm
+    {tracingassigns} and \prm {tracingrestores} is not there. Also keep in mind
+    that tracing doesn't involve what \LUA\ does.
+\stopitem
+
+\startitem
+    Register management in \LUAMETATEX\ uses the \OMEGA|/|\ALEPH\ model, so the
+    maximum value is 65535 and the implementation uses a flat array instead of
+    the mixed flat & sparse model from \ETEX.
+\stopitem
+
+\startitem
+    Because we don't use change files on top of original \TEX, the integration of
+    \ETEX\ functionality is bit more natural, code wise.
+\stopitem
+
+\stopitemize
+
+\stopsubsection
+
+\startsubsection[title=Changes from \PDFTEX\ 1.40]
+
+\topicindex {\PDFTEX}
+
+Because we want to produce \PDF\ the most natural starting point was the popular
+\PDFTEX\ program. We inherit the stable features, dropped most of the
+experimental code and promoted some functionality to core \LUATEX\ functionality
+which in turn triggered renaming primitives. However, as the backend was dropped,
+not that much from \PDFTEX\ is present any more. Basically all we now inherit
+from \PDFTEX\ is expansion and protrusion but even that has been adapted. So
+don't expect \LUAMETATEX\ to be compatible.
+
+\startitemize
+
+\startitem
+    The experimental primitives \lpr {ifabsnum} and \lpr {ifabsdim} have been
+    promoted to core primitives.
+\stopitem
+
+\startitem
+    The primitives \lpr {ifincsname}, \lpr {expanded} and \lpr {quitvmode}
+    have become core primitives.
+\stopitem
+
+\startitem
+    As the hz (expansion) and protrusion mechanism are part of the core the
+    related primitives \lpr {lpcode}, \lpr {rpcode}, \lpr {efcode}, \lpr
+    {leftmarginkern}, \lpr {rightmarginkern} are promoted to core primitives. The
+    two commands \lpr {protrudechars} and \lpr {adjustspacing} control these
+    processes.
+\stopitem
+
+\startitem
+    In \LUAMETATEX\ three extra primitives can be used to overload the font
+    specific settings: \lpr {adjustspacingstep} (max: 100), \lpr
+    {adjustspacingstretch} (max: 1000) and \lpr {adjustspacingshrink} (max: 500).
+\stopitem
+
+\startitem
+    The hz optimization code has been partially redone so that we no longer need
+    to create extra font instances. The front- and backend have been decoupled
+    and the glyph and kern nodes carry the used values. In \LUATEX\ that made a
+    more efficient generation of \PDF\ code possible. It also resulted in much
+    cleaner code. The backend code is gone, but of course the information is
+    still carried around.
+\stopitem
+
+\startitem
+    When \lpr {adjustspacing} has value~2, hz optimization will be applied to
+    glyphs and kerns. When the value is~3, only glyphs will be treated. A value
+    smaller than~2 disables this feature. With value of~1, font expansion is
+    applied after \TEX's normal paragraph breaking routines have broken the
+    paragraph into lines. In this case, line breaks are identical to standard
+    \TEX\ behavior (as with \PDFTEX). But \unknown\ this is a left|-|over from
+    the early days of \PDFTEX\ when this feature was part of a research topic. At
+    some point level~1 might be dropped from \LUAMETATEX.
+\stopitem
+
+\startitem
+    When \lpr {protrudechars} has a value larger than zero characters at the edge
+    of a line can be made to hang out. A value of~2 will take the protrusion into
+    account when breaking a paragraph into lines. A value of~3 will try to deal
+    with right|-|to|-|left rendering; this is a still experimental feature.
+\stopitem
+
+\startitem
+    The pixel multiplier dimension \lpr {pxdimen} has be inherited as core
+    primitive.
+\stopitem
+
+\startitem
+    The primitive \lpr {tracingfonts} is now a core primitive but doesn't relate
+    to the backend.
+\stopitem
+
+\stopitemize
+
+\stopsubsection
+
+\startsubsection[title=Changes from \ALEPH\ RC4]
+
+\topicindex {\ALEPH}
+
+In \LUATEX\ we took the 32 bit aspects and much of the directional mechanisms and
+merged it into the \PDFTEX\ code base as starting point for further development.
+Then we simplified directionality, fixed it and opened it up. In \LUAMETATEX\ not
+that much of the later is left. We only have two horizontal directions. Instead
+of vertical directions we introduce an orientation model bound to boxes.
+
+The already reduced|-|to|-|four set of directions now only has two members:
+left|-|to|-|right and right|-|to|-|left. They don't do much as it is the backend
+that has to deal with them. When paragraphs are constructed a change in
+horizontal direction is irrelevant for calculating the dimensions. So, basically
+most that we do is registering state and passing that on till the backend can do
+something with it.
+
+Here is a summary of inherited functionality:
+
+\startitemize
+
+\startitem
+    The \type {^^} notation has been extended: after \type {^^^^} four
+    hexadecimal characters are expected and after \type {^^^^^^} six hexadecimal
+    characters have to be given. The original \TEX\ interpretation is still valid
+    for the \type {^^} case but the four and six variants do no backtracking,
+    i.e.\ when they are not followed by the right number of hexadecimal digits
+    they issue an error message. Because \type {^^^} is a normal \TEX\ case, we
+    don't support the odd number of \type {^^^^^} either.
+\stopitem
+
+\startitem
+    Glues {\it immediately after} direction change commands are not legal
+    breakpoints. There is a bit more sanity testing for the direction state.
+\stopitem
+
+\startitem
+    The placement of math formula numbers is direction aware and adapts
+    accordingly. Boxes carry directional information but rules don't.
+\stopitem
+
+\startitem
+    There are no direction related primitives for page and body directions. The
+    paragraph, text and math directions are specified using primitives that
+    take a number.
+\stopitem
+
+\stopitemize
+
+\stopsubsection
+
+\startsubsection[title=Changes from standard \WEBC]
+
+\topicindex {\WEBC}
+
+The \LUAMETATEX\ codebase is not dependent on the \WEBC\ framework. The
+interaction with the file system and \TDS\ is up to \LUA. There still might be
+traces but eventually the code base should be lean and mean. The \METAPOST\
+library is coded in \CWEB\ and in order to be independent from related tools,
+conversion to \CCODE\ is done with a \LUA\ script ran by, surprise, \LUAMETATEX.
+
+\stopsubsection
+
+\stopsection
+
+\startsection[title=Implementation notes]
+
+\startsubsection[title=Memory allocation]
+
+\topicindex {memory}
+
+The single internal memory heap that traditional \TEX\ used for tokens and nodes
+is split into two separate arrays. Each of these will grow dynamically when
+needed. Internally a token or node is an index into these arrays. This permits
+for an efficient implementation and is also responsible for the performance of
+the core. The original documentation in \TEX\ The Program mostly applies!
+
+\stopsubsection
+
+\startsubsection[title=Sparse arrays]
+
+The \prm {mathcode}, \prm {delcode}, \prm {catcode}, \prm {sfcode}, \prm {lccode}
+and \prm {uccode} (and the new \lpr {hjcode}) tables are now sparse arrays that
+are implemented in~\CCODE. They are no longer part of the \TEX\ \quote
+{equivalence table} and because each had 1.1 million entries with a few memory
+words each, this makes a major difference in memory usage. Performance is not
+really hurt by this.
+
+The \prm {catcode}, \prm {sfcode}, \prm {lccode}, \prm {uccode} and \lpr {hjcode}
+assignments don't show up when using the \ETEX\ tracing routines \prm
+{tracingassigns} and \prm {tracingrestores} but we don't see that as a real
+limitation. It also saves a lot of clutter.
+
+A side|-|effect of the current implementation is that \prm {global} is now more
+expensive in terms of processing than non|-|global assignments but not many users
+will notice that.
+
+The glyph ids within a font are also managed by means of a sparse array as glyph
+ids can go up to index $2^{21}-1$ but these are never accessed directly so again
+users will not notice this.
+
+\stopsubsection
+
+\startsubsection[title=Simple single|-|character csnames]
+
+\topicindex {csnames}
+
+Single|-|character commands are no longer treated specially in the internals,
+they are stored in the hash just like the multiletter csnames.
+
+The code that displays control sequences explicitly checks if the length is one
+when it has to decide whether or not to add a trailing space.
+
+Active characters are internally implemented as a special type of multi|-|letter
+control sequences that uses a prefix that is otherwise impossible to obtain.
+
+\stopsubsection
+
+\startsubsection[title=Binary file reading]
+
+\topicindex {files+binary}
+
+All of the internal code is changed in such a way that if one of the \type
+{read_xxx_file} callbacks is not set, then the file is read by a \CCODE\ function
+using basically the same convention as the callback: a single read into a buffer
+big enough to hold the entire file contents. While this uses more memory than the
+previous code (that mostly used \type {getc} calls), it can be quite a bit faster
+(depending on your \IO\ subsystem). So far we never had issues with this approach.
+
+\stopsubsection
+
+\startsubsection[title=Tabs and spaces]
+
+\topicindex {space}
+\topicindex {newline}
+
+We conform to the way other \TEX\ engines handle trailing tabs and spaces. For
+decades trailing tabs and spaces (before a newline) were removed from the input
+but this behaviour was changed in September 2017 to only handle spaces. We are
+aware that this can introduce compatibility issues in existing workflows but
+because we don't want too many differences with upstream \TEXLIVE\ we just follow
+up on that patch (which is a functional one and not really a fix). It is up to
+macro packages maintainers to deal with possible compatibility issues and in
+\LUAMETATEX\ they can do so via the callbacks that deal with reading from files.
+
+The previous behaviour was a known side effect and (as that kind of input
+normally comes from generated sources) it was normally dealt with by adding a
+comment token to the line in case the spaces and|/|or tabs were intentional and
+to be kept. We are aware of the fact that this contradicts some of our other
+choices but consistency with other engines. We still stick to our view that at
+the log level we can (and might be) more incompatible. We already expose some
+more details anyway.
+
+\stopsubsection
+
+\startsubsection[title=Logging]
+
+The information that goes into the log file can be different from \LUATEX, and
+might even differ a bit more in the future. The main reason is that inside the
+engine we have more granularity, which for instance means that we output subtype
+related information when nodes are printed. Of course we could have offered a
+compatibility mode but it serves no purpose. Over time there have been many
+subtle changes to control logs in the \TEX\ ecosystems so another one is
+bearable.
+
+In a similar fashion, there is a bit different behaviour when \TEX\ expects
+input, which in turn is a side effect of removing the interception of \type {*}
+and \type {&} which made for cleaner code (quite a bit had accumulated as side
+effect of continuous adaptations in the \TEX\ ecosystems). There was already code
+that was never executed, simply as side effect of the way \LUATEX\ initializes
+itself (one needs to enable classes of primitives for instance).
+
+\stopsubsection
+
+\stopsection
+
+\stopchapter
+
+\stopcomponent
author	Hans Hagen <pragma@wxs.nl>	2019-12-30 20:42:59 +0100
committer	Context Git Mirror Bot <phg@phi-gamma.net>	2019-12-30 20:42:59 +0100
commit	54732448eb933607bdcb11a457756741dc4e0b44 (patch)
tree	d0f312dd29af54ee85d89f6d6f242be7ee6b5454 /doc/context/sources/general/manuals/luametatex/luametatex-modifications.tex
parent	ede5a2aae42ff502be35d800e97271cf0bdc889b (diff)
download	context-54732448eb933607bdcb11a457756741dc4e0b44.tar.gz