diff options
author | Context Git Mirror Bot <phg42.2a@gmail.com> | 2016-08-01 16:40:14 +0200 |
---|---|---|
committer | Context Git Mirror Bot <phg42.2a@gmail.com> | 2016-08-01 16:40:14 +0200 |
commit | 96f283b0d4f0259b7d7d1c64d1d078c519fc84a6 (patch) | |
tree | e9673071aa75f22fee32d701d05f1fdc443ce09c /doc/context/sources/general/manuals/mk/mk-goingbeta.tex | |
parent | c44a9d2f89620e439f335029689e7f0dff9516b7 (diff) | |
download | context-96f283b0d4f0259b7d7d1c64d1d078c519fc84a6.tar.gz |
2016-08-01 14:21:00
Diffstat (limited to 'doc/context/sources/general/manuals/mk/mk-goingbeta.tex')
-rw-r--r-- | doc/context/sources/general/manuals/mk/mk-goingbeta.tex | 343 |
1 files changed, 343 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/mk/mk-goingbeta.tex b/doc/context/sources/general/manuals/mk/mk-goingbeta.tex new file mode 100644 index 000000000..9937a373d --- /dev/null +++ b/doc/context/sources/general/manuals/mk/mk-goingbeta.tex @@ -0,0 +1,343 @@ +% language=uk + +\startcomponent mk-goingbeta + +\environment mk-environment + +\doifmodeelse {tug} { + + \title {Lua\TeX\ going beta} + + \subject{by Hans Hagen \& Taco Hoekwater} + + This is Chapter~XI from \notabene {\CONTEXT, from \MKII\ to \MKIV}, a document + that describes our explorations, experiments and decisions made while + we develop \LUATEX. + + \blank[3*big] + +} { + + \chapter {Going beta} + +} + +\subject{introduction} + +We're closing in on the day that we will go beta with \LUATEX\ (end of July +2007). By now we have a rather good picture of its potential and to what +extend \LUATEX\ will solve some of our persistent problems. Let's first +summarize our reasons for and objectives with \LUATEX. + +\startitemize + +\item The world has moved from 8~bits to 32~bits and more, and this is +quite noticeable in the arena of fonts. Although \TYPEONE\ fonts could host +more than 256 glyphs, the associated technology was limited to 256. The advent +of \OPENTYPE\ fonts will make it easier to support multiple languages at the +same time without the need to switch fonts at awkward times. + +\item At the same time \UNICODE\ is replacing 8~bit based encoding vectors and +code pages (input regimes). The most popular and rather efficient \UTF8 encoding +has become a de factor standard in document encoding and interchange. + +\item Although we can do real neat tricks with \TEX, given some nasty programming, +we are touching the limits of its possibilities. In order for it to survive we +need to extend the engine but not at the cost of base compatibility. + +\item Coding solutions in a macro language is fine, but sometimes you long to a more +procedural approach. Manipulating text, handling \IO, interfacing \unknown\ the +technology moves on and we need to move along too. + +\stopitemize + +Hence \LUATEX: a merge of the mainstream traditional \TEX\ engines, stripped from +broken or incomplete features and opened up to an embedded \LUA\ scripting engine. + +We will describe the impact of this new engine by starting from its core components +reflected in the specific \LUA\ interface libraries. Missing here is embedded support +for \METAPOST, because it's not yet there (apart from the fact that we use \LUA\ to +convert \METAPOST\ graphics into \TEX). Also missing is the interfacing to the \PDF\ +backend, which is also on the agenda for later. Special extensions, for instance those +dealing with runtime statistics are also not discussed. Since we use \CONTEXT\ as +testbed, we will refer to the \LUATEX\ aware version of this macro package, \MKIV, but +most conclusions are rather generic. + +\subject{tex internals} + +In order to manipulate \TEX's data structures, we need access to all those registers. +Already early in the development, dimension and counters were accessible and when +token and node interfaces were implemented, those registers also were interfaced. + +Those who read the previous chapters will have noticed that we hardly discussed this +option. The reason is that we didn't yet needed that access much in order to implement +font support and list processing. After all, most of the data that we need to access and +manipulate is not in the registers at all. Information meant for \LUA\ can be stored +in \LUA\ data structures. In fact, the basic call + +\starttyping +\directlua 0 {some lua code} +\stoptyping + +has shown to be a pretty good starting point and the fact that one can print back to +the \TEX\ engine overcomes the need to store results in shared variables. + +\starttyping +\def\valueofpi{\directlua0{tex.sprint(math.pi()}} +\stoptyping + +The number of such direct calls is not that large anyway. More often a call to \LUA\ +will be initiated by a callback, i.e.\ a hook into the \TEX\ machinery. + +What will be the impact of access on \CONTEXT\ \MKIV ? This is yet hard to tell. In a +later stage of the development, when parts of the \TEX\ machinery will be rewritten in +order to get rid of the current global nature of many variables, we will gain more +control and access to \TEX's internals. Core functionality will be isolated, can be +extended and|/|or overloaded and at that moment access to internals is much more +needed. But certainly that will be beyond the current registers and variables. + +\subject{callbacks} + +These are the spine of \LUATEX: here both worlds communicate with each other. A callback +is a place in the \TEX\ kernel where some information is passed to \LUA\ and some result +is returned that is then used along the road. The reference manual mentions them all and +we will not repeat them here. Interesting is that in \MKIV\ most of them are used and for +tasks that are rather natural to their place and function. + +\starttyping +callback.register("tex_wants_to_do_this", + function but_use_lua_to_do_it_instead(a,b,c) + -- do whatever you like with a, b and c + return a, b, c + end +) +\stoptyping + +The impact of callbacks on \MKIV\ is big. It provides us a way to solve persistent +problems or reimplement existing solutions in more convenient ways. Because we tested +realistic functionality on real (moderately complex) documents using a pretty large +macro package, we can safely conclude that callbacks are quite efficient. Stepwise +\LUA\ kicks in in order to: + +\startitemize[packed] +\item influence the input medium so that it provides a sequence of \UTF\ characters +\item manipulate the stream of characters that will be turned into a list of tokens +\item convert the list of tokens into another list of tokens +\item enhance the list of nodes that will be turned into a typeset paragraph +\item tweak the mechanisms that come into play when lines are constructed +\item finalize the result that will end up in the output medium +\stopitemize + +Interesting is that manipulating tokens is less useful than it may look at first +sight. This has to do with the fact that it's (mostly) an expanded stream and at that +time we've lost some information or need to do quite some coding in order to analyze +the information and act upon it. + +Will \CONTEXT\ users see any of this? Chances are small that they will, although we +will provide hooks so that they can add special code themselves. Users activating +a callback has some danger, since it may overload already existing functionality. +Chaining functionality in a callback also has drawbacks, if only that one may be +confronted with already processed results and|/|or may destroy this result in +unpredictable ways. So, as with most low level \TEX\ features, \CONTEXT\ users will +work with more abstract interfaces. + +\subject{in- and output} + +In \MKIV\ we will no longer use the \KPSE\ library directly. Instead we use a +reimplementation in \LUA\ that not only is more efficient, but also more powerful: +it can read from \ZIP\ files, use protocols, be more clever in searching, reencodes +the input streams when needed, etc. The impact on \MKIV\ is large. Most \TEX\ code +that deals with input reencoding has gone away and is replaced by \LUA\ code. + +Although it is not directly related with reading from the input medium, in that stage +we also replaced verbatim handling code. Such (often messy) catcode related situations +are now handled more flexible, thanks to fast catcode table switching (a new +\LUATEX\ feature) and features like syntax highlighting can be made more neat. + +Buffers, a quite old but frequently used feature of \CONTEXT, are now kept in +memory instead of files. This speeds up runs. Auxiliary data, aka multi||pass +information, will no longer be stored in \TEX\ files but in \LUA\ files. In +\CONTEXT\ we have one such auxiliary file and in \MKII\ this file is selectively +filtered, but in \MKIV\ we will be less careful with memory and load all that +data once. Such speed improvements compensate the fact that \LUATEX\ is somewhat +slower than it's ancestor \PDFTEX. (Actually, the fact that \LUATEX\ is a bit +slower that \PDFTEX\ is mostly due to the fact that it has \ALEPH\ code on +board.) + +Users often wonder why there are so many temporary files, but these mostly relate +to \METAPOST\ support. These will go away once we have \METAPOST\ as a library. + +In a similar way support for \XML\ will be enriched. We already have experimental +loaders, filters and other code, and integration is on the agenda. Since \CONTEXT\ uses +\XML\ for some sub systems, this may have some impact. + +Other \IO\ related improvements involve debugging, error handling and logging. We can pop +up helpers and debug screens (\MKIV\ can produce \XHTML\ output and then launch a +browser). Users can choose more verbose logging of \IO\ and ask for log data to be +formatted in \XML. These parts need some additional work, because in the end we will +also reimplement and extend \TEX's error handling. + +Another consequence of this will be that we will be able to package \TEX\ more +conveniently. We can put all the files that are needed into a \ZIP\ file so that we only +need to ship that \ZIP\ file and a binary. + + +\subject{font readers} + +Handling \OPENTYPE\ involves more that just loading yet another font format. Of course +loading an \OPENTYPE\ file is a necessity but we need to do more. Such fonts come with +features. Features can involve replacing one representation of a character by another +one of combining sequences into other sequences and finaly resolving them to one or more +glyphs. + +Given the numerous options we will have to spend quite some time on extending \CONTEXT\ +with new features. Instead of defining more and more font instances (the traditional \TEX\ way +of doing things) we will will provides feature switching. In the end this will make +the often confusing font mechanisms less complex for the user to understand. Instead of +for instance loading an extra font (set) that provides old style numerals, we will +decouple this completely from fonts and provide it as yet another property of a piece +of text. The good news is that much of the most important machinery is alresady in +place (ligature building and such). Here we also have to decide what we let \TEX\ do +and what we do by processing node lists. For instance kerning and ligature building +can either be done by \TEX\ or by \LUA. Given the fact that \TEX\ does some juggling +with character kerning while determining hyphenation points, we can as well disable +\TEX's kerning and let \LUA\ handle it. Thereby \TEX\ only has to deal with paragraph +building. (After all, we need to leave \TEX\ some core functionality to deal with.) + +Another everlasting burden on macro writers and users is dealing with character +representations missing from a font. Of course, since we use named glyphs in +\CONTEXT\ \MKII\ already much of this can be hidden, but in \MKIV\ we can +create virtual fonts on the fly and keep thinking in terms of characters and +glyphs instead of dealing with boxes and other structures that don't go well with +for instance hyphenating words. + +This brings us to hyphenation, historically bound to fonts in traditional \TEX. This +dependency will go away. In \MKII\ we already ship \UTF8\ based patterns fore some time +and these can be conveniently used in \MKIV\ too. We experimented with using hyphenated +word lists and this looks promising. You may expect more advanced ways of dealing with +words, hyphenation and paragraph building in the near future. When we presented the +first version of \LUATEX\ a few years ago, we only had the basic \type {\directlua} call +available and could do a bit of string manipulation on the input. A fancy demo was to +color wrongly spelled words. Now we can do that more robustly on the node lists. + +Loading and preparing fonts for usage in \LUATEX\ or actually \MKIV\ because this depends +on the macro package takes some runtime. For this reason we introduces caching +into \MKIV: data that is used frequently is written to a cache and converted to \LUA\ +bytecode. Loading the converted files is incredibly fast. Of course there is aprice to +pay: disk space, but that comes cheap these days. Also, it may as well be compensated +by the fact that we can kick out many redundant files from the core \TEX\ distributions +(metric files for instance). + +\subject{tokens handlers} + +Do we need to handle tokens? So far in experimental \MKIV\ code we only used these hooks +to demonstrate what \TEX\ does with your characters. For a while we also constructed +token lists when we wanted to inject \type {\pdfliteral} code in node lists, but that +became obsolete when automatic string to token conversion was introduced in the node +conversion code. Now we inject literal whatsit nodes. It may be worth noticing that +playing with token lists gave us some good insight in bottlenecks because quite some +small table allocation and garbage collections goes on. + +\subject{nodes and attributes} + +These are the most promissing new features. In itself, nodes are not new, nor are +attributes. In some sense when we use primitives like \type {\hbox}, \type {\vskip}, +\type {\lastpenalty} the result is a node, but we can only control and inspect their +properties within hard coded bounds. We cannot really look into boxes, and the last +penalty may be obscured by a whatsit (a mark, a special, a write, etc.). Attributes +could be fakes with marks and macro bases stacks of states. Native attributes +are more powerful and each node can cary a truckload of them. + +With \LUATEX, out of a sudden we can look into \TEX's internals and manipulate +them. Although I don't claim to be a real expert on these internals, even after +over a decade of \TEX\ programming, I'm sometimes surprised what I found there. +When we are playing with these interfaces, we often run into situations +where we need to add much print statements to the \LUA\ code in order to find +out what \TEX\ is returning. It all has to do with the way \TEX\ collects +information and when it decides to act. In regular \TEX\ much goes unnoticed, but +when one has for instance a callback that deals with page building there are many +places where this gets called and some of these places need special treatment. + +Undoubtely this will have a huge impact on \CONTEXT\ \MKIV. Instead of parsing +an input stream, we can now manipulate node lists in order to achieve (slight) +inter||character spacing which is often needed in sectioning titles. The nice +thing about this new approach is that we no longer have interference from +characters that need multiple tokens (input characters) in order to be +constructed, which complicates parsing (needed to split glyphs in \MKII). + +Signaling where to letterspace is done with the mentioned attributes. There can be +many of them and they behave like fonts: they obey grouping, travel with the nodes +and are therefore insensitive for box and page splitting. They can be set at the +\TEX\ end but needs to be handled at the \LUA\ side. One may wonder what kind +of macro packages would be around when \TEX\ has attributes right from its start. + +In \MKII\ letterspacing is handled by parsing the input and injecting skips. +Another approach would be to use a font where each character has more kerns or space +around it (a virtual font can do that). But that would not only demand knowledge of +what fonts need that that treatment, but also many more fonts and generating them is +no fun for users. In \PDFTEX\ there is a letterspace feature, where virtual fonts +are generated on the fly, and with such an approach one has to compensate for the +first and last character in a line, in order to get rid of the left- and +rightmost added space (being part of the glyph). The solution where nodes are +manipulated does put that burden upon the user. + +Another example of node processing is adding specific kerns around some punctuation +symbols, as is custom in French. You don't want to know what it takes to do that +in traditional \TEX, but if I mention the fact that colons become active characters +you can imagine the nightmare. Hours of hacking and maybe even days of dealing with +mechanisms that make these active colons workable in places where colons are used +for non text are now even more wasted time if you consider that it takes a few lines +of code in \MKIV. Currently we let \CONTEXT\ support both good old \TEX\ +(represented by \PDFTEX), \XETEX\ (a \UNICODE\ and \OPENTYPE\ aware variant) and +\LUATEX\ by shared and dedicated \MKII\ and \MKIV\ code. + +Vertical spacing can be a pain. Okay, currently \MKII\ has a rather sophisticated way to +deal with vertical spacing in ways that give documents a consistent look and feel, but +every now and then we run into border cases that cannot be dealt with simply because +we cannot look back in time. This is needed because \TEX\ adds content to the main +vertical list and then it's gone from our view. Take for instance section titles. We don't +want them dangling at the bottom of a page. But at the same time we want itemized lists +to look well, i.e.\ keep items together in some situations. Graphics that follow a section +title pose similar problems. Adding penalties helps but these may come too late, or +even worse, they may obscure previous skips which then cannot be dealt with by successive +skips. To simplify the problem: take a skip of 12pt, followed by a penalty, followed by +another skip of 24pt. In \CONTEXT\ this has to become a penalty followed by one skip +of 24pt. + +Dealing with this in the page builder is rather easy. Ok, due to the way \TEX\ adds +content to the page stream, we need to collect, treat and flush, but currently this +works all right. In \CONTEXT\ \MKIV\ we will have skips with three additional properties: +priority over other skips, penalties, and a category (think of: ignore, force, +replace, add). + +When we experimented with this kind of things we quickly decided that additional +experiments with grid snapping also made sense. These mechanisms are among the more +complex ones on \CONTEXT. A simple snap feature took a few lines of \LUA\ code and +hooking it into \MKIV\ was not that complex either. Eventually we will reimplement +all vertical spacing and grid snapping code of \MKII\ in \LUA. Because one of +\CONTEXT\ column mechanism is grid aware, we may as well adath that and|/|or implement +an additional mechanism. + +A side effect of being able to do this in \LUATEX\ is that the code taken from \PDFTEX\ +is cleaned up: all (recently added) static kerning code is removed (inter||character +spacing, pre- and post character kerning, experimental code that can fix the heights +and depths of lines, etc.). The core engine will only deal with dynamic features, +like \HZ\ and protruding. + +So, the impact on \MKIV\ of nodes and attributes is pretty big! Horizontal spacing isues, +vertical spacing, grid snapping are just a few of the things we will reimplement. Other +things are line numbering, multiple content streams with synchronization, both are +already present in \MKII\ but we can do a better job in \MKIV. + +\subject{generic code} + +In the previous text \MKIV\ was mentioned often, but some of the features are rather +generic in nature. So, how generic can interfaces be implemented? When the \MKIV\ code +has matured, much of the \LUA\ and glue||to||\TEX\ code will be generic in nature. +Eventually \CONTEXT\ will become a top layer on what we internally call \METATEX, a +collection of kernel modules that one can use to build specialized macro packages. +To some extent \METATEX\ can be for \LUATEX\ what plain is for \TEX. But if and how +fast this will be reality depends on the amount of time that we (and other members of +the \CONTEXT\ development team) can allocate to this. + +\stopcomponent |