% language=uk

\environment mk-environment

\startcomponent mk-halfway

\chapter{Halfway}

\subject{introduction}

We are about halfway into the \LUATEX\ project now. At the time of
writing this document we are only a few days away from version
0.40 (the Bacho\TEX\ cq.\ \TEX Live version) and around euro\TEX\
2009 we will release version 0.50. Starting with version 0.30
(which we released around the conference of the Korean \TEX\ User
group meeting) all one-decimal releases are supported and usable
for (controlled) production work. We have always stated that all
interfaces may change until they are documented to be stable, and
we expect to document the first stable parts in version 0.50.
Currently we plan to release version 1.00 sometime in 2012, 30
years after \TEX82, with 0.60 and 0.70 in 2010, 0.80 and 0.90 in
2011. But of course it might turn out different.

In this update we assume that the reader knows what \LUATEX\ is and
what it does.

\subject{design principles}

We started this project because we wanted an extensible engine.
We chose \LUA\ as the glue language. We do not regret this choice as it
permitted us to open up \TEX's internals reasonably well. There have been
a few extensions to \TEX\ itself, and there will be a few more, but none
of them are fundamental in the sense that they influence

typesetting. Extending \TEX\ in that area is up to the macro package
writer, who can use the \LUA\ language combined with \TEX\ macros. In a
similar fashion we made some decisions about \LUA\ libraries that are
included. What we have now is what you will get. Future versions of
\LUATEX\ will have the ability to load additional libraries but these
will not be part of the core distribution.  There is simply too much
choice and we do not want to enter endless discussions about what is
best. More flexibility would also add a burden on maintenance that we
do not want. Portability has always been a virtue of \TEX\ and we want
to keep it that way.

\subject{lua scripting}

Before 0.40 there could be multiple instances of the \LUA\ interpreter
active at the same time, but we have now decided to limit the number of
instances to just one. The reason is simple: sharing all functionality
among multiple \LUA\ interpreter instances does more bad than good and
\LUA\ has enough possibilities to create namespaces anyway. The new
limit also simplifies the internal source code, which is a good
thing. While the \type {\directlua} command is now sort of frozen, we
might extend the functionality of \type {\latelua}, especially in
relation to what is possible in the backend. Both commands still
accept a number but this now refers to an index in a user||definable
name table that will be shown when an error occurs.

\subject {input and output}

The current \LUATEX\ release permits multiple instances of \KPSE\
which can be handy if you mix, for instance, a macro package and
\MPLIB, as both have their own \quote{progname} (and engine) namespace.
However, right from the start it has been possible to bring most input
under \LUA\ control and one can overload the usual \KPSE\
mechanisms. This is what we do in \CONTEXT\ (and probably only there).

Logging, etc., is also under \LUA\ control. There is no support for
writing to \TEX's opened output channels except for the log and the
terminal. We are investigating limited write control to numbered
channels but this has a very low priority.

Reading from zip files and sockets has been available
for a while now.

Among the first things that have been implemented is a mechanism for
managing category codes (\type{\catcode}) although this is not really
needed for practical usage as we aim at full compatibility. It just
makes printing back to \TEX\ from \LUA\ a bit more comfortable.

\subject {interface to tex}

Registers can always be accessed from \LUA\ by number and (when
defined at the \TEX\ end) also by name. When writing to a register
grouping is honored. Most internal registers can be accessed
(mostly read-only). Box registers can be manipulated but users
need to be aware of potential memory management issues.

There will be provisions to use the primitives related to setting
codes (lowercase codes and such). Some of this functionality will be
available in version 0.50.

\subject {fonts}

The internal font model has been extended to the full \UNICODE\
range. There are readers for \OPENTYPE, \TYPEONE, and traditional
\TEX\ fonts. Users can create virtual fonts on the fly and have
complete control over what goes into \TEX. Font specific features
can either be mapped onto the traditional ligature and kerning
mechanisms or be implemented in \LUA.

We use code from \FONTFORGE\ that has been stripped to get a
smaller code base. Using the \FONTFORGE\ code has the advantage
that we get a similar view on the fonts in \LUATEX\ as in this
editor which makes debugging easier and developing fonts more
convenient.

The interface is already rather stable but some of the keys in loaded
tables might change. Almost all of the font interface will be stable
in version 0.50.

\subject {tokens}

It is possible to intercept tokenization. Once intercepted, a token
table can be manipulated before being piped back into \LUATEX.  We
still support \OMEGA's translation processes but that might become
obsolete at some point.

Future versions of \LUATEX\ might use \LUA's so-called \quote {user data}
concept but the interface will mostly be the same. Therefore this
subsystem will not be frozen yet in version 0.50.

\subject {nodes}

Users have access to the node lists in various stages. This interface
has already been quite stable for some time but some cleanup might
still take place. Currently the node memory maintenance is still
explicit, but eventually we will make releasing unused nodes automatic.

We have plans for keeping more extensive information within
a paragraph (initial whatsit) so that one can build alternative
paragraph builders in \LUA. There will be a vertical packer (in
addition to the horizontal packer) and we will open up the page
builder (inserts etc.). The basic interface will be stable in version
0.50.

\subject {attributes}

This new kid on the block is now available for most subsystems but
we might change some of its default behaviour. As of 0.40 you can
also use negative values for attributes. The original idea of
using negative values for special purposes has been abandoned as
we consider a secondary (faster and more efficient) limited
variant. The basic principles will be stable around version 0.50,
but we reserve the freedom to change some aspects of attributes
until we reach version 1.00.

\subject {hyphenation}

In \LUATEX\ we have clearly separated hyphenation, ligature
building and kerning. Managing patterns as well as hyphenation is
reimplemented from scratch but uses the same principles as
traditional \TEX. Patterns can be loaded at run time and exceptions
are quite efficient now. There are a few extensions, like embedded
discretionaries in exceptions and pre- as well as posthyphens.

On the agenda is fixing some \quote{hyphenchar} related issues and future
releases might deal with compound words as well. There are some
known limitations that we hope to have solved in version 0.50.

\subject {images}

Image handling is part of the backend. This part of the \PDFTEX\
code has been rewritten and can now be controlled from \LUA. There
are already a few more options than in \PDFTEX\ (simple
transformations). The image code will also be integrated in the
virtual font handler.

\subject {paragraph building}

The paragraph builder has been rewritten in \CCODE\ (soon to be
converted back to \CWEB). There is a callback related to the builder
so it is possible to overload the default line breaker by one written
in \LUA.

There are no further short|-|term revisions on the agenda, apart from
writing an advanced (third order) Arabic routine for the Oriental
\TEX\ project.

Future releases may provide a bit more control over \type{\parshape}s
and multiple paragraph shapes.

\subject {metapost}

The closely related \MPLIB\ project has resulted in a \METAPOST\
library that is included in \LUATEX. There can be multiple
instances active at the same time and \METAPOST\ processing is
very fast. Conversion to \PDF\ is to be done with \LUA.

On the to-do list is a bit more interoperability (pre- and
postscript tables) and this will make it into release 0.50
(maybe even in version 0.40 already).

\subject {mathematics}

Version 0.50 will have a stable version of \UNICODE\
math support. Math is backward compatible but provides solutions
for dealing with \OPENTYPE\ math fonts. We provide math lists in
their intermediate form (noads) so that it is possible to
manipulate math in great detail.

The relevant math parameters are reorganized according to what
\OPENTYPE\ math provides (we use the Cambria font as our reference). Parameters
are grouped by style. Future versions of \LUATEX\ will build upon
this base to provide a simple mechanism for switching style sets
and font families in-formula.

There are new primitives for placing accents (top and bottom
variants and extensible characters), creating radicals, and making
delimiters. Math characters are permitted in text mode.

There will be an additional alignment mechanism analogous to
what \MATHML\ provides. Expect more.

\subject {page building}

Not much work has been done on opening up the page builder
although we do have access to the intermediate lists. This
is unlikely to happen before 0.50.

\subject {going cweb}

After releasing version 0.50 around Euro\TEX\ 2009 there will be a
period of relative silence. Apart from bug fixes and (private)
experiments there will be no release for a while. At the time of the
0.50 release the \LUATEX\ source code will probably be in plain C
completely. After that is done, we will concentrate hard on
consolidating and upgrading the code base back into \CWEB.

\subject {cleanup}

Cleanup of code is a continuous process. Cleanup is needed because
we deal with a merge of traditional \TEX, \ETEX\ extensions,
\PDFTEX\ functionality and some \OMEGA\ (\ALEPH) code.

Compatibility is a prerequisite, with the exception of logging and
rather special ligature reconstruction code.

We also use the opportunity to slowly move away from all the global
variables that are used in the \PASCAL\ version.

\subject {alignments}

We do have some ideas about opening up alignments, but it has a
low priority and it will not happen before the 0.50 release.

\subject {error handling}

Once all code is converted to \CWEB, we will look into error
handling and recovery. It has no high priority as it is easier to
deal with after the conversion to \CWEB.

\subject {backend}

The backend code will be rewritten stepwise. The image related
code has already been redone, and currently everything related to
positioning and directions is redesigned and made more consistent.
Some bugs in the \ALEPH\ code (inherited from \OMEGA) have been
removed and we are trying to come up with a consistent way of dealing
with directions. Conceptually this is somewhat messy because much
directionality is delegated to the backend.

We are experimenting with positioning (preroll) and better literal
injection. Currently we still use the somewhat fuzzy \PDFTEX\ methods
that evolved over time (direct, page and normal injection) but we
will come up with a clearer model.

Accuracy of the output (\PDF) will be improved and character
extension (hz) will be done more efficiently. Experimental code
seems to work okay. This will become available from release 0.40
and onwards and further cleanup will take place when the \CWEB\
code is there, as much of the \PDF\ backend code is already \CCODE.

\subject{context mkiv}

When we started with \LUATEX\ we decided to use a branch of
\CONTEXT\ for testing as it involves quite drastic changes, many
rewrites, a tight connection with binary versions, etc.

As a result for some time we now have two versions of \CONTEXT: \MKII\
and \MKIV, where the former targets \PDFTEX\ and \XETEX, and
the latter exclusively uses \LUATEX. Although the user interface
is downward compatible the code base starts to diverge more and
more. Therefore at the last \CONTEXT\ meeting it was decided to
freeze the current version of \MKII\ and only apply bug fixes
and an occasional simple extension.

This policy change opened the road to rather drastic splitting of the
code, also because full compatibility between \MKII\ and \MKIV\ is not
required. Around \LUATEX\ version 0.40 the new, currently still
experimental, document structure related code will be merged into the
regular \MKIV\ version. This might have some impact as it opens up new
possibilities.

\subject {the future}

In the future, \MKIV\ will try to create (more) clearly separated
layers of functionality so that it will become possible to make
subsets of \CONTEXT\ for special purposes. This is done under the name
\METATEX. Think of layering like:

\startitemize[packed]
\item \IO, catcodes, callback management, helpers
\item input regimes, characters, filtering
\item nodes, attributes and noads
\item user interface
\item languages, scripts, fonts and math
\item spacing, par building and page construction
\item \XML, graphics, \METAPOST, job management, and structure (huge impact)
\item modules, styles, specific features
\item tools
\stopitemize

\subject{fonts}

At this moment \MKIV\ is already quite capable of dealing with
\OPENTYPE\ fonts. The driving force behind this is the Oriental
\TEX\ project which brings along some very complex and feature
rich Arabic font technology. Much time has gone into reverse
engineering the specification and behaviour of how these fonts
behave in Uniscribe (which we use as our reference for Arabic).

Dealing with the huge \CJK\ fonts is less a font issue and more
a matter of node list processing. Around the annual meeting of
the Korean User Group we got much of the machinery working, thanks
to discussions on the spot and on the mailing list.

\subject {math}

Between \LUATEX\ versions 0.30 and 0.40 the math machinery was opened
up (stage one). In order to test this new functionality, \MKIV's math
subsystem (that was then already partially \UNICODE\ aware) had to be
adapted.

First of all \UNICODE\ permits us to use only one math family and so
\MKIV\ now does that.  The implementation uses Microsoft's Cambria
Math font as a benchmark. It creates virtual fonts from the other (old
and new) math fonts so they appear to match up to Cambria
Math. Because the \TEX\ Gyre math project is not yet up to speed \MKIV\
currently uses virtual variants of these fonts that are created at
run time. The missing pieces in for instance Latin Modern and friends
are compensated for by means of virtual characters.

Because it is now possible to parse the intermediate noad lists \MKIV\ can
do some manipulations before the formula is typeset. This is for
instance used for alphabet remapping, forcing sizes, and spacing
around punctuation.

Although \MKIV\ already supports most of the math that users expect
there is still room for improvement once there is even more control
over the machinery. This is possible because \MKIV\ is not bound to
downward compatibility.

As with all other \LUATEX\ related \MKIV\ code, it is expected that we
will have to rewrite most of the current code a few times as we
proceed, so \MKIV\ math support is not yet stable either. We can take
such drastic measures because \MKIV\ is still experimental and because
users are willing to do frequent synchronous updating of macros and
engine. In the process we hope to get away from all ad||hoc boxing and
kerning and whatever solutions for creating constructs, by using
the new accent, delimiter, and radical primitives.

\subject {tracing and testing}

Whenever possible we add tracing and visualization features to
\CONTEXT\ because the progress reports and articles need them. Recent
extensions concerned tracing math and tracing \OPENTYPE\ processing.

The \OPENTYPE\ tracing options are a great help in stepwise
reaching the goals of the Oriental \TEX\ project. This project
gave the \LUATEX\ project its initial boost and aims at high
quality right|-|to|-|left typesetting. In the process complex (test)
fonts are made which, combined with the tracing mentioned, help us
to reveal the secrets of \OPENTYPE.

\stopcomponent