1 files changed, 437 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/mk/mk-structure.tex b/doc/context/sources/general/manuals/mk/mk-structure.tex
new file mode 100644
index 000000000..f199feb7b
--- /dev/null
+++ b/doc/context/sources/general/manuals/mk/mk-structure.tex
@@ -0,0 +1,437 @@
+% language=uk
+
+\usemodule[narrowtt]
+
+\environment mk-environment
+
+\startcomponent mk-structure
+
+\chapter{Everything structure}
+
+At the time of this writing, \CONTEXT\ \MKIV\ spends some 50\% of
+its time in \LUA. There are several reasons for this.
+
+\startitemize[packed]
+\item All \IO\ goes via \LUA, including messages and logging. This includes
+      file searching which happened to be done by the \KPSE\ library.
+\item Much font handling is done by \LUA\ too, for instance \OPENTYPE\ features
+      are completely handled by \LUA.
+\item Because \TEX\ is highy optimized, its influence on runtime is less
+      prominent. Even if we delegate some tasks to \LUA, \TEX\ still has
+      work to do.
+\stopitemize
+
+Among the reported statistics of a 242 page version of \type
+{mk.pdf} (not containing this chapter) we find the following:
+
+\startntyping
+input load time           - 0.094 seconds
+startup time              - 0.905 seconds (including runtime option file processing)
+jobdata time              - 0.140 seconds saving, 0.062 seconds loading
+fonts load time           - 5.413 seconds
+xml load time             - 0.000 seconds, lpath calls: 46, cached calls: 31
+lxml load time            - 0.000 seconds preparation, backreferences: 0
+mps conversion time       - 0.000 seconds
+node processing time      - 1.747 seconds including kernel
+kernel processing time    - 0.343 seconds
+attribute processing time - 2.075 seconds
+language load time        - 0.109 seconds, n=4
+graphics processing time  - 0.109 seconds including tex, n=7
+metapost processing time  - 0.484 seconds, loading: 0.016 seconds, execution: 0.203 seconds, n: 65
+current memory usage      - 332 MB
+loaded patterns           - gb:gb:pat:exc:3 nl:nl:pat:exc:4 us:us:pat:exc:2
+control sequences         - 34245 of 165536
+callbacks                 - direct: 235579, indirect: 18665, total: 254244 (1050 per page)
+runtime                   - 25.818 seconds, 242 processed pages, 242 shipped pages, 9.373 pages/second
+\stopntyping
+
+The startup time includes initial font loading (we don't store fonts
+in the format). Jobdata time involves loading and saving multipass data
+used for tables of contents, references, positioning, etc. The time needed
+for loading fonts is over 5 seconds due to the fact that we load a couple of
+real large and complex fonts. Node processing time mostly is related to
+\OPENTYPE\ feature support. The kernel processing time refers to hyphenation
+and line breaking, for which (of course) we use \TEX. Direct callbacks are
+implicit calls to \LUA, using \type {\directlua} while the indirect calls
+concern overloaded \TEX\ functions and callbacks triggered by \TEX\ itself.
+
+Depending on the system load on my laptop, the throughput is around
+10 pages per second for this document, which is due to the fact
+that some font trickery takes place using a few arabic fonts, some
+chinese, a bunch of metapost punk instances, Zapfino, etc.
+
+The times reported are accumulated times and contain quite some
+accumulated rounding errors so assuming that the operating system
+rounds up the times, the totals in practice might be higher. So,
+looking at the numbers, you might wonder if the load on \LUA\ will
+become even larger. This is not necessary. Some tasks can be done
+better in \LUA\ but not always with less code, especially when we
+want to extend functionality and to provide more robust solutions.
+Also, even if we win some processing time we might as well waste
+it in interfacing between \TEX\ and \LUA. For instance, we can
+delegate pretty printing to \LUA, but most documents don't contain
+verbatim at all. We can handle section management by \LUA, but how
+many section headers does a document have?
+
+When the future of \TEX\ is discussed, among the ideas presented
+is to let \TEX\ stick to typesetting and implement it as a
+component (or library) on top of a (maybe dedicated) language.
+This might sound like a nice idea, but eventually we will end up
+with some kind of user interface and a substantial amount of code
+dedicated to dealing with fonts, structure, character management,
+math etc.
+
+In the process of converting \CONTEXT\ to \MKIV\ we try to use
+each language (\TEX, \LUA, \METAPOST) for what it is best suited
+for. Instead of starting from scratch, we start with existing code
+and functionality, because we need a running system. Eventually we
+might find \TEX's role as language being reduced to (or maybe we can
+better talk of \quote {focused on}) mostly aspects of
+typesetting, but \CONTEXT\ as a whole will not be much different
+from the perspective of the user.
+
+So, this is how the transition of \CONTEXT\ takes place:
+
+\startitemize[packed]
+\item We started with replacing isolated bits and pieces of code
+      where \LUA\ is a more natural candidate, like file \IO, encoding
+      issues.
+\item We implement new functionality, for instance \OPENTYPE\
+      and \TYPEONE\ support.
+\item We reimplement mechanisms that are not efficient as we want them
+      to be, like buffers and verbatim.
+\item We add new features, for instance tree based \XML\ processing.
+\item After evaluating we reimplement again when needed (or when \LUATEX\
+      evolves).
+\stopitemize
+
+Yet another transition is the one we will discuss next:
+
+\startitemize[packed]
+\item We replace complex mechanisms by new ones where we separate
+      management and typesetting.
+\stopitemize
+
+This not so trivial effort because it affects many aspects of \CONTEXT\ and
+as such we need to adapt a lot of code at the same time: all things
+related to structure:
+
+\startitemize[packed]
+\item sectioning (chapters, sections, etc)
+\item numbering (pages, itemize, enumeration, floats, etc)
+\item marks (used for headers and footers)
+\item lists (tables of contents, lists of floats, sorted lists)
+\item registers (including collapsing of page ranges)
+\item cross referencing (to text as well as pages)
+\item notes (footnotes, endnotes, etc)
+\stopitemize
+
+All these mechanisms are somehow related. A section head can occur
+in a list, can be cross referenced, might be shows in a header and
+of course can have a number. Such a number can have multiple
+components (1.A.3) where each component can have its own
+conversion, rendering (fonts, colors) and selectively have less
+components. In tables of contents either or not we want to see all
+components, separators etc. Such a table can be generated at each
+level, which demands filtering mechanisms. The same is true for
+registers. There we have page numbers too, and these may be
+prefixed by section numbers, possibly rendered differently than
+the original section number.
+
+Much if this is possible in \CONTEXT\ \MKII, but the code that
+deals with this is not always nice and clean and right from the start
+of the \LUATEX\ project it has been on the agenda to clean it up. The code
+evolved over time and
+functionality was added when needed. But, the projects
+that we deal with demand more (often local) control over the
+components of a number.
+
+What makes structure related data complex is that we need to keep
+track of each aspect in order to be able to reproduce the
+rendering in for instance a table of contents, where we also may
+want to change some of the aspects (for instance separators in a
+different color). Another pending issue is \XML\ and although we
+could normally deal with this quite well, it started making sense
+to make all multi|-|pass data (registers, tables of content,
+sorted lists, references, etc.) more \XML\ aware. This is a
+somewhat hairy task, if only because we need to switch between
+\TEX\ mode and \XML\ mode when needed and at the same time keep an
+eye on unwanted expansion: do we keep structure in the content or
+not?
+
+Rewriting the code that deals with these aspects of typesetting is
+the first step in a separation of code in \MKII\ and \MKIV. Until
+now we tried to share much code, but this no longer makes sense.
+Also, at the \CONTEXT\ conference in Bohinj (2008) it was decided
+that given the development of \MKIV, it made sense to freeze
+\MKII\ (apart from bug fixes and minor extensions). This decision
+opens the road to more drastic changes. We will roll back some of
+the splits in code that made sharing code possible and just
+replace whole components of \CONTEXT\ as a whole. This also gives
+us the opportunity to review code more drastically than until now
+in the perspective of \ETEX.
+
+Because this stage in the rewrite of \CONTEXT\ might bring some
+compatibility issues with it (especially for users who use the
+more obscure tuning options), I will discuss some of the changes
+here. A bit of understanding might make users more tolerant.
+
+The core data structure that we need to deal with is a number, which
+can be constructed in several ways.
+
+\def\NotaBeneR{\inframed[frame=off,background=color,backgroundcolor=mktransparentred]}
+\def\NotaBeneG{\inframed[frame=off,background=color,backgroundcolor=mktransparentgreen]}
+\def\NotaBeneB{\inframed[frame=off,background=color,backgroundcolor=mktransparentblue]}
+\def\NotaBeneY{\inframed[frame=off,background=color,backgroundcolor=mktransparentyellow]}
+\def\NotaBeneS{\inframed[frame=off,background=color,backgroundcolor=mktransparentgray]}
+
+\starttabulate[|l|l|]
+\NC sectioning   \NC \NotaBeneR{1.A.2.II} some title \NC \NR
+\NC pagenumber   \NC page \NotaBeneR{1.A}\NotaBeneG{--}\NotaBeneB{23} \NC \NR
+\NC reference    \NC in chapter \NotaBeneR{2.II} \NC \NR
+\NC marking      \NC \NotaBeneR{A}: some title with preceding number \NC \NR
+\NC contents     \NC \NotaBeneR{2.II} some title with some page number \NotaBeneR{1.A}\NotaBeneG{--}\NotaBeneB{23} \NC \NR
+\NC index        \NC some word \NotaBeneB{23}, \NotaBeneR{A}\NotaBeneG{--}\NotaBeneB{42}---\NotaBeneR{B}\NotaBeneG{--}\NotaBeneB{48} \NC \NR
+\NC itemize      \NC \NotaBeneY{a} first item \NotaBeneY{a.1} subitem item \NC \NR
+\NC enumerate    \NC example \NotaBeneR{1.A.2.II}\NotaBeneG{.}\NotaBeneY{a} \NC \NR
+\NC floatcaption \NC figure \NotaBeneR{1}\NotaBeneG{--}\NotaBeneB{2} \NC \NR
+\NC footnotes    \NC note \NotaBeneS{\symbol[3]} \NC \NR
+\stoptabulate
+
+In this table we see how numbers are composed:
+
+\starttabulate[|l|p|]
+\NC \NotaBeneR{section number} \NC It has several components, separated by symbols
+                                   and with an optional final symbol \NC \NR
+\NC \NotaBeneG{separator}      \NC This can be different for each level and can
+                                   have dedicated rendering options \NC \NR
+\NC \NotaBeneB{page number}    \NC That can be preceded by a (partial) sectionnumber
+                                   and separated from the page number by another symbol \NC \NR
+\NC \NotaBeneY{counter}        \NC It can be preceded by a (partial) sectionnumber and
+                                   can also have subnumbers with its own separation
+                                   properties \NC \NR
+\NC \NotaBeneS{symbol}         \NC Sometimes numbers get represented by symbols in which
+                                   case we use pagewise restarting symbol sets \NC \NR
+\stoptabulate
+
+Say that at some point we store a section number and/or page
+number. With the number we need to store information about the
+conversion (number, character, roman numeral, etc) and the
+separators, including their rendering. However, when we reuse that
+stored information we might want to discard some components and/or
+use a different rendering. In traditional \CONTEXT\ we have
+control over some aspects but due to the way numbers are stored
+for later reuse this control is limited.
+
+Say that we have cloned a subsection head as follows:
+
+\starttyping
+\definehead[MyHead][section]
+\stoptyping
+
+This is used as:
+
+\starttyping
+\MyHead[example]{Example}
+\stoptyping
+
+In \MKII\ we save a list entry (which has the number, the title
+and a reference to the page) and a reference to the the number,
+the title and the page (tagged \type {example}). Page numbers are
+stored in such a way that we can filter at specific section
+levels. This permits local tables of contents.
+
+The entry in the multi pass data file looks as follows (we collect all
+multi pass data in one file):
+
+\starttyping
+\mainreference{}{example}{2--0-1-1-0-0-0-0--1}{1}{{I.I}{Example}}%
+\listentry{MyHead}{2}{I.I}{Example}{2--0-1-1-0-0-0-0--1}{1}%
+\stoptyping
+
+In \MKIV\ we store more information and use tables for that. Currently
+the entry looks as follows:
+
+\starttyping
+structure.lists.collected={
+ {
+   ...
+ },
+ {
+  metadata={
+   catcodes=4,
+   coding="tex",
+   internal=2,
+   kind="section",
+   name="MyHead",
+   reference="example",
+  },
+  pagenumber={
+   numbers={ 1, 1, 0 },
+  },
+  sectionnumber={
+   conversion="R",
+   conversionset="default",
+   numbers={ 0, 2 },
+   separatorset="default",
+  },
+  sectiontitle={
+   label="MyHead",
+   title="Example",
+  },
+ },
+ {
+  ...
+ },
+}
+\stoptyping
+
+There can be much more information in each of the subtables. For
+instance, the \type {pagenumber} and \type {sectionnumber}
+subtables can have \type {prefix}, \type {separatorset},
+\type{conversion}, \type {conversionset}, \type {stopper}, \type
+{segments} and \type {connector} fields, and the \type {metadata}
+table can contain information about the \XML\ root document so
+that associated filtering and handling can be reconstructed. With the
+section title we store information about the preceding label text
+(seldom used, think of \quote{Part B}).
+
+This entry is used for lists as well as cross referencing.
+Actually, the stored information is also used for markings
+(running heads). This means that these mechanisms must be able to
+distinguish between where and how information is stored.
+
+These tables look rather verbose and indeed they are. We end up
+with much larger multi|-|pass data files but fortunately loading them
+is quite efficient. Serializing on the other hand might cost some time
+which is compensated by the fact that we no longer store
+information in token lists associated with nodes in \TEX's lists
+and in the future we might even move more data handling to the
+\LUA\ end. Also, in future versions we will share similar data
+(like page number information) more efficiently.
+
+Storing date at the \LUA\ end also has consequences for the
+typesetting. When specific data is needed a call to \LUA\ is
+necessary. In the future we might offer both push and pull methods
+(\LUA\ pushing information to the typesetting code versus \LUA\
+triggering typesetting code). For lists we pull, and for registers
+we currently push. Depending on our experiences we might change
+these strategies.
+
+A side effect of the rewrite is that we force more consistency.
+For instance, you see a \type {conversion} field in the list. This
+is the old way of defining the way a number gets converted. The
+modern approach is to use sets. Because we now have a more
+stringent inheritance model at the user interface level, this
+might lead to incompatible conversions at lower levels (when
+unset). Instead of cooking up some nasty compatibility hacks, we
+accept some incompatibility, if only because users have to adapt
+their styles to new font technology anyway. And for older
+documents there is still \MKII.
+
+Instead of introducing many extra configuration variables (for each
+level of sectioning) we introduce sets. These replace some of the
+existing parameters and are the follow up on some (undocumented)
+precursor of sets. Examples of sets are:
+
+\starttyping
+\definestructureseparatorset [default][][.]
+\definestructureconversionset[default][][numbers]
+\definestructureresetset     [default][][0]
+\definestructureprefixset    [default][section-2,section-3][]
+\definestructureseparatorset [appendix][][.]
+\definestructureconversionset[appendix][Romannumerals,Characters][]
+\definestructureresetset     [appendix][][0]
+\stoptyping
+
+The third parameter is the default value. The sets that relate to typesetting
+can have a rendering specification:
+
+\starttyping
+\definestructureseparatorset
+  [demosep]
+  [demo->!,demo->?,demo->*,demo->@]
+  [demo->/]
+\stoptyping
+
+Here we apply \type{demo} to each of the separators as well as to the
+default. The renderer is defined with:
+
+\starttyping
+\defineprocessor[demo][style=\bfb,color=red]
+\stoptyping
+
+You can imagine that, although this is quite possible in \TEX,
+dealing with sets, splitting them, handling the rendering, etc.\
+is easier in \LUA\ that in \TEX. Of course the code still looks
+somewhat messy, if only because the problem is messy. Part if this
+mess is related to the fact that we might have to specify all
+components that make up a number.
+
+\starttabulate
+\NC section    \NC section number as part of head  \NC \NR
+\NC list       \NC section number as part of list entry  \NC \NR
+\NC            \NC section number as part of page number prefix \NC \NR
+\NC            \NC (optionally prefixed) page number \NC \NR
+\NC counter    \NC section number as part of counter prefix  \NC \NR
+\NC            \NC (optionally prefixed) counter value(s) \NC \NR
+\NC pagenumber \NC section number as part of page number \NC \NR
+\NC            \NC pagenumber components (realpage, page, subpage) \NC \NR
+\stoptabulate
+
+As a result we have upto 3 sets of parameters:
+
+\starttabulate
+\NC section    \NC \type{section*} \NC \NR
+\NC list       \NC \type{section*} \type{prefix*} \type{page*} \NC \NR
+\NC counter    \NC \type{section*} \type{number*} \NC \NR
+\NC pagenumber \NC \type{prefix*} \type{page*} \NC \NR
+\stoptabulate
+
+When reimplementing the structure related commands, we also have
+to take mechanisms into account that relate to them. For instance,
+index sorter code is also used for sorted lists, so when we adapt
+one mechanism we also have to adapt the other. The same is true
+for cross references, that are used all over the place. It helps
+that for the moment we can omit the more obscure interaction
+related mechanism, if only because users will seldom use them.
+Such mechanisms are also related to the backend and we're not yet
+in the stage where we upgrade the backend code. In case you wonder
+why references can be such a problematic areas think of the
+following:
+
+\starttyping
+\goto{here}[page(10),StartSound{ping},StartVideo{demo}]
+\goto{there}[page(10),VideLayer{example},JS(SomeScript{hi world})]
+\goto{anywhere}[url(mypreviouslydefinedurl)]
+\stoptyping
+
+The \CONTEXT\ cross reference mechanism permits mixed usage of simple
+hyperlinks (jump to some page) and more advanced viewer actions like
+showing widgets and runnign \JAVASCRIPT\ code. And even a simple
+reference like:
+
+\starttyping
+\at{here and there}[somefile::sometarget]
+\stoptyping
+
+involves some code because we need to handle the three words as
+well as the outer reference. \footnote {Currently \CONTEXT\ does
+its own splitting of multiword references, and does so by reusing
+hyperlink resources in the backend format. This might change in
+the future.} The reason why we need to reimplement referencing
+along with structure lays in the fact that for some structure
+components (like section headers and float references) we no
+longer store cross reference information separately but filter it
+from the data stored in the list (see example before).
+
+The \LUA\ code involved in dealing with the more complex
+references shown here is much more flexible and robust than the
+original \TEX\ code. This is a typical example of where the
+accumulated time spent on the \TEX\ based solution is large
+compared to the time spent on the \LUA\ variant. It's like driving
+200 km by car through hilly terrain and wondering how one did that
+in earlier times. Just like today scenery is not by definition better
+than yestedays, \MKIV\ code is not always better than \MKII\ code.
+
+\stopcomponent