diff options
Diffstat (limited to 'doc/context/sources/general/manuals/mk/mk-structure.tex')
-rw-r--r-- | doc/context/sources/general/manuals/mk/mk-structure.tex | 437 |
1 files changed, 437 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/mk/mk-structure.tex b/doc/context/sources/general/manuals/mk/mk-structure.tex new file mode 100644 index 000000000..f199feb7b --- /dev/null +++ b/doc/context/sources/general/manuals/mk/mk-structure.tex @@ -0,0 +1,437 @@ +% language=uk + +\usemodule[narrowtt] + +\environment mk-environment + +\startcomponent mk-structure + +\chapter{Everything structure} + +At the time of this writing, \CONTEXT\ \MKIV\ spends some 50\% of +its time in \LUA. There are several reasons for this. + +\startitemize[packed] +\item All \IO\ goes via \LUA, including messages and logging. This includes + file searching which happened to be done by the \KPSE\ library. +\item Much font handling is done by \LUA\ too, for instance \OPENTYPE\ features + are completely handled by \LUA. +\item Because \TEX\ is highy optimized, its influence on runtime is less + prominent. Even if we delegate some tasks to \LUA, \TEX\ still has + work to do. +\stopitemize + +Among the reported statistics of a 242 page version of \type +{mk.pdf} (not containing this chapter) we find the following: + +\startntyping +input load time - 0.094 seconds +startup time - 0.905 seconds (including runtime option file processing) +jobdata time - 0.140 seconds saving, 0.062 seconds loading +fonts load time - 5.413 seconds +xml load time - 0.000 seconds, lpath calls: 46, cached calls: 31 +lxml load time - 0.000 seconds preparation, backreferences: 0 +mps conversion time - 0.000 seconds +node processing time - 1.747 seconds including kernel +kernel processing time - 0.343 seconds +attribute processing time - 2.075 seconds +language load time - 0.109 seconds, n=4 +graphics processing time - 0.109 seconds including tex, n=7 +metapost processing time - 0.484 seconds, loading: 0.016 seconds, execution: 0.203 seconds, n: 65 +current memory usage - 332 MB +loaded patterns - gb:gb:pat:exc:3 nl:nl:pat:exc:4 us:us:pat:exc:2 +control sequences - 34245 of 165536 +callbacks - direct: 235579, indirect: 18665, total: 254244 (1050 per page) +runtime - 25.818 seconds, 242 processed pages, 242 shipped pages, 9.373 pages/second +\stopntyping + +The startup time includes initial font loading (we don't store fonts +in the format). Jobdata time involves loading and saving multipass data +used for tables of contents, references, positioning, etc. The time needed +for loading fonts is over 5 seconds due to the fact that we load a couple of +real large and complex fonts. Node processing time mostly is related to +\OPENTYPE\ feature support. The kernel processing time refers to hyphenation +and line breaking, for which (of course) we use \TEX. Direct callbacks are +implicit calls to \LUA, using \type {\directlua} while the indirect calls +concern overloaded \TEX\ functions and callbacks triggered by \TEX\ itself. + +Depending on the system load on my laptop, the throughput is around +10 pages per second for this document, which is due to the fact +that some font trickery takes place using a few arabic fonts, some +chinese, a bunch of metapost punk instances, Zapfino, etc. + +The times reported are accumulated times and contain quite some +accumulated rounding errors so assuming that the operating system +rounds up the times, the totals in practice might be higher. So, +looking at the numbers, you might wonder if the load on \LUA\ will +become even larger. This is not necessary. Some tasks can be done +better in \LUA\ but not always with less code, especially when we +want to extend functionality and to provide more robust solutions. +Also, even if we win some processing time we might as well waste +it in interfacing between \TEX\ and \LUA. For instance, we can +delegate pretty printing to \LUA, but most documents don't contain +verbatim at all. We can handle section management by \LUA, but how +many section headers does a document have? + +When the future of \TEX\ is discussed, among the ideas presented +is to let \TEX\ stick to typesetting and implement it as a +component (or library) on top of a (maybe dedicated) language. +This might sound like a nice idea, but eventually we will end up +with some kind of user interface and a substantial amount of code +dedicated to dealing with fonts, structure, character management, +math etc. + +In the process of converting \CONTEXT\ to \MKIV\ we try to use +each language (\TEX, \LUA, \METAPOST) for what it is best suited +for. Instead of starting from scratch, we start with existing code +and functionality, because we need a running system. Eventually we +might find \TEX's role as language being reduced to (or maybe we can +better talk of \quote {focused on}) mostly aspects of +typesetting, but \CONTEXT\ as a whole will not be much different +from the perspective of the user. + +So, this is how the transition of \CONTEXT\ takes place: + +\startitemize[packed] +\item We started with replacing isolated bits and pieces of code + where \LUA\ is a more natural candidate, like file \IO, encoding + issues. +\item We implement new functionality, for instance \OPENTYPE\ + and \TYPEONE\ support. +\item We reimplement mechanisms that are not efficient as we want them + to be, like buffers and verbatim. +\item We add new features, for instance tree based \XML\ processing. +\item After evaluating we reimplement again when needed (or when \LUATEX\ + evolves). +\stopitemize + +Yet another transition is the one we will discuss next: + +\startitemize[packed] +\item We replace complex mechanisms by new ones where we separate + management and typesetting. +\stopitemize + +This not so trivial effort because it affects many aspects of \CONTEXT\ and +as such we need to adapt a lot of code at the same time: all things +related to structure: + +\startitemize[packed] +\item sectioning (chapters, sections, etc) +\item numbering (pages, itemize, enumeration, floats, etc) +\item marks (used for headers and footers) +\item lists (tables of contents, lists of floats, sorted lists) +\item registers (including collapsing of page ranges) +\item cross referencing (to text as well as pages) +\item notes (footnotes, endnotes, etc) +\stopitemize + +All these mechanisms are somehow related. A section head can occur +in a list, can be cross referenced, might be shows in a header and +of course can have a number. Such a number can have multiple +components (1.A.3) where each component can have its own +conversion, rendering (fonts, colors) and selectively have less +components. In tables of contents either or not we want to see all +components, separators etc. Such a table can be generated at each +level, which demands filtering mechanisms. The same is true for +registers. There we have page numbers too, and these may be +prefixed by section numbers, possibly rendered differently than +the original section number. + +Much if this is possible in \CONTEXT\ \MKII, but the code that +deals with this is not always nice and clean and right from the start +of the \LUATEX\ project it has been on the agenda to clean it up. The code +evolved over time and +functionality was added when needed. But, the projects +that we deal with demand more (often local) control over the +components of a number. + +What makes structure related data complex is that we need to keep +track of each aspect in order to be able to reproduce the +rendering in for instance a table of contents, where we also may +want to change some of the aspects (for instance separators in a +different color). Another pending issue is \XML\ and although we +could normally deal with this quite well, it started making sense +to make all multi|-|pass data (registers, tables of content, +sorted lists, references, etc.) more \XML\ aware. This is a +somewhat hairy task, if only because we need to switch between +\TEX\ mode and \XML\ mode when needed and at the same time keep an +eye on unwanted expansion: do we keep structure in the content or +not? + +Rewriting the code that deals with these aspects of typesetting is +the first step in a separation of code in \MKII\ and \MKIV. Until +now we tried to share much code, but this no longer makes sense. +Also, at the \CONTEXT\ conference in Bohinj (2008) it was decided +that given the development of \MKIV, it made sense to freeze +\MKII\ (apart from bug fixes and minor extensions). This decision +opens the road to more drastic changes. We will roll back some of +the splits in code that made sharing code possible and just +replace whole components of \CONTEXT\ as a whole. This also gives +us the opportunity to review code more drastically than until now +in the perspective of \ETEX. + +Because this stage in the rewrite of \CONTEXT\ might bring some +compatibility issues with it (especially for users who use the +more obscure tuning options), I will discuss some of the changes +here. A bit of understanding might make users more tolerant. + +The core data structure that we need to deal with is a number, which +can be constructed in several ways. + +\def\NotaBeneR{\inframed[frame=off,background=color,backgroundcolor=mktransparentred]} +\def\NotaBeneG{\inframed[frame=off,background=color,backgroundcolor=mktransparentgreen]} +\def\NotaBeneB{\inframed[frame=off,background=color,backgroundcolor=mktransparentblue]} +\def\NotaBeneY{\inframed[frame=off,background=color,backgroundcolor=mktransparentyellow]} +\def\NotaBeneS{\inframed[frame=off,background=color,backgroundcolor=mktransparentgray]} + +\starttabulate[|l|l|] +\NC sectioning \NC \NotaBeneR{1.A.2.II} some title \NC \NR +\NC pagenumber \NC page \NotaBeneR{1.A}\NotaBeneG{--}\NotaBeneB{23} \NC \NR +\NC reference \NC in chapter \NotaBeneR{2.II} \NC \NR +\NC marking \NC \NotaBeneR{A}: some title with preceding number \NC \NR +\NC contents \NC \NotaBeneR{2.II} some title with some page number \NotaBeneR{1.A}\NotaBeneG{--}\NotaBeneB{23} \NC \NR +\NC index \NC some word \NotaBeneB{23}, \NotaBeneR{A}\NotaBeneG{--}\NotaBeneB{42}---\NotaBeneR{B}\NotaBeneG{--}\NotaBeneB{48} \NC \NR +\NC itemize \NC \NotaBeneY{a} first item \NotaBeneY{a.1} subitem item \NC \NR +\NC enumerate \NC example \NotaBeneR{1.A.2.II}\NotaBeneG{.}\NotaBeneY{a} \NC \NR +\NC floatcaption \NC figure \NotaBeneR{1}\NotaBeneG{--}\NotaBeneB{2} \NC \NR +\NC footnotes \NC note \NotaBeneS{\symbol[3]} \NC \NR +\stoptabulate + +In this table we see how numbers are composed: + +\starttabulate[|l|p|] +\NC \NotaBeneR{section number} \NC It has several components, separated by symbols + and with an optional final symbol \NC \NR +\NC \NotaBeneG{separator} \NC This can be different for each level and can + have dedicated rendering options \NC \NR +\NC \NotaBeneB{page number} \NC That can be preceded by a (partial) sectionnumber + and separated from the page number by another symbol \NC \NR +\NC \NotaBeneY{counter} \NC It can be preceded by a (partial) sectionnumber and + can also have subnumbers with its own separation + properties \NC \NR +\NC \NotaBeneS{symbol} \NC Sometimes numbers get represented by symbols in which + case we use pagewise restarting symbol sets \NC \NR +\stoptabulate + +Say that at some point we store a section number and/or page +number. With the number we need to store information about the +conversion (number, character, roman numeral, etc) and the +separators, including their rendering. However, when we reuse that +stored information we might want to discard some components and/or +use a different rendering. In traditional \CONTEXT\ we have +control over some aspects but due to the way numbers are stored +for later reuse this control is limited. + +Say that we have cloned a subsection head as follows: + +\starttyping +\definehead[MyHead][section] +\stoptyping + +This is used as: + +\starttyping +\MyHead[example]{Example} +\stoptyping + +In \MKII\ we save a list entry (which has the number, the title +and a reference to the page) and a reference to the the number, +the title and the page (tagged \type {example}). Page numbers are +stored in such a way that we can filter at specific section +levels. This permits local tables of contents. + +The entry in the multi pass data file looks as follows (we collect all +multi pass data in one file): + +\starttyping +\mainreference{}{example}{2--0-1-1-0-0-0-0--1}{1}{{I.I}{Example}}% +\listentry{MyHead}{2}{I.I}{Example}{2--0-1-1-0-0-0-0--1}{1}% +\stoptyping + +In \MKIV\ we store more information and use tables for that. Currently +the entry looks as follows: + +\starttyping +structure.lists.collected={ + { + ... + }, + { + metadata={ + catcodes=4, + coding="tex", + internal=2, + kind="section", + name="MyHead", + reference="example", + }, + pagenumber={ + numbers={ 1, 1, 0 }, + }, + sectionnumber={ + conversion="R", + conversionset="default", + numbers={ 0, 2 }, + separatorset="default", + }, + sectiontitle={ + label="MyHead", + title="Example", + }, + }, + { + ... + }, +} +\stoptyping + +There can be much more information in each of the subtables. For +instance, the \type {pagenumber} and \type {sectionnumber} +subtables can have \type {prefix}, \type {separatorset}, +\type{conversion}, \type {conversionset}, \type {stopper}, \type +{segments} and \type {connector} fields, and the \type {metadata} +table can contain information about the \XML\ root document so +that associated filtering and handling can be reconstructed. With the +section title we store information about the preceding label text +(seldom used, think of \quote{Part B}). + +This entry is used for lists as well as cross referencing. +Actually, the stored information is also used for markings +(running heads). This means that these mechanisms must be able to +distinguish between where and how information is stored. + +These tables look rather verbose and indeed they are. We end up +with much larger multi|-|pass data files but fortunately loading them +is quite efficient. Serializing on the other hand might cost some time +which is compensated by the fact that we no longer store +information in token lists associated with nodes in \TEX's lists +and in the future we might even move more data handling to the +\LUA\ end. Also, in future versions we will share similar data +(like page number information) more efficiently. + +Storing date at the \LUA\ end also has consequences for the +typesetting. When specific data is needed a call to \LUA\ is +necessary. In the future we might offer both push and pull methods +(\LUA\ pushing information to the typesetting code versus \LUA\ +triggering typesetting code). For lists we pull, and for registers +we currently push. Depending on our experiences we might change +these strategies. + +A side effect of the rewrite is that we force more consistency. +For instance, you see a \type {conversion} field in the list. This +is the old way of defining the way a number gets converted. The +modern approach is to use sets. Because we now have a more +stringent inheritance model at the user interface level, this +might lead to incompatible conversions at lower levels (when +unset). Instead of cooking up some nasty compatibility hacks, we +accept some incompatibility, if only because users have to adapt +their styles to new font technology anyway. And for older +documents there is still \MKII. + +Instead of introducing many extra configuration variables (for each +level of sectioning) we introduce sets. These replace some of the +existing parameters and are the follow up on some (undocumented) +precursor of sets. Examples of sets are: + +\starttyping +\definestructureseparatorset [default][][.] +\definestructureconversionset[default][][numbers] +\definestructureresetset [default][][0] +\definestructureprefixset [default][section-2,section-3][] +\definestructureseparatorset [appendix][][.] +\definestructureconversionset[appendix][Romannumerals,Characters][] +\definestructureresetset [appendix][][0] +\stoptyping + +The third parameter is the default value. The sets that relate to typesetting +can have a rendering specification: + +\starttyping +\definestructureseparatorset + [demosep] + [demo->!,demo->?,demo->*,demo->@] + [demo->/] +\stoptyping + +Here we apply \type{demo} to each of the separators as well as to the +default. The renderer is defined with: + +\starttyping +\defineprocessor[demo][style=\bfb,color=red] +\stoptyping + +You can imagine that, although this is quite possible in \TEX, +dealing with sets, splitting them, handling the rendering, etc.\ +is easier in \LUA\ that in \TEX. Of course the code still looks +somewhat messy, if only because the problem is messy. Part if this +mess is related to the fact that we might have to specify all +components that make up a number. + +\starttabulate +\NC section \NC section number as part of head \NC \NR +\NC list \NC section number as part of list entry \NC \NR +\NC \NC section number as part of page number prefix \NC \NR +\NC \NC (optionally prefixed) page number \NC \NR +\NC counter \NC section number as part of counter prefix \NC \NR +\NC \NC (optionally prefixed) counter value(s) \NC \NR +\NC pagenumber \NC section number as part of page number \NC \NR +\NC \NC pagenumber components (realpage, page, subpage) \NC \NR +\stoptabulate + +As a result we have upto 3 sets of parameters: + +\starttabulate +\NC section \NC \type{section*} \NC \NR +\NC list \NC \type{section*} \type{prefix*} \type{page*} \NC \NR +\NC counter \NC \type{section*} \type{number*} \NC \NR +\NC pagenumber \NC \type{prefix*} \type{page*} \NC \NR +\stoptabulate + +When reimplementing the structure related commands, we also have +to take mechanisms into account that relate to them. For instance, +index sorter code is also used for sorted lists, so when we adapt +one mechanism we also have to adapt the other. The same is true +for cross references, that are used all over the place. It helps +that for the moment we can omit the more obscure interaction +related mechanism, if only because users will seldom use them. +Such mechanisms are also related to the backend and we're not yet +in the stage where we upgrade the backend code. In case you wonder +why references can be such a problematic areas think of the +following: + +\starttyping +\goto{here}[page(10),StartSound{ping},StartVideo{demo}] +\goto{there}[page(10),VideLayer{example},JS(SomeScript{hi world})] +\goto{anywhere}[url(mypreviouslydefinedurl)] +\stoptyping + +The \CONTEXT\ cross reference mechanism permits mixed usage of simple +hyperlinks (jump to some page) and more advanced viewer actions like +showing widgets and runnign \JAVASCRIPT\ code. And even a simple +reference like: + +\starttyping +\at{here and there}[somefile::sometarget] +\stoptyping + +involves some code because we need to handle the three words as +well as the outer reference. \footnote {Currently \CONTEXT\ does +its own splitting of multiword references, and does so by reusing +hyperlink resources in the backend format. This might change in +the future.} The reason why we need to reimplement referencing +along with structure lays in the fact that for some structure +components (like section headers and float references) we no +longer store cross reference information separately but filter it +from the data stored in the list (see example before). + +The \LUA\ code involved in dealing with the more complex +references shown here is much more flexible and robust than the +original \TEX\ code. This is a typical example of where the +accumulated time spent on the \TEX\ based solution is large +compared to the time spent on the \LUA\ variant. It's like driving +200 km by car through hilly terrain and wondering how one did that +in earlier times. Just like today scenery is not by definition better +than yestedays, \MKIV\ code is not always better than \MKII\ code. + +\stopcomponent |