1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
|
% language=uk
\environment mk-environment
\startcomponent mk-order
\chapter{The order of things}
Normally the text that makes up a paragraph comes directly from
the input stream or macro expansions (think of labels). When \TEX\
has collected enough content to make a paragraph, for instance
because a \type {\par} token signals it \TEX\ will try to create
one. The raw material available for making such a paragraph is
linked in a list nodes: references to glyphs in a font, kerns
(fixed spacing), glue (flexible spacing), penalties (consider them
to be directives), whatsits (can be anything, e.g.\ \PDF\ literals
or hyperlinks). The result is a list of horizontal boxes (wrappers with
lists that represent \quote {lines}) and this is either wrapped in
vertical box of added to the main vertical list that keeps the
page stream.
The treatment consists of four activities:
\startitemize[packed]
\item construction of ligatures (an f plus an i can become fi)
\item hyphenation of words that cross a line boundary
\item kerning of characters based on information in the font
\item breaking the list in lines in the most optimal way
\stopitemize
The process of breaking into lines is also influenced by
protrusion (like hanging punctuation) and expansion
(hz-optimization) but here we will not take these processes
into account. There are numerous variables that control
the process and the quality.
These activities are rather interwoven and optimized. For
instance, in order to hyphenate, ligatures are to be decomposed
and|/|or constructed. Hyphenation happens when needed. Decisions
about optimal breakpoints in lines can be influenced by penalties
(like: not to many hyphenated words in a row) and permitting extra
stretch between words. Because a paragraph can be boxed and
unboxed, decomposed and fed into the machinery again, information
is kept around. Just imagine the following: you want to measure
the width of a word and therefore you box it. In order to get the
right dimensions, \TEX\ has to construct the ligatures and add
kerns. However, when we unbox that word and feed it into the
paragraph builder, potential hyphenation points have to be
consulted and at such a point might lay between the characters
that resulted in the ligature. You can imagine that adding (and
removing) inter|-|character kerns complicates the process even
more.
At the cost of some extra runtime and memory usage, in \LUATEX\
these steps are more isolated. There is a function that builts
ligatures, one that kerns characters, and another one that
hyphenates all words in a list, not just the ones that are
candidate for breaking. The potential breakpoints (called
discretionaries) can contain ligature information as well. The
linebreak process is also a separate function.
The order in which this happens now is:
\startitemize[packed,intro]
\item hyphenation of words
\item building of ligatures from sequences of glyphs
\item kerning of glyphs
\item breaking all this into lines
\stopitemize
One can discuss endless about the terminology here: are we dealing
with characters or with glyphs. When a glyph node is made, it
contains a reference to a slot in a font. Because in traditional
\TEX\ the number of slots is limited to 256 the relationship
between characters in the input and the shape in the font, called
glyph, is kind of indirect (the input encoding versus font
encoding issue) while in \LUATEX\ we can keep the font in
\UNICODE\ encoding if we want. In traditional \TEX, hyphenation is
based on the font encoding and therefore glyphs, and although in
\LUATEX\ this is still the case, there we can more safely talk of
characters till we start mapping then to shapes that have no
\UNICODE\ point. This is of course macro package dependent but in
\CONTEXT\ \MKIV\ we normalize all input to \UNICODE\ exclusively.
The last step is now really isolated and for that reason we can
best talk in terms of preparation of the to-be paragraph when
we refer to the first three activities. In \LUATEX\ these three
are available as functions that operate on a node list. They each
have their own callback so we can disable them by replacing the
default functions by dummies. Then we can hook in a new function
in the two places that matter: \type {hpack_filter} and \type
{pre_linebreak_filter} and move the preparation to there.
A simple overload is shown below. Because the first node is always
a whatsit that holds directional information (and at some point in
the future maybe even more paragraph related state info), we can
safely assume that \type {head} does not change. Of course this
situation might change when you start adding your own
functionality.
\starttyping
local function my_preparation(head)
local tail = node.slide(head) -- also add prev pointers
tail = lang.hyphenate(head,tail)
tail = node.ligaturing(head,tail)
tail = node.kerning(head,tail)
return head
end
callback.register("pre_linebreak_filter", my_preparation)
callback.register("hpack_filter", my_preparation)
local dummy = function(head,tail) return tail end
callback.register("hyphenate", dummy)
callback.register("ligaturing", dummy)
callback.register("kerning", dummy)
\stoptyping
It might be clear that the order of actions matter. It might also
be clear that you are responsible for that order yourself. There
is no pre||cooked mechanism for guarding your actions and there are
several reasons for this:
\startitemize
\item Each macro package does things its own way so any hard-coded
mechanism would be replaced and overloaded anyway. Compare this to
the usage of catcodes, font systems, auxiliary files, user
interfaces, handling of inserts etc. The combination of callbacks,
the three mentioned functions and the availability of \LUA\ makes
it possible to implement any system you like.
\item Macro packages might want to provide hooks for specialized
node list processing, and since there are many places where code
can be hooked in, some kind of oversight is needed (real people
who keep track of interference of user supplied features, no
program can do that).
\item User functions can mess up the node list and successive
actions then might make the wrong assumptions. In order to guard
this, macro packages might add tracing options and again there are
too many ways to communicate with users. Debugging and tracing has
to be embedded in the bigger system in a natural way.
\stopitemize
In \CONTEXT\ \MKIV\ there are already a few places where users can
hook code into the task list, but so far we haven't really
encouraged that. The interfaces are simply not stable enough yet.
On the other hand, there are already quite some node list
manipulators at work. The most prominent one is the \OPENTYPE\
feature handler. That one replaces the ligature and kerning
functions (at least for some fonts). It also means that we need to
keep an eye on possible interferences between \CONTEXT\ \MKIV\
mechanisms and those provided by \LUATEX.
For fonts, that is actually quite simple: the \LUATEX\ functions
use ligature and kerning information stored in the \TFM\ table,
and for \OPENTYPE\ fonts we simply don't provide that information
when we define a font, so in that case \LUATEX\ will not ligature
and kern. Users can influence this process to some extend by
setting the \type {mode} for a specific instance of a font to
\type {base} or \type {node}. Because \TYPEONE\ fonts have no
features like \OPENTYPE\ such fonts are (at least currently)
always are processed in base mode.
Deep down in \CONTEXT\ we call a sequence of actions a \quote
{task}. One such task is \quote {processors} and the actions
discussed so far are in this category. Within this category we
have subcategories:
\starttabulate[|l|p|]
\NC \bf subcategory \NC \bf intended usage \NC \NR
\HL
\NC before \NC experimental (or module) plugins \NC \NR
\NC normalizers \NC cleanup and preparation handlers \NC \NR
\NC characters \NC operations on individual characters \NC \NR
\NC words \NC operations on words \NC \NR
\NC fonts \NC font related manipulations \NC \NR
\NC lists \NC manipulations on the list as a whole \NC \NR
\NC after \NC experimental (or module) plugins \NC \NR
\stoptabulate
Here \quote {plugins} are experimental handlers or specialized
ones provided in modules that are not part of the kernel. The categories
are not that distinctive and only provide a convenient way to group
actions.
Examples of normalizers are: checking for missing characters and
replacing character references by fallbacks. Character processors
are for instance directional analysers (for right to left
typesetting), case swapping, and specialized character triggered
hyphenation (like compound words). Word processors deal with
hyphenation (here we use the default function provided by \LUATEX)
and spell checking. The font processors deal with \OPENTYPE\ as
well as the ligature building and kerning of other font types.
Finally, the list processors are responsible for tasks like special
spacing (french punctuation) and kerning (additional
inter||character kerning). Of course, this all is rather \CONTEXT\
specific and we expect to add quite some more less trivial handlers
the upcoming years.
Many of these handlers are triggered by attributes. Nodes can have
many attributes and each can have many values. Traditionally \TEX\
had only a few attributes: language and font, where the first is
not even a real attribute and the second is only bound to glyph
nodes. In \LUATEX\ language is also a glyph property. The nice
thing about attributes is that they can be set at the \TEX\ end
and obey grouping. This makes them for instance perfect for
implementing color mechanims. Because attributes are part of the
nodes, and not nodes themselves, they don't influence or interfere
processing unless one explicitly tests for them and acts
accordingly.
In addition to the mentioned task \quote {processors} we also have
a task \quote {shipouts} and there will be more tasks in future
versions of \CONTEXT. Again we have subcategories, currently:
\starttabulate[|l|p|]
\NC \bf subcategory \NC \bf intended usage \NC \NR
\HL
\NC before \NC experimental (or module) plugins \NC \NR
\NC normalizers \NC cleanup and preparation handlers \NC \NR
\NC finishers \NC manipulations on the list as a whole \NC \NR
\NC after \NC experimental (or module) plugins \NC \NR
\stoptabulate
An example of a normalizer is cleanup of the \quote {to be shipped
out} list. Finishers deal with color, transparency, overprint,
negated content (sometimes used in page imposition), special
effects effect (like outline fonts) and viewer layers (something
\PDF). Quite possible hyperlink support will also be handled there
but not before the backend code is rewritten.
The previous description is far from complete. For instance, not
all handlers use the same interface: some work \type {head}
onwards, some need a \type {tail} pointer too. Some report back
success or failure. So the task handler needs to normalize their
usage. Also, some effort goes into optimizing the task in such a
way that processing the document is still reasonable fast. Keep in
mind that each construction of a box invokes a callback, and there
are many boxes used for constructing a page. Even a nilled
callback is one, so for a simple one word paragraph four callbacks
are triggered: the (nilled) hyphenate, ligature and kern callbacks
as well as the one called \type {pre_linebreak_filter}. The task
handler that we plug in the filter callbacks calls many functions
and each of them does one of more passes over the node list, and
in turn might do many call to functions. You can imagine that
we're quite happy that \TEX\ as well as \LUA\ is so efficient.
As I already mentioned, implementing a task handler as well as
deciding what actions within tasks to perform in what order is
specific for the way a macro package is set up. The following code
can serve as a starting point
\starttyping
filters = { } -- global namespace
local list = { }
function filters.add(fnc,n)
if not n or n > #list + 1 then
table.insert(list,#list+1)
elseif n < 0 then
table.insert(list,1)
else
table.insert(list,n)
end
end
function filters.remove(fnc,n)
if n and n > 0 and n <= #list then
table.remove(list,n)
end
end
local function run_filters(head,...)
local tail = node.slide(head)
for _, fnc in ipairs(list) do
head, tail = fnc(head,tail,...)
end
return head
end
local function hyphenation(head,tail)
return head, tail, lang.hyphenate(head,tail) -- returns done
end
local function ligaturing(head,tail)
return node.ligaturing(head,tail) -- returns head,tail,done
end
local function kerning(head,tail)
return node.kerning(head,tail) -- returns head,tail,done
end
filters.add(hyphenation)
filters.add(ligaturing)
filters.add(kerning)
callback.register("pre_linebreak_filter", run_filters)
callback.register("hpack_filter", run_filters)
\stoptyping
Although one can inject extra filters by using the \type {add}
function it may be clear that this can be dangerous due to
interference. Therefore a slightly more secure variant is the
following, where \type {main} is reserved for macro package
actions and the others can be used by add||ons.
\starttyping
filters = { } -- global namespace
local list = {
pre = { }, main = { }, post = { },
}
local order = {
"pre", "main", "post"
}
local function somewhere(where)
if not where then
texio.write_nl("error: invalid filter category")
elseif not list[where] then
texio.write_nl(string.format("error: invalid filter category '%s'",where))
else
return list[where]
end
return false
end
function filters.add(where,fnc,n)
local list = somewhere(where)
if not list then
-- error
elseif not n or n > #list + 1 then
table.insert(list,#list+1)
elseif n < 0 then
table.insert(list,1)
else
table.insert(list,n)
end
end
function filters.remove(where,fnc,n)
local list = somewhere(where)
if list and n and n > 0 and n <= #list then
table.remove(list,n)
end
end
local function run_filters(head,...)
local tail = node.slide(head)
for _, lst in pairs(order) do
for _, fnc in ipairs(list[lst]) do
head, tail = fnc(head,tail,...)
end
end
return head
end
filters.add("main",hyphenation)
filters.add("main",ligaturing)
filters.add("main",kerning)
callback.register("pre_linebreak_filter", run_filters)
callback.register("hpack_filter", run_filters)
\stoptyping
Of course, \CONTEXT\ users who try to use this code will
be punished by loosing much of the functionality already
present, simply because we use yet another variant of the
above code.
\stopcomponent
|