summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/luatex/luatex-modifications.tex
blob: 28431fe2942fd11e7591248709b4342e6fb5a975 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
\environment luatex-style
\environment luatex-logos

\startcomponent luatex-modifications

\startchapter[reference=modifications,title={Modifications}]

\startsection[title=The merged engines]

\startsubsection[title=The need for change]

The first version of \LUATEX\ only had a few extra primitives and it was largely
the same as \PDFTEX. Then we merged substantial parts of \ALEPH\ into the code
and got more primitives. When we got more stable teh decision was made to clean
up the rather hybrid nature of the program. This means that some primnitives have
been promoted to core primitives, often with a different name, and that others
were removed. This made it possible to start cleaning up the code base. We will
describe most in following paragraphs.

Besides the expected changes caused by new functionality, there are a number of
not|-|so|-|expected changes. These are sometimes a side|-|effect of a new
(conflicting) feature, or, more often than not, a change neccessary to clean up
the internal interfaces. These will also be mentioned.

\stopsubsection

\startsubsection[title=Changes from \TEX\ 3.1415926]

Of course it all starts with traditional \TEX. Even if we started with \PDFTEX,
most still comes from the original. But we divert a bit.

\startitemize

\startitem
    The current code base is written in \CCODE, not \PASCAL. We use \CWEB\
    when possible.
\stopitem

\startitem
    See \in {chapter} [languages] for many small changes related to paragraph
    building, language handling and hyphenation. The most important change is
    that adding a brace group in the middle of a word (like in \type {of{}fice})
    does not prevent ligature creation.
\stopitem

\startitem
    There is no pool file, all strings are embedded during compilation.
\stopitem

\startitem
    The specifier \type {plus 1 fillll} does not generate an error. The extra
    \quote{l} is simply typeset.
\stopitem

\startitem
    The upper limit to \type {\endlinechar} and \type {\newlinechar} is 127.
\stopitem

\startitem
    The hz optimization code has been partially redone so that we no longer need
    to create extra font instances. The front- and backend have been decoupled and
    more efficient (\PDF) code is generated.
\stopitem

\stopitemize

\stopsubsection

\startsubsection[title=Changes from \ETEX\ 2.2]

Being the de factor standard extension of course we provide the \ETEX\
functionality, but with a few small adaptions.

\startitemize

\startitem
    The \ETEX\ functionality is always present and enabled so the prepended
    asterisk or \type {-etex} switch for \INITEX\ is not needed.
\stopitem

\startitem
    The \TEXXET\ extension is not present, so the primitives \type
    {\TeXXeTstate}, \type {\beginR}, \type {\beginL}, \type {\endR} and \type
    {\endL} are missing.
\stopitem

\startitem
    Some of the tracing information that is output by \ETEX's \type
    {\tracingassigns} and \type {\tracingrestores} is not there.
\stopitem

\startitem
    Register management in \LUATEX\ uses the \ALEPH\ model, so the maximum value
    is 65535 and the implementation uses a flat array instead of the mixed
    flat|\&|sparse model from \ETEX.
\stopitem

\startitem
    The \type {\savinghyphcodes} command is a no|-|op. \in {Chapter} [languages]
    explains why.
\stopitem

\startitem
    When kpathsea is used to find files, \LUATEX\ uses the \type {ofm} file
    format to search for font metrics. In turn, this means that \LUATEX\ looks at
    the \type {OFMFONTS} configuration variable (like \OMEGA\ and \ALEPH) instead
    of \type {TFMFONTS} (like \TEX\ and \PDFTEX). Likewise for virtual fonts
    (\LUATEX\ uses the variable \type {OVFFONTS} instead of \type {VFFONTS}).
\stopitem

\stopitemize

\stopsubsection

\startsubsection[title=Changes from \PDFTEX\ 1.40]

Because we want to produce \PDF\ the most natural starting point was the popular
\PDFTEX\ program. We inherit the stable features, dropped most of the
experimental code and promoted some functionality to core \LUATEX\ functionality
which in turn triggered renaming primitives.

\startitemize

\startitem
    The (experimental) support for snap nodes has been removed, because it is
    much more natural to build this functionality on top of node processing and
    attributes. The associated primitives that are now gone are: \type
    {\pdfsnaprefpoint}, \type {\pdfsnapy}, and \type {\pdfsnapycomp}.
\stopitem

\startitem
    The (experimental) support for specialized spacing around nodes has also been
    removed. The associated primitives that are now gone are: \type
    {\pdfadjustinterwordglue}, \type {\pdfprependkern}, and \type {\pdfappendkern}, as
    well as the five supporting primitives \type {\knbscode}, \type {\stbscode}, \type
    {\shbscode}, \type {\knbccode}, and \type {\knaccode}.
\stopitem

\startitem
    A number of \quote {pdftex primitives} have been removed as they can be
    implemented using \LUA:

    \start \raggedright
    \type {\pdfelapsedtime}, \type {\pdfescapehex}, \type {\pdfescapename}, \type
    {\pdfescapestring}, \type {\pdffiledump}, \type {\pdffilemoddate}, \type
    {\pdffilesize}, \type {\pdfforcepagebox}, \type {\pdflastmatch}, \type
    {\pdfmatch}, \type {\pdfmdfivesum}, \type {\pdfmovechars}, \type
    {\pdfoptionalwaysusepdfpagebox}, \type {\pdfoptionpdfinclusionerrorlevel},
    \type {\pdfresettimer}, \type {\pdfshellescape}, \type {\pdfstrcmp} and \type
    {\pdfunescapehex}
    \par \stop
\stopitem

\startitem
    The version related primitives \type {\pdftexbanner}, \type {\pdftexversion}
    and \type {\pdftexrevision} are no longer present as there is no longer a
    strict relationship with \PDFTEX\ development.
\stopitem

\startitem
    The experimental snapper mechanism has been removed and therefore also the
    primitives:

    \start \raggedright
    \type {\pdfignoreddimen}, \type {\pdffirstlineheight}, \type
    {\pdfeachlineheight}, \type {\pdfeachlinedepth} and \type
    {\pdflastlinedepth}
    \par \stop
\stopitem

\startitem
    The experimental primitives \type {\primitive}, \type {\ifprimitive}, \type
    {\ifabsnum} and \type {\ifabsdim} are promoted to core primitives. The \type
    {\pdf*} prefixed originals are not available.
\stopitem

\startitem
    The \PNG\ transparency fix from 1.40.6 is not applied as high|-|level
    support is pending.
\stopitem

\startitem
    Two extra token lists are provides, \type {\pdfxformresources} and \type
    {\pdfxformattr}, as an alternative to \type {\pdfxform} keywords.
\stopitem

\startitem
    The current version of \LUATEX\ no longer replaces and|/|or merges fonts in
    embedded pdf files with fonts of the enveloping \PDF\ document. This
    regression may be temporary, depending on how the rewritten font backend will
    look like.
\stopitem

\startitem
    The primitives \type {\pdfpagewidth} and \type {\pdfpageheight} have been removed
    because \type {\pagewidth} and \type {\pageheight} have that purpose.
\stopitem

\startitem
    The primitives \type {\pdfnormaldeviate}, \type {\pdfuniformdeviate}, \type
    {\pdfsetrandomseed} and \type {\pdfrandomseed} have been promoted to core
    primitives without \type {pdf} prefix so the original commands are no longer
    recognized.
\stopitem

\startitem
    The primitives \type {\ifincsname}, \type {\expanded} and \type {\quitvmode} are now
    core primitives.
\stopitem

\startitem
    As the hz and protrusion mechanism are part of the core the related
    primitives \type {\lpcode}, \type {\rpcode}, \type {\efcode}, \type
    {\leftmarginkern}, \type {\rightmarginkern} are promoted to core primitives. The
    two commands \type {\protrudechars} and \type {\adjustspacing} replace their
    prefixed with \type {\pdf} originals.
\stopitem

\startitem
    The \type {\tagcode} primitive is promoted to core primitive.
\stopitem

\startitem
    The \type {\letterspacefont} feature is now part of the core but will not be
    changed (improved). We just provide it for legacy use.
\stopitem

\startitem
    The \type {\pdfnoligatures} primitive is now \type {\ignoreligaturesinfont}.
\stopitem

\startitem
    The \type {\pdffontexpand} primitive is now \type {\expandglyphsinfont}.
\stopitem

\startitem
    Because position tracking is also available in \DVI\ mode the
    \type {\savepos}, \type {\lastxpos} and \type {\lastypos} commands now
    replace their \type {pdf} prefixed originals.
\stopitem

\startitem
    Candidates for removal are \type {\pdfcolorstackinit} and \type
    {\pdfcolorstack}.
\stopitem

\startitem
    Candidates for replacement are \type {\pdfoutput} (\type {\outputmode}) and
    \type {\pdfmatrix} (something with a normal syntax).
\stopitem

\startitem
    The introspective primitives \type {\pdflastximagecolordepth} and \type
    {\pdfximagebbox} have been removed. One can use external applications to
    determine these properties or use the built|-|in \type {img} library.
\stopitem

\stopitemize

One change involves the so called xforms and ximages. In \PDFTEX\ these are
implemented as so called whatsits. But contrary to other whatsits they have
dimensions that need to be taken into account when for instance calculating
optimal linebreaks. In \LUATEX\ these are now promoted to normal nodes, which
simplifies code that needs those dimensions.

Another reason for promotion is that these are useful concepts. Backends can
provide the ability to use content that has been rendered in several places,
and images are also common. For that reason we also changed the names:

\starttabulate[|l|l|]
\NC \bf new name                         \NC \bf old name \NC \NR
\NC \type {\saveboxresource}             \NC \type {\pdfxform}           \NC \NR
\NC \type {\saveimageresource}           \NC \type {\pdfximage}          \NC \NR
\NC \type {\useboxresource}              \NC \type {\pdfrefxform}        \NC \NR
\NC \type {\useimageresource}            \NC \type {\pdfrefximage}       \NC \NR
\NC \type {\lastsavedboxresourceindex}   \NC \type {\pdflastxform}       \NC \NR
\NC \type {\lastsavedimageresourceindex} \NC \type {\pdflastximage}      \NC \NR
\NC \type {\lastsavedimageresourcepages} \NC \type {\pdflastximagepages} \NC \NR
\stoptabulate

There are a few \type {\pdf...} primitives that relate to this but these are
typical backend specific ones. The index that gets returned is to be considered
as \quote {just a number} and although it still has the same meaning (object
related) as before, you should not depend on that.

\stopsubsection

\startsubsection[title=Changes from \ALEPH\ RC4]

Because we wanted proper directional typesetting the \ALEPH\ mechanisms looked
most attractive. These are rather close to the ones provided by \OMEGA, so what
we say next applies to both these programs.

\startitemize

\startitem
    The extended 16-bit math primitives (\type {\omathcode} etc.) have been
    removed.
\stopitem

\startitem
    The \OCP\ processing is no longer supported at all. As a consequence, the
    following primitives have been removed:

    \start \raggedright
    \type {\ocp}, \type {\externalocp}, \type {\ocplist}, \type {\pushocplist},
    \type {\popocplist}, \type {\clearocplists}, \type {\addbeforeocplist}, \type
    {\addafterocplist}, \type {\removebeforeocplist}, \type {\removeafterocplist}
    and \type {\ocptracelevel}
    \par \stop
\stopitem

\startitem
    \LUATEX\ only understands 4~of the 16~direction specifiers of \ALEPH: \type
    {TLT} (latin), \type {TRT} (arabic), \type {RTT} (cjk), \type {LTL}
    (mongolian). All other direction specifiers generate an error.
\stopitem

\startitem
    The input translations from \ALEPH\ are not implemented, the related
    primitives are not available:

    \start \raggedright
    \type {\DefaultInputMode}, \type {\noDefaultInputMode}, \type {\noInputMode},
    \type {\InputMode}, \type {\DefaultOutputMode}, \type {\noDefaultOutputMode},
    \type {\noOutputMode}, \type {\OutputMode}, \type {\DefaultInputTranslation},
    \type {\noDefaultInputTranslation}, \type {\noInputTranslation}, \type
    {\InputTranslation}, \type {\DefaultOutputTranslation}, \type
    {\noDefaultOutputTranslation}, \type {\noOutputTranslation} and \type
    {\OutputTranslation}
    \par \stop
\stopitem

\startitem
    Several bugs hav ebeen fixed. The \type {\hoffset} bug when \type {\pagedir TRT}
    is gone, removing the need for an explicit fix to \type {\hoffset}. Also bug
    causing \type {\fam} to fail for family numbers above 15 is fixed. A fair amount
    of other minor bugs are fixed as well, most of these related to \type
    {\tracingcommands} output.
\stopitem

\startitem
    The scanner for direction specifications now allows an optional space after
    the direction is completely parsed.
\stopitem

\startitem
    The \type {^^} notation can come in five and six item repetitions also, to
    insert characters that do not fit in the BMP.
\stopitem

\startitem
    Glues {\it immediately after} direction change commands are not legal
    breakpoints.
\stopitem

\startitem
    Several mechanisms that need to be right|-|to|-|left aware have been
    improved. For instance placement of formula numbers.
\stopitem

\startitem
    The page dimension related primitives \type {\pagewidth} and \type {\pageheight} have
    been promoted to core primitives.
\stopitem

\startitem
    The primitives \type {\charwd}, \type {\charht}, \type {\chardp} and \type {\charit}
    have been removes as we have the \ETEX\ variants \type {\fontchar*}.
\stopitem

\startitem
    The two dimension registers \type {\pagerightoffset} and \type
    {\pagebottomoffset} are now core primitives.
\stopitem

\startitem
    The direction related primitives \type {\pagedir}, \type {\bodydir}, \type
    {\pardir}, \type {\textdir}, \type {\mathdir} and \type {\boxdir} are now
    core primitives.
\stopitem

\startitem
    The promotion of primitives to core primitives as well as the removed of all
    others mean that the initialization namespace \type {aleph} is gone.
\stopitem

\stopitemize

\stopsubsection

\startsubsection[title=Changes from standard \WEBC]

The compilation framework is \WEBC\ and we keep using that but without the
\PASCAL\ to \CCODE\ step. This framework also provides some common features that
deal with reading bytes from files and locating files in \TDS. This is what we do
different:

\startitemize

\startitem
    There is no mltex support.
\stopitem

\startitem
    There is no enctex support.
\stopitem

\startitem
    The following commandline switches are silently ignored, even in non|-|\LUA\
    mode: \type {-8bit}, \type {-translate-file}, \type {-mltex}, \type {-enc}
    and \type {-etex}.
\stopitem

\startitem
    The \type {\openout} whatsits are not written to the log file.
\stopitem

\startitem
    Some of the so|-|called web2c extensions are hard to set up in non|-|\KPSE\
    mode because \type {texmf.cnf} is not read: \type {shell-escape} is off (but
    that is not a problem because of \LUA's \type {os.execute}), and the paranoia
    checks on \type {openin} and \type {openout} do not happen (however, it is
    easy for a \LUA\ script to do this itself by overloading \type {io.open}).
\stopitem

\startitem
    The \quote{E} option does not do anything useful.
\stopitem

\stopitemize

\stopsubsection

\stopsection

\startsection[title=Implementation notes]

\startsubsection[title=Memory allocation]

The single internal memory heap that traditional \TEX\ used for tokens and nodes
is split into two separate arrays. Each of these will grow dynamically when
needed.

The \type {texmf.cnf} settings related to main memory are no longer used (these
are: \type {main_memory}, \type {mem_bot}, \type {extra_mem_top} and \type
{extra_mem_bot}). \quote {Out of main memory} errors can still occur, but the
limiting factor is now the amount of RAM in your system, not a predefined limit.

Also, the memory (de)allocation routines for nodes are completely rewritten. The
relevant code now lives in the C file \type {texnode.c}, and basically uses a
dozen or so \quote {avail} lists instead of a doubly|-|linked model. An extra
function layer is added so that the code can ask for nodes by type instead of
directly requisitioning a certain amount of memory words.

Because of the split into two arrays and the resulting differences in the data
structures, some of the macros have been duplicated. For instance, there are now
\type {vlink} and \type {vinfo} as well as \type {token_link} and \type
{token_info}. All access to the variable memory array is now hidden behind a
macro called \type {vmem}.

The implementation of the growth of two arrays (via reallocation) introduces a
potential pitfall: the memory arrays should never be used as the left hand side
of a statement that can modify the array in question.

The input line buffer and pool size are now also reallocated when needed, and the
\type {texmf.cnf} settings \type {buf_size} and \type {pool_size} are silently
ignored.

\stopsubsection

\startsubsection[title=Sparse arrays]

The \type {\mathcode}, \type {\delcode}, \type {\catcode}, \type {\sfcode}, \type {\lccode}
and \type {\uccode} tables are now sparse arrays that are implemented in~\CCODE.
They are no longer part of the \TEX\ \quote {equivalence table} and because each
had 1.1 million entries with a few memory words each, this makes a major
difference in memory usage.

The \type {\catcode}, \type {\sfcode}, \type {\lccode} and \type {\uccode} assignments do
not yet show up when using the etex tracing routines \type {\tracingassigns} and
\type {\tracingrestores} (code simply not written yet).

A side|-|effect of the current implementation is that \type {\global} is now more
expensive in terms of processing than non|-|global assignments.

See \type {mathcodes.c} and \type {textcodes.c} if you are interested in the
details.

Also, the glyph ids within a font are now managed by means of a sparse array and
glyph ids can go up to index $2^{21}-1$.

\stopsubsection

\startsubsection[title=Simple single-character csnames]

Single|-|character commands are no longer treated specially in the internals,
they are stored in the hash just like the multiletter csnames.

The code that displays control sequences explicitly checks if the length is one
when it has to decide whether or not to add a trailing space.

Active characters are internally implemented as a special type of multi|-|letter
control sequences that uses a prefix that is otherwise impossible to obtain.

\stopsubsection

\startsubsection[title=Compressed format]

The format is passed through zlib, allowing it to shrink to roughly half of the
size it would have had in uncompressed form. This takes a bit more \CPU\ cycles
but much less disk \IO, so it should still be faster.

\stopsubsection

\startsubsection[title=Binary file reading]

All of the internal code is changed in such a way that if one of the \type
{read_xxx_file} callbacks is not set, then the file is read by a C function using
basically the same convention as the callback: a single read into a buffer big
enough to hold the entire file contents. While this uses more memory than the
previous code (that mostly used \type {getc} calls), it can be quite a bit faster
(depending on your I/O subsystem).

\stopsubsection

\stopsection

\stopchapter

\stopcomponent