doc/context/sources/general/manuals/hybrid/hybrid-backend.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389

% language=uk

\startcomponent hybrid-backends

\environment hybrid-environment

\startchapter[title={Backend code}]

\startsection [title={Introduction}]

In \CONTEXT\ we've always separated the backend code in so called driver files.
This means that in the code related to typesetting only calls to the \API\ take
place, and no backend specific code is to be used. That way we can support
backend like dvipsone (and dviwindo), dvips, acrobat, pdftex and dvipdfmx with
one interface. A simular model is used in \MKIV\ although at the moment we only
have one backend: \PDF. \footnote {At this moment we only support the native
\PDF\ backend but future versions might support \XML\ (\HTML) output as well.}

Some \CONTEXT\ users like to add their own \PDF\ specific code to their styles or
modules. However, such extensions can interfere with existing code, especially
when resources are involved. This has to be done via the official helper macros.

In the next sections an overview will be given of the current approach. There are
still quite some rough edges but these will be polished as soon as the backend
code is more isolated in \LUATEX\ itself.

\stopsection

\startsection [title={Structure}]

A \PDF\ file is a tree of indirect objects. Each object has a number and the file
contains a table (or multiple tables) that relates these numbers to positions in
a file (or position in a compressed object stream). That way a file can be viewed
without reading all data: a viewer only loads what is needed.

\starttyping
1 0 obj <<
    /Name (test) /Address 2 0 R
>>
2 0 obj [
   (Main Street) (24) (postal code) (MyPlace)
]
\stoptyping

For the sake of the discussion we consider strings like \type {(test)} also to be
objects. In the next table we list what we can encounter in a \PDF\ file. There
can be indirect objects in which case a reference is used (\type{2 0 R}) and
direct ones.

\starttabulate[|l|l|p|]
\FL
\NC \bf type \NC \bf form \NC \bf meaning \NC \NR
\TL
\NC constant   \NC \type{/...} \NC A symbol (prescribed string). \NC \NR
\NC string     \NC \type{(...)} \NC A sequence of characters in pdfdoc encoding \NC \NR
\NC unicode    \NC \type{<...>} \NC A sequence of characters in utf16  encoding \NC \NR
\NC number     \NC \type{3.1415} \NC A number constant. \NC \NR
\NC boolean    \NC \type{true/false} \NC A boolean constant. \NC \NR
\NC reference  \NC \type{N 0 R} \NC A reference to an object \NC \NR
\NC dictionary \NC \type{<< ... >>} \NC A collection of key value pairs where the
                   value itself is an (indirect) object. \NC \NR
\NC array      \NC \type{[ ... ]} \NC A list of objects or references to objects. \NC \NR
\NC stream     \NC \NC A sequence of bytes either or not packaged with a dictionary
                   that contains descriptive data. \NC \NR
\NC xform      \NC \NC A special kind of object containing an reusable blob of data,
                   for example an image. \NC \NR
\LL
\stoptabulate

While writing additional backend code, we mostly create dictionaries.

\starttyping
<< /Name (test) /Address 2 0 R >>
\stoptyping

In this case the indirect object can look like:

\starttyping
[ (Main Street) (24) (postal code) (MyPlace) ]
\stoptyping

It all starts in the document's root object. From there we access the page tree
and resources. Each page carries its own resource information which makes random
access easier. A page has a page stream and there we find the to be rendered
content as a mixture of (\UNICODE) strings and special drawing and rendering
operators. Here we will not discuss them as they are mostly generated by the
engine itself or dedicated subsystems like the \METAPOST\ converter. There we use
literal or \type {\latelua} whatsits to inject code into the current stream.

In the \CONTEXT\ \MKII\ backend drivers code you will see objects in their
verbose form. The content is passed on using special primitives, like \type
{\pdfobj}, \type{\pdfannot}, \type {\pdfcatalog}, etc. In \MKIV\ no such
primitives are used. In fact, some of them are overloaded to do nothing at all.
In the \LUA\ backend code you will find function calls like:

\starttyping
local d = lpdf.dictionary {
    Name    = lpdf.string("test"),
    Address = lpdf.array {
        "Main Street", "24", "postal code", "MyPlace",
    }
}
\stoptyping

Equaly valid is:

\starttyping
local d = lpdf.dictionary()
d.Name = "test"
\stoptyping

Eventually the object will end up in the file using calls like:

\starttyping
local r = pdf.immediateobj(tostring(d))
\stoptyping

or using the wrapper (which permits tracing):

\starttyping
local r = lpdf.flushobject(d)
\stoptyping

The object content will be serialized according to the formal specification so
the proper \type {<< >>} etc.\ are added. If you want the content instead you can
use a function call:

\starttyping
local dict = d()
\stoptyping

An example of using references is:

\starttyping
local a = lpdf.array {
    "Main Street", "24", "postal code", "MyPlace",
}
local d = lpdf.dictionary {
    Name    = lpdf.string("test"),
    Address = lpdf.reference(a),
}
local r = lpdf.flushobject(d)
\stoptyping

\stopsection

We have the following creators. Their arguments are optional.

\starttabulate[|l|p|]
\FL
\NC \bf function \NC \bf optional parameter \NC \NR
\TL
%NC \type{lpdf.stream}      \NC indexed table of operators \NC \NR
\NC \type{lpdf.dictionary}  \NC hash with key/values \NC \NR
\NC \type{lpdf.array}       \NC indexed table of objects \NC \NR
\NC \type{lpdf.unicode}     \NC string \NC \NR
\NC \type{lpdf.string}      \NC string \NC \NR
\NC \type{lpdf.number}      \NC number \NC \NR
\NC \type{lpdf.constant}    \NC string \NC \NR
\NC \type{lpdf.null}        \NC \NC \NR
\NC \type{lpdf.boolean}     \NC boolean \NC \NR
%NC \type{lpdf.true}        \NC \NC \NR
%NC \type{lpdf.false}       \NC \NC \NR
\NC \type{lpdf.reference}   \NC string \NC \NR
\NC \type{lpdf.verbose}     \NC indexed table of strings \NC \NR
\LL
\stoptabulate

Flushing objects is done with:

\starttyping
lpdf.flushobject(obj)
\stoptyping

Reserving object is or course possible and done with:

\starttyping
local r = lpdf.reserveobject()
\stoptyping

Such an object is flushed with:

\starttyping
lpdf.flushobject(r,obj)
\stoptyping

We also support named objects:

\starttyping
lpdf.reserveobject("myobject")

lpdf.flushobject("myobject",obj)
\stoptyping

\startsection [title={Resources}]

While \LUATEX\ itself will embed all resources related to regular typesetting,
\MKIV\ has to take care of embedding those related to special tricks, like
annotations, spot colors, layers, shades, transparencies, metadata, etc. If you
ever took a look in the \MKII\ \type {spec-*} files you might have gotten the
impression that it quickly becomes messy. The code there is actually rather old
and evolved in sync with the \PDF\ format as well as \PDFTEX\ and \DVIPDFMX\
maturing to their current state. As a result we have a dedicated object
referencing model that sometimes results in multiple passes due to forward
references. We could have gotten away from that with the latest versions of
\PDFTEX\ as it provides means to reserve object numbers but it makes not much
sense to do that now that \MKII\ is frozen.

Because third party modules (like tikz) also can add resources like in \MKII\
using an \API\ that makes sure that no interference takes place. Think of macros
like:

\starttyping
\pdfbackendsetcatalog       {key}{string}
\pdfbackendsetinfo          {key}{string}
\pdfbackendsetname          {key}{string}

\pdfbackendsetpageattribute {key}{string}
\pdfbackendsetpagesattribute{key}{string}
\pdfbackendsetpageresource  {key}{string}

\pdfbackendsetextgstate     {key}{pdfdata}
\pdfbackendsetcolorspace    {key}{pdfdata}
\pdfbackendsetpattern       {key}{pdfdata}
\pdfbackendsetshade         {key}{pdfdata}
\stoptyping

One is free to use the \LUA\ interface instead, as there one has more
possibilities. The names are similar, like:

\starttyping
lpdf.addtoinfo(key,anything_valid_pdf)
\stoptyping

At the time of this writing (\LUATEX\ .50) there are still places where \TEX\ and
\LUA\ code is interwoven in a non optimal way, but that will change in the future
as the backend is completely separated and we can do more \TEX\ trickery at the
\LUA\ end.

Also, currently we expose more of the backend code than we like and future
versions will have a more restricted access. The following function will stay
public:

\starttyping
lpdf.addtopageresources  (key,value)
lpdf.addtopageattributes (key,value)
lpdf.addtopagesattributes(key,value)

lpdf.adddocumentextgstate(key,value)
lpdf.adddocumentcolorspac(key,value)
lpdf.adddocumentpattern  (key,value)
lpdf.adddocumentshade    (key,value)

lpdf.addtocatalog        (key,value)
lpdf.addtoinfo           (key,value)
lpdf.addtonames          (key,value)
\stoptyping

There are several tracing options built in and some more will be added in due
time:

\starttyping
\enabletrackers
  [backend.finalizers,
   backend.resources,
   backend.objects,
   backend.detail]
\stoptyping

As with all trackers you can also pass them on the command line, for example:

\starttyping
context --trackers=backend.* yourfile
\stoptyping

The reference related backend mechanisms have their own trackers.

\stopsection

\startsection [title={Transformations}]

There is at the time of this writing still some backend related code at the \TEX\
end that needs a cleanup. Most noticeable is the code that deals with
transformations (like scaling). At some moment in \PDFTEX\ a primitive was
introduced but it was not completely covering the transform matrix so we never
used it. In \LUATEX\ we will come up with a better mechanism. Till that moment we
stick to the \MKII\ method.

\stopsection

\startsection [title={Annotations}]

The \LUA\ based backend of \MKIV\ is not so much less code, but definitely
cleaner. The reason why there is quite some code is because in \CONTEXT\ we also
handle annotations and destinations in \LUA. In other words: \TEX\ is not
bothered by the backend any more. We could make that split without too much
impact as we never depended on \PDFTEX\ hyperlink related features and used
generic annotations instead. It's for that reason that \CONTEXT\ has always been
able to nest hyperlinks and have annotations with a chain of actions.

Another reason for doing it all at the \LUA\ end is that as in \MKII\ we have to
deal with the rather hybrid cross reference mechanisms which uses a sort of
language and parsing this is also easier at the \LUA\ end. Think of:

\starttyping
\definereference[somesound][StartSound(attention)]

\at {just some page} [someplace,somesound,StartMovie(somemovie)]
\stoptyping

We parse the specification expanding shortcuts when needed, create an action
chain, make sure that the movie related resources are taken care of (normally the
movie itself will be a figure), and turn the three words into hyperlinks. As this
all happens in \LUA\ we have less \TEX\ code. Contrary to what you might expect,
the \LUA\ code is not that much faster as the \MKII\ \TEX\ code is rather
optimized.

Special features like \JAVASCRIPT\ as well as widgets (and forms) are also
reimplemented. Support for \JAVASCRIPT\ is not that complex at all, but as in
\CONTEXT\ we can organize scripts in collections and have automatic inclusion of
used functions, still some code is needed. As we now do this in \LUA\ we use less
\TEX\ memory. Reimplementing widgets took a bit more work as I used the
opportunity to remove hacks for older viewers. As support for widgets is somewhat
instable in viewers quite some testing was needed, especially because we keep
supporting cloned and copied fields (resulting in widget trees).

An interesting complication with widgets is that each instance can have a lot of
properties and as we want to be able to use thousands of them in one document,
each with different properties, we have efficient storage in \MKII\ and want to
do the same in \LUA. Most code at the \TEX\ end is related to passing all those
options.

You could use the \LUA\ functions that relate to annotations etc.\ but normally
you will use the regular \CONTEXT\ user interface. For practical reasons, the
backend code is grouped in several tables:

The \type{backends} table has subtables for each backend and currently there is
only one: \type {pdf}. Each backend provides tables itself. In the
\type{codeinjections} namespace we collect functions that don't interfere with
the typesetting or typeset result, like inserting all kind of resources (movies,
attachment, etc.), widget related functionality, and in fact everything that does
not fit into the other categories. In \type {nodeinjections} we organize
functions that inject literal \PDF\ code in the nodelist which then ends up in
the \PDF\ stream: color, layers, etc. The \type {registrations} table is reserved
for functions related to resources that result from node injections: spot colors,
transparencies, etc. Once the backend code is finished we might come up with
another organization. No matter what we end up with, the way the \type {backends}
table is supposed to be organized determines the \API\ and those who have seen
the \MKII\ backend code will recognize some of it.

\startsection [title={Metadata}]

We always had the opportunity to set the information fields in a \PDF\ but
standardization forces us to add these large verbose metadata blobs. As this blob
is coded in \XML\ we use the built in \XML\ parser to fill a template. Thanks to
extensive testing and research by Peter Rolf we now have a rather complete
support for \PDF/x related demands. This will definitely evolve with the advance
of the \PDF\ specification. You can replace the information with your own but we
suggest that you stay away from this metadata mess as far as possible.

\stopsection

\startsection [title={Helpers}]

If you look into the \type {lpdf-*.lua} files you will find more
functions. Some are public helpers, like:

\starttabulate
\NC \type {lpdf.toeight(str)}   \NC returns \type {(string)} \NC \NR
%NC \type {lpdf.cleaned(str)}   \NC returns \type {escaped string} \NC \NR
\NC \type {lpdf.tosixteen(str)} \NC returns \type {<utf16 sequence>} \NC \NR
\stoptabulate

An example of another public function is:

\starttyping
lpdf.sharedobj(content)
\stoptyping

This one flushes the object and returns the object number. Already defined
objects are reused. In addition to this code driven optimization, some other
optimization and reuse takes place but all that happens without user
intervention.

\stopsection

\stopchapter

\stopcomponent