doc/context/sources/general/manuals/xml/xml-mkiv-converter.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300

\environment xml-mkiv-style

\startcomponent xml-mkiv-converter

\startchapter[title={Setting up a converter}]

\startsection[title={from structure to setup}]

We use a very simple document structure for demonstrating how a converter is
defined. In practice a mapping will be more complex, especially when we have a
style with complex chapter openings using data coming from all kind of places,
different styling of sections with the same name, selectively (out of order)
flushed content, special formatting, etc.

\typefile{manual-demo-1.xml}

Say that this document is stored in the file \type {demo.xml}, then the following
code can be used as starting point:

\starttyping
\startxmlsetups xml:demo:base
  \xmlsetsetup{#1}{document|section|p}{xml:demo:*}
\stopxmlsetups

\xmlregisterdocumentsetup{demo}{xml:demo:base}

\startxmlsetups xml:demo:document
  \starttitle[title={Contents}]
    \placelist[chapter]
  \stoptitle
  \xmlflush{#1}
\stopxmlsetups

\startxmlsetups xml:demo:section
  \startchapter[title=\xmlfirst{#1}{/title}]
    \xmlfirst{#1}{/content}
  \stopchapter
\stopxmlsetups

\startxmlsetups xml:demo:p
  \xmlflush{#1}\endgraf
\stopxmlsetups

\xmlprocessfile{demo}{demo.xml}{}
\stoptyping

Watch out! These are not just setups, but specific \XML\ setups which get an
argument passed (the \type {#1}). If for some reason your \XML\ processing fails,
it might be that you mistakenly have used a normal setup definition. The argument
\type {#1} represents the current node (element) and is a unique identifier. For
instance a \type {<p>..</p>} can have an identifier {demo::5}. So, we can get
something:

\starttyping
\xmlflush{demo::5}\endgraf
\stoptyping

but as well:

\starttyping
\xmlflush{demo::6}\endgraf
\stoptyping

Keep in mind that the references tor the actual nodes (elements) are
abstractions, you never see those \type {<id>::<number>}'s, because we will use
either the abstract \type {#1} (any node) or an explicit reference like \type
{demo}. The previous setup when issued will be like:

\starttyping
\startchapter[title=\xmlfirst{demo::3}{/title}]
  \xmlfirst{demo::4}{/content}
\stopchapter
\stoptyping

Here the \type {title} is used to typeset the chapter title but also for an entry
in the table of contents. At the moment the title is typeset the \XML\ node gets
looked up and expanded in real text. However, for the list it gets stored for
later use. One can argue that this is not needed for \XML, because one can just
filter all the titles and use page references, but then one also looses the
control one normally has over such titles. For instance it can be that some
titles are rendered differently and for that we need to keep track of usage.
Doing that with transformations or filtering is often more complex than leaving
that to \TEX. As soon as the list gets typeset, the reference (\type {demo::#3})
is used for the lookup. This is because by default the title is stored as given.
So, as long as we make sure the \XML\ source is loaded before the table of
contents is typeset we're ok. Later we will look into this in more detail, for
now it's enough to know that in most cases the abstract \type {#1} reference will
work out ok.

Contrary to the style definitions this interface looks rather low level (with no
optional arguments) and the main reason for this is that we want processing to be
fast. So, the basic framework is:

\starttyping
\startxmlsetups xml:demo:base
  % associate setups with elements
\stopxmlsetups

\xmlregisterdocumentsetup{demo}{xml:demo:base}

% define setups for matches

\xmlprocessfile{demo}{demo.xml}{}
\stoptyping

In this example we mostly just flush the content of an element and in the case of
a section we flush explicit child elements. The \type {#1} in the example code
represents the current element. The line:

\starttyping
\xmlsetsetup{demo}{*}{-}
\stoptyping

sets the default for each element to \quote {just ignore it}. A \type {+} would
make the default to always flush the content. This means that at this point we
only handle:

\starttyping
<section>
  <title>Some title</title>
  <content>
    <p>a paragraph of text</p>
  </content>
</section>
\stoptyping

In the next section we will deal with the slightly more complex itemize and
figure placement. At first sight all these setups may look overkill but keep in
mind that normally the number of elements is rather limited. The complexity is
often in the style and having access to each snippet of content is actually
quite handy for that.

\stopsection

\startsection[title={alternative solutions}]

Dealing with an itemize is rather simple (as long as we forget about
attributes that control the behaviour):

\starttyping
<itemize>
  <item>first</item>
  <item>second</item>
</itemize>
\stoptyping

First we need to add \type {itemize} to the setup assignment (unless we've used
the wildcard \type {*}):

\starttyping
\xmlsetsetup{demo}{document|section|p|itemize}{xml:demo:*}
\stoptyping

The setup can look like:

\starttyping
\startxmlsetups xml:demo:itemize
  \startitemize
    \xmlfilter{#1}{/item/command(xml:demo:itemize:item)}
  \stopitemize
\stopxmlsetups

\startxmlsetups xml:demo:itemize:item
  \startitem
    \xmlflush{#1}
  \stopitem
\stopxmlsetups
\stoptyping

An alternative is to map item directly:

\starttyping
\xmlsetsetup{demo}{document|section|p|itemize|item}{xml:demo:*}
\stoptyping

and use:

\starttyping
\startxmlsetups xml:demo:itemize
  \startitemize
    \xmlflush{#1}
  \stopitemize
\stopxmlsetups

\startxmlsetups xml:demo:item
  \startitem
    \xmlflush{#1}
  \stopitem
\stopxmlsetups
\stoptyping

Sometimes, a more local solution using filters and \type {/command(...)} makes more
sense, especially when the \type {item} tag is used for other purposes as well.

Explicit flushing with \type {command} is definitely the way to go when you have
complex products. In one of our projects we compose math school books from many
thousands of small \XML\ files, and from one source set several products are
typeset. Within a book sections get done differently, content gets used, ignored
or interpreted differently depending on the kind of content, so there is a
constant checking of attributes that drive the rendering. In that a generic setup
for a title element makes less sense than explicit ones for each case. (We're
talking of huge amounts of files here, including multiple images on each rendered
page.)

When using \type {command} you can pass two arguments, the first is the setup for
the match, the second one for the miss, as in:

\starttyping
\xmlfilter{#1}{/element/command(xml:true,xml:false)}
\stoptyping

Back to the example, this leaves us with dealing with the resources, like
figures:

\starttyping
<resource type='figure'>
  <caption>A picture of a cow.</caption>
  <content><external file="cow.pdf"/></content>
</resource>
\stoptyping

Here we can use a more restricted match:

\starttyping
\xmlsetsetup{demo}{resource[@type='figure']}{xml:demo:figure}
\xmlsetsetup{demo}{external}{xml:demo:*}
\stoptyping

and the definitions:

\starttyping
\startxmlsetups xml:demo:figure
  \placefigure
    {\xmlfirst{#1}{/caption}}
    {\xmlfirst{#1}{/content}}
\stopxmlsetups

\startxmlsetups xml:demo:external
  \externalfigure[\xmlatt{#1}{file}]
\stopxmlsetups
\stoptyping

At this point it is good to notice that \type {\xmlatt{#1}{file}} is passed as it
is: a macro call. This means that when a macro like \type {\externalfigure} uses
the first argument frequently without first storing its value, the lookup is done
several times. A solution for this is:

\starttyping
\startxmlsetups xml:demo:external
  \expanded{\externalfigure[\xmlatt{#1}{file}]}
\stopxmlsetups
\stoptyping

Because the lookup is rather fast, normally there is no need to bother about this
too much because internally \CONTEXT\ already makes sure such expansion happens
only once.

An alternative definition for placement is the following:

\starttyping
\xmlsetsetup{demo}{resource}{xml:demo:resource}
\stoptyping

with:

\starttyping
\startxmlsetups xml:demo:resource
  \placefloat
    [\xmlatt{#1}{type}]
    {\xmlfirst{#1}{/caption}}
    {\xmlfirst{#1}{/content}}
\stopxmlsetups
\stoptyping

This way you can specify \type {table} as type too. Because you can define your
own float types, more complex variants are also possible. In that case it makes
sense to provide some default behaviour too:

\starttyping
\definefloat[figure-here][figure][default=here]
\definefloat[figure-left][figure][default=left]
\definefloat[table-here] [table] [default=here]
\definefloat[table-left] [table] [default=left]

\startxmlsetups xml:demo:resource
  \placefloat
    [\xmlattdef{#1}{type}{figure}-\xmlattdef{#1}{location}{here}]
    {\xmlfirst{#1}{/caption}}
    {\xmlfirst{#1}{/content}}
\stopxmlsetups
\stoptyping

In this example we support two types and two locations. We default to a figure
placed (when possible) at the current location.

\stopsection

\stopchapter

\stopcomponent