summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/xml/xml-mkiv-tricks.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/context/sources/general/manuals/xml/xml-mkiv-tricks.tex')
-rw-r--r--doc/context/sources/general/manuals/xml/xml-mkiv-tricks.tex814
1 files changed, 814 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/xml/xml-mkiv-tricks.tex b/doc/context/sources/general/manuals/xml/xml-mkiv-tricks.tex
new file mode 100644
index 000000000..f8c65ecc9
--- /dev/null
+++ b/doc/context/sources/general/manuals/xml/xml-mkiv-tricks.tex
@@ -0,0 +1,814 @@
+\environment xml-mkiv-style
+
+\startcomponent xml-mkiv-tricks
+
+\startchapter[title={Tips and tricks}]
+
+\startsection[title={tracing}]
+
+It can be hard to debug code as much happens kind of behind the screens.
+Therefore we have a couple of tracing options. Of course you can typeset some
+status information, using for instance:
+
+\startxmlcmd {\cmdbasicsetup{xmlshow}}
+ typeset the tree given by \cmdinternal {cd:node}
+\stopxmlcmd
+
+\startxmlcmd {\cmdbasicsetup{xmlinfo}}
+ typeset the name in the element given by \cmdinternal {cd:node}
+\stopxmlcmd
+
+\startxmlcmd {\cmdbasicsetup{xmlpath}}
+ returns the complete path (including namespace prefix and index) of the
+ given \cmdinternal {cd:node}
+\stopxmlcmd
+
+\startbuffer[demo]
+<?xml version "1.0"?>
+<document>
+ <section>
+ <content>
+ <p>first</p>
+ <p><b>second</b></p>
+ </content>
+ </section>
+ <section>
+ <content>
+ <p><b>third</b></p>
+ <p>fourth</p>
+ </content>
+ </section>
+</document>
+\stopbuffer
+
+Say that we have the following \XML:
+
+\typebuffer[demo]
+
+and the next definitions:
+
+\startbuffer
+\startxmlsetups xml:demo:base
+ \xmlsetsetup{#1}{p|b}{xml:demo:*}
+\stopxmlsetups
+
+\startxmlsetups xml:demo:p
+ \xmlflush{#1}
+ \par
+\stopxmlsetups
+
+\startxmlsetups xml:demo:b
+ \par
+ \xmlpath{#1} : \xmlflush{#1}
+ \par
+\stopxmlsetups
+
+\xmlregisterdocumentsetup{example-10}{xml:demo:base}
+
+\xmlprocessbuffer{example-10}{demo}{}
+\stopbuffer
+
+\typebuffer
+
+This will give us:
+
+\blank \startpacked \getbuffer \stoppacked \blank
+
+If you use \type {\xmlshow} you will get a complete subtree which can
+be handy for tracing but can also lead to large documents.
+
+We also have a bunch of trackers that can be enabled, like:
+
+\starttyping
+\enabletrackers[xml.show,xml.parse]
+\stoptyping
+
+The full list (currently) is:
+
+\starttabulate[|lT|p|]
+\NC xml.entities \NC show what entities are seen and replaced \NC \NR
+\NC xml.path \NC show the result of parsing an lpath expression \NC \NR
+\NC xml.parse \NC show stepwise resolving of expressions \NC \NR
+\NC xml.profile \NC report all parsed lpath expressions (in the log) \NC \NR
+\NC xml.remap \NC show what namespaces are remapped \NC \NR
+\NC lxml.access \NC report errors with respect to resolving (symbolic) nodes \NC \NR
+\NC lxml.comments \NC show the comments that are encountered (if at all) \NC \NR
+\NC lxml.loading \NC show what files are loaded and converted \NC \NR
+\NC lxml.setups \NC show what setups are being associated to elements \NC \NR
+\stoptabulate
+
+In one of our workflows we produce books from \XML\ where the (educational)
+content is organized in many small files. Each book has about 5~chapters and each
+chapter is made of sections that contain text, exercises, resources, etc.\ and so
+the document is assembled from thousands of files (don't worry, runtime inclusion
+is pretty fast). In order to see where in the sources content resides we can
+trace the filename.
+
+\startxmlcmd {\cmdbasicsetup{xmlinclusion}}
+ returns the file where the node comes from
+\stopxmlcmd
+
+\startxmlcmd {\cmdbasicsetup{xmlinclusions}}
+ returns the list of files where the node comes from
+\stopxmlcmd
+
+\startxmlcmd {\cmdbasicsetup{xmlbadinclusions}}
+ returns a list of files that were not included due to some problem
+\stopxmlcmd
+
+Of course you have to make sure that these names end up somewhere visible, for
+instance in the margin.
+
+\stopsection
+
+\startsection[title={expansion}]
+
+For novice users the concept of expansion might sound frightening and to some
+extend it is. However, it is important enough to spend some words on it here.
+
+It is good to realize that most setups are sort of immediate. When one setup is
+issued, it can call another one and so on. Normally you won't notice that but
+there are cases where that can be a problem. In \TEX\ you can define a macro,
+take for instance:
+
+\starttyping
+\startxmlsetups xml:foo
+ \def\foobar{\xmlfirst{#1}{/bar}}
+\stopxmlsetups
+\stoptyping
+
+you store the reference top node \type {bar} in \type {\foobar} maybe for later use. In
+this case the content is not yet fetched, it will be done when \type {\foobar} is
+called.
+
+\starttyping
+\startxmlsetups xml:foo
+ \edef\foobar{\xmlfirst{#1}{/bar}}
+\stopxmlsetups
+\stoptyping
+
+Here the content of \type {bar} becomes the body of the macro. But what if
+\type {bar} itself contains elements that also contain elements. When there
+is a setup for \type {bar} it will be triggered and so on.
+
+When that setup looks like:
+
+\starttyping
+\startxmlsetups xml:bar
+ \def\barfoo{\xmlflush{#1}}
+\stopxmlsetups
+\stoptyping
+
+Here we get something like:
+
+\starttyping
+\foobar => {\def\barfoo{...}}
+\stoptyping
+
+When \type {\barfoo} is not defined we get an error and when it is known and expands
+to something weird we might also get an error.
+
+Especially when you don't know what content can show up, this can result in errors
+when an expansion fails, for example because some macro being used is not defined.
+To prevent this we can define a macro:
+
+\starttyping
+\starttexdefinition unexpanded xml:bar:macro #1
+ \def\barfoo{\xmlflush{#1}}
+\stoptexdefinition
+
+\startxmlsetups xml:bar
+ \texdefinition{xml:bar:macro}{#1}
+\stopxmlsetups
+\stoptyping
+
+The setup \type {xml:bar} will still expand but the replacement text now is just the
+call to the macro, think of:
+
+\starttyping
+\foobar => {\texdefinition{xml:bar:macro}{#1}}
+\stoptyping
+
+But this is often not needed, most \CONTEXT\ commands can handle the expansions
+quite well but it's good to know that there is a way out. So, now to some
+examples. Imagine that we have an \XML\ file that looks as follows:
+
+\starttyping
+<?xml version='1.0' ?>
+<demo>
+ <chapter>
+ <title>Some <em>short</em> title</title>
+ <content>
+ zeta
+ <index>
+ <key>zeta</key>
+ <content>zeta again</content>
+ </index>
+ alpha
+ <index>
+ <key>alpha</key>
+ <content>alpha <em>again</em></content>
+ </index>
+ gamma
+ <index>
+ <key>gamma</key>
+ <content>gamma</content>
+ </index>
+ beta
+ <index>
+ <key>beta</key>
+ <content>beta</content>
+ </index>
+ delta
+ <index>
+ <key>delta</key>
+ <content>delta</content>
+ </index>
+ done!
+ </content>
+ </chapter>
+</demo>
+\stoptyping
+
+There are a few structure related elements here: a chapter (with its list entry)
+and some index entries. Both are multipass related and therefore travel around.
+This means that when we let data end up in the auxiliary file, we need to make
+sure that we end up with either expanded data (i.e.\ no references to the \XML\
+tree) or with robust forward and backward references to elements in the tree.
+
+Here we discuss three approaches (and more may show up later): pushing \XML\ into
+the auxiliary file and using references to elements either or not with an
+associated setup. We control the variants with a switch.
+
+\starttyping
+\newcount\TestMode
+
+\TestMode=0 % expansion=xml
+\TestMode=1 % expansion=yes, index, setup
+\TestMode=2 % expansion=yes
+\stoptyping
+
+We apply a couple of setups:
+
+\starttyping
+\startxmlsetups xml:mysetups
+ \xmlsetsetup{\xmldocument}{demo|index|content|chapter|title|em}{xml:*}
+\stopxmlsetups
+
+\xmlregistersetup{xml:mysetups}
+\stoptyping
+
+The main document is processed with:
+
+\starttyping
+\startxmlsetups xml:demo
+ \xmlflush{#1}
+ \subject{contents}
+ \placelist[chapter][criterium=all]
+ \subject{index}
+ \placeregister[index][criterium=all]
+ \page % else buffer is forgotten when placing header
+\stopxmlsetups
+\stoptyping
+
+First we show three alternative ways to deal with the chapter. The first case
+expands the \XML\ reference so that we have an \XML\ stream in the auxiliary
+file. This stream is processed as a small independent subfile when needed. The
+second case registers a reference to the current element (\type {#1}). This means
+that we have access to all data of this element, like attributes, title and
+content. What happens depends on the given setup. The third variant does the same
+but here the setup is part of the reference.
+
+\starttyping
+\startxmlsetups xml:chapter
+ \ifcase \TestMode
+ % xml code travels around
+ \setuphead[chapter][expansion=xml]
+ \startchapter[title=eh: \xmltext{#1}{title}]
+ \xmlfirst{#1}{content}
+ \stopchapter
+ \or
+ % index is used for access via setup
+ \setuphead[chapter][expansion=yes,xmlsetup=xml:title:flush]
+ \startchapter[title=\xmlgetindex{#1}]
+ \xmlfirst{#1}{content}
+ \stopchapter
+ \or
+ % tex call to xml using index is used
+ \setuphead[chapter][expansion=yes]
+ \startchapter[title=hm: \xmlreference{#1}{xml:title:flush}]
+ \xmlfirst{#1}{content}
+ \stopchapter
+ \fi
+\stopxmlsetups
+
+\startxmlsetups xml:title:flush
+ \xmltext{#1}{title}
+\stopxmlsetups
+\stoptyping
+
+We need to deal with emphasis and the content of the chapter.
+
+\starttyping
+\startxmlsetups xml:em
+ \begingroup\em\xmlflush{#1}\endgroup
+\stopxmlsetups
+
+\startxmlsetups xml:content
+ \xmlflush{#1}
+\stopxmlsetups
+\stoptyping
+
+A similar approach is followed with the index entries. Watch how we use the
+numbered entries variant (in this case we could also have used just \type
+{entries} and \type {keys}).
+
+\starttyping
+\startxmlsetups xml:index
+ \ifcase \TestMode
+ \setupregister[index][expansion=xml,xmlsetup=]
+ \setstructurepageregister
+ [index]
+ [entries:1=\xmlfirst{#1}{content},
+ keys:1=\xmltext{#1}{key}]
+ \or
+ \setupregister[index][expansion=yes,xmlsetup=xml:index:flush]
+ \setstructurepageregister
+ [index]
+ [entries:1=\xmlgetindex{#1},
+ keys:1=\xmltext{#1}{key}]
+ \or
+ \setupregister[index][expansion=yes,xmlsetup=]
+ \setstructurepageregister
+ [index]
+ [entries:1=\xmlreference{#1}{xml:index:flush},
+ keys:1=\xmltext{#1}{key}]
+ \fi
+\stopxmlsetups
+
+\startxmlsetups xml:index:flush
+ \xmlfirst{#1}{content}
+\stopxmlsetups
+\stoptyping
+
+Instead of this flush, you can use the predefined setup \type {xml:flush}
+unless it is overloaded by you.
+
+The file is processed by:
+
+\starttyping
+\starttext
+ \xmlprocessfile{main}{test.xml}{}
+\stoptext
+\stoptyping
+
+We don't show the result here. If you're curious what the output is, you can test
+it yourself. In that case it also makes sense to peek into the \type {test.tuc}
+file to see how the information travels around. The \type {metadata} fields carry
+information about how to process the data.
+
+The first case, the \XML\ expansion one, is somewhat special in the sense that
+internally we use small pseudo files. You can control the rendering by tweaking
+the following setups:
+
+\starttyping
+\startxmlsetups xml:ctx:sectionentry
+ \xmlflush{#1}
+\stopxmlsetups
+
+\startxmlsetups xml:ctx:registerentry
+ \xmlflush{#1}
+\stopxmlsetups
+\stoptyping
+
+{\em When these methods work out okay the other structural elements will be
+dealt with in a similar way.}
+
+\stopsection
+
+\startsection[title={special cases}]
+
+Normally the content will be flushed under a special (so called) catcode regime.
+This means that characters that have a special meaning in \TEX\ will have no such
+meaning in an \XML\ file. If you want content to be treated as \TEX\ code, you can
+use one of the following:
+
+\startxmlcmd {\cmdbasicsetup{xmlflushcontext}}
+ flush the given \cmdinternal {cd:node} using the \TEX\ character
+ interpretation scheme
+\stopxmlcmd
+
+\startxmlcmd {\cmdbasicsetup{xmlcontext}}
+ flush the match of \cmdinternal {cd:lpath} for the given \cmdinternal
+ {cd:node} using the \TEX\ character interpretation scheme
+\stopxmlcmd
+
+We use this in cases like:
+
+\starttyping
+....
+ \xmlsetsetup {#1} {
+ tm|texformula|
+ } {xml:*}
+....
+
+\startxmlsetups xml:tm
+ \mathematics{\xmlflushcontext{#1}}
+\stopxmlsetups
+
+\startxmlsetups xml:texformula
+ \placeformula\startformula\xmlflushcontext{#1}\stopformula
+\stopxmlsetups
+\stoptyping
+
+\stopsection
+
+\startsection[title={collecting}]
+
+Say that your document has
+
+\starttyping
+<table>
+ <tr>
+ <td>foo</td>
+ <td>bar<td>
+ </tr>
+</table>
+\stoptyping
+
+And that you need to convert that to \TEX\ speak like:
+
+\starttyping
+\bTABLE
+ \bTR
+ \bTD foo \eTD
+ \bTD bar \eTD
+ \eTR
+\eTABLE
+\stoptyping
+
+A simple mapping is:
+
+\starttyping
+\startxmlsetups xml:table
+ \bTABLE \xmlflush{#1} \eTABLE
+\stopxmlsetups
+\startxmlsetups xml:tr
+ \bTR \xmlflush{#1} \eTR
+\stopxmlsetups
+\startxmlsetups xml:td
+ \bTD \xmlflush{#1} \eTD
+\stopxmlsetups
+\stoptyping
+
+The \type {\bTD} command is a so called delimited command which means that it
+picks up its argument by looking for an \type {\eTD}. For the simple case here
+this works quite well because the flush is inside the pair. This is not the case
+in the following variant:
+
+\starttyping
+\startxmlsetups xml:td:start
+ \bTD
+\stopxmlsetups
+\startxmlsetups xml:td:stop
+ \eTD
+\stopxmlsetups
+\startxmlsetups xml:td
+ \xmlsetup{#1}{xml:td:start}
+ \xmlflush{#1}
+ \xmlsetup{#1}{xml:td:stop}
+\stopxmlsetups
+\stoptyping
+
+When for some reason \TEX\ gets confused you can revert to a mechanism that
+collects content.
+
+\starttyping
+\startxmlsetups xml:td:start
+ \startcollect
+ \bTD
+ \stopcollect
+\stopxmlsetups
+\startxmlsetups xml:td:stop
+ \startcollect
+ \eTD
+ \stopcollect
+\stopxmlsetups
+\startxmlsetups xml:td
+ \startcollecting
+ \xmlsetup{#1}{xml:td:start}
+ \xmlflush{#1}
+ \xmlsetup{#1}{xml:td:stop}
+ \stopcollecting
+\stopxmlsetups
+\stoptyping
+
+You can even implement solutions that effectively do this:
+
+\starttyping
+\startcollecting
+ \startcollect \bTABLE \stopcollect
+ \startcollect \bTR \stopcollect
+ \startcollect \bTD \stopcollect
+ \startcollect foo\stopcollect
+ \startcollect \eTD \stopcollect
+ \startcollect \bTD \stopcollect
+ \startcollect bar\stopcollect
+ \startcollect \eTD \stopcollect
+ \startcollect \eTR \stopcollect
+ \startcollect \eTABLE \stopcollect
+\stopcollecting
+\stoptyping
+
+Of course you only need to go that complex when the situation demands it. Here is
+another weird one:
+
+\starttyping
+\startcollecting
+ \startcollect \setupsomething[\stopcollect
+ \startcollect foo=\stopcollect
+ \startcollect FOO,\stopcollect
+ \startcollect bar=\stopcollect
+ \startcollect BAR,\stopcollect
+ \startcollect ]\stopcollect
+\stopcollecting
+\stoptyping
+
+\stopsection
+
+\startsection[title={selectors and injectors}]
+
+This section describes a bit special feature, one that we needed for a project
+where we could not touch the original content but could add specific sections for
+our own purpose. Hopefully the example demonstrates its useability.
+
+\enabletrackers[lxml.selectors]
+
+\startbuffer[foo]
+<?xml version="1.0" encoding="UTF-8"?>
+
+<?context-directive message info 1: this is a demo file ?>
+<?context-message-directive info 2: this is a demo file ?>
+
+<one>
+ <two>
+ <?context-select begin t1 t2 t3 ?>
+ <three>
+ t1 t2 t3
+ <?context-directive injector crlf t1 ?>
+ t1 t2 t3
+ </three>
+ <?context-select end ?>
+ <?context-select begin t4 ?>
+ <four>
+ t4
+ </four>
+ <?context-select end ?>
+ <?context-select begin t8 ?>
+ <four>
+ t8.0
+ t8.0
+ </four>
+ <?context-select end ?>
+ <?context-include begin t4 ?>
+ <!--
+ <three>
+ t4.t3
+ <?context-directive injector crlf t1 ?>
+ t4.t3
+ </three>
+ -->
+ <three>
+ t3
+ <?context-directive injector crlf t1 ?>
+ t3
+ </three>
+ <?context-include end ?>
+ <?context-select begin t8 ?>
+ <four>
+ t8.1
+ t8.1
+ </four>
+ <?context-select end ?>
+ <?context-select begin t8 ?>
+ <four>
+ t8.2
+ t8.2
+ </four>
+ <?context-select end ?>
+ <?context-select begin t4 ?>
+ <four>
+ t4
+ t4
+ </four>
+ <?context-select end ?>
+ <?context-directive injector page t7 t8 ?>
+ foo
+ <?context-directive injector blank t1 ?>
+ bar
+ <?context-directive injector page t7 t8 ?>
+ bar
+ </two>
+</one>
+\stopbuffer
+
+\typebuffer[foo]
+
+First we show how to plug in a directive. Processing instructions like the
+following are normally ignored by an \XML\ processor, unless they make sense
+to it.
+
+\starttyping
+<?context-directive message info 1: this is a demo file ?>
+<?context-message-directive info 2: this is a demo file ?>
+\stoptyping
+
+We can define a message handler as follows:
+
+\startbuffer
+\def\MyMessage#1#2#3{\writestatus{#1}{#2 #3}}
+
+\xmlinstalldirective{message}{MyMessage}
+\stopbuffer
+
+\typebuffer \getbuffer
+
+When this file is processed you will see this on the console:
+
+\starttyping
+info > 1: this is a demo file
+info > 2: this is a demo file
+\stoptyping
+
+The file has some sections that can be used or ignored. The recipe for
+obeying \type {t1} and \type {t4} is the following:
+
+\startbuffer
+\xmlsetinjectors[t1]
+\xmlsetinjectors[t4]
+
+\startxmlsetups xml:initialize
+ \xmlapplyselectors{#1}
+ \xmlsetsetup {#1} {
+ one|two|three|four
+ } {xml:*}
+\stopxmlsetups
+
+\xmlregistersetup{xml:initialize}
+
+\startxmlsetups xml:one
+ [ONE \xmlflush{#1} ONE]
+\stopxmlsetups
+
+\startxmlsetups xml:two
+ [TWO \xmlflush{#1} TWO]
+\stopxmlsetups
+
+\startxmlsetups xml:three
+ [THREE \xmlflush{#1} THREE]
+\stopxmlsetups
+
+\startxmlsetups xml:four
+ [FOUR \xmlflush{#1} FOUR]
+\stopxmlsetups
+\stopbuffer
+
+\typebuffer \getbuffer
+
+This typesets:
+
+\startnarrower
+\xmlprocessbuffer{main}{foo}{}
+\stopnarrower
+
+The include coding is kind of special: it permits adding content (in a comment)
+and ignoring the rest so that we indeed can add something without interfering
+with the original. Of course in a normal workflow such messy solutions are
+not needed, but alas, often workflows are not that clean, especially when one
+has no real control over the source.
+
+\startxmlcmd {\cmdbasicsetup{xmlsetinjectors}}
+ enables a list of injectors that will be used
+\stopxmlcmd
+
+\startxmlcmd {\cmdbasicsetup{xmlresetinjectors}}
+ resets the list of injectors
+\stopxmlcmd
+
+\startxmlcmd {\cmdbasicsetup{xmlinjector}}
+ expands an injection (command); normally this one is only used
+ (in some setup) or for testing
+\stopxmlcmd
+
+\startxmlcmd {\cmdbasicsetup{xmlapplyselectors}}
+ analyze the tree \cmdinternal {cd:node} for marked sections that
+ will be injected
+\stopxmlcmd
+
+We have some injections predefined:
+
+\starttyping
+\startsetups xml:directive:injector:page
+ \page
+\stopsetups
+
+\startsetups xml:directive:injector:column
+ \column
+\stopsetups
+
+\startsetups xml:directive:injector:blank
+ \blank
+\stopsetups
+\stoptyping
+
+In the example we see:
+
+\starttyping
+<?context-directive injector page t7 t8 ?>
+\stoptyping
+
+When we set \type {\xmlsetinjector[t7]} a pagebreak will injected in that spot.
+Tags like \type {t7}, \type {t8} etc.\ can represent versions.
+
+\stopsection
+
+\startsection[title=preprocessing]
+
+% local match = lpeg.match
+% local replacer = lpeg.replacer("BAD TITLE:","<bold>BAD TITLE:</bold>")
+%
+% function lxml.preprocessor(data,settings)
+% return match(replacer,data)
+% end
+
+\startbuffer[pre-code]
+\startluacode
+ function lxml.preprocessor(data,settings)
+ return string.find(data,"BAD TITLE:")
+ and string.gsub(data,"BAD TITLE:","<bold>BAD TITLE:</bold>")
+ or data
+ end
+\stopluacode
+\stopbuffer
+
+\startbuffer[pre-xml]
+\startxmlsetups pre:demo:initialize
+ \xmlsetsetup{#1}{*}{pre:demo:*}
+\stopxmlsetups
+
+\xmlregisterdocumentsetup{pre:demo}{pre:demo:initialize}
+
+\startxmlsetups pre:demo:root
+ \xmlflush{#1}
+\stopxmlsetups
+
+\startxmlsetups pre:demo:bold
+ \begingroup\bf\xmlflush{#1}\endgroup
+\stopxmlsetups
+
+\starttext
+ \xmlprocessbuffer{pre:demo}{demo}{}
+\stoptext
+\stopbuffer
+
+Say that you have the following \XML\ setup:
+
+\typebuffer[pre-xml]
+
+and that (such things happen) the input looks like this:
+
+\startbuffer[demo]
+<root>
+BAD TITLE: crap crap crap ...
+
+BAD TITLE: crap crap crap ...
+</root>
+\stopbuffer
+
+\typebuffer[demo]
+
+You can then clean up these \type {BAD TITLE}'s as follows:
+
+\typebuffer[pre-code]
+
+and get as result:
+
+\start \getbuffer[pre-code,pre-xml] \stop
+
+The preprocessor function gets as second argument the current settings, an d
+the field \type {currentresource} can be used to limit the actions to
+specific resources, in our case it's \type {buffer: demo}. Afterwards you can
+reset the proprocessor with:
+
+\startluacode
+lxml.preprocessor = nil
+\stopluacode
+
+Future versions might give some more control over preprocessors. For now consider
+it to be a quick hack.
+
+\stopsection
+
+\stopchapter
+
+\stopcomponent