diff options
Diffstat (limited to 'doc/context/sources/general/manuals/hybrid/hybrid-backend.tex')
-rw-r--r-- | doc/context/sources/general/manuals/hybrid/hybrid-backend.tex | 389 |
1 files changed, 389 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/hybrid/hybrid-backend.tex b/doc/context/sources/general/manuals/hybrid/hybrid-backend.tex new file mode 100644 index 000000000..4b6055151 --- /dev/null +++ b/doc/context/sources/general/manuals/hybrid/hybrid-backend.tex @@ -0,0 +1,389 @@ +% language=uk + +\startcomponent hybrid-backends + +\environment hybrid-environment + +\startchapter[title={Backend code}] + +\startsection [title={Introduction}] + +In \CONTEXT\ we've always separated the backend code in so called driver files. +This means that in the code related to typesetting only calls to the \API\ take +place, and no backend specific code is to be used. That way we can support +backend like dvipsone (and dviwindo), dvips, acrobat, pdftex and dvipdfmx with +one interface. A simular model is used in \MKIV\ although at the moment we only +have one backend: \PDF. \footnote {At this moment we only support the native +\PDF\ backend but future versions might support \XML\ (\HTML) output as well.} + +Some \CONTEXT\ users like to add their own \PDF\ specific code to their styles or +modules. However, such extensions can interfere with existing code, especially +when resources are involved. This has to be done via the official helper macros. + +In the next sections an overview will be given of the current approach. There are +still quite some rough edges but these will be polished as soon as the backend +code is more isolated in \LUATEX\ itself. + +\stopsection + +\startsection [title={Structure}] + +A \PDF\ file is a tree of indirect objects. Each object has a number and the file +contains a table (or multiple tables) that relates these numbers to positions in +a file (or position in a compressed object stream). That way a file can be viewed +without reading all data: a viewer only loads what is needed. + +\starttyping +1 0 obj << + /Name (test) /Address 2 0 R +>> +2 0 obj [ + (Main Street) (24) (postal code) (MyPlace) +] +\stoptyping + +For the sake of the discussion we consider strings like \type {(test)} also to be +objects. In the next table we list what we can encounter in a \PDF\ file. There +can be indirect objects in which case a reference is used (\type{2 0 R}) and +direct ones. + +\starttabulate[|l|l|p|] +\FL +\NC \bf type \NC \bf form \NC \bf meaning \NC \NR +\TL +\NC constant \NC \type{/...} \NC A symbol (prescribed string). \NC \NR +\NC string \NC \type{(...)} \NC A sequence of characters in pdfdoc encoding \NC \NR +\NC unicode \NC \type{<...>} \NC A sequence of characters in utf16 encoding \NC \NR +\NC number \NC \type{3.1415} \NC A number constant. \NC \NR +\NC boolean \NC \type{true/false} \NC A boolean constant. \NC \NR +\NC reference \NC \type{N 0 R} \NC A reference to an object \NC \NR +\NC dictionary \NC \type{<< ... >>} \NC A collection of key value pairs where the + value itself is an (indirect) object. \NC \NR +\NC array \NC \type{[ ... ]} \NC A list of objects or references to objects. \NC \NR +\NC stream \NC \NC A sequence of bytes either or not packaged with a dictionary + that contains descriptive data. \NC \NR +\NC xform \NC \NC A special kind of object containing an reusable blob of data, + for example an image. \NC \NR +\LL +\stoptabulate + +While writing additional backend code, we mostly create dictionaries. + +\starttyping +<< /Name (test) /Address 2 0 R >> +\stoptyping + +In this case the indirect object can look like: + +\starttyping +[ (Main Street) (24) (postal code) (MyPlace) ] +\stoptyping + +It all starts in the document's root object. From there we access the page tree +and resources. Each page carries its own resource information which makes random +access easier. A page has a page stream and there we find the to be rendered +content as a mixture of (\UNICODE) strings and special drawing and rendering +operators. Here we will not discuss them as they are mostly generated by the +engine itself or dedicated subsystems like the \METAPOST\ converter. There we use +literal or \type {\latelua} whatsits to inject code into the current stream. + +In the \CONTEXT\ \MKII\ backend drivers code you will see objects in their +verbose form. The content is passed on using special primitives, like \type +{\pdfobj}, \type{\pdfannot}, \type {\pdfcatalog}, etc. In \MKIV\ no such +primitives are used. In fact, some of them are overloaded to do nothing at all. +In the \LUA\ backend code you will find function calls like: + +\starttyping +local d = lpdf.dictionary { + Name = lpdf.string("test"), + Address = lpdf.array { + "Main Street", "24", "postal code", "MyPlace", + } +} +\stoptyping + +Equaly valid is: + +\starttyping +local d = lpdf.dictionary() +d.Name = "test" +\stoptyping + +Eventually the object will end up in the file using calls like: + +\starttyping +local r = pdf.immediateobj(tostring(d)) +\stoptyping + +or using the wrapper (which permits tracing): + +\starttyping +local r = lpdf.flushobject(d) +\stoptyping + +The object content will be serialized according to the formal specification so +the proper \type {<< >>} etc.\ are added. If you want the content instead you can +use a function call: + +\starttyping +local dict = d() +\stoptyping + +An example of using references is: + +\starttyping +local a = lpdf.array { + "Main Street", "24", "postal code", "MyPlace", +} +local d = lpdf.dictionary { + Name = lpdf.string("test"), + Address = lpdf.reference(a), +} +local r = lpdf.flushobject(d) +\stoptyping + +\stopsection + +We have the following creators. Their arguments are optional. + +\starttabulate[|l|p|] +\FL +\NC \bf function \NC \bf optional parameter \NC \NR +\TL +%NC \type{lpdf.stream} \NC indexed table of operators \NC \NR +\NC \type{lpdf.dictionary} \NC hash with key/values \NC \NR +\NC \type{lpdf.array} \NC indexed table of objects \NC \NR +\NC \type{lpdf.unicode} \NC string \NC \NR +\NC \type{lpdf.string} \NC string \NC \NR +\NC \type{lpdf.number} \NC number \NC \NR +\NC \type{lpdf.constant} \NC string \NC \NR +\NC \type{lpdf.null} \NC \NC \NR +\NC \type{lpdf.boolean} \NC boolean \NC \NR +%NC \type{lpdf.true} \NC \NC \NR +%NC \type{lpdf.false} \NC \NC \NR +\NC \type{lpdf.reference} \NC string \NC \NR +\NC \type{lpdf.verbose} \NC indexed table of strings \NC \NR +\LL +\stoptabulate + +Flushing objects is done with: + +\starttyping +lpdf.flushobject(obj) +\stoptyping + +Reserving object is or course possible and done with: + +\starttyping +local r = lpdf.reserveobject() +\stoptyping + +Such an object is flushed with: + +\starttyping +lpdf.flushobject(r,obj) +\stoptyping + +We also support named objects: + +\starttyping +lpdf.reserveobject("myobject") + +lpdf.flushobject("myobject",obj) +\stoptyping + +\startsection [title={Resources}] + +While \LUATEX\ itself will embed all resources related to regular typesetting, +\MKIV\ has to take care of embedding those related to special tricks, like +annotations, spot colors, layers, shades, transparencies, metadata, etc. If you +ever took a look in the \MKII\ \type {spec-*} files you might have gotten the +impression that it quickly becomes messy. The code there is actually rather old +and evolved in sync with the \PDF\ format as well as \PDFTEX\ and \DVIPDFMX\ +maturing to their current state. As a result we have a dedicated object +referencing model that sometimes results in multiple passes due to forward +references. We could have gotten away from that with the latest versions of +\PDFTEX\ as it provides means to reserve object numbers but it makes not much +sense to do that now that \MKII\ is frozen. + +Because third party modules (like tikz) also can add resources like in \MKII\ +using an \API\ that makes sure that no interference takes place. Think of macros +like: + +\starttyping +\pdfbackendsetcatalog {key}{string} +\pdfbackendsetinfo {key}{string} +\pdfbackendsetname {key}{string} + +\pdfbackendsetpageattribute {key}{string} +\pdfbackendsetpagesattribute{key}{string} +\pdfbackendsetpageresource {key}{string} + +\pdfbackendsetextgstate {key}{pdfdata} +\pdfbackendsetcolorspace {key}{pdfdata} +\pdfbackendsetpattern {key}{pdfdata} +\pdfbackendsetshade {key}{pdfdata} +\stoptyping + +One is free to use the \LUA\ interface instead, as there one has more +possibilities. The names are similar, like: + +\starttyping +lpdf.addtoinfo(key,anything_valid_pdf) +\stoptyping + +At the time of this writing (\LUATEX\ .50) there are still places where \TEX\ and +\LUA\ code is interwoven in a non optimal way, but that will change in the future +as the backend is completely separated and we can do more \TEX\ trickery at the +\LUA\ end. + +Also, currently we expose more of the backend code than we like and future +versions will have a more restricted access. The following function will stay +public: + +\starttyping +lpdf.addtopageresources (key,value) +lpdf.addtopageattributes (key,value) +lpdf.addtopagesattributes(key,value) + +lpdf.adddocumentextgstate(key,value) +lpdf.adddocumentcolorspac(key,value) +lpdf.adddocumentpattern (key,value) +lpdf.adddocumentshade (key,value) + +lpdf.addtocatalog (key,value) +lpdf.addtoinfo (key,value) +lpdf.addtonames (key,value) +\stoptyping + +There are several tracing options built in and some more will be added in due +time: + +\starttyping +\enabletrackers + [backend.finalizers, + backend.resources, + backend.objects, + backend.detail] +\stoptyping + +As with all trackers you can also pass them on the command line, for example: + +\starttyping +context --trackers=backend.* yourfile +\stoptyping + +The reference related backend mechanisms have their own trackers. + +\stopsection + +\startsection [title={Transformations}] + +There is at the time of this writing still some backend related code at the \TEX\ +end that needs a cleanup. Most noticeable is the code that deals with +transformations (like scaling). At some moment in \PDFTEX\ a primitive was +introduced but it was not completely covering the transform matrix so we never +used it. In \LUATEX\ we will come up with a better mechanism. Till that moment we +stick to the \MKII\ method. + +\stopsection + +\startsection [title={Annotations}] + +The \LUA\ based backend of \MKIV\ is not so much less code, but definitely +cleaner. The reason why there is quite some code is because in \CONTEXT\ we also +handle annotations and destinations in \LUA. In other words: \TEX\ is not +bothered by the backend any more. We could make that split without too much +impact as we never depended on \PDFTEX\ hyperlink related features and used +generic annotations instead. It's for that reason that \CONTEXT\ has always been +able to nest hyperlinks and have annotations with a chain of actions. + +Another reason for doing it all at the \LUA\ end is that as in \MKII\ we have to +deal with the rather hybrid cross reference mechanisms which uses a sort of +language and parsing this is also easier at the \LUA\ end. Think of: + +\starttyping +\definereference[somesound][StartSound(attention)] + +\at {just some page} [someplace,somesound,StartMovie(somemovie)] +\stoptyping + +We parse the specification expanding shortcuts when needed, create an action +chain, make sure that the movie related resources are taken care of (normally the +movie itself will be a figure), and turn the three words into hyperlinks. As this +all happens in \LUA\ we have less \TEX\ code. Contrary to what you might expect, +the \LUA\ code is not that much faster as the \MKII\ \TEX\ code is rather +optimized. + +Special features like \JAVASCRIPT\ as well as widgets (and forms) are also +reimplemented. Support for \JAVASCRIPT\ is not that complex at all, but as in +\CONTEXT\ we can organize scripts in collections and have automatic inclusion of +used functions, still some code is needed. As we now do this in \LUA\ we use less +\TEX\ memory. Reimplementing widgets took a bit more work as I used the +opportunity to remove hacks for older viewers. As support for widgets is somewhat +instable in viewers quite some testing was needed, especially because we keep +supporting cloned and copied fields (resulting in widget trees). + +An interesting complication with widgets is that each instance can have a lot of +properties and as we want to be able to use thousands of them in one document, +each with different properties, we have efficient storage in \MKII\ and want to +do the same in \LUA. Most code at the \TEX\ end is related to passing all those +options. + +You could use the \LUA\ functions that relate to annotations etc.\ but normally +you will use the regular \CONTEXT\ user interface. For practical reasons, the +backend code is grouped in several tables: + +The \type{backends} table has subtables for each backend and currently there is +only one: \type {pdf}. Each backend provides tables itself. In the +\type{codeinjections} namespace we collect functions that don't interfere with +the typesetting or typeset result, like inserting all kind of resources (movies, +attachment, etc.), widget related functionality, and in fact everything that does +not fit into the other categories. In \type {nodeinjections} we organize +functions that inject literal \PDF\ code in the nodelist which then ends up in +the \PDF\ stream: color, layers, etc. The \type {registrations} table is reserved +for functions related to resources that result from node injections: spot colors, +transparencies, etc. Once the backend code is finished we might come up with +another organization. No matter what we end up with, the way the \type {backends} +table is supposed to be organized determines the \API\ and those who have seen +the \MKII\ backend code will recognize some of it. + +\startsection [title={Metadata}] + +We always had the opportunity to set the information fields in a \PDF\ but +standardization forces us to add these large verbose metadata blobs. As this blob +is coded in \XML\ we use the built in \XML\ parser to fill a template. Thanks to +extensive testing and research by Peter Rolf we now have a rather complete +support for \PDF/x related demands. This will definitely evolve with the advance +of the \PDF\ specification. You can replace the information with your own but we +suggest that you stay away from this metadata mess as far as possible. + +\stopsection + +\startsection [title={Helpers}] + +If you look into the \type {lpdf-*.lua} files you will find more +functions. Some are public helpers, like: + +\starttabulate +\NC \type {lpdf.toeight(str)} \NC returns \type {(string)} \NC \NR +%NC \type {lpdf.cleaned(str)} \NC returns \type {escaped string} \NC \NR +\NC \type {lpdf.tosixteen(str)} \NC returns \type {<utf16 sequence>} \NC \NR +\stoptabulate + +An example of another public function is: + +\starttyping +lpdf.sharedobj(content) +\stoptyping + +This one flushes the object and returns the object number. Already defined +objects are reused. In addition to this code driven optimization, some other +optimization and reuse takes place but all that happens without user +intervention. + +\stopsection + +\stopchapter + +\stopcomponent |