% language=uk \startcomponent mk-zapfino \environment mk-environment \nonknuthmode \definefontfeature [SampleFont] [language=dflt, script=latn, calt=yes, clig=yes, rlig=yes, tlig=yes, mode=node] \font\Sample=ZapfinoExtraLTPro*SampleFont at 24pt \def\SampleChar#1{\dontleavehmode\struttedbox{\Sample\fontchar{#1}}} \def\SampleText#1{\dontleavehmode\struttedbox{\Sample#1}} \doifmodeelse {tug} { \title{Zapfing fonts} \subject{by Hans Hagen \& Taco Hoekwater} This is Chapter~XII from \notabene {\CONTEXT, from \MKII\ to \MKIV}, a document that describes our explorations, experiments and decisions made while we develop \LUATEX. This text has not been copy-edited. \blank[3*big] } { \chapter{Zapfing fonts} } \subject {remark} {\it The actual form of the tables shown here might have changed in the meantime. However, since this document describes the stepwise development of \LUATEX\ and \CONTEXT\ \MKIV\ we don't update the following information. The rendering might differ from earlier rendering simply because the code used to process this chapter evolves.} \subject {features} In previous chapters we've seen support for \OPENTYPE\ features creep into \LUATEX\ and \CONTEXT\ \MKIV. However, it may not have been clear that so far we were just feeding the traditional \TEX\ machinery with the right data: ligatures and kerns. Here we will show what so called features can do for you. Not much \LUA\ code will be shown, if only because relatively complex code is needed to handle this kind of trickery with acceptable performance. In order to support features in their full glory more is needed than \TEX's ligature and kern mechanisms: we need to manipulate the node list. As a result, we have now a second mechanism built into \MKIV\ and users can choose what method they like most. The first method, called \type {base}, is less powerful and less complete than the one named \type {node}. Eventually \CONTEXT\ will use the node method by default. There are two variants of features: substitutions and positioning. Here we concentrate on substitutions of which there are several. Positioning is for instance used for specialized kerning as needed in for instance typesetting Arab. One character representation can be replaced by one or more fixed alternatives or alternatives chosen from a list of alternatives (substitutions or alternates). Multiple characters can be replaces by one character (substitutions, alternates or a ligature). The replacements can depend on preceding and|/|or following glyphs in which case we say that the replacement is driven by rules. Rules can deal with single glyphs, combinations of glyphs, classes (defined in the font) of glyphs and|/|or ranges of glyphs. Because the available documentation of \OPENTYPE\ is rather minimalistic and because most fonts are relatively simple, you can imagine that figuring out how to implement support for fonts with advanced features is not entirely trivial and involves some trial and error. What also complicate things is that features can interfere. Yet another complicating factor is that in the order of applying a rule may obscure a later rule. Such fonts don't ship with manuals and examples of correct output are not part of the buy. We like testing \LUATEX's open type support with Palatino Regular and Palatino Sans and good old \TYPEONE\ support with Optima Nova. So it makes sense to test advanced features with Zapfino Pro. This font has many features, which happen to be implemented by Adam Twardoch, a well known font expert and familiar with the \TEX\ community. We had the feeling that when \LUATEX\ can support Zapfino Pro, designed by Hermann Zapf and enhanced by Adam, we have reached a crucial point in the development. The first thing that you will observe when using this font is that the files are larger than normal, especially the cached versions in \MKIV. This made me extend some of the serialization code that we use for caching font data so that it could handle huge tables better but at the cost of some speed. Once we could handle the data conveniently and as a side effect look into the font data with an editor, it became clear that implementing for the \type {calt} and \type {clig} features would take a bit of coding. \subject{example} Before some details will be discussed, we will show two of the test texts that \CONTEXT\ users normally use when testing layouts or new features, a quote from E.R.\ Tufte and one from Hermann Zapf. The \TEX\ code shows how features are set in \CONTEXT. \startbuffer \definefontfeature [zapfino] [language=nld,script=latn,mode=node, calt=yes,clig=yes,liga=yes,rlig=yes,tlig=yes] \definefont [Zapfino] [ZapfinoExtraLTPro*zapfino at 24pt] [line=40pt] \Zapfino \input tufte \par \stopbuffer \typebuffer \blank[disable] \start \getbuffer \stop You don't even have to look too closely in order to notice that characters are represented by different glyphs, depending on the context in which they appear. \startbuffer \definefontsynonym [Zapfino] [ZapfinoExtraLTPro] [features=zapfino] \definedfont [Zapfino at 24pt] \setupinterlinespace [line=40pt] \input zapf \par \stopbuffer \typebuffer \blank[disable] \start \getbuffer \stop \subject{obeying rules} When we were testing node based feature support, the only way to check this was to identify the rules that lead to certain glyphs. The more unique glyphs are good candidates for this. For instance \startitemize[packed] \item there is s special glyph representing \SampleChar{c_slash_o} \item in the input stream this is the character sequence \type{c/o} \item so there most be a rule that tells us that this sequence becomes that ligature \stopitemize As said, in this case, the replacement glyph is supposed to be a ligature and indeed there is such a ligature: \type {c_slash_o}. Of course, this replacement will only take place when the sequence is surrounded by spaces. However, when testing this, we were not looking at this rule but at the (randomly chosen) rule that was meant to intercept the alternative \type {h.2} followed by \type {z.4}. Interesting was that this resolved to a ligature indeed, but the shape associated with this ligature was an~\type {h}, which is not right. Actually, a few more of such rules turned out to be wrong. It took a bit of an effort to reach this conclusion because of the mentioned interferences of features and rules. At that time, the rule entry (in raw \LUATEX\ table format) looks as follows: \starttyping [44] = { ["format"] = "coverage", ["rules"] = { [1] = { ["coverage"] = { ["ncovers"] = { [1] = "h.2", [2] = "z.4", } }, ["lookups"] = { [1] = { ["lookup_tag"] = "L084", ["seq"] = 0, } } } } ["script_lang_index"] = 1, ["tag"] = "calt", ["type"] = "chainsub" } \stoptyping Instead of reinventing the wheel, we used the \FONTFORGE\ libraries for reading the \OPENTYPE\ font files. Therefore the \LUATEX\ table is resembling the internal \FONTFORGE\ data structures. Currently we show the version~1 format. Here \type {ncovers} means that when the current character has shape \SampleChar {h.2} (\type{h.2}) and the next one is \SampleChar{z.4} (\type{z.4}) (a sequence) then we need to apply the lookup internally tagged \type {L084}. Such a rule can be more extensive, for instance instead of \type {h.2} one can have a list of characters, and there can be \type {bcovers} and \type {fcovers} as well, which means that preceding or following character need to be taken into account. When this rule matches, it resolves to a specification like: \starttyping [6] = { ["flags"] = 0, ["lig"] = { ["char"] = "h", ["components"] = "h.2 z.4", }, ["script_lang_index"] = 65535, ["tag"] = "L084", ["type"] = "ligature", } \stoptyping Here \type {tag} and \type {script_lang_index} are kind of special and are part of an private feature system, i.e.\ they make up the cross reference between rules and glyphs. Watch how the components don't match the character, which is even more peculiar when we realize that these are the initials of the author of the font. It took a couple of Skype sessions and mails before we came to the conclusion that this was probably a glitch in the font. So, what to do when a font has bugs like this? Should one disable the feature? That would be a pitty because a font like Zapfino depends on it. On the other hand, given the number of rules and given the fact that there are different rule sets for some languages, you can imagine that making up the rules and checking them is not trivial. We should realize that Zapfino is an extraordinary case, because it used the \OPENTYPE\ features extensively. We can also be sure that the problems will be fixed once they are known, if only because Adam Twardoch (who did the job) has exceptionally high standards but it may take a while before the fix reached the user (who then has to update his or her font). As said, it also takes some effort to run into the situation described here so the likelihood of running into this rule is small. This also brings to our attention the fact that fonts can now contain bugs and updating them makes sense but can break existing documents. Since such fonts are copyrighted and not available on line, font vendors need to find ways to communicate these fixes to their customers. Can we add some additional checks for problems like this? For a while I thought that it was possible by assuming that ligatures have names like \type {h.2_z.4} but alas, sequences of glyphs are mapped onto ligatures using mappings like the following: \starttabulate[||||] \NC \type{three fraction four.2} \NC \type{threequarters} \NC \SampleChar{threequarters} \NC\NR \NC \type{three fraction four} \NC \type{threequarters} \NC \SampleChar{threequarters} \NC\NR \NC \type{d r} \NC \type{d_r} \NC \SampleChar{d_r} \NC\NR \NC \type{e period} \NC \type{e_period} \NC \SampleChar{e_period} \NC\NR \NC \type{f i} \NC \type{fi} \NC \SampleChar{fi} \NC\NR \NC \type{f l} \NC \type{fl} \NC \SampleChar{fl} \NC\NR \NC \type{f f i} \NC \type{f_f_i} \NC \SampleChar{f_f_i} \NC\NR \NC \type{f t} \NC \type{f_t} \NC \SampleChar{f_t} \NC\NR \stoptabulate Some ligature have no \type {_} in their names and there are also some inconsistencies, compare the \type {fl} and \type {f_f_i}. Here font history is painfully reflected in inconsistency and no solution can be found here. So, in order to get rid of this problem, \MKIV\ implements a method to ignore certain rules but then, this only makes sense if one knows how the rules are tagged internally. So, in practice this is no solution. However, you can imagine that at some point \CONTEXT\ ships with a database of fixes that are applied to known fonts with certain version numbers. We also found out that the font table that we used was not good enough for our purpose because the exact order in what rules have to be applies was not available. Then we noticed that in the meantime \FONTFORGE\ had moved on to version~2 and after consulting the author we quickly came to the conclusion that it made sense to use the updated representation. In version~2 the snippet with the previously mentioned rule looks as follows: \starttyping ["ks_latn_l_66_c_19"]={ ["format"]="coverage", ["rules"]={ [1]={ ["coverage"]={ ["current"]={ [1]="h.2", [2]="z.4", } }, ["lookups"]={ [1]={ ["lookup"]="ls_l_84", ["seq"]=0, } } } }, ["type"]="chainsub", }, \stoptyping The main rule table is now indexed by name which is possible because the order of rules is specified somewhere else. The key \type {ncovers} has been replaced by \type {current}. As long as \LUATEX\ is in beta stage, we have the freedom to change such labels as some of them are rather \FONTFORGE\ specific. This rule is mentioned in a feature specification table. Here specific features are associated with languages and scripts. This is just one of the entries concerning \type {calt}. You can imagine that it took a while to figure out how best to deal with this, but eventually the \MKIV\ code could do the trick. The cryptic names are replacements for pointers in the \FONTFORGE\ datastructure. In order to be able to use \FONTFORGE\ for font development and analysis, the decision was made to stick closely to its idiom. \starttyping ["gsub"]={ ... [67]={ ["features"]={ [1]={ ["scripts"]={ [1]={ ["langs"]={ [1]="AFK ", [2]="DEU ", [3]="NLD ", [4]="ROM ", [5]="TRK ", [6]="dflt", }, ["script"]="latn", } }, ["tag"]="calt", } }, ["name"]="ks_latn_l_66", ["subtables"]={ [1]={ ["name"]="ks_latn_l_66_c_0", }, ... [20]={ ["name"]="ks_latn_l_66_c_19", }, ... }, ["type"]="gsub_context_chain", }, \stoptyping \subject{practice} The few snapshots of the font table probably don't make much sense if you haven't seen the whole table. Well, it certainly helps to see the whole picture, but we're talking of a 14 MB file (1.5 MB bytecode). When resolving ligatures, we can follow a straightforward approach: \startitemize[packed] \item walk over the nodelist and at each character (glyph node) call a function \item this function inspects the character and takes a look at the following ones \item when a ligature is identified, the sequence of nodes is replaced \stopitemize Substitutions are not much different but there we look at just one character. However, contextual substitutions (and ligatures) are more complex. Here we need to loop over a list of rules (dependent on script and language) and this involves a sequence as well as preceding and following characters. When we have a hit, the sequence will be replaced by another one, determined by a lookup in the character table. Since this is a rather time consuming operation, especially because many surrounding characters need to be taken into account, you can imagine that we need a bit of trickery to get an acceptable performance. Fortunately \LUA\ is pretty fast when it comes down to manipulating strings and tables, so we can prepare some handy datastructures in advance. When testing the implementation of features one need to be aware of the fact that some appearance are also implemented using the regular ligature mechanisms. Take the following definitions: \startbuffer[a] \definefontfeature [none] [language=dflt,script=latn,mode=node,liga=no] \definefontfeature [calt] [language=dflt,script=latn,mode=node,liga=no,calt=yes] \definefontfeature [clig] [language=dflt,script=latn,mode=node,liga=no,clig=yes] \definefontfeature [dlig] [language=dflt,script=latn,mode=node,liga=no,dlig=yes] \definefontfeature [liga] [language=dflt,script=latn,mode=node] \stopbuffer \startbuffer[b] \starttabulate[||||] \NC \type{none } \NC \definedfont[ZapfinoExtraLTPro*none at 24pt]\hbox{on the synthesis}\NC\definedfont[ZapfinoExtraLTPro*none at 24pt]\hbox{winnow the wheat}\NC \NR \NC \type{calt } \NC \definedfont[ZapfinoExtraLTPro*calt at 24pt]\hbox{on the synthesis}\NC\definedfont[ZapfinoExtraLTPro*calt at 24pt]\hbox{winnow the wheat}\NC \NR \NC \type{clig } \NC \definedfont[ZapfinoExtraLTPro*clig at 24pt]\hbox{on the synthesis}\NC\definedfont[ZapfinoExtraLTPro*clig at 24pt]\hbox{winnow the wheat}\NC \NR \NC \type{dlig } \NC \definedfont[ZapfinoExtraLTPro*dlig at 24pt]\hbox{on the synthesis}\NC\definedfont[ZapfinoExtraLTPro*dlig at 24pt]\hbox{winnow the wheat}\NC \NR \NC \type{liga } \NC \definedfont[ZapfinoExtraLTPro*liga at 24pt]\hbox{on the synthesis}\NC\definedfont[ZapfinoExtraLTPro*liga at 24pt]\hbox{winnow the wheat}\NC \NR \stoptabulate \stopbuffer \typebuffer[a] This gives: \start \getbuffer[a] \getbuffer[b] \stop Here are Adam's recommendations with regards to the \type {dlig} feature: \quotation {The \type{dlig} feature is supposed to by use only upon user's discretion, usually on single runs, words or even pairs. It makes little sense to enable \type {dlig} for an entire sentence or paragraph. That's how the \OPENTYPE\ specification envisions it.} When testing features it helps to use words that look similar so next we will show some examples that used. When we look at these examples, we need to understand that when a specific character representation is analyzed, the rules can take preceding and following characters into account. The rules take characters as well as their shapes, or more precisely: one of their shapes since Zapfino has many variants, into account. Since different rules are used for languages (okay, this is limited to only a subset of languages that use the latin script) not only shapes but also the way words are constructed are taken into account. Designing te rules is definitely non trivial. When testing the implementation we ran into cases where the initial \type {t} showed up wrong, for instance in the the Dutch word \type {troef}. Because space can be part of the rules, we need to handle the cases where words end and start and boxes are then kind of special. \definefontfeature [zapfing] [language=dflt, script=latn, calt=yes, clig=yes, rlig=yes, tlig=yes, mode=node] \font\Zapfing=ZapfinoExtraLTPro*zapfing at 24pt \startbuffer troef troef troef troeftroef troef \par \ruledhbox{troef troef troef troeftroef troef} \par \ruledhbox{troef 123} \par \ruledhbox{troef} \ruledhbox{troef } \ruledhbox{ troef} \ruledhbox { troef } \par \stopbuffer \typebuffer \start \Zapfing \getbuffer \stop Unfortunately, this does not work well with punctuation, which is less prominent in the rules than space. In our favourite test quote of Tufte, we have lots of commas and there it shows up: \startbuffer review review review, review \par itemize, review \par itemize, review, \par \stopbuffer \typebuffer \start \Zapfing \getbuffer \stop Of course we can decide to extend the rule base at runtime and this may well happen when we experiment more with this font. The next one was one of our first test lines, Watch the initial and the Zapfino ligature. \startbuffer Welcome to Zapfino \stopbuffer \typebuffer \start \Zapfing \getbuffer \stop For a while there was a bug in the rule handler that resulted in the variant of the \type {y} that has a very large descender. Incidentally the word \type {synthesize} is also a good test case for the \type {the} pattern which gets special treatment because there is a ligature available. \startbuffer synopsize versus synthesize versus synthase versus sympathy versus synonym \stopbuffer \typebuffer \start \Zapfing \getbuffer \stop Here are some examples that use the \type {g}, \type {d} and \type {f} in several places. \startbuffer eggen groet ogen hagen \par dieren druiven onder aard donder modder \par fiets effe flater triest troef \par \stopbuffer \typebuffer \start \Zapfing \getbuffer \stop Let's see how well Hermann has taken care of the \type {h}'s representations. There are quite some variants of the lowercase one: \starttabulate \NC \type {h} \NC \SampleChar{h} \NC \NR \NC \type {h.2} \NC \SampleChar{h.2} \NC \NR \NC \type {h.3} \NC \SampleChar{h.3} \NC \NR \NC \type {h.4} \NC \SampleChar{h.4} \NC \NR \NC \type {h.5} \NC \SampleChar{h.5} \NC \NR \NC \type {h.init} \NC \SampleChar{h.init} \NC \NR \NC \type {h.sups} \NC \SampleChar{h.sups} \NC \NR \NC \type {h.sc} \NC \SampleChar{h.sc} \NC \NR \NC \type {orn.73} \NC \SampleChar{orn.73} \NC \NR \stoptabulate How about the uppercase variant, as used in his name: \startbuffer M Mr Mr. H He Her Herm Herma Herman Hermann Z Za Zap Zapf \par Mr. Hermann Zapf \stopbuffer \typebuffer \start \Zapfing \getbuffer \stop Of course we have to test another famous name: \startbuffer D Do Don Dona Donal Donald K Kn Knu Knut Knuth \par Don Knuth Donald Knuth Donald E. Knuth DEK \par Prof. Dr. Donald E. Knuth \par \stopbuffer \typebuffer \start \Zapfing \getbuffer \stop Unfortunately the \LUA\ and \TEX\ logos don't come out that well: \startbuffer L Lu Lua l lu lua t te tex TeX luatex luaTeX LuaTeX \stopbuffer \typebuffer \start \Zapfing \getbuffer \stop This font has quite some ornaments and there is an \type {ornm} feature that can be applied. We're still not sure about its usage, but when one keys in text in lowercase, \type {hermann} comes out as follows: \definefontfeature [gebarentaal] [language=dflt, script=latn, mode=node, ornm=yes, liga=no] {\font\Sample=ZapfinoExtraLTPro*gebarentaal at 24pt \Sample hermann} As said in the beginning, dirty implementation details will be kept away from the reader. Also, you should not be surprised if the current code had some bugs or does some things wrong. Also, if spacing looks a bit weird to you, keep in mind that we're still in the middle of sorting things out. \start \Zapfing Taco Hoekwater \& Hans Hagen \stop \stopcomponent