if not modules then modules = { } end modules ['font-ots'] = { -- sequences version = 1.001, comment = "companion to font-ini.mkiv", author = "Hans Hagen, PRAGMA-ADE, Hasselt NL", copyright = "PRAGMA ADE / ConTeXt Development Team", license = "see context related readme files", } --[[ldx--
This module is a bit more split up that I'd like but since we also want to test
with plain
The specification of OpenType is (or at least a decade ago was) kind of vague. Apart from a lack of a proper free specifications there's also the problem that Microsoft and Adobe may have their own interpretation of how and in what order to apply features. In general the Microsoft website has more detailed specifications and is a better reference. There is also some information in the FontForge help files. In the end we rely most on the Microsoft specification.
Because there is so much possible, fonts might contain bugs and/or be made to work with certain rederers. These may evolve over time which may have the side effect that suddenly fonts behave differently. We don't want to catch all font issues.
After a lot of experiments (mostly by Taco, me and Idris) the first implementation
becaus quite useful. When it did most of what we wanted, a more optimized version
evolved. Of course all errors are mine and of course the code can be improved. There
are quite some optimizations going on here and processing speed is currently quite
acceptable and has been improved over time. Many complex scripts are not yet supported
yet, but I will look into them as soon as
The specification leaves room for interpretation. In case of doubt the Microsoft implementation is the reference as it is the most complete one. As they deal with lots of scripts and fonts, Kai and Ivo did a lot of testing of the generic code and their suggestions help improve the code. I'm aware that not all border cases can be taken care of, unless we accept excessive runtime, and even then the interference with other mechanisms (like hyphenation) are not trivial.
Especially discretionary handling has been improved much by Kai Eigner who uses complex (latin) fonts. The current implementation is a compromis between his patches and my code and in the meantime performance is quite ok. We cannot check all border cases without compromising speed but so far we're okay. Given good test cases we can probably improve it here and there. Especially chain lookups are non trivial with discretionaries but things got much better over time thanks to Kai.
Glyphs are indexed not by unicode but in their own way. This is because there is no
relationship with unicode at all, apart from the fact that a font might cover certain
ranges of characters. One character can have multiple shapes. However, at the
The initial data table is rather close to the open type specification and also not
that different from the one produced by
This module is sparsely documented because it is has been a moving target. The table format of the reader changed a bit over time and we experiment a lot with different methods for supporting features. By now the structures are quite stable
Incrementing the version number will force a re-cache. We jump the number by one when there's a fix in the reader or processing code that can result in different results.
This code is also used outside context but in context it has to work with other mechanisms. Both put some constraints on the code here.
--ldx]]-- -- Remark: We assume that cursives don't cross discretionaries which is okay because it -- is only used in semitic scripts. -- -- Remark: We assume that marks precede base characters. -- -- Remark: When complex ligatures extend into discs nodes we can get side effects. Normally -- this doesn't happen; ff\d{l}{l}{l} in lm works but ff\d{f}{f}{f}. -- -- Todo: check if we copy attributes to disc nodes if needed. -- -- Todo: it would be nice if we could get rid of components. In other places we can use -- the unicode properties. -- -- Remark: We do some disc juggling where we need to keep in mind that the pre, post and -- replace fields can have prev pointers to a nesting node ... I wonder if that is still -- needed. -- -- Remark: This is not possible: -- -- \discretionary {alpha-} {betagammadelta} -- {\discretionary {alphabeta-} {gammadelta} -- {\discretionary {alphabetagamma-} {delta} -- {alphabetagammadelta}}} -- -- Remark: Something is messed up: we have two mark / ligature indices, one at the -- injection end and one here ... this is based on KE's patches but there is something -- fishy there as I'm pretty sure that for husayni we need some connection (as it's much -- more complex than an average font) but I need proper examples of all cases, not of -- only some. -- -- Remark: I wonder if indexed would be faster than unicoded. It would be a major -- rewrite to have char being unicode + an index field in glyph nodes. Also more -- assignments have to be made in order to keep things in sync. So, it's a no-go. -- -- Remark: We can provide a fast loop when there are no disc nodes (tests show a 1% -- gain). Smaller functions might perform better cache-wise. But ... memory becomes -- faster anyway, so ... local type, next, tonumber = type, next, tonumber local random = math.random local formatters = string.formatters local insert = table.insert local registertracker = trackers.register local logs = logs local trackers = trackers local nodes = nodes local attributes = attributes local fonts = fonts local otf = fonts.handlers.otf local tracers = nodes.tracers local trace_singles = false registertracker("otf.singles", function(v) trace_singles = v end) local trace_multiples = false registertracker("otf.multiples", function(v) trace_multiples = v end) local trace_alternatives = false registertracker("otf.alternatives", function(v) trace_alternatives = v end) local trace_ligatures = false registertracker("otf.ligatures", function(v) trace_ligatures = v end) local trace_contexts = false registertracker("otf.contexts", function(v) trace_contexts = v end) local trace_marks = false registertracker("otf.marks", function(v) trace_marks = v end) local trace_kerns = false registertracker("otf.kerns", function(v) trace_kerns = v end) local trace_cursive = false registertracker("otf.cursive", function(v) trace_cursive = v end) local trace_preparing = false registertracker("otf.preparing", function(v) trace_preparing = v end) local trace_bugs = false registertracker("otf.bugs", function(v) trace_bugs = v end) local trace_details = false registertracker("otf.details", function(v) trace_details = v end) local trace_steps = false registertracker("otf.steps", function(v) trace_steps = v end) local trace_skips = false registertracker("otf.skips", function(v) trace_skips = v end) local trace_directions = false registertracker("otf.directions", function(v) trace_directions = v end) local trace_plugins = false registertracker("otf.plugins", function(v) trace_plugins = v end) local trace_chains = false registertracker("otf.chains", function(v) trace_chains = v end) local trace_kernruns = false registertracker("otf.kernruns", function(v) trace_kernruns = v end) local trace_discruns = false registertracker("otf.discruns", function(v) trace_discruns = v end) local trace_compruns = false registertracker("otf.compruns", function(v) trace_compruns = v end) local trace_testruns = false registertracker("otf.testruns", function(v) trace_testruns = v end) local optimizekerns = true local report_direct = logs.reporter("fonts","otf direct") local report_subchain = logs.reporter("fonts","otf subchain") local report_chain = logs.reporter("fonts","otf chain") local report_process = logs.reporter("fonts","otf process") local report_warning = logs.reporter("fonts","otf warning") local report_run = logs.reporter("fonts","otf run") registertracker("otf.substitutions", "otf.singles","otf.multiples","otf.alternatives","otf.ligatures") registertracker("otf.positions", "otf.marks","otf.kerns","otf.cursive") registertracker("otf.actions", "otf.substitutions","otf.positions") registertracker("otf.sample", "otf.steps","otf.substitutions","otf.positions","otf.analyzing") local nuts = nodes.nuts local tonode = nuts.tonode local tonut = nuts.tonut local getfield = nuts.getfield local setfield = nuts.setfield local getnext = nuts.getnext local setnext = nuts.setnext local getprev = nuts.getprev local setprev = nuts.setprev local getboth = nuts.getboth local setboth = nuts.setboth local getid = nuts.getid local getattr = nuts.getattr local setattr = nuts.setattr local getprop = nuts.getprop local setprop = nuts.setprop local getfont = nuts.getfont local getsubtype = nuts.getsubtype local setsubtype = nuts.setsubtype local getchar = nuts.getchar local setchar = nuts.setchar local getdisc = nuts.getdisc local setdisc = nuts.setdisc local setlink = nuts.setlink local getcomponents = nuts.getcomponents -- the original one, not yet node-aux local setcomponents = nuts.setcomponents -- the original one, not yet node-aux local getdir = nuts.getdir local getwidth = nuts.getwidth local ischar = nuts.is_char local usesfont = nuts.uses_font local insert_node_after = nuts.insert_after local copy_node = nuts.copy local copy_node_list = nuts.copy_list local find_node_tail = nuts.tail local flush_node_list = nuts.flush_list local flush_node = nuts.flush_node local end_of_math = nuts.end_of_math local traverse_nodes = nuts.traverse local traverse_id = nuts.traverse_id local set_components = nuts.set_components local take_components = nuts.take_components local count_components = nuts.count_components local copy_no_components = nuts.copy_no_components local copy_only_glyphs = nuts.copy_only_glyphs local setmetatableindex = table.setmetatableindex ----- zwnj = 0x200C ----- zwj = 0x200D local nodecodes = nodes.nodecodes local glyphcodes = nodes.glyphcodes local disccodes = nodes.disccodes local glyph_code = nodecodes.glyph local glue_code = nodecodes.glue local disc_code = nodecodes.disc local math_code = nodecodes.math local dir_code = nodecodes.dir local localpar_code = nodecodes.localpar ----- discretionary_code = disccodes.discretionary local ligature_code = glyphcodes.ligature local a_state = attributes.private('state') local a_noligature = attributes.private("noligature") local injections = nodes.injections local setmark = injections.setmark local setcursive = injections.setcursive local setkern = injections.setkern local setpair = injections.setpair local resetinjection = injections.reset local copyinjection = injections.copy local setligaindex = injections.setligaindex local getligaindex = injections.getligaindex local fontdata = fonts.hashes.identifiers local fontfeatures = fonts.hashes.features local otffeatures = fonts.constructors.features.otf local registerotffeature = otffeatures.register local onetimemessage = fonts.loggers.onetimemessage or function() end local getrandom = utilities and utilities.randomizer and utilities.randomizer.get otf.defaultnodealternate = "none" -- first last -- We use a few semi-global variables. The handler can be called nested but this assumes -- that the same font is used. local tfmdata = false local characters = false local descriptions = false local marks = false local classes = false local currentfont = false local factor = 0 local threshold = 0 local checkmarks = false local sweepnode = nil local sweepprev = nil local sweepnext = nil local sweephead = { } local notmatchpre = { } local notmatchpost = { } local notmatchreplace = { } local handlers = { } local isspace = injections.isspace local getthreshold = injections.getthreshold local checkstep = (tracers and tracers.steppers.check) or function() end local registerstep = (tracers and tracers.steppers.register) or function() end local registermessage = (tracers and tracers.steppers.message) or function() end -- local function checkdisccontent(d) -- local pre, post, replace = getdisc(d) -- if pre then for n in traverse_id(glue_code,pre) do print("pre",nodes.idstostring(pre)) break end end -- if post then for n in traverse_id(glue_code,post) do print("pos",nodes.idstostring(post)) break end end -- if replace then for n in traverse_id(glue_code,replace) do print("rep",nodes.idstostring(replace)) break end end -- end local function logprocess(...) if trace_steps then registermessage(...) end report_direct(...) end local function logwarning(...) report_direct(...) end local f_unicode = formatters["U+%X"] -- was ["%U"] local f_uniname = formatters["U+%X (%s)"] -- was ["%U (%s)"] local f_unilist = formatters["% t (% t)"] local function gref(n) -- currently the same as in font-otb if type(n) == "number" then local description = descriptions[n] local name = description and description.name if name then return f_uniname(n,name) else return f_unicode(n) end elseif n then local num, nam = { }, { } for i=1,#n do local ni = n[i] if tonumber(ni) then -- later we will start at 2 local di = descriptions[ni] num[i] = f_unicode(ni) nam[i] = di and di.name or "-" end end return f_unilist(num,nam) else return "We get hits on a mark, but we're not sure if the it has to be applied so we need to explicitly test for basechar, baselig and basemark entries.
--ldx]]-- function handlers.gpos_mark2base(head,start,dataset,sequence,markanchors,rlmode) local markchar = getchar(start) if marks[markchar] then local base = getprev(start) -- [glyph] [start=mark] if base then local basechar = ischar(base,currentfont) if basechar then if marks[basechar] then while base do base = getprev(base) if base then basechar = ischar(base,currentfont) if basechar then if not marks[basechar] then break end else if trace_bugs then logwarning("%s: no base for mark %s, case %i",pref(dataset,sequence),gref(markchar),1) end return head, start, false end else if trace_bugs then logwarning("%s: no base for mark %s, case %i",pref(dataset,sequence),gref(markchar),2) end return head, start, false end end end local ba = markanchors[1][basechar] if ba then local ma = markanchors[2] local dx, dy, bound = setmark(start,base,factor,rlmode,ba,ma,characters[basechar],false,checkmarks) if trace_marks then logprocess("%s, bound %s, anchoring mark %s to basechar %s => (%p,%p)", pref(dataset,sequence),bound,gref(markchar),gref(basechar),dx,dy) end return head, start, true elseif trace_bugs then -- onetimemessage(currentfont,basechar,"no base anchors",report_fonts) logwarning("%s: mark %s is not anchored to %s",pref(dataset,sequence),gref(markchar),gref(basechar)) end elseif trace_bugs then logwarning("%s: nothing preceding, case %i",pref(dataset,sequence),1) end elseif trace_bugs then logwarning("%s: nothing preceding, case %i",pref(dataset,sequence),2) end elseif trace_bugs then logwarning("%s: mark %s is no mark",pref(dataset,sequence),gref(markchar)) end return head, start, false end function handlers.gpos_mark2ligature(head,start,dataset,sequence,markanchors,rlmode) local markchar = getchar(start) if marks[markchar] then local base = getprev(start) -- [glyph] [optional marks] [start=mark] if base then local basechar = ischar(base,currentfont) if basechar then if marks[basechar] then while base do base = getprev(base) if base then basechar = ischar(base,currentfont) if basechar then if not marks[basechar] then break end else if trace_bugs then logwarning("%s: no base for mark %s, case %i",pref(dataset,sequence),gref(markchar),1) end return head, start, false end else if trace_bugs then logwarning("%s: no base for mark %s, case %i",pref(dataset,sequence),gref(markchar),2) end return head, start, false end end end local ba = markanchors[1][basechar] if ba then local ma = markanchors[2] if ma then local index = getligaindex(start) ba = ba[index] if ba then local dx, dy, bound = setmark(start,base,factor,rlmode,ba,ma,characters[basechar],false,checkmarks) if trace_marks then logprocess("%s, index %s, bound %s, anchoring mark %s to baselig %s at index %s => (%p,%p)", pref(dataset,sequence),index,bound,gref(markchar),gref(basechar),index,dx,dy) end return head, start, true else if trace_bugs then logwarning("%s: no matching anchors for mark %s and baselig %s with index %a",pref(dataset,sequence),gref(markchar),gref(basechar),index) end end end elseif trace_bugs then -- logwarning("%s: char %s is missing in font",pref(dataset,sequence),gref(basechar)) onetimemessage(currentfont,basechar,"no base anchors",report_fonts) end elseif trace_bugs then logwarning("%s: prev node is no char, case %i",pref(dataset,sequence),1) end elseif trace_bugs then logwarning("%s: prev node is no char, case %i",pref(dataset,sequence),2) end elseif trace_bugs then logwarning("%s: mark %s is no mark",pref(dataset,sequence),gref(markchar)) end return head, start, false end function handlers.gpos_mark2mark(head,start,dataset,sequence,markanchors,rlmode) local markchar = getchar(start) if marks[markchar] then local base = getprev(start) -- [glyph] [basemark] [start=mark] local slc = getligaindex(start) if slc then -- a rather messy loop ... needs checking with husayni while base do local blc = getligaindex(base) if blc and blc ~= slc then base = getprev(base) else break end end end if base then local basechar = ischar(base,currentfont) if basechar then -- subtype test can go local ba = markanchors[1][basechar] -- slot 1 has been made copy of the class hash if ba then local ma = markanchors[2] local dx, dy, bound = setmark(start,base,factor,rlmode,ba,ma,characters[basechar],true,checkmarks) if trace_marks then logprocess("%s, bound %s, anchoring mark %s to basemark %s => (%p,%p)", pref(dataset,sequence),bound,gref(markchar),gref(basechar),dx,dy) end return head, start, true end end end elseif trace_bugs then logwarning("%s: mark %s is no mark",pref(dataset,sequence),gref(markchar)) end return head, start, false end function handlers.gpos_cursive(head,start,dataset,sequence,exitanchors,rlmode,step,i) -- to be checked local startchar = getchar(start) if marks[startchar] then if trace_cursive then logprocess("%s: ignoring cursive for mark %s",pref(dataset,sequence),gref(startchar)) end else local nxt = getnext(start) while nxt do local nextchar = ischar(nxt,currentfont) if not nextchar then break elseif marks[nextchar] then -- should not happen (maybe warning) nxt = getnext(nxt) else local exit = exitanchors[3] if exit then local entry = exitanchors[1][nextchar] if entry then entry = entry[2] if entry then local dx, dy, bound = setcursive(start,nxt,factor,rlmode,exit,entry,characters[startchar],characters[nextchar]) if trace_cursive then logprocess("%s: moving %s to %s cursive (%p,%p) using bound %s in %s mode",pref(dataset,sequence),gref(startchar),gref(nextchar),dx,dy,bound,mref(rlmode)) end return head, start, true end end end break end end end return head, start, false end --[[ldx--I will implement multiple chain replacements once I run into a font that uses it. It's not that complex to handle.
--ldx]]-- local chainprocs = { } local function logprocess(...) if trace_steps then registermessage(...) end report_subchain(...) end local logwarning = report_subchain local function logprocess(...) if trace_steps then registermessage(...) end report_chain(...) end local logwarning = report_chain -- We could share functions but that would lead to extra function calls with many -- arguments, redundant tests and confusing messages. -- The reversesub is a special case, which is why we need to store the replacements -- in a bit weird way. There is no lookup and the replacement comes from the lookup -- itself. It is meant mostly for dealing with Urdu. local function reversesub(head,start,stop,dataset,sequence,replacements,rlmode) local char = getchar(start) local replacement = replacements[char] if replacement then if trace_singles then logprocess("%s: single reverse replacement of %s by %s",cref(dataset,sequence),gref(char),gref(replacement)) end resetinjection(start) setchar(start,replacement) return head, start, true else return head, start, false end end chainprocs.reversesub = reversesub --[[ldx--This chain stuff is somewhat tricky since we can have a sequence of actions to be applied: single, alternate, multiple or ligature where ligature can be an invalid one in the sense that it will replace multiple by one but not neccessary one that looks like the combination (i.e. it is the counterpart of multiple then). For example, the following is valid:
Therefore we we don't really do the replacement here already unless we have the single lookup case. The efficiency of the replacements can be improved by deleting as less as needed but that would also make the code even more messy.
--ldx]]-- --[[ldx--Here we replace start by a single variant.
--ldx]]-- -- To be done (example needed): what if > 1 steps -- this is messy: do we need this disc checking also in alternaties? local function reportzerosteps(dataset,sequence) logwarning("%s: no steps",cref(dataset,sequence)) end local function reportmoresteps(dataset,sequence) logwarning("%s: more than 1 step",cref(dataset,sequence)) end -- local function reportbadsteps(dataset,sequence) -- logwarning("%s: bad step, no proper return values",cref(dataset,sequence)) -- end function chainprocs.gsub_single(head,start,stop,dataset,sequence,currentlookup,chainindex) local steps = currentlookup.steps local nofsteps = currentlookup.nofsteps if nofsteps > 1 then reportmoresteps(dataset,sequence) end if nofsteps == 0 then reportzerosteps(dataset,sequence) else local current = start local mapping = steps[1].coverage while current do local currentchar = ischar(current) if currentchar then local replacement = mapping[currentchar] if not replacement or replacement == "" then if trace_bugs then logwarning("%s: no single for %s",cref(dataset,sequence,chainindex),gref(currentchar)) end else if trace_singles then logprocess("%s: replacing single %s by %s",cref(dataset,sequence,chainindex),gref(currentchar),gref(replacement)) end resetinjection(current) setchar(current,replacement) end return head, start, true elseif currentchar == false then -- can't happen break elseif current == stop then break else current = getnext(current) end end end return head, start, false end --[[ldx--Here we replace start by a sequence of new glyphs.
--ldx]]-- function chainprocs.gsub_multiple(head,start,stop,dataset,sequence,currentlookup) local steps = currentlookup.steps local nofsteps = currentlookup.nofsteps if nofsteps > 1 then reportmoresteps(dataset,sequence) end if nofsteps == 0 then reportzerosteps(dataset,sequence) else local startchar = getchar(start) local replacement = steps[1].coverage[startchar] if not replacement or replacement == "" then if trace_bugs then logwarning("%s: no multiple for %s",cref(dataset,sequence),gref(startchar)) end else if trace_multiples then logprocess("%s: replacing %s by multiple characters %s",cref(dataset,sequence),gref(startchar),gref(replacement)) end return multiple_glyphs(head,start,replacement,sequence.flags[1],dataset[1]) end end return head, start, false end --[[ldx--Here we replace start by new glyph. First we delete the rest of the match.
--ldx]]-- -- char_1 mark_1 -> char_x mark_1 (ignore marks) -- char_1 mark_1 -> char_x -- to be checked: do we always have just one glyph? -- we can also have alternates for marks -- marks come last anyway -- are there cases where we need to delete the mark function chainprocs.gsub_alternate(head,start,stop,dataset,sequence,currentlookup) local steps = currentlookup.steps local nofsteps = currentlookup.nofsteps if nofsteps > 1 then reportmoresteps(dataset,sequence) end if nofsteps == 0 then reportzerosteps(dataset,sequence) else local kind = dataset[4] local what = dataset[1] local value = what == true and tfmdata.shared.features[kind] or what -- todo: optimize in ctx local current = start local mapping = steps[1].coverage while current do local currentchar = ischar(current) if currentchar then local alternatives = mapping[currentchar] if alternatives then local choice, comment = get_alternative_glyph(current,alternatives,value) if choice then if trace_alternatives then logprocess("%s: replacing %s by alternative %a to %s, %s",cref(dataset,sequence),gref(currentchar),choice,gref(choice),comment) end resetinjection(start) setchar(start,choice) else if trace_alternatives then logwarning("%s: no variant %a for %s, %s",cref(dataset,sequence),value,gref(currentchar),comment) end end end return head, start, true elseif currentchar == false then -- can't happen break elseif current == stop then break else current = getnext(current) end end end return head, start, false end --[[ldx--When we replace ligatures we use a helper that handles the marks. I might change this function (move code inline and handle the marks by a separate function). We assume rather stupid ligatures (no complex disc nodes).
--ldx]]-- function chainprocs.gsub_ligature(head,start,stop,dataset,sequence,currentlookup,chainindex) local steps = currentlookup.steps local nofsteps = currentlookup.nofsteps if nofsteps > 1 then reportmoresteps(dataset,sequence) end if nofsteps == 0 then reportzerosteps(dataset,sequence) else local startchar = getchar(start) local ligatures = steps[1].coverage[startchar] if not ligatures then if trace_bugs then logwarning("%s: no ligatures starting with %s",cref(dataset,sequence,chainindex),gref(startchar)) end else local current = getnext(start) local discfound = false local last = stop local nofreplacements = 1 local skipmark = currentlookup.flags[1] -- sequence.flags? while current do -- todo: ischar ... can there really be disc nodes here? local id = getid(current) if id == disc_code then if not discfound then discfound = current end if current == stop then break -- okay? or before the disc else current = getnext(current) end else local schar = getchar(current) if skipmark and marks[schar] then -- marks -- if current == stop then -- maybe add this -- break -- else current = getnext(current) -- end else local lg = ligatures[schar] if lg then ligatures = lg last = current nofreplacements = nofreplacements + 1 if current == stop then break else current = getnext(current) end else break end end end end local ligature = ligatures.ligature if ligature then if chainindex then stop = last end if trace_ligatures then if start == stop then logprocess("%s: replacing character %s by ligature %s case 3",cref(dataset,sequence,chainindex),gref(startchar),gref(ligature)) else logprocess("%s: replacing character %s upto %s by ligature %s case 4",cref(dataset,sequence,chainindex),gref(startchar),gref(getchar(stop)),gref(ligature)) end end head, start = toligature(head,start,stop,ligature,dataset,sequence,skipmark,discfound) return head, start, true, nofreplacements, discfound elseif trace_bugs then if start == stop then logwarning("%s: replacing character %s by ligature fails",cref(dataset,sequence,chainindex),gref(startchar)) else logwarning("%s: replacing character %s upto %s by ligature fails",cref(dataset,sequence,chainindex),gref(startchar),gref(getchar(stop))) end end end end return head, start, false, 0, false end function chainprocs.gpos_single(head,start,stop,dataset,sequence,currentlookup,rlmode,chainindex) local steps = currentlookup.steps local nofsteps = currentlookup.nofsteps if nofsteps > 1 then reportmoresteps(dataset,sequence) end if nofsteps == 0 then reportzerosteps(dataset,sequence) else local startchar = getchar(start) local step = steps[1] local kerns = step.coverage[startchar] if not kerns then -- skip elseif step.format == "pair" then local dx, dy, w, h = setpair(start,factor,rlmode,sequence.flags[4],kerns) -- currentlookup.flags ? if trace_kerns then logprocess("%s: shifting single %s by (%p,%p) and correction (%p,%p)",cref(dataset,sequence),gref(startchar),dx,dy,w,h) end else -- needs checking .. maybe no kerns format for single local k = setkern(start,factor,rlmode,kerns,injection) if trace_kerns then logprocess("%s: shifting single %s by %p",cref(dataset,sequence),gref(startchar),k) end end end return head, start, false end function chainprocs.gpos_pair(head,start,stop,dataset,sequence,currentlookup,rlmode,chainindex) -- todo: injections ? local steps = currentlookup.steps local nofsteps = currentlookup.nofsteps if nofsteps > 1 then reportmoresteps(dataset,sequence) end if nofsteps == 0 then reportzerosteps(dataset,sequence) else local snext = getnext(start) if snext then local startchar = getchar(start) local step = steps[1] local kerns = step.coverage[startchar] -- always 1 step if kerns then local prev = start while snext do local nextchar = ischar(snext,currentfont) if not nextchar then break end local krn = kerns[nextchar] if not krn and marks[nextchar] then prev = snext snext = getnext(snext) elseif not krn then break elseif step.format == "pair" then local a, b = krn[1], krn[2] if optimizekerns then -- this permits a mixed table, but we could also decide to optimize this -- in the loader and use format 'kern' if not b and a[1] == 0 and a[2] == 0 and a[4] == 0 then local k = setkern(snext,factor,rlmode,a[3],"injections") if trace_kerns then logprocess("%s: shifting single %s by %p",cref(dataset,sequence),gref(startchar),k) end return head, start, true end end if a and #a > 0 then local startchar = getchar(start) local x, y, w, h = setpair(start,factor,rlmode,sequence.flags[4],a,"injections") -- currentlookups flags? if trace_kerns then logprocess("%s: shifting first of pair %s and %s by (%p,%p) and correction (%p,%p)",cref(dataset,sequence),gref(startchar),gref(nextchar),x,y,w,h) end end if b and #b > 0 then local startchar = getchar(start) local x, y, w, h = setpair(snext,factor,rlmode,sequence.flags[4],b,"injections") if trace_kerns then logprocess("%s: shifting second of pair %s and %s by (%p,%p) and correction (%p,%p)",cref(dataset,sequence),gref(startchar),gref(nextchar),x,y,w,h) end end return head, start, true elseif krn ~= 0 then local k = setkern(snext,factor,rlmode,krn) if trace_kerns then logprocess("%s: inserting kern %s between %s and %s",cref(dataset,sequence),k,gref(getchar(prev)),gref(nextchar)) end return head, start, true else break end end end end end return head, start, false end function chainprocs.gpos_mark2base(head,start,stop,dataset,sequence,currentlookup,rlmode) local steps = currentlookup.steps local nofsteps = currentlookup.nofsteps if nofsteps > 1 then reportmoresteps(dataset,sequence) end if nofsteps == 0 then reportzerosteps(dataset,sequence) else local markchar = getchar(start) if marks[markchar] then local markanchors = steps[1].coverage[markchar] -- always 1 step if markanchors then local base = getprev(start) -- [glyph] [start=mark] if base then local basechar = ischar(base,currentfont) if basechar then if marks[basechar] then while base do base = getprev(base) if base then local basechar = ischar(base,currentfont) if basechar then if not marks[basechar] then break end else if trace_bugs then logwarning("%s: no base for mark %s, case %i",pref(dataset,sequence),gref(markchar),1) end return head, start, false end else if trace_bugs then logwarning("%s: no base for mark %s, case %i",pref(dataset,sequence),gref(markchar),2) end return head, start, false end end end local ba = markanchors[1][basechar] if ba then local ma = markanchors[2] if ma then local dx, dy, bound = setmark(start,base,factor,rlmode,ba,ma,characters[basechar],false,checkmarks) if trace_marks then logprocess("%s, bound %s, anchoring mark %s to basechar %s => (%p,%p)", cref(dataset,sequence),bound,gref(markchar),gref(basechar),dx,dy) end return head, start, true end end elseif trace_bugs then logwarning("%s: prev node is no char, case %i",cref(dataset,sequence),1) end elseif trace_bugs then logwarning("%s: prev node is no char, case %i",cref(dataset,sequence),2) end elseif trace_bugs then logwarning("%s: mark %s has no anchors",cref(dataset,sequence),gref(markchar)) end elseif trace_bugs then logwarning("%s: mark %s is no mark",cref(dataset,sequence),gref(markchar)) end end return head, start, false end function chainprocs.gpos_mark2ligature(head,start,stop,dataset,sequence,currentlookup,rlmode) local steps = currentlookup.steps local nofsteps = currentlookup.nofsteps if nofsteps > 1 then reportmoresteps(dataset,sequence) end if nofsteps == 0 then reportzerosteps(dataset,sequence) else local markchar = getchar(start) if marks[markchar] then local markanchors = steps[1].coverage[markchar] -- always 1 step if markanchors then local base = getprev(start) -- [glyph] [optional marks] [start=mark] if base then local basechar = ischar(base,currentfont) if basechar then if marks[basechar] then while base do base = getprev(base) if base then local basechar = ischar(base,currentfont) if basechar then if not marks[basechar] then break end else if trace_bugs then logwarning("%s: no base for mark %s, case %i",cref(dataset,sequence),markchar,1) end return head, start, false end else if trace_bugs then logwarning("%s: no base for mark %s, case %i",cref(dataset,sequence),markchar,2) end return head, start, false end end end local ba = markanchors[1][basechar] if ba then local ma = markanchors[2] if ma then local index = getligaindex(start) ba = ba[index] if ba then local dx, dy, bound = setmark(start,base,factor,rlmode,ba,ma,characters[basechar],false,checkmarks) if trace_marks then logprocess("%s, bound %s, anchoring mark %s to baselig %s at index %s => (%p,%p)", cref(dataset,sequence),a or bound,gref(markchar),gref(basechar),index,dx,dy) end return head, start, true end end end elseif trace_bugs then logwarning("%s, prev node is no char, case %i",cref(dataset,sequence),1) end elseif trace_bugs then logwarning("%s, prev node is no char, case %i",cref(dataset,sequence),2) end elseif trace_bugs then logwarning("%s, mark %s has no anchors",cref(dataset,sequence),gref(markchar)) end elseif trace_bugs then logwarning("%s, mark %s is no mark",cref(dataset,sequence),gref(markchar)) end end return head, start, false end function chainprocs.gpos_mark2mark(head,start,stop,dataset,sequence,currentlookup,rlmode) local steps = currentlookup.steps local nofsteps = currentlookup.nofsteps if nofsteps > 1 then reportmoresteps(dataset,sequence) end if nofsteps == 0 then reportzerosteps(dataset,sequence) else local markchar = getchar(start) if marks[markchar] then local markanchors = steps[1].coverage[markchar] -- always 1 step if markanchors then local base = getprev(start) -- [glyph] [basemark] [start=mark] local slc = getligaindex(start) if slc then -- a rather messy loop ... needs checking with husayni while base do local blc = getligaindex(base) if blc and blc ~= slc then base = getprev(base) else break end end end if base then -- subtype test can go local basechar = ischar(base,currentfont) if basechar then local ba = markanchors[1][basechar] if ba then local ma = markanchors[2] if ma then local dx, dy, bound = setmark(start,base,factor,rlmode,ba,ma,characters[basechar],true,checkmarks) if trace_marks then logprocess("%s, bound %s, anchoring mark %s to basemark %s => (%p,%p)", cref(dataset,sequence),bound,gref(markchar),gref(basechar),dx,dy) end return head, start, true end end elseif trace_bugs then logwarning("%s: prev node is no mark, case %i",cref(dataset,sequence),1) end elseif trace_bugs then logwarning("%s: prev node is no mark, case %i",cref(dataset,sequence),2) end elseif trace_bugs then logwarning("%s: mark %s has no anchors",cref(dataset,sequence),gref(markchar)) end elseif trace_bugs then logwarning("%s: mark %s is no mark",cref(dataset,sequence),gref(markchar)) end end return head, start, false end function chainprocs.gpos_cursive(head,start,stop,dataset,sequence,currentlookup,rlmode) local steps = currentlookup.steps local nofsteps = currentlookup.nofsteps if nofsteps > 1 then reportmoresteps(dataset,sequence) end if nofsteps == 0 then reportzerosteps(dataset,sequence) else local startchar = getchar(start) local exitanchors = steps[1].coverage[startchar] -- always 1 step if exitanchors then if marks[startchar] then if trace_cursive then logprocess("%s: ignoring cursive for mark %s",pref(dataset,sequence),gref(startchar)) end else local nxt = getnext(start) while nxt do local nextchar = ischar(nxt,currentfont) if not nextchar then break elseif marks[nextchar] then -- should not happen (maybe warning) nxt = getnext(nxt) else local exit = exitanchors[3] if exit then local entry = exitanchors[1][nextchar] if entry then entry = entry[2] if entry then local dx, dy, bound = setcursive(start,nxt,factor,rlmode,exit,entry,characters[startchar],characters[nextchar]) if trace_cursive then logprocess("%s: moving %s to %s cursive (%p,%p) using bound %s in %s mode",pref(dataset,sequence),gref(startchar),gref(nextchar),dx,dy,bound,mref(rlmode)) end return head, start, true end end elseif trace_bugs then onetimemessage(currentfont,startchar,"no entry anchors",report_fonts) end break end end end elseif trace_cursive and trace_details then logprocess("%s, cursive %s is already done",pref(dataset,sequence),gref(getchar(start)),alreadydone) end end return head, start, false end -- what pointer to return, spec says stop -- to be discussed ... is bidi changer a space? -- elseif char == zwnj and sequence[n][32] then -- brrr local function show_skip(dataset,sequence,char,ck,class) logwarning("%s: skipping char %s, class %a, rule %a, lookuptype %a",cref(dataset,sequence),gref(char),class,ck[1],ck[8] or ck[2]) end -- A previous version had disc collapsing code in the (single sub) handler plus some -- checking in the main loop, but that left the pre/post sequences undone. The best -- solution is to add some checking there and backtrack when a replace/post matches -- but it takes a bit of work to figure out an efficient way (this is what the sweep* -- names refer to). I might look into that variant one day again as it can replace -- some other code too. In that approach we can have a special version for gub and pos -- which gains some speed. This method does the test and passes info to the handlers -- (sweepnode, sweepmode, sweepprev, sweepnext, etc). Here collapsing is handled in the -- main loop which also makes code elsewhere simpler (i.e. no need for the other special -- runners and disc code in ligature building). I also experimented with pushing preceding -- glyphs sequences in the replace/pre fields beforehand which saves checking afterwards -- but at the cost of duplicate glyphs (memory) but it's too much overhead (runtime). -- -- In the meantime Kai had moved the code from the single chain into a more general handler -- and this one (renamed to chaindisk) is used now. I optimized the code a bit and brought -- it in sycn with the other code. Hopefully I didn't introduce errors. Note: this somewhat -- complex approach is meant for fonts that implement (for instance) ligatures by character -- replacement which to some extend is not that suitable for hyphenation. I also use some -- helpers. This method passes some states but reparses the list. There is room for a bit of -- speed up but that will be done in the context version. (In fact a partial rewrite of all -- code can bring some more efficientry.) -- -- I didn't test it with extremes but successive disc nodes still can give issues but in -- order to handle that we need more complex code which also slows down even more. The main -- loop variant could deal with that: test, collapse, backtrack. local userkern = nuts.pool and nuts.pool.newkern -- context do if not userkern then -- generic local thekern = nuts.new("kern",1) -- userkern local setkern = nuts.setkern -- not injections.setkern userkern = function(k) local n = copy_node(thekern) setkern(n,k) return n end end end local function checked(head) local current = head while current do if getid(current) == glue_code then local kern = userkern(getwidth(current)) if head == current then local next = getnext(current) if next then setlink(kern,next) end flush_node(current) head = kern current = next else local prev, next = getboth(current) setlink(prev,kern,next) flush_node(current) current = next end else current = getnext(current) end end return head end local function setdiscchecked(d,pre,post,replace) if pre then pre = checked(pre) end if post then post = checked(post) end if replace then replace = checked(replace) end setdisc(d,pre,post,replace) end local noflags = { false, false, false, false } local function chainrun(head,start,last,dataset,sequence,rlmode,ck,skipped) local size = ck[5] - ck[4] + 1 local flags = sequence.flags or noflags local done = false local skipmark = flags[1] local chainlookups = ck[6] -- current match if chainlookups then local nofchainlookups = #chainlookups -- Lookups can be like { 1, false, 3 } or { false, 2 } or basically anything and -- #lookups can be less than #current if size == 1 then -- if nofchainlookups > size then -- -- bad rules -- end local chainlookup = chainlookups[1] for j=1,#chainlookup do local chainstep = chainlookup[j] local chainkind = chainstep.type local chainproc = chainprocs[chainkind] if chainproc then local ok head, start, ok = chainproc(head,start,last,dataset,sequence,chainstep,rlmode,1) if ok then done = true end else logprocess("%s: %s is not yet supported (1)",cref(dataset,sequence),chainkind) end end else -- See LookupType 5: Contextual Substitution Subtable. Now it becomes messy. The -- easiest case is where #current maps on #lookups i.e. one-to-one. But what if -- we have a ligature. Cf the spec we then need to advance one character but we -- really need to test it as there are fonts out there that are fuzzy and have -- too many lookups: -- -- U+1105 U+119E U+1105 U+119E : sourcehansansklight: script=hang ccmp=yes -- -- Even worse are these family emoji shapes as they can have multiple lookups -- per slot (probably only for gpos). local i = 1 local laststart = start while start do if skipped then while start do local char, id = ischar(start,currentfont) if char then local class = classes[char] if class then if class == skipmark or class == skipligature or class == skipbase or (markclass and class == "mark" and not markclass[char]) then start = getnext(start) else break end else break end else break end end end local chainlookup = chainlookups[i] if chainlookup then for j=1,#chainlookup do local chainstep = chainlookup[j] local chainkind = chainstep.type local chainproc = chainprocs[chainkind] if chainproc then local ok, n head, start, ok, n = chainproc(head,start,last,dataset,sequence,chainstep,rlmode,i) -- messy since last can be changed ! if ok then done = true if n and n > 1 and i + n > nofchainlookups then -- this is a safeguard, we just ignore the rest of the lookups break end end else -- actually an error logprocess("%s: %s is not yet supported (2)",cref(dataset,sequence),chainkind) end end end i = i + 1 if i > size or not start then break elseif start then laststart = start start = getnext(start) end end if not start then start = laststart end end else -- todo: needs checking for holes in the replacements local replacements = ck[7] if replacements then head, start, done = reversesub(head,start,last,dataset,sequence,replacements,rlmode) else done = true if trace_contexts then logprocess("%s: skipping match",cref(dataset,sequence)) end end end return head, start, done end local function chaindisk(head,start,dataset,sequence,rlmode,ck,skipped) if not start then return head, start, false end local startishead = start == head local seq = ck[3] local f = ck[4] local l = ck[5] local s = #seq local done = false local sweepnode = sweepnode local sweeptype = sweeptype local sweepoverflow = false local keepdisc = not sweepnode local lookaheaddisc = nil local backtrackdisc = nil local current = start local last = start local prev = getprev(start) local hasglue = false -- fishy: so we can overflow and then go on in the sweep? -- todo : id can also be glue_code as we checked spaces local i = f while i <= l do local id = getid(current) if id == glyph_code then i = i + 1 last = current current = getnext(current) elseif id == glue_code then i = i + 1 last = current current = getnext(current) hasglue = true elseif id == disc_code then if keepdisc then keepdisc = false lookaheaddisc = current local replace = getfield(current,"replace") if not replace then sweepoverflow = true sweepnode = current current = getnext(current) else while replace and i <= l do if getid(replace) == glyph_code then i = i + 1 end replace = getnext(replace) end current = getnext(replace) end last = current else head, current = flattendisk(head,current) end else last = current current = getnext(current) end if current then -- go on elseif sweepoverflow then -- we already are folling up on sweepnode break elseif sweeptype == "post" or sweeptype == "replace" then current = getnext(sweepnode) if current then sweeptype = nil sweepoverflow = true else break end else break -- added end end if sweepoverflow then local prev = current and getprev(current) if not current or prev ~= sweepnode then local head = getnext(sweepnode) local tail = nil if prev then tail = prev setprev(current,sweepnode) else tail = find_node_tail(head) end setnext(sweepnode,current) setprev(head) setnext(tail) appenddisc(sweepnode,head) end end if l < s then local i = l local t = sweeptype == "post" or sweeptype == "replace" while current and i < s do local id = getid(current) if id == glyph_code then i = i + 1 current = getnext(current) elseif id == glue_code then i = i + 1 current = getnext(current) hasglue = true elseif id == disc_code then if keepdisc then keepdisc = false if notmatchpre[current] ~= notmatchreplace[current] then lookaheaddisc = current end -- we assume a simple text only replace (we could use nuts.count) local replace = getfield(current,"replace") while replace and i < s do if getid(replace) == glyph_code then i = i + 1 end replace = getnext(replace) end current = getnext(current) elseif notmatchpre[current] ~= notmatchreplace[current] then head, current = flattendisk(head,current) else current = getnext(current) -- HH end else current = getnext(current) end if not current and t then current = getnext(sweepnode) if current then sweeptype = nil end end end end if f > 1 then local current = prev local i = f local t = sweeptype == "pre" or sweeptype == "replace" if not current and t and current == checkdisk then current = getprev(sweepnode) end while current and i > 1 do -- missing getprev added / moved outside local id = getid(current) if id == glyph_code then i = i - 1 elseif id == glue_code then i = i - 1 hasglue = true elseif id == disc_code then if keepdisc then keepdisc = false if notmatchpost[current] ~= notmatchreplace[current] then backtrackdisc = current end -- we assume a simple text only replace (we could use nuts.count) local replace = getfield(current,"replace") while replace and i > 1 do if getid(replace) == glyph_code then i = i - 1 end replace = getnext(replace) end elseif notmatchpost[current] ~= notmatchreplace[current] then head, current = flattendisk(head,current) end end current = getprev(current) if t and current == checkdisk then current = getprev(sweepnode) end end end local done = false if lookaheaddisc then local cf = start local cl = getprev(lookaheaddisc) local cprev = getprev(start) local insertedmarks = 0 while cprev do local char = ischar(cf,currentfont) if char and marks[char] then insertedmarks = insertedmarks + 1 cf = cprev startishead = cf == head cprev = getprev(cprev) else break end end setlink(cprev,lookaheaddisc) setprev(cf) setnext(cl) if startishead then head = lookaheaddisc end local pre, post, replace = getdisc(lookaheaddisc) local new = copy_node_list(cf) local cnew = new if pre then setlink(find_node_tail(cf),pre) end if replace then local tail = find_node_tail(new) setlink(tail,replace) end for i=1,insertedmarks do cnew = getnext(cnew) end cl = start local clast = cnew for i=f,l do cl = getnext(cl) clast = getnext(clast) end if not notmatchpre[lookaheaddisc] then local ok = false cf, start, ok = chainrun(cf,start,cl,dataset,sequence,rlmode,ck,skipped) if ok then done = true end end if not notmatchreplace[lookaheaddisc] then local ok = false new, cnew, ok = chainrun(new,cnew,clast,dataset,sequence,rlmode,ck,skipped) if ok then done = true end end if hasglue then setdiscchecked(lookaheaddisc,cf,post,new) else setdisc(lookaheaddisc,cf,post,new) end start = getprev(lookaheaddisc) sweephead[cf] = getnext(clast) sweephead[new] = getnext(cl) elseif backtrackdisc then local cf = getnext(backtrackdisc) local cl = start local cnext = getnext(start) local insertedmarks = 0 while cnext do local char = ischar(cnext,currentfont) if char and marks[char] then insertedmarks = insertedmarks + 1 cl = cnext cnext = getnext(cnext) else break end end if cnext then setprev(cnext,backtrackdisc) end setnext(backtrackdisc,cnext) setprev(cf) setnext(cl) local pre, post, replace, pretail, posttail, replacetail = getdisc(backtrackdisc,true) local new = copy_node_list(cf) local cnew = find_node_tail(new) for i=1,insertedmarks do cnew = getprev(cnew) end local clast = cnew for i=f,l do clast = getnext(clast) end if not notmatchpost[backtrackdisc] then local ok = false cf, start, ok = chainrun(cf,start,last,dataset,sequence,rlmode,ck,skipped) if ok then done = true end end if not notmatchreplace[backtrackdisc] then local ok = false new, cnew, ok = chainrun(new,cnew,clast,dataset,sequence,rlmode,ck,skipped) if ok then done = true end end if post then setlink(posttail,cf) else post = cf end if replace then setlink(replacetail,new) else replace = new end if hasglue then setdiscchecked(backtrackdisc,pre,post,replace) else setdisc(backtrackdisc,pre,post,replace) end start = getprev(backtrackdisc) sweephead[post] = getnext(clast) sweephead[replace] = getnext(last) else local ok = false head, start, ok = chainrun(head,start,last,dataset,sequence,rlmode,ck,skipped) if ok then done = true end end return head, start, done end local function chaintrac(head,start,dataset,sequence,rlmode,ck,skipped,match) local rule = ck[1] local lookuptype = ck[8] or ck[2] local nofseq = #ck[3] local first = ck[4] local last = ck[5] local char = getchar(start) logwarning("%s: rule %s %s at char %s for (%s,%s,%s) chars, lookuptype %a", cref(dataset,sequence),rule,match and "matches" or "nomatch",gref(char),first-1,last-first+1,nofseq-last,lookuptype) end local function handle_contextchain(head,start,dataset,sequence,contexts,rlmode) local sweepnode = sweepnode local sweeptype = sweeptype local currentfont = currentfont local diskseen = false local checkdisc = sweeptype and getprev(head) local flags = sequence.flags or noflags local done = false local skipmark = flags[1] local skipligature = flags[2] local skipbase = flags[3] local markclass = sequence.markclass local skipped = false local startprev, startnext = getboth(start) for k=1,#contexts do -- i've only seen ccmp having > 1 (e.g. dejavu) local match = true local current = start local last = start local ck = contexts[k] local seq = ck[3] local s = #seq local size = 1 -- f..l = mid string if s == 1 then -- this seldom happens as it makes no sense (bril, ebgaramond, husayni, minion) local char = ischar(current,currentfont) if char then if not seq[1][char] then match = false end end else -- maybe we need a better space check (maybe check for glue or category or combination) -- we cannot optimize for n=2 because there can be disc nodes local f = ck[4] local l = ck[5] -- current match -- seq[f][ischar(current,currentfont)] is not nil size = l - f + 1 if size > 1 then -- before/current/after | before/current | current/after local discfound -- = nil local n = f + 1 -- last = getnext(last) -- the second in current (first already matched) last = startnext -- the second in current (first already matched) while n <= l do if not last and (sweeptype == "post" or sweeptype == "replace") then last = getnext(sweepnode) sweeptype = nil end if last then local char, id = ischar(last,currentfont) if char then local class = classes[char] if class then if class == skipmark or class == skipligature or class == skipbase or (markclass and class == "mark" and not markclass[char]) then skipped = true if trace_skips then show_skip(dataset,sequence,char,ck,class) end last = getnext(last) elseif seq[n][char] then if n < l then last = getnext(last) end n = n + 1 else if discfound then notmatchreplace[discfound] = true if notmatchpre[discfound] then match = false end else match = false end break end else if discfound then notmatchreplace[discfound] = true if notmatchpre[discfound] then match = false end else match = false end break end elseif char == false then if discfound then notmatchreplace[discfound] = true if notmatchpre[discfound] then match = false end else match = false end break elseif id == disc_code then diskseen = true discfound = last notmatchpre[last] = nil notmatchpost[last] = true notmatchreplace[last] = nil local pre, post, replace = getdisc(last) if pre then local n = n while pre do if seq[n][getchar(pre)] then n = n + 1 pre = getnext(pre) if n > l then break end else notmatchpre[last] = true break end end if n <= l then notmatchpre[last] = true end else notmatchpre[last] = true end if replace then -- so far we never entered this branch while replace do if seq[n][getchar(replace)] then n = n + 1 replace = getnext(replace) if n > l then break end else notmatchreplace[last] = true if notmatchpre[last] then match = false end break end end -- why here again if notmatchpre[last] then match = false end end -- maybe only if match last = getnext(last) else match = false break end else match = false break end end end -- before if match and f > 1 then -- local prev = getprev(start) -- if prev then if startprev then local prev = startprev if prev == checkdisc and (sweeptype == "pre" or sweeptype == "replace") then prev = getprev(sweepnode) -- sweeptype = nil end if prev then local discfound -- = nil local n = f - 1 while n >= 1 do if prev then local char, id = ischar(prev,currentfont) if char then local class = classes[char] if class then if class == skipmark or class == skipligature or class == skipbase or (markclass and class == "mark" and not markclass[char]) then skipped = true if trace_skips then show_skip(dataset,sequence,char,ck,class) end prev = getprev(prev) elseif seq[n][char] then if n > 1 then prev = getprev(prev) end n = n - 1 else if discfound then notmatchreplace[discfound] = true if notmatchpost[discfound] then match = false end else match = false end break end else if discfound then notmatchreplace[discfound] = true if notmatchpost[discfound] then match = false end else match = false end break end elseif char == false then if discfound then notmatchreplace[discfound] = true if notmatchpost[discfound] then match = false end else match = false end break elseif id == disc_code then -- the special case: f i where i becomes dottless i .. diskseen = true discfound = prev notmatchpre[prev] = true notmatchpost[prev] = nil notmatchreplace[prev] = nil local pre, post, replace, pretail, posttail, replacetail = getdisc(prev,true) if pre ~= start and post ~= start and replace ~= start then if post then local n = n while posttail do if seq[n][getchar(posttail)] then n = n - 1 if posttail == post then break else posttail = getprev(posttail) if n < 1 then break end end else notmatchpost[prev] = true break end end if n >= 1 then notmatchpost[prev] = true end else notmatchpost[prev] = true end if replace then -- we seldom enter this branch (e.g. on brill efficient) while replacetail do if seq[n][getchar(replacetail)] then n = n - 1 if replacetail == replace then break else replacetail = getprev(replacetail) if n < 1 then break end end else notmatchreplace[prev] = true if notmatchpost[prev] then match = false end break end end if not match then break end end end -- maybe only if match prev = getprev(prev) elseif id == glue_code and seq[n][32] and isspace(prev,threshold,id) then n = n - 1 prev = getprev(prev) else match = false break end else match = false break end end else match = false end else match = false end end -- after if match and s > l then local current = last and getnext(last) if not current and (sweeptype == "post" or sweeptype == "replace") then current = getnext(sweepnode) -- sweeptype = nil end if current then local discfound -- = nil -- removed optimization for s-l == 1, we have to deal with marks anyway local n = l + 1 while n <= s do if current then local char, id = ischar(current,currentfont) if char then local class = classes[char] if class then if class == skipmark or class == skipligature or class == skipbase or (markclass and class == "mark" and not markclass[char]) then skipped = true if trace_skips then show_skip(dataset,sequence,char,ck,class) end current = getnext(current) -- was absent elseif seq[n][char] then if n < s then -- new test current = getnext(current) -- was absent end n = n + 1 else if discfound then notmatchreplace[discfound] = true if notmatchpre[discfound] then match = false end else match = false end break end else if discfound then notmatchreplace[discfound] = true if notmatchpre[discfound] then match = false end else match = false end break end elseif char == false then if discfound then notmatchreplace[discfound] = true if notmatchpre[discfound] then match = false end else match = false end break elseif id == disc_code then diskseen = true discfound = current notmatchpre[current] = nil notmatchpost[current] = true notmatchreplace[current] = nil local pre, post, replace = getdisc(current) if pre then local n = n while pre do if seq[n][getchar(pre)] then n = n + 1 pre = getnext(pre) if n > s then break end else notmatchpre[current] = true break end end if n <= s then notmatchpre[current] = true end else notmatchpre[current] = true end if replace then -- so far we never entered this branch while replace do if seq[n][getchar(replace)] then n = n + 1 replace = getnext(replace) if n > s then break end else notmatchreplace[current] = true -- different than others, needs checking if "not" is okay if not notmatchpre[current] then match = false end break end end if not match then break end else -- skip 'm end -- maybe only if match current = getnext(current) elseif id == glue_code and seq[n][32] and isspace(current,threshold,id) then n = n + 1 current = getnext(current) else match = false break end else match = false break end end else match = false end end end if match then if trace_contexts then chaintrac(head,start,dataset,sequence,rlmode,ck,skipped,true) end if diskseen or sweepnode then head, start, done = chaindisk(head,start,dataset,sequence,rlmode,ck,skipped) else head, start, done = chainrun(head,start,last,dataset,sequence,rlmode,ck,skipped) end if done then break -- out of contexts (new, needs checking) end -- elseif trace_chains then -- chaintrac(head,start,dataset,sequence,rlmode,ck,skipped,match) end end if diskseen then notmatchpre = { } notmatchpost = { } notmatchreplace = { } end return head, start, done end handlers.gsub_context = handle_contextchain handlers.gsub_contextchain = handle_contextchain handlers.gsub_reversecontextchain = handle_contextchain handlers.gpos_contextchain = handle_contextchain handlers.gpos_context = handle_contextchain -- this needs testing local function chained_contextchain(head,start,stop,dataset,sequence,currentlookup,rlmode) local steps = currentlookup.steps local nofsteps = currentlookup.nofsteps if nofsteps > 1 then reportmoresteps(dataset,sequence) end return handle_contextchain(head,start,dataset,sequence,currentlookup,rlmode) end chainprocs.gsub_context = chained_contextchain chainprocs.gsub_contextchain = chained_contextchain chainprocs.gsub_reversecontextchain = chained_contextchain chainprocs.gpos_contextchain = chained_contextchain chainprocs.gpos_context = chained_contextchain -- experiment (needs no handler in font-otc so not now): -- -- function otf.registerchainproc(name,f) -- -- chainprocs[name] = f -- chainprocs[name] = function(head,start,stop,dataset,sequence,currentlookup,rlmode) -- local done = currentlookup.nofsteps > 0 -- if not done then -- reportzerosteps(dataset,sequence) -- else -- head, start, done = f(head,start,stop,dataset,sequence,currentlookup,rlmode) -- if not head or not start then -- reportbadsteps(dataset,sequence) -- end -- end -- return head, start, done -- end -- end local missing = setmetatableindex("table") local function logprocess(...) if trace_steps then registermessage(...) end report_process(...) end local logwarning = report_process local function report_missing_coverage(dataset,sequence) local t = missing[currentfont] if not t[sequence] then t[sequence] = true logwarning("missing coverage for feature %a, lookup %a, type %a, font %a, name %a", dataset[4],sequence.name,sequence.type,currentfont,tfmdata.properties.fullname) end end local resolved = { } -- we only resolve a font,script,language pair once -- todo: pass all these 'locals' in a table local sequencelists = setmetatableindex(function(t,font) local sequences = fontdata[font].resources.sequences if not sequences or not next(sequences) then sequences = false end t[font] = sequences return sequences end) -- fonts.hashes.sequences = sequencelists do -- overcome local limit local autofeatures = fonts.analyzers.features local featuretypes = otf.tables.featuretypes local defaultscript = otf.features.checkeddefaultscript local defaultlanguage = otf.features.checkeddefaultlanguage local wildcard = "*" local default = "dflt" local function initialize(sequence,script,language,enabled,autoscript,autolanguage) local features = sequence.features if features then local order = sequence.order if order then local featuretype = featuretypes[sequence.type or "unknown"] for i=1,#order do local kind = order[i] local valid = enabled[kind] if valid then local scripts = features[kind] local languages = scripts and ( scripts[script] or scripts[wildcard] or (autoscript and defaultscript(featuretype,autoscript,scripts)) ) local enabled = languages and ( languages[language] or languages[wildcard] or (autolanguage and defaultlanguage(featuretype,autolanguage,languages)) ) if enabled then return { valid, autofeatures[kind] or false, sequence, kind } end end end else -- can't happen end end return false end function otf.dataset(tfmdata,font) -- generic variant, overloaded in context local shared = tfmdata.shared local properties = tfmdata.properties local language = properties.language or "dflt" local script = properties.script or "dflt" local enabled = shared.features local autoscript = enabled and enabled.autoscript local autolanguage = enabled and enabled.autolanguage local res = resolved[font] if not res then res = { } resolved[font] = res end local rs = res[script] if not rs then rs = { } res[script] = rs end local rl = rs[language] if not rl then rl = { -- indexed but we can also add specific data by key } rs[language] = rl local sequences = tfmdata.resources.sequences if sequences then for s=1,#sequences do local v = enabled and initialize(sequences[s],script,language,enabled,autoscript,autolanguage) if v then rl[#rl+1] = v end end end end return rl end end -- Functions like kernrun, comprun etc evolved over time and in the end look rather -- complex. It's a bit of a compromis between extensive copying and creating subruns. -- The logic has been improved a lot by Kai and Ivo who use complex fonts which -- really helped to identify border cases on the one hand and get insight in the diverse -- ways fonts implement features (not always that consistent and efficient). At the same -- time I tried to keep the code relatively efficient so that the overhead in runtime -- stays acceptable. local function report_disc(what,n) report_run("%s: %s > %s",what,n,languages.serializediscretionary(n)) end local function kernrun(disc,k_run,font,attr,...) -- -- we catch