summaryrefslogtreecommitdiff
path: root/tex/context/base/mkiv/font-ots.lua
diff options
context:
space:
mode:
Diffstat (limited to 'tex/context/base/mkiv/font-ots.lua')
-rw-r--r--tex/context/base/mkiv/font-ots.lua225
1 files changed, 103 insertions, 122 deletions
diff --git a/tex/context/base/mkiv/font-ots.lua b/tex/context/base/mkiv/font-ots.lua
index 6d7c5fb25..48f85c365 100644
--- a/tex/context/base/mkiv/font-ots.lua
+++ b/tex/context/base/mkiv/font-ots.lua
@@ -7,92 +7,90 @@ if not modules then modules = { } end modules ['font-ots'] = { -- sequences
license = "see context related readme files",
}
---[[ldx--
-<p>I need to check the description at the microsoft site ... it has been improved
-so maybe there are some interesting details there. Most below is based on old and
-incomplete documentation and involved quite a bit of guesswork (checking with the
-abstract uniscribe of those days. But changing things is tricky!</p>
-
-<p>This module is a bit more split up that I'd like but since we also want to test
-with plain <l n='tex'/> it has to be so. This module is part of <l n='context'/>
-and discussion about improvements and functionality mostly happens on the
-<l n='context'/> mailing list.</p>
-
-<p>The specification of OpenType is (or at least decades ago was) kind of vague.
-Apart from a lack of a proper free specifications there's also the problem that
-Microsoft and Adobe may have their own interpretation of how and in what order to
-apply features. In general the Microsoft website has more detailed specifications
-and is a better reference. There is also some information in the FontForge help
-files. In the end we rely most on the Microsoft specification.</p>
-
-<p>Because there is so much possible, fonts might contain bugs and/or be made to
-work with certain rederers. These may evolve over time which may have the side
-effect that suddenly fonts behave differently. We don't want to catch all font
-issues.</p>
-
-<p>After a lot of experiments (mostly by Taco, me and Idris) the first implementation
-was already quite useful. When it did most of what we wanted, a more optimized version
-evolved. Of course all errors are mine and of course the code can be improved. There
-are quite some optimizations going on here and processing speed is currently quite
-acceptable and has been improved over time. Many complex scripts are not yet supported
-yet, but I will look into them as soon as <l n='context'/> users ask for it.</p>
-
-<p>The specification leaves room for interpretation. In case of doubt the Microsoft
-implementation is the reference as it is the most complete one. As they deal with
-lots of scripts and fonts, Kai and Ivo did a lot of testing of the generic code and
-their suggestions help improve the code. I'm aware that not all border cases can be
-taken care of, unless we accept excessive runtime, and even then the interference
-with other mechanisms (like hyphenation) are not trivial.</p>
-
-<p>Especially discretionary handling has been improved much by Kai Eigner who uses complex
-(latin) fonts. The current implementation is a compromis between his patches and my code
-and in the meantime performance is quite ok. We cannot check all border cases without
-compromising speed but so far we're okay. Given good test cases we can probably improve
-it here and there. Especially chain lookups are non trivial with discretionaries but
-things got much better over time thanks to Kai.</p>
-
-<p>Glyphs are indexed not by unicode but in their own way. This is because there is no
-relationship with unicode at all, apart from the fact that a font might cover certain
-ranges of characters. One character can have multiple shapes. However, at the
-<l n='tex'/> end we use unicode so and all extra glyphs are mapped into a private
-space. This is needed because we need to access them and <l n='tex'/> has to include
-then in the output eventually.</p>
-
-<p>The initial data table is rather close to the open type specification and also not
-that different from the one produced by <l n='fontforge'/> but we uses hashes instead.
-In <l n='context'/> that table is packed (similar tables are shared) and cached on disk
-so that successive runs can use the optimized table (after loading the table is
-unpacked).</p>
-
-<p>This module is sparsely documented because it is has been a moving target. The
-table format of the reader changed a bit over time and we experiment a lot with
-different methods for supporting features. By now the structures are quite stable</p>
-
-<p>Incrementing the version number will force a re-cache. We jump the number by one
-when there's a fix in the reader or processing code that can result in different
-results.</p>
-
-<p>This code is also used outside context but in context it has to work with other
-mechanisms. Both put some constraints on the code here.</p>
-
---ldx]]--
-
--- Remark: We assume that cursives don't cross discretionaries which is okay because it
--- is only used in semitic scripts.
+-- I need to check the description at the microsoft site ... it has been improved so
+-- maybe there are some interesting details there. Most below is based on old and
+-- incomplete documentation and involved quite a bit of guesswork (checking with the
+-- abstract uniscribe of those days. But changing things is tricky!
+--
+-- This module is a bit more split up that I'd like but since we also want to test
+-- with plain TeX it has to be so. This module is part of ConTeXt and discussion
+-- about improvements and functionality mostly happens on the ConTeXt mailing list.
+--
+-- The specification of OpenType is (or at least decades ago was) kind of vague.
+-- Apart from a lack of a proper free specifications there's also the problem that
+-- Microsoft and Adobe may have their own interpretation of how and in what order to
+-- apply features. In general the Microsoft website has more detailed specifications
+-- and is a better reference. There is also some information in the FontForge help
+-- files. In the end we rely most on the Microsoft specification.
+--
+-- Because there is so much possible, fonts might contain bugs and/or be made to
+-- work with certain rederers. These may evolve over time which may have the side
+-- effect that suddenly fonts behave differently. We don't want to catch all font
+-- issues.
+--
+-- After a lot of experiments (mostly by Taco, me and Idris) the first
+-- implementation was already quite useful. When it did most of what we wanted, a
+-- more optimized version evolved. Of course all errors are mine and of course the
+-- code can be improved. There are quite some optimizations going on here and
+-- processing speed is currently quite acceptable and has been improved over time.
+-- Many complex scripts are not yet supported yet, but I will look into them as soon
+-- as ConTeXt users ask for it.
+--
+-- The specification leaves room for interpretation. In case of doubt the Microsoft
+-- implementation is the reference as it is the most complete one. As they deal with
+-- lots of scripts and fonts, Kai and Ivo did a lot of testing of the generic code
+-- and their suggestions help improve the code. I'm aware that not all border cases
+-- can be taken care of, unless we accept excessive runtime, and even then the
+-- interference with other mechanisms (like hyphenation) are not trivial.
+--
+-- Especially discretionary handling has been improved much by Kai Eigner who uses
+-- complex (latin) fonts. The current implementation is a compromis between his
+-- patches and my code and in the meantime performance is quite ok. We cannot check
+-- all border cases without compromising speed but so far we're okay. Given good
+-- test cases we can probably improve it here and there. Especially chain lookups
+-- are non trivial with discretionaries but things got much better over time thanks
+-- to Kai.
+--
+-- Glyphs are indexed not by unicode but in their own way. This is because there is
+-- no relationship with unicode at all, apart from the fact that a font might cover
+-- certain ranges of characters. One character can have multiple shapes. However, at
+-- the TeX end we use unicode so and all extra glyphs are mapped into a private
+-- space. This is needed because we need to access them and TeX has to include then
+-- in the output eventually.
+--
+-- The initial data table is rather close to the open type specification and also
+-- not that different from the one produced by Fontforge but we uses hashes instead.
+-- In ConTeXt that table is packed (similar tables are shared) and cached on disk so
+-- that successive runs can use the optimized table (after loading the table is
+-- unpacked).
+--
+-- This module is sparsely documented because it is has been a moving target. The
+-- table format of the reader changed a bit over time and we experiment a lot with
+-- different methods for supporting features. By now the structures are quite stable
+--
+-- Incrementing the version number will force a re-cache. We jump the number by one
+-- when there's a fix in the reader or processing code that can result in different
+-- results.
+--
+-- This code is also used outside ConTeXt but in ConTeXt it has to work with other
+-- mechanisms. Both put some constraints on the code here.
+--
+-- Remark: We assume that cursives don't cross discretionaries which is okay because
+-- it is only used in semitic scripts.
--
-- Remark: We assume that marks precede base characters.
--
--- Remark: When complex ligatures extend into discs nodes we can get side effects. Normally
--- this doesn't happen; ff\d{l}{l}{l} in lm works but ff\d{f}{f}{f}.
+-- Remark: When complex ligatures extend into discs nodes we can get side effects.
+-- Normally this doesn't happen; ff\d{l}{l}{l} in lm works but ff\d{f}{f}{f}.
--
-- Todo: check if we copy attributes to disc nodes if needed.
--
--- Todo: it would be nice if we could get rid of components. In other places we can use
--- the unicode properties. We can just keep a lua table.
+-- Todo: it would be nice if we could get rid of components. In other places we can
+-- use the unicode properties. We can just keep a lua table.
--
--- Remark: We do some disc juggling where we need to keep in mind that the pre, post and
--- replace fields can have prev pointers to a nesting node ... I wonder if that is still
--- needed.
+-- Remark: We do some disc juggling where we need to keep in mind that the pre, post
+-- and replace fields can have prev pointers to a nesting node ... I wonder if that
+-- is still needed.
--
-- Remark: This is not possible:
--
@@ -1038,10 +1036,8 @@ function handlers.gpos_pair(head,start,dataset,sequence,kerns,rlmode,skiphash,st
end
end
---[[ldx--
-<p>We get hits on a mark, but we're not sure if the it has to be applied so
-we need to explicitly test for basechar, baselig and basemark entries.</p>
---ldx]]--
+-- We get hits on a mark, but we're not sure if the it has to be applied so we need
+-- to explicitly test for basechar, baselig and basemark entries.
function handlers.gpos_mark2base(head,start,dataset,sequence,markanchors,rlmode,skiphash)
local markchar = getchar(start)
@@ -1236,10 +1232,8 @@ function handlers.gpos_cursive(head,start,dataset,sequence,exitanchors,rlmode,sk
return head, start, false
end
---[[ldx--
-<p>I will implement multiple chain replacements once I run into a font that uses
-it. It's not that complex to handle.</p>
---ldx]]--
+-- I will implement multiple chain replacements once I run into a font that uses it.
+-- It's not that complex to handle.
local chainprocs = { }
@@ -1292,29 +1286,22 @@ end
chainprocs.reversesub = reversesub
---[[ldx--
-<p>This chain stuff is somewhat tricky since we can have a sequence of actions to be
-applied: single, alternate, multiple or ligature where ligature can be an invalid
-one in the sense that it will replace multiple by one but not neccessary one that
-looks like the combination (i.e. it is the counterpart of multiple then). For
-example, the following is valid:</p>
-
-<typing>
-<line>xxxabcdexxx [single a->A][multiple b->BCD][ligature cde->E] xxxABCDExxx</line>
-</typing>
-
-<p>Therefore we we don't really do the replacement here already unless we have the
-single lookup case. The efficiency of the replacements can be improved by deleting
-as less as needed but that would also make the code even more messy.</p>
---ldx]]--
-
---[[ldx--
-<p>Here we replace start by a single variant.</p>
---ldx]]--
-
--- To be done (example needed): what if > 1 steps
-
--- this is messy: do we need this disc checking also in alternates?
+-- This chain stuff is somewhat tricky since we can have a sequence of actions to be
+-- applied: single, alternate, multiple or ligature where ligature can be an invalid
+-- one in the sense that it will replace multiple by one but not neccessary one that
+-- looks like the combination (i.e. it is the counterpart of multiple then). For
+-- example, the following is valid:
+--
+-- xxxabcdexxx [single a->A][multiple b->BCD][ligature cde->E] xxxABCDExxx
+--
+-- Therefore we we don't really do the replacement here already unless we have the
+-- single lookup case. The efficiency of the replacements can be improved by
+-- deleting as less as needed but that would also make the code even more messy.
+--
+-- Here we replace start by a single variant.
+--
+-- To be done : what if > 1 steps (example needed)
+-- This is messy: do we need this disc checking also in alternates?
local function reportzerosteps(dataset,sequence)
logwarning("%s: no steps",cref(dataset,sequence))
@@ -1390,9 +1377,7 @@ function chainprocs.gsub_single(head,start,stop,dataset,sequence,currentlookup,r
return head, start, false
end
---[[ldx--
-<p>Here we replace start by new glyph. First we delete the rest of the match.</p>
---ldx]]--
+-- Here we replace start by new glyph. First we delete the rest of the match.
-- char_1 mark_1 -> char_x mark_1 (ignore marks)
-- char_1 mark_1 -> char_x
@@ -1444,9 +1429,7 @@ function chainprocs.gsub_alternate(head,start,stop,dataset,sequence,currentlooku
return head, start, false
end
---[[ldx--
-<p>Here we replace start by a sequence of new glyphs.</p>
---ldx]]--
+-- Here we replace start by a sequence of new glyphs.
function chainprocs.gsub_multiple(head,start,stop,dataset,sequence,currentlookup,rlmode,skiphash,chainindex)
local mapping = currentlookup.mapping
@@ -1470,11 +1453,9 @@ function chainprocs.gsub_multiple(head,start,stop,dataset,sequence,currentlookup
return head, start, false
end
---[[ldx--
-<p>When we replace ligatures we use a helper that handles the marks. I might change
-this function (move code inline and handle the marks by a separate function). We
-assume rather stupid ligatures (no complex disc nodes).</p>
---ldx]]--
+-- When we replace ligatures we use a helper that handles the marks. I might change
+-- this function (move code inline and handle the marks by a separate function). We
+-- assume rather stupid ligatures (no complex disc nodes).
-- compare to handlers.gsub_ligature which is more complex ... why
@@ -2532,7 +2513,7 @@ local function handle_contextchain(head,start,dataset,sequence,contexts,rlmode,s
-- fonts can have many steps (each doing one check) or many contexts
-- todo: make a per-char cache so that we have small contexts (when we have a context
- -- n == 1 and otherwise it can be more so we can even distingish n == 1 or more)
+ -- n == 1 and otherwise it can be more so we can even distinguish n == 1 or more)
local nofcontexts = contexts.n -- #contexts