diff options
Diffstat (limited to 'tex/context/base/sort-ini.lua')
-rw-r--r-- | tex/context/base/sort-ini.lua | 31 |
1 files changed, 30 insertions, 1 deletions
diff --git a/tex/context/base/sort-ini.lua b/tex/context/base/sort-ini.lua index 28bd3b4d8..a8ad1a6e0 100644 --- a/tex/context/base/sort-ini.lua +++ b/tex/context/base/sort-ini.lua @@ -9,7 +9,36 @@ if not modules then modules = { } end modules ['sort-ini'] = { -- It took a while to get there, but with Fleetwood Mac's "Don't Stop" -- playing in the background we sort of got it done. --- todo: cleanup splits (in other modules) +--[[<p>The code here evolved from the rather old mkii approach. There +we concatinate the key and (raw) entry into a new string. Numbers and +special characters get some treatment so that they sort ok. In +addition some normalization (lowercasing, accent stripping) takes +place and again data is appended ror prepended. Eventually these +strings are sorted using a regular string sorter. The relative order +of character is dealt with by weighting them. It took a while to +figure this all out but eventually it worked ok for most languages, +given that the right datatables were provided.</p> + +<p>Here we do follow a similar approach but this time we don't append +the manipulated keys and entries but create tables for each of them +with entries being tables themselves having different properties. In +these tables characters are represented by numbers and sorting takes +place using these numbers. Strings are simplified using lowercasing +as well as shape codes. Numbers are filtered and after getting an offset +they end up at the right end of the spectrum (more clever parser will +be added some day). There are definitely more solutions to the problem +and it is a nice puzzle to solve.</p> + +<p>In the future more methods can be added, as there is practically no +limit to what goes into the tables. For that we will provide hooks.</p> + +<p>Todo: decomposition with specific order of accents, this is +relatively easy to do.</p> + +<p>Todo: investigate what standards and conventions there are and see +how they map onto this mechanism. I've learned that users can come up +with any demand so nothign here is frozen.</p> +]]-- local utf = unicode.utf8 local gsub, rep, sub, sort, concat = string.gsub, string.rep, string.sub, table.sort, table.concat |