diff options
author | Hans Hagen <pragma@wxs.nl> | 2010-10-06 10:20:00 +0200 |
---|---|---|
committer | Hans Hagen <pragma@wxs.nl> | 2010-10-06 10:20:00 +0200 |
commit | e34ee22d154fbde65af2d2c6283e0049b41dee8b (patch) | |
tree | 6cb862be83fd861d5cf57e2c9aa764221d83f152 /tex/context/base/sort-ini.lua | |
parent | 26e9babbd527be8c77f9eabf089aa0763aabc3bd (diff) | |
download | context-e34ee22d154fbde65af2d2c6283e0049b41dee8b.tar.gz |
beta 2010.10.06 10:20
Diffstat (limited to 'tex/context/base/sort-ini.lua')
-rw-r--r-- | tex/context/base/sort-ini.lua | 31 |
1 files changed, 30 insertions, 1 deletions
diff --git a/tex/context/base/sort-ini.lua b/tex/context/base/sort-ini.lua index 28bd3b4d8..a8ad1a6e0 100644 --- a/tex/context/base/sort-ini.lua +++ b/tex/context/base/sort-ini.lua @@ -9,7 +9,36 @@ if not modules then modules = { } end modules ['sort-ini'] = { -- It took a while to get there, but with Fleetwood Mac's "Don't Stop" -- playing in the background we sort of got it done. --- todo: cleanup splits (in other modules) +--[[<p>The code here evolved from the rather old mkii approach. There +we concatinate the key and (raw) entry into a new string. Numbers and +special characters get some treatment so that they sort ok. In +addition some normalization (lowercasing, accent stripping) takes +place and again data is appended ror prepended. Eventually these +strings are sorted using a regular string sorter. The relative order +of character is dealt with by weighting them. It took a while to +figure this all out but eventually it worked ok for most languages, +given that the right datatables were provided.</p> + +<p>Here we do follow a similar approach but this time we don't append +the manipulated keys and entries but create tables for each of them +with entries being tables themselves having different properties. In +these tables characters are represented by numbers and sorting takes +place using these numbers. Strings are simplified using lowercasing +as well as shape codes. Numbers are filtered and after getting an offset +they end up at the right end of the spectrum (more clever parser will +be added some day). There are definitely more solutions to the problem +and it is a nice puzzle to solve.</p> + +<p>In the future more methods can be added, as there is practically no +limit to what goes into the tables. For that we will provide hooks.</p> + +<p>Todo: decomposition with specific order of accents, this is +relatively easy to do.</p> + +<p>Todo: investigate what standards and conventions there are and see +how they map onto this mechanism. I've learned that users can come up +with any demand so nothign here is frozen.</p> +]]-- local utf = unicode.utf8 local gsub, rep, sub, sort, concat = string.gsub, string.rep, string.sub, table.sort, table.concat |