summaryrefslogtreecommitdiff
path: root/tex/context/base/sort-ini.lua
diff options
context:
space:
mode:
authorHans Hagen <pragma@wxs.nl>2010-10-06 10:20:00 +0200
committerHans Hagen <pragma@wxs.nl>2010-10-06 10:20:00 +0200
commite34ee22d154fbde65af2d2c6283e0049b41dee8b (patch)
tree6cb862be83fd861d5cf57e2c9aa764221d83f152 /tex/context/base/sort-ini.lua
parent26e9babbd527be8c77f9eabf089aa0763aabc3bd (diff)
downloadcontext-e34ee22d154fbde65af2d2c6283e0049b41dee8b.tar.gz
beta 2010.10.06 10:20
Diffstat (limited to 'tex/context/base/sort-ini.lua')
-rw-r--r--tex/context/base/sort-ini.lua31
1 files changed, 30 insertions, 1 deletions
diff --git a/tex/context/base/sort-ini.lua b/tex/context/base/sort-ini.lua
index 28bd3b4d8..a8ad1a6e0 100644
--- a/tex/context/base/sort-ini.lua
+++ b/tex/context/base/sort-ini.lua
@@ -9,7 +9,36 @@ if not modules then modules = { } end modules ['sort-ini'] = {
-- It took a while to get there, but with Fleetwood Mac's "Don't Stop"
-- playing in the background we sort of got it done.
--- todo: cleanup splits (in other modules)
+--[[<p>The code here evolved from the rather old mkii approach. There
+we concatinate the key and (raw) entry into a new string. Numbers and
+special characters get some treatment so that they sort ok. In
+addition some normalization (lowercasing, accent stripping) takes
+place and again data is appended ror prepended. Eventually these
+strings are sorted using a regular string sorter. The relative order
+of character is dealt with by weighting them. It took a while to
+figure this all out but eventually it worked ok for most languages,
+given that the right datatables were provided.</p>
+
+<p>Here we do follow a similar approach but this time we don't append
+the manipulated keys and entries but create tables for each of them
+with entries being tables themselves having different properties. In
+these tables characters are represented by numbers and sorting takes
+place using these numbers. Strings are simplified using lowercasing
+as well as shape codes. Numbers are filtered and after getting an offset
+they end up at the right end of the spectrum (more clever parser will
+be added some day). There are definitely more solutions to the problem
+and it is a nice puzzle to solve.</p>
+
+<p>In the future more methods can be added, as there is practically no
+limit to what goes into the tables. For that we will provide hooks.</p>
+
+<p>Todo: decomposition with specific order of accents, this is
+relatively easy to do.</p>
+
+<p>Todo: investigate what standards and conventions there are and see
+how they map onto this mechanism. I've learned that users can come up
+with any demand so nothign here is frozen.</p>
+]]--
local utf = unicode.utf8
local gsub, rep, sub, sort, concat = string.gsub, string.rep, string.sub, table.sort, table.concat