summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/languages/languages-options.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/context/sources/general/manuals/languages/languages-options.tex')
-rw-r--r--doc/context/sources/general/manuals/languages/languages-options.tex424
1 files changed, 424 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/languages/languages-options.tex b/doc/context/sources/general/manuals/languages/languages-options.tex
new file mode 100644
index 000000000..e2e5a61c3
--- /dev/null
+++ b/doc/context/sources/general/manuals/languages/languages-options.tex
@@ -0,0 +1,424 @@
+% language=uk
+
+\startcomponent languages-options
+
+\environment languages-environment
+
+\startchapter[title=Options][color=darkblue]
+
+\startsection[title=Introduction]
+
+Hyphenation of words is controlled by so called patterns. They take a word and
+try to match parts with a pattern that describes where a hyphen can be injected.
+Preferred and discouraged injection points accumulate to a score that in the end
+determine where so called discretionary nodes gets injected in the list of
+glyphs that make a word. The patterns are language specific.
+
+This mechanism is agnostic when it comes to the characters involved: they are
+just numbers. However, when in a next step font features like ligature building
+and kerning are applied we also have to deal with language specific properties
+(and meanings). Often a ligature at the boundary of a composed word can make
+reading confusing and has to be avoided. Some of that can be controlled by the
+font when it implements language specific features but because that approach is
+not based on a dictionary it is more about playing safe and prevention than about
+quality.
+
+In the next sections a mechanism is discussed that also uses patterns. This time
+it is about controlling fonts as well as how hyphenation patterns are applied.
+This process kicks in before hyphenation is applied but it definitely has to be
+seen as part of that same process. It is integrated in hyphenation machinery and
+acts as preprocessor with the possibility to feedback and move forward. The
+implementation is such that when it's not used there is no performance penalty.
+\footnote {There are by now plenty of alternative approaches to these problems
+but after some discussion about the pro's and cons of each this new mechanism was
+made. I admit that the fun factor played a role. It is also one of the things we
+can do in \LUAMETATEX\ without worrying about a possible negative impact on
+\LUATEX\ users other than \CONTEXT .}
+
+There are several predefined operations that are characterized by keywords and
+shortcuts and collected in an option list that is part of a language goodie file.
+Examples can be found in the distribution in files with the suffix \type {llg}
+(\LUA\ language goodie). The framework of such a file is:
+
+\starttyping
+return {
+ name = "whatever",
+ version = "1.00",
+ comment = "Goodies for experiments and demo.",
+ author = "Hans Hagen",
+ copyright = "ConTeXt development team",
+ options = {
+ { ... },
+ ........
+ { ... },
+ }
+}
+\stoptyping
+
+These options will eventually result in patterns that are bound to words,
+think of:
+
+\starttabulate[|T||||]
+\NC effe \NC \type {foo|bar} \NC \type {..|..} \NC inhibit ligature \NC \NR
+\NC foobar \NC \type {foo=bar} \NC \type {...=...} \NC inhibit kerning \NC \NR
+\NC somemore \NC \type {some+more} \NC \type {....+....} \NC compound word \NC \NR
+\stoptabulate
+
+The whole repertoire is:
+
+\starttabulate[||T|]
+\NC \type {a|b} \NC a:norightligature, b:noleftligature \NC \NR
+\NC \type {a=b} \NC a:norightkern, b:noleftkern \NC \NR
+\NC \type {a<b} \NC b:noleftkern \NC \NR
+\NC \type {a>b} \NC a:norightkern \NC \NR
+\NC \type {a+b} \NC a:compound:b \NC \NR
+\stoptabulate
+
+Later we will see how some can be combined. An option can be defined using entries
+in a subtable:
+
+\starttabulate[|T|||]
+\NC patterns \NC hash \NC \type {[snippet] = "replacement pattern"} \NC \NR
+\NC words \NC string \NC string of words, separated by whitespace \NC \NR
+\NC prefixes \NC string \NC snippets that combine with words (at the start) \NC \NR
+\NC suffixes \NC string \NC snippets that combine with words (at the end) \NC \NR
+\NC matches \NC array or number \NC a number or table indicating which match matters \NC \NR
+\NC actions \NC hash \NC \type {[character] = "action(s)"} \NC \NR
+\NC characters \NC string \NC permitted characters (additional hjcodes) \NC \NR
+\NC return \NC integer \NC what to do next \NC \NR
+\stoptabulate
+
+The default return value is~2 but there are some more:
+
+\starttabulate[|T||]
+\NC 0 \NC go to the next (valid) word \NC \NR
+\NC 1 \NC restart \NC \NR
+\NC 2 \NC exceptions and after that patterns \NC \NR
+\NC 3 \NC patterns \NC \NR
+\stoptabulate
+
+There are some safeguards built in that force a restart. For instance when a word
+is replaced a restart is enforces unless we skip the word. A restart will not
+permit a second replacement (after all we need to avoid endless loops).
+
+In a multi|-|line word list, lines that start with a comment trigger: \LUA's
+double dash or the usual \TEX\ percent sign.
+
+\stopsection
+
+\startsection[title=Inhibiting]
+
+The next definition replaces \type {ff} by \type {f|f} in the words given and
+eventually block a ligature.
+
+\starttyping
+{
+ patterns = {
+ ff = "f|f",
+ },
+ words = [[
+ effe
+ ]],
+}
+\stoptyping
+
+Some fonts provide the \type {ij} ligature or do some special kerning between
+these characters (something Dutch). Because it depends on the font logic if a
+dedicated replacement or kerning is used this is an example where we do this:
+
+\starttyping
+{
+ patterns = {
+ ij = "i|j",
+ },
+ actions = {
+ ["|"] = "nokern noligature",
+ },
+ words = [[
+ ijverig
+ -- fijn -- to ligature fi or ij, that's the question
+ ]],
+}
+\stoptyping
+
+A more extensive definition is the following. Here we explicitly define that only
+the first match in a word get treated. Here we not only block ligatures but also
+kerns.
+
+\starttyping
+{
+ patterns = {
+ ff = "f|f",
+ },
+ matches = { 1 },
+ actions = {
+ ["|"] = "noligature nokern"
+ },
+ words = [[
+ effe
+ effeffe
+ ]],
+}
+\stoptyping
+
+You can also omit the pattern when you inject specifiers yourself:
+
+\starttyping
+{
+ actions = {
+ ["|"] = "noligature nokern"
+ },
+ words = [[
+ ef|fe
+ ef|fef|fe
+ ]],
+}
+\stoptyping
+
+You can also use different shortcuts:
+
+\starttyping
+{
+ actions = {
+ ["1"] = "noligature"
+ ["2"] = "nokern"
+ },
+ words = [[
+ ef1fe
+ ef1fef2fe
+ ]],
+}
+\stoptyping
+
+Although I cannot come up with a nice example, there can be reasons for
+inhibiting kerns. Here we inhibit kerns left of the upcoming character:
+
+\starttyping
+{
+ patterns = {
+ fo = "f<o",
+ rm = "r<m",
+ },
+ words = [[
+ information
+ ]],
+}
+\stoptyping
+
+And here we inhibit kerns left of the previous and upcoming character:
+
+\starttyping
+{
+ patterns = {
+ th = "t=h",
+ },
+ words = [[
+ thrive
+ ]],
+}
+\stoptyping
+
+Just look in the files in the distribution for realistic examples, like
+
+\starttyping
+{
+ patterns = {
+ fi = "f|i",
+ },
+ words = [[
+ deafish dwarfish elfish oafish selfish
+ ]],
+ suffixes = [[
+ ness ly
+ ]]
+}
+\stoptyping
+
+where we block ligatures in 15 words. There's also a \type {prefixes} key.
+
+\stopsection
+
+\startsection[title=Replacements]
+
+Replacements are probably not used that much but here is one for German. Not
+only is the uppercase variant of ß seldom used, many fonts don't provide it
+so we can best replace it:
+
+\starttyping
+{
+ characters = "ẞ", -- uppercase ß, not visible in all verbatim fonts
+ patterns = {
+ ["ẞ"] = "SS", -- key is uppercase ß
+ },
+}
+\stoptyping
+
+Here we define that character as valid, something that normally is done with the
+patterns but patterns don't have them. If we do not specify it here, the
+hyphenator will skip this word. For the record: this can also be done with a font
+feature that decomposes the character.
+
+\stopsection
+
+\startsection[title=Compound words]
+
+You might want to suppress ligatures and maybe even kerning when compound words
+are involved.
+
+\starttyping
+{
+ patterns = {
+ ff = "f+f",
+ },
+ words = [[
+ aaaaffaaaa
+ bbffbb
+ ]],
+}
+\stoptyping
+
+Again you can also say:
+
+\starttyping
+{
+ words = [[
+ aaaaf|faaaa
+ bbf|fbb
+ ]],
+}
+\stoptyping
+
+But patterns make sense when you have a large list (that might come from some
+other source than yourself).
+
+The next specification will turn two times three \type {bla}'s into a compound
+word but also make sure that we have at least 4 characters left and right of a
+potential break.
+
+\starttyping
+ {
+ left = 4,
+ right = 4,
+ words = [[
+ blablabla+blablabla
+ ]],
+ }
+\stoptyping
+
+\stopsection
+
+\startsection[title=Performance]
+
+Although these mechanisms introduce overhead, the performance hit in \LMTX\ is
+not that large. This is because the number of words in a document is limited and
+\LUA\ is fast enough.
+
+\stopsection
+
+\startsection[title=Plugins]
+
+{\em This interface is preliminary but for the record I put an example here
+anyway.}
+
+\starttyping
+local n = 0
+function document.myhack(original)
+ n = n + 1
+ print(n,original)
+ return original
+end
+
+languages.installhandler("de","document.myhack")
+\stoptyping
+
+One can manipulate a text as in:
+
+\starttyping
+function document.myhack(original)
+ local t = utf.split(original)
+ local t = table.reverse(t)
+ local f = t[#t]
+ local l = t[1]
+ if characters.upper(f) == f then
+ t[1] = characters.upper()
+ t[#t] = characters.lower(f)
+ end
+ local original = table.concat(t)
+ return original
+end
+
+languages.installhandler("en","document.myhack")
+\stoptyping
+
+The text will fed again into the hyphenator and treated in the normal way. There
+are some safeguards against the text being processed twice.
+
+\stopsection
+
+\startsection[title=Tracing]
+
+You can also embed definitions in the source file:
+
+\starttyping
+\startlanguageoptions[de]
+ Zapf|innovation
+\stoplanguageoptions
+\stoptyping
+
+\stopsection
+
+\startsection[title=Exceptions]
+
+When you set exceptions in a goodie file, it will use the plugin mechanism to
+check for them. This is a bit more efficient than using the internal checkerm
+which actually also goes via a\LUA\ hash.
+
+\starttyping
+{
+ exceptions = [[
+ a-very{-}{-}{w}eird{1}{2}{3}(w)ord
+ ]],
+}
+\stoptyping
+
+Watch out: when you specify a discretionary replacement three braced valued are
+passed: the pre, post and replace text. The replace text is used in the lookup,
+unless you add a string between parentheses, which then will be used instead. A
+digit between bracket will apply a penalty according to the following logic (in
+the engine): A zero digit results in \type {\hyphenpenalty}, otherwise the
+digits~1 upto~9 will be used as multiplier for \type {\exceptionpenalty} when
+that value is larger than 100000, otherwise \type {\exceptionpenalty} is used.
+
+\stopsection
+
+\startsection[title=Tracing]
+
+The following tracker can be used:
+
+\starttyping
+\enabletrackers[languages.goodies]
+\stoptyping
+
+In addition the style \type {languages-goodies} implements some tracing options.
+You can just run that one to see what it does.
+
+The engine itself has also a tracing option: \type {\tracinghyphenation}. When
+set to zero nothing is shown, when set to one redundant patterns will be
+reported. A value of two reports what words get fed into the hyphenator and if
+they got hyphenated. A value of three gives more detail: when a word gets
+hyphenated the relevant (resulting) part of the node list is shown. You need to
+set \type {\tracingonline} to a value larger than zero to get this reported to
+the console. Expects lots of extra output to the console for large documents but
+it can be revealing.
+
+\stopsection
+
+\stopchapter
+
+\stopcomponent
+
+%D Musical timestamp: end Match 2021: running into Joe Parrish's amazing
+%D interpretation of Stravinsky's "Rite of Spring" on guitars.
+%D
+%D Also on YT: The Rite of Spring by London Symphony Orchestra (conducted
+%D by Simon Rattle).