diff options
Diffstat (limited to 'doc/context/sources/general/manuals/languages')
10 files changed, 0 insertions, 2238 deletions
diff --git a/doc/context/sources/general/manuals/languages/languages-appendix.tex b/doc/context/sources/general/manuals/languages/languages-appendix.tex deleted file mode 100644 index d8f33e44a..000000000 --- a/doc/context/sources/general/manuals/languages/languages-appendix.tex +++ /dev/null @@ -1,48 +0,0 @@ -% language=uk - -\startcomponent languages-appendix - -\environment languages-environment - -\startchapter[title=Appendix][color=darkgray] - -\startsection[title=The language files] - - Todo. - -\stopsection - -\startsection[title=The \type {mtx-patterns} script] - - Todo. - -\stopsection - -\startsection[title=Installed sorters] - -\startbuffer -\usemodule[s-languages-sorting] - -\showinstalledsorting -\stopbuffer - -\typebuffer \blank[2*line] \getbuffer - -\stopsection - -\startsection[title=Verbose counters] - -\startbuffer -\usemodule[s-languages-counters] - -\showverbosecounters[language={en,es}] -\stopbuffer - -\typebuffer \blank[2*line] \getbuffer - -\stopsection - -\stopchapter - -\stopcomponent - diff --git a/doc/context/sources/general/manuals/languages/languages-basics.tex b/doc/context/sources/general/manuals/languages/languages-basics.tex deleted file mode 100644 index 840897096..000000000 --- a/doc/context/sources/general/manuals/languages/languages-basics.tex +++ /dev/null @@ -1,348 +0,0 @@ -% language=uk - -\startcomponent languages-basics - -\environment languages-environment - -\startchapter[title=Some basics][color=darkyellow] - -\startsection[title={Introduction}] - -In this chapter we will see how we can toggle between languages. A first -introduction to patterns will be given. Some details of how to control the -hyphenation with specific patterns will be given in a later chapter. - -\stopsection - -\startsection[title={Available languages}] - -When you use the English version of \CONTEXT\ you will default to US English as -main language. This means that hyphenation will be US specific, which by the way -is different from the rules in GB. All labels that are generated by the system -are also in English. Languages can often be accessed by names like \type -{english} or \type {dutch} although it is quite common to use the short tags like -\type {en} and \type {nl}. Because we want to be as compatible as possible with -\MKII, there are quite some synonyms. The following table lists the languages that -for which support is built|-|in.\footnote {More languages can be defined. It is -up to users to provide the information.} - -\startbuffer -\usemodule[languages-system] - -\loadinstalledlanguages -\showinstalledlanguages -\stopbuffer - -\getbuffer - -You can call up such a table with the following commands: - -\typebuffer - -Instead you can run \type {context --global languages-system.mkiv}. - -As you can see, many languages have hyphenation patterns but for Japanese, -Korean, Chinese as well as Arabic languages they make no sense. The patterns are -loaded on demand. The number is the internal number that is used in the engine; a -user never has to use that number. Numbers $<1$ are used to disable hyphenation. -The file tag is used to locate and load a specification. Such files have names -like type {lang-nl.lua}. - -Some languages share the same hyphenation patterns but can have demands that -differ, like labels or quotes. The characters shown in the table are those found -in the pattern files. The number of patterns differs a lot between languages. -This relates to the systematic behind them. Some languages use word stems, others -base their hyphenation on syllables. Some language have inflections which adds to -the complexity while others can combine words in ways that demand special care -for word boundaries. Of course a low or high number can signal a low quality as -well, but most pattern collections are assembled over many years and updated when -for instance spelling rules change. I think that we can safely say that most patterns -are quite stable and of good quality. - -\stopsection - -\startsection[title=Switching] - -The document language is set with - -\starttyping -\mainlanguage[en] -\stoptyping - -but when you want to apply the proper hyphenation rules to an embedded language -you can use: - -\starttyping -\language[en] -\stoptyping - -or just: - -\starttyping -\en -\stoptyping - -The main language determines what labels show up, how numbering happens, in what -way dates get formatted, etc. Normally the \typ {\mainlanguage} command comes -before the \typ {\starttext} command. - -\stopsection - -\startsection[title=Hyphenation] - -In \LUATEX\ each character that gets typeset not only carries a font id and character -code, but also a language number. You can switch language whenever you want and -the change will be carried with the characters. Switching within a word doesn't make -sense but it is permitted: - -\starttabulate[|||T|] -\NC 1 \NC \type{\de incrediblykompliziert} \NC \hyphenatedword{\de incrediblykompliziert} \NC \NR -\NC 2 \NC \type{\en incrediblykompliziert} \NC \hyphenatedword{\en incrediblykompliziert} \NC \NR -\NC 3 \NC \type{\en incredibly\de kompliziert} \NC \hyphenatedword{\en incredibly\de kompliziert} \NC \NR -\NC 4 \NC \type{\en incredibly\de\-kompliziert} \NC \hyphenatedword{\en incredibly\de\-kompliziert} \NC \NR -\NC 5 \NC \type{\en incredibly\de-kompliziert} \NC \hyphenatedword{\en incredibly\de-kompliziert} \NC \NR -\stoptabulate - -In the line 4 we have a \type {\-} between the two words, and in the last -line just a \type {-}. If you look closely you will notice that the snippets -can be quite small. If we typeset a word with a 1mm text width we get this: - -\blank \start \en \hsize 1mm incredibly \par \stop \blank - -If you are familiar with the details of hyphenation, you know that the number of -characters at the end and beginning of a word is controlled by the two variables -\typ {\lefthyphenmin} and \typ {\righthyphenmin}. However, these only influence -the hyphenation process. What bits and pieces eventually end up on a line is -determined by the par builder and there the \type {\hsize} matters. In practice -you will not run into these situations, unless you have extreme long words and a -narrow column. - -Hyphenation normally is limited to regular characters that make up the alphabet of -a language. It is insensitive for capitalization as the following text shows: - -\blank - -\startnarrower -\hyphenatedword {This time the musical distraction while developing code came -from watching youtube performances of Cory Henry (also known from Snarky Puppy, -a conglomerate of excellent players). Just search the web for his name with \quote -{Stevie Wonder and Michael Jackson Tribute}. There is no keyboard he can't play. -Another interesting keyboard player is Sun Rai (a short name for Rai -Thistlethwayte, just google for \quote {The Beatles, Come Together, Live Piano -Acoustic with Loop Pedal}, or do a combined search with \quote {Matt -Chamberlain}. Okay, and talking of keyboards, let's not forget Vika Yermolyeva -(vkgoeswild) as she's one of a kind too on the web. And then there is Jacob -Collier, in one word: incredible (or hyphenated the Dutch way {\nl incredible}, -let me repeat that in French {\fr incredible}).} \footnote {Get me right, there -are of course many more fantastic musicians.} -\stopnarrower - -\blank - -Of course, names are often short and don't need to be hyphenated -(or the left and right settings prohibit it). Another complication with names is -that they can come from another language so we either need to switch language -temporarily or we need to add an exception (more about that later). - -\stopsection - -\startsection[title=Primitives] - -In traditional \TEX\ the language is not a property of a character but is -triggered by a signal in the (so called) list. Think of: - -\starttyping -<language 1>this is <language 2>nederlands<language 1> mixed with english -\stoptyping - -This number is set by the primitive \typ {\language}. Language triggers are -injected into the list depending on the value of this number. There is also a \typ -{\setlanguage} primitive that can inject triggers without setting the \typ -{\language} number. Because in \LUATEX\ the state is kept with the character -you don't need to worry about the subtle differences here. - -In \CONTEXT\ the \typ {\language} and \typ {\setlanguage} commands are overloaded -by a more advanced switch macro. You cannot assume that they work as explained in -general manuals about \TEX. Currently you can still assign a number but that -might change. Just consider the language to be an abstraction and don't mess with -this number. Both commands not only change the current language but also do -specific initializations when needed. - -What characters get involved in hyhenation is historically determines by the so -called \type {\lccode} values. Each character can have such a value which maps -an uppercase to a lowercase character. This concept has been extended in \ETEX\ -where it binds to a pattern set (language). However, in \CONTEXT\ the user never -has to worry about such details. - -% The \type {\patterns} primitive is -% The \type {\hyphenation} primitive is - -In traditional hyphenation there will not be hyphenated if the sum of \typ -{\lefthyphenmin} and \typ {\righthyphenmin} exceeds 62. This limitation is not -present in the to be presented \LUA\ variant of this routine as there is no -good reason for this limitation other than implementation constraints. - -\stopsection - -\startsection[title=Control] - -We already mentioned \typ {\lefthyphenmin} and \typ {\righthyphenmin}. These -two variables control the area in a word that is subjected to hyphenation. -Setting these values is a matter of taste but making them too small can result in -bad hyphenation when the patterns are made with the assumptions that certain -minima are used. Using a \typ {\lefthyphenmin} of 2 while the patterns are made -with a value of 3 in mind is a bad idea. - -\startlinecorrection[blank] -\startluacode -context.bTABLE { option = "stretch", align= "middle" } - context.bTR() - context.bTD { ny = 2, align = "middle,lohi", style = "monobold" } - context.verbatim("\\lefthyphenmin") - context.eTD() - context.bTD { nx = 5, style = "monobold" } - context.verbatim("\\righthyphenmin") - context.eTD() - context.eTR() - context.bTR() - for right=1,5 do - context.bTD() - context.mono(right) - context.eTD() - end - context.eTR() - for left=1,5 do - context.bTR() - context.bTD() - context.mono(left) - context.eTD() - for right=1,5 do - context.bTD() - context("\\lefthyphenmin %s \\righthyphenmin %s \\hyphenatedword{interesting}",left,right) - context.eTD() - end - context.eTR() - end -context.eTABLE() -\stopluacode -\stoplinecorrection - -When \TEX\ breaks a paragraph into lines it will try do so without hyphenation. -When that fails (read: when the badness becomes too high) a next effort will take -hyphenation into account. \footnote {Because in \LUATEX\ we always hyphenate -there is no real gain in trying not to hyphenate. Because in traditional \TEX\ -hyphenation happens on the fly a pass without hyphenating makes more sense.} When -the badness is still too high, an optional emergency pass can be made but only -when the tolerances are set to permit this. In \CONTEXT\ you can try these -settings when you get too many over- or underfull boxes reported on the console. - -\starttyping -\setupalign[tolerant] -\setupalign[verytolerant] -\setupalign[verytolerant,stretch] -\stoptyping - -Personally I tend to use the last setting, especially in automated flows. After -all, \TEX\ will not apply stretch unless it's really needed. - -The two \typ {\*hyphenmin} parameters can be set any time and the current value -is stored with each character. They can also be set with the language which we -will see later. - -When \TEX\ hyphenates words it has to decide where a word starts and ends. In -traditional \TEX\ the words starts normally at a character that falls within the -scope of the hyphenator. It ends at when a box (hlist or vlist) is seen, but also -at a rule, discretionary, accent (forget about this in \CONTEXT) or math. An -example will be given in the chapter that discussed the \LUA\ alternative. - -\stopsection - -\startsection[title=Installing] - - todo - -\stopsection - -\startsection[title=Modes] - -Languages are one of the mechanisms where you can access the current state. There are -for instance two (official) macros that contain the current (main) language: - -\startbuffer -\starttabulate[||Tc|] -\HL -\NC \bf macro \NC \bf value \NC \NR -\HL -\NC \type {\currentmainlanguage} \NC \currentmainlanguage \NC \NR -\NC \type {\currentlanguage} \NC \currentlanguage \NC \NR -\HL -\stoptabulate -\stopbuffer - -\getbuffer - -When we have set \type {\language[nl]} we get this: - -\start \nl \getbuffer \stop - -If you write a style that needs to adapt to a language you can use modes. There -are several ways to do this: - -\startbuffer -\language[nl] - -\startmode[**en] - \color[darkred]{main english} -\stopmode - -\startmode[*en] - \color[darkred]{local english} -\stopmode - -\startmode[**nl] - \color[darkblue]{main dutch} -\stopmode - -\startmode[*nl] - \color[darkblue]{local dutch} -\stopmode - -\startmodeset - [*en] {\color[darkgreen]{english set}} - [*nl] {\color[darkgreen]{dutch set}} -\stopmodeset -\stopbuffer - -\typebuffer - -This typesets: - -\blank \startpacked \setupindenting[no] \getbuffer \stoppacked \blank - -When you use setups you can use the following trick: - -\startbuffer -\language[nl] - -\startsetups language:en - \color[darkorange]{something english} -\stopsetups - -\startsetups language:nl - \color[darkorange]{something dutch} -\stopsetups - -\setups[language:\currentlanguage] -\stopbuffer - -\typebuffer - -As expected we get: - -\blank \start \setupindenting[no] \getbuffer \stop \blank - -\stopsection - -\stopchapter - -\stopcomponent diff --git a/doc/context/sources/general/manuals/languages/languages-environment.tex b/doc/context/sources/general/manuals/languages/languages-environment.tex deleted file mode 100644 index 2669fe0c5..000000000 --- a/doc/context/sources/general/manuals/languages/languages-environment.tex +++ /dev/null @@ -1,15 +0,0 @@ -\startenvironment languages-environment - - \environment manuals-explaining-environment - \environment manuals-explaining-cover - - \usemodule [s] [languages-hyphenation] - \usemodule [s] [languages-frequencies] - \usemodule [s] [languages-sorting] - \usemodule [s] [languages-counters] - \usemodule [s] [languages-system] - - \usemodule [s] [math-extensibles] - -\stopenvironment - diff --git a/doc/context/sources/general/manuals/languages/languages-goodies.tex b/doc/context/sources/general/manuals/languages/languages-goodies.tex deleted file mode 100644 index 32de54db3..000000000 --- a/doc/context/sources/general/manuals/languages/languages-goodies.tex +++ /dev/null @@ -1,138 +0,0 @@ -% language=uk - -\startcomponent languages-goodies - -\environment languages-environment - -\startchapter[title=Goodies][color=darkorange] - -\startsection[title=Introduction] - -There are some features that will only be used in rare cases. They were often -implemented as experiment but found useful enough to keep around. - -\stopsection - -\startsection[title=Spell checking] - -There are some means to check the spelling of words in your document but get it -right: \CONTEXT\ is not a spell|-|checker. These features were added in order to -be able to do some quick checking of documents written by multiple authors. There -are currently three options and we only show a simple examples. - -First you need to load word lists. These are either text files with just words -separated by spacing. - -\starttyping[color=maincolor] -foobar foo-bar foo=bar foo{}{}{}bar foo{}{}{bar} -\stoptyping - -All these words become \type {foobar} which means that one can use words with -discretionary specifications. A text list is loaded with: - -\startbuffer -\loadspellchecklist[en][t:/manuals/lua/words-en.txt] -\stopbuffer - -\typebuffer \getbuffer - -Instead you can load a \LUA\ file with words. Here we use the same structure that -we use for the spell checker provided for \SCITE: - -\starttyping[color=maincolor] -return { - max = 9, - min = 6, - n = 2, - words = { - ["barfoo"] = "Barfoo" - ["foobarred"] = "foobarred", - } -} -\stoptyping - -We use the same load command (you can also load bytecode files with suffix \type -{luc} this way): - -\startbuffer -\loadspellchecklist[nl][t:/scite/data/context/lexers/data/spell-nl.lua] -\stopbuffer - -\typebuffer \getbuffer - -Usage boils down to enabling the checker. If needed we can add more methods. The -first method colors the known and unknown colors. Words shorter then the -threshold of 4 will be skipped. - -\startbuffer -\setupspellchecking[state=start,method=1] -\en Is this written right or is this wromg?\par % m -> n error -\nl Is dit goed geschreven of niet?\par -\setupspellchecking[state=stop] -\stopbuffer - -\typebuffer \startpacked \getbuffer \stoppacked - -You can change the colors: - -\starttyping -\definecolor[word:yes] [g=.75] -\definecolor[word:no] [r=.75] -\stoptyping - -The second method doesn't show anything but produces a file \type -{jobname.words}) with used words. The \type {found} value of \type {list} is used -as key in the produced table. - -\startbuffer -\setupspellchecking[state=start,method=2,list=found] -\en Is this written right or is this wrong?\par -\nl Is dit goed geschreven of niet?\par -\setupspellchecking[state=stop] -\stopbuffer - -\typebuffer \startpacked \getbuffer \stoppacked - -The produced table is: - -\typefile{\jobname.words} - -The result can be traced with a module: - -\startbuffer -\usemodule[s-languages-words] - -\showwords -\stopbuffer - -\typebuffer - -This shows up as: - -\getbuffer - -The third mechanism colors languages differently. We only defined a few colors: - -\starttyping -\definecolor[word:en] [b=.75] -\definecolor[word:de] [r=.75] -\definecolor[word:nl] [g=.75] -\definecolor[word:unknown][r=.75,g=.75] -\stoptyping - -but you can of course define a color for your favourite language in a similar way. - -\startbuffer -\setupspellchecking[state=start,method=3] -\en Is this written right or is this wrong?\par -\nl Is dit goed geschreven of niet?\par -\setupspellchecking[state=stop] -\stopbuffer - -\typebuffer \startpacked \getbuffer \stoppacked - -\stopsection - -\stopchapter - -\stopcomponent diff --git a/doc/context/sources/general/manuals/languages/languages-hyphenation.tex b/doc/context/sources/general/manuals/languages/languages-hyphenation.tex deleted file mode 100644 index 6698717be..000000000 --- a/doc/context/sources/general/manuals/languages/languages-hyphenation.tex +++ /dev/null @@ -1,876 +0,0 @@ -% language=uk - -\startcomponent languages-hyphenation - -\environment languages-environment - -\startchapter[title=Hyphenation][color=darkmagenta] - -\startsection[title=How it works] - -Proper hyphenation is one of the strong points of \TEX. Hyphenation in \TEX\ is -done using so called hyphenation patterns. Making these patterns is an art -and most users (including me) happily use whatever is available. Patterns can be -created automatically using \type {patgen} but often manual tweaking is needed -too. A pattern looks as follows: - -\starttyping -pat1tern -\stoptyping - -This means as much as: you can split the word \type {pattern} in two pieces, with -a hyphen between the two \type {t}'s. Actually it will also split the word \type -{patterns} because the hyphenation mechanism looks at substrings. When no number -between characters in a pattern is given, a zero is assumed. This means as much -as {\em undefined}. An even number inhibits hyphenation, an odd number permits -it. The larger the number (weight), the more influence it has. A more restricted -pattern is: - -\starttyping -.pat1tern. -\stoptyping - -Here the periods set the word boundaries. The pattern dictionary for us -english has smaller patterns and the next trace shows how these are applied. - -\starthyphenation[traditional] -\showhyphenationtrace[en][pattern] -\stophyphenation - -The effective hyphenation of a word is determined by several factors: - -\startitemize[packed] -\startitem the current language, each language can have different patterns \stopitem -\startitem the characters, as some characters might block hyphenation \stopitem -\startitem the settings of \type {\lefthyphenmin} and \type {\righthyphenmin} \stopitem -\stopitemize - -A place where a word can be hyphenated is called a discretionary. When \TEX\ -analyzes a stream, it will inject discretionary nodes into that stream. - -\starttyping -pat\discretionary{-}{}{}tern. -\stoptyping - -In traditional \TEX\ hyphenation, ligature building and kerning are tightly -interwoven which is quite effective. However, there was also a strong -relationship between the current font and hyphenation. This is a side effect of -traditional \TEX\ having at most 256 characters in a font and the fact that the -used character is fact a reference to a slot in a font. There a character in the -input initially ends up as a character node and eventually becomes a glyph node. -For instance two characters \type {fi} can become a ligature glyph representing -this combination. - -In \LUATEX\ the hyphenation, ligature building and kerning stages are separated -and can be overloaded. In \CONTEXT\ all three can be replaced by code written in -\LUA. Because normally hyphenation happens before font logic is applied, there is -no relationship with font encoding. I wrote the first \LUA\ version of the -hyohenator on a rainy weekend and the result was not that bad so it was presented -at the 2014 \CONTEXT\ meeting. After some polishing I decided to add this routine -to the standard \MKIV\ repertoire which then involved some proper interfacing. - -You can enable the \LUA\ variant with the following command: - -\starttyping -\setuphyphenation[method=traditional] -\stoptyping - -We call this method \type {traditional} because in principle we can have -many more methods and this one is (supposed to be) mostly compatible to the -built-in method. This is a global setting. You can switch back with: - -\starttyping -\setuphyphenation[method=default] -\stoptyping - -In the next sections we will see how we can provide alternatives within the -traditional method. These alternatives can be set local and therefore can operate -over a limited range of characters. - -One complication in interfacing is that \TEX\ has grouping (which permits local -settings) and we want to limit some of the above functionality using groups. At -the same time hyphenation is a paragraph related action so we need to enable the -hyphenation related code at a global level (or at least make sure that it gets -exercised by forcing a \type {\par}). That means that the alternative -hyphenator has to be quite compatible so that we could just enable it for a whole -document. This can have an impact on performance but in practice that can be -neglected. In \LUATEX\ the \LUA\ variant is 4~times slower than the built-in one, -in \LUAJITTEX\ it's 3~times slower. But the good news is that the amount of time -spent in the hyphenator is relatively small compared to other manipulations and -macro expansion. The additional time needed for loading and preparing the -patterns into a more \LUA\ specific format can be neglected. - -You can check how words get hyphenated using the patterns management script: - -\starttyping ->mtxrun --script patterns --hyphenate language - -hyphenator | -hyphenator | . l a n g u a g e . . l a n g u a g e . -hyphenator | 0a2n0 0 0 2 0 0 0 0 0 0 -hyphenator | 2a0n0g0 0 2 2 0 0 0 0 0 0 -hyphenator | 0n1g0u0 0 2 2 1 0 0 0 0 0 -hyphenator | 0g0u4a0 0 2 2 1 0 4 0 0 0 -hyphenator | 2g0e0.0 0 2 2 1 0 4 2 0 0 -hyphenator | .0l2a2n1g0u4a2g0e0. . l a n-g u a g e . -hyphenator | -mtx-patterns | us 3 3 : language : lan-guage -\stoptyping - -\stopsection - -\startsection[title=The last words] - -Mid 2014 we had to upgrade a style for a \PDF\ assembly service: chapters from -(technical) school books are combined into arbitrary new books. There are some -nasty aspects with this flow: for instance, all section numbers in a chapter are -replaced by new numbers and this also involves figure and table prefixes. -It boils down to splitting up books, analyzing the typeset content and -preparing it for replacements. The structure is described in \XML\ files so that -we can generate tables of contents. The reason for not generating from \XML\ -sources is that the publisher doesn't have a \XML\ workflow and that books -already were available. Also, books from several series are combined and even -within a series structure (and rendering) differs. - -What has this to do with hyphenation? Writing a style for such a flow always -results in a more complex one that estimated and as usual it's in the details. -The original style was written in \MKII\ and used some box juggling to achieve -reasonable results but in \MKIV\ we can do better. - -Each chapter has a title and books get titles and subtitles as well. The titles -are typeset each time a new book is composed. This happens within some layout -constraints. Think of constraints like these: - -\startitemize[packed] -\startitem the title goes on top of a shape that doesn't permit much overflow \stopitem -\startitem there can be very long words (not uncommon in Dutch or German) \stopitem -\startitem a short word or hyphenated part should not end up on the last line \stopitem -\startitem the left and right hyphenation minima are at least four \stopitem -\stopitemize - -The last requirement is a compromise because in most cases publishers seem to -want ragged right not hyphenated rendering (at least in Dutch schoolbooks). The -arguments for this are quite weak and probably originate in fear of bad rendering -given past experiences. It's this kind of situations that drive the development -of the more obscure features that ship with \CONTEXT\ and a (partial) solution -for this specific case will be given later. - -If you look at thousands of titles and turn these into (small) paragraphs \TEX\ -does a pretty good job. It's the few exceptions that we need to catch. The next -examples demonstrate such an extreme case. - -\startbuffer[example] -\dorecurse{5} { % dejavu - \startlinecorrection[blank] - \bTABLE - \bTR - \bTD[align=middle,width=2em,foregroundstyle=bold] - #1 - \eTD - \bTD[align={verytolerant,flushleft},width=15em,offset=1ex] - \hsize \dimexpr11\emwidth-#1\dimexpr.5\emwidth\relax - \dontcomplain - \lefthyphenmin=4\righthyphenmin=4 - \blackrule[color=darkyellow,width=\hsize,height=-3pt,depth=5pt]\par - \begstrut\getbuffer[long]\endstrut\par - \eTD - \bTD[align={verytolerant,flushleft},width=15em,offset=1ex] - \sethyphenationfeatures[demo] - \hsize \dimexpr11\emwidth-#1\dimexpr.5\emwidth\relax - \dontcomplain - \blackrule[color=darkyellow,width=\hsize,height=-3pt,depth=5pt]\par - \begstrut\getbuffer[long]\endstrut\par - \eTD - \eTR - \eTABLE - \stoplinecorrection -} -\stopbuffer - -\definehyphenationfeatures - [demo] - [rightwords=1, - lefthyphenmin=4, - righthyphenmin=4] - -\startbuffer[long] -a verylongword and then anevenlongerword -\stopbuffer - -\starthyphenation[traditional] - \enabletrackers[hyphenator.visualize] - \getbuffer[example]\par - \disabletrackers[hyphenator.visualize] -\stophyphenation - -Of course in practice there need to be some reasonable width and when we pose -these limits the longest possible word should fit into the allocated space. In -these examples the rule shows the width. In the right columns we see a red -colored word and that one will not get hyphenated. - -\stopsection - -\startsection[title=Explicit hyphens] - -Another special case that we needed to handle were (compound) words with explicit -hyphens. Because often data comes from \XML\ files we can not really control the -typesetting as in a \TEX\ document where the author sees what gets done. So here -we need a way to turn these hyphens into proper hyphenation directives and at the -same time permit the words to be hyphenated. - -\definehyphenationfeatures - [demo] - [hyphens=yes, - lefthyphenmin=4, - righthyphenmin=4] - -\startbuffer[long] -a very-long-word and then an-even-longer-word -\stopbuffer - -\starthyphenation[traditional] - \enabletrackers[hyphenator.visualize] - \getbuffer[example]\par - \disabletrackers[hyphenator.visualize] -\stophyphenation - -\stopsection - -\startsection[title=Extended patterns] - -As with more opened up mechanisms, in \MKIV\ we can extend functionality. As an -example I have implemented the extensions discussed in the article by László -Németh in the Proceedings of Euro\TEX\ 2006: {\em Hyphenation in OpenOffice.org} -(TUGboat, Volume 27, 2006). The syntax for these extension is somewhat ugly and -involves optional offsets and ranges. \footnote {I'm not sure if there were ever -patterns released that used this syntax.} - -\startbuffer -\registerhyphenationpattern[nl][e1ë/e=e] -\registerhyphenationpattern[nl][a9atje./a=t,1,3] -\registerhyphenationpattern[en][eigh1tee/t=t,5,1] -\registerhyphenationpattern[de][c1k/k=k] -\registerhyphenationpattern[de][schif1f/ff=f,5,2] -\stopbuffer - -\typebuffer \getbuffer - -These patterns result in the following hyphenations: - -\starthyphenation[traditional] - \switchtobodyfont[big] - \starttabulate[|||] - \NC reëel \NC \language[nl]\hyphenatedcoloredword{reëel} \NC \NR - \NC omaatje \NC \language[nl]\hyphenatedcoloredword{omaatje} \NC \NR - \NC eighteen \NC \language[en]\hyphenatedcoloredword{eighteen} \NC \NR - \NC Zucker \NC \language[de]\hyphenatedcoloredword{Zucker} \NC \NR - \NC Schiffahrt \NC \language[de]\hyphenatedcoloredword{Schiffahrt} \NC \NR - \stoptabulate -\stophyphenation - -In a specification, the \type {.} indicates a word boundary and numbers indicate -the weight of a breakpoint. The optional extended specification comes after the -\type {/}. The values separated by a \type {=} are the pre and post sequences: -these end up at the end of the current line and beginning of the next one. The -optional numbers are the start position and length. These default to~1 and~2, so -in the first example they identify \type {eë} (the weights don't count). - -There is a pitfall here. When the language already has patterns that for -instance prohibit a hyphen between \type {e} and type {ë}, like \type{e2ë}, we -need to make sure that we give our new one a higher priority, which is why we -used a \type{e9ë}. - -This feature is somewhat experimental and can be improved. Here is a more \LUA-ish -way of setting such patterns: - -\starttyping -local registerpattern = - languages.hyphenators.traditional.registerpattern - -registerpattern("nl","e1ë", { - start = 1, - length = 2, - before = "e", - after = "e", -} ) - -registerpattern("nl","a9atje./a=t,1,3") -\stoptyping - -Just adding extra patterns to an existing set without much testing is not wise. For -instance we could add these to the dutch dictionary: - -\starttyping -\registerhyphenationpattern[nl][e3ë/e=e] -\registerhyphenationpattern[nl][o3ë/o=e] -\registerhyphenationpattern[nl][e3ï/e=i] -\registerhyphenationpattern[nl][i3ë/i=e] -\registerhyphenationpattern[nl][a5atje./a=t,1,3] -\registerhyphenationpattern[nl][toma8at5je] -\stoptyping - -That would work oke well for words like - -\starttyping -coëfficiënt -geïntroduceerd -copiëren -omaatje -tomaatje -\stoptyping - -However, the last word only goes right because we explicitly added a pattern -for it. One reason is that the existing patterns already contain rules to -prevent weird hyphenations. The same is true for the accented characters. So, -consider these examples and coordinate additional patterns with other users -so that errors can be identified. - -\stopsection - -\startsection[title=Exceptions] - -We have a variant on the \TEX\ primitive \type {\hyphenation}, the official way -to register a specific way to hyphenate a word. - -\startbuffer -\registerhyphenationexception[aaaaa-bbbbb] -aaaaabbbbb \par -\stopbuffer - -\typebuffer - -This code is self explaining and results in: - -\blank - -\starthyphenation[traditional] -\setupindenting[no]\hsize 1mm \lefthyphenmin 1 \righthyphenmin 1 \getbuffer -\stophyphenation - -There can be multiple hyphens and even multiple words in such a specification: - -\startbuffer -\registerhyphenationexception[aaaaa-bbbbb cc-ccc-ddd-dd] -aaaaabbbbb \par -cccccddddd \par -\stopbuffer - -\typebuffer - -We get: - -\blank - -\starthyphenation[traditional] -\setupindenting[no]\hsize 1mm \lefthyphenmin 1 \righthyphenmin 1 \getbuffer -\stophyphenation - - -\stopsection - -\startsection[title=Boundaries] - -A box, rule, math or discretionary will end a word and prohibit hyphenation -of that word. Take this example: - -\startbuffer[demo] -whatever \par -whatever\hbox{!} \par -\vl whatever\vl \par -whatever$x$ \par -whatever-whatever \par -\stopbuffer - -\typebuffer[demo] - -These lines will hyphenate differently and in traditional \TEX\ you need to -insert penalties and|/|or glue to get around it. In the \LUA\ variant we can -enable that limitation. - -\startbuffer -\definehyphenationfeatures - [strict] - [rightedge=tex] -\stopbuffer - -\typebuffer \getbuffer - -Here we show the three variants: traditional \TEX\ and \LUA\ with and without -strict settings. - -\starttabulate[|p|p|p|] -\HL -\NC \ttbf \hbox to 11em{default\hss} -\NC \ttbf \hbox to 11em{traditional\hss} -\NC \ttbf \hbox to 11em{traditional strict\hss} -\NC \NR -\HL -\NC \starthyphenation[default] \hsize1mm \getbuffer[demo] \stophyphenation -\NC \starthyphenation[traditional] \hsize1mm \getbuffer[demo] \stophyphenation -\NC \starthyphenation[traditional] \sethyphenationfeatures[strict] - \hsize1mm \getbuffer[demo] \stophyphenation -\NC \NR -\HL -\stoptabulate - -By default \CONTEXT\ is configured to hyphenate words that start with an -uppercase character. This behaviour is controlled in \TEX\ by the \typ {\uchyph} -variable. A positive value will enable this and a negative one disables it. - -\starttabulate[|p|p|p|p|] -\HL -\NC \ttbf \hbox to 8em{default 0\hss} -\NC \ttbf \hbox to 8em{default 1\hss} -\NC \ttbf \hbox to 8em{traditional 0\hss} -\NC \ttbf \hbox to 8em{traditional 1\hss} -\NC \NR -\HL -\NC \starthyphenation[default] \hsize1mm \uchyph\zerocount TEXified \dontcomplain \stophyphenation -\NC \starthyphenation[traditional] \hsize1mm \uchyph\zerocount TEXified \dontcomplain \stophyphenation -\NC \starthyphenation[default] \hsize1mm \uchyph\plusone TEXified \dontcomplain \stophyphenation -\NC \starthyphenation[traditional] \hsize1mm \uchyph\plusone TEXified \dontcomplain \stophyphenation -\NC \NR -\HL -\stoptabulate - -The \LUA\ variants behaves the same as the built-in implementation (that of course -remains the reference). - -\stopsection - -\startsection[title=Plug-ins] - -The default hyphenator is similar to the built-in one, with a couple of -extensions as mentioned. However, you can plug in your own code, given that it -does return a proper hyphenation result. One reason for providing this plug is -that there are users who want to play with hyphenators based on a different -logic. In \CONTEXT\ we already have some methods to deal with languages that -(for instance) have no spaces but split on words or syllabes. A more tight -integration with the hyphenator can have advantages so I will explore these -options when there is demand. - -A result table indicates where we can break a word. If we have a four character -word and can break after the second character, the result looks like this: - -\starttyping -result = { false, true, false, false } -\stoptyping - -Instead of \type {true} we can also have a table that has entries like the -extensions discussed in a previous section. Let's give an example of a -plug-in. - -\startbuffer -\startluacode - local subset = { - a = true, - e = true, - i = true, - o = true, - u = true, - y = true, - } - - languages.hyphenators.traditional.installmethod("test", - function(dictionary,word,n) - local t = { } - for i=1,#word do - local w = word[i] - if subset[w] then - t[i] = { - before = "<" .. w, - after = w .. ">", - left = false, - right = false, - } - else - t[i] = false - end - end - return t - end - ) -\stopluacode -\stopbuffer - -\typebuffer \getbuffer - -Here we hyphenate on vowels and surround them by angle brackets when -split over lines. This alternative is installed as follows: - -\startbuffer -\definehyphenationfeatures - [demo] - [alternative=test] -\stopbuffer - -\typebuffer \getbuffer - -We can now use it as follows: - -\starttyping -\setuphyphenation[method=traditional] -\sethyphenationfeatures[demo] -\stoptyping - -When applied to one the tufte example we get: - -\startbuffer[demo] -\starthyphenation[traditional] - \setuptolerance[tolerant] - \sethyphenationfeatures[demo] - \dontleavehmode - \input tufte\relax -\stophyphenation -\stopbuffer - -\blank \startnarrower \getbuffer[demo] \stopnarrower \blank - -A more realistic (but not perfect) example is the following: - -\startbuffer -\startluacode - local packslashes = false - - local specials = { - ["!"] = "before", ["?"] = "before", - ['"'] = "before", ["'"] = "before", - ["/"] = "before", ["\\"] = "before", - ["#"] = "before", - ["$"] = "before", - ["%"] = "before", - ["&"] = "before", - ["*"] = "before", - ["+"] = "before", ["-"] = "before", - [","] = "before", ["."] = "before", - [":"] = "before", [";"] = "before", - ["<"] = "before", [">"] = "before", - ["="] = "before", - ["@"] = "before", - ["("] = "before", - ["["] = "before", - ["{"] = "before", - ["^"] = "before", ["_"] = "before", - ["`"] = "before", - ["|"] = "before", - ["~"] = "before", - -- - [")"] = "after", - ["]"] = "after", - ["}"] = "after", - } - - languages.hyphenators.traditional.installmethod("url", - function(dictionary,word,n) - local t = { } - local p = nil - for i=1,#word do - local w = word[i] - local s = specials[w] - if s == "after" then - s = { - start = 1, - length = 1, - after = w, - left = false, - right = false, - } - specials[w] = s - elseif s == "before" then - s = { - start = 1, - length = 1, - before = w, - left = false, - right = false, - } - specials[w] = s - end - if not s then - s = false - elseif w == p and w == "/" then - t[i-1] = false - end - t[i] = s - if packslashes then - p = w - end - end - return t - end - ) -\stopluacode -\stopbuffer - -\typebuffer \getbuffer - -Again we define a plug: - -\startbuffer -\definehyphenationfeatures - [url] - [characters=all, - alternative=url] -\stopbuffer - -\typebuffer \getbuffer - -So, we only break a line after symbols. - -\startlinecorrection[blank] - \starthyphenation[traditional] - \tt - \sethyphenationfeatures[url] - \scale[width=\hsize]{\hyphenatedcoloredword{http://www.pragma-ade.nl}} - \stophyphenation -\stoplinecorrection - -A quick test can look as follows: - -\startbuffer -\starthyphenation[traditional] - \sethyphenationfeatures[url] - \tt - \dontcomplain - \hsize 1mm - http://www.pragma-ade.nl -\stophyphenation -\stopbuffer - -\typebuffer - -Or: - -\getbuffer - -\stopsection - -\startsection[title=Blocking ligatures] - -Yet another predefined feature is the ability to block a ligature. In -traditional \TEX\ this can be done by putting a \type {{}} between -the characters, although that effect can get lost when the text is -manipulated. The natural way to do this in a \UNICODE\ environment -is to use the special characters \type {zwj} and \type {zwnj}. - -We use the following example lines: - -\startbuffer[sample] -supereffective \blank -superef\zwnj fective -\stopbuffer - -\typebuffer[sample] - -and define two featuresets: - -\startbuffer -\definehyphenationfeatures - [demo-1] - [characters=\zwnj\zwj, - joiners=yes] - -\definehyphenationfeatures - [demo-2] - [joiners=no] -\stopbuffer - -\typebuffer \getbuffer - -We limit the width to 1mm and get: - -\startlinecorrection[blank] -\bTABLE[option=stretch,offset=.5ex] - \bTR - \bTD \tx - \type{method=default} - \eTD - \bTD \tx - \type{method=traditional} - \eTD - \bTD \tx - \type{method=traditional}\par - \type{featureset=demo-1} - \eTD - \bTD \tx - \type{method=traditional}\par - \type{featureset=demo-2} - \eTD - \eTR - \bTR - \bTD - \hsize 1mm \dontcomplain - \starthyphenation[default] - \getbuffer[sample] - \stophyphenation - \eTD - \bTD - \hsize 1mm \dontcomplain - \starthyphenation[traditional] - \getbuffer[sample] - \stophyphenation - \eTD - \bTD - \hsize 1mm \dontcomplain - \starthyphenation[traditional] - \sethyphenationfeatures[demo-1] - \getbuffer[sample] - \stophyphenation - \eTD - \bTD - \hsize 1mm \dontcomplain - \starthyphenation[traditional] - \sethyphenationfeatures[demo-2] - \getbuffer[sample] - \stophyphenation - \eTD - \eTR -\eTABLE -\stoplinecorrection - -\stopsection - -\startsection[title=Special characters] - -The \type {characters} example can be used (to some extend) to do the -same as the breakpoints mechanism (compounds). - -\startbuffer -\definehyphenationfeatures - [demo-3] - [characters={()[]}] -\stopbuffer - -\typebuffer \blank \getbuffer \blank - -\startbuffer[demo] -\starthyphenation[traditional] - \sethyphenationfeatures[demo-3] - \dontcomplain - \hsize 1mm - we use (super)special(ized) patterns -\stophyphenation -\stopbuffer - -\typebuffer[demo] \blank \getbuffer[demo] \blank - -We can make this more clever by adding patterns: - -\startbuffer -\registerhyphenationpattern[en][)9] -\registerhyphenationpattern[en][9(] -\stopbuffer - -\typebuffer \blank \getbuffer \blank - -This gives: - -\blank \getbuffer[demo] \blank - -A detailed trace shows that these patterns get applied: - -\starthyphenation[traditional] - \ttx - \showhyphenationtrace[en][(super)special(ized)] -\stophyphenation - -\unregisterhyphenationpattern[en][)9] -\unregisterhyphenationpattern[en][9(] - -The somewhat weird hyphens at the edges will in practice not show up because -there is always one regular character there. - -\stopsection - -\startsection[title=Counting] - -There is not much you can do about patterns. It's a craft to make them and so -they are shipped with the distribution. In order to hyphenate well, \TEX\ looks -at some character properties. In \CONTEXT\ only the characters used in the -patterns of a language get tagged as valid in a word. - -The following example illustrates that there can be corner cases. In fact, this -example might render differently depending on the patterns available. First we -define an extra language, based on French. - -\startbuffer -\installlanguage[frf][default=fr,patterns=fr,factor=yes] -\stopbuffer - -\typebuffer \getbuffer - -Here we set the \type {factor} parameter which tells the loader that it should -look at the characters used in a special way: some count for none, and some count -for more than one when determining the min values used to determine if and where -hyphenation is to be applied. - -\startbuffer -\startmixedcolumns[n=3,balance=yes] - \hsize 1mm \dontcomplain - \language[fr] aesop oedipus æsop œdipus \column - \hsize 1mm \dontcomplain - \language[frf] aesop oedipus æsop œdipus \column - \startexceptions æ-sop \stopexceptions - \hsize 1mm \dontcomplain - \language[frf] aesop oedipus æsop œdipus -\stopmixedcolumns -\stopbuffer - -\typebuffer - -We get three (when writing this manual) different columns: - -\getbuffer - -The trick is in the \type {factor}: when set to \type {yes} an \type {æ} is -counted as two characters. Combining marks count as zero but you will not -find them being used as we already resolve them in an earlier stage. - -\startluacode -context.startcolumns { n = 2 } -context.starttabulate { "|Tc|c|c|l|" } -for u, data in table.sortedhash(languages.hjcounts) do - if data.category ~= "combining" then - context.NC() context("%05U",u) - context.NC() context("%c",u) - context.NC() context(data.count) - context.NC() context(data.category) - context.NC() context.NR() - end -end -context.stoptabulate() -context.stopcolumns() -\stopluacode - -It is very unlikely to find an \type {ffi} in the input and even an \type {ij} is -rare. The \type {æ} is marked as character and the \type {œ} a ligatyure in -\UNICODE. Maybe all the characters here are dubious but al least we provide a -way to experiment with them. - -\stopsection - -\startsection[title=Tracing] - -Among the tracing options (low level trackers) there is one for pattern developers: - -\startbuffer -\usemodule[s-languages-hyphenation] - -\startcomparepatterns[de,nl,en,fr] - \input zapf \quad (\showcomparepatternslegend) -\stopcomparepatterns -\stopbuffer - -\typebuffer - -The different hyphenation points are shown with colored bars. Some valid points -might not be shown because the font engine can collapse successive -discretionaries. - -\getbuffer - -\stopsection - -\stopchapter - -\stopcomponent diff --git a/doc/context/sources/general/manuals/languages/languages-introduction.tex b/doc/context/sources/general/manuals/languages/languages-introduction.tex deleted file mode 100644 index 25bbb1a90..000000000 --- a/doc/context/sources/general/manuals/languages/languages-introduction.tex +++ /dev/null @@ -1,69 +0,0 @@ -% language=uk - -\startcomponent languages-introduction - -\environment languages-environment - -\startchapter[title=Introduction][color=darkgray] - -This document describes an important property of the \TEX\ typesetting system and -\CONTEXT\ in particular: the ability to deal with different languages at the same -time. With languages we refer to natural languages. So, we're not going to -discuss the \TEX\ language itself, not \METAPOST, nor \LUA. - -The original application of \TEX\ was English that uses the Latin script. The -fonts that came with \TEX\ were suitable for that usage. When lines became too -long they could be hyphenated using so called hyphenation patterns. Due to the -implementation for many years there was a close relationship between fonts and -hyphenation. Although at some point many more languages and scripts were -supported, it was only when the \UNICODE\ aware variants showed up that -hyphenation and fonts were decoupled. This makes it much more easier to mix -languages that use different scripts. Although Greek, Cyrillic, Arabic, Chinese, -Japanese, Korean and other languages have been supported for a while using -(sometimes dirty) tricks, we now have cleaner implementations. - -We can hyphenate words in all languages (and scripts) that have a need for it, -that is, split it at the end of a line and add a symbol before and|/|or after the -break. The way words are broken into parts is called hyphenation and so called -patterns are used to achieve that goal. The way these patterns are constructed -and applied was part of the research related to \TEX\ development. The method -used is also applied in other programs and is probably one of the few popular -ways to deal with hyphenation. There have been ideas about extensions that cover -the demands of certain languages but so far nothing better has shown up. In the -end \TEX\ does a pretty decent job and more advanced tricks don't necessarily -lead to better results. - -Hyphenation is driven by a language number and that's about it. This means that -one cannot claim that \TEX\ in its raw form supports languages, other than that -it can hyphenate and use fonts that provide the glyphs. It's upto a macro package -to wrap this into a mechanism that provides the user an interface. So, when we -speak about language support, hyphenation is only one aspect. Labels, like the -\type {figure} in {\em figure~1.2} need to adapt to the main document language. -When dates are shown they can be language specific. Scientific units and math -function names can also be subjected to translation. Registers and other lists -have to be sorted according to specific rules. Spacing dan differ per language. - -In this manual we will cover some of functionality in \CONTEXT\ \MKIV\ that -relates to languages (and scripts). This manual is a compliment to other manuals, -articles and documentation. Here we mostly focus on the language aspects. Some of -the content (or maybe most) might looks alien and complex to you. This is because -one purpose of this manual is to provide a place to wrap up some aspects of -\CONTEXT. If you're not interested in that, just stick to the more general -manuals that also cover language aspects. - -\startnotabene - This document is still under construction. The functionality discussed here - will stay and more might show up. Of course there are errors, and they're all - mine. The text is not checked for spelling errors. Feel free to let me know - what should get added. -\stopnotabene - -\startlines -Hans Hagen -PRAGMA ADE, Hasselt NL -2013 \emdash\ 2016 -\stoplines - -\stopchapter - -\stopcomponent diff --git a/doc/context/sources/general/manuals/languages/languages-labels.tex b/doc/context/sources/general/manuals/languages/languages-labels.tex deleted file mode 100644 index 6c806ab1c..000000000 --- a/doc/context/sources/general/manuals/languages/languages-labels.tex +++ /dev/null @@ -1,141 +0,0 @@ -% language=uk - -\startcomponent languages-labels - -\environment languages-environment - -\startchapter[title=Labels][color=darkcyan] - -\startsection[title=Introduction] - -When we started using \TEX, I naturally started with plain \TEX. But it didn't -take long before we tried \LATEX. Because our documents were in Dutch one of the -first fights with this package was to get rid of the english labels. Because -rather soon we decided to cook up an alternative package, a decent label -mechanism was one of the first things to show up. And as soon as multiple -language typesetting gets into view, such a mechanism becomes one of those -language dependent features. In this chapter the basics will be covered. - -\stopsection - -\startsection[title=Defining labels] - -Before we define a label we need to define a label class. You probably seldom -need that but this is how it's done: - -\startbuffer -\definelabelclass [mylabel] -\stopbuffer - -\typebuffer \getbuffer - -There are some classes predefined: - -\starttabulate[|lB|l|] - \NC head \NC (complete) titles like \headtext {chapter} and \headtext {figure} \NC \NR - \NC label \NC in||text labels like \labeltext {chapter} and \labeltext {figure} \NC \NR - \NC mathlabel \NC function names like \mathlabeltext{sin} and \mathlabeltext{cos} \NC \NR - \NC taglabel \NC labels used for tagging purposed in the backend \NC \NR - \NC btxlabel \NC labels used in typesetting bibliographic items \NC \NR -\stoptabulate - -The physical units mechanism also uses labels: unit, operator, prefix and suffix. -All these labels are defined per language with a fall back on english. - -Given that we have defined class \type {mylabel}, a label itself is set like -this: - -\startbuffer -\setupmylabeltext - [en] - [first={<after first}, - second={{before second>},{<after second}}] -\stopbuffer - -\typebuffer \getbuffer - -The first argument (the language) is optional. In the next section we will see -how these labels are used. A lot of labels are predefined, in \MKIV\ this happens -in the file \type {lang-txt.lua}. There is no need to adapt this file as you can -always add labels run time. - -\stopsection - -\startsection[title=Using labels] - -How a label is called depends on the way it needs to be used. In any case the -main language set determines the language of the label. So, when in an Dutch text -we temporary switch to German, the Dutch labels are used. - -\startbuffer -\starttabulate[||||] - \NC \bf command \NC \ttbf first \NC \ttbf {second} \NC \NR - \HL - \NC \type {\leftmylabeltext {tag}} \NC \leftmylabeltext {first} \NC \leftmylabeltext {second} \NC \NR - \NC \type {\rightmylabeltext{tag}} \NC \rightmylabeltext{first} \NC \rightmylabeltext{second} \NC \NR - \NC \type {\mylabeltext {tag}} \NC \mylabeltext {first} \NC \mylabeltext {second} \NC \NR - \NC \type {\mylabeltexts {tag}{text}} \NC \mylabeltexts {first}{text} \NC \mylabeltexts {second}{text} \NC \NR -\stoptabulate -\stopbuffer - -\getbuffer - -\stopsection - -\startsection[title=Hooks] - -Some mechanisms have label support built in, most noticeably sections -heads and numbered items, like figure captions. - -\startbuffer -\definehead - [myhead] - [subsection] - -\setuphead - [myhead] - [bodypartlabel=bodypartmyhead] - -\setuplabeltext - [en] - [bodypartmyhead=My Head: ] - -\myhead{Welcome} -\stopbuffer - -\typebuffer \getbuffer - -The head text label class can be used as follows: - -\startbuffer -\setupheadtext - [SomeHead=Just A Title] - -\subsection - [title=\headtext{SomeHead}] -\stopbuffer - -\typebuffer \getbuffer - -A label will obey the style settings, as in: - -\startbuffer -\definehead - [MyFancyHead] - [subsection] - [style={\bs\setcharactercasing[Words]}] - -\setupheadtext - [SomeHead=just another title] - -\MyFancyHead - [title=\headtext{SomeHead}] -\stopbuffer - -\typebuffer \getbuffer - -\stopsection - -\stopchapter - -\stopcomponent diff --git a/doc/context/sources/general/manuals/languages/languages-mkiv.tex b/doc/context/sources/general/manuals/languages/languages-mkiv.tex deleted file mode 100644 index 98ca84814..000000000 --- a/doc/context/sources/general/manuals/languages/languages-mkiv.tex +++ /dev/null @@ -1,61 +0,0 @@ -% language=uk - -% author : Hans Hagen -% copyright : PRAGMA ADE & ConTeXt Development Team -% license : Creative Commons Attribution ShareAlike 4.0 International -% reference : pragma-ade.nl | contextgarden.net | texlive (related) distributions -% origin : the ConTeXt distribution -% -% comment : Because this manual is distributed with TeX distributions it comes with a rather -% liberal license. We try to adapt these documents to upgrades in the (sub)systems -% that they describe. Using parts of the content otherwise can therefore conflict -% with existing functionality and we cannot be held responsible for that. Many of -% the manuals contain characteristic graphics and personal notes or examples that -% make no sense when used out-of-context. - -\enablemode[simple] % ,oversized - -\startbuffer[abstract] - - This book explains how we support languages (and scripts) in \CONTEXT\ \MKIV\ - and \LUATEX. Some of the mechanisms discussed are generic and not \CONTEXT\ - specific. We discuss the way languages are dealt with in the engine, - hyphenation, standard features and additional goodies. Tracing and the - extensibility of code are also discussed. - -\stopbuffer - -\environment languages-environment - -\startdocument - [author=Hans Hagen, - title=Languages in \ConTeXt, - subtitle=explaining luatex and mkiv, - affiliation=PRAGMA ADE, - comment=work in progress, - cover:color:1=darkmagenta, - cover:color:2=darkorange, - cover:color:3=darkyellow, - cover:color:4=darkcyan, - cover:color:5=darkgray] - - \startfrontmatter - \component manuals-explaining-contents - \component languages-introduction - \stopfrontmatter - - \startbodymatter - \component languages-basics - \component languages-hyphenation - \component languages-labels - \component languages-numbering - \component languages-typesetting - \component languages-goodies - \component languages-sorting - \stopbodymatter - - \startappendices - \component languages-appendix - \stopappendices - -\stopdocument diff --git a/doc/context/sources/general/manuals/languages/languages-numbering.tex b/doc/context/sources/general/manuals/languages/languages-numbering.tex deleted file mode 100644 index 3464826df..000000000 --- a/doc/context/sources/general/manuals/languages/languages-numbering.tex +++ /dev/null @@ -1,307 +0,0 @@ -% language=uk - -\startcomponent languages-numbering - -\environment languages-environment - -\startchapter[title=Numbering][color=darkgreen] - -\startsection[title=Introduction] - -Numbering is complex and in \CONTEXT\ it's not easy either. This is because we -not only have 1, 2, 3 \unknown\ but also sub numbers like 1a, 1b, 1ic \unknown\ -or 1.a, 1.b, 1.c \unknown\ There can be many levels, different separators, final -symbols. As we're talking languages we only discuss conversion here: the -mechanism that turns a number in for instance a letter. It happens that the -mapping from a number onto a letter is language dependent. The next lines show -how English, Spanish and Slovenian numbers: - -\blank -\startpacked -\startcolor[maincolor] -\dontleavehmode - {\ttbf\mainlanguage[en]\dorecurse{28} - {\hbox to 1.5em{\convertnumber{alphabetic}{#1}\hss}}}\par -\dontleavehmode - {\ttbf\mainlanguage[es]\dorecurse{28} - {\hbox to 1.5em{\convertnumber{alphabetic}{#1}\hss}}}\par -\dontleavehmode - {\ttbf\mainlanguage[sl]\dorecurse{28} - {\hbox to 1.5em{\convertnumber{alphabetic}{#1}\hss}}}\par -\stopcolor -\stoppacked -\blank - -You convert a number into a letter with: - -\starttyping -\convertnumber{alphabetic}{15} -\stoptyping - -There is also \type {\uconvertnumber} which does not expand unless typesetting -is going on. Normally you don't need to bother about this. - -The \type {alphabetic} converter adapts to the current main language. When a -language has no special alphabet, the regular 26 characters are used. - -A converter can also convert to a roman numeral, a language specific ordered -list, a day or month, an ordinal string and again there can be a language -specific conversion. The general conversion macro takes a conversion name and -a number. When a conversion can be set (for instance in an itemized list, or in -section numbering) you can use these names. You can define additional -converters if needed, as long as the converter can handle a number. - -\starttyping -\defineconversion [alphabetic] [\alphabeticnumerals] -\stoptyping - -Here \type {\alphabeticnumerals} is a converter. If you look into the source of -\CONTEXT\ you will see that many converters are calling out to \LUA, where we -have implemented those specific conversions. The following table has long and -short names. The short one are historic. - -\starttabulate -\FL -\NC month \NC \type {\monthlong} \NC \NR -\NC month:mnem \NC \type {\monthshort} \NC \NR -\ML -\NC character \NC \type {\character} \NC \NR -\NC Character \NC \type {\Character} \NC \NR -\NC characters \NC \type {\characters} \NC \NR -\NC Characters \NC \type {\Characters} \NC \NR -\ML -\NC AK \NC \type {\smallcappedcharacters} \NC \NR -\NC KA \NC \type {\smallcappedcharacters} \NC \NR -\ML -\NC alphabetic a \NC \type {\alphabeticnumerals} \NC \NR -\NC Alphabetic A \NC \type {\Alphabeticnumerals} \NC \NR -\ML -\NC number numbers n \NC \type {\numbers} \NC \NR -\NC Numbers N \NC \type {\Numbers} \NC \NR -\NC mediaeval m \NC \type {\mediaeval} \NC \NR -\ML -\NC word words \NC \type {\verbosenumber} \NC \NR -\NC Word Words \NC \type {\VerboseNumber} \NC \NR -\ML -\NC ordinal \NC \type {\ordinalnumber} \NC \NR -\NC Ordinal \NC \type {\Ordinalnumber} \NC \NR -\ML -\NC romannumerals i r \NC \type {\romannumerals} \NC \NR -\NC Romannumerals I R \NC \type {\Romannumerals} \NC \NR -\ML -\NC o \NC \type {\oldstylenumerals} \NC \NR -\NC O \NC \type {\oldstylenumerals} \NC \NR -\NC or \NC \type {\oldstyleromannumerals} \NC \NR -\ML -\NC KR \NC \type {\smallcappedromannumerals} \NC \NR -\NC RK \NC \type {\smallcappedromannumerals} \NC \NR -\ML -\NC greek g \NC \type {\greeknumerals} \NC \NR -\NC Greek G \NC \type {\Greeknumerals} \NC \NR -\NC mathgreek \NC \type {\mathgreek} \NC \NR -\ML -\NC abjadnumerals \NC \type {\abjadnumerals} \NC \NR -\NC abjadnodotnumerals \NC \type {\abjadnodotnumerals} \NC \NR -\NC abjadnaivenumerals \NC \type {\abjadnaivenumerals} \NC \NR -\ML -\NC thainumerals \NC \type {\thainumerals} \NC \NR -\NC devanagarinumerals \NC \type {\devanagarinumerals} \NC \NR -\NC gurmurkhinumerals \NC \type {\gurmurkhinumerals} \NC \NR -\NC gujaratinumerals \NC \type {\gujaratinumerals} \NC \NR -\NC tibetannumerals \NC \type {\tibetannumerals} \NC \NR -\NC greeknumerals \NC \type {\greeknumerals} \NC \NR -\NC Greeknumerals \NC \type {\Greeknumerals} \NC \NR -\NC arabicnumerals \NC \type {\arabicnumerals} \NC \NR -\NC persiannumerals \NC \type {\persiannumerals} \NC \NR -\NC arabicexnumerals \NC \type {\arabicexnumerals} \NC \NR -\NC arabicdecimals \NC \type {\arabicdecimals} \NC \NR -\NC persiandecimals \NC \type {\persiandecimals} \NC \NR -\ML -\NC koreannumerals kr \NC \type {\koreannumerals} \NC \NR -\NC koreanparenthesisnumerals kr-p \NC \type {\koreanparenthesisnumerals} \NC \NR -\NC koreancirclenumerals kr-c \NC \type {\koreancirclenumerals} \NC \NR -\ML -\NC chinesenumerals cn \NC \type {\chinesenumerals} \NC \NR -\NC chinesecapnumerals cn-c \NC \type {\chinesecapnumerals} \NC \NR -\NC chineseallnumerals cn-a \NC \type {\chineseallnumerals} \NC \NR -\ML -\NC sloveniannumerals \NC \type {\sloveniannumerals} \NC \NR -\NC slovenianNumerals \NC \type {\slovenianNumerals} \NC \NR -\ML -\NC spanishnumerals \NC \type {\spanishnumerals} \NC \NR -\NC spanishNumerals \NC \type {\spanishNumerals} \NC \NR -\LR -\stoptabulate - -The \type {alphabetic} and \type {Alphabetic} converters adapt to slovenian and -spanish as do their small capped alternatives. There are more general helpers for it -too: - -\starttyping -\languagecharacters{number} -\languageCharacters{number} -\stoptyping - -Also language related is the \type {\continuednumber} macro. Here we see an -application: - -\startbuffer -1 \continuednumber{1} -1, 2 \continuednumber{2} -1, 2, 3 \continuednumber{3} -\stopbuffer - -\typebuffer - -What renders as: - -\startlines[color=maincolor] -\getbuffer -\stoplines - -Such a macro is typically used in combination with counters ant it just typesets -a label text depending on the valu ebeing non|-|zero. - -\startbuffer -\setuplabeltext[en][continued={and so on}] -1, 2, 3 \continuednumber{3} -1, 2, 3 \convertnumber{continued}{3} -\stopbuffer - -\typebuffer - -This gives: - -\startlines[color=maincolor] -\getbuffer -\stoplines - -In the rare case that you want to check if a conversion is defined you can use - -\starttyping -\doifelseconversiondefined{name}{true}{false} -\stoptyping - -So, - -\startbuffer -\doifelseconversiondefined{characters}{we can convert}{forget about it} -\stopbuffer - -\typebuffer - -Gives: - -\startlines[color=maincolor] -\getbuffer -\stoplines - -There are also some non language related converters that we mention here for -completeness: - -\blank -\type {set 0}: \startcolor[maincolor]\dorecurse{20}{\convertnumber{set 0}{#1} }\stopcolor\par -\type {set 1}: \startcolor[maincolor]\dorecurse{20}{\convertnumber{set 1}{#1} }\stopcolor\par -\type {set 2}: \startcolor[maincolor]\dorecurse{20}{\convertnumber{set 2}{#1} }\stopcolor\par -\type {set 3}: \startcolor[maincolor]\dorecurse{20}{\convertnumber{set 3}{#1} }\stopcolor\par -\blank - -When a set overruns we start again at the first element. - -The ordinal converter produces output like \color [maincolor] {\convertnumber -{ordinal}{123}} and \color [maincolor] {\convertnumber {ordinal}{654}}. The -corresponding string renderer is \type {\highordinalstr}. - -% quite limited currently op not documented here: -% -% \wordtonumber{two}{3} -% \wordtonumber{fivethousand}{unknown} - -\stopsection - -\startsection[title=Dates] - -Dates are also language dependent. The following macros take a number and return -the name of the month or day. - -\starttabulate -\NC \type {\monthlong } \NC \monthlong {10} \NC \NR -\NC \type {\monthshort} \NC \monthshort{10} \NC \NR -\NC \type {\MONTH } \NC \MONTH {10} \NC \NR -\NC \type {\MONTHLONG } \NC \MONTHLONG {10} \NC \NR -\NC \type {\MONTHSHORT} \NC \MONTHSHORT{10} \NC \NR -\NC \type {\weekday } \NC \weekday {5} \NC \NR -\NC \type {\WEEKDAY } \NC \WEEKDAY {5} \NC \NR -\stoptabulate - -The current date can be typeset with \type {\currentdate} and a -specific date with \type {\date}, for instance: - -\startbuffer -\currentdate[weekday,day,month,year] -\currentdate[WEEKDAY,day,MONTH,year] -\date[d=12,m=12,y=1998][weekday] -\date[d=12,m=12,y=1998] -\stopbuffer - -\typebuffer - -\startlines[color=maincolor] -\getbuffer -\stoplines - -Possible elements of the specification are: - -\starttabulate -\FL -\NC + ord \NC ordinal suffix \NC \NR -\NC ++ highord \NC high ordinal suffix \NC \NR -\ML -\NC mnem: \NC mnemonic prefix \NC \NR -\ML -\NC Y y year \NC year 4 digits \NC \NR -\NC yy \NC year 2 digits \NC \NR -\ML -\NC M \NC month 1 or 2 digits \NC \NR -\NC mm \NC month 2 digits \NC \NR -\ML -\NC D \NC day 1 or 2 digits \NC \NR -\NC dd \NC day 2 digits \NC \NR -\ML -\NC W \NC 1 digit \NC \NR -\ML -\NC month m \NC language dependent (can be mnemonic) \NC \NR -\NC day d \NC language dependent \NC \NR -\NC weekday w \NC language dependent \NC \NR -\ML -\NC MONTH \NC month uppercased \NC \NR -\NC WEEKDAY \NC weekday uppercased \NC \NR -\ML -\NC referral \NC YYYMMDD \NC \NR -\ML -\NC space \\ \NC space \NC \NR -\NC <word> \NC word \NC \NR -\LL -\stoptabulate - -There are also some converters built in (more can be added), for instance: - -\startbuffer -The current {\em gregorian} date \currentdate [month, day, {, }, year] is -in {\em jalali} \currentdate [jalali:to, month, day, {, }, year] but we -can also as a specific one, so {\em jalali} \date [y=1395, m=4, d=18] -[month, day, {, }, year] is {\em gregorian} \date [y=1395, m=4, d=18] -[jalali:from, month, day, {, }, year]. -\stopbuffer - -\typebuffer \startnarrower \getbuffer \stopnarrower - -\stopsection - -% \startsection[title=Counters] -% -% \stopsection - -\stopchapter - -\stopcomponent diff --git a/doc/context/sources/general/manuals/languages/languages-sorting.tex b/doc/context/sources/general/manuals/languages/languages-sorting.tex deleted file mode 100644 index ce24f385c..000000000 --- a/doc/context/sources/general/manuals/languages/languages-sorting.tex +++ /dev/null @@ -1,235 +0,0 @@ -% language=uk - -\startcomponent languages-sorting - -\environment languages-environment - -\startchapter[title=Sorting][color=darkblue] - -\startsection[title=Introduction] - -Sorting is complex, not so much for English, Dutch, German, etc. only texts but -there are languages and scripts that are more demanding. There are several -complications: - -\startitemize - - \startitem - There can be characters that have accents, like à, á, â, ã, ä - \unknown\ that have a base shape a and in an index these often end up - close to each other. The order can differ per language. - \stopitem - - \startitem - There are upper and lowercase words and there can be different - expectations to them being mixed or separated. - \stopitem - \startitem - Some scripts have characters that are combinations, like Æ, and - one might want to see them as one character or two, in which the - second one obeys the sorting order. The shape can dominate here. - \stopitem - \startitem - Some scripts, like Japanese, are a combination of several scripts - and sorting then depends on normalization. - \stopitem - \startitem - When there are many glyphs, like in Chinese, the order can depend - on the complexity of the glyph and when we're lucky that order is - reflected in the numeric character order. - \stopitem -\stopitemize - -Often the rules are somewhat strict and one can doubt of the same rules would -have been imposed if computers had been developed earlier. Given discussions one -can doubt if the rules are really consistent or just there because someone (or a -group) with influence set the standard (not so much different from grammar). So, -if we deal with sorting, we do that in such a way that users can (to some extend) -influence the outcome. After all, one important aspect of typesetting and -organizing content is that the users gets the feeling of control and a diversion -from a standard can be part of that. The reader will often not notice these -details. In the next sections we will explore the way sorting is done in -\CONTEXT. The method evolved over a few decades. In \MKII\ sorting happened -between runs and it was just part of the processing of a document that users -never really saw in action. Sorting just happened and few users will have noticed -that we moved from a \MODULA\ program to a \PERL\ script and ended up with a -\RUBY\ script. In fact, there is a \LUA\ replacement but it never got tested well -because we moved in to \MKIV. There all happens inside the engine using \LUA. -Some principles stayed the same but we are more flexible now. - -\stopsection - -\startsection[title=How it works] - -How does sorting work out? Take these words: - -\startlines -abracadabra -abräcàdábra -àbracádabrä -ábracadàbra -äbrácadabrà -\stoplines - -As long as they end up in an order where the reader can find it, we're okay. -After all we're pretty good in pattern recognition. - -There are probably many ways to implement a sorter but the one we uses is more or -less a follow up on the one we had for over a decade and was the result of an -evolution based on user demand. It boils down to cleaning up the string in such a -way that it can be split into meaningful characters. One can argue that we should -use some kd of standardized sorting method but the problem is that we always have -to deal with for instance embedded tex commands and mixed content, for instance -numbers. And users using the same language can have different opinions about the -rules too. - -A word (or sequence of words) is split into characters. Because there can be -\TEX\ commands in there some cleanup happens beforehand. After that we create -several lists with numbers that will be compared when sorting two entries. - -\startluacode - --- local ignoredoffset = sorters.constants.ignoredoffset --- local replacementoffset = sorters.constants.replacementoffset --- local digitsoffset = sorters.constants.digitsoffset --- local digitsmaximum = sorters.constants.digitsmaximum - -local context = context - -local utfchar = utf.char -local utfyte = utf.byte -local concat = table.concat -local gsub = string.gsub -local formatters = string.formatters - -local f_char = formatters["%s"] -local f_byte = formatters["x%02X"] - -local meaning = { - ch = "raw character", - mm = "minus mapping", - zm = "zero mapping", - pm = "plus mapping", - mc = "lowercase - 1", - zc = "lowercase", - pc = "lowercase + 1", - uc = "unicode", -} - -local function show(s,key,bodyfont) - local c = s[key] - local t = { } - for i=1,#c do - local ci = c[i] - if type(ci) == "string" then - t[i] = f_char(ci) - else - t[i] = f_byte(ci) - end - end - t = concat(t,"~") - context.NC() context.maincolor() context(key) - context.NC() context.maincolor() context(meaning[key]) - context.NC() if bodyfont then context.switchtobodyfont{bodyfont} end context(t) - context.NC() context.NR() -end - -function document.ShowSortSplit(str,language,bodyfont) - sorters.setlanguage(language or "en") - local s = sorters.splitters.utf(str) - context.starttabulate{ "|Tl|Tlj2|Tp|" } - context.FL() - context.NC() - context.NC() context.maincolor() context(language) - context.NC() if bodyfont then context.switchtobodyfont{bodyfont} end context.maincolor() context(str) - context.NC() context.NR() - context.ML() - show(s,"ch",bodyfont) - show(s,"uc") - show(s,"zc") - show(s,"mc") - show(s,"pc") - show(s,"zm") - show(s,"mm") - show(s,"pm") - context.LL() - context.stoptabulate() -end - -\stopluacode - -We can best demonstrate this with a few examples. As usual an English language -example is trivial. - -\ctxlua{document.ShowSortSplit("abracadabra","en")} - -When we add an uppercase character we get a slightly different outcome: - -\ctxlua{document.ShowSortSplit("Abracadabra","en")} - -Some characters will be split, like \type {æ}: - -\ctxlua{document.ShowSortSplit("æsop","en")} - -It gets more complex when langiage specific demands kick in. Compare an English, German -and Austrian split: - -\ctxlua{document.ShowSortSplit("Abräcàdábra","en")} -\ctxlua{document.ShowSortSplit("Abräcàdábra","de")} -\ctxlua{document.ShowSortSplit("Abräcàdábra","de-at")} - -The way a character gets replaced, like \type {ä} into \type {ae}, is defined in -\type {sort-lan.lua} using \LUA\ tables. We will not explain all the obscure -details here; most of the work is already done, so users are not bothered by -these definitions. And new ones can often be made by copying and adapting an -existing one. - -The sorting itself is specified by a sequence: - -\starttabulate[|TlCT{maincolor}|Tl|] -\NC default \NC zc,pc,zm,pm,uc \NC \NR -\NC before \NC mm,mc,uc \NC \NR -\NC after \NC pm,mc,uc \NC \NR -\NC first \NC pc,mm,uc \NC \NR -\NC last \NC mc,mm,uc \NC \NR -\stoptabulate - -The raw character is what we get after the (language specific) replacement has -been applied and the unicodes are used when comparing. Lowercasing is done using -the \UNICODE\ lowercase code, but one can define language specific ones too. The -plus and minus variants can be used to force lowercase before or after uppercase. -The mapping is based on an alphabet specification so this can differ per language -and again we also provide plus and minus values that depend on case. When a -character has no case we use shapes instead. For instance, the shape of \type -{à} is \type {a}. Digits are treated special and currently get an offset so that -they end up last in the sort order. - -\defineregister[jindex] - -\startbuffer -ぱあ \jindex{ぱあ} -ぱー \jindex{ぱー} -ぱぁ \jindex{ぱぁ} -\stopbuffer - -{\switchtobodyfont[ipaex]\startlines\typebuffer\stoplines} - -This three entry index\jindex{ぱあ}\jindex{ぱー}\jindex{ぱぁ} should be sorted in the order: -{\switchtobodyfont[ipaex]\ruledhbox{ぱー}\enspace\ruledhbox{ぱぁ}\enspace\ruledhbox{ぱあ}}. - -{\mainlanguage[jp]\switchtobodyfont[ipaex]\placeregister[jindex][language=jp,n=1,method=default]} -{\mainlanguage[jp]\switchtobodyfont[ipaex]\placeregister[jindex][language=jp,n=1,method=zm]} - -\ctxlua{document.ShowSortSplit("ぱあ","jp","ipaex")} -\ctxlua{document.ShowSortSplit("ぱー","jp","ipaex")} -\ctxlua{document.ShowSortSplit("ぱぁ","jp","ipaex")} - -{\em To be continued!} - -\stopsection - -% ぱー $\prec$ ぱぁ $\prec$ ぱあ - -\stopchapter - -\stopcomponent |