%D \module %D [ file=lang-mis, %D version=1997.03.20, % used to be supp-lan.tex %D title=\CONTEXT\ Language Macros, %D subtitle=Compounds, %D author=Hans Hagen, %D date=\currentdate, %D copyright={PRAGMA ADE \& \CONTEXT\ Development Team}] %C %C This module is part of the \CONTEXT\ macro||package and is %C therefore copyrighted by \PRAGMA. See mreadme.pdf for %C details. \writestatus{loading}{ConTeXt Language Macros / Compounds} %D More or less replaced. %D \gdef\starttest#1\stoptest{\starttabulate[|l|l|p|]#1\stoptabulate} %D \gdef\test #1{\NC\detokenize{#1}\NC\hyphenatedword{#1}\NC#1\NC\NR} \unprotect %D One of \TEX's strong points in building paragraphs is the way %D hyphenations are handled. Although for real good hyphenation %D of non||english languages some extensions to the program are %D needed, fairly good results can be reached with the standard %D mechanisms and an additional macro, at least in Dutch. %D \CONTEXT\ originates in the wish to typeset educational %D materials, especially in a technical environment. In %D production oriented environments, a lot of compound words %D are used. Because the Dutch language poses no limits on %D combining words, we often favor putting dashes between those %D words, because it facilitates reading, at least for those %D who are not that accustomed to it. %D %D In \TEX\ compound words, separated by a hyphen, are not %D hyphenated at all. In spite of the multiple pass paragraph %D typesetting this can lead to parts of words sticking into %D the margin. The solution lays in saying \type %D {spoelwater||terugwinunit} instead of \type %D {spoelwater-terugwinunit}. By using a one character command %D like \type {|}, delimited by the same character \type {|}, %D we get ourselves both a decent visualization (in \TEXEDIT\ %D and colored verbatim we color these commands yellow) and an %D efficient way of combining words. %D %D The sequence \type{||} simply leads to two words connected by %D a hyphen. Because we want to distinguish such a hyphen from %D the one inserted when \TEX\ hyphenates a word, we use a bit %D longer one. %D %D \hyphenation {spoel-wa-ter te-rug-win-unit} %D %D \starttest %D \test {spoelwater||terugwinunit} %D \stoptest %D %D As we already said, the \type{|} is a command. This commands %D accepts an optional argument before it's delimiter, which is %D also a \type{|}. %D %D \hyphenation {po-ly-meer che-mie} %D %D \starttest %D \test {polymeer|*|chemie} %D \stoptest %D %D Arguments like \type{*} are not interpreted and inserted %D directly, in contrary to arguments like: %D %D \starttest %D \test {polymeer|~|chemie} %D \test {|(|polymeer|)|chemie} %D \test {polymeer|(|chemie|)| } %D \stoptest %D %D Although such situations seldom occur |<|we typeset thousands %D of pages before we encountered one that forced us to enhance %D this mechanism|>| we also have to take care of comma's. %D %D \hyphenation {uit-stel-len} %D %D \starttest %D \test {op||, in|| en uitstellen} %D \stoptest %D %D The next special case (concerning quotes) was brought to my %D attention by Piet Tutelaers, one of the driving forces %D behind rebuilding hyphenation patterns for the dutch %D language.\footnote{In 1996 the spelling of the dutch %D language has been slightly reformed which made this topic %D actual again.} We'll also take care of this case. %D %D \starttest %D \test {AOW|'|er} %D \test {cd|'|tje} %D \test {ex|-|PTT|'|er} %D \test {rock|-|'n|-|roller} %D \stoptest %D %D Tobias Burnus pointed out that I should also support %D something like %D %D \starttest %D \test {well|_|known} %D \stoptest %D %D to stress the compoundness of hyphenated words. %D %D Of course we also have to take care of the special case: %D %D \starttest %D \test {text||color and ||font} %D \stoptest %D \macros %D {installdiscretionaries} %D %D The mechanism described here is one of the older inner parts %D of \CONTEXT. The most recent extensions concerns some %D special cases as well as the possibility to install other %D characters as delimiters. The prefered way of specifying %D compound words is using \type{||}, which is installed by: %D %D \starttyping %D \installdiscretionary | - %D \stoptyping %D %D Some alternative definitions are: %D %D \startbuffer %D \installdiscretionary * - %D \installdiscretionary + - %D \installdiscretionary / - %D \installdiscretionary ~ - %D \stopbuffer %D %D \typebuffer %D %D after which we can say: %D %D \start \getbuffer %D \starttest %D \test {test**test**test} %D \test {test++test++test} %D \test {test//test//test} %D \test {test~~test~~test} %D \stoptest %D \stop %D \macros %D {compoundhyphen, %D beginofsubsentence,endofsubsentence} %D %D Now let's go to the macros. First we define some variables. %D In the main \CONTEXT\ modules these can be tuned by a setup %D command. Watch the (maybe) better looking compound hyphen. \ifx\compoundhyphen \undefined \unexpanded\def\compoundhyphen {\hbox{-\kern-.25ex-}} \fi \ifx\beginofsubsentence\undefined \unexpanded\def\beginofsubsentence{\hbox{\emdash}} \fi \ifx\endofsubsentence \undefined \unexpanded\def\endofsubsentence {\hbox{\emdash}} \fi %D The last two variables are needed for subsentences %D |<|like this one|>| which we did not yet mention. %D %D We want to enable breaking but at the same time don't want %D compound characters like |-| or || to be separated from the %D words. \TEX\ hackers will recognise the next two macro's: \ifx\prewordbreak \undefined \unexpanded\def\prewordbreak {\penalty\plustenthousand\hskip\zeropoint\relax} \fi \ifx\postwordbreak\undefined \unexpanded\def\postwordbreak {\penalty\zerocount \hskip\zeropoint\relax} \fi \ifx\hspaceamount \undefined \def\hspaceamount#1#2{.16667\emwidth} \fi % language specific \unexpanded\def\permithyphenation{\ifhmode\prewordbreak\fi} % doesn't remove spaces %D \macros %D {beginofsubsentencespacing,endofsubsentencespacing} %D %D In the previous macros we provided two hooks which can be %D used to support nested sub||sentences. In \CONTEXT\ these %D hooks are used to insert a small space when needed. \ifx\beginofsubsentencespacing\undefined \let\beginofsubsentencespacing\relax \fi \ifx\endofsubsentencespacing \undefined \let\endofsubsentencespacing \relax \fi %D The following piece of code is a torture test compound %D hndling. The \type {\relax} before the \type {\ifmmode} is %D needed because of the alignment scanner (in \ETEX\ this %D problem is not present because there a protected macro is %D not expanded. Thanks to Tobias Burnus for providing this %D example. %D %D \startformula %D \left|f(x_n)-{1\over2}\right| = %D {\cases{|{1\over2}-x_n| &for $0\le x_n < {1\over2}$\cr %D |x_n-{1\over2}| &for ${1\over2}2xxx \def\lang_discretionaries_hyphen_like#1#2% {\ifconditional\spaceafterdiscretionary \prewordbreak\hbox{#1}\relax \else\ifconditional\punctafterdiscretionary \prewordbreak\hbox{#1}\relax \else \prewordbreak#2\postwordbreak % was prewordbreak \fi\fi} \definetextmodediscretionary {} {\lang_discretionaries_hyphen_like\textmodehyphen\textmodehyphendiscretionary} \definetextmodediscretionary - {\lang_discretionaries_hyphen_like\normalhyphen\normalhyphendiscretionary} \definetextmodediscretionary _ {\lang_discretionaries_hyphen_like\composedhyphen\composedhyphendiscretionary} \definetextmodediscretionary ) {\lang_discretionaries_hyphen_like{)}{\discretionary{-)}{}{)}}} \definetextmodediscretionary ( {\ifdim\lastskip>\zeropoint (\prewordbreak \else \prewordbreak\discretionary{}{(-}{(}\prewordbreak \fi} \definetextmodediscretionary ~ {\prewordbreak\discretionary{-}{}{\thinspace}\postwordbreak} \definetextmodediscretionary ' {\prewordbreak\discretionary{-}{}{'}\postwordbreak} \definetextmodediscretionary ^ {\prewordbreak\discretionary{\hbox{$|$}}{}{\hbox{$|$}}% \allowbreak\postwordbreak} % bugged \definetextmodediscretionary < {\beginofsubsentence\prewordbreak\beginofsubsentencespacing} \definetextmodediscretionary > {\endofsubsentencespacing\prewordbreak\endofsubsentence} \definetextmodediscretionary = {\prewordbreak\midsentence\prewordbreak} % {\prewordbreak\compoundhyphen} % french \definetextmodediscretionary : {\removeunwantedspaces\prewordbreak\kern\hspaceamount\empty{:}:} \definetextmodediscretionary ; {\removeunwantedspaces\prewordbreak\kern\hspaceamount\empty{;};} \definetextmodediscretionary ? {\removeunwantedspaces\prewordbreak\kern\hspaceamount\empty{?}?} \definetextmodediscretionary ! {\removeunwantedspaces\prewordbreak\kern\hspaceamount\empty{!}!} \definetextmodediscretionary * {\prewordbreak\discretionary{-}{}{\kern.05em}\prewordbreak} % spanish \definetextmodediscretionary ?? {\prewordbreak\questiondown} \definetextmodediscretionary !! {\prewordbreak\exclamdown} % \ifx\normalcompound\undefined \let\normalcompound=| \fi %D \installdiscretionary | + %D \installdiscretionary + = \def\defaultdiscretionaryhyphen{\compoundhyphen} \installdiscretionary | \defaultdiscretionaryhyphen % installs in ctx and prt will fall back on it %D \macros %D {fakecompoundhyphen} %D %D In headers and footers as well as in active pieces of text %D we need a dirty hack. Try to imagine what is needed to %D savely break the next text across a line and at the same %D time make the words interactive. %D %D \starttyping %D \goto{Some||Long||Word} %D \stoptyping \unexpanded\def\fakecompoundhyphen {\def\|{\mathortext\vert\lang_compounds_fake_hyphen}} \def\lang_compounds_fake_hyphen {\def##1|% {\doifelsenothing{##1}\compoundhyphen{##1}% \kern\compoundbreakpoint\allowbreak}} %D \macros %D {midworddiscretionary} %D %D If needed, one can add a discretionary hyphen using \type %D {\midworddiscretionary}. This macro does the same as %D \PLAIN\ \TEX's \type {\-}, but, like the ones implemented %D earlier, this one also looks ahead for spaces and grouping %D tokens. \unexpanded\def\midworddiscretionary {\futurelet\nexttoken\lang_discretionaries_mid_word} \def\lang_discretionaries_mid_word {\ifx\nexttoken\blankspace\else \ifx\nexttoken\bgroup \else \ifx\nexttoken\egroup \else \discretionary{-}{}{}% \fi\fi\fi} %D \macros %D {installcompoundcharacter} %D %D When Tobias Burnus started translating the dutch manual of %D \PPCHTEX\ into german, he suggested to let \CONTEXT\ support %D the \type{german.sty} method of handling compound %D characters, especially the umlaut. This package is meant for %D use with \PLAIN\ \TEX\ as well as \LATEX. %D %D I decided to implement compound character support as %D versatile as possible. As a result one can define his own %D compound character support, like: %D %D \starttyping %D \installcompoundcharacter "a {\"a} %D \installcompoundcharacter "e {\"e} %D \installcompoundcharacter "i {\"i} %D \installcompoundcharacter "u {\"u} %D \installcompoundcharacter "o {\"o} %D \installcompoundcharacter "s {\SS} %D \stoptyping %D %D or even %D %D \starttyping %D \installcompoundcharacter "ck {\discretionary {k-}{k}{ck}} %D \installcompoundcharacter "ff {\discretionary{ff-}{f}{ff}} %D \stoptyping %D %D The support is not limited to alphabetic characters, so the %D next definition is also valid. %D %D \starttyping %D \installcompoundcharacter ". {.\doifnextcharelse{\spacetoken}{}{\kern.125em}} %D \stoptyping %D %D The implementation looks familiar and uses the same tricks as %D mentioned earlier in this module. We take care of two %D arguments, which complicates things a bit. \installcorenamespace{compoundnormal} \installcorenamespace{compoundsingle} \installcorenamespace{compoundmultiple} \installcorenamespace{compounddefinition} %D When we started working on MK IV code, we needed a different %D approach for defining the active character itself. In MK II as %D well as in MK IV we now use the catcode vectors. \setnewconstant\compoundcharactermode\plusone \newcount\c_lang_compounds_character \def\installcompoundcharacter #1#2#3 #4% {#4} no grouping {\ifcase\compoundcharactermode % ignore mode \else \chardef\c_lang_compounds_character`#1% \expandafter\chardef\csname\??compoundnormal\string#1\endcsname\c_lang_compounds_character \def\!!stringa{#3}% \expandafter\def\csname\ifx\!!stringa\empty\??compoundsingle\else\??compoundmultiple\fi\detokenize{#1#2#3}\endcsname{#4}% \setevalue{\??compounddefinition\detokenize{#1}}{\noexpand\lang_compounds_handle_character{\detokenize{#1}}}% beter nr's \expandafter\letcatcodecommand\expandafter\ctxcatcodes\expandafter\c_lang_compounds_character\csname\??compounddefinition\detokenize{#1}\endcsname \fi} %D We can also ignore definitions (needed in for instance \XML). Beware, %D this macro is supposed to be used grouped! \def\ignorecompoundcharacter {\compoundcharactermode\zerocount} %D In handling the compound characters we have to take care of %D \type{\bgroup} and \type{\egroup} tokens, so we end up with %D a multi||step interpretation macro. We look ahead for a %D \type{\bgroup}, \type{\egroup} or \type{\blankspace}. Being %D no user of this mechanism, the credits for testing them goes %D to Tobias Burnus, the first german user of \CONTEXT. %D %D We define these macros as \type{\long} because we can %D expect \type{\par} tokens. We need to look into the future %D with \type{\futurelet} to prevent spaces from %D disappearing. \def\lang_compounds_handle_character#1% {\def\lang_compounds_handle_character_finish{\lang_compounds_handle_character_finish_indeed{#1}}% \futurelet\nexttoken\xhandlecompoundcharacter} \def\lang_compounds_handle_character_finish_indeed {\ifx\nexttoken\bgroup %\expandafter\lang_compounds_handle_character_pickup % handle "{ee} -> \"ee %\expandafter\gobbleoneargument % forget "{ee} -> ee \expandafter\lang_compounds_handle_character_one % ignore "{ee} -> "ee \else\ifx\nexttoken\egroup \doubleexpandafter\lang_compounds_handle_character_normal \else\ifx\nexttoken\blankspace \tripleexpandafter\lang_compounds_handle_character_normal \else \tripleexpandafter\lang_compounds_handle_character_pickup \fi\fi\fi} \def\lang_compounds_handle_character_normal#1% {\csname\??compoundnormal\string#1\endcsname} \def\lang_compounds_handle_character_pickup#1#2% preserve space {\def\lang_compounds_handle_character_finish{\lang_compounds_handle_character_finish_indeed#1#2}% \futurelet\nexttoken\lang_compounds_handle_character_finish} \def\lang_compounds_handle_character_finish_indeed {\ifx\nexttoken\bgroup \expandafter\lang_compounds_handle_character_one \else\ifx\nexttoken\egroup \doubleexpandafter\lang_compounds_handle_character_one \else\ifx\nexttoken\blankspace \tripleexpandafter\lang_compounds_handle_character_one \else \tripleexpandafter\lang_compounds_handle_character_two \fi\fi\fi} %D Besides taken care of the grouping and space tokens, we have %D to deal with three situations. First we look if the next %D character equals the first one, if so, then we just insert %D the original. Next we look if indeed a compound character is %D defined. We either execute the compound character or just %D insert the first. So we have %D %D \starttyping %D %D \stoptyping %D %D In later modules we will see how these commands are used. \def\lang_compounds_handle_character_one#1#2% {\if\string#1\string#2% was: \ifx#1#2% \def\next{\csname\??compoundnormal\string#1\endcsname}% \else\ifcsname\??compoundsingle\string#1\string#2\endcsname \def\next{\csname\??compoundsingle\string#1\string#2\endcsname}% \else \def\next{\csname\??compoundnormal\string#1\endcsname#2}% \fi\fi \next} \def\lang_compounds_handle_character_two#1#2#3% {\if\string#1\string#2% \def\next{\csname\??compoundnormal\string#1\endcsname#3}% \else\ifcsname\??compoundmultiple\string#1\string#2\string#3\endcsname \def\next{\csname\??compoundmultiple\string#1\string#2\string#3\endcsname}% \else\ifcsname\??compoundsingle\string#1\string#2\endcsname \def\next{\csname\??compoundsingle\string#1\string#2\endcsname#3}% \else \def\next{\csname\??compoundnormal\string#1\endcsname#2#3}% \fi\fi\fi \next} %D For very obscure applications (see for an application \type %D {lang-sla.tex}) we provide: \def\simplifiedcompoundcharacter#1#2% {\ifcsname\??compoundsingle\string#1\string#2\endcsname \doubleexpandafter\firstofoneargument\csname\??compoundsingle\string#1\string#2\endcsname \else #2% \fi} %D \macros %D {disablediscretionaries,disablecompoundcharacter} %D %D Occasionally we need to disable this mechanism. For the %D moment we assume that \type {|} is used. \let\disablediscretionaries \ignorediscretionaries \let\disablecompoundcharacters\ignorecompoundcharacter %D \macros %D {normalcompound} %D %D Handy in for instance XML. (Kind of obsolete) \ifdefined\normalcompound \else \let\normalcompound=| \fi %D \macros %D {compound} %D %D We will overload the already active \type {|} so we have %D to save its meaning in order to be able to use this handy %D macro. %D %D \starttyping %D so test\compound{}test can be used instead of test||test %D \stoptyping \bgroup \catcode\barasciicode\activecatcode \unexpanded\gdef\compound#1{|#1|} \doglobal \appendtoks \def|#1|{\ifx#1\empty\empty-\else#1\fi}% \to \everysimplifycommands \egroup %D Here we hook some code into the clean up mechanism needed %D for verbatim data. \appendtoks \disablecompoundcharacters \disablediscretionaries \to \everycleanupfeatures \protect \endinput