% language=uk \usemodule[fnt-24] \startcomponent mk-cjk \environment mk-environment \definefontfallback [FullTyping] [adobemyungjostd-medium] [0x3000-0xFFFF] [check=yes,force=no] \definefontfallback [FullTyping] [adobesongstd-light] [0x3000-0xFFFF] [check=yes,force=no] \definefontsynonym [MyTyping] [lmmono10-regular] [fallbacks=FullTyping] \definefont[MyTypingFont][MyTyping sa 1] \nonknuthmode \chapter{Chinese, Japanese and Korean, aka CJK} \start \setuptyping[style=\MyTypingFont] % begin of typing hackery {\em This aspect of \MKIV\ is under construction. We use non-realistic examples. We need to reimplement chinese numbering in \LUA, etc.\ etc.} {\em todo: There is no need for checkinf the width if the halfwidth feature is turned on.} \subject{introduction} In \CONTEXT\ \MKII\ we support \CJK\ languages. Intercharacter spacing as well as linebreaks are taken care of. Chinese numbering is dealt with and labels and other language specific aspects are supported too. The implementation uses active characters and some special encoding subsystem. Although it works quite okay, in \MKIV\ we follow a different route. The current implementation is an intermediate one and is used to explore the possibilities and identify needs. One handicap in implementing \CJK\ support is that the wishlist of features and behaviour is somewhat dependent on who you talk to. This means that the implementation will have some default behaviour but can be tuned to specific needs. The current implementation uses the script related analyser and is triggered by fonts but at some point I may decide to provide analysing independent of fonts. As will all things \TEX, we need to find a proper font to get our document typeset and because \CJK\ fonts are normally quite large they are not always available on your system by default. \subject{scripts and languages} I'm no expert on \CJK\ and will never be one so don't expect much insight in the scripts and languages here. Here we only look at the way a sequence of characters in the input turns into a typeset paragraph. For that it is important to keep in mind that in a Korean or Japanese text we might find Chinese characters and that the spacing rules become somewhat fuzzed by that. For instance Korean has spaces between words and words can be broken at any point, while Chinese has no spaces. Officially Chinese runs from top to bottom but here we focus on the horizontal variant. When turned into glyphs the characters normally are of equal width and in principle we could expect them all to be vertically aligned. However, a font can have characters that take half that space: so called halfwidth characters. And, of course, in practice a font might have shapes that fall into this categrory but happen to have their own width which deviates from this. This means that a mechanism that deals with \CJK\ has to take care of a few things: \startitemize[packed] \item Spaces at the end of the line (or actually anywhere in the input stream) need to be removed but only for Chinese. \item Opening and closing symbols as well as punctuation needs special treatment especially when they are halfwidth. \item Korean uses proportially spaces punctuation and mixes with other latin fonts, while Chinese often uses built in latin shapes. \item We may break anywhere but not after an opening symbol like~( or and not before a closing symbol like~). \item We need to deal with mixed Chinese and Korean spacing rules. \stopitemize Let's start with showing some Korean. We use one of the fonts shipped by Adobe as part of Acrobat but first we define a Korean featureset and a font. \startbuffer \definefontfeature [korean] [script=hang,language=kor,mode=node,analyze=yes] \definefont[KoreanSample][adobemyungjostd-medium*korean] \stopbuffer \typebuffer \getbuffer Korean looks like this: \startbuffer \KoreanSample \setscript[hangul] 모든 인간은 태어날 때부터 자유로우며 그 존엄과 권리에 있어 동등하다. 인간은 천부적으로 이성과 양심을 부여받았으며 서로 형제애의 정신으로 행동하여야 한다. \stopbuffer \typebuffer \start \getbuffer \stop The Korean script reflect syllabes and is very structured. Although modern fonts contain prebuilt syllabes one can also use the jamo alphabet to build them from components. The following example is provided by Dohyun Kim: \startbuffer \definefontfeature [medievalkorean] [mode=node,script=hang,lang=kor,ccmp=yes,ljmo=yes,vjmo=yes,tjmo=yes] \definefontfeature [modernkorean] [mode=node,script=hang,lang=kor] \enabletrackers[scripts.analyzing] \setscript[hangul] \definedfont [UnBatang*medievalkorean at 20pt] ᄒᆞᆫ글 \ruledhbox{ᄒᆞᆫ글} \ruledhbox{ᄒᆞᆫ} \ruledhbox{글}\blank \definedfont [UnBatang*modernkorean at 20pt] ᄒᆞᆫ글 \ruledhbox{ᄒᆞᆫ글} \ruledhbox{ᄒᆞᆫ} \ruledhbox{글}\blank \disabletrackers[scripts.analyzing] \stopbuffer \typebuffer \start \getbuffer \stop There are subtle differences between the medieval and modern shapes. It was this example that lead to more advanced \type {tounicode} support in \MKIV\ so that copy and paste works out well now for such input. For Chinese we define a couple of features \startbuffer \definefontfeature [chinese-traditional] [mode=node,script=hang,lang=zht] \definefontfeature [chinese-simple] [mode=node,script=hang,lang=zhs] \definefontfeature [chinese-traditional-hw] [mode=node,script=hang,lang=zht,hwid=yes] \definefontfeature [chinese-simple-hw] [mode=node,script=hang,lang=zhs,hwid=yes] \stopbuffer \typebuffer \getbuffer \startbuffer \definefont[ChineseSampleFW][adobesongstd-light*chinese-traditional] \definefont[ChineseSampleHW][adobesongstd-light*chinese-traditional-hw] \setscript[hanzi] \ChineseSampleFW 兡也包因沘氓侷柵苗孫孫財崧淫設弼琶跑愍窟榜蒸奭稽 霄瓢館縲擻鼕〈孃魔釁〉佉沎岠狋垚柛胅娭涘罞偟惈牻荺 傒焱菏酡廅滘絺赩塴榗箂踃嬁澕蓴醊獧螗餟燱螬駸礑鎞 瀧鄿瀯騬醹躕鱕。 \ChineseSampleHW 兡也包因沘氓侷柵苗孫孫財崧淫設弼琶跑愍窟榜蒸奭稽 霄瓢館縲擻鼕〈孃魔釁〉佉沎岠狋垚柛胅娭涘罞偟惈牻荺 傒焱菏酡廅滘絺赩塴榗箂踃嬁澕蓴醊獧螗餟燱螬駸礑鎞 瀧鄿瀯騬醹躕鱕。 \stopbuffer \typebuffer \start \getbuffer \stop A few more samples: \startbuffer \definefont[ChFntAT][name:adobesongstd-light*chinese-traditional-hw at 16pt] \definefont[ChFntBT][name:songti*chinese-traditional at 16pt] \definefont[ChFntCT][name:fangsong*chinese-traditional at 16pt] \definefont[ChFntAS][name:adobesongstd-light*chinese-simple-hw at 16pt] \definefont[ChFntBS][name:songti*chinese-simple at 16pt] \definefont[ChFntCS][name:fangsong*chinese-simple at 16pt] \stopbuffer \typebuffer \getbuffer In these fonts traditional comes out as follows: \start \setscript[hanzi] \startlines \ChFntAT 我〈能吞下玻璃而不傷身〉體。 \ChFntBT 我〈能吞下玻璃而不傷身〉體。 \ChFntCT 我〈能吞下玻璃而不傷身〉體。 \stoplines \stop And simple as: \start \setscript[hanzi] \startlines \ChFntAS 我〈能吞下玻璃而不伤身〉体。 \ChFntBS 我〈能吞下玻璃而不伤身〉体。 \ChFntCS 我〈能吞下玻璃而不伤身〉体。 \stoplines \stop \subject {tracing} As usual in \CONTEXT, we have some tracing built in. When you say \startbuffer \enabletrackers[scripts.analyzing] \stopbuffer You will get the output colored according to the category that the analyser put them in. When you say \startbuffer \enabletrackers[scripts.injections] \stopbuffer some rudimentary information will be written to the log about whet gets inserted in the nodelist. Analyzed input looks like: \startbuffer 아아, 나는 이제야 도(道)를 알았도다. 마음이 어두운 자는 이목이 누(累)가 되지 않는다. 이목만을 믿는 자는 보고 듣는 것이 더욱 밝혀져서 병이 되는 것이다. 이제 내 마부가 발을 말굽에 밟혀서 뒷차에 실리었으므로, 나는 드디어 혼자 고삐를 늦추어 강에 띄우고, 무릎을 구부려 발을 모으고 안장 위에 앉았다. 한번 떨어지면 강이나 물로 땅을 삼고, 물로 옷을 삼으며, 물로 몸을 삼고, 물로 성정을 삼을 것이다. 이제야 내 마음은 한번 떨어질 것을 판단한 터이므로, 내 귓속에 강물 소리가 없어졌다. 무릇 아홉 번 건너는데도 걱정이 없어 의자 위에서 좌와(坐臥)하고 기거(起居)하는 것 같았다. \stopbuffer \typebuffer \start \enabletrackers[scripts.analyzing] \KoreanSample \setscript[hangul] \getbuffer \disabletrackers[scripts.analyzing] \stop For developers (and those who provide them with input) we have another tracing \startbuffer \definedfont[arialuni*korean at 10pt] \setscript[hangul] \ShowCombinationsKorean \stopbuffer \typebuffer We need to use a font that supports Chinese as well as Korean. This gives quite some output. \start \getbuffer \stop % 안녕하세요? (Hello) % 감사합니다. (Thank you) \page \stop % end of typing hackery \stopcomponent % \font\JapaneseFontA=name:kozminprovi-regular % % \startlines % Hankaku : {\JapaneseFontA アイウエオカキクケコサシスセソタチツテ} % Romanj digits : {\JapaneseFontA 0123456789} % Romanj lowercase : {\JapaneseFontA abcdefghi} % Romanj uppercase : {\JapaneseFontA ABCDEFGHI} % \stoplines % % \enabletrackers[scripts.analyzing] % % \start \raggedright \dontleavehmode % \ruledhbox\bgroup \ChFntBS ,\egroup \quad % \ruledhbox\bgroup \ChFntBS 〉\egroup \quad % \ruledhbox\bgroup \ChFntBS 〈\egroup \par % \stop % % \def\DoChineseSample#1#2#3% % {\ruledvtop{#1\hsize#2\relax#3}} % % \def\ChineseSampleA#1#2{% % \blank % \subsubject{hsize #2, fullwidth} % \dontleavehmode % \DoChineseSample{#1}{#2}{吞吞吞,吞吞吞吞。}\quad % \DoChineseSample{#1}{#2}{吞吞吞,,吞吞吞吞。}\quad % \DoChineseSample{#1}{#2}{吞吞吞〉吞吞吞吞。}\quad % \DoChineseSample{#1}{#2}{吞吞吞〉,吞吞吞吞。} % \blank[small] % \dontleavehmode % \DoChineseSample{#1}{#2}{吞吞吞〉〉吞吞吞吞。}\quad % \DoChineseSample{#1}{#2}{吞吞吞〉〉吞吞吞吞。}\quad % \DoChineseSample{#1}{#2}{〈吞吞吞吞吞吞吞。}\quad % \DoChineseSample{#1}{#2}{〈〈吞吞吞吞吞吞吞。} % \blank[small] % \dontleavehmode % \DoChineseSample{#1}{#2}{吞吞吞…吞吞吞吞。}\quad % \DoChineseSample{#1}{#2}{吞吞吞……吞吞吞吞。} % \dontleavehmode % \blank % } % % \ChineseSampleA\ChFntBS{4.25em} % \ChineseSampleA\ChFntBS{4.00em} % \ChineseSampleA\ChFntBS{3.75em} % \ChineseSampleA\ChFntBS{3.50em} % \ChineseSampleA\ChFntBS{3.25em} % \ChineseSampleA\ChFntBS{3.00em} % % \def\ChineseSampleB#1#2{% % \blank % \subsubject{hsize #2, halfwidth} % \dontleavehmode % \DoChineseSample{#1}{#2}{吞吞吞,吞吞吞吞。}\quad % \DoChineseSample{#1}{#2}{吞吞吞‘吞吞吞吞。}\quad % \DoChineseSample{#1}{#2}{吞吞吞’吞吞吞吞。}\quad % \blank % } % % \ChineseSampleB\ChFntBS{4.25em} % \ChineseSampleB\ChFntBS{4.00em} % \ChineseSampleB\ChFntBS{3.75em} % \ChineseSampleB\ChFntBS{3.50em} % \ChineseSampleB\ChFntBS{3.25em} % \ChineseSampleB\ChFntBS{3.00em} % % \disabletrackers[scripts.analyzing]