summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/mk/mk-cjk.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/context/sources/general/manuals/mk/mk-cjk.tex')
-rw-r--r--doc/context/sources/general/manuals/mk/mk-cjk.tex320
1 files changed, 320 insertions, 0 deletions
diff --git a/doc/context/sources/general/manuals/mk/mk-cjk.tex b/doc/context/sources/general/manuals/mk/mk-cjk.tex
new file mode 100644
index 000000000..dfe17a29c
--- /dev/null
+++ b/doc/context/sources/general/manuals/mk/mk-cjk.tex
@@ -0,0 +1,320 @@
+% language=uk
+
+\usemodule[fnt-24]
+
+\startcomponent mk-cjk
+
+\environment mk-environment
+
+\definefontfallback [FullTyping] [adobemyungjostd-medium] [0x3000-0xFFFF] [check=yes,force=no]
+\definefontfallback [FullTyping] [adobesongstd-light] [0x3000-0xFFFF] [check=yes,force=no]
+
+\definefontsynonym [MyTyping] [lmmono10-regular] [fallbacks=FullTyping]
+\definefont[MyTypingFont][MyTyping sa 1]
+
+\nonknuthmode
+
+\chapter{Chinese, Japanese and Korean, aka CJK}
+
+\start \setuptyping[style=\MyTypingFont] % begin of typing hackery
+
+{\em This aspect of \MKIV\ is under construction. We use non-realistic examples.
+We need to reimplement chinese numbering in \LUA, etc.\ etc.}
+
+{\em todo: There is no need for checkinf the width if the halfwidth feature is turned on.}
+
+\subject{introduction}
+
+In \CONTEXT\ \MKII\ we support \CJK\ languages. Intercharacter spacing as
+well as linebreaks are taken care of. Chinese numbering is dealt with and
+labels and other language specific aspects are supported too. The implementation
+uses active characters and some special encoding subsystem. Although it works
+quite okay, in \MKIV\ we follow a different route.
+
+The current implementation is an intermediate one and is used to explore the
+possibilities and identify needs. One handicap in implementing \CJK\ support is
+that the wishlist of features and behaviour is somewhat dependent on who you talk
+to. This means that the implementation will have some default behaviour but can be
+tuned to specific needs. The current implementation uses the script related
+analyser and is triggered by fonts but at some point I may decide to provide
+analysing independent of fonts.
+
+As will all things \TEX, we need to find a proper font to get our document typeset
+and because \CJK\ fonts are normally quite large they are not always available on
+your system by default.
+
+\subject{scripts and languages}
+
+I'm no expert on \CJK\ and will never be one so don't expect much insight in the
+scripts and languages here. Here we only look at the way a sequence of characters
+in the input turns into a typeset paragraph. For that it is important to keep in
+mind that in a Korean or Japanese text we might find Chinese characters and that
+the spacing rules become somewhat fuzzed by that. For instance Korean has spaces
+between words and words can be broken at any point, while Chinese has no spaces.
+
+Officially Chinese runs from top to bottom but here we focus on the horizontal
+variant. When turned into glyphs the characters normally are of equal width
+and in principle we could expect them all to be vertically aligned. However, a
+font can have characters that take half that space: so called halfwidth
+characters. And, of course, in practice a font might have shapes that fall into
+this categrory but happen to have their own width which deviates from this.
+
+This means that a mechanism that deals with \CJK\ has to take care of a few
+things:
+
+\startitemize[packed]
+\item Spaces at the end of the line (or actually anywhere in the input stream)
+ need to be removed but only for Chinese.
+\item Opening and closing symbols as well as punctuation needs special treatment
+ especially when they are halfwidth.
+\item Korean uses proportially spaces punctuation and mixes with other latin fonts,
+ while Chinese often uses built in latin shapes.
+\item We may break anywhere but not after an opening symbol like~( or and not
+ before a closing symbol like~).
+\item We need to deal with mixed Chinese and Korean spacing rules.
+\stopitemize
+
+Let's start with showing some Korean. We use one of the fonts shipped
+by Adobe as part of Acrobat but first we define a Korean featureset and
+a font.
+
+\startbuffer
+\definefontfeature
+ [korean]
+ [script=hang,language=kor,mode=node,analyze=yes]
+
+\definefont[KoreanSample][adobemyungjostd-medium*korean]
+\stopbuffer
+
+\typebuffer \getbuffer
+
+Korean looks like this:
+
+\startbuffer
+\KoreanSample \setscript[hangul]
+
+모든 인간은 태어날 때부터 자유로우며 그 존엄과 권리에 있어 동등하다.
+인간은 천부적으로 이성과 양심을 부여받았으며 서로 형제애의 정신으로
+행동하여야 한다.
+\stopbuffer
+
+\typebuffer \start \getbuffer \stop
+
+The Korean script reflect syllabes and is very structured.
+Although modern fonts contain prebuilt syllabes one can also use
+the jamo alphabet to build them from components. The following
+example is provided by Dohyun Kim:
+
+\startbuffer
+\definefontfeature [medievalkorean] [mode=node,script=hang,lang=kor,ccmp=yes,ljmo=yes,vjmo=yes,tjmo=yes]
+\definefontfeature [modernkorean] [mode=node,script=hang,lang=kor]
+
+\enabletrackers[scripts.analyzing]
+\setscript[hangul]
+\definedfont [UnBatang*medievalkorean at 20pt] ᄒᆞᆫ글 \ruledhbox{ᄒᆞᆫ글} \ruledhbox{ᄒᆞᆫ} \ruledhbox{글}\blank
+\definedfont [UnBatang*modernkorean at 20pt] ᄒᆞᆫ글 \ruledhbox{ᄒᆞᆫ글} \ruledhbox{ᄒᆞᆫ} \ruledhbox{글}\blank
+\disabletrackers[scripts.analyzing]
+\stopbuffer
+
+\typebuffer \start \getbuffer \stop
+
+There are subtle differences between the medieval and modern
+shapes. It was this example that lead to more advanced \type
+{tounicode} support in \MKIV\ so that copy and paste works out
+well now for such input.
+
+For Chinese we define a couple of features
+
+\startbuffer
+\definefontfeature
+ [chinese-traditional]
+ [mode=node,script=hang,lang=zht]
+\definefontfeature
+ [chinese-simple]
+ [mode=node,script=hang,lang=zhs]
+\definefontfeature
+ [chinese-traditional-hw]
+ [mode=node,script=hang,lang=zht,hwid=yes]
+\definefontfeature
+ [chinese-simple-hw]
+ [mode=node,script=hang,lang=zhs,hwid=yes]
+\stopbuffer
+
+\typebuffer \getbuffer
+
+\startbuffer
+\definefont[ChineseSampleFW][adobesongstd-light*chinese-traditional]
+\definefont[ChineseSampleHW][adobesongstd-light*chinese-traditional-hw]
+\setscript[hanzi]
+
+\ChineseSampleFW
+兡也包因沘氓侷柵苗孫孫財崧淫設弼琶跑愍窟榜蒸奭稽
+霄瓢館縲擻鼕〈孃魔釁〉佉沎岠狋垚柛胅娭涘罞偟惈牻荺
+傒焱菏酡廅滘絺赩塴榗箂踃嬁澕蓴醊獧螗餟燱螬駸礑鎞
+瀧鄿瀯騬醹躕鱕。
+
+\ChineseSampleHW
+兡也包因沘氓侷柵苗孫孫財崧淫設弼琶跑愍窟榜蒸奭稽
+霄瓢館縲擻鼕〈孃魔釁〉佉沎岠狋垚柛胅娭涘罞偟惈牻荺
+傒焱菏酡廅滘絺赩塴榗箂踃嬁澕蓴醊獧螗餟燱螬駸礑鎞
+瀧鄿瀯騬醹躕鱕。
+\stopbuffer
+
+\typebuffer \start \getbuffer \stop
+
+A few more samples:
+
+\startbuffer
+\definefont[ChFntAT][name:adobesongstd-light*chinese-traditional-hw at 16pt]
+\definefont[ChFntBT][name:songti*chinese-traditional at 16pt]
+\definefont[ChFntCT][name:fangsong*chinese-traditional at 16pt]
+
+\definefont[ChFntAS][name:adobesongstd-light*chinese-simple-hw at 16pt]
+\definefont[ChFntBS][name:songti*chinese-simple at 16pt]
+\definefont[ChFntCS][name:fangsong*chinese-simple at 16pt]
+\stopbuffer
+
+\typebuffer \getbuffer
+
+In these fonts traditional comes out as follows:
+
+\start \setscript[hanzi]
+\startlines
+\ChFntAT 我〈能吞下玻璃而不傷身〉體。
+\ChFntBT 我〈能吞下玻璃而不傷身〉體。
+\ChFntCT 我〈能吞下玻璃而不傷身〉體。
+\stoplines
+\stop
+
+And simple as:
+
+\start \setscript[hanzi]
+\startlines
+\ChFntAS 我〈能吞下玻璃而不伤身〉体。
+\ChFntBS 我〈能吞下玻璃而不伤身〉体。
+\ChFntCS 我〈能吞下玻璃而不伤身〉体。
+\stoplines
+\stop
+
+\subject {tracing}
+
+As usual in \CONTEXT, we have some tracing built in. When you say
+
+\startbuffer
+\enabletrackers[scripts.analyzing]
+\stopbuffer
+
+You will get the output colored according to the category that the
+analyser put them in. When you say
+
+\startbuffer
+\enabletrackers[scripts.injections]
+\stopbuffer
+
+some rudimentary information will be written to the log about whet gets
+inserted in the nodelist.
+
+Analyzed input looks like:
+
+\startbuffer
+아아, 나는 이제야 도(道)를 알았도다. 마음이 어두운 자는 이목이
+누(累)가 되지 않는다. 이목만을 믿는 자는 보고 듣는 것이
+더욱 밝혀져서 병이 되는 것이다. 이제 내 마부가 발을 말굽에
+밟혀서 뒷차에 실리었으므로, 나는 드디어 혼자 고삐를 늦추어
+강에 띄우고, 무릎을 구부려 발을 모으고 안장 위에 앉았다.
+한번 떨어지면 강이나 물로 땅을 삼고, 물로 옷을 삼으며,
+물로 몸을 삼고, 물로 성정을 삼을 것이다. 이제야 내 마음은
+한번 떨어질 것을 판단한 터이므로, 내 귓속에 강물 소리가 없어졌다.
+무릇 아홉 번 건너는데도 걱정이 없어 의자 위에서 좌와(坐臥)하고
+기거(起居)하는 것 같았다.
+\stopbuffer
+
+\typebuffer \start \enabletrackers[scripts.analyzing] \KoreanSample \setscript[hangul] \getbuffer \disabletrackers[scripts.analyzing] \stop
+
+For developers (and those who provide them with input) we have another tracing
+
+\startbuffer
+\definedfont[arialuni*korean at 10pt] \setscript[hangul] \ShowCombinationsKorean
+\stopbuffer
+
+\typebuffer
+
+We need to use a font that supports Chinese as well as Korean. This gives quite some output.
+
+\start \getbuffer \stop
+
+% 안녕하세요? (Hello)
+% 감사합니다. (Thank you)
+
+\page \stop % end of typing hackery
+
+\stopcomponent
+
+% \font\JapaneseFontA=name:kozminprovi-regular
+%
+% \startlines
+% Hankaku : {\JapaneseFontA アイウエオカキクケコサシスセソタチツテ}
+% Romanj digits : {\JapaneseFontA 0123456789}
+% Romanj lowercase : {\JapaneseFontA abcdefghi}
+% Romanj uppercase : {\JapaneseFontA ABCDEFGHI}
+% \stoplines
+%
+% \enabletrackers[scripts.analyzing]
+%
+% \start \raggedright \dontleavehmode
+% \ruledhbox\bgroup \ChFntBS ,\egroup \quad
+% \ruledhbox\bgroup \ChFntBS 〉\egroup \quad
+% \ruledhbox\bgroup \ChFntBS 〈\egroup \par
+% \stop
+%
+% \def\DoChineseSample#1#2#3%
+% {\ruledvtop{#1\hsize#2\relax#3}}
+%
+% \def\ChineseSampleA#1#2{%
+% \blank
+% \subsubject{hsize #2, fullwidth}
+% \dontleavehmode
+% \DoChineseSample{#1}{#2}{吞吞吞,吞吞吞吞。}\quad
+% \DoChineseSample{#1}{#2}{吞吞吞,,吞吞吞吞。}\quad
+% \DoChineseSample{#1}{#2}{吞吞吞〉吞吞吞吞。}\quad
+% \DoChineseSample{#1}{#2}{吞吞吞〉,吞吞吞吞。}
+% \blank[small]
+% \dontleavehmode
+% \DoChineseSample{#1}{#2}{吞吞吞〉〉吞吞吞吞。}\quad
+% \DoChineseSample{#1}{#2}{吞吞吞〉〉吞吞吞吞。}\quad
+% \DoChineseSample{#1}{#2}{〈吞吞吞吞吞吞吞。}\quad
+% \DoChineseSample{#1}{#2}{〈〈吞吞吞吞吞吞吞。}
+% \blank[small]
+% \dontleavehmode
+% \DoChineseSample{#1}{#2}{吞吞吞…吞吞吞吞。}\quad
+% \DoChineseSample{#1}{#2}{吞吞吞……吞吞吞吞。}
+% \dontleavehmode
+% \blank
+% }
+%
+% \ChineseSampleA\ChFntBS{4.25em}
+% \ChineseSampleA\ChFntBS{4.00em}
+% \ChineseSampleA\ChFntBS{3.75em}
+% \ChineseSampleA\ChFntBS{3.50em}
+% \ChineseSampleA\ChFntBS{3.25em}
+% \ChineseSampleA\ChFntBS{3.00em}
+%
+% \def\ChineseSampleB#1#2{%
+% \blank
+% \subsubject{hsize #2, halfwidth}
+% \dontleavehmode
+% \DoChineseSample{#1}{#2}{吞吞吞,吞吞吞吞。}\quad
+% \DoChineseSample{#1}{#2}{吞吞吞‘吞吞吞吞。}\quad
+% \DoChineseSample{#1}{#2}{吞吞吞’吞吞吞吞。}\quad
+% \blank
+% }
+%
+% \ChineseSampleB\ChFntBS{4.25em}
+% \ChineseSampleB\ChFntBS{4.00em}
+% \ChineseSampleB\ChFntBS{3.75em}
+% \ChineseSampleB\ChFntBS{3.50em}
+% \ChineseSampleB\ChFntBS{3.25em}
+% \ChineseSampleB\ChFntBS{3.00em}
+%
+% \disabletrackers[scripts.analyzing]
+