summaryrefslogtreecommitdiff
path: root/doc/context/sources/general/manuals/languages/languages-hyphenation.tex
diff options
context:
space:
mode:
Diffstat (limited to 'doc/context/sources/general/manuals/languages/languages-hyphenation.tex')
-rw-r--r--doc/context/sources/general/manuals/languages/languages-hyphenation.tex102
1 files changed, 84 insertions, 18 deletions
diff --git a/doc/context/sources/general/manuals/languages/languages-hyphenation.tex b/doc/context/sources/general/manuals/languages/languages-hyphenation.tex
index 48e6eb385..96271d1aa 100644
--- a/doc/context/sources/general/manuals/languages/languages-hyphenation.tex
+++ b/doc/context/sources/general/manuals/languages/languages-hyphenation.tex
@@ -1,9 +1,9 @@
% language=uk
-\environment languages-environment
-
\startcomponent languages-hyphenation
+\environment languages-environment
+
\startchapter[title=Hyphenation][color=darkmagenta]
\startsection[title=How it works]
@@ -339,7 +339,7 @@ aaaaabbbbb \par
\typebuffer
-\noindentation This code is self explaining and results in:
+This code is self explaining and results in:
\blank
@@ -347,8 +347,7 @@ aaaaabbbbb \par
\setupindenting[no]\hsize 1mm \lefthyphenmin 1 \righthyphenmin 1 \getbuffer
\stophyphenation
-\noindentation There can be multiple hyphens and even multiple words in such a
-specification:
+There can be multiple hyphens and even multiple words in such a specification:
\startbuffer
\registerhyphenationexception[aaaaa-bbbbb cc-ccc-ddd-dd]
@@ -358,7 +357,7 @@ cccccddddd \par
\typebuffer
-\noindentation We get:
+We get:
\blank
@@ -385,8 +384,8 @@ whatever-whatever \par
\typebuffer[demo]
These lines will hyphenate differently and in traditional \TEX\ you need to
-insert penalties and|/|or glue to get around it. In the \LUA\ variant we can
-enable that limitation.
+insert penalties and|/|or glue to get around it unless you instruct \LUATEX\ to
+be more. In the \LUA\ variant we can enable that limitation.
\startbuffer
\definehyphenationfeatures
@@ -446,7 +445,7 @@ extensions as mentioned. However, you can plug in your own code, given that it
does return a proper hyphenation result. One reason for providing this plug is
that there are users who want to play with hyphenators based on a different
logic. In \CONTEXT\ we already have some methods to deal with languages that
-(for instance) have no spaces but split on words or syllabes. A more tight
+(for instance) have no spaces but split on words or syllables. A more tight
integration with the hyphenator can have advantages so I will explore these
options when there is demand.
@@ -520,7 +519,7 @@ When applied to one the tufte example we get:
\starthyphenation[traditional]
\setuptolerance[tolerant]
\sethyphenationfeatures[demo]
- \noindentation % \dontleavehmode
+ \dontleavehmode
\input tufte\relax
\stophyphenation
\stopbuffer
@@ -626,7 +625,7 @@ So, we only break a line after symbols.
\stophyphenation
\stoplinecorrection
-\noindentation A quick test can look as follows:
+A quick test can look as follows:
\startbuffer
\starthyphenation[traditional]
@@ -663,7 +662,7 @@ superef\zwnj fective
\typebuffer[sample]
-\noindentation and define two featuresets:
+and define two featuresets:
\startbuffer
\definehyphenationfeatures
@@ -678,7 +677,7 @@ superef\zwnj fective
\typebuffer \getbuffer
-\noindentation We limit the width to 1mm and get:
+We limit the width to 1mm and get:
\startlinecorrection[blank]
\bTABLE[option=stretch,offset=.5ex]
@@ -748,7 +747,7 @@ same as the breakpoints mechanism (compounds).
\starthyphenation[traditional]
\sethyphenationfeatures[demo-3]
\dontcomplain
- \hsize 1mm \noindentation
+ \hsize 1mm
we use (super)special(ized) patterns
\stophyphenation
\stopbuffer
@@ -764,11 +763,11 @@ We can make this more clever by adding patterns:
\typebuffer \blank \getbuffer \blank
-\noindentation This gives:
+This gives:
\blank \getbuffer[demo] \blank
-\noindentation A detailed trace shows that these patterns get applied:
+A detailed trace shows that these patterns get applied:
\starthyphenation[traditional]
\ttx
@@ -778,8 +777,75 @@ We can make this more clever by adding patterns:
\unregisterhyphenationpattern[en][)9]
\unregisterhyphenationpattern[en][9(]
-\noindentation The somewhat weird hyphens at the edges will in practice not show
-up because there is always one regular character there.
+The somewhat weird hyphens at the edges will in practice not show up because
+there is always one regular character there.
+
+\stopsection
+
+\startsection[title=Counting]
+
+There is not much you can do about patterns. It's a craft to make them and so
+they are shipped with the distribution. In order to hyphenate well, \TEX\ looks
+at some character properties. In \CONTEXT\ only the characters used in the
+patterns of a language get tagged as valid in a word.
+
+The following example illustrates that there can be corner cases. In fact, this
+example might render differently depending on the patterns available. First we
+define an extra language, based on French.
+
+\startbuffer
+\installlanguage[frf][default=fr,patterns=fr,factor=yes]
+\stopbuffer
+
+\typebuffer \getbuffer
+
+Here we set the \type {factor} parameter which tells the loader that it should
+look at the characters used in a special way: some count for none, and some count
+for more than one when determining the min values used to determine if and where
+hyphenation is to be applied.
+
+\startbuffer
+\startmixedcolumns[n=3,balance=yes]
+ \hsize 1mm \dontcomplain
+ \language[fr] aesop oedipus æsop œdipus \column
+ \hsize 1mm \dontcomplain
+ \language[frf] aesop oedipus æsop œdipus \column
+ \startexceptions æ-sop \stopexceptions
+ \hsize 1mm \dontcomplain
+ \language[frf] aesop oedipus æsop œdipus
+\stopmixedcolumns
+\stopbuffer
+
+\typebuffer
+
+We get three (when writing this manual) different columns:
+
+\getbuffer
+
+The trick is in the \type {factor}: when set to \type {yes} an \type {æ} is
+counted as two characters. Combining marks count as zero but you will not
+find them being used as we already resolve them in an earlier stage.
+
+\startluacode
+context.startcolumns { n = 2 }
+context.starttabulate { "|Tc|c|c|l|" }
+for u, data in table.sortedhash(languages.hjcounts) do
+ if data.category ~= "combining" then
+ context.NC() context("%05U",u)
+ context.NC() context("%c",u)
+ context.NC() context(data.count)
+ context.NC() context(data.category)
+ context.NC() context.NR()
+ end
+end
+context.stoptabulate()
+context.stopcolumns()
+\stopluacode
+
+It is very unlikely to find an \type {ffi} in the input and even an \type {ij} is
+rare. The \type {æ} is marked as character and the \type {œ} a ligatyure in
+\UNICODE. Maybe all the characters here are dubious but al least we provide a
+way to experiment with them.
\stopsection