summaryrefslogtreecommitdiff
path: root/tex/context/patterns/common/lang-bg.rme
diff options
context:
space:
mode:
Diffstat (limited to 'tex/context/patterns/common/lang-bg.rme')
-rw-r--r--tex/context/patterns/common/lang-bg.rme889
1 files changed, 1 insertions, 888 deletions
diff --git a/tex/context/patterns/common/lang-bg.rme b/tex/context/patterns/common/lang-bg.rme
index 25a3e2ca5..fff218f71 100644
--- a/tex/context/patterns/common/lang-bg.rme
+++ b/tex/context/patterns/common/lang-bg.rme
@@ -1,890 +1,3 @@
% generated by mtxrun --script pattern --convert
-% copyright: Copyright (C) 2000, 2004, 2017 by Anton Zinoviev <anton@lml.bas.bg>
-% title: Bulgarian hyphenation patterns
-% version: 21 October 2017
-% language:
-% name: Bulgarian
-% tag: bg
-% notice: >
-% This file is part of the hyph-utf8 package.
-% See http://www.hyphenation.org for more information.
-% authors:
-% -
-% name: Anton Zinoviev
-% contact: anton:lml.bas.bg
-% licence:
-% text: >
-% This software may be used, modified, copied, distributed, and sold,
-% both in source and binary form provided that the above copyright
-% notice and these terms are retained. The name of the author may not
-% be used to endorse or promote products derived from this software
-% without prior permission. THIS SOFTWARE IS PROVIDES "AS IS" AND
-% ANY EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED. IN NO EVENT
-% SHALL THE AUTHOR BE LIABLE FOR ANY DAMAGES ARISING IN ANY WAY OUT
-% OF THE USE OF THIS SOFTWARE.
-% hyphenmins:
-% typesetting:
-% left: 2
-% right: 2
-% changes: See below
-% ==========================================
-% Copyright (C) 2000,2004,2017 by Anton Zinoviev <anton@lml.bas.bg>
-%
-% This software may be used, modified, copied, distributed, and sold,
-% both in source and binary form provided that the above copyright
-% notice and these terms are retained. The name of the author may not
-% be used to endorse or promote products derived from this software
-% without prior permission. THIS SOFTWARE IS PROVIDES "AS IS" AND
-% ANY EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED. IN NO EVENT
-% SHALL THE AUTHOR BE LIABLE FOR ANY DAMAGES ARISING IN ANY WAY OUT
-% OF THE USE OF THIS SOFTWARE.
-%
-% Bulgarian hyphenation patterns
-%
-% Generated by ./hyph-bg.sh --safe-morphology --standalone-tex
-%
-% Both left and right hyphenmins should be set to 2.
-%
-% % Automated Bulgarian Hyphenation
-% % Anton Zinoviev
-% % 21 October 2017
-%
-% Principles of the Bulgarian hyphenation
-% =======================================
-%
-% One specificity of the Bulgarian language is that the average length
-% of the words is greater than in English. When typesetting a Bulgarian
-% text, hyphenation is more important than when typesetting an English
-% text. Knuth's algorithm for line-breaking is such that in most
-% English paragraphs no hyphenation will be used. With a Bulgarian
-% text, however, even the Knuth's algorithm will use hyphenation in most
-% paragraphs. Hyphenation becomes an absolute necessity if we want to
-% obtain nice, justified paragraphs when using a software with dumb
-% line-breaking algorithm, such as LibreOffice.
-%
-% According to Decree 936 of the Council of Ministers promulgated on 27
-% November 1950, the Institute for Bulgarian Language at the Bulgarian
-% Academy of Sciences is authorised to publish the rules of the
-% orthography of the Bulgarian language (within certain limits).
-%
-% Hyphenation rules between 1945 and 1983
-% ---------------------------------------
-%
-% Between 1945 and 1983 Bulgarian used syllable hyphenation with two
-% morphological exceptions: hyphenation is preferred between a prefix
-% and a stem and at the boundary of compound words. The following were
-% the rules governing the hyphenation:
-%
-% 1. One letter does not stay alone. Words of one syllable can not be
-% hyphenated.
-% 2. No hyphenation before or after ь.
-% 3. In a sequence of vowels at least one vowel stays before the
-% hyphen.
-% 4. A single consonant between two vowels links with the second vowel.
-% For example по-ле /po-le/, ра-бо-та /ra-bo-ta/.
-% 5. In a sequence of consonants between two vowels, at least one
-% consonant stays with the second vowel. For example те-сто /te-sto/
-% or тес-то /tes-to/.[^b]
-% 6. In a sequence of consonants between two vowels, if the first
-% consonant is sonorant (й /y/, л /l/, м /m/, н /n/, р /r/), then it
-% stays with the first vowel. For example гер-дан /ger-dan/, сен-ки
-% /sen-ki/.
-% 7. The hyphenation separates two successive equal consonants. For
-% example времен-но /vremen-no/, пролет-та /prolet-ta/.
-% 8. When the letters дж /dzh/ and дз /dz/ denote a single consonant,
-% then they are not separated. For example боя-джия /boya-dzhiya/
-% but not бояд-жия /boyad-zhiya/. When these letters denote two
-% consonants, then the normal rules apply: над-живявам
-% /nad-zhivyavam/.
-% 9. Word prefixes may not be broken. Compound words are hyphenated
-% either at the boundary of the components or the hyphenation rules
-% are applied to each of the components separately. For example:
-% пред-упреждавам /pred-uprezhdavam/ (not пре-дупреждавам
-% /pre-duprezhdavam/), пред-известие /pred-izvestie/ (not
-% пре-дизвестие /pre-dizvestie/), за-движвам /za-dvizhvam/ (not
-% зад-вижвам /zad-vizhvam/), авто-клуб /avto-klub/ (not авток-луб
-% /avtok-lub/), вакуум-апарат /vakuum-aparat/ (not вакуу-мапарат
-% /vakuu-maparat/).
-%
-% In some rare cases the proper application of rule 9 depends on the
-% semantics of the word. For example пре-дреша /pre-dresha/ 'change
-% clothes' but пред-реша /pred-resha/ 'predetermine' or прес-пите
-% /pres-pite/ 'the snow-drifts' but пре-спите /pre-spite/ 'sleep for a
-% while/overnight'.
-%
-% [^b]: In several publications this rule is formulated with the
-% additional restriction that the sequence of consonants begins with
-% an obstruent. I believe this restriction is unintentional. It
-% makes no sense to forbid a hyphenation of the form AB-A but to
-% permit ABB-A (A denotes a vowel and B – a consonant).
-%
-% Hyphenation rules between 1983 and 2012
-% ---------------------------------------
-%
-% The Orthographic dictionary published by the Institute for Bulgarian
-% language in 1983 introduced new hyphenation rules. The complexity of
-% the previous rules was the main reason for the change. The new rules
-% aimed at two objectives: simplicity and unambiguity.
-%
-% The new rules are:
-%
-% 1. A consonant between two vowels links with the second vowel. For
-% example ви-со-чи-на /vi-so-chi-na/.
-% 2. In a sequence of two or more consonants between two vowels, at
-% least one consonant stays with first vowel and at least one with
-% the second vowel. For example сес-тра /ses-tra/ and сест-ра
-% /sest-ra/.
-% 3. Two equal consonants are separated. For example плен-ник
-% /plen-nik/.
-% 4. In a sequence of two or more vowels, the first vowel stays before
-% the hyphen. For example пре-одолея /pre-odoleya/ and прео-долея
-% /preo-doleya/.
-% 5. In a sequence of three or more vowels, the last vowel stays after
-% the hyphen. For example мао-изъм /mao-izam/ but not маои-зъм
-% /maoi-zam/.
-% 6. The letter й /y/ between a vowel and a consonant stays with the
-% vowel. For example май-ка /may-ka/.
-% 7. When a sequence of two or more consonants follows й /y/ then at
-% least one consonant links with й /y/. For example айс-берг
-% /ays-berg/ (not ай-сберг /ay-sberg/).
-% 8. The letter й /y/ between two vowels links with the second vowel.
-% For example ма-йор /ma-yor/.
-% 9. No hyphenation before or after ь.
-% 10. When the letters дж /dzh/ denote a single consonant, then they are
-% not separated. For example су-джук /su-dzhuk/ (not суд-жук
-% /sud-zhuk/) but над-живея /nad-zhiveya/.
-% 11. There must be at least one vowel before and after the hyphen.
-% 12. One letter does not stay alone.
-%
-% The total disregard of the morphology by these rules leads to some
-% strange results. For example пре-дизвестие /pre-dizvestie/ is
-% permitted and пред-известие /pred-izvestie/ is forbidden, зад-вижвам
-% /zad-vizhvam/ is permitted and за-движвам /za-dvizhvam/ is forbidden,
-% авток-луб /avtok-lub/ is permitted and авто-клуб /avto-klub/ is
-% forbidden, вакуу-мапарат /vakuu-maparat/ is permitted and
-% вакуум-апарат /vakuum-aparat/ is forbidden. Because of this, the new
-% rules were not universally accepted. The old rules are still
-% mentioned in various places in Internet, they are included even in
-% some grammar books published by the publishing houses of the Ministry
-% of Education and of Sofia University. The software developers,
-% however, soon came into love with the new hyphenation rules.
-%
-% Hyphenation rules after 2012
-% ----------------------------
-%
-% In 2012 new rules came into force. There are two differences with
-% respect to the previous rules:
-%
-% 1. Rule 5 of the previous rules is revoked. For example маои-зъм
-% /maoi-zam/ becomes a valid hyphenation.
-% 2. The new rules permit morphologically based hyphenation (however it
-% is not obligatory). For example пред-известие /pred-izvestie/,
-% за-движвам /za-dvizhvam/, авто-клуб /avto-klub/, вакуум-апарат
-% /vakuum-aparat/ are valid hyphenations.
-%
-% Good hyphenation is a complex matter and it seems the linguists at the
-% Institute for Bulgarian Language have recognised this. They no longer
-% attempt to provide universal rules about everything. Instead, they
-% provide some very permissible rules while the good application of
-% these rules is leaved to the discretion and the experience of the
-% printers and the developers of hyphenation software.
-%
-% It makes sense to use at least two different sets of hyphenation rules
-% for Bulgarian. In most cases a more restrictive version should be
-% used, one which attempts to eliminate the controversial cases of
-% hyphenation. When typesetting a Bulgarian text in a narrow newspaper
-% column, however, it will be appropriate to use more liberal
-% hyphenation rules. It should be noted that one of the reasons for the
-% hyphenation reform in 1983 was the desire to fix the chaotic
-% hyphenation in the Bulgarian newspapers at that time.
-%
-% Computer implementations
-% ========================
-%
-% Mathematical analysis of the Bulgarian hyphenation
-% --------------------------------------------------
-%
-% The earliest mathematical analysis of the Bulgarian hyphenation rules
-% belongs to Veska Noncheva.[^1] In 1988 she proposed a mathematical
-% formalisation of the hyphenation rules in a table with 22 rows.[^2]
-%
-% [^1]: <http://www.researchgate.net/profile/Veska_Noncheva>
-%
-% [^2]: Нончева В. Алгоритъм за автоматично пренасяне на думи в
-% българския език. Математика и математическо
-% образование. Сб. доклади на 17. ПК на СМБ. С., БАН, 1988, 479-482.
-%
-% In the same year Eugene Belogay[^3] proposed an alternative
-% formalisation with only 9 rules.[^4] Belogay proved that his rules are
-% consistent and that they form a minimal set. The rules of Belogay
-% have negative character – every hyphenation which is not forbidden by
-% a rule is possible hyphenation.
-%
-% [^3]: <http://www.linkedin.com/in/belogay>
-%
-% [^4]: Белогай Е. Алгоритъм за автоматично пренасяне на думи. Компютър
-% за вас (1988) 3, 12-14.
-%
-% The following are the first 7 rules, as formulated by Belogay:
-%
-% 1. Б-А
-% 2. А-ББ
-% 3. Б-ТТ, ТТ-Б
-% 4. ААА-Б
-% 5. й-ББ
-% 6. Б-ь
-% 7. д-ж
-%
-% Here А denotes an arbitrary vowel letter, Б denotes an arbitrary
-% consonant letter (including ь and й), ТТ denotes a sequence of two
-% equal consonant letters and the letters й, ь, д and ж denote
-% themselves. For example the rule "Б-А" says that we are not permitted
-% to separate a consonant letter from immediately following vowel
-% letter.
-%
-% The eighth rule of Belogay says that hyphenation is forbidden before
-% the first and after the last vowel letter. The ninth rule of Belogay
-% says that hyphenation is forbidden immediately after the first or
-% immediately before the last letter of the word.
-%
-% Notice that is is very easy to translate the rules of Belogay in the
-% form, required for the hyphenation algorithm of Knuth and Liang used
-% in TeX.[^a] Let us remind that this algorithm matches the word with a
-% set of string patterns in which the odd numbers say hyphenation is
-% permitted in this position and even numbers say the hyphenation is
-% forbidden. When two patterns give conflicting numbers for the same
-% position, then the greater number wins.
-%
-% First, since the rules of Belogay are negative (they say where
-% hyphenation is forbidden, not where it is permitted), we have to
-% permit the hyphenation everywhere:
-%
-% 1. А1
-% 2. Б1
-%
-% Then, the first seven rules of Belogay obtain the form:
-%
-% 1. Б2А
-% 2. А2ББ
-% 3. Б2ТТ ТТ2Б
-% 4. ААА2Б
-% 5. й2ББ
-% 6. Б2ь
-% 7. д2ж
-%
-% Since no Bulgarian word starts with more that four consonants and no
-% Bulgarian word ends with more than three consonants, the eighth rule
-% of Belogay can be translated in the following way:
-%
-% 1. .Б2
-% 2. .ББ2
-% 3. .БББ2
-% 4. 2Б.
-% 5. 2ББ.
-%
-% The ninth rule of Belogay means that left and right hyphen mins should
-% be set to 2.
-%
-% The work of Eugene Belogay was not limited to merely a mathematical
-% analysis of the Bulgarian hyphenation rules. In his paper he
-% published a short algorithm in Pascal which implements these rules.
-% It didn't take long for this algorithm to be used in various text
-% processing software. The algorithm of Belogay was famous for many
-% years. Even as late as 1997 in one book about TeX, the author didn't
-% care to give any explanations but simply wrote about "the algorithm of
-% Belogay" as something well known to the reader.[^5]
-%
-% [^a]: Liang, Franklin Mark. Word Hy-phen-a-tion by
-% Com-put-er (Doctoral Dissertation). Stanford University, 1983
-%
-% [^5]: Василев В. Ултимативният ТеХ. Удоволствието да правим
-% предпечатна подготовка сами. София, Интела, 1997, 36
-%
-% Bulgarian hyphenation in TeX
-% ----------------------------
-%
-% One unfortunate design decision of Knuth was that the hyphenation
-% algorithm of TeX applied the hyphenation patterns not to the input
-% character codes but to the internal codes of the glyphs in the font.
-% This created a problem for the Cyrillic languages because in TeX the
-% Cyrillic fonts did not have standardised encoding. Perhaps this is
-% one of the reasons why the earliest implementations of the Bulgarian
-% hyphenation in TeX did not rely on the internal hyphenation algorithm
-% of TeX. Instead, external tools were used to insert soft hyphens in
-% all Bulgarian words. For example such a tool would replace the word
-% сричкопренасяне /srichkoprenasyane/ with
-% срич\\-коп\\-ре\\-на\\-ся\\-не /srich\\-kop\\-re\\-na\\-sya\\-ne/.
-% The saying "To every disadvantage there is a corresponding advantage"
-% is true – since Cyrillic and Latin letters use different character
-% codes, an external tool could easily insert soft hyphens in all
-% Bulgarian words while leaving the TeX commands intact.
-%
-% The earliest known attempt to use the hyphenation algorithm of TeX for
-% Bulgarian was made by Ognyan Tonev in 1990.[^6] He described his work
-% as "a not very good translation of the rules. I work in this
-% direction. But I don't have a 100% working complect of patterns. So,
-% the copy I send to you[^7] is only a beta-version." The hyphenation
-% patterns of Tonev don't work correctly and it seems he never completed
-% his work.
-%
-% [^6]: The author of this text was unable to find current information
-% about Ognyan Tonev in Internet. Apparently in 1990 he worked in
-% the Center of Informatics and Computer Technology of the Bulgarian
-% Academy of Sciences.
-%
-% [^7]: To Yannis Haralambous,
-% <http://perso.telecom-bretagne.eu/yannisharalambous>
-%
-% The first usable Bulgarian hyphenation patterns for TeX were developed
-% by Georgi Boshnakov[^8] in 1994. In order to solve the encoding
-% problem, Boshnakov had developed TeX fonts supporting the MIK encoding
-% (the prevalent encoding at that time in Bulgaria). This allowed him
-% to introduce a fully working implementation only a few months after
-% LaTeX2e became the official LaTeX version. Later Boshnakov modified
-% his work with the Babel system. The hyphenation patterns of Boshnakov
-% did their job well enough, so that for almost quarter a century after
-% their initial creation, they remained the only Bulgarian hyphenation
-% patterns in the standard distributions of TeX and CTAN.
-%
-% [^8]: <http://www.maths.manchester.ac.uk/~gb/>
-%
-% There are some similarities between the patterns of Boshnakov and the
-% patterns of Belogay. The following are the main differences.
-%
-% First, Boshnakov used an ingenious and more compact implementation of
-% the second and the third rule. Instead of {А2ББ, Б2ТТ, ТТ2Б}, or
-% 8×22×22+22×22+22×22=4840 patterns in total, Boshnakov has patterns of
-% the form 2Б3Б2 and 4Т3Т4, or only 22×22=484 in total, with the same
-% effect.
-%
-% The second main difference between the patterns of Boshnakov and the
-% patterns of Belogay concerns the letter combination дж /dzh/. In
-% Bulgarian this letter combination can denote either a single
-% consonant, or a sequence of two consonants and the hyphenation rules
-% change respectively. Unfortunately, it is impossible to know the
-% meaning of дж /dzh/ without a vocabulary. The solution of Belogay was
-% a cautious one – his rules do the hyphenation in a way which will be
-% correct regardless of whether дж /dzh/ is a single consonant or a
-% sequence of two consonant. On the other hand, the approach of
-% Boshnakov is a bold one – since дж /dzh/ is more often a single
-% consonant, his rules assume that it is always a single consonant. The
-% number of the cases when this decision leads to bad hyphenations is
-% insignificant in comparison with the cases in which we obtain improved
-% hyphenation.
-%
-% The third main difference between the patterns of Boshnakov and the
-% patterns of Belogay concerns the eighth rule – its implementation in
-% the rules of Boshnakov is rather limited which leads to wrong
-% hyphenations like бри-дж /bri-dzh/. A full implementation of this
-% rule would require 11660 patterns in total and this would be too much
-% for the computers in 1994.
-%
-% Later developments
-% ------------------
-%
-% In 1995 Atanas Topalov defended a Masters thesis in the Faculty of
-% Mathematics and Informatics at Sofia University titled "Algorithms and
-% software about text processing".[^9] One of the main topics in his
-% thesis was the Bulgarian hyphenation. Topalov criticised vehemently
-% the official hyphenation rules and their total disregard of the
-% morphology. He wrote:
-%
-% > If we look at the history of the problems of the hyphenation, we
-% > will discover something very strange. Instead of the expected
-% > involvement with the depths and aspiration for more admissible and
-% > satisfactory style, we can find a growing tendency for
-% > simplification. One unpleasant discovery is that the development of
-% > the hyphenation software stays firmly on the principle "let us do
-% > the easiest thing". The earliest works which have been studied are
-% > from 1978. It turned out that they present the best approach
-% > concerning the automated hyphenation. The authors have chosen the
-% > most difficult but the most correct (from literary point of view)
-% > method for hyphenation, namely the morphological approach.
-%
-% Topalov proposed his own hyphenation algorithm. The hyphenation it
-% generated was smooth and easy to read. One obvious defect of the
-% algorithm of Topalov was that it contradicted the official hyphenation
-% rules at that time. One can argue, however, that his algorithm is
-% compatible with the current hyphenation rules.
-%
-% [^9]: The thesis of Atanas Topalov can be accessed at the author's
-% website <http://www.mind-print.com>
-%
-% In 1999 Svetla Koeva[^10] wrote a paper about the automated Bulgarian
-% hyphenation.[^11] At that time she was a junior member of the
-% Department of Computational Linguistics at the Institute for Bulgarian
-% Language but now she is a director of the whole institute. The paper
-% of Koeva contains a list of hyphenation patterns which can be used as
-% a basis of automated hyphenation. In 2004 with the help of Stoyan
-% Mihov[^12] the rules of Koeva were formalised with regular relations
-% and rewriting rules. They were implemented in a software product
-% named ItaEst which provided Bulgarian hyphenation and grammar checking
-% for various software products of Microsoft and Apple.
-%
-% [^10]: <http://dcl.bas.bg/svetla_koeva/>
-%
-% [^11]: Коева, Светла. Правила за пренасяне на части от думите на нов
-% ред. Български език. 1999/2000, 1, 84-86
-%
-% [^12]: <http://lml.bas.bg/~stoyan/>
-%
-% The main differences between the hyphenation of Koeva and the official
-% hyphenation rules effective after 2012 is that the separation of a
-% long sequence of consonants between two vowels is done according to
-% the rules valid before 1983. For example се-стра /se-stra/ and
-% ай-сберг /ay-sberg/ are permitted. The main difference between the
-% hyphenation of Koeva and the official hyphenation rules effective
-% before 1983 is that the rules of Koeva disregard the morphology of the
-% words. The following rule of Koeva is specific: in a sequence of two
-% sonorant consonants between two vowels, we are permitted to separate
-% the first vowel from the first consonant, for example материа-лна
-% /materia-lna/.
-%
-% In 2000 Anton Zinoviev[^13] created new hyphenation patterns for TeX.
-% He didn't know about the previous work of Boshnakov and he didn't
-% bother to make his work available in the various TeX distributions and
-% CTAN. His work was used mostly by the local Linux enthusiasts and the
-% colleagues of Zinoviev. In 2001 Radostin Radnev[^14] created a free
-% grammar dictionary of Bulgarian[^15] where he used the hyphenation
-% patterns of Zinoviev. From there the work of Zinoviev propagated to
-% OpenOffice, LibreOffice and various online dictionaries, including
-% <http://bg.wiktionary.org> and <http://rechnik.chitanka.info>.
-%
-% [^13]: The author of this text.
-%
-% [^14]: <http://bg.linkedin.com/in/radostinradnev>
-%
-% [^15]: <http://bgoffice.sourceforge.net/>
-%
-% The following are the main differences between the hyphenation of
-% Zinoviev and the hyphenation of Boshnakov.
-%
-% First, the eighth rule of Belogay is fully implemented.
-%
-% Second, the rules of Zinoviev try to detect when the letters дж /dzh/
-% (and дз /dz/) denote a single consonant and when they denote a
-% sequence of two consonants. By default, however, Zinoviev (like
-% Boshnakov) assumes that дж /dzh/ is a single consonant and hyphenates
-% accordingly.
-%
-% Third, the rules of Zinoviev disable some cases of unpleasant
-% hyphenations:
-%
-% 1. In a consonant sequence like тст /tst/, the two equal consonants т
-% /t/ are separated. For example братст-во /bratst-vo/ is forbidden
-% while братс-тво /brats-tvo/ and брат-ство /brat-stvo/ are
-% permitted.
-% 2. The hyphenation is forbidden after a sonorant consonant following
-% an obstruent consonant. For example отм-ра /otm-ra/ is forbidden
-% and от-мра /ot-mra/ is permitted.
-% 3. The hyphenation separates two consecutive kindred voiced/voiceless
-% consonants. For example субп-родукт /subp-roduct/ is forbidden and
-% суб-продукт /sub-product/ is permitted.
-%
-% At the start of his work on the Bulgarian hyphenation, Zinoviev had
-% the opportunity to discuss the hyphenation with Svetla Koeva. He
-% remembers that some cases of unpleasant hyphenation were suggested to
-% him by Koeva. Unfortunately, he hasn't taken notes so now he doesn't
-% know which cases of unpleasant hyphenation have been suggested to him
-% by Koeva and which are his own findings.
-%
-% The present work
-% ================
-%
-% Motivation
-% ----------
-%
-% The present work was carried out on the initiative of the leader of
-% the Bulgarian localisation team of Mozilla, who contacted Zinoviev,
-% Boshnakov and the maintainers of the TeX hyphenation patterns.[^17]
-% This work pursues the following main objectives:
-%
-% 1. to update the hyphenation patterns in accordance with the current
-% hyphenation rules;
-% 2. to generate the hyphenation patterns by a publicly available
-% script;
-% 3. to make the hyphenation patterns customisable;
-% 4. to provide documentation for the future developers.
-%
-% [^16]: <http://mozillians.org/en-US/u/stoyan/>
-%
-% [^17]: <http://hyphenation.org>
-%
-% The current official hyphenating rules for Bulgarian are rather
-% liberal. Very often, in a long sequence of consonants we are
-% permitted to split the word at any position, for example аген-т-с-т-во
-% /agen-t-s-t-vo/. This is prone to many unusual and unexpected results
-% that interrupt the attention of the reader or deceive his expectations
-% during the movement of his eyes to the next line. On the other hand,
-% in order to produce nice justified paragraphs there is no need for so
-% many hyphenation possibilities. It would be sufficient even if only
-% one possible separation between any two syllables was permitted.
-%
-% Therefore, it makes sense to use a more restrictive version of the
-% Bulgarian hyphenation, one which eliminates the controversial cases of
-% hyphenation. Only when typesetting a Bulgarian text in a very narrow
-% newspaper column it will be appropriate to use a more liberal version.
-% It should be noted that some specialised English dictionaries also
-% separate the word-division positions into two categories – preferred
-% positions and less recommended positions.
-%
-% There are two methods to determine the optimal division within a
-% sequence of consonants between two vowels:
-%
-% * we can hyphenate according to the syllables in the word or
-% * we can hyphenate morphologically.
-%
-% Hyphenation according to the syllables in the word
-% --------------------------------------------------
-%
-% Let us look at the properties of the Bulgarian syllables. All
-% syllables have the following structure:
-%
-% > onset - nucleus - code
-%
-% The nucleus in Bulgarian is always a vowel. Both the onset and the
-% code are (possibly empty) sequences of consonants.
-%
-% The Bulgarian syllables adhere to the Sonority Sequencing Principle.
-% According to this principle, the consonants within the onset have
-% raising sonority and the consonants within the code have decreasing
-% sonority.
-%
-% Several grammar books agree that the following sonority scale is valid
-% for Bulgarian:
-%
-% > voiceless obtrusive < voiced obtrusive < sonorant consonant < vowel
-%
-% According to the investigations of the author, the only exception to
-% this law is due to the letter в /v/ which is a voiced obtrusive but it
-% can be used also as a voiceless obtrusive. This exception is due to a
-% spelling particularity of the Bulgarian language. Whenever the letter
-% в /v/ seemingly violates the Sonority Sequencing Principle, in the
-% spoken language this letter is read as ф /f/, that is as a voiceless
-% obtrusive (for example the word отвсякъде /otvsyakade/ is read as
-% отфсякъде /otfsyakade/).[^18]
-%
-% [^18]: No Primitive Slavonic word contains the phoneme ф /f/.
-% Therefore, we can safely assume that in the Primitive Slavonic
-% language the consonant ф /f/ was a positional variant of the consonant
-% в /v/.
-%
-% The author has found that the sonorant consonants in Bulgarian have
-% their own sonority scale:
-%
-% > м /m/ < н /n/ < л /l/ < р /r/ < й /y/
-%
-% Only a few words such as жанр /zhanr/ and химн /himn/ violate this
-% scale. Such words are always loan-words and their pronunciation is
-% somewhat problematic for the native Bulgarian speakers.
-%
-% In addition to the Sonority Sequencing Principle, the consonant
-% clusters within the Bulgarian syllable adhere to the following
-% additional principles:
-%
-% 1. Both in the onset and in the code, the labial and dorsal plosives
-% precede the coronal plosives and affricates.
-% 2. If the onset or the code contains two plosives or affricates, then
-% there are no fricatives between them. Few words with the Latin
-% root 'text' are exceptions: контекст /kontekst/.
-% 3. If the onset or the code contains two fricatives other than в /v/,
-% then there are no plosives or affricates between them.
-% 4. If the onset or the code contains two plosives or affricates, then
-% they both have equal sonority (both are voiced, or both are
-% voiceless).
-% 5. If the onset or the code contains two fricatives other than в /v/,
-% then they both have equal sonority (both are voiced, or both are
-% voiceless).
-% 6. Neither the onset, nor the code may contain two labial plosives, or
-% two coronal plosives or affricates or two dorsal plosives.
-% 7. Neither the onset, nor the code may contain two equal consonants
-% with the exception of в /v/ (for example втвърди /vtvardi/).[^19]
-%
-% [^19]: Actually, the letter в /v/ is not a real exception because in
-% all such cases this letter denotes two different consonants – в /v/
-% and ф /f/. Only in the Russian loan-word взвод /vzvod/ the two
-% letters в /v/ denote a repeating consonant в /v/.
-%
-% From all these properties of the Bulgarian syllable we can deduce the
-% following hyphenation rules:
-%
-% 1. In a sequence МК where М is a consonant with higher sonority than
-% K, we are not permitted to hyphenate before М. Exception: when М
-% is в /v/ and К is a voiceless consonant.
-% 2. In a sequence КМ where М is a consonant with higher sonority than
-% K, we are not permitted to hyphenate after М.
-% 3. In a sequence KBT where K and T are plosives or affricates and B is
-% fricative, we separate K from T.
-% 4. In a sequence CKB where K is a plosive or affricate and C and B are
-% fricatives other than в /v/, we separate C from B.
-% 5. If in a consonant sequence a coronal plosive or affricate Т is
-% followed by a labial or dorsal plosive К, then we separate Т from К.
-% 6. If a consonant sequence contains two plosives or affricates, one
-% voiced and one voiceless, then we separate them.
-% 7. If a consonant sequence contains two fricatives other than в /v/,
-% one voiced and one voiceless, then we separate them.
-% 8. If a consonant sequence contains two labial plosives or two coronal
-% plosives or affricates or two dorsal plosives then they are
-% separated.
-% 9. If a consonant sequence contains two equal consonants (not
-% necessarily consecutive), then they are separated.
-%
-% With so many prohibitive rules, a question arises: if we apply all
-% these rules, aren't we going to eliminate too many hyphenation
-% possibilities? The answer is no. It can be demonstrated that between
-% any two consecutive syllables at least one separation point will be
-% permitted.
-%
-%
-% Hyphenation according to the morphology
-% ---------------------------------------
-%
-% Between 1983 and 2012 the official orthographic rules of the
-% Bulgarian language forbade morphologically based hyphenation. After
-% 2012 such hyphenation is permitted (but not obligatory).
-%
-% The most important case when it is very desirable to use
-% morphologically based hyphenation is the case of the compound words.
-% Divisions such as авток-луб /avtok-lub/ and вакуу-мапарат
-% /vakuu-maparat/ are extremely irritating even if they are formally
-% correct. Unfortunately, we do not have a vocabulary of the compound
-% Bulgarian words that would permit us to produce rules for automated
-% hyphenation. Therefore, the current Bulgarian hyphenation patterns do
-% not attempt to apply morphological hyphenation to such words.
-%
-% Second in importance (but far more significant in terms of numbers) is
-% the case with the word prefixes. While the eyes of the reader still
-% look at the start of the word, the word is still unknown to him. At
-% this point, it is very important not to deceive his expectations. For
-% example, when the reader sees над- /nad-/ at the end of the line, he
-% will expect that this is the prefix над- /nad-/ with semantics 'attain
-% more than'. This expectation will be fooled if this wasn't really a
-% prefix, but a deceiving (while formally correct) hyphenation of the
-% word надремя /nadremya/ 'have dozed enough' where the real prefix is
-% not над- /nad-/ but на- /na-/ with semantics 'achieve a state after
-% accumulation'. Such hyphenation distracts the reader and makes the
-% reading more difficult.
-%
-% Third in importance is the case with the word suffixes. With respect
-% to the hyphenation rules we can divide the suffixes into three
-% categories:
-%
-% 1. Suffixes starting with a vowel, for example -ар /-ar/. It is not
-% appropriate to follow the morphology with such suffixes because
-% this will contradict the whole hyphenation tradition of the
-% Bulgarian language. For example крав-ар /krav-ar/ is unwarranted.
-% 2. Suffixes starting with one consonant, for example -ка /-ka/.
-% Usually with such suffixes the syllable boundary in the word
-% coincides with morpheme boundary so no specific cares are
-% necessary, for example кравар-ка /kravar-ka/. The exceptions are
-% rare, for example: обек-тната /obek-tnata/ instead of обект-ната
-% /obekt-nata/.
-% 3. Suffixes starting with more than one consonant (-ски /-ski/, -ство
-% /-stvo/). It is possible to use morphological hyphenation rules
-% with such suffixes.
-%
-% Even if it is possible to use morphological hyphenation with the
-% suffixes of the third category, it turns out, this is not as useful as
-% it is with the case of the prefixes. When the eyes of the reader have
-% reached this part of the word, the word is already more or less known
-% to the reader. Therefore, at this point the morphological hyphenation
-% does not provide any significant advantages in comparison to the
-% simpler hyphenation based only on the syllables in the word. Consider
-% for example the word геройс-тво /geroys-tvo/ with suffix -ство
-% /-stvo/. When the reader sees геройс- /geroys-/ at the end of the
-% line this will give him an early clue that the suffix of the word is
-% -ство /-stvo/. Such non-morphological hyphenation does not deceive
-% the expectations of the reader. On the contrary, it makes the reading
-% easier because it gives clues to the reader about what follows on the
-% next line.
-%
-% Because of these considerations, the current Bulgarian hyphenation
-% patterns do not attempt to use morphological hyphenation with respect
-% to the suffixes of the words. Though it would be useful to implement
-% rules about the suffixes of the second cateogory. Hopefully, some
-% future version will have such rules.
-%
-% Occasionally,[^20] a fourth morphological requirement is stated: that
-% hyphenation should conform with the boundary between the word and the
-% definitive articles -та /-ta/ and -те /-te/ (postfixed in Bulgarian).
-% There is no need to pay attention to this rule because it seems to be
-% satisfied by its own nature. The author has searched in a dictionary
-% with over 860000 Bulgarian words for cases when the hyphenation rules
-% would hyphenate badly with respect to the definitive article. He was
-% unable to find even one such case with the hyphenation rules valid
-% after 1983 and only about 10 cases with the rules valid before 1983
-% (one of them is живопи-ста /zhivopi-sta/ instead of живопис-та
-% /zhivopis-ta/).
-%
-% One unavoidable characteristic of any morphologically based automated
-% hyphenation is that it can create wrong hyphenations. Because of
-% this, one useful option is to use the morphology in a safe way – to
-% use it in order to forbid bad hyphenations but to create no new
-% hyphenation possibilities solely on the basis of the morphology.
-%
-% Take for example the word дозрея /dozreya/ 'ripen fully'. According
-% to the phonological rules, we should hyphenate it as доз-рея
-% /doz-reya/. According to the morphology, however, we should hyphenate
-% as до-зрея /do-zreyq/ because this word is formed with the prefix до-
-% /do-/ with semantics 'complete or supplement' and this semantics would
-% be lost if the reader sees доз- /doz-/ at the end of the line.
-% Therefore, there are three methods to hyphenate this word:
-%
-% 1. доз-рея /doz-reya/ when morphology is not used;
-% 2. до-зрея /do-zreya/ when morphology is fully used;
-% 3. дозрея /dozreya/ (no hyphenation) when morphology is used in a safe
-% way.
-%
-% The option to use the morphology in a safe way is very attractive when
-% the software uses a smart line-breaking algorithm which can produce
-% good results even with less hyphenation possibilities. TeX is one
-% such software. It should be noted that this option does not eliminate
-% too many hyphenation possibilities because the morpheme boundaries
-% most of the time are also syllable boundaries.
-%
-% [^20]: Правописен и правоговорен наръчник. Състав. Иван Хаджов,
-% Цв. Минков; Ред. Ив. Хаджов и др. София, Бълг. кн., 1945
-%
-% The following are results of a statistics about the quality of the
-% morphological rules (the number after the sign ± is the expected
-% standard deviation of our estimations):
-%
-% With the option `--morphology`:
-%
-% * in 0.1% ±0.3% of the dictionary words the morphological patterns
-% create very wrong hyphenation;
-% * in 89.8% ±0.1% of the dictionary words the morphological patterns
-% hyphenate identically with the case when no morphology patterns are
-% used;
-% * in 0.3% ±0.2% of the dictionary words the morphological patterns
-% hyphenate differently in comparison to the case when no morphology
-% patterns are used and the word is hyphenated in a way which
-% contradicts the morphology;
-% * in 0.6% ±0.1% of the dictionary words the morphological patterns
-% hyphenate differently in comparison to the case when no morphology
-% patterns are used and there is a possible hyphenation which is
-% compatible with the word morphology but which is nevertheless
-% forbidden by the morphology patterns.
-%
-% With the option `--safe-morphology`:
-%
-% * in 0% of the dictionary words the morphological patterns create very
-% wrong hyphenation;
-% * in 90.0% ±0.1% of the dictionary words the morphological patterns
-% hyphenate identically with the case when no morphology patterns are
-% used;
-% * in 0.3% ±0.2% of the dictionary words the morphological patterns
-% hyphenate differently in comparison to the case when no morphology
-% patterns are used and the word is hyphenated in a way which
-% contradicts the morphology;
-% * in 0.6% ±0.1% of the dictionary words the morphological patterns
-% hyphenate differently in comparison to the case when no morphology
-% patterns are used and there is a possible hyphenation which is
-% compatible both with the word morphology and with the syllable
-% boundaries but which is nevertheless forbidden by the morphology
-% patterns.
-%
-% Notice that the morphological patterns create a different hyphenation
-% only in about 10% of the words. The following explanation can be
-% given for this surprising fact. First, the natural evolution of the
-% human languages tends to simplify the complex sequences of consonants.
-% Therefore, no morpheme contains a complex sequence of consonants. And
-% second, the Bulgarian orthography is morphological. This means that
-% the morphemes are written according to their actual pronunciation,
-% however the simplifications in the spoken languages which take place
-% at the morpheme boundaries are not taken into account in the
-% orthography. The independent operation of these two factors leads to
-% the result that most of the time the morpheme boundaries coincide with
-% the conventional syllable boundaries. The main exception to this is
-% when a morpheme starts with a vowel, in this case its syllable will
-% include one or more consonants of the preceeding morpheme. The second
-% exception is when a morpheme ends with a vowel and the next morpheme
-% starts with a sequence of two or more consonants.
-%
-% Usage of the script `hyph-bg.sh`
-% --------------------------------
-%
-% The `hyph-bg.sh` is all-in-one script which can generate both
-% documentation (this text) and Bulgarian hyphenation patterns. When
-% given the option `--help` the script gives short usage instructions:
-%
-% ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-% hyph-bg.sh --help
-% Show this info
-% hyph-bg.sh [--doc-html | --doc-latex | --doc-txt]
-% Print documentation in various formats
-% hyph-bg.sh [other options]
-% Generate Bulgarian hyphenation patterns
-%
-% Options when generating hyphenation patterns:
-%
-% --standalone-tex
-% Produce hyphenation patterns for TeX with \patterns{ ... }.
-%
-% --no-hyphen-mins
-% Hyphenation patterns which do not require hyphen mins.
-% Otherwise: both left and right hyphen mins should be set to 2.
-%
-% --safe-dz
-% Do not try to guess whether DZ is a single consonant or not.
-% Only use hyphenation which will be correct in both cases.
-%
-% --permissible
-% Permit any formally correct hyphenation, including unnatural
-% divisions, such as studen-tstvo. Useful for educational tools
-% or when typesetting Bulgarian text in a very short column.
-%
-% --morphology
-% Apply morphology when hyphenating, for example: za-dvizhvam.
-% May hyphenate incorrectly in some cases.
-%
-% --safe-morphology
-% Apply morphology when hyphenating. Never hyphenates incorrectly
-% but may prohibit some correct hyphenations.
-%
-% --no-morphology
-% Disregard the morphology. Default.
-%
-% --1945
-% Hyphenate according to the rules effective between 1945 and 1982
-%
-% --1983
-% Hyphenate according to the rules effective between 1983 and 2011
-%
-% --2012
-% Hyphenate according to the rules effective after 2012. Default.
-% ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-%
-% The following are the recommended ways to generate hyphenation
-% patterns by this script:
-%
-% `hyph-bg.sh --standalone-tex --safe-morphology`
-% : For TeX. Apply the morphology in a safe way when the software
-% uses a smart line-breaking algorithm.
-%
-% `hyph-bg.sh`
-% : For most other software.
-%
-% `hyph-bg.sh --no-hyphen-mins`
-% : The current versions of Mozilla (as of 2017) seem to ignore the
-% hyphen mins in words that contain a dash.
-%
-% `hyph-bg.sh --morphology`
-% : For professional typography with human proof-reader.
-%
-% `hyph-bg.sh --permissible`
-% : For educational tools and online dictionaries which can show only one
-% kind of hyphenation.
-%
-% Notice that some specialised English dictionaries separate the
-% word-division positions into two categories – preferred positions and
-% less recommended positions. It would be best if the Bulgarian online
-% dictionaries could do the same. For example hyphen "-" can be used to
-% display the preferred positions and dot "." – the less recommended
-% positions. If a word-division position is permitted only by the
-% patterns of `hyph-bg.sh --permissible`, then this position is less
-% recommended.
-%
-
-\message{Bulgarian hyphenation patterns (options: --safe-morphology --standalone-tex, version 21 October 2017)}
+% no comment \ No newline at end of file