diff options
Diffstat (limited to 'tex/context/patterns/mkiv/lang-bg.lua')
-rw-r--r-- | tex/context/patterns/mkiv/lang-bg.lua | 890 |
1 files changed, 1 insertions, 889 deletions
diff --git a/tex/context/patterns/mkiv/lang-bg.lua b/tex/context/patterns/mkiv/lang-bg.lua index 7bcc69108..36ee29044 100644 --- a/tex/context/patterns/mkiv/lang-bg.lua +++ b/tex/context/patterns/mkiv/lang-bg.lua @@ -6,895 +6,7 @@ return { ["metadata"]={ ["mnemonic"]="bg", ["source"]="hyph-bg", - ["texcomment"]="% copyright: Copyright (C) 2000, 2004, 2017 by Anton Zinoviev <anton@lml.bas.bg>\ -% title: Bulgarian hyphenation patterns\ -% version: 21 October 2017\ -% language:\ -% name: Bulgarian\ -% tag: bg\ -% notice: >\ -% This file is part of the hyph-utf8 package.\ -% See http://www.hyphenation.org for more information.\ -% authors:\ -% -\ -% name: Anton Zinoviev\ -% contact: anton:lml.bas.bg\ -% licence:\ -% text: >\ -% This software may be used, modified, copied, distributed, and sold,\ -% both in source and binary form provided that the above copyright\ -% notice and these terms are retained. The name of the author may not\ -% be used to endorse or promote products derived from this software\ -% without prior permission. THIS SOFTWARE IS PROVIDES \"AS IS\" AND\ -% ANY EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED. IN NO EVENT\ -% SHALL THE AUTHOR BE LIABLE FOR ANY DAMAGES ARISING IN ANY WAY OUT\ -% OF THE USE OF THIS SOFTWARE.\ -% hyphenmins:\ -% typesetting:\ -% left: 2\ -% right: 2\ -% changes: See below\ -% ==========================================\ -% Copyright (C) 2000,2004,2017 by Anton Zinoviev <anton@lml.bas.bg>\ -%\ -% This software may be used, modified, copied, distributed, and sold,\ -% both in source and binary form provided that the above copyright\ -% notice and these terms are retained. The name of the author may not\ -% be used to endorse or promote products derived from this software\ -% without prior permission. THIS SOFTWARE IS PROVIDES \"AS IS\" AND\ -% ANY EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED. IN NO EVENT\ -% SHALL THE AUTHOR BE LIABLE FOR ANY DAMAGES ARISING IN ANY WAY OUT\ -% OF THE USE OF THIS SOFTWARE.\ -%\ -% Bulgarian hyphenation patterns\ -%\ -% Generated by ./hyph-bg.sh --safe-morphology --standalone-tex\ -%\ -% Both left and right hyphenmins should be set to 2.\ -%\ -% % Automated Bulgarian Hyphenation\ -% % Anton Zinoviev\ -% % 21 October 2017\ -% \ -% Principles of the Bulgarian hyphenation\ -% =======================================\ -% \ -% One specificity of the Bulgarian language is that the average length\ -% of the words is greater than in English. When typesetting a Bulgarian\ -% text, hyphenation is more important than when typesetting an English\ -% text. Knuth's algorithm for line-breaking is such that in most\ -% English paragraphs no hyphenation will be used. With a Bulgarian\ -% text, however, even the Knuth's algorithm will use hyphenation in most\ -% paragraphs. Hyphenation becomes an absolute necessity if we want to\ -% obtain nice, justified paragraphs when using a software with dumb\ -% line-breaking algorithm, such as LibreOffice.\ -% \ -% According to Decree 936 of the Council of Ministers promulgated on 27\ -% November 1950, the Institute for Bulgarian Language at the Bulgarian\ -% Academy of Sciences is authorised to publish the rules of the\ -% orthography of the Bulgarian language (within certain limits).\ -% \ -% Hyphenation rules between 1945 and 1983\ -% ---------------------------------------\ -% \ -% Between 1945 and 1983 Bulgarian used syllable hyphenation with two\ -% morphological exceptions: hyphenation is preferred between a prefix\ -% and a stem and at the boundary of compound words. The following were\ -% the rules governing the hyphenation:\ -% \ -% 1. One letter does not stay alone. Words of one syllable can not be\ -% hyphenated.\ -% 2. No hyphenation before or after ь.\ -% 3. In a sequence of vowels at least one vowel stays before the\ -% hyphen.\ -% 4. A single consonant between two vowels links with the second vowel.\ -% For example по-ле /po-le/, ра-бо-та /ra-bo-ta/.\ -% 5. In a sequence of consonants between two vowels, at least one\ -% consonant stays with the second vowel. For example те-сто /te-sto/\ -% or тес-то /tes-to/.[^b]\ -% 6. In a sequence of consonants between two vowels, if the first\ -% consonant is sonorant (й /y/, л /l/, м /m/, н /n/, р /r/), then it\ -% stays with the first vowel. For example гер-дан /ger-dan/, сен-ки\ -% /sen-ki/.\ -% 7. The hyphenation separates two successive equal consonants. For\ -% example времен-но /vremen-no/, пролет-та /prolet-ta/.\ -% 8. When the letters дж /dzh/ and дз /dz/ denote a single consonant,\ -% then they are not separated. For example боя-джия /boya-dzhiya/\ -% but not бояд-жия /boyad-zhiya/. When these letters denote two\ -% consonants, then the normal rules apply: над-живявам\ -% /nad-zhivyavam/.\ -% 9. Word prefixes may not be broken. Compound words are hyphenated\ -% either at the boundary of the components or the hyphenation rules\ -% are applied to each of the components separately. For example:\ -% пред-упреждавам /pred-uprezhdavam/ (not пре-дупреждавам\ -% /pre-duprezhdavam/), пред-известие /pred-izvestie/ (not\ -% пре-дизвестие /pre-dizvestie/), за-движвам /za-dvizhvam/ (not\ -% зад-вижвам /zad-vizhvam/), авто-клуб /avto-klub/ (not авток-луб\ -% /avtok-lub/), вакуум-апарат /vakuum-aparat/ (not вакуу-мапарат\ -% /vakuu-maparat/).\ -% \ -% In some rare cases the proper application of rule 9 depends on the\ -% semantics of the word. For example пре-дреша /pre-dresha/ 'change\ -% clothes' but пред-реша /pred-resha/ 'predetermine' or прес-пите\ -% /pres-pite/ 'the snow-drifts' but пре-спите /pre-spite/ 'sleep for a\ -% while/overnight'.\ -% \ -% [^b]: In several publications this rule is formulated with the\ -% additional restriction that the sequence of consonants begins with\ -% an obstruent. I believe this restriction is unintentional. It\ -% makes no sense to forbid a hyphenation of the form AB-A but to\ -% permit ABB-A (A denotes a vowel and B – a consonant).\ -% \ -% Hyphenation rules between 1983 and 2012\ -% ---------------------------------------\ -% \ -% The Orthographic dictionary published by the Institute for Bulgarian\ -% language in 1983 introduced new hyphenation rules. The complexity of\ -% the previous rules was the main reason for the change. The new rules\ -% aimed at two objectives: simplicity and unambiguity.\ -% \ -% The new rules are:\ -% \ -% 1. A consonant between two vowels links with the second vowel. For\ -% example ви-со-чи-на /vi-so-chi-na/.\ -% 2. In a sequence of two or more consonants between two vowels, at\ -% least one consonant stays with first vowel and at least one with\ -% the second vowel. For example сес-тра /ses-tra/ and сест-ра\ -% /sest-ra/.\ -% 3. Two equal consonants are separated. For example плен-ник\ -% /plen-nik/.\ -% 4. In a sequence of two or more vowels, the first vowel stays before\ -% the hyphen. For example пре-одолея /pre-odoleya/ and прео-долея\ -% /preo-doleya/.\ -% 5. In a sequence of three or more vowels, the last vowel stays after\ -% the hyphen. For example мао-изъм /mao-izam/ but not маои-зъм\ -% /maoi-zam/.\ -% 6. The letter й /y/ between a vowel and a consonant stays with the\ -% vowel. For example май-ка /may-ka/.\ -% 7. When a sequence of two or more consonants follows й /y/ then at\ -% least one consonant links with й /y/. For example айс-берг\ -% /ays-berg/ (not ай-сберг /ay-sberg/).\ -% 8. The letter й /y/ between two vowels links with the second vowel.\ -% For example ма-йор /ma-yor/.\ -% 9. No hyphenation before or after ь.\ -% 10. When the letters дж /dzh/ denote a single consonant, then they are\ -% not separated. For example су-джук /su-dzhuk/ (not суд-жук\ -% /sud-zhuk/) but над-живея /nad-zhiveya/.\ -% 11. There must be at least one vowel before and after the hyphen.\ -% 12. One letter does not stay alone.\ -% \ -% The total disregard of the morphology by these rules leads to some\ -% strange results. For example пре-дизвестие /pre-dizvestie/ is\ -% permitted and пред-известие /pred-izvestie/ is forbidden, зад-вижвам\ -% /zad-vizhvam/ is permitted and за-движвам /za-dvizhvam/ is forbidden,\ -% авток-луб /avtok-lub/ is permitted and авто-клуб /avto-klub/ is\ -% forbidden, вакуу-мапарат /vakuu-maparat/ is permitted and\ -% вакуум-апарат /vakuum-aparat/ is forbidden. Because of this, the new\ -% rules were not universally accepted. The old rules are still\ -% mentioned in various places in Internet, they are included even in\ -% some grammar books published by the publishing houses of the Ministry\ -% of Education and of Sofia University. The software developers,\ -% however, soon came into love with the new hyphenation rules.\ -% \ -% Hyphenation rules after 2012\ -% ----------------------------\ -% \ -% In 2012 new rules came into force. There are two differences with\ -% respect to the previous rules:\ -% \ -% 1. Rule 5 of the previous rules is revoked. For example маои-зъм\ -% /maoi-zam/ becomes a valid hyphenation.\ -% 2. The new rules permit morphologically based hyphenation (however it\ -% is not obligatory). For example пред-известие /pred-izvestie/,\ -% за-движвам /za-dvizhvam/, авто-клуб /avto-klub/, вакуум-апарат\ -% /vakuum-aparat/ are valid hyphenations.\ -% \ -% Good hyphenation is a complex matter and it seems the linguists at the\ -% Institute for Bulgarian Language have recognised this. They no longer\ -% attempt to provide universal rules about everything. Instead, they\ -% provide some very permissible rules while the good application of\ -% these rules is leaved to the discretion and the experience of the\ -% printers and the developers of hyphenation software.\ -% \ -% It makes sense to use at least two different sets of hyphenation rules\ -% for Bulgarian. In most cases a more restrictive version should be\ -% used, one which attempts to eliminate the controversial cases of\ -% hyphenation. When typesetting a Bulgarian text in a narrow newspaper\ -% column, however, it will be appropriate to use more liberal\ -% hyphenation rules. It should be noted that one of the reasons for the\ -% hyphenation reform in 1983 was the desire to fix the chaotic\ -% hyphenation in the Bulgarian newspapers at that time.\ -% \ -% Computer implementations\ -% ========================\ -% \ -% Mathematical analysis of the Bulgarian hyphenation\ -% --------------------------------------------------\ -% \ -% The earliest mathematical analysis of the Bulgarian hyphenation rules\ -% belongs to Veska Noncheva.[^1] In 1988 she proposed a mathematical\ -% formalisation of the hyphenation rules in a table with 22 rows.[^2]\ -% \ -% [^1]: <http://www.researchgate.net/profile/Veska_Noncheva>\ -% \ -% [^2]: Нончева В. Алгоритъм за автоматично пренасяне на думи в\ -% българския език. Математика и математическо\ -% образование. Сб. доклади на 17. ПК на СМБ. С., БАН, 1988, 479-482.\ -% \ -% In the same year Eugene Belogay[^3] proposed an alternative\ -% formalisation with only 9 rules.[^4] Belogay proved that his rules are\ -% consistent and that they form a minimal set. The rules of Belogay\ -% have negative character – every hyphenation which is not forbidden by\ -% a rule is possible hyphenation.\ -% \ -% [^3]: <http://www.linkedin.com/in/belogay>\ -% \ -% [^4]: Белогай Е. Алгоритъм за автоматично пренасяне на думи. Компютър\ -% за вас (1988) 3, 12-14.\ -% \ -% The following are the first 7 rules, as formulated by Belogay:\ -% \ -% 1. Б-А\ -% 2. А-ББ\ -% 3. Б-ТТ, ТТ-Б\ -% 4. ААА-Б\ -% 5. й-ББ\ -% 6. Б-ь\ -% 7. д-ж\ -% \ -% Here А denotes an arbitrary vowel letter, Б denotes an arbitrary\ -% consonant letter (including ь and й), ТТ denotes a sequence of two\ -% equal consonant letters and the letters й, ь, д and ж denote\ -% themselves. For example the rule \"Б-А\" says that we are not permitted\ -% to separate a consonant letter from immediately following vowel\ -% letter.\ -% \ -% The eighth rule of Belogay says that hyphenation is forbidden before\ -% the first and after the last vowel letter. The ninth rule of Belogay\ -% says that hyphenation is forbidden immediately after the first or\ -% immediately before the last letter of the word.\ -% \ -% Notice that is is very easy to translate the rules of Belogay in the\ -% form, required for the hyphenation algorithm of Knuth and Liang used\ -% in TeX.[^a] Let us remind that this algorithm matches the word with a\ -% set of string patterns in which the odd numbers say hyphenation is\ -% permitted in this position and even numbers say the hyphenation is\ -% forbidden. When two patterns give conflicting numbers for the same\ -% position, then the greater number wins.\ -% \ -% First, since the rules of Belogay are negative (they say where\ -% hyphenation is forbidden, not where it is permitted), we have to\ -% permit the hyphenation everywhere:\ -% \ -% 1. А1\ -% 2. Б1\ -% \ -% Then, the first seven rules of Belogay obtain the form:\ -% \ -% 1. Б2А\ -% 2. А2ББ\ -% 3. Б2ТТ ТТ2Б\ -% 4. ААА2Б\ -% 5. й2ББ\ -% 6. Б2ь\ -% 7. д2ж\ -% \ -% Since no Bulgarian word starts with more that four consonants and no\ -% Bulgarian word ends with more than three consonants, the eighth rule\ -% of Belogay can be translated in the following way:\ -% \ -% 1. .Б2\ -% 2. .ББ2\ -% 3. .БББ2\ -% 4. 2Б.\ -% 5. 2ББ.\ -% \ -% The ninth rule of Belogay means that left and right hyphen mins should\ -% be set to 2.\ -% \ -% The work of Eugene Belogay was not limited to merely a mathematical\ -% analysis of the Bulgarian hyphenation rules. In his paper he\ -% published a short algorithm in Pascal which implements these rules.\ -% It didn't take long for this algorithm to be used in various text\ -% processing software. The algorithm of Belogay was famous for many\ -% years. Even as late as 1997 in one book about TeX, the author didn't\ -% care to give any explanations but simply wrote about \"the algorithm of\ -% Belogay\" as something well known to the reader.[^5]\ -% \ -% [^a]: Liang, Franklin Mark. Word Hy-phen-a-tion by\ -% Com-put-er (Doctoral Dissertation). Stanford University, 1983\ -% \ -% [^5]: Василев В. Ултимативният ТеХ. Удоволствието да правим\ -% предпечатна подготовка сами. София, Интела, 1997, 36\ -% \ -% Bulgarian hyphenation in TeX\ -% ----------------------------\ -% \ -% One unfortunate design decision of Knuth was that the hyphenation\ -% algorithm of TeX applied the hyphenation patterns not to the input\ -% character codes but to the internal codes of the glyphs in the font.\ -% This created a problem for the Cyrillic languages because in TeX the\ -% Cyrillic fonts did not have standardised encoding. Perhaps this is\ -% one of the reasons why the earliest implementations of the Bulgarian\ -% hyphenation in TeX did not rely on the internal hyphenation algorithm\ -% of TeX. Instead, external tools were used to insert soft hyphens in\ -% all Bulgarian words. For example such a tool would replace the word\ -% сричкопренасяне /srichkoprenasyane/ with\ -% срич\\\\-коп\\\\-ре\\\\-на\\\\-ся\\\\-не /srich\\\\-kop\\\\-re\\\\-na\\\\-sya\\\\-ne/.\ -% The saying \"To every disadvantage there is a corresponding advantage\"\ -% is true – since Cyrillic and Latin letters use different character\ -% codes, an external tool could easily insert soft hyphens in all\ -% Bulgarian words while leaving the TeX commands intact.\ -% \ -% The earliest known attempt to use the hyphenation algorithm of TeX for\ -% Bulgarian was made by Ognyan Tonev in 1990.[^6] He described his work\ -% as \"a not very good translation of the rules. I work in this\ -% direction. But I don't have a 100% working complect of patterns. So,\ -% the copy I send to you[^7] is only a beta-version.\" The hyphenation\ -% patterns of Tonev don't work correctly and it seems he never completed\ -% his work.\ -% \ -% [^6]: The author of this text was unable to find current information\ -% about Ognyan Tonev in Internet. Apparently in 1990 he worked in\ -% the Center of Informatics and Computer Technology of the Bulgarian\ -% Academy of Sciences.\ -% \ -% [^7]: To Yannis Haralambous,\ -% <http://perso.telecom-bretagne.eu/yannisharalambous>\ -% \ -% The first usable Bulgarian hyphenation patterns for TeX were developed\ -% by Georgi Boshnakov[^8] in 1994. In order to solve the encoding\ -% problem, Boshnakov had developed TeX fonts supporting the MIK encoding\ -% (the prevalent encoding at that time in Bulgaria). This allowed him\ -% to introduce a fully working implementation only a few months after\ -% LaTeX2e became the official LaTeX version. Later Boshnakov modified\ -% his work with the Babel system. The hyphenation patterns of Boshnakov\ -% did their job well enough, so that for almost quarter a century after\ -% their initial creation, they remained the only Bulgarian hyphenation\ -% patterns in the standard distributions of TeX and CTAN.\ -% \ -% [^8]: <http://www.maths.manchester.ac.uk/~gb/>\ -% \ -% There are some similarities between the patterns of Boshnakov and the\ -% patterns of Belogay. The following are the main differences.\ -% \ -% First, Boshnakov used an ingenious and more compact implementation of\ -% the second and the third rule. Instead of {А2ББ, Б2ТТ, ТТ2Б}, or\ -% 8×22×22+22×22+22×22=4840 patterns in total, Boshnakov has patterns of\ -% the form 2Б3Б2 and 4Т3Т4, or only 22×22=484 in total, with the same\ -% effect.\ -% \ -% The second main difference between the patterns of Boshnakov and the\ -% patterns of Belogay concerns the letter combination дж /dzh/. In\ -% Bulgarian this letter combination can denote either a single\ -% consonant, or a sequence of two consonants and the hyphenation rules\ -% change respectively. Unfortunately, it is impossible to know the\ -% meaning of дж /dzh/ without a vocabulary. The solution of Belogay was\ -% a cautious one – his rules do the hyphenation in a way which will be\ -% correct regardless of whether дж /dzh/ is a single consonant or a\ -% sequence of two consonant. On the other hand, the approach of\ -% Boshnakov is a bold one – since дж /dzh/ is more often a single\ -% consonant, his rules assume that it is always a single consonant. The\ -% number of the cases when this decision leads to bad hyphenations is\ -% insignificant in comparison with the cases in which we obtain improved\ -% hyphenation.\ -% \ -% The third main difference between the patterns of Boshnakov and the\ -% patterns of Belogay concerns the eighth rule – its implementation in\ -% the rules of Boshnakov is rather limited which leads to wrong\ -% hyphenations like бри-дж /bri-dzh/. A full implementation of this\ -% rule would require 11660 patterns in total and this would be too much\ -% for the computers in 1994.\ -% \ -% Later developments\ -% ------------------\ -% \ -% In 1995 Atanas Topalov defended a Masters thesis in the Faculty of\ -% Mathematics and Informatics at Sofia University titled \"Algorithms and\ -% software about text processing\".[^9] One of the main topics in his\ -% thesis was the Bulgarian hyphenation. Topalov criticised vehemently\ -% the official hyphenation rules and their total disregard of the\ -% morphology. He wrote:\ -% \ -% > If we look at the history of the problems of the hyphenation, we\ -% > will discover something very strange. Instead of the expected\ -% > involvement with the depths and aspiration for more admissible and\ -% > satisfactory style, we can find a growing tendency for\ -% > simplification. One unpleasant discovery is that the development of\ -% > the hyphenation software stays firmly on the principle \"let us do\ -% > the easiest thing\". The earliest works which have been studied are\ -% > from 1978. It turned out that they present the best approach\ -% > concerning the automated hyphenation. The authors have chosen the\ -% > most difficult but the most correct (from literary point of view)\ -% > method for hyphenation, namely the morphological approach.\ -% \ -% Topalov proposed his own hyphenation algorithm. The hyphenation it\ -% generated was smooth and easy to read. One obvious defect of the\ -% algorithm of Topalov was that it contradicted the official hyphenation\ -% rules at that time. One can argue, however, that his algorithm is\ -% compatible with the current hyphenation rules.\ -% \ -% [^9]: The thesis of Atanas Topalov can be accessed at the author's\ -% website <http://www.mind-print.com>\ -% \ -% In 1999 Svetla Koeva[^10] wrote a paper about the automated Bulgarian\ -% hyphenation.[^11] At that time she was a junior member of the\ -% Department of Computational Linguistics at the Institute for Bulgarian\ -% Language but now she is a director of the whole institute. The paper\ -% of Koeva contains a list of hyphenation patterns which can be used as\ -% a basis of automated hyphenation. In 2004 with the help of Stoyan\ -% Mihov[^12] the rules of Koeva were formalised with regular relations\ -% and rewriting rules. They were implemented in a software product\ -% named ItaEst which provided Bulgarian hyphenation and grammar checking\ -% for various software products of Microsoft and Apple.\ -% \ -% [^10]: <http://dcl.bas.bg/svetla_koeva/>\ -% \ -% [^11]: Коева, Светла. Правила за пренасяне на части от думите на нов\ -% ред. Български език. 1999/2000, 1, 84-86\ -% \ -% [^12]: <http://lml.bas.bg/~stoyan/>\ -% \ -% The main differences between the hyphenation of Koeva and the official\ -% hyphenation rules effective after 2012 is that the separation of a\ -% long sequence of consonants between two vowels is done according to\ -% the rules valid before 1983. For example се-стра /se-stra/ and\ -% ай-сберг /ay-sberg/ are permitted. The main difference between the\ -% hyphenation of Koeva and the official hyphenation rules effective\ -% before 1983 is that the rules of Koeva disregard the morphology of the\ -% words. The following rule of Koeva is specific: in a sequence of two\ -% sonorant consonants between two vowels, we are permitted to separate\ -% the first vowel from the first consonant, for example материа-лна\ -% /materia-lna/.\ -% \ -% In 2000 Anton Zinoviev[^13] created new hyphenation patterns for TeX.\ -% He didn't know about the previous work of Boshnakov and he didn't\ -% bother to make his work available in the various TeX distributions and\ -% CTAN. His work was used mostly by the local Linux enthusiasts and the\ -% colleagues of Zinoviev. In 2001 Radostin Radnev[^14] created a free\ -% grammar dictionary of Bulgarian[^15] where he used the hyphenation\ -% patterns of Zinoviev. From there the work of Zinoviev propagated to\ -% OpenOffice, LibreOffice and various online dictionaries, including\ -% <http://bg.wiktionary.org> and <http://rechnik.chitanka.info>.\ -% \ -% [^13]: The author of this text.\ -% \ -% [^14]: <http://bg.linkedin.com/in/radostinradnev>\ -% \ -% [^15]: <http://bgoffice.sourceforge.net/>\ -% \ -% The following are the main differences between the hyphenation of\ -% Zinoviev and the hyphenation of Boshnakov.\ -% \ -% First, the eighth rule of Belogay is fully implemented.\ -% \ -% Second, the rules of Zinoviev try to detect when the letters дж /dzh/\ -% (and дз /dz/) denote a single consonant and when they denote a\ -% sequence of two consonants. By default, however, Zinoviev (like\ -% Boshnakov) assumes that дж /dzh/ is a single consonant and hyphenates\ -% accordingly.\ -% \ -% Third, the rules of Zinoviev disable some cases of unpleasant\ -% hyphenations:\ -% \ -% 1. In a consonant sequence like тст /tst/, the two equal consonants т\ -% /t/ are separated. For example братст-во /bratst-vo/ is forbidden\ -% while братс-тво /brats-tvo/ and брат-ство /brat-stvo/ are\ -% permitted.\ -% 2. The hyphenation is forbidden after a sonorant consonant following\ -% an obstruent consonant. For example отм-ра /otm-ra/ is forbidden\ -% and от-мра /ot-mra/ is permitted.\ -% 3. The hyphenation separates two consecutive kindred voiced/voiceless\ -% consonants. For example субп-родукт /subp-roduct/ is forbidden and\ -% суб-продукт /sub-product/ is permitted.\ -% \ -% At the start of his work on the Bulgarian hyphenation, Zinoviev had\ -% the opportunity to discuss the hyphenation with Svetla Koeva. He\ -% remembers that some cases of unpleasant hyphenation were suggested to\ -% him by Koeva. Unfortunately, he hasn't taken notes so now he doesn't\ -% know which cases of unpleasant hyphenation have been suggested to him\ -% by Koeva and which are his own findings.\ -% \ -% The present work\ -% ================\ -% \ -% Motivation\ -% ----------\ -% \ -% The present work was carried out on the initiative of the leader of\ -% the Bulgarian localisation team of Mozilla, who contacted Zinoviev,\ -% Boshnakov and the maintainers of the TeX hyphenation patterns.[^17]\ -% This work pursues the following main objectives:\ -% \ -% 1. to update the hyphenation patterns in accordance with the current\ -% hyphenation rules;\ -% 2. to generate the hyphenation patterns by a publicly available\ -% script;\ -% 3. to make the hyphenation patterns customisable;\ -% 4. to provide documentation for the future developers.\ -% \ -% [^16]: <http://mozillians.org/en-US/u/stoyan/>\ -% \ -% [^17]: <http://hyphenation.org>\ -% \ -% The current official hyphenating rules for Bulgarian are rather\ -% liberal. Very often, in a long sequence of consonants we are\ -% permitted to split the word at any position, for example аген-т-с-т-во\ -% /agen-t-s-t-vo/. This is prone to many unusual and unexpected results\ -% that interrupt the attention of the reader or deceive his expectations\ -% during the movement of his eyes to the next line. On the other hand,\ -% in order to produce nice justified paragraphs there is no need for so\ -% many hyphenation possibilities. It would be sufficient even if only\ -% one possible separation between any two syllables was permitted.\ -% \ -% Therefore, it makes sense to use a more restrictive version of the\ -% Bulgarian hyphenation, one which eliminates the controversial cases of\ -% hyphenation. Only when typesetting a Bulgarian text in a very narrow\ -% newspaper column it will be appropriate to use a more liberal version.\ -% It should be noted that some specialised English dictionaries also\ -% separate the word-division positions into two categories – preferred\ -% positions and less recommended positions.\ -% \ -% There are two methods to determine the optimal division within a\ -% sequence of consonants between two vowels:\ -% \ -% * we can hyphenate according to the syllables in the word or\ -% * we can hyphenate morphologically.\ -% \ -% Hyphenation according to the syllables in the word\ -% --------------------------------------------------\ -% \ -% Let us look at the properties of the Bulgarian syllables. All\ -% syllables have the following structure:\ -% \ -% > onset - nucleus - code\ -% \ -% The nucleus in Bulgarian is always a vowel. Both the onset and the\ -% code are (possibly empty) sequences of consonants.\ -% \ -% The Bulgarian syllables adhere to the Sonority Sequencing Principle.\ -% According to this principle, the consonants within the onset have\ -% raising sonority and the consonants within the code have decreasing\ -% sonority.\ -% \ -% Several grammar books agree that the following sonority scale is valid\ -% for Bulgarian:\ -% \ -% > voiceless obtrusive < voiced obtrusive < sonorant consonant < vowel\ -% \ -% According to the investigations of the author, the only exception to\ -% this law is due to the letter в /v/ which is a voiced obtrusive but it\ -% can be used also as a voiceless obtrusive. This exception is due to a\ -% spelling particularity of the Bulgarian language. Whenever the letter\ -% в /v/ seemingly violates the Sonority Sequencing Principle, in the\ -% spoken language this letter is read as ф /f/, that is as a voiceless\ -% obtrusive (for example the word отвсякъде /otvsyakade/ is read as\ -% отфсякъде /otfsyakade/).[^18]\ -% \ -% [^18]: No Primitive Slavonic word contains the phoneme ф /f/.\ -% Therefore, we can safely assume that in the Primitive Slavonic\ -% language the consonant ф /f/ was a positional variant of the consonant\ -% в /v/.\ -% \ -% The author has found that the sonorant consonants in Bulgarian have\ -% their own sonority scale:\ -% \ -% > м /m/ < н /n/ < л /l/ < р /r/ < й /y/\ -% \ -% Only a few words such as жанр /zhanr/ and химн /himn/ violate this\ -% scale. Such words are always loan-words and their pronunciation is\ -% somewhat problematic for the native Bulgarian speakers.\ -% \ -% In addition to the Sonority Sequencing Principle, the consonant\ -% clusters within the Bulgarian syllable adhere to the following\ -% additional principles:\ -% \ -% 1. Both in the onset and in the code, the labial and dorsal plosives\ -% precede the coronal plosives and affricates.\ -% 2. If the onset or the code contains two plosives or affricates, then\ -% there are no fricatives between them. Few words with the Latin\ -% root 'text' are exceptions: контекст /kontekst/.\ -% 3. If the onset or the code contains two fricatives other than в /v/,\ -% then there are no plosives or affricates between them.\ -% 4. If the onset or the code contains two plosives or affricates, then\ -% they both have equal sonority (both are voiced, or both are\ -% voiceless).\ -% 5. If the onset or the code contains two fricatives other than в /v/,\ -% then they both have equal sonority (both are voiced, or both are\ -% voiceless).\ -% 6. Neither the onset, nor the code may contain two labial plosives, or\ -% two coronal plosives or affricates or two dorsal plosives.\ -% 7. Neither the onset, nor the code may contain two equal consonants\ -% with the exception of в /v/ (for example втвърди /vtvardi/).[^19]\ -% \ -% [^19]: Actually, the letter в /v/ is not a real exception because in\ -% all such cases this letter denotes two different consonants – в /v/\ -% and ф /f/. Only in the Russian loan-word взвод /vzvod/ the two\ -% letters в /v/ denote a repeating consonant в /v/.\ -% \ -% From all these properties of the Bulgarian syllable we can deduce the\ -% following hyphenation rules:\ -% \ -% 1. In a sequence МК where М is a consonant with higher sonority than\ -% K, we are not permitted to hyphenate before М. Exception: when М\ -% is в /v/ and К is a voiceless consonant.\ -% 2. In a sequence КМ where М is a consonant with higher sonority than\ -% K, we are not permitted to hyphenate after М.\ -% 3. In a sequence KBT where K and T are plosives or affricates and B is\ -% fricative, we separate K from T.\ -% 4. In a sequence CKB where K is a plosive or affricate and C and B are\ -% fricatives other than в /v/, we separate C from B.\ -% 5. If in a consonant sequence a coronal plosive or affricate Т is\ -% followed by a labial or dorsal plosive К, then we separate Т from К.\ -% 6. If a consonant sequence contains two plosives or affricates, one\ -% voiced and one voiceless, then we separate them.\ -% 7. If a consonant sequence contains two fricatives other than в /v/,\ -% one voiced and one voiceless, then we separate them.\ -% 8. If a consonant sequence contains two labial plosives or two coronal\ -% plosives or affricates or two dorsal plosives then they are\ -% separated.\ -% 9. If a consonant sequence contains two equal consonants (not\ -% necessarily consecutive), then they are separated.\ -% \ -% With so many prohibitive rules, a question arises: if we apply all\ -% these rules, aren't we going to eliminate too many hyphenation\ -% possibilities? The answer is no. It can be demonstrated that between\ -% any two consecutive syllables at least one separation point will be\ -% permitted.\ -% \ -% \ -% Hyphenation according to the morphology\ -% ---------------------------------------\ -% \ -% Between 1983 and 2012 the official orthographic rules of the\ -% Bulgarian language forbade morphologically based hyphenation. After\ -% 2012 such hyphenation is permitted (but not obligatory).\ -% \ -% The most important case when it is very desirable to use\ -% morphologically based hyphenation is the case of the compound words.\ -% Divisions such as авток-луб /avtok-lub/ and вакуу-мапарат\ -% /vakuu-maparat/ are extremely irritating even if they are formally\ -% correct. Unfortunately, we do not have a vocabulary of the compound\ -% Bulgarian words that would permit us to produce rules for automated\ -% hyphenation. Therefore, the current Bulgarian hyphenation patterns do\ -% not attempt to apply morphological hyphenation to such words.\ -% \ -% Second in importance (but far more significant in terms of numbers) is\ -% the case with the word prefixes. While the eyes of the reader still\ -% look at the start of the word, the word is still unknown to him. At\ -% this point, it is very important not to deceive his expectations. For\ -% example, when the reader sees над- /nad-/ at the end of the line, he\ -% will expect that this is the prefix над- /nad-/ with semantics 'attain\ -% more than'. This expectation will be fooled if this wasn't really a\ -% prefix, but a deceiving (while formally correct) hyphenation of the\ -% word надремя /nadremya/ 'have dozed enough' where the real prefix is\ -% not над- /nad-/ but на- /na-/ with semantics 'achieve a state after\ -% accumulation'. Such hyphenation distracts the reader and makes the\ -% reading more difficult.\ -% \ -% Third in importance is the case with the word suffixes. With respect\ -% to the hyphenation rules we can divide the suffixes into three\ -% categories:\ -% \ -% 1. Suffixes starting with a vowel, for example -ар /-ar/. It is not\ -% appropriate to follow the morphology with such suffixes because\ -% this will contradict the whole hyphenation tradition of the\ -% Bulgarian language. For example крав-ар /krav-ar/ is unwarranted.\ -% 2. Suffixes starting with one consonant, for example -ка /-ka/.\ -% Usually with such suffixes the syllable boundary in the word\ -% coincides with morpheme boundary so no specific cares are\ -% necessary, for example кравар-ка /kravar-ka/. The exceptions are\ -% rare, for example: обек-тната /obek-tnata/ instead of обект-ната\ -% /obekt-nata/.\ -% 3. Suffixes starting with more than one consonant (-ски /-ski/, -ство\ -% /-stvo/). It is possible to use morphological hyphenation rules\ -% with such suffixes.\ -% \ -% Even if it is possible to use morphological hyphenation with the\ -% suffixes of the third category, it turns out, this is not as useful as\ -% it is with the case of the prefixes. When the eyes of the reader have\ -% reached this part of the word, the word is already more or less known\ -% to the reader. Therefore, at this point the morphological hyphenation\ -% does not provide any significant advantages in comparison to the\ -% simpler hyphenation based only on the syllables in the word. Consider\ -% for example the word геройс-тво /geroys-tvo/ with suffix -ство\ -% /-stvo/. When the reader sees геройс- /geroys-/ at the end of the\ -% line this will give him an early clue that the suffix of the word is\ -% -ство /-stvo/. Such non-morphological hyphenation does not deceive\ -% the expectations of the reader. On the contrary, it makes the reading\ -% easier because it gives clues to the reader about what follows on the\ -% next line.\ -% \ -% Because of these considerations, the current Bulgarian hyphenation\ -% patterns do not attempt to use morphological hyphenation with respect\ -% to the suffixes of the words. Though it would be useful to implement\ -% rules about the suffixes of the second cateogory. Hopefully, some\ -% future version will have such rules.\ -% \ -% Occasionally,[^20] a fourth morphological requirement is stated: that\ -% hyphenation should conform with the boundary between the word and the\ -% definitive articles -та /-ta/ and -те /-te/ (postfixed in Bulgarian).\ -% There is no need to pay attention to this rule because it seems to be\ -% satisfied by its own nature. The author has searched in a dictionary\ -% with over 860000 Bulgarian words for cases when the hyphenation rules\ -% would hyphenate badly with respect to the definitive article. He was\ -% unable to find even one such case with the hyphenation rules valid\ -% after 1983 and only about 10 cases with the rules valid before 1983\ -% (one of them is живопи-ста /zhivopi-sta/ instead of живопис-та\ -% /zhivopis-ta/).\ -% \ -% One unavoidable characteristic of any morphologically based automated\ -% hyphenation is that it can create wrong hyphenations. Because of\ -% this, one useful option is to use the morphology in a safe way – to\ -% use it in order to forbid bad hyphenations but to create no new\ -% hyphenation possibilities solely on the basis of the morphology.\ -% \ -% Take for example the word дозрея /dozreya/ 'ripen fully'. According\ -% to the phonological rules, we should hyphenate it as доз-рея\ -% /doz-reya/. According to the morphology, however, we should hyphenate\ -% as до-зрея /do-zreyq/ because this word is formed with the prefix до-\ -% /do-/ with semantics 'complete or supplement' and this semantics would\ -% be lost if the reader sees доз- /doz-/ at the end of the line.\ -% Therefore, there are three methods to hyphenate this word:\ -% \ -% 1. доз-рея /doz-reya/ when morphology is not used;\ -% 2. до-зрея /do-zreya/ when morphology is fully used;\ -% 3. дозрея /dozreya/ (no hyphenation) when morphology is used in a safe\ -% way.\ -% \ -% The option to use the morphology in a safe way is very attractive when\ -% the software uses a smart line-breaking algorithm which can produce\ -% good results even with less hyphenation possibilities. TeX is one\ -% such software. It should be noted that this option does not eliminate\ -% too many hyphenation possibilities because the morpheme boundaries\ -% most of the time are also syllable boundaries.\ -% \ -% [^20]: Правописен и правоговорен наръчник. Състав. Иван Хаджов,\ -% Цв. Минков; Ред. Ив. Хаджов и др. София, Бълг. кн., 1945\ -% \ -% The following are results of a statistics about the quality of the\ -% morphological rules (the number after the sign ± is the expected\ -% standard deviation of our estimations):\ -% \ -% With the option `--morphology`:\ -% \ -% * in 0.1% ±0.3% of the dictionary words the morphological patterns\ -% create very wrong hyphenation;\ -% * in 89.8% ±0.1% of the dictionary words the morphological patterns\ -% hyphenate identically with the case when no morphology patterns are\ -% used;\ -% * in 0.3% ±0.2% of the dictionary words the morphological patterns\ -% hyphenate differently in comparison to the case when no morphology\ -% patterns are used and the word is hyphenated in a way which\ -% contradicts the morphology;\ -% * in 0.6% ±0.1% of the dictionary words the morphological patterns\ -% hyphenate differently in comparison to the case when no morphology\ -% patterns are used and there is a possible hyphenation which is\ -% compatible with the word morphology but which is nevertheless\ -% forbidden by the morphology patterns.\ -% \ -% With the option `--safe-morphology`:\ -% \ -% * in 0% of the dictionary words the morphological patterns create very\ -% wrong hyphenation;\ -% * in 90.0% ±0.1% of the dictionary words the morphological patterns\ -% hyphenate identically with the case when no morphology patterns are\ -% used;\ -% * in 0.3% ±0.2% of the dictionary words the morphological patterns\ -% hyphenate differently in comparison to the case when no morphology\ -% patterns are used and the word is hyphenated in a way which\ -% contradicts the morphology;\ -% * in 0.6% ±0.1% of the dictionary words the morphological patterns\ -% hyphenate differently in comparison to the case when no morphology\ -% patterns are used and there is a possible hyphenation which is\ -% compatible both with the word morphology and with the syllable\ -% boundaries but which is nevertheless forbidden by the morphology\ -% patterns.\ -% \ -% Notice that the morphological patterns create a different hyphenation\ -% only in about 10% of the words. The following explanation can be\ -% given for this surprising fact. First, the natural evolution of the\ -% human languages tends to simplify the complex sequences of consonants.\ -% Therefore, no morpheme contains a complex sequence of consonants. And\ -% second, the Bulgarian orthography is morphological. This means that\ -% the morphemes are written according to their actual pronunciation,\ -% however the simplifications in the spoken languages which take place\ -% at the morpheme boundaries are not taken into account in the\ -% orthography. The independent operation of these two factors leads to\ -% the result that most of the time the morpheme boundaries coincide with\ -% the conventional syllable boundaries. The main exception to this is\ -% when a morpheme starts with a vowel, in this case its syllable will\ -% include one or more consonants of the preceeding morpheme. The second\ -% exception is when a morpheme ends with a vowel and the next morpheme\ -% starts with a sequence of two or more consonants.\ -% \ -% Usage of the script `hyph-bg.sh`\ -% --------------------------------\ -% \ -% The `hyph-bg.sh` is all-in-one script which can generate both\ -% documentation (this text) and Bulgarian hyphenation patterns. When\ -% given the option `--help` the script gives short usage instructions:\ -% \ -% ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\ -% hyph-bg.sh --help\ -% Show this info\ -% hyph-bg.sh [--doc-html | --doc-latex | --doc-txt]\ -% Print documentation in various formats\ -% hyph-bg.sh [other options]\ -% Generate Bulgarian hyphenation patterns\ -% \ -% Options when generating hyphenation patterns:\ -% \ -% --standalone-tex\ -% Produce hyphenation patterns for TeX with \\patterns{ ... }.\ -% \ -% --no-hyphen-mins\ -% Hyphenation patterns which do not require hyphen mins.\ -% Otherwise: both left and right hyphen mins should be set to 2.\ -% \ -% --safe-dz\ -% Do not try to guess whether DZ is a single consonant or not.\ -% Only use hyphenation which will be correct in both cases.\ -% \ -% --permissible\ -% Permit any formally correct hyphenation, including unnatural\ -% divisions, such as studen-tstvo. Useful for educational tools\ -% or when typesetting Bulgarian text in a very short column.\ -% \ -% --morphology\ -% Apply morphology when hyphenating, for example: za-dvizhvam.\ -% May hyphenate incorrectly in some cases.\ -% \ -% --safe-morphology\ -% Apply morphology when hyphenating. Never hyphenates incorrectly\ -% but may prohibit some correct hyphenations.\ -% \ -% --no-morphology\ -% Disregard the morphology. Default.\ -% \ -% --1945\ -% Hyphenate according to the rules effective between 1945 and 1982\ -% \ -% --1983\ -% Hyphenate according to the rules effective between 1983 and 2011\ -% \ -% --2012\ -% Hyphenate according to the rules effective after 2012. Default.\ -% ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\ -% \ -% The following are the recommended ways to generate hyphenation\ -% patterns by this script:\ -% \ -% `hyph-bg.sh --standalone-tex --safe-morphology`\ -% : For TeX. Apply the morphology in a safe way when the software\ -% uses a smart line-breaking algorithm.\ -% \ -% `hyph-bg.sh`\ -% : For most other software.\ -% \ -% `hyph-bg.sh --no-hyphen-mins`\ -% : The current versions of Mozilla (as of 2017) seem to ignore the\ -% hyphen mins in words that contain a dash.\ -% \ -% `hyph-bg.sh --morphology`\ -% : For professional typography with human proof-reader.\ -% \ -% `hyph-bg.sh --permissible`\ -% : For educational tools and online dictionaries which can show only one\ -% kind of hyphenation.\ -% \ -% Notice that some specialised English dictionaries separate the\ -% word-division positions into two categories – preferred positions and\ -% less recommended positions. It would be best if the Bulgarian online\ -% dictionaries could do the same. For example hyphen \"-\" can be used to\ -% display the preferred positions and dot \".\" – the less recommended\ -% positions. If a word-division position is permitted only by the\ -% patterns of `hyph-bg.sh --permissible`, then this position is less\ -% recommended.\ -% \ -% \ -% \\message{Bulgarian hyphenation patterns (options: --safe-morphology --standalone-tex, version 21 October 2017)}\ -% ", + ["texcomment"]="% no comment", }, ["patterns"]={ ["characters"]="абвгдежзийклмнопрстуфхцчшщъюя", |