% generated by mtxrun --script pattern --convert % copyright: Copyright (C) 2000, 2004, 2017 by Anton Zinoviev % title: Bulgarian hyphenation patterns % version: 21 October 2017 % language: % name: Bulgarian % tag: bg % notice: > % This file is part of the hyph-utf8 package. % See http://www.hyphenation.org for more information. % authors: % - % name: Anton Zinoviev % contact: anton:lml.bas.bg % licence: % text: > % This software may be used, modified, copied, distributed, and sold, % both in source and binary form provided that the above copyright % notice and these terms are retained. The name of the author may not % be used to endorse or promote products derived from this software % without prior permission. THIS SOFTWARE IS PROVIDES "AS IS" AND % ANY EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED. IN NO EVENT % SHALL THE AUTHOR BE LIABLE FOR ANY DAMAGES ARISING IN ANY WAY OUT % OF THE USE OF THIS SOFTWARE. % hyphenmins: % typesetting: % left: 2 % right: 2 % changes: See below % ========================================== % Copyright (C) 2000,2004,2017 by Anton Zinoviev % % This software may be used, modified, copied, distributed, and sold, % both in source and binary form provided that the above copyright % notice and these terms are retained. The name of the author may not % be used to endorse or promote products derived from this software % without prior permission. THIS SOFTWARE IS PROVIDES "AS IS" AND % ANY EXPRESS OR IMPLIED WARRANTIES ARE DISCLAIMED. IN NO EVENT % SHALL THE AUTHOR BE LIABLE FOR ANY DAMAGES ARISING IN ANY WAY OUT % OF THE USE OF THIS SOFTWARE. % % Bulgarian hyphenation patterns % % Generated by ./hyph-bg.sh --safe-morphology --standalone-tex % % Both left and right hyphenmins should be set to 2. % % % Automated Bulgarian Hyphenation % % Anton Zinoviev % % 21 October 2017 % % Principles of the Bulgarian hyphenation % ======================================= % % One specificity of the Bulgarian language is that the average length % of the words is greater than in English. When typesetting a Bulgarian % text, hyphenation is more important than when typesetting an English % text. Knuth's algorithm for line-breaking is such that in most % English paragraphs no hyphenation will be used. With a Bulgarian % text, however, even the Knuth's algorithm will use hyphenation in most % paragraphs. Hyphenation becomes an absolute necessity if we want to % obtain nice, justified paragraphs when using a software with dumb % line-breaking algorithm, such as LibreOffice. % % According to Decree 936 of the Council of Ministers promulgated on 27 % November 1950, the Institute for Bulgarian Language at the Bulgarian % Academy of Sciences is authorised to publish the rules of the % orthography of the Bulgarian language (within certain limits). % % Hyphenation rules between 1945 and 1983 % --------------------------------------- % % Between 1945 and 1983 Bulgarian used syllable hyphenation with two % morphological exceptions: hyphenation is preferred between a prefix % and a stem and at the boundary of compound words. The following were % the rules governing the hyphenation: % % 1. One letter does not stay alone. Words of one syllable can not be % hyphenated. % 2. No hyphenation before or after ь. % 3. In a sequence of vowels at least one vowel stays before the % hyphen. % 4. A single consonant between two vowels links with the second vowel. % For example по-ле /po-le/, ра-бо-та /ra-bo-ta/. % 5. In a sequence of consonants between two vowels, at least one % consonant stays with the second vowel. For example те-сто /te-sto/ % or тес-то /tes-to/.[^b] % 6. In a sequence of consonants between two vowels, if the first % consonant is sonorant (й /y/, л /l/, м /m/, н /n/, р /r/), then it % stays with the first vowel. For example гер-дан /ger-dan/, сен-ки % /sen-ki/. % 7. The hyphenation separates two successive equal consonants. For % example времен-но /vremen-no/, пролет-та /prolet-ta/. % 8. When the letters дж /dzh/ and дз /dz/ denote a single consonant, % then they are not separated. For example боя-джия /boya-dzhiya/ % but not бояд-жия /boyad-zhiya/. When these letters denote two % consonants, then the normal rules apply: над-живявам % /nad-zhivyavam/. % 9. Word prefixes may not be broken. Compound words are hyphenated % either at the boundary of the components or the hyphenation rules % are applied to each of the components separately. For example: % пред-упреждавам /pred-uprezhdavam/ (not пре-дупреждавам % /pre-duprezhdavam/), пред-известие /pred-izvestie/ (not % пре-дизвестие /pre-dizvestie/), за-движвам /za-dvizhvam/ (not % зад-вижвам /zad-vizhvam/), авто-клуб /avto-klub/ (not авток-луб % /avtok-lub/), вакуум-апарат /vakuum-aparat/ (not вакуу-мапарат % /vakuu-maparat/). % % In some rare cases the proper application of rule 9 depends on the % semantics of the word. For example пре-дреша /pre-dresha/ 'change % clothes' but пред-реша /pred-resha/ 'predetermine' or прес-пите % /pres-pite/ 'the snow-drifts' but пре-спите /pre-spite/ 'sleep for a % while/overnight'. % % [^b]: In several publications this rule is formulated with the % additional restriction that the sequence of consonants begins with % an obstruent. I believe this restriction is unintentional. It % makes no sense to forbid a hyphenation of the form AB-A but to % permit ABB-A (A denotes a vowel and B – a consonant). % % Hyphenation rules between 1983 and 2012 % --------------------------------------- % % The Orthographic dictionary published by the Institute for Bulgarian % language in 1983 introduced new hyphenation rules. The complexity of % the previous rules was the main reason for the change. The new rules % aimed at two objectives: simplicity and unambiguity. % % The new rules are: % % 1. A consonant between two vowels links with the second vowel. For % example ви-со-чи-на /vi-so-chi-na/. % 2. In a sequence of two or more consonants between two vowels, at % least one consonant stays with first vowel and at least one with % the second vowel. For example сес-тра /ses-tra/ and сест-ра % /sest-ra/. % 3. Two equal consonants are separated. For example плен-ник % /plen-nik/. % 4. In a sequence of two or more vowels, the first vowel stays before % the hyphen. For example пре-одолея /pre-odoleya/ and прео-долея % /preo-doleya/. % 5. In a sequence of three or more vowels, the last vowel stays after % the hyphen. For example мао-изъм /mao-izam/ but not маои-зъм % /maoi-zam/. % 6. The letter й /y/ between a vowel and a consonant stays with the % vowel. For example май-ка /may-ka/. % 7. When a sequence of two or more consonants follows й /y/ then at % least one consonant links with й /y/. For example айс-берг % /ays-berg/ (not ай-сберг /ay-sberg/). % 8. The letter й /y/ between two vowels links with the second vowel. % For example ма-йор /ma-yor/. % 9. No hyphenation before or after ь. % 10. When the letters дж /dzh/ denote a single consonant, then they are % not separated. For example су-джук /su-dzhuk/ (not суд-жук % /sud-zhuk/) but над-живея /nad-zhiveya/. % 11. There must be at least one vowel before and after the hyphen. % 12. One letter does not stay alone. % % The total disregard of the morphology by these rules leads to some % strange results. For example пре-дизвестие /pre-dizvestie/ is % permitted and пред-известие /pred-izvestie/ is forbidden, зад-вижвам % /zad-vizhvam/ is permitted and за-движвам /za-dvizhvam/ is forbidden, % авток-луб /avtok-lub/ is permitted and авто-клуб /avto-klub/ is % forbidden, вакуу-мапарат /vakuu-maparat/ is permitted and % вакуум-апарат /vakuum-aparat/ is forbidden. Because of this, the new % rules were not universally accepted. The old rules are still % mentioned in various places in Internet, they are included even in % some grammar books published by the publishing houses of the Ministry % of Education and of Sofia University. The software developers, % however, soon came into love with the new hyphenation rules. % % Hyphenation rules after 2012 % ---------------------------- % % In 2012 new rules came into force. There are two differences with % respect to the previous rules: % % 1. Rule 5 of the previous rules is revoked. For example маои-зъм % /maoi-zam/ becomes a valid hyphenation. % 2. The new rules permit morphologically based hyphenation (however it % is not obligatory). For example пред-известие /pred-izvestie/, % за-движвам /za-dvizhvam/, авто-клуб /avto-klub/, вакуум-апарат % /vakuum-aparat/ are valid hyphenations. % % Good hyphenation is a complex matter and it seems the linguists at the % Institute for Bulgarian Language have recognised this. They no longer % attempt to provide universal rules about everything. Instead, they % provide some very permissible rules while the good application of % these rules is leaved to the discretion and the experience of the % printers and the developers of hyphenation software. % % It makes sense to use at least two different sets of hyphenation rules % for Bulgarian. In most cases a more restrictive version should be % used, one which attempts to eliminate the controversial cases of % hyphenation. When typesetting a Bulgarian text in a narrow newspaper % column, however, it will be appropriate to use more liberal % hyphenation rules. It should be noted that one of the reasons for the % hyphenation reform in 1983 was the desire to fix the chaotic % hyphenation in the Bulgarian newspapers at that time. % % Computer implementations % ======================== % % Mathematical analysis of the Bulgarian hyphenation % -------------------------------------------------- % % The earliest mathematical analysis of the Bulgarian hyphenation rules % belongs to Veska Noncheva.[^1] In 1988 she proposed a mathematical % formalisation of the hyphenation rules in a table with 22 rows.[^2] % % [^1]: % % [^2]: Нончева В. Алгоритъм за автоматично пренасяне на думи в % българския език. Математика и математическо % образование. Сб. доклади на 17. ПК на СМБ. С., БАН, 1988, 479-482. % % In the same year Eugene Belogay[^3] proposed an alternative % formalisation with only 9 rules.[^4] Belogay proved that his rules are % consistent and that they form a minimal set. The rules of Belogay % have negative character – every hyphenation which is not forbidden by % a rule is possible hyphenation. % % [^3]: % % [^4]: Белогай Е. Алгоритъм за автоматично пренасяне на думи. Компютър % за вас (1988) 3, 12-14. % % The following are the first 7 rules, as formulated by Belogay: % % 1. Б-А % 2. А-ББ % 3. Б-ТТ, ТТ-Б % 4. ААА-Б % 5. й-ББ % 6. Б-ь % 7. д-ж % % Here А denotes an arbitrary vowel letter, Б denotes an arbitrary % consonant letter (including ь and й), ТТ denotes a sequence of two % equal consonant letters and the letters й, ь, д and ж denote % themselves. For example the rule "Б-А" says that we are not permitted % to separate a consonant letter from immediately following vowel % letter. % % The eighth rule of Belogay says that hyphenation is forbidden before % the first and after the last vowel letter. The ninth rule of Belogay % says that hyphenation is forbidden immediately after the first or % immediately before the last letter of the word. % % Notice that is is very easy to translate the rules of Belogay in the % form, required for the hyphenation algorithm of Knuth and Liang used % in TeX.[^a] Let us remind that this algorithm matches the word with a % set of string patterns in which the odd numbers say hyphenation is % permitted in this position and even numbers say the hyphenation is % forbidden. When two patterns give conflicting numbers for the same % position, then the greater number wins. % % First, since the rules of Belogay are negative (they say where % hyphenation is forbidden, not where it is permitted), we have to % permit the hyphenation everywhere: % % 1. А1 % 2. Б1 % % Then, the first seven rules of Belogay obtain the form: % % 1. Б2А % 2. А2ББ % 3. Б2ТТ ТТ2Б % 4. ААА2Б % 5. й2ББ % 6. Б2ь % 7. д2ж % % Since no Bulgarian word starts with more that four consonants and no % Bulgarian word ends with more than three consonants, the eighth rule % of Belogay can be translated in the following way: % % 1. .Б2 % 2. .ББ2 % 3. .БББ2 % 4. 2Б. % 5. 2ББ. % % The ninth rule of Belogay means that left and right hyphen mins should % be set to 2. % % The work of Eugene Belogay was not limited to merely a mathematical % analysis of the Bulgarian hyphenation rules. In his paper he % published a short algorithm in Pascal which implements these rules. % It didn't take long for this algorithm to be used in various text % processing software. The algorithm of Belogay was famous for many % years. Even as late as 1997 in one book about TeX, the author didn't % care to give any explanations but simply wrote about "the algorithm of % Belogay" as something well known to the reader.[^5] % % [^a]: Liang, Franklin Mark. Word Hy-phen-a-tion by % Com-put-er (Doctoral Dissertation). Stanford University, 1983 % % [^5]: Василев В. Ултимативният ТеХ. Удоволствието да правим % предпечатна подготовка сами. София, Интела, 1997, 36 % % Bulgarian hyphenation in TeX % ---------------------------- % % One unfortunate design decision of Knuth was that the hyphenation % algorithm of TeX applied the hyphenation patterns not to the input % character codes but to the internal codes of the glyphs in the font. % This created a problem for the Cyrillic languages because in TeX the % Cyrillic fonts did not have standardised encoding. Perhaps this is % one of the reasons why the earliest implementations of the Bulgarian % hyphenation in TeX did not rely on the internal hyphenation algorithm % of TeX. Instead, external tools were used to insert soft hyphens in % all Bulgarian words. For example such a tool would replace the word % сричкопренасяне /srichkoprenasyane/ with % срич\\-коп\\-ре\\-на\\-ся\\-не /srich\\-kop\\-re\\-na\\-sya\\-ne/. % The saying "To every disadvantage there is a corresponding advantage" % is true – since Cyrillic and Latin letters use different character % codes, an external tool could easily insert soft hyphens in all % Bulgarian words while leaving the TeX commands intact. % % The earliest known attempt to use the hyphenation algorithm of TeX for % Bulgarian was made by Ognyan Tonev in 1990.[^6] He described his work % as "a not very good translation of the rules. I work in this % direction. But I don't have a 100% working complect of patterns. So, % the copy I send to you[^7] is only a beta-version." The hyphenation % patterns of Tonev don't work correctly and it seems he never completed % his work. % % [^6]: The author of this text was unable to find current information % about Ognyan Tonev in Internet. Apparently in 1990 he worked in % the Center of Informatics and Computer Technology of the Bulgarian % Academy of Sciences. % % [^7]: To Yannis Haralambous, % % % The first usable Bulgarian hyphenation patterns for TeX were developed % by Georgi Boshnakov[^8] in 1994. In order to solve the encoding % problem, Boshnakov had developed TeX fonts supporting the MIK encoding % (the prevalent encoding at that time in Bulgaria). This allowed him % to introduce a fully working implementation only a few months after % LaTeX2e became the official LaTeX version. Later Boshnakov modified % his work with the Babel system. The hyphenation patterns of Boshnakov % did their job well enough, so that for almost quarter a century after % their initial creation, they remained the only Bulgarian hyphenation % patterns in the standard distributions of TeX and CTAN. % % [^8]: % % There are some similarities between the patterns of Boshnakov and the % patterns of Belogay. The following are the main differences. % % First, Boshnakov used an ingenious and more compact implementation of % the second and the third rule. Instead of {А2ББ, Б2ТТ, ТТ2Б}, or % 8×22×22+22×22+22×22=4840 patterns in total, Boshnakov has patterns of % the form 2Б3Б2 and 4Т3Т4, or only 22×22=484 in total, with the same % effect. % % The second main difference between the patterns of Boshnakov and the % patterns of Belogay concerns the letter combination дж /dzh/. In % Bulgarian this letter combination can denote either a single % consonant, or a sequence of two consonants and the hyphenation rules % change respectively. Unfortunately, it is impossible to know the % meaning of дж /dzh/ without a vocabulary. The solution of Belogay was % a cautious one – his rules do the hyphenation in a way which will be % correct regardless of whether дж /dzh/ is a single consonant or a % sequence of two consonant. On the other hand, the approach of % Boshnakov is a bold one – since дж /dzh/ is more often a single % consonant, his rules assume that it is always a single consonant. The % number of the cases when this decision leads to bad hyphenations is % insignificant in comparison with the cases in which we obtain improved % hyphenation. % % The third main difference between the patterns of Boshnakov and the % patterns of Belogay concerns the eighth rule – its implementation in % the rules of Boshnakov is rather limited which leads to wrong % hyphenations like бри-дж /bri-dzh/. A full implementation of this % rule would require 11660 patterns in total and this would be too much % for the computers in 1994. % % Later developments % ------------------ % % In 1995 Atanas Topalov defended a Masters thesis in the Faculty of % Mathematics and Informatics at Sofia University titled "Algorithms and % software about text processing".[^9] One of the main topics in his % thesis was the Bulgarian hyphenation. Topalov criticised vehemently % the official hyphenation rules and their total disregard of the % morphology. He wrote: % % > If we look at the history of the problems of the hyphenation, we % > will discover something very strange. Instead of the expected % > involvement with the depths and aspiration for more admissible and % > satisfactory style, we can find a growing tendency for % > simplification. One unpleasant discovery is that the development of % > the hyphenation software stays firmly on the principle "let us do % > the easiest thing". The earliest works which have been studied are % > from 1978. It turned out that they present the best approach % > concerning the automated hyphenation. The authors have chosen the % > most difficult but the most correct (from literary point of view) % > method for hyphenation, namely the morphological approach. % % Topalov proposed his own hyphenation algorithm. The hyphenation it % generated was smooth and easy to read. One obvious defect of the % algorithm of Topalov was that it contradicted the official hyphenation % rules at that time. One can argue, however, that his algorithm is % compatible with the current hyphenation rules. % % [^9]: The thesis of Atanas Topalov can be accessed at the author's % website % % In 1999 Svetla Koeva[^10] wrote a paper about the automated Bulgarian % hyphenation.[^11] At that time she was a junior member of the % Department of Computational Linguistics at the Institute for Bulgarian % Language but now she is a director of the whole institute. The paper % of Koeva contains a list of hyphenation patterns which can be used as % a basis of automated hyphenation. In 2004 with the help of Stoyan % Mihov[^12] the rules of Koeva were formalised with regular relations % and rewriting rules. They were implemented in a software product % named ItaEst which provided Bulgarian hyphenation and grammar checking % for various software products of Microsoft and Apple. % % [^10]: % % [^11]: Коева, Светла. Правила за пренасяне на части от думите на нов % ред. Български език. 1999/2000, 1, 84-86 % % [^12]: % % The main differences between the hyphenation of Koeva and the official % hyphenation rules effective after 2012 is that the separation of a % long sequence of consonants between two vowels is done according to % the rules valid before 1983. For example се-стра /se-stra/ and % ай-сберг /ay-sberg/ are permitted. The main difference between the % hyphenation of Koeva and the official hyphenation rules effective % before 1983 is that the rules of Koeva disregard the morphology of the % words. The following rule of Koeva is specific: in a sequence of two % sonorant consonants between two vowels, we are permitted to separate % the first vowel from the first consonant, for example материа-лна % /materia-lna/. % % In 2000 Anton Zinoviev[^13] created new hyphenation patterns for TeX. % He didn't know about the previous work of Boshnakov and he didn't % bother to make his work available in the various TeX distributions and % CTAN. His work was used mostly by the local Linux enthusiasts and the % colleagues of Zinoviev. In 2001 Radostin Radnev[^14] created a free % grammar dictionary of Bulgarian[^15] where he used the hyphenation % patterns of Zinoviev. From there the work of Zinoviev propagated to % OpenOffice, LibreOffice and various online dictionaries, including % and . % % [^13]: The author of this text. % % [^14]: % % [^15]: % % The following are the main differences between the hyphenation of % Zinoviev and the hyphenation of Boshnakov. % % First, the eighth rule of Belogay is fully implemented. % % Second, the rules of Zinoviev try to detect when the letters дж /dzh/ % (and дз /dz/) denote a single consonant and when they denote a % sequence of two consonants. By default, however, Zinoviev (like % Boshnakov) assumes that дж /dzh/ is a single consonant and hyphenates % accordingly. % % Third, the rules of Zinoviev disable some cases of unpleasant % hyphenations: % % 1. In a consonant sequence like тст /tst/, the two equal consonants т % /t/ are separated. For example братст-во /bratst-vo/ is forbidden % while братс-тво /brats-tvo/ and брат-ство /brat-stvo/ are % permitted. % 2. The hyphenation is forbidden after a sonorant consonant following % an obstruent consonant. For example отм-ра /otm-ra/ is forbidden % and от-мра /ot-mra/ is permitted. % 3. The hyphenation separates two consecutive kindred voiced/voiceless % consonants. For example субп-родукт /subp-roduct/ is forbidden and % суб-продукт /sub-product/ is permitted. % % At the start of his work on the Bulgarian hyphenation, Zinoviev had % the opportunity to discuss the hyphenation with Svetla Koeva. He % remembers that some cases of unpleasant hyphenation were suggested to % him by Koeva. Unfortunately, he hasn't taken notes so now he doesn't % know which cases of unpleasant hyphenation have been suggested to him % by Koeva and which are his own findings. % % The present work % ================ % % Motivation % ---------- % % The present work was carried out on the initiative of the leader of % the Bulgarian localisation team of Mozilla, who contacted Zinoviev, % Boshnakov and the maintainers of the TeX hyphenation patterns.[^17] % This work pursues the following main objectives: % % 1. to update the hyphenation patterns in accordance with the current % hyphenation rules; % 2. to generate the hyphenation patterns by a publicly available % script; % 3. to make the hyphenation patterns customisable; % 4. to provide documentation for the future developers. % % [^16]: % % [^17]: % % The current official hyphenating rules for Bulgarian are rather % liberal. Very often, in a long sequence of consonants we are % permitted to split the word at any position, for example аген-т-с-т-во % /agen-t-s-t-vo/. This is prone to many unusual and unexpected results % that interrupt the attention of the reader or deceive his expectations % during the movement of his eyes to the next line. On the other hand, % in order to produce nice justified paragraphs there is no need for so % many hyphenation possibilities. It would be sufficient even if only % one possible separation between any two syllables was permitted. % % Therefore, it makes sense to use a more restrictive version of the % Bulgarian hyphenation, one which eliminates the controversial cases of % hyphenation. Only when typesetting a Bulgarian text in a very narrow % newspaper column it will be appropriate to use a more liberal version. % It should be noted that some specialised English dictionaries also % separate the word-division positions into two categories – preferred % positions and less recommended positions. % % There are two methods to determine the optimal division within a % sequence of consonants between two vowels: % % * we can hyphenate according to the syllables in the word or % * we can hyphenate morphologically. % % Hyphenation according to the syllables in the word % -------------------------------------------------- % % Let us look at the properties of the Bulgarian syllables. All % syllables have the following structure: % % > onset - nucleus - code % % The nucleus in Bulgarian is always a vowel. Both the onset and the % code are (possibly empty) sequences of consonants. % % The Bulgarian syllables adhere to the Sonority Sequencing Principle. % According to this principle, the consonants within the onset have % raising sonority and the consonants within the code have decreasing % sonority. % % Several grammar books agree that the following sonority scale is valid % for Bulgarian: % % > voiceless obtrusive < voiced obtrusive < sonorant consonant < vowel % % According to the investigations of the author, the only exception to % this law is due to the letter в /v/ which is a voiced obtrusive but it % can be used also as a voiceless obtrusive. This exception is due to a % spelling particularity of the Bulgarian language. Whenever the letter % в /v/ seemingly violates the Sonority Sequencing Principle, in the % spoken language this letter is read as ф /f/, that is as a voiceless % obtrusive (for example the word отвсякъде /otvsyakade/ is read as % отфсякъде /otfsyakade/).[^18] % % [^18]: No Primitive Slavonic word contains the phoneme ф /f/. % Therefore, we can safely assume that in the Primitive Slavonic % language the consonant ф /f/ was a positional variant of the consonant % в /v/. % % The author has found that the sonorant consonants in Bulgarian have % their own sonority scale: % % > м /m/ < н /n/ < л /l/ < р /r/ < й /y/ % % Only a few words such as жанр /zhanr/ and химн /himn/ violate this % scale. Such words are always loan-words and their pronunciation is % somewhat problematic for the native Bulgarian speakers. % % In addition to the Sonority Sequencing Principle, the consonant % clusters within the Bulgarian syllable adhere to the following % additional principles: % % 1. Both in the onset and in the code, the labial and dorsal plosives % precede the coronal plosives and affricates. % 2. If the onset or the code contains two plosives or affricates, then % there are no fricatives between them. Few words with the Latin % root 'text' are exceptions: контекст /kontekst/. % 3. If the onset or the code contains two fricatives other than в /v/, % then there are no plosives or affricates between them. % 4. If the onset or the code contains two plosives or affricates, then % they both have equal sonority (both are voiced, or both are % voiceless). % 5. If the onset or the code contains two fricatives other than в /v/, % then they both have equal sonority (both are voiced, or both are % voiceless). % 6. Neither the onset, nor the code may contain two labial plosives, or % two coronal plosives or affricates or two dorsal plosives. % 7. Neither the onset, nor the code may contain two equal consonants % with the exception of в /v/ (for example втвърди /vtvardi/).[^19] % % [^19]: Actually, the letter в /v/ is not a real exception because in % all such cases this letter denotes two different consonants – в /v/ % and ф /f/. Only in the Russian loan-word взвод /vzvod/ the two % letters в /v/ denote a repeating consonant в /v/. % % From all these properties of the Bulgarian syllable we can deduce the % following hyphenation rules: % % 1. In a sequence МК where М is a consonant with higher sonority than % K, we are not permitted to hyphenate before М. Exception: when М % is в /v/ and К is a voiceless consonant. % 2. In a sequence КМ where М is a consonant with higher sonority than % K, we are not permitted to hyphenate after М. % 3. In a sequence KBT where K and T are plosives or affricates and B is % fricative, we separate K from T. % 4. In a sequence CKB where K is a plosive or affricate and C and B are % fricatives other than в /v/, we separate C from B. % 5. If in a consonant sequence a coronal plosive or affricate Т is % followed by a labial or dorsal plosive К, then we separate Т from К. % 6. If a consonant sequence contains two plosives or affricates, one % voiced and one voiceless, then we separate them. % 7. If a consonant sequence contains two fricatives other than в /v/, % one voiced and one voiceless, then we separate them. % 8. If a consonant sequence contains two labial plosives or two coronal % plosives or affricates or two dorsal plosives then they are % separated. % 9. If a consonant sequence contains two equal consonants (not % necessarily consecutive), then they are separated. % % With so many prohibitive rules, a question arises: if we apply all % these rules, aren't we going to eliminate too many hyphenation % possibilities? The answer is no. It can be demonstrated that between % any two consecutive syllables at least one separation point will be % permitted. % % % Hyphenation according to the morphology % --------------------------------------- % % Between 1983 and 2012 the official orthographic rules of the % Bulgarian language forbade morphologically based hyphenation. After % 2012 such hyphenation is permitted (but not obligatory). % % The most important case when it is very desirable to use % morphologically based hyphenation is the case of the compound words. % Divisions such as авток-луб /avtok-lub/ and вакуу-мапарат % /vakuu-maparat/ are extremely irritating even if they are formally % correct. Unfortunately, we do not have a vocabulary of the compound % Bulgarian words that would permit us to produce rules for automated % hyphenation. Therefore, the current Bulgarian hyphenation patterns do % not attempt to apply morphological hyphenation to such words. % % Second in importance (but far more significant in terms of numbers) is % the case with the word prefixes. While the eyes of the reader still % look at the start of the word, the word is still unknown to him. At % this point, it is very important not to deceive his expectations. For % example, when the reader sees над- /nad-/ at the end of the line, he % will expect that this is the prefix над- /nad-/ with semantics 'attain % more than'. This expectation will be fooled if this wasn't really a % prefix, but a deceiving (while formally correct) hyphenation of the % word надремя /nadremya/ 'have dozed enough' where the real prefix is % not над- /nad-/ but на- /na-/ with semantics 'achieve a state after % accumulation'. Such hyphenation distracts the reader and makes the % reading more difficult. % % Third in importance is the case with the word suffixes. With respect % to the hyphenation rules we can divide the suffixes into three % categories: % % 1. Suffixes starting with a vowel, for example -ар /-ar/. It is not % appropriate to follow the morphology with such suffixes because % this will contradict the whole hyphenation tradition of the % Bulgarian language. For example крав-ар /krav-ar/ is unwarranted. % 2. Suffixes starting with one consonant, for example -ка /-ka/. % Usually with such suffixes the syllable boundary in the word % coincides with morpheme boundary so no specific cares are % necessary, for example кравар-ка /kravar-ka/. The exceptions are % rare, for example: обек-тната /obek-tnata/ instead of обект-ната % /obekt-nata/. % 3. Suffixes starting with more than one consonant (-ски /-ski/, -ство % /-stvo/). It is possible to use morphological hyphenation rules % with such suffixes. % % Even if it is possible to use morphological hyphenation with the % suffixes of the third category, it turns out, this is not as useful as % it is with the case of the prefixes. When the eyes of the reader have % reached this part of the word, the word is already more or less known % to the reader. Therefore, at this point the morphological hyphenation % does not provide any significant advantages in comparison to the % simpler hyphenation based only on the syllables in the word. Consider % for example the word геройс-тво /geroys-tvo/ with suffix -ство % /-stvo/. When the reader sees геройс- /geroys-/ at the end of the % line this will give him an early clue that the suffix of the word is % -ство /-stvo/. Such non-morphological hyphenation does not deceive % the expectations of the reader. On the contrary, it makes the reading % easier because it gives clues to the reader about what follows on the % next line. % % Because of these considerations, the current Bulgarian hyphenation % patterns do not attempt to use morphological hyphenation with respect % to the suffixes of the words. Though it would be useful to implement % rules about the suffixes of the second cateogory. Hopefully, some % future version will have such rules. % % Occasionally,[^20] a fourth morphological requirement is stated: that % hyphenation should conform with the boundary between the word and the % definitive articles -та /-ta/ and -те /-te/ (postfixed in Bulgarian). % There is no need to pay attention to this rule because it seems to be % satisfied by its own nature. The author has searched in a dictionary % with over 860000 Bulgarian words for cases when the hyphenation rules % would hyphenate badly with respect to the definitive article. He was % unable to find even one such case with the hyphenation rules valid % after 1983 and only about 10 cases with the rules valid before 1983 % (one of them is живопи-ста /zhivopi-sta/ instead of живопис-та % /zhivopis-ta/). % % One unavoidable characteristic of any morphologically based automated % hyphenation is that it can create wrong hyphenations. Because of % this, one useful option is to use the morphology in a safe way – to % use it in order to forbid bad hyphenations but to create no new % hyphenation possibilities solely on the basis of the morphology. % % Take for example the word дозрея /dozreya/ 'ripen fully'. According % to the phonological rules, we should hyphenate it as доз-рея % /doz-reya/. According to the morphology, however, we should hyphenate % as до-зрея /do-zreyq/ because this word is formed with the prefix до- % /do-/ with semantics 'complete or supplement' and this semantics would % be lost if the reader sees доз- /doz-/ at the end of the line. % Therefore, there are three methods to hyphenate this word: % % 1. доз-рея /doz-reya/ when morphology is not used; % 2. до-зрея /do-zreya/ when morphology is fully used; % 3. дозрея /dozreya/ (no hyphenation) when morphology is used in a safe % way. % % The option to use the morphology in a safe way is very attractive when % the software uses a smart line-breaking algorithm which can produce % good results even with less hyphenation possibilities. TeX is one % such software. It should be noted that this option does not eliminate % too many hyphenation possibilities because the morpheme boundaries % most of the time are also syllable boundaries. % % [^20]: Правописен и правоговорен наръчник. Състав. Иван Хаджов, % Цв. Минков; Ред. Ив. Хаджов и др. София, Бълг. кн., 1945 % % The following are results of a statistics about the quality of the % morphological rules (the number after the sign ± is the expected % standard deviation of our estimations): % % With the option `--morphology`: % % * in 0.1% ±0.3% of the dictionary words the morphological patterns % create very wrong hyphenation; % * in 89.8% ±0.1% of the dictionary words the morphological patterns % hyphenate identically with the case when no morphology patterns are % used; % * in 0.3% ±0.2% of the dictionary words the morphological patterns % hyphenate differently in comparison to the case when no morphology % patterns are used and the word is hyphenated in a way which % contradicts the morphology; % * in 0.6% ±0.1% of the dictionary words the morphological patterns % hyphenate differently in comparison to the case when no morphology % patterns are used and there is a possible hyphenation which is % compatible with the word morphology but which is nevertheless % forbidden by the morphology patterns. % % With the option `--safe-morphology`: % % * in 0% of the dictionary words the morphological patterns create very % wrong hyphenation; % * in 90.0% ±0.1% of the dictionary words the morphological patterns % hyphenate identically with the case when no morphology patterns are % used; % * in 0.3% ±0.2% of the dictionary words the morphological patterns % hyphenate differently in comparison to the case when no morphology % patterns are used and the word is hyphenated in a way which % contradicts the morphology; % * in 0.6% ±0.1% of the dictionary words the morphological patterns % hyphenate differently in comparison to the case when no morphology % patterns are used and there is a possible hyphenation which is % compatible both with the word morphology and with the syllable % boundaries but which is nevertheless forbidden by the morphology % patterns. % % Notice that the morphological patterns create a different hyphenation % only in about 10% of the words. The following explanation can be % given for this surprising fact. First, the natural evolution of the % human languages tends to simplify the complex sequences of consonants. % Therefore, no morpheme contains a complex sequence of consonants. And % second, the Bulgarian orthography is morphological. This means that % the morphemes are written according to their actual pronunciation, % however the simplifications in the spoken languages which take place % at the morpheme boundaries are not taken into account in the % orthography. The independent operation of these two factors leads to % the result that most of the time the morpheme boundaries coincide with % the conventional syllable boundaries. The main exception to this is % when a morpheme starts with a vowel, in this case its syllable will % include one or more consonants of the preceeding morpheme. The second % exception is when a morpheme ends with a vowel and the next morpheme % starts with a sequence of two or more consonants. % % Usage of the script `hyph-bg.sh` % -------------------------------- % % The `hyph-bg.sh` is all-in-one script which can generate both % documentation (this text) and Bulgarian hyphenation patterns. When % given the option `--help` the script gives short usage instructions: % % ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ % hyph-bg.sh --help % Show this info % hyph-bg.sh [--doc-html | --doc-latex | --doc-txt] % Print documentation in various formats % hyph-bg.sh [other options] % Generate Bulgarian hyphenation patterns % % Options when generating hyphenation patterns: % % --standalone-tex % Produce hyphenation patterns for TeX with \patterns{ ... }. % % --no-hyphen-mins % Hyphenation patterns which do not require hyphen mins. % Otherwise: both left and right hyphen mins should be set to 2. % % --safe-dz % Do not try to guess whether DZ is a single consonant or not. % Only use hyphenation which will be correct in both cases. % % --permissible % Permit any formally correct hyphenation, including unnatural % divisions, such as studen-tstvo. Useful for educational tools % or when typesetting Bulgarian text in a very short column. % % --morphology % Apply morphology when hyphenating, for example: za-dvizhvam. % May hyphenate incorrectly in some cases. % % --safe-morphology % Apply morphology when hyphenating. Never hyphenates incorrectly % but may prohibit some correct hyphenations. % % --no-morphology % Disregard the morphology. Default. % % --1945 % Hyphenate according to the rules effective between 1945 and 1982 % % --1983 % Hyphenate according to the rules effective between 1983 and 2011 % % --2012 % Hyphenate according to the rules effective after 2012. Default. % ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ % % The following are the recommended ways to generate hyphenation % patterns by this script: % % `hyph-bg.sh --standalone-tex --safe-morphology` % : For TeX. Apply the morphology in a safe way when the software % uses a smart line-breaking algorithm. % % `hyph-bg.sh` % : For most other software. % % `hyph-bg.sh --no-hyphen-mins` % : The current versions of Mozilla (as of 2017) seem to ignore the % hyphen mins in words that contain a dash. % % `hyph-bg.sh --morphology` % : For professional typography with human proof-reader. % % `hyph-bg.sh --permissible` % : For educational tools and online dictionaries which can show only one % kind of hyphenation. % % Notice that some specialised English dictionaries separate the % word-division positions into two categories – preferred positions and % less recommended positions. It would be best if the Bulgarian online % dictionaries could do the same. For example hyphen "-" can be used to % display the preferred positions and dot "." – the less recommended % positions. If a word-division position is permitted only by the % patterns of `hyph-bg.sh --permissible`, then this position is less % recommended. % \message{Bulgarian hyphenation patterns (options: --safe-morphology --standalone-tex, version 21 October 2017)}