BulgarianEnglishGreekRussian

A suggestion for transliteration
(for Bulgarian language)

     There are two types of transliteration, between which we have to make a difference. In the first case we are talking about a transliteration provided to be used by foreigners (for example, our names in the identity cards, geographic names written on maps, street signs and signposts). In the second case we are talking about a transliteration with the purpose to be used by Bulgarians (for example, when we are writing to another Bulgarian, but we can't use Cyrillic letters because of a technical problem and we have to code it with Latin letters).
     To make a difference between these two things Professor Lubomir Ivanov thinks that is more appropriate to use the term transliteration in both cases but in the second he's talking about a reversed transliteration with technical purpose.
     The two types transliteration can't be unified in one transliteration scheme because the requirements for the two of them are very different and even opposite. When we translate for foreigners we must do it echoic and we must choose a specific foreign language and consider its rules. When we translate for Bulgarians it is necessary to do that because our readers know how to read the words and even know how these words are written with Cyrillic letters. We only need to code in some simple way the Cyrillic text into Latin and to make it easy and almost automatic to do that in the opposite direction when we need to reconstruct it.

     For the transliteration, meant for foreigners, we consider that it should be used the following transliteration scheme:
 
à - a å - e ê - k ï - p ô - f ù - sht
á - b æ - zh ë - l ð - r õ - h ú - a
â - v ç - z ì - m ñ - s ö - ts ü - y
ã - g è - i í - n ò - t ÷ - ch þ - yu
ä - d é - y î - o ó - u ø - sh ÿ - ya

     The table above is introduced in 1995 in Toponymic Guidelines for Antarctica. Later on it is legalized by a government decree 61/02.04.1999 and 10/11.02.2000 and it is being used by the passport services. Also, the table is included in the Bulgarian spelling dictionary, an Institute for Bulgarian language edition. If you want to learn more about the causes for the introduction of this transliteration table, we recommend you to read Professor Lubomir Ivanov's article: "On the Romanization of Bulgarian and English".

     For reversed transliteration, meant for technical causes, we suggest the 30 letters of the Cyrillic alphabet to be coded with the 26 of the Latin and every letter to have an equivalent in Latin, except for the double (Þ, ß and Ù) and for Ü. These four letters should be coded as ÉÓ, ÉÀ, ØÒ and É.
     How are we going to code the other 26 letters? For 20 of them it easy - like in the table from the questionnaire. There are 6 letters left: É, Ö, Ú, Ø, × and Æ. We suggest the letters É, Ö and Ú to be coded with J, C and Y because they sound similar, because that is the way they are written in the Slavonic languages, using Latin and because that is the way they are coded in the already existing phonetic keyboard.
     There are three more letters to code - Ø, ×, Æ. The letters from the Latin alphabet that are not used are w, q, x and we suggest to code them in this order because of the little graphic similarity.

The coding turns out to be like in the following table but it would be easier to imagine it if you click on the yellow arrow Metodii.com and look at the second version of the questionnaire, which is coded that way.
 
 
à - a å - e ê - k ï - p ô - f ù - wt
á - b æ - x ë - l ð - r õ - h ú - y
â - v ç - z ì - m ñ - s ö - c ü - j
ã - g è - i í - n ò - t ÷ - q þ - ju
ä - d é - j î - o ó - u ø - w ÿ - ja

     The advantages of the suggested version are three:

first, it is comparatively easy to learn, because the coding of almost every letter is logic ,except for three ,in which case is absolutely not logic. That means that you need to remember the coding of only three letters (Ø, ×, Æ).

the second advantage is the brevity. Most Bulgarian letters are coded with one Latin, only three of them make an exception.

the third and most important advantage is that the decoding is very easy. The reason is that almost every letter from Bulgarian is equivalent to the Latin. You can make a mistake only with the double letters or with Ü. For example Ùåêà can be decoded as Øòåêà, which is a mistake but not that horrible like the decoding of Ñõåìà as Øåìà (we mean that ùåêà and øòåêà are spelt different but they sound almost the same).It will be hard to make even that kind of mistake because the double letters are going to be decoded together, except for the words: ïóñòîøòà, íàøòà, âàøòà, íàøòî, âàøòî. You can make a mistake also with the coding of É, Ü and È grave - the three letters are coded with Y but the rule is simple - when it is a separate word, it is È grave, when it is followed by Î and there is a consonant before, it is Ü, in every other case it is É.

     So far the standard is simple and you can practically reverse every literate text. We are going to add four more rules, giving the opportunity Latin words to be included in the text so that you can reverse it 100% (even if it is illiterate).These four rules are going to make the standard more complicated but we can guess that people will not have to remember them and they will be used by the computer programs only.

     1. The complementary rule for a text including Latin words is that the words have to be in comas. If there is an apostrophe in the text, it must be changed to a double apostrophe. (The decoding will not make a difference between the symbol apostrophe and the symbols opening and closing comas, because some text readers automatically change this symbol with the opening and closing apostrophe. That is why the five opening and closing apostrophes ( ) should be also doubled. The Unicode numbers of these symbols are U+2018, U+2019, U+201A, U+2039, U+203A.)
     2. A complementary rule for dividing double letters is writing the symbol / between them. For example, ïóñòîøòà is going to be coded as pustow/ta, and the name Ìàéà is going to be coded as Maj/a (there is such a name and it is different from Ìàÿ). If the person who writes in Latin doesn't know that rule and he has written pustowta, with the decoding it is going to come out as ïóñòîùà instead of ïóñòîøòà. This is not a big problem, because that form of the Bulgarian word ïóñòîø is practically the only word which is coded following that rule.
     3. A complementary rule for making a difference between É, Ü and È grave is to write before J the symbols #, & and *. Of course, they are going to be used only if the letter isn't written following the literacy rules. For example, the sentence: "É è Ü ñà áóêâè" is going to be coded as: "#J i &J sa bukvi". (The literacy rules say that when J is a separate word ,it must be decoded as È grave. When J is in the beginning of the word or after a vocal, it must be decoded as É. In every other case it must be decoded as Ü. These rules are for the case, when it is not strictly noted the right decoding with the symbols #, & and *.)
     4. A complementary rule for coding the symbols /, #, & and *. These symbols should be put in comas when there is a possibility for collision with rules 2 and 3 in any other case. For example, the symbol sequences ø/ò and ò#üî are coded respectively as w'/'t and t'#'&jo.

     For the reversed transliteration (decoding of a word, written in Latin, through the Cyrillic alphabet) we recommend you to use the same table and put the word in comas. For example, the sentence: "Àç ñè èíñòàëèðàõ Windows 98" is going to be coded as: "Àç ñè èíñòàëèðàõ 'Øèíäîøñ' 98". Of course that the word Windows, written in that way can be recognized only by people who know the standard. If we are not worried that much about the spelling, but the way we pronounce the words, then it is better to code them echoic. For example, Windows is going to be 'Óèíäîëñ'. The echoic coding is most appropriate when a Bulgarian has to read a speech in a language he doesn't know.
 
 

Back to the first page of the initiative 
 Back to: 
Methodius.org