Categories
English Blog

No Transliterations

I’ve received several requests to add transliterations to this website.

A transliteration is simply a way of writing Hindi words using the English alphabet, e.g. “aap kaise hain?”.

I entertained these requests very seriously, but I have finally concluded that I will not add transliterations.

Now, I reserve the right to change my mind in the future, of course. I take the suggestions and comments of people who use this site seriously, and I appreciate feedback. I spent a lot of time experimenting with code for performing automatic transliteration, and I thought about these issues a lot.  I would like to explain my rationale for the sake of those people who are interested in transliteration.

1. Automatic Transliteration is Difficult

There are thousands of words on this site written in Devanagari. Manually updating every word to include a parallel transliteration is totally impractical. So, the only way to add transliterations is to use automatic transliteration. In other words, I would have to write software that transliterates the Devanagari text on my website.

There are three general approaches to automatic transliteration:

  • Use a unique symbol for each Devanagari symbol
  • Use a rule-based algorithm
  • Use a statistical algorithm

I’ll explain why none of these is really a good option.

Unique Symbols

The simplest approach is to assign a unique symbol to each Devanagari letter. Some transliteration schemes like ITRANS do this. ITRANS has a different purpose though: it is intended to encode Devanagari, but not necessarily to make it highly legible. Consider an example: the city “चंडीगढ़” is typically written “Chandigarh”. In ITRANS, this is written “cha.nDiiga.Dh”; this is not easily legible. It is also aesthetically displeasing; the mixture of capital letters with lowercase letters and intervening symbols like “.” make it look odd. However, this is not meant as a criticism of ITRANS; its goals are different. If I add transliterations to the site, the purpose is to add legible text for people who can’t read Devanagari.

Now, I could simply choose different symbols. However, this will produce confusing transliterations, due to the phenomenon of “schwa deletion” in Hindi, and other factors. For instance, “करना” would be transliterated as “karana” not as “karna”.

Rule Based Approach

The next approach is to use a rule-based algorithm. I wrote such an algorithm, and it actually works pretty well. Here is a small sample:

हालांकि यह निराशा की बात तो है लेकिन आश्चर्य की बात नहीं कि इतने कम विद्यालय हिंदी सिखाते हैं. अंग्रेजी के अलावा अमेरिका में सबसे अधिक बोली जाने वाली भाषा स्पेनिश है, और फ्रेंच भाषा ने अंग्रेजी भाषा पर बहुत प्रभाव डाला है, इस लिए आश्चर्य की बात नहीं है कि ये दो भाषाएं सबसे अधिक सिखाई जाने वाली भाषाएं हैं. अमेरिकन जनता यूरोपीय भाषाओं से जितना परिचित है हिंदी से उतना नहीं.

[hālānki yah nirāshā kī bāt tō hai lēkin āshchary kī bāt nahīn ki itnē kam vidyālay hindī sikhātē hain. angrējī kē alāvā amērikā mēn sabsē adhik bōlī jānē vālī bhāshā spēnish hai, aur frēnch bhāshā nē angrējī bhāshā par bahut prabhāv ḍālā hai, is liē āshchary kī bāt nahīn hai ki yē dō bhāshāēn sabsē adhik sikhāī jānē vālī bhāshāēn hain. amērikan jantā yūrōpīy bhāshāōn sē jitnā parichit hai hindī sē utnā nahīn]

However, I don’t want to use it for several reasons.

First, such algorithms are fairly complex, which means it will be difficult to develop and maintain. Here are some of the tricky details that the algorithm has to deal with:

Schwa Deletion

For instance, the algorithm must determine when to add the default vowel (अ), and when to suppress it. In the word “करना”, people generally do not transliterate the middle vowel, because it is not pronounced. Thus, most people write “karna”. There is a simple rule that can predict this: since there are vowels on both sides, this vowel is suppressed.

However, this rule fails on a word like अर्थव्यवस्था, which is generally transliterated “arthvyavastha”. The rule will suppress the second “a”: “arthvyavstha”. We could modify the rule so that it doesn’t suppress vowels followed by a conjunct. However, this will again produce an undesirable transliteration: “arthavyavastha”. The rule cannot possibly recognize the fact that अर्थव्यवस्था is a compound word, and thus is an exception to the normal rule.

Mismatch of Phonemes

As one example, consider the transliteration of “व”. This is a phoneme in Hindi. In other words, it represents a set of sounds (“w” and “v”), not a single sound. Hindi speakers don’t differentiate these sounds, so it doesn’t matter. But, the English alphabet does differentiate, so what do we do? Defining rules for this case is very tricky too. For instance, we want to write “wala” for वाला, but “vah” for “वह”.

In Hindi, there are four sounds that the English alphabet represents with “t”. Likewise, there are four sounds that the English alphabet represents with “d”. My original idea was to use diacritical marks like dots and lines to distinguish these letters. Thus, “ḍ” for “ड” but “d” for “द”. I would use “h” to represent aspiration, e.g.  “ḍh” for ढ. However, this is not natural (Hindi speakers don’t actually write transliterations this way) and this requires the reader to be familiar with my fairly arbitrary conventions.

Statistical Algorithms

The next type of algorithm uses statistical methods, such as “conditional random fields” or “maximum entropy models” to derive a transliteration scheme from a large parallel corpus of Hindi words and their respective transliterations. Once a model is constructed from the corpus, and algorithm can select the transliteration that has the highest probability of being correct based on certain features of the Hindi word. This is impractical for my purposes, since I don’t have such a corpus, and this is far too much effort; I’m not trying to earn a PhD in natural language processing or artificial intelligence, folks!

Parsing Page Content

The second aspect of automatic transliteration is where to add the transliterations. Here are some options:

  • If the user hovers the mouse over a word, show its transliteration
  • Try to append transliterations of sentences next to the sentences
  • Add an option to transliterate the entire page, replacing Devanagari text

I don’t really like any of these options.

Many users probably won’t realize that a mouseover feature exists, and it doesn’t allow a person to read more than one word at a time.

It is very difficult to delimit sentences that are interspersed with arbitrary HTML content. I would run the risk of corrupting pages if I modify them to include transliterations.

I could transliterate the entire page, but then the user won’t be able to read Devanagari text and transliterations in parallel.

2. Automatic Transliteration Defeats its Own Purpose

What exactly is the goal of an automatic transliteration? Well, I assume that the goal is to enable people who can’t read Devanagari to read and pronounce Hindi text. However, no matter how good the transliteration scheme is, the reader will have to become familiar with Hindi before it makes any sense. How would someone know that “h” represents aspiration and “d” is a dental consonant in “dh”, unless they first learn about Hindi phonetics anyway? If the reader is going to learn about Hindi consonants and vowels, why not just learn Devanagari?

Now, learning how to read transliterated Hindi is a very good thing to do, since most Hindi speakers use it when typing (and even writing) Hindi. But, it is only really legible if you are already familiar with the language.

3. Devanagari is a Nearly Ideal Transcription of the Sounds of Hindi

The Devanagari script represents the sounds of the Hindi language almost perfectly. Each letter represents a distinct sound, with few exceptions. People who are learning Hindi actually have a huge advantage. They can “sound out” almost any word that they read. Contrast this with learning how to pronounce English words!

4. I Want to Encourage People to Learn Devanagari

By only writing in Devanagari, I encourage people to learn it. I have provided some good resources for learning Devanagari on this site. I encourage people to learn it first.

Now, I realize that different people have different goals. A person might want to learn a few phrases without investing the effort to learn Devanagari. However, I cannot accommodate everyone on my site.

5. There Are Tools Like “Google Translate”

Google Translate will produce transliterations and back-transliterations. You can use it to hear the pronunciation of words too. People can use this tool while learning Hindi.

6. Automatic Transliterations Won’t Be Natural

Automatic transliterations (except those produced by a statistical algorithm) are not “natural”, i.e. Hindi speakers don’t actually write that way.

In summary, I have seriously entertained the idea of adding transliterations, but I ultimately decided that I will not add them.

Categories
English Blog

How To Learn Hindi

Recently, I’ve met several non-natives who speak English very well. I asked them how they learned English so well, and, interestingly, every one of them had the same response:

“I watched a lot of television and movies”.

Their consistent testimony suggests that input is the key to language learning. This is no secret; many people regard input as the most fundamental aspect of language acquisition.

This website has a wealth of information that is very useful to Hindi learners, but you won’t learn Hindi just by studying Hindi; you need a lot of input and a lot of practice. This website is a great tool, but you need more to understand Hindi well.

Input

“Input” simply refers to the activity of reading Hindi and listening to Hindi. You need a LOT of input if you want to learn Hindi well. This is the most important aspect of learning Hindi; to learn Hindi well, you need a TON of input. It is not possible to say how much a person needs, but if you really want to learn Hindi well, then it will probably take several years worth of consistent daily input.

Reading

For reading, I suggest news websites like BBC Hindi. BBC Hindi has a good mixture of writing styles and content. There are other online Hindi news sites, such as Aaj Tak and Jagran, but they contain a lot of tabloid journalism and annoying advertisements. For the purpose of language learning, though, it doesn’t matter, as long as you can find some interesting content. You can also read blogs for less formal, more colloquial Hindi. It’s a good idea to vary the content so that your input isn’t too biased. I try to read at least 10 news articles every day, but there’s no reason to set an arbitrary goal like “10 per day”; try to enjoy the process of getting input. You won’t get “burned out”, and you’ll, well, enjoy it! I read articles that seem interesting, such as science and technology articles, foreign affairs, entertainment news, blogs, etc. I also like to read printed material too. I read Hindi short stories, books, and other material.

When reading, sometimes I am intentional, and sometimes I am not. I just do whatever I feel like doing. If I want to stop and look up a word in a dictionary, I do; otherwise, I ignore it. If I want to re-read a sentence, I do; otherwise, I just keep going. If I want to analyze the grammar, I do; but, often, I just try to read continuously and enjoy the content. Developing the skill to notice things in context is very useful for a language learner.

In my experience, I’ve learned a lot of words, grammar, and idioms just by reading. In the beginning, reading was laborious, but over time my speed has improved greatly. Don’t get discouraged, just keep reading!

Listening

Fortunately, the Hindi film and television industry is prolific, so there’s no shortage of material! There are many movies and television series available in Hindi. I usually try to watch at least one hour of Hindi television each day. It’s a good excuse to watch T.V.! I prefer T.V. to movies generally, since the show becomes familiar and this familiarity makes it easier to learn. Repeating the same content can be beneficial; watching the same T.V. show or movie multiple times is fine, as long as you don’t get bored and as long as you are still learning. You can watch television shows on YouTube. Some Indian television networks have official YouTube channels, such as Star Classics, SAB TV, or Zee TV.

You’ll learn a lot of grammar and vocabulary just by listening and reading. You’ll eventually notice that you know new grammatical constructions and new vocabulary, and you might not even remember how you learned it!

Getting Started

But, how does an absolute beginner get started? It is very important that the input that you receive is comprehensible. If you understand little or nothing of the input, then you probably will not benefit from it. I studied grammar and vocabulary first, then I began to get a lot of input. In other words, I started with a little understanding from studying, the input increased my understanding, then I could understand more of the input, which increased my understanding even more, then I could understand more input… this process continues, and understanding gradually accelerates. It’s a “snowball” effect.

Transcripts can be useful for a beginner or intermediate learner too. I’ve created a few transcripts on this site. The learner can read the transcript while listening to the audio recording. Creating transcripts is good practice for more advanced learners, but it is a very tedious process.

Output

“Output” refers to speaking and writing Hindi. You will need a lot of input before you can produce any output. You can begin speaking whenever you feel the desire to speak.

It is important to practice with a native speaker. Practicing with non-native speakers can be harmful, since they can’t properly validate your output, and you might acquire some bad habits.

You can begin by writing. Chat programs are a good way to practice initially. You have more time to compose sentences, and you can be certain about what the other person is saying since you have a transcript right in front of you! I often “chat” on IM programs with family members and friends in Hindi.

Speaking will be very difficult for most people. It will be awkward, frustrating, and embarrassing. You will struggle. You simply have to persevere. Eventually, you will learn to speak if you practice enough. Most natives will be very kind and encouraging toward your efforts; however, they will probably overwhelm you in conversation, and they will often be impatient, supplying translations rather than waiting for your response, etc. It is a common experience for an American to hear a foreigner speaking English, but it is a very rare experience for an Indian to hear a foreigner speaking Hindi. As a result, you will become a spectacle. This can be quite embarrassing, but, it is only because they are so fascinated. Don’t get discouraged; keep going! Eventually, you will have to endure the struggle of beginning to speak, but you don’t have to do anything until you are ready.

Vocabulary

Many people like SRS systems like Anki for vocabulary. I also think they are a good way to create a catalog of words and to review them. I personally don’t like to use them, however. I prefer to acquire vocabulary naturally. Learning vocabulary and grammar naturally is better than staring at flash cards. You will learn how to use words in context rather than just a gloss for the word. Knowing a translation for a word doesn’t tell you how to use it, although cleverly designed flashcards could overcome some of these limitations. The experience is much more vivid with multimedia too, which makes it easier to remember.

Summary

So, in summary, if you want to learn Hindi really well, get copious amounts of input by reading, watching T.V., listening to podcasts, etc., and when you are ready, practice speaking a lot with native speakers.

Really, there is no single “right” way to learn Hindi. If something works for you, then keep doing it. However, certain general principles apply to everyone; everyone will need lots of exposure (input) to the language in order to learn it properly.

My Mistakes

I’ve made a few mistakes in language learning:

First, I simply assumed that learning a foreign language well as an adult is not a practical goal. I now am convinced that this is false. As a result of this attitude, I didn’t make a serious effort to learn. I am fascinated by language, so I studied Hindi, but study doesn’t always result in ability.

Second, I studied too much. I came from a background where I studied grammar intensely. I studied classical/koine Greek, which is taught grammatically, since the goal is to read texts and analyze them. Thus, when I started to learn Hindi, I applied the same techniques of intense grammar analysis. Of course, language is a skill; I don’t regret learning grammar; I have an analytical mind, and I enjoy analyzing grammar. However, skill is only acquired through practice and experience.

Lastly, I didn’t get enough input. It was frustrating in the beginning since I understood very little. This was discouraging, so I didn’t listen to enough Hindi. I didn’t understand the importance of input. But, despite all of this, I did listen to a fair amount of Hindi, and eventually I understood more and more. It took me a while to realize how effective input is. Now I realize how helpful input has been, and I intend to get a lot more of it! I want the majority of my effort to be based on input.

So, my learning experience has been inefficient, but, now that I understand my mistakes, I will correct them.

I’m proud of what I’ve learned so far, but I am not satisfied. I am able to understand spoken Hindi well enough to transcribe entire podcasts and T.V. episodes word for word, but I still struggle to understand spoken Hindi sometimes. I can write Hindi well in blog posts, etc. I can speak Hindi, but I am diffident, which will only improve with practice.

I still have a lot to learn. Learning a language isn’t a goal, per se, but a process.

My personal goal for the upcoming year is to get a lot of input and to practice speaking with my wife very consistently. I’m confident that, if I am diligent, I can dramatically improve my ability over the next year.

I hope this will help you to learn Hindi. If you’d like, share your experience with learning Hindi.

Categories
English Blog

Anki

I’ve recently begun using Anki, a “spaced repetition system” (SRS).

Anki is software that helps its users to remember large amounts of information in an efficient way. It can be used to learn many things, but it is especially useful for learning vocabulary.

An SRS is an efficient way to learn vocabulary words. In Anki, you can create “decks” of “cards”, metaphors taken from traditional “flash cards”. On each card, you can enter information, including the word or phrase that you want to remember (i.e., the “front side” of the card) and its translation or meaning or an example (i.e., the “back side” of the card).

For instance, you could enter “खुश” one the front side of a card, and “happy” on the back side, or you could enter “X को Y पता होना” on the front and “for X to know Y, e.g., मुझे यह पता है” on the back.

However, Anki is far more efficient than traditional flash cards. When you study using Anki, the software presents a card to you, and you can rate your confidence (e.g., “I have no idea” / “I know that” / “I know that very well”). Based on your rating, Anki adjusts the cards it will present in the future. Anki uses a statistical algorithm to select which cards to display. This is vastly more efficient, because Anki will only present a small subset of the words to you each time, and it presents the words that you need to study the most. Thus, you spend your time studying what you need to study, and you don’t spend too much time studying. The idea of spaced repetition is that the better you know words, the less often you need to study them. Anki exploits this fact to make studying more efficient.

Anki can be fun – it’s almost like a game, and the sessions don’t last very long. There’s no need to stare at long lists of vocabulary for hours.

Anki offers a lot of additional features too. It can be used online, and you can synchronize decks across multiple computers, for instance.

I’ve enjoyed using Anki so far. Here’s one way that I use Anki: I read news articles online, such as BBC Hindi, and I enter any words that I don’t recognize into Anki so that I can learn them. I also enter some words that I know but that I want to remember. I typically do this in the mornings, then review in the evenings.

There are many other SRS available besides Anki, but Anki is very popular.

Try Anki, and let me know how your experience is.