Speech recognition technology cannot quite master Arabic.
New apps tackle Arabic translation
Long gone are the days of flipping through foreign dictionaries for just the right phrase to get an idea across. Some of the most advanced hand-held technology instantly translates and transliterates spoken word for spoken word, smoothing out the most awkward of interlingual conversations.
But while electronic language translators are evolving to process street signs and utterances in real time, Arabic's cursive script and reversed word order make it more of a challenge than the European languages that have so far been the focus (the devices face similar problems with Asian languages).
New translation technology relies heavily on databases of translated documents, books and websites. But fewer articles pair Arabic with English than, for example, French with English - meaning those databases are far smaller, offering fewer word patterns. That makes it harder for translations to take account of the context in which words are used, and they end up poorer as a result.
Rather than teaching a machine a complicated language with exceptions to rules, and exceptions to those exceptions, translators are taking a new approach: letting the computers discover grammatical rules and sentence structure themselves by analysing a language and recognising patterns.
In the case of Google, which has access to millions of translated documents, statistical machine translation has turned up billions of such patterns in dozens of languages.
The result is a machine that makes intelligent guesses based on mathematical probabilities, forming phrases and sentences that, more often than not, make sense in the intended context.
Perhaps the technology that makes best use of these models is a mobile application released by Google this year. The conversation mode of Google Translate layers speech recognition with text translation, allowing users to speak a word or phrase into their smartphone and hear it come back in another language.
The programme measures the voice input against audio tracks from a database consisting largely of YouTube videos, including everything from news reports to poorly recorded user-created content.
It recognises phonemes, or utterances, and patterns that could create words or sentences, taking into account different accents and intonations.
Currently it recognises 25 languages and dialects - British English and American English, for instance, are considered separate.
The technology is not perfect, and background noise or multiple speakers can confuse the system as it looks for a string of sounds or words in the database.
In regions where video content is in short supply or where, as in the case of Arabic, the language can vary drastically in dialect and use, the gaps can be more difficult to fill.
Because speech-to-speech translation involves many steps - recognising sounds to form words, converting to text and then translating to be rendered back to speech - there is much room for error, especially for languages with unusual sentence structure.
"Each language brings perhaps complex morphologies, tone, and so on," said Michael Cohen, head of Google's speech recognition team.
"It is a tremendous amount of detail to get right, but as we continue to feed the engine with data and those details, our hope is that it will become ubiquitous. It is evolving very quickly."
Speech synthesis, or producing an artificial human voice, can be tricky with Arabic's guttural rhotic and uvular consonants - such as the "kha" sound in Ras al Khaimah - which make use of the back of the tongue.
Even in English, speech synthesis "sounds mechanical and gets tiring [to listen to], where we need natural-sounding stress patterns and intonation contours," Mr Cohen said.
Context, especially, can be hard to translate. "Every language has its own syntactic rules, and there is always context that comes into play that would not be translatable.
"An extreme example would be to imagine translating a poem. You would need the cultural context to understand the meaning behind the words in order to provide a meaningful translation. That's a hard problem in many languages, including Arabic."
Unlike in most languages, Arabic sentences often start with a verb, while there is also a commonly used, alternate subject-verb-object structure. For instance, a sentence in Arabic could be translated to read "the girl plays with the doll" or, more often, "plays the girl with the doll".
"Translating from Arabic to English, the word order is probably the most difficult aspect," said David Talbot, a research scientist on Google's translation team.
"Getting the correct word order often has a significant impact on the quality of a translation."
The curved text and numerous styles of script also make Arabic a tough candidate for Quest Visual's new, seemingly magical Word Lens mobile app.
When the phone's camera is pointed at text - whether on street signs, newspapers, or menus - the image is altered and augmented through pixel shading, replacing the original words with the Word Lens translation.
The app uses algorithms to recognise hard image features such as letter edges and angles. It requires no internet connection, relying on a dictionary.
But Arabic, say the developers, is a way off - largely because its type can be so variable.
"With Arabic, it is difficult for the computer to make a separation between the letters because they are joined and mushed together, and there are so many different fonts or styles that add or leave out the dots around the text," said Otavio Good, the American programmer who developed the software.
"When you're dealing with European languages, you can walk around the street and most words will generally look the same, like Helvetica. In Arabic, there is a lot more room for style, which makes it difficult."
Word Lens currently only translates between English and Spanish, with French in development. But Good said that while Arabic is possible, it leaves open too much room for error.
"Arabic is fascinating, and something I've always wanted to look into," he said. "But getting a computer to read things in Arabic in the real world, I'm not sure that is something that we can do in the near future."
How digital voice translation works
The user's voice is sent to a speech recognition engine where the binary file input is measured against a database using statistical models on three levels: acoustic, lexical and linguistic.
For the acoustic model, the file measures the acoustic signals for phonemes, of which there are about 40 in the English language. "House", for example, breaks down as hh aw s.
Next, the lexical model looks at the likelihood of words those phonemes could form, based on mathematical rules of conditional probabilities.
Lastly, the linguistic model recognises those words to create phrase units and recognises those to form sentences, also using statistical probability.
Once the system makes its best guess at the words and sentences formed, it converts that information to text. That text is then translated to another language based on partial translations mined from the web. The translated text and speech are sent back to the device to be rendered.
* Erin Conroy