[Multilingual Monday] The Google Translate Chronicles

Oct 26, 2009 22:25



So I was in Bolingbrook for a Hallowe'en party, and I ran into, among other people, muckefuck, and of course we had to start talking about languages. Someone who was part of our conversation made reference to automatic machine translation. "Well I understand there are pretty good ones these days!" he said, and I could only laugh, especially given my experiences with automatic translation. In a nutshell: all automatic translation has the same Syntran base to it, and there will never be a perfect automatic translator. Problems include context, human error, homonyms, and idiomatic usage. While even the most recent automatic translators seemingly have certain idioms programmed in, they're still far from ideal, particularly when you don't know the source language in order to figure out the Bablefish artifacts.

LANGuAGE: English to Hebrew
SOURCE TEXT: I love this song.
RESULTING TEXT: אני אוהבת את השיר הזה.
THE PROBLEM: This is fine Hebrew ... if you're female. Google Translate, for some unfathomable reason, seems to translate the first person as though the first person is female. I don't know why this is, but I can see this being problematic for, say, the straight guy trying to romance some Israeli girl and figuring he can do it via automatic translation. A similar sentence, "You said it," came out as אתה אמרת את זה, which also show that the second person is automatically assumed to be male. Verbs and pronouns in Semitic languages (with few exceptions) reflect the gender involved in said sentences, and there's no way you can click on "she has a vag" or "he has a penis" in order for the translation software to know, unless you use gender-specific pronouns in the source language.

LANGUAGE: Hebrew to English
SOURCE TEXT: בוועדה לזכויות הילד ידונו היום בבעיית השכרות של הנוער. דו"ח של מכון המחקר בכנסת מצייר תמונת מצב עגומה: כחמישית מהבנים בכיתה ו' הודו ששתו לפחות פעם בשבוע, שליש מבני 15 עד 17 השתכרו לפחות פעם בשנה
RESULTING TEXT: Child rights committee today to discuss the problem of youth drunkenness. Report of the Knesset Research Institute paints a bleak picture: a fifth grade boys and India who drank at least once a week, a third of 15 to 17 earn at least once a year
THE PROBLEM: The gist can be gotten from this news blurb, certainly, but you have amusing little screwups here and there. "India" is caused by הודו, hodu, which indeed is the word for "India," but in this case it is the verb "admitted". "Earned" is caused by השתכרו, which can be "earned," but also can mean "to get drunk".

LANGUAGE: Japanese to English
SOURCE TEXT: 2009年版カレンダーの制作について、多忙の中、発売に間に合う見込みがなくなってしまいました。 楽しみにしていたみなさまの期待に添えず、大変心苦しいのですが、仕事が落ち着くしばらくの間、BG関連製品製作を中止することに決定いたしました。
RESULTING TEXT: For production version of the calendar year 2009, during the busy time for the expected release disappeared. Minasama添Ezu expectations were looking forward to that bothers is very calm for a while to work, BG has decided to stop making products.
THE PROBLEM: The problem??? You mean you don't see the problem?!?! :: laugh :: The further from English you go, the less likely you'll get am even remotely understandable translation. Word order gets particularly slaughtered in translations from Japanese to English even when there's no need; 2009年版, for example, is "2009 year edition," so why did it get split up in the resulting translation? Minasama is みなさま, which is just an English transcription of the same word; it means "everyone" and, while it's not an unheard-of rendering of the word, it's also not standard, causing the translator to vomit on it. Since kanji don't often have a single reading, the translator just doesn't even bother translating the first part of 添えず, here "without meeting" (someone's expectations).

But there is hope; some of the more obvious software translation errors are, well, at least being patched. Da pointed out the infamous 干 problem in Chinese. This character is a simplified form of several characters -- 乾, "dry" and 幹, "trunk", so the character can have several meanings, and one it has managed to acquire is "fuck" -- this is how the character for quite a bit of time had been translated, no thanks in part to dictionaries listing this as a primary meaning. Apparently someone caught on about EVERYTHING in China saying "fuck" because of translation software, because now 不干胶 on Google translate comes out correctly as "adhesive label", and not as a "no fuck label" or something similar.

I'd love to hear your tales of automatic translation!

漢字, multilingual monday, עברית, 日本語, hebrew, translation, chinese, kanji, japanese, babelfish, hanzi, 中文, google translate

Previous post Next post
Up