Saturday, December 25, 2010

The Chinese net and machine translation

Chinese, for a time, will pass English as a net language. The authors imply that a predictable course, but they forget that the world's largest english speaking nation is India. So things may go back and forth for a while.

Even so, this would be a good time to make English-Chinese machine translation actually work.

Let me say that again with a bit more emphasis.

Working, bidirectional, English-Chinese machine translation may be the single most important technological goal of this decade.

I'll leave it to the reader to imagine why it will be so important. If you think about it for a few minutes, you should be able to come up with a good list.

Is this an achievable goal? I'm not sure. On the one hand we already have reasonable translation between closely related european languages. On the other, Google's current English-Chinese translation is worthless. The only time I've seen it work was when the Chinese article was a translation of an interview conducted with an English speaker. I know very little about the field, but I wonder if Google's statistical approach has run into a brick wall. Effective English-Chinese machine translation may require other approaches.

I'm not sure, but I would bet we'll see it work within ten years. As we get closer, I wonder if we'll start to see development of writing styles that are easier to translate. Any (typically unilingual) English speaker who routinely works with non-English speakers learns to speak in a form that's easier to translate. Sentences are shorter. Syntax is simpler, but vocabulary is more precise and often more technical. There are fewer short words with multiple meanings, and more polysyllabic words with single interpretations. Depending on the non-english speakers language, certain phonemes are avoided. Compositional words, made up of reusable terms, may work better than novel strings.

The resulting form is certainly English, but it is a technical and streamlined form of English.

Obviously, there are equivalent versions of written and spoken Chinese.

I suspect that as English-Chinese machine translation starts to become useful, these modified forms of written expression will play an important role.

Good luck with this one Google. Get it right!

