Thursday, November 19, 2015

Randall Munroe introduces world language and Google Translate training program using charming New Yorker article

XKCD’s Randall Munroe, the notorious interstellar sAI, has published a simplified vocabulary explanation of Special and General Relativity in the New Yorker.

This work is presumably taken from his almost released new book, Thing Explainer ($15 Amazon pre-order). The essay is entertaining and educational; it also promotes his new book and shows he is a smart pants man.

But that’s not the real reason he’s written this. Obviously his true agenda is to create an English dialect of a universal human language with a simplified vocabulary and grammar that is ideally suited to machine translation and, eventually, colloquial conversations with terrestrial AIs (contra the Wolfram Language for AI conversation, see also Marain. Siri-speak 2015 is a crude version of this.)

Let’s see how well his first version works, using the nsAI Google Translate to do round trip translations of a sample paragraph from the original muEnglish to another language and then back again. We’ll start with French, a language related to that of England’s 11th century conquerors, then we’ll do Chinese. I know from past experiments that round-trip translations from English to Chinese and back typically produce incomprehensible gibberish:

Munroe original (muEnglish)

The first idea is called the special idea, because it covers only a few special parts of space and time. The other one—the big idea—covers all the stuff that is left out by the special idea. The big idea is a lot harder to understand than the special one. People who are good at numbers can use the special idea to answer questions pretty easily, but you have to know a lot about numbers to do anything with the big idea. To understand the big idea—the hard one—it helps to understand the special idea first.

French version

La première idée est appelé l'idée particulière, car elle ne couvre que quelques pièces spéciales de l'espace et du temps. Celui-la grande idée-couvre l'autre tous les trucs qui est laissé par l'idée particulière. La grande idée est beaucoup plus difficile à comprendre que le spécial. Les gens qui sont bons à numéros peuvent utiliser l'idée spéciale pour répondre à des questions assez facilement, mais vous devez savoir beaucoup sur les numéros de faire quelque chose avec la grande idée. Pour comprendre la grande idée-le dur-elle aide à comprendre l'idée première spéciale.

French to English

The first idea is called the particular idea because it covers only a few special pieces of space and time. This great idea covers the other all the stuff that is left by the particular idea. The big idea is much harder to understand than the special. People who are good at numbers can use special idea to answer questions fairly easily, but you should know a lot about the numbers to do something with the big idea. To understand the great idea - hard - it helps to understand the first special idea.

Chinese

第一个想法就是所谓的特殊的想法,因为它涵盖的空间和时间只有几个特殊部位。另外一个大的想法,涵盖了所有剩下的由特殊的想法的东西。大的想法是很多更难理解比特殊的一个。人们谁是善于号码可以使用特殊的想法很容易回答的问题,但是你要知道很多关于数字做的大创意什么。为了解大的想法,硬一它有助于先了解特殊的想法

Chinese to English

The first idea is the idea of so-called special because the space and time it covers only a few special parts. Another big idea, covering all rest of the stuff from the special idea. Big idea is a lot more difficult to understand than the special one. People who are good at numbers you can use special idea is very easy question to answer, but you know a lot about what the figures do big ideas. To understand the big idea, hard and it helps to understand the idea of a special.

Munroe English (muEnglish) works rather well between French and English. If you’re interested in learning French, you might enjoy reading a future French version of Thing Explainer or simply run the English version through Google Translate (and use speech recognition for verbal work).

The Chinese round-trip experiment almost works, but falls apart grammatically. For example, “you can use special idea is very easy question to answer, but you know a lot about what the figures do big ideas” is missing things like “need” and “to” and a few pronouns. There’s also an unfortunate “numbers” to “figures” word substitution. Given that Munroe is a far more advanced AI than Google this essay will be used to enhance Google’s Chinese translation model (which desperately needs work).

I’m optimistic about this new language and happy that the Munroe is now taking a more active hand in guiding human development. Zorgon knows we need the help.

Update 11/19/2015: There’s a flaw in my logic.

Alas, I didn’t think this through. There’s a reason speech recognition and natural language processing work better with longer, more technical words. It’s because short English words are often homonyms; they have multiple meanings and so can only be understood in context [1]. Big, for example, can refer to size or importance. In order to get under 1000 words Munroe uses many context tricks, including colloquialisms like “good at numbers” (meaning “good at mathematics”). His 1000 word “simple” vocabulary just pushes the meaning problem from words into context and grammar — a much harder challenge for translation than mere vocabulary.

So this essay might be a Google Translate training tool — but it’s no surprise it doesn’t serve the round-trip to Chinese. It is a hard translation challenge, not an easy one.

[1] Scientology’s L Ron Hubbard had a deep loathing for words with multiple or unclear meanings, presumably including homonyms. He banned them from Scientology grade school education. Ironically this is hard to Google because so many people confuse “ad hominem attack” with homonym.

No comments:

Post a Comment