Saturday, December 25, 2010

The Chinese net and machine translation

Chinese, for a time, will pass English as a net language. The authors imply that a predictable course, but they forget that the world's largest english speaking nation is India. So things may go back and forth for a while.

Even so, this would be a good time to make English-Chinese machine translation actually work.

Let me say that again with a bit more emphasis.

Working, bidirectional, English-Chinese machine translation may be the single most important technological goal of this decade.

I'll leave it to the reader to imagine why it will be so important. If you think about it for a few minutes, you should be able to come up with a good list.

Is this an achievable goal? I'm not sure. On the one hand we already have reasonable translation between closely related european languages. On the other, Google's current English-Chinese translation is worthless. The only time I've seen it work was when the Chinese article was a translation of an interview conducted with an English speaker. I know very little about the field, but I wonder if Google's statistical approach has run into a brick wall. Effective English-Chinese machine translation may require other approaches.

I'm not sure, but I would bet we'll see it work within ten years. As we get closer, I wonder if we'll start to see development of writing styles that are easier to translate. Any (typically unilingual) English speaker who routinely works with non-English speakers learns to speak in a form that's easier to translate. Sentences are shorter. Syntax is simpler, but vocabulary is more precise and often more technical. There are fewer short words with multiple meanings, and more polysyllabic words with single interpretations. Depending on the non-english speakers language, certain phonemes are avoided. Compositional words, made up of reusable terms, may work better than novel strings.

The resulting form is certainly English, but it is a technical and streamlined form of English.

Obviously, there are equivalent versions of written and spoken Chinese.

I suspect that as English-Chinese machine translation starts to become useful, these modified forms of written expression will play an important role.

Good luck with this one Google. Get it right!

Thursday, December 23, 2010

Digital cameras 2011-2017

David Pogue writes today about affordable, bigger-than-pocketable, cameras with reasonable ISO 800 images. That made me think about how the digital camera market breaks down at the end of 2010. There are about 5 markets left today, cameras like the Canon G series are being replaced by these newer cameras.

  1. iPhone 4 and equivalent
  2. Very compact simple cameras. (Canon, everyone else)
  3. Non-pocketable smaller-than-SLR fixed lens cameras (Canon, Panasonic, Samsung)
  4. SLRs (Canon and Nikon)
  5. MILC - Mirrorless interchangeable lens cameras (everyone but Canon and Nikon)

The first category is dead. Why buy an ultra-compact when you own an iPhone 4 or the equivalent?

So that leaves 4 "camera" categories for 2011-2012. The middle two occupy the same two niches that have existed since the 1960s ...

  1. iPhone 4 and equivalent.
  2. Non-pocketable smaller-than-SLR fixed lens cameras -> same niche as the film rangefinder
  3. SLRs (Canon and Nikon) -> same as film SLRs
  4. MILC

The last category is the interesting one. We've been expecting the MILC for the past decade, so it's hardly a surprise. Manufacturers are now ready to chop the prism and the mirror. By 2012 Nikon and Canon will have MILCs that work with their current lenses, and low end SLRs will fade away.

More than the SLR is at risk; MILCs can be much smaller than the an SLR. That doesn't leave much room for the "rangefinder". So by 2017 we'll have only two categories;

  1. iPhone 8 and equivalent
  2. MILC (Canon and Nikon)

Next year I expect to replace my 4-5 yo Digital Rebel XT with a 2011 model that bloody well better shoot ISO 1200 as well as my current camera shoots ISO 400 [1]. That will almost certainly be the last SLR I'll buy.

[1] More pixels means I can use less optical zoom, so larger F-stop, so the overall light sensitivity per end-image pixel is greater than the ISO difference alone. Needless to say, I'm not impressed with megapixels. I want photons.

Monday, December 20, 2010

Which animal throws best? It's a mystery.

Which animal is biggest? Smallest? Fastest?

Good questions, and there are lots of answers. It's the sort of thing we talk about over breakfast. This morning, feeling clever, I asked my kids which animal throws best.

It's an important question.  A large animal that can throw a hefty object has a terrific attack and defense advantage against prey and predator alike. A social animal capable of throwing a hefty rock at 80 kph could, you know, rule the world.

There are very few animals that can throw well. It's quite a trick. To do it well an animal has to be bipedal, it needs really good binocular vision, and it has to be able to calculate trajectories.

Gorillas can throw things, but I don't think they compare to the naked ape.

Of course the kids didn't trust my answer. They wanted confirmation.

That's when I uncovered the mystery. The mystery of this search ...

"which animal throws best" - Google Search: "No results found for 'which animal throws best'."

No freakin' results?! Google is telling me nobody has asked this question with this phrasing on the net?!

That's not possible.

Is it?

Sunday, December 19, 2010

Gordon's scale of corporate evil - 2nd edition

A post on why Facebook, Netflix and Amazon will join the Google-Apple truce reminded me it's time to update Gordon's 15 point scale of publicly traded corporate evil.

Here's the current list, with notes on who's moved up and down ...

  1. Philip Morris: 15 (defines the upper bound)
  2. Goldman Sachs: 14 (up - working for SPECTRE).
  3. Exxon: 13 (global warming advocacy)
  4. AT&T, Verizon: 12 - up two
  5. Facebook: 11 (down 1 - still most evil tech company)
  6. For profit health insurance companies: 11
  7. Microsoft: 10
  8. Average publicly traded company: 8
  9. Google: 5 (down one)
  10. Apple: 5
  11. CARE International: 1 (They're not a PTC, so this is merely a non-evil reference point)

The greater enemy: why Facebook, Netflix and Amazon will join the Google-Apple truce

The Google-Apple war officially ended in September. I've not seen any convincing explanation of why the war ended; my best guess is that both companies realized that Verizon, AT&T and Comcast are the greater enemy. The three big carriers want to bring the cable TV business model to the net through the IP Multimedia Subsystem ... (emphases mine)

Mobile Carriers Dream of Charging per Page | Epicenter | Wired.com:

... The companies, Allot Communications and Openet — suppliers to large wireless companies including AT&T and Verizon — showed off a new product in a web seminar Tuesday, which included a PowerPoint presentation (1.5-MB .pdf) that was sent to Wired by a trusted source.

The idea? Make it possible for your wireless provider to monitor everything you do online and charge you extra for using Facebook, Skype or Netflix. For instance, in the seventh slide of the above PowerPoint, a Vodafone user would be charged two cents per MB for using Facebook, three euros a month to use Skype and $0.50 monthly for a speed-limited version of YouTube. But traffic to Vodafone’s services would be free, allowing the mobile carrier to create video services that could undercut NetFlix on price....

... “It certainly is exactly the thing we have been warning the companies will do if they have the opportunity and explains why AT&T and Verizon are so insistent that the wireless rules be solely about blocking and not anything else,” said Public Knowledge legal director Harold Feld....

... The ideas don’t look too different from the way cable companies price their video offerings, with different packages of programming at different levels.

... “I have been saying that this is where they want to go for a while,” van Schewick wrote to Wired. “The IP Multimedia Subsystem (IMS), a technology that is being deployed in many wireline and wireless networks throughout the country, explicitly envisages this sort of pricing as one of the pricing schemes supported by IMS.”...

... And as van Schewick points out, this model is already showing up in European mobile networks, where some networks charge users an extra fee to use internet telephony or to use an e-mail client on their phone....

... For instance, Comcast runs an online video service called FanCast that competes with NetFlix and YouTube, and is trying to buy NBC, which owns more than 30 percent of Hulu.com. And every cable and satellite company offers pay-movie services for an extra monthly fee and a la carte video on demand that compete with third-party streaming video services, like Blockbuster and Amazon....

I love the Orwellian twist of calling a cable-company business model venture "Openet".

Google and Apple will never be best buds again, but the vision of a net run like cable TV has concentrated their minds. At the moment then, though betrayals are certain, we have Google, Apple, Netflix, Amazon and even Facebook on one side. On the other side we have Verizon, AT&T and Comcast. Microsoft, the wounded Titan, lurks in the background, perhaps contemplating an acquisition.

As a consumer and citizen, there's no doubt which side I support. On a scale of corporate evil, AT&T & Verizon are far above Google and Apple (Facebook is another matter). Politically Google and Apple are pretty much on the Dem side, and Verizon, AT&T and Comcast are very much GOP.

Should be interesting, and scary.

Saturday, December 18, 2010

Hans Rosling's Ted Talk trendanalyzer is now Google Docs Motion Chart

It's not new, but it's still awesome to see Hans Rosling demo the Trendanalyzer wealth/mortality graph. It's a great antidote to Rationalist fatigue; we have made awesome progress in 200 years. Measured in wealth and life expectancy, the hellholes of the modern world are still better than most of the europe of 1800.

Check out the mortality spikes at WW I/Spanish Flu and WW II, and the fall of South Africa.

The real trick is getting the data, but the software is still cool enough. Must be awesome to have something like that. Must cost a fortune though.

Except, it's free. My friend Rob M. pointed me to Motion Chart, Google's rebranding of the Trendanalyzer. It comes with Google Docs. You can make some pretty nice graphics for a blog post from it.

Yes, we've come a long way.

Research the past 200 years of memetic propagation: Google Books ngram viewer

Google Ngram Viewer. Awesome.

The frequency of the word "dementia" spiked in the period 1870-1885. It then rose gradually to 1920 and fell until 1962 or so. Since then it has skyrocketed.

"Hepatitis" was used (Upper case "H", searches are case sensitive at the moment) around 1818, but then it disappeared until 1940. I wonder what it meant in 1818; it wouldn't be the modern meaning of the word.

"Schizophrenia" does not appear at all before 1910.

Screen shot 2010-12-18 at 8.50.09 PM.png

Slide rule starts rising in 1880, spikes in 1940, and then fall smoothly to plateau around 1990.

Screen shot 2010-12-18 at 8.52.42 PM.png

The usage patterns of the "n-word" is remarkable -- there are peaks in 1860 (the war to preserve slavery), around 1940 (black Americans in the armed forces), 1970 (civil rights), and the late 90s (the meaning and usage of the n-word changes?).

Imagine how a teacher could use this.

Google's kid problem - something for the GOP led house to chew on

Google has a kid problem.

The latest example is the new Google eBooks iOS app. It seemed like a good option for my kids iOS devices. Problem is, like every Google app I've looked at it, it has an embedded browser. Disabling Safari doesn't disable webkit use, so the browser is always available. A full Google search prompt is only a few clicks away, so the iPhone eBook app effectively disables iOS parental controls.

Just like Google's ad platform disables iOS parental controls.

I wish Apple would give all apps with embedded browsers and NC17 rating. Still, this is Google's problem -- not Apple's. Google has the same problem with Android devices (no parental controls at all) and with Google search (no effective parental controls). I'm sure Chrome OS will be no better.

I guess we have to wait until the founders have kids. The one bright side of the GOP dominated House is that they might give Google a hard time about this. Google is no friend of the GOP, and those guys know how to turn the heat up.

Google may have to start paying attentions to kids. They can start by having their iOS devices disable WebKit use when Safari is disabled.

Thursday, December 16, 2010

Jon Swift (Al Weisel) died Feb 27, 2010

During the Bush years I found solace in the ironic writings of Jon Swift. He didn't write every day, so I didn't notice when he stopped.

Today, browsing old links, I saw his blog. The last post was in March of 2009, a link to a friend's tragedy.

There's a lot of spam in the comment thread on that post, but midway down is a story ...

I don't know how else to tell you all who love this blog. I am Jon Swift's Mom and I guess I'm going to OUT him. He was Al Weisel, my beloved son. Al was on his way to his father's funeral in VA when he suffered 2 aortic aneurysms, a leaky aortic valve and an aortic artery dissection from his heart to his pelvis. He had 3 major surgeries within 24 hours and sometime during those surgeries also suffered a severe stroke. We, his 2 sisters, his brother, his partner and his best friend since he was 9 years old were with him as he took his last breath. We have all lost a shining start who warmed our hearts, tormented us and made us laugh as he giggled at our pulling something over on us. He passed away on February 27, 2010. My beloved child will live on in so many hearts. I miss him more than I can say. If you are on Facebook, go to organizations and join "Friends of Al Weisel, Unite!" It will give you just a taste of how special he was. Farewell, Jon (Al)

The blog went silent almost a year before he died. This news got some coverage in March of 2010, but I don't read many lefty blogs (I'm lefty enough). If you enjoyed Jon's writings, spare a moment for his partner, friends and family.

This post has links to some of his best writing.

I'm guessing his partner didn't have his passwords, or there would have been a follow-up post and the commenting would have been disabled.

The Hugo Awards revisited - 58 years later

Tor's Jo Walton is reviewing the Hugo Awards Nominees -- starting in 1953.  I started reading science fiction in the 1960s, so many of the early winners are vaguely familiar.

Jo refers us to the Locus Index to SF Awards, such as this 1959 Hugo listing including Alfred Bester's 1959s short story "The Men Who Murdered Mohammed" - which is not nearly as inflammatory as it sounds.

Many of these stories can, with some work, be ordered through well connected libraries. I plan to try a few.

Yahoo kills Delicious - don't say you weren't warned about the Cloud

Not surprising ...

Michael Tsai - Blog - Yahoo Shuts Down Delicious:

Delicious was a good service, and I’m sorry to see all the data and metadata that people have entered go away

Some fear Flickr is next - like Digital Railroad. I wrote about Yahoo's likely use of Flickr in 2008.

Don't say I didn't warn you.

I don't put anything "in the Cloud" I can't walk away from. One reason I use Simplenote is that I have completely usable and completely current local copies of all data. If they go away tomorrow I can switch to an alternative in minutes.

Google deserves credit for supporting "data freedom" -- which is the only thing that can make the Cloud tolerable. Most recently the data liberation front gave OS X users an easy way to download entire online albums.  Google has a spotty data history, but I give them credit for their data freedom team.

See also:

Alta Vista RIP

In the 90s I taught physicians how to use the Internet. It gave me an excuse to attend Society of Teachers of Family Medicine and American Academy of Family Practice conferences. I didn't use PowerPoint, my "slides" were a set of web pages shared between different frames, like a slide on Digital's Alta Vista Search Engine:

Screen shot 2010-12-16 at 6.59.16 PM.png

In 1997 I wrote a series of tips on using Alta Vista like ...

+noir +film -"pinot noir"

Matches may be required, or prohibited. Precede a required word or phrase with + and a prohibited one with -. This query finds documents containing film and noir, but not containing pinot noir.

By 1999 though I was writing ...

There are about 800 million web pages that are publicly accessible (Feb. 1999 [1]). This excludes, for example, the New York Times and the Encyclopedia Brittanica! These are the pages that a search engine can find for you.

Until 1999 the best search engine was AltaVista. It had about the widest coverage (16% [1],[2]), very good performance, and a powerful but slightly complex search language. When AltaVista failed, Profusion was a great way to try every other useful search engine.

Then came Google, and everybody else became history.

This document introduces Google, and talks about how to use AltaVista and Profusion on those occasions that Google doesn't succeed.

Google crushed Alta Vista in the mid 90s. After that things went downhill - for Digital (acquired by short lived Compaq) as well as Alta Vista. I remember a strange project in the twilight years, a Snow Crash/Neuromancer inspired Alta Vista virtual office tower for business in cyberspace. I had an address there. It was kind of silly.

Even in the twilight years though, Alta Vista had a good translation service. I missed Alta Vista's "near" operator, even though Google searches were much better (Google now has it).

Eventually Yahoo bought the remnants of Alta Vista, probably for the patents and the remnant traffic. Today Yahoo shuttered Alta Vista.

RIP Alta Vista. Historians will forget there was anything before Google. The truth is Alta Vista was pretty good, and if Google had never existed the web might have been much the same.

Tuesday, December 14, 2010

Gawker was hacked yesterday. Today LinkedIn?

Yesterday we learned Gawker was hacked. I got this message today ...

We have recently disabled your account for security reasons. To reset your password, follow these quick steps:
....
The LinkedIn Team

My LinkedIn password was not the same as the disposable Gawker password. It wasn't an ultra secure 64 character random string, but it was a 5th percentile good quality password, one of my class III credentials. It wouldn't fall to a standard attack.

So was LinkedIn hacked? Is this a false alarm? Are they being extra cautious after the Gawker hack?

There's another possibility. Since my Gmail account was hacked I don't enter my Google credentials on untrusted machines. Practically speaking, that means only OS X machines I control. Since that day I divide my credentials into five classes.

  • I: You want it? Take it.
  • II: I'd rather you didn't.
  • III: Help!! Help!!
  • IV: I'll fight you for it.
  • V: Kreegah bundolo! Kill!!

Category IV and V credentials are only used on trusted machines. Category I is used everywhere. Category II and III I'll use on my work machine -- an XP box with corporate class antiviral software. In other words, a vulnerable machine.

The fourth possibility is that one of my Category III credentials has fallen to a keystroke logger on my corporate laptop.

Yech.

I've reset my LinkedIn password (and reviewed the list of reset emails), and, on reflection, I've moved those credentials into "Class IV". So I won't use those credentials on an untrusted machine.

What's next?

See also (my stuff):

Update 12/14/10: LinkedIn wasn't hacked, unless you consider that they've hacked themselves. They'd matched every email address posted by the Gawker hackers, and reset the passwords associated with them. They explain that today (emphases mine) ...

We recently sent you a message stating that your LinkedIn password had been disabled for security reasons. (Note: If you have more than one email registered with us, you will receive more than one password reset message. You only need to act on one of them.)

This was in response to a security breach on a different site, Gawker.com, where a number of usernames and passwords were exposed. We want to make sure those leaked emails and passwords were not being used to attack any LinkedIn members.

There is no indication that your LinkedIn account has been affected, but since it shares an email with the compromised Gawker accounts, we decided to ensure its safety by asking you to reset its password ...

They would have done better to explain that yesterday. What a screw up.

Monday, December 13, 2010

The Gawker hack - and two factor authentication

I got my email from Gawker today

... the user name and password associated with your comment account were released on the internet...

Gawker was hacked - big time. Forbes has the gory details ...

The Real Lessons Of Gawker’s Security Mess - The Firewall - the world of security - Forbes

... Despite this, they do not really seem to be acknowledging the scale of what happened. They still try to put some blame back on users, suggesting that if they had a weak password they might be compromised. Well, that really does not make much of a difference when you expose the entire database table and have way too much faith in the 34 year old encryption algorithm reported to be used to safeguard the data...

Briefly, I take security far more seriously than Team Gawker. They were a big fat soft target.

I don't remember creating a Gawker account - I probably created it on io9 originally. I'm sure I used my throwaway password (still far more robust than most). I have retired that password, but it will now be a part of a future dictionary attack. I need to check that Emily doesn't use it any more either.

In the wake of these events there are typically calls to "use strong passwords". Except, of course, if the server side password store encryption is hacked then even the world's best password is useless. And, of course, there are keystroke loggers out there.

This is what I do now, but, really, we need two factor authentication urgently.

I did go through Gawker's password reset procedure, which seems to have given me a new username and password. There's no way currently to get to their accounts page so I'll just leave it as it is.

Update 12/14/10: This Lifehacker (Gawker) article on lessons learned from a hacked google account is quite ironic now. They didn't learn any lessons.

There've been two good commentaries today ...

The snowy 70s and nordic skiing

I came of age in Montreal in the 70s. It was a snowy time, and, not coincidentally, Cross Country (Nordic) skiing was relatively popular. There were cross country resorts as far south as mid-Pennsylvania.

Then came the 80s. The snows went away, the resorts closed, and cross country skiing declined. When global warming became obvious in the late 90s I figured that was the end of my favorite sport.

Now some are wondering if the 70s are back ...

Snow storm snarls Midwest: Is US facing another extreme winter? - CSMonitor.com

.... Scientists at the University of Wisconsin in Madison are among those trying to understand the mysterious interplay between Pacific and North Atlantic weather phenomena that threaten to dunk the Eastern US into a second year in a row of 1970s-style blizzards and cold snaps...

... Scientists speculate that heat released from storms racing up the US East Coast toward the Labrador Sea may be feeding the so-called North Atlantic Oscillation – nicknamed "The Greenland Block" – in ways that are not yet understood. The region of high pressure over Greenland has pushed huge troughs of Canadian air into the US, causing the fifth biggest snow storm on record in Minneapolis over the weekend and now threatening Orlando, Fla., with 20 degree F temperatures.

The atmospheric upset has had the opposite effect on parts of the West, where cities like Long Beach, Calif., and Phoenix saw record high temperatures Monday...

So now I know why my childhood was snowy. It's news to me. I found a bit more about it in this Feb 2011 article inspired by DC snow ...

The North Atlantic Oscillation, a mid-oceanic pressure system, has some distinct internal variability, but generally it alternates between roughly 25-year-periods of warm, then cold, temperatures. During the previous cold phase, which lasted from about 1960 to 1985, there were major winter storms in the Washington, D.C., area every couple of years — big snow storms hampered John F. Kennedy's inauguration in 1961 and a week of sub-zero temperatures chilled many people attending Ronald Reagan's second inauguration. Like the current storm, these storms dumped lots of snow: A 1979 storm dropped 18.7 inches and a 1983 storm dropped 16.6 inches. The storm that struck the capital region in December 2009 also dumped 16.6 inches of snow in D.C.

Those biggies of the past were usually also associated with El Niño, like this year. The North Atlantic Oscillation brings colder weather; the El Niño, which arises out of an unusually warm equatorial Pacific Ocean and occurs roughly every two to seven years, brings moisture to the Mid-Atlantic...

From my selfish point of view, snowy winters are excellent. Even with global warming Minnesota is cold enough for winter snow -- we're just too dry. These past two years we've gotten the moisture we need.

Maybe cross country skiing will make a bit of a comeback, even if the long term outlook is a bit bleak.