Friday, October 31, 2008

Google is doing OCR on PDF wrapped document scans

The Google blog struggles to explain why their latest technical achievement is important...
Official Google Blog: A picture of a thousand words?

... We are now able to perform OCR on any scanned documents that we find stored in Adobe's PDF format... This is a small but important step forward in our mission of making all the world's information accessible and useful...
So why is this important, yet hard to appreciate?

The first problem is that most people think of PDF as a text container. Indexing a text container is nothing special. What's less appreciated is that PDF is the de facto standard way to package a scanned document [1].

So what's novel about doing character recognition on a scan? OCR on 600 dpi B&W document scans is no great trick. Adobe's PDF client has more or less done that for about 10 years [2], and Windows' (formerly Xerox) ancient and under-appreciated document imaging has had this ability since the dawn of time.

The trick is implementing this affordably on millions and billions of PDFs indexed on Google's servers.

That's impressive and it is going to open a vast amount of knowledge.

Good for you Google!

[1] I figured this would happen in the 90s before there was any clear answer to the scanned document representation question. There are bizarre technical issues with scanning into PDF, but it's a great format overall. (Hint: Ancient fax-style lossless compression of a "B&W" document scan is much more efficient than readable JPEG compression of a gray scale scan.)

[2] They sometimes hide or remove this feature depending on what else they're selling.

Firewalls and separation of powers: banking, government, medicine and pharmacy

When I wrote about firewalls a few weeks ago I focused on contagion ...
Gordon's Notes: Systemic failure and financial firewalls

... even if there are deeper economic and cultural failures, there are also more straightforward firewall failures in our current crisis. These are usually called "regulatory failures" but regulation can come in many forms. I think the most interesting forms are those that are designed to stop the spread of contagions.

Fires, seizures, epidemics, hurricanes and financial crises are all, famously, "chaotic". They have non-linear perturbation sensitivities, and they can roar up and die down in ways that are only loosely predictable.

Excepting hurricanes, we have firewalls for these things. In our brains are systems to dampen seizures should they arise, and, we think, to limit where they spread. In our buildings we have, well, firewalls. In public health we find immunization rings, targeted interventions, quarantine and the like...

... Firewalls don't show up, to my knowledge, in classical economics. I'm sure they show up in modern economic models of regulation and in studies of "complex adaptive systems" [1]. Maybe this latest crisis will bring models of financial system firewalls, like the mourned Glass-Steagall act, to the level of popular economics.
Glass-Steagall separated commercial and investment banking. One effect was to reduce the risk of contagion, but I think the intent was to reduce conflicts of interest.

Managing systemic conflicts of interest is one reason America's political system separates power between Congress, the Executive and the Courts. (One of the reasons Bush was able to fully leverage his incompetence was that the GOP controlled all three, and had near-control of the media as well.)

Conflict of interest is one reason, for example, that it's a very bad idea for orthopedic surgeons to own imaging facilities.

Speaking of which, there's yet another separation of powers that's waned over the past tweny years.

At one time American physicians dispensed medications and pharmacists prescribed. That's still true in many nations. Shockingly, the result was very high use of very inappropriate medications. The cure was separation of powers. Physicians would prescribe and pharmacists would dispense.

Time passed. Lessons were forgotten. Market deism and libertarian ideals joined forces. Now we have minute clinics owned by dispensing organizations, and oncologists who make a large share of their revenue by the margin on dispensed drugs.

I am very confident that we will rediscover that there was a good reason to separate prescribing and dispensing. We'll find, for example, that minute clinics dramatically increase the cost of "treating" self-limited conditions -- not to mention the sale of diet pills, supplements, and candy.

Firewalls to contain epidemic chaos. Separation of powers to manage fundamental conflicts of interest in an imperfect world of imperfect communication and incomplete knowledge. They both reduce efficiency. They are both essential. Sometimes they're the same thing.

I do wish our meta-memory wasn't so short.

Don't listen to Judith Warner. Worry.

Judith Warner tries to tell us not to worry, that things are getting better.

Of course on closer inspection she's really saying that that either Obama/Biden will win, in which case we should be happy now, or McCain/Palin will win, in which case we should celebrate Bush's end because soon we'll despair.

I admit, there's a deeper logic there. After all, relatively soon we'll all be dead, so we might as well be happy now.

In the meanwhile, don't believe the polls. The turnout of "young" (under 35) voters in early Florida voting has been lousy. Just like Kerry, who might have won if the under-35s had actually voted like they said they would.

If the "young" stay home, they will give us President Palin.

Punishment for the boomers? Well, I can understand that. Silly though, the young will live with the consequences longer than I will.

Thursday, October 30, 2008

George Will?!

George Will endorses Obama.

Who's left? 

Dick Cheney?

Jesus.

I can't possibly support somebody George Will has endorsed.

My head has exploded.

Again.

The Economist's endorsement: What's surprising about it.

The Economist has been running a world "electoral college" . Last I looked 80% of their US readers voted for Obama.

80%.

That's current readers of The Economist, a journal that used to be rationalist 19th century liberal but became a pale imitation of the Wall Street Journal editorial pages in the 90s. Even this readership, the very heart of McCain's former constituency, is massively pro-Obama.

So I figured the "paper" would endorse Obama. If 80% of the US readership of a WSJed-lite publication wants Obama, they aren't going to be stupid.

Still, in 2000 they endorsed Bush. They never adequately apologized.

In 2004, they weakly, half-heartedly, with poisoned pen, "endorsed" John Kerry as "the incoherent".

So I was expecting a grudging, muttered, meaningless endorsement.

Instead we got ...

Obama on the cover, striding along. Headline "It's time".

There's nothing poisonous about this endorsement (emphasis mine):
An endorsement of Barack Obama | It's time | The Economist
... all the shortcomings of the campaign, both John McCain and Barack Obama offer hope of national redemption. Now America has to choose between them. The Economist does not have a vote, but if it did, it would cast it for Mr Obama. We do so wholeheartedly ...
"Wholeheartedly". A carefully chosen word.

They are not forgiven. They will never be forgiven for their 2000 endorsement of GWB.

Still.

It's something.

Why Obama would be a great president – in two lines from Bill Clinton

This says it all …

Joan Walsh - Salon.com

…Clinton shared the candidate's measured, investigative approach to the financial crisis in September: calling his advisors, calling Clintons' advisors, calling both Clintons and others. What Obama told everyone, Clinton said, was, "'Tell me what's right. Don't tell me what's popular, tell me what's right, and I'll figure out how to sell it.' That's what a president does. He will be a very fine decision maker, working for the American people."…

Not because he’s a very good writer. Not because he’s black or multi-ethnic. Not because he’s as smart as anyone you know. Not because of his lifetime worldwide experience. Not because he’s a “great communicator”. Not entirely because of his iron focus and calm discipline.

Those are good things, but they’re not why he’d be a great president.

He’d be a great president because he uses Reason to political ends.

Reason about problems. Reason about people. Reason about politics. Reason about what’s doable and how to do it.

Desperate times can sometimes lead a nation to wake up from a drunken stupor and make intelligent choices. Easy times produce George W Bush. Desperate times produced Roosevelt, Lincoln, and Churchill.

I don’t think we Americans, today, deserve a good president, much less a great one. As a people we’ve earned a President Palin. That would be just.

I’m not asking for justice.

I’m asking for mercy.

If the fates are merciful, we’ll get Barack Obama.

A lyrical essay on America and Obama

Like Roger Cohen I'm an American immigrant. Unlike Cohen, I'm not too positive about America's current culture. Ask me on November 5th.

Unlike me, Cohen is a very good writer ...

ROGER COHEN - American Stories - NYTimes.com

Of the countless words Barack Obama has uttered since he opened his campaign for president on an icy Illinois morning in February 2007, a handful have kept reverberating in my mind:

“For as long as I live, I will never forget that in no other country on earth is my story even possible.”

Perhaps the words echo because I’m a naturalized American, and I came here, like many others, seeking relief from Britain’s subtle barriers of religion and class, and possibility broader than in Europe’s confines....

... Americans are decent people. They’re not interested in where you came from. They’re interested in who you are. That has not changed.

But much has in the last eight years. This is a moment of anguish. The Bush presidency has engineered the unlikely double whammy of undermining free-market capitalism and essential freedoms, the nation’s twin badges.

American luster is gone. The American idea has, in Joyce Carol Oates’s words, become a “cruel joke.” Americans are worrying and hurting.

So it is important to step back, from the last machinations of this endless campaign, and think again about what America is.

It is renewal, the place where impossible stories get written.

It is the overcoming of history, the leaving behind of war and barriers, in the name of a future freed from the cruel gyre of memory.

It is reinvention, the absorption of one identity in something larger — the notion that “out of many, we are truly one.”

It is a place better than Bush’s land of shadows where a leader entrusted with the hopes of the earth cannot find within himself a solitary phrase to uplift the soul.

Multiple polls now show Obama with a clear lead. But nobody can know the outcome and nobody should underestimate the immense psychological leap that sending a black couple to the White House would represent.

What I am sure of is this: an ever more interconnected world, where financial chain reactions spread with the virulence of plagues, thirsts for American renewal and a form of American leadership sensitive to humanity’s tied fate...

...Watching the way he has allowed his opponents’ weaknesses to reveal themselves, the way he has enticed them into self-defeating exhaustion pounding against the wall of his equanimity, I have come to understand better what he meant.

Stories require restraint, too. Restraint engages the imagination, which has always been stirred by the American idea, and can be once again.

I feel that we're in free fall as a nation. There's a tree growing from the cliff, and if we can twist just right and get a bit of a breeze we might be able to stop in it. We'll still have a heck of a climb to the top, but it's not impossible.

Miss this tree, and there may not be another one. Not for us, and maybe not for humanity.

Yeah, I know, it sounds melodramatic.

Truth.

Wednesday, October 29, 2008

Geeks for Obama: Tim O'Reilly's endorsement

Tim O'Reilly is a big name in the geek world. His eponymous publishing house has sold some of the best programming books for many years.

In a radical departure from his usual writing, he's published a stirring endorsement of Barack Obama.

Geeks tend to be libertarian to centrist sorts, though substantially less religious than the average American. (I'm far more concerned with the 'problem of the weak' than most American geeks.) My tribe should have been a natural constituency for McCain.

The old, pre-Palin, non-demented McCain.

Instead, donation reports from Google, Apple, Microsoft and my own employer show that 80% of geeks support Obama. Not coincidentally, the Economist's world-wide reader survey found that 80% of US registered Economist readers support Obama.

O'Reilly fits right into this picture.

Microsoft lessons from Target Trutech and my fancy JBL

Two years ago I spent far too much money on a fancy JBL iPod clock radio.

It was defective by design. My 3G b&w iPod wouldn't always start when the alarm went off. Later model iPods would always start, but they played a random tune (known defect of this device). The embedded OS crashed randomly, typically every few weeks. It's pain to reset when there's a 9V battery backup installed, so I gave up on the battery backup. The time and alarm configuration is cryptic to begin with, and there are many combinatorials to get wrong.

In the meantime I bought my 10 yo a $8 house brand Target Trutech clock radio. It's been very reliable, despite extreme abuse.

Tonight I threw in the towel. I bought another $8 "Trutech" for our room. It's very simple to program, the 9V battery backup works, the radio even makes some noise and the power adapter is quite compact.

I thought about paying more than $8, but my pre-JBL clock radio was a $90 SONY CD player/radio that died after about 10 months of gentle use.

In the 21st century there's no particular correlation between price and quality, and most brand names are meaningless. (Apple being the obvious exception.)

Today one can either buy the very cheapest device and save money, or buy a luxury brand (Apple, Bose, ?) and expect some support. The great middle is gone.

Speaking of which, Target is selling $300 ASUS "netbooks" that run Linux and bundle OpenOffice. They include a 4 GB solid state "drive" and 512MB of memory with embedded wireless. Within a year they'll sell for $200 and have 8 GB of storage and 1GB of memory.

Microsoft is not a luxury brand.

2009 will be an ugly year for Microsoft.

Mainstream media wearies of calling McCain on his lying

A not so good development.

McCain/Palin lying has been so regular and sustained it's no longer "news". It's "dog bits man" and thus not reported by big media.

ABC is an exception, but the 'Carved B' hoax is getting a complete pass. FactCheck.org is routinely calling the lies, but they're running out of euphemisms and migrating to descriptions like "whopper". Alas, they're not mainstream.

I put Obama's chances at no more than 50%, so this is not a good time for the big buys to stop pointing out that Palin and McCain are non-stop liars.

Tuesday, October 28, 2008

Was Babbage's computer truly forgotten?

In Our Time 's Ada Lovelace program, by necessity, involved quite a bit of discussion of Charles Babbage. Babbage, with some help from Lovelace, imagined a good portion of the computing machine that Turing and others would later build.

Melvyn's guests felt that there wasn't a direct connection between Babbage (1830)  and Turing (1945), that Babbage's contributions were essentially lost to science.

This seems a bit odd, as the Wikipedia article on Babbage mentions that his son created six difference engines. To this I can add an additional note from my library. I have a copy of the 1911 Encyclopedia Britannica (11th ed), which includes an article on Babbage and one on Calculating Machines. I've scanned all of the former and portions of the latter (PDF 8MB [1]). (See also the "love to know" 1911 project , but their article doesn't match my copy. Microsoft apparently republished the encyclopedia in 1995.)

Briefly, the article on Babbage focuses on his mathematical pursuits, including an essay deploring the decline of science in 19th century England (some things never change). The article describes both the Difference and Analytic engines much as we understand them now, though it misses the significance of the programming design. The article on Calculating Machines praises the Difference Engine as a real device with well understood principles, but states that the Analytical Engine did not progress beyond sketches. It does, however, refer interested readers to a comprehensive book by Babbage's son.

I'm left with the impression that Melvyn's guests understated the extent to which portions of Babbage's work survived into the 20th century.

[1] OS X Black and White PDFs are vastly larger than Adobe's B&W PDFs.

Monday, October 27, 2008

Apple blocks Opera Mini from the iPhone

This is why we iPhone users must encourage everyone to buy a Google Android gPhone ...

Opera Sings an Ode to Browsers Everywhere - Bits Blog - NYTimes.com

... For smartphones, Opera Mobile is a full-featured browser that can display most Web sites. Handset makers pay Opera about 50 cents to $1 per copy for each phone made with the browser on it.

For less sophisticated phones and slower networks, it offers Opera Mini, which takes advantage of a server computer, run by Opera, to handle the processing of Web pages. The server then sends a simplified version of each page to the phone in a compressed form.

Because that makes for much faster browsing no matter what the phone and network, Mr. von Tetzchner said, Opera Mini is increasingly popular on smartphones, even those that use the latest third-generation, or 3G, wireless data networks.

“3G isn’t really that fast,” he said. “We try to deal with the real world.”

Mr. von Tetzchner said that Opera’s engineers have developed a version of Opera Mini that can run on an Apple iPhone, but Apple won’t let the company release it because it competes with Apple’s own Safari browser...

Wouldn't you like to have the choice of a Opera Mobile? Google Android customers will have that choice.

Apple needs the lash of competition to be a barely tolerable companion.

Great news: brain speed increases to age 39

I figured brain speed peaked at 25, so 39 is great news ..
Slashdot | Brains Work Best At Age of 39

... Scientists at the University of California Los Angeles are reporting that while some people may think 'life begins at 40,' all it seems to do is slow down. According to recent research, at age 39 our brain reaches its peak speed, and it's all downhill after that...
Why the negative spin? This is kind of nice.

So now I have the brain of a 29 yo .... (assuming a normal curve decline, which is probably optimistic).

Sunday, October 26, 2008

If you're told you can't vote -- call 866-OUR-VOTE

The good guys are ready for the big fight. They've got a strike team standing by to fight GOP voter suppression attacks ...
Do you know what to do? (Scripting News)

... Call 866- OUR-VOTE or go to 866OURVOTE.org to get information on where to vote and the facts on your right to vote. A trained team of advisors is available to help you resolve your problem...

The Singularity University

It's easy to mock this group, not least because of the wealth and power of some of the attendees ...
Rough Type: Nicholas Carr's Blog: This post will self-destruct in five minutes

... On Saturday, September 20, 2008, a carefully selected group of the tech world's best and brightest assembled in a windowless conference room at NASA's Ames Research Center in Silicon Valley - barely a mile from the Googleplex as the rocket flies - to discuss preparations for our impending post-human future. This was the founding meeting of Singularity University, an academic institution whose mission, as founder Dr. Peter Diamandis told the elite audience, would be 'to assemble, educate and inspire a cadre of leaders who strive to understand and facilitate the development of exponentially advancing technologies (bio, nano, info, etc); and to apply, focus and guide these to the best benefit of humanity and its environment.'...
The group is keeping a low profile, for obvious reasons. Neither Vernor Vinge nor Bill Joy attended.

When it comes to the "rapture of the nerds" I hold the middle ground. I fear the era of transcendent non-biological minds, but I think it's beyond 2060. Unlike Nicholas Carr, I'll avoid the easy sarcasm. I'm glad they're thinking about the problem. Maybe they'll figure out how we could create an artificial mind that wouldn't be, for example, insane or murderous.