Saturday, March 12, 2005

What's worse than no privacy? Lies.

A while back I posted on the (gross) errors that my shadow medical profile is likely accumulating thanks to a persistent billing error. I thought I was making a prediction, but tomorrow is today. The data stolen from ChoicePoint (and Lexis/Nexis and everywhere else) is full of errors:
MSNBC - ChoicePoint files found riddled with errors
By Bob Sullivan MSNBC

... Pierce, a privacy advocate, obtained her report nearly two years ago, long before the current controversy. Thanks to the unknown source -- perhaps a company employee, Pierce said, but she has no way of knowing -- she got a rare privilege most consumers don't: a chance to see what ChoicePoint knows about her.

... What first caught Pierce's eye, she said, was a heading titled "possible Texas criminal history." A short paragraph suggested additional, "manual" research, because three Texas court records had been found that might be connected to her. "A manual search on PIERCE D.S." is recommended, it said.

Pierce says she's only visited Texas twice briefly, and never had any trouble with the law there.

"But if I was applying for a job, and there were other candidates, and this was on my record, the company would obviously go for another person," she said. "It raises a question in your mind."

... On ChoicePoint's Web site, the National Comprehensive Report is described as a collection of searches that glean data from "national and state databases for a summary of assets, driver licenses, professional licenses, real property, vehicles, and more. Each report offers the ability to add associates to the report, which include relatives, others linked to the same addresses as the subject and neighbors."

... Under former addresses, an ex-boyfriend's address was listed. Pierce said she never lived there, and in fact, he moved into that house after they broke up. The report also listed three automobiles she never owned and three companies listed that she never owned or worked for.

Under the relatives section, her sister's ex-husband was listed. And there are seven other people listed as relatives who Pierce doesn't know...

...Most alarming to Pierce is the fact that, with all this information, the ChoicePoint report she received had glaring omissions, too. Many of her former addresses aren't listed; and despite the host of other people listed on her report, many relatives and nearby neighbors were missing.

... Pierce's experience neatly parallels that of Richard Smith, another privacy advocate, who paid a $20 fee and received a similar report from ChoicePoint several years ago. The company offers a wide variety of reports on individuals; Smith purchased a commercial version that's sold to curious consumers.

Smith's dossier had the same kind of errors that Pierce reported. His file also suggested a manual search of Texas court records was required, and listed him as connected to 30 businesses which he knew nothing about.

Some of the mistakes on Smith's report were comical: That his wife had a child three years before they were married, that he had been married previously to another woman, and most absurd, that he had died in 1976...
It's a longish article, and pretty depressing. The quality of the data is awful, it can control your destiny, you can't see it and you can't fix it. I would have been surprised if anything else were true. Most of my blogly bloviating is pure opinion; in this particular domain I have actual expertise (shock! It's even exotic expertise). Even if there weren't inevitable and severe matching errors associated with gathering data from multiple sources, the data would only be as good as its sources. Then there's the risk of the inevitable inferences that must be performed to process the data. Lastly, there's ChoicePoint's motivations. They don't get in trouble if they label someone a "child molester" who isn't -- how would anyone find out? They get in serious trouble if they mislabel a "child molester" as "clear".

Think about it. What are they more likely to do? Err on the side of labeling a good person as bad, or a bad person as good? Which error costs them money?

Add these four things together:
  1. Fundamental problems related to "matching" identities managed in different systems.
  2. Mismatch between the quality needs of the acquiring systems vs. the use to which the data is put by ChoicePoint.
  3. Semantic issues too complex to mention here.
  4. The intense motivation to err on the dark side of life.
and it would be astounding of ChoicePoint records were not full of severe errors and prone to cause harm to the (relatively) innocent.

Alas, do Americans care? Not now they don't. They will one day.

PS. David Brin covered this topic in great depth many years ago. He wrote a book about it (The Transparent Society), but you get the main ideas here and (more recently) here. The Amazon reviews make interesting reading -- the best are pained admissions that Brin might be right.

No comments: