Category Archives: Government Process IQ Trainwrecks

Bad Data Quality affects UK Pensioners

From the Guardian:

Government watchdog blames lack of data and bad governance for poor performance by MyCSP and delays in payments for thousands of pensioners

According to the UK’s National Audit Office more than three-quarters of civil service pension records – 1.25m – are incomplete or incorrect, which it says has caused hardship and distress to many pensioners.

The NAO office report cites:

  • Systems capabilities not being in place to deal with processes and data on payroll and pensions
  • A lack of data governance and oversight
  • A lack of any methods to track benefits and other KPIs

This case has been cited by critics as a “text book case” of how not to reform public services.

The full NAO Audit report can be found here: https://www.nao.org.uk/report/investigation-into-members-experience-of-civil-service-pension-administration/

Accounting Accountability

From Europe we learn of two stories with similar characteristics that tick all the boxes for classic Information Quality Trainwrecks.

 

From Germany we hear that due to errors in internal accounting in the recently nationalised Hypo Real Estate, the German National debt was overstated by €55 Billion (US$76 bn approx). This was doubly embarrassing for Germany as they had spent the last while criticising the accuracy of accounting by the Greek Government.

According to the Financial Post website:

In an era of austerity where their government has squabbled tirelessly for two years over a mooted €6-billion tax cut, Germans found it hard to fathom that their government was so suddenly and unexpectedly 55-billion euros better off.

The net effect of the error being found and fixed is that Germany’s Debt to GDP ratio will be 2.6% lower than previously thought.

The root cause appears to be a failure to standardise accounting practices between two banks who were being merged as part of a restructuring of the German banking system. This resulted in the missing billions being accounted for incorrectly on the balance sheet of the German government who owns the banks in question.

From Ireland we have a similar story of missing Billions. In this case a very simple accounting error resulted in monies that were loaned from one State agency (the National Treasury Management Agency) to another State Agency (the Housing Finance Agency) being accounted for by the Department of Finance in a way which resulted in €3.6billion being added to the Irish National Debt figures.

This (almost co-incidentally) resulted in a 2% misstatement of the Irish National debt. Also co-incidentally it is exactly the same figure as the Irish Government is seeking to reduce net expenditure by in its forthcoming budget.

The problem was first spotted by the NTMA in August of last year (2010) but, despite a number of emails and phone calls from the NTMA to the Department of Finance the error was not fixed until October 2011. For some reason there was a failure in the Department to recognise the error, understand the significance, or take action on it.

The Secretary General of the Department of Finance blames middle-management:

Secretary general of the department Kevin Cardiff said the error was made at “middle management” level and was never communicated up to a more senior level. He said the department was initiating an internal inquiry to examine the issue and would establish an external review to look at the systems and to put safeguards in place to ensure such mistakes were not repeated in the future.

Information Quality Professionals of course would consider looking at the SYSTEM, and part of that is the organisation culture which is in place in the Department which prevented a significant error in information from being acted upon.

Lessons to Learn:

There are a lot of lessons to learn from these stories. Among them:

  1. When bringing data together from different organisations, particularly when those organisations are being merged, it is important to ensure y0u review and standardise the “Information Product Specification” so that everyone knows what the standard terms, business rules, and meaning of data are in the NEW organisation and ACROSS organisational boundaries. Something as simple as knowing who has to put a value in the DEBIT column and where the corresponding CREDIT needs to be put should be clearly defined. Operational Definitions of critical concepts are essential.
  2. When errors are found, there needs to be clear and open channels of communication that allow the errors to be logged, assessed, and acted on where they have a material or significant effect. Organisational cultures where internal politics or historic arrogance lead managers to assume that the issue isn’t there or isn’t their problem ultimately result in the issue becoming bigger and more difficult to deal with.
  3. Don’t shoot the messenger. Don’t blame the knowledge worker. But ensure that there are mechanisms by which people can take accountability and responsibility. And that starts at the the top.

Green Card, Red Faces

The United States Government is being sued in a massive class-action suit representing Green Card applicants from over 30 countries which alleges that the United States unfairly denied 22,000 people a Green Card due to a computer blunder.

This story is reported in the Irish Times and the Wall Street Journal.

It is not in the remit of this blog to debate the merits of awarding working visas on the basis of a random lottery, but this is precisely what the Green Card system is, offering places to 50,000 people each year based on a random selection of applications submitted over a 30 day period. According to the WSJ:

In early May, the State Department notified 22,000 people they were chosen. But soon after, it informed them the electronic draw would have to be held again because a computer glitch caused 90% of the winners to be selected from the first two days of applications instead of the entire 30-day registration period.

Many of these 22,000 people are qualified workers who had jobs lined up contingent on their getting the Green Card. The WSJ cites the example of a French neurospyschology PhD holder (who earned her PhD in the US) who had a job offer contingent on her green card.

The root causes that contributed to this problem are:

  1. that the random sampling process did not pull records from the entire 30 day period, with the sampling weighted to the first two days of applicants, with 90% of the “winners” being drawn from the first two days.
  2. There was no review of the sampling process and outputs before the notifications were sent to the applicants and published by the State Department. It appears there was a time lag in the error being identified and the decision being taken to scrap the May Visa Lottery draw.

The first error looks like a possible case of a poorly designed sampling strategy in the software. The regulations governing the lottery draw require that there be a “fair and random sampling” of applicants. As 90% of the applicants were drawn from the first two days, the implication is that the draw was not fair enough or was not random enough. At the risk of sounding a little clinical however, fair and random do not always go hand in hand when it comes to statistical sampling.

If the sampling strategy was to pool all the applications into a single population (N) and then randomly pull 50,000 applicants (sample size n), then all applicants had a statistically equal chance of being selected. The fact that the sampling pulled records from the same date range is an interesting correlation or co-incidence. Indeed, the date of application would be irrelevant to the sampling extraction as everyone would be in one single population. Of course, that depends to a degree on the design of the software that created the underlying data set (were identifiers assigned randomly or sequentially before the selection/sampling process began etc.)

This is more or less how your local State or National lottery works… there is a defined sample of balls pulled randomly which create an identifier which is associated with a ticket you have bought (i.e. the numbers you have picked). You then have a certain statistical chance of a) having your identifier pulled and b) being the only person with that identifier in that draw (or else you have to share the winnings).

If the sampling strategy was to pull a random sample of 1666.6667 records from each of the 30 days that is a different approach. Each person on each day of application has the same chance as anyone else who applied that day, with each day having an equal chance at the same number of applicants being selected. Of course it raises the question of what do you do with the rounding difference you are carrying through the 30 days (equating to 20 people) in order to still be fair and random (a mini-lottery perhaps).

Which raises the question: if the approach was the “random in a given day” sampling strategy why was the software not tested before the draw to ensure that it was working correctly?

In relation to the time lag between publication of the results and the identification of the error, this suggests a broken or missing control process in the validation of the sampling to ensure that it conforms to the expected statistical model. Again, in such a critical process it would not be unreasonable to have extensive checks but the checking should be done BEFORE the results are published.

Given the basis of the Class Action suit, expect to see some statistical debate in the evidence being put forward on both sides.

Gas by-products give a pain in the gut

Courtesy of Lwanga Yonke comes this great story about how the choice of unit of measure for reporting, particularly for regulatory reporting or Corporate Social Responsibility reports can be very important.

The natural gas industry’s claim that it is making great strides in reducing the polluted wastewater it discharges to rivers is proving difficult to assess because of inconsistent reporting and a big data entry error in the system for tracking contaminated fluids.

The issue:

Back in February the Natural Gas industry in the US released statistics which appeared to show that they had managed to recycle at least 65% of the toxic waste brine that is a by-product of natural gas production. Unfortunately they had their data input a little bit askew, thanks to one company who had reported data back to the State of Pennsylvania using the wrong unit of measure – confusing barrels with gallons.

For those of us who aren’t into the minutiae of natural gas extraction, the Wall Street Journal helpfully points out that there are 42 gallons in a barrel. So, by reporting 5.2 million barrels of wastewater recycled instead of the 5.2 million gallons that were actually recycled, the helpful data entry error overstated the recycling success by a factor of 42.

Which is, co-incidentally, the answer to Life the Universe and Everything.

According to the Wall Street Journal, it may be impossible to accurately identify the rate of waste water recycling in the natural gas industry in the US.

Not counting Seneca’s bad numbers — and assuming that the rest of the state’s data is accurate — drillers reported that they generated about 5.4 million barrels of wastewater in the second half of 2010. Of that, DEP lists about 2.8 million barrels going to treatment plants that discharge into rivers and streams, about 460,000 barrels being sent to underground disposal wells, and about 2 million barrels being recycled or treated at plants with no river discharge.

That would suggest a recycling rate of around 38 percent, a number that stands in stark contrast to the 90 percent recycling rate claimed by some industry representatives. But Kathryn Klaber, president of the Marcellus Shale Coalition, an industry group, stood by the 90 percent figure this week after it was questioned by The Associated Press, The New York Times and other news organizations.

The WSJ article goes on to point out that there is a lack of clarity about what should actually be reported as recycled waste water and issues with the tracking of and reporting of discharges of waste water from gas extraction.

At least one company, Range Resources of Fort Worth, Texas, said it hadn’t been reporting much of its recycled wastewater at all, because it believed the DEP’s tracking system only covered water that the company sent out for treatment or disposal, not fluids it reused on the spot.

Another company that had boasted of a near 100 percent recycling rate, Cabot Oil & Gas, also Houston-based, told The AP that the figure only included fluids that gush from a well once it is opened for production by a process known as hydraulic fracturing. Company spokesman George Stark said it didn’t include different types of wastewater unrelated to fracturing, like groundwater or rainwater contaminated during the drilling process by chemically tainted drilling muds.

So, a finger flub on data entry, combined with lack of agreement on meaning and usage of data in the industry, and gaps in regulation and enforcement of standards means that there is, as of now, no definitive right answer to the question “how much waste water is recycled from gas production in Pennsylvania?”.

What does your gut tell you?

 

Calculation errors casts doubt on TSA Backscatter safety

It is reported in the past week on Wired.com and CNN that the TSA in the United States is to conduct extensive radiation safety tests on their recently introduced backscatter full body scanners (affectionately known as the “nudie scanner” in some quarters).

An internal review of the previous safety testing which had been done on the devices revealed a litany of

  • calculation errors,
  • missing data and
  • other discrepancies on paperwork

In short, Information Quality problems. A TSA spokesperson described the issues to CNN as being “record keeping errors”.

The errors affected approximately 25% of the scanners which are in operation, which Wired.com identifies as being from the same manufacturer, and included errors in the calculation of radiation exposure that occurs when passing through the machine. The calculations were out by a factor of 10.

Wired.com interviewed a TSA spokesperson and they provided the following information:

Rapiscan technicians in the field are required to test radiation levels 10 times in a row, and divide by 10 to produce an average radiation measurement. Often, the testers failed to divide results by 10.

For their part, the manufacturer is redesigning the form used by technicians conducting tests to avoid the error in the future. Also, it appears from documentation linked to from the Wired.com story that the manufacturer spotted the risk of calculation error in December 2010.

Here at IQTrainwrecks.com we are not nuclear scientists or physicists or medical doctors (at least not at the moment) so we can’t comment on whether the factor of 10 error in the calculations is a matter for any real health concern.

But the potential health impacts of radiation exposure are often a source of concern for people. Given the public disquiet in the US and elsewhere about the privacy implications and other issues surrounding this technology any errors which cast doubt on the veracity and trustworthiness of the technology, its governance and management, and the data on which decisions to use it are based will create headlines and headaches.

 

The Wrong Arm of the (f)Law

Courtesy of Steve Tuck and Privacy International comes this great story from the UK of how a simple error, if left uncorrected, can result in significantly unwelcome outcomes. It is also a cautionary tale for those of us who might think that flagging a record as being “incorrect” or inaccurate might solve the problem… such flags are only as good as the policing that surrounds them.
Matthew Jillard lives on Repton Road in a suburb of Birmingham. In the past 18 months he has been raided over 40 times by the police. During Christmas week he was raided no fewer than 5 times, with some “visits” taking place at 3am and 5am, disturbing him, his family, his family’s guests, his neighbours, his neighbour’s guests….
According to Mr Jillard,
9 times out of 10 they are really apologetic.
Which suggests that 1 time out of 10 the visiting police might annoyed at Mr Jillard for living at the wrong address(??)
The root cause: The police are confusing Mr Jillard’s address with a house around the corner on Repton Grove.
(scroll the map to the right to find Repton Grove)
Clancy Wiggum from the Simpsons
Not a spokesman for West Midlands Police

View larger map
Complaints to the police force in question have been met with apologies and assurances that the police have had training on how important it is to get the address right for a search. Some officers have blamed their Sat Nav for leading them astray.
Given the cost to the police of mounting raids, getting it wrong 40 times will be putting a dent in their budget. Also, the costs to the police of putting right any damages done to Mr Jillard’s home due to the incorrect raids (which have included kicking in his door at 3am on Christmas Day) will also be mounting up.
The police have said that “measures” have been taken to prevent Mr Jillard’s home being raided, including putting a marker against his address on the police computer systems. None of these measures appear to have stopped the raids, which come at an average frequency of more than one a fortnight (40 raids in 18 months).
This Trainwreck highlights the impact of apparently simple errors in data:
  1. Mr Jillard’s home is being disturbed without cause on a frequent basis
  2. His neighbours must be increasingly suspicious of him, what with the police calling around more often than the milkman
  3. The police force is incurring costs and wasting man power with a continuing cycle of fruitless raids.
  4. The real target of the raids are now probably aware of the fact that the police are looking for them and will have moved their activities away from Repton Grove.

Zero Entertainment in Norway

This story comes from our Norwegian Correspondent, Mr. Arnt-Erik Hansen (former IAIDQ Director of Member Services). We let him tell his tale in his own words, with only minor editing….

++++

Norway is the only country in the world, as far as I know, with full transparency with regards to personal income, wealth and state tax. One of the big Autumnal Entertainments in Norway is the annual publication of the list of taxpayers, their incomes, and the amount of tax they have paid. . Here you can search for your neighbor, your friends, enemies, public companies or government officials to find what their income was, how much they contributed to the state and how wealthy they are. On October 20, 2010 the list was published on the internet

Newspapers, TV, and blogs (including this one) find this a great event to generate stories.  Every year there are stories about the person earning the least and the most. Not only that, they compare a person’s income to the average income in the country, according to your age. They will even go as far as determining the street in Norway with the lowest average income. This is business intelligence on real live data.

However the Norwegian tax authorities are not immune to data quality problems. Here are some of the stories I read in the newspapers today (all online of course [editor: Links to the stories will be posted here soon])

A polish citizen, address in Polen and with a modest Norwegian income is reported to have paid more than 119’000’000 Norwegian Kroner (approx US$20.4 million, or €14.6million ) in taxes for 2009. As a consequence he can claim to be the biggest contributor to the state coffers. Thanks Polen, I am sure that someone will be there to collect the money soon.

This is of course wrong and the reason is, most likely, that someone punched in the wrong number – a number with too many zeros. We’ll call this “The Fat Finger Zero Error“.

Apart from errors in the data itself, there appear to be errors in the actual interpretation of the data. For example, a lady working as a cleaner got a tax claim of no less than 84’005’501 Norwegian Kroner (€10.3million or US$14.4million). Her income for 2009 was reported by one newspaper to be 324’000 (US$55,600, or €39,818) so this is an obvious error. Another newspaper reported her income to be 240’000.

A key question is: What has happened here?

The answer from the tax authorities when asked to comment on this was simply – errors happen, unfortunately, and will be corrected immediately. However, the reason was, most likely, that someone punched in the wrong number – a number with too many zeros. Well, seems that someone has a problem with zeros.

There is another story to come about the “Fat Finger Zero” error.  But first we need to share some insight into the tax reporting and collection process in Norway works.

It’s not too different to any other country. Except in Norway the State sends you your forms filled in with the information the tax authorities have about you and you simply have to sign them and send them back.  And, like most tax authorities, they most likely know more about you than yourself.

For instance, banks in Norway send megabytes of data about all customers and their accounts to the tax authorities. Which leads us to our third IQTrainwreck example in this story…

This year two banks managed to put zeros behind the customer account balances instead of in front for 500 customers.  So €00000500 became €500,0000.

A possible root cause: the definition of the attributes in the file has gone astray.

But from two banks?

Or was the requirement wrong?

It  seems that Norway has a problem with zeros.

Oh, almost forgot to mention – today the Norwegian State Fund was valued above the 3’000 Billion mark for the first time. That’s 3’000’000’000’000 Norwegian Kroner (US$514,610,220,303).

I think I understand the problems with zeros. (But how much of that is due to tax collection errors? – the sting in the tail for Norwegian tax payers is that if there is an error in their tax calculations they have to pay the decided amount and then are refunded the amount of any error).

PS! If you look up my name you will find income 0, wealth 0 and taxes 0. The reason: I lived in Switzerland in 2009 and in that country you are invited to pay taxes and it is not a criminal offence if you don’t tell the state everything.

Police Untelligence

From The Register comes this wonderful example of the problems that can arise where data is used for unintended purposes, resulting in poor quality outcomes for all involved.

The NYPD have been regularly raiding the home of an elderly Brooklyn couple. They’ve been hit 50 times over the past 4 years, which might mark them out as leading crime kingpins but for the fact that their address has wound up included in police data used to test notification systems. The Reg tags this as “a glitch in one of the department’s computers”, but Information Quality trainwreck observers will immediately recognise that the problem isn’t with the technology but with the Information.

The trainwreck is compounded by two facts which emerge in the article:

  1. NYPD believed that they had removed the couple’s address from the system back in 2007, but it appears to have not been the case (or perhaps it was restored from a backup)
  2. The solution the NYPD have now implemented is to put a flag on the couple’s address advising officers NOT to respond to calls to that address.

The latter “solution” echoes many of the pitfalls information quality professionals encounter on a daily basis where a “quick fix” is put in to address a specific symptom which then triggers (as el Reg puts it) “the law of unintended consequences”.  To cut through implication and suggestion, let’s pose the question – what happens if there is an actual incident at this couple’s home which requires a police response?

What might the alternative approaches or solutions be to this?

(And are the NYPD in discussions with the Slovak Border police about the perils of using live data or live subjects for testing?)

Did you check on the cheques we sent to County Jail?

Courtesy of Keith Underdown comes yet another classic IQ Trainwreck  which he came across on the CBS News.

It seems that up to 3900 prisoners received cheques (or ‘checks’ to our North American readers) of US$250 each, despite the very low probability that they would be able to actually use them to stimulate the economy. Of the 3900, 2200 were, it seems, entitled to receive them as they had not been incarcerated in any one of the three months prior to the enactment of the Stimulus bill.

However, that still leaves 1700 prisoners who should not have received cheques who did. The root cause?

According to CBS News:

…government records didn’t accurately show they were in prison

A classic information quality problem… accuracy of master data being used in a process resulting in an unexpected or undesired outcome.

While most prisons have intercepted and returned the cheques, there will now need to be a process to identify,  for each prisoner, whether the Recovery payment was actually due. Again, a necessary manual check (no pun intended) at this stage but one which will add to the cost and time involved in processing the Recovery cheques.

Of course, we’ve already written here about the problem with Stimulus cheques being sent to deceased people.

These cases highlight the fact that an Information Quality problem doesn’t have to be massively impacting on your bottom line or impact significant numbers of people to have an impact on your reputation.

US Government Health (S)Care.

Courtesy of Jim Harris at the excellent OCDQBlog.com comes this classic example of a real life Information Quality Trainwreck concerning US Healthcare. Keith Underdown also sent us the link to the story on USAToday’s site

It seems that 1800 US military veterans have recently been sent letters informing them that they have the degenerative neurological disease ALS (a condition similar to that which physicist Stephen Hawking has).

At least some of the letters, it turns out, were sent in error.

[From the LA Times]

As a result of the panic the letters caused, the agency plans to create a more rigorous screening process for its notification letters and is offering to reimburse veterans for medical expenses incurred as a result of the letters.

“That’s the least they can do,” said former Air Force reservist Gale Reid in Montgomery, Ala. She racked up more than $3,000 in bills for medical tests last week to get a second opinion. Her civilian doctor concluded she did not have ALS, also known as Lou Gehrig’s disease.

So, poor quality information entered a process, resulting in incorrect decisions, distressing communications, and additional costs to individuals and governement agencies. Yes. This is ticking all the boxes to be an IQ Trainwreck.

The LA Times reports that the Department of Veterans Affairs estimates that 600 letters were sent to people who did not have ALS. That is a 33% error rate. The cause of the error? According to the USA Today story:

Jim Bunker, president of the National Gulf War Resource Center, said VA officials told him the letters dated Aug. 12 were the result of a computer coding error that mistakenly labeled the veterans with amyotrophic lateral sclerosis, or ALS.

Oh. A coding error on medical data. We have never seen that before on IQTrainwrecks.com in relation to private health insurer/HMO data. Gosh no.

Given the impact that a diagnosis of an illness which kills affected people within an average of 5 years can have on people, the simple coding error has been bumped up to a classic IQTrainwreck.

There are actually two Information quality issues at play here however which illustrate one of the common problems in convincing people that there is an information quality problem in the first place . While the VA now estimates (and I put that in bold for a reason) that the error rate was 600 out of 1800, the LA Times reporting tells us that:

… the VA has increased its estimate on the number of veterans who received the letters in error. Earlier this week, it refuted a Gulf War veterans group’s estimate of 1,200, saying the agency had been contacted by fewer than 10 veterans who had been wrongly notified.

So, the range estimates for error goes from 10 in1800 (1.8%) to 600 in 1800 (33%) to 1200 in 1800 (66%). The intersting thing for me as an information quality practitioner is that the VA’s initial estimate was based on the numberof people who had contacted the agency.

This is an important lesson.. the number of reported errors (anecdotes) may be less than the number of actual errors and the only real way to know is to examine the quality of the data and look for evidence of errors and inconsistency so you can Act on Fact.

The positive news… the VA is changing its procedures. The bad news about that… it looks like they are investing money in inspecting defects out of the process rather than making sure the correct fact is correctly coded in patient records.