Orderless in Seattle: Software "glitch" shuts down Swedish Medical Center's medical-records system

A commenter yesterday commented that after 25 years in practice, they had one lost [paper] chart (as opposed to an IT systems crash, where every chart is lost temporarily).

As coincidence would have it, there's this story in the news:

Software glitch shuts down Swedish medical-records system
Tuesday, January 25, 2011
By Carol M. Ostrom
Seattle Times health reporter

A four-hour shutdown of Swedish Medical Center's centralized electronic medical-records system Monday morning was caused by a glitch in another company's software, said Swedish chief information officer Janice Newell.

There's that word "glitch" again that I see so frequently in the health IT sector when a system suffers a major crash that could harm patients. Why do we not call it a "glitch" when a doctor amputates the wrong body part, or kills someone?

The system, made by Epic Systems, a Wisconsin-based electronic medical-records vendor, turned itself off because it noticed an error in the add-on software, Newell said, and Swedish was forced to go to its highest level of backup operation.

Turned itself off? Back we go to the old Unix adage that "either you're in control of your information system, or it's in control of you."
To prove that point, note that "the highest level of backup operation" had a bit of a problem:

That allowed medical providers to see patient records but not to add or change information, such as medication orders.

I'm sure sick and unstable patients such as in the ICU's, as well as their physicians and nurses, appreciated this minor "glitch." Look, Ma, no orders!
(Do events like this ever happen in the middle of a Joint Commission inspection?)
The "glitch" didn't just affect a few charts:

The outage affected all of Swedish's campuses, including First Hill, Cherry Hill, Ballard and its Issaquah emergency facility, as well as Swedish's clinics and affiliated groups such as the Polyclinic.

I cannot imagine a paper-based "glitch" that could affect so many, so suddenly, other than a wide-scale catastrophe.

During the outage, new information was put on paper records [that 5,000 year old, obsolete papyrus-based technology that's simply ruining healthcare, according to the IT pundits - ed.] and transferred into patient records in the Epic system after the system went back up in the afternoon. [By whom? Busy doctors? - ed.] Epic, Newell said, is "really good at fail-safe activity," and if it detects something awry that could corrupt data, it shuts itself off, which it did Monday at about 10 a.m.

Which means that interfaced systems need to undergo the highest levels of scrutiny in real world use, if they can in effect shut down an entire enterprise clinical system.
I note the identity of the "other company's software" that brought the whole system to a grinding halt was not identified, nor was the nature of the "other vendor's" software "glitch" itself. Was the problem truly caused by "another vendor" via a bug in their product, via a faulty upgrade, or an internal staff error related to the "other vendor's" software?
It seems we now have yet another defense for HIT "glitches" other than "blame the users": it's not OUR fault; blame the other vendors.

Newell said the shutdown likely affected about 600 providers, 2,500 staffers and perhaps up to 2,000 patients, but no safety problems were reported.

As I've noted at this blog before, it is peculiar how such "glitches" never seem to produce safety problems, or even acknowledgments of increased risk.

Staff members were notified of the shutdown via error messages, e-mails, intranet, a hospital overhead paging system and personal pagers.

"Warning! Warning! EHR and CPOE down! Grab your pencils!" Just what busy doctors and nurses want to hear when they arrive for a harrowing day of patient care.
I wonder if the alert was expressed in a manner not understandable to patients, i.e., "Code 1100011" (99 in binary!) or something similar as in a medical emergency.

Newell said she was "99.9 percent sure" other hospitals have had similar shutdowns [that's certainly reassuring about health IT - ed.], because software, hardware and even power systems are not perfect. [That's why we have resilience engineering, redundancy, etc. - ed.]
"Anybody who hasn't had this happen has not been up on an electronic medical record very long," Newell said. "I would bet a year's pay on that."

A logical fallacy to justify some action or situation can take the form of an appeal to common practice. Is what I am seeing here what might be called an appeal to common malpractice?
Or is the fallacy simply a manifestation of the adage "misery loves company?"

Newell said this is not the first shutdown of Epic, which was fully installed in Swedish facilities in 2009 after a nearly two-year process. But it was the longest-running one, she acknowledged.


Swedish is exploring creating "more sophisticated levels of backup" with other hospitals, Newell said, locating a giant server in a different geographic area to protect against various disasters such as earthquakes or floods.

Maybe they should have done that after the aforementioned other "glitches."

I repeat the adage:

"Either you're in control of your information system, or it's in control of you."

Indeed, if the information system is mission-critical, and you cannot control it, you literally have no business disrupting clinicians en masse and putting patients at risk by letting it control you.

Finally, on the topic of 'cybernetic extremophiles', I note that we have several Mars Rovers and very distant space probes such as Voyager 1 whose onboard computers (in the case of Voyager, built long ago with much less advanced technology than today's IT) have been working flawlessly in environments far more hostile than a hospital data center, and long beyond their stated life expectancies.

The Voyager 1 spacecraft is a 722-kilogram (1,592 lb) robotic space probe launched by NASA on September 5, 1977 to study the outer Solar System and eventually interstellar space. Operating for 33 years, 4 months, and 22 days, the spacecraft receives routine commands and transmits data back to the Deep Space Network. Currently in extended mission, the spacecraft is tasked with locating and studying the boundaries of the Solar System, including the Kuiper belt, the heliosphere and interstellar space. The primary mission ended November 20, 1980, after encountering the Jovian system in 1979 and the Saturnian system in 1980.[2] It was the first probe to provide detailed images of the two largest planets and their moons.

As of January 23, 2011, Voyager 1 was about 115.963 AU (17.242 billion km, or 10.8 billion miles) or about 0.00183 of a light-year from the Sun. Radio signals traveling at the speed of light between Voyager 1 and Earth take more than 16 hours to cross the distance between the two.


While these are exceptional case examples of resilience in IT systems far less complex than hospital IT, I believe healthcare can do better in terms of computer "glitches" affecting mission critical systems that are a bit closer than 10 billion miles away.

-- SS

Komentar