How do you audit a regulated data processing spreadsheet? Part 2: Scanning for Data Problems


Once I've got the basic idea of what I'm looking at and how it's been populated and controlled, it's time to look a the data.

The first thing that I do is a scan of the data. For smaller tables I'll zoom out and take a full view of the whole. For a large, table-shaped dataset like you find in many spreadsheets, I'll start with a Z-shaped scan of the data: across the columns in the top few rows of data, diagonally down to the bottom few rows, and then back across again.

This allows your eye to catch some obvious signs, for example:

  • Error Codes, Warnings (look for green flags and #CODE!)
  • Circular references
  • Blank spaces
  • Text amidst a field of numbers
  • Are results the expected/reasonable order of magnitude?
  • Sudden or recurrent changes in units
  • Inconsistent number formats
  • Unexpected changes in formatting that could indicate a cut/paste problem
  • Something that breaks the expected shape of the data - for example one row that has an extra number
  • Any other unusual patterns or inconsistencies in the data that come to the eye

Some things that often show up are weirdly repeating numbers, dates and times that make no sense or seem to be recorded in the wrong order, or patterns that emerge based on who did the work or when - e.g. one person's numbers are consistently lower than everyone else's.

Depending on the scope of the data I like to keep track of the things I see during these scans rather then investigating them right away.

Also, if I'm not the "Quality Control" step (and I will post some other time on why I think QA should not be the QC step), if I see too many obvious problems off the bat I'll send it back to the owner with a message that someone needs to do some QC before it gets any further review or audit.

Now that I've got my list of things to check out, I'll look a little closer, with the source data at hand:

  • Is the dataset complete? Do we have the expected number of inputs / subject / groups?
  • Do the number of results match what you would expect?
  • What are the warnings and errors about? Are they benign or expected?
  • Are any blanks valid? Are there error codes for valid data and what does that signify?
  • Check a portion (usually 10-15%) of input data against the source, including both random points and those suspicious data points.
  • I'll also look at anything that stood out with the functionality, using the Show Formulas (Ctrl+`) and Trace precedents/dependents (Ctrl+[ and Ctrl+] to navigate) to help me.

Finally I'll look at any charts, graphs or formatted results tables and do a sanity check - number of points, order of magnitude, check that the corners and a few in the middle match, that kind of thing.

Until next time, thanks for reading!

– Brendan

p.s. Enjoy this message? Read more at the Hyland Quality Systems website.

The Daily HaiQu

I'm Brendan Hyland. I help regulated facilities transform their software, spreadsheets, workflows and documents from time-consuming, deviation-invoking, regulatory burdens, to the competitive advantage they were meant to be. Join me every week as we take a few minutes to explore, design, test and improve the critical systems we use in our facilities.

Read more from The Daily HaiQu

It’s the first step of the problem solving framework that I was taught back in Engineering school. Not ‘Plan’. Not “Define”. “I want to and I can”. That particular framework - the McMaster Six Step - never gained the popularity of the ones now used today, but in the end they all contain the same basic elements - research, planning & design, implementation, evaluation and iteration - just stated in different ways. However I’ve never really seen this particular element called out explicitly...

Black music production equipment with headphones on desk

My eight year old son figured out a hack to make the music service work better for him. The kids have a Google smart speaker that is attached to a Spotify account so they can just ask for any of their favourite music. Anyone who has pre-teens in the house probably knows how much such a setup is used - all day every day. Coming from someone who had to run to the double-cassette boom box to press the record button any time a new favourite song came on the radio just so I could listen to it...

A reporter interviews a smiling man holding a book.

I’ve seen several quality leaders complain this week about their disappointment with generative AI - they’re not getting the results they expected. And I understand why - context is king! If you just ask AI to write a procedure or generate a quality document, you’ll get generic, mediocre output. Without enough context, AI can only produce something generic based on its training data. But how do you give it that context? By the time you’ve gone back and forth trying to “engineer the prompt” to...