How do you audit a regulated data processing spreadsheet? Part 2: Scanning for Data Problems


Once I've got the basic idea of what I'm looking at and how it's been populated and controlled, it's time to look a the data.

The first thing that I do is a scan of the data. For smaller tables I'll zoom out and take a full view of the whole. For a large, table-shaped dataset like you find in many spreadsheets, I'll start with a Z-shaped scan of the data: across the columns in the top few rows of data, diagonally down to the bottom few rows, and then back across again.

This allows your eye to catch some obvious signs, for example:

  • Error Codes, Warnings (look for green flags and #CODE!)
  • Circular references
  • Blank spaces
  • Text amidst a field of numbers
  • Are results the expected/reasonable order of magnitude?
  • Sudden or recurrent changes in units
  • Inconsistent number formats
  • Unexpected changes in formatting that could indicate a cut/paste problem
  • Something that breaks the expected shape of the data - for example one row that has an extra number
  • Any other unusual patterns or inconsistencies in the data that come to the eye

Some things that often show up are weirdly repeating numbers, dates and times that make no sense or seem to be recorded in the wrong order, or patterns that emerge based on who did the work or when - e.g. one person's numbers are consistently lower than everyone else's.

Depending on the scope of the data I like to keep track of the things I see during these scans rather then investigating them right away.

Also, if I'm not the "Quality Control" step (and I will post some other time on why I think QA should not be the QC step), if I see too many obvious problems off the bat I'll send it back to the owner with a message that someone needs to do some QC before it gets any further review or audit.

Now that I've got my list of things to check out, I'll look a little closer, with the source data at hand:

  • Is the dataset complete? Do we have the expected number of inputs / subject / groups?
  • Do the number of results match what you would expect?
  • What are the warnings and errors about? Are they benign or expected?
  • Are any blanks valid? Are there error codes for valid data and what does that signify?
  • Check a portion (usually 10-15%) of input data against the source, including both random points and those suspicious data points.
  • I'll also look at anything that stood out with the functionality, using the Show Formulas (Ctrl+`) and Trace precedents/dependents (Ctrl+[ and Ctrl+] to navigate) to help me.

Finally I'll look at any charts, graphs or formatted results tables and do a sanity check - number of points, order of magnitude, check that the corners and a few in the middle match, that kind of thing.

Until next time, thanks for reading!

– Brendan

p.s. Enjoy this message? Read more at the Hyland Quality Systems website.

The HaiQu Newsletter

I'm Brendan Hyland. I help regulated facilities transform their software, spreadsheets, workflows and documents from time-consuming, deviation-invoking, regulatory burdens, to the competitive advantage they were meant to be. Join me every week as we take a few minutes to explore, design, test and improve the critical systems we use in our facilities.

Read more from The HaiQu Newsletter

I’ve seen this pattern repeatedly: Instrument data is saved as an Excel or CSV. Data is then copied to a bare-bones spreadsheet with several columns of calculations or transformations. Results from the spreadsheet’s calculations are copied for use further down the data analysis pipeline. And I, the auditor, get handed a signed and dated pdf of the worksheet. Was this email forwarded to you? Subscribe Here! Why yes, M. Inspector, this spreadsheet was validated! Ok, they are rarely this bad - I...

The Spreadsheet Risk Reduction Guide

Calling quality people everywhere! I've just released something I think you'll find useful. Over the past 20 years working in regulated environments, I've seen the same spreadsheet problems show up again and again: A quick Excel tool gets built to solve an immediate need. It works. Gets "validated" with a few hand calculations. Then gets reused and modified for different datasets or slightly different purposes. Eventually - sometimes months or years later - someone discovers an error. Or an...

I'm thrilled to announce that I'll be presenting at the CCSQA/NERCSQA Joint Annual Meeting in October! The conference is 2 days, Thursday & Friday, 16-17 October 2025, in Laval, Quebec, Canada. On-site and virtual attendance available. Here's the announcement link: https://sqa.org/CCSQA/CCSQA/Events/Upcoming_Events.aspx My session is on Friday morning, and titled "The power of a specification: Freeing your creative self to go beyond compliance." Here's the abstract: As busy quality...