How do you audit a regulated data processing spreadsheet? Part 2: Scanning for Data Problems

Once I've got the basic idea of what I'm looking at and how it's been populated and controlled, it's time to look a the data.

The first thing that I do is a scan of the data. For smaller tables I'll zoom out and take a full view of the whole. For a large, table-shaped dataset like you find in many spreadsheets, I'll start with a Z-shaped scan of the data: across the columns in the top few rows of data, diagonally down to the bottom few rows, and then back across again.

This allows your eye to catch some obvious signs, for example:

Error Codes, Warnings (look for green flags and #CODE!)
Circular references
Blank spaces
Text amidst a field of numbers
Are results the expected/reasonable order of magnitude?
Sudden or recurrent changes in units
Inconsistent number formats
Unexpected changes in formatting that could indicate a cut/paste problem
Something that breaks the expected shape of the data - for example one row that has an extra number
Any other unusual patterns or inconsistencies in the data that come to the eye

Some things that often show up are weirdly repeating numbers, dates and times that make no sense or seem to be recorded in the wrong order, or patterns that emerge based on who did the work or when - e.g. one person's numbers are consistently lower than everyone else's.

Depending on the scope of the data I like to keep track of the things I see during these scans rather then investigating them right away.

Also, if I'm not the "Quality Control" step (and I will post some other time on why I think QA should not be the QC step), if I see too many obvious problems off the bat I'll send it back to the owner with a message that someone needs to do some QC before it gets any further review or audit.

Now that I've got my list of things to check out, I'll look a little closer, with the source data at hand:

Is the dataset complete? Do we have the expected number of inputs / subject / groups?
Do the number of results match what you would expect?
What are the warnings and errors about? Are they benign or expected?
Are any blanks valid? Are there error codes for valid data and what does that signify?
Check a portion (usually 10-15%) of input data against the source, including both random points and those suspicious data points.
I'll also look at anything that stood out with the functionality, using the Show Formulas (Ctrl+`) and Trace precedents/dependents (Ctrl+[ and Ctrl+] to navigate) to help me.

Finally I'll look at any charts, graphs or formatted results tables and do a sanity check - number of points, order of magnitude, check that the corners and a few in the middle match, that kind of thing.

Until next time, thanks for reading!

– Brendan

p.s. Enjoy this message? Read more at the Hyland Quality Systems website.

The Daily HaiQu

How do you audit a regulated data processing spreadsheet? Part 2: Scanning for Data Problems

One Quick Thing that Will Greatly Improve How You Work with AI

SQA 2025: Is it the right tool for the job?

How do you prove you're you when signing a document?