How do you audit a regulated data processing spreadsheet? Part 2: Scanning for Data Problems


Once I've got the basic idea of what I'm looking at and how it's been populated and controlled, it's time to look a the data.

The first thing that I do is a scan of the data. For smaller tables I'll zoom out and take a full view of the whole. For a large, table-shaped dataset like you find in many spreadsheets, I'll start with a Z-shaped scan of the data: across the columns in the top few rows of data, diagonally down to the bottom few rows, and then back across again.

This allows your eye to catch some obvious signs, for example:

  • Error Codes, Warnings (look for green flags and #CODE!)
  • Circular references
  • Blank spaces
  • Text amidst a field of numbers
  • Are results the expected/reasonable order of magnitude?
  • Sudden or recurrent changes in units
  • Inconsistent number formats
  • Unexpected changes in formatting that could indicate a cut/paste problem
  • Something that breaks the expected shape of the data - for example one row that has an extra number
  • Any other unusual patterns or inconsistencies in the data that come to the eye

Some things that often show up are weirdly repeating numbers, dates and times that make no sense or seem to be recorded in the wrong order, or patterns that emerge based on who did the work or when - e.g. one person's numbers are consistently lower than everyone else's.

Depending on the scope of the data I like to keep track of the things I see during these scans rather then investigating them right away.

Also, if I'm not the "Quality Control" step (and I will post some other time on why I think QA should not be the QC step), if I see too many obvious problems off the bat I'll send it back to the owner with a message that someone needs to do some QC before it gets any further review or audit.

Now that I've got my list of things to check out, I'll look a little closer, with the source data at hand:

  • Is the dataset complete? Do we have the expected number of inputs / subject / groups?
  • Do the number of results match what you would expect?
  • What are the warnings and errors about? Are they benign or expected?
  • Are any blanks valid? Are there error codes for valid data and what does that signify?
  • Check a portion (usually 10-15%) of input data against the source, including both random points and those suspicious data points.
  • I'll also look at anything that stood out with the functionality, using the Show Formulas (Ctrl+`) and Trace precedents/dependents (Ctrl+[ and Ctrl+] to navigate) to help me.

Finally I'll look at any charts, graphs or formatted results tables and do a sanity check - number of points, order of magnitude, check that the corners and a few in the middle match, that kind of thing.

Until next time, thanks for reading!

– Brendan

p.s. Enjoy this message? Read more at the Hyland Quality Systems website.

The Daily HaiQu

I'm Brendan Hyland. I help regulated facilities transform their software, spreadsheets, workflows and documents from time-consuming, deviation-invoking, regulatory burdens, to the competitive advantage they were meant to be. Join me every week as we take a few minutes to explore, design, test and improve the critical systems we use in our facilities.

Read more from The Daily HaiQu

Last time we left off with a cliff-hanger of a question: How do you prove you're you when signing a document? There are several ways I've seen that the 3rd party providers prove that it's you who's signed the document: You clicked a link from an email. You paid for the service with a credit card. You provided some government issued photo ID. Someone, such as a notorized public or your HR department, has verified it's you in person. Obviously these are very different levels of assurance. Then...

There are several levels of 'signatures' that you can apply to an electronic document. The first and most basic is just an image of your written signature. One common option for this is to print the document, sign and scan it back in again. A more convenient version is to have an image of your signature saved that you can paste into documents. This is what many free versions of pdf software and word processors offer as a basic document signing option - a 'stamp' of your saved signature image....

Ever since COVID, document and signing workflows have been incorporated into everything. Dropbox has it. Microsoft Teams has it. Google Workspaces has it. If you need e-signatures, you probably have access to Docusign, Adobe, Hellosign, and so on. But what exactly are we talking about when we say "document and signing workflow"? Let's step back. Most document workflows are about moving some work through review, commentary, revision and approval. The old way to do this was to send a document...