cancel
Showing results for 
Search instead for 
Did you mean: 
intapiuser
Community Team Member
Community Team Member
Data is a beautiful mess.
As data heroes, we spot the mess and get rid of it to highlight the beauty.
Here are a couple useful tricks to detect incorrect data.

We will assume a table that describes the number of meows made by each cat in each box, on a given day.
why analytics? because cats

  1. NOTE THE DETECTED DATA TYPES

    If a field was detected as a different type, its values may not follow the right format

  1. EXPECT THE UNEXPECTED: NULLS, NUMERIC ANOMALIES, TEXT TYPOS

    See which text values are contained in your categorial fields:

    SELECT DISTINCT a.cat
    FROM table a

    Find numeric anomalies by detecting range (or simply visualize the field): 

    SELECT
        CASE WHEN a.meows < 0 THEN "under 0"
        WHEN a.meows >= 0 AND a.meows <= 10 THEN " 0 - 10"
        WHEN a.meows > 10 AND a.meows <= 50 THEN " 11 - 50"
        ELSE "over 50"
        END meows_range,
    COUNT(*) AS instances
    FROM table a
    GROUP BY 1


  2. EXPECT THE UNEXPECTED UNDER CERTAIN CONDITIONS

    Repeat the previous bullet, this time under certain business logic
    Here we expect Jingle the cat to meow at least once every time he snuggles in any box, therefore we will check Jingle's specific range

    SELECT
        CASE WHEN a.meows < 0 THEN "under 0"
        WHEN a.meows >= 0 AND a.meows <= 10 THEN " 0 - 10"
        WHEN a.meows > 10 AND a.meows <= 50 THEN " 11 - 50"
        ELSE "over 50"
        END meow_range,
    COUNT(*) AS instances
    FROM table a
    WHERE a.cat = 'Jingle'
    GROUP BY 1


  3. VALIDATE WITH REALITY

    Sample some specific and aggregate data points and check with external application
    Why are there entries of Mia the cat given she spent the day at the vet?
  4. FIND THOSE DUPLICATES

    Make sure that you know what combination of fields make a unique identifier in your table
    SELECT a.cat, a.box, COUNT(*) AS duplicates
    FROM table a
    GROUP BY a.cat, a.box
     

Rate this article:
Version history
Last update:
‎03-02-2023 08:33 AM
Updated by:
Contributors