cancel
Showing results for 
Search instead for 
Did you mean: 
HamzaJ
12 - Data Integration
12 - Data Integration

If you torture the data long enough, it will confess to anything: How to avoid misrepresentation

There is so much data you have access to within a company, and communicating accurate insights from company data is challenging. This blog covers the mental biases and common mistakes that people make when analyzing data, namely Confirmation Bias, Selection Bias, and Survivorship Bias.

Understanding Confirmation Bias

Confirmation bias refers to the inclination to actively seek or interpret information that aligns with preexisting beliefs, often disregarding contradictory evidence. This bias tends to overlook opposing data, especially in a business setting where it can lead to ignoring indications that a feature, product, or business element isn't functioning optimally due to a focus on a single favorable metric.

This phenomenon urges individuals to downplay negative signals while emphasizing positive aspects.

How to Detect Confirmation Bias?

There are three common signs that confirmation bias is influencing the data you are looking at.

  • Only good news- If the only news is good news confirmation bias is probably hiding some inconvenient metrics.
  • Limited metrics reports- Usually people present a few metrics because they are trying to be brief and that is desirable in most cases.
  • Obscure metrics being used- Many times, we cannot measure exactly what we want so we have to settle for proxy metrics. If people are confidently sharing convoluted proxy metrics, they are likely looking for ways to find positive signals in the data.

Identifying Confirmation Bias

Several indicators can signal the presence of confirmation bias in the analysis of data:

  • Selective Positivity: If exclusively positive news is highlighted, it's likely that confirmation bias is obscuring less favorable metrics.
  • Limited Metrics: Presenting only a handful of metrics could indicate an attempt to simplify, yet it might also be concealing contrary data.
  • Complex Proxies: When intricate proxy metrics are touted with confidence, there might be an effort to extract positive signals from the data due to the absence of direct measurements.

Counteracting Confirmation Bias

To approach confirmation bias systematically, consider these strategies:

1. Academic Approach:

Establish a peer review system where a fellow analyst scrutinizes your analysis before the presentation. This ensures diverse perspectives and helps validate conclusions. Additionally, replicating tests within the company, if resources permit, can corroborate findings.

2. Third-Party Audit:

Leverage data analysts to randomly assess shared analyses. If discrepancies or misleading visuals emerge, engage in a collaborative review with the creator. Once corrections are made, distribute updates to affected stakeholders.

3. Insurance Methodology:

Define clear parameters for data claims, restricting certain causal phrases, for instance. Claims beyond established boundaries aren't approved for sharing, aligning with an "insurance" mindset within the company.

4. Cultural Approach:

Highlight the company's mission as paramount. Emphasize that the mission's importance surpasses personal satisfaction with product or feature success. Acknowledge that mistakes are inevitable, and they serve as stepping stones for better decision-making in the future.

By integrating these strategies, businesses can mitigate the influence of confirmation bias, fostering a more objective and holistic analysis of data.

Understanding Selection Bias?

Selection bias arises when the group chosen for analysis does not accurately represent the broader population under study.

Acquiring an Unrepresentative Group Unintentionally

1. Convenience:
This occurs when a group is chosen for analysis due to ease of measurement, leading to a non-representative sample.

Example:
When evaluating the value of a new feature, asking only three friends for their opinion may not reflect the broader customer base.

Common Convenient Biases:

    • Engaged customers
    • Latest user cohort
    • Specific geographic region

2. Self-Selection:
This happens when individuals who voluntarily participate in analysis possess traits that differ from the entire population.

Example:
Distributing a product satisfaction survey could attract highly opinionated or time-wasting respondents, skewing results.

Common Biased Self-Selectors:

    • Extremely negative individuals
    • Overly positive individuals
    • Early adopters
    • Power users

Selection Bias in Business Context

Suppose you’re launching a premium feature in your BI Tool. If you invite only the most active users to try it, their motivations might cloud the analysis:

  • They might test all new features, irrespective of value.
  • They might seek contact with your company.
  • They might aim to suggest new features.

Although well-intentioned, their feedback might mislead you.

Remedies:

  • Actively engage a representative sample for testing.
  • Employ qualifying questions to select a balanced sample.
  • If many early adopters respond, consider weighing their feedback against typical customers.

Grasping Survivorship Bias

These biased interpretations of events and data are prevalent within the business landscape. Imagine you’re overseeing a two-week trial phase for a recently launched dashboard/data product.

At the midpoint of the trial, you observe only a few remaining active users. Let’s suppose that these users primarily consist of data analysts, and they are progressively generating more intricate analyses using your BI tool.

Possible Conclusions and Their Caveats

1. My dashboard/data product resonates with data analysts (inferring a norm):

Without investigating the users who ceased engagement with your dashboard, you lack insight into whether the initial trial group included data analysts. If a majority of those who started the trial were indeed data analysts but a larger portion discontinued usage, your initial conclusion would be contradicted. It’s imperative to delve into the non-surviving trial participants before drawing conclusive statements. Engaged data analyst users do not necessarily imply that your product resonates with all data analysts.

To reach a more informed conclusion:

Analyze the entire trial participant pool to unveil distinct patterns between engaged and non-engaged cohorts. Understanding the characteristics of both groups before deducing a norm is crucial.

2. My dashboard empowers deep analysis (inferring causality):

Currently, you lack a point of comparison for your users’ capabilities. While they may be utilizing your dashboard to perform advanced tasks, their proficiency might transcend your product’s design. Their success could be due to their analytical skills rather than your dashboard’s influence.

To reach a more informed conclusion:

Pre-assess users’ skill levels to gauge the true value added by your product’s features. Compare their performance using other tools for similar tasks. If your product demonstrates clear benefits, you can leverage these results as a testament to its effectiveness.

Key Takeaways

  1. Analyze the entire trial cohort, encompassing both engaged and non-engaged users.
  2. Gauge your product’s impact by comparing user behavior with competitor tools.
  3. Evaluate users’ skill levels before attributing success solely to your product.

By adopting these approaches, you can mitigate survivorship bias and arrive at more accurate and nuanced conclusions about your product’s performance.