We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
Compliance departments are under great pressure to leverage data analytics in day-to-day operations. Internally, pressure stems from mandates to do more with fewer resources, mitigate and control for a growing list of fraud scenarios and investigate increasingly complex fraud schemes that are otherwise difficult to unravel.
Externally, pressure is coming from regulatory guidance, audit committees and the competition. These forces have compliance departments moving quickly toward data-driven operations and integrated use of data analytics. However, there are fundamental biases in data usage that even seasoned data scientists have trouble avoiding and that compliance professionals should be aware of and proactively take steps to avoid.
For simplicity, biases should be viewed as the impact the environment or subconscious has on decision-making, the way we interpret data and the results we yield from our work. We may not always be aware of biases when experiencing their effects, but luckily, we can be aware of the existence of several common types of biases and understand ways to mitigate impact.
Confirmation bias occurs when an analysis is conducted or perceived in a way that supports preconceived hypotheses and ignores data or interpretations that run contrary. For example, a trade surveillance analyst may believe the options traders at his bank are more likely to manipulate the market. As a result, he may spend less time investigating traders in other products and increase diligence on options trade alerts.
With increased focus on options traders, it stands to reason that the analyst will escalate more options trading alerts, irrespective of the relative riskiness of the underlying behavior. Subsequent reporting may show that options traders were subject to the most escalations and therefore should be subject to additional scrutiny.
Selection (sample) bias
While confirmation bias is present in data analysis, selection bias occurs in the initial choice of the data to (or not to) review. Compliance reviews and compliance investigations often fall prey to this bias when making a sample selection from which to test. Often a “random” or “risk-based” sample of 25 is used as selection criteria.
This is subject to selection bias: the “random sampling” could actually be the 25 most recent transactions, or transactions in a chronological grouping — oversampling a specific time.
Similarly, if the risk-based 25 are the highest value transactions, the sample improperly ignores smaller value transactions. It may even be that fraud is more likely to occur in those smaller value transactions, leaving the organization exposed to fraud risk.
Survivorship bias is a subset of selection bias that considers not only what data is being selected for an analysis, but considers the lack of data available for selection in the first place. For example, a compliance professional might notice that vendor due diligence is flagging IT vendors more frequently than other vendor types, suggesting IT vendors are inherently a riskier group.
However, it may be the case that the IT department regularly submits vendors for due diligence review, whereas the sales department does so less frequently. When the compliance professional notices IT vendors being flagged more often, she may infer that IT vendors are a riskier category to be subsequently monitored. However, it is, in fact, the relative absence of other vendors being reviewed causing this incorrect inference.
False cause fallacy (causation vs. correlation)
False cause fallacy is a commonly cited and understood form of bias in which correlation does not imply causation: “ice cream increases likelihood for sunburns because ice cream sales are highly correlated with aloe vera sales.” Of course, ice cream sales increase in the summer as does time spent outdoors, but both are independent of one another.
Despite familiarity with this fallacy, it often occurs in data analysis and can go unrecognized. Compliance professionals working on a travel spend analysis may notice, on average, members of Group A booked hotel stays outside the preferred vendor list more frequently than other groups.
The inference is that Group A was not adhering to travel policies. This could be a correlation, not a causation, if it turns out that Group A is responsible for travel to more suburban locations and the preferred vendor list skewed toward large cities.
When Group A travels to large cities where preferred vendor options existed, their compliance may be well above average. Simply put, more data points are needed for consideration before concluding correlation or causation.
What can be done?
Recognizing bias can exist in every analysis is the first step toward mitigating bias. There are also several steps a compliance function can take to help identify and avoid these biases, such as independent validation, documentation review and training.
Independent validation is a powerful tool wherein someone recreates an analysis, even at a high-level, from scratch. If the independent recreation yields dramatically different results, it may be because bias was present in one version of analysis and that a comparison of the two approaches can tease out the biased decisions present.
Compliance professionals can perform their own high-level independent validation of an analysis (or request one from a peer) without needing a deep technical skillset and without needing to understand complex analytics performed in the original analysis.
Conversely, documentation review is the complement to independent validation in that it does not involve recreating work, but instead is the methodical review of the assumptions and choices made in the original analysis.
All good data-driven work is documented and a compliance professional should feel empowered to request documentation from others’ work product and step through it individually or with their counterpart as needed. Each decision point in an analysis or script should comport with the compliance professional’s understanding of the subject being analyzed.
Finally, training on critical thinking and data correlations can help compliance professionals identify the many variables at play in their analyses. Through education and practice, a compliance professional will be able to find the hidden connections between variables, question assumptions and tease out causation from correlation.
Michael Costa is a managing director at StoneTurn.
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!