Applications that use artificial intelligence and machine learning techniques present unique challenges to testers. These systems are largely black boxes that use multiple algorithms, sometimes hundreds, to process data in a series of layers and return a result.
While testing can be a complex undertaking for any application, at a fundamental level it involves ensuring that the results returned are what you expect for a given entry. And with AI / ML systems, that’s a problem. The software returns a response, but testers have no way of independently determining if this is the correct answer. This is not always obvious, as testers don’t necessarily know what the correct answer should be for a given set of inputs.
In fact, some application results can be laughable. Individual ecommerce recommendation engines are often wrong on the facts, but as long as they collectively inspire shoppers to add items to their shopping carts, they are considered a business success. And how do you determine if your ML application is achieving the required level of success before deployment?
Thus, the definition of a correct answer depends not only on the application, but also on its degree of precision. If the answer has to be exact, that’s simple, but how close is it close enough? And will it always be close enough?
It is ultimately the black hole for testers. If you do not have a functional statistical definition of precision based on the needs of the problem area, you cannot objectively tell whether a result is correct or not.
It just got worse and worse from there. Testers may not know whether an answer is right or wrong, even for a binary response. Under some circumstances, it may be possible to go back to the training data and find a similar case, but there is still no obvious way to validate the results in many circumstances.
Does it matter? Yes, and probably more than in traditional business applications. The vast majority of results from a traditional business application can be easily categorized as correct or incorrect. Testers don’t need to know how the underlying algorithms work, although it would be helpful if they did.
ML applications are not that obvious. A result may look correct, but bias or distorted training data can cause it to be wrong. But wrong answers can also result from using an incorrect ML model that occasionally or consistently produces less than optimal responses. This is where Explainable AI (XAI) can help.
Explainable AI explained
XAI is a way for an AI or ML application to explain why it achieved a particular result. By providing a defined path from input to output, XAI can allow a tester to understand the logic between inputs and outputs that might otherwise be impenetrable.
XAI is a young field and most business AI / ML applications are not yet embracing it. The techniques behind the term are loosely defined. While app users can gain confidence if they have a rationale that points to a result, any explanation also helps development and testing teams validate algorithms and training data and ensure that the results accurately reflect the area of ââthe problem.
Pepper, the SoftBank robot that responds to tactile stimulation, is a fascinating example of an early XAI effort. Pepper has been programmed to talk about his instructions as he performs them. Speaking through instructions is a form of XAI, in that it allows users to understand why the robot is performing specific sequences of activities. Pepper will also identify contradictions or ambiguities through this process and know when to seek further clarification.
Imagine how such a program feature can help testers. Using the test data, the tester can get a result and then ask the application how it got that result, while manipulating the input data so the tester can document why the result is valid.
But that only scratches the surface; XAI must serve multiple constituents. For developers, this can help validate the technical approach and the algorithms used. For testers, this helps confirm accuracy and quality. For end users, it is a way to build trust in the app.
The three legs of the XAI stool
So how does XAI work? There is a long way to go here, but there are a few techniques that show promise. XAI operates according to the principles of transparency, interpretability and explainability.
- Transparency means you can examine the algorithms to clearly discern how they process the input data. While it might not tell you how these algorithms are trained, it does provide insight into the path to results and is meant to be interpreted by design and development teams.
- Interpretability is how the results are presented for human understanding. In other words, if you have an application and get a particular result, you should be able to see and understand how that result was achieved, based on the input data and the processing algorithms. There should be a logical path between the data inputs and the results outputs.
- Explainability remains a vague concept as researchers attempt to define exactly how this might work. We may wish to take charge of queries about our results or obtain detailed explanations of more specific phases of treatment. But until there is better consensus, this feature remains a gray area.
Several techniques can help AI / ML applications to be explainable. These tend to make quantitative assumptions about how to qualitatively explain a particular result.
Two common techniques are Shapley values ââand integrated gradients. Both offer quantitative measures that assess the contribution of each set of data or characteristics to a particular outcome.
Likewise, the method of contrastive explanations is an afterthought calculation that tries to isolate individual outcomes based on why an outcome occurred versus a competing outcome. In other words, why did it return this result and not this one?
Again, this is a quantitative measure that assesses the likelihood of one outcome over another. The numbers give you the relative positioning of the strength of the entry on the outcome.
Data only gets you there in part
Ultimately, since AI / ML applications are data driven and the manipulation of that data must use quantitative methods for explainability, we have no way beyond data science to provide explinations. The problem is that numeric weights can play a role in interpretability, but are still far from a true explainable.
AI / ML development teams should understand and apply techniques such as these, for their own benefit and that of testers and users. In particular, without an explanation of the result at some level, it may not be possible for testers to determine whether the returned result is correct or not.
To ensure the quality and integrity of AI / ML applications, testers must have a way to determine where results are coming from. XAI is a start, but it will take some time to fully realize this technique.