Explaining machine learning models to business users using BigQueryML and Google Data Studio

Muhajirakbarhsb
4 min readMar 16, 2022
Photo by Stephen Dawson on Unsplash

With Explainable ML data scientists can understand what factors contribute to predictions for even the most complex deep learning models. But how can we make an Explainable ML model easily?

In this post, we’ll look at how to take advantage of explainable ML by creating models in SQL using BigQuery ML and then explain those model predictions to stakeholders and domain experts using “What-If Scenario Dashboards” in Google Data Studio.

What is explainability?

Explainability is a way to understand what a machine learning model is doing. There are two types of explainability. In local explainability, we ask the ML model how it came up with a result for an individual prediction. Why do Customers unsubscribe from the app? Why does it suggest that this email is spam?
In global explainability, we ask about feature importance. How important is the length of stay to predict whether users will buy more products? How important is the level of education in predicting salary?

A Classification model

We’ll take the simple model to predict whether a user purchased a particular product using XGBOOST Model, a short query fits a classification model to predict whether a user purchased a particular product. BQML includes robust defaults along with many options for specifying model behavior. BigQuery provides model fit metrics, training logs, and other model details.

Build Model

Explainable ML in BQML

Explainable ML in BigQuery ML supports a variety of machine learning models, including both time series and non-time series models. Each of the models takes advantage of a different explainability method. because the model we use is XGBOOST, we use a TREE SHAP to get the explainable value of the model.

For example, the following query creates three hypothetical transactions with varying Age, Gender, and Expected Salary. The model predicts the first transaction will purchase, because of the largely expected salary.

BQML Explainable AI Query and Result

Creating a “What-If Scenario Dashboard” in Google Data Studio

As you can see, you can use instance explanations to explain how different inputs caused the resulting value to change. This can help build confidence that the model is working properly and with BQML that will be so easy to implement. While BQML unlocks a rich set of capabilities, it can be more valuable to bring explainable ML to non-technical stakeholders such as business domain experts or executives.

if this tutorial describes how to create a “what-if” dashboard using looker then here We will try to replicate the dashboard using Google Data Studio. In this example, we utilize the semantic model to define the BQML SQL and create a dashboard. Once the dashboard is constructed, end users can enter their own transaction details using dashboard filters and view the prediction and model explanation — all without making any code!

Google Data Studio Explainable ML Dashboard

BQML Predictions in Google Data Studio

The script below creates the “What-If Scenario” dashboard. Define a parameter for each hypothetical user input. Build a derived table using ML.EXPLAIN_PREDICT on a subquery with the user’s input parameters.

Next Steps

If you have a lot of data to display in the dashboard we can also automate this manual process so that the model gets updated monthly/quarterly depending upon the business needs.

Idea is to chain all queries so the completion of one query executes another. Here is how the data flow looks like:

Data Pipeline

we can query the model shortly after the data update and process the model with a trigger from the cloud scheduler or pub/sub. Here is a cloud function code snippet with node js of how to update the model and then send the trigger to pub/sub so that a new global explanation datamart can be created to be displayed on our “what -if” dashboard

After the new model deployed, we send a signal to pub/sub to update the query table from the global explanation, so that the global explanation on our dashboard will also be updated using the latest data and model.

Summary

In this guide, we performed Explanable XGBOOST all using SQL queries by leveraging BigQuery ML and visualized the “what-if dashboard” in the data studio. We also touched on how the Explaining machine learning models can explain easily to business users. Lastly, we set up the data pipeline to automate the entire model process so that we can get updated data on a monthly, weekly, or daily basis.

I hope I was able to convey the idea that with little effort how much we can extract information from the data and model that we have. And how effective it can be to support and improve the business.

Sources and Further Readings

--

--