scikit-learn

Classification

Computing a counterfactual of a sklearn classifier is done by using the ceml.sklearn.models.generate_counterfactual() function.

We must specify the model we want to use, the input whose prediction we want to explain and the requested target prediction (prediction of the counterfactual). In addition we can restrict the features that can be used for computing a counterfactual, specify a regularization of the counterfactual and specifying the optimization algorithm used for computing a counterfactual.

A complete example of a classification task is given below:

 1#!/usr/bin/env python3
 2# -*- coding: utf-8 -*-
 3from sklearn.datasets import load_iris
 4from sklearn.model_selection import train_test_split
 5from sklearn.metrics import accuracy_score
 6from sklearn.tree import DecisionTreeClassifier
 7
 8from ceml.sklearn import generate_counterfactual
 9
10
11if __name__ == "__main__":
12    # Load data
13    X, y = load_iris(return_X_y=True)
14    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=4242)
15
16    # Whitelist of features - list of features we can change/use when computing a counterfactual 
17    features_whitelist = None   # We can use all features
18
19    # Create and fit model
20    model = DecisionTreeClassifier(max_depth=3)
21    model.fit(X_train, y_train)
22
23    # Select data point for explaining its prediction
24    x = X_test[1,:]
25    print("Prediction on x: {0}".format(model.predict([x])))
26
27    # Compute counterfactual
28    print("\nCompute counterfactual ....")
29    print(generate_counterfactual(model, x, y_target=0, features_whitelist=features_whitelist))

Regression

The interface for computing a counterfactual of a regression model is exactly the same.

But because it might be very difficult or even impossible (e.g. knn or decision tree) to achieve a requested prediction exactly, we can specify a tolerance range in which the prediction is accepted.

We can so by defining a function that takes a prediction as an input and returns True if the predictions is accepted (it is in the range of tolerated predictions) and False otherwise. For instance, if our target value is 25.0 but we are also happy if it deviates not more than 0.5, we could come up with the following function:

1done = lambda z: np.abs(z - 25.0) <= 0.5

This function can be passed as a value of the optional argument done to the ceml.sklearn.models.generate_counterfactual() function.

A complete example of a regression task is given below:

 1#!/usr/bin/env python3
 2# -*- coding: utf-8 -*-
 3import numpy as np
 4from sklearn.datasets import load_boston
 5from sklearn.model_selection import train_test_split
 6from sklearn.linear_model import Ridge
 7
 8from ceml.sklearn import generate_counterfactual
 9
10
11if __name__ == "__main__":
12    # Load data
13    X, y = load_boston(return_X_y=True)
14    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=4242)
15
16    # Whitelist of features - list of features we can change/use when computing a counterfactual 
17    features_whitelist = [0, 1, 2, 3, 4]    # Use the first five features only
18
19    # Create and fit model
20    model = Ridge()
21    model.fit(X_train, y_train)
22
23    # Select data point for explaining its prediction
24    x = X_test[1,:]
25    print("Prediction on x: {0}".format(model.predict([x])))
26
27    # Compute counterfactual
28    print("\nCompute counterfactual ....")
29    y_target = 25.0
30    done = lambda z: np.abs(y_target - z) <= 0.5     # Since we might not be able to achieve `y_target` exactly, we tell ceml that we are happy if we do not deviate more than 0.5 from it.
31    print(generate_counterfactual(model, x, y_target=y_target, features_whitelist=features_whitelist, C=1.0, regularization="l2", optimizer="bfgs", done=done))

Pipeline

Often our machine learning pipeline contains more than one model. E.g. we first scale the input and/or reduce the dimensionality before classifying it.

The interface for computing a counterfactual when using a pipeline is identical to the one when using a single model only. We can simply pass a sklearn.pipeline.Pipeline instance as the value of the parameter model to the function ceml.sklearn.models.generate_counterfactual().

Take a look at the ceml.sklearn.pipeline.PipelineCounterfactual class to see which preprocessings are supported.

A complete example of a classification pipeline with the standard scaler skelarn.preprocessing.StandardScaler and logistic regression sklearn.linear_model.LogisticRegression is given below:

 1#!/usr/bin/env python3
 2# -*- coding: utf-8 -*-
 3from sklearn.datasets import load_iris
 4from sklearn.preprocessing import StandardScaler
 5from sklearn.pipeline import make_pipeline
 6from sklearn.model_selection import train_test_split
 7from sklearn.metrics import accuracy_score
 8from sklearn.linear_model import LogisticRegression
 9
10from ceml.sklearn import generate_counterfactual
11
12
13if __name__ == "__main__":
14    # Load data
15    X, y = load_iris(return_X_y=True)
16    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=4242)
17
18    # Whitelist of features - list of features we can change/use when computing a counterfactual 
19    features_whitelist = [1, 3]   # Use the second and fourth feature only
20
21    # Create and fit the pipeline
22    scaler = StandardScaler()
23    model = LogisticRegression(solver='lbfgs', multi_class='multinomial')   # Note that ceml requires: multi_class='multinomial'
24
25    model = make_pipeline(scaler, model)
26    model.fit(X_train, y_train)
27    
28    # Select data point for explaining its prediction
29    x = X_test[1,:]
30    print("Prediction on x: {0}".format(model.predict([x])))
31
32    # Compute counterfactual
33    print("\nCompute counterfactual ....")
34    print(generate_counterfactual(model, x, y_target=0, features_whitelist=features_whitelist))

Change optimization parameters

Sometimes it might become necessary to change to default parameters of the optimization methods - e.g. changing the solver, the maximum number of iterations, etc. This can be done by passing the optional argument optimizer_args to the ceml.sklearn.models.generate_counterfactual() function. The value of optimizer_args must be a dictionary where some parameters like verbosity, solver, maximum number of iterations, tolerance thresholds, etc. can be changed - note that not all parameters are used by every optimization algorithm (e.g. “epsilon”, “solver” and “solver_verbosity” are only used if optimizer=”mp”).

A short code snippet demonstrating how to change some optimization parameters is given below:

 1import cvxpy as cp
 2from ceml.sklearn import generate_counterfactual
 3#.......
 4
 5#model = ......
 6#x_orig = .....
 7#y_target = .....
 8
 9# Change optimization parameters
10#opt = ....
11opt_args = {"epsilon": 10.e-4, "solver": cp.SCS, "solver_verbosity": False, "max_iter": 200}
12
13# Compute counterfactual explanations
14x_cf, y_cf, delta = generate_counterfactual(model, x_orig, y_target, features_whitelist=None, C=0.1, regularization="l1", optimizer=opt, optimizer_args=opt_args, return_as_dict=False)
15