scikit-learn
Classification
Computing a counterfactual of a sklearn classifier is done by using the ceml.sklearn.models.generate_counterfactual()
function.
We must specify the model we want to use, the input whose prediction we want to explain and the requested target prediction (prediction of the counterfactual). In addition we can restrict the features that can be used for computing a counterfactual, specify a regularization of the counterfactual and specifying the optimization algorithm used for computing a counterfactual.
A complete example of a classification task is given below:
1#!/usr/bin/env python3
2# -*- coding: utf-8 -*-
3from sklearn.datasets import load_iris
4from sklearn.model_selection import train_test_split
5from sklearn.metrics import accuracy_score
6from sklearn.tree import DecisionTreeClassifier
7
8from ceml.sklearn import generate_counterfactual
9
10
11if __name__ == "__main__":
12 # Load data
13 X, y = load_iris(return_X_y=True)
14 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=4242)
15
16 # Whitelist of features - list of features we can change/use when computing a counterfactual
17 features_whitelist = None # We can use all features
18
19 # Create and fit model
20 model = DecisionTreeClassifier(max_depth=3)
21 model.fit(X_train, y_train)
22
23 # Select data point for explaining its prediction
24 x = X_test[1,:]
25 print("Prediction on x: {0}".format(model.predict([x])))
26
27 # Compute counterfactual
28 print("\nCompute counterfactual ....")
29 print(generate_counterfactual(model, x, y_target=0, features_whitelist=features_whitelist))
Regression
The interface for computing a counterfactual of a regression model is exactly the same.
But because it might be very difficult or even impossible (e.g. knn or decision tree) to achieve a requested prediction exactly, we can specify a tolerance range in which the prediction is accepted.
We can so by defining a function that takes a prediction as an input and returns True if the predictions is accepted (it is in the range of tolerated predictions) and False otherwise. For instance, if our target value is 25.0 but we are also happy if it deviates not more than 0.5, we could come up with the following function:
1done = lambda z: np.abs(z - 25.0) <= 0.5
This function can be passed as a value of the optional argument done to the ceml.sklearn.models.generate_counterfactual()
function.
A complete example of a regression task is given below:
1#!/usr/bin/env python3
2# -*- coding: utf-8 -*-
3import numpy as np
4from sklearn.datasets import load_boston
5from sklearn.model_selection import train_test_split
6from sklearn.linear_model import Ridge
7
8from ceml.sklearn import generate_counterfactual
9
10
11if __name__ == "__main__":
12 # Load data
13 X, y = load_boston(return_X_y=True)
14 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=4242)
15
16 # Whitelist of features - list of features we can change/use when computing a counterfactual
17 features_whitelist = [0, 1, 2, 3, 4] # Use the first five features only
18
19 # Create and fit model
20 model = Ridge()
21 model.fit(X_train, y_train)
22
23 # Select data point for explaining its prediction
24 x = X_test[1,:]
25 print("Prediction on x: {0}".format(model.predict([x])))
26
27 # Compute counterfactual
28 print("\nCompute counterfactual ....")
29 y_target = 25.0
30 done = lambda z: np.abs(y_target - z) <= 0.5 # Since we might not be able to achieve `y_target` exactly, we tell ceml that we are happy if we do not deviate more than 0.5 from it.
31 print(generate_counterfactual(model, x, y_target=y_target, features_whitelist=features_whitelist, C=1.0, regularization="l2", optimizer="bfgs", done=done))
Pipeline
Often our machine learning pipeline contains more than one model. E.g. we first scale the input and/or reduce the dimensionality before classifying it.
The interface for computing a counterfactual when using a pipeline is identical to the one when using a single model only. We can simply pass a sklearn.pipeline.Pipeline
instance as the value of the parameter model to the function ceml.sklearn.models.generate_counterfactual()
.
Take a look at the ceml.sklearn.pipeline.PipelineCounterfactual
class to see which preprocessings are supported.
A complete example of a classification pipeline with the standard scaler skelarn.preprocessing.StandardScaler
and logistic regression sklearn.linear_model.LogisticRegression
is given below:
1#!/usr/bin/env python3
2# -*- coding: utf-8 -*-
3from sklearn.datasets import load_iris
4from sklearn.preprocessing import StandardScaler
5from sklearn.pipeline import make_pipeline
6from sklearn.model_selection import train_test_split
7from sklearn.metrics import accuracy_score
8from sklearn.linear_model import LogisticRegression
9
10from ceml.sklearn import generate_counterfactual
11
12
13if __name__ == "__main__":
14 # Load data
15 X, y = load_iris(return_X_y=True)
16 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=4242)
17
18 # Whitelist of features - list of features we can change/use when computing a counterfactual
19 features_whitelist = [1, 3] # Use the second and fourth feature only
20
21 # Create and fit the pipeline
22 scaler = StandardScaler()
23 model = LogisticRegression(solver='lbfgs', multi_class='multinomial') # Note that ceml requires: multi_class='multinomial'
24
25 model = make_pipeline(scaler, model)
26 model.fit(X_train, y_train)
27
28 # Select data point for explaining its prediction
29 x = X_test[1,:]
30 print("Prediction on x: {0}".format(model.predict([x])))
31
32 # Compute counterfactual
33 print("\nCompute counterfactual ....")
34 print(generate_counterfactual(model, x, y_target=0, features_whitelist=features_whitelist))
Change optimization parameters
Sometimes it might become necessary to change to default parameters of the optimization methods - e.g. changing the solver, the maximum number of iterations, etc.
This can be done by passing the optional argument optimizer_args to the ceml.sklearn.models.generate_counterfactual()
function.
The value of optimizer_args must be a dictionary where some parameters like verbosity, solver, maximum number of iterations, tolerance thresholds, etc. can be changed - note that not all parameters are used by every optimization algorithm (e.g. “epsilon”, “solver” and “solver_verbosity” are only used if optimizer=”mp”).
A short code snippet demonstrating how to change some optimization parameters is given below:
1import cvxpy as cp
2from ceml.sklearn import generate_counterfactual
3#.......
4
5#model = ......
6#x_orig = .....
7#y_target = .....
8
9# Change optimization parameters
10#opt = ....
11opt_args = {"epsilon": 10.e-4, "solver": cp.SCS, "solver_verbosity": False, "max_iter": 200}
12
13# Compute counterfactual explanations
14x_cf, y_cf, delta = generate_counterfactual(model, x_orig, y_target, features_whitelist=None, C=0.1, regularization="l1", optimizer=opt, optimizer_args=opt_args, return_as_dict=False)
15