# scikit-learn¶

## Classification¶

Computing a counterfactual of a sklearn classifier is done by using the ceml.sklearn.models.generate_counterfactual() function.

We must specify the model we want to use, the input whose prediction we want to explain and the requested target prediction (prediction of the counterfactual). In addition we can restrict the features that can be used for computing a counterfactual, specify a regularization of the counterfactual and specifying the optimization algorithm used for computing a counterfactual.

A complete example of a classification task is given below:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 #!/usr/bin/env python3 # -*- coding: utf-8 -*- from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from sklearn.tree import DecisionTreeClassifier from ceml.sklearn import generate_counterfactual if __name__ == "__main__": # Load data X, y = load_iris(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=4242) # Whitelist of features - list of features we can change/use when computing a counterfactual features_whitelist = None # We can use all features # Create and fit model model = DecisionTreeClassifier(max_depth=3) model.fit(X_train, y_train) # Select data point for explaining its prediction x = X_test[1,:] print("Prediction on x: {0}".format(model.predict([x]))) # Compute counterfactual print("\nCompute counterfactual ....") print(generate_counterfactual(model, x, y_target=0, features_whitelist=features_whitelist)) 

## Regression¶

The interface for computing a counterfactual of a regression model is exactly the same.

But because it might be very difficult or even impossible (e.g. knn or decision tree) to achieve a requested prediction exactly, we can specify a tolerance range in which the prediction is accepted.

We can so by defining a function that takes a prediction as an input and returns True if the predictions is accepted (it is in the range of tolerated predictions) and False otherwise. For instance, if our target value is 25.0 but we are also happy if it deviates not more than 0.5, we could come up with the following function:

 1 done = lambda z: np.abs(z - 25.0) <= 0.5 

This function can be passed as a value of the optional argument done to the ceml.sklearn.models.generate_counterfactual() function.

A complete example of a regression task is given below:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 #!/usr/bin/env python3 # -*- coding: utf-8 -*- import numpy as np from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from sklearn.linear_model import Ridge from ceml.sklearn import generate_counterfactual if __name__ == "__main__": # Load data X, y = load_boston(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=4242) # Whitelist of features - list of features we can change/use when computing a counterfactual features_whitelist = [0, 1, 2, 3, 4] # Use the first five features only # Create and fit model model = Ridge() model.fit(X_train, y_train) # Select data point for explaining its prediction x = X_test[1,:] print("Prediction on x: {0}".format(model.predict([x]))) # Compute counterfactual print("\nCompute counterfactual ....") y_target = 25.0 done = lambda z: np.abs(y_target - z) <= 0.5 # Since we might not be able to achieve y_target exactly, we tell ceml that we are happy if we do not deviate more than 0.5 from it. print(generate_counterfactual(model, x, y_target=y_target, features_whitelist=features_whitelist, C=1.0, regularization="l2", optimizer="bfgs", done=done)) 

## Pipeline¶

Often our machine learning pipeline contains more than one model. E.g. we first scale the input and/or reduce the dimensionality before classifying it.

The interface for computing a counterfactual when using a pipeline is identical to the one when using a single model only. We can simply pass a sklearn.pipeline.Pipeline instance as the value of the parameter model to the function ceml.sklearn.models.generate_counterfactual().

Take a look at the ceml.sklearn.pipeline.PipelineCounterfactual class to see which preprocessings are supported.

A complete example of a classification pipeline with the standard scaler skelarn.preprocessing.StandardScaler and logistic regression sklearn.linear_model.LogisticRegression is given below:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 #!/usr/bin/env python3 # -*- coding: utf-8 -*- from sklearn.datasets import load_iris from sklearn.preprocessing import StandardScaler from sklearn.pipeline import make_pipeline from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from sklearn.linear_model import LogisticRegression from ceml.sklearn import generate_counterfactual if __name__ == "__main__": # Load data X, y = load_iris(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=4242) # Whitelist of features - list of features we can change/use when computing a counterfactual features_whitelist = [1, 3] # Use the second and fourth feature only # Create and fit the pipeline scaler = StandardScaler() model = LogisticRegression(solver='lbfgs', multi_class='multinomial') # Note that ceml requires: multi_class='multinomial' model = make_pipeline(scaler, model) model.fit(X_train, y_train) # Select data point for explaining its prediction x = X_test[1,:] print("Prediction on x: {0}".format(model.predict([x]))) # Compute counterfactual print("\nCompute counterfactual ....") print(generate_counterfactual(model, x, y_target=0, features_whitelist=features_whitelist))