ceml.sklearn

ceml.sklearn.counterfactual

class ceml.sklearn.counterfactual.SklearnCounterfactual(model, **kwds)

Bases: ceml.model.counterfactual.Counterfactual, abc.ABC

Base class for computing a counterfactual of a sklearn model.

The SklearnCounterfactual class can compute counterfactuals of sklearn models.

Parameters

model (object) – The sklearn model that is used for computing the counterfactual.

model

An instance of a sklearn model.

Type

object

mymodel

Rebuild model.

Type

instance of ceml.model.ModelWithLoss

Note

The class SklearnCounterfactual can not be instantiated because it contains an abstract method.

compute_counterfactual(x, y_target, features_whitelist=None, regularization='l1', C=1.0, optimizer='auto', optimizer_args=None, return_as_dict=True, done=None)

Computes a counterfactual of a given input x.

Parameters
  • x (numpy.ndarray) – The data point x whose prediction has to be explained.

  • y_target (int or float) – The requested prediction of the counterfactual.

  • feature_whitelist (list(int), optional) –

    List of feature indices (dimensions of the input space) that can be used when computing the counterfactual.

    If feature_whitelist is None, all features can be used.

    The default is None.

  • regularization (str or ceml.costfunctions.costfunctions.CostFunction, optional) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x. Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    regularization can be a description of the regularization, an instance of ceml.costfunctions.costfunctions.CostFunction (or ceml.costfunctions.costfunctions.DifferentiableCostFunction if the cost function is differentiable) or None if no regularization is requested.

    If regularization is None, no regularization is used.

    The default is “l1”.

  • C (float or list(float), optional) –

    The regularization strength. If C is a list, all values in C are tried and as soon as a counterfactual is found, this counterfactual is returned and no other values of C are tried.

    If no regularization is used (regularization=None), C is ignored.

    The default is 1.0

  • optimizer (str or instance of ceml.optim.optimizer.Optimizer, optional) –

    Name/Identifier of the optimizer that is used for computing the counterfactual. See ceml.optim.optimizer.prepare_optim() for details.

    Use “auto” if you do not know what optimizer to use - a suitable optimizer is chosen automatically.

    As an alternative, we can use any (custom) optimizer that is derived from the ceml.optim.optimizer.Optimizer class.

    Some models (see paper) support the use of mathematical programs for computing counterfactuals. In this case, you can use the option “mp” - please read the documentation of the corresponding model for further information.

    The default is “auto”.

  • optimizer_args (dict, optional) –

    Dictionary for overriding the default hyperparameters of the optimization algorithm.

    The default is None.

  • return_as_dict (boolean, optional) –

    If True, returns the counterfactual, its prediction and the needed changes to the input as dictionary. If False, the results are returned as a triple.

    The default is True.

  • done (callable, optional) –

    A callable that returns True if a counterfactual with a given output/prediction is accepted and False otherwise.

    If done is None, the output/prediction of the counterfactual must match y_target exactly.

    The default is None.

    Note

    In case of a regression it might not always be possible to achieve a given output/prediction exactly.

Returns

A dictionary where the counterfactual is stored in ‘x_cf’, its prediction in ‘y_cf’ and the changes to the original input in ‘delta’.

(x_cf, y_cf, delta) : triple if return_as_dict is False

Return type

dict or triple

Raises

Exception – If no counterfactual was found.

abstract rebuild_model(model)

Rebuilds a sklearn model.

Converts a sklearn model into a class:ceml.model.ModelWithLoss instance so that we have a model specific cost function and can compute the derivative with respect to the input.

Parameters

model – The sklearn model that is used for computing the counterfactual.

Returns

The wrapped model

Return type

ceml.model.ModelWithLoss

ceml.sklearn.plausibility

ceml.sklearn.plausibility.prepare_computation_of_plausible_counterfactuals(X, y, gmms, projection_mean_sub=None, projection_matrix=None, density_thresholds=None)

Computes all steps that are independent of a concrete sample when computing a plausible counterfactual explanations. Because the computation of a plausible counterfactual requires quite an amount of computation that does not depend on the concret sample we want to explain, it make sense to pre compute as much as possible (reduce redundant computations).

Parameters
  • X (numpy.ndarray) – Data points.

  • y (numpy.ndarray) – Labels of data points X. Assumed to be [0, 1, 2, …].

  • gmms (list(int)) – List of class dependent Gaussian Mixture Models (GMMs).

  • projection_mean_sub (numpy.ndarray, optional) –

    The negative bias of the affine preprocessing.

    The default is None.

  • projection_matrix (numpy.ndarray, optional) –

    The projection matrix of the affine preprocessing.

    The default is None.

  • density_threshold (float, optional) –

    Density threshold at which we consider a counterfactual to be plausible.

    If no density threshold is specified (density_threshold is set to None), the median density of the samples X is chosen as a threshold.

    The default is None.

Returns

All necessary (pre computable) stuff needed for the computation of plausible counterfactuals.

Return type

dict

ceml.sklearn.decisiontree

class ceml.sklearn.decisiontree.DecisionTreeCounterfactual(model, **kwds)

Bases: ceml.sklearn.counterfactual.SklearnCounterfactual, ceml.sklearn.decisiontree.PlausibleCounterfactualOfDecisionTree

Class for computing a counterfactual of a decision tree model.

See parent class ceml.sklearn.counterfactual.SklearnCounterfactual.

compute_all_counterfactuals(x, y_target, features_whitelist=None, regularization='l1')

Computes all counterfactuals of a given input x.

Parameters
  • model (a sklearn.tree.DecisionTreeClassifier or sklearn.tree.DecisionTreeRegressor instance.) – The decision tree model that is used for computing the counterfactual.

  • x (numpy.ndarray) – The input x whose prediction is supposed to be explained.

  • y_target (int or float or a callable that returns True if a given prediction is accepted.) – The requested prediction of the counterfactual.

  • features_whitelist (list(int), optional) –

    List of feature indices (dimensions of the input space) that can be used when computing the counterfactual.

    If features_whitelist is None, all features can be used.

    The default is None.

  • regularization (str or callable, optional) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x.

    Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    You can use your own custom penalty function by setting regularization to a callable that can be called on a potential counterfactual and returns a scalar.

    If regularization is None, no regularization is used.

    The default is “l1”.

Returns

List of all counterfactuals.

Return type

list(np.array)

Raises
  • TypeError – If an invalid argument is passed to the function.

  • ValueError – If no counterfactual exists.

compute_counterfactual(x, y_target, features_whitelist=None, regularization='l1', C=None, optimizer=None, return_as_dict=True)

Computes a counterfactual of a given input x.

Parameters
  • model (a sklearn.tree.DecisionTreeClassifier or sklearn.tree.DecisionTreeRegressor instance.) – The decision tree model that is used for computing the counterfactual.

  • x (numpy.ndarray) – The input x whose prediction is supposed to be explained.

  • y_target (int or float or a callable that returns True if a given prediction is accepted.) – The requested prediction of the counterfactual.

  • features_whitelist (list(int), optional) –

    List of feature indices (dimensions of the input space) that can be used when computing the counterfactual.

    If features_whitelist is None, all features can be used.

    The default is None.

  • regularization (str or callable, optional) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x.

    Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    You can use your own custom penalty function by setting regularization to a callable that can be called on a potential counterfactual and returns a scalar.

    If regularization is None, no regularization is used.

    The default is “l1”.

  • C (None) –

    Not used - is always None.

    The only reason for including this parameter is to match the signature of other ceml.sklearn.counterfactual.SklearnCounterfactual children.

  • optimizer (None) –

    Not used - is always None.

    The only reason for including this parameter is to match the signature of other ceml.sklearn.counterfactual.SklearnCounterfactual children.

  • return_as_dict (boolean, optional) –

    If True, returns the counterfactual, its prediction and the needed changes to the input as dictionary. If False, the results are returned as a triple.

    The default is True.

Returns

A dictionary where the counterfactual is stored in ‘x_cf’, its prediction in ‘y_cf’ and the changes to the original input in ‘delta’.

(x_cf, y_cf, delta) : triple if return_as_dict is False

Return type

dict or triple

rebuild_model(model)

Rebuild a sklearn.linear_model.LogisticRegression model.

Does nothing.

Parameters

model (instance of sklearn.tree.DecisionTreeClassifier or sklearn.tree.DecisionTreeRegressor) – The sklearn decision tree model.

Returns

Return type

None

Note

In contrast to many other SklearnCounterfactual instances, we do do not rebuild the model because we do not need/can compute gradients in a decision tree. We compute the set of counterfactuals without using a “common” optimization algorithms like Nelder-Mead.

ceml.sklearn.decisiontree.decisiontree_generate_counterfactual(model, x, y_target, features_whitelist=None, regularization='l1', return_as_dict=True, done=None, plausibility=None)

Computes a counterfactual of a given input x.

Parameters
  • model (a sklearn.tree.DecisionTreeClassifier or sklearn.tree.DecisionTreeRegressor instance.) – The decision tree model that is used for computing the counterfactual.

  • x (numpy.ndarray) – The input x whose prediction has to be explained.

  • y_target (int or float or a callable that returns True if a given prediction is accepted.) – The requested prediction of the counterfactual.

  • feature_whitelist (list(int), optional) –

    List of feature indices (dimensions of the input space) that can be used when computing the counterfactual.

    If feature_whitelist is None, all features can be used.

    The default is None.

  • regularization (str or callable, optional) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x. Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    You can use your own custom penalty function by setting regularization to a callable that can be called on a potential counterfactual and returns a scalar.

    If regularization is None, no regularization is used.

    The default is “l1”.

  • return_as_dict (boolean, optional) –

    If True, returns the counterfactual, its prediction and the needed changes to the input as dictionary. If False, the results are returned as a triple.

    The default is True.

  • done (callable, optional) – Not used.

  • plausibility (dict, optional.) –

    If set to a valid dictionary (see ceml.sklearn.plausibility.prepare_computation_of_plausible_counterfactuals()), a plausible counterfactual (as proposed in Artelt et al. 2020) is computed. Note that in this case, all other parameters are ignored.

    If plausibility is None, the closest counterfactual is computed.

    The default is None.

Returns

A dictionary where the counterfactual is stored in ‘x_cf’, its prediction in ‘y_cf’ and the changes to the original input in ‘delta’.

(x_cf, y_cf, delta) : triple if return_as_dict is False

Return type

dict or triple

Raises

Exception – If no counterfactual was found.

ceml.sklearn.knn

class ceml.sklearn.knn.KNN(model, dist='l2', **kwds)

Bases: ceml.model.model.ModelWithLoss

Class for rebuilding/wrapping the sklearn.neighbors.KNeighborsClassifier and sklearn.neighbors.KNeighborsRegressor classes.

The KNN class rebuilds a sklearn knn model.

Parameters
  • model (instance of sklearn.neighbors.KNeighborsClassifier or sklearn.neighbors.KNeighborsRegressor) – The knn model.

  • dist (str or callable, optional) –

    Computes the distance between a prototype and a data point.

    Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    You can use your own custom distance function by setting dist to a callable that can be called on a data point and returns a scalar.

    The default is “l2”.

    Note: dist must not be None.

X

The training data set.

Type

numpy.array

y

The ground truth of the training data set.

Type

numpy.array

dist

The distance function.

Type

callable

Raises

TypeError – If model is not an instance of sklearn.neighbors.KNeighborsClassifier or sklearn.neighbors.KNeighborsRegressor

get_loss(y_target, pred=None)

Creates and returns a loss function.

Builds a cost function where we penalize the minimum distance to the nearest prototype which is consistent with the target y_target.

Parameters
  • y_target (int) – The target class.

  • pred (callable, optional) –

    A callable that maps an input to an input. E.g. using the ceml.optim.input_wrapper.InputWrapper class.

    If pred is None, no transformation is applied to the input before passing it into the loss function.

    The default is None.

Returns

Initialized cost function. Target label is y_target.

Return type

ceml.backend.jax.costfunctions.TopKMinOfListDistCost

predict(x)

Note

This function is a placeholder only.

This function does not predict anything and just returns the given input.

class ceml.sklearn.knn.KnnCounterfactual(model, dist='l2', **kwds)

Bases: ceml.sklearn.counterfactual.SklearnCounterfactual

Class for computing a counterfactual of a knn model.

See parent class ceml.sklearn.counterfactual.SklearnCounterfactual.

rebuild_model(model)

Rebuilds a sklearn.neighbors.KNeighborsClassifier or sklearn.neighbors.KNeighborsRegressor model.

Converts a sklearn.neighbors.KNeighborsClassifier or sklearn.neighbors.KNeighborsRegressor instance into a ceml.sklearn.knn.KNN instance.

Parameters

model (instace of sklearn.neighbors.KNeighborsClassifier or sklearn.neighbors.KNeighborsRegressor) – The sklearn knn model.

Returns

The wrapped knn model.

Return type

ceml.sklearn.knn.KNN

ceml.sklearn.knn.knn_generate_counterfactual(model, x, y_target, features_whitelist=None, dist='l2', regularization='l1', C=1.0, optimizer='nelder-mead', optimizer_args=None, return_as_dict=True, done=None)

Computes a counterfactual of a given input x.

Parameters
  • model (a sklearn.neighbors.KNeighborsClassifier or sklearn.neighbors.KNeighborsRegressor instance.) – The knn model that is used for computing the counterfactual.

  • x (numpy.ndarray) – The input x whose prediction has to be explained.

  • y_target (int or float or a callable that returns True if a given prediction is accepted.) – The requested prediction of the counterfactual.

  • features_whitelist (list(int), optional) –

    List of feature indices (dimensions of the input space) that can be used when computing the counterfactual.

    If features_whitelist is None, all features can be used.

    The default is None.

  • dist (str or callable, optional) –

    Computes the distance between a prototype and a data point.

    Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    You can use your own custom distance function by setting dist to a callable that can be called on a data point and returns a scalar.

    The default is “l1”.

    Note: dist must not be None.

  • regularization (str or ceml.costfunctions.costfunctions.CostFunction, optional) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x. Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    regularization can be a description of the regularization, an instance of ceml.costfunctions.costfunctions.CostFunction (or ceml.costfunctions.costfunctions.CostFunctionDifferentiable if your cost function is differentiable) or None if no regularization is requested.

    If regularization is None, no regularization is used.

    The default is “l1”.

  • C (float or list(float), optional) –

    The regularization strength. If C is a list, all values in C are tried and as soon as a counterfactual is found, this counterfactual is returned and no other values of C are tried.

    C is ignored if no regularization is used (regularization=None).

    The default is 1.0

  • optimizer (str or instance of ceml.optim.optimizer.Optimizer, optional) –

    Name/Identifier of the optimizer that is used for computing the counterfactual. See ceml.optimizer.optimizer.desc_to_optim() for details.

    As an alternative, we can use any (custom) optimizer that is derived from the ceml.optim.optimizer.Optimizer class.

    The default is “nelder-mead”.

  • optimizer_args (dict, optional) –

    Dictionary for overriding the default hyperparameters of the optimization algorithm.

    The default is None.

  • return_as_dict (boolean, optional) –

    If True, returns the counterfactual, its prediction and the needed changes to the input as dictionary. If False, the results are returned as a triple.

    The default is True.

  • done (callable, optional) –

    A callable that returns True if a counterfactual with a given output/prediction is accepted and False otherwise.

    If done is None, the output/prediction of the counterfactual must match y_target exactly.

    The default is None.

    Note

    In case of a regression it might not always be possible to achieve a given output/prediction exactly.

Returns

A dictionary where the counterfactual is stored in ‘x_cf’, its prediction in ‘y_cf’ and the changes to the original input in ‘delta’.

(x_cf, y_cf, delta) : triple if return_as_dict is False

Return type

dict or triple

ceml.sklearn.linearregression

class ceml.sklearn.linearregression.LinearRegression(model, **kwds)

Bases: ceml.model.model.ModelWithLoss

Class for rebuilding/wrapping the sklearn.linear_model.base.LinearModel class

The LinearRegression class rebuilds a softmax regression model from a given weight vector and intercept.

Parameters

model (instance of sklearn.linear_model.base.LinearModel) – The linear regression model (e.g. sklearn.linear_model.LinearRegression or sklearn.linear_model.Ridge).

w

The weight vector (a matrix if we have a multi-dimensional output).

Type

numpy.ndarray

b

The intercept/bias (a vector if we have a multi-dimensional output).

Type

numpy.ndarray

dim

Dimensionality of the input data.

Type

int

get_loss(y_target, pred=None)

Creates and returns a loss function.

Build a squared-error cost function where the target is y_target.

Parameters
  • y_target (float) – The target value.

  • pred (callable, optional) –

    A callable that maps an input to the output (regression).

    If pred is None, the class method predict is used for mapping the input to the output (regression)

    The default is None.

Returns

Initialized squared-error cost function. Target is y_target.

Return type

ceml.backend.jax.costfunctions.SquaredError

predict(x)

Predict the output of a given input.

Computes the regression on a given input x.

Parameters

x (numpy.ndarray) – The input x whose output is going to be predicted.

Returns

An array containing the predicted output.

Return type

jax.numpy.array

class ceml.sklearn.linearregression.LinearRegressionCounterfactual(model, **kwds)

Bases: ceml.sklearn.counterfactual.SklearnCounterfactual, ceml.optim.cvx.MathematicalProgram, ceml.optim.cvx.ConvexQuadraticProgram

Class for computing a counterfactual of a linear regression model.

See parent class ceml.sklearn.counterfactual.SklearnCounterfactual.

rebuild_model(model)

Rebuild a sklearn.linear_model.base.LinearModel model.

Converts a sklearn.linear_model.base.LinearModel into a ceml.sklearn.linearregression.LinearRegression.

Parameters

model (instance of sklearn.linear_model.base.LinearModel) – The sklearn linear regression model (e.g. sklearn.linear_model.LinearRegression or sklearn.linear_model.Ridge).

Returns

The wrapped linear regression model.

Return type

ceml.sklearn.linearregression.LinearRegression

ceml.sklearn.linearregression.linearregression_generate_counterfactual(model, x, y_target, features_whitelist=None, regularization='l1', C=1.0, optimizer='mp', optimizer_args=None, return_as_dict=True, done=None)

Computes a counterfactual of a given input x.

Parameters
  • model (a sklearn.linear_model.base.LinearModel instance.) – The linear regression model (e.g. sklearn.linear_model.LinearRegression or sklearn.linear_model.Ridge) that is used for computing the counterfactual.

  • x (numpy.ndarray) – The input x whose prediction has to be explained.

  • y_target (float) – The requested prediction of the counterfactual.

  • features_whitelist (list(int), optional) –

    List of feature indices (dimensions of the input space) that can be used when computing the counterfactual.

    If features_whitelist is None, all features can be used.

    The default is None.

  • regularization (str or ceml.costfunctions.costfunctions.CostFunction, optional) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x.

    Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    regularization can be a description of the regularization, an instance of ceml.costfunctions.costfunctions.CostFunction (or ceml.costfunctions.costfunctions.CostFunctionDifferentiable if your cost function is differentiable) or None if no regularization is requested.

    If regularization is None, no regularization is used.

    The default is “l1”.

  • C (float or list(float), optional) –

    The regularization strength. If C is a list, all values in C are tried and as soon as a counterfactual is found, this counterfactual is returned and no other values of C are tried.

    C is ignored if no regularization is used (regularization=None).

    The default is 1.0

  • optimizer (str or instance of ceml.optim.optimizer.Optimizer, optional) –

    Name/Identifier of the optimizer that is used for computing the counterfactual. See ceml.optim.optimizer.prepare_optim() for details.

    Linear regression supports the use of mathematical programs for computing counterfactuals - set optimizer to “mp” for using a convex quadratic program for computing the counterfactual. Note that in this case the hyperparameter C is ignored.

    As an alternative, we can use any (custom) optimizer that is derived from the ceml.optim.optimizer.Optimizer class.

    The default is “mp”.

  • optimizer_args (dict, optional) –

    Dictionary for overriding the default hyperparameters of the optimization algorithm.

    The default is None.

  • return_as_dict (boolean, optional) –

    If True, returns the counterfactual, its prediction and the needed changes to the input as dictionary. If False, the results are returned as a triple.

    The default is True.

  • done (callable, optional) –

    A callable that returns True if a counterfactual with a given output/prediction is accepted and False otherwise.

    If done is None, the output/prediction of the counterfactual must match y_target exactly.

    The default is None.

    Note

    It might not always be possible to achieve a given output/prediction exactly.

Returns

A dictionary where the counterfactual is stored in ‘x_cf’, its prediction in ‘y_cf’ and the changes to the original input in ‘delta’.

(x_cf, y_cf, delta) : triple if return_as_dict is False

Return type

dict or triple

Raises

Exception – If no counterfactual was found.

ceml.sklearn.lvq

class ceml.sklearn.lvq.CQPHelper(mymodel, x_orig, y_target, indices_other_prototypes, features_whitelist=None, regularization='l1', optimizer_args=None, **kwds)

Bases: ceml.optim.cvx.ConvexQuadraticProgram

class ceml.sklearn.lvq.LVQ(model, dist='l2', **kwds)

Bases: ceml.model.model.ModelWithLoss

Class for rebuilding/wrapping the sklearn_lvq.GlvqModel, sklearn_lvq.GmlvqModel, sklearn_lvq.LgmlvqModel, sklearn_lvq.RslvqModel, sklearn_lvq.MrslvqModel and sklearn_lvq.LmrslvqModel classes.

The LVQ class rebuilds a sklearn-lvq lvq model.

Parameters
  • model (instance of sklearn_lvq.GlvqModel, sklearn_lvq.GmlvqModel, sklearn_lvq.LgmlvqModel, sklearn_lvq.RslvqModel, sklearn_lvq.MrslvqModel or sklearn_lvq.LmrslvqModel) – The lvq model.

  • dist (str or callable, optional) –

    Computes the distance between a prototype and a data point.

    Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    You can use your own custom distance function by setting dist to a callable that can be called on a data point and returns a scalar.

    The default is “l2”.

    Note: dist must not be None.

prototypes

The prototypes.

Type

numpy.array

labels

The labels of the prototypes.

Type

numpy.array

dist

The distance function.

Type

callable

model

The original sklearn-lvq model.

Type

object

model_class

The class of the sklearn-lvq model.

Type

class

dim

Dimensionality of the input data.

Type

int

Raises

TypeError – If model is not an instance of sklearn_lvq.GlvqModel, sklearn_lvq.GmlvqModel, sklearn_lvq.LgmlvqModel, sklearn_lvq.RslvqModel, sklearn_lvq.MrslvqModel or sklearn_lvq.LmrslvqModel

get_loss(y_target, pred=None)

Creates and returns a loss function.

Builds a cost function where we penalize the minimum distance to the nearest prototype which is consistent with the target y_target.

Parameters
  • y_target (int) – The target class.

  • pred (callable, optional) –

    A callable that maps an input to an input. E.g. using the ceml.optim.input_wrapper.InputWrapper class.

    If pred is None, no transformation is applied to the input before putting it into the loss function.

    The default is None.

Returns

Initialized cost function. Target label is y_target.

Return type

ceml.backend.jax.costfunctions.MinOfListDistCost

predict(x)

Note

This function is a placeholder only.

This function does not predict anything and just returns the given input.

class ceml.sklearn.lvq.LvqCounterfactual(model, dist='l2', cqphelper=<class 'ceml.sklearn.lvq.CQPHelper'>, **kwds)

Bases: ceml.sklearn.counterfactual.SklearnCounterfactual, ceml.optim.cvx.MathematicalProgram, ceml.optim.cvx.DCQP

Class for computing a counterfactual of a lvq model.

See parent class ceml.sklearn.counterfactual.SklearnCounterfactual.

rebuild_model(model)

Rebuilds a sklearn_lvq.GlvqModel, sklearn_lvq.GmlvqModel, sklearn_lvq.LgmlvqModel, sklearn_lvq.RslvqModel, sklearn_lvq.MrslvqModel or sklearn_lvq.LmrslvqModel model.

Converts a sklearn_lvq.GlvqModel, sklearn_lvq.GmlvqModel, sklearn_lvq.LgmlvqModel, sklearn_lvq.RslvqModel, sklearn_lvq.MrslvqModel or sklearn_lvq.LmrslvqModel instance into a ceml.sklearn.lvq.LVQ instance.

Parameters

model (instace of sklearn_lvq.GlvqModel, sklearn_lvq.GmlvqModel, sklearn_lvq.LgmlvqModel, sklearn_lvq.RslvqModel, sklearn_lvq.MrslvqModel or sklearn_lvq.LmrslvqModel) – The sklearn-lvq lvq model.

Returns

The wrapped lvq model.

Return type

ceml.sklearn.lvq.LVQ

solve(x_orig, y_target, regularization, features_whitelist, return_as_dict, optimizer_args)

Approximately solves the DCQP by using the penalty convex-concave procedure.

Parameters

x0 (numpy.ndarray) – The initial data point for the penalty convex-concave procedure - this could be anything, however a “good” initial solution might lead to a better result.

ceml.sklearn.lvq.lvq_generate_counterfactual(model, x, y_target, features_whitelist=None, dist='l2', regularization='l1', C=1.0, optimizer='auto', optimizer_args=None, return_as_dict=True, done=None)

Computes a counterfactual of a given input x.

Parameters
  • model (a sklearn.neighbors.sklearn_lvq.GlvqModel, sklearn_lvq.GmlvqModel, sklearn_lvq.LgmlvqModel, sklearn_lvq.RslvqModel, sklearn_lvq.MrslvqModel or sklearn_lvq.LmrslvqModel instance.) –

    The lvq model that is used for computing the counterfactual.

    Note: Only lvq models from sklearn-lvq are supported.

  • x (numpy.ndarray) – The input x whose prediction has to be explained.

  • y_target (int or float or a callable that returns True if a given prediction is accepted.) – The requested prediction of the counterfactual.

  • features_whitelist (list(int), optional) –

    List of feature indices (dimensions of the input space) that can be used when computing the counterfactual.

    If features_whitelist is None, all features can be used.

    The default is None.

  • dist (str or callable, optional) –

    Computes the distance between a prototype and a data point.

    Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    You can use your own custom distance function by setting dist to a callable that can be called on a data point and returns a scalar.

    The default is “l1”.

    Note: dist must not be None.

  • regularization (str or callable, optional) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x. Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    regularization can be a description of the regularization, an instance of ceml.costfunctions.costfunctions.CostFunction (or ceml.costfunctions.costfunctions.CostFunctionDifferentiable if your cost function is differentiable) or None if no regularization is requested.

    If regularization is None, no regularization is used.

    The default is “l1”.

  • C (float or list(float), optional) –

    The regularization strength. If C is a list, all values in C are tried and as soon as a counterfactual is found, this counterfactual is returned and no other values of C are tried.

    C is ignored if no regularization is used (regularization=None).

    The default is 1.0

  • optimizer (str or instance of ceml.optim.optimizer.Optimizer, optional) –

    Name/Identifier of the optimizer that is used for computing the counterfactual. See ceml.optim.optimizer.prepare_optim() for details.

    As an alternative, we can use any (custom) optimizer that is derived from the ceml.optim.optimizer.Optimizer class.

    Use “auto” if you do not know what optimizer to use - a suitable optimizer is chosen automatically.

    The default is “auto”.

    Learning vector quantization supports the use of mathematical programs for computing counterfactuals - set optimizer to “mp” for using a convex quadratic program (G(M)LVQ) or a DCQP (otherwise) for computing the counterfactual. Note that in this case the hyperparameter C is ignored. Because the DCQP is a non-convex problem, we are not guaranteed to find the best solution (it might even happen that we do not find a solution at all) - we use the penalty convex-concave procedure for approximately solving the DCQP.

  • optimizer_args (dict, optional) –

    Dictionary for overriding the default hyperparameters of the optimization algorithm.

    The default is None.

  • return_as_dict (boolean, optional) –

    If True, returns the counterfactual, its prediction and the needed changes to the input as dictionary. If False, the results are returned as a triple.

    The default is True.

  • done (callable, optional) – Not used.

Returns

A dictionary where the counterfactual is stored in ‘x_cf’, its prediction in ‘y_cf’ and the changes to the original input in ‘delta’.

(x_cf, y_cf, delta) : triple if return_as_dict is False

Return type

dict or triple

Raises

Exception – If no counterfactual was found.

ceml.sklearn.models

ceml.sklearn.models.generate_counterfactual(model, x, y_target, features_whitelist=None, dist='l2', regularization='l1', C=1.0, optimizer='auto', optimizer_args=None, return_as_dict=True, done=None)

Computes a counterfactual of a given input x.

Parameters
  • model (object) – The sklearn model that is used for computing the counterfactual.

  • x (numpy.ndarray) – The input x whose prediction has to be explained.

  • y_target (int or float or a callable that returns True if a given prediction is accepted.) – The requested prediction of the counterfactual.

  • features_whitelist (list(int), optional) –

    List of feature indices (dimensions of the input space) that can be used when computing the counterfactual.

    If features_whitelist is None, all features can be used.

    The default is None.

  • dist (str or callable, optional) –

    Computes the distance between a prototype and a data point.

    Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    You can use your own custom distance function by setting dist to a callable that can be called on a data point and returns a scalar.

    The default is “l1”.

    Note: dist must not be None.

    Note

    Only needed if model is a LVQ or KNN model!

  • regularization (str or callable, optional) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x.

    Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    regularization can be a description of the regularization, an instance of ceml.costfunctions.costfunctions.CostFunction (or ceml.costfunctions.costfunctions.CostFunctionDifferentiable if your cost function is differentiable) or None if no regularization is requested.

    If regularization is None, no regularization is used.

    The default is “l1”.

  • C (float or list(float), optional) –

    The regularization strength. If C is a list, all values in C are tried and as soon as a counterfactual is found, this counterfactual is returned and no other values of C are tried.

    C is ignored if no regularization is used (regularization=None).

    The default is 1.0

  • optimizer (str or instance of ceml.optim.optimizer.Optimizer, optional) –

    Name/Identifier of the optimizer that is used for computing the counterfactual. See ceml.optimizer.optimizer.desc_to_optim() for details.

    Use “auto” if you do not know what optimizer to use - a suitable optimizer is chosen automatically.

    As an alternative, we can use any (custom) optimizer that is derived from the ceml.optim.optimizer.Optimizer class.

    The default is “auto”.

  • optimizer_args (dict, optional) –

    Dictionary for overriding the default hyperparameters of the optimization algorithm.

    The default is None.

  • return_as_dict (boolean, optional) –

    If True, returns the counterfactual, its prediction and the needed changes to the input as dictionary. If False, the results are returned as a triple.

    The default is True.

  • done (callable, optional) –

    A callable that returns True if a counterfactual with a given output/prediction is accepted and False otherwise.

    If done is None, the output/prediction of the counterfactual must match y_target exactly.

    The default is None.

    Note

    In case of a regression it might not always be possible to achieve a given output/prediction exactly.

Returns

A dictionary where the counterfactual is stored in ‘x_cf’, its prediction in ‘y_cf’ and the changes to the original input in ‘delta’.

(x_cf, y_cf, delta) : triple if return_as_dict is False

Return type

dict or triple

Raises

ValueError – If model contains an unsupported model.

ceml.sklearn.naivebayes

class ceml.sklearn.naivebayes.GaussianNB(model, **kwds)

Bases: ceml.model.model.ModelWithLoss

Class for rebuilding/wrapping the sklearn.naive_bayes.GaussianNB class

The GaussianNB class rebuilds a gaussian naive bayes model from a given set of parameters (priors, means and variances).

Parameters

model (instance of sklearn.naive_bayes.GaussianNB) – The gaussian naive bayes model.

class_priors

Class dependend priors.

Type

numpy.ndarray

means

Class and feature dependend means.

Type

numpy.array

variances

Class and feature dependend variances.

Type

numpy.ndarray

dim

Dimensionality of the input data.

Type

int

is_binary

True if model is a binary classifier, False otherwise.

Type

boolean

get_loss(y_target, pred=None)

Creates and returns a loss function.

Build a negative-log-likehood cost function where the target is y_target.

Parameters
  • y_target (int) – The target class.

  • pred (callable, optional) –

    A callable that maps an input to the output (class probabilities).

    If pred is None, the class method predict is used for mapping the input to the output (class probabilities)

    The default is None.

Returns

Initialized negative-log-likelihood cost function. Target label is y_target.

Return type

ceml.backend.jax.costfunctions.NegLogLikelihoodCost

predict(x)

Predict the output of a given input.

Computes the class probabilities for a given input x.

Parameters

x (numpy.ndarray) – The input x that is going to be classified.

Returns

An array containing the class probabilities.

Return type

jax.numpy.array

class ceml.sklearn.naivebayes.GaussianNbCounterfactual(model, **kwds)

Bases: ceml.sklearn.counterfactual.SklearnCounterfactual, ceml.optim.cvx.MathematicalProgram, ceml.optim.cvx.SDP, ceml.optim.cvx.DCQP

Class for computing a counterfactual of a gaussian naive bayes model.

See parent class ceml.sklearn.counterfactual.SklearnCounterfactual.

rebuild_model(model)

Rebuild a sklearn.naive_bayes.GaussianNB model.

Converts a sklearn.naive_bayes.GaussianNB into a ceml.sklearn.naivebayes.GaussianNB.

Parameters

model (instance of sklearn.naive_bayes.GaussianNB) – The sklearn gaussian naive bayes model.

Returns

The wrapped gaussian naive bayes model.

Return type

ceml.sklearn.naivebayes.GaussianNB

solve(x_orig, y_target, regularization, features_whitelist, return_as_dict, optimizer_args)

Approximately solves the DCQP by using the penalty convex-concave procedure.

Parameters

x0 (numpy.ndarray) – The initial data point for the penalty convex-concave procedure - this could be anything, however a “good” initial solution might lead to a better result.

ceml.sklearn.naivebayes.gaussiannb_generate_counterfactual(model, x, y_target, features_whitelist=None, regularization='l1', C=1.0, optimizer='auto', optimizer_args=None, return_as_dict=True, done=None)

Computes a counterfactual of a given input x.

Parameters
  • model (a sklearn.naive_bayes.GaussianNB instance.) – The gaussian naive bayes model that is used for computing the counterfactual.

  • x (numpy.ndarray) – The input x whose prediction has to be explained.

  • y_target (int or float or a callable that returns True if a given prediction is accepted.) – The requested prediction of the counterfactual.

  • features_whitelist (list(int), optional) –

    List of feature indices (dimensions of the input space) that can be used when computing the counterfactual.

    If features_whitelist is None, all features can be used.

    The default is None.

  • regularization (str or ceml.costfunctions.costfunctions.CostFunction, optional) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x. Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    regularization can be a description of the regularization, an instance of ceml.costfunctions.costfunctions.CostFunction (or ceml.costfunctions.costfunctions.CostFunctionDifferentiable if your cost function is differentiable) or None if no regularization is requested.

    If regularization is None, no regularization is used.

    The default is “l1”.

  • C (float or list(float), optional) –

    The regularization strength. If C is a list, all values in C are tried and as soon as a counterfactual is found, this counterfactual is returned and no other values of C are tried.

    C is ignored if no regularization is used (regularization=None).

    The default is 1.0

  • optimizer (str or instance of ceml.optim.optimizer.Optimizer, optional) –

    Name/Identifier of the optimizer that is used for computing the counterfactual. See ceml.optim.optimizer.prepare_optim() for details.

    As an alternative, we can use any (custom) optimizer that is derived from the ceml.optim.optimizer.Optimizer class.

    Use “auto” if you do not know what optimizer to use - a suitable optimizer is chosen automatically.

    The default is “auto”.

    Gaussian naive Bayes supports the use of mathematical programs for computing counterfactuals - set optimizer to “mp” for using a semi-definite program (binary classifier) or a DCQP (otherwise) for computing the counterfactual. Note that in this case the hyperparameter C is ignored. Because the DCQP is a non-convex problem, we are not guaranteed to find the best solution (it might even happen that we do not find a solution at all) - we use the penalty convex-concave procedure for approximately solving the DCQP.

  • optimizer_args (dict, optional) –

    Dictionary for overriding the default hyperparameters of the optimization algorithm.

    The default is None.

  • return_as_dict (boolean, optional) –

    If True, returns the counterfactual, its prediction and the needed changes to the input as dictionary. If False, the results are returned as a triple.

    The default is True.

  • done (callable, optional) – Not used.

Returns

A dictionary where the counterfactual is stored in ‘x_cf’, its prediction in ‘y_cf’ and the changes to the original input in ‘delta’.

(x_cf, y_cf, delta) : triple if return_as_dict is False

Return type

dict or triple

Raises

Exception – If no counterfactual was found.

ceml.sklearn.lda

class ceml.sklearn.lda.Lda(model, **kwds)

Bases: ceml.model.model.ModelWithLoss

Class for rebuilding/wrapping the sklearn.discriminant_analysis.LinearDiscriminantAnalysis class.

The Lda class rebuilds a lda model from a given parameters.

Parameters

model (instance of sklearn.discriminant_analysis.LinearDiscriminantAnalysis) – The lda model.

class_priors

Class dependend priors.

Type

numpy.ndarray

means

Class dependend means.

Type

numpy.ndarray

sigma_inv

Inverted covariance matrix.

Type

numpy.ndarray

dim

Dimensionality of the input data.

Type

int

Raises

TypeError – If model is not an instance of sklearn.discriminant_analysis.LinearDiscriminantAnalysis

get_loss(y_target, pred=None)

Creates and returns a loss function.

Build a negative-log-likehood cost function where the target is y_target.

Parameters
  • y_target (int) – The target class.

  • pred (callable, optional) –

    A callable that maps an input to the output (class probabilities).

    If pred is None, the class method predict is used for mapping the input to the output (class probabilities)

    The default is None.

Returns

Initialized negative-log-likelihood cost function. Target label is y_target.

Return type

ceml.backend.jax.costfunctions.NegLogLikelihoodCost

predict(x)

Predict the output of a given input.

Computes the class probabilities for a given input x.

Parameters

x (numpy.ndarray) – The input x that is going to be classified.

Returns

An array containing the class probabilities.

Return type

jax.numpy.array

class ceml.sklearn.lda.LdaCounterfactual(model, **kwds)

Bases: ceml.sklearn.counterfactual.SklearnCounterfactual, ceml.optim.cvx.MathematicalProgram, ceml.optim.cvx.ConvexQuadraticProgram, ceml.optim.cvx.PlausibleCounterfactualOfHyperplaneClassifier

Class for computing a counterfactual of a lda model.

See parent class ceml.sklearn.counterfactual.SklearnCounterfactual.

rebuild_model(model)

Rebuild a sklearn.discriminant_analysis.LinearDiscriminantAnalysis model.

Converts a sklearn.discriminant_analysis.LinearDiscriminantAnalysis into a ceml.sklearn.lda.Lda.

Parameters

model (instance of sklearn.discriminant_analysis.LinearDiscriminantAnalysis) – The sklearn lda model - note that store_covariance must be set to True.

Returns

The wrapped qda model.

Return type

ceml.sklearn.lda.Lda

ceml.sklearn.lda.lda_generate_counterfactual(model, x, y_target, features_whitelist=None, regularization='l1', C=1.0, optimizer='mp', optimizer_args=None, return_as_dict=True, done=None, plausibility=None)

Computes a counterfactual of a given input x.

Parameters
  • model (a sklearn.discriminant_analysis.LinearDiscriminantAnalysis instance.) – The lda model that is used for computing the counterfactual.

  • x (numpy.ndarray) – The input x whose prediction has to be explained.

  • y_target (int or float or a callable that returns True if a given prediction is accepted.) – The requested prediction of the counterfactual.

  • features_whitelist (list(int), optional) –

    List of feature indices (dimensions of the input space) that can be used when computing the counterfactual.

    If features_whitelist is None, all features can be used.

    The default is None.

  • regularization (str or ceml.costfunctions.costfunctions.CostFunction, optional) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x. Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    regularization can be a description of the regularization, an instance of ceml.costfunctions.costfunctions.CostFunction (or ceml.costfunctions.costfunctions.CostFunctionDifferentiable if your cost function is differentiable) or None if no regularization is requested.

    If regularization is None, no regularization is used.

    The default is “l1”.

  • C (float or list(float), optional) –

    The regularization strength. If C is a list, all values in C are tried and as soon as a counterfactual is found, this counterfactual is returned and no other values of C are tried.

    C is ignored if no regularization is used (regularization=None).

    The default is 1.0

  • optimizer (str or instance of ceml.optim.optimizer.Optimizer, optional) –

    Name/Identifier of the optimizer that is used for computing the counterfactual. See ceml.optim.optimizer.prepare_optim() for details.

    Linear discriminant analysis supports the use of mathematical programs for computing counterfactuals - set optimizer to “mp” for using a convex quadratic program for computing the counterfactual. Note that in this case the hyperparameter C is ignored.

    As an alternative, we can use any (custom) optimizer that is derived from the ceml.optim.optimizer.Optimizer class.

    The default is “mp”.

  • optimizer_args (dict, optional) –

    Dictionary for overriding the default hyperparameters of the optimization algorithm.

    The default is None.

  • return_as_dict (boolean, optional) –

    If True, returns the counterfactual, its prediction and the needed changes to the input as dictionary. If False, the results are returned as a triple.

    The default is True.

  • done (callable, optional) – Not used.

  • plausibility (dict, optional.) –

    If set to a valid dictionary (see ceml.sklearn.plausibility.prepare_computation_of_plausible_counterfactuals()), a plausible counterfactual (as proposed in Artelt et al. 2020) is computed. Note that in this case, all other parameters are ignored.

    If plausibility is None, the closest counterfactual is computed.

    The default is None.

Returns

A dictionary where the counterfactual is stored in ‘x_cf’, its prediction in ‘y_cf’ and the changes to the original input in ‘delta’.

(x_cf, y_cf, delta) : triple if return_as_dict is False

Return type

dict or triple

Raises

Exception – If no counterfactual was found.

ceml.sklearn.qda

class ceml.sklearn.qda.Qda(model, **kwds)

Bases: ceml.model.model.ModelWithLoss

Class for rebuilding/wrapping the sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis class.

The Qda class rebuilds a lda model from a given parameters.

Parameters

model (instance of sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis) – The qda model.

class_priors

Class dependend priors.

Type

numpy.ndarray

means

Class dependend means.

Type

numpy.ndarray

sigma_inv

Class dependend inverted covariance matrices.

Type

numpy.ndarray

dim

Dimensionality of the input data.

Type

int

is_binary

True if model is a binary classifier, False otherwise.

Type

boolean

Raises

TypeError – If model is not an instance of sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis

get_loss(y_target, pred=None)

Creates and returns a loss function.

Build a negative-log-likehood cost function where the target is y_target.

Parameters
  • y_target (int) – The target class.

  • pred (callable, optional) –

    A callable that maps an input to the output (class probabilities).

    If pred is None, the class method predict is used for mapping the input to the output (class probabilities)

    The default is None.

Returns

Initialized negative-log-likelihood cost function. Target label is y_target.

Return type

ceml.backend.jax.costfunctions.NegLogLikelihoodCost

predict(x)

Predict the output of a given input.

Computes the class probabilities for a given input x.

Parameters

x (numpy.ndarray) – The input x that is going to be classified.

Returns

An array containing the class probabilities.

Return type

jax.numpy.array

class ceml.sklearn.qda.QdaCounterfactual(model, **kwds)

Bases: ceml.sklearn.counterfactual.SklearnCounterfactual, ceml.optim.cvx.MathematicalProgram, ceml.optim.cvx.SDP, ceml.optim.cvx.DCQP

Class for computing a counterfactual of a qda model.

See parent class ceml.sklearn.counterfactual.SklearnCounterfactual.

rebuild_model(model)

Rebuild a sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis model.

Converts a sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis into a ceml.sklearn.qda.Qda.

Parameters

model (instance of sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis) – The sklearn qda model - note that store_covariance must be set to True.

Returns

The wrapped qda model.

Return type

ceml.sklearn.qda.Qda

solve(x_orig, y_target, regularization, features_whitelist, return_as_dict, optimizer_args)

Approximately solves the DCQP by using the penalty convex-concave procedure.

Parameters

x0 (numpy.ndarray) – The initial data point for the penalty convex-concave procedure - this could be anything, however a “good” initial solution might lead to a better result.

ceml.sklearn.qda.qda_generate_counterfactual(model, x, y_target, features_whitelist=None, regularization='l1', C=1.0, optimizer='auto', optimizer_args=None, return_as_dict=True, done=None)

Computes a counterfactual of a given input x.

Parameters
  • model (a sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis instance.) – The qda model that is used for computing the counterfactual.

  • x (numpy.ndarray) – The input x whose prediction has to be explained.

  • y_target (int or float or a callable that returns True if a given prediction is accepted.) – The requested prediction of the counterfactual.

  • features_whitelist (list(int), optional) –

    List of feature indices (dimensions of the input space) that can be used when computing the counterfactual.

    If features_whitelist is None, all features can be used.

    The default is None.

  • regularization (str or ceml.costfunctions.costfunctions.CostFunction, optional) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x. Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    regularization can be a description of the regularization, an instance of ceml.costfunctions.costfunctions.CostFunction (or ceml.costfunctions.costfunctions.CostFunctionDifferentiable if your cost function is differentiable) or None if no regularization is requested.

    If regularization is None, no regularization is used.

    The default is “l1”.

  • C (float or list(float), optional) –

    The regularization strength. If C is a list, all values in C are tried and as soon as a counterfactual is found, this counterfactual is returned and no other values of C are tried.

    C is ignored if no regularization is used (regularization=None).

    The default is 1.0

  • optimizer (str or instance of ceml.optim.optimizer.Optimizer, optional) –

    Name/Identifier of the optimizer that is used for computing the counterfactual. See ceml.optim.optimizer.prepare_optim() for details.

    As an alternative, we can use any (custom) optimizer that is derived from the ceml.optim.optimizer.Optimizer class.

    The default is “nelder-mead”.

    Quadratic discriminant analysis supports the use of mathematical programs for computing counterfactuals - set optimizer to “mp” for using a semi-definite program (binary classifier) or a DCQP (otherwise) for computing the counterfactual. Note that in this case the hyperparameter C is ignored. Because the DCQP is a non-convex problem, we are not guaranteed to find the best solution (it might even happen that we do not find a solution at all) - we use the penalty convex-concave procedure for approximately solving the DCQP.

  • optimizer_args (dict, optional) –

    Dictionary for overriding the default hyperparameters of the optimization algorithm.

    The default is None.

  • return_as_dict (boolean, optional) –

    If True, returns the counterfactual, its prediction and the needed changes to the input as dictionary. If False, the results are returned as a triple.

    The default is True.

  • done (callable, optional) – Not used.

Returns

A dictionary where the counterfactual is stored in ‘x_cf’, its prediction in ‘y_cf’ and the changes to the original input in ‘delta’.

(x_cf, y_cf, delta) : triple if return_as_dict is False

Return type

dict or triple

Raises

Exception – If no counterfactual was found.

ceml.sklearn.pipeline

class ceml.sklearn.pipeline.PipelineCounterfactual(model, **kwds)

Bases: ceml.sklearn.counterfactual.SklearnCounterfactual

Class for computing a counterfactual of a softmax regression model.

See parent class ceml.sklearn.counterfactual.SklearnCounterfactual.

build_loss(regularization, x_orig, y_target, pred, grad_mask, C, input_wrapper)

Build a loss function.

Overwrites the build_loss method from base class ceml.sklearn.counterfactual.SklearnCounterfactual.

Parameters
  • regularization (str or ceml.costfunctions.costfunctions.CostFunction) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x.

    Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    regularization can be a description of the regularization, an instance of ceml.costfunctions.costfunctions.CostFunction (or ceml.costfunctions.costfunctions.DifferentiableCostFunction if your cost function is differentiable) or None if no regularization is requested.

    If regularization is None, no regularization is used.

  • x_orig (numpy.array) – The original input whose prediction has to be explained.

  • y_target (int or float) – The requested output.

  • pred (callable) –

    A callable that maps an input to the output.

    If pred is None, the class method predict is used for mapping the input to the output.

  • grad_mask (numpy.array) – Gradient mask determining which dimensions can be used.

  • C (float or list(float)) –

    The regularization strength. If C is a list, all values in C are tried and as soon as a counterfactual is found, this counterfactual is returned and no other values of C are tried.

    C is ignored if no regularization is used (regularization=None).

  • input_wrapper (callable) – Converts the input (e.g. if we want to exclude some features/dimensions, we might have to include these missing features before applying any function to it).

Returns

Initialized cost function. Target is set to y_target.

Return type

ceml.costfunctions.costfunctions.CostFunction

compute_counterfactual(x, y_target, features_whitelist=None, regularization='l1', C=1.0, optimizer='auto', optimizer_args=None, return_as_dict=True, done=None)

Computes a counterfactual of a given input x.

Parameters
  • x (numpy.ndarray) – The data point x whose prediction has to be explained.

  • y_target (int or float) – The requested prediction of the counterfactual.

  • feature_whitelist (list(int), optional) –

    List of feature indices (dimensions of the input space) that can be used when computing the counterfactual.

    If feature_whitelist is None, all features can be used.

    The default is None.

  • regularization (str or ceml.costfunctions.costfunctions.CostFunction, optional) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x. Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    regularization can be a description of the regularization, an instance of ceml.costfunctions.costfunctions.CostFunction (or ceml.costfunctions.costfunctions.DifferentiableCostFunction if the cost function is differentiable) or None if no regularization is requested.

    If regularization is None, no regularization is used.

    The default is “l1”.

  • C (float or list(float), optional) –

    The regularization strength. If C is a list, all values in C are tried and as soon as a counterfactual is found, this counterfactual is returned and no other values of C are tried.

    If no regularization is used (regularization=None), C is ignored.

    The default is 1.0

  • optimizer (str or instance of ceml.optim.optimizer.Optimizer, optional) –

    Name/Identifier of the optimizer that is used for computing the counterfactual. See ceml.optim.optimizer.prepare_optim() for details.

    Use “auto” if you do not know what optimizer to use - a suitable optimizer is chosen automatically.

    As an alternative, we can use any (custom) optimizer that is derived from the ceml.optim.optimizer.Optimizer class.

    Some models (see paper) support the use of mathematical programs for computing counterfactuals. In this case, you can use the option “mp” - please read the documentation of the corresponding model for further information.

    The default is “auto”.

  • optimizer_args (dict, optional) –

    Dictionary for overriding the default hyperparameters of the optimization algorithm.

    The default is None.

  • return_as_dict (boolean, optional) –

    If True, returns the counterfactual, its prediction and the needed changes to the input as dictionary. If False, the results are returned as a triple.

    The default is True.

  • done (callable, optional) –

    A callable that returns True if a counterfactual with a given output/prediction is accepted and False otherwise.

    If done is None, the output/prediction of the counterfactual must match y_target exactly.

    The default is None.

    Note

    In case of a regression it might not always be possible to achieve a given output/prediction exactly.

Returns

A dictionary where the counterfactual is stored in ‘x_cf’, its prediction in ‘y_cf’ and the changes to the original input in ‘delta’.

(x_cf, y_cf, delta) : triple if return_as_dict is False

Return type

dict or triple

Raises

Exception – If no counterfactual was found.

rebuild_model(model)

Rebuild a sklearn.pipeline.Pipeline model.

Converts a sklearn.pipeline.Pipeline into a ceml.sklearn.pipeline.PipelineModel.

Parameters

model (instance of sklearn.pipeline.Pipeline) – The sklearn pipeline model.

Returns

The wrapped pipeline model.

Return type

ceml.sklearn.pipeline.Pipeline

class ceml.sklearn.pipeline.PipelineModel(models, **kwds)

Bases: ceml.model.model.ModelWithLoss

Class for rebuilding/wrapping the sklearn.pipeline.Pipeline class

The PipelineModel class rebuilds a pipeline model from a given list of sklearn models.

Parameters

models (list(object)) – Ordered list of all sklearn models in the pipeline.

models

Ordered list of all sklearn models in the pipeline.

Type

list(objects)

get_loss(y_target, pred=None)

Creates and returns a loss function.

Builds a cost function where the target is y_target.

Parameters
  • y_target (int or float) – The requested output.

  • pred (callable, optional) –

    A callable that maps an input to the output.

    If pred is None, the class method predict is used for mapping the input to the output.

    The default is None.

Returns

Initialized cost function. Target is set to y_target.

Return type

ceml.costfunctions.costfunctions.CostFunction

predict(x)

Predicts the output of a given input.

Computes the prediction of a given input x.

Parameters

x (numpy.ndarray) – The input x.

Returns

Output of the pipeline (might be scalar or smth. higher-dimensional).

Return type

numpy.array

ceml.sklearn.pipeline.pipeline_generate_counterfactual(model, x, y_target, features_whitelist=None, regularization='l1', C=1.0, optimizer='nelder-mead', optimizer_args=None, return_as_dict=True, done=None)

Computes a counterfactual of a given input x.

Parameters
  • model (a sklearn.pipeline.Pipeline instance.) – The modelpipeline that is used for computing the counterfactual.

  • x (numpy.ndarray) – The input x whose prediction has to be explained.

  • y_target (int or float or a callable that returns True if a given prediction is accepted.) – The requested prediction of the counterfactual.

  • features_whitelist (list(int), optional) –

    List of feature indices (dimensions of the input space) that can be used when computing the counterfactual.

    If features_whitelist is None, all features can be used.

    The default is None.

  • regularization (str or ceml.costfunctions.costfunctions.CostFunction, optional) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x.

    Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    regularization can be a description of the regularization, an instance of ceml.costfunctions.costfunctions.CostFunction (or ceml.costfunctions.costfunctions.CostFunctionDifferentiable if your cost function is differentiable) or None if no regularization is requested.

    If regularization is None, no regularization is used.

    The default is “l1”.

  • C (float or list(float), optional) –

    The regularization strength. If C is a list, all values in C are tried and as soon as a counterfactual is found, this counterfactual is returned and no other values of C are tried.

    C is ignored if no regularization is used (regularization=None).

    The default is 1.0

  • optimizer (str or instance of ceml.optim.optimizer.Optimizer, optional) –

    Name/Identifier of the optimizer that is used for computing the counterfactual. See ceml.optim.optimizer.prepare_optim() for details.

    Use “auto” if you do not know what optimizer to use - a suitable optimizer is chosen automatically.

    As an alternative, we can use any (custom) optimizer that is derived from the ceml.optim.optimizer.Optimizer class.

    The default is “nelder-mead”.

    Some models (see paper) support the use of mathematical programs for computing counterfactuals. In this case, you can use the option “mp” - please read the documentation of the corresponding model for further information.

  • optimizer_args (dict, optional) –

    Dictionary for overriding the default hyperparameters of the optimization algorithm.

    The default is None.

  • return_as_dict (boolean, optional) –

    If True, returns the counterfactual, its prediction and the needed changes to the input as dictionary. If False, the results are returned as a triple.

    The default is True.

  • done (callable, optional) –

    A callable that returns True if a counterfactual with a given output/prediction is accepted and False otherwise.

    If done is None, the output/prediction of the counterfactual must match y_target exactly.

    The default is None.

    Note

    In case of a regression it might not always be possible to achieve a given output/prediction exactly.

Returns

A dictionary where the counterfactual is stored in ‘x_cf’, its prediction in ‘y_cf’ and the changes to the original input in ‘delta’.

(x_cf, y_cf, delta) : triple if return_as_dict is False

Return type

dict or triple

Raises

Exception – If no counterfactual was found.

ceml.sklearn.randomforest

class ceml.sklearn.randomforest.EnsembleVotingCost(models, y_target, input_wrapper=None, epsilon=0, **kwds)

Bases: ceml.costfunctions.costfunctions.CostFunction

Loss function of an ensemble of models.

The loss is the negative fraction of models that predict the correct output.

Parameters
  • models (list(object)) – List of models

  • y_target (int, float or a callable that returns True if a given prediction is accepted.) – The requested prediction.

  • input_wrapper (callable, optional) –

    Converts the input (e.g. if we want to exclude some features/dimensions, we might have to include these missing features before applying any function to it).

    The default is None.

score_impl(x)

Implementation of the loss function.

class ceml.sklearn.randomforest.RandomForest(model, **kwds)

Bases: ceml.model.model.ModelWithLoss

Class for rebuilding/wrapping the sklearn.ensemble.RandomForestClassifier or sklearn.ensemble.RandomForestRegressor class.

Parameters

model (instance of sklearn.ensemble.RandomForestClassifier or sklearn.ensemble.RandomForestRegressor) – The random forest model.

Raises

TypeError – If model is not an instance of sklearn.ensemble.RandomForestClassifier or sklearn.ensemble.RandomForestRegressor

get_loss(y_target, input_wrapper=None)

Creates and returns a loss function.

Parameters
  • y_target (int, float or a callable that returns True if a given prediction is accepted.) – The requested prediction.

  • input_wrapper (callable) – Converts the input (e.g. if we want to exclude some features/dimensions, we might have to include these missing features before applying any function to it).

Returns

Initialized loss function. The target output is y_target.

Return type

ceml.sklearn.randomforest.EnsembleVotingCost

predict(x)

Predict the output of a given input.

Computes the class label of a given input x.

Parameters

x (numpy.ndarray) – The input x that is going to be classified.

Returns

Prediction.

Return type

int or float

class ceml.sklearn.randomforest.RandomForestCounterfactual(model, **kwds)

Bases: ceml.sklearn.counterfactual.SklearnCounterfactual

Class for computing a counterfactual of a random forest model.

See parent class ceml.sklearn.counterfactual.SklearnCounterfactual.

build_loss(regularization, x_orig, y_target, pred, grad_mask, C, input_wrapper)

Build the (non-differentiable) cost function: Regularization + Loss

compute_counterfactual(x, y_target, features_whitelist=None, regularization='l1', C=1.0, optimizer='nelder-mead', optimizer_args=None, return_as_dict=True, done=None)

Computes a counterfactual of a given input x.

Parameters
  • x (numpy.ndarray) – The input x whose prediction has to be explained.

  • y_target (int or float) – The requested prediction of the counterfactual.

  • feature_whitelist (list(int), optional) –

    List of feature indices (dimensions of the input space) that can be used when computing the counterfactual.

    If feature_whitelist is None, all features can be used.

    The default is None.

  • regularization (str or ceml.costfunctions.costfunctions.CostFunction, optional) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x. Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    regularization can be a description of the regularization, an instance of ceml.costfunctions.costfunctions.CostFunction (or ceml.costfunctions.costfunctions.DifferentiableCostFunction if the cost function is differentiable) or None if no regularization is requested.

    If regularization is None, no regularization is used.

    The default is “l1”.

  • C (float or list(float), optional) –

    The regularization strength. If C is a list, all values in C are tried and as soon as a counterfactual is found, this counterfactual is returned and no other values of C are tried.

    If no regularization is used (regularization=None), C is ignored.

    The default is 1.0

  • optimizer (str or instance of ceml.optim.optimizer.Optimizer, optional) –

    Name/Identifier of the optimizer that is used for computing the counterfactual. See ceml.optim.optimizer.prepare_optim() for details.

    As an alternative, we can use any (custom) optimizer that is derived from the ceml.optim.optimizer.Optimizer class.

    The default is “nelder-mead”.

    Note

    The cost function of a random forest model is not differentiable - we can not use a gradient-based optimization algorithm.

  • optimizer_args (dict, optional) –

    Dictionary for overriding the default hyperparameters of the optimization algorithm.

    The default is None.

  • return_as_dict (boolean, optional) –

    If True, returns the counterfactual, its prediction and the needed changes to the input as dictionary. If False, the results are returned as a triple.

    The default is True.

  • done (callable, optional) –

    A callable that returns True if a counterfactual with a given output/prediction is accepted and False otherwise.

    If done is None, the output/prediction of the counterfactual must match y_target exactly.

    The default is None.

    Note

    In case of a regression it might not always be possible to achieve a given output/prediction exactly.

Returns

A dictionary where the counterfactual is stored in ‘x_cf’, its prediction in ‘y_cf’ and the changes to the original input in ‘delta’.

(x_cf, y_cf, delta) : triple if return_as_dict is False

Return type

dict or triple

Raises

Exception – If no counterfactual was found.

rebuild_model(model)

Rebuilds a sklearn.ensemble.RandomForestClassifier or sklearn.ensemble.RandomForestRegressor model.

Converts a sklearn.ensemble.RandomForestClassifier or sklearn.ensemble.RandomForestRegressor instance into a ceml.sklearn.randomforest.RandomForest instance.

Parameters

model (instance of sklearn.ensemble.RandomForestClassifier or sklearn.ensemble.RandomForestRegressor) – The sklearn random forest model.

Returns

The wrapped random forest model.

Return type

ceml.sklearn.randomforest.RandomForest

ceml.sklearn.randomforest.randomforest_generate_counterfactual(model, x, y_target, features_whitelist=None, regularization='l1', C=1.0, optimizer='nelder-mead', optimizer_args=None, return_as_dict=True, done=None)

Computes a counterfactual of a given input x.

Parameters
  • model (a sklearn.ensemble.RandomForestClassifier or sklearn.ensemble.RandomForestRegressor instance.) – The random forest model that is used for computing the counterfactual.

  • x (numpy.ndarray) – The input x whose prediction has to be explained.

  • y_target (int or float or a callable that returns True if a given prediction is accepted.) – The requested prediction of the counterfactual.

  • features_whitelist (list(int), optional) –

    List of feature indices (dimensions of the input space) that can be used when computing the counterfactual.

    If features_whitelist is None, all features can be used.

    The default is None.

  • regularization (str or ceml.costfunctions.costfunctions.CostFunction, optional) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x.

    Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    regularization can be a description of the regularization, an instance of ceml.costfunctions.costfunctions.CostFunction (or ceml.costfunctions.costfunctions.CostFunctionDifferentiable if your cost function is differentiable) or None if no regularization is requested.

    If regularization is None, no regularization is used.

    The default is “l1”.

  • C (float or list(float), optional) –

    The regularization strength. If C is a list, all values in C are tried and as soon as a counterfactual is found, this counterfactual is returned and no other values of C are tried.

    C is ignored if no regularization is used (regularization=None).

    The default is 1.0

  • optimizer (str or instance of ceml.optim.optimizer.Optimizer, optional) –

    Name/Identifier of the optimizer that is used for computing the counterfactual. See ceml.optim.optimizer.prepare_optim() for details.

    As an alternative, we can use any (custom) optimizer that is derived from the ceml.optim.optimizer.Optimizer class.

    The default is “nelder-mead”.

    Note

    The cost function of a random forest model is not differentiable - we can not use a gradient-based optimization algorithm.

  • optimizer_args (dict, optional) –

    Dictionary for overriding the default hyperparameters of the optimization algorithm.

    The default is None.

  • return_as_dict (boolean, optional) –

    If True, returns the counterfactual, its prediction and the needed changes to the input as dictionary. If False, the results are returned as a triple.

    The default is True.

  • done (callable, optional) – Not used.

Returns

A dictionary where the counterfactual is stored in ‘x_cf’, its prediction in ‘y_cf’ and the changes to the original input in ‘delta’.

(x_cf, y_cf, delta) : triple if return_as_dict is False

Return type

dict or triple

Raises

Exception – If no counterfactual was found.

ceml.sklearn.isolationforest

class ceml.sklearn.isolationforest.IsolationForest(model, **kwds)

Bases: ceml.model.model.ModelWithLoss

Class for rebuilding/wrapping the sklearn.ensemble.IsolationForest class.

Parameters

model (instance of sklearn.ensemble.IsolationForest) – The isolation forest model.

Raises

TypeError – If model is not an instance of sklearn.ensemble.IsolationForest

get_loss(y_target, input_wrapper=None)

Creates and returns a loss function.

Parameters
  • y_target (int) – The target class - either +1 or -1

  • input_wrapper (callable) – Converts the input (e.g. if we want to exclude some features/dimensions, we might have to include these missing features before applying any function to it).

Returns

Initialized loss function. Target label is y_target.

Return type

ceml.sklearn.isolationforest.IsolationForestCost

predict(x)

Predict the output of a given input.

Computes the class label of a given input x.

Parameters

x (numpy.ndarray) – The input x that is going to be classified.

Returns

Prediction.

Return type

int

class ceml.sklearn.isolationforest.IsolationForestCost(models, y_target, input_wrapper=None, epsilon=0, **kwds)

Bases: ceml.costfunctions.costfunctions.CostFunction

Loss function of an isolation forest.

The loss is the negative averaged length of the decision paths.

Parameters
  • models (list(object)) – List of decision trees.

  • y_target (int) – The requested prediction - either -1 or +1.

  • input_wrapper (callable, optional) –

    Converts the input (e.g. if we want to exclude some features/dimensions, we might have to include these missing features before applying any function to it).

    The default is None.

score_impl(x)

Implementation of the loss function.

class ceml.sklearn.isolationforest.IsolationForestCounterfactual(model, **kwds)

Bases: ceml.sklearn.counterfactual.SklearnCounterfactual

Class for computing a counterfactual of an isolation forest model.

See parent class ceml.sklearn.counterfactual.SklearnCounterfactual.

compute_counterfactual(x, y_target, features_whitelist=None, regularization='l1', C=1.0, optimizer='nelder-mead', optimizer_args=None, return_as_dict=True, done=None)

Computes a counterfactual of a given input x.

Parameters
  • x (numpy.ndarray) – The input x whose prediction has to be explained.

  • y_target (int or float) – The requested prediction of the counterfactual.

  • feature_whitelist (list(int), optional) –

    List of feature indices (dimensions of the input space) that can be used when computing the counterfactual.

    If feature_whitelist is None, all features can be used.

    The default is None.

  • regularization (str or ceml.costfunctions.costfunctions.CostFunction, optional) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x. Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    regularization can be a description of the regularization, an instance of ceml.costfunctions.costfunctions.CostFunction (or ceml.costfunctions.costfunctions.DifferentiableCostFunction if the cost function is differentiable) or None if no regularization is requested.

    If regularization is None, no regularization is used.

    The default is “l1”.

  • C (float or list(float), optional) –

    The regularization strength. If C is a list, all values in C are tried and as soon as a counterfactual is found, this counterfactual is returned and no other values of C are tried.

    If no regularization is used (regularization=None), C is ignored.

    The default is 1.0

  • optimizer (str or instance of ceml.optim.optimizer.Optimizer, optional) –

    Name/Identifier of the optimizer that is used for computing the counterfactual. See ceml.optimizer.optimizer.desc_to_optim() for details.

    As an alternative, we can use any (custom) optimizer that is derived from the ceml.optim.optimizer.Optimizer class.

    The default is “nelder-mead”.

    Note

    The cost function of an isolation forest model is not differentiable - we can not use a gradient-based optimization algorithm.

  • optimizer_args (dict, optional) –

    Dictionary for overriding the default hyperparameters of the optimization algorithm.

    The default is None.

  • return_as_dict (boolean, optional) –

    If True, returns the counterfactual, its prediction and the needed changes to the input as dictionary. If False, the results are returned as a triple.

    The default is True.

  • done (callable, optional) –

    A callable that returns True if a counterfactual with a given output/prediction is accepted and False otherwise.

    If done is None, the output/prediction of the counterfactual must match y_target exactly.

    The default is None.

    Note

    In case of a regression it might not always be possible to achieve a given output/prediction exactly.

Returns

A dictionary where the counterfactual is stored in ‘x_cf’, its prediction in ‘y_cf’ and the changes to the original input in ‘delta’.

(x_cf, y_cf, delta) : triple if return_as_dict is False

Return type

dict or triple

Raises

Exception – If no counterfactual was found.

rebuild_model(model)

Rebuilds a sklearn.ensemble.IsolationForest model.

Converts a sklearn.ensemble.IsolationForest into a ceml.sklearn.isolationforest.IsolationForest.

Parameters

model (instance of sklearn.ensemble.IsolationForest) – The sklearn isolation forest model.

Returns

The wrapped isolation forest model.

Return type

ceml.sklearn.isolationforest.IsolationForest

ceml.sklearn.isolationforest.isolationforest_generate_counterfactual(model, x, y_target, features_whitelist=None, regularization='l1', C=1.0, optimizer='nelder-mead', optimizer_args=None, return_as_dict=True)

Computes a counterfactual of a given input x.

Parameters
  • model (a sklearn.ensemble.IsolationForest instance.) – The isolation forest model that is used for computing the counterfactual.

  • x (numpy.ndarray) – The input x whose prediction has to be explained.

  • y_target (int) – The requested prediction of the counterfactual - either -1 or +1.

  • features_whitelist (list(int), optional) –

    List of feature indices (dimensions of the input space) that can be used when computing the counterfactual.

    If features_whitelist is None, all features can be used.

    The default is None.

  • regularization (str or ceml.costfunctions.costfunctions.CostFunction, optional) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x.

    Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    regularization can be a description of the regularization, an instance of ceml.costfunctions.costfunctions.CostFunction (or ceml.costfunctions.costfunctions.CostFunctionDifferentiable if your cost function is differentiable) or None if no regularization is requested.

    If regularization is None, no regularization is used.

    The default is “l1”.

  • C (float or list(float), optional) –

    The regularization strength. If C is a list, all values in C are tried and as soon as a counterfactual is found, this counterfactual is returned and no other values of C are tried.

    C is ignored if no regularization is used (regularization=None).

    The default is 1.0

  • optimizer (str or instance of ceml.optim.optimizer.Optimizer, optional) –

    Name/Identifier of the optimizer that is used for computing the counterfactual. See ceml.optimizer.optimizer.desc_to_optim() for details.

    As an alternative, we can use any (custom) optimizer that is derived from the ceml.optim.optimizer.Optimizer class.

    The default is “nelder-mead”.

    Note

    The cost function of an isolation forest model is not differentiable - we can not use a gradient-based optimization algorithm.

  • optimizer_args (dict, optional) –

    Dictionary for overriding the default hyperparameters of the optimization algorithm.

    The default is None.

  • return_as_dict (boolean, optional) –

    If True, returns the counterfactual, its prediction and the needed changes to the input as dictionary. If False, the results are returned as a triple.

    The default is True.

Returns

A dictionary where the counterfactual is stored in ‘x_cf’, its prediction in ‘y_cf’ and the changes to the original input in ‘delta’.

(x_cf, y_cf, delta) : triple if return_as_dict is False

Return type

dict or triple

Raises

Exception – If no counterfactual was found.

ceml.sklearn.softmaxregression

class ceml.sklearn.softmaxregression.SoftmaxCounterfactual(model, **kwds)

Bases: ceml.sklearn.counterfactual.SklearnCounterfactual, ceml.optim.cvx.MathematicalProgram, ceml.optim.cvx.ConvexQuadraticProgram, ceml.optim.cvx.PlausibleCounterfactualOfHyperplaneClassifier

Class for computing a counterfactual of a softmax regression model.

See parent class ceml.sklearn.counterfactual.SklearnCounterfactual.

rebuild_model(model)

Rebuilds a sklearn.linear_model.LogisticRegression model.

Converts a sklearn.linear_model.LogisticRegression into a ceml.sklearn.softmaxregression.SoftmaxRegression.

Parameters

model (instance of sklearn.linear_model.LogisticRegression) – The sklearn softmax regression model.

Returns

The wrapped softmax regression model.

Return type

ceml.sklearn.softmaxregression.SoftmaxRegression

class ceml.sklearn.softmaxregression.SoftmaxRegression(model, **kwds)

Bases: ceml.model.model.ModelWithLoss

Class for rebuilding/wrapping the sklearn.linear_model.LogisticRegression class.

The SoftmaxRegression class rebuilds a softmax regression model from a given weight vector and intercept.

Parameters

model (instance of sklearn.linear_model.LogisticRegression) – The softmax regression model.

w

The weight vector (a matrix if we have more than two classes).

Type

numpy.ndarray

b

The intercept/bias (a vector if we have more than two classes).

Type

numpy.ndarray

dim

Dimensionality of the input data.

Type

int

is_multiclass

True if model is a binary classifier, False otherwise.

Type

boolean

Raises

TypeError – If model is not an instance of sklearn.linear_model.LogisticRegression

get_loss(y_target, pred=None)

Creates and returns a loss function.

Builds a negative-log-likehood cost function where the target is y_target.

Parameters
  • y_target (int) – The target class.

  • pred (callable, optional) –

    A callable that maps an input to the output (class probabilities).

    If pred is None, the class method predict is used for mapping the input to the output (class probabilities)

    The default is None.

Returns

Initialized negative-log-likelihood cost function. Target label is y_target.

Return type

ceml.backend.jax.costfunctions.NegLogLikelihoodCost

predict(x)

Predict the output of a given input.

Computes the class probabilities for a given input x.

Parameters

x (numpy.ndarray) – The input x that is going to be classified.

Returns

An array containing the class probabilities.

Return type

jax.numpy.array

ceml.sklearn.softmaxregression.softmaxregression_generate_counterfactual(model, x, y_target, features_whitelist=None, regularization='l1', C=1.0, optimizer='mp', optimizer_args=None, return_as_dict=True, done=None, plausibility=None)

Computes a counterfactual of a given input x.

Parameters
  • model (a sklearn.linear_model.LogisticRegression instance.) –

    The softmax regression model that is used for computing the counterfactual.

    Note: model.multi_class must be set to multinomial.

  • x (numpy.ndarray) – The input x whose prediction has to be explained.

  • y_target (int or float or a callable that returns True if a given prediction is accepted.) – The requested prediction of the counterfactual.

  • features_whitelist (list(int), optional) –

    List of feature indices (dimensions of the input space) that can be used when computing the counterfactual.

    If features_whitelist is None, all features can be used.

    The default is None.

  • regularization (str or ceml.costfunctions.costfunctions.CostFunction, optional) –

    Regularizer of the counterfactual. Penalty for deviating from the original input x.

    Supported values:

    • l1: Penalizes the absolute deviation.

    • l2: Penalizes the squared deviation.

    regularization can be a description of the regularization, an instance of ceml.costfunctions.costfunctions.CostFunction (or ceml.costfunctions.costfunctions.CostFunctionDifferentiable if your cost function is differentiable) or None if no regularization is requested.

    If regularization is None, no regularization is used.

    The default is “l1”.

  • C (float or list(float), optional) –

    The regularization strength. If C is a list, all values in C are tried and as soon as a counterfactual is found, this counterfactual is returned and no other values of C are tried.

    C is ignored if no regularization is used (regularization=None).

    The default is 1.0

  • optimizer (str or instance of ceml.optim.optimizer.Optimizer, optional) –

    Name/Identifier of the optimizer that is used for computing the counterfactual. See ceml.optim.optimizer.prepare_optim() for details.

    Softmax regression supports the use of mathematical programs for computing counterfactuals - set optimizer to “mp” for using a convex quadratic program for computing the counterfactual. Note that in this case the hyperparameter C is ignored.

    As an alternative, we can use any (custom) optimizer that is derived from the ceml.optim.optimizer.Optimizer class.

    The default is “mp”.

  • optimizer_args (dict, optional) –

    Dictionary for overriding the default hyperparameters of the optimization algorithm.

    The default is None.

  • return_as_dict (boolean, optional) –

    If True, returns the counterfactual, its prediction and the needed changes to the input as dictionary. If False, the results are returned as a triple.

    The default is True.

  • done (callable, optional) – Not used.

  • plausibility (dict, optional.) –

    If set to a valid dictionary (see ceml.sklearn.plausibility.prepare_computation_of_plausible_counterfactuals()), a plausible counterfactual (as proposed in Artelt et al. 2020) is computed. Note that in this case, all other parameters are ignored.

    If plausibility is None, the closest counterfactual is computed.

    The default is None.

Returns

A dictionary where the counterfactual is stored in ‘x_cf’, its prediction in ‘y_cf’ and the changes to the original input in ‘delta’.

(x_cf, y_cf, delta) : triple if return_as_dict is False

Return type

dict or triple

Raises

Exception – If no counterfactual was found.

ceml.sklearn.utils

ceml.sklearn.utils.build_regularization_loss(regularization, x, input_wrapper=None)

Build a regularization loss.

Parameters
  • regularization (str, ceml.costfunctions.costfunctions.CostFunction or None) –

    Description of the regularization, instance of ceml.costfunctions.costfunctions.CostFunction (or ceml.costfunctions.costfunctions.DifferentiableCostFunction if your cost function is differentiable) or None if no regularization is requested.

    See ceml.sklearn.utils.desc_to_regcost() for a list of supported descriptions.

    If no regularization is requested, an instance of ceml.backend.jax.costfunctions.costfunctions.DummyCost is returned. This cost function always outputs zero, no matter what the input is.

  • x (numpy.array) – The original input from which we do not want to deviate much.

  • input_wrapper (callable, optional) –

    Converts the input (e.g. if we want to exclude some features/dimensions, we might have to include these missing features before applying any function to it).

    If input_wrapper is None, the input is passed without any modifications.

    The default is None.

Returns

An instance of ceml.costfunctions.costfunctions.CostFunction or the user defined, callable, regularization.

Return type

callable

Raises

TypeError – If regularization has an invalid type.

ceml.sklearn.utils.desc_to_dist(desc)

Converts a description of a distance metric into a jax.numpy function.

Supported descriptions:

  • l1: l1-norm

  • l2: l2-norm

Parameters

desc (str) – Description of the distance metric.

Returns

The distance function implemented as a jax.numpy function.

Return type

callable

Raises

ValueError – If desc contains an invalid description.

ceml.sklearn.utils.desc_to_regcost(desc, x, input_wrapper)

Converts a description of a regularization into a jax.numpy function.

Supported descriptions:

  • l1: l1-regularization

  • l2: l2-regularization

Parameters
  • desc (str) – Description of the distance metric.

  • x (numpy.array) – The original input from which we do not want to deviate much.

  • input_wrapper (callable) – Converts the input (e.g. if we want to exclude some features/dimensions, we might have to include these missing features before applying any function to it).

Returns

The regularization function implemented as a jax.numpy function.

Return type

callable

Raises

ValueError – If desc contains an invalid description.