explainer.sage.batch#

This module contains the Batch SAGE explainer.

Classes

BatchSage(model_function, feature_names, ...)

Batch SAGE Explainer

class explainer.sage.batch.BatchSage(model_function, feature_names, loss_function, n_inner_samples=1, storage=None, imputer=None)[source]#

Bases: object

Batch SAGE Explainer

Computes SAGE importance values according to its original definition in https://arxiv.org/abs/2004.00668. A Storage is updated with all observations from a stream and an explanation is created with access to all of these observations at once. This can be computationally challenging for large amounts of observations.

Parameters:

model_function (Callable[[Any], Any]) – The Model function to be explained (e.g. model.predict_one (river), model.predict_proba (sklearn)).
loss_function (Union[Metric, Callable[[Any, Dict], float]]) –
The loss function for which the importance values are calculated. This can either be a callable function or a predefined river.metric.base.Metric.<br> - river.metric.base.Metric: Any Metric implemented in river (e.g.

river.metrics.CrossEntropy() for classification or river.metrics.MSE() for regression).<br>
- callable function: The loss_function needs to follow the signature of
  loss_function(y_true, y_pred) and handle the output dimensions of the model function. Smaller values are interpreted as being better if not overriden with loss_bigger_is_better=True. y_pred is passed as a dict.
feature_names (Sequence[Union[str, int, float]]) – List of feature names to be explained for the model.
storage (BaseStorage, optional) – Optional incremental data storage Mechanism. Defaults to GeometricReservoirStorage(size=100) for dynamic modelling settings (dynamic_setting=True) and UniformReservoirStorage(size=100) in static modelling settings (dynamic_setting=False).
imputer (BaseImputer, optional) – Incremental imputing strategy to be used. Defaults to MarginalImputer(sampling_strategy=’joint’).
n_inner_samples (int) – Number of model evaluation per feature and explanation step (observation). Defaults to 1.

feature_names#

The feature names of the dataset.

Type:: Sequence[Union[str, int, float]]

n_inner_samples#

Number of model evaluation per feature and explanation step (observation).

Type:: int

explain_many(x_data, y_data, n_inner_samples=None, verbose=True)[source]#

Explain one observation (x_i, y_i) with all data stored.

Parameters:

x_data (List[dict]) – A list of input data to be explained, as dicts mapping from feature names to feature values.
y_data (List[Any]) – Target label of the current observation.
n_inner_samples (int, optional) – Number of model evaluation per feature for the current explanation step (observation). Defaults to None.
verbose (bool) – Flag indicating if the explanation should print to console (True) or not (False).

Returns:

The current SAGE feature importance scores.

Return type:

(dict)

explain_many_original(x_data, y_data, n_inner_samples=None, verbose=True)[source]#

Explain one observation (x_i, y_i) with all data stored according to the original definition in https://arxiv.org/abs/2004.00668.

Parameters:

x_data (List[dict]) – A list of input data to be explained, as dicts mapping from feature names to feature values.
y_data (List[Any]) – Target label of the current observation.
n_inner_samples (int, optional) – Number of model evaluation per feature for the current explanation step (observation). Defaults to None.
verbose (bool) – Flag indicating if the explanation should print to console (True) or not (False).

Returns:

The current SAGE feature importance scores.

Return type:

(dict)

explain_one(x_i, y_i, n_inner_samples=None, original_sage=False, verbose=True)[source]#

Explain one observation (x_i, y_i) with all data stored.

Parameters:

x_i (dict) – The input features of the current observation as a dict of feature names to feature values.
y_i (Any) – Target label of the current observation.
n_inner_samples (int, optional) – Number of model evaluation per feature for the current explanation step (observation). Defaults to None.
original_sage (bool) – Flag indicating if the original definition of SAGE is used (True) or not (False). Defaults to False.

Returns:

The current SAGE feature importance scores.

Return type:

(dict)

update_storage(x_i, y_i)[source]#

Updates the data storage with one observation (x_i, y_i).

Parameters:

x_i (dict) – The input features of the current observation.
y_i (Any) – Target label of the current observation.