explainer.sage#
This module gathers SAGE Explanation Methods
- class explainer.sage.BatchSage(model_function, feature_names, loss_function, n_inner_samples=1, storage=None, imputer=None)[source]#
Bases:
objectBatch SAGE Explainer
Computes SAGE importance values according to its original definition in https://arxiv.org/abs/2004.00668. A Storage is updated with all observations from a stream and an explanation is created with access to all of these observations at once. This can be computationally challenging for large amounts of observations.
- Parameters:
model_function (Callable[[Any], Any]) – The Model function to be explained (e.g. model.predict_one (river), model.predict_proba (sklearn)).
loss_function (Union[Metric, Callable[[Any, Dict], float]]) –
The loss function for which the importance values are calculated. This can either be a callable function or a predefined river.metric.base.Metric.<br> - river.metric.base.Metric: Any Metric implemented in river (e.g.
river.metrics.CrossEntropy() for classification or river.metrics.MSE() for regression).<br>
- callable function: The loss_function needs to follow the signature of
loss_function(y_true, y_pred) and handle the output dimensions of the model function. Smaller values are interpreted as being better if not overriden with loss_bigger_is_better=True. y_pred is passed as a dict.
feature_names (Sequence[Union[str, int, float]]) – List of feature names to be explained for the model.
storage (BaseStorage, optional) – Optional incremental data storage Mechanism. Defaults to GeometricReservoirStorage(size=100) for dynamic modelling settings (dynamic_setting=True) and UniformReservoirStorage(size=100) in static modelling settings (dynamic_setting=False).
imputer (BaseImputer, optional) – Incremental imputing strategy to be used. Defaults to MarginalImputer(sampling_strategy=’joint’).
n_inner_samples (int) – Number of model evaluation per feature and explanation step (observation). Defaults to 1.
- n_inner_samples#
Number of model evaluation per feature and explanation step (observation).
- Type:
- explain_many(x_data, y_data, n_inner_samples=None, verbose=True)[source]#
Explain one observation (x_i, y_i) with all data stored.
- Parameters:
x_data (List[dict]) – A list of input data to be explained, as dicts mapping from feature names to feature values.
y_data (List[Any]) – Target label of the current observation.
n_inner_samples (int, optional) – Number of model evaluation per feature for the current explanation step (observation). Defaults to None.
verbose (bool) – Flag indicating if the explanation should print to console (True) or not (False).
- Returns:
The current SAGE feature importance scores.
- Return type:
(dict)
- explain_many_original(x_data, y_data, n_inner_samples=None, verbose=True)[source]#
Explain one observation (x_i, y_i) with all data stored according to the original definition in https://arxiv.org/abs/2004.00668.
- Parameters:
x_data (List[dict]) – A list of input data to be explained, as dicts mapping from feature names to feature values.
y_data (List[Any]) – Target label of the current observation.
n_inner_samples (int, optional) – Number of model evaluation per feature for the current explanation step (observation). Defaults to None.
verbose (bool) – Flag indicating if the explanation should print to console (True) or not (False).
- Returns:
The current SAGE feature importance scores.
- Return type:
(dict)
- explain_one(x_i, y_i, n_inner_samples=None, original_sage=False, verbose=True)[source]#
Explain one observation (x_i, y_i) with all data stored.
- Parameters:
x_i (dict) – The input features of the current observation as a dict of feature names to feature values.
y_i (Any) – Target label of the current observation.
n_inner_samples (int, optional) – Number of model evaluation per feature for the current explanation step (observation). Defaults to None.
original_sage (bool) – Flag indicating if the original definition of SAGE is used (True) or not (False). Defaults to False.
- Returns:
The current SAGE feature importance scores.
- Return type:
(dict)
- class explainer.sage.IncrementalSage(model_function, loss_function, feature_names, *, smoothing_alpha=None, storage=None, imputer=None, n_inner_samples=1, dynamic_setting=True, loss_bigger_is_better=False)[source]#
Bases:
BaseIncrementalFeatureImportanceIncremental SAGE Explainer
Computes SAGE importance values incrementally by applying exponential smoothing. For each input instance tuple x_i, y_i one update of the explanation procedure is performed.
- Parameters:
model_function (Callable) – The Model function to be explained (e.g.
model.predict_one(river) (sklearn)) –
model.predict_proba (sklearn)) –
loss_function (Union[Metric, Callable[[Any, Dict], float]]) –
The loss function for which the importance values are calculated. This can either be a callable function or a predefined
river.metric.base.Metric.river.metric.base.Metric: Any Metric implemented in river (e.g.
river.metrics.CrossEntropy()for classification orriver.metrics.MSE()for regression). callable function: The loss_function needs to follow the signature of loss_function(y_true, y_pred) and handle the output dimensions of the model function. Smaller values are interpreted as being better if not overriden withloss_bigger_is_better=True.y_predis passed as a dict.feature_names (Sequence[Union[str, int, float]]) – List of feature names to be explained for the model.
smoothing_alpha (float, optional) – The smoothing parameter for the exponential smoothing of the importance values. Should be in the interval between ]0,1]. Defaults to 0.001.
storage (BaseStorage, optional) – Optional incremental data storage Mechanism. Defaults to
GeometricReservoirStorage(size=100)for dynamic modelling settings (dynamic_setting=True) andUniformReservoirStorage(size=100)in static modelling settings (dynamic_setting=False).imputer (BaseImputer, optional) – Incremental imputing strategy to be used. Defaults to
MarginalImputer(sampling_strategy='joint').n_inner_samples (int) – Number of model evaluation per feature and explanation step (observation). Defaults to 1.
dynamic_setting (bool) – Flag to indicate if the modelling setting is dynamic
True(changing model, and adaptive explanation) or a static modelling settingFalse(all observations contribute equally to the final importance). Defaults toTrue.loss_bigger_is_better (bool) – Flag that indicates if a smaller loss value indicates a better fit (‘True’) or not (‘False’). This is only used to represent the marginal- and model-loss more sensibly.
- marginal_prediction#
The current marginal prediction of the model_function (smoothed over time).
- Type:
- n_inner_samples#
Number of model evaluation per feature and explanation step (observation).
- Type:
- explain_one(x_i, y_i, n_inner_samples=None, update_storage=True)[source]#
Explain one observation (x_i, y_i).
- Parameters:
x_i (dict) – The input features of the current observation as a dict of feature names to feature values.
y_i (Any) – Target label of the current observation.
n_inner_samples (int, optional) – Number of model evaluation per feature for the current explanation step (observation). Defaults to
None.update_storage (bool) – Flag if the underlying incremental data storage mechanism is to be updated with the new observation (
True) or not (False). Defaults toTrue.
- Returns:
The current SAGE feature importance scores.
- Return type:
(dict)
- property explained_loss#
Explained loss (difference between the current marginal and model loss.) property.
- property marginal_loss#
Marginal loss (loss of the model without any features, default prediction loss) property, which is smoothed over time.
- property model_loss#
Model loss (loss of model with features) property, which is smoothed over time.
- class explainer.sage.IntervalSage(model_function, feature_names, loss_function, n_inner_samples=1, interval_length=1000, storage_length=1000, storage=None, imputer=None)[source]#
Bases:
BatchSageInterval SAGE Explainer
Computes SAGE importance values according to its original definition in https://arxiv.org/abs/2004.00668 at set time intervals. A Storage of the last n (specified by storage_length) observations are kept on which the explanations are created.
- Parameters:
model_function (Callable[[Any], Any]) – The Model function to be explained (e.g. model.predict_one (river), model.predict_proba (sklearn)).
loss_function (Union[Metric, Callable[[Any, Dict], float]]) –
The loss function for which the importance values are calculated. This can either be a callable function or a predefined river.metric.base.Metric.<br> - river.metric.base.Metric: Any Metric implemented in river (e.g.
river.metrics.CrossEntropy() for classification or river.metrics.MSE() for regression).<br>
- callable function: The loss_function needs to follow the signature of
loss_function(y_true, y_pred) and handle the output dimensions of the model function. Smaller values are interpreted as being better if not overriden with loss_bigger_is_better=True. y_pred is passed as a dict.
feature_names (Sequence[Union[str, int, float]]) – List of feature names to be explained for the model.
storage (IntervalStorage, optional) – Optional incremental data storage Mechanism. Defaults to IntervalStorage(size=interval_length).
imputer (BaseImputer, optional) – Incremental imputing strategy to be used. Defaults to MarginalImputer(sampling_strategy=’joint’).
n_inner_samples (int) – Number of model evaluation per feature and explanation step (observation). Defaults to 1.
interval_length (int) – Length of the explanation interval after which the explanations are created. Defaults to 1000.
- n_inner_samples#
Number of model evaluation per feature and explanation step (observation).
- Type:
- explain_one(x_i, y_i, n_inner_samples=None, update_storage=True, force_explain=False, verbose=True)[source]#
- Explain one observation (x_i, y_i) if enough time between the last explanation and now
has passed (interval_length).
- Parameters:
x_i (dict) – The input features of the current observation as a dict of feature names to feature values.
y_i (Any) – Target label of the current observation.
n_inner_samples (int, optional) – Number of model evaluation per feature for the current explanation step (observation). Defaults to None.
update_storage (bool) – Flag if the underlying incremental data storage mechanism is to be updated with the new observation (True) or not (False). Defaults to True.
force_explain (bool) – Overrides the interval_length restriction and explains the current sample. This does not override the set interval_length globally, such that the explainer is still run in the same rhythm as before.
verbose (bool) – Flag indicating if the explanation should print to console (True) or not (False).
- Returns:
The current SAGE feature importance scores.
- Return type:
(dict)
Modules
This module contains the Batch SAGE explainer. |
|
This module contains the incremental SAGE explainer. |
|
This module contains the interval SAGE explainer. |