storage#

class storage.BatchStorage(store_targets=True)[source]#

Bases: BaseStorage

A Batch Storage storing all seen samples.

update(x, y=None)[source]#

Given a data point, it updates the storage. :type x: dict :param x: Features as List of Dicts :type y: Optional[Any] :param y: Target as float or integer

Returns:

None

class storage.GeometricReservoirStorage(size, constant_probability=None, store_targets=False)[source]#

Bases: ReservoirStorage

Geometric Reservoir Storage

update(x, y=None)[source]#
class storage.IntervalStorage(size, store_targets=True)[source]#

Bases: BaseStorage

An Interval Storage storing last k samples.

get_data()[source]#
update(x, y=None)[source]#
class storage.ReservoirStorage(size, store_targets=False)[source]#

Bases: BaseStorage, ABC

Reservoir Storage - base class

size int

Size of the reservoir.

store_targets bool

Flag if the target values are also stored.

class storage.SequenceStorage(store_targets=True)[source]#

Bases: IntervalStorage

An Interval Storage storing the last sample.

class storage.TreeStorage(cat_feature_names, num_feature_names, max_depth=5, leaf_reservoir_length=10, grace_period=200, seed=None)[source]#

Bases: BaseStorage

A Tree Storage that trains incremental decision trees for each feature.

feature_names#

List of features stored.

Type:

list[str]

cat_feature_names#

List of categorical features stored.

Type:

list[str]

num_feature_names#

List of numerical features stored.

Type:

list[str]

performances#

Dictionary of performance metrics per incremental decision tree for each feature stored.

Type:

dict[Any, Union[R2, Accuracy]]

data_reservoirs#

Dictionary of data reservoirs for each feature and leaf nodes.

Type:

dict[str, dict]

__call__(feature_name)[source]#

Given a feature name, returns the associated data reservoirs.

Parameters:

feature_name (str) – The feature name for which to return the data reservoirs.

Returns:

Tuple of data reservoir and flag if it is stored as a numerical feature or categorical.

Return type:

(dict, str)

Raises:

ValueError – If feature_name is not stored as a categorical feature nor a numerical feature.

static get_path_through_tree(node, x_i)[source]#

Given a data point and a starting node, traverses the decision tree.

Parameters:
  • node – Root node of the model.

  • x_i – Data point to traverse the tree with.

Returns:

The walked path through the decision tree.

Return type:

str

update(x, y=None)[source]#

Given a data point, it updates the storage.

Parameters:
  • x (Dict) – Features as List of Dicts

  • y (Optional[Any]) – Target as float or integer (not used)

Returns:

None

class storage.UniformReservoirStorage(size=1000, store_targets=False)[source]#

Bases: ReservoirStorage

Uniform Reservoir Storage

Summarizes a data stream by keeping track of a fixed length reservoir of observations. Each past observation of the stream has an equal probability of being in the reservoir at the current time. For more information we refer to https://en.wikipedia.org/wiki/Reservoir_sampling.

stored_samples int

Number of samples observed in the stream.

update(x, y=None)[source]#

Updates the reservoir with the current sample if necessary.

The update mechanism follows the optimal algorithm as stated here: https://en.wikipedia.org/wiki/Reservoir_sampling#Optimal:_Algorithm_L.

Parameters:
  • x (dict) – Current observation’s features.

  • y (Any, optional) – Current observation’s label. Defaults to None.

Modules

storage.base

This module contains base storage objects

storage.batch_storage

This module contains the batch storage.

storage.geometric_reservoir_storage

This module contains the GeometricReservoirStorage.

storage.interval_storage

This module contains the IntervalStorage

storage.reservoir_storage

This module contains the base ReservoirStorage.

storage.sequence_storage

storage.tree_storage

This module contains the TreeStorage and the MeanVarRegressor leaf classifier.

storage.uniform_reservoir_storage

This module contains the UniformReservoirStorage.