storage.tree_storage#

This module contains the TreeStorage and the MeanVarRegressor leaf classifier.

Functions

get_all_tree_paths(node[, walked_path, paths])

rtype:

List[str]

walk_through_tree(node, x_i[, until_leaf])

Traverses a decision tree given a data point, and a starting node.

Classes

MeanVarRegressor()

A simple regressor model intended to be used as a leaf model in Decision Tree Regressors.

TreeStorage(cat_feature_names, num_feature_names)

A Tree Storage that trains incremental decision trees for each feature.

class storage.tree_storage.MeanVarRegressor[source]#

Bases: Regressor

A simple regressor model intended to be used as a leaf model in Decision Tree Regressors.

The Regressor keeps track of the mean and standard deviation of the incoming numerical labels and samples prediction values from a normal distribution according to the current mean and standard deviation.

learn_one(x, y)[source]#

Updates the summary statistics based on the target labels.

Parameters:
  • x (Any) – input features (that are not used for prediction)

  • y (base.typing.RegTarget) – A number that is transformable into a float.

Returns:

The Regressor itself.

Return type:

base.Regressor

predict_one(x=None)[source]#

Predicts a value based on the current summary statistics.

Parameters:

x (Any) – input features (that are not used for prediction)

Returns:

The predicted value.

Return type:

float

class storage.tree_storage.TreeStorage(cat_feature_names, num_feature_names, max_depth=5, leaf_reservoir_length=10, grace_period=200, seed=None)[source]#

Bases: BaseStorage

A Tree Storage that trains incremental decision trees for each feature.

feature_names#

List of features stored.

Type:

list[str]

cat_feature_names#

List of categorical features stored.

Type:

list[str]

num_feature_names#

List of numerical features stored.

Type:

list[str]

performances#

Dictionary of performance metrics per incremental decision tree for each feature stored.

Type:

dict[Any, Union[R2, Accuracy]]

data_reservoirs#

Dictionary of data reservoirs for each feature and leaf nodes.

Type:

dict[str, dict]

__call__(feature_name)[source]#

Given a feature name, returns the associated data reservoirs.

Parameters:

feature_name (str) – The feature name for which to return the data reservoirs.

Returns:

Tuple of data reservoir and flag if it is stored as a numerical feature or categorical.

Return type:

(dict, str)

Raises:

ValueError – If feature_name is not stored as a categorical feature nor a numerical feature.

static get_path_through_tree(node, x_i)[source]#

Given a data point and a starting node, traverses the decision tree.

Parameters:
  • node – Root node of the model.

  • x_i – Data point to traverse the tree with.

Returns:

The walked path through the decision tree.

Return type:

str

update(x, y=None)[source]#

Given a data point, it updates the storage.

Parameters:
  • x (Dict) – Features as List of Dicts

  • y (Optional[Any]) – Target as float or integer (not used)

Returns:

None

storage.tree_storage.get_all_tree_paths(node, walked_path='', paths=None)[source]#
Return type:

List[str]

storage.tree_storage.walk_through_tree(node, x_i, until_leaf=True)[source]#

Traverses a decision tree given a data point, and a starting node.

Parameters:
  • node (Union[Branch, Leaf]) – Target as float or integer

  • x_i (dict) – Data point as Dicts.

  • until_leaf (bool) – Flag weather to traverse the tree until a leaf node (True) or just the next node (False).

Yields:

The next node in the tree.

Return type:

Iterable[Union[Branch, Leaf]]