Getting started!

Usage

pypads is easy to use. Just define what is needed to be tracked in the config and call PyPads.

A simple example looks like the following.:

from pypads.base import PyPads
# define the configuration, in this case we want to track the parameters,
# outputs and the inputs of each called function included in the hooks (pypads_fit, pypads_predict)
config = {"events": {
    "parameters": {"on": ["pypads_fit"]},
    "output": {"on": ["pypads_fit", "pypads_predict"]},
    "input": {"on": ["pypads_fit"]}
}}
# A simple initialization of the class will activate the tracking
PyPads(config=config)

# An example
from sklearn import datasets, metrics
from sklearn.tree import DecisionTreeClassifier

# load the iris datasets
dataset = datasets.load_iris()

# fit a model to the data
model = DecisionTreeClassifier()
model.fit(dataset.data, dataset.target) # pypads will track the parameters, output, and input of the model fit function.
# get the predictions
predicted = model.predict(dataset.data) # pypads will track only the output of the model predict function.

The used hooks for each event are defined in the mapping json file where each event includes the functions to listen to.

Mapping file example

For the previous example, the sklearn mapping json file would look like the following.

{
  "default_hooks": {
    "modules": {
      "fns": {}
    },
    "classes": {
      "fns": {
        "pypads_init": [
          "__init__"
        ],
        "pypads_fit": [
          "fit",
          "fit_predict",
          "fit_transform"
        ],
        "pypads_predict": [
          "fit_predict",
          "predict",
          "score"
        ],
        "pypads_transform": [
          "fit_transform",
          "transform"
        ]
      }
    },
    "fns": {}
  },
  "algorithms": [
    {
      "name": "base sklearn estimator",
      "other_names": [],
      "implementation": {
        "sklearn": "sklearn.base.BaseEstimator"
      },
      "hooks": {
        "pypads_fit": [
          "fit",
          "fit_predict",
          "fit_transform"
        ],
        "pypads_predict": [
          "fit_predict",
          "predict"
        ],
        "pypads_transform": [
          "fit_transform",
          "transform"
        ]
      }
    },
    {
      "name": "sklearn classification metrics",
      "other_names": [],
      "implementation": {
        "sklearn": "sklearn.metrics.classification"
      },
      "hooks": {
        "pypads_metric": [
          ".*"
        ]
      }
    },
    {
      "name": "sklearn datasets",
      "other_names": [],
      "implementation": {
        "sklearn": "sklearn.datasets"
      },
      "hooks": {
        "pypads_dataset": [
          "load*"
        ]
      }
    },
    {
      "name": "sklearn cross validation",
      "other_names": [],
      "implementation": {
        "sklearn": "sklearn.model_selection._search.BaseSearchCV"
      },
      "hooks": {
        "pypads_param_search": [
          "fit"
        ]
      }
    },
    {
      "name": "sklearn cross validation",
      "other_names": [],
      "implementation": {
        "sklearn": "sklearn.model_selection._validation._fit_and_score"
      },
      "hooks": {
        "pypads_param_search_exec": "always"
      }
    },
    {
      "name": "sklearn cross validation",
      "other_names": [],
      "implementation": {
        "sklearn": "sklearn.model_selection._split.BaseCrossValidator"
      },
      "hooks": {
        "pypads_split": [
          "split"
        ]
      }
    },
    {
      "name": "sklearn shuffle split",
      "other_names": [],
      "implementation": {
        "sklearn": "sklearn.model_selection._split.BaseShuffleSplit"
      },
      "hooks": {
        "pypads_split": [
          "split"
        ]
      }
    },
    {
      "name": "base sklearn estimator",
      "other_names": [],
      "implementation": {
        "sklearn": "sklearn.base.RegressorMixin"
      },
      "hooks": {
        "pypads_metric": [
          "score"
        ]
      }
    },
    {
      "name": "base sklearn estimator",
      "other_names": [],
      "implementation": {
        "sklearn": "sklearn.base.ClassifierMixin"
      },
      "hooks": {
        "pypads_metric": [
          "score"
        ]
      }
    },
    {
      "name": "base sklearn estimator",
      "other_names": [],
      "implementation": {
        "sklearn": "sklearn.base.DensityMixin"
      },
      "hooks": {
        "pypads_metric": [
          "score"
        ]
      }
    },
    {
      "name": "base decision tree",
      "other_names": [],
      "implementation": {
        "sklearn": "sklearn.tree.tree.BaseDecisionTree"
      }
    },
    {
      "name": "logistic regression",
      "other_names": [
        "logit regression",
        "maximum-entropy classification",
        "MaxEnt",
        "log-linear classifier"
      ],
      "implementation": {
        "sklearn": "sklearn.linear_model.logistic.LogisticRegression"
      },
      "type": "Classification",
      "hyper_parameters": {
        "model_parameters": [
          {
            "name": "penalty_norm",
            "kind_of_value": "{l1, l2}",
            "optional": "False",
            "description": "Used to specify the norm used in the penalization.",
            "sklearn": {
              "default_value": "'l2'",
              "path": "penalty"
            }
          },
          {
            "name": "dual",
            "kind_of_value": "boolean",
            "optional": "False",
            "description": "Dual or primal formulation.",
            "sklearn": {
              "default_value": "False",
              "path": "dual"
            }
          },
          {
            "name": "tolerance",
            "kind_of_value": "float",
            "optional": "False",
            "description": "Tolerance for stopping criteria.",
            "sklearn": {
              "default_value": "0.0001",
              "path": "tol"
            }
          },
          {
            "name": "inverse_regularisation_strength",
            "kind_of_value": "float",
            "optional": "False",
            "description": "Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.",
            "sklearn": {
              "default_value": "1.0",
              "path": "C"
            }
          },
          {
            "name": "fit_intercept",
            "kind_of_value": "boolean",
            "optional": "False",
            "description": "Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.",
            "sklearn": {
              "default_value": "True",
              "path": "fit_intercept"
            }
          },
          {
            "name": "intercept_scaling",
            "kind_of_value": "float",
            "optional": "False",
            "description": "Useful only when the solver \\u2018liblinear\\u2019 is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a \\u201csynthetic\\u201d feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic_feature_weight.",
            "sklearn": {
              "default_value": "1",
              "path": "intercept_scaling"
            }
          },
          {
            "name": "class_weight",
            "kind_of_value": "{dict, 'balanced', None}",
            "optional": "False",
            "description": "Weights associated with classes.",
            "sklearn": {
              "default_value": "None",
              "path": "class_weight"
            }
          },
          {
            "name": "random_state",
            "kind_of_value": "{integer, RandomState instance, None}",
            "optional": "True",
            "description": "The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.",
            "sklearn": {
              "default_value": "None",
              "path": "random_state"
            }
          },
          {
            "name": "solver",
            "kind_of_value": "{'newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'}",
            "optional": "False",
            "description": "Solver to use in the computational routines.",
            "sklearn": {
              "default_value": "'liblinear'",
              "path": "solver"
            }
          },
          {
            "name": "multi_class",
            "kind_of_value": "{'ovr', 'multinomial'}",
            "optional": "False",
            "description": "If the option chosen is \\u2018ovr\\u2019, then a binary problem is fit for each label. Else the loss minimised is the multinomial loss fit across the entire probability distribution.",
            "sklearn": {
              "default_value": "'ovr'",
              "path": "multi_class"
            }
          },
          {
            "name": "verbose",
            "kind_of_value": "integer",
            "optional": "True",
            "description": "For the liblinear and lbfgs solvers set verbose to any positive number for verbosity.",
            "sklearn": {
              "default_value": "0",
              "path": "verbose"
            }
          }
        ],
        "optimisation_parameters": [
          {
            "name": "max_iterations",
            "kind_of_value": "integer",
            "optional": "False",
            "description": "Maximum number of iterations.",
            "sklearn": {
              "default_value": "100",
              "path": "max_iter"
            }
          },
          {
            "name": "reuse_previous",
            "kind_of_value": "boolean",
            "optional": "False",
            "description": "When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.",
            "sklearn": {
              "default_value": "False",
              "path": "warm_start"
            }
          },
          {
            "name": "jobs",
            "kind_of_value": "integer",
            "optional": "False",
            "description": "Number of CPU cores used when parallelizing over classes.",
            "sklearn": {
              "default_value": "1",
              "path": "n_jobs"
            }
          }
        ],
        "execution_parameters": []
      }
    }
  ],
  "metadata": {
    "author": "Michael Granitzer",
    "library": "sklearn",
    "library_version": "0.19.1",
    "mapping_version": "0.1"
  }
}

For example, “pypads_fit” is an event listener on any fit, fit_predict and fit_transform function call made by any tracked class with those methods.

Defining a hook for an event

A hook can be defined in the mapping file with 3 different ways.

  1. Always:

    {
      "name": "sklearn classification metrics",
      "other_names": [],
      "implementation": {
        "sklearn": "sklearn.metrics.classification"
      },
      "hooks": {
        "pypads_metric": "always"
      }
    }
    

    This hook triggers always. If you annotate a module with this hook, all its functions and classes will be tracked.

  2. QualNameHook:

    {
      "name": "sklearn classification metrics",
      "other_names": [],
      "implementation": {
        "sklearn": "sklearn.metrics.classification"
      },
      "hooks": {
        "pypads_metric": ["f1_score"]
      }
    }
    

    Tracks function with a name matching the given Regex.

  3. PackageNameHook:

    {
      "name": "sklearn classification metrics",
      "other_names": [],
      "implementation": {
        "sklearn": "sklearn.metrics"
      },
      "hooks": {
        "pypads_metric": [{"type": "package_name", "name":".*classification.*"}]
      }
    }
    

    Tracks all attributes of the module where “package_name” is matching Regex.

Define an event

Once the hooks are defined, they are then linked to the events we want them to trigger. Following the example below, the hook pypads_metric will be linked to an event we call Metrics for example. This is done via passing a dictionary as the parameter config to the PyPads class:

config = {"events": {
                    "Metrics" : {"on": ["pypads_metrics"]}
                    }
         }

PyPads loggers

PyPads has a set of built-in logging functions that are mapped by default to some pre-defined events. Check the default setting of PyPads here. The user can also define custom logging functions for custom events. Details on how to do that can be found (here).