Module leapyear.model

LeapYear models.

Data objects generated from training or evaluating models used in machine learning.

Regression-Based Models

class leapyear.model.GLM(affinity: bool, l1reg: float, l2reg: float, model: GeneralizedLinearModel)

A representation of a trained Generalized Linear Model (GLM).

Differentially private versions of GLMs are calibrated using various methods, e.g.

Objects of this class store parameters and structure of a regression model and can be used to generate predictions for regression and classification problems.

affinity: bool

Alias for field number 0

l1reg: float

Alias for field number 1

l2reg: float

Alias for field number 2

model: leapyear._tidl.protocol.algorithms.generalizedlinearmodel.GeneralizedLinearModel

Alias for field number 3

property coefficients

Model coefficients, excluding intercepts.

Return type


property intercept

Model intercept, if model has only one coefficient set.

Return type


property intercepts

Model intercepts, if any.

Return type


property model_type

Model type (e.g. linear, logistic).

Return type



Decision function of the generalized linear model.

Computes the height of the regression function (xbeta) at the provided points. This is purely linear transformation of the input features.

In case of logistic model, model would ultimately classify observations based on the sign of this decision function.


xs (ndarray) – a set of datapoints for which to predict


The predicted decision function

Return type



Prediction function of the generalized linear model.

For linear problems, returns the height of the regression line (decision function) at the data points provided.

For classification problems, returns boolean classification choice, which is based on the sign of this decision function.


xs (ndarray) – a set of datapoints for which to predict


the predictions for the points according to the model

Return type



Probabilities given by generalized linear model.

For logistic classification problems, returns probability that the model assigns to a positive response (True outcome variable) for each of the data points provided.


xs (ndarray) – array with input data


array of probability scores assigned by the model

Return type



Logarithm of probabilities given by generalized linear model.

For logistic classification problems, returns natural logarithm of probability that the model assigns to a True outcome for each of the data points provided.


xs (ndarray) – array with input data


array of log-probability scores assigned by the model

Return type



Convert to a dictionary.

Return type

Dict[str, Any]

classmethod from_dict(cls, d)

Convert from a dictionary.

Unsupported Backends

Not supported for the following LeapYear compute backend(s): snowflake.

Return type



Convert the trained model to SHAP format.

The converted model can then be used to construct a LinearExplainer object able to generate Shapley explanations for new records to which the model would be applied to.

Note that:

  • model execution and generation of model score explanations is expected to be done in a production setting by an automated system with direct access to record-level information.

  • feature explanations for categorical features are currently not supported. Consider one-hot encoding features to get the benefits of explainable model scores.


>>> import shap
>>> from leapyear import analytics as la
>>> ...
>>> glm_model = la.logreg(xs, y, ds).run()
>>> glm_explainer = shap.LinearExplainer(glm_model.to_shap(), X_reference)
>>> glm_shap_values = glm_explainer.shap_values(X_to_predict)
>>> ...

In this example:

  • The input X_reference used to initialize the explainer object is a pandas.DataFrame containing explanatory variables in the same order as used to train models. It is used to infer what model scores and feature distribution should be considered “typical”.

  • The input X_to_predict is a pandas.DataFrame capturing the explanatory variables in the same order as used to train models.

See the SHAP Linear documentation for more information.

LeapYear has been tested with SHAP version 0.39.0. Older or newer versions are not guaranteed to work.

Return type


Tree-Based Models

class leapyear.model.RandomForestClassifier(ntrees: int, height: int, model: DecisionForest)

A representation of a trained Random Forest classification model.

Provides methods for making predictions and report on feature importance statistics.

ntrees: int

Alias for field number 0

height: int

Alias for field number 1

model: leapyear._tidl.protocol.algorithms.decisiontree.DecisionForest

Alias for field number 2


Prediction function of the random forest classification model.

For classification problems, returns the most likely class according to the model.


xs (ndarray) – array with input data


array of most likely outcome labels assigned by the model

Return type



Prediction probability function of the random forest model.

For each of the data points provided, returns probability that the model assigns to any given outcome.


xs (ndarray) – array with input data


array of probability scores assigned by the model to input data points and possible outcomes

Return type



Logarithm of probabilities given by random forest model.

For each of the data points provided, returns natural logarithm of probability that the model assigns to any given outcome.


xs (ndarray) – array with input data


array of log-probability scores assigned by the model to input data points and possible outcomes

Return type


property feature_importance

Relative feature importance.

Feature importances are derived based on the information collected during model training with differentially private computations, specifically:

  1. For each tree and for each split of the tree, lookup value (gain) of introducing the split, as calculated on training data during model calibration - and attribute it to the splitting feature. See for specific calculation of split gain based on a notion of Gini impurity.

  2. To compute tree-specific feature importances, sum up split gains across all splits within each tree, weighted (multiplied) by parent node size, and re-scale these tree-specific feature importances to sum up to 1 for each tree.

  3. Average feature importances across all trees in the random forest ensemble to get final feature importance.

  • Hastie, Tibshirani, Friedman. “The Elements of Statistical Learning, 2nd Edition.” 2001.

Return type

Mapping[int, float]


Convert to a dictionary.

Return type

Dict[str, Any]

classmethod from_dict(cls, d)

Convert from a dictionary.

Unsupported Backends

Not supported for the following LeapYear compute backend(s): snowflake.

Return type



Convert the trained model to SHAP format.

The converted model can then be used to construct a TreeExplainer object able to generate Shapley explanations for new records to which the model would be applied to.

Note that:

  • model execution and generation of model score explanations is expected to be done in a production setting by an automated system with direct access to record-level information.

  • feature explanations for categorical features are currently not supported. Consider one-hot encoding features to get the benefits of explainable model scores.


>>> import shap
>>> from leapyear import analytics as la
>>> ...
>>> rfc_model = la.random_forest (xs, y, ds).run()
>>> rfc_explainer = shap.TreeExplainer(rfc_model.to_shap(), X_reference)
>>> rfc_shap_values = rfc_explainer.shap_values(X_to_predict)
>>> ...

In this example:

  • The input X_reference used to initialize the explainer object is a pandas.DataFrame containing explanatory variables in the same order as used to train models. It is used to infer what model scores and feature distribution should be considered “typical”.

  • The input X_to_predict is a pandas.DataFrame capturing the explanatory variables in the same order as used to train models.

See the SHAP Tree documentation for more information.

LeapYear has been tested with SHAP version 0.39.0. Older or newer versions are not guaranteed to work.

Return type


class leapyear.model.RandomForestRegressor(ntrees: int, height: int, model: DecisionForest)

A representation of a trained Random Forest regression model.

ntrees: int

Alias for field number 0

height: int

Alias for field number 1

model: leapyear._tidl.protocol.algorithms.decisiontree.DecisionForest

Alias for field number 2


Prediction function of the random forest regression model.

For each of the data points provided, returns the prediction that the model assigns.


xs (ndarray) – array with input data


array of predictions assigned by the model to input data points

Return type



Convert to a dictionary.

Return type

Dict[str, Any]

classmethod from_dict(cls, d)

Convert from a dictionary.

Unsupported Backends

Not supported for the following LeapYear compute backend(s): snowflake.

Return type



Convert the trained model to SHAP format.

The converted model can then be used to construct a TreeExplainer object able to generate Shapley explanations for new records to which the model would be applied to.

Note that:

  • model execution and generation of model score explanations is expected to be done in a production setting by an automated system with direct access to record-level information.

  • feature explanations for categorical features are currently not supported. Consider one-hot encoding features to get the benefits of explainable model scores.


>>> import shap
>>> from leapyear import analytics as la
>>> ...
>>> rf_model = la.regression_trees(xs, y, ds).run()
>>> rf_explainer = shap.TreeExplainer(rf_model.to_shap(), X_reference)
>>> rf_shap_values = rf_explainer.shap_values(X_to_predict)
>>> ...

In this example:

  • The input X_reference used to initialize the explainer object is a pandas.DataFrame containing explanatory variables in the same order as used to train models. It is used to infer what model scores and feature distribution should be considered “typical”.

  • The input X_to_predict is a pandas.DataFrame capturing the explanatory variables in the same order as used to train models.

See the SHAP Tree documentation for more information.

LeapYear has been tested with SHAP version 0.39.0. Older or newer versions are not guaranteed to work.

Return type


class leapyear.model.GradientBoostedTreeClassifier(max_depth: int, model: WeightedDecisionForest)

A representation of a trained gradient boosted tree classifier model.

This includes two named fields:

  • max_depth - the maximum depth of the individual decision trees.

  • model - a model object of class WeightedDecisionForest, including information about individual decision trees and their weights.

max_depth: int

Alias for field number 0

model: leapyear._tidl.protocol.algorithms.decisiontree.WeightedDecisionForest

Alias for field number 1


Prediction function of the gradient boosted tree classification model.

For classification problems, returns the most likely class according to the model.


xs (ndarray) – array with input data


array of most likely outcome labels assigned by the model

Return type



Prediction probability function of the GBT model.

For each of the data points provided, returns probability that the model assigns to any given outcome.


xs (ndarray) – array with input data


array of probability scores assigned by the model to input data points and possible outcomes

Return type



Logarithm of probabilities given by GBT model.

For each of the data points provided, returns natural logarithm of probability that the model assigns to any given outcome.


xs (ndarray) – array with input data


array of log-probability scores assigned by the model to input data points and possible outcomes

Return type



Convert to a dictionary.

Return type

Dict[str, Any]

classmethod from_dict(cls, d)

Convert from a dictionary.

Unsupported Backends

Not supported for the following LeapYear compute backend(s): snowflake.

Return type



Convert the trained model to SHAP format.

The converted model can then be used to construct a TreeExplainer object able to generate Shapley explanations for new records to which the model would be applied to.

Note that:

  • model execution and generation of model score explanations is expected to be done in a production setting by an automated system with direct access to record-level information.

  • feature explanations for categorical features are currently not supported. Consider one-hot encoding features to get the benefits of explainable model scores.


>>> import shap
>>> from leapyear import analytics as la
>>> ...
>>> gbt_model = la.gradient_boosted_tree_classifier(xs, y, ds).run()
>>> gbt_explainer = shap.TreeExplainer(gbt_model.to_shap(), X_reference)
>>> gbt_shap_values = gbt_explainer.shap_values(X_to_predict)
>>> ...

In this example:

  • The input X_reference used to initialize the explainer object is a pandas.DataFrame containing explanatory variables in the same order as used to train models. It is used to infer what model scores and feature distribution should be considered “typical”.

  • The input X_to_predict is a pandas.DataFrame capturing the explanatory variables in the same order as used to train models.

See the SHAP Tree documentation for more information.

LeapYear has been tested with SHAP version 0.39.0. Older or newer versions are not guaranteed to work.

Return type


Clustering Models

class leapyear.model.ClusterModel(niters: int, nclusters: int, model: _CM)

A representation of the trained K-means clustering model.

This model is generated by running a K-means clustering algorithm and contains cluster centroids (centers).

niters: int

Alias for field number 0

nclusters: int

Alias for field number 1

model: ClusterModel

Alias for field number 2

property centroids

Model centroids.

Return type

ndarray[Any, dtype[float64]]


Prediction function of the clustering model.

Returns the labels for each point in xs.


xs (ndarray) – A 2-dimensional array of data points.


The associated cluster labels predicted by the the model.

Return type



Convert to a dictionary.

Return type

Dict[str, Any]

classmethod from_dict(cls, d)

Convert from a dictionary.

Unsupported Backends

Not supported for the following LeapYear compute backend(s): snowflake.

Return type


Model Evaluation Objects

class leapyear.model.ConfusionCurve(model: _CC)

The Confusion curve object.

This model is generated from running and contains the metrics of true positive, false positive, true negative and false negative rates for a sequence of thresholds. Other common metrics are provided as properties of this model.

model: leapyear._tidl.protocol.algorithms.confusioncurve.ConfusionCurve

Alias for field number 0

property df

Return a dataframe containing most of the analytics.

Return type


property thresholds


Outputs the list of thresholds used for generating confusion curve.

Return type


property tpr

Compute true positive rates.

Outputs a list of true positive rate (sensitivity, recall) values, associated with chosen thresholds.

Aliases: tpr (true positive rate), sensitivity, recall

Return type


property sensitivity

Compute true positive rates.

Outputs a list of true positive rate (sensitivity, recall) values, associated with chosen thresholds.

Aliases: tpr (true positive rate), sensitivity, recall

Return type


property recall

Compute true positive rates.

Outputs a list of true positive rate (sensitivity, recall) values, associated with chosen thresholds.

Aliases: tpr (true positive rate), sensitivity, recall

Return type


property fpr

Compute false positive rates.

Outputs a list of false positive rate (fallout) values, associated with chosen thresholds.

Aliases: fpr (false positive rate), fallout

Return type


property fallout

Compute false positive rates.

Outputs a list of false positive rate (fallout) values, associated with chosen thresholds.

Aliases: fpr (false positive rate), fallout

Return type


property tnr

Compute true negative rates.

Outputs a list of true negative rate (specificity) values, associated with chosen thresholds.

Aliases: tnr (true negative rate), specificity

Return type


property specificity

Compute true negative rates.

Outputs a list of true negative rate (specificity) values, associated with chosen thresholds.

Aliases: tnr (true negative rate), specificity

Return type


property fnr

Compute false negative rates.

Outputs a list of false negative rate (miss rate) values, associated with chosen thresholds.

Aliases: fnr (false negative rate), missrate

Return type


property missrate

Compute false negative rates.

Outputs a list of false negative rate (miss rate) values, associated with chosen thresholds.

Aliases: fnr (false negative rate), missrate

Return type


property precision

Compute precision.

Aliases: precision, ppv (positive predictive value)

Return type


property ppv

Compute precision.

Aliases: precision, ppv (positive predictive value)

Return type


property npv

Negative predictive value.

Return type


property accuracy


Return type


property f1score


Return type


property mcc

Matthews correlation coefficient.

Return type


property auc_roc

Area under the ROC curve.

Calculates the area under Receiver Operating Characteristic (ROC) curve.

Return type


property auc_pr

Area under the Precision-Recall curve.

Return type


property gmeasure

the geometric mean of the precision and recall.



Return type




The Fbeta-score is the weighted harmonic mean between the precision and recall.


beta (float) – Non-negative float for the relative proportion of precision and recall.


Return type

The Fbeta score