Module leapyear.analytics.classes

Classes related to LeapYear analyses.

Analysis Classes

This section documents classes that define and process analyses in the LeapYear system.

Main Classes

class leapyear.analytics.classes.Analysis(analysis, relation)

Any analysis that can be performed on the LeapYear server.

It contains an analysis and a relation on which the analysis should be performed.

check(*, cache=None, allow_max_budget_allocation=None, precise=None, **kwargs)

Check the analysis for errors.

If any errors are present, the function will raise a descriptive error. If no errors are found, then the function will return self.

Return type

Analysis[~_Result, ~_Model, ~_ModelMetadata]

run(*, detach: Literal[True], cache: Optional[bool] = None, allow_max_budget_allocation: Optional[bool] = None, precise: Optional[bool] = None, rich_result: bool = False, max_timeout_sec: Optional[float] = None, minimum_dataset_size: Optional[int] = None, **kwargs: Any)leapyear.analytics.classes.AsyncAnalysis[_Result, _Model, _ModelMetadata]
run(*, rich_result: Literal[True], cache: Optional[bool] = None, allow_max_budget_allocation: Optional[bool] = None, precise: Optional[bool] = None, max_timeout_sec: Optional[float] = None, minimum_dataset_size: Optional[int] = None, **kwargs: Any)leapyear.analytics.classes.RichResult[_Model, _ModelMetadata]
run(*, cache: Optional[bool] = None, allow_max_budget_allocation: Optional[bool] = None, precise: Optional[bool] = None, max_timeout_sec: Optional[float] = None, minimum_dataset_size: Optional[int] = None, **kwargs: Any)_Model

Run analysis.

Parameters
  • detach – If True when the analysis is sent to the server, it will return immediately with an AsyncAnalysis object. The analysis will be evaluated in the background on the server. The client can check the result of the analysis later using the AsyncAnalysis object.

  • cache – If True, then the first time this analysis is executed on the LeapYear server, the result will be cached. Subsequent calls to the identical analysis (with cache=True) will fetch the cached version and not contribute to the security cost. If None, the default caching behavior will be obtained from the connection.

  • allow_max_budget_allocation – Default is True. If False, raise an leapyear.exceptions.DataSetTooSmallException when the randomness calibration system would run an analysis with the maximum privacy exposure per computation. If None, the default value will be obtained from the connection.

  • precise – When True, request an answer with no noise added to the computation.

  • rich_result – When True, return a result with additional (potentially analysis-specific) metadata, including the privacy exposure expended in the process of performing the analysis. Defaults to False.

  • max_timeout_sec – When detach=False, specifies the maximum amount of time (in seconds) the user is willing to wait for a response. If set to None, the analysis will poll the server indefinitely. When computing on big data or long-running machine learning tasks, we recommend using the detach=True feature and use the functions provided in AsyncAnalysis. Defaults to waiting forever.

  • minimum_dataset_size – When minimum_dataset_size is set, prevent computations on data sets that have fewer rows than the specified value. We recommend using this when an analysis could filter down to a small number of records, potentially consuming more privacy budget than is desired. Setting this will spend a small amount of privacy budget to estimate the number of rows involved in a computation. This value is superseded by an admin-defined minimum_dataset_size parameter, if the admin’s value is larger.

Returns

The result of the analysis. Multiple return types are possible.

Return type

Union[_Model, AsyncAnalysis, RichResult[_Model, _ModelMetadata]]

maximum_privacy_exposure(minimum_dataset_size=None)

Maximum privacy exposure associated with running this analysis.

Estimate the maximum incremental privacy exposure that could result from running this computation for the current user. The result is represented as a percentage of privacy exposure limit for each data source.

Note: Estimating maximum privacy exposure may incur a small amount of privacy exposure.

Parameters

minimum_dataset_size (Optional[int]) – When minimum_dataset_size is set, prevent computations on data sets that have fewer rows than the specified value.

Returns

A dictionary that maps a TableIdentifier to the estimated maximum fractional privacy exposure that running the analysis would incur for the associated table.

Return type

FractionalPrivacyExposure

class leapyear.analytics.classes.AsyncAnalysis(async_job_id, *, analysis, rich_result)

Asynchronous job for running analysis queries.

check_status()

Check the status of the given asynchronous job.

Return type

AsyncJobStatus

cancel()

Cancel the job.

wait_to_cancel(**kwargs)

Wait for the given asynchronous job to finish.

Same as ‘wait’, but doesn’t error on cancellations. Takes the same arguments as ‘wait’.

Return type

None

process_result(result)

Process the result of running an analysis.

Return type

Union[~_Model, RichResult[~_Model, ~_ModelMetadata]]

serialize()

Serialize an external analysis to a string.

Return type

str

Analysis Subclasses

This section describes the subclasses of Analysis, generally determined by what type of output they produce.

class leapyear.analytics.classes.BoundsAnalysis(analysis, relation)

Analysis that results in a lower and upper bound.

class leapyear.analytics.classes.ClusteringAnalysis(analysis, relation)

Analysis that results in a clustering model.

class leapyear.analytics.classes.ConfusionModelAnalysis(analysis, relation)

Analysis that results in a ConfusionCurve object.

class leapyear.analytics.classes.CountAnalysis(analysis, relation)

Analysis that results in a scalar count value.

class leapyear.analytics.classes.CountAnalysisWithRI(analysis, relation)

Analysis that computes a scalar count value.

The user can request additional information about the computation with run(rich_result=True). A RandomizationInterval object will be generated.

class leapyear.analytics.classes.CrossValidationAnalysis(analysis, relation)

Analysis that results in multiple results of the same type.

class leapyear.analytics.classes.DescribeAnalysis(analysis, relation)

Analysis that produces a model describing a dataset.

class leapyear.analytics.classes.FailAnalysis(analysis, relation)

An analysis that always fails.

class leapyear.analytics.classes.ForestModelClassifierAnalysis(analysis, relation)

Analysis that results in a forest model.

class leapyear.analytics.classes.ForestModelRegressionAnalysis(analysis, relation)

Analysis that results in a forest model.

class leapyear.analytics.classes.GradientBoostedTreeClassifierModelAnalysis(analysis, relation)

Gradient boosted tree classifier analysis.

When executed, this analysis returns a model object of class GradientBoostedTreeClassifier.

class leapyear.analytics.classes.GroupbyAggAnalysis(analysis, relation)

Analysis that results in a GroupbyAgg.

class leapyear.analytics.classes.GenLinAnalysis(analysis, relation)

Analysis that results in a generalized linear model.

class leapyear.analytics.classes.Histogram2DAnalysis(analysis, relation)

Analysis that results in a 2d histogram.

class leapyear.analytics.classes.HistogramAnalysis(analysis, relation)

Analysis that results in a histogram.

class leapyear.analytics.classes.HyperOptAnalysis(analysis, relation)

Analysis that returns the result of hyperparameter optimization.

class leapyear.analytics.classes.MatrixAnalysis(analysis, relation)

Analysis that results in a matrix of float values.

class leapyear.analytics.classes.ScalarAnalysis(analysis, relation)

Analysis that results in a scalar float value.

class leapyear.analytics.classes.ScalarAnalysisWithRI(analysis, relation)

Analysis that computes a scalar value.

The user can request additional information about the computation with run(rich_result=True). In this case, a RandomizationInterval object will be generated.

This likely interval is likely to include the exact value of the computation on the data sample.

class leapyear.analytics.classes.ScalarFromHistogramAnalysis(f, *args)

Analysis that uses a histogram to compute a scalar value.

class leapyear.analytics.classes.SleepAnalysis(analysis, relation)

An analysis that will sleep for a set amount of microseconds.

class leapyear.analytics.classes.TypeAnalysis(analysis, relation)

Analysis that results list of counts associated with types.

Rich Results

This section documents classes related to rich results and privacy exposure measurements.

class leapyear.analytics.classes.RandomizationInterval(estimation_method: str, confidence_level: float, low: float, high: float)

An interval estimating the uncertainty in the exact answer, given randomized output.

A RandomizationInterval can be generated for a subset of the analyses offered by LeapYear by running an analysis with run(rich_result = True).

The interval between low and high is expected to include the exact value of the computation on the data sample with the stated confidence_level (e.g. 95%).

Parameters
  • confidence_level (float) – The confidence that the exact answer lies within the interval.

  • low (float) – The lower bound of the interval.

  • high (float) – The upper bound of the interval.

  • estimation_method (str) –

    The method used to compute the interval depends on analysis type.

    With 'bayesian' method, the randomization interval is obtained using a posterior distribution analysis based on non-informative prior, the knowledge of randomized output and the scale of the randomization effect applied.

    With 'approximate' method, the randomization interval is estimated using simplified simulation process.

    Note

    The 'approximate' estimation method tends to produce biased intervals for small data samples.

class leapyear.analytics.classes.FractionalPrivacyExposure

A fractional measure of privacy exposure for a collection of tables.

This is represented by a dictionary, mapping a TableIdentifier to the fraction of total privacy exposure that has been expended for the table corresponding to the TableIdentifier.

Aggregate Results

This section documents classes related to aggregate resuls from group by operations.

class leapyear.analytics.classes.GroupbyAgg(aggregate_type: Sequence[_ml.GroupbyAgg], key_columns: Sequence[Attribute], aggs: Mapping[Tuple[Any, ], float])

A GroupBy Aggregate.

aggregate_type: Sequence[leapyear._tidl.protocol.query.select.machinelearning.GroupbyAgg]

Alias for field number 0

key_columns: Sequence[leapyear.dataset.attribute.Attribute]

Alias for field number 1

aggs: Mapping[Tuple[Any, ], float]

Alias for field number 2

to_dataframe(groups_as_index=True)

Convert to a pandas DataFrame.

Parameters

groups_as_index (bool, optional) – Whether the groupBy columns should be the MultiIndex of the resulting DataFrame. If False, then the groupBy columns are made into columns of the output. By default True.

Returns

A pandas DataFrame, with a multi-index corresponding to the key columns of the groupBy operation if groups_as_index = True.

Return type

pd.DataFrame