Module leapyear.analytics.classes¶
Classes related to LeapYear analyses.
Analysis Classes¶
This section documents classes that define and process analyses in the LeapYear system.
Main Classes¶
-
class
leapyear.analytics.classes.
Analysis
(analysis, relation)¶ Any analysis that can be performed on the LeapYear server.
It contains an analysis and a relation on which the analysis should be performed.
-
check
(*, cache=None, allow_max_budget_allocation=None, precise=None, **kwargs)¶ Check the analysis for errors.
If any errors are present, the function will raise a descriptive error. If no errors are found, then the function will return self.
- Return type
Analysis
[~_Result, ~_Model, ~_ModelMetadata]
-
run
(*, detach: Literal[True], cache: Optional[bool] = None, allow_max_budget_allocation: Optional[bool] = None, precise: Optional[bool] = None, rich_result: bool = False, max_timeout_sec: Optional[float] = None, minimum_dataset_size: Optional[int] = None, **kwargs: Any) → leapyear.analytics.classes.AsyncAnalysis[_Result, _Model, _ModelMetadata]¶ -
run
(*, rich_result: Literal[True], cache: Optional[bool] = None, allow_max_budget_allocation: Optional[bool] = None, precise: Optional[bool] = None, max_timeout_sec: Optional[float] = None, minimum_dataset_size: Optional[int] = None, **kwargs: Any) → leapyear.analytics.classes.RichResult[_Model, _ModelMetadata] -
run
(*, cache: Optional[bool] = None, allow_max_budget_allocation: Optional[bool] = None, precise: Optional[bool] = None, max_timeout_sec: Optional[float] = None, minimum_dataset_size: Optional[int] = None, **kwargs: Any) → _Model Run analysis.
- Parameters
detach – If
True
when the analysis is sent to the server, it will return immediately with an AsyncAnalysis object. The analysis will be evaluated in the background on the server. The client can check the result of the analysis later using the AsyncAnalysis object.cache – If
True
, then the first time this analysis is executed on the LeapYear server, the result will be cached. Subsequent calls to the identical analysis (withcache=True
) will fetch the cached version and not contribute to the security cost. IfNone
, the default caching behavior will be obtained from the connection.allow_max_budget_allocation – Default is
True
. IfFalse
, raise anleapyear.exceptions.DataSetTooSmallException
when the randomness calibration system would run an analysis with the maximum privacy exposure per computation. IfNone
, the default value will be obtained from the connection.precise – When
True
, request an answer with no noise added to the computation.rich_result – When
True
, return a result with additional (potentially analysis-specific) metadata, including the privacy exposure expended in the process of performing the analysis. Defaults toFalse
.max_timeout_sec – When
detach=False
, specifies the maximum amount of time (in seconds) the user is willing to wait for a response. If set toNone
, the analysis will poll the server indefinitely. When computing on big data or long-running machine learning tasks, we recommend using thedetach=True
feature and use the functions provided inAsyncAnalysis
. Defaults to waiting forever.minimum_dataset_size – When
minimum_dataset_size
is set, prevent computations on data sets that have fewer rows than the specified value. We recommend using this when an analysis could filter down to a small number of records, potentially consuming more privacy budget than is desired. Setting this will spend a small amount of privacy budget to estimate the number of rows involved in a computation. This value is superseded by an admin-definedminimum_dataset_size
parameter, if the admin’s value is larger.
- Returns
The result of the analysis. Multiple return types are possible.
- Return type
Union[_Model, AsyncAnalysis, RichResult[_Model, _ModelMetadata]]
-
maximum_privacy_exposure
(minimum_dataset_size=None)¶ Maximum privacy exposure associated with running this analysis.
Estimate the maximum incremental privacy exposure that could result from running this computation for the current user. The result is represented as a percentage of privacy exposure limit for each data source.
Note: Estimating maximum privacy exposure may incur a small amount of privacy exposure.
- Parameters
minimum_dataset_size (
Optional
[int
]) – Whenminimum_dataset_size
is set, prevent computations on data sets that have fewer rows than the specified value.- Returns
A dictionary that maps a
TableIdentifier
to the estimated maximum fractional privacy exposure that running the analysis would incur for the associated table.- Return type
-
-
class
leapyear.analytics.classes.
AsyncAnalysis
(async_job_id, *, analysis, rich_result)¶ Asynchronous job for running analysis queries.
-
check_status
()¶ Check the status of the given asynchronous job.
- Return type
-
cancel
()¶ Cancel the job.
-
wait_to_cancel
(**kwargs)¶ Wait for the given asynchronous job to finish.
Same as ‘wait’, but doesn’t error on cancellations. Takes the same arguments as ‘wait’.
- Return type
-
Analysis Subclasses¶
This section describes the subclasses of Analysis
, generally determined by what type of output they produce.
-
class
leapyear.analytics.classes.
BoundsAnalysis
(analysis, relation)¶ Analysis that results in a lower and upper bound.
-
class
leapyear.analytics.classes.
ClusteringAnalysis
(analysis, relation)¶ Analysis that results in a clustering model.
-
class
leapyear.analytics.classes.
ConfusionModelAnalysis
(analysis, relation)¶ Analysis that results in a ConfusionCurve object.
-
class
leapyear.analytics.classes.
CountAnalysis
(analysis, relation)¶ Analysis that results in a scalar count value.
-
class
leapyear.analytics.classes.
CountAnalysisWithRI
(analysis, relation)¶ Analysis that computes a scalar count value.
The user can request additional information about the computation with
run(rich_result=True)
. ARandomizationInterval
object will be generated.
-
class
leapyear.analytics.classes.
CrossValidationAnalysis
(analysis, relation)¶ Analysis that results in multiple results of the same type.
-
class
leapyear.analytics.classes.
DescribeAnalysis
(analysis, relation)¶ Analysis that produces a model describing a dataset.
-
class
leapyear.analytics.classes.
FailAnalysis
(analysis, relation)¶ An analysis that always fails.
-
class
leapyear.analytics.classes.
ForestModelClassifierAnalysis
(analysis, relation)¶ Analysis that results in a forest model.
-
class
leapyear.analytics.classes.
ForestModelRegressionAnalysis
(analysis, relation)¶ Analysis that results in a forest model.
-
class
leapyear.analytics.classes.
GradientBoostedTreeClassifierModelAnalysis
(analysis, relation)¶ Gradient boosted tree classifier analysis.
When executed, this analysis returns a model object of class
GradientBoostedTreeClassifier
.
-
class
leapyear.analytics.classes.
GroupbyAggAnalysis
(analysis, relation)¶ Analysis that results in a
GroupbyAgg
.
-
class
leapyear.analytics.classes.
GenLinAnalysis
(analysis, relation)¶ Analysis that results in a generalized linear model.
-
class
leapyear.analytics.classes.
Histogram2DAnalysis
(analysis, relation)¶ Analysis that results in a 2d histogram.
-
class
leapyear.analytics.classes.
HistogramAnalysis
(analysis, relation)¶ Analysis that results in a histogram.
-
class
leapyear.analytics.classes.
HyperOptAnalysis
(analysis, relation)¶ Analysis that returns the result of hyperparameter optimization.
-
class
leapyear.analytics.classes.
MatrixAnalysis
(analysis, relation)¶ Analysis that results in a matrix of float values.
-
class
leapyear.analytics.classes.
ScalarAnalysis
(analysis, relation)¶ Analysis that results in a scalar float value.
-
class
leapyear.analytics.classes.
ScalarAnalysisWithRI
(analysis, relation)¶ Analysis that computes a scalar value.
The user can request additional information about the computation with
run(rich_result=True)
. In this case, aRandomizationInterval
object will be generated.This likely interval is likely to include the exact value of the computation on the data sample.
-
class
leapyear.analytics.classes.
ScalarFromHistogramAnalysis
(f, *args)¶ Analysis that uses a histogram to compute a scalar value.
-
class
leapyear.analytics.classes.
SleepAnalysis
(analysis, relation)¶ An analysis that will sleep for a set amount of microseconds.
-
class
leapyear.analytics.classes.
TypeAnalysis
(analysis, relation)¶ Analysis that results list of counts associated with types.
Rich Results¶
This section documents classes related to rich results and privacy exposure measurements.
-
class
leapyear.analytics.classes.
RandomizationInterval
(estimation_method: str, confidence_level: float, low: float, high: float)¶ An interval estimating the uncertainty in the exact answer, given randomized output.
A
RandomizationInterval
can be generated for a subset of the analyses offered by LeapYear by running an analysis withrun(rich_result = True)
.The interval between
low
andhigh
is expected to include the exact value of the computation on the data sample with the statedconfidence_level
(e.g. 95%).- Parameters
confidence_level (float) – The confidence that the exact answer lies within the interval.
low (float) – The lower bound of the interval.
high (float) – The upper bound of the interval.
estimation_method (str) –
The method used to compute the interval depends on analysis type.
With
'bayesian'
method, the randomization interval is obtained using a posterior distribution analysis based on non-informative prior, the knowledge of randomized output and the scale of the randomization effect applied.With
'approximate'
method, the randomization interval is estimated using simplified simulation process.Note
The
'approximate'
estimation method tends to produce biased intervals for small data samples.
-
class
leapyear.analytics.classes.
FractionalPrivacyExposure
¶ A fractional measure of privacy exposure for a collection of tables.
This is represented by a dictionary, mapping a
TableIdentifier
to the fraction of total privacy exposure that has been expended for the table corresponding to theTableIdentifier
.
Aggregate Results¶
This section documents classes related to aggregate resuls from group by operations.
-
class
leapyear.analytics.classes.
GroupbyAgg
(aggregate_type: Sequence[_ml.GroupbyAgg], key_columns: Sequence[Attribute], aggs: Mapping[Tuple[Any, …], float])¶ A GroupBy Aggregate.
-
aggregate_type
: Sequence[leapyear._tidl.protocol.query.select.machinelearning.GroupbyAgg]¶ Alias for field number 0
-
key_columns
: Sequence[leapyear.dataset.attribute.Attribute]¶ Alias for field number 1
-
to_dataframe
(groups_as_index=True)¶ Convert to a pandas DataFrame.
- Parameters
groups_as_index (bool, optional) – Whether the groupBy columns should be the
MultiIndex
of the resultingDataFrame
. IfFalse
, then the groupBy columns are made into columns of the output. By default True.- Returns
A
pandas
DataFrame
, with a multi-index corresponding to the key columns of the groupBy operation ifgroups_as_index
= True.- Return type
pd.DataFrame
-