Module leapyear.functions¶
Functions for Attributes.
Datetime functions¶
Time functions for Attributes.
-
leapyear.functions.time.
add_months
(start_date, num_months)¶ Return the date that is num_months after start_date.
NOTE: This function can take a datetime as input, but produces a date output regardless.
Examples
>>> str(dt) '2004-02-29 23:59:59' >>> add_months(dt, 1) '2004-03-29' >>> add_months(dt, 13) '2005-03-01'
- Return type
Attribute
-
leapyear.functions.time.
date_add
(start, days)¶ Return the date that is days days after start.
NOTE: This function can take a datetime as input, but produces a date output regardless.
Examples
>>> str(dt) '2004-02-28 23:59:59' >>> date_add(dt, 1) '2004-02-29' >>> date_add(dt, 2) '2004-03-01'
- Return type
Attribute
-
leapyear.functions.time.
date_sub
(start, days)¶ Return the date that is days days before start.
NOTE: This function can take a datetime as input, but produces a date output regardless.
Examples
>>> str(dt) '2004-03-01 23:59:59' >>> date_sub(dt, 1) '2004-02-29' >>> date_sub(dt, 2) '2004-02-29'
- Return type
Attribute
-
leapyear.functions.time.
datediff
(end, start)¶ Return the number of days from start to end.
Examples
1. Create date diff column ‘date_sub’ that is a difference between datetime column ‘col1’ and datetime column ‘col2’ in ds1:
>>> import leapyear.functions as f >>> ds2 = ds1.with_attributes({'date_sub':f.time.datediff(f.col('col1'),f.col('col2'))})
- Return type
Attribute
-
leapyear.functions.time.
dayofmonth
(e)¶ Extract the day of the month as an integer from a given date/datetime attribute.
- Return type
Attribute
-
leapyear.functions.time.
dayofyear
(e)¶ Extract the day of the year as an integer from a given date/datetime attribute.
- Return type
Attribute
-
leapyear.functions.time.
hour
(e)¶ Extract the hours as an integer from a given date/datetime attribute.
- Return type
Attribute
-
leapyear.functions.time.
last_day
(e)¶ Return the last day of the month which the given date belongs to.
NOTE: This function can take a datetime as input, but produces a date output regardless.
Examples
>>> str(dt) '2004-02-01 23:59:59' >>> last_day(dt) '2004-02-29'
- Return type
Attribute
-
leapyear.functions.time.
minute
(e)¶ Extract the minutes as an integer from a given date/datetime attribute.
- Return type
Attribute
-
leapyear.functions.time.
month
(e)¶ Extract the month as an integer from a given date/datetime attribute.
- Return type
Attribute
-
leapyear.functions.time.
months_between
(date1, date2)¶ Return integer number of months between dates date1 and date2.
This is based on both the day and the month, not the number of days between the dates. Note in the last line of the examples that there is 1 month between
dateC
anddateB
even though there are only 29 days between them. The result is negative ifdate1
is >=1 month beforedate2
. The number of months is always rounded down to the nearest integer.Examples
>>> str(dateA) '2004-01-01' >>> str(dateB) '2004-02-01' >>> str(dateC) '2004-03-02' >>> months_between(dateA, dateB) -1 >>> months_between(dateB, dateA) 1 >>> months_between(dateC, dateB) 1
- Return type
Attribute
-
leapyear.functions.time.
next_day
(date, day_of_week)¶ Return the next date that falls on the specified day of the week.
NOTE: This function can take a datetime as input, but produces a date output regardless.
Examples
>>> str(dt) '2004-02-23 23:59:59' >>> next_day(dt, "Sunday") '2004-02-29'
- Unsupported Backends
Not supported for the following LeapYear compute backend(s): snowflake.
- Return type
Attribute
-
leapyear.functions.time.
quarter
(e)¶ Extract the quarter as an integer from a given date/datetime attribute.
- Return type
Attribute
-
leapyear.functions.time.
second
(e)¶ Extract the seconds as an integer from a given date/datetime attribute.
- Return type
Attribute
-
leapyear.functions.time.
to_date
(e)¶ Convert the column into Date type.
Examples
Create truncated column ‘date_trunc’ from a datetime column ‘col1’ in ds1:
>>> import leapyear.functions as f >>> ds2 = ds1.with_attributes({'date_trunc':f.time.to_date(f.col('col1'))})
- Return type
Attribute
-
leapyear.functions.time.
to_datetime
(ts)¶ Convert the column into Datetime type.
- Return type
Attribute
-
leapyear.functions.time.
trunc
(date, fmt)¶ Return date truncated to the unit specified by the format.
fmt can be one of: “year”, “month”, “day”
NOTE: This function can take a datetime as input, but produces a date output regardless.
Examples
>>> str(dt) '2004-02-29 23:59:59' >>> trunc(dt, "month") '2004-02-01'
- Return type
Attribute
-
leapyear.functions.time.
weekofyear
(e)¶ Extract the week number as an integer from a given date/datetime attribute.
Week numbers range from 1 to (up to) 53, and new weeks start on Monday. Dates at the begining / end of a year may be considered to be part of a week from the previous / next year. Full documentation of week numbering can be found here.
- Return type
Attribute
-
leapyear.functions.time.
year
(e)¶ Extract the year as an integer from a given date/datetime attribute.
- Return type
Attribute
-
leapyear.functions.time.
yearofweek
(e)¶ Extract the year based on the ISO week numbering where a week associated the previous year may spill over into the next calendar year.
Full documentation of week numbering can be found here.
- Return type
Attribute
-
leapyear.functions.time.
year_with_week
(e)¶ Extract the year and the ISO week from a Date column and returns a Factor which combines the two.
For example, given a row containing ‘dt(2021,1,1)’ this function returns ‘2020-53’.
Full documentation with week numbering can be found here.
- Return type
Attribute
-
leapyear.functions.time.
dayofweek
(e)¶ Extract the day of the week number as an integer from a given date/datetime attribute, where Monday = 1, …, Sunday = 7.
- Return type
Attribute
-
leapyear.functions.time.
parse_clamped_time
(e, bounds, *, fmt=None)¶ Parse a Text column and return a DateTime column using the given format clamped to bounds.
Format specification and default format depends on the back end:
- Spark
default format:
yyyy-MM-dd HH:mm:ss
(Java reference)- Snowflake
default format:
yyyy-MM-DD HH:MI:SS
(Snowflake reference)
Examples
>>> import datetime.datetime as dt >>> parse_clamped_time(f.col("date"), bounds=(dt(2000, 1, 1), dt(2020, 12, 31)), fmt="yyyMMdd HHmmss")
- Return type
Attribute
Math functions¶
Math functions for Attributes.
-
leapyear.functions.math.
acos
(e)¶ Compute the cosine inverse of the given value.
The returned angle is in the range 0 through \(\pi\).
- Return type
Attribute
-
leapyear.functions.math.
asin
(e)¶ Compute the sine inverse of the given value.
The returned angle is in the range -pi/2 through pi/2.
- Return type
Attribute
-
leapyear.functions.math.
atan
(e)¶ Compute the tangent inverse of the given value.
- Return type
Attribute
-
leapyear.functions.math.
cbrt
(e)¶ Compute the cube-root of the given value.
- Return type
Attribute
-
leapyear.functions.math.
ceil
(e)¶ Compute the ceiling of the given value.
- Return type
Attribute
-
leapyear.functions.math.
cos
(e)¶ Compute the cosine of the given value.
- Return type
Attribute
-
leapyear.functions.math.
cosh
(e)¶ Compute the hyperbolic cosine of the given value.
- Unsupported Backends
Not supported for the following LeapYear compute backend(s): snowflake.
- Return type
Attribute
-
leapyear.functions.math.
degrees
(e)¶ Convert an angle measured in radians to an equivalent angle measured in degrees.
- Return type
Attribute
-
leapyear.functions.math.
exp
(e)¶ Compute the exponential of the given value.
- Return type
Attribute
-
leapyear.functions.math.
expm1
(e)¶ Compute the exponential of the given value minus one.
- Return type
Attribute
-
leapyear.functions.math.
floor
(e)¶ Compute the floor of the given value.
- Return type
Attribute
-
leapyear.functions.math.
hypot
(x, y)¶ Compute sqrt(x^2 + y^2) without intermediate overflow or underflow.
- Unsupported Backends
Not supported for the following LeapYear compute backend(s): snowflake.
- Return type
Attribute
-
leapyear.functions.math.
log
(e)¶ Compute the natural logarithm of the given value.
- Return type
Attribute
-
leapyear.functions.math.
log10
(e)¶ Compute the logarithm of the given value in base 10.
- Return type
Attribute
-
leapyear.functions.math.
log1p
(e)¶ Compute the natural logarithm of the given value plus one.
- Return type
Attribute
-
leapyear.functions.math.
log2
(e)¶ Compute the logarithm of the given column in base 2.
- Return type
Attribute
-
leapyear.functions.math.
sigmoid
(e)¶ Compute the sigmoid of the given value.
- Unsupported Backends
Not supported for the following LeapYear compute backend(s): snowflake.
- Return type
Attribute
-
leapyear.functions.math.
pow
(base, exp)¶ Return the value of the first argument raised to the power of the second argument.
- Return type
Attribute
-
leapyear.functions.math.
radians
(e)¶ Convert an angle measured in degrees to an equivalent angle measured in radians.
- Return type
Attribute
-
leapyear.functions.math.
round
(e, scale=0)¶ Round the value of e to scale decimal places.
Examples
1. Create a new attribute ‘col1_trn’ which is derived by rounding ‘col1’ to 2 digits:
>>> import leapyear.functions as f >>> ds2 = ds1.with_attributes({'col1_trn': f.math.round(f.col('col1'),2)})
- Return type
Attribute
-
leapyear.functions.math.
signum
(e)¶ Compute the signum of the given value.
- Return type
Attribute
-
leapyear.functions.math.
sin
(e)¶ Compute the sine of the given value.
- Return type
Attribute
-
leapyear.functions.math.
sinh
(e)¶ Compute the hyperbolic sine of the given value.
- Return type
Attribute
-
leapyear.functions.math.
sqrt
(e)¶ Compute the square root of the specified float value.
- Return type
Attribute
-
leapyear.functions.math.
tan
(e)¶ Compute the tangent of the given value.
- Unsupported Backends
Not supported for the following LeapYear compute backend(s): snowflake.
- Return type
Attribute
-
leapyear.functions.math.
tanh
(e)¶ Compute the hyperbolic tangent of the given value.
- Return type
Attribute
-
leapyear.functions.math.
erf
(e)¶ Compute the error function on the given value.
- Unsupported Backends
Not supported for the following LeapYear compute backend(s): snowflake.
- Return type
Attribute
-
leapyear.functions.math.
erfc
(e)¶ Compute the error function on the given value.
- Unsupported Backends
Not supported for the following LeapYear compute backend(s): snowflake.
- Return type
Attribute
-
leapyear.functions.math.
inverf
(e)¶ Compute the error function on the given value.
- Return type
Attribute
-
leapyear.functions.math.
inverfc
(e)¶ Compute the error function on the given value.
- Return type
Attribute
Non-aggregate functions¶
Non-aggregate functions for Attributes.
-
leapyear.functions.non_aggregate.
all
(*exprs)¶ Return True if all columns are True.
- Return type
Attribute
-
leapyear.functions.non_aggregate.
attr_in
(input_attr_name, in_list)¶ Expression for filtering rows based on attribute values that are IN a list of values.
- Parameters
input_attr_name (String) – Name of Attribute based on which to filter
in_list (Iterable) – List values for filtering
- Returns
Boolean Attribute
- Return type
Examples
Filtering based on an attribute taking a value from a list of values:
>>> import leapyear.functions as f >>> ds2 = ds.where(f.attr_in('col1',in_list=[val1,val2,val3]))
-
leapyear.functions.non_aggregate.
attr_not_in
(input_attr_name, not_in_list)¶ Expression for filtering rows based on attribute values that are IN a list of values.
- Parameters
- Returns
Boolean Attribute
- Return type
Examples
Filtering based on an attribute NOT taking a value from a list of values:
>>> import leapyear.functions as f >>> ds2 = ds.where(f.attr_not_in('col1',not_in_list=[val1,val2,val3]))
-
leapyear.functions.non_aggregate.
any
(*exprs)¶ Return True if any column is True.
- Return type
Attribute
-
leapyear.functions.non_aggregate.
abs
(e)¶ Compute the absolute value.
- Return type
Attribute
-
leapyear.functions.non_aggregate.
col
(col_name)¶ Return an Attribute based on the given name.
- Return type
Attribute
-
leapyear.functions.non_aggregate.
column
(col_name)¶ Return an Attribute based on the given name.
- Return type
Attribute
-
leapyear.functions.non_aggregate.
greatest
(*exprs)¶ Return the greatest value of the list of column names, skipping null values.
- Return type
Attribute
-
leapyear.functions.non_aggregate.
isnull
(e)¶ Return true iff the column is null.
- Return type
Attribute
-
leapyear.functions.non_aggregate.
least
(*exprs)¶ Return the greatest value of the list of column names, skipping null values.
- Return type
Attribute
-
leapyear.functions.non_aggregate.
lit
(literal)¶ Create an Attribute of literal value.
-
leapyear.functions.non_aggregate.
negate
(e)¶ Unary minus.
- Return type
Attribute
-
leapyear.functions.non_aggregate.
not_
(e)¶ Inversion of boolean expression.
- Return type
Attribute
-
leapyear.functions.non_aggregate.
when
(condition, value)¶ Evaluate a list of conditions and returns one of multiple possible result expressions.
If otherwise is not defined at the end, null is returned for unmatched conditions.
Example
Encoding gender string column into an integer.
>>> df.select(when(col("gender") == "male", 0) ... .when(col("gender") == "female", 1) ... .otherwise(2))
- Return type
Attribute
-
leapyear.functions.non_aggregate.
to_text
(e)¶ Convert an attribute to a string representation.
- Return type
Attribute
String functions¶
Math functions for Attributes.
-
leapyear.functions.string.
ascii
(e)¶ Convert the first character of the string to its ASCII value.
- Return type
Attribute
-
leapyear.functions.string.
concat
(*exprs, sep=None)¶ Concatenation of strings with optional separator.
Concatenating factors {‘a’, ‘b’} and {‘c’, ‘d’} gives {‘ac’, ‘ad’, ‘bc’, ‘cd’}. Including the separator ” ” gives {‘a c’, ‘a d’, ‘b c’, ‘c d’}.
If any of the expressions are type TEXT, then the result will be TEXT.
Examples
1. Create new column ‘col1_trn’ which concatenation of ‘col1’ and ‘col2’ with separator ‘-‘. It can be used to contruct a date column from day and month columns:
>>> import leapyear.functions as f >>> ds2 = ds1.with_attributes({'col1_trn':f.concat(ds1['col1'],ds1['col2'],sep='-')})
- Return type
Attribute
-
leapyear.functions.string.
instr
(attr, substr)¶ Give the position of substring in the Attribute, otherwise 0.
- Return type
Attribute
-
leapyear.functions.string.
length
(attr)¶ Return the length of the string.
- Return type
Attribute
-
leapyear.functions.string.
levenshtein
(attr1, attr2)¶ Compute the Levenshtein distance between strings.
- Unsupported Backends
Not supported for the following LeapYear compute backend(s): snowflake.
- Return type
Attribute
-
leapyear.functions.string.
locate
(attr, substr, pos=None)¶ Give the position of substr in the Attribute, optionally after pos.
Position is returned as a positive integer starting at 1 if the substring is found, and 0 if the substring is not found.
-
leapyear.functions.string.
lpad
(attr, len_, pad)¶ Pad the string on the left with pad to make the total length len_.
When pad is an empty string, the string is returned without modification, or truncated to len_.
- Unsupported Backends
Not supported for the following LeapYear compute backend(s): snowflake.
- Return type
Attribute
-
leapyear.functions.string.
ltrim
(attr)¶ Remove whitespace characters on the beginning of the string.
- Return type
Attribute
-
leapyear.functions.string.
reverse
(attr)¶ Reverse the string.
- Return type
Attribute
-
leapyear.functions.string.
repeat
(attr, n)¶ Repeat the string n times.
- Unsupported Backends
Not supported for the following LeapYear compute backend(s): snowflake.
- Return type
Attribute
-
leapyear.functions.string.
rpad
(attr, len_, pad)¶ Pad the string on the right with pad to make the total length len_.
When pad is an empty string, the string is returned without modification, or truncated to len_.
- Unsupported Backends
Not supported for the following LeapYear compute backend(s): snowflake.
- Return type
Attribute
-
leapyear.functions.string.
rtrim
(attr)¶ Remove whitespace characters at the end of the string.
- Return type
Attribute
-
leapyear.functions.string.
soundex
(attr)¶ Return the soundex code for the specified expression.
- Unsupported Backends
Not supported for the following LeapYear compute backend(s): snowflake.
- Return type
Attribute
-
leapyear.functions.string.
substring
(attr, start, len_)¶ Return the substring of length len_ starting at start.
- Return type
Attribute
-
leapyear.functions.string.
substring_index
(attr, delim, len_)¶ Return the substring from string str before count occurrences of the delimiter delim.
If count is positive, everything the left of the final delimiter (counting from left) is returned. If count is negative, every to the right of the final delimiter (counting from the right) is returned. substring_index performs a case-sensitive match when searching for delim.
- Unsupported Backends
Not supported for the following LeapYear compute backend(s): snowflake.
- Return type
Attribute
-
leapyear.functions.string.
translate
(attr, match, replace)¶ Translate any character in the src by a character in replace.
The characters in replace correspond to the characters in matching. The translate will happen when any character in the string matches the character in the match.
- Return type
Attribute
-
leapyear.functions.string.
trim
(attr)¶ Remove the whitespace from the beginning and end of the string.
- Return type
Attribute
-
leapyear.functions.string.
lex_lt
(attr1, attr2)¶ Lexicographical less-than operation.
- Unsupported Backends
Not supported for the following LeapYear compute backend(s): snowflake.
- Return type
Attribute
-
leapyear.functions.string.
lex_lte
(attr1, attr2)¶ Lexicographical less-than-or-equal operation.
- Unsupported Backends
Not supported for the following LeapYear compute backend(s): snowflake.
- Return type
Attribute
-
leapyear.functions.string.
lex_gt
(attr1, attr2)¶ Lexicographical greater-than operation.
- Unsupported Backends
Not supported for the following LeapYear compute backend(s): snowflake.
- Return type
Attribute
-
leapyear.functions.string.
lex_gte
(attr1, attr2)¶ Lexicographical greater-than-or-equal operation.
- Unsupported Backends
Not supported for the following LeapYear compute backend(s): snowflake.
- Return type
Attribute
-
leapyear.functions.string.
remove_accents
(attr)¶ Remove all accents from the string.
- Unsupported Backends
Not supported for the following LeapYear compute backend(s): snowflake.
- Return type
Attribute
-
leapyear.functions.string.
lower
(attr)¶ Convert strings to lowercase.
- Return type
Attribute
-
leapyear.functions.string.
upper
(attr)¶ Convert strings to uppercase.
- Return type
Attribute
-
leapyear.functions.string.
regex_extract
(attr, pattern, group_idx)¶ Use a regular expression pattern to extract a part of the string.
This function always results in a nullable Text type.
The regex syntax depends on the backend:
- Spark
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
- Snowflake
https://docs.snowflake.com/en/sql-reference/functions-regexp.html#general-usage-notes
- Return type
Attribute
-
leapyear.functions.string.
regex_replace
(attr, pattern, replace)¶ Use a regular expression pattern to replace a part of the string.
This function always results in a nullable Text type.
The regex syntax depends on the backend:
- Spark
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
- Snowflake
https://docs.snowflake.com/en/sql-reference/functions-regexp.html#general-usage-notes
- Return type
Attribute
Windowing functions¶
Window functions for Attributes.
-
leapyear.functions.window.
lead
(e, i)¶ Compute the lead value.
- Return type
WindowAttribute
-
leapyear.functions.window.
lag
(e, i)¶ Compute the lag value.
- Return type
WindowAttribute
-
leapyear.functions.window.
first
(e)¶ Compute the first non-null value.
- Return type
WindowAttribute
-
leapyear.functions.window.
last
(e)¶ Compute the last non-null value.
- Return type
WindowAttribute
-
leapyear.functions.window.
count
(e=None)¶ Compute the number of non-null entries.
- Return type
WindowAttribute
-
leapyear.functions.window.
approx_count_distinct
(attrs)¶ Compute the number of distinct entries (using an approximate streaming algorithm).
- Return type
WindowAttribute
-
leapyear.functions.window.
mean
(e)¶ Compute the mean value.
Examples
1. Create a window specification that partitions based on ‘col1’ and orders by a ‘date_col’ and picks past 14 rows from current row 0. Then, we can compute mean of ‘col2’ over this window:
>>> import leapyear.functions as f >>> from leapyear.dataset import Window >>> ws1 = Window.partition_by(f.col('col1')) .order_by(f.col('date_col').asc()) .rows_between(start=-14,end=0) >>> ds2 = ds1.project(['date_col','col1','col2']) .with_attribute('mean_col2', f.window.mean(f.col('col2')).over(ws1))
- Return type
WindowAttribute
-
leapyear.functions.window.
avg
(e)¶ Compute the mean value.
Examples
1. Create a window specification that partitions based on ‘col1’ and orders by a ‘date_col’ and picks past 14 rows from current row 0. Then, we can compute mean of ‘col2’ over this window:
>>> import leapyear.functions as f >>> from leapyear.dataset import Window >>> ws1 = Window.partition_by(f.col('col1')) .order_by(f.col('date_col').asc()) .rows_between(start=-14,end=0) >>> ds2 = ds1.project(['date_col','col1','col2']) .with_attribute('mean_col2', f.window.mean(f.col('col2')).over(ws1))
- Return type
WindowAttribute
-
leapyear.functions.window.
sum
(e)¶ Compute the sum.
- Return type
WindowAttribute
-
leapyear.functions.window.
or_
(e)¶ Compute the or of boolean values.
- Return type
WindowAttribute
-
leapyear.functions.window.
and_
(e)¶ Compute the and of boolean values.
- Return type
WindowAttribute
-
leapyear.functions.window.
min
(e)¶ Compute the min value.
- Return type
WindowAttribute
-
leapyear.functions.window.
max
(e)¶ Compute the max value.
- Return type
WindowAttribute
-
leapyear.functions.window.
stddev
(e)¶ Compute the sample standard deviation.
- Return type
WindowAttribute
-
leapyear.functions.window.
stddev_samp
(e)¶ Compute the sample standard deviation.
- Return type
WindowAttribute
-
leapyear.functions.window.
stddev_pop
(e)¶ Compute the population standard deviation.
- Return type
WindowAttribute
-
leapyear.functions.window.
variance
(e)¶ Compute the sample variance.
- Return type
WindowAttribute
-
leapyear.functions.window.
variance_samp
(e)¶ Compute the sample variance.
- Return type
WindowAttribute
-
leapyear.functions.window.
variance_pop
(e)¶ Compute the population variance.
- Return type
WindowAttribute
-
leapyear.functions.window.
skewness
(e)¶ Compute the skewness.
- Return type
WindowAttribute
-
leapyear.functions.window.
kurtosis
(e)¶ Compute the kurtosis.
- Return type
WindowAttribute
-
leapyear.functions.window.
covar_samp
(e1, e2)¶ Compute the sample covariance.
- Return type
WindowAttribute
-
leapyear.functions.window.
covar_pop
(e1, e2)¶ Compute the population covariance.
- Return type
WindowAttribute
-
leapyear.functions.window.
corr
(e1, e2)¶ Compute the correlation.
- Return type
WindowAttribute