bucketize_lit

bucketize_lit(*items, return_dtype=None)

Returns a Polars expression that assigns a label to each row based on its index, cycling through the provided items in a round-robin fashion.

bucketize_lit() is a simplified version of bucketize(), designed for common use cases involving literal values. For more advanced scenarios, consider using bucketize() directly.

Parameters

items : Any = ()

Literal values to cycle through. You can provide these either as multiple separate arguments or as a single iterable containing the values. All items must be of the same type, and at least two items are required. See the table below for supported types and their conversions.

return_dtype : pl.DataType | pl.DataTypeExpr | None = None

An optional Polars data type to cast the resulting expression to.

Returns

: pl.Expr

A Polars expression that cycles through the provided values based on the row index modulo.

Supported Type Conversions

Python Type Converted To
bool pl.Boolean
datetime.datetime pl.Datetime
datetime.date pl.Date
datetime.time pl.Time
datetime.timedelta pl.Duration
int pl.Int64
float pl.Float64
str pl.String
list, tuple pl.List
Others no cast involved

Examples

DataFrame Context

Cycle through boolean values to mark alternating rows:

import polars as pl
import turtle_island as ti

pl.Config.set_fmt_table_cell_list_len(10)
df = pl.DataFrame({"x": [1, 2, 3, 4, 5]})
df.with_columns(ti.bucketize_lit(True, False).alias("bucketized"))
shape: (5, 2)
xbucketized
i64bool
1true
2false
3true
4false
5true

Cast the result to a specific data type using return_dtype=:

df.with_columns(
    ti.bucketize_lit(True, False, return_dtype=pl.Int64).alias("bucketized")
)
shape: (5, 2)
xbucketized
i64i64
11
20
31
40
51

List Namespace Context

In the list namespace, it may be easier to think of each row as an element in a list. Conceptually, you’re working with a pl.Series, where each row corresponds to one item in the list.

Cycle through boolean values to mark alternating elements:

df2 = pl.DataFrame(
    {
        "x": [[1, 2, 3, 4], [5, 6, 7, 8]],
        "y": [[9, 10, 11, 12], [13, 14, 15, 16]],
    }
)
df2.with_columns(
    pl.all().list.eval(ti.bucketize_lit(True, False))
)
shape: (2, 2)
xy
list[bool]list[bool]
[true, false, true, false][true, false, true, false]
[true, false, true, false][true, false, true, false]