bucketize_lit

bucketize_lit(*items, return_dtype=None)

Returns a Polars expression that assigns a label to each row based on its index, cycling through the provided items in a round-robin fashion.

bucketize_lit() is a simplified version of bucketize(), designed for common use cases involving literal values. For more advanced scenarios, consider using bucketize() directly.

Parameters

items : Any = (): Literal values to cycle through. You can provide these either as multiple separate arguments or as a single iterable containing the values. All items must be of the same type, and at least two items are required. See the table below for supported types and their conversions.
return_dtype : pl.DataType | pl.DataTypeExpr | None = None: An optional Polars data type to cast the resulting expression to.

Returns

: pl.Expr: A Polars expression that cycles through the provided values based on the row index modulo.

Supported Type Conversions

Python Type	Converted To
`bool`	`pl.Boolean`
`datetime.datetime`	`pl.Datetime`
`datetime.date`	`pl.Date`
`datetime.time`	`pl.Time`
`datetime.timedelta`	`pl.Duration`
`int`	`pl.Int64`
`float`	`pl.Float64`
`str`	`pl.String`
`list`, `tuple`	`pl.List`
Others	no cast involved

Examples

DataFrame Context

Cycle through boolean values to mark alternating rows:

import polars as pl
import turtle_island as ti

pl.Config.set_fmt_table_cell_list_len(10)
df = pl.DataFrame({"x": [1, 2, 3, 4, 5]})
df.with_columns(ti.bucketize_lit(True, False).alias("bucketized"))

shape: (5, 2)

x	bucketized
i64	bool
1	true
2	false
3	true
4	false
5	true

Cast the result to a specific data type using return_dtype=:

df.with_columns(
    ti.bucketize_lit(True, False, return_dtype=pl.Int64).alias("bucketized")
)

shape: (5, 2)

x	bucketized
i64	i64
1	1
2	0
3	1
4	0
5	1

List Namespace Context

Working with Lists as Series

In the list namespace, it may be easier to think of each row as an element in a list. Conceptually, you’re working with a pl.Series, where each row corresponds to one item in the list.

Cycle through boolean values to mark alternating elements:

df2 = pl.DataFrame(
    {
        "x": [[1, 2, 3, 4], [5, 6, 7, 8]],
        "y": [[9, 10, 11, 12], [13, 14, 15, 16]],
    }
)
df2.with_columns(
    pl.all().list.eval(ti.bucketize_lit(True, False))
)

shape: (2, 2)

x	y
list[bool]	list[bool]
[true, false, true, false]	[true, false, true, false]
[true, false, true, false]	[true, false, true, false]