is_every_nth_row

is_every_nth_row(n, offset=0, *, name='bool_nth_row')

Returns a Polars expression that is True for every n-th row (index modulo n equals 0).

is_every_nth_row() can be seen as the complement of pl.Expr.gather_every().

While pl.Expr.gather_every() is typically used in a select() context and may return a DataFrame with fewer rows, is_every_nth_row() produces a predicate expression that can be used with select() or with_columns() to preserve the original row structure for further processing, or with filter() to achieve the same result as pl.Expr.gather_every().

Since expressions are only evaluated at runtime, their validity cannot be checked until execution. If offset= is greater than the number of rows in the DataFrame, the result will be a column filled with False.

Parameters

n : int

The interval to use for row selection. Should be positive.

offset : int = 0

Start the index at this offset. Cannot be negative.

name : str = 'bool_nth_row'

The name of the resulting column.

Returns

: pl.Expr

A boolean Polars expression.

Examples

DataFrame Context

Mark every second row:

import polars as pl
import turtle_island as ti

pl.Config.set_fmt_table_cell_list_len(10)
df = pl.DataFrame({"x": [1, 2, 3, 4, 5]})
df.with_columns(ti.is_every_nth_row(2))
shape: (5, 2)
xbool_nth_row
i64bool
1true
2false
3true
4false
5true

To invert the result, use either the ~ operator or pl.Expr.not_():

df.with_columns(
    ~ti.is_every_nth_row(2).alias("~2"),
    ti.is_every_nth_row(2).not_().alias("not_2"),
)
shape: (5, 3)
x~2not_2
i64boolbool
1falsefalse
2truetrue
3falsefalse
4truetrue
5falsefalse

Use offset= to shift the starting index:

df.with_columns(ti.is_every_nth_row(3, 1))
shape: (5, 2)
xbool_nth_row
i64bool
1false
2true
3false
4false
5true

For reference, here’s the output using pl.Expr.gather_every():

df.select(pl.col("x").gather_every(3, 1))
shape: (2, 1)
x
i64
2
5

You can also combine multiple is_every_nth_row() expressions to construct more complex row selections. For example, to select rows that are part of every second or every third row:

df.select(
    ti.is_every_nth_row(2).alias("2"),
    ti.is_every_nth_row(3).alias("3"),
    ti.is_every_nth_row(2).or_(ti.is_every_nth_row(3)).alias("2_or_3")
)
shape: (5, 3)
232_or_3
boolboolbool
truetruetrue
falsefalsefalse
truefalsetrue
falsetruetrue
truefalsetrue

List Namespace Context

In the list namespace, it may be easier to think of each row as an element in a list. Conceptually, you’re working with a pl.Series, where each row corresponds to one item in the list.

Mark every second element:

df2 = pl.DataFrame(
    {
        "x": [[1, 2, 3, 4], [5, 6, 7, 8]],
        "y": [[9, 10, 11, 12], [13, 14, 15, 16]],
    }
)
df2.with_columns(pl.all().list.eval(ti.is_every_nth_row(2)))
shape: (2, 2)
xy
list[bool]list[bool]
[true, false, true, false][true, false, true, false]
[true, false, true, false][true, false, true, false]