import polars as pl
import turtle_island as ti
10)
pl.Config.set_fmt_table_cell_list_len(= pl.DataFrame({"x": [1, 2, 3, 4, 5]})
df 2)) df.with_columns(ti.is_every_nth_row(
x | bool_nth_row |
---|---|
i64 | bool |
1 | true |
2 | false |
3 | true |
4 | false |
5 | true |
Returns a Polars expression that is True
for every n
-th row (index modulo n
equals 0).
is_every_nth_row()
can be seen as the complement of pl.Expr.gather_every().
While pl.Expr.gather_every()
is typically used in a select()
context and may return a DataFrame with fewer rows, is_every_nth_row()
produces a predicate expression that can be used with select()
or with_columns()
to preserve the original row structure for further processing, or with filter()
to achieve the same result as pl.Expr.gather_every()
.
offset=
does not exceed the total number of rows
Since expressions are only evaluated at runtime, their validity cannot be checked until execution. If offset=
is greater than the number of rows in the DataFrame, the result will be a column filled with False
.
n : int
The interval to use for row selection. Should be positive.
offset : int = 0
Start the index at this offset. Cannot be negative.
name : str = 'bool_nth_row'
The name of the resulting column.
: pl.Expr
A boolean Polars expression.
Mark every second row:
x | bool_nth_row |
---|---|
i64 | bool |
1 | true |
2 | false |
3 | true |
4 | false |
5 | true |
To invert the result, use either the ~
operator or pl.Expr.not_()
:
x | ~2 | not_2 |
---|---|---|
i64 | bool | bool |
1 | false | false |
2 | true | true |
3 | false | false |
4 | true | true |
5 | false | false |
Use offset=
to shift the starting index:
x | bool_nth_row |
---|---|
i64 | bool |
1 | false |
2 | true |
3 | false |
4 | false |
5 | true |
For reference, here’s the output using pl.Expr.gather_every()
:
You can also combine multiple is_every_nth_row()
expressions to construct more complex row selections. For example, to select rows that are part of every second or every third row:
In the list namespace, it may be easier to think of each row as an element in a list. Conceptually, you’re working with a pl.Series
, where each row corresponds to one item in the list.
Mark every second element:
x | y |
---|---|
list[bool] | list[bool] |
[true, false, true, false] | [true, false, true, false] |
[true, false, true, false] | [true, false, true, false] |