import polars as pl
import turtle_island as ti
= pl.DataFrame(
df
{"col1": [1, 2, 3, 4, 5],
"col2": [6, 7, 8, 9, 10],
"col3": [11, 12, 13, 14, 15],
} )
🐢Turtle Island
Turtle Island is a lightweight utility library that provides helper functions to reduce boilerplate when writing Polars expressions. It aims to simplify common expression patterns and improve developer productivity when working with the Polars API.
🚀 Installation
Turtle Island is not yet published on PyPI. The recommended way to install it is using uv add
:
uv add git+https://github.com/jrycw/turtle-island.git
📦 Recommended Import
To keep your code clean and idiomatic, it’s recommended to import Turtle Island as a top-level module:
import turtle_island as ti
⚙️ Core Spirit
The core spirit of Turtle Island is to embrace expressions over columns.
When wrangling data, it’s common to create temporary helper columns as part of the transformation process. However, many of these columns are just intermediate artifacts — not part of the final output we actually want. They exist solely to assist with intermediate steps.
Polars offers a powerful distinction between contexts and expressions, allowing us to focus on expression-based transformations without needing to materialize every intermediate result as a column. Turtle Island builds on this principle, encouraging users to rely more on expressions — flexible, composable, and context-aware — rather than temporary columns.
Let’s walk through an example to clarify this approach.
Problem: Column Manipulation Based on Row Index
Say we have a DataFrame df
, and we want to transform the values in col1
and col2
such that:
- If the row index is odd (1st, 3rd, …), the values remain unchanged.
- If the row index is even (2nd, 4th, …), the values should be taken from
col3
.
Conventional Approach (Column-Oriented)
(
df.with_row_index().with_columns("index").mod(2).eq(0))
pl.when(pl.col("col1", "col2"))
.then(pl.col("col3"),
.otherwise(
) )
index | col1 | col2 | col3 |
---|---|---|---|
u32 | i64 | i64 | i64 |
0 | 1 | 6 | 11 |
1 | 12 | 12 | 12 |
2 | 3 | 8 | 13 |
3 | 14 | 14 | 14 |
4 | 5 | 10 | 15 |
In conventional Polars, achieving this pattern usually involves several steps:
- Add an
index
column usingpl.DataFrame.with_row_index()
to track row positions. - Use a
when-then-otherwise
expression insidewith_columns()
to check whether(index % 2) == 0
.
Because the index
column must be created first, the conditional logic can only be defined after that step, leading to a step-by-step materialization of intermediate columns.
Expression-Oriented Approach (Turtle Island Style)
With Turtle Island, you can express the same logic in a single with_columns()
context, thanks to expression-based helpers:
df.with_columns(
ti.case_when(=[(ti.is_every_nth_row(2), pl.col("col1", "col2"))],
case_list="col3",
otherwise
) )
col1 | col2 | col3 |
---|---|---|
i64 | i64 | i64 |
1 | 6 | 11 |
12 | 12 | 12 |
3 | 8 | 13 |
14 | 14 | 14 |
5 | 10 | 15 |
ti.is_every_nth_row()
returns a Polars expression rather than a materialized column. Think of it as a virtual column—you can use it directly in conditional logic without creating intermediate columns.
Meanwhile, ti.case_when()
offers a cleaner, more ergonomic way to write complex conditions. It’s optional, but especially useful when handling multiple branches. In practice, I’ve found it far easier to read and maintain than long chains of when-then
statements.
🔧 Add-ons
Some Turtle Island functions also integrate with Polars’ list namespace, offering a seamless experience. For instance, you can cycle elements downward by one position using ti.cycle()
within .list.eval()
:
= pl.DataFrame(
df2
{"x": [[1, 2, 3], [4, 5, 6]],
"y": [[7, 8, 9], [10, 11, 12]],
}
)all().list.eval(ti.cycle(pl.element(), 1))) df2.with_columns(pl.
x | y |
---|---|
list[i64] | list[i64] |
[3, 1, 2] | [9, 7, 8] |
[6, 4, 5] | [12, 10, 11] |