🐢Turtle Island

Turtle Island is a lightweight utility library that provides helper functions to reduce boilerplate when writing Polars expressions. It aims to simplify common expression patterns and improve developer productivity when working with the Polars API.

Disclaimer

Turtle Island is in early development. The API is still evolving and may change without notice. Use with caution in production environments.

🚀 Installation

Turtle Island is not yet published on PyPI. The recommended way to install it is using uv add:

uv add git+https://github.com/jrycw/turtle-island.git

📦 Recommended Import

To keep your code clean and idiomatic, it’s recommended to import Turtle Island as a top-level module:

import turtle_island as ti

⚙️ Core Spirit

The core spirit of Turtle Island is to embrace expressions over columns.

When wrangling data, it’s common to create temporary helper columns as part of the transformation process. However, many of these columns are just intermediate artifacts — not part of the final output we actually want. They exist solely to assist with intermediate steps.

Polars offers a powerful distinction between contexts and expressions, allowing us to focus on expression-based transformations without needing to materialize every intermediate result as a column. Turtle Island builds on this principle, encouraging users to rely more on expressions — flexible, composable, and context-aware — rather than temporary columns.

Let’s walk through an example to clarify this approach.

Problem: Column Manipulation Based on Row Index

import polars as pl
import turtle_island as ti


df = pl.DataFrame(
    {
        "col1": [1, 2, 3, 4, 5],
        "col2": [6, 7, 8, 9, 10],
        "col3": [11, 12, 13, 14, 15],
    }
)

Say we have a DataFrame df, and we want to transform the values in col1 and col2 such that:

If the row index is odd (1st, 3rd, …), the values remain unchanged.
If the row index is even (2nd, 4th, …), the values should be taken from col3.

Conventional Approach (Column-Oriented)

(
    df.with_row_index().with_columns(
        pl.when(pl.col("index").mod(2).eq(0))
        .then(pl.col("col1", "col2"))
        .otherwise("col3"),
    )
)

shape: (5, 4)

index	col1	col2	col3
u32	i64	i64	i64
0	1	6	11
1	12	12	12
2	3	8	13
3	14	14	14
4	5	10	15

In conventional Polars, achieving this pattern usually involves several steps:

Add an index column using pl.DataFrame.with_row_index() to track row positions.
Use a when-then-otherwise expression inside with_columns() to check whether (index % 2) == 0.

Because the index column must be created first, the conditional logic can only be defined after that step, leading to a step-by-step materialization of intermediate columns.

Expression-Oriented Approach (Turtle Island Style)

With Turtle Island, you can express the same logic in a single with_columns() context, thanks to expression-based helpers:

df.with_columns(
    ti.case_when(
        case_list=[(ti.is_every_nth_row(2), pl.col("col1", "col2"))],
        otherwise="col3",
    )
)

shape: (5, 3)

col1	col2	col3
i64	i64	i64
1	6	11
12	12	12
3	8	13
14	14	14
5	10	15

ti.is_every_nth_row() returns a Polars expression rather than a materialized column. Think of it as a virtual column—you can use it directly in conditional logic without creating intermediate columns.

Meanwhile, ti.case_when() offers a cleaner, more ergonomic way to write complex conditions. It’s optional, but especially useful when handling multiple branches. In practice, I’ve found it far easier to read and maintain than long chains of when-then statements.

🔧 Add-ons

Some Turtle Island functions also integrate with Polars’ list namespace, offering a seamless experience. For instance, you can cycle elements downward by one position using ti.cycle() within .list.eval():

df2 = pl.DataFrame(
    {
        "x": [[1, 2, 3], [4, 5, 6]],
        "y": [[7, 8, 9], [10, 11, 12]],
    }
)
df2.with_columns(pl.all().list.eval(ti.cycle(pl.element(), 1)))

shape: (2, 2)

x	y
list[i64]	list[i64]
[3, 1, 2]	[9, 7, 8]
[6, 4, 5]	[12, 10, 11]