🐢Turtle Island

Turtle Island is a lightweight utility library that provides helper functions to reduce boilerplate when writing Polars expressions. It aims to simplify common expression patterns and improve developer productivity when working with the Polars API.

WarningDisclaimer

Turtle Island is in early development. The API is still evolving and may change without notice. Use with caution in production environments.

🚀 Installation

Turtle Island is not yet published on PyPI. The recommended way to install it is using uv add:

uv add git+https://github.com/jrycw/turtle-island.git

⚙️ Core Spirit

The core spirit of Turtle Island is to embrace expressions over columns.

When wrangling data, it’s common to create temporary helper columns as part of the transformation process. However, many of these columns are just intermediate artifacts — not part of the final output we actually want. They exist solely to assist with intermediate steps.

Polars offers a powerful distinction between contexts and expressions, allowing us to focus on expression-based transformations without needing to materialize every intermediate result as a column. Turtle Island builds on this principle, encouraging users to rely more on expressions — flexible, composable, and context-aware — rather than temporary columns.

Let’s walk through an example to clarify this approach.

Problem: Column Manipulation Based on Row Index

import polars as pl
import turtle_island as ti


df = pl.DataFrame(
    {
        "col1": [1, 2, 3, 4, 5],
        "col2": [6, 7, 8, 9, 10],
        "col3": [11, 12, 13, 14, 15],
    }
)

Say we have a DataFrame df, and we want to transform the values in col1 and col2 such that:

  • If the row index is odd (1st, 3rd, …), the values remain unchanged.
  • If the row index is even (2nd, 4th, …), the values should be taken from col3.

Conventional Approach (Column-Oriented)

(
    df.with_row_index().with_columns(
        pl.when(pl.col("index").mod(2).eq(0))
        .then(pl.col("col1", "col2"))
        .otherwise("col3"),
    )
)
shape: (5, 4)
indexcol1col2col3
u32i64i64i64
01611
1121212
23813
3141414
451015

In conventional Polars, achieving this pattern usually involves several steps:

  1. Add an index column using pl.DataFrame.with_row_index() to track row positions.
  2. Use a when-then-otherwise expression inside with_columns() to check whether (index % 2) == 0.

Because the index column must be created first, the conditional logic can only be defined after that step, leading to a step-by-step materialization of intermediate columns.

Expression-Oriented Approach (Turtle Island Style)

With Turtle Island, you can express the same logic in a single with_columns() context, thanks to expression-based helpers:

df.with_columns(
    ti.case_when(
        case_list=[(ti.is_every_nth_row(2), pl.col("col1", "col2"))],
        otherwise="col3",
    )
)
shape: (5, 3)
col1col2col3
i64i64i64
1611
121212
3813
141414
51015

ti.is_every_nth_row() returns a Polars expression rather than a materialized column. Think of it as a virtual column—you can use it directly in conditional logic without creating intermediate columns.

Meanwhile, ti.case_when() offers a cleaner, more ergonomic way to write complex conditions. It’s optional, but especially useful when handling multiple branches. In practice, I’ve found it far easier to read and maintain than long chains of when-then statements.

🔧 Add-ons

Some Turtle Island functions also integrate with Polars’ list namespace, offering a seamless experience. For instance, you can cycle elements downward by one position using ti.cycle() within .list.eval():

df2 = pl.DataFrame(
    {
        "x": [[1, 2, 3], [4, 5, 6]],
        "y": [[7, 8, 9], [10, 11, 12]],
    }
)
df2.with_columns(pl.all().list.eval(ti.cycle(pl.element(), 1)))
shape: (2, 2)
xy
list[i64]list[i64]
[3, 1, 2][9, 7, 8]
[6, 4, 5][12, 10, 11]