Skip to content

df.filter()

df.filter([..]) selects rows based on the given conditions.

Setup

import numpy as np
import pandas as pd
import polars as pl

np.random.seed(42)
data = {"nrs": [1, 2, 3, 4, 5], "random": np.random.rand(5)}

df_pl = pl.DataFrame(data)
print(df_pl)

shape: (5, 2)
┌─────┬──────────┐
│ nrs ┆ random   │
│ --- ┆ ---      │
│ i64 ┆ f64      │
╞═════╪══════════╡
│ 1   ┆ 0.37454  │
│ 2   ┆ 0.950714 │
│ 3   ┆ 0.731994 │
│ 4   ┆ 0.598658 │
│ 5   ┆ 0.156019 │
└─────┴──────────┘

df_pd = pd.DataFrame(data)
print(df_pd)

   nrs    random
0    1  0.374540
1    2  0.950714
2    3  0.731994
3    4  0.598658
4    5  0.156019

Example

The behavior of df.filter([..]) can be treated as df.query(..) in Pandas.

out_pl = df_pl.filter((pl.col("nrs") > 2) & (pl.col("random") > 0.5))
print(out_pl)

shape: (2, 2)
┌─────┬──────────┐
│ nrs ┆ random   │
│ --- ┆ ---      │
│ i64 ┆ f64      │
╞═════╪══════════╡
│ 3   ┆ 0.731994 │
│ 4   ┆ 0.598658 │
└─────┴──────────┘

out_pd = df_pd.query("nrs > 2 & random > 0.5")
print(out_pd)

   nrs    random
2    3  0.731994
3    4  0.598658

Reference

The examples in this section have been adapted from the Polars user guide.