Skip to content

df.with_columns()

df.with_columns([..]) allows you to create new columns in parallel. Unlike df.select([..]), it adds the newly created columns to the original dataframe instead of dropping them.

Setup

import numpy as np
import pandas as pd
import polars as pl

np.random.seed(42)
data = {"nrs": [1, 2, 3, 4, 5], "random": np.random.rand(5)}

df_pl = pl.DataFrame(data)
print(df_pl)

shape: (5, 2)
┌─────┬──────────┐
│ nrs ┆ random   │
│ --- ┆ ---      │
│ i64 ┆ f64      │
╞═════╪══════════╡
│ 1   ┆ 0.37454  │
│ 2   ┆ 0.950714 │
│ 3   ┆ 0.731994 │
│ 4   ┆ 0.598658 │
│ 5   ┆ 0.156019 │
└─────┴──────────┘

df_pd = pd.DataFrame(data)
print(df_pd)

   nrs    random
0    1  0.374540
1    2  0.950714
2    3  0.731994
3    4  0.598658
4    5  0.156019

Example

The behavior of df.with_columns([..]) can be treated as df.assign(..) in Pandas.

out_pl = df_pl.with_columns(
    pl.sum("nrs").alias("nrs_sum"), pl.col("random").count().alias("count")
)
print(out_pl)

shape: (5, 4)
┌─────┬──────────┬─────────┬───────┐
│ nrs ┆ random   ┆ nrs_sum ┆ count │
│ --- ┆ ---      ┆ ---     ┆ ---   │
│ i64 ┆ f64      ┆ i64     ┆ u32   │
╞═════╪══════════╪═════════╪═══════╡
│ 1   ┆ 0.37454  ┆ 15      ┆ 5     │
│ 2   ┆ 0.950714 ┆ 15      ┆ 5     │
│ 3   ┆ 0.731994 ┆ 15      ┆ 5     │
│ 4   ┆ 0.598658 ┆ 15      ┆ 5     │
│ 5   ┆ 0.156019 ┆ 15      ┆ 5     │
└─────┴──────────┴─────────┴───────┘

out_pd = df_pd.assign(
    nrs_sum=lambda df_: df_.nrs.sum(), count=lambda df_: df_.random.count()
)
print(out_pd)

   nrs    random  nrs_sum  count
0    1  0.374540       15      5
1    2  0.950714       15      5
2    3  0.731994       15      5
3    4  0.598658       15      5
4    5  0.156019       15      5

Reference

The examples in this section have been adapted from the Polars user guide.