Strings
Setup
Check for existence of a pattern
out_pl = df_pl.select(
pl.col("animal"),
pl.col("animal").str.contains("cat|bit").alias("regex"),
pl.col("animal").str.contains("rab$", literal=True).alias("literal"),
pl.col("animal").str.starts_with("rab").alias("starts_with"),
pl.col("animal").str.ends_with("dog").alias("ends_with"),
)
print(out_pl)
shape: (4, 5)
┌─────────────┬───────┬─────────┬─────────────┬───────────┐
│ animal ┆ regex ┆ literal ┆ starts_with ┆ ends_with │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ bool ┆ bool ┆ bool ┆ bool │
╞═════════════╪═══════╪═════════╪═════════════╪═══════════╡
│ Crab ┆ false ┆ false ┆ false ┆ false │
│ cat and dog ┆ true ┆ false ┆ false ┆ true │
│ rab$bit ┆ true ┆ true ┆ true ┆ false │
│ null ┆ null ┆ null ┆ null ┆ null │
└─────────────┴───────┴─────────┴─────────────┴───────────┘
out_pd = df_pd.assign(
animal=lambda df_: df_.animal,
regex=lambda df_: df_.animal.str.contains("cat|bit"),
literal=lambda df_: df_.animal.str.contains("rab$", regex=False),
starts_with=lambda df_: df_.animal.str.startswith("rab"),
ends_with=lambda df_: df_.animal.str.endswith("dog"),
)
print(out_pd)
There's a slight difference in syntax between Polars
and Pandas
when it comes to methods for checking whether the start and end of each string element matches a given pattern.
- In
Polars
, you usepl.col(..).str.starts_with(..)
andpl.col(..).str.ends_with(..)
. - In
Pandas
, the equivalent methods arepd.Series.str.startswith(..)
andpd.Series.str.endswith(..)
.
Extract a pattern
data2 = {
"a": [
"http://vote.com/ballon_dor?candidate=messi&ref=polars",
"http://vote.com/ballon_dor?candidat=jorginho&ref=polars",
"http://vote.com/ballon_dor?candidate=ronaldo&ref=polars",
]
}
Extract all occurrences of a pattern
Replace a pattern
Reference
The examples in this section have been adapted from the Polars
user guide.