Mutate
Basic examples
To create new variables based transformation of existing ones:
import tidypolars4sci as tp
from tidypolars4sci.data import starwars
df = (starwars
.head(5) # <= to select onlye the fist 5 for the example
.select('name', 'mass')
# create two new variables:
.mutate(mass2 = tp.col('mass') * 2,
mass2_squared = tp.col('mass2') * tp.col('mass2'),
)
)
df.print()
shape: (5, 4)
┌──────────────────────────────────────────────────┐
│ name mass mass2 mass2_squared │
│ str f64 f64 f64 │
╞══════════════════════════════════════════════════╡
│ Luke Skywalker 77.00 154.00 23,716.00 │
│ C-3PO 75.00 150.00 22,500.00 │
│ R2-D2 32.00 64.00 4,096.00 │
│ Darth Vader 136.00 272.00 73,984.00 │
│ Leia Organa 49.00 98.00 9,604.00 │
└──────────────────────────────────────────────────┘
Change type of many variables at once
We can change the types of many variables that match a name pattern
# select some rows and varibles
df = (starwars
.head(5)
.select("name", "homeworld", "species")
)
df.print()
shape: (5, 3)
┌──────────────────────────────────────┐
│ name homeworld species │
│ str str str │
╞══════════════════════════════════════╡
│ Luke Skywalker Tatooine Human │
│ C-3PO Tatooine Droid │
│ R2-D2 Naboo Droid │
│ Darth Vader Tatooine Human │
│ Leia Organa Alderaan Human │
└──────────────────────────────────────┘
# change to factor (i.e., category) those whose name matches hom|sp
df = df.mutate(tp.across(tp.matches("hom|sp"), tp.as_factor))
df.print()
shape: (5, 3)
┌──────────────────────────────────────┐
│ name homeworld species │
│ str cat cat │
╞══════════════════════════════════════╡
│ Luke Skywalker Tatooine Human │
│ C-3PO Tatooine Droid │
│ R2-D2 Naboo Droid │
│ Darth Vader Tatooine Human │
│ Leia Organa Alderaan Human │
└──────────────────────────────────────┘
Using dynamic variable names
We can use dynamic names to create the new variable:
new_var = "mass2_squared"
df = (starwars
.head(5) # <= to select onlye the fist 5 for the example
.select('name', 'mass')
# create a new variable using dynamic name:
.mutate(**{new_var : tp.col('mass') **2 })
)
df.print()
shape: (5, 3)
┌─────────────────────────────────────────┐
│ name mass mass2_squared │
│ str f64 f64 │
╞═════════════════════════════════════════╡
│ Luke Skywalker 77.00 5,929.00 │
│ C-3PO 75.00 5,625.00 │
│ R2-D2 32.00 1,024.00 │
│ Darth Vader 136.00 18,496.00 │
│ Leia Organa 49.00 2,401.00 │
└─────────────────────────────────────────┘