PySpark Column | getItem method
Start your free 7-days trial now!
PySpark Column's getItem(~)
method extracts a value from the lists or dictionaries in a PySpark Column.
Parameters
1. key
| any
The key
value depends on the column type:
for lists,
key
should be an integer index indicating the position of the value that you wish to extract.for dictionaries,
key
should be the key of the values you wish to extract.
Return Value
A new PySpark Column.
Examples
Consider the following PySpark DataFrame:
+------+| vals|+------+|[5, 6]||[7, 8]|+------+
Extracting n-th item in lists
To extract the second value from each list in the vals
column:
Note that we could also use [~]
syntax instead of getItem(~)
:
Specifying an index position that is out of bounds for the list will return a null
value:
Extracting values using keys in dictionaries
Consider the following PySpark DataFrame:
+----------------+| vals|+----------------+| {A -> 4}||{A -> 5, B -> 6}|+----------------+
To extract the value where the key is 'A'
:
Note that referring to keys that do not exist will return null
: