PySpark SQL Functions | count method
Start your free 7-days trial now!
PySpark SQL Functions' count(~)
is an aggregate method used in conjunction with the agg(~)
method to compute the number of items in each group.
Parameters
1. col
| string
or Column
The column to perform the count on.
Return Value
A new PySpark Column.
Examples
Consider the following PySpark DataFrame:
+-----+-----+| name|class|+-----+-----+| Alex| A|| Bob| B||Cathy| A|+-----+-----+
Counting the number of items in each group
To count the number of rows for each class
group:
Here, note the following:
we are first grouping by the
class
column usinggroupBy(~)
, and then for each group, we are counting how many rows there are. Technically speaking, we are counting the number ofclass
values in each group (F.count('class')
), but this is equivalent to just counting the number of rows in each group.we are assigning a label to the resulting aggregate column using the
alias(~)
method. Note that the default label assigned is'count'
.