PySpark
keyboard_arrow_down 147 guides
chevron_leftPySpark SQL Functions
Method arrayMethod colMethod collect_listMethod collect_setMethod concatMethod concat_wsMethod countMethod count_distinctMethod countDistinctMethod date_addMethod date_formatMethod dayofmonthMethod dayofweekMethod dayofyearMethod element_atMethod explodeMethod exprMethod firstMethod greatestMethod instrMethod isnanMethod lastMethod leastMethod lengthMethod litMethod lowerMethod maxMethod meanMethod minMethod monthMethod regexp_extractMethod regexp_replaceMethod repeatMethod roundMethod splitMethod to_dateMethod translateMethod trimMethod upperMethod whenMethod year
check_circle
Mark as learned thumb_up
0
thumb_down
0
chat_bubble_outline
0
Comment auto_stories Bi-column layout
settings
PySpark SQL Functions | mean method
schedule Aug 12, 2023
Last updated local_offer
Tags PySpark
tocTable of Contents
expand_more Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!
Start your free 7-days trial now!
PySpark SQL Functions' mean(~)
method returns the mean value in the specified column.
Parameters
1. col
| string
or Column
The column in which to obtain the mean value.
Return Value
A PySpark Column (pyspark.sql.column.Column
).
Examples
Consider the following PySpark DataFrame:
+----+---+|name|age|+----+---+|Alex| 25|| Bob| 30|+----+---+
Getting the mean of a PySpark column
To obtain the mean age
:
To get the mean age
as an integer:
list_rows = df.select(F.mean("age")).collect()list_rows[0][0]
27.5
Here, we are converting the PySpark DataFrame returned from select(~)
into a list of Row
objects using the collect()
method. This list is guaranteed to be of size one because the mean(~)
reduces column values into a single number. To access the content of the Row
object, we use another [0]
.
Getting the mean of each group in PySpark
Consider the following PySpark DataFrame:
["Bob", 30, "B"],\ ["Cathy", 50, "A"]], ["name", "age", "class"])
+-----+---+-----+| name|age|class|+-----+---+-----+| Alex| 20| A|| Bob| 30| B||Cathy| 50| A|+-----+---+-----+
To get the mean age
of each class
:
Here, we are using alias("MEAN AGE")
to assign a label to the aggregated age
column.
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
Official PySpark Documentation
https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.functions.mean.html
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!