PySpark
keyboard_arrow_down 147 guides
chevron_leftPySpark RDD
check_circle
Mark as learned thumb_up
0
thumb_down
0
chat_bubble_outline
0
Comment auto_stories Bi-column layout
settings
PySpark RDD | glom method
schedule Aug 12, 2023
Last updated local_offer
Tags PySpark
tocTable of Contents
expand_more Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!
Start your free 7-days trial now!
PySpark RDD's glom()
method returns a RDD holding the content of each partition.
Parameters
This method does not take in any parameters.
Return Value
A PySpark RDD (pyspark.rdd.PipelinedRDD
).
Examples
Consider the following RDD:
['A', 'B', 'C', 'A']
Getting the values of each partition in PySpark RDD
To see the content of these partitions:
[['A'], ['B'], ['C', 'A']]
Here:
Partition 1 holds
'A'
Partition 2 holds
'B'
Partition 3 holds
'C'
and'A'
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
Official PySpark Documentation
https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.RDD.glom.html
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!