PySpark
keyboard_arrow_down 147 guides
chevron_leftPySpark RDD
check_circle
Mark as learned thumb_up
2
thumb_down
0
chat_bubble_outline
0
Comment auto_stories Bi-column layout
settings
PySpark RDD | keys method
schedule Aug 12, 2023
Last updated local_offer
Tags PySpark
tocTable of Contents
expand_more Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!
Start your free 7-days trial now!
PySpark RDD's keys(~)
method returns the keys of a pair RDD that contains tuples of length two.
Parameters
This method does not take in any parameters.
Return Value
A PySpark RDD (pyspark.rdd.PipelinedRDD
).
Examples
Consider the following PySpark pair RDD:
# Create a RDD using the parallelize method
[('a', 3), ('a', 2), ('b', 5), ('c', 1)]
Getting the keys of a pair RDD in PySpark
To get the keys of the pair RDD as a list of strings:
Note that if the RDD is not a pair RDD, then the values are returned:
['a', 'a', 'b', 'c']
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
Official PySpark Documentation
https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.RDD.keys.html
thumb_up
2
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!