pyspark.RDD.keyBy#
- RDD.keyBy(f)[source]#
- Creates tuples of the elements in this RDD by applying f. - New in version 0.9.1. - Parameters
- ffunction
- a function to compute the key 
 
- Returns
 - See also - Examples - >>> rdd1 = sc.parallelize(range(0,3)).keyBy(lambda x: x*x) >>> rdd2 = sc.parallelize(zip(range(0,5), range(0,5))) >>> [(x, list(map(list, y))) for x, y in sorted(rdd1.cogroup(rdd2).collect())] [(0, [[0], [0]]), (1, [[1], [1]]), (2, [[], [2]]), (3, [[], [3]]), (4, [[2], [4]])]