pyspark.RDD.take¶
- 
RDD.take(num: int) → List[T][source]¶
- Take the first num elements of the RDD. - It works by first scanning one partition, and use the results from that partition to estimate the number of additional partitions needed to satisfy the limit. - Translated from the Scala implementation in RDD#take(). - New in version 0.7.0. - Parameters
- numint
- first number of elements 
 
- Returns
- list
- the first num elements 
 
 - See also - Notes - This method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver’s memory. - Examples - >>> sc.parallelize([2, 3, 4, 5, 6]).cache().take(2) [2, 3] >>> sc.parallelize([2, 3, 4, 5, 6]).take(10) [2, 3, 4, 5, 6] >>> sc.parallelize(range(100), 100).filter(lambda x: x > 90).take(3) [91, 92, 93]