Skip to content

Commit

Permalink
[SPARK-5878] fix DataFrame.repartition() in Python
Browse files Browse the repository at this point in the history
Also add tests for distinct()

Author: Davies Liu <[email protected]>

Closes apache#4667 from davies/repartition and squashes the following commits:

79059fd [Davies Liu] add test
cb4915e [Davies Liu] fix repartition
  • Loading branch information
Davies Liu authored and rxin committed Feb 18, 2015
1 parent de0dd6d commit c1b6fa9
Showing 1 changed file with 7 additions and 1 deletion.
8 changes: 7 additions & 1 deletion python/pyspark/sql/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -434,12 +434,18 @@ def unpersist(self, blocking=True):
def repartition(self, numPartitions):
""" Return a new :class:`DataFrame` that has exactly `numPartitions`
partitions.
>>> df.repartition(10).rdd.getNumPartitions()
10
"""
return DataFrame(self._jdf.repartition(numPartitions, None), self.sql_ctx)
return DataFrame(self._jdf.repartition(numPartitions), self.sql_ctx)

def distinct(self):
"""
Return a new :class:`DataFrame` containing the distinct rows in this DataFrame.
>>> df.distinct().count()
2L
"""
return DataFrame(self._jdf.distinct(), self.sql_ctx)

Expand Down

0 comments on commit c1b6fa9

Please sign in to comment.