Skip to content

Commit

Permalink
[SPARK-42713][PYTHON][DOCS] Add '__getattr__' and '__getitem__' of Da…
Browse files Browse the repository at this point in the history
…taFrame and Column to API reference

### What changes were proposed in this pull request?
Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference

### Why are the changes needed?
 '__getattr__' and '__getitem__' are widely used, but we did not document them.

### Does this PR introduce _any_ user-facing change?
yes, new doc

### How was this patch tested?
added doctests

Closes apache#40331 from zhengruifeng/py_doc.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
  • Loading branch information
zhengruifeng authored and HyukjinKwon committed Mar 8, 2023
1 parent 8e83ab7 commit e28f7f3
Show file tree
Hide file tree
Showing 4 changed files with 96 additions and 0 deletions.
2 changes: 2 additions & 0 deletions python/docs/source/reference/pyspark.sql/column.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ Column
.. autosummary::
:toctree: api/

Column.__getattr__
Column.__getitem__
Column.alias
Column.asc
Column.asc_nulls_first
Expand Down
2 changes: 2 additions & 0 deletions python/docs/source/reference/pyspark.sql/dataframe.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ DataFrame
.. autosummary::
:toctree: api/

DataFrame.__getattr__
DataFrame.__getitem__
DataFrame.agg
DataFrame.alias
DataFrame.approxQuantile
Expand Down
59 changes: 59 additions & 0 deletions python/pyspark/sql/column.py
Original file line number Diff line number Diff line change
Expand Up @@ -639,11 +639,70 @@ def dropFields(self, *fieldNames: str) -> "Column":
return Column(jc)

def __getattr__(self, item: Any) -> "Column":
"""
An expression that gets an item at position ``ordinal`` out of a list,
or gets an item by key out of a dict.
.. versionadded:: 1.3.0
.. versionchanged:: 3.4.0
Support Spark Connect.
Parameters
----------
item
a literal value.
Returns
-------
:class:`Column`
Column representing the item got by key out of a dict.
Examples
--------
>>> df = spark.createDataFrame([('abcedfg', {"key": "value"})], ["l", "d"])
>>> df.select(df.d.key).show()
+------+
|d[key]|
+------+
| value|
+------+
"""
if item.startswith("__"):
raise AttributeError(item)
return self[item]

def __getitem__(self, k: Any) -> "Column":
"""
An expression that gets an item at position ``ordinal`` out of a list,
or gets an item by key out of a dict.
.. versionadded:: 1.3.0
.. versionchanged:: 3.4.0
Support Spark Connect.
Parameters
----------
k
a literal value, or a slice object without step.
Returns
-------
:class:`Column`
Column representing the item got by key out of a dict, or substrings sliced by
the given slice object.
Examples
--------
>>> df = spark.createDataFrame([('abcedfg', {"key": "value"})], ["l", "d"])
>>> df.select(df.l[slice(1, 3)], df.d['key']).show()
+------------------+------+
|substring(l, 1, 3)|d[key]|
+------------------+------+
| abc| value|
+------------------+------+
"""
if isinstance(k, slice):
if k.step is not None:
raise ValueError("slice with step is not supported.")
Expand Down
33 changes: 33 additions & 0 deletions python/pyspark/sql/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -2847,6 +2847,28 @@ def __getitem__(self, item: Union[int, str, Column, List, Tuple]) -> Union[Colum
.. versionadded:: 1.3.0
.. versionchanged:: 3.4.0
Support Spark Connect.
Parameters
----------
item : int, str, :class:`Column`, list or tuple
column index, column name, column, or a list or tuple of columns
Returns
-------
:class:`Column` or :class:`DataFrame`
a specified column, or a filtered or projected dataframe.
* If the input `item` is an int or str, the output is a :class:`Column`.
* If the input `item` is a :class:`Column`, the output is a :class:`DataFrame`
filtered by this given :class:`Column`.
* If the input `item` is a list or tuple, the output is a :class:`DataFrame`
projected by this given list or tuple.
Examples
--------
>>> df = spark.createDataFrame([
Expand All @@ -2862,6 +2884,14 @@ def __getitem__(self, item: Union[int, str, Column, List, Tuple]) -> Union[Colum
| 5|
+---+
>>> df.select(df[1]).show()
+-----+
| name|
+-----+
|Alice|
| Bob|
+-----+
Select multiple string columns as index.
>>> df[["name", "age"]].show()
Expand Down Expand Up @@ -2905,6 +2935,9 @@ def __getattr__(self, name: str) -> Column:
.. versionadded:: 1.3.0
.. versionchanged:: 3.4.0
Support Spark Connect.
Parameters
----------
name : str
Expand Down

0 comments on commit e28f7f3

Please sign in to comment.