Skip to content

Commit

Permalink
[SPARK-19064][PYSPARK] Fix pip installing of sub components
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

Fix instalation of mllib and ml sub components, and more eagerly cleanup cache files during test script & make-distribution.

## How was this patch tested?

Updated sanity test script to import mllib and ml sub-components.

Author: Holden Karau <[email protected]>

Closes apache#16465 from holdenk/SPARK-19064-fix-pip-install-sub-components.
  • Loading branch information
holdenk committed Jan 25, 2017
1 parent 92afaa9 commit 965c82d
Show file tree
Hide file tree
Showing 5 changed files with 15 additions and 2 deletions.
2 changes: 2 additions & 0 deletions dev/make-distribution.sh
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,8 @@ cp -r "$SPARK_HOME/data" "$DISTDIR"
if [ "$MAKE_PIP" == "true" ]; then
echo "Building python distribution package"
pushd "$SPARK_HOME/python" > /dev/null
# Delete the egg info file if it exists, this can cache older setup files.
rm -rf pyspark.egg-info || echo "No existing egg info file, skipping deletion"
python setup.py sdist
popd > /dev/null
else
Expand Down
2 changes: 2 additions & 0 deletions dev/pip-sanity-check.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@
from __future__ import print_function

from pyspark.sql import SparkSession
from pyspark.ml.param import Params
from pyspark.mllib.linalg import *
import sys

if __name__ == "__main__":
Expand Down
1 change: 1 addition & 0 deletions dev/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
jira==1.0.3
PyGithub==1.26.0
Unidecode==0.04.19
pypandoc==1.3.3
7 changes: 5 additions & 2 deletions dev/run-pip-tests
Original file line number Diff line number Diff line change
Expand Up @@ -78,11 +78,14 @@ for python in "${PYTHON_EXECS[@]}"; do
mkdir -p "$VIRTUALENV_PATH"
virtualenv --python=$python "$VIRTUALENV_PATH"
source "$VIRTUALENV_PATH"/bin/activate
# Upgrade pip
pip install --upgrade pip
# Upgrade pip & friends
pip install --upgrade pip pypandoc wheel
pip install numpy # Needed so we can verify mllib imports

echo "Creating pip installable source dist"
cd "$FWDIR"/python
# Delete the egg info file if it exists, this can cache the setup file.
rm -rf pyspark.egg-info || echo "No existing egg info file, skipping deletion"
$python setup.py sdist


Expand Down
5 changes: 5 additions & 0 deletions python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,12 @@ def _supports_symlinks():
url='https://github.com/apache/spark/tree/master/python',
packages=['pyspark',
'pyspark.mllib',
'pyspark.mllib.linalg',
'pyspark.mllib.stat',
'pyspark.ml',
'pyspark.ml.linalg',
'pyspark.ml.param',
'pyspark.ml.stat',
'pyspark.sql',
'pyspark.streaming',
'pyspark.bin',
Expand Down

0 comments on commit 965c82d

Please sign in to comment.