Skip to content

Commit

Permalink
[SPARK-19955][PYSPARK] Jenkins Python Conda based test.
Browse files Browse the repository at this point in the history
## What changes were proposed in this pull request?

Allow Jenkins Python tests to use the installed conda to test Python 2.7 support & test pip installability.

## How was this patch tested?

Updated shell scripts, ran tests locally with installed conda, ran tests in Jenkins.

Author: Holden Karau <[email protected]>

Closes apache#17355 from holdenk/SPARK-19955-support-python-tests-with-conda.
  • Loading branch information
holdenk committed Mar 29, 2017
1 parent c622a87 commit d6ddfdf
Show file tree
Hide file tree
Showing 3 changed files with 47 additions and 28 deletions.
66 changes: 42 additions & 24 deletions dev/run-pip-tests
Original file line number Diff line number Diff line change
Expand Up @@ -35,32 +35,37 @@ function delete_virtualenv() {
}
trap delete_virtualenv EXIT

PYTHON_EXECS=()
# Some systems don't have pip or virtualenv - in those cases our tests won't work.
if ! hash virtualenv 2>/dev/null; then
echo "Missing virtualenv skipping pip installability tests."
if hash virtualenv 2>/dev/null && [ ! -n "$USE_CONDA" ]; then
echo "virtualenv installed - using. Note if this is a conda virtual env you may wish to set USE_CONDA"
# Figure out which Python execs we should test pip installation with
if hash python2 2>/dev/null; then
# We do this since we are testing with virtualenv and the default virtual env python
# is in /usr/bin/python
PYTHON_EXECS+=('python2')
elif hash python 2>/dev/null; then
# If python2 isn't installed fallback to python if available
PYTHON_EXECS+=('python')
fi
if hash python3 2>/dev/null; then
PYTHON_EXECS+=('python3')
fi
elif hash conda 2>/dev/null; then
echo "Using conda virtual enviroments"
PYTHON_EXECS=('3.5')
USE_CONDA=1
else
echo "Missing virtualenv & conda, skipping pip installability tests"
exit 0
fi
if ! hash pip 2>/dev/null; then
echo "Missing pip, skipping pip installability tests."
exit 0
fi

# Figure out which Python execs we should test pip installation with
PYTHON_EXECS=()
if hash python2 2>/dev/null; then
# We do this since we are testing with virtualenv and the default virtual env python
# is in /usr/bin/python
PYTHON_EXECS+=('python2')
elif hash python 2>/dev/null; then
# If python2 isn't installed fallback to python if available
PYTHON_EXECS+=('python')
fi
if hash python3 2>/dev/null; then
PYTHON_EXECS+=('python3')
fi

# Determine which version of PySpark we are building for archive name
PYSPARK_VERSION=$(python -c "exec(open('python/pyspark/version.py').read());print __version__")
PYSPARK_VERSION=$(python3 -c "exec(open('python/pyspark/version.py').read());print(__version__)")
PYSPARK_DIST="$FWDIR/python/dist/pyspark-$PYSPARK_VERSION.tar.gz"
# The pip install options we use for all the pip commands
PIP_OPTIONS="--upgrade --no-cache-dir --force-reinstall "
Expand All @@ -75,18 +80,24 @@ for python in "${PYTHON_EXECS[@]}"; do
echo "Using $VIRTUALENV_BASE for virtualenv"
VIRTUALENV_PATH="$VIRTUALENV_BASE"/$python
rm -rf "$VIRTUALENV_PATH"
mkdir -p "$VIRTUALENV_PATH"
virtualenv --python=$python "$VIRTUALENV_PATH"
source "$VIRTUALENV_PATH"/bin/activate
# Upgrade pip & friends
pip install --upgrade pip pypandoc wheel
pip install numpy # Needed so we can verify mllib imports
if [ -n "$USE_CONDA" ]; then
conda create -y -p "$VIRTUALENV_PATH" python=$python numpy pandas pip setuptools
source activate "$VIRTUALENV_PATH"
else
mkdir -p "$VIRTUALENV_PATH"
virtualenv --python=$python "$VIRTUALENV_PATH"
source "$VIRTUALENV_PATH"/bin/activate
fi
# Upgrade pip & friends if using virutal env
if [ ! -n "USE_CONDA" ]; then
pip install --upgrade pip pypandoc wheel numpy
fi

echo "Creating pip installable source dist"
cd "$FWDIR"/python
# Delete the egg info file if it exists, this can cache the setup file.
rm -rf pyspark.egg-info || echo "No existing egg info file, skipping deletion"
$python setup.py sdist
python setup.py sdist


echo "Installing dist into virtual env"
Expand All @@ -112,6 +123,13 @@ for python in "${PYTHON_EXECS[@]}"; do

cd "$FWDIR"

# conda / virtualenv enviroments need to be deactivated differently
if [ -n "$USE_CONDA" ]; then
source deactivate
else
deactivate
fi

done
done

Expand Down
3 changes: 2 additions & 1 deletion dev/run-tests-jenkins
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@
# Environment variables are populated by the code here:
#+ https://github.com/jenkinsci/ghprb-plugin/blob/master/src/main/java/org/jenkinsci/plugins/ghprb/GhprbTrigger.java#L139

FWDIR="$(cd "`dirname $0`"/..; pwd)"
FWDIR="$( cd "$( dirname "$0" )/.." && pwd )"
cd "$FWDIR"

export PATH=/home/anaconda/bin:$PATH
exec python -u ./dev/run-tests-jenkins.py "$@"
6 changes: 3 additions & 3 deletions python/run-tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,9 +111,9 @@ def run_individual_python_test(test_name, pyspark_python):


def get_default_python_executables():
python_execs = [x for x in ["python2.6", "python3.4", "pypy"] if which(x)]
if "python2.6" not in python_execs:
LOGGER.warning("Not testing against `python2.6` because it could not be found; falling"
python_execs = [x for x in ["python2.7", "python3.4", "pypy"] if which(x)]
if "python2.7" not in python_execs:
LOGGER.warning("Not testing against `python2.7` because it could not be found; falling"
" back to `python` instead")
python_execs.insert(0, "python")
return python_execs
Expand Down

0 comments on commit d6ddfdf

Please sign in to comment.