Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Corrects $SPARK_HOME errors in spark container #1009

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

brockneedscoffee
Copy link

When installing spark via miniconda the spark home is not set so you cannot run spark in AML. You will get errors when you try to set the spark context or if you set the framework to pyspark in an AML pipeline step. This PR installs spark, sets the home, and installs open mpi dependency.

@welcome
Copy link

welcome bot commented Mar 16, 2021

💖 Thanks for opening your first pull request! 💖 We use semantic commit messages to streamline the release process. Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix. This helps us to create release messages and credit you for your hard work!
Examples of commit messages with semantic prefixes:

  • fix: Fix LightGBM crashes with empty partitions
  • feat: Make HTTP on Spark back-offs configurable
  • docs: Update Spark Serving usage
  • build: Add codecov support
  • perf: improve LightGBM memory usage
  • refactor: make python code generation rely on classes
  • style: Remove nulls from CNTKModel
  • test: Add test coverage for CNTKModel

Make sure to check out the developer guide for guidance on testing your change.

@mhamilton723 mhamilton723 linked an issue Mar 16, 2021 that may be closed by this pull request
@mhamilton723
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov
Copy link

codecov bot commented Mar 16, 2021

Codecov Report

Merging #1009 (a17834c) into master (2c223f6) will increase coverage by 0.35%.
The diff coverage is n/a.

❗ Current head a17834c differs from pull request most recent head 0040785. Consider uploading reports for the commit 0040785 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1009      +/-   ##
==========================================
+ Coverage   84.13%   84.49%   +0.35%     
==========================================
  Files         201      199       -2     
  Lines        9306     9172     -134     
  Branches      554      543      -11     
==========================================
- Hits         7830     7750      -80     
+ Misses       1476     1422      -54     
Impacted Files Coverage Δ
...om/microsoft/ml/spark/train/AutoTrainedModel.scala 50.00% <0.00%> (-35.72%) ⬇️
.../org/apache/spark/ml/param/UntypedArrayParam.scala 37.50% <0.00%> (-20.40%) ⬇️
...n/scala/com/microsoft/ml/spark/stages/Lambda.scala 80.00% <0.00%> (-13.34%) ⬇️
...a/com/microsoft/ml/spark/io/http/HTTPClients.scala 76.66% <0.00%> (-6.67%) ⬇️
...icrosoft/ml/spark/featurize/CleanMissingData.scala 88.88% <0.00%> (-4.96%) ⬇️
.../com/microsoft/ml/spark/stages/ClassBalancer.scala 81.81% <0.00%> (-4.85%) ⬇️
...icrosoft/ml/spark/automl/TuneHyperparameters.scala 73.91% <0.00%> (-4.35%) ⬇️
.../microsoft/ml/spark/vw/VowpalWabbitRegressor.scala 70.00% <0.00%> (-2.73%) ⬇️
.../com/microsoft/ml/spark/automl/FindBestModel.scala 85.00% <0.00%> (-1.54%) ⬇️
.../microsoft/ml/spark/core/utils/ModelEquality.scala 85.71% <0.00%> (-1.25%) ⬇️
... and 71 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2c223f6...0040785. Read the comment docs.

@brockneedscoffee brockneedscoffee changed the title Adds Spark and Open MPI Fix: Corrects $SPARK_HOME errors in spark container Mar 16, 2021
@brockneedscoffee
Copy link
Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@brockneedscoffee
Copy link
Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@brockneedscoffee
Copy link
Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723
Copy link
Collaborator

@brockneedscoffee - looks good but we dont test docker notebook as part of the build yet so could you manually test these out?

Also could you group all the run steps together into a single run step and put ENV stuff in top of docker. Putting all of the run steps together (and deleting ant .tgzs you download in these steps) makes the docker image considerably smaller thanks!

ENV PATH /opt/conda/bin:$PATH
ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64

# Build-essentials
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Group run steps together for smaller docker image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spark not configured in docker image
3 participants