Fix: Corrects $SPARK_HOME errors in spark container #1009

brockneedscoffee · 2021-03-16T19:37:38Z

When installing spark via miniconda the spark home is not set so you cannot run spark in AML. You will get errors when you try to set the spark context or if you set the framework to pyspark in an AML pipeline step. This PR installs spark, sets the home, and installs open mpi dependency.

welcome · 2021-03-16T19:37:46Z

💖 Thanks for opening your first pull request! 💖 We use semantic commit messages to streamline the release process. Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix. This helps us to create release messages and credit you for your hard work!
Examples of commit messages with semantic prefixes:

fix: Fix LightGBM crashes with empty partitions
feat: Make HTTP on Spark back-offs configurable
docs: Update Spark Serving usage
build: Add codecov support
perf: improve LightGBM memory usage
refactor: make python code generation rely on classes
style: Remove nulls from CNTKModel
test: Add test coverage for CNTKModel

Make sure to check out the developer guide for guidance on testing your change.

…e spark in AML

…scoffee/mmlspark into update-spark-container

mhamilton723 · 2021-03-16T19:48:21Z

/azp run

azure-pipelines · 2021-03-16T19:48:35Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov · 2021-03-16T19:54:05Z

Codecov Report

Merging #1009 (a17834c) into master (2c223f6) will increase coverage by 0.35%.
The diff coverage is n/a.

❗ Current head a17834c differs from pull request most recent head 0040785. Consider uploading reports for the commit 0040785 to get more accurate results

@@            Coverage Diff             @@
##           master    #1009      +/-   ##
==========================================
+ Coverage   84.13%   84.49%   +0.35%     
==========================================
  Files         201      199       -2     
  Lines        9306     9172     -134     
  Branches      554      543      -11     
==========================================
- Hits         7830     7750      -80     
+ Misses       1476     1422      -54

Impacted Files	Coverage Δ
...om/microsoft/ml/spark/train/AutoTrainedModel.scala	`50.00% <0.00%> (-35.72%)`	⬇️
.../org/apache/spark/ml/param/UntypedArrayParam.scala	`37.50% <0.00%> (-20.40%)`	⬇️
...n/scala/com/microsoft/ml/spark/stages/Lambda.scala	`80.00% <0.00%> (-13.34%)`	⬇️
...a/com/microsoft/ml/spark/io/http/HTTPClients.scala	`76.66% <0.00%> (-6.67%)`	⬇️
...icrosoft/ml/spark/featurize/CleanMissingData.scala	`88.88% <0.00%> (-4.96%)`	⬇️
.../com/microsoft/ml/spark/stages/ClassBalancer.scala	`81.81% <0.00%> (-4.85%)`	⬇️
...icrosoft/ml/spark/automl/TuneHyperparameters.scala	`73.91% <0.00%> (-4.35%)`	⬇️
.../microsoft/ml/spark/vw/VowpalWabbitRegressor.scala	`70.00% <0.00%> (-2.73%)`	⬇️
.../com/microsoft/ml/spark/automl/FindBestModel.scala	`85.00% <0.00%> (-1.54%)`	⬇️
.../microsoft/ml/spark/core/utils/ModelEquality.scala	`85.71% <0.00%> (-1.25%)`	⬇️
... and 71 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2c223f6...0040785. Read the comment docs.

brockneedscoffee · 2021-03-16T20:16:17Z

/azp run

azure-pipelines · 2021-03-16T20:16:37Z

Azure Pipelines successfully started running 1 pipeline(s).

brockneedscoffee · 2021-03-25T04:58:25Z

/azp run

azure-pipelines · 2021-03-25T04:58:46Z

Azure Pipelines successfully started running 1 pipeline(s).

brockneedscoffee · 2021-03-25T04:59:14Z

/azp run

azure-pipelines · 2021-03-25T04:59:33Z

Azure Pipelines successfully started running 1 pipeline(s).

mhamilton723 · 2021-03-28T17:18:26Z

@brockneedscoffee - looks good but we dont test docker notebook as part of the build yet so could you manually test these out?

Also could you group all the run steps together into a single run step and put ENV stuff in top of docker. Putting all of the run steps together (and deleting ant .tgzs you download in these steps) makes the docker image considerably smaller thanks!

mhamilton723 · 2021-03-28T17:19:05Z

tools/docker/demo/Dockerfile

 ENV PATH /opt/conda/bin:$PATH
 ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64

+# Build-essentials


Group run steps together for smaller docker image

added spark install and openmpi

b93a7e2

brockneedscoffee requested a review from mhamilton723 as a code owner March 16, 2021 19:37

brockneedscoffee and others added 4 commits March 16, 2021 15:37

Merge branch 'master' into update-spark-container

893700a

fix: installs spark, sets spark home, and adds open mpi so you can us…

d5b569b

…e spark in AML

Merge branch 'update-spark-container' of https://github.com/brockneed…

a54ce92

…scoffee/mmlspark into update-spark-container

fix: fixes microsoft#1008

9a3bbd2

mhamilton723 linked an issue Mar 16, 2021 that may be closed by this pull request

Spark not configured in docker image #1008

Open

fix: corrects openmpi install error by adding build-essentials

26b3db2

brockneedscoffee changed the title ~~Adds Spark and Open MPI~~ Fix: Corrects $SPARK_HOME errors in spark container Mar 16, 2021

brockneedscoffee added 2 commits March 24, 2021 22:43

Merge branch 'master' into update-spark-container

b36be87

add build essentials

c840b8f

add build essentials

a17834c

mhamilton723 requested changes Mar 28, 2021

View reviewed changes

brockneedscoffee added 2 commits April 1, 2021 21:13

Merge branch 'master' into update-spark-container

fcfe51f

Merge branch 'master' into update-spark-container

0040785

imatiach-msft approved these changes Dec 31, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Corrects $SPARK_HOME errors in spark container #1009

Fix: Corrects $SPARK_HOME errors in spark container #1009

brockneedscoffee commented Mar 16, 2021

welcome bot commented Mar 16, 2021

mhamilton723 commented Mar 16, 2021

azure-pipelines bot commented Mar 16, 2021

codecov bot commented Mar 16, 2021 •

edited

Loading

brockneedscoffee commented Mar 16, 2021

azure-pipelines bot commented Mar 16, 2021

brockneedscoffee commented Mar 25, 2021

azure-pipelines bot commented Mar 25, 2021

brockneedscoffee commented Mar 25, 2021

azure-pipelines bot commented Mar 25, 2021

mhamilton723 commented Mar 28, 2021

mhamilton723 Mar 28, 2021

Fix: Corrects $SPARK_HOME errors in spark container #1009

Are you sure you want to change the base?

Fix: Corrects $SPARK_HOME errors in spark container #1009

Conversation

brockneedscoffee commented Mar 16, 2021

welcome bot commented Mar 16, 2021

mhamilton723 commented Mar 16, 2021

azure-pipelines bot commented Mar 16, 2021

codecov bot commented Mar 16, 2021 • edited Loading

Codecov Report

brockneedscoffee commented Mar 16, 2021

azure-pipelines bot commented Mar 16, 2021

brockneedscoffee commented Mar 25, 2021

azure-pipelines bot commented Mar 25, 2021

brockneedscoffee commented Mar 25, 2021

azure-pipelines bot commented Mar 25, 2021

mhamilton723 commented Mar 28, 2021

mhamilton723 Mar 28, 2021

Choose a reason for hiding this comment

codecov bot commented Mar 16, 2021 •

edited

Loading