Skip to content

Commit

Permalink
Added a E2E test for buffered async gradient aggregation without quan…
Browse files Browse the repository at this point in the history
…tization
  • Loading branch information
amitaga committed Jan 7, 2016
1 parent eaf3956 commit 6ed460b
Show file tree
Hide file tree
Showing 6 changed files with 10,363 additions and 0 deletions.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/bin/bash

. $TEST_ROOT_DIR/run-test-common

ConfigDir=$TEST_DIR/..
LogFileName=stderr
Instances=3
NumCPUThreads=$(threadsPerInstance $Instances)

# cntkmpirun <MPI args> <CNTK config file name> <additional CNTK args>
cntkmpirun "-n $Instances" cntk.config "numCPUThreads=$NumCPUThreads precision=double speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[gradientBits=64]]]] speechTrain=[SGD=[ParallelTrain=[DataParallelSGD=[useBufferedAsyncGradientAggregation=true]]]] speechTrain=[SGD=[ParallelTrain=[parallelizationStartEpoch=2]]] speechTrain=[SGD=[maxEpochs=4]] speechTrain=[SGD=[ParallelTrain=[syncPerfStats=5]]]"
ExitCode=$?
sed 's/^/MPI Rank 0: /' $TEST_RUN_DIR/"$LogFileName"_speechTrain.logrank0
sed 's/^/MPI Rank 1: /' $TEST_RUN_DIR/"$LogFileName"_speechTrain.logrank1
sed 's/^/MPI Rank 2: /' $TEST_RUN_DIR/"$LogFileName"_speechTrain.logrank2
exit $ExitCode
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
dataDir: ../../Data
tags:
# running on every BVT job in 'S' (Speech) leg in Debug-GPU and Release-CPU configurations:
- bvt-s (flavor=='debug') ^ (device=='cpu')
# running unconditionally on every Nightly job in 'S' leg
- nightly-s

testCases:
Must train epochs in exactly same order and parameters for each MPI Rank:
patterns:
- ^MPI Rank {{integer}}
- Starting Epoch {{integer}}
- learning rate per sample = {{float}}
- momentum = {{float}}

Epochs must be finished with expected results for each MPI Rank:
patterns:
- ^MPI Rank {{integer}}
- Finished Epoch[{{integer}} of {{integer}}]
- TrainLossPerSample = {{float,tolerance=0%}}
- EvalErrPerSample = {{float,tolerance=0%}}
- AvgLearningRatePerSample = {{float,tolerance=0%}}

Per-minibatch training results must match for each MPI Rank:
patterns:
- ^MPI Rank {{integer}}
- Epoch[{{integer}} of {{integer}}]-Minibatch[{{integer}}-{{integer}}
- SamplesSeen = {{integer}}
- TrainLossPerSample = {{float,tolerance=0%}}
- EvalErr[0]PerSample = {{float,tolerance=0%}}

DataParallelSGD training parameters must match for each MPI Rank:
patterns:
- ^MPI Rank {{integer}}
- Starting minibatch loop
- DataParallelSGD training
- MyRank = {{integer}}
- NumNodes = 3
- NumGradientBits = 64
- distributed reading is ENABLED
- BufferedAsyncGradientAggregation is ENABLED

0 comments on commit 6ed460b

Please sign in to comment.