The following is a brief directory structure and description for this example:
├── data # Data set directory
├── distribute_k8s # Distributed training related files
│ ├── distribute_k8s_BF16.yaml # k8s yaml to crate a training job with BF16 feature
│ ├── distribute_k8s_FP32.yaml # k8s yaml to crate a training job
│ └── launch.py # Script to set env for distributed training
├── README.md # Documentation
├── result # Output directory
│ └── README.md # Documentation describing output directory
└── train.py # Training script
Progressive Layered Extraction(PLE) is proposed by Tencent in 2020.
-
Please prepare the data set and DeepRec env.
- Manually
- Follow dataset preparation to prepare data set.
- Download code by
git clone https://github.com/alibaba/DeepRec
- Follow How to Build to build DeepRec whl package and install by
pip install $DEEPREC_WHL
.
- Docker(Recommended)
docker pull alideeprec/deeprec-release-modelzoo:latest docker run -it alideeprec/deeprec-release-modelzoo:latest /bin/bash # In docker container cd /root/modelzoo/ple
- Manually
-
Training.
python train.py # Memory acceleration with jemalloc. # The required ENV `MALLOC_CONF` is already set in the code. LD_PRELOAD=./libjemalloc.so.2.5.1 python train.py
Use argument
--bf16
to enable DeepRec BF16 feature.python train.py --bf16 # Memory acceleration with jemalloc. # The required ENV `MALLOC_CONF` is already set in the code. LD_PRELOAD=./libjemalloc.so.2.5.1 python train.py --bf16
In the community tensorflow environment, use argument
--tf
to disable all of DeepRec's feature.python train.py --tf
Use arguments to set up a custom configuation:
- DeepRec Features:
export START_STATISTIC_STEP
andexport STOP_STATISTIC_STEP
: Set ENV to configure CPU memory optimization. This is already set to100
&110
in the code by default.--bf16
: Enable DeepRec BF16 feature in DeepRec. Use FP32 by default.--emb_fusion
: Whether to enable embedding fusion, Default isTrue
.--op_fusion
: Whether to enable Auto graph fusion feature. Default isTrue
.--optimizer
: Choose the optimizer for deep model from ['adam', 'adamasync', 'adagraddecay', 'adagrad', 'gradientdescent']. Useadam
by default.--smartstaged
: Whether to enable SmartStaged feature of DeepRec, Default isTrue
.--micro_batch
: Set num for Auto Micro Batch. Default is0
. (Not really enabled)--ev
: Whether to enable DeepRec EmbeddingVariable. Default isFalse
.--group_embedding
: Use GroupEmbedding features.--adaptive_emb
: Whether to enable Adaptive Embedding. Default isFalse
.--ev_elimination
: Set Feature Elimination of EmbeddingVariable Feature. Options: [None, 'l2', 'gstep'], default isNone
.--ev_filter
: Set Feature Filter of EmbeddingVariable Feature. Options: [None, 'counter', 'cbf'], default toNone
.--dynamic_ev
: Whether to enable Dynamic-dimension Embedding Variable. Default isFalse
. (Not really enabled)--multihash
: Whether to enable Multi-Hash Variable. Default isFalse
. (Not really enabled)--incremental_ckpt
: Set time of save Incremental Checkpoint. Default is0
.--workqueue
: Whether to enable WorkQueue. Default isFalse
.--parquet_dataset
: Whether to enable ParquetDataset. Default isTrue
.--parquet_dataset_shuffle
: Whether to enable shuffle operation for Parquet Dataset. Default toFalse
.
- Basic Settings:
--data_location
: Full path of train & eval data. Default is./data
.--steps
: Set the number of steps on train dataset. When default(0
) is used, the number of steps is computed based on dataset size and number of epochs equals 1000.--no_eval
: Do not evaluate trained model by eval dataset.--batch_size
: Batch size to train. Default is2048
.--output_dir
: Full path to output directory for logs and saved model. Default is./result
.--checkpoint
: Full path to checkpoints output directory. Default is$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMP)
--save_steps
: Set the number of steps on saving checkpoints, zero to close. Default will be set toNone
.--seed
: Random seed. Default is2021
.--timeline
: Save steps of profile hooks to record timeline, zero to close. Default isNone
.--keep_checkpoint_max
: Maximum number of recent checkpoint to keep. Default is1
.--learning_rate
: Learning rate for network. Default is0.1
.--l2_regularization
: L2 regularization for the model. Default isNone
.--protocol
: Set the protocol('grpc', 'grpc++', 'star_server') used when starting server in distributed training. Default isgrpc
.--inter
: Set inter op parallelism threads. Default is0
.--intra
: Set intra op parallelism threads. Default is0
.--input_layer_partitioner
: Slice size of input layer partitioner(units MB). Default is0
.--dense_layer_partitioner
: Slice size of dense layer partitioner(units kB). Default is0
.--tf
: Use TF 1.15.5 API and disable all DeepRec features.
- DeepRec Features:
- Prepare a K8S cluster and shared storage volume.
- Create a PVC(PeritetVolumeClaim) for storage volumn in cluster.
- Prepare docker image by DockerFile.
- Edit k8s yaml file
replicas
: numbers of cheif, worker, ps.image
: where nodes can pull the docker image.claimName
: PVC name.
The benchmark is performed on the Alibaba Cloud ECS general purpose instance family with high clock speeds - ecs.g8i.4xlarge.
-
Hardware
- Model name: Intel(R) Xeon(R) Platinum 8475B
- CPU(s): 16
- Socket(s): 1
- Core(s) per socket: 8
- Thread(s) per core: 2
- Memory: 64G
-
Software
- kernel: Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
- OS: Ubuntu 22.04.2 LTS
- GCC: 11.3.0
- Docker: 20.10.21
Framework | DType | Accuracy | AUC | Throughput | |
PLE | Community TensorFlow | FP32 | 1.000000 | 0.498449 | 21182.44(baseline) |
DeepRec w/ oneDNN | FP32 | 1.000000 | 0.497499 | 28608.60(1.35x) | |
DeepRec w/ oneDNN | FP32+BF16 | 0.998046 | 0.499049 | 33542.94(1.58x) |
- Community TensorFlow version is v1.15.5.
- Due to the small size of the dataset, the results did not converge, leading to limited reference value for ACC and AUC.
Train & eval dataset using Taobao dataset.
We provide the dataset in two formats:
- CSV Format Put data file taobao_train_data & taobao_test_data into ./data/ These files are available at Taobao CSV Dataset.
- Parquet Format Put data file taobao_train_data.parquet & taobao_test_data.parquet into ./data/ These files are available at Taobao Parquet Dataset.
The dataset contains 20 columns, details as follow:
Name | clk | buy | pid | adgroup_id | cate_id | campaign_id | customer | brand | user_id | cms_segid | cms_group_id | final_gender_code | age_level | pvalue_level | shopping_level | occupation | new_user_class_level | tag_category_list | tag_brand_list | price |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Type | tf.int32 | tf.int32 | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.string | tf.int32 |
The data in tag_category_list
and tag_brand_list
column are separated by '|'
The 'clk' ans 'buy' columns are` used as labels.
Input feature columns are as follow:
Column name | Hash bucket size | Embedding dimension |
---|---|---|
pid | 10 | 16 |
adgroup_id | 100000 | 16 |
cate_id | 10000 | 16 |
campaign_id | 100000 | 16 |
customer | 100000 | 16 |
brand | 100000 | 16 |
user_id | 100000 | 16 |
cms_segid | 100 | 16 |
cms_group_id | 100 | 16 |
age_level | 10 | 16 |
pvalue_level | 10 | 16 |
shopping_level | 10 | 16 |
occupation | 10 | 16 |
new_user_class_level | 10 | 16 |
tag_category_list | 100000 | 16 |
tag_brand_list | 100000 | 16 |
-------------------- | Num Buckets | ------------------- |
price | 50 | 16 |
- Distribute training model
- Benchmark
- DeepRec DockerFile