Name		Name	Last commit message	Last commit date
parent directory ..
dlrm @ ab68cd5		dlrm @ ab68cd5
README.md		README.md

README.md

Distributed DLRM

The implementation is developed based on DLRM dist_exp branch and add Facebook features and optimizations.

Currently you need to download the following PR to get the latest update. (Will be fixed soon.)

Usage

Currently, it is launched with mpirun on multi-nodes. The hostfile need to be created or a host list should be given. The DLRM parameters should be given in the same way as single node master branch.

mpirun -np 128 -hostfile hostfile python dlrm_s_pytorch.py ...

Example

large_arch_emb=$(printf '14000%.0s' {1..64}) large_arch_emb=${large_arch_emb_ads//"01"/"0-1"}

python dlrm_s_pytorch.py
   --arch-sparse-feature-size=128
   --arch-mlp-bot="2000-1024-1024-128"
   --arch-mlp-top="4096-4096-4096-1"
   --arch-embedding-size=$large_arch_emb
   --data-generation=random
   --loss-function=bce
   --round-targets=True
   --learning-rate=0.1
   --mini-batch-size=2048
   --print-freq=10240
   --print-time
   --test-mini-batch-size=16384
   --test-num-workers=16
   --num-indices-per-lookup-fixed=1
   --num-indices-per-lookup=100
   --arch-projection-size 30
   --use-gpu

Please check the README.md in the PR for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workloads

workloads

README.md

Distributed DLRM

Usage

Example

Files

workloads

Directory actions

More options

Directory actions

More options

Latest commit

History

workloads

Folders and files

parent directory

README.md

Distributed DLRM

Usage

Example