Evaluation worker (sjtu-marl#24)

* Update sc2 implementation * Disambiguate the behavior of sampling * Add qmix * Ready for smarts * VecEnv supports auto reset * Use simple metric record * Fix: exploration is not exploit * tmp save * update * Update environment returns * Fix: sequential rollout frames shifting * Add configs for running smarts * Update * Update usage * Fix bug: sequential rollout * Identify with PID * Fix: on adapter for smarts * update ddpg yaml * update * ppo runable * Enable info adapter for smarts * Apply custom metric to simulation rollout * fix bug in dqn (action_mask is None) * update * update * Fix: no info * Apply reward shaping * bug fix: ppo ratio and exploration * save params * Light controller and dataset server * temp test for Mujoco * Add data shapes example (gym) * Tag logging with log level * Depart task request handling from server * Alternative policy pool and register collected * Formatted * Launched new coordinator * Temporary collect helper * Task cache dataclass * Remove useless state id parameter * Gym passed * update * fix resource error and upgrade ray * upgrade dependencies * fix: request index * update yamls * deprecated environment * rename filename * apply safe load * Mappo+gfootball (sjtu-marl#29) * tmp saving, starting add mappo+gfootball; now add customized rollout function * rollout function & policy & env tests over * use formal action mask * format code * mappo run gfootball 5_vs_5 against bot, runnable version; fix & more feature need to be added * Make SubProcVecEnv support AddEnv * Feature 1.make num env in subprocVecEnv flexible 2.add gpu for AgentInterface during training ps. eager to add read/write limits * add missing commits * update ignores * refactor rollout mechanism, and test * test pass: google football base env * enable wrappers for gfootball env * support RNN state transmission * avoid list numpy warning * auto tensor caster for loss computing, test required * test pass for offline dataset * temp save * temp save Co-authored-by: ming <[email protected]> * update * make rollout/env/databackend test pass * test mappo: in progress * update configs and fix: episode info record * Mappo+gfootball (sjtu-marl#30) * tmp saving, starting add mappo+gfootball; now add customized rollout function * rollout function & policy & env tests over * use formal action mask * format code * mappo run gfootball 5_vs_5 against bot, runnable version; fix & more feature need to be added * Make SubProcVecEnv support AddEnv * Feature 1.make num env in subprocVecEnv flexible 2.add gpu for AgentInterface during training ps. eager to add read/write limits * add missing commits * update ignores * refactor rollout mechanism, and test * add vtrace * test pass: google football base env * enable wrappers for gfootball env * support RNN state transmission * avoid list numpy warning * auto tensor caster for loss computing, test required * test pass for offline dataset * temp save * temp save * tmp saving * report problem * test_mappo.py pass still problems such as rnn_states never change? next step: check data and then run * update * all passed * only mappo loss has some bugs * enable info logging * still debug * dict rnn state is required * reparameterize yamls Co-authored-by: ziyuwan <[email protected]> * temp save * collect tests * make vecenv support sequential envs * add ignored env tests * still in progress: sequential * test pass * check output from ctde * expose local buffer config * copy next frame for sequential games * let remote logger launch in a cluster * fix: data frame is broken in sequential games * tmp saving: adding algorithm tests * add tests for mappo/dqn * add all tests for algorithm except 'bc', "SAC" implementaion has problems should be fixed * reformatted and leave fixme for sac test * init some test suits * temp save * in progress: add learner and server tests * fix some bugs in sac to pass test * test coverage of learner is 85% * just make qmix pass tests, but it may be totally wrong. * temp save * updates * remove deprecated test & add example test * pre merge * add test for task handler * fix rnn states name * ignore deprecated funcs * improve gfootball test * fix bug: no validate_func method * add tests for env agent interface * add test for evaluation * update settings * add test for managers * udpate algorithm tests * adding updates * add tests for evaluation * reformatted tests and SC2 implementation * fix: no rollout_worker_manger * add test for misc and modify misc * add test for conv in dqn; modify dqn conv's code to make it more general * offlinne dataset unit test * :( fix bugs * add tests model ---> 94% cov * remove deprecated file * reformatted and enable sc2 * env cov rise * init utils test * tmp: simplify rollout implementation with easy actor configs * migrate exp tools * init worker test * remove useless comments * delete deprecated file * metric_type is not required by rollout_worker * add template geneator for types * async_rollout_worker_passed * init parameter server tests * parameter server test passed * update test cases for payoff manager * improve tests for postprocessors * resolve task dispatching conflicts * update * fix: no optimization tasks executed because of wrong actorpool use * fix: FakeStepping should remove callback * fix: error raised when transformation in maatari * fix and refine: tensorboard error and make ExprManager human readable * disable verbose for test * replace agent_reward with reward * fix parameter list for all env test * test examples * remove info from env.timestep() which can be processed by np.asarray() * fix qmix sc2, make it runnable * make qmix tests passed * add comments * handle cc error and fix tests for SC2 * update settings * update docs * remove deprecated examples Co-authored-by: Morning <[email protected]> Co-authored-by: ziyuwan <[email protected]> Co-authored-by: ziyuwan <[email protected]> Co-authored-by: Hanjing Wang <[email protected]>
Coldison · Feb 18, 2022 · 53d64ba · 53d64ba
1 parent 3785309
commit 53d64ba
Show file tree

Hide file tree

Showing 345 changed files with 19,865 additions and 5,329 deletions.
diff --git a/.coveragerc b/.coveragerc
@@ -0,0 +1,40 @@
+[run]
+branch = False
+omit = 
+	# ignore typings and base class
+	*/__init__.py
+	malib/agent/agent_interface.py
+	malib/backend/coordinator/base_coordinator.py
+	malib/envs/smarts/*
+	malib/envs/env.py
+	malib/rollout/base_worker.py
+	malib/evaluator/base_evaluator.py
+	# ignore imitation training suit
+	malib/algorithm/imitation/*
+	malib/algorithm/common/reward.py
+	# ignore random policy, just for test
+	malib/algorithm/random/*
+	malib/utils/*
+	malib/rpc/*
+	# ignore cli
+	malib/runner.py
+	# no usage
+	malib/rollout/sync_rollout_worker.py
+
+
+[report]
+skip_empty = True
+omit =
+	malib/envs/smarts/*
+	malib/algorithm/imitation/*
+	malib/settings.py
+	malib/registration.py
+	malib/backend/coordinator/light_server.py
+	# deprecated
+	malib/backend/datapool/data_array.py
+	malib/algorithm/common/reward.py
+	malib/utils/*
+	malib/rpc/*
+
+[html]
+directory = cov_html
diff --git a/.gitignore b/.gitignore
@@ -133,4 +133,5 @@ dmypy.json
 .idea
 _build
 logs
-demos
+demos
+prof/
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "malib/envs/smarts/_env"]
+	path = malib/envs/smarts/_env
+	url = https://github.com/huawei-noah/SMARTS.git
diff --git a/Makefile b/Makefile
@@ -7,7 +7,7 @@
 #
 #.PHONY: profiling
 #profiling:
-
+# run visualization for cov_html: ruby -run -ehttpd . -p8000
 .PHONY: clean
 clean:
 	rm -rf ./logs/*
@@ -24,3 +24,40 @@ docs:
 .PHONY: rm-pycache
 rm-pycache:
 	find . -type f -name '*.py[co]' -delete -o -type d -name __pycache__ -delete
+
+.PHONY: test
+test:
+	pytest --cov-config=.coveragerc --cov=malib --cov-report html --doctest-modules tests
+	rm -f .coverage.*
+
+.PHONY: test-dataset
+test-dataset:
+	pytest -v --doctest-modules tests/dataset
+
+.PHONY: test-parameter-server
+test-parameter-server:
+	pytest -v --doctest-modules tests/paramter_server
+
+.PHONY: test-coordinator
+test-coordinator:
+	pytest -v --doctest-modules tests/coordinator
+
+.PHONY: test-backend
+test-backend: test-dataset test-parameter-server test-coordinator
+
+.PHONY: test-algorith
+test-algorithm:
+	pytest -v --doctest-modules tests/algorithm
+
+.PHONY: test-rollout
+test-rollout:
+	pytest -v --doctest-modules tests/rollout
+
+.PHONY: test-agent
+test-agent:
+	pytest --doctest-modules tests/agent
+
+.PHONY: test-env-api
+test-env-api:
+	pytest -v --doctest-modules tests/env_api
+
diff --git a/docs/source/api/malib.agent.indepdent_irl_agent.rst b/docs/source/api/malib.agent.indepdent_irl_agent.rst
@@ -0,0 +1,7 @@
+malib.agent.indepdent\_irl\_agent module
+========================================
+
+.. automodule:: malib.agent.indepdent_irl_agent
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.agent.rst b/docs/source/api/malib.agent.rst
@@ -17,4 +17,5 @@ Submodules
    malib.agent.centralized_agent
    malib.agent.ctde_agent
    malib.agent.indepdent_agent
+   malib.agent.indepdent_irl_agent
    malib.agent.sync_agent
diff --git a/docs/source/api/malib.algorithm.common.reward.rst b/docs/source/api/malib.algorithm.common.reward.rst
@@ -0,0 +1,7 @@
+malib.algorithm.common.reward module
+====================================
+
+.. automodule:: malib.algorithm.common.reward
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.common.rst b/docs/source/api/malib.algorithm.common.rst
@@ -16,4 +16,5 @@ Submodules
    malib.algorithm.common.misc
    malib.algorithm.common.model
    malib.algorithm.common.policy
+   malib.algorithm.common.reward
    malib.algorithm.common.trainer
diff --git a/docs/source/api/malib.algorithm.discrete_sac.loss.rst b/docs/source/api/malib.algorithm.discrete_sac.loss.rst
@@ -0,0 +1,7 @@
+malib.algorithm.discrete\_sac.loss module
+=========================================
+
+.. automodule:: malib.algorithm.discrete_sac.loss
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.discrete_sac.policy.rst b/docs/source/api/malib.algorithm.discrete_sac.policy.rst
@@ -0,0 +1,7 @@
+malib.algorithm.discrete\_sac.policy module
+===========================================
+
+.. automodule:: malib.algorithm.discrete_sac.policy
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.discrete_sac.rst b/docs/source/api/malib.algorithm.discrete_sac.rst
@@ -0,0 +1,17 @@
+malib.algorithm.discrete\_sac package
+=====================================
+
+.. automodule:: malib.algorithm.discrete_sac
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Submodules
+----------
+
+.. toctree::
+   :maxdepth: 2
+
+   malib.algorithm.discrete_sac.loss
+   malib.algorithm.discrete_sac.policy
+   malib.algorithm.discrete_sac.trainer
diff --git a/docs/source/api/malib.algorithm.discrete_sac.trainer.rst b/docs/source/api/malib.algorithm.discrete_sac.trainer.rst
@@ -0,0 +1,7 @@
+malib.algorithm.discrete\_sac.trainer module
+============================================
+
+.. automodule:: malib.algorithm.discrete_sac.trainer
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.imitation.advirl.loss.rst b/docs/source/api/malib.algorithm.imitation.advirl.loss.rst
@@ -0,0 +1,7 @@
+malib.algorithm.imitation.advirl.loss module
+============================================
+
+.. automodule:: malib.algorithm.imitation.advirl.loss
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.imitation.advirl.reward.rst b/docs/source/api/malib.algorithm.imitation.advirl.reward.rst
@@ -0,0 +1,7 @@
+malib.algorithm.imitation.advirl.reward module
+==============================================
+
+.. automodule:: malib.algorithm.imitation.advirl.reward
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.imitation.advirl.rst b/docs/source/api/malib.algorithm.imitation.advirl.rst
@@ -0,0 +1,17 @@
+malib.algorithm.imitation.advirl package
+========================================
+
+.. automodule:: malib.algorithm.imitation.advirl
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Submodules
+----------
+
+.. toctree::
+   :maxdepth: 2
+
+   malib.algorithm.imitation.advirl.loss
+   malib.algorithm.imitation.advirl.reward
+   malib.algorithm.imitation.advirl.trainer
diff --git a/docs/source/api/malib.algorithm.imitation.advirl.trainer.rst b/docs/source/api/malib.algorithm.imitation.advirl.trainer.rst
@@ -0,0 +1,7 @@
+malib.algorithm.imitation.advirl.trainer module
+===============================================
+
+.. automodule:: malib.algorithm.imitation.advirl.trainer
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.imitation.bc.loss.rst b/docs/source/api/malib.algorithm.imitation.bc.loss.rst
@@ -0,0 +1,7 @@
+malib.algorithm.imitation.bc.loss module
+========================================
+
+.. automodule:: malib.algorithm.imitation.bc.loss
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.imitation.bc.policy.rst b/docs/source/api/malib.algorithm.imitation.bc.policy.rst
@@ -0,0 +1,7 @@
+malib.algorithm.imitation.bc.policy module
+==========================================
+
+.. automodule:: malib.algorithm.imitation.bc.policy
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.imitation.bc.rst b/docs/source/api/malib.algorithm.imitation.bc.rst
@@ -0,0 +1,17 @@
+malib.algorithm.imitation.bc package
+====================================
+
+.. automodule:: malib.algorithm.imitation.bc
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Submodules
+----------
+
+.. toctree::
+   :maxdepth: 2
+
+   malib.algorithm.imitation.bc.loss
+   malib.algorithm.imitation.bc.policy
+   malib.algorithm.imitation.bc.trainer
diff --git a/docs/source/api/malib.algorithm.imitation.bc.trainer.rst b/docs/source/api/malib.algorithm.imitation.bc.trainer.rst
@@ -0,0 +1,7 @@
+malib.algorithm.imitation.bc.trainer module
+===========================================
+
+.. automodule:: malib.algorithm.imitation.bc.trainer
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.imitation.imitation_trainer.rst b/docs/source/api/malib.algorithm.imitation.imitation_trainer.rst
@@ -0,0 +1,7 @@
+malib.algorithm.imitation.imitation\_trainer module
+===================================================
+
+.. automodule:: malib.algorithm.imitation.imitation_trainer
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.imitation.rst b/docs/source/api/malib.algorithm.imitation.rst
@@ -0,0 +1,24 @@
+malib.algorithm.imitation package
+=================================
+
+.. automodule:: malib.algorithm.imitation
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Subpackages
+-----------
+
+.. toctree::
+   :maxdepth: 2
+
+   malib.algorithm.imitation.advirl
+   malib.algorithm.imitation.bc
+
+Submodules
+----------
+
+.. toctree::
+   :maxdepth: 2
+
+   malib.algorithm.imitation.imitation_trainer
diff --git a/docs/source/api/malib.algorithm.mappo.actor_critic.rst b/docs/source/api/malib.algorithm.mappo.actor_critic.rst
@@ -0,0 +1,7 @@
+malib.algorithm.mappo.actor\_critic module
+==========================================
+
+.. automodule:: malib.algorithm.mappo.actor_critic
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.mappo.data_generator.rst b/docs/source/api/malib.algorithm.mappo.data_generator.rst
@@ -0,0 +1,7 @@
+malib.algorithm.mappo.data\_generator module
+============================================
+
+.. automodule:: malib.algorithm.mappo.data_generator
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.mappo.loss.rst b/docs/source/api/malib.algorithm.mappo.loss.rst
@@ -0,0 +1,7 @@
+malib.algorithm.mappo.loss module
+=================================
+
+.. automodule:: malib.algorithm.mappo.loss
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.mappo.policy.rst b/docs/source/api/malib.algorithm.mappo.policy.rst
@@ -0,0 +1,7 @@
+malib.algorithm.mappo.policy module
+===================================
+
+.. automodule:: malib.algorithm.mappo.policy
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.mappo.rst b/docs/source/api/malib.algorithm.mappo.rst
@@ -0,0 +1,21 @@
+malib.algorithm.mappo package
+=============================
+
+.. automodule:: malib.algorithm.mappo
+   :members:
+   :undoc-members:
+   :show-inheritance:
+
+Submodules
+----------
+
+.. toctree::
+   :maxdepth: 2
+
+   malib.algorithm.mappo.actor_critic
+   malib.algorithm.mappo.data_generator
+   malib.algorithm.mappo.loss
+   malib.algorithm.mappo.policy
+   malib.algorithm.mappo.trainer
+   malib.algorithm.mappo.utils
+   malib.algorithm.mappo.vtrace
diff --git a/docs/source/api/malib.algorithm.mappo.trainer.rst b/docs/source/api/malib.algorithm.mappo.trainer.rst
@@ -0,0 +1,7 @@
+malib.algorithm.mappo.trainer module
+====================================
+
+.. automodule:: malib.algorithm.mappo.trainer
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.mappo.utils.rst b/docs/source/api/malib.algorithm.mappo.utils.rst
@@ -0,0 +1,7 @@
+malib.algorithm.mappo.utils module
+==================================
+
+.. automodule:: malib.algorithm.mappo.utils
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.mappo.vtrace.rst b/docs/source/api/malib.algorithm.mappo.vtrace.rst
@@ -0,0 +1,7 @@
+malib.algorithm.mappo.vtrace module
+===================================
+
+.. automodule:: malib.algorithm.mappo.vtrace
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.ppo.rst b/docs/source/api/malib.algorithm.ppo.rst
@@ -14,4 +14,4 @@ Submodules
 
    malib.algorithm.ppo.loss
    malib.algorithm.ppo.policy
-   malib.algorithm.ppo.ppo_trainer
+   malib.algorithm.ppo.trainer
diff --git a/docs/source/api/malib.algorithm.ppo.trainer.rst b/docs/source/api/malib.algorithm.ppo.trainer.rst
@@ -0,0 +1,7 @@
+malib.algorithm.ppo.trainer module
+==================================
+
+.. automodule:: malib.algorithm.ppo.trainer
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.qmix.q_mixer.rst b/docs/source/api/malib.algorithm.qmix.q_mixer.rst
@@ -0,0 +1,7 @@
+malib.algorithm.qmix.q\_mixer module
+====================================
+
+.. automodule:: malib.algorithm.qmix.q_mixer
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/api/malib.algorithm.qmix.rst b/docs/source/api/malib.algorithm.qmix.rst
@@ -13,5 +13,5 @@ Submodules
    :maxdepth: 2
 
    malib.algorithm.qmix.loss
-   malib.algorithm.qmix.policy
+   malib.algorithm.qmix.q_mixer
    malib.algorithm.qmix.trainer
-Original file line number
+Diff line change
@@ Expand Up / @@ -133,4 +133,5 @@ dmypy.json @@
     .idea
     _build
     logs
-    demos
+    demos
+    prof/