Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naming convention - Policy #40

Closed
kkshyu opened this issue Oct 23, 2016 · 4 comments
Closed

Naming convention - Policy #40

kkshyu opened this issue Oct 23, 2016 · 4 comments

Comments

@kkshyu
Copy link
Contributor

kkshyu commented Oct 23, 2016

Environment and Policy both contain act method, but they do quite different things.
In my opinion, act is a verb to perform sth. Therefore, in Policy abstract class, it should be a noun action just like bestAction. However, chooseAction and chooseBestAction are good, too.

class Policy(object):
    """Abstract class for all policies, i.e. objects that can take any space as input, and output an action.
    """

    def __init__(self, q_network, n_actions,random_state):
        self.q_network = q_network
        self.n_actions = n_actions
        self.random_state = random_state

        pass

    def bestAction(self, state):
        """ Returns the best Action
        """
        action = self.q_network.chooseBestAction(state)
        V = max(self.q_network.qValues(state))
        return action, V

    def act(self, state):
        """Main method of the Policy class. It can be called by agent.py, given a state,
        and should return a valid action w.r.t. the environment given to the constructor.
        """
        raise NotImplementedError()
@VinF
Copy link
Owner

VinF commented Oct 24, 2016

I agree that chooseAction may be more appropriate so as to avoid the confusion with the method in Environment(however I don't think chooseBestAction would be appropriate because the policies have to deal with the exploration/exploitation dilemma which is done within that method).

@kkshyu
Copy link
Contributor Author

kkshyu commented Oct 25, 2016

Did you mean chooseBestAtction have already dealt with exploration/exploitation dilemma?
I thought the dilemma should be implemented in act(chooseAction).
In my opinion, the method bestAction in policies is similar with chooseBestAction in q-network. Actually I don't understand the purpose of this method.

@VinF
Copy link
Owner

VinF commented Oct 25, 2016

Ok, I think I misunderstood your first message. Either we have act--->chooseAction and bestAction--->chooseBestAction or we keep bestAction and we change act--->action. (of course, chooseBestAction or bestAction has already dealt with exploration/exploitation)
And indeed bestAction in policies returns currently directly chooseBestAction in q-network. It's an additional encapsulation but if it's not required in the future, the best is probably to remove it, indeed.
Thanks for the feedback! Either you can do a PR or I'll do the changes.

@kkshyu kkshyu closed this as completed Oct 26, 2016
@kkshyu
Copy link
Contributor Author

kkshyu commented Oct 26, 2016

#41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants